Skip to content

Audio overview

AgentSquadAudio is a separate Swift package product that ships two AVFoundation-backed implementations — MicCapture and AudioPlayback — built on top of two protocols declared in the core AgentSquad module.

import AgentSquad // AudioInput, AudioOutput protocols
import AgentSquadAudio // MicCapture, AudioPlayback

Both protocols are declared in AgentSquad (not AgentSquadAudio), so you can write custom conformances, unit-test stubs, or file-based implementations without pulling in AVFoundation.

public protocol AudioInput: Sendable {
/// Captured PCM16 chunks. Bounded drop-oldest — a slow consumer never blocks the audio thread.
/// Finishes when capture stops.
var frames: AsyncStream<Data> { get }
func start() async throws
func stop() async
}

start() should install the capture source and begin yielding frames. stop() should halt capture and call continuation.finish() so consumers exit their for await loop cleanly.

public protocol AudioOutput: Sendable {
func start() async throws
func enqueue(_ pcm16: Data) async
func flush() async
func stop() async
}

enqueue schedules one PCM16 frame without waiting for it to finish playing — implementations must not serialize to real-time playback speed. flush is the barge-in primitive: it discards all queued or in-flight audio instantly.


RealtimeRuntime accepts an AudioInput and an AudioOutput at construction time:

let runtime = RealtimeRuntime(
input: MicCapture(),
output: AudioPlayback(),
// ... other config
)

The runtime drives start, stop, enqueue, and flush from its single event pump, so implementations are never called concurrently by the runtime itself.


TypeProtocolDescription
MicCaptureAudioInputAVAudioEngine tap → PCM16 @ 24 kHz, with iOS permission gating
AudioPlaybackAudioOutputAVAudioEngine + AVAudioPlayerNode, with barge-in flush

You can replace either built-in with any conforming type — useful for tests, file-based input, or platforms where AVFoundation is unavailable.

See Custom audio for worked examples of a file-replay AudioInput and a recording AudioOutput test sink.