Audio overview
AgentSquadAudio is a separate Swift package product that ships two AVFoundation-backed implementations — MicCapture and AudioPlayback — built on top of two protocols declared in the core AgentSquad module.
import AgentSquad // AudioInput, AudioOutput protocolsimport AgentSquadAudio // MicCapture, AudioPlaybackProtocols
Section titled “Protocols”Both protocols are declared in AgentSquad (not AgentSquadAudio), so you can write custom conformances, unit-test stubs, or file-based implementations without pulling in AVFoundation.
AudioInput
Section titled “AudioInput”public protocol AudioInput: Sendable { /// Captured PCM16 chunks. Bounded drop-oldest — a slow consumer never blocks the audio thread. /// Finishes when capture stops. var frames: AsyncStream<Data> { get }
func start() async throws func stop() async}start() should install the capture source and begin yielding frames. stop() should halt capture and call continuation.finish() so consumers exit their for await loop cleanly.
AudioOutput
Section titled “AudioOutput”public protocol AudioOutput: Sendable { func start() async throws func enqueue(_ pcm16: Data) async func flush() async func stop() async}enqueue schedules one PCM16 frame without waiting for it to finish playing — implementations must not serialize to real-time playback speed. flush is the barge-in primitive: it discards all queued or in-flight audio instantly.
How they feed the voice runtime
Section titled “How they feed the voice runtime”RealtimeRuntime accepts an AudioInput and an AudioOutput at construction time:
let runtime = RealtimeRuntime( input: MicCapture(), output: AudioPlayback(), // ... other config)The runtime drives start, stop, enqueue, and flush from its single event pump, so implementations are never called concurrently by the runtime itself.
Built-in implementations
Section titled “Built-in implementations”| Type | Protocol | Description |
|---|---|---|
MicCapture | AudioInput | AVAudioEngine tap → PCM16 @ 24 kHz, with iOS permission gating |
AudioPlayback | AudioOutput | AVAudioEngine + AVAudioPlayerNode, with barge-in flush |
Custom implementations
Section titled “Custom implementations”You can replace either built-in with any conforming type — useful for tests, file-based input, or platforms where AVFoundation is unavailable.
See Custom audio for worked examples of a file-replay AudioInput and a recording AudioOutput test sink.
Related pages
Section titled “Related pages”- Voice overview — the
RealtimeRuntimethat consumes these protocols - OpenAI Voice agent — built-in agent wired to the voice runtime
- Custom audio