LLM clients — Overview
Every Agent, GroundedAgent, and LLMClassifier accepts an LLMClient. The protocol is a single streaming method; swap the underlying model or provider without touching any orchestration code.
- Built-in: ChatCompletionsClient — works with OpenAI and any compatible endpoint.
- Custom: Writing a custom LLM connector or transport
LLMClient protocol
Section titled “LLMClient protocol”public protocol LLMClient: Sendable { func complete(_ request: LLMRequest) -> AsyncThrowingStream<LLMStreamEvent, any Error>}complete streams events for one turn. The caller iterates until .done, runs any requested tools, and re-invokes — the tool loop is handled for you inside Agent and GroundedAgent.
LLMRequest
Section titled “LLMRequest”public struct LLMRequest: Sendable { public let system: String? public let messages: [ConversationMessage] public let tools: [AgentTool]
public init( system: String? = nil, messages: [ConversationMessage], tools: [AgentTool] = [] )}See Messages & events for ConversationMessage, and Tools for AgentTool.
LLMStreamEvent
Section titled “LLMStreamEvent”public enum LLMStreamEvent: Sendable { case textDelta(String) case toolCall(id: String, name: String, arguments: JSONValue) case done(reason: FinishReason, usage: LLMUsage?)}Events arrive in order: zero or more .textDelta and .toolCall events, then exactly one .done. Multiple .toolCall events may appear before .done when the model requests parallel calls.
FinishReason
Section titled “FinishReason”public enum FinishReason: Sendable, Equatable { case stop // model finished its answer case toolCalls // stopped to request tool calls case length // truncated by token limit case contentFilter // refused or filtered case other(String) // provider-specific value}LLMUsage
Section titled “LLMUsage”public struct LLMUsage: Sendable, Equatable { public let promptTokens: Int? public let completionTokens: Int?}Usage is reported on .done when the provider includes it in the stream. See Tracing for how token counts surface in traces.
ResponseFormat
Section titled “ResponseFormat”ResponseFormat is defined alongside ChatCompletionsClient but applies to any client that supports structured output:
public enum ResponseFormat: Sendable, Equatable { case text case json case jsonSchema(name: String, schema: JSONValue, strict: Bool = true)}.text— omitsresponse_formatfrom the request body (default prose output)..json— sends{"type": "json_object"}. The model returns valid JSON but no schema is enforced..jsonSchema(name:schema:strict:)— sends{"type": "json_schema", ...}with your schema.strictdefaults totrue.
Consuming the stream directly
Section titled “Consuming the stream directly”If you need raw stream access outside an Agent, iterate complete yourself:
let request = LLMRequest( system: "You are a helpful assistant.", messages: [ConversationMessage(role: .user, parts: [.text("Hello")])])
for try await event in llm.complete(request) { switch event { case .textDelta(let text): print(text, terminator: "") case .toolCall(let id, let name, let arguments): print("\nTool call: \(name) [\(id)] args=\(arguments)") case .done(let reason, let usage): print("\nDone: \(reason), tokens: \(String(describing: usage))") }}Next steps
Section titled “Next steps”- Use the built-in ChatCompletionsClient for OpenAI and compatible providers.
- Write a custom connector or transport for non-OpenAI backends or test mocks.