LLM clients — Overview

Every Agent, GroundedAgent, and LLMClassifier accepts an LLMClient. The protocol is a single streaming method; swap the underlying model or provider without touching any orchestration code.

Built-in: ChatCompletionsClient — works with OpenAI and any compatible endpoint.
Custom: Writing a custom LLM connector or transport

LLMClient protocol

public protocol LLMClient: Sendable {
    func complete(_ request: LLMRequest) -> AsyncThrowingStream<LLMStreamEvent, any Error>
}

complete streams events for one turn. The caller iterates until .done, runs any requested tools, and re-invokes — the tool loop is handled for you inside Agent and GroundedAgent.

LLMRequest

public struct LLMRequest: Sendable {
    public let system: String?
    public let messages: [ConversationMessage]
    public let tools: [AgentTool]

    public init(
        system: String? = nil,
        messages: [ConversationMessage],
        tools: [AgentTool] = []
    )
}

See Messages & events for ConversationMessage, and Tools for AgentTool.

LLMStreamEvent

public enum LLMStreamEvent: Sendable {
    case textDelta(String)
    case toolCall(id: String, name: String, arguments: JSONValue)
    case done(reason: FinishReason, usage: LLMUsage?)
}

Events arrive in order: zero or more .textDelta and .toolCall events, then exactly one .done. Multiple .toolCall events may appear before .done when the model requests parallel calls.

FinishReason

public enum FinishReason: Sendable, Equatable {
    case stop           // model finished its answer
    case toolCalls      // stopped to request tool calls
    case length         // truncated by token limit
    case contentFilter  // refused or filtered
    case other(String)  // provider-specific value
}

LLMUsage

public struct LLMUsage: Sendable, Equatable {
    public let promptTokens: Int?
    public let completionTokens: Int?
}

Usage is reported on .done when the provider includes it in the stream. See Tracing for how token counts surface in traces.

ResponseFormat

ResponseFormat is defined alongside ChatCompletionsClient but applies to any client that supports structured output:

public enum ResponseFormat: Sendable, Equatable {
    case text
    case json
    case jsonSchema(name: String, schema: JSONValue, strict: Bool = true)
}

.text — omits response_format from the request body (default prose output).
.json — sends {"type": "json_object"}. The model returns valid JSON but no schema is enforced.
.jsonSchema(name:schema:strict:) — sends {"type": "json_schema", ...} with your schema. strict defaults to true.

Consuming the stream directly

If you need raw stream access outside an Agent, iterate complete yourself:

let request = LLMRequest(
    system: "You are a helpful assistant.",
    messages: [ConversationMessage(role: .user, parts: [.text("Hello")])]
)

for try await event in llm.complete(request) {
    switch event {
    case .textDelta(let text):
        print(text, terminator: "")
    case .toolCall(let id, let name, let arguments):
        print("\nTool call: \(name) [\(id)] args=\(arguments)")
    case .done(let reason, let usage):
        print("\nDone: \(reason), tokens: \(String(describing: usage))")
    }
}

Next steps

Use the built-in ChatCompletionsClient for OpenAI and compatible providers.
Write a custom connector or transport for non-OpenAI backends or test mocks.