Skip to content

ChatCompletionsClient

ChatCompletionsClient implements LLMClient for any provider that speaks the OpenAI chat-completions SSE format: OpenAI, Azure OpenAI, OpenRouter, Together AI, Groq, Fireworks, Ollama, llama.cpp, LiteLLM, and others. Switch providers by changing baseURL.

public init(
baseURL: URL = URL(string: "https://api.openai.com/v1")!,
model: String,
apiKey: String? = nil,
headers: [String: String] = [:],
responseFormat: ResponseFormat = .text,
extraBody: [String: JSONValue] = [:],
maxRetries: Int = 2,
retryDelay: Duration = .milliseconds(250),
transport: any ChatCompletionsTransport = URLSessionEventStream()
)

The client appends /chat/completions to baseURL automatically.

ParameterDefaultNotes
baseURLhttps://api.openai.com/v1Override for any compatible provider
modelRequired; provider model identifier
apiKeynilSent as Authorization: Bearer <key>
headers[:]Merged after the built-in headers; use for provider-specific auth (e.g. api-key on Azure)
responseFormat.textSee ResponseFormat
extraBody[:]Arbitrary top-level body keys (temperature, max_tokens, seed, …); applied last, can override defaults
maxRetries2Retries only before the first streamed event, and only for URLError, 429, or 5xx
retryDelay250 msLinear backoff — delay × attempt number
transportURLSessionEventStream()See Custom transport
let llm = ChatCompletionsClient(
model: "gpt-4o-mini",
apiKey: ProcessInfo.processInfo.environment["OPENAI_API_KEY"]
)

Change baseURL to point at any OpenAI-compatible endpoint:

// Groq
let groq = ChatCompletionsClient(
baseURL: URL(string: "https://api.groq.com/openai/v1")!,
model: "llama3-8b-8192",
apiKey: groqKey
)
// Local Ollama
let local = ChatCompletionsClient(
baseURL: URL(string: "http://localhost:11434/v1")!,
model: "llama3.2"
// no apiKey needed
)

Pass provider-specific or model-tuning parameters through extraBody:

let llm = ChatCompletionsClient(
model: "gpt-4o",
apiKey: key,
extraBody: [
"temperature": .number(0.2),
"max_tokens": .number(1024),
"seed": .number(42),
]
)

extraBody keys are applied after the built-in keys, so they can override anything — including stream_options if a provider rejects it.

Set responseFormat to request JSON or schema-constrained output:

let schema: JSONValue = .object([
"type": .string("object"),
"properties": .object([
"answer": .object(["type": .string("string")]),
"confidence": .object(["type": .string("number")]),
]),
"required": .array([.string("answer"), .string("confidence")]),
])
let llm = ChatCompletionsClient(
model: "gpt-4o-mini",
apiKey: key,
responseFormat: .jsonSchema(name: "answer_with_confidence", schema: schema)
)

See ResponseFormat on the overview page for the full enum definition.

public enum ChatCompletionsError: Error, Equatable {
case httpStatus(Int, body: String?)
case nonHTTPResponse
case emptyStream
}
  • .httpStatus — the provider returned a non-2xx status. body contains up to 2 048 bytes of the response for diagnostics.
  • .nonHTTPResponseURLSession returned a non-HTTP response (should not occur in practice).
  • .emptyStream — a 200 whose SSE stream carried nothing parseable; typically a provider error envelope or an HTML gateway page. Not retried.

The transport parameter is the HTTP seam. The default URLSessionEventStream suffices for production use:

public struct URLSessionEventStream: ChatCompletionsTransport {
public init(timeout: TimeInterval = 60)
}

timeout is the idle timeout — the maximum gap between incoming bytes. It is not a total-duration cap, so long responses are not cut off.

To replace the HTTP layer for test mocks, custom URLSession configurations, or an alternative networking stack, implement ChatCompletionsTransport. See Custom transport for a worked example.