chat.agent is built to keep that prefix stable across turns, suspends, and resumes; this page shows how to place the cache breakpoints and verify they’re hitting.
Caching is provider-specific. This guide covers Anthropic (@ai-sdk/anthropic), where you opt in per breakpoint with providerOptions.anthropic.cacheControl. Other providers cache differently, and most cache automatically — see Other providers.
What you cache, and where
A request renders astools → system → messages. There are three prefix regions worth caching, in order:
| Region | How to cache it | Stability |
|---|---|---|
| System prompt (+ tools) | cacheControl / systemProviderOptions on chat.toStreamTextOptions(), or providerOptions on chat.prompt.set() | Set once, never changes — the highest-value target |
| Conversation history | prepareMessages adds a breakpoint to the last message | Grows append-only across turns |
| Tool definitions | Stable as long as your tool set doesn’t change between turns | Render at position 0 — changing them invalidates everything |
chat.agent preserves providerOptions through message persistence and rehydration, so a breakpoint you place survives a suspend/resume or a page refresh. The recommended way to place message breakpoints is prepareMessages (below) rather than baking cacheControl into stored messages — prepareMessages runs on every prompt-assembly path, including after compaction, so the breakpoint is always in the right place.
Cache the system prompt
The system prompt (yourchat.prompt text plus any skills preamble) is usually the largest stable block, so it’s the first thing to cache. chat.toStreamTextOptions() returns system as a plain string by default; opt into caching and it returns a structured system message carrying the cache breakpoint instead.
System-prompt caching needs AI SDK v6 or later, where the
system parameter accepts a structured message. On AI SDK v5 system is a plain string, so these options won’t apply a breakpoint to the system block — cache the conversation via prepareMessages instead.cacheControl at the streamText call site — the Anthropic-flavored one-liner:
/trigger/chat.ts
systemProviderOptions is the provider-agnostic form — pass the raw providerOptions so it composes with any provider:
/trigger/chat.ts
providerOptions on chat.prompt.set() co-locates the intent with where the prompt is defined. It carries through to toStreamTextOptions() with no call-site change:
/trigger/chat.ts
systemProviderOptions overrides cacheControl, and both override chat.prompt.set’s providerOptions. There’s no deep merge — the most specific option replaces the rest.
Use the 1-hour cache for prefixes that sit idle longer than 5 minutes between turns:
cacheControl: { type: "ephemeral", ttl: "1h" }. Writes cost more (2× vs 1.25×), so it pays off only when reads span the longer window.Cache the conversation history
Place a breakpoint on the last message and the entire conversation prefix up to that point is cached, so the next turn reads it back instead of re-processing it. Do this inprepareMessages — it transforms model messages once, and chat.agent applies it on every path that builds a prompt (each turn, and both compaction rebuild paths), so the breakpoint always lands on the real last message.
/trigger/chat.ts
Anthropic allows at most 4 cache breakpoints per request, and a prefix must be at least ~1024 tokens (model-dependent) to cache at all — shorter prefixes silently don’t cache. One system breakpoint plus one rolling message breakpoint is the typical setup and leaves headroom.
Caching and compaction
Compaction rewrites the conversation prefix — it replaces earlier turns with a summary — so it necessarily invalidates the cached message prefix at that point. That’s a one-time reset, not a regression: becauseprepareMessages also runs on the compaction rebuild and result paths, the new (shorter) prefix gets a fresh breakpoint and re-warms on the next turn. Your system-prompt cache is unaffected — compaction never touches the system block. See Compaction for how the summary is produced.
Other providers
Caching is provider-specific, and most providers don’t use per-block breakpoints at all:-
OpenAI and Google Gemini cache automatically. OpenAI caches any prompt prefix over 1024 tokens; Gemini 2.5 caches implicitly (1024 tokens on Flash, 2048 on Pro). Neither needs a breakpoint, so the system-caching options above are a no-op for them —
chat.agentalready gives automatic caching exactly what it needs: a byte-stable prefix that only grows across turns. Keep the system prompt frozen and the prefix over the model’s minimum and reads happen on their own. (OpenAI’s optionalproviderOptions.openai.promptCacheKeyimproves hit-routing across requests; it’s a top-level option, not a system-block breakpoint.) -
Anthropic and Amazon Bedrock take an explicit breakpoint on the system block — Anthropic via
cacheControl, Bedrock viacachePoint. Both go through the provider-agnosticsystemProviderOptions:
/trigger/chat.ts
cacheControl shorthand is Anthropic-only; systemProviderOptions (and chat.prompt.set’s providerOptions) is the form to reach for on any other breakpoint-based provider.
Usage reporting is normalized. Each provider reports cache tokens under its own provider-specific field, but the AI SDK maps them into the same inputTokenDetails.cacheReadTokens / cacheWriteTokens that previousTurnUsage and totalUsage carry and the dashboard shows — so the verify step is the same regardless of provider.
Verify caching is working
The turn’s usage carries cache token counts.chat.agent accumulates them across turns and hands them to run as previousTurnUsage (last turn) and totalUsage (whole chat), both LanguageModelUsage:
/trigger/chat.ts
cacheWriteTokens > 0, cacheReadTokens is 0). Every turn after, on an unchanged prefix, reads it (cacheReadTokens > 0). The dashboard surfaces the same numbers on the AI span as Cache write and Cache read, so you can confirm hits per run without logging.
If cacheReadTokens stays 0 across turns with an identical prefix, a silent invalidator is shifting the bytes — see below.
Next steps
Compaction
Keep long conversations within token limits — and re-warm the cache after.
Fast starts
Cut cold-start latency so a cached prefix is the only thing between a message and a reply.
chat.agent reference
Full option surface, including
prepareMessages and toStreamTextOptions.Building agents: backend
The three ways to build a chat backend and when to reach for each.

