Prompt caching - Trigger.dev

The AI Agents and Prompts surface ships as part of the v4.5 release candidate. Install with @trigger.dev/sdk@rc (or pin 4.5.0-rc.0 or later) to use these features — they aren’t yet on the latest stable, and APIs may still change before the 4.5.0 GA. See supported AI SDK versions and the AI chat changelog for details.

Prompt caching lets a provider reuse the unchanged prefix of your prompt across requests, billing it at a fraction of the input price and skipping re-processing. With Anthropic, cache reads cost ~10% of base input tokens, so a long, stable system prompt or a growing conversation history pays full price once and reads cheaply on every turn after. Caching is a byte-exact prefix match: any change in the prefix invalidates everything after it. A multi-turn agent is the ideal case — the system prompt, tools, and earlier turns are identical turn over turn, so the cacheable prefix only grows. chat.agent is built to keep that prefix stable across turns, suspends, and resumes; this page shows how to place the cache breakpoints and verify they’re hitting. Caching is provider-specific. This guide covers Anthropic (@ai-sdk/anthropic), where you opt in per breakpoint with providerOptions.anthropic.cacheControl. Other providers cache differently, and most cache automatically — see Other providers.

What you cache, and where

A request renders as tools → system → messages. There are three prefix regions worth caching, in order:

Region	How to cache it	Stability
System prompt (+ tools)	`cacheControl` / `systemProviderOptions` on `chat.toStreamTextOptions()`, or `providerOptions` on `chat.prompt.set()`	Set once, never changes — the highest-value target
Conversation history	`prepareMessages` adds a breakpoint to the last message	Grows append-only across turns
Tool definitions	Stable as long as your tool set doesn’t change between turns	Render at position 0 — changing them invalidates everything

chat.agent preserves providerOptions through message persistence and rehydration, so a breakpoint you place survives a suspend/resume or a page refresh. The recommended way to place message breakpoints is prepareMessages (below) rather than baking cacheControl into stored messages — prepareMessages runs on every prompt-assembly path, including after compaction, so the breakpoint is always in the right place.

Cache the system prompt

The system prompt (your chat.prompt text plus any skills preamble) is usually the largest stable block, so it’s the first thing to cache. chat.toStreamTextOptions() returns system as a plain string by default; opt into caching and it returns a structured system message carrying the cache breakpoint instead.

System-prompt caching needs AI SDK v6 or later, where the system parameter accepts a structured message. On AI SDK v5 system is a plain string, so these options won’t apply a breakpoint to the system block — cache the conversation via prepareMessages instead.

Three ways to opt in, depending on where you’d rather express it. cacheControl at the streamText call site — the Anthropic-flavored one-liner:

/trigger/chat.ts

import { chat } from "@trigger.dev/sdk/ai";
import { streamText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";

export const myChat = chat.agent({
  id: "my-chat",
  onChatStart: async () => {
    chat.prompt.set(SYSTEM_PROMPT); // a large, stable instruction block
  },
  run: async ({ messages, signal }) => {
    return streamText({
      model: anthropic("claude-sonnet-4-5"),
      // Caches the system block with a 5-minute breakpoint.
      ...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
      messages,
      abortSignal: signal,
    });
  },
});

systemProviderOptions is the provider-agnostic form — pass the raw providerOptions so it composes with any provider:

/trigger/chat.ts

return streamText({
  model: anthropic("claude-sonnet-4-5"),
  ...chat.toStreamTextOptions({
    systemProviderOptions: { anthropic: { cacheControl: { type: "ephemeral" } } },
  }),
  messages,
  abortSignal: signal,
});

providerOptions on chat.prompt.set() co-locates the intent with where the prompt is defined. It carries through to toStreamTextOptions() with no call-site change:

/trigger/chat.ts

onChatStart: async () => {
  chat.prompt.set(SYSTEM_PROMPT, {
    providerOptions: { anthropic: { cacheControl: { type: "ephemeral" } } },
  });
},
run: async ({ messages, signal }) => {
  return streamText({
    model: anthropic("claude-sonnet-4-5"),
    ...chat.toStreamTextOptions(), // already cached
    messages,
    abortSignal: signal,
  });
},

If more than one is set, the call-site option wins: systemProviderOptions overrides cacheControl, and both override chat.prompt.set’s providerOptions. There’s no deep merge — the most specific option replaces the rest.

Use the 1-hour cache for prefixes that sit idle longer than 5 minutes between turns: cacheControl: { type: "ephemeral", ttl: "1h" }. Writes cost more (2× vs 1.25×), so it pays off only when reads span the longer window.

Cache the conversation history

Place a breakpoint on the last message and the entire conversation prefix up to that point is cached, so the next turn reads it back instead of re-processing it. Do this in prepareMessages — it transforms model messages once, and chat.agent applies it on every path that builds a prompt (each turn, and both compaction rebuild paths), so the breakpoint always lands on the real last message.

/trigger/chat.ts

export const myChat = chat.agent({
  id: "my-chat",
  prepareMessages: async ({ messages }) => {
    if (messages.length === 0) return messages;
    const last = messages[messages.length - 1];
    return [
      ...messages.slice(0, -1),
      {
        ...last,
        providerOptions: {
          ...last.providerOptions,
          anthropic: { cacheControl: { type: "ephemeral" } },
        },
      },
    ];
  },
  run: async ({ messages, signal }) => {
    return streamText({
      model: anthropic("claude-sonnet-4-5"),
      ...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
      messages,
      abortSignal: signal,
    });
  },
});

The system breakpoint and the conversation breakpoint compose: the system block is cached once for the life of the chat, and each turn extends the cached message prefix.

Anthropic allows at most 4 cache breakpoints per request, and a prefix must be at least ~1024 tokens (model-dependent) to cache at all — shorter prefixes silently don’t cache. One system breakpoint plus one rolling message breakpoint is the typical setup and leaves headroom.

Caching and compaction

Compaction rewrites the conversation prefix — it replaces earlier turns with a summary — so it necessarily invalidates the cached message prefix at that point. That’s a one-time reset, not a regression: because prepareMessages also runs on the compaction rebuild and result paths, the new (shorter) prefix gets a fresh breakpoint and re-warms on the next turn. Your system-prompt cache is unaffected — compaction never touches the system block. See Compaction for how the summary is produced.

Other providers

Caching is provider-specific, and most providers don’t use per-block breakpoints at all:

OpenAI and Google Gemini cache automatically. OpenAI caches any prompt prefix over 1024 tokens; Gemini 2.5 caches implicitly (1024 tokens on Flash, 2048 on Pro). Neither needs a breakpoint, so the system-caching options above are a no-op for them — chat.agent already gives automatic caching exactly what it needs: a byte-stable prefix that only grows across turns. Keep the system prompt frozen and the prefix over the model’s minimum and reads happen on their own. (OpenAI’s optional providerOptions.openai.promptCacheKey improves hit-routing across requests; it’s a top-level option, not a system-block breakpoint.)
Anthropic and Amazon Bedrock take an explicit breakpoint on the system block — Anthropic via cacheControl, Bedrock via cachePoint. Both go through the provider-agnostic systemProviderOptions:

/trigger/chat.ts

// Amazon Bedrock
return streamText({
  ...chat.toStreamTextOptions({
    systemProviderOptions: { bedrock: { cachePoint: { type: "default" } } },
  }),
  messages,
});

The cacheControl shorthand is Anthropic-only; systemProviderOptions (and chat.prompt.set’s providerOptions) is the form to reach for on any other breakpoint-based provider. Usage reporting is normalized. Each provider reports cache tokens under its own provider-specific field, but the AI SDK maps them into the same inputTokenDetails.cacheReadTokens / cacheWriteTokens that previousTurnUsage and totalUsage carry and the dashboard shows — so the verify step is the same regardless of provider.

Verify caching is working

The turn’s usage carries cache token counts. chat.agent accumulates them across turns and hands them to run as previousTurnUsage (last turn) and totalUsage (whole chat), both LanguageModelUsage:

/trigger/chat.ts

run: async ({ messages, signal, previousTurnUsage }) => {
  // After turn 1, cacheReadTokens should be > 0 on a stable prefix.
  console.log("cache read", previousTurnUsage?.inputTokenDetails?.cacheReadTokens);
  console.log("cache write", previousTurnUsage?.inputTokenDetails?.cacheWriteTokens);

  return streamText({
    model: anthropic("claude-sonnet-4-5"),
    ...chat.toStreamTextOptions({ cacheControl: { type: "ephemeral" } }),
    messages,
    abortSignal: signal,
  });
},

The first turn writes the cache (cacheWriteTokens > 0, cacheReadTokens is 0). Every turn after, on an unchanged prefix, reads it (cacheReadTokens > 0). The dashboard surfaces the same numbers on the AI span as Cache write and Cache read, so you can confirm hits per run without logging. If cacheReadTokens stays 0 across turns with an identical prefix, a silent invalidator is shifting the bytes — see below.

Anything that changes the prefix between turns silently kills the cache. Keep the system prompt byte-stable — never interpolate a timestamp, request ID, or per-turn value into chat.prompt. Don’t change the model or the tool set mid-conversation (tools render at position 0, so adding one invalidates everything after). Inject dynamic per-turn context as a late message via pending messages or background injection, not into the cached prefix.

Next steps

Compaction

Keep long conversations within token limits — and re-warm the cache after.

Fast starts

Cut cold-start latency so a cached prefix is the only thing between a message and a reply.

chat.agent reference

Full option surface, including prepareMessages and toStreamTextOptions.

Building agents: backend

The three ways to build a chat backend and when to reach for each.

​What you cache, and where

​Cache the system prompt

​Cache the conversation history

​Caching and compaction

​Other providers

​Verify caching is working

​Next steps

Compaction

Fast starts

chat.agent reference

Building agents: backend

What you cache, and where

Cache the system prompt

Cache the conversation history

Caching and compaction

Other providers

Verify caching is working

Next steps