Prompt Caching in the API
Many developers use the same context repeatedly across multiple API calls when building AI applications, like when making edits to a codebase or having long, multi-turn conversations with a chatbot. Today, we’re introducing Prompt Caching, allowing developers to reduce costs and latency. By reusing recently seen input tokens, developers can get a 50% discount and faster prompt processing times.
Prompt Caching Availability & Pricing
Starting today, Prompt Caching is automatically applied on the latest versions of GPT-4o, GPT-4o mini, o1-preview and o1-mini, as well as fine-tuned versions of those models. Cached prompts are offered at a discount compared to uncached prompts.
Many developers use the same context repeatedly across multiple API calls when building AI applications, like when making edits to a codebase or having long, multi-turn conversations with a chatbot. Today, we’re introducing Prompt Caching, allowing developers to reduce costs and latency. By reusing recently seen input tokens, developers can get a 50% discount and faster prompt processing times.
Prompt Caching Availability & Pricing
Starting today, Prompt Caching is automatically applied on the latest versions of GPT-4o, GPT-4o mini, o1-preview and o1-mini, as well as fine-tuned versions of those models. Cached prompts are offered at a discount compared to uncached prompts.