Input Token Price Drop Since 2023
Production Models Tracked
Output vs Input Token Cost Ratio
Cost Savings with Tiered Routing
Key Takeaways
LLM API pricing has never been more consequential — or more confusing. In Q1 2026, the market features more than 15 production-grade models across five major providers, with per-token rates spanning two orders of magnitude. Choosing the wrong model for a high-volume agent pipeline can inflate monthly costs by 10x or more. Choosing the right one can make previously uneconomical automation suddenly viable.
This pricing index tracks current input and output token rates across all major LLMs, normalized to cost per 1M tokens for direct comparison. It includes historical trend data, per-pattern cost estimates for five common agent deployment types, and a practical framework for budgeting and optimizing AI agent infrastructure. For teams building agentic systems at scale, see our analysis of agentic AI statistics for 2026 and how cost is reshaping deployment decisions across enterprise teams.
The index is updated monthly. Rates reflect standard pay-as-you-go pricing in USD as of March 2026. Enterprise contract rates, batch discounts, and prompt caching rates are noted separately where they differ materially.
Q1 2026 LLM Pricing Landscape Overview
The defining trend of the past 30 months has been rapid, competitive price compression at the frontier. OpenAI's GPT-4 launched at $30 per 1M input tokens in March 2023. By Q1 2026, comparable capability — as measured by MMLU, HumanEval, and agentic benchmarks — is available for under $3 per 1M input tokens. The compression has been driven by hardware improvements, inference optimization, and intense competition between OpenAI, Anthropic, Google, and open-weight model hosts like Together AI and Fireworks.
The market has also stratified clearly into three tiers. Frontier models (GPT-5.4, Claude Sonnet 4.5, Gemini 2.5 Pro) command premium pricing for maximum reasoning capability. Mid-tier models (GPT-4o Mini, Claude Haiku 3.5, Gemini 2.0 Flash) offer strong general-purpose performance at one-fifth the cost. Budget models (Mistral 7B, Gemini Flash Lite, various open-weight deployments) serve high-volume classification and extraction workflows at sub-cent-per-1M-token pricing.
Frontier input token pricing fell from $30 per 1M tokens at GPT-4 launch in 2023 to under $3 per 1M for comparable models in Q1 2026. Budget models sit below $0.15 per 1M.
Frontier, mid-tier, and budget models now serve distinct use cases. Mixing tiers intelligently within a single pipeline is the primary cost optimization lever available in 2026.
Output tokens cost 3–5x more than input tokens across all providers. In agent pipelines with high tool-call volumes, verbose output formats are the single largest controllable cost driver.
Context window pricing has become the second major dimension after per-token rates. Models now offer 128K, 200K, and 1M+ token windows, but teams that fill these windows pay proportionally. A research agent that feeds an entire 200K-token document corpus into each reasoning step will spend orders of magnitude more than one that retrieves only the relevant 2K-token chunks via RAG. The context window is a capability, not a default operating mode.
Per-Token Rate Index: All Major Models
All rates below are in USD per 1M tokens, standard pay-as-you-go as of March 2026. Batch pricing (50% discount where available) and prompt caching read rates are noted separately.
Frontier Tier
Input: $2.50 / 1M tokens
Output: $10.00 / 1M tokens
Context: 128K tokens
Batch discount: 50% off
Best for: Complex reasoning, multi-step planning, code generation at high accuracy
Input: $3.00 / 1M tokens
Output: $15.00 / 1M tokens
Context: 200K tokens
Cache reads: $0.30 / 1M tokens
Best for: Long-document analysis, nuanced instruction following, enterprise workflows
Input (up to 200K): $1.25 / 1M
Input (200K+): $2.50 / 1M
Output: $10.00 / 1M tokens
Context: 1M tokens
Best for: Multimodal tasks, very long context, Google ecosystem integration
Input: $2.00 / 1M tokens
Output: $6.00 / 1M tokens
Context: 128K tokens
Batch discount: Available
Best for: European data residency requirements, multilingual tasks, cost-sensitive frontier use cases
Mid-Tier Models
Input: $0.15 / 1M
Output: $0.60 / 1M
Context: 128K
Strong instruction following, fast latency
Input: $0.80 / 1M
Output: $4.00 / 1M
Cache reads: $0.08 / 1M
Top mid-tier for structured output + tool use
Input: $0.10 / 1M
Output: $0.40 / 1M
Context: 1M tokens
Best mid-tier price-performance, 1M context window
Index note: Rates reflect standard API pricing as of March 2026. Enterprise agreements, Azure OpenAI Service, and Google Cloud Vertex AI pricing may differ. Always verify current rates at provider pricing pages before finalizing cost models.
Context Window Costs and Long-Context Penalties
Every token in a model's context window is billed as an input token — including the conversation history, system prompt, tool definitions, retrieved documents, and any previous turns in a multi-step agent loop. This means that long-context capability is not free: filling a 128K context window costs 16x more in input tokens than filling a 8K window for the same task.
For agentic deployments, context accumulation is the most common source of unexpectedly high costs. Each turn in a multi-turn agent conversation retransmits the full conversation history plus the new input. A 10-turn research agent with 20K tokens of context per turn accumulates 200K input tokens in history alone by turn 10, before counting the actual content of the final query.
Remove completed tool call results, intermediate reasoning steps, and verbose error messages from context between turns. Retaining only the agent's final state and the current task reduces context costs by 40–60% in typical multi-turn workflows without affecting task completion quality.
Retrieval-augmented generation costs far less than loading full document corpora into context. Retrieving 2K relevant tokens via vector search instead of loading a 100K-token document reduces that step's input cost by 98%. Context windows are best reserved for tasks that genuinely require holistic document understanding.
Anthropic and Google both offer prompt caching for repeated prefixes. System prompts and tool definitions that appear at the start of every request can be cached once and read at 10% of the standard input token price. For deployments with 5K+ token system prompts, caching alone cuts input costs by 30–50%.
Requiring JSON or structured output instead of prose reduces output token counts by 30–70% for data extraction and classification tasks. A verbose narrative explanation of a classification decision costs 5–10x more in output tokens than returning {"label":"positive","confidence":0.94}.
Cost Estimates for Five Agent Deployment Patterns
The following estimates assume 1,000 tasks per day, using mid-tier models (Gemini 2.0 Flash or Claude Haiku 3.5) for standard steps and frontier models (Claude Sonnet 4.5) for reasoning-heavy steps. Costs are monthly totals. Task complexity tiers are defined as: simple (single-step, structured output), moderate (2–4 steps with tool use), and complex (5+ steps, iterative refinement).
Pattern: Web search → summarize sources → synthesize answer → format report
Avg input tokens: 12K per task
Avg output tokens: 1,200 per task
Steps: 4–6 (moderate-complex)
Monthly cost at 1K tasks/day:
Mid-tier only: ~$480
Hybrid (frontier synthesis): ~$1,200
Frontier only: ~$4,800
Pattern: Parse diff → check patterns → security scan → generate feedback
Avg input tokens: 8K per task
Avg output tokens: 800 per task
Steps: 3–4 (moderate)
Monthly cost at 1K tasks/day:
Mid-tier only: ~$260
Hybrid: ~$720
Frontier only: ~$3,200
Pattern: Classify intent → retrieve KB → draft response → escalation check
Avg input tokens: 2.5K per task
Avg output tokens: 400 per task
Steps: 2–3 (simple-moderate)
Monthly cost at 1K tasks/day:
Budget model: ~$22
Mid-tier: ~$90
Frontier: ~$900
Pattern: Research brief → outline → draft sections → edit for brand voice
Avg input tokens: 6K per task
Avg output tokens: 3K per task
Steps: 4–5 (moderate-complex)
Monthly cost at 1K tasks/day:
Mid-tier: ~$540
Hybrid: ~$1,400
Frontier: ~$5,400