Input price spread
Cheapest input
Sonnet 4.6 input
Opus 4.6 output
Key Takeaways
In Q2 2026, the spread for input token pricing has hit 60x — as low as $0.05 per million tokens with Qwen 3.5 9B, rising to $3 per million for Claude Sonnet 4.6, and exceeding $15 for Opus 4.6 output. The Digital Applied LLM API Pricing Index monitors where this gap is expanding or narrowing, which providers are maintaining high prices, and how agencies should distribute traffic across tiers to preserve margins without sacrificing performance.
This Q2 2026 update categorizes every major model on OpenRouter into five pricing tiers — ultra-low, economy, mid, premium, and free. We then layer on the 90-day change, the cost-routing strategies we employ in production, and the total-cost-of-ownership factors that list prices often miss. All data below is sourced from OpenRouter's April 2026 public pricing table.
Pricing snapshot date: April 12, 2026. LLM rates change monthly — please verify against the OpenRouter models catalog before locking in any cost model. Combine this with our performance-vs-price efficient frontier analysis for a complete view on capabilities.
The Q2 2026 Pricing Landscape
The pricing curve for Q2 2026 is shaped by two opposing forces. On one side, Chinese and open-weight providers continue to squeeze the low end — with Qwen 3.5 9B at $0.05 input, MiMo V2 Flash at $0.09, and Step 3.5 Flash at $0.10. On the other, Anthropic, OpenAI, and Google maintain steady premium prices because spend driven by capability doesn't hunt for discounts. Between them lies a crowded economy tier ($0.15-$0.50) where most high-volume agentic traffic is now settling.
- Ultra-low (<$0.15/M input): ideal for bulk classification, extraction, OCR post-processing, retrieval re-ranking, and agent memory compaction.
- Economy ($0.15-$0.50): suited for planning, tool selection, routine code generation, and shaping structured data.
- Mid-tier ($0.50-$3): designed for reasoning-heavy tasks, complex tool chains, multi-step agent workflows, and technical writing.
- Premium ($3+): reserved for terminal reasoning, irreversible actions, client-facing one-shot outputs, and the final mile of difficult coding problems.
- Free tier: perfect for experimentation, load testing, fallback routes, and non-critical background jobs where variable latency is acceptable.
Design the routing layer first. Model selection is merely a result of workload classification. Work with our AI Digital Transformation team to build the classification and routing tier that will sustain the rest of your AI budget.
Ultra-Low Tier (<$0.15/M Input)
The most significant action in Q2 2026 is found in the ultra-low tier. Four models priced under $0.15 input — Qwen 3.5 9B, Qwen 3.5 Flash, MiMo V2 Flash, and Step 3.5 Flash — collectively handle the majority of non-reasoning agent traffic we observe in agency pipelines. All four support over 256K context, and Qwen 3.5 Flash offers a full 1M context at $0.065 input — a price-to-context ratio that was unavailable anywhere just twelve months ago.
| Model | Provider | Input $/M | Output $/M | Context |
|---|---|---|---|---|
| Qwen 3.5 9B | Alibaba | $0.05 | $0.15 | 256K |
| Qwen 3.5 Flash | Alibaba | $0.065 | $0.26 | 1M |
| MiMo V2 Flash | Xiaomi | $0.09 | $0.29 | 262K |
| Step 3.5 Flash | StepFun | $0.10 | $0.30 | 262K (free tier) |
Route aggressively to the ultra-low tier. In our internal pipelines, approximately 55-65% of total tokens flow through this band after classification-first routing. On extraction tasks, the cost difference compared to mid-tier is typically 10-20x for identical output quality.
Economy Tier ($0.15-$0.50)
The economy tier is the most active segment of the Q2 2026 market. It hosts models like Qwen 3 Coder Next for software tasks, MiniMax M2.5 and M2.7 for general agentic traffic, Qwen 3.5 35B and 3.5 Plus for balanced reasoning, and MiMo V2 Omni for multimodal needs. This is where most planning, tool-routing, and structured generation should land for agencies optimizing cost without dropping to ultra-low quality.
| Model | Provider | Input $/M | Output $/M | Context |
|---|---|---|---|---|
| Qwen 3 Coder Next | Alibaba | $0.12 | $0.75 | 256K |
| MiniMax M2.5 | MiniMax | $0.12 | $0.99 | 197K |
| Qwen 3.5 35B | Alibaba | $0.16 | $1.30 | 262K |
| Qwen 3.5 Plus | Alibaba | $0.26 | $1.56 | 1M |
| MiniMax M2.7 | MiniMax | $0.30 | $1.20 | 205K |
| MiMo V2 Omni | Xiaomi | $0.40 | $2.00 | 262K |
Be mindful of the output pricing variance within this band. Qwen 3 Coder Next is at $0.75 output versus a $0.12 input, while MiMo V2 Omni hits $2 output on a $0.40 input. For workloads involving extensive generation, the economics will vary significantly depending on which economy-tier model is used, so benchmark your specific input/output ratio before standardizing.
Mid-Tier ($0.50-$3)
The mid-tier has become leaner, as the ultra-low and economy bands have absorbed much of the workload that would have fallen here in 2025. What remains sits roughly between $0.75 and $1 on input: MiMo V2 Pro acts as the heavyweight generalist with a 1.04M context window, and Qwen 3 Max Thinking serves as the reasoning variant for step-by-step problem resolution.
| Model | Provider | Input $/M | Output $/M | Context |
|---|---|---|---|---|
| Qwen 3 Max Thinking | Alibaba | $0.78 | $3.90 | 262K |
| MiMo V2 Pro | Xiaomi | $1.00 | $3.00 | 1.04M |
MiMo V2 Pro is currently the top model on OpenRouter by volume, processing 4.79T weekly tokens and handling about a quarter of all coding tokens observed on the network. This concentration of real-world workload at $1/$3 indicates the mid-tier's pricing ceiling: the market has decided that reasoning-grade, 1M-context capability shouldn't cost more than $1-$3 per million input unless the model reaches a premium capability bar.
Premium Tier ($3+)
The premium tier is dominated by Anthropic and OpenAI. Claude Sonnet 4.6 at $3/$15 and Opus 4.6 at $5/$25 (via OpenRouter) have held firm through Q2 despite pressure from cheaper Chinese models performing similarly on benchmarks. The GPT-5.4 family fits in here as well: GPT-5.4 at $2.50/$15, GPT-5.3-Codex at $1.75/$14, and GPT-5.4 Pro leading the market at $30/$180. This is where capability-bound spend is concentrated.
| Model | Provider | Input $/M | Output $/M | Context |
|---|---|---|---|---|
| GPT-5.4 | OpenAI | $2.50 | $15.00 | 1.05M |
| Claude Sonnet 4.6 | Anthropic | $3.00 | $15.00 | 200K / 1M beta |
| Claude Opus 4.6 | Anthropic | $5.00 | $25.00 | 200K / 1M beta |
| GPT-5.4 Pro | OpenAI | $30.00 | $180.00 | 1.05M |
The Opus concentration problem. Claude Opus 4.6 alone drives roughly $25.1M per month in API spend, dominating Anthropic's direct API revenue mix. We explore the revenue-geometry implications in the Anthropic cost problem analysis.
Free Tier Models
Q2 2026 has brought an unusually robust free tier. Qwen 3.6 Plus is completely free during preview with a 1M context window — already rising to the #2 spot on OpenRouter by volume with 1.64T weekly tokens. NVIDIA's Nemotron 3 Super 120B and Nemotron 3 Nano 30B both offer free tiers and 256K+ context. For agencies, these free tiers act as a genuine infrastructure subsidy and should be included in any cost plan as a fallback and experimentation route.
| Model | Provider | Cost | Context | Notes |
|---|---|---|---|---|
| Qwen 3.6 Plus | Alibaba | Free (preview) | 1M | #2 on OpenRouter, always-on CoT, native function calling |
| Nemotron 3 Super 120B | NVIDIA | Free tier | 262K | 120B/12B active, 60.47% SWE-Bench Verified, open-source |
| Nemotron 3 Nano 30B | NVIDIA | Free tier | 256K | Open-source, compact deployment-friendly |
| Step 3.5 Flash | StepFun | Free tier | 262K | Paid tier also available at $0.10/$0.30 |
Approach free-tier routing as an operational decision, not just a cost-saving measure. Free tiers come with rate limits, latency fluctuations, and provider-side preview caveats, making them best suited for fallback chains, background batch jobs, and development sandboxes rather than customer-facing production paths.
90-Day Delta Analysis
The most notable shift between Q1 2026 and Q2 2026 is what failed to occur. Anthropic held Sonnet and Opus pricing steady despite Sonnet 4.6's launch pressing Opus margins. OpenAI did not significantly reprice the GPT-5.4 family. Google kept Gemini 3.1 Pro at $2/$12. The premium tier remains stable, not eroding.
- Ultra-low compression continues. Qwen 3.5 Flash debuted at $0.065/$0.26 with 1M context, resetting price-per-context expectations for the entire low-end market.
- Economy tier crowding. Six distinct models now occupy the $0.12-$0.40 input band, with output prices differing by 2.5x across them for similar task quality.
- Mid-tier shrinks. Workloads previously directed to mid-tier have moved to either cheaper economy-tier models or premium Claude Sonnet 4.6. Only MiMo V2 Pro and Qwen 3 Max Thinking retain significant mid-tier share.
- Premium holds. No price changes for flagship Anthropic or OpenAI models in Q2 2026. Spend driven by capability is not price-elastic at the premium tier.
- Free-tier expansion. Qwen 3.6 Plus and the Nemotron 3 family added large-context free options that were absent from Q1 2026 pricing sheets.
The strategic takeaway is that the pricing curve is becoming more bimodal, not smoother. Cheap gets cheaper. Premium stays premium. Agencies should be most cautious about defaulting in the middle, as workload classification now routes most requests either below or above it.
Agency Cost-Routing Strategy
The most impactful lever in LLM cost management is establishing a routing tier before choosing models. The goal is straightforward: classify every query by complexity and match it to the cheapest model capable of serving it at the required quality level. When executed well, this reduces API spend by 60-80% compared to naive single-model deployments, and it scales automatically with every new model released without requiring architectural changes.
The Four-Stage Stack
- Classi