AI API Cost Calculator

Compare real token pricing for Claude, GPT-5, Gemini, Groq, DeepSeek, and more. Estimate your monthly API spend before you build – no surprises on your invoice.

Prices verified April 2026

1. Select a provider

2. Paste your prompt to count tokens (optional)

Token count appears here after you paste text above.

3. Estimate your usage

Input tokens per call

Output tokens per call

Calls per month

Cost per call

—

USD per API call

Monthly cost

—

USD per month

1K calls/mo

—

10K calls/mo

—

100K calls/mo

—

How this compares across all models — same usage

Model	Per call	Monthly

What Are Tokens and Why Do They Cost Money?

When you send a message to an AI model through its API, the text is broken into small units called tokens before processing. A token is roughly 4 characters or about 0.75 words in English – so a 1,000-word document is approximately 1,300 tokens.

AI providers charge separately for input tokens (what you send to the model – your prompt, system instructions, and conversation history) and output tokens (what the model sends back). Output tokens are almost always more expensive than input tokens, typically 3-10x more depending on the model.

This is why API costs can surprise developers. A simple chatbot with a long system prompt, a full conversation history, and verbose responses can cost far more than flat-fee subscriptions like ChatGPT Plus or Claude Pro. The calculator above helps you estimate real costs before you commit to building.

How to Reduce Your AI API Costs

Prompt compression is the highest-leverage optimization available. Every token you remove from your system prompt multiplies across every API call you make. Trimming a 1,000-token system prompt to 600 tokens saves 400 tokens per call – at 10,000 calls per month, that’s 4 million tokens saved. Rewrite instructions in direct, imperative language. “Please always make sure to respond in a polite and friendly manner” becomes “Be polite and friendly.” Same instruction, 60% fewer tokens.

Choose the right model tier for each task. Not every API call needs a flagship model. Claude Haiku 4.5 and GPT-5.4 nano handle simple classification, extraction, summarization, and support responses at 10-30x lower cost than flagship models with comparable quality for those tasks. Build a tiered routing system that sends simple tasks to cheap models and only escalates complex reasoning to expensive ones.

Implement prompt caching for repeated system prompts. Both Anthropic and OpenAI offer prompt caching that charges 90% less for cached input tokens. If your system prompt is 2,000 tokens sent with every call, caching it reduces that cost to roughly 200 tokens per call. On a high-volume application, this single change can cut your monthly bill by 50% or more.

Set max_tokens on every API call. Uncapped output tokens are the most common cause of runaway API costs. If your use case only needs 200-word responses, set max_tokens to 280. You will never pay for tokens you don’t need.

Current AI API Pricing — April 2026

Model	Input /1M tokens	Output /1M tokens	Best for
Llama 3.1 8B (Groq) cheapest	$0.05	$0.08	Ultra-low cost, simple tasks
GPT-OSS 20B (Groq) 1000 TPS	$0.075	$0.30	Fastest inference available
Gemini 2.5 Flash-Lite cheapest Google	$0.10	$0.40	Bulk processing, classification
Llama 4 Scout (Groq)	$0.11	$0.34	512K context at very low cost
DeepSeek V3.2 best value	$0.14	$0.28	Strong quality at near-zero cost
GPT-OSS 120B (Groq)	$0.15	$0.60	Best open-source quality on Groq
GPT-5.4 nano	$0.20	$1.25	Cheapest GPT-5 model
GPT-5 mini	$0.25	$2.00	Affordable OpenAI mid-tier
Gemini 2.5 Flash	$0.30	$2.50	Fast multimodal, 1M context
Claude Haiku 4.5 cheapest Claude	$1.00	$5.00	High-volume Claude tasks
GPT-5	$1.25	$10.00	OpenAI flagship at competitive cost
Gemini 2.5 Pro	$1.25	$10.00	Best value Google model
GPT-5.2	$1.75	$14.00	Capable OpenAI mid-tier
GPT-4.1	$2.00	$8.00	1M context, proven workhorse
Gemini 3.1 Pro	$2.00	$12.00	Google flagship, cheaper output than GPT-5.4
GPT-5.4 OpenAI flagship	$2.50	$15.00	Complex reasoning and vision
Claude Sonnet 4.6	$3.00	$15.00	Best-in-class coding and agents
Claude Opus 4.6 Anthropic flagship	$5.00	$25.00	1M context, extended thinking
Claude Opus 4.1 (legacy)	$15.00	$75.00	Migrate to Opus 4.6 — 3x cheaper
Gemini 2.0 Flash-Lite deprecated Jun 1	$0.10	$0.40	Migrate to Gemini 2.5 Flash-Lite

Frequently Asked Questions

As of April 2026, Anthropic offers three current Claude models. Claude Haiku 4.5 is the most affordable at $1.00 per million input tokens and $5.00 per million output tokens. Claude Sonnet 4.6 costs $3.00 input and $15.00 output per million tokens. Claude Opus 4.6, the flagship, is $5.00 input and $25.00 output per million tokens. This represents a significant price reduction from earlier generations — the legacy Opus 4.1 cost $15/$75, meaning Opus 4.6 delivers comparable or better performance at one-third the price.

OpenAI’s GPT-5 family has several tiers as of April 2026. GPT-5.4 nano is the cheapest at $0.20 input / $1.25 output per million tokens. GPT-5.4 mini sits at $0.75 / $4.50. GPT-5 is $1.25 / $10.00. GPT-5.2 is $1.75 / $14.00. The flagship GPT-5.4 costs $2.50 input and $15.00 output per million tokens. GPT-5.4 pro, the most expensive, is $30.00 / $180.00 per million tokens and is intended for highly specialized workloads only.

At the budget tier, GPT-5.4 nano ($0.20/$1.25) is significantly cheaper than Claude Haiku 4.5 ($1.00/$5.00). At the mid tier, GPT-5.2 ($1.75/$14.00) is slightly cheaper on input than Claude Sonnet 4.6 ($3.00/$15.00), though output pricing is comparable. At the flagship tier, Claude Opus 4.6 ($5.00/$25.00) is now considerably cheaper than GPT-5.4 ($2.50/$15.00) on output, though GPT-5.4 has cheaper input pricing. The best choice depends on your workload — test both on your specific task before committing.

The cheapest production-grade options as of April 2026 are Llama 3.1 8B on Groq at $0.05/$0.08 per million tokens, and Gemini 2.5 Flash-Lite at $0.10/$0.40. DeepSeek V3.2 at $0.14/$0.28 is remarkably capable for the price and worth testing for tasks that need more intelligence than the budget open-source models. For applications requiring proprietary model quality, Claude Haiku 4.5 at $1/$5 and GPT-5.4 nano at $0.20/$1.25 offer the best cost-to-quality ratio in their respective ecosystems.

You need three numbers: your average input tokens per API call (your prompt plus any context or conversation history), your average output tokens per call (the model’s response), and your estimated number of API calls per month. Multiply input tokens by the model’s input rate per million, output tokens by the output rate per million, sum them for cost per call, then multiply by monthly call volume. The calculator above does all of this automatically and compares the result across all major models simultaneously.

Generating output tokens is computationally more intensive than processing input tokens. Input tokens can be processed in parallel through the model’s attention mechanism, while output tokens must be generated one at a time — each requiring a full forward pass through the model. This sequential generation is what makes output tokens 3-10x more expensive than input tokens depending on the model, and why limiting your max_tokens setting is one of the most effective ways to control API costs.

Prompt caching stores previously processed portions of your prompt — typically your system prompt, large documents, or conversation history — so subsequent requests read from cache rather than reprocessing the same tokens. Cache reads are charged at roughly 10% of the standard input rate on both Anthropic and OpenAI. For an application with a 2,000-token system prompt sent with every call, caching reduces that system prompt cost by 90% on every cached request. At high volumes, prompt caching combined with the Batch API (which offers a 50% discount) can reduce total API costs by up to 95%.

Gemini 2.0 Flash and Gemini 2.0 Flash-Lite are both deprecated as of April 2026 and will be shut down on June 1, 2026. If you are using either of these models, migrate to Gemini 2.5 Flash-Lite (same $0.10/$0.40 pricing, newer architecture) or Gemini 2.5 Flash ($0.30/$2.50 for better quality) before the shutdown date to avoid service disruption.

返回列表

AI接口成本计算器：Claude、GPT-5、Gemini及Groq 2026定价方案 | AIPricingCalc