1. Select a provider
2. Paste your prompt to count tokens (optional)
3. Estimate your usage
How this compares across all models — same usage
| Model | Per call | Monthly |
|---|
What Are Tokens and Why Do They Cost Money?
When you send a message to an AI model through its API, the text is broken into small units called tokens before processing. A token is roughly 4 characters or about 0.75 words in English – so a 1,000-word document is approximately 1,300 tokens.
AI providers charge separately for input tokens (what you send to the model – your prompt, system instructions, and conversation history) and output tokens (what the model sends back). Output tokens are almost always more expensive than input tokens, typically 3-10x more depending on the model.
This is why API costs can surprise developers. A simple chatbot with a long system prompt, a full conversation history, and verbose responses can cost far more than flat-fee subscriptions like ChatGPT Plus or Claude Pro. The calculator above helps you estimate real costs before you commit to building.
How to Reduce Your AI API Costs
Prompt compression is the highest-leverage optimization available. Every token you remove from your system prompt multiplies across every API call you make. Trimming a 1,000-token system prompt to 600 tokens saves 400 tokens per call – at 10,000 calls per month, that’s 4 million tokens saved. Rewrite instructions in direct, imperative language. “Please always make sure to respond in a polite and friendly manner” becomes “Be polite and friendly.” Same instruction, 60% fewer tokens.
Choose the right model tier for each task. Not every API call needs a flagship model. Claude Haiku 4.5 and GPT-5.4 nano handle simple classification, extraction, summarization, and support responses at 10-30x lower cost than flagship models with comparable quality for those tasks. Build a tiered routing system that sends simple tasks to cheap models and only escalates complex reasoning to expensive ones.
Implement prompt caching for repeated system prompts. Both Anthropic and OpenAI offer prompt caching that charges 90% less for cached input tokens. If your system prompt is 2,000 tokens sent with every call, caching it reduces that cost to roughly 200 tokens per call. On a high-volume application, this single change can cut your monthly bill by 50% or more.
Set max_tokens on every API call. Uncapped output tokens are the most common cause of runaway API costs. If your use case only needs 200-word responses, set max_tokens to 280. You will never pay for tokens you don’t need.
Current AI API Pricing — April 2026
| Model | Input /1M tokens | Output /1M tokens | Best for |
|---|---|---|---|
| Llama 3.1 8B (Groq) cheapest | $0.05 | $0.08 | Ultra-low cost, simple tasks |
| GPT-OSS 20B (Groq) 1000 TPS | $0.075 | $0.30 | Fastest inference available |
| Gemini 2.5 Flash-Lite cheapest Google | $0.10 | $0.40 | Bulk processing, classification |
| Llama 4 Scout (Groq) | $0.11 | $0.34 | 512K context at very low cost |
| DeepSeek V3.2 best value | $0.14 | $0.28 | Strong quality at near-zero cost |
| GPT-OSS 120B (Groq) | $0.15 | $0.60 | Best open-source quality on Groq |
| GPT-5.4 nano | $0.20 | $1.25 | Cheapest GPT-5 model |
| GPT-5 mini | $0.25 | $2.00 | Affordable OpenAI mid-tier |
| Gemini 2.5 Flash | $0.30 | $2.50 | Fast multimodal, 1M context |
| Claude Haiku 4.5 cheapest Claude | $1.00 | $5.00 | High-volume Claude tasks |
| GPT-5 | $1.25 | $10.00 | OpenAI flagship at competitive cost |
| Gemini 2.5 Pro | $1.25 | $10.00 | Best value Google model |
| GPT-5.2 | $1.75 | $14.00 | Capable OpenAI mid-tier |
| GPT-4.1 | $2.00 | $8.00 | 1M context, proven workhorse |
| Gemini 3.1 Pro | $2.00 | $12.00 | Google flagship, cheaper output than GPT-5.4 |
| GPT-5.4 OpenAI flagship | $2.50 | $15.00 | Complex reasoning and vision |
| Claude Sonnet 4.6 | $3.00 | $15.00 | Best-in-class coding and agents |
| Claude Opus 4.6 Anthropic flagship | $5.00 | $25.00 | 1M context, extended thinking |
| Claude Opus 4.1 (legacy) | $15.00 | $75.00 | Migrate to Opus 4.6 — 3x cheaper |
| Gemini 2.0 Flash-Lite deprecated Jun 1 | $0.10 | $0.40 | Migrate to Gemini 2.5 Flash-Lite |