LLM Token Pricing Index
Every major model, side by side. Free forever. Powered by CARTIE AI.
| Provider | Model | $/1M in | $/1M out | Context | Tier | Notes |
|---|---|---|---|---|---|---|
| AWS | Amazon Nova Lite amazon-nova-lite | $0.060 | $0.240 | 300k | Cheap | Bedrock. One of the cheapest multimodal models. |
Gemini 2.0 Flash gemini-2.0-flash | $0.075 | $0.300 | 1M | Cheap | Cheapest 1M-context option on the market. | |
Gemini 3 Flash gemini-3-flash | $0.100 | $0.400 | 1M | Cheap | Insanely cheap for 1M context. | |
| OpenAI | GPT-4o mini gpt-4o-mini | $0.150 | $0.600 | 128k | Cheap | Best price/quality for high-volume. |
| Cohere | Command R command-r | $0.150 | $0.600 | 128k | Cheap | Cheap RAG-tuned model. |
| Mistral | Mistral Small 3 mistral-small-3 | $0.200 | $0.600 | 32k | Cheap | Apache 2.0 license, self-hostable. |
| DeepSeek | DeepSeek V3 deepseek-v3 | $0.270 | $1.10 | 64k | Balanced | Strong code/math. Cache-hit pricing as low as $0.07. |
| OpenAI | GPT-5.2 mini gpt-5.2-mini | $0.300 | $2.40 | 400k | Balanced | Cheapest GPT-5 family. |
| xAI | Grok 3 mini grok-3-mini | $0.300 | $0.500 | 131k | Cheap | Fastest in xAI family. |
| DeepSeek | DeepSeek R1 deepseek-r1 | $0.550 | $2.19 | 64k | Reasoning | Open reasoning model. ~30× cheaper than o1. |
| Meta | Llama 3.3 70B llama-3.3-70b | $0.590 | $0.790 | 128k | Balanced | Open weights. Pricing shown is Together AI; self-host = free. |
| Anthropic | Claude Haiku 4.5 claude-haiku-4-5 | $0.800 | $4.00 | 200k | Cheap | Anthropic's fastest. Code completion sweet spot. |
| AWS | Amazon Nova Pro amazon-nova-pro | $0.800 | $3.20 | 300k | Balanced | Bedrock-native. Best for AWS-shop integration. |
| OpenAI | GPT-5.2 gpt-5.2 | $1.75 | $14.00 | 400k | Frontier | Highest reasoning tier. Cache-friendly. |
Gemini 3 Pro gemini-3-pro | $2.00 | $12.00 | 1M | Frontier | Massive 1M context window. | |
| Mistral | Mistral Large 3 mistral-large-3 | $2.00 | $6.00 | 128k | Balanced | Strong European option. EU data residency. |
| OpenAI | GPT-4o gpt-4o | $2.50 | $10.00 | 128k | Balanced | Workhorse multimodal model. |
| Cohere | Command R+ command-r-plus | $2.50 | $10.00 | 128k | Balanced | Optimized for RAG and tool use. |
| OpenAI | o1-mini o1-mini | $3.00 | $12.00 | 128k | Reasoning | Reasoning at 5× cheaper than o1. |
| Anthropic | Claude Sonnet 4.5 claude-sonnet-4-5 | $3.00 | $15.00 | 200k | Balanced | Excellent for code + writing. Strong cache pricing. |
| Meta | Llama 3.3 405B llama-3.3-405b | $3.50 | $3.50 | 128k | Frontier | Open weights. Pricing via Fireworks/Together. |
| xAI | Grok 3 grok-3 | $5.00 | $15.00 | 131k | Frontier | X-integrated. Real-time web context. |
| OpenAI | o1 (reasoning) o1 | $15.00 | $60.00 | 200k | Reasoning | Chain-of-thought reasoning, expensive per output token. |
| Anthropic | Claude Opus 4.5 claude-opus-4-5 | $15.00 | $75.00 | 200k | Frontier | Anthropic's largest. Best for complex reasoning. |
Cheapest per tier
Best dollar-for-dollar pick in each quality band (blended in/out).
Cost Calculator
Estimate your monthly bill across every major model.
Spread
Cheapest Amazon Nova Lite at $14.40/mo · most expensive Claude Opus 4.5 at $4.2K/mo · spread is $4.2K/mo.
Math = (requests × input_tokens / 1M × $input) + (requests × output_tokens / 1M × $output). Excludes cache discounts, fine-tuning, multimodal, and committed-use rates.
Hidden costs nobody talks about
Provider pricing pages don't advertise these — but they'll wreck your monthly bill.
Cache pricing diff
Anthropic & OpenAI offer 90% discount on cache READS but 25% premium on cache WRITES. Get this wrong and you pay 3× more.
Fine-tuning surcharge
Fine-tuned models cost 4-6× the base model per token. Sometimes a smaller base + good prompts is cheaper.
Image/audio token math
A single image can be 1,000-12,000 tokens depending on detail tier. Audio billed per second, not per token.
Context-window amplifier
Filling a 1M-token context with Gemini = $2 per request even with cheap pricing. Use the smallest window you can.
Free, citable JSON API
Embed this in your tools. No key required. Updated weekly.
{
"powered_by": "CARTIE AI",
"docs": "https://cartieai.com/tokens",
"last_updated": "2026-02-13",
"currency": "USD",
"unit": "per_1m_tokens",
"models": [
{ "provider": "OpenAI", "model": "gpt-5.2", "input": 1.75, "output": 14.00, "context_k": 400, "tier": "frontier", ... },
...
]
}Please cache responses for ≥1 hour. Cite CARTIE AI if you build with this data. Refreshed weekly.
Want this math on your actual usage?
CARTIE plugs into your LLM proxy and shows the same numbers — line by line, route by route, with model-swap recommendations.