How is AI API pricing calculated?

Most LLM providers charge per token, with separate rates for input (prompt) tokens and output (completion) tokens. This calculator multiplies your token counts by each model's per-token rate to show the total cost per request, per day, or per month.

What's the difference between input and output token pricing?

Input tokens are what you send to the model (your prompt, system instructions, and context). Output tokens are what the model generates in response. Output tokens typically cost 2-5x more than input tokens because generation is more compute-intensive.

Which AI model is the cheapest to use?

For budget use, GPT-4o Mini and Gemini 2.5 Flash offer the lowest per-token pricing among hosted APIs. Open-weight models like Llama 4 Maverick and DeepSeek V3 are free if you self-host, though you pay for GPU compute instead.

How often do AI model prices change?

Pricing can change at any time — providers often drop prices after launching newer models. We update our pricing data regularly, but always verify against the provider's official pricing page for the most current rates.

Can I calculate monthly AI API costs?

Yes. Enter your estimated tokens per request and number of requests, then switch to the 'Per Month' multiplier to see projected monthly costs across all models side by side.

What are cached input tokens and how do they reduce cost?

Cached input tokens are prompt prefixes that don't change between requests (like system prompts). Both Anthropic and OpenAI offer 50-90% discounts on cached tokens, which can dramatically reduce costs for high-volume applications.

How do I estimate how many tokens my requests will use?

A rough rule: 1 token is about 4 characters or 0.75 words in English. For precise counts, use a token counter tool. Our AI Token Counter supports GPT, Claude, and Llama tokenizers for exact estimates.

AI Cost Calculator

How to Use This AI Cost Calculator

Enter your token counts. Set the number of input tokens (your prompt) and output tokens (the model’s response) per request. Not sure how many tokens you’ll use? A rough rule: 1 token ≈ 4 characters or 0.75 words in English. Use our AI Token Counter for exact numbers.
Set the request volume. Enter how many requests you expect, then choose the frequency — per request, per day, or per month.
Hit Calculate (or press Ctrl+Enter). The calculator shows every major model’s cost, sorted cheapest first.
Compare and export. Toggle between grid and table view. Use “Copy as CSV” to drop the comparison into a spreadsheet or Notion doc.

The results update live as you change inputs, so you can quickly model different scenarios — “what if I double the context window?” or “what does switching from Opus to Sonnet save?”

How LLM API Pricing Works

If you’re building anything with AI, you’ve probably stared at a pricing page and thought “okay, but what will this actually cost me?” That’s what this calculator is for. Plug in your numbers, see what you’ll pay across every major provider, and stop guessing.

Most LLM providers use a token-based pricing model. Tokens aren’t characters or words — they’re chunks of text that the model processes internally. Different models use different tokenizers, so the same text produces different token counts across providers. That’s why comparing raw per-token prices without normalizing for tokenizer efficiency can be misleading.

The key thing to understand: input tokens and output tokens are priced differently. Input tokens (your prompt) are cheaper because the model just reads them. Output tokens (the model’s response) cost more because generation is computationally heavier. With most providers, output tokens run 2-5x the input price.

The Real Cost Breakdown

Here’s what actually hits your bill:

Input tokens — everything you send: system prompts, user messages, conversation history, injected context (like RAG results)
Output tokens — everything the model generates back
Hidden multiplier — if you’re passing conversation history, those tokens get re-sent (and re-charged) every turn

That last point catches a lot of people. A chatbot that keeps 20 turns of history isn’t just paying for the latest message — it’s paying for all 20 turns as input every single time. This is where costs spiral.

Common Mistakes When Estimating AI API Costs

Ignoring conversation history accumulation. Every turn of a multi-turn chat re-sends all previous messages as input tokens. A 20-turn conversation with 500 tokens per turn isn’t 10,000 input tokens — it’s roughly 105,000 (the triangular sum). This alone can 10x your initial estimate.

Forgetting system prompt costs. A 2,000-token system prompt gets charged on every single request. At 100k requests/month, that’s 200M input tokens just for the system prompt — which can cost $30-600/month depending on the model.

Comparing prices without accounting for output quality. A cheaper model that needs 3 retries costs more than an expensive model that gets it right the first time. Factor in your expected success rate.

Overlooking embedding and retrieval costs. If your app uses RAG (retrieval-augmented generation), you’re paying for embedding generation on every document chunk plus the LLM call. The embedding cost is usually small per-call, but it compounds fast at scale.

Not using prompt caching. If your system prompt or few-shot examples stay the same across requests, both Anthropic and OpenAI offer 50-90% discounts on cached input tokens. Skipping this is leaving money on the table.

Beyond Per-Token Pricing

Token costs aren’t the whole picture. Depending on your use case, watch for:

Fine-tuning costs — training runs are billed per token at higher rates, plus you pay storage fees for custom models
Embedding costs — if you’re doing RAG, you’re paying separately for embedding generation (usually cheap, but it adds up at scale)
Rate limits — higher tiers cost more monthly but give you better throughput
Minimum spend — some enterprise tiers require committed spend
Cached input discounts — Anthropic and OpenAI both offer discounts when your prompt prefix is cached, cutting input costs by 50-90%

Tips for Keeping AI API Costs Down

Pick the right model for the job. Don’t use GPT-5.4 or Claude Opus for tasks that Haiku or GPT-4o Mini can handle. Classification, extraction, and simple Q&A don’t need frontier models.

Trim your prompts. Every token in your system prompt gets charged on every request. Be concise. Use abbreviations in few-shot examples. Strip unnecessary formatting.

Use caching. If your system prompt or context doesn’t change between requests, prompt caching can cut input costs dramatically. Both Anthropic and OpenAI support this.

Set max_tokens wisely. Don’t set it to 4096 if you expect 200-token responses. While you only pay for tokens actually generated, a lower cap prevents runaway completions.

Batch when possible. Some providers offer batch APIs at 50% discounts for non-real-time workloads. Anthropic’s Message Batches API and OpenAI’s Batch API both support this.

Route by complexity. Use a fast classifier (or a cheap model) to categorize incoming requests, then route simple ones to budget models and only send complex queries to frontier models. This “model router” pattern can cut costs 50-70% with minimal quality loss.

2026 AI Model Pricing Landscape

The pricing landscape has shifted significantly. Here’s how the tiers break down today:

Premium frontier (Claude Opus 4.6, GPT-5.4, Gemini 3 Ultra) — highest capability, highest cost. Use for complex reasoning, multi-step coding, and deep analysis. Expect $10-30 per million output tokens.
Mid-tier workhorses (Claude Sonnet 4.6, GPT-4o, Gemini 2.5 Pro, Grok 3) — the sweet spot for most production apps. Strong quality at $3-15 per million output tokens.
Budget (Claude Haiku 4.5, GPT-4o Mini, Gemini 2.5 Flash, DeepSeek V3) — surprisingly capable for routine tasks at $0.25-4 per million output tokens. These handle 80% of typical workloads.
Open-weight (Llama 4 Maverick, Qwen 3) — zero API cost if self-hosted on your own GPUs. You’re trading API fees for compute infrastructure costs.

The right choice depends on your latency requirements, quality bar, and volume. Most production systems use 2-3 models: a cheap one for simple tasks, a mid-tier for most work, and a frontier model for the hard stuff.

Use the calculator above to model your specific usage pattern. Enter your token counts per request, how many requests you expect, and see exactly what each provider will charge. For individual model deep-dives, see our dedicated calculators for GPT-5, Claude Opus, and DeepSeek.

AI Cost Calculator

You might also need

AI Token Counter

AI Model Comparison Table

GPT-5 Pricing Calculator

Claude Opus 4.6 Pricing Calculator

DeepSeek Pricing Calculator