How is AI API pricing calculated?

Most LLM providers charge per token, with separate rates for input (prompt) tokens and output (completion) tokens. This calculator multiplies your token counts by each model's per-token rate to show the total cost.

What's the difference between input and output pricing?

Input tokens are what you send to the model (your prompt). Output tokens are what the model generates in response. Output tokens are typically 2-5x more expensive than input tokens because generation requires more compute.

Which AI model is cheapest?

For budget use, GPT-4o Mini and Gemini 2.5 Flash offer the lowest per-token pricing. Open-weight models like Llama 4 are free to use if you self-host, though you'll pay for compute infrastructure.

How often do AI model prices change?

Pricing can change at any time. We update our pricing data regularly. Always check the provider's official pricing page for the most current rates.

Can I calculate monthly costs?

Yes. Enter your estimated tokens per request and number of requests per month to see projected monthly costs across all models.

AI Pricing Calculator

How LLM API Pricing Works

If you’re building anything with AI, you’ve probably stared at a pricing page and thought “okay, but what will this actually cost me?” That’s what this calculator is for. Plug in your numbers, see what you’ll pay across every major provider, and stop guessing.

Most LLM providers use a token-based pricing model. Tokens aren’t characters or words — they’re chunks of text that the model processes internally. A rough rule of thumb: 1 token is about 3-4 characters in English, or roughly 0.75 words. But that varies by model and tokenizer.

The key thing to understand: input tokens and output tokens are priced differently. Input tokens (your prompt) are cheaper because the model just reads them. Output tokens (the model’s response) cost more because generation is computationally heavier. With most providers, output tokens run 2-5x the input price.

The Real Cost Breakdown

Here’s what actually hits your bill:

Input tokens — everything you send: system prompts, user messages, conversation history, injected context (like RAG results)
Output tokens — everything the model generates back
Hidden multiplier — if you’re passing conversation history, those tokens get re-sent (and re-charged) every turn

That last point catches a lot of people. A chatbot that keeps 20 turns of history isn’t just paying for the latest message — it’s paying for all 20 turns as input every single time. This is where costs spiral.

Beyond Per-Token Pricing

Token costs aren’t the whole picture. Depending on your use case, watch for:

Fine-tuning costs — training runs are billed per token at higher rates, plus you pay storage fees for custom models
Embedding costs — if you’re doing RAG, you’re paying separately for embedding generation (usually cheap, but it adds up at scale)
Rate limits — higher tiers cost more monthly but give you better throughput
Minimum spend — some enterprise tiers require committed spend
Cached input discounts — Anthropic and OpenAI both offer discounts when your prompt prefix is cached, cutting input costs by 50-90%

Tips for Keeping Costs Down

After working with these APIs for a while, here’s what actually moves the needle:

Pick the right model for the job. Don’t use GPT-5.4 or Claude Opus for tasks that Haiku or GPT-4o Mini can handle. Classification, extraction, and simple Q&A don’t need frontier models.

Trim your prompts. Every token in your system prompt gets charged on every request. Be concise. Use abbreviations in few-shot examples. Strip unnecessary formatting.

Use caching. If your system prompt or context doesn’t change between requests, prompt caching can cut input costs dramatically. Both Anthropic and OpenAI support this.

Set max_tokens wisely. Don’t set it to 4096 if you expect 200-token responses. While you only pay for tokens actually generated, a lower cap prevents runaway completions.

Batch when possible. Some providers offer batch APIs at 50% discounts for non-real-time workloads.

Comparing Pricing Models

The landscape breaks down roughly like this:

Premium frontier (Claude Opus 4.6, GPT-5.4, Gemini 3) — highest capability, highest cost. Use for complex reasoning, coding, analysis.
Mid-tier (Claude Sonnet, GPT-4o, Gemini 2.5 Pro, Grok 3) — great balance of quality and cost for production apps.
Budget (Claude Haiku, GPT-4o Mini, Gemini 2.5 Flash, DeepSeek V3) — surprisingly capable for routine tasks at a fraction of the price.
Open-weight (Llama 4 Maverick) — zero API cost if self-hosted, but you’re paying for GPU compute instead.

The right choice depends on your latency requirements, quality bar, and volume. Most production systems end up using 2-3 models: a cheap one for simple tasks, a mid-tier for most work, and a frontier model for the hard stuff.

Use the calculator above to model your specific usage pattern. Enter your token counts per request, how many requests you expect, and see exactly what each provider will charge.

AI Pricing Calculator

You might also need

AI Token Counter

AI Model Comparison Table

JSON Schema Generator for Function Calling

How LLM API Pricing Works

The Real Cost Breakdown

Beyond Per-Token Pricing

Tips for Keeping Costs Down

Comparing Pricing Models

Frequently Asked Questions