Dev & engineering · free calculator
LLM API cost calculator
Claude, GPT-4o, Gemini, DeepSeek — cost per call, daily/monthly/annual with prompt caching.
Monthly cost
$2,925
$98/day · $35,588/year
Cost per call
$0
Input: $0 · Output: $0
Show the work
- Daily input tokens12,500,000
- Daily output tokens4,000,000
- Cost per call$0
- Daily cost$98
- Monthly cost (30 days)$2,925
- Annual cost$35,588
LLM API cost calculator — model routing and unit economics
Large Language Model API costs drop 10x every 18 months, but they're still the biggest infrastructure line item for most AI-powered SaaS. And pricing is deceptively complex: input tokens, output tokens, cache hits, tool use, and vision all price separately. This calculator helps you estimate monthly spend across providers and model tiers so you can price your product profitably.
How token pricing works
All major LLM providers charge per token, split between input (prompt) and output (generation):
- Input tokens: Everything you send — system prompt, conversation history, RAG context, tool definitions. Cheaper per token than output.
- Output tokens: Everything the model generates. 3-5x more expensive than input on most providers.
- Cached input: Anthropic and OpenAI cache large system prompts. Cache hits cost ~10% of normal input — huge for apps with repeated system prompts.
1 token ≈ 4 characters ≈ 0.75 words. A 500-word input prompt with a 200-word response is ~660 input + ~265 output tokens.
Current model landscape (2026)
Prices move quickly — treat these as ballpark:
- Top-tier reasoning: Claude Opus 4, GPT-4o, Gemini Ultra. $3-30 per 1M input, $15- 120 per 1M output. Best quality, highest cost.
- Mid-tier workhorses: Claude Sonnet 4.6, GPT-4o, Gemini Pro. $1-5 per 1M input, $5-25 per 1M output. Best cost/quality ratio for most production use.
- Budget tier: Claude Haiku 4.5, GPT-4o mini, Gemini Flash. $0.15-0.50 per 1M input, $0.50-2.50 per 1M output. Great for high- volume simple tasks.
- Open source (via Together, Fireworks, Groq): Llama 3.3, DeepSeek V3, Qwen 2.5. $0.10-0.80 per 1M tokens. Competitive quality for many tasks at 5-20x cost advantage.
The cost killers in production
- Agentic loops: Multi-step agents that plan, execute, observe, and re-plan can easily consume 20-100x the tokens of a single turn. Budget carefully — a "simple" agent task that looks like one request might be 50 under the hood.
- Long-context usage: 200k-token context calls look the same as 2k-token calls in code but cost 100x. Retrieval is usually cheaper than stuffing full docs.
- Chain-of-thought / thinking: Reasoning models (o1, Claude with thinking) generate invisible reasoning tokens that count toward output. 5-20x more output cost typical.
- Tool/function calling loops: Each tool round-trip is another API call. A complex workflow hitting 10 tools = 10 API calls billed.
- Repeated system prompts without caching: If your system prompt is 3k tokens and you don't cache, every call pays for it. Cache saves 80-90% on that portion.
Unit economics: how to price your feature
Example: SaaS product with 1,000 users, each making 50 AI calls/month, avg 3k input + 800 output tokens, using Sonnet 4.6:
- Per call: (3000 × $0.003/1k) + (800 × $0.015/1k) = $0.009 + $0.012 = $0.021
- Per user per month: 50 × $0.021 = $1.05
- Total monthly API cost: 1,000 × $1.05 = $1,050
- If pricing at $25/user/mo: 4.2% of revenue
- If pricing at $10/user/mo: 10.5% of revenue
At 10%+ of revenue, margins compress fast. Target API cost < 5% of revenue for sustainable margin. Path there: routing, caching, rate limits.
Model routing strategy
Don't default to the most expensive model. Classify requests and route:
- Simple classification / extraction: Haiku, Flash, GPT-4o mini. 80% of production traffic fits here.
- Summarization / rewriting: Haiku or Sonnet. Rarely needs Opus.
- Multi-step reasoning: Sonnet or GPT-4o.
- Complex planning / agents: Opus or o1. Reserve for the hard cases.
- Fallback / retry: If cheap model output fails validation, retry with mid-tier. If mid-tier fails, escalate to top-tier. Caps cost while preserving quality.
Prompt caching: the biggest lever
Anthropic's prompt caching (and OpenAI's equivalent) can cut input costs 80-90% for repetitive workloads. How it works:
- Mark static parts of your prompt (system message, long context) as cacheable
- First call: full price + small cache-write surcharge
- Subsequent calls within 5-minute window: cached portion at ~10% of normal input price
A support bot with a 5k-token system prompt hitting 10k queries/day: without cache $37.50/day on input, with 90% cache hits $7.50/day. Saves $9k/year on one feature.
Rate limiting and abuse
Without per-user rate limits, one abusive user can blow your monthly budget. Must-haves:
- Per-user daily quota: e.g., 100 calls/day on free tier, 1000/day on paid. Soft throttle at 80%, hard block at 100%.
- Max tokens per call: Cap output tokens to prevent runaway generations.
- Anomaly detection: Alert when any single user consumes > 5% of daily API cost. Usually bot abuse.
- Cost-per-user dashboard: Track actual API cost per user alongside revenue per user. Flag unprofitable users.
Related calculators
Keep the math moving
Dev & engineering
Cloud hosting cost estimator
AWS, GCP, Azure, DO, Fly — monthly cost per MAU by compute, bandwidth, DB, storage.
Dev & engineering
Freelance dev hourly rate
What to charge per hour based on target salary + benefits + overhead + utilization + profit margin.
Dev & engineering
Server capacity planning
Servers needed for peak RPS with CPU/RAM math, utilization targets, and N+1 / 2N redundancy.
Dev & engineering
Database cost calculator
RDS, Aurora Serverless, PlanetScale, Supabase, Neon, Atlas — monthly DB cost with storage + reads + writes.
Dev & engineering
Load balancer breakeven
Self-hosted HAProxy vs managed AWS ALB / GCP LB / Cloudflare — where the crossover point actually is.
Dev & engineering
Tech debt ROI calculator
Turn a debt-fix project into a finance pitch: drag cost today, fix cost, payback months, ROI over 3 years.