Skip to main content

Dev & engineering · free calculator

LLM API cost calculator

Claude, GPT-4o, Gemini, DeepSeek — cost per call, daily/monthly/annual with prompt caching.

Model

Monthly cost

$2,925

$98/day · $35,588/year

Cost per call

$0

Input: $0 · Output: $0

Show the work

  • Daily input tokens12,500,000
  • Daily output tokens4,000,000
  • Cost per call$0
  • Daily cost$98
  • Monthly cost (30 days)$2,925
  • Annual cost$35,588

LLM API cost calculator — model routing and unit economics

Large Language Model API costs drop 10x every 18 months, but they're still the biggest infrastructure line item for most AI-powered SaaS. And pricing is deceptively complex: input tokens, output tokens, cache hits, tool use, and vision all price separately. This calculator helps you estimate monthly spend across providers and model tiers so you can price your product profitably.

How token pricing works

All major LLM providers charge per token, split between input (prompt) and output (generation):

  • Input tokens: Everything you send — system prompt, conversation history, RAG context, tool definitions. Cheaper per token than output.
  • Output tokens: Everything the model generates. 3-5x more expensive than input on most providers.
  • Cached input: Anthropic and OpenAI cache large system prompts. Cache hits cost ~10% of normal input — huge for apps with repeated system prompts.

1 token ≈ 4 characters ≈ 0.75 words. A 500-word input prompt with a 200-word response is ~660 input + ~265 output tokens.

Current model landscape (2026)

Prices move quickly — treat these as ballpark:

  • Top-tier reasoning: Claude Opus 4, GPT-4o, Gemini Ultra. $3-30 per 1M input, $15- 120 per 1M output. Best quality, highest cost.
  • Mid-tier workhorses: Claude Sonnet 4.6, GPT-4o, Gemini Pro. $1-5 per 1M input, $5-25 per 1M output. Best cost/quality ratio for most production use.
  • Budget tier: Claude Haiku 4.5, GPT-4o mini, Gemini Flash. $0.15-0.50 per 1M input, $0.50-2.50 per 1M output. Great for high- volume simple tasks.
  • Open source (via Together, Fireworks, Groq): Llama 3.3, DeepSeek V3, Qwen 2.5. $0.10-0.80 per 1M tokens. Competitive quality for many tasks at 5-20x cost advantage.

The cost killers in production

  1. Agentic loops: Multi-step agents that plan, execute, observe, and re-plan can easily consume 20-100x the tokens of a single turn. Budget carefully — a "simple" agent task that looks like one request might be 50 under the hood.
  2. Long-context usage: 200k-token context calls look the same as 2k-token calls in code but cost 100x. Retrieval is usually cheaper than stuffing full docs.
  3. Chain-of-thought / thinking: Reasoning models (o1, Claude with thinking) generate invisible reasoning tokens that count toward output. 5-20x more output cost typical.
  4. Tool/function calling loops: Each tool round-trip is another API call. A complex workflow hitting 10 tools = 10 API calls billed.
  5. Repeated system prompts without caching: If your system prompt is 3k tokens and you don't cache, every call pays for it. Cache saves 80-90% on that portion.

Unit economics: how to price your feature

Example: SaaS product with 1,000 users, each making 50 AI calls/month, avg 3k input + 800 output tokens, using Sonnet 4.6:

  • Per call: (3000 × $0.003/1k) + (800 × $0.015/1k) = $0.009 + $0.012 = $0.021
  • Per user per month: 50 × $0.021 = $1.05
  • Total monthly API cost: 1,000 × $1.05 = $1,050
  • If pricing at $25/user/mo: 4.2% of revenue
  • If pricing at $10/user/mo: 10.5% of revenue

At 10%+ of revenue, margins compress fast. Target API cost < 5% of revenue for sustainable margin. Path there: routing, caching, rate limits.

Model routing strategy

Don't default to the most expensive model. Classify requests and route:

  • Simple classification / extraction: Haiku, Flash, GPT-4o mini. 80% of production traffic fits here.
  • Summarization / rewriting: Haiku or Sonnet. Rarely needs Opus.
  • Multi-step reasoning: Sonnet or GPT-4o.
  • Complex planning / agents: Opus or o1. Reserve for the hard cases.
  • Fallback / retry: If cheap model output fails validation, retry with mid-tier. If mid-tier fails, escalate to top-tier. Caps cost while preserving quality.

Prompt caching: the biggest lever

Anthropic's prompt caching (and OpenAI's equivalent) can cut input costs 80-90% for repetitive workloads. How it works:

  • Mark static parts of your prompt (system message, long context) as cacheable
  • First call: full price + small cache-write surcharge
  • Subsequent calls within 5-minute window: cached portion at ~10% of normal input price

A support bot with a 5k-token system prompt hitting 10k queries/day: without cache $37.50/day on input, with 90% cache hits $7.50/day. Saves $9k/year on one feature.

Rate limiting and abuse

Without per-user rate limits, one abusive user can blow your monthly budget. Must-haves:

  • Per-user daily quota: e.g., 100 calls/day on free tier, 1000/day on paid. Soft throttle at 80%, hard block at 100%.
  • Max tokens per call: Cap output tokens to prevent runaway generations.
  • Anomaly detection: Alert when any single user consumes > 5% of daily API cost. Usually bot abuse.
  • Cost-per-user dashboard: Track actual API cost per user alongside revenue per user. Flag unprofitable users.

Related calculators

Keep the math moving