Question 1

How do I reduce LLM API costs?

Accepted Answer

Top five: (1) Prompt caching — Anthropic caches large system prompts at ~10% of normal input cost, can cut bills 50-80% for repetitive workloads. (2) Model routing — use Haiku/Flash/mini for simple tasks, reserve Opus/GPT-4 for complex reasoning. Often 80% of calls fit the cheaper model. (3) Shorter outputs — limit max_tokens aggressively; unused tokens still cost. (4) Batching — some providers offer 50% discount for async batch processing. (5) Fine-tuning — for repeated patterns, a fine-tuned smaller model can match big-model quality at 5-10x lower cost.

Question 2

How many tokens is my average prompt?

Accepted Answer

Rough conversion: 1 token ≈ 4 English characters ≈ 0.75 words. So 1,000 tokens ≈ 750 words ≈ 3-4 paragraphs. Typical values: chatbot turn (500-2,000 tokens), RAG retrieval with 5 chunks (3,000-8,000 tokens), full document analysis (10,000-100,000 tokens), agentic multi-step workflow (50,000-500,000 tokens total across calls). Use the actual tokenizer (tiktoken for OpenAI, anthropic SDK count_tokens) before committing to a use case at scale.

Question 3

Is Haiku/mini good enough?

Accepted Answer

For 60-80% of production use cases, yes. Smaller models (Haiku 4.5, GPT-4o mini, Gemini Flash) handle classification, extraction, summarization, simple rewrites, and basic Q&A at 90-95% quality of top-tier models for 5-30x less cost. The exceptions where top-tier wins: multi-step reasoning, novel problem-solving, adversarial robustness, long-context reasoning. Build evals for your specific task before deciding — the gap varies wildly by use case.

Question 4

What margin should I charge on LLM API passthrough?

Accepted Answer

Direct passthrough-plus-markup is dying as API costs plummet and users compare prices. Better: price on value delivered, not tokens consumed. If your feature saves 2 hours of work at $100/hr, charge $20-50 per use even if API cost is $0.50. Unit economics rule: target 80%+ gross margin on API spend for B2B SaaS, 70%+ for B2C. Cap usage per tier to prevent runaway costs from power users. Monitor per-user API cost daily — a 1% of users consuming 80% of tokens is normal and requires rate limits.

LLM API cost calculator

LLM API cost calculator — model routing and unit economics

How token pricing works

Current model landscape (2026)

The cost killers in production

Unit economics: how to price your feature

Model routing strategy

Prompt caching: the biggest lever

Rate limiting and abuse

Keep the math moving