Dev & engineering · free calculator
Server capacity planning
Servers needed for peak RPS with CPU/RAM math, utilization targets, and N+1 / 2N redundancy.
Servers needed
11
Base: 10 · With redundancy: 11
Monthly cost
$880
$0 per million requests
Show the work
- Concurrent requests300.0
- CPU cores needed24.00
- RAM needed24,000 MB
- Servers for CPU10
- Servers for RAM5
- Base servers (max)10
- Total with redundancy11
Server capacity planning — how many machines do you actually need?
Over-provisioning wastes money. Under-provisioning causes outages. Most teams do both — they're wildly over on baseline compute and underprovisioned at peak. Proper capacity planning uses three inputs: request volume, per-request resource consumption, and redundancy requirements. This calculator gives you a grounded answer so you don't have to guess.
The capacity math
Two resources typically bottleneck: CPU and memory. You need enough of both.
CPU capacity: CPU-cores needed = (requests/sec × CPU-ms per request) / 1000. If your API handles 2,000 rps and each request needs 12ms of actual CPU work, you need 24 CPU-seconds per second = 24 CPU cores.
Memory capacity: Concurrent requests × memory per request. 2,000 rps with 150ms response time = 300 concurrent requests. At 80MB each, that's 24GB of request-handling RAM.
Add OS + runtime overhead (typically 1-4GB per server), plus whatever persistent data structures your app keeps (caches, connection pools, session data). Real memory need is usually 1.5-2x the request-handling number.
Utilization: the non-intuitive cap
Queueing theory tells us that as utilization approaches 100%, response time grows exponentially. For an M/M/1 queue, response time at X% utilization is: service time / (1 − X).
- At 50% utilization: response time = 2x service time
- At 70%: response time = 3.3x service time
- At 80%: response time = 5x service time
- At 90%: response time = 10x service time
- At 95%: response time = 20x service time
Practical target: 60-70% for web/API tiers (bursty, latency-sensitive), 70-80% for batch workers (less sensitive to individual-request latency).
Redundancy models
Running exactly what you need (N) means any failure causes an outage. Three common patterns:
- None: Run exactly N servers. Cheapest. One server failure = capacity shortage + potential outage. Only acceptable for low-tier services.
- N+1: Run one more than needed. Any single server failure = still at full capacity. Adds 25-50% cost for small N, < 5% for large N. Industry default.
- N+2: Run two extra. Tolerates simultaneous failures. Common for critical services during deploys.
- 2N: Double deployment, typically across zones. Full zone failure survived. Doubles cost. Reserved for highest-tier services.
- Multi-region active-active: 2+ regions, each handling full production traffic. Highest resilience, 2-4x baseline cost. Reserved for payments, auth, critical infra.
Peak vs average
Most apps have 3-10x variance between daily peak and trough. Provisioning for 24/7 peak wastes money; for average causes peak-hour outages.
Options:
- Static + peak: Provision for peak, pay for idle capacity off-peak. Simple, predictable cost, 2-5x more expensive than necessary at average hours.
- Scheduled scaling: Know your patterns (traffic 9am-9pm on weekdays). Scale up/down on schedule. Predictable, no response- time risk if schedule is right.
- Autoscaling on signal: Scale based on CPU, request queue, or custom metric. Responsive but has lag — cold starts matter.
- Serverless: Lambda, Cloud Run, Fargate. Pay per request. Zero idle cost. 2-5x more expensive at steady load, but wins at bursty/low-volume. Cold starts hurt latency- sensitive workloads.
Where predictions break
Common capacity-planning failures:
- Measuring dev-load perf: Local or staging traffic doesn't represent prod patterns. DB query caches hit differently, real users do weird things. Load test with realistic traffic.
- Ignoring p99/p95 burst: Avg rps is fine, peaks matter. Capacity plan for 2-3x average, not average.
- Memory leaks not accounted for: Many apps leak 50-200MB/day per instance. After 7 days, memory usage is 2-4x baseline. Restart schedule or fix leaks before capacity planning.
- N+1 on too-small N: If you need 1 server and run 2 (N+1), you're 100% over- provisioned. Below 3-4 servers, N+1 is expensive percentage-wise.
- DB connection pool bottlenecks: Adding app servers doesn't help if DB connection pool is maxed. Check downstream dependencies before scaling out.
- Deploy spikes: During rolling deploys, capacity briefly drops. N+1 covers this; N doesn't.
Server sizing tradeoffs
Two philosophies:
- Many small servers (cattle): More instances of smaller machines. Finer granularity for autoscaling. Better fault isolation. More overhead per server (OS, runtime, connections).
- Fewer large servers (pets): Big instances serving many requests each. Better resource utilization. More blast radius per failure. Simpler to operate.
Rule of thumb: modern apps favor the cattle model (small + many) for resilience and scaling. Legacy monoliths and heavy services (databases, caches) often run better on pets (few + large).
Capacity buffers
Always have explicit buffers for growth:
- Short-term: 25% headroom at current utilization for traffic spikes.
- Medium-term: 2x current capacity provisionable within hours (autoscaling + quota).
- Long-term: Quarterly capacity review. Plan for 6-month growth at current rate.
Without explicit buffers, the first outage from traffic spike costs you more in lost revenue and reputation than the over-provisioning would have.
Related calculators
Keep the math moving
Dev & engineering
Cloud hosting cost estimator
AWS, GCP, Azure, DO, Fly — monthly cost per MAU by compute, bandwidth, DB, storage.
Dev & engineering
LLM API cost calculator
Claude, GPT-4o, Gemini, DeepSeek — cost per call, daily/monthly/annual with prompt caching.
Dev & engineering
Freelance dev hourly rate
What to charge per hour based on target salary + benefits + overhead + utilization + profit margin.
Dev & engineering
Database cost calculator
RDS, Aurora Serverless, PlanetScale, Supabase, Neon, Atlas — monthly DB cost with storage + reads + writes.
Dev & engineering
Load balancer breakeven
Self-hosted HAProxy vs managed AWS ALB / GCP LB / Cloudflare — where the crossover point actually is.
Dev & engineering
Tech debt ROI calculator
Turn a debt-fix project into a finance pitch: drag cost today, fix cost, payback months, ROI over 3 years.