Skip to main content

Dev & engineering · free calculator

Server capacity planning

Servers needed for peak RPS with CPU/RAM math, utilization targets, and N+1 / 2N redundancy.

Redundancy

Servers needed

11

Base: 10 · With redundancy: 11

Monthly cost

$880

$0 per million requests

Show the work

  • Concurrent requests300.0
  • CPU cores needed24.00
  • RAM needed24,000 MB
  • Servers for CPU10
  • Servers for RAM5
  • Base servers (max)10
  • Total with redundancy11

Server capacity planning — how many machines do you actually need?

Over-provisioning wastes money. Under-provisioning causes outages. Most teams do both — they're wildly over on baseline compute and underprovisioned at peak. Proper capacity planning uses three inputs: request volume, per-request resource consumption, and redundancy requirements. This calculator gives you a grounded answer so you don't have to guess.

The capacity math

Two resources typically bottleneck: CPU and memory. You need enough of both.

CPU capacity: CPU-cores needed = (requests/sec × CPU-ms per request) / 1000. If your API handles 2,000 rps and each request needs 12ms of actual CPU work, you need 24 CPU-seconds per second = 24 CPU cores.

Memory capacity: Concurrent requests × memory per request. 2,000 rps with 150ms response time = 300 concurrent requests. At 80MB each, that's 24GB of request-handling RAM.

Add OS + runtime overhead (typically 1-4GB per server), plus whatever persistent data structures your app keeps (caches, connection pools, session data). Real memory need is usually 1.5-2x the request-handling number.

Utilization: the non-intuitive cap

Queueing theory tells us that as utilization approaches 100%, response time grows exponentially. For an M/M/1 queue, response time at X% utilization is: service time / (1 − X).

  • At 50% utilization: response time = 2x service time
  • At 70%: response time = 3.3x service time
  • At 80%: response time = 5x service time
  • At 90%: response time = 10x service time
  • At 95%: response time = 20x service time

Practical target: 60-70% for web/API tiers (bursty, latency-sensitive), 70-80% for batch workers (less sensitive to individual-request latency).

Redundancy models

Running exactly what you need (N) means any failure causes an outage. Three common patterns:

  • None: Run exactly N servers. Cheapest. One server failure = capacity shortage + potential outage. Only acceptable for low-tier services.
  • N+1: Run one more than needed. Any single server failure = still at full capacity. Adds 25-50% cost for small N, < 5% for large N. Industry default.
  • N+2: Run two extra. Tolerates simultaneous failures. Common for critical services during deploys.
  • 2N: Double deployment, typically across zones. Full zone failure survived. Doubles cost. Reserved for highest-tier services.
  • Multi-region active-active: 2+ regions, each handling full production traffic. Highest resilience, 2-4x baseline cost. Reserved for payments, auth, critical infra.

Peak vs average

Most apps have 3-10x variance between daily peak and trough. Provisioning for 24/7 peak wastes money; for average causes peak-hour outages.

Options:

  1. Static + peak: Provision for peak, pay for idle capacity off-peak. Simple, predictable cost, 2-5x more expensive than necessary at average hours.
  2. Scheduled scaling: Know your patterns (traffic 9am-9pm on weekdays). Scale up/down on schedule. Predictable, no response- time risk if schedule is right.
  3. Autoscaling on signal: Scale based on CPU, request queue, or custom metric. Responsive but has lag — cold starts matter.
  4. Serverless: Lambda, Cloud Run, Fargate. Pay per request. Zero idle cost. 2-5x more expensive at steady load, but wins at bursty/low-volume. Cold starts hurt latency- sensitive workloads.

Where predictions break

Common capacity-planning failures:

  • Measuring dev-load perf: Local or staging traffic doesn't represent prod patterns. DB query caches hit differently, real users do weird things. Load test with realistic traffic.
  • Ignoring p99/p95 burst: Avg rps is fine, peaks matter. Capacity plan for 2-3x average, not average.
  • Memory leaks not accounted for: Many apps leak 50-200MB/day per instance. After 7 days, memory usage is 2-4x baseline. Restart schedule or fix leaks before capacity planning.
  • N+1 on too-small N: If you need 1 server and run 2 (N+1), you're 100% over- provisioned. Below 3-4 servers, N+1 is expensive percentage-wise.
  • DB connection pool bottlenecks: Adding app servers doesn't help if DB connection pool is maxed. Check downstream dependencies before scaling out.
  • Deploy spikes: During rolling deploys, capacity briefly drops. N+1 covers this; N doesn't.

Server sizing tradeoffs

Two philosophies:

  • Many small servers (cattle): More instances of smaller machines. Finer granularity for autoscaling. Better fault isolation. More overhead per server (OS, runtime, connections).
  • Fewer large servers (pets): Big instances serving many requests each. Better resource utilization. More blast radius per failure. Simpler to operate.

Rule of thumb: modern apps favor the cattle model (small + many) for resilience and scaling. Legacy monoliths and heavy services (databases, caches) often run better on pets (few + large).

Capacity buffers

Always have explicit buffers for growth:

  • Short-term: 25% headroom at current utilization for traffic spikes.
  • Medium-term: 2x current capacity provisionable within hours (autoscaling + quota).
  • Long-term: Quarterly capacity review. Plan for 6-month growth at current rate.

Without explicit buffers, the first outage from traffic spike costs you more in lost revenue and reputation than the over-provisioning would have.

Related calculators

Keep the math moving