Question 1

What's N+1 redundancy and why do I need it?

Accepted Answer

N+1 means you run one more server than you strictly need. If you need 3 servers to handle peak load, you run 4 — so if one fails (deploy gone bad, hardware failure, crash), the remaining 3 handle full load without downtime. 2N runs exactly double, typically across two availability zones — if an entire zone fails you're still at full capacity. N+1 is the standard for most production workloads; 2N is for high-availability tiers (payments, auth, anything where downtime = lost revenue).

Question 2

Why not run at 100% CPU utilization?

Accepted Answer

Because response times spike exponentially past 80% utilization due to queueing theory (M/M/1 queue). At 70% CPU, avg response time is close to service time. At 90%, response time is 10x service time. At 95%, it's 20x. So you need headroom not for waste but for burst absorption and consistent p95/p99 latency. Target 60-70% steady-state CPU on web/API tiers, 70-80% for batch/async workers.

Question 3

How do I measure CPU-ms per request?

Accepted Answer

Run your service under production-like load for 5-10 minutes. Total CPU seconds consumed ÷ requests served = avg CPU-ms per request. Tools: New Relic APM, Datadog, or simple Linux tools (mpstat, pidstat). Typical values: simple CRUD API 5-20 CPU-ms/request, DB-heavy queries 20-100ms, CPU-bound computation (image processing, ML inference) 100-10,000ms. Measure before planning — assumed numbers are almost always wrong by 2-5x.

Question 4

When should I switch to autoscaling?

Accepted Answer

If traffic is stable (±20% day-over-day), static provisioning is cheaper and simpler. Autoscale when: (1) traffic has 3-10x daily peak-to-trough variance, (2) traffic is bursty (viral events, marketing campaigns), (3) you have clear scale-out signals (queue depth, request queue, CPU over threshold for N minutes). Caveat: cold start latency matters — Java/JVM takes 30-120s, Node/Go/Rust <10s. Factor startup time into scaling aggressiveness.

Server capacity planning

Server capacity planning — how many machines do you actually need?

The capacity math

Utilization: the non-intuitive cap

Redundancy models

Peak vs average

Where predictions break

Server sizing tradeoffs

Capacity buffers

Keep the math moving