Users & Traffic
Turn DAU into requests per second.
ƒ QPS_avg = DAU × req/user/day ÷ 86,400     QPS_peak = QPS_avg × peak_ratio
Inputs
users
Type "10m" or "1.5b"
req
×
Web: 2–3×. Spiky: 5–10×.
: 1
Social 100:1. Logging 1:100.
Results
Total requests / day
Average QPS
Peak QPS
Peak read QPS
Peak write QPS

Storage
Disk needed for records over a retention window.
ƒ total = records/day × size × retention_days × (1 + index) × replicas
Inputs
rec
Tweet ≈ 300 B. Image ≈ 1 MB.
×
days
5 yr = 1,825 days · 1 yr = 365 days
frac
0.30 = +30%
Results
Per day
Per month
Per year

Raw over retention
Total provisioned
with replication + overhead

Bandwidth
Network traffic in and out of your service.
ƒ bandwidth = QPS × payload_size     (per direction)
Inputs
req/s
JSON API ~1 KB. Photo ~1 MB.
Webpage ~100 KB. Video chunk ~256 KB.
Results
Ingress (in)
Egress (out)

Total (in + out)
Per month
Per year

Throughput & Concurrency
How many in-flight requests does this load create?
ƒ Little's Law: L = λ × W     (concurrency = QPS × latency)
Inputs
req/s
ms
srv
Results
Total in-flight requests
across the whole fleet

Per-server QPS
Per-server concurrency
threads/connections per box
CPU cores hint
rough: concurrency/2 (I/O-bound)

Cache Sizing
Size a Redis / Memcached tier for the hot working set.
ƒ cache = items × hot_fraction × bytes_per_item × (1 + overhead)
Inputs
items
frac
0.20 = 80/20 rule
frac
0.20 = +20% for keys / metadata
Results
Hot items
Working set

Recommended cache size
with overhead

RAM per Server
Memory for in-memory session / connection state.
ƒ per_server = users × bytes/user ÷ servers × (1 + headroom)
Inputs
users
srv
frac
0.30 = +30% safety
Results
Total session memory
Per server (steady)
Per server (with headroom)
size boxes to at least this

Media Streaming
Bandwidth and storage for audio/video delivery.
ƒ peak_bw = viewers × bitrate     storage = upload_hrs/day × bitrate × 3600s × versions × retention × replication
Inputs
viewers
Mbps
Audio ~0.128 · 720p ~2.5 · 1080p ~5 · 4K ~20
hrs/day
YouTube ≈ 720k hrs/day. Small platform ≈ 1k–10k.
×
Quality tiers to store (e.g. 360p/720p/1080p = 3)
days
×
Results
Peak streaming bandwidth
total egress to all viewers
Daily data transferred

Storage per hour of content
all encoding versions
Storage per day of uploads
Total storage over retention
with replication

AI / LLM Inference
GPU count and memory for serving a large language model.
ƒ GPUs = max(⌈(weights + KV_cache) ÷ VRAM⌉, ⌈RPS × tokens ÷ GPU_tps⌉)
Model & Hardware
B params
Llama 3 8B · Llama 3 70B · GPT-4 ~1.8T
B/p
FP32=4 · FP16/BF16=2 · INT8=1 · INT4=0.5
GB
H100=80 · A100=40/80 · A10=24 · RTX 4090=24
tok/s
H100 ~3k · A100 ~2k · A10 ~500 (output tokens)
reqs
≈ context_len × 0.5 MB per 1K tokens (7B model)
req/s
tok
Input + output combined
Results
Model weight memory
KV cache (full batch)
Total GPU memory needed

GPUs for memory
⌈total_mem ÷ VRAM⌉
GPUs for throughput
⌈RPS × tokens ÷ GPU_tps⌉
Min GPU count
memory- or throughput-bound

WebSocket / Real-time
Persistent connection bandwidth, fan-out, and server sizing.
ƒ outbound = connections × msg_rate × msg_size × fan_out     servers = ⌈connections × mem_per_conn ÷ server_mem⌉
Inputs
conns
msg/s
Presence ~0.03 · Chat ~0.1 · Trading ~10
JSON ~200–2k B · Binary frame ~50–500 B
recv/msg
DM=1 · Group chat ~50 · Global broadcast=all
Node.js WS ~10–50 KB · Erlang/Go ~2–5 KB
GB
Results
Inbound bandwidth
all clients → server
Outbound bandwidth
server → clients (after fan-out)
Total bandwidth

Total connection memory
Max connections / server
memory-bound
Servers needed
Frequently Asked Questions
How do I convert DAU to QPS?
Multiply DAU by requests per user per day, divide by 86,400 (seconds in a day) to get average QPS. Multiply by a peak factor (typically 2–3×) to get peak QPS. Example: 10M DAU × 20 req/day ÷ 86,400 × 3 = ~6,944 peak QPS.
What is Little's Law for concurrent requests?
Little's Law states L = λ × W: concurrency (in-flight requests) equals arrival rate (QPS) multiplied by average latency in seconds. At 10,000 QPS with 50ms latency, you have 500 concurrent requests in-flight at any moment.
How much storage does an app need?
Total storage = records/day × bytes/record × retention days × replication factor × (1 + index overhead). For 1M records/day at 1 KB each with 5-year retention, 3× replication, and 30% index overhead, you need roughly 13.5 TB.
How many GPUs do I need for a 7B LLM?
A 7B parameter model in FP16 requires ~14 GB of VRAM for weights, plus KV cache per concurrent request (~2 MB at 4K context). On a single H100 (80 GB), that fits easily. GPU count is often throughput-bound: at 10 RPS × 500 tokens ÷ 2,000 tokens/sec per H100, you need 3 GPUs.
How do I size a Redis or Memcached cache?
Use the 80/20 rule: 20% of items receive 80% of traffic. Cache size = total items × hot fraction × bytes per item × (1 + overhead). For 100M items at 1 KB each with 20% hot fraction and 20% overhead, you need ~24 GB of cache memory.
How much bandwidth does video streaming use?
Streaming bandwidth = concurrent viewers × bitrate per stream. 1080p video at 5 Mbps with 100,000 concurrent viewers requires 500 Gbps of egress — far beyond a single datacenter, which is why CDNs are essential for video at scale.
What is the read:write ratio and why does it matter?
Read:write ratio tells you how to architect your data layer. Social media apps are read-heavy (~100:1), logging pipelines are write-heavy (1:100). High read ratios suggest caching and read replicas; high write ratios suggest write-optimized databases like Cassandra or Kafka.
How do I calculate WebSocket server count?
Server count is usually memory-bound: servers = ⌈concurrent connections × memory per connection ÷ server RAM⌉. Node.js uses ~10–50 KB per WebSocket; Erlang and Go use ~2–5 KB. With 500K connections at 10 KB each on 64 GB servers, you need just 1 server for connection state.
System Design Calculator · Privacy Policy · Questions?
Latency Numbers (Jeff Dean, updated for modern hardware)
OperationTimeNotes
L1 cache reference1 ns
Branch mispredict3 ns
L2 cache reference4 ns≈ 4× L1
Mutex lock/unlock17 ns
Main memory reference100 ns≈ 20× L2, 200× L1
Compress 1 KB w/ Snappy2 µs
Read 1 MB sequentially from memory3 µs
Send 1 KB over 1 Gbps network10 µs
Read 4 KB random from SSD16 µs
Read 1 MB sequentially from SSD49 µs
Round trip within same datacenter500 µs0.5 ms
Read 1 MB sequentially from disk825 µs
Disk seek2 ms
Packet CA → Netherlands → CA150 ms
Powers of 2 — quick lookup
PowerValueMeaning
2101,024≈ 1 thousand (Kilo)
21665,536max value of UInt16
2201,048,576≈ 1 million (Mega)
2301,073,741,824≈ 1 billion (Giga)
2324,294,967,296max UInt32, ~4.3 B
2401,099,511,627,776≈ 1 trillion (Tera)
2501.13 × 1015≈ 1 quadrillion (Peta)
2639.22 × 1018max Int64 (signed)
Common Assumptions
Day length86,400 sec ≈ 10⁵ — useful for QPS math
Month≈ 30 days for napkin math
Year≈ 3.15 × 10⁷ sec ≈ π × 10⁷
Replication factor3 (Cassandra, HDFS, Kafka default)
Read:Write ratioSocial 100:1, Read-heavy CMS 10:1, Logging 1:100
Peak-to-average2–3× typical; 5–10× for spiky / event-driven
Cache hit rate target80–95% (80/20 rule on hot keys)
Disk speed (SSD)500 MB/s sequential, 100k IOPS random
Network1 Gbps ≈ 125 MB/s; 10 Gbps ≈ 1.25 GB/s
CompressionText: 5–10× (gzip); binary: 1–3×
Throughput & Bandwidth Ballparks
Single MySQL/Postgres node~1k–10k QPS reads, ~1k QPS writes
Redis (single node)~100k QPS
Kafka broker~1 GB/s ingest, ~100k msg/s
Nginx (single box)~50k req/s static content
App server (single box)~1k–10k QPS depending on workload
S3 / object storeEffectively unlimited; ~100 ms p50 GET
DNS lookup~20–120 ms cold; ~1 ms cached
TCP handshake (RTT)~1 ms LAN, ~30–100 ms WAN
TLS handshake1–2 extra RTTs
Mobile 4G RTT~50–100 ms