All-in-One
Every calculator on one page. Type shorthand like "10m" or "1.5b".
Users & Traffic
Turn DAU into requests per second.
ƒ QPS_avg = DAU × req/user/day ÷ 86,400 QPS_peak = QPS_avg × peak_ratio
Inputs
users
Type "10m" or "1.5b"
req
×
Web: 2–3×. Spiky: 5–10×.
: 1
Social 100:1. Logging 1:100.
Results
Total requests / day
—
Average QPS
—
Peak QPS
—
Peak read QPS
—
Peak write QPS
—
Storage
Disk needed for records over a retention window.
ƒ total = records/day × size × retention_days × (1 + index) × replicas
Inputs
rec
Tweet ≈ 300 B. Image ≈ 1 MB.
×
days
5 yr = 1,825 days · 1 yr = 365 days
frac
0.30 = +30%
Results
Per day
—
Per month
—
Per year
—
Raw over retention
—
Total provisioned
—with replication + overhead
Bandwidth
Network traffic in and out of your service.
ƒ bandwidth = QPS × payload_size (per direction)
Inputs
req/s
JSON API ~1 KB. Photo ~1 MB.
Webpage ~100 KB. Video chunk ~256 KB.
Results
Ingress (in)
——
Egress (out)
——
Total (in + out)
—
Per month
—
Per year
—
Throughput & Concurrency
How many in-flight requests does this load create?
ƒ Little's Law: L = λ × W (concurrency = QPS × latency)
Inputs
req/s
ms
srv
Results
Total in-flight requests
—across the whole fleet
Per-server QPS
—
Per-server concurrency
—threads/connections per box
CPU cores hint
—rough: concurrency/2 (I/O-bound)
Cache Sizing
Size a Redis / Memcached tier for the hot working set.
ƒ cache = items × hot_fraction × bytes_per_item × (1 + overhead)
Inputs
items
frac
0.20 = 80/20 rule
frac
0.20 = +20% for keys / metadata
Results
Hot items
——
Working set
—
Recommended cache size
—with overhead
RAM per Server
Memory for in-memory session / connection state.
ƒ per_server = users × bytes/user ÷ servers × (1 + headroom)
Inputs
users
srv
frac
0.30 = +30% safety
Results
Total session memory
—
Per server (steady)
—
Per server (with headroom)
—size boxes to at least this
Media Streaming
Bandwidth and storage for audio/video delivery.
ƒ peak_bw = viewers × bitrate storage = upload_hrs/day × bitrate × 3600s × versions × retention × replication
Inputs
viewers
Mbps
Audio ~0.128 · 720p ~2.5 · 1080p ~5 · 4K ~20
hrs/day
YouTube ≈ 720k hrs/day. Small platform ≈ 1k–10k.
×
Quality tiers to store (e.g. 360p/720p/1080p = 3)
days
×
Results
Peak streaming bandwidth
—total egress to all viewers
Daily data transferred
—
Storage per hour of content
—all encoding versions
Storage per day of uploads
—
Total storage over retention
—with replication
AI / LLM Inference
GPU count and memory for serving a large language model.
ƒ GPUs = max(⌈(weights + KV_cache) ÷ VRAM⌉, ⌈RPS × tokens ÷ GPU_tps⌉)
Model & Hardware
B params
Llama 3 8B · Llama 3 70B · GPT-4 ~1.8T
B/p
FP32=4 · FP16/BF16=2 · INT8=1 · INT4=0.5
GB
H100=80 · A100=40/80 · A10=24 · RTX 4090=24
tok/s
H100 ~3k · A100 ~2k · A10 ~500 (output tokens)
reqs
≈ context_len × 0.5 MB per 1K tokens (7B model)
req/s
tok
Input + output combined
Results
Model weight memory
—
KV cache (full batch)
—
Total GPU memory needed
—
GPUs for memory
—⌈total_mem ÷ VRAM⌉
GPUs for throughput
—⌈RPS × tokens ÷ GPU_tps⌉
Min GPU count
—memory- or throughput-bound
WebSocket / Real-time
Persistent connection bandwidth, fan-out, and server sizing.
ƒ outbound = connections × msg_rate × msg_size × fan_out servers = ⌈connections × mem_per_conn ÷ server_mem⌉
Inputs
conns
msg/s
Presence ~0.03 · Chat ~0.1 · Trading ~10
JSON ~200–2k B · Binary frame ~50–500 B
recv/msg
DM=1 · Group chat ~50 · Global broadcast=all
Node.js WS ~10–50 KB · Erlang/Go ~2–5 KB
GB
Results
Inbound bandwidth
—all clients → server
Outbound bandwidth
—server → clients (after fan-out)
Total bandwidth
—
Total connection memory
—
Max connections / server
—memory-bound
Servers needed
—
Frequently Asked Questions
How do I convert DAU to QPS?
Multiply DAU by requests per user per day, divide by 86,400 (seconds in a day) to get average QPS. Multiply by a peak factor (typically 2–3×) to get peak QPS. Example: 10M DAU × 20 req/day ÷ 86,400 × 3 = ~6,944 peak QPS.
What is Little's Law for concurrent requests?
Little's Law states L = λ × W: concurrency (in-flight requests) equals arrival rate (QPS) multiplied by average latency in seconds. At 10,000 QPS with 50ms latency, you have 500 concurrent requests in-flight at any moment.
How much storage does an app need?
Total storage = records/day × bytes/record × retention days × replication factor × (1 + index overhead). For 1M records/day at 1 KB each with 5-year retention, 3× replication, and 30% index overhead, you need roughly 13.5 TB.
How many GPUs do I need for a 7B LLM?
A 7B parameter model in FP16 requires ~14 GB of VRAM for weights, plus KV cache per concurrent request (~2 MB at 4K context). On a single H100 (80 GB), that fits easily. GPU count is often throughput-bound: at 10 RPS × 500 tokens ÷ 2,000 tokens/sec per H100, you need 3 GPUs.
How do I size a Redis or Memcached cache?
Use the 80/20 rule: 20% of items receive 80% of traffic. Cache size = total items × hot fraction × bytes per item × (1 + overhead). For 100M items at 1 KB each with 20% hot fraction and 20% overhead, you need ~24 GB of cache memory.
How much bandwidth does video streaming use?
Streaming bandwidth = concurrent viewers × bitrate per stream. 1080p video at 5 Mbps with 100,000 concurrent viewers requires 500 Gbps of egress — far beyond a single datacenter, which is why CDNs are essential for video at scale.
What is the read:write ratio and why does it matter?
Read:write ratio tells you how to architect your data layer. Social media apps are read-heavy (~100:1), logging pipelines are write-heavy (1:100). High read ratios suggest caching and read replicas; high write ratios suggest write-optimized databases like Cassandra or Kafka.
How do I calculate WebSocket server count?
Server count is usually memory-bound: servers = ⌈concurrent connections × memory per connection ÷ server RAM⌉. Node.js uses ~10–50 KB per WebSocket; Erlang and Go use ~2–5 KB. With 500K connections at 10 KB each on 64 GB servers, you need just 1 server for connection state.
Reference Cheatsheet
Numbers every engineer should know before the whiteboard.
Latency Numbers (Jeff Dean, updated for modern hardware)
| Operation | Time | Notes |
|---|---|---|
| L1 cache reference | 1 ns | |
| Branch mispredict | 3 ns | |
| L2 cache reference | 4 ns | ≈ 4× L1 |
| Mutex lock/unlock | 17 ns | |
| Main memory reference | 100 ns | ≈ 20× L2, 200× L1 |
| Compress 1 KB w/ Snappy | 2 µs | |
| Read 1 MB sequentially from memory | 3 µs | |
| Send 1 KB over 1 Gbps network | 10 µs | |
| Read 4 KB random from SSD | 16 µs | |
| Read 1 MB sequentially from SSD | 49 µs | |
| Round trip within same datacenter | 500 µs | 0.5 ms |
| Read 1 MB sequentially from disk | 825 µs | |
| Disk seek | 2 ms | |
| Packet CA → Netherlands → CA | 150 ms |
Powers of 2 — quick lookup
| Power | Value | Meaning |
|---|---|---|
| 210 | 1,024 | ≈ 1 thousand (Kilo) |
| 216 | 65,536 | max value of UInt16 |
| 220 | 1,048,576 | ≈ 1 million (Mega) |
| 230 | 1,073,741,824 | ≈ 1 billion (Giga) |
| 232 | 4,294,967,296 | max UInt32, ~4.3 B |
| 240 | 1,099,511,627,776 | ≈ 1 trillion (Tera) |
| 250 | 1.13 × 1015 | ≈ 1 quadrillion (Peta) |
| 263 | 9.22 × 1018 | max Int64 (signed) |
Common Assumptions
Day length86,400 sec ≈ 10⁵ — useful for QPS math
Month≈ 30 days for napkin math
Year≈ 3.15 × 10⁷ sec ≈ π × 10⁷
Replication factor3 (Cassandra, HDFS, Kafka default)
Read:Write ratioSocial 100:1, Read-heavy CMS 10:1, Logging 1:100
Peak-to-average2–3× typical; 5–10× for spiky / event-driven
Cache hit rate target80–95% (80/20 rule on hot keys)
Disk speed (SSD)500 MB/s sequential, 100k IOPS random
Network1 Gbps ≈ 125 MB/s; 10 Gbps ≈ 1.25 GB/s
CompressionText: 5–10× (gzip); binary: 1–3×
Throughput & Bandwidth Ballparks
Single MySQL/Postgres node~1k–10k QPS reads, ~1k QPS writes
Redis (single node)~100k QPS
Kafka broker~1 GB/s ingest, ~100k msg/s
Nginx (single box)~50k req/s static content
App server (single box)~1k–10k QPS depending on workload
S3 / object storeEffectively unlimited; ~100 ms p50 GET
DNS lookup~20–120 ms cold; ~1 ms cached
TCP handshake (RTT)~1 ms LAN, ~30–100 ms WAN
TLS handshake1–2 extra RTTs
Mobile 4G RTT~50–100 ms