How do I calculate QPS from DAU?

QPS_avg = DAU x requests_per_user_per_day / 86400. For peak QPS, multiply by a peak-to-average ratio (typically 2-3x). For example, 10M DAU with 20 requests/user/day gives 2,315 avg QPS and ~6,944 peak QPS at 3x.

How many GPUs do I need to serve a large language model?

You need enough GPUs for both memory and throughput. Memory GPUs = ceil((model_params_B x bytes_per_param x 1e9 + kv_cache_bytes) / gpu_vram_bytes). Throughput GPUs = ceil(target_rps x tokens_per_req / gpu_tokens_per_sec). Take the maximum. A 70B FP16 model needs at least 2 A100-80GB GPUs for memory, but throughput requirements usually dominate at scale.

What is Little's Law in system design?

Little's Law states L = lambda x W, where L is the average number of requests in the system, lambda is the arrival rate (QPS), and W is the average time a request spends in the system (latency). It lets you calculate concurrency: at 1,000 QPS with 200ms latency, there are 200 concurrent requests in flight at any given moment.

How much storage does a video streaming platform need?

Total storage = upload_hours_per_day x bitrate_Bps x 3600 x encoding_versions x retention_days x replication. For a YouTube-scale platform with 500 hours/day uploaded, 5 Mbps average, 5 encoding versions, 5-year retention, and 3 CDN replicas, you need hundreds of petabytes.

How many WebSocket servers do I need?

The bottleneck is usually memory, not CPU. Servers = ceil(concurrent_connections x mem_per_connection / server_RAM). At 10KB per connection on 64GB servers, each server handles about 6,500 connections. For 5 million concurrent connections you need roughly 770 servers.

Can I use this calculator for S3 or object storage?

Yes. In the Storage calculator, set replication = 1 (S3 manages redundancy internally) and index_overhead = 0. Enter your object count as records/day and average object size as bytes/record to get raw object storage requirements.

System Design Calculator

Users & Traffic

Turn DAU into requests per second.

QPS_avg = DAU × req/user/day ÷ 86,400 QPS_peak = QPS_avg × peak_ratio

Inputs

Daily Active Users users

Type "10m" or "1.5b"

Requests per user per day req

Peak-to-average ratio ×

Web: 2–3×. Spiky: 5–10×.

Read : Write ratio : 1

Social 100:1. Logging 1:100.

Results

Total requests / day

—

Average QPS

—

Peak QPS

—

Peak read QPS

—

Peak write QPS

—

Storage

Disk needed for records over a retention window.

total = records/day × size × retention_days × (1 + index) × replicas

Inputs

Records per day rec

Size per record

Tweet ≈ 300 B. Image ≈ 1 MB.

Replication factor ×

Retention days

5 yr = 1,825 days · 1 yr = 365 days

Index / metadata overhead frac

0.30 = +30%

Results

Per day

—

Per month

—

Per year

—

Raw over retention

—

Total provisioned

—with replication + overhead

Bandwidth

Network traffic in and out of your service.

bandwidth = QPS × payload_size (per direction)

Inputs

Peak QPS req/s

Avg request size

JSON API ~1 KB. Photo ~1 MB.

Avg response size

Webpage ~100 KB. Video chunk ~256 KB.

Results

Ingress (in)

——

Egress (out)

——

Total (in + out)

—

Per month

—

Per year

—

Throughput & Concurrency

How many in-flight requests does this load create?

Little's Law: L = λ × W (concurrency = QPS × latency)

Inputs

Target QPS (λ) req/s

Avg request latency (W) ms

Server count srv

Results

Total in-flight requests

—across the whole fleet

Per-server QPS

—

Per-server concurrency

—threads/connections per box

CPU cores hint

—rough: concurrency/2 (I/O-bound)

Cache Sizing

Size a Redis / Memcached tier for the hot working set.

cache = items × hot_fraction × bytes_per_item × (1 + overhead)

Inputs

Total items in dataset items

Hot fraction frac

0.20 = 80/20 rule

Size per item

Cache overhead frac

0.20 = +20% for keys / metadata

Results

Hot items

——

Working set

—

Recommended cache size

—with overhead

RAM per Server

Memory for in-memory session / connection state.

per_server = users × bytes/user ÷ servers × (1 + headroom)

Inputs

Concurrent users / sessions users

Size per session

Server count srv

Headroom frac

0.30 = +30% safety

Results

Total session memory

—

Per server (steady)

—

Per server (with headroom)

—size boxes to at least this

Media Streaming

Bandwidth and storage for audio/video delivery.

peak_bw = viewers × bitrate storage = upload_hrs/day × bitrate × 3600s × versions × retention × replication

Inputs

Concurrent viewers (peak) viewers

Bitrate per stream Mbps

Audio ~0.128 · 720p ~2.5 · 1080p ~5 · 4K ~20

Content uploaded per day hrs/day

YouTube ≈ 720k hrs/day. Small platform ≈ 1k–10k.

Encoding versions ×

Quality tiers to store (e.g. 360p/720p/1080p = 3)

Storage retention days

Replication factor ×

Results

Peak streaming bandwidth

—total egress to all viewers

Daily data transferred

—

Storage per hour of content

—all encoding versions

Storage per day of uploads

—

Total storage over retention

—with replication

AI / LLM Inference

GPU count and memory for serving a large language model.

GPUs = max(⌈(weights + KV_cache) ÷ VRAM⌉, ⌈RPS × tokens ÷ GPU_tps⌉)

Model & Hardware

Model size B params

Llama 3 8B · Llama 3 70B · GPT-4 ~1.8T

Bytes per parameter B/p

FP32=4 · FP16/BF16=2 · INT8=1 · INT4=0.5

GPU VRAM GB

H100=80 · A100=40/80 · A10=24 · RTX 4090=24

GPU throughput tok/s

H100 ~3k · A100 ~2k · A10 ~500 (output tokens)

Batch size (concurrent reqs) reqs

KV cache per request

≈ context_len × 0.5 MB per 1K tokens (7B model)

Target requests / sec req/s

Avg tokens per request tok

Input + output combined

Results

Model weight memory

—

KV cache (full batch)

—

Total GPU memory needed

—

GPUs for memory

—⌈total_mem ÷ VRAM⌉

GPUs for throughput

—⌈RPS × tokens ÷ GPU_tps⌉

Min GPU count

—memory- or throughput-bound

WebSocket / Real-time

Persistent connection bandwidth, fan-out, and server sizing.

outbound = connections × msg_rate × msg_size × fan_out servers = ⌈connections × mem_per_conn ÷ server_mem⌉

Inputs

Concurrent connections conns

Messages / connection / sec msg/s

Presence ~0.03 · Chat ~0.1 · Trading ~10

Message size

JSON ~200–2k B · Binary frame ~50–500 B

Fan-out ratio recv/msg

DM=1 · Group chat ~50 · Global broadcast=all

Memory per connection

Node.js WS ~10–50 KB · Erlang/Go ~2–5 KB

Server memory GB

Results

Inbound bandwidth

—all clients → server

Outbound bandwidth

—server → clients (after fan-out)

Total bandwidth

—

Total connection memory

—

Max connections / server

—memory-bound

Servers needed

—

Frequently Asked Questions

How do I convert DAU to QPS?

Multiply DAU by requests per user per day, divide by 86,400 (seconds in a day) to get average QPS. Multiply by a peak factor (typically 2–3×) to get peak QPS. Example: 10M DAU × 20 req/day ÷ 86,400 × 3 = ~6,944 peak QPS.

What is Little's Law for concurrent requests?

Little's Law states L = λ × W: concurrency (in-flight requests) equals arrival rate (QPS) multiplied by average latency in seconds. At 10,000 QPS with 50ms latency, you have 500 concurrent requests in-flight at any moment.

How much storage does an app need?

Total storage = records/day × bytes/record × retention days × replication factor × (1 + index overhead). For 1M records/day at 1 KB each with 5-year retention, 3× replication, and 30% index overhead, you need roughly 13.5 TB.

How many GPUs do I need for a 7B LLM?

A 7B parameter model in FP16 requires ~14 GB of VRAM for weights, plus KV cache per concurrent request (~2 MB at 4K context). On a single H100 (80 GB), that fits easily. GPU count is often throughput-bound: at 10 RPS × 500 tokens ÷ 2,000 tokens/sec per H100, you need 3 GPUs.

How do I size a Redis or Memcached cache?

Use the 80/20 rule: 20% of items receive 80% of traffic. Cache size = total items × hot fraction × bytes per item × (1 + overhead). For 100M items at 1 KB each with 20% hot fraction and 20% overhead, you need ~24 GB of cache memory.

How much bandwidth does video streaming use?

Streaming bandwidth = concurrent viewers × bitrate per stream. 1080p video at 5 Mbps with 100,000 concurrent viewers requires 500 Gbps of egress — far beyond a single datacenter, which is why CDNs are essential for video at scale.

What is the read:write ratio and why does it matter?

Read:write ratio tells you how to architect your data layer. Social media apps are read-heavy (~100:1), logging pipelines are write-heavy (1:100). High read ratios suggest caching and read replicas; high write ratios suggest write-optimized databases like Cassandra or Kafka.

How do I calculate WebSocket server count?

Server count is usually memory-bound: servers = ⌈concurrent connections × memory per connection ÷ server RAM⌉. Node.js uses ~10–50 KB per WebSocket; Erlang and Go use ~2–5 KB. With 500K connections at 10 KB each on 64 GB servers, you need just 1 server for connection state.

Latency Numbers (Jeff Dean, updated for modern hardware)

Operation	Time	Notes
L1 cache reference	1 ns
Branch mispredict	3 ns
L2 cache reference	4 ns	≈ 4× L1
Mutex lock/unlock	17 ns
Main memory reference	100 ns	≈ 20× L2, 200× L1
Compress 1 KB w/ Snappy	2 µs
Read 1 MB sequentially from memory	3 µs
Send 1 KB over 1 Gbps network	10 µs
Read 4 KB random from SSD	16 µs
Read 1 MB sequentially from SSD	49 µs
Round trip within same datacenter	500 µs	0.5 ms
Read 1 MB sequentially from disk	825 µs
Disk seek	2 ms
Packet CA → Netherlands → CA	150 ms

Powers of 2 — quick lookup

Power	Value	Meaning
2¹⁰	1,024	≈ 1 thousand (Kilo)
2¹⁶	65,536	max value of UInt16
2²⁰	1,048,576	≈ 1 million (Mega)
2³⁰	1,073,741,824	≈ 1 billion (Giga)
2³²	4,294,967,296	max UInt32, ~4.3 B
2⁴⁰	1,099,511,627,776	≈ 1 trillion (Tera)
2⁵⁰	1.13 × 10¹⁵	≈ 1 quadrillion (Peta)
2⁶³	9.22 × 10¹⁸	max Int64 (signed)

Common Assumptions

Day length86,400 sec ≈ 10⁵ — useful for QPS math

Month≈ 30 days for napkin math

Year≈ 3.15 × 10⁷ sec ≈ π × 10⁷

Replication factor3 (Cassandra, HDFS, Kafka default)

Read:Write ratioSocial 100:1, Read-heavy CMS 10:1, Logging 1:100

Peak-to-average2–3× typical; 5–10× for spiky / event-driven

Cache hit rate target80–95% (80/20 rule on hot keys)

Disk speed (SSD)500 MB/s sequential, 100k IOPS random

Network1 Gbps ≈ 125 MB/s; 10 Gbps ≈ 1.25 GB/s

CompressionText: 5–10× (gzip); binary: 1–3×

Throughput & Bandwidth Ballparks

Single MySQL/Postgres node~1k–10k QPS reads, ~1k QPS writes

Redis (single node)~100k QPS

Kafka broker~1 GB/s ingest, ~100k msg/s

Nginx (single box)~50k req/s static content

App server (single box)~1k–10k QPS depending on workload

S3 / object storeEffectively unlimited; ~100 ms p50 GET

DNS lookup~20–120 ms cold; ~1 ms cached

TCP handshake (RTT)~1 ms LAN, ~30–100 ms WAN

TLS handshake1–2 extra RTTs

Mobile 4G RTT~50–100 ms