A video streaming platform like YouTube or Netflix must handle massive storage (hundreds of terabytes of daily uploads), complex multi-format transcoding, global CDN delivery with minimal buffering, and ML-driven recommendations that keep users engaged. The upload pipeline and the playback pipeline are effectively two separate systems with very different constraints — upload is a throughput-optimized batch processing problem; playback is a latency-optimized read path problem. Getting the boundary between these two right is the central design challenge.

⚡ Quick Takeaways
  • TCP over UDP for on-demand — retransmission eliminates packet-loss artifacts; UDP dominates live streaming where latency beats quality.
  • Upload pipeline via Kafka — raw video fans out to content filtering, classification, and multi-resolution transcoding in parallel before landing in blob storage.
  • HLS/DASH for ABR — video is split into 2–10 second segments at multiple bitrates; the client's ABR algorithm picks the quality tier that matches available bandwidth in real time.
  • Blob storage (S3) + Cassandra — binary files in object storage; video metadata (fixed query patterns by ID) in Cassandra for high write throughput.
  • CDN + lazy segment loading — serve only the segments being watched; dramatically cuts bandwidth and startup latency.
  • Recommendations offline — pre-compute from Kafka-buffered watch signals; serve from key-value store at O(1) lookup.
  • Redis cache-aside for auth — load on first miss; lazy loading avoids caching data that is never requested.
  • Signed URLs for playback auth — time-limited tokens in CDN URLs prevent hotlinking and enforce access control without touching the video bytes.
tldr

Video uploads flow through a Kafka-backed pipeline of content filtering, classification, and multi-resolution transcoding before landing in blob storage and Cassandra metadata. Transcoded segments are served from geographically distributed CDNs using HLS/DASH adaptive bitrate streaming. Recommendations are computed offline and stored in key-value NoSQL. TCP is chosen over UDP for reliability — no packet loss means no buffering artifacts.

High-level video streaming platform architecture
High-level video streaming platform architecture

Step 1 — Clarify Requirements

Video streaming covers a wide surface area. In an interview, scope explicitly to avoid spending 45 minutes on a system you can't finish.

Functional

Non-Functional

Step 2 — Capacity Estimation

Assume a large platform in the YouTube tier: 10 million daily active users (DAU), each watching ~10 videos per day; 1% are content creators uploading ~2 videos per day each.

Upload Volume

Playback Volume

Storage Growth

Step 3 — API Design

Two distinct API surfaces: the upload API (write-heavy, async processing) and the playback API (read-heavy, latency-sensitive).

HTTP
# Upload: initiate a resumable upload
POST /api/videos/upload/init
     { title, description, file_size_bytes, content_type }
     → { upload_id, upload_url }  (pre-signed S3 URL)

# Upload: upload chunk (resumable, idempotent)
PUT  {upload_url}
     Content-Range: bytes 0-5242879/104857600
     → 308 Resume Incomplete  |  200 OK (final chunk)

# Check processing status
GET  /api/videos/{video_id}/status
     → { status: "processing"|"ready"|"failed", progress_pct }

# Playback: get video manifest
GET  /api/videos/{video_id}/manifest
     → { hls_url, dash_url, thumbnail_url, duration_s, title }

# Playback: HLS master playlist (served from CDN)
GET  https://cdn.example.com/v/{video_id}/master.m3u8
     → HLS playlist listing all quality variants with signed URLs

# Watch history / resume
POST /api/videos/{video_id}/progress
     { user_id, position_s, event: "pause"|"seek"|"complete" }

# Homepage feed
GET  /api/feed?user_id=123&page_token=abc
     → { videos: [...], next_page_token }

Two design choices worth calling out: upload uses resumable chunked upload (equivalent to Google's resumable upload protocol or AWS S3 multipart upload) so a mobile creator on a flaky connection can resume mid-upload without restarting. Playback returns a signed CDN URL in the manifest — the video bytes are served directly from CDN edge nodes, never through application servers.

Step 4 — Protocol: TCP vs. UDP vs. QUIC

YouTube and Netflix adopted TCP as the transport protocol primarily to guarantee video quality — TCP's retransmission ensures zero packet loss, eliminating the pixelation or blocking artifacts that dropped UDP packets cause. Security is a secondary benefit. The tradeoff is slightly higher latency compared to UDP, which is acceptable for on-demand streaming where a 2–10 second buffer absorbs jitter.

ProtocolLatencyReliabilityUse case
TCPModerateGuaranteed delivery, orderedOn-demand VOD streaming — quality beats latency
UDPLowBest-effort, may drop packetsLive streaming, video calls — latency beats quality
QUICLow (0-RTT)Reliable over UDP, multiplexed streamsEmerging: YouTube uses it to reduce buffering on lossy networks
WebRTCVery lowUDP-based, browser-nativeReal-time 1:1 or small group video (Zoom, Meets)

Google's QUIC protocol deserves a mention: it achieves TCP-like reliability over UDP, eliminates TCP head-of-line blocking across streams, and supports 0-RTT connection resumption. YouTube has been migrating to QUIC and reports measurable reductions in buffering events, especially on mobile networks with frequent IP address changes (handoffs between WiFi and cellular).

Video upload pipeline flow
Video upload pipeline flow

Step 5 — Upload Pipeline

The upload path is the most complex flow in the system. A raw video file goes through multiple transformations before it is ready for viewers. The key design principle is fan-out through a message queue: as soon as the raw file lands in staging storage, a Kafka event triggers all downstream processing steps in parallel, so content filtering, classification, and transcoding all run concurrently rather than sequentially.

flow
Creator uploads raw video (chunked, resumable)
  → Content Onboarding Initializer Service
      - Validates file integrity (checksum)
      - Writes PENDING record to Cassandra
      - Moves raw file to staging S3 bucket
  → Publishes VideoUploaded event to Kafka
      ├── Multi-part Resolver: chunks file into ~5 min GOP segments
      ├── Content Filter: screens for illegal/inappropriate material
      │     └── On violation: mark REJECTED, notify creator
      ├── Classifier & Tagger: generates tags, categories, language
      │     └── ML model: visual + audio + transcript signals
      └── Transcoder (parallel workers):
            ├── 360p  H.264  ~400 Kbps
            ├── 480p  H.264  ~800 Kbps
            ├── 720p  H.264  ~2.5 Mbps
            ├── 1080p H.264  ~5 Mbps
            └── 4K   HEVC   ~15 Mbps
                  └── Each resolution → segment files in S3
                        └── Content Service writes metadata to Cassandra
                              → Elasticsearch indexing
                                    → Status: READY

Chunking the upload into parallel parts through the Multi-part Resolver dramatically reduces total processing time for large files. Each GOP (Group of Pictures) segment can be transcoded independently and in parallel across a fleet of transcoding workers. A 2-hour movie that takes 6 hours to transcode sequentially can be completed in under 30 minutes with enough parallel workers — critical for YouTube's promise of near-instant availability after upload.

Resumable Upload Implementation

On mobile networks, a 500 MB upload will almost certainly be interrupted. The resumable upload protocol uses pre-signed chunk URLs from S3 and tracks which chunks are confirmed in a Redis bitmap (indexed by chunk number). On reconnect, the client queries which chunks are already received and resumes from the first missing chunk. The server-side chunking state is stored in Redis with a TTL of 24 hours — after which an incomplete upload is abandoned.

Step 6 — Transcoding and ABR Streaming

Transcoding is the most resource-intensive step in the pipeline. The goal is to produce multiple quality representations of each video, split into short segments, formatted for adaptive bitrate (ABR) delivery.

HLS vs. DASH

ProtocolSegment formatManifestSupport
HLS (HTTP Live Streaming)MPEG-TS or fMP4.m3u8 playlistNative on iOS/macOS/Safari; widely supported
DASH (Dynamic Adaptive Streaming)MP4 fragments (fMP4)MPD XML manifestAndroid, Chrome, smart TVs; not native on iOS

In practice, large platforms transcode into both HLS and DASH to cover all client types. The segment files themselves (fMP4 chunks) can often be shared between HLS and DASH since both support fragmented MP4 — only the manifest files differ. Segment duration is typically 2–6 seconds: shorter segments allow faster quality switches but increase manifest overhead and the number of HTTP requests per minute of video.

ABR Algorithm

The client player implements an ABR (Adaptive Bitrate) algorithm that continuously monitors available bandwidth and buffer occupancy to select the optimal quality segment to fetch next. The two dominant approaches:

pseudocode
-- Buffer-based ABR (simplified)
FUNCTION select_next_quality(buffer_s, available_bitrates):
  IF buffer_s > 30:
    RETURN step_up(current_quality, available_bitrates)
  ELIF buffer_s < 10:
    RETURN step_down(current_quality, available_bitrates)
  ELSE:
    RETURN current_quality  -- stable zone, hold current

-- Segment fetch loop
WHILE playing:
  quality = select_next_quality(buffer.seconds_remaining(), bitrates)
  segment_url = manifest.segment_url(next_segment_idx, quality)
  fetch_async(segment_url)  -- from CDN edge
  next_segment_idx += 1
Transcoding pipeline and multi-resolution output
Transcoding pipeline and multi-resolution output

Step 7 — Data Model and Storage Architecture

Storage selection is driven by access patterns. Video streaming has a particularly sharp split: binary video segments belong in object storage; structured metadata belongs in a database.

Data TypeStorageRationale
User accounts, billingSQL (RDBMS)ACID compliance for authentication and financial data
Video metadataCassandra (NoSQL)High write throughput; fixed query patterns (by video ID, by creator)
Video segmentsBlob storage (S3/GCS)Object store purpose-built for large binary content at petabyte scale
RecommendationsKey-value NoSQL (DynamoDB/Redis)Fast point lookups by user ID; updated via offline batch jobs
Search indexElasticsearchFull-text and fuzzy search with type-ahead
Watch progressCassandraHigh write throughput (continuous position updates); keyed by (user_id, video_id)
Upload chunk stateRedis (TTL 24h)Temporary, fast bitmap for tracking which chunks are confirmed

Cassandra Data Model for Video Metadata

CQL
CREATE TABLE videos (
  video_id      UUID PRIMARY KEY,
  creator_id    UUID,
  title         TEXT,
  description   TEXT,
  status        TEXT,          -- PENDING|PROCESSING|READY|FAILED
  duration_s    INT,
  s3_prefix     TEXT,          -- base path in S3 for all segments
  hls_manifest  TEXT,          -- CDN URL to master.m3u8
  dash_manifest TEXT,          -- CDN URL to manifest.mpd
  thumbnail_url TEXT,
  tags          LIST<TEXT>,
  created_at    TIMESTAMP,
  updated_at    TIMESTAMP
);

-- Watch progress: (user, video) → last known position
CREATE TABLE watch_progress (
  user_id       UUID,
  video_id      UUID,
  position_s    INT,
  updated_at    TIMESTAMP,
  PRIMARY KEY   (user_id, video_id)
);

Step 8 — User Authentication

The authentication flow uses a standard RESTful API service backed by a SQL database for ACID guarantees. A Redis cache with Cache-Aside (lazy loading) pattern sits in front of the SQL database: data is loaded into cache on first miss and served from cache on subsequent reads. Lazy loading is preferred over write-through here because it avoids caching data that is never actually requested — most user accounts are never queried after registration; only active session tokens matter for the hot path.

Session tokens are short-lived JWTs (15-minute access tokens + 7-day refresh tokens). On token refresh, the refresh token is validated against Redis (where active refresh tokens are stored) and a new access token is issued without touching the SQL database. This keeps the auth hot path entirely in Redis.

Step 9 — Content Delivery and Playback Authorization

Video files are served through a globally distributed CDN network. A Host Optimizer service directs each client to the nearest CDN node based on geographic proximity and current server load. The CDN layer applies lazy segment loading — only the segments the user is currently watching are fetched, rather than the full video file, reducing bandwidth costs and startup latency.

Playback Authorization with Signed URLs

Access control for video playback cannot rely on the CDN checking tokens on every segment request — that would saturate the application servers. Instead, the solution is CDN-signed URLs:

  1. Client authenticates to the application server and requests a video manifest.
  2. Application server generates a short-lived signed token (signed with a CDN-shared secret), embeds it in the manifest's segment URLs, and returns the manifest.
  3. Client fetches segments directly from CDN. The CDN validates the signature without any application server involvement — zero overhead on the hot path.
  4. Tokens expire after ~1 hour, preventing hotlinking and enforcing subscription validity.
pseudocode
-- Server-side: generate signed URL for a segment
FUNCTION sign_cdn_url(base_url, user_id, expires_at, secret):
  policy = {
    "url": base_url,
    "user_id": user_id,
    "exp": expires_at
  }
  signature = hmac_sha256(json_encode(policy), secret)
  RETURN base_url + "?token=" + base64url(policy) + "&sig=" + signature

-- CDN edge: validate on each segment request
FUNCTION validate_cdn_request(request, secret):
  policy, sig = parse_token(request.query.token, request.query.sig)
  IF policy.exp < now():  RETURN 401
  expected = hmac_sha256(json_encode(policy), secret)
  IF sig != expected:       RETURN 403
  RETURN 200               -- serve the segment

Step 10 — Homepage, Search, and Recommendations

The homepage combines two data sources:

Recommendation Architecture

Recommendations are computed offline from watch history, watch duration, and interaction signals (likes, shares, re-watches, skips). These signals are fed into Kafka from the playback service and used to train recommendation models in batch. The architecture has two components:

Pre-computed recommendations for each user are stored in DynamoDB keyed by user_id and refreshed every 1–6 hours by the offline batch pipeline. This gives O(1) lookup at serve time with no inference latency on the critical homepage path.

Step 11 — Playback Tracking

During playback, the client continuously logs watch status and position. This data flows into Kafka and drives multiple downstream consumers:

Step 12 — Scaling, Fault Tolerance, and Live Streaming

Scaling the Transcoding Pipeline

Transcoding is CPU-intensive and naturally parallelizable. At 100 TB/day of ingest, the transcoding fleet must be elastic:

CDN Fault Tolerance

Live Streaming vs. VOD

DimensionVOD (on-demand)Live streaming
ProtocolTCP (HLS/DASH)UDP or QUIC (WebRTC for ultra-low latency)
Segment availabilityAll segments pre-generatedSegments generated in real time; manifest is rolling
CDN cachingHigh cache hit rate (same segments served many times)Each segment is unique and expires after a few minutes
TranscodingOffline, parallel, retryableReal-time, single pass, must finish before segment deadline
Latency target2–10 sec startup OK<3 sec for sports/events; <30 sec for standard live

For live streaming, the transcoding path must process each GOP segment in real time with a strict deadline. A dedicated real-time transcoding cluster (distinct from the VOD fleet) handles live streams, using lower-latency encoding presets (faster but slightly lower quality) to meet the deadline. The rolling manifest is a special HLS/DASH variant (low-latency HLS) that publishes new segments every 0.5–2 seconds instead of the standard 2–6 seconds.

Key Tradeoffs

DecisionChoiceTrade-off accepted
Transport protocol (VOD)TCP (HLS/DASH)Higher latency than UDP; acceptable because 2–10s buffer absorbs jitter
RecommendationsOffline pre-computeRecommendations are 1–6 hours stale; catches most behavior changes
Video metadata storeCassandraNo relational joins; query patterns must be known at schema design time
ABR segment length4–6 secondsShorter = faster quality switches but more HTTP requests and manifest overhead
Playback authCDN signed URLsToken expiry means brief re-auth on very long sessions; app server not on hot path
Transcoding parallelismGOP-parallel workersMust ensure GOP boundaries align; slight overhead versus per-file sequential
takeaway

Video streaming is two systems: upload (complex multi-stage pipeline optimized for throughput, correctness, and parallel processing) and playback (optimized for latency via CDN, ABR, and lazy segment loading). TCP's no-packet-loss guarantee is worth the latency tradeoff for on-demand streaming. Recommendations belong offline — pre-compute once, serve fast from key-value store, retrain continuously from Kafka-buffered behavior signals. CDN signed URLs are the right way to enforce playback auth without application servers on the hot path.

🎯 interview hot-takes

Why TCP over UDP for on-demand video? TCP retransmission eliminates dropped-packet artifacts (pixelation, blocking); the slight latency increase is acceptable for buffered on-demand content because the player maintains a 2–10 second buffer. UDP is preferred for live streaming where low latency matters more than perfect quality.
Why store recommendations in a key-value store rather than computing them on-the-fly? Recommendation models are expensive to run; pre-computing offline and serving from a key-value store by user ID gives O(1) lookup with no inference latency on the critical path.
Why use Cassandra for video metadata? Queries are fixed-pattern (by video ID, by creator), write throughput is high (status updates, view count increments), and there are no relational joins needed — Cassandra's wide-column model is ideal.
How does ABR work? Video is split into 2–6 second segments at multiple bitrates. The client's ABR algorithm monitors buffer occupancy and measured throughput, selecting the highest quality tier that can be downloaded without rebuffering.
How do you authorize video playback without overloading application servers? Use CDN-signed URLs embedded in the manifest. The CDN validates the HMAC signature at the edge without any application server involvement — auth happens once at manifest fetch time, not on every segment.

← previous
Design a Map Web App