Chat, live notifications, collaborative editing, multiplayer games, trading dashboards — all need the server to push data to the client the instant something happens. Plain HTTP can't: it's request/response, always initiated by the client. WebSockets solve this with a persistent, full-duplex connection over which either side can send at any time. Understanding the options — polling, long-polling, SSE, and WebSockets — and how to scale stateful connections is the core of any real-time design, like our chat app.
- HTTP is client-initiated request/response — the server can't push, so real-time needs another approach.
- WebSockets give a persistent, full-duplex connection — after an HTTP upgrade handshake, both sides send freely over one long-lived TCP connection.
- The handshake starts as HTTP with an
Upgradeheader →101 Switching Protocols→ the connection becomesws:///wss://. - Know the alternatives — long-polling (hacky push), and SSE (server→client only, simpler) — and pick by whether you need bidirectional.
- Scaling is the hard part — connections are stateful and long-lived; you need many open sockets, sticky routing, and a pub/sub backplane to fan out across servers.
- Heartbeats + reconnection keep connections healthy and recover from drops.
WebSockets upgrade an HTTP connection into a persistent, full-duplex channel so the server can push to clients in real time. For one-way server→client streams, SSE is simpler; for true bidirectional, use WebSockets; long-polling is the fallback. The real engineering challenge is scale: connections are stateful and long-lived, so you need sticky routing, capacity for huge numbers of idle sockets, and a pub/sub backplane (e.g. Redis) to deliver a message to clients connected to other servers. Add heartbeats and reconnection for resilience.
The Problem: HTTP Is One-Way
HTTP's model is simple and scalable: a client sends a request, the server responds, done. But it means the server has no way to initiate — it can only answer. For anything where the server needs to tell the client "a new message arrived" or "the price changed," that's a fundamental mismatch. The naive workaround is polling (the client asks "anything new?" every few seconds), which is wasteful — mostly empty responses — and laggy. The history of real-time on the web is a series of increasingly better answers to this.
The Evolution: Polling → Long-Polling → SSE → WebSockets
- Short polling — client requests on a timer. Simple, but wastes requests and adds latency up to the poll interval.
- Long polling — the client requests and the server holds the request open until it has data (or a timeout), then the client immediately re-requests. Near-real-time, works everywhere, but it's a hack with connection churn and overhead.
- Server-Sent Events (SSE) — a standardized one-way stream: the server pushes a continuous event stream over a single long-lived HTTP response. Simple, auto-reconnects, but server→client only.
- WebSockets — a persistent, full-duplex connection where both sides push freely. The most capable, and the right tool when the client also needs to send frequently.
What WebSockets Are
A WebSocket is a single, long-lived TCP connection that supports full-duplex communication — both client and server can send messages at any time, independently, with low overhead per message (no HTTP headers per message). It begins life as an ordinary HTTP request so it works through existing web infrastructure, then upgrades: once established, it's a bare bidirectional message pipe (ws://, or wss:// over TLS).
# client → server (looks like HTTP)
GET /chat HTTP/1.1
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
# server → client
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
# → connection is now a full-duplex WebSocket; either side sends anytime
WebSockets vs SSE vs Long-Polling
| Aspect | Long-Polling | SSE | WebSockets |
|---|---|---|---|
| Direction | Server→client (faked) | Server→client only | Full-duplex |
| Connection | Repeated requests | One long-lived HTTP | One persistent socket |
| Overhead | High (re-connect churn) | Low | Lowest per message |
| Complexity | Low | Low (built into browsers) | Higher |
| Use for | Legacy fallback | Feeds, notifications, tickers | Chat, collab, games |
The decision rule: if the client only needs to receive updates, SSE is simpler and rides on plain HTTP. If the client also sends frequently (typing in a chat, moving in a game), use WebSockets. Long-polling is the universal fallback when neither is available.
Scaling WebSockets: the Real Challenge
Anyone can open a WebSocket; the hard part is operating millions of them. Unlike stateless HTTP requests that any server can handle, a WebSocket is a stateful, long-lived connection pinned to one specific server. That creates several problems:
- Many idle connections — a chat service may hold millions of mostly-idle sockets; each consumes memory and a file descriptor, so connection density per server matters.
- Sticky routing — the load balancer must keep a client pinned to the server holding its connection (and must support the WebSocket upgrade in the first place — see load balancing).
- The fan-out problem — the killer issue, below.
The Fan-Out Problem and the Backplane
Suppose Alice and Bob are in the same chat room but their WebSockets are connected to different servers. When Alice sends a message, the server holding her connection has no direct way to reach Bob's connection on another server. The solution is a pub/sub backplane: servers publish incoming messages to a shared bus (commonly Redis pub/sub or Kafka), and every server subscribed to that room receives the message and pushes it down to its own connected clients.
Alice ──ws──▶ WS-server-1 ──publish "room42"──▶ ┌───────────────┐
│ Redis pub/sub │
Bob ──ws──▶ WS-server-2 ◀──subscribe "room42"─┤ (backplane) │
│ └───────────────┘
└──push──▶ Bob (server-2 delivers to its own client)
every WS server subscribes to the rooms its clients are in;
the backplane bridges connections that live on different servers
This backplane is the defining piece of WebSocket scaling — without it, horizontal scaling breaks because messages can't cross server boundaries.
Heartbeats and Reconnection
Long-lived connections die silently — a laptop sleeps, a network blips, a proxy times out an idle connection — often without a clean close. Two mechanisms keep things healthy. Heartbeats (WebSocket ping/pong frames at intervals) detect dead connections so the server can reclaim resources and the client knows to reconnect. Reconnection logic on the client re-establishes the socket after a drop, ideally with exponential backoff (to avoid a thundering herd when a server restarts) and a way to resume — replaying missed messages since the last received ID so nothing is lost across the gap.
Use Cases
- Chat & messaging — the canonical bidirectional case (see chat app design).
- Live notifications & presence — "user is typing," online status, alerts (SSE often suffices if one-way).
- Collaborative editing — shared cursors and edits (Google Docs-style), needing low-latency two-way sync.
- Live dashboards, gaming, trading — continuous, latency-sensitive updates in both directions.
Pitfalls
- Infrastructure support — load balancers, proxies, and gateways must explicitly support the WebSocket upgrade and long-lived connections.
- Connection limits — OS file-descriptor and memory limits cap connections per server; tune and plan capacity for idle sockets.
- No backplane — forgetting the pub/sub layer means messages can't reach clients on other servers; the classic scaling miss.
- State on the connection — pinning users to servers complicates deploys (draining connections on restart) and failover.
- Overkill — if updates are infrequent or one-way, SSE or even polling is simpler and cheaper than a full WebSocket stack.
WebSockets turn HTTP's one-way request/response into a persistent, full-duplex channel for true real-time. Choose SSE when you only push server→client, WebSockets when the client also sends, long-polling as the fallback. The hard part isn't opening a socket — it's scaling stateful connections: sticky routing, capacity for millions of idle sockets, and above all a pub/sub backplane to fan messages out across servers, plus heartbeats and reconnection for resilience.
Why not just HTTP for real-time? HTTP is client-initiated request/response — the server can't push; polling is wasteful and laggy.
WebSocket vs SSE? SSE is one-way (server→client) over plain HTTP and simpler; WebSockets are full-duplex for when the client also sends frequently.
How does the handshake work? An HTTP request with Upgrade: websocket → 101 Switching Protocols → the connection becomes a persistent bidirectional socket.
How do you scale WebSockets? Sticky routing to the server holding the connection, capacity for many idle sockets, and a pub/sub backplane (Redis/Kafka) to fan messages out to clients on other servers.
How do you keep connections healthy? Ping/pong heartbeats to detect dead sockets, and client reconnection with backoff and message resume.