Caching stores the result of an expensive operation — a database query, an API call, a computed value — in fast storage so subsequent requests can be served without repeating the work. It is one of the highest-leverage performance techniques in distributed systems: a well-placed cache can reduce database load by orders of magnitude and cut response times from hundreds of milliseconds to single digits. At its core, caching exploits a simple empirical observation known as the 80/20 rule: 80% of traffic touches only 20% of the data. If you can keep that hot 20% in memory, you absorb the bulk of the load without touching the slower persistence tier at all.

But caching is not simply "put things in Redis." Every design decision — where in the stack you cache, which write strategy you use, what eviction policy governs memory pressure, how you handle TTLs, and how you defend against stampedes and avalanches — compounds into either a highly available, low-latency system or a subtle consistency minefield that breaks in production under load. This guide covers all of it: from the five canonical read/write strategies to eviction algorithm internals, distributed cache consistency, hot-key pathology, CDN edge caching, and the full taxonomy of failure modes with their mitigations.

⚡ Quick Takeaways
  • Cache-aside (lazy loading) is the safest default — only cache what's actually requested, and degrade gracefully on cache failure.
  • Write-through keeps cache and DB in sync but wastes space on data that's never read; write-behind maximizes write throughput but risks data loss on crash.
  • Write-around bypasses the cache on write — useful when written data is unlikely to be re-read soon (large batch imports, archival writes).
  • LRU is the general-purpose eviction policy; use LFU for stable hot-key workloads; ARC adapts automatically; add TTL jitter to prevent avalanche.
  • Cache stampede — when a popular key expires, many threads hit the DB simultaneously; fix with mutex locking or probabilistic early refresh.
  • Redis vs Memcached — Redis wins for rich data structures, persistence, and pub/sub; Memcached wins for pure simple string throughput at scale.
  • Bloom filter prevents cache penetration for non-existent keys without a DB lookup.
  • Hot keys in a distributed cache saturate a single shard — replicate to multiple slots or add an in-process L1 cache in front of Redis.
tldr

Cache frequently-read, rarely-changing data close to the consumer. Choose a write strategy (cache-aside, write-through, write-behind, write-around, read-through) based on your consistency and write-amplification budget. Use TTL and an eviction policy (LRU is the safe default, ARC adapts) to bound memory. Design for cache misses — a cold start or stampede should degrade gracefully, not cascade into a database meltdown. Defend against penetration with Bloom filters, against hot keys with local replication, and against distributed inconsistency with careful invalidation or short TTLs.

Caching layers in a distributed system stack
Caching layers in a distributed system stack

Where to Cache: The Layered Stack

Caches can live at multiple layers of the stack, each with different tradeoffs in latency, visibility, and invalidation complexity:

design principle

Layer your caches from fast to slow: L1 in-process → L2 distributed → database. A cache miss at L1 hits L2 before touching the database. This keeps database connections free for writes and genuinely uncacheable reads. The key insight is that each layer absorbs a different fraction of traffic, so even a modest L1 hit rate dramatically reduces L2 and DB pressure.

The Five Cache Write Strategies

There is no single correct strategy. The right choice depends on your consistency tolerance, write frequency, and what happens if the cache and the database diverge. Most real systems combine multiple strategies across different data types.

1. Cache-Aside (Lazy Loading)

The application owns the cache interaction entirely. On a read: check the cache first; if it misses, query the database, write the result to the cache, then return it. On a write: update the database, then either delete or update the cache entry.

python
def get_product(product_id):
    key = f"product:{product_id}"
    data = cache.get(key)                     # 1. check cache
    if data:
        return data                            # cache hit — done
    data = db.query("SELECT * FROM products WHERE id=?", product_id)
    cache.setex(key, 300, data)               # 2. populate on miss
    return data

def update_product(product_id, payload):
    db.execute("UPDATE products SET ... WHERE id=?", product_id)
    cache.delete(f"product:{product_id}")     # 3. invalidate on write

2. Read-Through

Similar to cache-aside but the cache library itself sits between the application and the database. The app always talks to the cache; on a miss, the cache goes to the database, populates itself, and returns the value. The application code does not need to implement the miss-and-populate logic explicitly.

3. Write-Through

Every write goes to both the cache and the database synchronously before acknowledging success to the caller. The cache is always consistent with the database immediately after any write.

4. Write-Behind (Write-Back)

Writes land in the cache immediately; the database write is deferred and batched asynchronously in the background. The cache serves as the primary write target, and the database is eventually brought in sync.

5. Write-Around

Writes go directly to the database, bypassing the cache entirely. The cache is only populated on subsequent reads (lazy loading). This prevents the cache from being filled with data that is unlikely to be re-read soon after being written.

StrategyWrite PathConsistencyWrite SpeedBest Fit
Cache-AsideDB, then invalidate cacheEventual (TTL-bounded)Fast (no cache write)General read-heavy
Read-ThroughDB, then cache auto-populates on missEventualFastORM-managed caches
Write-ThroughCache + DB (sync)StrongSlower (dual write)Low-write, read-critical
Write-BehindCache immediately, DB asyncEventual (risky)FastestHigh write throughput
Write-AroundDB only, bypasses cacheEventual (cold writes)FastBatch/archival writes

Eviction Policies: Algorithms in Depth

When the cache is full, something must be evicted to make room. The policy you choose determines which data survives — and getting it wrong means evicting hot entries while keeping cold ones, which destroys your hit rate. Each algorithm encodes a different assumption about your access pattern.

LRU — Least Recently Used

Evicts the item that has not been accessed for the longest time. Implemented as a doubly-linked list where accessed items are moved to the front; the item at the back is the eviction candidate. O(1) access and update with a hash map backing the list.

LRU works well because most workloads exhibit temporal locality — recently accessed data tends to be accessed again soon. However, LRU has a famous failure mode: a single sequential scan over a large dataset (a full-table backup, a batch report) evicts the entire working set and leaves the cache cold for the hot OLTP queries that follow. This is sometimes called a cache pollution event.

LFU — Least Frequently Used

Evicts the item with the lowest access count. Naturally keeps viral content and stable hot keys alive regardless of access recency. The downside: newly inserted items start with a count of 1 and are vulnerable to immediate eviction even if they are about to become hot. Modern LFU implementations (TinyLFU, used in Redis 4.0+ as allkeys-lfu) combine a frequency sketch with a small recency window to handle the "new popular item" problem.

ARC — Adaptive Replacement Cache

ARC automatically adapts between LRU and LFU by maintaining two lists: one for recently-used items (T1) and one for frequently-used items (T2), plus ghost lists (B1, B2) that remember recently evicted items' identities without their data. When a ghost-list hit occurs, ARC shifts the balance toward that list's strategy. The result is a cache that learns your access pattern in real time and self-tunes without configuration. ARC is patent-encumbered (IBM), so some systems implement approximations like LIRS or CLOCK-Pro instead.

FIFO — First In, First Out

The simplest policy: evict the oldest-inserted item. Requires no access tracking, making it fast and memory-efficient to implement. Almost never optimal for a cache because it has no connection to access frequency or recency; an item inserted first might be the most-accessed one in the system. FIFO's legitimate use cases are simple message buffers and rate-limiter windows, not general-purpose caches.

TTL-Based Expiration

Items are evicted when they pass their expiry timestamp, independent of the eviction policy. TTL and eviction work in concert: TTL controls data freshness; eviction controls memory pressure. Redis combines both — you set a TTL per key and choose a policy (LRU, LFU, etc.) that governs what happens when memory is full and no TTLs have fired yet.

PolicyEvictsStrengthsWeaknesses
LRULeast recently accessedGreat temporal locality; simple O(1)Sequential scan pollution; ignores frequency
LFUFewest total accessesKeeps stable hot keys alive long-termNew hot items start cold; stale counts linger
ARCAdapts LRU+LFU dynamicallySelf-tuning; best of both worldsMore memory overhead; patent-encumbered
FIFOOldest insertedZero overhead; deterministicNo relationship to access pattern
TTLPast expiry timestampGuarantees freshness; predictable staleness windowNot memory-pressure-driven; stale data until expiry fires

TTL Design: Freshness vs. Stability

Setting a TTL is a tradeoff between freshness (how quickly a cache update reflects a database change) and stability (how often you pay the miss penalty). Getting it right requires understanding both the data's change frequency and the business cost of serving stale data.

TTL jitter

Never set the same TTL for a large batch of keys — if you load 10,000 product records into Redis with a uniform TTL of 300 seconds, they all expire simultaneously 5 minutes later, generating a thundering-herd avalanche on the database. Add random jitter: ttl = base_ttl + random.randint(0, base_ttl // 5). This spreads expiries over time and eliminates the synchronized miss burst.

The Three Fatal Failure Modes

Cache Stampede (Thundering Herd)

When a single popular key expires under high traffic, many concurrent requests simultaneously discover a cache miss and all race to rebuild it from the database. If rebuilding takes 200 ms and you have 1,000 req/s hitting that key, you generate 200 concurrent database queries in the first second alone — enough to crash a moderately sized database.

Mitigations:

python
import time, math, random

def fetch_with_per(key, beta=1.0):
    # probabilistic early recompute — avoids stampede
    entry = cache.get_with_ttl(key)          # returns (value, remaining_ttl)
    if entry is None:
        return recompute_and_store(key)       # cold miss
    value, ttl = entry
    delta = time.monotonic() - time.monotonic()  # compute time proxy
    if -beta * math.log(random.random()) > ttl:
        return recompute_and_store(key)       # early recompute
    return value                               # serve cached value

Cache Penetration

Requests for keys that do not exist in either the cache or the database pass through to the database on every call. Each request misses the cache (nothing to cache since the result is empty) and triggers a full database query. A common cause is malicious clients querying invalid IDs in a loop — a DoS vector that bypasses the cache entirely.

Mitigations:

python
import redis, random
from pybloom_live import BloomFilter

r = redis.Redis(host='localhost', port=6379)
bloom = BloomFilter(capacity=1_000_000, error_rate=0.01)

def get_user(user_id):
    if user_id not in bloom:
        return None                            # bloom says definitely missing

    key = f"user:{user_id}"
    cached = r.get(key)
    if cached == "__null__":
        return None                            # cached negative result
    if cached:
        return cached

    user = db.query("SELECT * FROM users WHERE id=?", user_id)
    if user is None:
        r.setex(key, 30, "__null__")            # cache negative result
        return None

    ttl = 300 + random.randint(0, 60)         # jitter to prevent avalanche
    r.setex(key, ttl, user)
    bloom.add(user_id)
    return user

Cache Avalanche

A large batch of keys expires simultaneously, sending a flood of cache misses to the database at exactly the same moment. Common triggers: a rolling deployment that flushes the cache, a bulk data import that populates thousands of keys with the same TTL, or a caching layer restart after a maintenance window.

Mitigations:

Hot Keys: The Silent Bottleneck

In a distributed cache cluster like Redis Cluster, each key lives on exactly one shard. A hot key — a single product page, a trending tweet, a viral news article — can receive millions of requests per second, all routed to the same shard. That shard becomes a bottleneck regardless of how many other shards are idle. This is a distinct problem from cache stampede (which is about expiry) — hot keys are a throughput problem that exists even when the key is populated and never expires.

Mitigations:

python
import random

NUM_REPLICAS = 10

def get_hot_product(product_id):
    # spread reads across N replica keys on different shards
    replica_idx = random.randint(0, NUM_REPLICAS - 1)
    key = f"product:{product_id}:r{replica_idx}"
    return cache.get(key) or rebuild_and_fan_write(product_id)

def invalidate_hot_product(product_id):
    # on update, invalidate all replica keys
    keys = [f"product:{product_id}:r{i}" for i in range(NUM_REPLICAS)]
    cache.delete(*keys)

Distributed Cache Consistency

A distributed cache introduces a second system of record: the cache and the database can hold different values for the same key at any point in time. Managing this divergence is one of the hardest problems in caching at scale. The core tension is between two approaches to invalidation:

the correct order

Always write to the database first, then invalidate the cache — not the other way around. If you invalidate the cache first and then the DB write fails, you have a cold cache with a stale (or missing) DB value. If you write the DB first and the cache invalidation fails, your cache is briefly stale but the DB is authoritative; the TTL will eventually clean it up. The asymmetry matters.

For systems requiring stronger guarantees, consider event-driven invalidation via Change Data Capture (CDC): a CDC connector (Debezium, Maxwell) reads the database write-ahead log and publishes change events to a message queue. Cache workers subscribe and invalidate or refresh the cache entries. This completely decouples the write path from cache invalidation latency and ensures the cache is eventually consistent with the DB's WAL, which is the authoritative record of all writes.

Redis vs. Memcached: An Honest Comparison

Redis and Memcached are both in-memory key-value stores but they diverged substantially in their design philosophy and feature set. The decision is rarely close once you know your requirements.

DimensionRedisMemcached
Data structuresStrings, hashes, lists, sets, sorted sets, HyperLogLog, streams, bitmaps, geospatialStrings only (binary-safe)
PersistenceRDB snapshots + AOF write-ahead log; configurable durability vs performance tradeoffNone — memory-only; all data lost on restart
Clustering / HANative Redis Cluster (hash-slot partitioning) + Sentinel for automatic failoverClient-side consistent hashing; no server-side cluster
Pub/SubYes — native publish/subscribe + streams (Kafka-lite for simple use cases)No
Lua scriptingYes — atomic server-side scripts; enables complex atomic operationsNo
TransactionsMULTI/EXEC (optimistic) + WATCH for CASNo
Threading modelSingle-threaded event loop (network I/O multi-threaded since 6.0)Multi-threaded — true CPU parallelism per connection
Memory efficiencyRich structures add overhead; use hashes for small objectsLean, minimal overhead for raw strings
Use casesSessions, leaderboards (sorted sets), rate limiting, queues, pub/sub, feature flagsSimple string caching at extreme scale, read-only object caches

Redis is the default choice for most new systems. Memcached retains a meaningful edge in pure raw-string throughput on multi-core machines — its multi-threaded architecture allows true CPU-level parallelism per connection, whereas Redis's single-threaded command processing serializes all commands regardless of available cores (network I/O was multi-threaded in 6.0, but command execution remains single-threaded). For a workload that is literally millions of simple GET/SET on plain strings with no need for persistence, rich types, or pub/sub, Memcached can outperform Redis by 20–40% at extreme scale. For everything else — sorted set leaderboards, stream processing, Lua-atomic rate limiters, persistent sessions — Redis is unambiguously the better choice.

Redis eviction policies to know

Redis supports multiple eviction policies configured via maxmemory-policy: noeviction (return errors when full), allkeys-lru (LRU across all keys), volatile-lru (LRU only among TTL-set keys), allkeys-lfu (LFU across all keys), allkeys-random, and volatile-ttl (evict keys with shortest TTL first). The most common production choice is allkeys-lru for a pure cache, and volatile-lru when Redis is dual-used as both a cache and a data store (only evict TTL-expiring cache entries, never the persistent data).

CDN Edge Caching

A Content Delivery Network is a globally distributed cache that sits at the network edge — geographically close to end users. CDNs are the highest-leverage cache you can deploy for public-facing content: instead of a request from Singapore traveling 200 ms to your US-East origin, it hits a CDN edge node in Singapore and returns in under 10 ms.

What to cache at the CDN

What not to cache at the CDN

Cache invalidation at the CDN

CDN invalidation (cache purge) is expensive in two ways: it has an API rate limit, and it takes seconds to propagate to all edge nodes globally. The standard strategy is to avoid needing invalidation for static assets (content-hash URLs) and to rely on short TTLs for semi-dynamic content. For urgent invalidation (a data breach requiring immediate removal), use the CDN's purge API but treat it as a rare emergency tool, not a routine cache-refresh mechanism.

Caching Patterns in Production: Real Scenarios

E-commerce product page

A product detail page aggregates data from multiple services: catalog, inventory, pricing, and reviews. The typical pattern is fragment caching: cache each fragment independently at its appropriate TTL rather than caching the assembled page. Product description: 1-hour TTL; pricing: 30-second TTL (or event-invalidated); inventory: not cached (always read from source). The CDN caches the assembled page at a 60-second TTL for anonymous users; personalized or authenticated responses bypass the CDN entirely.

Social feed (news feed / timeline)

User timelines are expensive to assemble from raw graph data. The industry standard (Facebook Memcache, Twitter Pelikan) is to cache pre-assembled timeline objects per user. Writes fan out to the timelines of all followers — the push-on-write model — so reads are always cache hits. For users with millions of followers (celebrities), the fan-out is unbounded; switch to a pull-on-read model for those accounts and merge their content into the timeline at read time.

Rate limiting with Redis

Redis sorted sets or atomic increments are the standard for distributed rate limiting. The sliding-window log pattern stores each request timestamp as a sorted set member; a ZRANGEBYSCORE counts requests in the last N seconds; ZADD adds the new request; ZREMRANGEBYSCORE prunes old entries — all in a Lua script for atomicity. The fixed-window counter is simpler: INCR key + EXPIRE key window_seconds.

lua
-- Redis Lua: sliding-window rate limiter (atomic)
local key      = KEYS[1]               -- e.g. "ratelimit:user:42"
local now      = tonumber(ARGV[1])     -- current timestamp ms
local window   = tonumber(ARGV[2])     -- window size ms (e.g. 60000)
local limit    = tonumber(ARGV[3])     -- max requests per window

redis.call('ZREMRANGEBYSCORE', key, 0, now - window)
local count = redis.call('ZCARD', key)
if count >= limit then
  return 0                             -- rate limited
end
redis.call('ZADD', key, now, now)
redis.call('PEXPIRE', key, window)
return 1                               -- allowed

Cache Observability: What to Measure

A cache you cannot observe is a cache you cannot tune. The key metrics to instrument:

Best Practices and Anti-Patterns

Do

Do Not

takeaway

Cache-aside is the safest starting point — it is simple, only caches what is actually needed, and degrades gracefully on cache failure. Add TTL jitter from day one to avoid avalanche. Understand your eviction policy relative to your access pattern — LRU for general workloads, LFU for stable hot-key sets, ARC when you cannot predict which. Reach for Redis when you need persistence, pub/sub, sorted sets, or atomic Lua scripts; consider Memcached only when raw string throughput at extreme scale is the sole requirement. Treat hot keys, stampede, penetration, and avalanche as distinct failure modes — each needs a specific mitigation, not a generic "add more cache" response.

🎯 interview hot-takes

Cache stampede vs cache avalanche — what's the difference? Stampede: one popular key expires and many threads hit the DB simultaneously — fix with mutex locking or probabilistic early refresh. Avalanche: many keys expire simultaneously sending a flood of misses — fix with TTL jitter and pre-warming. Distinct problems, distinct fixes.
How do you handle cache penetration? Cache negative results (a sentinel like "__null__" with a short TTL) or place a Bloom filter in front to reject nonexistent keys before they hit the cache or DB.
Redis or Memcached? Redis for anything beyond simple string caching — persistence, pub/sub, sorted sets for leaderboards, Lua-atomic rate limiters, and streams. Memcached only when raw string throughput at multi-core extreme scale is the sole requirement and ops simplicity is paramount.
What eviction policy do you use? allkeys-lru for a pure cache; volatile-lru if Redis is dual-purposed with persistent data. Switch to allkeys-lfu when the working set has stable, clearly defined hot keys (leaderboards, viral content).
How do you handle a hot key saturating one Redis shard? Replicate the key to N slots with a random suffix and fan out reads; add an in-process L1 cache with a 1-second TTL to absorb the bulk of requests before they even hit Redis.

← previous
Load Balancing