Every connection on the internet starts with a question: "what's the IP address for this name?" The Domain Name System answers it. DNS is the internet's distributed, hierarchical, heavily-cached phone book that turns sdeoffer.com into an IP a machine can connect to. It looks like plumbing, but it's also a powerful infrastructure tool — it's how CDNs route you to the nearest edge, how traffic is balanced and failed over across regions, and a frequent (often overlooked) single point of failure.

⚡ Quick Takeaways
  • DNS maps names → IPs through a hierarchy: root → TLD (.com) → authoritative nameserver for the domain.
  • A recursive resolver does the legwork — your client asks it; it walks the hierarchy and caches the answer.
  • Caching + TTL are everything — answers are cached at every level for their TTL; "DNS propagation" is just waiting for old TTLs to expire.
  • Record types encode different answers — A/AAAA (IPv4/IPv6), CNAME (alias), MX (mail), TXT, NS.
  • DNS is an infra tool — GeoDNS returns location-based IPs and round-robin/weighted records balance load and do failover.
  • Anycast lets one IP be served from many locations, the basis for resilient root servers and CDNs.
  • It's a SPOF and attack surface — DNS outages take down "everything"; DNSSEC and DoH/DoT add integrity and privacy.
tldr

DNS resolves a hostname to an IP by walking a hierarchy (root → TLD → authoritative), with a recursive resolver doing the work and caching results by TTL. Records (A, CNAME, MX…) hold the answers. Beyond lookups, DNS is a routing tool: GeoDNS and weighted/round-robin records steer and balance traffic, and anycast serves one IP from many sites. Because nearly everything depends on it, DNS is a notorious single point of failure — secure it (DNSSEC, DoH/DoT) and design TTLs deliberately.

The Problem: Names to Addresses

Humans remember github.com; machines route packets to 140.82.x.x. DNS is the indirection layer between them — and that indirection is valuable beyond convenience: because the name→IP mapping is resolved fresh (subject to caching), you can change where a name points without changing the name, which is what enables migrations, failover, and geo-routing.

The Resolution Flow

When your browser needs sdeoffer.com, it asks a recursive resolver (run by your ISP, or 1.1.1.1 / 8.8.8.8). If the answer isn't cached, the resolver walks the hierarchy on your behalf:

resolving sdeoffer.com (cache miss)
client ─▶ recursive resolver
            ├─▶ root server      : "ask the .com TLD servers → here"
            ├─▶ .com TLD server  : "ask sdeoffer.com's nameservers → here"
            ├─▶ authoritative NS : "sdeoffer.com A = 203.0.113.10"  ✓
            └─ cache the answer (for its TTL), return to client

  client connects to 203.0.113.10
  next lookup within TTL → served instantly from the resolver's cache

The resolver's queries are iterative (each server refers it one step further down), while the client's single query to the resolver is recursive (the resolver returns the final answer). The first lookup touches several servers; every subsequent one within the TTL is a cache hit.

The Hierarchy

DNS is a tree, which is what lets it scale to billions of names without any single authority:

Record Types

RecordMapsExample use
A / AAAAName → IPv4 / IPv6The core hostname → address mapping
CNAMEName → another name (alias)www → apex; pointing at a CDN hostname
MXDomain → mail serversEmail routing
NSDomain → its nameserversDelegation
TXTArbitrary textDomain verification, SPF/DKIM

A common gotcha: a CNAME can't coexist with other records at the same name and isn't allowed at the zone apex (example.com itself) — which is why providers offer "ALIAS/ANAME" flattening to point an apex at a CDN.

Caching and TTL

DNS scales because answers are cached aggressively at every layer — the OS, the recursive resolver, sometimes the browser. Each record carries a TTL (time to live) telling caches how long to keep it. This is the single most important operational lever:

"propagation" is just TTL

When you change a DNS record and it "takes hours to propagate," nothing is actively propagating — caches around the world are simply still serving the old answer until its TTL expires. The fix when planning a change: lower the TTL ahead of time (e.g. to 60s) so caches refresh quickly, make the change, then raise it again. High TTLs mean fast lookups but slow changes; low TTLs mean the opposite.

DNS as an Infrastructure Tool

Because the resolver asks "what IP for this name?", the authoritative server can answer differently depending on who's asking or current conditions — turning DNS into a routing and load-balancing layer:

Anycast

Anycast is the technique that lets a single IP address be announced from many physical locations at once; internet routing delivers each client to the topologically nearest one. It's how the 13 root servers actually run on hundreds of machines worldwide, and how CDNs and big DNS providers achieve both low latency and resilience — if one location goes down, routing simply sends clients to the next-nearest. (Contrast with GeoDNS, which returns different answers; anycast uses the same IP and lets the network pick the site.)

Reliability and Security

Because virtually everything begins with a DNS lookup, DNS is a high-impact single point of failure: the 2016 Dyn attack took down major sites not by attacking them but by DDoSing their DNS provider. Mitigations: use multiple/redundant DNS providers, anycast, and sensible TTLs. On security: plain DNS is unauthenticated and unencrypted, so DNSSEC adds cryptographic signatures to prevent forged/spoofed answers (cache poisoning), and DoH/DoT (DNS over HTTPS/TLS) encrypt the query for privacy so intermediaries can't see or tamper with your lookups.

In System Design

DNS shows up in most designs as the very first hop: it routes users to a CDN edge or a regional load balancer, enables blue-green/region failover (repoint a name), and underlies service discovery (internal services resolve each other by name, often via the platform's DNS). The recurring interview point is to remember TTL: anything you want to change quickly (failover, cutover) needs a low TTL set in advance.

Pitfalls

takeaway

DNS is a hierarchical, cached lookup from names to IPs — root → TLD → authoritative, fronted by a recursive resolver — and TTL governs the trade between fast lookups and fast changes. Beyond resolution it's a routing layer (GeoDNS, weighted records, anycast) used for load balancing, geo-routing, and failover. Treat it as critical infrastructure: redundant providers, deliberate TTLs, and DNSSEC/encrypted DNS.

🎯 interview hot-takes

Walk through a DNS lookup. Client → recursive resolver → (cache miss) root → TLD (.com) → authoritative nameserver → A record; resolver caches by TTL and returns the IP.
Recursive vs authoritative? The recursive resolver does the legwork and caches; authoritative servers hold the real records for a domain. Root/TLD servers delegate downward.
What is "DNS propagation"? Not active propagation — caches serving the old record until its TTL expires. Lower TTL before a planned change.
GeoDNS vs anycast? GeoDNS returns a different IP based on location; anycast announces one IP from many sites and lets routing pick the nearest.
Why is DNS a reliability concern? Everything starts with a lookup, so a DNS outage looks total (cf. the Dyn attack); use redundant providers, anycast, and DNSSEC.

← previous
Observability