Every connection on the internet starts with a question: "what's the IP address for this name?" The Domain Name System answers it. DNS is the internet's distributed, hierarchical, heavily-cached phone book that turns sdeoffer.com into an IP a machine can connect to. It looks like plumbing, but it's also a powerful infrastructure tool — it's how CDNs route you to the nearest edge, how traffic is balanced and failed over across regions, and a frequent (often overlooked) single point of failure.
- DNS maps names → IPs through a hierarchy: root → TLD (
.com) → authoritative nameserver for the domain. - A recursive resolver does the legwork — your client asks it; it walks the hierarchy and caches the answer.
- Caching + TTL are everything — answers are cached at every level for their TTL; "DNS propagation" is just waiting for old TTLs to expire.
- Record types encode different answers — A/AAAA (IPv4/IPv6), CNAME (alias), MX (mail), TXT, NS.
- DNS is an infra tool — GeoDNS returns location-based IPs and round-robin/weighted records balance load and do failover.
- Anycast lets one IP be served from many locations, the basis for resilient root servers and CDNs.
- It's a SPOF and attack surface — DNS outages take down "everything"; DNSSEC and DoH/DoT add integrity and privacy.
DNS resolves a hostname to an IP by walking a hierarchy (root → TLD → authoritative), with a recursive resolver doing the work and caching results by TTL. Records (A, CNAME, MX…) hold the answers. Beyond lookups, DNS is a routing tool: GeoDNS and weighted/round-robin records steer and balance traffic, and anycast serves one IP from many sites. Because nearly everything depends on it, DNS is a notorious single point of failure — secure it (DNSSEC, DoH/DoT) and design TTLs deliberately.
The Problem: Names to Addresses
Humans remember github.com; machines route packets to 140.82.x.x. DNS is the indirection layer between them — and that indirection is valuable beyond convenience: because the name→IP mapping is resolved fresh (subject to caching), you can change where a name points without changing the name, which is what enables migrations, failover, and geo-routing.
The Resolution Flow
When your browser needs sdeoffer.com, it asks a recursive resolver (run by your ISP, or 1.1.1.1 / 8.8.8.8). If the answer isn't cached, the resolver walks the hierarchy on your behalf:
client ─▶ recursive resolver
├─▶ root server : "ask the .com TLD servers → here"
├─▶ .com TLD server : "ask sdeoffer.com's nameservers → here"
├─▶ authoritative NS : "sdeoffer.com A = 203.0.113.10" ✓
└─ cache the answer (for its TTL), return to client
client connects to 203.0.113.10
next lookup within TTL → served instantly from the resolver's cache
The resolver's queries are iterative (each server refers it one step further down), while the client's single query to the resolver is recursive (the resolver returns the final answer). The first lookup touches several servers; every subsequent one within the TTL is a cache hit.
The Hierarchy
DNS is a tree, which is what lets it scale to billions of names without any single authority:
- Root servers (the
.at the top) — know where the TLD servers are. There are 13 root server identities, each massively replicated via anycast. - TLD servers — manage top-level domains (
.com,.org,.io, country codes); they know which nameservers are authoritative for each domain under them. - Authoritative nameservers — hold the actual records for a specific domain (often run by your DNS provider — Route 53, Cloudflare, etc.).
Record Types
| Record | Maps | Example use |
|---|---|---|
| A / AAAA | Name → IPv4 / IPv6 | The core hostname → address mapping |
| CNAME | Name → another name (alias) | www → apex; pointing at a CDN hostname |
| MX | Domain → mail servers | Email routing |
| NS | Domain → its nameservers | Delegation |
| TXT | Arbitrary text | Domain verification, SPF/DKIM |
A common gotcha: a CNAME can't coexist with other records at the same name and isn't allowed at the zone apex (example.com itself) — which is why providers offer "ALIAS/ANAME" flattening to point an apex at a CDN.
Caching and TTL
DNS scales because answers are cached aggressively at every layer — the OS, the recursive resolver, sometimes the browser. Each record carries a TTL (time to live) telling caches how long to keep it. This is the single most important operational lever:
When you change a DNS record and it "takes hours to propagate," nothing is actively propagating — caches around the world are simply still serving the old answer until its TTL expires. The fix when planning a change: lower the TTL ahead of time (e.g. to 60s) so caches refresh quickly, make the change, then raise it again. High TTLs mean fast lookups but slow changes; low TTLs mean the opposite.
DNS as an Infrastructure Tool
Because the resolver asks "what IP for this name?", the authoritative server can answer differently depending on who's asking or current conditions — turning DNS into a routing and load-balancing layer:
- Round-robin / weighted DNS — return multiple A records (or rotate/weight them) to spread clients across servers, a crude but real load balancer (see load balancing).
- GeoDNS — return a different IP based on the resolver's location, sending users to the nearest datacenter/edge — one way CDNs route traffic.
- DNS failover — health-check endpoints and stop returning the IP of a dead region, steering traffic away (bounded by TTL — which is why low TTLs matter for failover).
Anycast
Anycast is the technique that lets a single IP address be announced from many physical locations at once; internet routing delivers each client to the topologically nearest one. It's how the 13 root servers actually run on hundreds of machines worldwide, and how CDNs and big DNS providers achieve both low latency and resilience — if one location goes down, routing simply sends clients to the next-nearest. (Contrast with GeoDNS, which returns different answers; anycast uses the same IP and lets the network pick the site.)
Reliability and Security
Because virtually everything begins with a DNS lookup, DNS is a high-impact single point of failure: the 2016 Dyn attack took down major sites not by attacking them but by DDoSing their DNS provider. Mitigations: use multiple/redundant DNS providers, anycast, and sensible TTLs. On security: plain DNS is unauthenticated and unencrypted, so DNSSEC adds cryptographic signatures to prevent forged/spoofed answers (cache poisoning), and DoH/DoT (DNS over HTTPS/TLS) encrypt the query for privacy so intermediaries can't see or tamper with your lookups.
In System Design
DNS shows up in most designs as the very first hop: it routes users to a CDN edge or a regional load balancer, enables blue-green/region failover (repoint a name), and underlies service discovery (internal services resolve each other by name, often via the platform's DNS). The recurring interview point is to remember TTL: anything you want to change quickly (failover, cutover) needs a low TTL set in advance.
Pitfalls
- TTL too high for changes — you can't fail over or migrate fast if the world caches the old IP for a day; pre-lower TTLs.
- DNS as an unguarded SPOF — single provider, no redundancy; a DNS outage looks like a total outage.
- CNAME at the apex — not allowed; use ALIAS/ANAME flattening.
- Stale negative caching — failures (NXDOMAIN) are cached too; a misconfiguration can linger.
- Ignoring DNS in latency budgets — a cold lookup adds round trips;
dns-prefetch/preconnect help on the web.
DNS is a hierarchical, cached lookup from names to IPs — root → TLD → authoritative, fronted by a recursive resolver — and TTL governs the trade between fast lookups and fast changes. Beyond resolution it's a routing layer (GeoDNS, weighted records, anycast) used for load balancing, geo-routing, and failover. Treat it as critical infrastructure: redundant providers, deliberate TTLs, and DNSSEC/encrypted DNS.
Walk through a DNS lookup. Client → recursive resolver → (cache miss) root → TLD (.com) → authoritative nameserver → A record; resolver caches by TTL and returns the IP.
Recursive vs authoritative? The recursive resolver does the legwork and caches; authoritative servers hold the real records for a domain. Root/TLD servers delegate downward.
What is "DNS propagation"? Not active propagation — caches serving the old record until its TTL expires. Lower TTL before a planned change.
GeoDNS vs anycast? GeoDNS returns a different IP based on location; anycast announces one IP from many sites and lets routing pick the nearest.
Why is DNS a reliability concern? Everything starts with a lookup, so a DNS outage looks total (cf. the Dyn attack); use redundant providers, anycast, and DNSSEC.