Object Storage & S3

Object storage is the quiet workhorse behind half the internet: every uploaded photo, video, backup, and static asset almost certainly lives in an object store like Amazon S3, Google Cloud Storage, or Azure Blob. It's the durable, effectively-infinite bucket our Google Drive and Pastebin designs offload their bytes to. What makes it different from a database or a filesystem is a deliberate set of constraints — a flat namespace, immutable objects, and an HTTP API — that trade convenience for near-limitless scale and durability.

⚡ Quick Takeaways

Objects, not files or blocks — each object is a blob + a key + metadata, stored in a flat namespace (no real directories), accessed over HTTP.
Built for durability — ~11 nines via replication across AZs and erasure coding; you offload "never lose this" to the storage layer.
Effectively infinite + cheap — scale to exabytes with no capacity planning; pay per GB stored and per request.
Immutable objects — you replace a whole object, not edit in place; there's no cheap "rename" or "append."
Big files use multipart upload; clients transfer directly via presigned URLs so app servers never proxy the bytes.
Storage classes / lifecycle tier data from hot to archive to cut cost; pair with a CDN for fast global reads.

tldr

Object storage keeps immutable blobs addressed by key in a flat namespace, exposed over an HTTP API. It achieves ~11 nines of durability with erasure coding and cross-AZ replication, scales to exabytes with no provisioning, and is cheap. Upload big files with multipart and let clients transfer directly via presigned URLs. It's not a filesystem (no rename, listing is paginated and not free) and not a database (no queries) — it's the durable bucket you put bytes in and front with a CDN.

Object vs Block vs File Storage

There are three storage paradigms, and choosing object storage means accepting its model on purpose:

Aspect	Object (S3)	Block (EBS)	File (NFS)
Unit	Object (blob + metadata)	Fixed-size blocks	Files in directories
Access	HTTP API by key	Attached as a disk	Mounted, POSIX paths
Mutation	Replace whole object	Edit any block	Edit in place
Scale	Effectively unlimited	Per-volume limits	Limited by server
Best for	Blobs: media, backups, assets	Databases, OS disks	Shared app files

Block storage is a raw disk (great for a database's files); file storage is a shared, mountable hierarchy (great for legacy apps). Object storage gives up in-place edits and a real directory tree in exchange for limitless scale and a simple network API — ideal for write-once, read-many blobs.

Anatomy of an Object and the Flat Namespace

An object is three things: the data (the bytes), a unique key (its name within a bucket), and metadata (content type, size, custom tags). Crucially, the namespace is flat — there are no real folders. A key like 2023/09/photo.jpg looks hierarchical, but the slashes are just characters in the key; "folders" are a UI convenience produced by listing keys with a common prefix. This flatness is what lets the system scale: there's no directory tree to traverse or lock, just a giant distributed key→blob map.

The HTTP API

You interact with object storage over plain HTTP verbs against a key:

object storage API

PUT    /my-bucket/2023/09/photo.jpg     # upload (replaces if exists)
GET    /my-bucket/2023/09/photo.jpg     # download
DELETE /my-bucket/2023/09/photo.jpg     # remove
GET    /my-bucket?prefix=2023/09/       # list keys under a prefix (paginated)

# big files: multipart upload (parallel, resumable)
initiate → upload part 1..N (in parallel) → complete

Two patterns matter at scale. Multipart upload splits a large object into parts uploaded in parallel and reassembled server-side — resumable and fast for multi-GB files. Presigned URLs are time-limited, signed links that let a client PUT/GET an object directly to/from the store without routing the bytes through your application servers — essential for keeping bulk traffic off your backend (exactly how the Drive design moves chunks).

Durability via Replication and Erasure Coding

The headline feature is durability — providers advertise ~11 nines (99.999999999%), meaning the probability of losing an object is vanishingly small. Two techniques get there. Replication stores copies across multiple availability zones (physically separate datacenters), so a fire or flood in one doesn't lose data. Erasure coding is cleverer and cheaper than full replication: it splits an object into k data fragments plus m parity fragments, and can reconstruct the object from any k of the k+m total. With (say) 10 data + 4 parity, you tolerate losing any 4 fragments while using only 1.4× storage instead of 3× for triple replication.

erasure coding: durability without 3x cost

object → split into k=10 data + m=4 parity = 14 fragments
         spread across 14 disks / zones

  rebuild from ANY 10 of 14   → tolerate losing up to 4
  storage overhead = 14/10 = 1.4x   (vs 3x for triple replication)

Consistency Model

Historically S3 was eventually consistent for some operations — a freshly written object might briefly 404 on read. Modern object stores now offer strong read-after-write consistency: once a PUT succeeds, a subsequent GET returns the new data. This matters for pipelines that write then immediately read. Listing, however, may still lag slightly, and there are no multi-object transactions — each object operation is independent and atomic on its own.

Storage Classes and Lifecycle

Not all data is accessed equally, so object stores offer storage classes at different price/latency points, and lifecycle rules to move objects between them automatically:

Class	Use	Trade-off
Standard (hot)	Frequently accessed	Highest storage cost, instant access
Infrequent access	Accessed monthly	Cheaper storage, retrieval fee
Archive (cold)	Backups, compliance	Very cheap, minutes-to-hours to restore

A lifecycle policy might keep an object in Standard for 30 days, move it to Infrequent Access, then to Archive after a year, and finally expire it — all automatically, slashing cost for aging data.

Scaling and Performance

Because the namespace is flat, the store partitions the keyspace and scales nearly linearly — there's no directory tree to bottleneck. Historically performance was tied to key prefixes (the store partitioned by prefix, so many keys sharing one prefix could create a hotspot), which is why advice used to be to randomize prefixes; modern S3 auto-scales prefixes, but spreading load across keys still helps for extreme throughput. For read-heavy public content, you put a CDN in front so most reads are served from the edge and never hit the origin bucket.

What It's Used For

Static assets & media — images, video, JS/CSS bundles, served via CDN.
Backups & archives — durable, cheap, lifecycle-tiered cold storage.
Data lake — raw data for batch/analytics jobs to read directly (schema-on-read).
Application blobs — user uploads, file-sync chunks, generated artifacts (Drive, Pastebin).

Pitfalls

No rename / move — "renaming" means copy-to-new-key + delete-old; moving a big "folder" is many operations.
Listing isn't free — large buckets list in paginated batches and per-request costs add up; don't use listing as a query engine.
Not a database — no queries, joins, or partial updates; pair it with a metadata DB (as the Drive design does).
Request costs — millions of tiny objects can cost more in requests than in storage; batch small items.

takeaway

Object storage trades the conveniences of a filesystem (rename, in-place edit, cheap listing) and a database (queries) for three superpowers: limitless scale via a flat namespace, ~11 nines durability via erasure coding and cross-AZ replication, and a dead-simple HTTP API. Use it for immutable blobs, move bytes directly with presigned URLs and multipart, tier with lifecycle rules, and front it with a CDN.

🎯 interview hot-takes

Object vs block vs file? Object = blobs by key over HTTP, flat namespace, limitless (S3); block = raw disk for databases (EBS); file = mountable POSIX hierarchy (NFS).
How does it get 11 nines? Cross-AZ replication plus erasure coding (k data + m parity, rebuild from any k) — durability far cheaper than 3x replication.
Why presigned URLs and multipart? Clients upload/download directly to the store (bytes never touch your servers); multipart parallelizes and resumes large uploads.
Is it a filesystem? No — flat namespace, immutable objects, no cheap rename, paginated non-free listing. "Folders" are just key prefixes.
Why front it with a CDN? Reads are bulk and cacheable; the edge serves hot content so the origin bucket isn't hammered.