A hotel booking platform is a two-sided marketplace with strict correctness requirements on the critical path: double-booking a room is a severe user experience failure, while incorrectly blocking available inventory costs revenue. The core design challenge is maintaining accurate room availability under concurrent bookings across a global user base, across a stay that spans multiple calendar days, with a checkout flow that must survive network failures, payment timeouts, and user abandonment — all without ever selling the same night to two different guests. Beyond that correctness challenge, the platform must serve fast, filtered, geo-aware search results across hundreds of thousands of properties, manage complex cancellation and refund policies per-hotel, and provide business analytics to hotel operators in near real time.

⚡ Quick Takeaways
  • SQL throughout — hotel/room catalog is bounded and structured; booking demands ACID; volume doesn't justify NoSQL complexity.
  • DB-level constraint prevents oversellCHECK (available_count >= 0) on the room_availability table; concurrent decrements to -1 fail at the database, not the application layer.
  • Optimistic locking for low-contention — use a version column; retry on conflict; no row-level lock held between the user's search and their booking confirmation.
  • Redis TTL hold (~10 min) — soft-reserves a room during checkout; auto-releases on TTL expiry if user abandons without completing payment.
  • Kafka → Elasticsearch — hotel/room create or update events feed the search index asynchronously; fully-booked rooms are removed from results automatically.
  • Idempotent booking API — client-supplied idempotency key prevents duplicate reservations on retried requests (double-tap, network retry).
  • S3 + CDN for media — hotel images live in object storage; CDN edge nodes serve them globally with low latency.
tldr

Use SQL throughout for ACID correctness — hotel and room data is bounded in size, and bookings demand transaction integrity. A row-level constraint on available_count prevents overselling at the database level. A Redis hold with ~10-minute TTL bridges the checkout gap without permanently blocking inventory. Kafka feeds listing changes into Elasticsearch for search. CDN distributes hotel media globally. Cancellation and refund are their own idempotent state machine paths.

High-level hotel booking app architecture
High-level hotel booking app architecture

Step 1 — Clarify Requirements

Before drawing boxes, scope the system. Hotel booking covers a lot of ground — narrow it to a defensible, coherent subset.

Functional requirements (Hotel Admin)

Functional requirements (User)

Non-functional requirements

interview tip

Explicitly scope out: loyalty programs, third-party OTA (Online Travel Agency) integration, dynamic room upgrade logic, fraud detection. These are real product features but eating the full scope in 45 minutes leaves no time for the hard parts — concurrency control and the booking state machine.

Step 2 — Capacity Estimation

Hotel booking is a read-heavy, write-careful system. Assume a large OTA (Booking.com / Expedia scale): 500M hotels globally, 1M bookings/day.

Traffic

Storage

note

The data volumes confirm that SQL is the right primary store — there is no scale justification for NoSQL complexity. The hard problem is not storage but concurrent write correctness on the availability table.

Step 3 — API Design

Clean REST surface across the hotel admin and user-facing services:

REST API
# Hotel admin — listing management
POST   /hotels              -- create hotel
PUT    /hotels/{id}         -- update hotel
GET    /hotels/{id}         -- get hotel details
PUT    /hotels/{id}/rooms/{room_id}   -- update room
GET    /hotels/{id}/rooms             -- list rooms for hotel
PUT    /rooms/{room_id}/availability  -- set available_count per date range

# User — search and discovery
GET    /hotels/search?city=Paris&checkin=2025-07-01&checkout=2025-07-05&guests=2&max_price=200
GET    /hotels/{id}/availability?checkin=...&checkout=...

# User — booking (idempotent via Idempotency-Key header)
POST   /bookings
       Idempotency-Key: "client-uuid-123"
       { hotel_id, room_id, checkin, checkout, guests, payment_method_id }
       → 201 { booking_id, status: "RESERVED", hold_expires_at }

GET    /bookings/{id}       -- booking status and details
DELETE /bookings/{id}       -- cancel a booking (triggers refund)
PUT    /bookings/{id}/modify -- change dates (if hotel policy allows)

The POST /bookings endpoint carries an Idempotency-Key header. A retried request with the same key (double-tap, network timeout) returns the existing booking rather than creating a second one. This is essential because the booking flow involves a payment charge — without idempotency, a network glitch between the client and server can result in two charges to the guest's card.

Step 4 — Hotel and Room Listing Service

The hotel and room catalog is stored in a SQL database. The rationale is deliberately simple: the number of hotels in the world is finite and grows predictably. A relational database is the right fit — the data is structured, growth is foreseeable, and the query patterns (look up hotel by ID, list rooms by hotel) are well-suited to SQL indexes.

Hotel images and videos are stored in AWS S3 (or equivalent object storage). The SQL database holds references (URLs) to these media assets. A CDN sits in front of S3 to serve media to a global user base with low latency — fetching a hotel image from a geographically proximate CDN edge node is orders of magnitude faster than hitting the origin bucket in us-east-1.

When hotels or rooms are created or updated, the changes are published to Kafka as events. A consumer reads these events and updates the Elasticsearch search index asynchronously, keeping search results eventually consistent with the source of truth. Fully-booked rooms are removed from results as their available_count drops to zero — this removal flows through the same CDC pipeline, not through a direct Elasticsearch write on the booking path.

Step 5 — Data Model

The core tables span the hotel catalog and the booking system. All live in SQL for ACID correctness:

SQL schema
CREATE TABLE hotels (
  id            BIGINT PRIMARY KEY,
  name          VARCHAR(255) NOT NULL,
  city          VARCHAR(100),
  country       VARCHAR(2),     -- ISO 3166-1 alpha-2
  lat           DECIMAL(9,6),
  lng           DECIMAL(9,6),
  star_rating   SMALLINT,
  description   TEXT,
  amenities     JSONB,            -- pool, wifi, parking, pet-friendly...
  created_at    TIMESTAMP
);

CREATE TABLE rooms (
  id            BIGINT PRIMARY KEY,
  hotel_id      BIGINT REFERENCES hotels(id),
  room_type     VARCHAR(64),    -- STANDARD|DELUXE|SUITE
  max_guests    SMALLINT,
  base_rate_cents INT,
  description   TEXT
);

CREATE TABLE room_availability (
  room_id         BIGINT REFERENCES rooms(id),
  date            DATE,
  available_count INT NOT NULL CHECK (available_count >= 0),
  price_cents     INT NOT NULL, -- per-night rate for this specific date
  version         INT NOT NULL DEFAULT 0, -- optimistic lock
  PRIMARY KEY (room_id, date)
);

CREATE TABLE bookings (
  id              BIGINT PRIMARY KEY,
  user_id         BIGINT NOT NULL,
  room_id         BIGINT NOT NULL,
  checkin_date    DATE   NOT NULL,
  checkout_date   DATE   NOT NULL,
  guest_count     SMALLINT,
  total_cents     BIGINT NOT NULL,
  status          VARCHAR(32),  -- RESERVED|BOOKED|CANCELLED|COMPLETED
  idempotency_key VARCHAR(64) UNIQUE,
  payment_intent_id VARCHAR(128),
  hold_expires_at TIMESTAMP,   -- Redis TTL mirrored here for recovery
  created_at      TIMESTAMP,
  updated_at      TIMESTAMP
);

Key schema decisions: the CHECK (available_count >= 0) constraint is the database-level guard against overselling. The version column enables optimistic concurrency control. The idempotency_key UNIQUE constraint prevents duplicate bookings at the DB level. The price_cents column on room_availability captures the per-night rate at booking time — the price can change after booking but the guest's confirmed rate is locked in the booking record's total_cents.

Hotel and room database schema
Hotel and room database schema

Step 6 — Search Service

Search is powered by an Elasticsearch cluster (Solr is a comparable alternative — both are built on Apache Lucene). For a hotel platform where listings update continuously (price changes, availability, new properties), Elasticsearch is the better fit due to its real-time indexing and built-in geo-distance queries.

FeatureElasticsearchSolr
Best forTime-series data, real-time indexing, geo queriesStatic data sets with heavy caching
Fuzzy searchExcellent (Levenshtein automata)Good
Geo-distance filterNative geo_point type + distance filterSupported via spatial module
Type-aheadBuilt-in completion suggesterSupported via edge n-gram

Search query flow

A typical search — "hotels in Paris, 2 adults, July 1–5, max €200/night" — executes as follows:

  1. Elasticsearch query — filter by city = "Paris" (or geo_distance from "Paris" centroid), min star rating, amenities. Return hotel IDs sorted by relevance score (blending text match, popularity, conversion rate).
  2. Availability filter — for each returned hotel, query the room_availability SQL table to verify at least one room type has available_count > 0 for every night in the requested range and price_cents <= 200 × 100. Filter out hotels with no qualifying rooms.
  3. Price computation — for qualifying hotels, sum the per-night price_cents across the stay dates, apply promotions and loyalty discounts, compute the total displayed price.
  4. Return results — paginated list of hotels with thumbnail, nightly rate, total stay price, availability badge.
search caching

Search results for popular city+date combinations (e.g., "Paris July 4th weekend") can be cached in Redis with a short TTL (60–120 s). Availability changes invalidate the cache for affected hotel IDs. This significantly reduces Elasticsearch and SQL load for high-traffic search queries without serving stale availability data for long.

Step 7 — Booking Service and Concurrency Control

The booking service uses a SQL database for its ACID guarantees — a double-booking is unacceptable, and ACID transactions are the clean way to prevent it. The central challenge is multi-night availability: a 5-night stay requires available_count > 0 on five separate rows in room_availability, and all five decrements must succeed atomically or all fail — there is no valid partial booking.

Pessimistic locking

One approach is to lock all rows for the stay upfront using SELECT ... FOR UPDATE, then decrement and commit. This is safe but holds locks for the duration of the transaction — including any external calls (e.g., payment processing) that happen inside the transaction boundary. Holding DB locks while waiting for a Stripe API response (which can take 2–5 seconds) is a recipe for connection pool exhaustion under load.

Optimistic locking (preferred)

The preferred approach uses the version column on room_availability for optimistic concurrency control. No locks are held between the user's search and their booking confirmation — the transaction is only opened at the moment of the actual write:

SQL
-- Step 1: Read current state (no lock)
SELECT date, available_count, version
FROM room_availability
WHERE room_id = 42
  AND date BETWEEN '2025-07-01' AND '2025-07-04';
-- → returns 4 rows with version=7,8,7,9 and available_count=3,3,2,3

-- Step 2: Begin transaction; attempt conditional decrement on each date
BEGIN;
UPDATE room_availability
  SET available_count = available_count - 1,
      version = version + 1
  WHERE room_id = 42
    AND date = '2025-07-01'
    AND version = 7        -- optimistic lock: fails if row was modified
    AND available_count > 0; -- prevents going negative
-- Repeat for all dates in stay
-- If any UPDATE returns 0 rows affected → ROLLBACK (someone beat us to it)
-- If all succeed → INSERT booking record → COMMIT

If the version check fails (another concurrent booking modified one of the rows), the transaction rolls back immediately and the service retries with fresh data. Under low contention (most nights), the first attempt succeeds. Under high contention (popular hotel on peak weekend), the service may retry 1–3 times before succeeding or returning a "room no longer available" error. The CHECK (available_count >= 0) constraint is a second line of defense at the database level — even if application logic has a bug, the DB rejects any decrement below zero.

why not 2PC across services?

Two-phase commit across booking and payment services would give atomic checkout — either both succeed or neither does. But 2PC holds locks across services for the full transaction duration, creating a coordinator bottleneck and cratering availability under load. The chosen approach (Redis hold + optimistic lock on booking + async payment confirmation) trades strict atomicity for high availability, with explicit compensating actions for the failure cases.

Step 8 — Preventing Overselling During Checkout

Between when a user selects a room and when they complete payment, other users might book the same room. The solution mirrors the e-commerce cart pattern: a Redis hold with a ~10-minute TTL soft-reserves the room during checkout. This creates a two-phase booking flow:

flow
-- Phase 1: Reserve (immediate)
User selects room + dates
  → Booking Service: BEGIN TX
      Decrement available_count for each stay date (optimistic lock)
      Insert booking with status=RESERVED, hold_expires_at=NOW()+10min
    COMMIT
  → Redis: SET hold:{booking_id} 1 EX 600   (mirrors the DB expiry)
  → Return booking_id + checkout URL to user

-- Phase 2: Confirm (within 10 min)
User enters payment details + submits
  → Payment Service: charge card (Stripe PaymentIntent)
      → webhook: payment.succeeded
  → Booking Service: UPDATE booking SET status=BOOKED
  → Notification: send confirmation email + calendar invite

-- Timeout path (user abandons)
Redis TTL expires (10 min)
  → Expiry listener / background job detects hold:* key deleted
  → Booking Service: UPDATE booking SET status=CANCELLED
      Re-increment available_count for each stay date
  → Notification: "your hold has expired"

The Redis hold TTL expiry path requires a reliable mechanism to restore the availability count when a user abandons checkout. Two approaches:

Booking schema and status lifecycle
Booking schema and status lifecycle

Step 9 — Booking State Machine

An explicit state machine prevents illegal transitions and makes the booking lifecycle auditable. Each state transition is driven by a business event and is idempotent — receiving the same event twice in the same state is a no-op:

states
RESERVED         ← room held, checkout in progress (Redis TTL active)
  │ payment.succeeded webhook
  ▼
BOOKED           ← confirmed; room locked for guest's dates
  │ guest checks out / stay date passes
  ▼
COMPLETED        ← stay has occurred (terminal)

  From RESERVED:
  → HOLD_EXPIRED     (TTL expired without payment; room released)
  → PAYMENT_DECLINED (payment failed; room released)
  → CANCELLED_BY_USER (user cancelled before payment completed)

  From BOOKED:
  → CANCELLED_BY_USER   (cancellation within cancellation policy window)
  → CANCELLED_BY_HOTEL  (hotel cancels due to overbooking/ops issue)
  → NO_SHOW             (guest didn't arrive; charged no-show fee)

All CANCELLED_* states trigger:
  → re-increment available_count for each stay date
  → initiate refund (if applicable, per hotel policy)
  → notify guest and hotel admin

The state machine is persisted in SQL as the status column on the bookings table, guarded by application-level transition checks. Each transition also emits a Kafka event, allowing downstream services (notifications, analytics, revenue reporting) to react without being directly coupled to the booking service.

Step 10 — Payment Processing and Idempotency

Payment processing for hotel bookings follows the same principles as e-commerce but with a critical timing difference: hotel payments are often authorized at booking time but captured at check-in (or at check-out for post-pay hotels). This authorization-capture split gives the hotel a guarantee of funds while giving the guest flexibility — but it adds complexity to the payment state machine.

Payment flow

  1. During checkout, create a PaymentIntent with the payment processor (Stripe) for the full stay amount. This authorizes the card without charging it — the hold appears on the guest's card statement but no money moves.
  2. On check-in (or immediately for pre-pay hotels), capture the authorized PaymentIntent. If the card authorization has expired (typically 7 days), request a new authorization.
  3. On cancellation, release the authorization or issue a refund if already captured, according to the hotel's cancellation policy (free cancellation until X days before check-in; partial refund; no refund).

Idempotency at every step

edge case

What if the capture succeeds at the payment processor but the webhook confirming success is lost? The booking is stuck in RESERVED state with the guest's money actually captured. A nightly reconciliation job compares booking records in RESERVED state against the processor's settled transactions — if a PaymentIntent is captured in Stripe but the booking is RESERVED in our DB, promote the booking to BOOKED. This pattern (optimistic local state + reconciliation) is essential for any payment-adjacent system.

Step 11 — Cancellation and Refunds

Cancellation is its own state machine within the booking lifecycle. Hotel cancellation policies vary widely — from "free cancellation any time" to "no refund within 48 hours of check-in" — and the booking service must enforce these rules at cancellation time, not just at booking time.

Cancellation policy enforcement

Availability restoration on cancellation

When a booking is cancelled, the availability count must be restored for each night of the cancelled stay. This is the compensating action in the booking Saga:

SQL
-- Compensating action: restore availability on cancellation
BEGIN;
UPDATE bookings
  SET status = 'CANCELLED_BY_USER', updated_at = NOW()
  WHERE id = :booking_id
    AND status IN ('RESERVED', 'BOOKED');  -- idempotent: no-op if already cancelled

UPDATE room_availability
  SET available_count = available_count + 1
  WHERE room_id = :room_id
    AND date BETWEEN :checkin AND :checkout - INTERVAL '1 day';
COMMIT;
-- Then: issue refund via Payment Service (async)
-- Then: publish BookingCancelled event to Kafka

Step 12 — Availability Calendar

The availability calendar is the data structure that drives both search filtering and the room detail view ("show me which dates are available for this room type"). The room_availability table with a (room_id, date) primary key is the authoritative source. For the user-facing calendar UI, a pre-computed cache is essential:

multi-night availability check

A search for "Paris, July 1–5" must verify that at least one room type has availability on all four nights (Jul 1, 2, 3, 4). A SQL query on room_availability with a GROUP BY and HAVING count = 4 works correctly. For search-time performance at scale, pre-compute per-room "available from date X to date Y" ranges and index them in Elasticsearch using a date range field. This allows Elasticsearch to do the filtering without hitting the SQL DB on every search hit.

Step 13 — Analytics and Reporting

The Analytics Service consumes two event streams from Kafka: search actions (what users search for, which results they click) and booking transactions (bookings, cancellations, revenue). These feeds power hotel admin dashboards and platform-level reporting:

Analytics events flow from Kafka into a data warehouse (ClickHouse or Redshift) where SQL aggregations run without touching the operational DB. Hotel admin dashboards query the warehouse; the operational booking service is never used for analytics queries.

Step 14 — Scaling and Fault Tolerance

Hotel booking is read-dominated (searches vastly outnumber bookings) but write-critical (booking writes must be correct). The scaling strategy differs for each:

Search scaling

Booking write scaling

Fault tolerance

Step 15 — Why Not Cassandra for Bookings?

Unlike e-commerce order history (high volume, append-heavy, suitable for Cassandra archival), hotel booking volume is low enough that a SQL primary can retain the full booking history without a dedicated archival service. The correctness benefits of ACID transactions outweigh the scalability benefits of NoSQL at this data volume. Specifically:

Step 16 — Key Tradeoffs

DecisionChoiceTrade-off accepted
Primary storeSQL throughoutLower horizontal write scale than NoSQL; mitigated by the naturally low booking write rate
Concurrency controlOptimistic lockingRetry needed on conflict; under high contention for hot rooms, retry rate spikes — add exponential backoff
Search indexElasticsearch (eventual)Newly booked rooms may appear available in search for seconds until Kafka propagates; compensate with short-TTL availability cache
Checkout holdRedis TTL (~10 min)Room blocked for 10 min even if user abandons; mitigate with shorter TTL (5 min) and clear UI warning
Payment timingAuthorize now, capture at check-inAuthorization can expire (7 days) for far-future bookings; must re-authorize if needed
AnalyticsKafka → data warehouseAnalytics are eventually consistent (minutes to hours lag); acceptable for reporting use cases
takeaway

Hotel booking design is all about consistency: SQL + row-level constraints prevent double-booking at the database level; Redis TTL holds solve the checkout gap without permanently blocking inventory; optimistic locking keeps transactions short and highly concurrent; Kafka feeds Elasticsearch asynchronously to keep search accurate. The "why not NoSQL" answer is simple — booking volume doesn't justify sacrificing ACID, and hotel counts are bounded. The idempotency key on the booking API is the difference between a correct system and one that double-charges guests on network retries.

🎯 interview hot-takes

How do you prevent double-booking under concurrent requests? A CHECK (available_count >= 0) constraint on the room_availability table makes any transaction that would decrement below zero fail at the DB level. Optimistic locking with a version column ensures two concurrent bookings for the same room+date can't both succeed — one will retry with fresh data.
Why not use Cassandra for bookings like e-commerce uses it for order history? Hotel booking volume is low enough for SQL to retain full history; ACID transactions cleanly prevent double-bookings; Cassandra's eventual consistency model would require complex application-level conflict resolution.
What happens if a user abandons checkout after selecting a room? The Redis TTL (~10 min) expires automatically, triggering a background job (or keyspace notification handler) that re-increments available_count for each stay date — no manual cleanup or cron job needed.
How do you make the booking API idempotent? Require a client-generated Idempotency-Key header; store it in the bookings table as a UNIQUE column. A retried request with the same key returns the existing booking rather than creating a duplicate — preventing double-charges on network retries.
How do you handle the payment processor timing out during booking confirmation? Store the PaymentIntent ID in the booking record before making the capture call. Retry capture with exponential backoff using the same PaymentIntent ID (Stripe deduplicates). A nightly reconciliation job catches any booking in RESERVED state whose PaymentIntent has been captured — promotes them to BOOKED.

← previous
Design a URL Shortener