Leetcode S17 — Probability & Randomization

Q: How do you make a fair coin from a biased coin?

Toss the biased coin twice. Map 10 to heads and 01 to tails; if you get 11 or 00, discard and retry. It works because P(10) = p(1-p) and P(01) = (1-p)p are exactly equal, so whatever the bias p, the two accepted outcomes are equally likely. This is the von Neumann trick and it is a special case of rejection sampling.

Q: How do you build rand7 from rand5 (or Rand10 from Rand7)?

Combine two rolls as digits, not a sum: rand5*5 + rand5 gives a uniform value in [0,25). Then reject to a clean multiple of the target: keep 0..20 (21 = 3*7 values) and re-roll on 21..24, so each group of three maps to one rand7 value via /3. Never add two rolls — a sum is triangular, not uniform. LC 470 Rand10 from Rand7 is the same technique.

Q: What is reservoir sampling and when do you use it?

Reservoir sampling picks k items uniformly at random from a stream whose length is unknown or too large to store, in one pass and O(k) space. Keep the first k; for the n-th arrival, let it replace a random slot with probability k/n. Induction shows every seen element stays uniformly likely. It powers LC 382 Linked List Random Node and LC 398 Random Pick Index.

Q: How do you do weighted random selection?

Build a prefix-sum array of the weights (the cumulative distribution), draw a uniform integer in [1, total], and binary search for the first bucket whose cumulative sum is at least that value. Construction is O(n), each pick is O(log n). This is LC 528 Random Pick with Weight.

Probability questions are where interviews stop testing whether you can write code and start testing whether you understand what your code actually does. There is no data structure to memorize here — just a handful of moves (expand a range, reject a remainder, swap in place) and one non-negotiable obligation: to prove, out loud, that every outcome is equally likely. This final session of the series walks the canonical set — a fair coin out of a biased one, rand7 from rand5 (and Rand10 from Rand7), building any range from a single random bit, the one correct card shuffle, reservoir sampling for streams you can't store, and weighted random selection — and closes on the interviewer's favorite curveball: "now implement random() without a random function."

⚡ Quick Takeaways

Rejection sampling is the master key — when a source over-covers your target range, keep the values that fit and re-roll the rest. Almost every problem here is a variation on "expand, then reject."
Fair coin from a biased coin — toss twice; 10 → 1, 01 → 0, discard 11/00 and retry. Since P(10) = P(01) = p(1−p), the bias cancels exactly (the von Neumann trick).
Combine, never add — rand5*5 + rand5 is a uniform rand25 (base-5 digits); random bits build any range. A sum of two rolls is triangular, not uniform — a classic trap.
Fisher–Yates or bust — swap position i with a random index in [i, n). The naive "swap with any index" gives biased permutations. O(n), in place.
Reservoir sampling — sample uniformly from a stream of unknown length: the n-th arrival claims the slot with probability k/n. One pass, O(k) space.
Weighted pick = prefix sums + binary search — lay the weights on a cumulative number line, draw one uniform value, binary search the bucket. O(log n) per pick (LC 528).

tldr

Almost every randomization question reduces to two moves: expand a small uniform source into a bigger one (digits, bits, or products), then reject the leftover so what survives stays uniform. Layered on top are three reusable patterns — Fisher–Yates for permutations, reservoir sampling for streams, and prefix-sum + binary search for weighted choice. Learn why each one preserves uniformity and you can re-derive the rest at the whiteboard.

Two Primitives: Uniform Ranges and Rejection Sampling

Before any named algorithm, pin down the two building blocks every question in this session leans on. The first is generating a uniform integer. In Java there are two idioms, and interviewers do notice which one you reach for:

Java — uniform integer helpers

// [0, k):  Math.random() returns a double in [0, 1)
int a = (int) (Math.random() * k);        // note the parens — cast the product, not just Math.random()

// [0, k):  the idiomatic way — no floating point at all
Random random = new Random();
int b = random.nextInt(k);                // uniform in [0, k)

// inclusive [min, max]
int c = min + (int) (Math.random() * ((max - min) + 1));

The (int)(Math.random() * k) parenthesization matters: (int) Math.random() * k casts the [0,1) double to 0 first and always yields zero. Prefer Random.nextInt(k) when you can — it avoids floating point entirely and is a hair more honest about uniformity.

The second primitive is rejection sampling, and it is the single most important idea in the session. The recipe: build a uniform source that over-covers the range you want, then throw away and re-roll any value that falls outside a clean sub-range. As long as the accepted region is exactly a whole number of copies of your target, every target value is equally likely. The cost is a few wasted rolls; the payoff is provable uniformity. Every one of Q1, Q2, and Q3 below is rejection sampling wearing a different costume.

Fair Coin from a Biased Coin

The warm-up: you are handed a coin that lands heads with probability p and tails with 1 − p, where p is unknown and not 0.5. Produce a fair 50/50 bit. Adding a correction constant is impossible — you don't know p. The trick is to find two events that are guaranteed equally likely no matter what p is. Toss twice and look at the pair:

Two tosses	Probability	Action
1 1	p · p	discard, retry
1 0	p · (1 − p)	return 1
0 1	(1 − p) · p	return 0
0 0	(1 − p) · (1 − p)	discard, retry

The two mixed outcomes 10 and 01 have identical probability p(1−p) — the asymmetry cancels. So map 10 to one result, 01 to the other, and when both tosses agree (11 or 00), reject and start over. That is rejection sampling in its purest, two-outcome form, and it is known as the von Neumann trick.

Java — fair coin from biased toss()

// toss() returns 1 with prob p, 0 with prob 1-p  (p unknown, p != 0.5)
public int fairToss() {
    int t1 = toss();
    int t2 = toss();
    if (t1 == 1 && t2 == 0) return 1;      // P = p(1-p)
    if (t1 == 0 && t2 == 1) return 0;      // P = (1-p)p  — equal, bias cancels
    return fairToss();                        // 11 or 00 → reject and retry
}

The recursion terminates with probability 1: each pair of tosses accepts with probability 2p(1−p) > 0, so the expected number of rounds is finite (it is 1 / (2p(1−p))). In an interview, volunteer that this only fails if p is exactly 0 or 1 — a coin with no entropy to extract.

rand5 → rand7: Base Expansion, Then Reject

Now the flagship pattern. Given rand5() — uniform on [0, 5), so P(rand5() == 0) = 1/5 — implement rand7(), uniform on [0, 7). On LeetCode you will meet the mirror image, LC 470 Implement Rand10() Using Rand7(); the technique is identical, so master it once here.

The instinct to write rand5() + rand5() is a trap: the sum ranges over 0..8 but the middle values are far more likely than the ends (it is a triangular distribution — there is only one way to make 0 but many ways to make 4). The fix is to treat the two rolls as digits in base 5, exactly like hashing coordinates into one number:

building a uniform rand25

rand5()*5 + rand5()   →   uniform over [0, 25)

        rand5()  (second digit, the column)
        0   1   2   3   4
      +--------------------
   0  | 0   1   2   3   4
r  1  | 5   6   7   8   9        every one of the 25 cells is
o  2  |10  11  12  13  14        hit by exactly one (row, col)
w  3  |15  16  17  18  19        pair → each value has prob 1/25
   4  |20  21  22  23  24

Now we have a uniform rand25, but 25 is not a multiple of 7. Reject the ragged tail: keep 0..20 (that's 21 = 3 × 7 equally likely values) and re-roll on 21..24. Every surviving value maps to a rand7 result by grouping in threes — tmp / 3 (or equivalently tmp % 7).

Java — rand7 from rand5

public int rand7() {
    int tmp = rand5() * 5 + rand5();   // uniform [0, 25)
    if (tmp < 21) return tmp / 3;         // keep 0..20 = 3*7, group by 3
    return rand7();                       // reject 21..24 and retry
}

// the loop form some interviewers prefer — same logic, no recursion
public int rand7Loop() {
    int tmp;
    do {
        tmp = rand5() * 5 + rand5();
    } while (tmp > 20);
    return tmp / 3;
}

The generalization — randomM → randomN — falls out of two observations about the relative sizes of m and n:

m > n (e.g. rand25 → rand7): you already have enough range; just reject down to the largest multiple of n that fits, then divide.
m < n (e.g. rand5 → rand25): you're short on range; combine rolls as digits — randomM*M + randomM — to inflate the range until it covers n, then reject as above.
m ≫ n (e.g. rand100 → rand9): group in bulk — ⌊100/9⌋ = 11 values per bucket, keep 0..98, reject 99, then divide by 11.
m ≪ n: keep inflating naively (multiply in more digits) until the range is at least n.

the mistake that fails the round

Never add two independent rolls to widen a range, and never tack on a constant. Sums pile up in the middle; only a positional combination (hi * base + lo) keeps the joint distribution flat. If you can articulate why the sum is triangular, you've shown the interviewer you understand the distribution and not just the incantation.

rand2 → randN: One Bit Builds Everything

Push the idea to its limit: with only rand2() — a single fair bit — construct randN() for arbitrary N. Bits are the atoms of range: stack k of them and you can address 2^k values. The reachable ranges march up in powers of two minus one:

bits → range, and the numbers worth memorizing

max value with k bits:  0  1  3  7  15  31  63  127  255 ...   (2^k - 1)
bits needed for N:      k = floor(log2(N)) + 1

10 bits → 1024            ≈ 1K
20 bits → 1024 * 1024     ≈ 1 million (1M)
30 bits → 1024^3          ≈ 1 billion (1G)          1 byte = 8 bits

So to build randN(): draw k = ⌈log2 N⌉ bits, assemble them into a number in [0, 2^k) by weighted sum (each bit doubles the accumulator), and — the now-familiar move — reject anything ≥ N and retry. Because 2^k < 2N, the acceptance probability is above one half, so retries stay cheap.

Java — randN from a single fair bit

// rand2() returns 0 or 1, each with prob 1/2
public int randN(int n) {
    int bits = 1, range = 2;
    while (range < n) { range *= 2; bits++; }   // smallest 2^bits >= n

    while (true) {
        int val = 0;
        for (int i = 0; i < bits; i++)
            val = val * 2 + rand2();       // weighted sum: assemble a k-bit number
        if (val < n) return val;              // accept
        // else reject and re-draw all bits
    }
}

This closes the loop with the rest of the session: rand2 → randN is the atomic version of rand5 → rand7. Whatever source you're handed — a fair bit, a fair die, a biased coin you've already de-biased — the recipe is the same three words: expand, combine, reject.

Fisher–Yates: Shuffling a Deck the Right Way

A frequent OOD-flavored ask (see the Java OOP & design session for the class-modeling half): design a Deck and implement shuffle() so that every card is equally likely to land at every position — a uniform random permutation, 1/52 for each card in each slot. The corresponding LeetCode is LC 384 Shuffle an Array.

The algorithm is random, then swap, walking position by position. Fix position 0 and pick its occupant uniformly from all 52 cards; fix position 1 and pick from the remaining 51; and so on. Position 0 gets any given card with probability 1/52. Position 1 gets a given card with probability (51/52) · (1/51) = 1/52 — the 51/52 is the chance it survived round 0, the 1/51 the chance it's chosen now. The telescoping continues for every slot, which is exactly the uniformity proof.

Java — Fisher–Yates, both directions

class Deck {
    private List<Card> cards;
    private Random random = new Random();

    // forward: fix position i, draw its occupant from the unshuffled tail [i, n)
    public void shuffle() {
        for (int i = 0; i < cards.size(); i++) {
            int j = i + random.nextInt(cards.size() - i);  // uniform in [i, n)
            Collections.swap(cards, i, j);
        }
    }

    // backward: same guarantee, i from n-1 down to 1, j in [0, i]
    public void shuffleBackward() {
        for (int i = cards.size() - 1; i >= 1; i--) {
            int j = random.nextInt(i + 1);              // uniform in [0, i]
            Collections.swap(cards, i, j);
        }
    }
}

The bug interviewers hunt for: choosing j uniformly from [0, n) on every iteration instead of from the shrinking window [i, n). That version produces n^n equally likely execution paths mapped onto n! permutations, and since n^n is not divisible by n!, some permutations come out more often than others — subtly, provably non-uniform. The one-line difference between the correct and broken versions is the entire question. Both correct variants are O(n) time and shuffle in place.

Reservoir Sampling: Sampling a Stream You Can't Store

The scenario from the notes: a security desk must pick one entering person to screen, uniformly at random, but people arrive in an unbounded stream and you can't buffer them all — you must be able to answer "who's chosen so far?" at any instant, having seen each person exactly once. This is reservoir sampling, and it powers LC 382 Linked List Random Node and LC 398 Random Pick Index.

For a single pick (k = 1): keep the first arrival; when the n-th person arrives, replace the current choice with probability 1/n. The induction is clean — assume after n − 1 arrivals each has probability 1/(n−1); the newcomer takes over with probability 1/n, and any incumbent survives with probability (1/(n−1)) · (n−1)/n = 1/n. Everyone ends at 1/n.

Java — reservoir sampling, k = 1 and k slots

// pick ONE element uniformly from a stream of unknown length
int chosen = -1, count = 0;
for (int x : stream) {
    count++;                              // x is the n-th value seen
    if (random.nextInt(count) == 0)      // prob 1/n
        chosen = x;
}
// count==1 → always chosen; later arrivals win their slot with prob 1/n

// pick K elements (the 100-target version from the notes)
int[] reservoir = new int[100];
int n = 0;
for (int x : stream) {
    if (n < 100) {
        reservoir[n] = x;                 // fill the pool with the first 100
    } else {
        int j = random.nextInt(n + 1);      // uniform in [0, n]
        if (j < 100) reservoir[j] = x;      // enters with prob 100/(n+1)
    }
    n++;
}

The whole appeal is the resource profile: one pass, O(k) memory, no knowledge of the stream length. That's why it's the standard answer for "sample from a file too big for RAM" or "random node in a list you can only traverse once." When the interviewer stresses streaming or unknown size, reservoir sampling is the reflex.

Weighted Random by Cumulative Probability

Not every distribution is uniform. Suppose visitors break down as China 7, US 2, Japan 1, and you must pick a country in proportion to those weights. Lay the weights end-to-end on a number line and the picture is a set of buckets:

cumulative buckets on [0, total)

weights:  China 7      US 2   Japan 1
          |-----------|------|---|
          0           7      9   10   ← prefix sums = [7, 9, 10], total = 10

draw a uniform r in [0, 10):
   r in [0,7) → China     r in [7,9) → US     r in [9,10) → Japan

The prefix sums are the cumulative distribution. Draw a uniform value in [0, total), then find the first bucket whose running sum exceeds it — a first-greater-than search, which is a textbook binary search over the prefix array (the exact binary-search template from Session 1). This is LC 528 Random Pick with Weight.

Java — LC 528 Random Pick with Weight

class Solution {
    private int[] prefix;
    private int total;
    private Random random = new Random();

    public Solution(int[] w) {
        prefix = new int[w.length];
        int sum = 0;
        for (int i = 0; i < w.length; i++) {
            sum += w[i];
            prefix[i] = sum;                     // cumulative distribution
        }
        total = sum;
    }

    public int pickIndex() {
        int target = random.nextInt(total) + 1;  // [1, total]
        int left = 0, right = prefix.length - 1;
        while (left < right) {                     // first prefix[i] >= target
            int mid = left + (right - left) / 2;
            if (prefix[mid] < target) left = mid + 1;
            else                     right = mid;
        }
        return left;
    }
}

Construction is O(n) once; each pick is O(log n). This is the same machinery behind sampling a point inside one of several weighted rectangles or intervals — pick the region by weight, then pick uniformly inside it. And note the callback the notes flag: the cumulative-count idea is a cousin of the majority-element / boss-counting trick — both slice a total into proportional territory on a line.

Where Does Randomness Even Come From?

The closer is deliberately unsettling: implement random() without calling any random function. You can't manufacture entropy from pure logic, so the honest answer names an external source — and then reuses everything above to shape it.

The reduction is random(n) → random(2) → getCurrentTime() % 2. A low-order bit of a high-resolution clock (System.nanoTime() % 2), or the jitter in thread scheduling, or a hardware entropy source, gives you an approximately fair bit — a random(2). Feed that bit into the rand2 → randN constructor from earlier and you have random(n) for any range. The interview point is not that nanoTime() % 2 is a good RNG (it is weak and predictable, unfit for anything security-sensitive), but that the entire hierarchy collapses to a single fair bit plus rejection sampling. Get one bit of real entropy and the rest is construction.

Problem Checklist

Problem	Pattern	Complexity
Biased coin → fair coin	von Neumann rejection (10/01)	O(1) expected, 1/(2p(1−p)) rounds
rand5 → rand7 / LC 470 Rand10 from Rand7	base expansion + reject tail	O(1) expected
randomM → randomN	combine digits and/or reject	O(1) expected
rand2 → randN	k bits + reject if ≥ n	O(log n) expected
LC 384 Shuffle an Array	Fisher–Yates swap into [i, n)	O(n), in place
LC 382 / 398 Reservoir sampling	n-th arrival wins slot w.p. k/n	O(n) time, O(k) space
LC 528 Random Pick with Weight	prefix sums + binary search	O(n) build, O(log n) pick
random() without random()	time/entropy bit → rand2 → randN	O(log n) per value

takeaway

Randomization is a small toolkit with one obsession: prove it's uniform. The spine is expand → combine → reject — inflate a weak source, glue rolls together as positional digits (never sums), and reject the ragged remainder so the survivors stay flat. On top sit three named patterns: Fisher–Yates (swap into the shrinking tail) for permutations, reservoir sampling (the n-th arrival wins with probability k/n) for streams, and prefix-sum + binary search for weighted choice. In every answer, say the probability argument out loud — that's the part interviewers are actually grading.

🎯 interview hot-takes

How do you make a fair coin from a biased coin? Toss twice; 10 → 1, 01 → 0, discard 11/00 and retry. It works because P(10) = P(01) = p(1−p) exactly, so the bias cancels regardless of p. Fails only when p is 0 or 1 (von Neumann trick).
How do you build rand7 from rand5 (or Rand10 from Rand7)? Combine as digits, not a sum: rand5*5 + rand5 is a uniform [0,25). Keep 0..20 (that's 3×7), re-roll on 21..24, then /3. Adding two rolls gives a triangular distribution — that's the trap.
What is reservoir sampling and when do you use it? Sampling k items uniformly from a stream of unknown/huge length in one pass and O(k) space. Keep the first k; the n-th arrival replaces a random slot with probability k/n. Induction proves uniformity. Powers LC 382 and LC 398.
How do you do weighted random selection? Prefix-sum the weights (the cumulative distribution), draw a uniform value in [1, total], and binary search for the first bucket ≥ it. O(n) build, O(log n) pick — LC 528.
What's the correct Fisher–Yates shuffle and its classic bug? For i in [0, n), swap i with a random j in [i, n) — the shrinking window is the point. Swapping with any index in [0, n) every time yields n^n paths over n! permutations and is provably biased.