For Kth largest, should you use a heap or quickselect?

Quickselect averages O(n) but degrades to O(n^2) and mutates the input array. A bounded size-k min-heap is O(n log k), touches each element once, and keeps working on a stream that does not fit in memory. Rule of thumb: k tiny or streaming data → size-k heap; k about n/2 and the array fits in memory → quickselect; k equals n → just sort.

What is the difference between HashMap, Hashtable and HashSet?

HashMap stores key to value, is not synchronized, and allows one null key plus any number of null values. Hashtable has the same API but is synchronized (thread-safe via locks, slower, and forbids null keys or values) — it is essentially legacy. HashSet stores keys only and is backed by a HashMap; it is the go-to for dedup and membership. All three offer O(1) average add, get, and contains.

Why is HashMap get() O(1), and what happens on a collision?

A good hash function maps every key to a deterministic, well-scattered bucket index in [0, capacity). Collisions are inevitable, so buckets use separate chaining (a list or tree per bucket) or open addressing (probe to the next free slot). The table resizes at load factor 0.75 (new capacity about n / 0.75) so chains stay short — average O(1), worst case O(n) if everything collides into one bucket.

Comparable vs Comparator — which one should you use?

Comparable defines a type's single natural order via compareTo and lives on the class itself. Comparator is external and pluggable via compare — you can define as many orderings as you need and even order a class you cannot modify. Comparator is the recommended default for flexibility. For a min-heap on frequency, return a.freq - b.freq; flip to b.freq - a.freq for a max-heap.

Leetcode S6 — Heap, HashMap & Graph

Three data structures in one sitting, and they share a single idea: each one buys you exactly one cheap operation, and interview problems are really about recognizing which cheap operation the question wants. A heap hands you the minimum or maximum in O(1) and is the engine behind every top-K question. A HashMap hands you membership and lookup in O(1) — as long as you understand the fine print behind that promise. A graph is just nodes, edges, and a storage decision. This session builds all three from the metal up and then attacks the same handful of problems — Kth largest, top-K words, missing number, array intersection — from the sort, heap, hash, and two-pointer angles so you learn to choose a tool rather than memorize one.

⚡ Quick Takeaways

A heap is a complete binary tree stored in a flat array — the node at index i has parent (i-1)/2 and children 2i+1 / 2i+2. That index arithmetic is the whole data structure — no pointers, cache-friendly, space-efficient.
Get is O(1); everything else is O(log n) — you can only ever see the top. Insert, poll, and update each sift one root-to-leaf path. Building a heap from scratch with heapify is O(n), not O(n log n).
Kth-largest has four gears — sort O(n log n), quickselect O(n) average, a full min-heap O(n + k log n), or a bounded size-k heap O(n log k). Pick by how k compares to n and whether the data streams.
A size-k heap is the streaming answer — cap the heap at k and it keeps working when the data doesn't fit in memory. That's the exact primitive that scales up to the distributed / MapReduce version.
HashMap's O(1) is a promise, not magic — a well-scattered hash function, a load-factor-triggered resize (capacity ≈ n / 0.75), and collision handling (chaining or open addressing) are what keep buckets short.
A graph is nodes + edges + a storage choice — adjacency list for sparse graphs, matrix for dense graphs or O(1) edge lookup. Direction and cycles (DAG) decide which algorithms you can even run.

tldr

Heap: a complete binary tree in an array, top-only access, O(log n) mutations — the engine behind top-K. HashMap/Set: O(1) membership via hashing, with load factor and collision handling as the fine print. Graph: nodes, edges, directed-vs-undirected, adjacency-list-vs-matrix. The recurring skill is attacking one problem shape (Kth largest, top-K words, missing number, intersection) from sort / heap / hash / two-pointer angles and choosing deliberately.

Where Heap, HashMap and Graph Fit

By this point in the course the toolbox already holds a lot. Arrays gave us sort, binary search, and two pointers. Linked lists gave us dummy heads, fast/slow pointers, and recursion. Queues and stacks gave us ordered access. Trees and BSTs gave us recursion in two flavors — bottom-up (classical recursion that returns a result up the call stack) and top-down (carry state down as parameters) — plus the divide-and-conquer move that turns an n-sized problem into two n/2 subproblems in O(n) merge work, for O(n log n) overall.

This session adds the three heavier structures that show up constantly in real systems: the heap (exposed in Java as PriorityQueue), the HashMap / HashTable / HashSet family, and the graph. They look unrelated, but the through-line is the same: identify the single operation the structure makes cheap, and let that drive the algorithm.

Heap = Complete Binary Tree, Living in an Array

A heap is a complete binary tree — every level full except possibly the last, which fills left to right — with one ordering rule: a parent's value is smaller than both children (min-heap) or larger than both (max-heap). Crucially, that's a parent-child rule only. Siblings are unordered; there is no total order. That is exactly why you can only trust the element at the very top.

Because the tree is complete, it maps perfectly onto an array with no gaps. If a node sits at index i (0-indexed), its parent is at (i-1)/2, its left child at 2i+1, and its right child at 2i+2. No node pointers, contiguous memory, cache-friendly. The five facts worth memorizing:

Heap order — min-heap (smallest on top) or max-heap (largest on top).
Always complete — that's the invariant that makes the array representation valid.
Only the top is accessible — peek() is the single element you can read directly.
Operations — insert (push/offer/add), remove-top (poll/remove), read-top (peek/get), and update.
Initialize with heapify — turn an arbitrary array into a valid heap in linear time.

Operation	Cost	Why
get / peek top	O(1)	the root is always index 0
insert / offer	O(log n)	bubble up one leaf-to-root path
poll / delete-root	O(log n)	sift the new root down one path
update	O(log n)	sift up or down from the changed node
build / heapify	O(n)	only internal nodes sift, and cheaply

In Java, new PriorityQueue<>() is a min-heap by default; pass a Comparator to flip it to a max-heap or to order custom objects (more on that below).

Building a Heap by Hand — heapify, insert, delete

Writing the heap yourself, even once, makes the complexities obvious. Here is the full min-heap over a raw int[]:

Java — a min-heap from scratch

public class Heap {
    public int[] A;          // the array used as the heap
    public int n;            // number of live nodes

    public Heap(int[] B) {
        A = new int[B.length];
        System.arraycopy(B, 0, A, 0, B.length);
        n = A.length;
        for (int i = n / 2 - 1; i >= 0; i--)  // only non-leaf nodes, bottom-up
            heapify(i);
    }

    private void heapify(int i) {                 // sift node i down (min-heap)
        int left = 2 * i + 1, right = 2 * i + 2, min = i;
        if (left  < n && A[left]  < A[min]) min = left;   // child must exist (index < n)
        if (right < n && A[right] < A[min]) min = right;
        if (min != i) {
            int tmp = A[i]; A[i] = A[min]; A[min] = tmp;
            heapify(min);                          // keep sifting where the swap landed
        }
    }

    public void insert(int key) {                 // append at the end, then bubble up
        int i = n++;                              // grow; i is the new last slot
        while (i > 0 && A[(i - 1) / 2] > key) {   // A[(i-1)/2] is the parent
            A[i] = A[(i - 1) / 2];               // parent slides down
            i = (i - 1) / 2;
        }
        A[i] = key;
    }

    public int deleteRoot() {                    // remove the minimum
        if (n < 1) return -1;                    // empty guard
        int min = A[0];
        A[0] = A[--n];                          // last element becomes the new root
        heapify(0);                             // percolate it back down
        return min;
    }
}

Why heapify is O(n), not O(n log n). The trick is that the leaves — indices n/2 … n-1 — are already valid one-element heaps, so you never touch them. You only sift the internal nodes, from n/2 - 1 up to the root. Now count the work by height: at height h there are at most n / 2^(h+1) nodes, and each costs O(h) to sift down. The total is Σ h · n / 2^(h+1), and that sum converges — Σ h / 2^h = 2 — so it collapses to O(n). Building a heap is linear. (Heapsort is not, because the k pops that follow each cost O(log n).) Insert and delete-root each walk a single root-to-leaf path, so both are O(log n).

Kth Largest Element (LC 215) — Four Gears

LC 215 Kth Largest Element in an Array is the canonical heap problem, and it's worth solving four ways because the best answer depends on the inputs.

S1 — sort, then index. Sort descending and return arr[k-1]. O(n log n) time, trivial to write. Offer it as the baseline, then improve on it.

S2 — quickselect. Borrow the partition step from quicksort, but recurse into only the side that contains the answer. Quicksort doesn't need a fully sorted array here; once the pivot lands in a position with exactly k-1 larger elements ahead of it, the pivot is the answer. The partition tells you which side to keep and which to discard.

Java — LC 215 quickselect

class Solution {
    private Random rand = new Random();

    public int findKthLargest(int[] nums, int k) {
        int index = quickSelect(nums, 0, nums.length - 1, k);
        return nums[index];
    }

    // returns the array index holding the k-th largest value
    private int quickSelect(int[] nums, int left, int right, int k) {
        int pivotIndex = left + rand.nextInt(right - left + 1);
        int pivot = nums[pivotIndex];
        swap(nums, pivotIndex, right);        // park the pivot at the end

        int i = left, j = right - 1;
        while (i <= j) {
            while (i <= j && nums[i] < pivot) i++;   // smaller values drift left
            while (j >= i && nums[j] > pivot) j--;   // larger values drift right
            if (i <= j) swap(nums, i++, j--);
        }
        swap(nums, i, right);                  // pivot back to its final slot

        int m = right - i + 1;                 // count of values >= pivot
        if (m == k) return i;
        else if (m < k) return quickSelect(nums, left, i - 1, k - m);
        else return quickSelect(nums, i + 1, right, k);
    }

    private void swap(int[] nums, int i, int j) {
        int t = nums[i]; nums[i] = nums[j]; nums[j] = t;
    }
}

The complexity story is the whole point. In the average case you spend O(n) to shrink an n-sized problem to n/2, so n + n/2 + n/4 + … = 2n → O(n). In the worst case, an adversarial pivot peels off one element per pass — n + (n-1) + (n-2) + … → O(n²) — which is why the pivot is randomized. Space is O(1), but note it mutates the input array. Watch the offset when you discard the left side: recurse with k - m, not k.

S3 — a full min-heap of size n. Heapify the whole array in O(n) (or insert one by one in O(n log n)), then poll k times at O(log n) each. Total O(n + k log n), space O(n).

S4 — a bounded min-heap of size k. The elegant one: keep a min-heap that never holds more than k elements — the k largest seen so far. Heapify the first k in O(k), then for each remaining element, if it beats the heap's top, evict the top and insert it. The top of a size-k min-heap is, by construction, the k-th largest.

Java — LC 215 bounded size-k heap

// k-th largest = smallest of the k largest → keep a size-k MIN-heap
public int findKthLargest(int[] nums, int k) {
    PriorityQueue<Integer> pq = new PriorityQueue<>();   // min-heap by default
    for (int val : nums) {
        pq.offer(val);
        if (pq.size() > k) pq.poll();      // drop the smallest, keep the top k
    }
    return pq.peek();                        // heap top = k-th largest
}

Time is O(k + (n-k) log k) ≈ O(n log k), space O(k). S3 vs S4 depends on k relative to n. When k is tiny (say k=1, "the max"), the size-k heap is basically O(n). When k ≈ n/2 and the array fits in memory, quickselect's O(n) wins outright. When k = n, you might as well sort. And there's a decisive tiebreaker: the size-k heap is the only option that survives when the data streams and can't all fit in memory — you hold just k elements at a time. Follow-up often heard: "return all k smallest, in sorted order." Then sort the k, or run quickselect to isolate them and sort that slice.

Top K Frequent Words (LC 692) & Comparator vs Comparable

Top K Frequent Words in a book (LC 692) chains a HashMap and a heap. Count every word's frequency into a Map<String, Integer>, wrap each (word, freq) pair in a small object, and push it into a size-k heap ordered by frequency. To keep the k most frequent, use a min-heap on frequency so the least frequent word is always the one that falls out when the heap overflows.

The direction of the comparator is the whole game, and it's a classic interview stumble. This is also where Comparable vs Comparator matters:

Java — comparator direction, Comparable vs Comparator

class Element {
    String word;
    int freq;
}

// return a.freq - b.freq  →  MIN-heap (least frequent on top, so it gets evicted)
// return b.freq - a.freq  →  MAX-heap
PriorityQueue<Element> heap =
    new PriorityQueue<>((a, b) -> a.freq - b.freq);

// Comparable: the type's single natural order — one per class
class Employee implements Comparable<Employee> {
    int salary;
    public int compareTo(Employee o) { return this.salary - o.salary; }  // low first
}

// Comparator: external and pluggable — define as many orderings as you want
Arrays.sort(employees, new Comparator<Employee>() {
    public int compare(Employee a, Employee b) { return a.getName().compareTo(b.getName()); }
});

Comparable defines the one "natural" order a type has, via compareTo on the class itself — there can only be one. Comparator is an external object with a compare method; you can define as many as you need and even sort a class you can't modify. Comparator is the recommended default for exactly that flexibility. To walk a map's keys, iterate for (String key : map.keySet()).

The notes list three more ways to scale this problem up, which is a nice systems-design bridge. S2 — TreeMap: a red-black tree keeps keys in sorted order, useful when you want ordered traversal rather than just the top k. S3 — distributed / limited memory: when the book is too big for one machine, partition the words across machines, compute a local top-k on each, then merge. S4 — big data / MapReduce: map emits (word, 1), reduce sums per word, and a final top-K pass merges the partitions. The size-k heap you just wrote is precisely the per-partition and final-merge primitive.

HashMap, HashTable & HashSet

The distinction starts with what they store: a Map holds key→value pairs; a Set holds keys only and exists to answer "have I seen this before?" A Set is an interface, implemented by HashSet, and it never holds duplicates. A Map is an interface, implemented by HashMap.

	HashMap	Hashtable
Thread safety	not synchronized (faster single-threaded)	synchronized via locks (thread-safe, slower)
Null keys/values	one null key, any number of null values	no null keys or values
Use when	the default choice	essentially legacy

Unsynchronized objects generally outperform synchronized ones, so HashMap is the everyday pick; reach for explicit concurrency tools (or ConcurrentHashMap) when you actually need thread safety. The core APIs are all O(1) average:

Set — set.add(key), set.contains(key), set.remove(key). add returns a boolean: false if the element was already present, which is the trick behind dedup and intersection.
Map — map.put(key, value), map.get(key), map.containsKey(key), map.remove(key). put returns the old value (or null) — handy for detecting overwrites.

Two practical warnings from the notes. First, when you drop objects into a HashSet for dedup, be clear on whether you're deduping by reference or by value — the latter requires proper equals/hashCode, and getting this wrong is a silent bug. Second, if you need one-to-many mapping, bundle the multiple fields into a small class and use that as the value. A neat real-world example is a self-destructing message store: Map<Key, (message, expireTime)> — the value carries both the payload and a TTL, and you check expiry on read.

Inside HashMap — Why get() Is O(1)

Interviewers love to push past "it's O(1)" to "why?" The answer is a chain of design decisions.

The promise. A hash function turns any key into a bucket index that is stable (the same key always maps to the same bucket), deterministic, and well-scattered across [0, capacity) with no obvious pattern. That's what makes the lookup a single array access instead of a scan.

Collisions are inevitable. By the pigeonhole principle, distinct keys will sometimes land in the same bucket. Two standard fixes: separate chaining (each bucket holds a linked list — or, once long, a tree — of entries) and open addressing (on collision, probe forward to the next free slot, e.g. linear probing).

Keeping buckets short. As the table fills, collisions rise, so a load factor caps occupancy. Java resizes at 0.75: once the table is three-quarters full, it grows and rehashes every entry into a larger array — new capacity ≈ n / 0.75 so the table stays roughly a quarter empty. Resizing is O(n), but it's rare, so lookups stay amortized O(1); the pathological worst case is O(n) when every key collides into one bucket. In distributed hash tables the same idea appears as consistent hashing, which minimizes how many keys have to move when you add or remove a bucket.

A concrete hash function for strings — a polynomial rolling hash — makes it tangible:

Java — string hashing

int hash(String key) {
    int sum = 0;
    for (char c : key.toCharArray()) {
        sum = sum * 32 + (int) c;      // polynomial rolling hash (31/32 are typical)
        sum %= HASH_MAP_CAPACITY;         // fold back into the bucket range
    }
    return sum;                           // stable, deterministic, well-scattered
}

Multiplying by a small constant and folding in each character spreads similar strings (like "abc" and "acb") into different buckets — that spread is what keeps chains short and get() effectively constant-time.

Missing Number (LC 268) — Five Angles

LC 268 Missing Number — an array holds n distinct numbers drawn from [0, n], and exactly one is missing. Five approaches, each teaching a different reflex:

S1 — XOR. XOR every index 0…n together with every value in the array. Each present number cancels against its index (x ^ x = 0), leaving only the missing number. O(n) time, O(1) space, and no overflow risk — the cleanest answer.
S2 — sum. The expected sum is n(n+1)/2; subtract the actual sum of the array and the difference is the missing number. O(n)/O(1), but watch integer overflow for large n.
S3 — sort, then scan. Sort, then find the first index where value != index, or compare adjacent elements. Mind the edge cases the notes flag: a single element, or nothing missing until the very end.
S4 — sort + binary search. On a sorted [0…n]-with-one-gap, nums[mid] > mid means the gap is to the left, otherwise to the right. O(n log n) dominated by the sort, but a good demonstration that the missing element is a flip-boundary — see Session 1.
S5 — HashSet / HashMap. Drop every value into a set, then probe 0…n for the absentee. This generalizes cleanly to the richer counting variants — "each number appears m times except n that appear k times" — which a frequency map answers directly.

Intersection of Two Arrays (LC 349 / 350)

Given two arrays, find their common elements. Three approaches, chosen by the shape of the input.

S1 — HashSet. Add the smaller array into a set (to save space), then scan the larger one; any element already in the set is common.

Java — intersection via HashSet

public List<Integer> intersection(int[] arr1, int[] arr2) {
    if (arr1 == null || arr1.length == 0 ||
        arr2 == null || arr2.length == 0) return new ArrayList<>();

    if (arr1.length < arr2.length) return intersection(arr2, arr1);  // set holds the smaller one

    List<Integer> ans = new ArrayList<>();
    Set<Integer> set = new HashSet<>();
    for (int x : arr2) set.add(x);        // smaller array → the set
    for (int x : arr1) {                    // scan the larger array
        if (!set.add(x)) ans.add(x);       // add()==false → already there → common
    }
    return ans;
}

One caveat the notes call out: if the larger array contains duplicates of a common element, this emits it multiple times (that's the LC 350 "keep multiplicity" behavior). If you want each common value once (LC 349), collect results into a set instead of a list. S2 — two pointers: if both arrays are already sorted, walk two pointers together in O(n + m) with O(1) extra space. S3 — binary search: if one array is enormous and the other tiny, sort the big one and binary-search each small element into it — O(small · log big). The lesson is the same one from the whole session: recognize which resource dominates and let that pick the tool.

Graph — Nodes, Edges, and How to Store Them

A graph is nothing more than nodes connected by edges. In code, the object-oriented representation is a node that carries its value, its neighbours, and a visited flag:

Java — a graph node

class GraphNode {
    int val;
    List<GraphNode> neighbours;   // unweighted edges
    // List<Pair> neighbours;       // weighted: Pair = (GraphNode, weight)
    boolean visited;

    GraphNode(int x) {
        this.val = x;
        neighbours = new ArrayList<>();
    }
}

For a weighted graph, a neighbour is a Pair of (node, weight) rather than a bare node; if the weights are keyed by neighbour you can store them in a HashMap. Two properties shape every graph problem:

Directed vs undirected. Directed edges go one way — Twitter's follow/unfollow is directed (I can follow you without you following me). Undirected edges are mutual — a Facebook or WeChat friendship goes both ways.
Connectivity and cycles. Is the graph connected? Does it contain a cycle? A DAG (directed acyclic graph) is the important special case: course prerequisites, where you must take A before B, form a DAG, and "can you finish all the courses?" — LC 207 Course Schedule — is answered by topological sort, i.e. detecting whether a cycle exists.

How you store a graph is a real decision with tradeoffs:

Representation	Space	Best for
Adjacency matrix	O(V²)	dense graphs, O(1) edge lookup; wastes space on sparse graphs, and a plain matrix doesn't encode direction unless you make it asymmetric
Adjacency list	O(V + E)	sparse graphs — the everyday default
Node objects	O(V + E)	traversal / clone-graph style problems, where you already hold node references

Matrices and adjacency lists are the common academic-research representations; the node-object form is what you usually manipulate in interview traversal problems. Which one you pick flows directly from density and from the operation you need to be cheap — the same recognition muscle this whole session has been training.

Problem Checklist

Problem	Pattern	Complexity
Build-heap / heapify	sift only internal nodes, bottom-up	O(n)
LC 215 Kth Largest Element	quickselect / bounded size-k min-heap	O(n) avg / O(n log k)
LC 692 Top K Frequent Words	freq HashMap + size-k min-heap	O(n log k)
LC 268 Missing Number	XOR / sum / sort+BS / HashSet	O(n)
LC 349 / 350 Intersection of Two Arrays	HashSet / two pointers / binary search	O(n + m)
LC 207 Course Schedule	DAG topological sort / cycle detection	O(V + E)

takeaway

Heap, HashMap and Graph are three "one cheap operation" structures. The heap gives you the min/max in O(1) and is the engine of every top-K question — reach for a bounded size-k heap when data streams, quickselect when it all fits and you want O(n). The HashMap gives you O(1) membership as long as the hash scatters well and you resize before buckets grow long. The graph is nodes + edges + a storage decision, with direction and cycles gating which algorithms apply. Across LC 215, 692, 268, and 349 you attacked the same shapes from sort, heap, hash, and two-pointer angles — the real skill is choosing the angle, not memorizing one.

🎯 interview hot-takes

Kth largest — heap or quickselect? Quickselect averages O(n) but degrades to O(n²) and mutates the array; a bounded size-k min-heap is O(n log k), touches each element once, and works on a stream that doesn't fit in memory. k tiny → bounded heap; k ≈ n/2 with the array in memory → quickselect; k = n → just sort.
HashMap vs Hashtable vs HashSet? HashMap: key→value, not synchronized, allows one null key. Hashtable: same API but synchronized (slower, no nulls) — legacy. HashSet: keys only, backed by a HashMap, for dedup and membership. All O(1) average.
Why is HashMap get() O(1), and what about collisions? A good hash function scatters keys into buckets deterministically; collisions go into a chain (or probe to the next slot); the table resizes at load factor 0.75 (new capacity ≈ n/0.75) so chains stay short. Average O(1), worst O(n) if everything collides.
Comparable vs Comparator? Comparable is the class's one natural order (compareTo); Comparator is external and pluggable (compare) — define as many as you like, or order a class you can't modify. For a min-heap on frequency: a.freq - b.freq.
How do you store a graph? Adjacency list for sparse graphs (space O(V+E), the default); adjacency matrix for dense graphs or O(1) edge lookup (space O(V²)); node-with-neighbours objects for clone/traversal problems. Add direction for follow-graphs; check for cycles before a topological sort (DAG).