How is dynamic programming just mathematical induction?

DP is base case plus induction rule — you define f(0), then a transition f(k) to f(k+1), exactly like a proof by induction. The only addition is memoization: you store each subproblem's answer in a table so it is computed once. DP fills that table bottom-up (small to large); recursion computes it top-down (large to small).

When should I reach for DP in a coding interview?

Four keyword signals: max/min, longest/shortest, yes/no feasibility, and count the number of ways. If the question asks for one of those over choices that build on smaller choices, define a dp state and a transition. Note that vanilla DP returns the optimal value or the count, not the concrete plan that achieves it — reconstructing the plan needs extra bookkeeping.

How do you find the maximum-sum submatrix efficiently?

Brute force over every rectangle is O(n^6). Precompute a 2D prefix-sum so any rectangle sum is O(1) via inclusion-exclusion (D - B - C + A), which drops it to O(n^4). The best trick fixes the top and bottom rows, compresses the columns between them into a 1D array, and runs Kadane's maximum-subarray on it — O(n^3) overall.

Why does the Maximal Square DP take the min of three neighbors?

dp[i][j] is the side length of the largest all-ones square whose bottom-right corner is (i,j). To grow a square of side s+1 you need the cells above, to the left, and diagonally up-left to each already support a square of side at least s — the shortest of those three is the bottleneck, so dp[i][j] = min(top, left, top-left) + 1 when the cell is 1.

Leetcode S13 — Dynamic Programming II

The first DP session teaches you that dynamic programming is nothing more than mathematical induction with a memo table. This second session is where that idea gets stress-tested: the states go 2D, the transitions stop being one-liners, and the real difficulty moves from "what is the recurrence" to "what is the state." We walk from grid-path DP through Maximal Square, the maximum-sum-submatrix pipeline that collapses an O(n⁶) brute force into O(n³), the sequence classics (Decode Ways, Catalan-counted BSTs), and finish with the House Robber trilogy. Along the way we detour into two matrix problems — Surrounded Regions and the largest hollow square — that look like DP but are really about how you preprocess a grid.

⚡ Quick Takeaways

DP is induction plus a table — a base case, an induction rule f(k) → f(k+1), and memoization so each subproblem is computed once. Bottom-up fills the table small-to-large; recursion computes it top-down.
Four keyword signals — max/min, longest/shortest, yes/no feasibility, or count the ways. See any of them building on smaller choices and reach for a dp state.
Getting the state right is the whole game — dp[i][j] as "square ending at (i,j)" or "prefix sum to (i,j)" turns an intractable search into two nested loops.
Compress a dimension to kill complexity — the max-sum submatrix drops O(n⁶)→O(n³) by fixing two rows, flattening the columns between them, and running Kadane on the 1D strip.
DP counts, it doesn't reconstruct — vanilla DP gives you the optimal value or the number of ways, not the actual path or plan; recovering that needs extra parent pointers.
Rolling arrays save space for free — most grid DPs only read the previous row, so a 2D table collapses to a single 1D array with no logic change.

tldr

DP II is a masterclass in state design. Once you can name what dp[i][j] means — a square side, a path count, a prefix sum, "best rob up to house i" — the transition writes itself. The recurring trick of the session is dimensional collapse: prefix sums make a rectangle-sum O(1), and compressing two rows into one turns a 2D optimization into a 1D Kadane pass.

The DP Mindset: Memoization as Induction

The session opens by re-anchoring the definition from the first DP class. The key to every DP solution is memoized storage. Structurally, DP is a proof by mathematical induction: you establish a base case — f(0) — and an induction rule that carries you from f(k) to f(k+1). Nothing more mysterious than that. The memo table is simply where you cache each f(k) so it is never recomputed.

That framing immediately clarifies the difference between DP and plain recursion, which people often blur together:

	Dynamic Programming	Recursion
Direction	small → large (bottom-up)	large → small (top-down)
Stores	subproblem values in a table	the concrete result on the call stack
Recovers a plan?	not by default — value/count only	naturally, as it unwinds

The last row is a genuine interview trap and the notes flag it bluntly: DP cannot hand you the concrete solution, only the optimum or the count. If the interviewer asks not just "how many paths" but "print one path," you either add parent pointers or switch to a recursion/backtracking formulation. Knowing which question you were actually asked is half the battle.

The Four-Step DP Recipe

Every problem below is solved by the same four (really five) moves. Say them out loud in an interview — the structure is worth as many points as the code:

Definition — state precisely what dp[i] or dp[i][j] means. This is where problems are won or lost.
Base case / start — the seed values induction builds from.
Induction rule / transition — the state-transition equation relating the current index to earlier ones. Ask: does i depend on one step back, a few steps back, or all previous states?
Termination / answer — which cell holds the final answer (often dp[n], sometimes a running max).
Optimization — collapse the table (rolling array), or reuse space cleverly, once the correct version works.

To recognize a DP problem in the first place, the notes give four keyword hints: Max/Min, Longest/Shortest, Yes/No (feasibility), and Count the number of ways. And they sort DP problems into four recurring shapes, which is a useful map for the whole session:

Sequence DP — one array/string, dp[i] over a prefix (Decode Ways, House Robber).
Two-sequence DP — two strings, dp[i][j] over two prefixes (Distinct Subsequences, Interleaving String).
Matrix DP — a grid, dp[i][j] over a cell (Min Path Sum, Maximal Square).
Backpack / knapsack — capacity-indexed states, the classic weight-vs-value tradeoff.

Grid-Path DP: Min Path Sum & Unique Paths

LC 64 Minimum Path Sum is the textbook entry point, and the keyword minimum is the tell — it screams DP. Define dp[i][j] as the minimum path sum from grid[0][0] down to grid[i][j]. Since you can only move right or down, a cell is reached either from the top or from the left, so the transition is a clean two-way min:

Java — LC 64 Minimum Path Sum

public int minPathSum(int[][] grid) {
    int[][] dp = new int[grid.length][grid[0].length];
    dp[0][0] = grid[0][0];
    for (int i = 1; i < grid.length; i++)
        dp[i][0] = dp[i-1][0] + grid[i][0];        // first column: only from top
    for (int j = 1; j < grid[0].length; j++)
        dp[0][j] = dp[0][j-1] + grid[0][j];        // first row: only from left

    for (int i = 1; i < grid.length; i++)
        for (int j = 1; j < grid[0].length; j++)
            dp[i][j] = Math.min(dp[i-1][j], dp[i][j-1]) + grid[i][j];
    return dp[grid.length-1][grid[0].length-1];
}

LC 62 Unique Paths swaps the operation from min to +: dp[i][j] = dp[i-1][j] + dp[i][j-1] counts the number of ways to reach a cell. Here comes the first optimization the notes push — the rolling array. Because each row only needs the row above, you keep a single 1D dp of length n and update it in place: dp[j] += dp[j-1]. The old dp[j] is the value "from above," and the freshly updated dp[j-1] is "from the left." One line, O(n) space.

LC 63 Unique Paths II adds obstacles, and the rolling array handles it with one branch: iterate cell by cell, and whenever obstacleGrid[i][j] == 1 set dp[j] = 0 (no path routes through a wall); otherwise accumulate dp[j] += dp[j-1] as before. The obstacle simply zeroes out that column's contribution for the rest of the sweep — the transition logic is untouched.

Maximal Square and the Min-of-Three Trick

LC 221 Maximal Square — given a binary matrix, find the area of the largest all-ones square. The brute force is a cautionary tale: enumerating every square is Θ(n⁵) and every rectangle Θ(n⁶), and it re-checks the same cells over and over. DP erases that redundancy with a beautifully compact state.

Define dp[i][j] as the side length of the largest all-ones square whose bottom-right corner is (i, j). If that cell is 1, the square it anchors is limited by its three neighbors — top, left, and top-left diagonal — because if any one of them is short, a full square can't form. So you take the smallest of the three and add one:

Java — LC 221 Maximal Square

public int maximalSquare(char[][] matrix) {
    if (matrix == null || matrix.length == 0
        || matrix[0] == null || matrix[0].length == 0) return 0;

    int row = matrix.length, col = matrix[0].length, max = 0;
    int[][] dp = new int[row][col];
    for (int i = 0; i < row; i++) {           // first column seeds the base case
        dp[i][0] = matrix[i][0] - '0';
        max = Math.max(max, dp[i][0]);
    }
    for (int j = 0; j < col; j++) {           // first row seeds the base case
        dp[0][j] = matrix[0][j] - '0';
        max = Math.max(max, dp[0][j]);
    }
    for (int i = 1; i < row; i++) {
        for (int j = 1; j < col; j++) {
            if (matrix[i][j] == '0') dp[i][j] = 0;
            else dp[i][j] = Math.min(dp[i-1][j-1],
                       Math.min(dp[i-1][j], dp[i][j-1])) + 1;
            max = Math.max(max, dp[i][j]);
        }
    }
    return max * max;                          // side → area
}

Two details worth internalizing: the answer is a running max over the whole table, not dp[n-1][m-1] (the biggest square can end anywhere), and the return is max * max because the state tracks a side, not the area. The matrix[i][0] - '0' idiom converts the char digit to its int value for the base row and column.

The hollow-square cousin

The notes then pose a harder relative: given an X/O grid, find the largest hollow square whose border is all 1 (all X). Brute force spiral-checks every corner and size at O(n⁴). The clever route is preprocessing directional run-lengths: build a hor table (consecutive ones to the left of each cell) and a ver table (consecutive ones above each cell), each in O(n²). Then for each cell take small = min(hor, ver) — the guaranteed right and bottom edges — and shrink small until the matching left edge (ver[i][j-small+1]) and top edge (hor[i-small+1][j]) are both long enough. Two precomputed matrices turn an O(n⁴) search into roughly O(n³): the whole session's "trade space for time" motif in miniature.

LC 130 Surrounded Regions rides along in the same matrix-thinking discussion, though it's a graph-traversal problem, not DP. Any O connected to the border survives; everything else flips to X. The trick is to invert the logic: start a DFS or BFS from every border O, mark the whole boundary-connected region as safe (say '*' or 'U'), then in one final sweep turn every remaining O into X and restore the marked cells. It's the canonical "flood from the edges" pattern and a clean contrast to how DP handles a grid.

Patterns of Ones: From 1D Runs to Plus Signs

A short but instructive family the notes build up in stages. It starts in 1D: given 1 1 1 1 0 0 1 0 1 1, find the longest run of ones. The DP is dp[i] = dp[i-1] + 1 when the cell is 1, else 0 — yielding 1 2 3 4 0 0 1 0 1 2. That "run-length" idea is the atom everything else is built from.

Level up to 2D and the target becomes a plus sign: a center cell plus four equal arms of ones. Naively, for every center you walk out in all four directions and take the minimum arm length — O(n²·n) = O(n³) time, O(1) space. But precompute four directional run-length matrices (up, down, left, right) and each center becomes an O(1) lookup: O(n²) time at the cost of O(4n²) space. The same space-for-time lever, and the same four-direction precomputation that powered the hollow square. From there the notes gesture at fancier shapes — an ⌐ or ⊔ border — all solvable by combining those directional tables.

Maximum-Sum Submatrix: Compressing 2D into 1D

This is the intellectual centerpiece of the session — a single problem taught as a five-stage optimization from O(n⁶) to O(n³), and every stage is a reusable trick.

Stage 1 — brute force. A rectangle needs four indices to pin down, and summing it costs O(n²): O(n⁴)·O(n²) = O(n⁶). Hopeless, but it names the two costs we'll attack separately.

Stage 2 — 1D intuition first. Before touching 2D, recall the 1D toolkit. Kadane's algorithm finds the maximum subarray sum in O(n): dp[i] = dp[i-1] + arr[i] if dp[i-1] > 0, else arr[i]. And a prefix-sum array answers any range sum in O(1): with dp[i] = sum(0..i), sum(i, j) = dp[j] − dp[i-1]. A neat corollary the notes highlight: to check whether a subarray sums to k, rearrange to dp[i] + k = dp[j] and use a HashSet/HashMap — the same idea as Two Sum.

Stage 3 & 4 — prefix sums go 2D. Apply prefix sums along one direction to get row sums in O(1), then apply them to the entire matrix. Define dp[i][j] = sum of the rectangle from (0,0) to (i,j). The build and query are both inclusion-exclusion:

2D prefix sum — inclusion / exclusion

// build: sum of rectangle (0,0)..(i,j)
dp[i][j] = dp[i-1][j] + dp[i][j-1] - dp[i-1][j-1] + matrix[i][j];

// query: sum of sub-rectangle with corners A(top-left) .. D(bottom-right)
//   A --- B          submatrix = D - B - C + A
//   |     |        = dp[x][y] - dp[a-1][y] - dp[x][b-1] + dp[a-1][b-1]
//   C --- D          each rectangle sum is now O(1)

With O(1) rectangle sums, the brute force drops to O(n⁴) — we killed the inner O(n²), but four corner loops remain.

Stage 5 — compress two rows, then Kadane. The final leap borrows Stage 2's insight: if you don't need where the best sum is, only its value, Kadane replaces an O(n) scan with O(1) bookkeeping. So fix a top row a and a bottom row x (O(n²) such pairs), collapse every column between them into a single value using the row prefix sums — column_sum = prefix[x] − prefix[a-1] — and run Kadane across that flattened 1D strip to find the best left/right span. Total: O(n²) preprocessing + O(n²) row pairs × O(n) Kadane = O(n³). From O(n⁶) to O(n³) by pattern-matching a 2D problem onto its 1D skeleton.

Sequence DP: Decode Ways & Counting BSTs

LC 91 Decode Ways — how many ways can a digit string be decoded into letters (A=1 … Z=26)? Classic sequence DP: dp[0] = 1 (an empty string has one decoding), dp[1] depends on whether the first digit is a valid single, and each further position adds the ways from a valid one-digit read (dp[i-1]) plus a valid two-digit read (dp[i-2]).

Java — LC 91 Decode Ways

public int numDecodings(String s) {
    if (s == null || s.length() == 0) return 0;
    int n = s.length();
    int[] dp = new int[n + 1];
    dp[0] = 1;
    dp[1] = s.charAt(0) != '0' ? 1 : 0;

    for (int i = 2; i <= n; i++) {
        int one = Integer.parseInt(s.substring(i-1, i));   // last single digit
        int two = Integer.parseInt(s.substring(i-2, i));   // last two digits
        if (one >= 1 && one <= 9)  dp[i] += dp[i-1];
        if (two >= 10 && two <= 26) dp[i] += dp[i-2];
    }
    return dp[n];
}

The traps are all in the zeros: a lone 0 has no single-digit decoding, and only 10–26 qualify as a two-digit read (so 07 is invalid). The >= 10 lower bound on two is exactly what rejects a leading-zero pair.

LC 96 Unique Binary Search Trees is DP wearing combinatorics. To build a BST from 1…n, pick each value i as the root; the subsequence 1…(i-1) forms the left subtree and (i+1)…n the right, each built recursively. Distinct roots guarantee distinct trees. Let G(n) be the count for a sequence of length n; choosing root i contributes G(i-1) · G(n-i) (a Cartesian product of left and right shapes). Summing over roots gives the Catalan recurrence G(n) = Σ G(k-1)·G(n-k):

Java — LC 96 Unique BSTs (Catalan)

public int numTrees(int n) {
    if (n < 2) return 1;
    int[] dp = new int[n + 1];
    dp[0] = 1;                       // empty tree
    dp[1] = 1;                       // single node
    for (int i = 2; i <= n; i++)
        for (int k = 1; k <= i; k++)
            dp[i] += dp[k-1] * dp[i-k];   // root k: left size k-1, right size i-k
    return dp[n];
}

Its sibling LC 95 Unique Binary Search Trees II asks you to actually construct every tree, not just count them — which is precisely the "DP gives a number, recursion gives the objects" boundary from the opening section. You build LC 95 with recursion that returns the list of subtrees for each range and combines left×right, while LC 96 only needs the count DP above.

House Robber I / II / III

A trilogy that shows the same recurrence adapting to a line, a circle, and a tree. The core decision never changes: at each house, either skip it (keep the best so far) or rob it and add the best from two houses back.

Java — LC 198 & LC 213 House Robber I / II

// LC 198 — houses in a line, O(1) space via two rolling variables
public int rob(int[] nums) {
    int pre = 0, prePre = 0;               // best up to i-1 and i-2
    for (int x : nums) {
        int cur = Math.max(pre, prePre + x);    // skip x, or rob x + best two back
        prePre = pre;
        pre = cur;
    }
    return pre;
}

// LC 213 — houses in a circle: first and last are adjacent, so never rob both
public int robCircular(int[] nums) {
    if (nums == null || nums.length == 0) return 0;
    if (nums.length == 1) return nums[0];
    return Math.max(robRange(nums, 0, nums.length - 2),   // drop the last house
                    robRange(nums, 1, nums.length - 1));  // drop the first house
}

LC 198 is the linear base case — two variables (pre, prePre) roll forward, no array needed. LC 213 House Robber II arranges the houses in a circle, so robbing both the first and the last is illegal; the fix is elegant — run the linear solver twice, once over [0, n-2] and once over [1, n-1], and take the max. You've simply forbidden one of the two conflicting endpoints in each pass.

LC 337 House Robber III moves the money into a binary tree: rob a node and you skip its children, but may take its four grandchildren. This is tree DP with memoization — a HashMap keyed by TreeNode caches each node's best. For a node you compare "rob this node + best of the four grandchildren" against "skip this node + best of both children," and store the winner. Same skip-or-take decision as the linear version, hung on a recursion instead of a loop.

Two-Sequence & Interval DP: Three Hard Finishers

The session closes by naming three harder problems that extend the same machinery into two-dimensional string states — the kind that separate a passing candidate from a strong one:

LC 132 Palindrome Partitioning II — the minimum cuts to partition a string into palindromes. It pairs a boolean isPalindrome[i][j] interval DP with a 1D cut[i] DP: cut[i] = min over j of cut[j-1] + 1 whenever s[j..i] is a palindrome. Two DP tables cooperating.
LC 115 Distinct Subsequences — count how many times string t appears as a subsequence of s. Two-sequence DP: dp[i][j] = ways to form t[0..j) from s[0..i); when characters match you both take and skip (dp[i-1][j-1] + dp[i-1][j]), otherwise you only skip.
LC 97 Interleaving String — can s3 be formed by interleaving s1 and s2? A yes/no two-sequence DP where dp[i][j] is reachable if the current s3 character matches either s1[i-1] (from dp[i-1][j]) or s2[j-1] (from dp[i][j-1]).

All three are the two-sequence and interval shapes from the recipe — proof that once the state is named, even the "hard" DPs are the same four steps with a bigger table.

Problem Checklist

Problem	Pattern	Complexity
LC 64 Minimum Path Sum	matrix DP, min from top/left	O(mn)
LC 62 / 63 Unique Paths I / II	count paths, rolling 1D array	O(mn), O(n) space
LC 221 Maximal Square	dp = square side, min of 3 neighbors	O(mn)
Largest Hollow Square (GfG)	hor/ver run-length precompute	~O(n³)
LC 130 Surrounded Regions	flood from border (DFS/BFS, not DP)	O(mn)
Plus-Sign / Patterns of 1s	4-direction run-length matrices	O(n²)
Max-Sum Submatrix (GfG)	2D prefix sum + row compress + Kadane	O(n³)
LC 91 Decode Ways	sequence DP, 1- and 2-digit reads	O(n)
LC 96 / 95 Unique BSTs	Catalan count / recursive build	O(n²) / exponential
LC 198 House Robber	rolling max(skip, take+2back)	O(n)
LC 213 House Robber II	circular → linear DP twice	O(n)
LC 337 House Robber III	tree DP + HashMap memo	O(n)
LC 132 Palindrome Partitioning II	interval palindrome + cut DP	O(n²)
LC 115 Distinct Subsequences	two-sequence DP, take/skip	O(mn)
LC 97 Interleaving String	two-sequence yes/no DP	O(mn)

takeaway

DP II is state-design practice disguised as a problem set. Run the four-step recipe every time — define dp, seed the base case, write the transition, read off the answer, then optimize — and let the four keyword hints (max/min, longest/shortest, yes/no, count) flag the problems worth attacking with it. The signature move of this session is dimensional collapse: prefix sums make a rectangle O(1), a rolling array shrinks a 2D table to 1D, and compressing two rows turns the maximum-sum submatrix into a Kadane pass. And remember the fine print — DP hands you the value or the count, not the plan.

🎯 interview hot-takes

How is DP just induction? A base case plus an induction rule f(k) → f(k+1), with memoization so each subproblem is computed once. Bottom-up DP fills the table small-to-large; recursion does it top-down. DP stores values; recursion stores the concrete result.
When do you reach for DP? Four keyword signals — max/min, longest/shortest, yes/no feasibility, count the ways — over choices that build on smaller choices. But say up front: plain DP returns the value or count, not the actual plan.
How do you find the max-sum submatrix fast? Brute force is O(n⁶). A 2D prefix sum makes any rectangle O(1) via D-B-C+A → O(n⁴); then fix top/bottom rows, compress the columns into 1D, and run Kadane → O(n³).
Why the min of three in Maximal Square? dp[i][j] is the side of the largest all-ones square ending at (i,j); growing to side s+1 needs the top, left, and top-left neighbors to each support side ≥ s, so min(three) + 1. Answer is a running max, and area is side².
How does House Robber generalize? The decision max(skip, take + best two back) is constant across all three. A line is two rolling variables; a circle runs the line twice excluding one endpoint; a tree becomes memoized recursion over grandchildren.