Algorithms

Why this unit matters

Algorithms is one of the most mark-dense units in ISRO CS. Questions range from applying the Master theorem to compute recurrence solutions, to tracing Dijkstra on a small weighted graph, to identifying whether a problem is NP-complete. ISRO examiners frequently ask for the exact number of comparisons or swaps performed by a sorting algorithm on a specific input, so knowing average-case vs worst-case is not enough. You must be able to simulate the algorithm.

The unit pairs closely with Data Structures (heaps for heap sort, BST for binary search) and Theory of Computation (decidability analogies with P/NP).

Syllabus map

Asymptotic notation
- Big-O, Big-Omega, Big-Theta definitions
- Common complexity classes: O(1), O(log n), O(n), O(n log n), O(n^2), O(2^n)
- Recurrence relations: substitution method, recursion tree, Master theorem
Sorting algorithms
- Comparison-based: insertion, selection, bubble, merge, quick, heap
- Non-comparison: counting sort, radix sort, bucket sort
- Stability, in-place property, worst/average/best complexity
Searching and hashing
- Linear search, binary search
- Hash tables: division method, multiplication method
- Collision resolution: chaining, open addressing (linear probing, quadratic probing, double hashing)
- Load factor and expected time
Greedy algorithms
- Activity selection problem
- Fractional knapsack
- Huffman coding
- Minimum spanning tree: Kruskal and Prim
Dynamic programming
- Memoisation vs bottom-up (tabulation)
- Classic problems: 0/1 knapsack, longest common subsequence (LCS), matrix chain multiplication, edit distance, coin change
Divide and conquer
- Binary search, merge sort, quick sort, Strassen matrix multiplication
Graph algorithms
- BFS, DFS, topological sort
- Shortest path: Dijkstra (single source, non-negative weights), Bellman-Ford (negative weights, detects negative cycles), Floyd-Warshall (all pairs)
- MST: Kruskal (sort edges, union-find), Prim (grow tree greedily)
NP-completeness
- Complexity classes P, NP, NP-hard, NP-complete
- Reduction concept
- Classic NP-complete problems: SAT, 3-SAT, vertex cover, clique, Hamiltonian cycle, TSP, subset sum

Asymptotic notation

Master theorem

For recurrences of the form T(n) = a * T(n/b) + f(n) where a >= 1, b > 1:

Compute c = log_b(a).

Case 1: f(n) = O(n^(c - epsilon)) for some epsilon > 0. Then T(n) = Theta(n^c).
Case 2: f(n) = Theta(n^c * log^k n) for k >= 0. Then T(n) = Theta(n^c * log^(k+1) n).
Case 3: f(n) = Omega(n^(c + epsilon)) and af(n/b) <= df(n) for some d < 1. Then T(n) = Theta(f(n)).

Common recurrences:

Merge sort: T(n) = 2T(n/2) + O(n). c = 1, f(n) = Theta(n). Case 2 (k=0). T(n) = Theta(n log n).
Binary search: T(n) = T(n/2) + O(1). c = 0, f(n) = Theta(1) = Theta(n^0). Case 2. T(n) = Theta(log n).
Strassen: T(n) = 7T(n/2) + O(n^2). c = log2(7) approx 2.81, f(n) = O(n^2). Since 2 < 2.81, Case 1. T(n) = Theta(n^(log2 7)).

ISRO trap: Master theorem does not directly apply when f(n) has a logarithmic factor that makes it fall exactly between cases (the "gap"). Use the recursion tree method instead.

Sorting algorithms

Comparison-based sort summary

Algorithm	Best	Average	Worst	Stable	In-place
Insertion	O(n)	O(n^2)	O(n^2)	Yes	Yes
Selection	O(n^2)	O(n^2)	O(n^2)	No	Yes
Bubble	O(n)	O(n^2)	O(n^2)	Yes	Yes
Merge	O(n log n)	O(n log n)	O(n log n)	Yes	No
Quick	O(n log n)	O(n log n)	O(n^2)	No	Yes
Heap	O(n log n)	O(n log n)	O(n log n)	No	Yes

Quick sort worst case occurs on already-sorted input with naive pivot selection (first or last element). Randomised pivot makes worst case highly unlikely.

Merge sort requires O(n) auxiliary space (for the merge step). For linked lists, merge sort is preferred because merging linked lists needs O(1) extra space.

Non-comparison sorts

Counting sort: O(n + k) where k is the range of values. Requires integer values in a known range.

Radix sort: sorts digit by digit (from least significant to most significant for LSD radix sort) using a stable sort (typically counting sort) for each digit. Time: O(d * (n + k)) where d = number of digits, k = base. For 32-bit integers with base 256: d = 4, k = 256.

The theoretical lower bound for comparison-based sorting is Omega(n log n). Counting and radix sort bypass this by not using comparisons.

Searching and hashing

Binary search

Works only on sorted arrays. Divides search space in half each step.

def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Time: O(log n). Number of comparisons in worst case: floor(log2 n) + 1.

Hashing

Hash function h(k) maps key k to slot in table of size m.

Division method: h(k) = k mod m. Choose m to be a prime not close to a power of 2.

Collision resolution by chaining: each slot is a linked list. Expected search time with load factor alpha = n/m is O(1 + alpha). For alpha = 1 (n = m), this is O(1) expected.

Open addressing: all elements stored in the table itself. On collision, probe sequence determines next slot.

Linear probing: h(k, i) = (h(k) + i) mod m. Suffers from primary clustering.
Quadratic probing: h(k, i) = (h(k) + c1i + c2i^2) mod m. Suffers from secondary clustering.
Double hashing: h(k, i) = (h1(k) + i * h2(k)) mod m. Fewer clustering issues, better distribution.

Load factor alpha must be < 1 for open addressing. Expected probes for a successful search: approximately (1/alpha) * ln(1/(1-alpha)).

Greedy algorithms

Huffman coding

Given character frequencies, build an optimal prefix code (minimum total encoding length). Algorithm:

Create a leaf node for each character and add to a min-priority queue.
While more than one node: extract two nodes with minimum frequency, create a new internal node with their sum as frequency, add back.
The resulting tree defines the code: left edge = 0, right edge = 1.

The greedy choice is always merging the two least-frequent nodes first. This is provably optimal (exchange argument).

Activity selection

Given activities with start and end times, select the maximum number of non-overlapping activities. Greedy choice: always pick the activity with the earliest finish time that is compatible with the previously selected activity. Provably optimal by exchange argument.

Dynamic programming

Key insight

DP applies when the problem has optimal substructure (optimal solution contains optimal solutions to subproblems) and overlapping subproblems (same subproblems solved multiple times in a naive recursion).

0/1 Knapsack

n items, each with weight w[i] and value v[i], knapsack capacity W. dp[i][j] = maximum value using first i items with capacity j.

Recurrence:

dp[0][j] = 0 for all j.
dp[i][j] = dp[i-1][j] if w[i] > j (cannot include item i).
dp[i][j] = max(dp[i-1][j], dp[i-1][j-w[i]] + v[i]) otherwise.

Time: O(n * W). Space: O(n * W), reducible to O(W) with a 1D array.

LCS

dp[i][j] = length of LCS of X[1..i] and Y[1..j].

dp[i][j] = dp[i-1][j-1] + 1 if X[i] == Y[j].
dp[i][j] = max(dp[i-1][j], dp[i][j-1]) otherwise.

Time: O(m * n). To recover the actual LCS, backtrack through the dp table.

Graph algorithms

Dijkstra's algorithm

Single-source shortest path for graphs with non-negative edge weights. Uses a min-priority queue (binary heap). Time: O((V + E) log V) with a binary heap, or O(V^2) with a simple array.

Does NOT work with negative edge weights. For negative weights (no negative cycles), use Bellman-Ford.

Bellman-Ford

Relaxes all edges V-1 times. Detects negative cycles by checking if any edge can still be relaxed after V-1 iterations. Time: O(V * E).

Floyd-Warshall

All-pairs shortest paths. dp[i][j][k] = shortest path from i to j using only vertices {1..k} as intermediaries.

Recurrence: dp[i][j][k] = min(dp[i][j][k-1], dp[i][k][k-1] + dp[k][j][k-1]).

Time: O(V^3). Space: O(V^2) using in-place updates.

ISRO trap: Floyd-Warshall works with negative edges but not negative cycles. To detect negative cycles, check if dp[i][i] < 0 for any i after completion.

Topological sort

Applies only to directed acyclic graphs (DAGs). Two methods:

Kahn's algorithm: repeatedly remove vertices with in-degree 0. Uses a queue. O(V + E).
DFS-based: finish times in DFS give reverse topological order. O(V + E).

If during Kahn's algorithm not all vertices are processed (queue empties with remaining vertices), a cycle exists.

NP-completeness

P: problems solvable in polynomial time. NP: problems whose solutions can be verified in polynomial time. NP-hard: at least as hard as the hardest problem in NP (not necessarily in NP). NP-complete: in NP and NP-hard.

The open question is whether P = NP. If any NP-complete problem is shown to be in P, then all NP problems are in P.

Classic NP-complete problems: Boolean satisfiability (SAT, the first proved NP-complete by Cook-Levin theorem), 3-SAT, vertex cover, clique, independent set, Hamiltonian cycle, travelling salesman decision version, subset sum, graph colouring.

To show problem X is NP-complete: (1) show X is in NP by demonstrating a polynomial-time verifier, and (2) show a known NP-complete problem reduces to X in polynomial time.

Worked examples

Example 1: Master theorem

T(n) = 3T(n/4) + n^2 log n.

a = 3, b = 4, c = log4(3) approx 0.79. f(n) = n^2 log n = Omega(n^(0.79 + epsilon)) for epsilon = 1 (well, we need to check the regularity condition). Check: a * f(n/b) = 3 * (n/4)^2 * log(n/4) = 3n^2/16 * (log n - 2) <= (3/16) * n^2 log n = d * f(n) for d = 3/16 < 1. Case 3 applies. T(n) = Theta(n^2 log n).

Example 2: Counting comparisons in quick sort

Array: [5, 3, 8, 1, 9, 2], pivot = last element = 2.

Partition: compare 5 > 2 (no swap boundary moves), 3 > 2, 8 > 2, 1 < 2 (swap 1 and 5), 9 > 2. Pivot swaps with element at boundary. After partition: [1, 2, 5, 3, 8, 9].

Pivot 2 placed at index 1. Left subarray [1] is trivially sorted. Right subarray [5, 3, 8, 9] is recursed.

Total comparisons: for a subarray of size n, partitioning makes n-1 comparisons with the pivot. Worst case (sorted input, last-element pivot) gives n-1 + n-2 + ... + 1 = n(n-1)/2 = O(n^2).

Example 3: Dijkstra trace

Graph: vertices {A, B, C, D}. Edges: A-B weight 1, A-C weight 4, B-C weight 2, B-D weight 5, C-D weight 1.

Start at A. Initial distances: A=0, B=inf, C=inf, D=inf.

Step 1: Extract A (dist 0). Relax neighbours. B: 0+1=1, C: 0+4=4. Step 2: Extract B (dist 1). Relax. C: min(4, 1+2)=3, D: 1+5=6. Step 3: Extract C (dist 3). Relax. D: min(6, 3+1)=4. Step 4: Extract D (dist 4). No neighbours to relax.

Shortest paths from A: B=1, C=3, D=4.

Example 4: LCS of two strings

X = "ABCBDAB", Y = "BDCABA".

Build dp table (dimensions 8x7, including 0-row and 0-column of zeros). The LCS length is dp[7][6] = 4. One LCS is "BCBA" (or "BDAB").

The critical formula: match if X[i] == Y[j], otherwise take max from above or left.

Example 5: Huffman coding

Characters and frequencies: A=5, B=9, C=12, D=13, E=16, F=45.

Priority queue (min): {A:5, B:9, C:12, D:13, E:16, F:45}.

Merge A+B: 14. Queue: {C:12, D:13, E:16, AB:14, F:45}. Merge C+D: 25. Queue: {E:16, AB:14, F:45, CD:25}. Merge E+AB: 30. Queue: {CD:25, F:45, EAB:30}. Merge CD+EAB: 55. Queue: {F:45, CDEAB:55}. Merge: 100 (root).

F gets code of length 1 (shortest), A and B get codes of length 4 (deepest). Optimal encoding length = 54 + 94 + 123 + 133 + 163 + 451 = 20+36+36+39+48+45 = 224 bits for 100 characters.

Quick-revision summary

Master theorem: compute c = log_b(a), compare with growth rate of f(n). Three cases.
Merge sort: stable, O(n log n) always, O(n) space. Quick sort: in-place, O(n log n) average, O(n^2) worst.
Counting sort and radix sort beat O(n log n) by not comparing elements.
Dijkstra: non-negative weights only. Bellman-Ford: handles negative weights, detects negative cycles. Floyd-Warshall: all-pairs, O(V^3).
Kruskal: sort edges, add if no cycle (union-find). Prim: grow tree greedily using priority queue.
Greedy for activity selection and Huffman coding: exchange argument proves optimality.
0/1 knapsack: DP O(nW). LCS: DP O(mn). Matrix chain: DP O(n^3).
NP-complete: in NP and NP-hard. SAT was the first proved NP-complete.
Topological sort only on DAGs. Kahn's algorithm detects cycle if not all vertices are processed.

How to study this unit

Memorise and practise applying the Master theorem to at least 10 different recurrences. Cover cases where it does not apply (non-polynomial f, logarithmic factors).
Simulate sorting algorithms on small arrays (5-7 elements) by hand, counting comparisons. ISRO asks "how many comparisons to sort [specific array] using merge sort".
For graph algorithms, practise on the same small graph (5-6 vertices) using Dijkstra, Bellman-Ford, and Floyd-Warshall. Compare the results and the number of steps.
For DP problems, always write the recurrence first, then draw the table. Fill the table systematically before computing the answer.
Know the list of NP-complete problems well enough to classify a new problem by analogy. ISRO may describe a variant and ask if it is in P or NP-complete.
Review hashing collision resolution step by step: probe sequences for linear, quadratic, and double hashing for a small table (size 7 or 11).