Algorithms

Why this unit matters

Algorithms is the most analytically demanding unit in GATE CS and arguably the most rewarding. Questions span complexity analysis, algorithm correctness, and numerical tracing. This unit overlaps with Unit 4 (data structures) and Unit 1 (recurrences and combinatorics), strong performance here amplifies marks across the paper.

GATE draws 10-15 marks from algorithms each year. Expect: worst-case complexity of sorting algorithms, hash collision counting, Dijkstra/Bellman-Ford on a given graph, dynamic programming subproblem formulation, greedy correctness argument, and Master Theorem applications.

Syllabus map

Asymptotic Complexity
- Big-O, Omega, Theta, Little-o, Little-omega definitions
- Recurrence solving: substitution, recursion tree, Master Theorem
- Time and space complexity analysis
Searching and Sorting
- Linear search: O(n)
- Binary search: O(log n), conditions, invariants
- Bubble sort, selection sort, insertion sort: O(n^2)
- Merge sort: O(n log n), stable, not in-place
- Quick sort: O(n log n) average, O(n^2) worst, in-place
- Heap sort: O(n log n), in-place, not stable
- Counting sort, radix sort: linear time under conditions
- Lower bound for comparison-based sorting: Omega(n log n)
Hashing
- Hash functions, hash tables
- Collision resolution: chaining, open addressing (linear probing, quadratic probing, double hashing)
- Load factor, expected number of probes
Greedy Algorithms
- Activity selection, interval scheduling
- Fractional knapsack
- Huffman coding
- Greedy correctness (exchange argument)
Dynamic Programming
- Optimal substructure and overlapping subproblems
- 0/1 Knapsack
- Longest Common Subsequence (LCS)
- Longest Increasing Subsequence (LIS)
- Matrix chain multiplication
- Coin change
Divide and Conquer
- Merge sort, quick sort
- Binary search
- Strassen's matrix multiplication
- Closest pair of points
Graph Algorithms
- BFS: shortest path in unweighted graph, O(V+E)
- DFS: topological sort, finding connected/strongly connected components, O(V+E)
- Dijkstra's algorithm: single-source shortest path, non-negative weights, O((V+E) log V) with min-heap
- Bellman-Ford: handles negative weights, detects negative cycles, O(VE)
- Floyd-Warshall: all-pairs shortest paths, O(V^3)
- Prim's and Kruskal's: minimum spanning trees, O(E log V)

Asymptotic Complexity

Definitions

f(n) = O(g(n)): there exist c > 0 and n0 such that f(n) <= c*g(n) for all n >= n0. (Upper bound)
f(n) = Omega(g(n)): there exist c > 0 and n0 such that f(n) >= c*g(n) for all n >= n0. (Lower bound)
f(n) = Theta(g(n)): f(n) = O(g(n)) and f(n) = Omega(g(n)). (Tight bound)

Common growth rates (slow to fast): O(1) < O(log n) < O(sqrt(n)) < O(n) < O(n log n) < O(n^2) < O(n^3) < O(2^n) < O(n!)

Typical GATE trap: log(n!) = Theta(n log n) by Stirling's approximation. This is often needed in sorting lower-bound proofs.

Master Theorem

For T(n) = aT(n/b) + f(n) where a >= 1, b > 1:

Let p = log_b(a) (the "critical exponent").

Case 1: If f(n) = O(n^(p-epsilon)) for some epsilon > 0, then T(n) = Theta(n^p).
Case 2: If f(n) = Theta(n^p), then T(n) = Theta(n^p * log n).
Case 3: If f(n) = Omega(n^(p+epsilon)) for some epsilon > 0, AND af(n/b) <= cf(n) for c < 1 (regularity condition), then T(n) = Theta(f(n)).

Worked example: T(n) = 4T(n/2) + n.

a = 4, b = 2, p = log_2(4) = 2. f(n) = n = O(n^(2-1)) = O(n^1). Case 1 applies. T(n) = Theta(n^2).

Worked example: T(n) = 2T(n/2) + n log n.

a = 2, b = 2, p = log_2(2) = 1. f(n) = n log n. Compare with n^1: n log n grows faster than n, so Case 3? Check: n log n = Omega(n^(1+epsilon)) for any epsilon > 0? n log n = o(n^(1+epsilon)) for any epsilon > 0, so Case 3 does NOT apply. This is a Master Theorem gap case, use recursion tree instead. Result: T(n) = Theta(n log^2 n).

Sorting Algorithms

Comparison-Based Sorts

Algorithm	Best	Average	Worst	Space	Stable
Bubble sort	O(n)	O(n^2)	O(n^2)	O(1)	Yes
Selection sort	O(n^2)	O(n^2)	O(n^2)	O(1)	No
Insertion sort	O(n)	O(n^2)	O(n^2)	O(1)	Yes
Merge sort	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Quick sort	O(n log n)	O(n log n)	O(n^2)	O(log n)	No
Heap sort	O(n log n)	O(n log n)	O(n log n)	O(1)	No

Lower bound for comparison-based sorting: Any comparison-based sorting algorithm must make Omega(n log n) comparisons in the worst case. Proof: a decision tree for n elements has n! leaves; its height is at least log2(n!) = Theta(n log n).

Quick sort worst case: Occurs when the pivot is always the smallest or largest element, happens on already-sorted or reverse-sorted input with naive pivot selection (first or last element). Random pivot selection or median-of-three reduces this risk.

Merge sort: Recurrence T(n) = 2T(n/2) + n. Master Theorem Case 2: T(n) = Theta(n log n). Stable and guaranteed O(n log n) but requires O(n) extra space.

Worked example, Quick sort pivot: Sort [3, 1, 4, 1, 5, 9, 2, 6] with last element (6) as pivot.

Partition: elements < 6 on left, > 6 on right. [3, 1, 4, 1, 5, 2, 6, 9], pivot 6 is at index 6. Recursively sort [3,1,4,1,5,2] and [9]. Continue until sorted.

Linear-Time Sorting

Counting sort: For integers in range [0, k], uses an auxiliary array of size k+1. Time: O(n+k), Space: O(n+k). Stable. Not comparison-based.

Radix sort: Sort digit by digit (LSD to MSD) using a stable sort (typically counting sort) at each digit position. Time: O(d*(n+k)) where d = number of digits, k = range per digit. Effective when d is small.

Typical GATE trap: Radix sort requires a stable sort at each digit pass. If counting sort (which is stable) is used, radix sort is stable overall.

Hashing

Collision Resolution

Chaining: Each table slot holds a linked list. With n elements in m slots, average list length = load factor alpha = n/m. Successful search: O(1 + alpha). Unsuccessful search: O(1 + alpha).

Open addressing: Probe a sequence of slots. Three variants:

Linear probing: h(k, i) = (h'(k) + i) mod m. Simple but causes primary clustering.
Quadratic probing: h(k, i) = (h'(k) + c1i + c2i^2) mod m. Reduces primary clustering but causes secondary clustering.
Double hashing: h(k, i) = (h1(k) + i*h2(k)) mod m. Best distribution, no clustering issues.

Expected number of probes for unsuccessful search in open addressing: approximately 1/(1, alpha) for alpha < 1.

Worked example: Hash table with 7 slots, hash function h(k) = k mod 7. Insert keys 50, 700, 76, 85, 92, 73 using linear probing.

50 mod 7 = 1 → slot 1
700 mod 7 = 0 → slot 0
76 mod 7 = 6 → slot 6
85 mod 7 = 1 → slot 1 occupied, probe slot 2
92 mod 7 = 1 → slots 1,2 occupied, probe slot 3
73 mod 7 = 3 → slot 3 occupied, probe slot 4

Table: [700, 50, 85, 92, 73, _, 76]. Note primary clustering around slot 1.

Greedy Algorithms

Activity Selection

Given n activities with start times s_i and finish times f_i, select the maximum number of non-overlapping activities.

Greedy choice: always select the activity with the earliest finish time among those compatible with previously selected activities. Time: O(n log n) for sorting + O(n) for selection.

Why it works: The exchange argument, any optimal solution can be transformed to include the greedy choice without reducing the number of activities selected.

Huffman Coding

Build an optimal prefix-free binary code for characters with given frequencies. Greedy: repeatedly merge the two lowest-frequency nodes into a new internal node. Use a min-heap to get the two minimums efficiently. Time: O(n log n).

Worked example: Frequencies: A=5, B=9, C=12, D=13, E=16, F=45.

Step 1: Merge A(5) and B(9) → AB(14). Step 2: Merge C(12) and D(13) → CD(25). Step 3: Merge AB(14) and E(16) → ABE(30). Step 4: Merge CD(25) and ABE(30) → CDABE(55). Step 5: Merge F(45) and CDABE(55) → root(100).

F gets code 0 (1 bit). Others get longer codes. A and B get the longest codes.

Typical GATE trap: Huffman produces an optimal prefix-free code for a given set of frequencies, but the code itself is not unique, only the total length of the encoded string is optimal.

Dynamic Programming

Key Conditions

A problem is suitable for DP when it has:

Optimal substructure: optimal solution contains optimal solutions to subproblems.
Overlapping subproblems: same subproblems are solved multiple times in a naive recursive approach.

0/1 Knapsack

n items, each with weight w_i and value v_i. Capacity W. Choose items to maximise total value without exceeding W.

Recurrence: dp[i][w] = max(dp[i-1][w], v_i + dp[i-1][w, w_i]) if w_i <= w, else dp[i-1][w].

Time: O(nW), Space: O(nW) (or O(W) with 1D optimisation). NOT polynomial time in the input size (W can be exponential in the number of bits to represent it, this is a pseudopolynomial algorithm).

Worked example: Items: (w=2, v=3), (w=3, v=4), (w=4, v=5). Capacity W=5.

dp[0][*] = 0 (no items)
Item 1 (w=2, v=3):
  dp[1][0]=0, dp[1][1]=0, dp[1][2]=3, dp[1][3]=3, dp[1][4]=3, dp[1][5]=3
Item 2 (w=3, v=4):
  dp[2][0]=0, dp[2][1]=0, dp[2][2]=3, dp[2][3]=4, dp[2][4]=4, dp[2][5]=7
Item 3 (w=4, v=5):
  dp[3][0]=0, ..., dp[3][4]=5, dp[3][5]=7

Maximum value = dp[3][5] = 7 (items 1 and 2: weights 2+3=5, values 3+4=7).

Longest Common Subsequence (LCS)

Recurrence:

If X[i] == Y[j]: dp[i][j] = 1 + dp[i-1][j-1], Else: dp[i][j] = max(dp[i-1][j], dp[i][j-1])

Time: O(mn), Space: O(mn).

Typical GATE trap: LCS is not the same as longest common substring. LCS allows gaps (non-contiguous); substring requires contiguous characters.

Graph Algorithms

BFS and DFS

BFS uses a queue. Produces BFS tree, computes shortest path (by number of edges) in unweighted graphs. Time: O(V+E).

DFS uses a stack (or recursion). Produces DFS tree/forest. Used for:

Topological sort (reverse finishing order in DFS on DAG), Finding strongly connected components (Kosaraju's or Tarjan's algorithm), Cycle detection

Topological sort condition: A topological ordering exists if and only if the graph is a DAG (Directed Acyclic Graph).

Dijkstra's Algorithm

Single-source shortest path for graphs with non-negative edge weights.

Initialize dist[source] = 0, dist[all others] = infinity
Priority queue Q with all vertices keyed by dist
While Q is not empty:
    u = extract-min from Q
    For each neighbour v of u:
        if dist[u] + w(u,v) < dist[v]:
            dist[v] = dist[u] + w(u,v)
            update v's key in Q

Time: O((V+E) log V) with a binary min-heap.

Typical GATE trap: Dijkstra fails with negative edge weights. Use Bellman-Ford instead.

Worked example: Dijkstra on graph with edges: A-B(4), A-C(2), B-C(5), B-D(10), C-E(3), E-D(4). Source = A.

Initial: dist[A]=0, others=inf
Extract A: relax B→4, C→2.
Extract C(2): relax E→5.
Extract B(4): relax D→14.
Extract E(5): relax D→9.
Extract D(9): no improvements.
Final: A=0, B=4, C=2, D=9, E=5.

Bellman-Ford

Relaxes all edges V-1 times. Detects negative cycles on the V-th pass.

Time: O(VE). Slower than Dijkstra but handles negative weights.

Minimum Spanning Tree

Kruskal's: Sort edges by weight, add edge if it doesn't create a cycle (use Union-Find). O(E log E).

Prim's: Grow MST one vertex at a time, always picking the minimum-weight edge crossing the cut. O(E log V) with binary heap.

Both produce an MST of total weight = minimum possible to connect all V vertices. An MST has exactly V-1 edges.

Typical GATE trap: MST is not necessarily unique. If all edge weights are distinct, the MST is unique. If there are ties, multiple MSTs with the same total weight may exist.

Worked example: Find MST of the graph: edges (A,B,4), (A,C,3), (B,C,1), (B,D,2), (C,D,5).

Kruskal's sorted edges: (B,C,1), (B,D,2), (A,C,3), (A,B,4), (C,D,5)., Add (B,C,1): no cycle. MST edges: {B-C}., Add (B,D,2): no cycle. MST edges: {B-C, B-D}., Add (A,C,3): no cycle. MST edges: {B-C, B-D, A-C}.

(A,B,4): would create cycle A-B-C-A. Skip., Done (3 = V-1 edges for 4 vertices).

MST total weight = 1+2+3 = 6.

Worked Examples

Example 1 (Master Theorem): T(n) = 3T(n/3) + n^2.

a=3, b=3, p = log_3(3) = 1. f(n) = n^2 = Omega(n^(1+1)) = Omega(n^2). Case 3: check regularity: 3*(n/3)^2 = 3n^2/9 = n^2/3 <= cn^2 for c=1/3 < 1. Yes. T(n) = Theta(n^2).

Example 2 (Sorting): What is the worst-case number of comparisons for insertion sort on an array of n elements?

Worst case (reverse-sorted): 1+2+...+(n-1) = n(n-1)/2 = O(n^2).

Example 3 (Hashing): A hash table of size 10 uses h(k) = k mod 10. How many collisions occur when inserting keys 23, 33, 43, 53?

All four keys hash to slot 3 (23 mod 10 = 3, 33 mod 10 = 3, etc.). First insertion has 0 collisions. Each subsequent insertion collides with existing elements. Total collisions = 0+1+2+3 = 6 (in chaining, collision count = number of existing elements in the chain when inserting).

Example 4 (Graph, BFS path): BFS on graph: A-B, A-C, B-D, C-D, D-E starting from A. What is the shortest path from A to E?

BFS from A: queue = [A]. Visit A, enqueue B, C. Queue=[B,C]. Visit B, enqueue D. Visit C, D already enqueued. Queue=[D]. Visit D, enqueue E. Queue=[E]. Visit E.

Shortest path: A → B → D → E (or A → C → D → E), length 3 edges.

Example 5 (DP, LCS): Find length of LCS of X = "ABCBDAB" and Y = "BDCABA".

Using the DP table (7x6), LCS length = 4. One LCS is "BCBA" or "BDAB".

Quick-revision Summary

O(n log n) is the lower bound for comparison-based sorting, Merge sort: O(n log n) worst, stable, O(n) extra space, Quick sort: O(n log n) average, O(n^2) worst (sorted input, bad pivot), in-place, not stable, Heap sort: O(n log n) worst, in-place, not stable, Master Theorem: compare f(n) with n^(log_b(a)) to determine case, Hashing with chaining: expected O(1 + alpha); open addressing: O(1/(1-alpha)) for unsuccessful, Linear probing: primary clustering; double hashing: best open addressing, Greedy activity selection: always pick earliest finish time, Huffman coding: greedy merging of lowest-frequency nodes, O(n log n), 0/1 Knapsack: O(nW) pseudopolynomial DP, LCS != longest common substring (LCS allows gaps), Dijkstra: non-negative weights, O((V+E) log V); fails with negative weights, Bellman-Ford: O(VE), handles negative weights, detects negative cycles, Kruskal's: sort edges + Union-Find, O(E log E), Prim's: grow MST with min-heap, O(E log V), MST unique iff all edge weights distinct

How to Study this Unit

Master the Master Theorem before anything else. At least one recurrence question appears every year. Write out the three cases and practice identifying which case applies. For gap cases, default to the recursion tree method.
For sorting, know the exact complexity table. GATE asks: "which of the following is true about quicksort worst case?" or "which sorting algorithm is stable?" Memorise the table in this unit, best, average, worst, space, stability for all seven algorithms.
Graph algorithm questions give you a graph and ask you to trace. Always draw the graph, label edges with weights, and simulate the algorithm step by step. For Dijkstra, maintain a dist[] table and update it at each step. Do not try to reason about the output without tracing.
DP questions in GATE follow a pattern. Read the problem, identify the subproblem definition, write the recurrence, fill the table for a small example. The most common DP problems in GATE are: 0/1 Knapsack, LCS, matrix chain multiplication, and coin change. Practise the table-filling for each.
Greedy: always justify your choice. GATE sometimes asks which greedy strategy is correct. An incorrect greedy (e.g., always picking the highest value in 0/1 knapsack) is a common wrong answer. Know the exchange argument for activity selection and Huffman coding.
Connect Unit 4 and Unit 5 during revision. When you revise heaps (Unit 4), immediately solve heap sort complexity and Dijkstra's priority queue questions (Unit 5). When you revise BSTs (Unit 4), connect to binary search analysis (Unit 5). The units are not siloed, treating them as connected saves revision time.