Programming, Data Structures & Algorithms

Why this unit matters

GATE DA expects you to read and write Python, reason about time and space complexity, and trace algorithms step by step. Questions in this unit are usually concrete and deterministic, there is a right answer and you can verify it by tracing the code. That makes this unit high yield if you practise it actively. It also underpins your ability to implement ML algorithms cleanly, which is tested in the machine learning unit.

Syllabus map

Sub-topic	Key concepts
Python basics	Types, control flow, functions, comprehensions, scope
Stacks and queues	LIFO/FIFO, operations, complexity
Linked lists	Singly, doubly, insertion, deletion
Trees	Binary trees, BST, traversals
Hash tables	Hash functions, collision resolution
Searching	Linear search, binary search
Sorting	Selection, bubble, insertion, merge, quicksort
Graphs	Representation, BFS, DFS, Dijkstra

Python programming fundamentals

Python is pass-by-object-reference. Mutable objects (lists, dicts) can be modified inside a function; immutable objects (int, str, tuple) cannot.

# Mutable default argument trap
def append_to(element, to=[]):   # BAD: list created once
    to.append(element)
    return to

def append_to_safe(element, to=None):   # GOOD
    if to is None:
        to = []
    to.append(element)
    return to

List comprehensions are frequently tested for their output:

result = [x**2 for x in range(5) if x % 2 == 0]
# result = [0, 4, 16]

Lambda and higher-order functions:

nums = [3, 1, 4, 1, 5]
nums.sort(key=lambda x: -x)   # sorts descending
# nums = [5, 4, 3, 1, 1]

Trap. Python's integer division uses //. The expression 7 // 2 gives 3, not 3.5. In sorting and binary search problems, this distinction matters.

Stacks and queues

Stack (LIFO, Last In, First Out). Push and pop from the same end.

stack = []
stack.append(10)   # push
stack.append(20)
top = stack.pop()  # pop returns 20

Time complexity: push O(1), pop O(1), peek O(1).

Queue (FIFO, First In, First Out). Enqueue at rear, dequeue from front.

from collections import deque
q = deque()
q.append(10)       # enqueue
q.append(20)
front = q.popleft()  # dequeue returns 10

Using deque gives O(1) for both ends. A plain list gives O(n) for popleft, this is a common trap question.

Application. Stacks are used in function call management, expression evaluation, and DFS (explicitly). Queues are used in BFS.

Linked lists

A singly linked list node has data and a pointer to the next node. Traversal is O(n); random access does not exist.

class Node:
    def __init__(self, data):
        self.data = data
        self.next = None

class LinkedList:
    def __init__(self):
        self.head = None

    def insert_at_front(self, data):
        new_node = Node(data)
        new_node.next = self.head
        self.head = new_node

Insertion at front: O(1). Insertion at end without tail pointer: O(n). Deletion requires tracking the previous node.

Doubly linked list adds a prev pointer, making backward traversal and deletion O(1) (once the node is found).

Trees

A binary tree has at most two children per node. A binary search tree (BST) satisfies: all left descendants < node < all right descendants.

Tree traversals:

def inorder(node):
    if node:
        inorder(node.left)
        print(node.data)
        inorder(node.right)

def preorder(node):
    if node:
        print(node.data)
        preorder(node.left)
        preorder(node.right)

def postorder(node):
    if node:
        postorder(node.left)
        postorder(node.right)
        print(node.data)

Inorder traversal of a BST gives elements in sorted order. This is a direct GATE question pattern.

Heights and node counts. A complete binary tree with n nodes has height floor(log2(n)). BST search, insert, delete: O(h) where h is height. Worst case (skewed tree): O(n). Best case (balanced): O(log n).

Hash tables

A hash table maps keys to values via a hash function. Average-case insert, lookup, delete: O(1). Worst case (all keys collide): O(n).

Collision resolution strategies:

Chaining: each slot holds a linked list of (key, value) pairs., Open addressing: probe for the next empty slot (linear probing, quadratic probing, double hashing).

Load factor = n / m where n is number of stored keys and m is table size. High load factor increases collision probability. Rehashing is done when load factor exceeds a threshold (typically 0.7, 0.75).

Trap. Python's dict is a hash table. dict lookup is O(1) average; however, using a list for a "set membership" check is O(n) and a common performance mistake.

Searching

Linear search. Scan every element. O(n) time, O(1) space. Works on unsorted data.

Binary search. Works only on sorted data. Halve the search space each time.

def binary_search(arr, target):
    lo, hi = 0, len(arr) - 1
    while lo <= hi:
        mid = (lo + hi) // 2
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            lo = mid + 1
        else:
            hi = mid - 1
    return -1

Time: O(log n). Space: O(1) iterative, O(log n) recursive.

Trap. The classic off-by-one: use lo <= hi (not lo < hi) and update lo = mid + 1, hi = mid - 1 (not mid) to avoid infinite loops.

Sorting algorithms

Algorithm	Best	Average	Worst	Space	Stable?
Selection	O(n^2)	O(n^2)	O(n^2)	O(1)	No
Bubble	O(n)	O(n^2)	O(n^2)	O(1)	Yes
Insertion	O(n)	O(n^2)	O(n^2)	O(1)	Yes
Merge	O(n log n)	O(n log n)	O(n log n)	O(n)	Yes
Quicksort	O(n log n)	O(n log n)	O(n^2)	O(log n)	No

Merge sort divides, sorts recursively, then merges. Guaranteed O(n log n), preferred when stability matters or worst-case guarantees are needed.

Quicksort picks a pivot, partitions around it. Average O(n log n) but degrades to O(n^2) with bad pivots (sorted input + first-element pivot). Randomised pivot or median-of-three avoids this in practice.

Insertion sort is efficient for nearly sorted data (O(n) in the best case) and is used as a subroutine in hybrid sorts (like Timsort, which Python uses internally).

Graph theory and traversals

A graph G = (V, E). Represented as an adjacency list (space O(V + E)) or adjacency matrix (space O(V^2)).

BFS (Breadth-First Search): uses a queue, explores level by level. Finds shortest path in unweighted graphs.

from collections import deque

def bfs(graph, start):
    visited = set([start])
    queue = deque([start])
    while queue:
        node = queue.popleft()
        for neighbour in graph[node]:
            if neighbour not in visited:
                visited.add(neighbour)
                queue.append(neighbour)

DFS (Depth-First Search): uses a stack (or recursion), explores as deep as possible before backtracking. Used for cycle detection, topological sort, connected components.

Dijkstra's algorithm: shortest path in a weighted graph with non-negative edge weights. Uses a min-heap (priority queue). Time: O((V + E) log V).

Trap. Dijkstra does not work with negative edge weights. For negative weights, use Bellman-Ford (O(VE)).

Worked examples

Example 1. What is the output of the following Python code?

x = [1, 2, 3]
y = x
y.append(4)
print(x)

Output: [1, 2, 3, 4]. Both x and y reference the same list object.

Example 2. Insert keys 10, 20, 5, 15 into a BST (initially empty). What is the inorder traversal?

Insert 10 (root), 20 (right of 10), 5 (left of 10), 15 (left of 20). Inorder: 5, 10, 15, 20.

Example 3. How many comparisons does merge sort make in the worst case for n = 8?

Merge sort: O(n log n). For n = 8: 8 * log2(8) = 8 * 3 = 24 comparisons (approximate). The exact worst-case count is 17 for 8 elements.

Example 4. A hash table has 7 slots (indices 0, 6) and uses h(k) = k mod 7 with linear probing. Insert 10, 20, 30. At which indices?

h(10) = 3, h(20) = 6, h(30) = 2. No collisions; 10 at index 3, 20 at index 6, 30 at index 2.

Example 5. BFS from node A in the graph A-B, A-C, B-D, C-D. What is the BFS order?

Queue starts: [A]. Visit A, enqueue B, C. Queue: [B, C]. Visit B, enqueue D. Queue: [C, D]. Visit C (D already queued). Queue: [D]. Visit D. Order: A, B, C, D.

Quick-revision summary

Stack: LIFO, O(1) push/pop. Queue: FIFO, use deque for O(1) both ends., BST inorder traversal = sorted order., Merge sort: O(n log n) always, O(n) space, stable., Quicksort: O(n log n) average, O(n^2) worst (sorted input + bad pivot), in-place, not stable., Binary search: O(log n), only on sorted data., Hash table: O(1) average lookup; load factor drives rehashing., BFS uses queue, finds shortest path in unweighted graphs., DFS uses stack/recursion, used for cycle detection and topological sort., Dijkstra: O((V+E) log V), requires non-negative weights.

How to study this unit

Write Python code daily (day 1, 2): practice list comprehensions, lambda, mutable vs. immutable traps, and common built-ins (sorted, zip, enumerate, defaultdict).
Implement stack, queue, and linked list from scratch (day 3). Do not rely on library imports during practice, implement the logic yourself.
Trace BST operations manually: insert a sequence of keys, then write down inorder/preorder/postorder results (day 4).
Memorise the sorting complexity table. Then implement merge sort and binary search by hand (day 5).
Practice BFS and DFS on small hand-drawn graphs (day 6). Trace each step with the queue/stack state.
Solve 10, 15 previous GATE CS questions on data structures and algorithms, many overlap with GATE DA.