Programming & Data Structures

Why this unit matters

Programming and Data Structures is the backbone of the GATE CS paper. It is tightly coupled with Unit 5 (Algorithms) and together they contribute the largest chunk of marks, often 20+ marks combined. Unit 4 specifically tests your ability to reason about code behaviour in C, trace recursive calls, and understand how data structures are organised in memory.

GATE questions here are often deceptively short: a 10-line C snippet or a tree with 7 nodes. The traps are in pointer arithmetic, undefined behaviour in C, and off-by-one errors in tree/heap properties. Data structure questions frequently ask about height, number of nodes, insertion order, or traversal output.

Syllabus map

Programming in C
- Data types, operators, expressions
- Control flow: if-else, switch, loops, break/continue
- Functions, parameter passing (call by value)
- Pointers and pointer arithmetic
- Arrays and strings
- Structures and unions
- Dynamic memory allocation (malloc, calloc, free)
- Scope, linkage, storage classes
Recursion
- Base cases and recursive cases
- Recursion tree, call stack
- Tail recursion vs non-tail recursion
- Mutual recursion
Arrays
- 1D and 2D array storage (row-major vs column-major)
- Address calculation for multi-dimensional arrays
Stacks
- LIFO behaviour
- Applications: expression evaluation, function call stack, backtracking
Queues
- FIFO behaviour
- Circular queue, deque
- Applications: BFS, scheduling
Linked Lists
- Singly, doubly, and circular linked lists
- Insertion, deletion, reversal
- Floyd's cycle detection
Trees
- Binary trees: height, diameter, traversals (in-order, pre-order, post-order, level-order)
- Relationship between n, height, and number of leaf nodes
- Binary Search Trees (BST): search, insert, delete
- AVL trees: rotations, height balance factor
- Full, complete, and perfect binary trees
Binary Heaps
- Max-heap and min-heap property
- Heapify, insert, extract-max/min
- Heap sort
Graphs
- Representations: adjacency matrix, adjacency list
- Directed vs undirected, weighted vs unweighted
- (Traversals are covered in Unit 5)

Programming in C

Pointers and Pointer Arithmetic

A pointer stores a memory address. Adding 1 to a pointer of type T advances it by sizeof(T) bytes, not by 1 byte.

int a[] = {10, 20, 30, 40};
int *p = a;        // p points to a[0]
p++;               // p now points to a[1], not a[0]+1 in bytes
printf("%d", *p);  // prints 20

Typical GATE trap: *p++ increments the pointer (after dereferencing), while (*p)++ increments the value. Operator precedence: ++ and * have the same precedence and are right-associative, so *p++ is parsed as *(p++).

Parameter Passing in C

C uses call by value exclusively. Even when you pass a pointer, you are passing the value of the pointer (the address). You can modify what the pointer points to, but you cannot change where the original pointer variable points unless you pass a pointer to a pointer.

void swap(int *a, int *b) {
    int temp = *a;
    *a = *b;
    *b = temp;
}
// This correctly swaps because we dereference the pointers.

void bad_swap(int a, int b) {
    int temp = a; a = b; b = temp;
    // This does NOT affect the caller's variables.
}

Structures and Unions

A structure allocates memory for all members separately. A union allocates memory equal to its largest member, all members share the same memory location.

union U {
    int i;     // 4 bytes
    char c;    // 1 byte
};
// sizeof(U) = 4 (size of largest member)

Typical GATE trap: Writing to one union member and reading from another is implementation-defined behaviour in C. GATE sometimes asks what value is read, the answer depends on endianness and byte layout.

Worked Example, C Output Prediction

#include <stdio.h>
int f(int n) {
    if (n <= 1) return 1;
    return n * f(n - 1);
}
int main() {
    int x = 4;
    printf("%d", f(x));
    return 0;
}

Trace: f(4) = 4 * f(3) = 4 * 3 * f(2) = 4 * 3 * 2 * f(1) = 4 * 3 * 2 * 1 = 24.

Output: 24.

Recursion

Recursion Tree and Stack Depth

Each recursive call adds a frame to the call stack. Maximum stack depth = maximum recursion depth. For a function that recurses k times before reaching the base case, space usage is O(k).

Tail recursion: The recursive call is the very last operation. A tail-recursive function can be optimised by a compiler into a loop (no new stack frame needed). Non-tail recursion cannot be optimised this way.

// Non-tail recursive factorial
int fact(int n) {
    if (n == 0) return 1;
    return n * fact(n-1);  // multiplication happens AFTER recursion returns
}

// Tail recursive factorial
int fact_tail(int n, int acc) {
    if (n == 0) return acc;
    return fact_tail(n-1, n * acc);  // recursive call is the LAST thing
}

Worked example, Recurrence from code:

int T(int n) {
    if (n == 1) return 1;
    return T(n/2) + T(n/2) + n;
}

Recurrence: T(n) = 2T(n/2) + n. By Master Theorem (case 2): T(n) = O(n log n).

Arrays

2D Array Address Calculation

For a 2D array A[R][C] stored in row-major order, element A[i][j] is at:

base_address + (i * C + j) * element_size

For column-major (used in Fortran, MATLAB):

base_address + (j * R + i) * element_size

Worked example: int A[5][4], base address = 1000, element size = 4 bytes, row-major. Where is A[3][2]?

Address = 1000 + (3*4 + 2) * 4 = 1000 + 14 * 4 = 1000 + 56 = 1056.

Stacks and Queues

Stack Applications

Expression evaluation: infix to postfix conversion uses a stack. Evaluating postfix also uses a stack.
Function calls: the call stack stores return addresses and local variables.
Backtracking algorithms: DFS uses a stack (explicitly or via recursion).

Infix to postfix algorithm: Scan left to right. Operand → output. Operator → push to stack, popping operators of greater or equal precedence first. Left parenthesis → push. Right parenthesis → pop and output until matching left paren.

Circular Queue

A circular queue of size n can hold at most n-1 elements (one slot wasted to distinguish full from empty) unless a separate count variable is used. Front and rear wrap around modulo n.

Linked Lists

Reversal of Singly Linked List

Node* reverse(Node* head) {
    Node *prev = NULL, *curr = head, *next = NULL;
    while (curr != NULL) {
        next = curr->next;  // save next
        curr->next = prev;  // reverse link
        prev = curr;        // advance prev
        curr = next;        // advance curr
    }
    return prev;  // new head
}

Time: O(n), Space: O(1).

Floyd's Cycle Detection

Use two pointers: slow moves one step at a time, fast moves two steps. If there is a cycle, they will meet inside the cycle. If fast reaches NULL, there is no cycle.

Typical GATE trap: Floyd's algorithm detects if a cycle exists. Finding the start of the cycle requires a second phase: after meeting point, reset one pointer to head, advance both one step at a time, they meet at the cycle start.

Trees

Binary Tree Properties

A binary tree of height h has at most 2^(h+1), 1 nodes (where height of a single node = 0)., A binary tree with n nodes has height at least floor(log2(n))., A full binary tree has every node with 0 or 2 children. If it has n internal nodes, it has n+1 leaves., A complete binary tree has all levels filled except possibly the last, which is filled left to right., A perfect binary tree is both full and complete: all leaves at the same level.

Traversals:

In-order (left-root-right): gives sorted order for BST.
Pre-order (root-left-right): used for copying/serialising a tree.
Post-order (left-right-root): used for deleting a tree, evaluating expression trees.
Level-order: BFS traversal level by level.

Worked example: For the BST with insertions 5, 3, 7, 1, 4, what is the in-order traversal?

Build the BST:

In-order: 1, 3, 4, 5, 7 (sorted ascending, always true for BST in-order).

BST Operations

Search: O(h) where h = height. Best case O(log n) for balanced BST; worst case O(n) for skewed BST.
Insert: like search, then add node at the appropriate leaf position.
Delete: three cases, (1) leaf: just remove; (2) one child: replace with child; (3) two children: replace with in-order successor (smallest in right subtree) or in-order predecessor.

Typical GATE trap: After deleting a node with two children using in-order successor, the successor itself must also be deleted from its original position (it has at most one child, no left child, so that deletion is a simpler case).

AVL Trees

An AVL tree is a BST where for every node, the height difference between left and right subtrees (balance factor) is at most 1.

After insert/delete, rotations restore balance:

Single right rotation (LL case): left child's left subtree too tall.
Single left rotation (RR case): right child's right subtree too tall.
Left-right rotation (LR case): left child's right subtree too tall.
Right-left rotation (RL case): right child's left subtree too tall.

Height of an AVL tree with n nodes: O(log n).

Binary Heaps

Heap Property

A max-heap satisfies: parent >= all children for every node. A min-heap satisfies: parent <= all children.

A binary heap is implemented as a complete binary tree, typically stored in an array. For a node at index i (1-based):

Left child: 2i, Right child: 2i + 1, Parent: floor(i/2)

Heapify and Build-Heap

max_heapify(A, i) fixes a single violation at node i assuming both subtrees are already heaps. Time: O(log n).

build_max_heap calls max_heapify on all non-leaf nodes from bottom up:

void build_max_heap(int A[], int n) {
    for (int i = n/2; i >= 1; i--)
        max_heapify(A, i, n);
}

Time: O(n), not O(n log n) as intuition might suggest. The O(n) analysis uses the fact that most heapify calls operate on small subtrees near the leaves.

Worked example, Heap insertion: Insert 50 into the max-heap [100, 80, 60, 40, 30].

Array: [100, 80, 60, 40, 30, 50]. New node 50 at index 6. Parent = floor(6/2) = 3, value = 60. 50 < 60, so heap property satisfied. No swap needed.

Now insert 90: Array = [100, 80, 60, 40, 30, 50, 90]. Node at index 7, parent at 3 = 60. 90 > 60, swap: [100, 80, 90, 40, 30, 50, 60]. Now node at index 3, parent at 1 = 100. 90 < 100, stop. Heap property satisfied.

Heap Sort

Build a max-heap, then repeatedly extract the maximum (swap root with last element, reduce heap size, heapify root). Time: O(n log n), in-place, but not stable.

Typical GATE trap: Heap sort is NOT stable. Two equal elements may appear in different order in the output.

Graphs

Representations

Adjacency matrix: n x n matrix. A[i][j] = 1 if edge (i,j) exists. Space: O(n^2). Edge lookup: O(1). Neighbour enumeration: O(n).

Adjacency list: Array of n lists. Space: O(n + m) where m = number of edges. Neighbour enumeration: O(degree). Preferred for sparse graphs.

Typical GATE trap: For dense graphs (m close to n^2), adjacency matrix has no storage disadvantage. For sparse graphs, adjacency list is significantly more space-efficient.

Worked Examples

Example 1 (C, pointer arithmetic):

int a[] = {1, 2, 3, 4, 5};
int *p = a + 2;
printf("%d %d", *p, *(p-1));

p = a + 2 points to a[2] = 3. *p = 3. *(p-1) = a[1] = 2. Output: 3 2.

Example 2 (BST): Insert keys in order: 10, 5, 15, 3, 7 into an initially empty BST. What is the height?

Height = 2 (root at 0, deepest nodes 3 and 7 at level 2).

Example 3 (Heap): Given the array [16, 14, 10, 8, 7, 9, 3, 2, 4, 1] as a max-heap. Extract max and show resulting heap.

Remove 16 (root), place last element (1) at root: [1, 14, 10, 8, 7, 9, 3, 2, 4]. Heapify: 1 vs children 14 and 10 → swap with 14: [14, 1, 10, 8, 7, 9, 3, 2, 4]. Then 1 vs 8 and 7 → swap with 8: [14, 8, 10, 1, 7, 9, 3, 2, 4]. Then 1 vs 2 and 4 → swap with 4: [14, 8, 10, 4, 7, 9, 3, 2, 1]. Heap property restored.

Example 4 (Recursion): How many times is f(3) called when computing f(5) using the naive Fibonacci: f(n) = f(n-1) + f(n-2)?

Call tree: f(5) calls f(4) and f(3). f(4) calls f(3) and f(2). So f(3) is called 2 times.

Example 5 (Linked List): A singly linked list has nodes 1→2→3→4→5. After one call to reverse, what is the list?

After reversal: 5→4→3→2→1. Return value is a pointer to node 5 (new head).

Quick-revision Summary

C is call-by-value; passing a pointer lets you modify the pointed-to value
*p++ = dereference then increment pointer; (*p)++ = dereference and increment value, Union size = size of largest member; all members share the same memory, Tail recursion can be compiled to iteration; non-tail cannot, 2D row-major address: base + (i*C + j) * size, Full binary tree with n internal nodes has n+1 leaves, In-order traversal of BST gives sorted output, BST search/insert/delete: O(h); AVL tree h = O(log n), AVL rotations: LL→right rotation, RR→left rotation, LR→left-right, RL→right-left, Max-heap parent index i: children at 2i and 2i+1 (1-based array), build_max_heap is O(n), not O(n log n), Heap sort: O(n log n), in-place, not stable, Adjacency matrix: O(n^2) space, O(1) edge lookup, Adjacency list: O(n+m) space, O(degree) neighbour enumeration, Floyd's cycle detection: slow+fast pointers; they meet if and only if cycle exists

How to Study this Unit

Write C code by hand, without an IDE. GATE does not give you a compiler. Practise tracing pointer operations, recursion, and output prediction on paper. The most common mistake is computing *p++ incorrectly.
For trees, always draw the structure. Even for a 5-node tree, drawing it removes ambiguity. BST insertion questions are free marks if you draw the tree first.
Heap operations: practise on paper arrays. Indexing mistakes (0-based vs 1-based) are frequent. Decide on one convention and stick with it. GATE uses 1-based for heaps in most solutions.
Recursion: write the recurrence first, then trace. For any recursive function asked in GATE, write T(n) = ... before tracing. This connects Unit 4 and Unit 5, knowing the recurrence tells you the time complexity without full tracing.
Do PYQs grouped by data structure. Solve all GATE BST questions together (2010-2024), then all heap questions, then all linked list questions. Patterns become obvious. BST questions repeat: height after insertion sequence, in-order traversal, delete and re-balance.
Linked list pointer manipulation is a common 2-marker. Practise reversal, finding the middle node (slow/fast pointers), and cycle detection by writing the code and tracing pointer assignments step by step.