Artificial Intelligence

Why this unit matters

Artificial intelligence in GATE DA focuses on three well-defined areas: search, logic, and probabilistic reasoning. These are foundational to understanding how AI agents make decisions, represent knowledge, and reason under uncertainty. The probabilistic reasoning sub-section (Bayesian networks) connects directly to the probability unit and is increasingly important for modern ML. Search algorithms are highly tractable exam topics, you can trace them step by step and get the exact right answer. This unit is manageable in 10, 12 days if you are systematic.

Syllabus map

Sub-topic	Key concepts
Uninformed search	BFS, DFS, IDDFS, UCS
Informed search	Greedy best-first, A*, admissible heuristics
Adversarial search	Minimax, alpha-beta pruning
Propositional logic	Syntax, semantics, inference rules
Predicate logic	FOL, quantifiers, unification
Conditional independence	D-separation, Markov blanket
Exact inference	Variable elimination, factor graphs
Approximate inference	Sampling, MCMC, importance sampling

Uninformed search

Uninformed (blind) search explores the state space without domain-specific knowledge about goal proximity.

BFS (Breadth-First Search).

Uses a FIFO queue., Explores all nodes at depth d before depth d+1., Complete: yes (finds a solution if one exists)., Optimal: yes, for unit step costs., Time and space: O(b^d) where b = branching factor, d = depth of shallowest solution., Space is the main weakness, must keep all frontier nodes in memory.

DFS (Depth-First Search).

Uses a LIFO stack., Space: O(b*m) where m = maximum depth., Not complete for infinite state spaces (may loop). Complete for finite, acyclic spaces., Not optimal.

IDDFS (Iterative Deepening DFS).

Runs DFS with increasing depth limits (0, 1, 2, ...)., Combines the space efficiency of DFS (O(b*d)) with the completeness and optimality of BFS., Preferred when search depth is unknown.

UCS (Uniform Cost Search).

Expands the node with the lowest path cost., Uses a min-priority queue., Optimal for any (non-negative) step costs., Equivalent to Dijkstra's algorithm on a search graph.

Trap. DFS is not optimal. BFS is optimal only for uniform step costs. For varying costs, use UCS.

Informed search

Informed search uses a heuristic function h(n) estimating the cost from node n to the goal.

Greedy best-first search: expands the node with the smallest h(n). Fast but not optimal and not complete.

A search:* expands the node with the smallest f(n) = g(n) + h(n), where g(n) = cost from start to n., Complete and optimal if h is admissible.

Admissible heuristic: never overestimates the true cost to the goal. h(n) <= h*(n) for all n.
Consistent heuristic: h(n) <= c(n, n') + h(n') for every successor n'. Consistency implies admissibility.

Comparing heuristics. If h1(n) >= h2(n) for all n and both are admissible, h1 dominates h2. A* with h1 expands fewer nodes.

Trap. A* is optimal if the heuristic is admissible. If it overestimates, you may miss the optimal path. Always verify admissibility before claiming a heuristic is suitable for A*.

Example. In the 8-puzzle, the number of misplaced tiles is admissible (each tile needs at least 1 move). The Manhattan distance (sum of horizontal and vertical distances) is also admissible and dominates the first heuristic.

Adversarial search

Adversarial search models games between two players (MAX and MIN) who alternate moves.

Minimax algorithm. MAX chooses the move that maximises the utility; MIN chooses the move that minimises it. The minimax value is the utility that MAX can guarantee against an optimal MIN.

minimax(node, is_max):
    if terminal(node): return utility(node)
    if is_max:
        return max over children c of minimax(c, False)
    else:
        return min over children c of minimax(c, True)

Time: O(b^m). Space: O(b*m).

Alpha-beta pruning. Prunes branches that cannot affect the final decision., alpha: best value MAX can guarantee so far., beta: best value MIN can guarantee so far., Prune when alpha >= beta., Best case (perfect move ordering): reduces time to O(b^(m/2)), effectively doubling the search depth for the same computation.

Trap. Alpha-beta pruning does not change the result of minimax, it only eliminates branches that would not be chosen. The final answer is identical to minimax.

Propositional logic

Propositional logic deals with statements that are either true or false.

Connectives: NOT (¬), AND (∧), OR (∨), implies (→), biconditional (↔).

Implication: A → B is false only when A is true and B is false. Equivalently: A → B = ¬A ∨ B.

Inference rules:

Modus Ponens: A, A → B ⊢ B., Modus Tollens: ¬B, A → B ⊢ ¬A., Resolution: A ∨ B, ¬B ∨ C ⊢ A ∨ C.

CNF (Conjunctive Normal Form): a conjunction of clauses, each clause a disjunction of literals. Resolution works on CNF. Every propositional formula can be converted to CNF.

Validity, satisfiability, unsatisfiability.

Valid (tautology): true under all truth assignments. Example: A ∨ ¬A., Satisfiable: true under at least one assignment., Unsatisfiable (contradiction): false under all assignments. Example: A ∧ ¬A.

Predicate logic (First-Order Logic)

FOL extends propositional logic with quantifiers and predicates.

Quantifiers:

Universal: ∀x P(x), P holds for all x., Existential: ∃x P(x), P holds for some x.

Negation rules:

¬∀x P(x) ≡ ∃x ¬P(x)
¬∃x P(x) ≡ ∀x ¬P(x)

Unification: finds a substitution that makes two expressions identical. Used in resolution-based theorem proving.

Example: P(x, f(y)) and P(a, f(b)) unify with substitution {x/a, y/b}.

Trap. FOL is semi-decidable, a valid sentence can always be proven, but an invalid sentence may cause the proof procedure to loop forever. Propositional logic is decidable.

Probabilistic reasoning: conditional independence

A Bayesian network is a directed acyclic graph (DAG) where:

Each node is a random variable., Each edge represents a direct dependency., Each node has a conditional probability table (CPT) given its parents.

Conditional independence. X and Y are conditionally independent given Z if:

P(X, Y | Z) = P(X | Z) * P(Y | Z)

In a Bayesian network, d-separation can be used to read off conditional independencies from the graph structure.

Three patterns of d-separation:

Chain: X → Z → Y. X and Y are d-separated given Z (independent given Z).
Fork: X ← Z → Y. X and Y are d-separated given Z.
Collider: X → Z ← Y. X and Y are d-separated if Z (and its descendants) are not observed. Observing Z makes X and Y dependent (explaining away effect).

Markov blanket of a node X: its parents, children, and other parents of its children. Given its Markov blanket, X is conditionally independent of all other variables in the network.

Exact inference: variable elimination

Variable elimination computes marginal or conditional probabilities by summing out (marginalising over) irrelevant variables one at a time.

Steps:

Choose an ordering of the variables to eliminate (those not in the query or evidence).
For each variable, collect all factors involving it, multiply them, and sum out that variable.
Repeat until only the query variable(s) remain.

Complexity. Exponential in the treewidth of the graph. For chains or trees, variable elimination is efficient. For dense graphs (many inter-connections), exact inference is intractable.

Example. Joint: P(A, B, C) = P(A) * P(B | A) * P(C | B). Query: P(A | C = true).

P(A | C=t) proportional to sum over B: P(A) * P(B|A) * P(C=t|B).

Step 1: Eliminate B. Factor1 = P(B|A) * P(C=t|B), summed over B -> new factor f(A). Step 2: Multiply by P(A) -> P(A) * f(A). Normalise.

Approximate inference: sampling

When exact inference is intractable, sampling methods approximate the distribution.

Direct sampling. Sample each variable in topological order using its CPT. Use samples to estimate marginals. Efficient but cannot handle evidence directly.

Rejection sampling. Generate samples and reject those inconsistent with evidence. Simple but wasteful when evidence has low probability.

Likelihood weighting. Set evidence variables to their observed values (no rejection). Weight each sample by the product of the conditional probabilities of the evidence. More efficient than rejection sampling.

MCMC (Markov Chain Monte Carlo) / Gibbs sampling. Sample each variable conditioned on the current values of all others (from its Markov blanket). Asymptotically correct. Useful for high-dimensional distributions.

Trap. Rejection sampling becomes exponentially slow as the number of evidence variables grows. Likelihood weighting avoids this but can have high variance if the evidence is unlikely. Gibbs sampling is generally preferred.

Worked examples

Example 1. BFS vs. DFS on a graph with 4 levels, branching factor 3. What is the maximum number of nodes stored in memory by each?

BFS (at deepest frontier): 3^4 = 81 nodes. DFS: at most bm = 34 = 12 nodes. DFS wins dramatically on memory.

Example 2. Minimax trace. Root is MAX. Children of root have MIN nodes with values {3, 5} (left subtree) and {2, 9} (right subtree). What does MAX choose?

Left MIN node: min(3, 5) = 3. Right MIN node: min(2, 9) = 2. MAX node: max(3, 2) = 3. MAX chooses the left branch.

Example 3. Is h(n) = 0 for all n an admissible heuristic for A*?

Yes. h(n) = 0 never overestimates (it equals 0, which is always <= true cost). A* with h = 0 degenerates to UCS. It is admissible but useless, it provides no guidance.

Example 4. P(A) = 0.3, P(B|A) = 0.7, P(B|¬A) = 0.2. Find P(A | B).

P(B) = P(B|A)P(A) + P(B|¬A)P(¬A) = 0.70.3 + 0.20.7 = 0.21 + 0.14 = 0.35. P(A|B) = P(B|A)*P(A)/P(B) = 0.21/0.35 = 0.6.

Example 5. In a Bayesian network A → B → C, is A conditionally independent of C given B?

Yes. This is the chain pattern. Given B, A and C are d-separated. P(A, C | B) = P(A | B) * P(C | B).

Quick-revision summary

BFS: optimal for unit costs, O(b^d) space. DFS: space O(bm), not optimal., IDDFS: best of both, O(bd) space, optimal. Preferred when depth unknown., A*: optimal if heuristic is admissible. f(n) = g(n) + h(n)., Alpha-beta pruning does not change minimax outcome, halves the effective depth., Implication A → B = ¬A ∨ B. Modus ponens: A, A→B yields B., Collider node: observing it creates dependence between its parents., Variable elimination: multiply relevant factors, sum out one variable at a time., Likelihood weighting is more efficient than rejection sampling for evidence with low probability., Gibbs sampling: resample each variable from its conditional given its Markov blanket.

How to study this unit

Trace BFS and DFS manually on 4, 5 small graphs. Record the frontier (queue/stack) at each step, GATE asks this directly.
Practice A*: compute f(n) = g(n) + h(n) for all nodes, show which gets expanded first, and verify the heuristic is admissible.
Build a minimax tree by hand (2, 3 levels) and then apply alpha-beta pruning. Identify which nodes are pruned and why.
For propositional logic: convert 5 formulas to CNF and apply the resolution rule at least twice each.
Draw small Bayesian networks (3, 5 nodes) and practice d-separation questions: are X and Y independent given Z? Check all three patterns.
Understand variable elimination conceptually with a 3-variable chain example. You do not need to implement it, but you should be able to trace the factor operations on paper.
For approximate inference, focus on the difference between the four methods (direct, rejection, likelihood weighting, Gibbs). GATE typically tests conceptual understanding, not numerical computation.