Theory of Computation
Finite automata, regular languages, CFG, PDA, Turing machines, decidability, pumping lemma.
Why this unit matters
Theory of Computation (TOC) is where ISRO CS tests formal reasoning. Questions are typically about identifying the class of a given language, minimising a DFA, applying the pumping lemma, checking whether a grammar is ambiguous, and determining the decidability of a problem. ISRO papers tend to combine TOC with a twist: for example, the complement of a context-free language, the intersection of a CFL and a regular language, or whether a specific Turing machine halts.
Expect 8-11 marks. Most questions are 1-2 marks and conceptual, but they require careful formal thinking. Careless students confuse NFA with DFA equivalence details, or forget which classes are closed under which operations.
Syllabus map
-
Finite automata
- DFA: formal definition, transition function, extended transition function
- NFA: non-determinism, epsilon-NFA
- Equivalence of DFA and NFA (subset construction)
- DFA minimisation: Myhill-Nerode theorem, table-filling algorithm
-
Regular languages
- Regular expressions: syntax, semantics
- Equivalence of DFA, NFA, and regular expressions
- Closure properties: union, concatenation, Kleene star, complement, intersection, difference
- Pumping lemma for regular languages
- Non-regular languages
-
Context-free languages
- Context-free grammars (CFG): productions, derivation, parse trees
- Ambiguity: multiple parse trees for the same string
- Normal forms: Chomsky Normal Form (CNF), Greibach Normal Form (GNF)
- Pushdown automata (PDA): stack, transitions
- Equivalence of CFG and PDA
- Closure properties: union, concatenation, Kleene star; NOT closed under intersection or complement
- Pumping lemma for context-free languages
- Non-context-free languages
-
Turing machines
- Formal definition: 7-tuple
- Deterministic TM, nondeterministic TM (equivalent power)
- Multi-tape TM (equivalent to single-tape, polynomial slowdown)
- Church-Turing thesis
-
Decidability and computability
- Recursive (decidable) languages: membership decidable
- Recursively enumerable (RE) languages: TM recognises (may not halt on non-members)
- Non-RE languages
- Halting problem: undecidable, RE but not recursive
- Reductions: mapping reduction, Rice's theorem
- Common undecidable problems
Finite automata
DFA formal definition
A DFA is a 5-tuple (Q, Sigma, delta, q0, F) where:
- Q: finite set of states
- Sigma: finite input alphabet
- delta: Q x Sigma -> Q (total transition function)
- q0 in Q: start state
- F subset Q: set of accepting (final) states
A string w is accepted if the extended transition function delta*(q0, w) is in F.
NFA and subset construction
An NFA allows multiple transitions on the same input and epsilon (empty) transitions. For each NFA state set, the subset construction builds the equivalent DFA. The DFA has at most 2^|Q| states but often far fewer are reachable.
ISRO trap: an NFA with n states and the equivalent DFA can have up to 2^n states. But for many practical languages, the DFA has far fewer states. ISRO sometimes asks for the minimum number of states in the DFA for a specific language.
DFA minimisation
The table-filling (or marking) algorithm identifies pairs of states that are distinguishable (one is accepting, the other is not, or they have distinguishable transitions). States that remain indistinguishable after the algorithm are merged.
Myhill-Nerode theorem: the minimum DFA has exactly as many states as there are equivalence classes of the right-congruence relation on strings over the language. This also gives a characterisation of regular languages: a language is regular if and only if it has finitely many equivalence classes.
Regular languages
Pumping lemma for regular languages
If L is regular, there exists a pumping length p such that every string s in L with |s| >= p can be written as s = xyz where:
- |xy| <= p
- |y| >= 1
- For all i >= 0, xy^i z is in L
To prove a language is NOT regular, assume it is regular, choose a string s that must have some pumping length p, and show that for all valid decompositions xyz, some pumping of y falls outside L.
Classic non-regular language: L = {a^n b^n | n >= 0}. Choosing s = a^p b^p, any y in the first p characters is a block of a's. Pumping y twice gives more a's than b's, which is not in L.
Closure properties of regular languages
Regular languages are closed under: union, intersection, complement, concatenation, Kleene star, difference, reversal, homomorphism, inverse homomorphism.
These closure properties are useful for proving a language is regular (construct it from known regular languages using closed operations) or non-regular (if a derived language is non-regular, the original cannot be regular either).
Context-free languages
CFG and ambiguity
A CFG is ambiguous if some string has two or more different parse trees (equivalently, two or more leftmost derivations, or two or more rightmost derivations).
Example: S -> S + S | S * S | id. The string "id + id * id" has two parse trees. This grammar is ambiguous.
Some languages are inherently ambiguous: every CFG for them is ambiguous. The language of arithmetic expressions is NOT inherently ambiguous (an unambiguous grammar exists). The language {a^n b^n c^m | n,m >= 0} union {a^n b^m c^m | n,m >= 0} IS inherently ambiguous.
Chomsky Normal Form
Every production is either A -> BC or A -> a (where A, B, C are variables and a is a terminal). The start symbol may produce epsilon only if epsilon is in the language.
Any CFG can be converted to CNF. CNF is useful for the CYK parsing algorithm and the pumping lemma proof.
PDA
A PDA is an NFA with a stack. It accepts by final state or by empty stack (both models are equivalent in power to CFG). Deterministic PDAs (DPDA) accept deterministic context-free languages, a proper subset of CFLs.
Closure properties of CFLs
CFLs are closed under: union, concatenation, Kleene star, homomorphism, inverse homomorphism, intersection with a regular language.
CFLs are NOT closed under: intersection (in general), complement. The intersection of two CFLs may not be a CFL.
ISRO trap: intersection of a CFL and a regular language IS a CFL. Intersection of two CFLs is not necessarily a CFL. Complement of a CFL is not necessarily a CFL (though complement of a deterministic CFL is a DCFL).
Pumping lemma for CFLs
If L is CFL, there exists p such that every s in L with |s| >= p can be written as s = uvwxy where:
- |vwx| <= p
- |vx| >= 1
- For all i >= 0, uv^i wx^i y is in L
Classic non-CFL: L = {a^n b^n c^n | n >= 0}. Choosing s = a^p b^p c^p, pumping any combination of v and x cannot maintain equal counts of all three characters simultaneously.
Turing machines
Formal definition
A TM is a 7-tuple (Q, Sigma, Gamma, delta, q0, q_accept, q_reject) where:
- Q: finite set of states
- Sigma: input alphabet (not containing the blank symbol)
- Gamma: tape alphabet (Sigma subset Gamma, includes blank B)
- delta: Q x Gamma -> Q x Gamma x {L, R} (transition function)
- q0: start state
- q_accept: accept state
- q_reject: reject state (q_accept != q_reject)
The TM reads/writes symbols on an infinite tape, moves left or right, and halts by entering q_accept or q_reject.
Equivalence of TM variants
- Nondeterministic TM: equivalent in power to deterministic TM (DTM), but DTM simulation of NTM may have exponential overhead in time.
- Multi-tape TM: equivalent to single-tape TM in power (single-tape simulation has O(t^2) overhead in time for t steps of multi-tape TM).
- Church-Turing thesis: any effectively computable function can be computed by a TM.
Decidability
Classes of languages
Recursive (decidable): a TM that always halts (accepts or rejects) on any input. Closed under complement, union, intersection.
Recursively enumerable (RE): a TM that accepts all strings in the language but may loop forever on strings not in the language. Closed under union and intersection but NOT under complement.
Non-RE: no TM can even recognise the language.
If L is RE and complement of L is RE, then L is recursive.
Halting problem
HP = {<M, w> | TM M halts on input w}.
HP is RE (a universal TM accepts when M halts on w) but not recursive (undecidable). The proof uses diagonalisation: assume a decider H exists for HP, construct a TM D that runs H on <D, D>, leading to a contradiction.
The complement of HP is not RE.
Rice's theorem
Any non-trivial property of the language recognised by a TM is undecidable. A property is non-trivial if some TMs satisfy it and some do not.
This means questions like "does M accept the empty string?", "is L(M) regular?", "is L(M) = Sigma*?" are all undecidable.
Common decidable and undecidable problems
Decidable: membership in a regular language, membership in a CFL, emptiness of a regular language, equivalence of two DFAs, emptiness of a CFL, finiteness of a regular language.
Undecidable: halting problem, emptiness of L(TM), equivalence of two CFGs (or two TMs), ambiguity of a CFG, Post's Correspondence Problem (PCP).
ISRO trap: emptiness of a DFA (or NFA, or regular language) is DECIDABLE. Emptiness of a TM's language is UNDECIDABLE.
Worked examples
Example 1: DFA minimisation
DFA with states {q0, q1, q2, q3}, alphabet {0, 1}, start = q0, final = {q1, q3}.
Transitions:
- q0 on 0 -> q1, on 1 -> q2
- q1 on 0 -> q1, on 1 -> q3
- q2 on 0 -> q1, on 1 -> q2
- q3 on 0 -> q1, on 1 -> q3
Initial partition: {q1, q3} (accepting) and {q0, q2} (non-accepting).
Check {q0, q2}: on 0, q0 -> q1 (accepting), q2 -> q1 (accepting). Same. On 1, q0 -> q2 (non-accepting), q2 -> q2 (non-accepting). Same. So q0 and q2 are indistinguishable.
Check {q1, q3}: on 0, both go to q1 (accepting). On 1, both go to q3 (accepting). Indistinguishable.
Minimised DFA has 2 states: {q0, q2} merged as state A, {q1, q3} merged as state B.
Example 2: Pumping lemma application
Show L = {ww | w in {a,b}*} is not regular.
Assume regular, let p be the pumping length. Choose s = a^p b^p a^p b^p (ww with w = a^p b^p, |s| = 4p >= p).
Any decomposition xyz with |xy| <= p and |y| >= 1 has y entirely within the first p a's.
Pumping y^0 = xz: we get a^(p-|y|) b^p a^p b^p. For this to be in L, the first half must equal the second half. First half has fewer a's, so it cannot equal a^p b^(p-something). Contradiction.
L is not regular.
Example 3: CFL intersection
Show L = {a^n b^n c^n | n >= 0} is not context-free.
L1 = {a^n b^n c^m | n, m >= 0} is CFL (grammar: S -> AB, A -> aAb | epsilon, B -> cB | epsilon). L2 = {a^m b^n c^n | m, n >= 0} is CFL similarly.
L1 intersection L2 = {a^n b^n c^n} (the set of strings where all three counts are equal). If L were CFL, and CFLs were closed under intersection, this would be a contradiction. Since CFLs are not closed under intersection, this alone is not a proof. Use pumping lemma for CFLs directly.
With pumping length p, choose s = a^p b^p c^p. Any vx has at most two types of characters. Pumping cannot increase all three equally. So for some i, uv^i wx^i y has unequal counts. Contradiction.
Example 4: Undecidability via Rice's theorem
Is it decidable whether a given TM M accepts at least one string?
This is a property of L(M): "L(M) is non-empty." It is a language property (not about M's structure). Is it non-trivial? Yes: some TMs accept nothing (trivial rejector), some accept something. So by Rice's theorem, this property is undecidable.
Example 5: PDA for a^n b^n
Grammar: S -> aSb | epsilon.
PDA: start in q0, push 'a' for each 'a' read. When 'b' is encountered, transition to q1 and pop 'a' for each 'b'. Accept if stack is empty when input is exhausted.
More formally: delta(q0, a, Z) = {(q0, AZ)}, delta(q0, a, A) = {(q0, AA)}, delta(q0, b, A) = {(q1, epsilon)}, delta(q1, b, A) = {(q1, epsilon)}, delta(q1, epsilon, Z) = {(q_accept, Z)}.
Quick-revision summary
- DFA and NFA are equivalent. NFA -> DFA via subset construction (at most 2^n DFA states).
- Regular languages are closed under boolean operations, concatenation, Kleene star.
- Pumping lemma proves languages are NOT regular (cannot prove they are regular).
- CFLs are NOT closed under intersection or complement. Intersection CFL with regular = CFL.
- Pumping lemma for CFLs: split into uvwxy with |vwx| <= p and |vx| >= 1.
- TM variants (nondeterministic, multi-tape) are equivalent in power to DTM.
- Recursive: TM always halts. RE: TM accepts (may loop). Both closed under union and intersection; recursive also closed under complement.
- Halting problem: RE but not recursive (undecidable).
- Rice's theorem: any non-trivial language property of TM is undecidable.
- Emptiness of DFA/NFA/CFL: decidable. Emptiness of TM language: undecidable.
How to study this unit
- Practise DFA construction for at least 10 different languages. The language of strings over {a, b} ending in 'ab', divisible by 3 (in binary), not containing 'aa', etc.
- Practise DFA minimisation using the table-filling algorithm on a 5-6 state DFA. Check your work by verifying the minimised DFA accepts the same strings.
- For pumping lemma proofs, follow a strict template: assume regular/CFL, choose s (in terms of p), enumerate all possible decompositions, show all lead to a contradiction.
- Know the closure property tables for regular, CFL, recursive, and RE languages. ISRO loves "is the intersection of a CFL and a regular language always a CFL?" style questions.
- Study Rice's theorem carefully. Any question asking whether a TM's language has some property is almost certainly undecidable.
- Build a mental map of the language hierarchy: regular subset CFL subset recursive subset RE. Know one example of each class and why it belongs there.
Prefer watching over reading?
Subscribe for free.