Theory of Computation
Regular and context-free languages, automata, pumping lemma, Turing machines, undecidability.
Why this unit matters
Theory of Computation (TOC) consistently contributes 4-6 marks in GATE CS. More importantly, it is a unit where almost every mark is available to anyone who masters a small set of formal tools. There are no implementation tricks, no memorisation of library calls, just clean mathematical reasoning. Students who skip TOC because it feels abstract almost always regret it on exam day. The questions are precise, the marks are reliable, and the preparation time-to-mark ratio is among the best in the syllabus.
Syllabus map
- Regular expressions and finite automata (DFA, NFA, epsilon-NFA, minimisation, equivalence), Context-free grammars (CFG) and pushdown automata (PDA), Regular and context-free languages, pumping lemma for both classes, Turing machines (TM) and undecidability (halting problem, Rice's theorem, reductions)
Regular expressions and finite automata
Deterministic finite automata
A DFA is a 5-tuple (Q, Sigma, delta, q0, F) where Q is a finite set of states, Sigma is the input alphabet, delta: Q x Sigma -> Q is the transition function, q0 is the start state, and F is the set of accepting states.
A string w is accepted if the extended transition function delta*(q0, w) lands in F.
Key properties:
- DFAs and NFAs recognise exactly the same class of languages (regular languages)., Every NFA can be converted to a DFA via the subset construction. If the NFA has n states the resulting DFA has at most 2^n states., Minimisation: use the table-filling algorithm to find distinguishable state pairs, then merge indistinguishable ones.
NFA to DFA conversion, the subset construction
Start with {q0} (or the epsilon-closure of q0 for an epsilon-NFA). For each unprocessed subset S and each symbol a, compute the set of states reachable from any state in S on a. That set is a new DFA state. A DFA state is accepting if it contains at least one NFA accepting state.
GATE trap: Counting DFA states after subset construction. Remember: the empty set {} is also a valid (dead/trap) DFA state.
Regular expressions
Every regular language can be described by a regular expression over operations: concatenation, union (+), Kleene star (*).
Arden's lemma: If X = AX + B (languages), then X = A*B. This is used to extract a regex from a DFA's state equations.
Minimisation
Myhill-Nerode theorem: the minimum DFA for a language L has one state per equivalence class of the relation ~L (x ~L y iff for all z, xz in L iff yz in L). Two practical uses: proving a language is not regular (infinitely many classes), and counting minimum DFA states.
GATE trap: Questions ask for the minimum number of states including the dead state.
Context-free grammars and pushdown automata
CFGs
A CFG is a 4-tuple (V, Sigma, R, S) where V is variables, Sigma is terminals, R is production rules, S is the start variable.
A language is context-free if and only if it is accepted by some PDA.
Normal forms:
- Chomsky Normal Form (CNF): every production is A -> BC or A -> a. Useful for the CYK parsing algorithm., Greibach Normal Form (GNF): every production starts with a terminal, A -> a alpha. Useful for constructing PDAs.
Closure properties of CFLs: closed under union, concatenation, Kleene star. NOT closed under intersection or complement. (Intersection of two CFLs can be context-sensitive.)
GATE trap: Intersection of a CFL and a regular language is always a CFL. This is tested repeatedly.
Pushdown automata
A PDA is like an NFA with a stack. It can be defined to accept by empty stack or by final state, both models are equivalent for CFLs.
Deterministic PDAs (DPDA) are strictly weaker than nondeterministic PDAs. Every DCFL (language accepted by a DPDA) is a CFL, but not every CFL is a DCFL. Example: the language of palindromes over {a, b} is a CFL but not a DCFL.
Pumping lemma
For regular languages
If L is regular, there exists a pumping length p such that every string w in L with |w| >= p can be written as w = xyz where:
- |xy| <= p
- |y| >= 1
- For all i >= 0, xy^i z is in L.
Use this to prove a language is NOT regular: assume L is regular, pick a clever string (usually parameterised by p), show that no valid split can be pumped to stay in L. Contradiction.
GATE trap: The pumping lemma gives a necessary condition, not a sufficient one. Passing the pumping lemma does not prove a language is regular.
For context-free languages
If L is a CFL, there exists p such that every w in L with |w| >= p can be written as w = uvxyz where:
- |vxy| <= p
- |vy| >= 1
- For all i >= 0, uv^i xy^i z is in L.
Note: both v and y are pumped together.
Turing machines and undecidability
Turing machines
A TM is a 7-tuple: (Q, Sigma, Gamma, delta, q0, q_accept, q_reject). Gamma is the tape alphabet (superset of Sigma plus the blank symbol). delta: Q x Gamma -> Q x Gamma x {L, R} is the transition.
Variants (multi-tape TM, nondeterministic TM, enumerators) are all equivalent in power to the standard single-tape TM.
A language is Turing-recognisable (recursively enumerable, RE) if some TM accepts it. A language is decidable (recursive) if some TM always halts and decides membership.
Closure:
- Decidable: closed under complement, union, intersection, concatenation, star., RE: closed under union, concatenation, star. NOT closed under complement.
The halting problem
The halting problem H = {<M, w> | TM M halts on input w} is Turing-recognisable but not decidable. The proof uses diagonalisation.
Rice's theorem
Any non-trivial semantic property of the language recognised by a TM is undecidable. A property is non-trivial if some TMs have it and some do not. This gives a fast way to show undecidability: identify a property of L(M), check that it is non-trivial.
GATE trap: Rice's theorem applies to properties of L(M) (the language), not to properties of the TM itself (e.g., "does M have 5 states?" is decidable).
Reductions
To show problem A is undecidable, reduce a known undecidable problem (often the halting problem) to A: if A were decidable, we could decide H, contradiction.
Many-one reduction (A <=m B): a computable function f such that w in A iff f(w) in B.
Worked examples
Example 1. How many states does the minimum DFA have for the language L = {w in {a,b}* | w ends with 'ab'}?
The equivalence classes of ~L are: strings whose longest suffix that is a prefix of 'ab' has length 0 (neither 'a' nor 'ab' seen at end), length 1 ('a' seen at end), and length 2 ('ab' seen). Plus a dead state is not needed here because every string is live. The minimum DFA has 3 states: one for "last two chars not ending in a", one for "last char is a", one for "last two chars are ab" (accepting). Answer: 3 states.
Example 2. Is L = {a^n b^n c^n | n >= 1} a CFL?
Use the pumping lemma for CFLs. Let p be the pumping length. Pick w = a^p b^p c^p. Any split w = uvxyz with |vxy| <= p means vxy spans at most two of the three character groups. Pumping i = 2 increases at most two of the counts but not all three equally. So the pumped string is not in L. Therefore L is not a CFL, it is context-sensitive.
Example 3. Is the property "L(M) is finite" decidable?
This is a non-trivial semantic property of L(M): some TMs recognise finite languages, some do not. By Rice's theorem it is undecidable.
Example 4. Convert the regular expression (a+b)*abb to a DFA and count states.
Build the NFA by Thompson's construction (states for each operator), then apply subset construction. The resulting minimum DFA has 4 states corresponding to: no progress, seen 'a', seen 'ab', seen 'abb' (accepting). The dead state is not needed because transitions from the accepting state loop back sensibly.
Example 5. Prove L = {a^n | n is a prime} is not regular.
Assume it is regular with pumping length p. Pick w = a^q where q > p is a prime. Write w = xyz with |xy| <= p, |y| >= 1, so |y| = k where 1 <= k <= p. Consider i = q: pump to xy^q z has length q + (q-1)k = q(1+k), k. Choose i = 0: length q, k. We need q, k to be prime. But we can choose i = q + k, giving length q + k(q+k-1) = q + kq + k^2, k = (q+k)(... ), actually the cleanest path is to pump with i = q: length = |xz| + q*|y| = (q, k) + qk = q, k + qk = q(1+k), k. Since 1+k > 1 and q > p >= k, this is composite. Contradiction. Hence L is not regular.
Quick-revision summary
| Automaton | Language class | Closure under intersection? | Closure under complement? |
|---|---|---|---|
| DFA/NFA | Regular | Yes | Yes |
| PDA (nondeterministic) | CFL | No | No |
| DPDA | DCFL | No | Yes |
| TM (decidable) | Recursive | Yes | Yes |
| TM (recogniser) | RE | No | No |
Key facts to memorise:
- Epsilon-NFA -> NFA -> DFA: all equivalent; subset construction gives at most 2^n DFA states., CFL intersection regular = CFL., Halting problem is RE but not decidable., Rice's theorem: non-trivial semantic property of L(M) => undecidable., Pumping lemma: necessary condition only, not sufficient.
How to study this unit
- Build all four automaton types from scratch on paper, DFA, NFA, PDA, TM. If you cannot construct one for a simple language, you do not know it yet.
- Practice at least 10 subset-construction problems and 5 minimisation problems. Speed matters here.
- For pumping lemma, always write out the three conditions formally before choosing your string. Skipping formalism leads to errors.
- For Rice's theorem problems, write one sentence identifying the property and one sentence confirming it is non-trivial.
- Do GATE previous-year questions from 2010 onwards, TOC questions are highly predictable in style.
- Pay special attention to closure properties: a single table on a flashcard covers a third of all TOC MCQs.
Prefer watching over reading?
Subscribe for free.