Computer Architecture

Why this unit matters

ISRO CS exams have a strong bias toward computer architecture numericals. Past papers consistently carry 10-14 marks from this unit, and the questions are rarely trivial definitions. Expect pipeline execution timelines requiring you to count stall cycles, cache mapping problems asking you to identify set and tag fields, effective memory access time (EMAT) calculations involving TLB hit rates, and disk I/O timing. ISRO setters particularly love two traps: forgetting to account for stall cycles when a pipeline has a data hazard, and mixing up write-through with write-back semantics in cache problems.

Getting this unit right also gives you a foundation for Operating Systems (paging, TLB) and Compiler Design (code generation, register allocation).

Syllabus map

Number systems and arithmetic
- Binary, octal, hexadecimal conversions
- Signed representation: sign-magnitude, 1s complement, 2s complement
- Fixed-point addition and subtraction in 2s complement
- Overflow detection
- Floating-point representation: IEEE 754 single and double precision
- Normalisation, rounding, precision
ALU and control unit
- ALU design: adder, subtractor, comparator, shifter
- Datapath components: MAR, MDR, PC, IR, accumulator, general-purpose registers
- Hardwired control unit: combinational logic, fast but inflexible
- Microprogrammed control unit: control memory, microinstruction formats, horizontal vs vertical microprogramming
Instruction set architecture
- Addressing modes: immediate, direct, indirect, register, register-indirect, displacement, relative, auto-increment/decrement
- RISC vs CISC characteristics
Pipelining
- Classic 5-stage pipeline: IF, ID, EX, MEM, WB
- Performance metrics: throughput, speedup, efficiency
- Hazards: structural, data (RAW, WAR, WAW), control
- Mitigation: stalls (bubbles), data forwarding, branch prediction
Memory hierarchy
- Cache organisation: direct-mapped, set-associative, fully associative
- Cache parameters: block size, number of sets, associativity, tag/index/offset field sizes
- Replacement policies: LRU, FIFO, optimal (Belady)
- Write policies: write-through, write-back, write-allocate vs no-write-allocate
- EMAT formula
- Main memory: DRAM internals (row/column access)
- Virtual memory: page tables, multi-level page tables, TLB
- EMAT with TLB
I/O organisation
- Programmed I/O (polling)
- Interrupt-driven I/O: interrupt vector, priority, maskable vs non-maskable
- DMA: burst mode, cycle steal, transparent mode
- Comparison: CPU involvement and throughput

Number systems and arithmetic

Conversions and complements

To convert a decimal integer to binary, repeatedly divide by 2 and collect remainders bottom-up. For a fraction, multiply by 2 and collect integer parts top-down.

2s complement of an n-bit number N is (2^n - N). The shortcut: flip all bits and add 1. Subtraction A - B is done as A + (2s complement of B), then discard the carry out of the MSB.

Overflow in signed addition: if two positives produce a negative result, or two negatives produce a positive result, overflow has occurred. The XOR of the carry into and out of the sign bit is the overflow flag.

ISRO trap: sign-magnitude has two representations of zero (+0 and -0). 2s complement has only one zero, so an n-bit 2s complement can represent one extra negative number (-2^(n-1)) compared to sign-magnitude.

IEEE 754 single precision

32 bits: 1 sign + 8 exponent (bias 127) + 23 mantissa (implicit leading 1).

Value = (-1)^S * 1.Mantissa * 2^(Exponent - 127)

Special cases: exponent all 0s with mantissa 0 is zero; exponent all 1s with mantissa 0 is infinity; exponent all 1s with non-zero mantissa is NaN.

ALU and control unit

Hardwired vs microprogrammed

Hardwired control generates control signals directly from a finite state machine implemented in combinational logic. It is fast because there is no memory fetch for each micro-operation, but modifying it requires rewiring.

Microprogrammed control stores micro-operations in a control store (ROM or RAM). Each machine instruction maps to a sequence of microinstructions. The control unit fetches a microinstruction, decodes it, and generates control signals. It is slower but easy to modify and extend. ISRO questions sometimes ask about the width of a horizontal microinstruction (one bit per control signal) vs a vertical microinstruction (encoded fields, narrower word but needs a decoder).

Instruction pipelining

Pipeline performance

For n instructions through a k-stage pipeline with each stage taking 1 clock cycle:

Without pipeline: n * k cycles
With ideal pipeline: k + (n - 1) cycles (k cycles to fill the pipeline, then 1 per instruction)
Speedup over non-pipelined: (n * k) / (k + n - 1), which approaches k for large n

Data hazards and forwarding

A RAW (Read After Write) hazard occurs when an instruction needs a value that a preceding instruction has not yet written back. Without forwarding, the pipeline must stall (insert bubbles).

With forwarding (bypassing), the result from the EX or MEM stage is fed directly to the EX input of a later instruction, eliminating stalls in most cases. One situation forwarding cannot fully resolve: a load followed immediately by an instruction that uses the loaded value (load-use hazard) still requires one stall cycle even with forwarding.

Control hazards

A branch decision is typically known at the EX stage. By then, two instructions after the branch have already entered IF and ID, wasting two cycles (2-cycle branch penalty in a 5-stage pipeline unless the branch is resolved earlier). Branch prediction reduces wasted cycles by guessing whether the branch is taken or not.

Memory hierarchy

Cache organisation

A cache with capacity C, block size B, and k-way set associativity has:

Number of sets S = C / (k * B)
Address breakdown: offset = log2(B) bits, index = log2(S) bits, tag = remaining bits

Direct-mapped (k=1): each main-memory block maps to exactly one cache line. Simple but suffers from conflict misses.

Fully associative (k=S): a block can go anywhere. Needs content-addressable memory (CAM) for tag comparison but has fewest conflict misses.

Set-associative: a compromise.

Write policies

Write-through: on a write hit, update both cache and main memory immediately. Simple, always consistent, but generates more memory traffic.

Write-back: on a write hit, update only the cache line and mark it dirty. Write to memory only when the dirty line is evicted. Reduces memory traffic but complicates coherence.

Write-allocate: on a write miss, bring the block into cache, then write. Usually paired with write-back.

No-write-allocate: on a write miss, write directly to memory without loading the block into cache. Usually paired with write-through.

EMAT calculation

EMAT = Hit time + Miss rate * Miss penalty

Example: cache hit time = 4 ns, miss penalty = 100 ns, miss rate = 5%. EMAT = 4 + 0.05 * 100 = 4 + 5 = 9 ns.

With TLB: each memory access first consults TLB.

EMAT = TLB hit time + TLB miss rate * (page table access time) + memory access time

(Exact formula depends on whether TLB miss requires one or multiple page-table accesses.)

Virtual memory and TLB

In a single-level page table, each virtual address is split into a page number and an offset. The page table maps the page number to a physical frame number.

A TLB (Translation Lookaside Buffer) caches recent page-table entries. TLB hit: get the frame number in 1 cycle. TLB miss: must access main memory for the page table entry.

With a two-level page table and no TLB, a single memory reference requires 3 memory accesses (two for page table levels, one for the actual data). The TLB reduces this to near 1 on a hit.

I/O organisation

Programmed I/O wastes CPU cycles busy-waiting for the device to be ready. Interrupt-driven I/O frees the CPU, which only handles the device when it signals via an interrupt. DMA transfers large blocks between memory and device without CPU involvement for each byte, making it best for bulk transfers (disk, network).

In cycle-steal mode, DMA temporarily takes the bus away from the CPU one cycle at a time without stopping program execution, making it transparent to software but slightly slower for the CPU.

Worked examples

Example 1: 2s complement subtraction

Compute 15 - 23 in 8-bit 2s complement.

15 in binary: 0000 1111 23 in binary: 0001 0111 2s complement of 23: flip bits = 1110 1000, add 1 = 1110 1001

Add: 0000 1111 + 1110 1001 = 1111 1000

1111 1000 in 2s complement: flip = 0000 0111, add 1 = 0000 1000 = 8. The sign bit is 1, so the result is -8.

Check: 15 - 23 = -8. Correct.

Example 2: Cache address breakdown

A system has a 32-bit address, 16 KB direct-mapped cache, and 64-byte blocks. Find the number of bits for tag, index, and offset.

Block offset: 64 = 2^6, so 6 bits. Number of cache lines: 16 KB / 64 B = 256 = 2^8, so 8 index bits. Tag: 32 - 8 - 6 = 18 bits.

Example 3: Pipeline execution time with stalls

A 5-stage pipeline (IF, ID, EX, MEM, WB), 1 ns per stage. Instruction sequence: I1: ADD R1, R2, R3 I2: SUB R4, R1, R5 (needs R1, written by I1 at end of WB)

Without forwarding, I2 must wait until I1 completes WB before it can use R1 in EX. I1 writes back at the end of cycle 5. I2 needs R1 at its EX stage. Without stalls, I2's EX would be cycle 4. So we need 2 stall cycles (bubbles inserted after I2's ID stage).

Total cycles for these 2 instructions: 5 (for I1) + 2 (stalls) + 4 (remaining stages for I2) = ... using the pipeline diagram:

Cycle: 1 2 3 4 5 6 7 8 9 I1: IF ID EX MEM WB I2: - IF ID st st EX MEM WB

Without forwarding, 2 stalls are needed. With forwarding (EX-to-EX), the result is available after EX of I1 at end of cycle 3, and I2 needs it at EX which starts cycle 4. Zero stalls needed with forwarding.

Example 4: EMAT with TLB

A system has: TLB access time = 20 ns, main memory access time = 100 ns, TLB hit rate = 90%, single-level page table.

On TLB hit: total access time = 20 + 100 = 120 ns (TLB lookup + data access). On TLB miss: total access time = 20 + 100 + 100 = 220 ns (TLB lookup + page table access in memory + data access).

EMAT = 0.9 * 120 + 0.1 * 220 = 108 + 22 = 130 ns.

Example 5: IEEE 754 representation

Represent -0.75 in IEEE 754 single precision.

0.75 = 0.11 in binary = 1.1 * 2^(-1). Sign: 1 (negative). Exponent: -1 + 127 = 126 = 0111 1110. Mantissa (fractional part of 1.1): 100 0000 0000 0000 0000 0000 (23 bits).

Full 32-bit: 1 | 0111 1110 | 100 0000 0000 0000 0000 0000 = BF40 0000 in hex.

Quick-revision summary

2s complement: flip and add 1. Overflow when carry-in XOR carry-out of sign bit is 1.
IEEE 754 single: 1 sign + 8 exponent (bias 127) + 23 mantissa, implicit leading 1.
Pipeline speedup approaches k (number of stages) for large n. Actual speedup is degraded by stalls.
RAW hazard without forwarding: 2 stall cycles in a 5-stage pipeline. With forwarding: 0 stalls except load-use (1 stall).
Cache address: offset = log2(block size), index = log2(number of sets), tag = rest.
Write-through: always writes to memory. Write-back: writes on eviction. Remember dirty bit.
EMAT = hit time + miss rate * miss penalty.
TLB miss in single-level paging costs 1 extra memory access; in two-level paging, 2 extra.
DMA: bulk transfers without per-byte CPU involvement.
Microprogrammed control: flexible, slower. Hardwired: fast, inflexible.

How to study this unit

Practice conversions and 2s complement arithmetic by hand until you can do an 8-bit subtraction in under 30 seconds. Overflow detection is a frequent 1-mark question.
Draw pipeline execution diagrams for 4-5 instruction sequences, deliberately introducing RAW and control hazards. Count cycles with and without forwarding.
For each cache problem, first extract offset/index/tag bits from the address size and cache parameters, then work the mapping. Never guess which bits are which.
Memorise the EMAT formula with and without TLB. Practice 3-4 variations where the hit rate or penalty changes.
Know the DMA vs interrupt-driven I/O trade-offs well enough to answer a comparison question in two sentences.
Revise IEEE 754 special cases (NaN, infinity, denormal) separately. They appear as trap questions.