Computer System Architecture

Why this unit matters

Unit 2 is the hardware backbone of the NET CS paper. Questions here test both conceptual understanding (why does pipelining improve throughput?) and numerical ability (convert a floating-point number, calculate cache miss penalty, trace a pipeline hazard). The unit bridges the gap between raw silicon and the instruction set, making it foundational for understanding compilers and operating systems in later units. Expect 8-12 direct questions from this unit in each NET exam.

Syllabus map

Every named topic in the official UGC NET CS Code 87 syllabus for Unit 2:

Digital Logic Circuits: logic gates, Boolean algebra, Karnaugh map simplification, combinational circuits (adders, subtractors, comparators), flip-flops (SR, JK, D, T), sequential circuits, integrated circuits, decoders, multiplexers/demultiplexers, registers, counters, memory unit, Data Representation: data types, number systems (binary, octal, hexadecimal), complements (1's, 2's, 9's, 10's), fixed-point representation, floating-point representation (IEEE 754), error detection codes (parity, Hamming), BCD, Gray code, computer arithmetic (addition, subtraction, multiplication, division), Register Transfer Language (RTL) and Microoperations: register notation, bus and memory transfers, arithmetic microoperations, logic microoperations, shift microoperations, Basic Computer Organisation and Design: stored-program concept, instruction codes, processor registers, instruction cycle, memory-reference instructions, input-output, interrupt handling, timing and control, Programming the Basic Computer: machine language, assembly language, assembler design, program loops, subroutines, I/O programming, Microprogrammed Control: control memory, address sequencing, microprogrammed CPU design, horizontal vs vertical microcode, Central Processing Unit: general register organisation, stack organisation, instruction formats (zero, one, two, three address), addressing modes (immediate, direct, indirect, register, register indirect, autoincrement, autodecrement, displacement, relative, indexed), RISC vs CISC, Pipeline and Vector Processing: parallel processing concepts, instruction pipelining, pipeline stages, data hazards, control hazards, structural hazards, hazard mitigation, arithmetic pipelines, vector processing, array processors, Input-Output Organisation: peripheral devices, I/O interface (I/O ports, I/O bus), asynchronous data transfer (strobe, handshaking), modes of transfer (programmed I/O, interrupt-driven, DMA), priority interrupt, daisy-chain, DMA controller, serial communication (RS-232, UART), Memory Hierarchy: main memory (DRAM, SRAM), auxiliary storage, associative memory (content-addressable), cache memory (direct mapped, set-associative, fully associative, replacement policies, write policies), virtual memory, memory management unit (MMU), TLB, Multiprocessors: characteristics of multiprocessors, interconnection structures (bus, crossbar switch, multistage network), arbitration and communication, synchronisation, cache coherence (snooping, directory-based protocols), multicore processors

Digital Logic Circuits

Logic Gates and Boolean Algebra

The seven basic gates: AND, OR, NOT, NAND, NOR, XOR, XNOR. NAND and NOR are individually functionally complete, any Boolean function can be realised with only NAND gates, or only NOR gates.

Karnaugh map (K-map) simplification: a visual method for minimising Boolean expressions. Group adjacent 1s (SOP) or 0s (POS) in groups of 1, 2, 4, 8, or 16. The map wraps around; corners and edges are adjacent. The largest group gives the fewest literals.

Combinational Circuits

A half adder computes Sum = A XOR B, Carry = A AND B. A full adder includes a carry-in: Sum = A XOR B XOR Cin, Cout = (A AND B) OR (Cin AND (A XOR B)). Ripple carry adder: chain n full adders; the carry propagates from bit to bit. Slow for large n. Carry lookahead adder: computes carry for all bits simultaneously using generate (G = A AND B) and propagate (P = A XOR B) signals.

Decoder: n-to-2^n. Enables exactly one of 2^n output lines based on n input bits. Multiplexer (MUX): 2^n-to-1. Selects one of 2^n data lines based on n select lines.

Flip-flops and Sequential Circuits

SR flip-flop: Set and Reset inputs. Forbidden state when both S=1, R=1. D flip-flop: output follows D on clock edge. Eliminates the forbidden state. JK flip-flop: when J=1, K=1, toggles output. Most versatile. T flip-flop: toggles when T=1. Used in counters.

Registers: a group of flip-flops that hold a multi-bit value. Shift registers shift bits left or right on each clock cycle.

Counters: ripple counter (asynchronous, simpler, has propagation delay), synchronous counter (all flip-flops clock together, faster).

Data Representation

Number Systems

Decimal, binary (base 2), octal (base 8), hexadecimal (base 16).

Conversion: decimal to binary, divide repeatedly by 2, collect remainders bottom-up. Example: 45 in decimal = 101101 in binary (32+8+4+1).

Binary to hexadecimal: group binary digits in groups of 4 from the right. Example: 10110110 = 1011 | 0110 = B6 in hex.

Complements

1's complement: flip all bits. 2's complement: flip all bits and add 1. Used for signed integer arithmetic. Example: -5 in 8-bit 2's complement: 5 = 00000101, 1's complement = 11111010, 2's complement = 11111011.

9's complement of decimal N with d digits: (10^d, 1), N. 10's complement: 9's complement + 1. Same concept as 2's complement, for decimal.

Floating-Point Representation (IEEE 754)

Single precision (32 bits): 1 sign bit, 8 exponent bits (biased by 127), 23 mantissa bits. Double precision (64 bits): 1 sign bit, 11 exponent bits (biased by 1023), 52 mantissa bits.

Value = (-1)^sign * 1.mantissa * 2^(exponent, bias)

Special cases: exponent all 0s = denormalised number (no implicit leading 1). Exponent all 1s + mantissa 0 = infinity. Exponent all 1s + mantissa nonzero = NaN.

Error Detection Codes

Parity bit: adds one bit so that the total number of 1s is even (even parity) or odd (odd parity). Detects single-bit errors, cannot correct them.

Hamming code: places parity bits at positions that are powers of 2 (1, 2, 4, 8, ...). Each parity bit covers a specific set of data bits. Can detect and correct single-bit errors; detect (but not correct) double-bit errors.

For a message of m data bits, the number of parity bits r must satisfy: 2^r >= m + r + 1.

RTL and Microoperations

Register Transfer Language uses notation like R1 <- R2 to mean "copy contents of R2 into R1." A bus allows multiple registers to share a single transfer path.

Arithmetic microoperations: add, subtract, increment, decrement, with or without carry. Logic microoperations: AND, OR, XOR, complement, applied bitwise. Shift microoperations: logical shift (fills with 0), arithmetic shift (preserves sign bit), rotate (wraps bits around).

Basic Computer Organisation and Design

Instruction Cycle

The basic fetch-decode-execute cycle:

Fetch: load the instruction from the address in the program counter (PC) into the instruction register (IR). Increment PC.
Decode: the control unit interprets the opcode.
Execute: the ALU or memory unit carries out the operation.
Interrupt check: after execute, check if an interrupt is pending.

Instruction Formats and Addressing Modes

A typical instruction has an opcode field and one or more address fields.

Immediate: operand is part of the instruction itself. Fast, but limited range., Direct: instruction contains the memory address of the operand., Indirect: instruction contains the address of a memory location that holds the actual address. Two memory accesses., Register: operand is in a named register., Register Indirect: register holds the address of the operand., Indexed: effective address = base address + index register., Relative: effective address = PC + offset. Used for branches., Autoincrement / Autodecrement: register is automatically incremented or decremented after (or before) use. Useful for arrays and stacks.

CPU Architecture

RISC vs CISC

CISC (Complex Instruction Set Computer): many complex instructions, variable-length encoding, multiple addressing modes, instructions take variable clock cycles. Example: x86.

RISC (Reduced Instruction Set Computer): small, uniform instruction set, fixed-length instructions, load-store architecture (only load and store access memory; all computation on registers), most instructions execute in one cycle. Examples: ARM, MIPS.

RISC advantages: simpler control unit, easier pipelining, more registers. CISC advantages: denser code (fewer instructions to fetch), easier to program in assembly.

Stack and Register Organisations

Zero-address instruction: both operands are implicit on a stack. The stack is the central data structure (e.g., Java bytecode, Forth).

General register organisation: multiple general-purpose registers, ALU connected via internal bus. Instructions specify register numbers explicitly.

Pipeline and Vector Processing

Instruction Pipeline

Pipelining overlaps execution of multiple instructions. A classic 5-stage pipeline:

IF, Instruction Fetch
ID, Instruction Decode / Register Read
EX, Execute (ALU operation)
MEM, Memory Access
WB, Write Back to register

Throughput: with n stages and k instructions, time = (n + k, 1) * cycle time vs k * n * cycle time without pipelining. Speedup approaches n for large k.

Hazards

Data hazard: an instruction depends on a result not yet written by a previous instruction. Solutions: forwarding (bypass the pipeline, send result directly), stalling (insert NOP bubbles), out-of-order execution.

Control hazard: a branch changes the PC before the fetch stage can react. Solutions: branch prediction, delayed branching, flushing the pipeline on mispredict.

Structural hazard: two instructions need the same hardware resource simultaneously. Solution: add duplicate resources, or stall.

Vector Processing

A vector processor operates on arrays of data in a single instruction (SIMD, Single Instruction Multiple Data). The vector registers hold multiple data elements. Used in scientific computing (matrix operations, signal processing). Example: SSE/AVX extensions in modern x86.

Input-Output Organisation

Transfer Modes

Programmed I/O: CPU polls the device status register continuously (busy waiting). Simple but wastes CPU time.

Interrupt-driven I/O: device raises an interrupt when ready. CPU handles the interrupt via an Interrupt Service Routine (ISR) and then resumes normal work. More efficient.

Direct Memory Access (DMA): a DMA controller transfers data directly between device and memory without CPU involvement per byte. CPU is only involved at the start (setup) and end (interrupt on completion). Best for high-bandwidth transfers.

Priority Interrupt

Multiple devices may interrupt simultaneously. Daisy-chain: devices are linked in series; priority is positional. Vectored interrupts: each device provides its own vector address for the ISR. Interrupt mask registers allow software to disable specific interrupt lines.

Memory Hierarchy

Cache Memory

Cache sits between CPU and main memory. It exploits spatial locality (nearby locations accessed together) and temporal locality (recently accessed locations accessed again soon).

Direct mapped cache: each main memory block maps to exactly one cache line. Address is split into tag, index, and block offset. Simple but suffers from conflict misses.

Set-associative cache: each block can go in one of k cache lines (k-way set associative). k=1 is direct mapped; k = number of lines is fully associative.

Replacement policies: LRU (Least Recently Used), FIFO, Random.

Write policies:

Write-through: write to both cache and main memory simultaneously., Write-back: write only to cache; write to main memory only when the cache line is evicted (dirty bit).

Miss penalty: additional cycles taken when a cache miss occurs and data must be fetched from main memory. Average memory access time = hit time + miss rate * miss penalty.

Virtual Memory and TLB

Virtual memory allows programs to use more address space than physical RAM. Pages are mapped via a page table. When a page is not in RAM, a page fault occurs and the OS loads the page from disk.

TLB (Translation Lookaside Buffer): a small, fast cache for page table entries. Reduces the overhead of virtual-to-physical address translation.

Multiprocessors

Tightly coupled (shared memory) multiprocessors communicate via a shared address space. Loosely coupled (message passing) communicate via explicit messages.

Cache coherence problem: if two processors have the same cache line and one writes, the other has a stale copy. Solutions:

Snooping protocol: each cache monitors (snoops) the shared bus. Write-invalidate (MESI protocol) invalidates other copies on a write. Write-update broadcasts the new value.

Directory-based protocol: a directory tracks which caches hold each line. Scales better than bus snooping for large systems.

Multicore: multiple processor cores on a single chip sharing last-level cache (LLC) and memory controller.

Worked Examples

Example 1: Data Representation, 2's Complement Arithmetic

Add -6 and +4 in 8-bit 2's complement. -6: 6 = 00000110, 2's complement = 11111010. +4 = 00000100. Sum: 11111010 + 00000100 = 11111110. 11111110 is 2's complement of 00000010 = 2, so the result is -2. Correct: -6 + 4 = -2.

Example 2: Hamming Code

Data bits: 1011 (4 bits). How many parity bits r needed? 2^r >= 4 + r + 1. For r=3: 8 >= 8. Yes, r=3. Bit positions 1-7: P1 at 1, P2 at 2, D1 at 3, P3 at 4, D2 at 5, D3 at 6, D4 at 7. Data: D1=1, D2=0, D3=1, D4=1. P1 covers positions 1,3,5,7: D1 XOR D2 XOR D4 = 1 XOR 0 XOR 1 = 0. P2 covers positions 2,3,6,7: D1 XOR D3 XOR D4 = 1 XOR 1 XOR 1 = 1. P3 covers positions 4,5,6,7: D2 XOR D3 XOR D4 = 0 XOR 1 XOR 1 = 0. Transmitted: P1=0, P2=1, D1=1, P3=0, D2=0, D3=1, D4=1 → 0110011.

Example 3: Cache Miss, Average Memory Access Time

Cache hit time = 10 ns, miss rate = 5%, main memory access time = 100 ns. Average access time = 10 + 0.05 * 100 = 10 + 5 = 15 ns. (Assuming write-through with no write buffer and sequential access on miss.)

Example 4: Pipeline Speedup

A non-pipelined processor executes each instruction in 80 ns (8 stages of 10 ns each). A pipelined processor uses 8 stages at 10 ns each. Speedup for 1000 instructions: non-pipelined = 1000 * 80 = 80,000 ns. Pipelined = (8 + 1000, 1) * 10 = 1007 * 10 = 10,070 ns. Speedup = 80,000 / 10,070 ≈ 7.94.

Example 5: IEEE 754 Single Precision

Represent -0.75 in IEEE 754 single precision. 0.75 = 0.11 in binary = 1.1 * 2^(-1). Sign = 1 (negative). Exponent = -1 + 127 = 126 = 01111110. Mantissa = 10000000000000000000000 (23 bits after the implicit 1). Result: 1 01111110 10000000000000000000000.

Quick-Revision Summary

2's complement = 1's complement + 1. Used for signed integers., IEEE 754 single: 1 sign, 8 exponent (bias 127), 23 mantissa bits., Hamming code parity bits satisfy 2^r >= m + r + 1., Pipeline speedup approaches number of stages for large instruction counts., Pipeline hazards: data, control, structural. Solutions: forwarding, stalling, prediction, duplication., Cache: direct mapped (fastest lookup, most conflict), fully associative (most flexible, expensive)., Average access time = hit time + miss rate * miss penalty., Virtual memory uses page tables; TLB accelerates translation., DMA transfers data without per-byte CPU involvement., Cache coherence: snooping (bus-based) vs directory-based (scalable)., RISC: fixed-length, load-store, single-cycle instructions. CISC: variable-length, complex addressing.

How to Study This Unit

Data representation questions are almost always numerical. Practice converting numbers and representing them in IEEE 754 until it is mechanical.
For cache memory, work through direct-mapped and 2-way set-associative examples with small caches (4-8 lines). Calculate hit/miss for a sequence of memory accesses.
Draw the 5-stage pipeline and trace through 5-6 instructions by hand, including a data hazard, to understand where stalls appear.
For digital logic, practice K-map simplification for both 3-variable and 4-variable maps; know half adder and full adder truth tables.
RISC vs CISC is a favourite short-answer topic, have a concise 5-point comparison ready.
Previous NET papers frequently test addressing modes with a formula like EA = Base + Index. Memorise the EA formula for each mode.