Question? Leave a message!




Microprogramming

Microprogramming
1 Microprogramming Arvind Computer Science & Artificial Intelligence Lab M.I.T. Based on the material prepared by Arvind and Krste Asanovic 6.823 L4- 2 Arvind ISA to Microarchitecture Mapping • An ISA often designed for a particular microarchitectural style, e.g., –CISC ⇒ microcoded –RISC ⇒ hardwired, pipelined –VLIW ⇒ fixed latency in-order pipelines –JVM ⇒ software interpretation • But an ISA can be implemented in any microarchitectural style – Pentium-4: hardwired pipelined CISC (x86) machine (with some microcode support) – This lecture: a microcoded RISC (MIPS) machine – Intel will probably eventually have a dynamically scheduled out-of-order VLIW (IA-64) processor – PicoJava: A hardware JVM processor September 21, 2005 6.823 L4- 3 Arvind Microarchitecture: Implementation of an ISA control Controller status points lines Data path Structure: How components are connected. Static Behavior: How data moves between components Dynamic September 21, 2005 6.823 L4- 4 Arvind Microcontrol Unit Maurice Wilkes, 1954 Embed the control logic state table in a memory array op conditional code flip-flop Next state µ address Matrix A Matrix B Decoder Control lines to ALU, MUXs, Registers September 21, 2005 6.823 L4- 5 Arvind Microcoded Microarchitecture holds fixed busy? µcontroller microcode instructions zero? (ROM) opcode Datapath Data Addr Memory enMem holds user program (RAM) MemWrt written in macrocode instructions (e.g., MIPS, x86, etc.) September 21, 2005 6.823 L4- 6 Arvind The MIPS32 ISA • Processor State 32 32-bit GPRs, R0 always contains a 0 16 double-precision/32 single-precision FPRs FP status register, used for FP compares & exceptions PC, the program counter See H&P p129­ some other special registers 137 & Appendix C (online) for full • Data types description 8-bit byte, 16-bit half word 32-bit word for integers 32-bit word for single precision floating point 64-bit word for double precision floating point • Load/Store style instruction set data addressing modes- immediate & indexed branch addressing modes- PC relative & register indirect Byte addressable memory- big-endian mode All instructions are 32 bits September 21, 2005 6.823 L4- 7 Arvind MIPS Instruction Formats 6 5 5 5 5 6 0 rs rt rd 0 func rd ← (rs) func (rt) ALU opcode rs rt immediate rt ← (rs) op immediate ALUi 6 5 5 16 opcode rs rt displacement M(rs) + displacement Mem 6 5 5 16 opcode rs offset BEQZ, BNEZ 6 5 5 16 opcode rs JR, JALR 6 26 opcode offset J, JAL September 21, 2005 6.823 L4- 8 Arvind A Bus-based Datapath for MIPS Opcode zero? Busy? ldIR 32(PC) ldMA ldA ldB OpSel 31(Link) rd rt 2 rs RegSel MA 3 rd rt addr addr IR A B rs 32 GPRs ExtSel + PC ... Memory MemWrt Imm ALU RegWrt 2 Ext control ALU 32-bit Reg enReg enMem enImm enALU data data Bus 32 Microinstruction: register to register transfer (17 control signals) MA ← PC means RegSel = PC; enReg=yes; ldMA= yes B ← Regrt means RegSel = rt; enReg=yes; ldB = yes September 21, 2005 6.823 L4- 9 Arvind Memory Module addr busy Write(1)/Read(0) RAM we Enable din dout bus Assumption: Memory operates asynchronously and is slow as compared to Reg-to-Reg transfers September 21, 2005 6.823 L4- 10 Arvind Instruction Execution Execution of a MIPS instruction involves 1. instruction fetch 2. decode and register fetch 3. ALU operation 4. memory operation (optional) 5. write back to register file (optional) + the computation of the next instruction address September 21, 2005 6.823 L4- 11 Arvind Microprogram Fragments instr fetch: MA ← PC can be A ← PC treated as IR ← Memory PC ← A + 4 a macro dispatch on OPcode ALU: A ← Regrs B ← Regrt Regrd ← func(A,B) do instruction fetch ALUi: A ← Regrs B ← Imm sign extension ... Regrt ← Opcode(A,B) do instruction fetch September 21, 2005 6.823 L4- 12 Arvind Microprogram Fragments (cont.) LW: A ← Regrs B ← Imm MA ← A + B Regrt ← Memory do instruction fetch JumpTarg(A,B) = J: A ← PC A31:28,B25:0,00 B ← IR PC ← JumpTarg(A,B) do instruction fetch beqz: A ← Regrs If zero?(A) then go to bz-taken do instruction fetch bz-taken: A ← PC B ← Imm 2 PC ← A + B do instruction fetch September 21, 2005 6.823 L4- 13 Arvind MIPS Microcontroller: first attempt Opcode 6 zero? Busy (memory) latching the inputs µPC (state) may cause a How big s one-cycle delay is “s”? addr s ROM size ? µProgram ROM (opcode+status+s) = 2 words Word size ? data next = control+s bits state Control Signals (17) September 21, 2005 6.823 L4- 14 Arvind Microprogram in the ROM worksheet State Op zero? busy Control points next-state fetch MA ← PC fetch 0 1 fetch yes .... fetch 1 1 fetch no IR ← Memory fetch 1 2 fetch A ← PC fetch 2 3 fetch PC ← A + 4 ? 3 fetchALU PC ← A + 4 ALU 3 0 ALU A ← Regrs ALU 0 1 ALU B ← Regrt ALU 1 2 ALU Regrd ← func(A,B) fetch 2 0 September 21, 2005 6.823 L4- 15 Arvind Microprogram in the ROM State Op zero? busy Control points next-state fetch MA ← PC fetch 0 1 fetch yes .... fetch 1 1 fetch no IR ← Memory fetch 1 2 fetch A ← PC fetch 2 3 fetchALU PC ← A + 4 ALU 3 0 fetchALUi PC ← A + 4 ALUi 3 0 fetchLW PC ← A + 4 LW 3 0 fetchSW PC ← A + 4 SW 3 0 fetch J PC ← A + 4 J 3 0 fetchJAL PC ← A + 4 JAL 3 0 fetchJR PC ← A + 4 JR 3 0 fetchJALR PC ← A + 4 JALR 3 0 fetchbeqz PC ← A + 4 beqz 3 0 ... ALU A ← Regrs ALU 0 1 ALU B ← Regrt ALU 1 2 ALU Regrd ← func(A,B) fetch 2 0 September 21, 2005 6.823 L4- 16 Arvind Microprogram in the ROM Cont. State Op zero? busy Control points next-state ALUi A ← Regrs ALUi 0 1 ALUisExt B ← sExt (Imm) ALUi 1 16 2 ALUiuExt B ← uExt (Imm) ALUi 1 16 2 ALUi Regrd← Op(A,B) fetch 2 0 ... J A ← PC J 0 1 J B ← IR J 1 2 J PC ← JumpTarg(A,B) fetch 2 0 ... beqz A ← Regrs beqz 0 1 beqz yes A ← PC beqz 1 2 beqz no .... fetch 1 0 beqz B ← sExt (Imm) beqz 2 16 3 beqz PC ← A+B fetch 3 0 ... JumpTarg(A,B) = A31:28,B25:0,00 September 21, 2005 6.823 L4- 17 Arvind Size of Control Store / status & opcode w µPC / s addr (w+s) size = 2 x (c + s) next µPC Control ROM data Control signals / c MIPS: w = 6+2 c = 17 s = ? no. of steps per opcode = 4 to 6 + fetch-sequence no. of states ≈ (4 steps per op-group ) x op-groups + common sequences = 4 x 8 + 10 states = 42 states ⇒ s = 6 (8+6) Control ROM = 2 x 23 bits ≈ 48 Kbytes September 21, 2005 6.823 L4- 18 Arvind Reducing Control Store Size Control store has to be fast ⇒ expensive • Reduce the ROM height (= address bits) – reduce inputs by extra external logic each input bit doubles the size of the control store – reduce states by grouping opcodes find common sequences of actions – condense input status bits combine all exceptions into one, i.e., exception/no-exception • Reduce the ROM width – restrict the next-state encoding Next, Dispatch on opcode, Wait for memory, ... – encode control signals (vertical microcode) September 21, 2005 6.823 L4- 19 Arvind MIPS Controller V2 absolute (start of a predetermined sequence) ext Opcode op-group µPC µPC+1 +1 input encoding µPC (state) µPCSrc reduces ROM height zero jump address busy logic µJumpType = Control ROM next spin data fetch dispatch feqz fnez next-state encoding Control Signals (17) reduces ROM width September 21, 2005 6.823 L4- 20 Arvind Jump Logic µPCSrc = Case µJumpTypes next ⇒µPC+1 spin ⇒ if (busy) then µPC else µPC+1 fetch ⇒ absolute dispatch ⇒ op-group feqz ⇒ if (zero) then absolute else µPC+1 fnez ⇒ if (zero) then µPC+1 else absolute September 21, 2005 6.823 L4- 21 Arvind Instruction Fetch & ALU:MIPS-Controller-2 State Control points next-state next fetch MA ← PC 0 spin fetch IR ← Memory 1 next fetch A ← PC 2 dispatch fetch PC ← A + 4 3 ... ALU A ← Regrs next 0 ALU B ← Regrt next 1 ALU Regrd←func(A,B) fetch 2 ALUi A ← Regrs next 0 ALUi B ← sExt (Imm) next 1 16 ALUi Regrd← Op(A,B) fetch 2 September 21, 2005 6.823 L4- 22 Arvind Load & Store: MIPS-Controller-2 State Control points next-state LW A ← Regrs next 0 LW B ← sExt (Imm) next 1 16 LW MA ← A+B next 2 LW Regrt ← Memory spin 3 LW fetch 4 SW A ← Regrs next 0 SW B ← sExt (Imm) next 1 16 SW MA ← A+B next 2 SW Memory ← Regrt spin 3 SW fetch 4 September 21, 2005 6.823 L4- 23 Arvind Branches: MIPS-Controller-2 State Control points next-state BEQZ A ← Regrs next 0 BEQZ fnez 1 BEQZ A ← PC next 2 BEQZ B ← sExt (Imm2) next 3 16 BEQZ PC ← A+B fetch 4 BNEZ A ← Regrs next 0 BNEZ feqz 1 BNEZ A ← PC next 2 BNEZ B ← sExt (Imm2) next 3 16 BNEZ PC ← A+B fetch 4 September 21, 2005 6.823 L4- 24 Arvind Jumps: MIPS-Controller-2 State Control points next-state J A ← PC next 0 J B ← IR next 1 J PC ← JumpTarg(A,B) fetch 2 JR A ← Regrs next 0 JR PC ← A fetch 1 JAL A ← PC next 0 JAL Reg31 ← A next 1 JAL B ← IR next 2 JAL PC ← JumpTarg(A,B) fetch 3 JALR A ← PC next 0 JALR B ← Regrs next 1 JALR Reg31 ← A next 2 JALR PC ← B fetch 3 September 21, 2005 25 Five-minute break to stretch your legs 6.823 L4- 26 Arvind Implementing Complex Instructions Opcode zero? Busy? ldIR 32(PC) ldA ldB ldMA OpSel 31(Link) rd rt 2 rs RegSel MA 3 rd rt addr addr IR A B rs 32 GPRs ExtSel + PC ... Memory MemWrt ALU Imm RegWrt control 2 Ext ALU 32-bit Reg enReg enMem enImm enALU data data Bus 32 rd ← M(rs) op (rt) Reg-Memory-src ALU op M(rd) ← (rs) op (rt) Reg-Memory-dst ALU op M(rd) ← M(rs) op M(rt) Mem-Mem ALU op September 21, 2005 6.823 L4- 27 Arvind Mem-Mem ALU Instructions: MIPS-Controller-2 Mem-Mem ALU op M(rd) ← M(rs) op M(rt) ALUMM MA ← Regrs next 0 ALUMM A ← Memory spin 1 ALUMM MA ← Regrt next 2 ALUMM B ← Memory spin 3 ALUMM MA ←Regrd next 4 ALUMM Memory ← func(A,B) spin 5 ALUMM fetch 6 Complex instructions usually do not require datapath modifications in a microprogrammed implementation only extra space for the control program Implementing these instructions using a hardwired controller is difficult without datapath modifications September 21, 2005 6.823 L4- 28 Arvind Performance Issues Microprogrammed control ⇒ multiple cycles per instruction Cycle time ? t max(t , t , t , t ) C reg-reg ALUµROM RAM Given complex control, t & t can be broken ALU RAM into multiple cycles. However, t cannot be µROM broken down. Hence t max(t , t ) C reg-regµROM Suppose 10 t t µROM RAM Good performance, relative to the single-cycle hardwired implementation, can be achieved even with a CPI of 10 September 21, 2005 6.823 L4- 29 Arvind Horizontal vs Vertical µCode Bits per µInstruction µInstructions • Horizontal µcode has wider µinstructions – Multiple parallel operations per µinstruction – Fewer steps per macroinstruction – Sparser encoding ⇒ more bits • Vertical µcode has narrower µinstructions – Typically a single datapath operation per µinstruction –separate µinstruction for branches – More steps to per macroinstruction –More compact ⇒ less bits • Nanocoding – Tries to combine best of horizontal and vertical µcode September 21, 2005 6.823 L4- 30 Arvind Nanocoding µcode Exploits recurring µPC (state) next-state control signal patterns in µcode, e.g., µaddress µcode ROM ALU A ← Regrs 0 nanoaddress ... ALUi A ← Regrs 0 nanoinstruction ROM ... data • MC68000 had 17-bit µcode containing either 10-bit µjump or 9­ bit nanoinstruction pointer – Nanoinstructions were 68 bits wide, decoded to give 196 control signals September 21, 2005 6.823 L4- 31 Arvind Some more history … • IBM 360 • Microcoding through the seventies • Microcoding now September 21, 2005 6.823 L4- 32 Arvind Microprogramming in IBM 360 M30 M40 M50 M65 8 16 32 64 Datapath width (bits) 50 52 85 87 µinst width (bits) 4 4 2.75 2.75 µcode size (K minsts) CCROS TCROS BCROS BCROS µstore technology µstore cycle 750 625 500 200 (ns) memory 1500 2500 2000 750 cycle (ns) Rental fee 4 7 15 35 (K/month) Only the fastest models (75 and 95) were hardwired September 21, 2005 6.823 L4- 33 Arvind Microcode Emulation • IBM initially miscalculated the importance of software compatibility with earlier models when introducing the 360 series • Honeywell stole some IBM 1401 customers by offering translation software (“Liberator”) for Honeywell H200 series machine • IBM retaliated with optional additional microcode for 360 series that could emulate IBM 1401 ISA, later extended for IBM 7000 series – one popular program on 1401 was a 650 simulator, so some customers ran many 650 programs on emulated 1401s – (650 simulated on 1401 emulated on 360) September 21, 2005 6.823 L4- 34 Arvind Microprogramming thrived in the Seventies • Significantly faster ROMs than DRAMs were available • For complex instruction sets, datapath and controller were cheaper and simpler • New instructions , e.g., floating point, could be supported without datapath modifications • Fixing bugs in the controller was easier • ISA compatibility across various models could be achieved easily and cheaply Except for the cheapest and fastest machines, all computers were microprogrammed September 21, 2005 6.823 L4- 35 Arvind Writable Control Store (WCS) • Implement control store with SRAM not ROM – MOS SRAM memories now almost as fast as control store (core memories/DRAMs were 2-10x slower) – Bug-free microprograms difficult to write • User-WCS provided as option on several minicomputers – Allowed users to change microcode for each process • User-WCS failed – Little or no programming tools support – Difficult to fit software into small space – Microcode control tailored to original ISA, less useful for others – Large WCS part of processor state - expensive context switches – Protection difficult if user can change microcode – Virtual memory required restartable microcode September 21, 2005 6.823 L4- 36 Arvind Microprogramming: late seventies • With the advent of VLSI technology assumptions about ROM & RAM speed became invalid • Micromachines became more complicated • Micromachines were pipelined to overcome slower ROM • Complex instruction sets led to the need for subroutine and call stacks in µcode • Need for fixing bugs in control programs was in conflict with read-only nature of µROM ⇒ WCS (B1700, QMachine, Intel432, …) • Introduction of caches and buffers, especially for instructions, made multiple-cycle execution of reg-reg instructions unattractive September 21, 2005 6.823 L4- 37 Arvind Modern Usage • Microprogramming is far from extinct • Played a crucial role in micros of the Eighties Motorola 68K series Intel 386 and 486 • Microcode pays an assisting role in most modern CISC micros (AMD Athlon, Intel Pentium-4 ...) • Most instructions are executed directly, i.e., with hard-wired control • Infrequently-used and/or complicated instructions invoke the microcode engine • Patchable microcode common for post-fabrication bug fixes, e.g. Intel Pentiums load µcode patches at bootup September 21, 2005 38 Thank you
Website URL
Comment
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.ShaneMatts
User Type:
Teacher
Country:
United States
Uploaded Date:
23-07-2017