Question? Leave a message!




Microprogramming

Microprogramming
1 Microprogramming Arvind Computer Science Artificial Intelligence Lab M.I.T. Based on the material prepared by Arvind and Krste Asanovic 6.823 L4 2 Arvind ISA to Microarchitecture Mapping • An ISA often designed for a particular microarchitectural style, e.g., –CISC ⇒ microcoded –RISC ⇒ hardwired, pipelined –VLIW ⇒ fixed latency inorder pipelines –JVM ⇒ software interpretation • But an ISA can be implemented in any microarchitectural style – Pentium4: hardwired pipelined CISC (x86) machine (with some microcode support) – This lecture: a microcoded RISC (MIPS) machine – Intel will probably eventually have a dynamically scheduled outoforder VLIW (IA64) processor – PicoJava: A hardware JVM processor September 21, 2005 6.823 L4 3 Arvind Microarchitecture: Implementation of an ISA control Controller status points lines Data path Structure: How components are connected. Static Behavior: How data moves between components Dynamic September 21, 2005 6.823 L4 4 Arvind Microcontrol Unit Maurice Wilkes, 1954 Embed the control logic state table in a memory array op conditional code flipflop Next state µ address Matrix A Matrix B Decoder Control lines to ALU, MUXs, Registers September 21, 2005 6.823 L4 5 Arvind Microcoded Microarchitecture holds fixed busy µcontroller microcode instructions zero (ROM) opcode Datapath Data Addr Memory enMem holds user program (RAM) MemWrt written in macrocode instructions (e.g., MIPS, x86, etc.) September 21, 2005 6.823 L4 6 Arvind The MIPS32 ISA • Processor State 32 32bit GPRs, R0 always contains a 0 16 doubleprecision/32 singleprecision FPRs FP status register, used for FP compares exceptions PC, the program counter See HP p129­ some other special registers 137 Appendix C (online) for full • Data types description 8bit byte, 16bit half word 32bit word for integers 32bit word for single precision floating point 64bit word for double precision floating point • Load/Store style instruction set data addressing modes immediate indexed branch addressing modes PC relative register indirect Byte addressable memory bigendian mode All instructions are 32 bits September 21, 2005 6.823 L4 7 Arvind MIPS Instruction Formats 6 5 5 5 5 6 0 rs rt rd 0 func rd ← (rs) func (rt) ALU opcode rs rt immediate rt ← (rs) op immediate ALUi 6 5 5 16 opcode rs rt displacement M(rs) + displacement Mem 6 5 5 16 opcode rs offset BEQZ, BNEZ 6 5 5 16 opcode rs JR, JALR 6 26 opcode offset J, JAL September 21, 2005 6.823 L4 8 Arvind A Busbased Datapath for MIPS Opcode zero Busy ldIR 32(PC) ldMA ldA ldB OpSel 31(Link) rd rt 2 rs RegSel MA 3 rd rt addr addr IR A B rs 32 GPRs ExtSel + PC ... Memory MemWrt Imm ALU RegWrt 2 Ext control ALU 32bit Reg enReg enMem enImm enALU data data Bus 32 Microinstruction: register to register transfer (17 control signals) MA ← PC means RegSel = PC; enReg=yes; ldMA= yes B ← Regrt means RegSel = rt; enReg=yes; ldB = yes September 21, 2005 6.823 L4 9 Arvind Memory Module addr busy Write(1)/Read(0) RAM we Enable din dout bus Assumption: Memory operates asynchronously and is slow as compared to RegtoReg transfers September 21, 2005 6.823 L4 10 Arvind Instruction Execution Execution of a MIPS instruction involves 1. instruction fetch 2. decode and register fetch 3. ALU operation 4. memory operation (optional) 5. write back to register file (optional) + the computation of the next instruction address September 21, 2005 6.823 L4 11 Arvind Microprogram Fragments instr fetch: MA ← PC can be A ← PC treated as IR ← Memory PC ← A + 4 a macro dispatch on OPcode ALU: A ← Regrs B ← Regrt Regrd ← func(A,B) do instruction fetch ALUi: A ← Regrs B ← Imm sign extension ... Regrt ← Opcode(A,B) do instruction fetch September 21, 2005 6.823 L4 12 Arvind Microprogram Fragments (cont.) LW: A ← Regrs B ← Imm MA ← A + B Regrt ← Memory do instruction fetch JumpTarg(A,B) = J: A ← PC A31:28,B25:0,00 B ← IR PC ← JumpTarg(A,B) do instruction fetch beqz: A ← Regrs If zero(A) then go to bztaken do instruction fetch bztaken: A ← PC B ← Imm 2 PC ← A + B do instruction fetch September 21, 2005 6.823 L4 13 Arvind MIPS Microcontroller: first attempt Opcode 6 zero Busy (memory) latching the inputs µPC (state) may cause a How big s onecycle delay is “s” addr s ROM size µProgram ROM (opcode+status+s) = 2 words Word size data next = control+s bits state Control Signals (17) September 21, 2005 6.823 L4 14 Arvind Microprogram in the ROM worksheet State Op zero busy Control points nextstate fetch MA ← PC fetch 0 1 fetch yes .... fetch 1 1 fetch no IR ← Memory fetch 1 2 fetch A ← PC fetch 2 3 fetch PC ← A + 4 3 fetchALU PC ← A + 4 ALU 3 0 ALU A ← Regrs ALU 0 1 ALU B ← Regrt ALU 1 2 ALU Regrd ← func(A,B) fetch 2 0 September 21, 2005 6.823 L4 15 Arvind Microprogram in the ROM State Op zero busy Control points nextstate fetch MA ← PC fetch 0 1 fetch yes .... fetch 1 1 fetch no IR ← Memory fetch 1 2 fetch A ← PC fetch 2 3 fetchALU PC ← A + 4 ALU 3 0 fetchALUi PC ← A + 4 ALUi 3 0 fetchLW PC ← A + 4 LW 3 0 fetchSW PC ← A + 4 SW 3 0 fetch J PC ← A + 4 J 3 0 fetchJAL PC ← A + 4 JAL 3 0 fetchJR PC ← A + 4 JR 3 0 fetchJALR PC ← A + 4 JALR 3 0 fetchbeqz PC ← A + 4 beqz 3 0 ... ALU A ← Regrs ALU 0 1 ALU B ← Regrt ALU 1 2 ALU Regrd ← func(A,B) fetch 2 0 September 21, 2005 6.823 L4 16 Arvind Microprogram in the ROM Cont. State Op zero busy Control points nextstate ALUi A ← Regrs ALUi 0 1 ALUisExt B ← sExt (Imm) ALUi 1 16 2 ALUiuExt B ← uExt (Imm) ALUi 1 16 2 ALUi Regrd← Op(A,B) fetch 2 0 ... J A ← PC J 0 1 J B ← IR J 1 2 J PC ← JumpTarg(A,B) fetch 2 0 ... beqz A ← Regrs beqz 0 1 beqz yes A ← PC beqz 1 2 beqz no .... fetch 1 0 beqz B ← sExt (Imm) beqz 2 16 3 beqz PC ← A+B fetch 3 0 ... JumpTarg(A,B) = A31:28,B25:0,00 September 21, 2005 6.823 L4 17 Arvind Size of Control Store / status opcode w µPC / s addr (w+s) size = 2 x (c + s) next µPC Control ROM data Control signals / c MIPS: w = 6+2 c = 17 s = no. of steps per opcode = 4 to 6 + fetchsequence no. of states ≈ (4 steps per opgroup ) x opgroups + common sequences = 4 x 8 + 10 states = 42 states ⇒ s = 6 (8+6) Control ROM = 2 x 23 bits ≈ 48 Kbytes September 21, 2005 6.823 L4 18 Arvind Reducing Control Store Size Control store has to be fast ⇒ expensive • Reduce the ROM height (= address bits) – reduce inputs by extra external logic each input bit doubles the size of the control store – reduce states by grouping opcodes find common sequences of actions – condense input status bits combine all exceptions into one, i.e., exception/noexception • Reduce the ROM width – restrict the nextstate encoding Next, Dispatch on opcode, Wait for memory, ... – encode control signals (vertical microcode) September 21, 2005 6.823 L4 19 Arvind MIPS Controller V2 absolute (start of a predetermined sequence) ext Opcode opgroup µPC µPC+1 +1 input encoding µPC (state) µPCSrc reduces ROM height zero jump address busy logic µJumpType = Control ROM next spin data fetch dispatch feqz fnez nextstate encoding Control Signals (17) reduces ROM width September 21, 2005 6.823 L4 20 Arvind Jump Logic µPCSrc = Case µJumpTypes next ⇒µPC+1 spin ⇒ if (busy) then µPC else µPC+1 fetch ⇒ absolute dispatch ⇒ opgroup feqz ⇒ if (zero) then absolute else µPC+1 fnez ⇒ if (zero) then µPC+1 else absolute September 21, 2005 6.823 L4 21 Arvind Instruction Fetch ALU:MIPSController2 State Control points nextstate next fetch MA ← PC 0 spin fetch IR ← Memory 1 next fetch A ← PC 2 dispatch fetch PC ← A + 4 3 ... ALU A ← Regrs next 0 ALU B ← Regrt next 1 ALU Regrd←func(A,B) fetch 2 ALUi A ← Regrs next 0 ALUi B ← sExt (Imm) next 1 16 ALUi Regrd← Op(A,B) fetch 2 September 21, 2005 6.823 L4 22 Arvind Load Store: MIPSController2 State Control points nextstate LW A ← Regrs next 0 LW B ← sExt (Imm) next 1 16 LW MA ← A+B next 2 LW Regrt ← Memory spin 3 LW fetch 4 SW A ← Regrs next 0 SW B ← sExt (Imm) next 1 16 SW MA ← A+B next 2 SW Memory ← Regrt spin 3 SW fetch 4 September 21, 2005 6.823 L4 23 Arvind Branches: MIPSController2 State Control points nextstate BEQZ A ← Regrs next 0 BEQZ fnez 1 BEQZ A ← PC next 2 BEQZ B ← sExt (Imm2) next 3 16 BEQZ PC ← A+B fetch 4 BNEZ A ← Regrs next 0 BNEZ feqz 1 BNEZ A ← PC next 2 BNEZ B ← sExt (Imm2) next 3 16 BNEZ PC ← A+B fetch 4 September 21, 2005 6.823 L4 24 Arvind Jumps: MIPSController2 State Control points nextstate J A ← PC next 0 J B ← IR next 1 J PC ← JumpTarg(A,B) fetch 2 JR A ← Regrs next 0 JR PC ← A fetch 1 JAL A ← PC next 0 JAL Reg31 ← A next 1 JAL B ← IR next 2 JAL PC ← JumpTarg(A,B) fetch 3 JALR A ← PC next 0 JALR B ← Regrs next 1 JALR Reg31 ← A next 2 JALR PC ← B fetch 3 September 21, 2005 25 Fiveminute break to stretch your legs 6.823 L4 26 Arvind Implementing Complex Instructions Opcode zero Busy ldIR 32(PC) ldA ldB ldMA OpSel 31(Link) rd rt 2 rs RegSel MA 3 rd rt addr addr IR A B rs 32 GPRs ExtSel + PC ... Memory MemWrt ALU Imm RegWrt control 2 Ext ALU 32bit Reg enReg enMem enImm enALU data data Bus 32 rd ← M(rs) op (rt) RegMemorysrc ALU op M(rd) ← (rs) op (rt) RegMemorydst ALU op M(rd) ← M(rs) op M(rt) MemMem ALU op September 21, 2005 6.823 L4 27 Arvind MemMem ALU Instructions: MIPSController2 MemMem ALU op M(rd) ← M(rs) op M(rt) ALUMM MA ← Regrs next 0 ALUMM A ← Memory spin 1 ALUMM MA ← Regrt next 2 ALUMM B ← Memory spin 3 ALUMM MA ←Regrd next 4 ALUMM Memory ← func(A,B) spin 5 ALUMM fetch 6 Complex instructions usually do not require datapath modifications in a microprogrammed implementation only extra space for the control program Implementing these instructions using a hardwired controller is difficult without datapath modifications September 21, 2005 6.823 L4 28 Arvind Performance Issues Microprogrammed control ⇒ multiple cycles per instruction Cycle time t max(t , t , t , t ) C regreg ALUµROM RAM Given complex control, t t can be broken ALU RAM into multiple cycles. However, t cannot be µROM broken down. Hence t max(t , t ) C regregµROM Suppose 10 t t µROM RAM Good performance, relative to the singlecycle hardwired implementation, can be achieved even with a CPI of 10 September 21, 2005 6.823 L4 29 Arvind Horizontal vs Vertical µCode Bits per µInstruction µInstructions • Horizontal µcode has wider µinstructions – Multiple parallel operations per µinstruction – Fewer steps per macroinstruction – Sparser encoding ⇒ more bits • Vertical µcode has narrower µinstructions – Typically a single datapath operation per µinstruction –separate µinstruction for branches – More steps to per macroinstruction –More compact ⇒ less bits • Nanocoding – Tries to combine best of horizontal and vertical µcode September 21, 2005 6.823 L4 30 Arvind Nanocoding µcode Exploits recurring µPC (state) nextstate control signal patterns in µcode, e.g., µaddress µcode ROM ALU A ← Regrs 0 nanoaddress ... ALUi A ← Regrs 0 nanoinstruction ROM ... data • MC68000 had 17bit µcode containing either 10bit µjump or 9­ bit nanoinstruction pointer – Nanoinstructions were 68 bits wide, decoded to give 196 control signals September 21, 2005 6.823 L4 31 Arvind Some more history … • IBM 360 • Microcoding through the seventies • Microcoding now September 21, 2005 6.823 L4 32 Arvind Microprogramming in IBM 360 M30 M40 M50 M65 8 16 32 64 Datapath width (bits) 50 52 85 87 µinst width (bits) 4 4 2.75 2.75 µcode size (K minsts) CCROS TCROS BCROS BCROS µstore technology µstore cycle 750 625 500 200 (ns) memory 1500 2500 2000 750 cycle (ns) Rental fee 4 7 15 35 (K/month) Only the fastest models (75 and 95) were hardwired September 21, 2005 6.823 L4 33 Arvind Microcode Emulation • IBM initially miscalculated the importance of software compatibility with earlier models when introducing the 360 series • Honeywell stole some IBM 1401 customers by offering translation software (“Liberator”) for Honeywell H200 series machine • IBM retaliated with optional additional microcode for 360 series that could emulate IBM 1401 ISA, later extended for IBM 7000 series – one popular program on 1401 was a 650 simulator, so some customers ran many 650 programs on emulated 1401s – (650 simulated on 1401 emulated on 360) September 21, 2005 6.823 L4 34 Arvind Microprogramming thrived in the Seventies • Significantly faster ROMs than DRAMs were available • For complex instruction sets, datapath and controller were cheaper and simpler • New instructions , e.g., floating point, could be supported without datapath modifications • Fixing bugs in the controller was easier • ISA compatibility across various models could be achieved easily and cheaply Except for the cheapest and fastest machines, all computers were microprogrammed September 21, 2005 6.823 L4 35 Arvind Writable Control Store (WCS) • Implement control store with SRAM not ROM – MOS SRAM memories now almost as fast as control store (core memories/DRAMs were 210x slower) – Bugfree microprograms difficult to write • UserWCS provided as option on several minicomputers – Allowed users to change microcode for each process • UserWCS failed – Little or no programming tools support – Difficult to fit software into small space – Microcode control tailored to original ISA, less useful for others – Large WCS part of processor state expensive context switches – Protection difficult if user can change microcode – Virtual memory required restartable microcode September 21, 2005 6.823 L4 36 Arvind Microprogramming: late seventies • With the advent of VLSI technology assumptions about ROM RAM speed became invalid • Micromachines became more complicated • Micromachines were pipelined to overcome slower ROM • Complex instruction sets led to the need for subroutine and call stacks in µcode • Need for fixing bugs in control programs was in conflict with readonly nature of µROM ⇒ WCS (B1700, QMachine, Intel432, …) • Introduction of caches and buffers, especially for instructions, made multiplecycle execution of regreg instructions unattractive September 21, 2005 6.823 L4 37 Arvind Modern Usage • Microprogramming is far from extinct • Played a crucial role in micros of the Eighties Motorola 68K series Intel 386 and 486 • Microcode pays an assisting role in most modern CISC micros (AMD Athlon, Intel Pentium4 ...) • Most instructions are executed directly, i.e., with hardwired control • Infrequentlyused and/or complicated instructions invoke the microcode engine • Patchable microcode common for postfabrication bug fixes, e.g. Intel Pentiums load µcode patches at bootup September 21, 2005 38 Thank you
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.ShaneMatts
User Type:
Teacher
Country:
United States
Uploaded Date:
23-07-2017

Recommend