Question? Leave a message!




Pipeline Hazards

Pipeline Hazards
Dr.ShaneMatts Profile Pic
Dr.ShaneMatts,United States,Teacher
Published Date:23-07-2017
Website URL
Comment
1 Pipeline Hazards Arvind Computer Science and Artificial Intelligence Laboratory M.I.T. Based on the material prepared by Arvind and Krste Asanovic6.823 L6- 2 Arvind Technology Assumptions • A small amount of very fast memory (caches) backed up by a large, slower memory • Fast ALU (at least for integers) • Multiported Register files (slower) It makes the following timing assumption valid t ≈ t ≈ t ≈ t ≈ t IM RF ALU DM RW A 5-stage pipelined Harvard architecture will be the focus of our detailed design September 28, 20056.823 L6- 3 Arvind 5-Stage Pipelined Execution 0x4 Add we rs1 rs2 rd1 we PC addr ws IR addr rdata ALU wd rd2 rdata GPRs Data Inst. Memory Imm Memory wdata Ext Write Decode, Reg. Fetch I-Fetch Execute Memory -Back (ID) (IF) (EX) (MA) (WB) time t0 t1 t2 t3 t4 t5 t6 t7 . . . . instruction1 IF ID EX MA WB 1 1 1 1 1 instruction2 IF ID EX MA WB 2 2 2 2 2 instruction3 IF ID EX MA WB 3 3 3 3 3 instruction4 IF ID EX MA WB 4 4 4 4 4 instruction5 IF ID EX MA WB 5 5 5 5 5 September 28, 20056.823 L6- 4 Arvind 5-Stage Pipelined Execution Resource Usage Diagram 0x4 Add we rs1 rs2 rd1 we PC addr ws IR addr rdata ALU wd rd2 rdata GPRs Data Inst. Memory Imm Memory wdata Ext Write Decode, Reg. Fetch I-Fetch Execute Memory -Back (ID) (IF) (EX) (MA) (WB) time t0 t1 t2 t3 t4 t5 t6 t7 . . . . IF I I I I I 1 2 3 4 5 ID I I I I I 1 2 3 4 5 EX I I I I I 1 2 3 4 5 MA I I I I I 1 2 3 4 5 WB I I I I I 1 2 3 4 5 September 28, 2005 Resources6.823 L6- 5 Arvind Pipelined Execution: ALU Instructions 0x4 IR IR IR Add 31 we rs1 rs2 A addr rd1 we PC ws inst IR addr ALU Y wd rd2 Inst GPRs B rdata Memory Data R Memory Imm wdata wdata Ext MD1 MD2 Not quite correct We need an Instruction Reg (IR) for each stage September 28, 20056.823 L6- 6 Arvind IR’s and Control points 0x4 IR IR IR Add 31 we rs1 rs2 A addr rd1 we PC ws inst IR addr ALU Y wd rd2 Inst GPRs B rdata Memory Data R Memory Imm wdata wdata Ext MD1 MD2 Are control points connected properly? - ALU instructions - Load/Store instructions -Write back September 28, 20056.823 L6- 7 Arvind Pipelined MIPS Datapath without jumps FD E M W IR IR IR 31 0x4 Add RegDst RegWrite we OpSel MemWrite rs1 WBSrc rs2 A addr rd1 we PC ws inst IR addr ALU Y wd rd2 Inst GPRs B rdata Memory Data R Memory Imm wdata wdata Ext MD1 MD2 ExtSel BSrc September 28, 20056.823 L6- 8 Arvind How Instructions can Interact with each other in a pipeline • An instruction in the pipeline may need a resource being used by another instruction in the pipeline – structural hazard • An instruction may produce data that is needed by a later instruction – data hazard • In the extreme case, an instruction may determine the next instruction to be executed – control hazard (branches, interrupts,...) September 28, 20056.823 L6- 9 Arvind Data Hazards r1 ← … r4 ← r1 … 0x4 IR IR IR Add 31 we rs1 rs2 A addr rd1 we PC ws inst IR addr ALU Y wd rd2 Inst GPRs B rdata Memory Data R Memory Imm wdata wdata Ext MD1 MD2 ... r1 ← r0 + 10 r4 ← r1 + 17 r1 is stale. Oops ... September 28, 20056.823 L6- 10 Arvind Resolving Data Hazards Freeze earlier pipeline stages until the data becomes available ⇒ interlocks If data is available somewhere in the datapath provide a bypass to get it to the right stage Speculate about the hazard resolution and kill the instruction later if the speculation is wrong. September 28, 20056.823 L6- 11 Arvind Feedback to Resolve Hazards FB FB FB FB 1 2 3 4 stage stage stage stage 3 4 1 2 • Detect a hazard and provide feedback to previous stages to stall or kill instructions • Controlling a pipeline in this manner works provided the instruction at stage i+1 can complete without any interference from instructions in stages 1 to i (otherwise deadlocks may occur) September 28, 20056.823 L6- 12 Arvind Interlocks to resolve Data Hazards Stall Condition 0x4 nop IR IR IR Add 31 we rs1 rs2 A addr rd1 we PC ws inst IR addr ALU Y wd rd2 Inst GPRs B rdata Memory Data R Memory Imm wdata wdata Ext ... MD1 MD2 r1 ← r0 + 10 r4 ← r1 + 17 ... September 28, 20056.823 L6- 13 Arvind Stalled Stages and Pipeline Bubbles time t0 t1 t2 t3 t4 t5 t6 t7 . . . . (I ) r1 ← (r0) + 10 IF ID EX MA WB 1 1 1 1 1 1 (I ) r4 ← (r1) + 17 IF ID ID ID ID EX MA WB 2 2 2 2 2 2 2 2 2 (I)IF IF IF IF ID EX MA WB 3 3 3 3 3 3 3 3 3 (I)IF ID EX MA WB 4 4 4 4 4 4 stalled stages (I)IF ID EX MA WB 5 5 5 5 5 5 time t0 t1 t2 t3 t4 t5 t6 t7 . . . . IF I I I I I I I I 1 2 3 3 3 3 4 5 ID I I I I I I I I 1 2 2 2 2 3 4 5 Resource EX I nop nop nop I I I I 1 2 3 4 5 Usage MA I nop nop nop I I I I 1 2 3 4 5 WB I nop nop nop I I I I 1 2 3 4 5 nop ⇒ pipeline bubble September 28, 20056.823 L6- 14 Arvind Interlock Control Logic ws stall C stall rs ? rt 0x4 nop IR IR IR Add 31 we rs1 rs2 A addr rd1 we PC ws inst IR addr ALU Y wd rd2 Inst GPRs B rdata Memory Data R Memory Imm wdata wdata Ext MD1 MD2 Compare the source registers of the instruction in the decode stage with the destination register of the uncommitted instructions. September 28, 20056.823 L6- 15 Arvind Interlocks Control Logic ignoring jumps & branches ws stall we C stall rs ? rt ws ws we we C re1 re2 C dest dest C re 0x4 nop IR IR IR Add 31 C dest we rs1 rs2 A addr rd1 we PC ws inst IR addr ALU Y wd rd2 Inst GPRs B rdata Memory Data R Memory Imm wdata wdata Ext MD1 MD2 Should we always stall if the rs field matches some rd? not every instruction writes a register ⇒ we not every instruction reads a register ⇒ re September 28, 20056.823 L6- 16 Arvind Source & Destination Registers R-type: op rs rt rd func I-type: op rs rt immediate16 J-type: op immediate26 source(s) destination ALU rd ← (rs) func (rt) rs, rt rd ALUi rt ← (rs) op imm rs rt LW rt ← M (rs) + imm rs rt SW M (rs) + imm ← (rt) rs, rt BZ cond (rs) true: PC ← (PC) + imm rs false: PC ← (PC) + 4 rs JPC ← (PC) + imm JAL r31 ← (PC), PC ← (PC) + imm 31 JR PC ← (rs) rs JALR r31 ← (PC), PC ← (rs) rs 31 September 28, 20056.823 L6- 17 Arvind Deriving the Stall Signal C C dest re ws = Case opcode re1 = Case opcode ALU ⇒ rd ALU, ALUi, ALUi, LW ⇒ rt LW, SW, BZ, JR, JALR JAL, JALR ⇒ R31 ⇒ on J, JAL ⇒ off we = Case opcode ALU, ALUi, LW ⇒(ws ≠ 0) re2 = Case opcode JAL, JALR ⇒ on ⇒ on ALU, SW ... ⇒ off ⇒ off ... C stall stall = ((rs =ws ).we + D E E (rs =ws ).we + D M M (rs =ws ).we ) . re1 + D W W D ((rt =ws ).we + D E E (rt =ws ).we + D M M (rt =ws ).we ) . re2 D W W D September 28, 2005 This is not the full story 6.823 L6- 18 Arvind Hazards due to Loads & Stores Stall Condition What if (r1)+7 = (r3)+5 ? 0x4 nop IR IR IR Add 31 we rs1 rs2 A addr rd1 we PC ws inst IR addr ALU Y wd rd2 Inst GPRs B rdata Memory Data R Memory Imm wdata wdata Ext MD1 MD2 ... Is there any possible data hazard M(r1)+7 ← (r2) r4 ← M(r3)+5 in this instruction sequence? ... September 28, 20056.823 L6- 19 Arvind Load & Store Hazards ... M(r1)+7 ← (r2) (r1)+7 = (r3)+5 ⇒ data hazard r4 ← M(r3)+5 ... However, the hazard is avoided because our memory system completes writes in one cycle Load/Store hazards, even when they do exist, are often resolved in the memory system itself. More on this later in the course. September 28, 200520 Five-minute break to stretch your legs