Question? Leave a message!




Advanced Superscalar Microprocessors

Advanced Superscalar Microprocessors
1 Advanced Superscalar Microprocessors Joel Emer Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Based on the material prepared by Krste Asanovic and Arvind 6.823 L14 2 Emer OoO Execution with ROB R1 R1 1 1 R1 R1 tag tag t t 0 0 ii Reg Regiiste ster r Rename Rename R2 R2 2 2 t t 0 0 R2 R2 j j va valid bit lid bit R3 R3 3 3 File File Table Table t t 2 2 1 1 R3 R3 : : t t 1 1 1 1 R4 R4 :: : : Ins Ins Ins use use use exec exec exec op op op p1 p1 p1 sr sr src1 c1 c1 p2 p2 p2 src2 src2 src2 pd pd pd dest dest dest d d da a ata ta ta Ne Nex xt t t to o 0 0 X X X X ad add d X X 1 1 X X 2 2 X X R R4 4 4 4 t t 1 1 com comm mit it t t 8 8 X X ld ld X X 2 25 56 6 R R3 3 2 2 Nex Next t . . ava avaiilla ab blle e . . Reorder Reorder t t n n buffer buffer Commit Commit Load Load Store Store FU FU FU FU FU FU Uni Unit t Un Unit it t t, , r re esul sult t Basic Operation: • Enter op and tag or data (if known) for each source • Replace tag with data as it becomes available • Issue instruction when all sources are available • Save dest data when operation finishes • Commit saved dest data when instruction commits October 31, 2005 6.823 L14 3 Emer Unified Physical Register File (MIPS R10K, Alpha 21264, Pentium 4) t 1 Reg Snapshots for t 2 r t 1 i mispredict recovery . File r t 2 j t n Rename Load Store FU FU FU FU Table Unit Unit (ROB not shown) t, result • One regfile for both committed and speculative values (no data in ROB) • During decode, instruction result allocated new physical register, source regs translated to physical regs through rename table • Instruction reads data from regfile at start of execute (not in decode) • Writeback updates reg. busy bits on instructions in ROB (assoc. search) • Snapshots of rename table taken at every branch to recover mispredicts • On exception, renaming undone in reverse order of issue (MIPS R10000) October 31, 2005 6.823 L14 4 Emer Speculative OutofOrder Execution Update predictors Branch kill Resolution Branch kill kill Prediction kill OutofOrder InOrder Decode PC Fetch Reorder Buffer Commit Rename InOrder Physical Reg. File Branch Store ALU MEM D Unit Buffer Execute October 31, 2005 6.823 L14 5 Emer Lifetime of Physical Registers • Physical regfile holds committed and speculative values • Physical registers decoupled from ROB entries (no data in ROB) ld r1, (r3) ld P1, (Px) add r3, r1, 4 add P2, P1, 4 sub r6, r7, r9 sub P3, Py, Pz add r3, r3, r6 add P4, P2, P3 Rename ld r6, (r1) ld P5, (P1) add r6, r6, r3 add P6, P5, P4 st r6, (r1) st P6, (P1) ld r6, (r11) ld P7, (Pw) When can we reuse a physical register When next write of same architectural register commits October 31, 2005 6.823 L14 6 Emer Physical Register Management Rename Physical Regs Free List Table P0 P0 R0 P1 P1 ld r1, 0(r3) R1 P2 P8 P3 R2 P3 P2 add r3, r1, 4 R3 P4 P7 P4 R4 P5 R6 p sub r6, r7, r6 R5 P6 R7 p R6 P7 add r3, r3, r6 P5 R3 p R7 P8 P6 R1 p ld r6, 0(r1) Pn ROB use ex op p1 PR1 p2 PR2 Rd LPRd PRd (LPRd requires third read port on Rename Table for each instruction) October 31, 2005 6.823 L14 7 Emer Physical Register Management Rename Physical Regs Free List Table P0 P0 R0 P1 P1 ld r1, 0(r3) R1 P0 P2 P8 P3 R2 P3 P2 add r3, r1, 4 R3 P4 P7 P4 R4 P5 R6 p sub r6, r7, r6 R5 P6 R7 p R6 P7 add r3, r3, r6 P5 R3 p R7 P8 P6 R1 p ld r6, 0(r1) Pn ROB use ex op p1 PR1 p2 PR2 Rd LPRd PRd x ld p P7 r1 P8 P0 October 31, 2005 6.823 L14 8 Emer Physical Register Management Rename Physical Regs Free List Table P0 P0 R0 P1 P1 ld r1, 0(r3) R1 P0 P2 P8 P3 R2 P3 P2 add r3, r1, 4 R3 P1 P4 P7 P4 R4 P5 R6 p sub r6, r7, r6 R5 P6 R7 p R6 P7 add r3, r3, r6 P5 R3 p R7 P8 P6 R1 p ld r6, 0(r1) Pn ROB use ex op p1 PR1 p2 PR2 Rd LPRd PRd x ld p P7 r1 P8 P0 x add P0 r3 P7 P1 October 31, 2005 6.823 L14 9 Emer Physical Register Management Rename Physical Regs Free List Table P0 P0 R0 P1 P1 ld r1, 0(r3) R1 P0 P2 P8 P3 R2 P3 P2 add r3, r1, 4 R3 P1 P4 P7 P4 R4 P5 R6 p sub r6, r7, r6 R5 P6 R7 p R6 P3 P7 add r3, r3, r6 P5 R3 p R7 P8 P6 R1 p ld r6, 0(r1) Pn ROB use ex op p1 PR1 p2 PR2 Rd LPRd PRd x ld p P7 r1 P8 P0 x add P0 r3 P7 P1 x sub p P6 p P5 r6 P5 P3 October 31, 2005 6.823 L14 10 Emer Physical Register Management Rename Physical Regs Free List Table P0 P0 R0 P1 P1 ld r1, 0(r3) R1 P0 P2 P8 P3 R2 P3 P2 add r3, r1, 4 R3 P1 P2 P4 P7 P4 R4 P5 R6 p sub r6, r7, r6 R5 P6 R7 p R6 P3 P7 add r3, r3, r6 P5 R3 p R7 P8 P6 R1 p ld r6, 0(r1) Pn ROB use ex op p1 PR1 p2 PR2 Rd LPRd PRd x ld p P7 r1 P8 P0 x add P0 r3 P7 P1 x sub p P6 p P5 r6 P5 P3 x add P1 P3 r3 P1 P2 October 31, 2005 6.823 L14 11 Emer Physical Register Management Rename Physical Regs Free List Table P0 P0 R0 P1 P1 ld r1, 0(r3) R1 P0 P2 P8 P3 R2 P3 P2 add r3, r1, 4 R3 P1 P2 P4 P7 P4 R4 P5 R6 p sub r6, r7, r6 R5 P6 R7 p R6 P3 P4 P7 add r3, r3, r6 P5 R3 p R7 P8 P6 R1 p ld r6, 0(r1) Pn ROB use ex op p1 PR1 p2 PR2 Rd LPRd PRd x ld p P7 r1 P8 P0 x add P0 r3 P7 P1 x sub p P6 p P5 r6 P5 P3 x add P1 P3 r3 P1 P2 x ld P0 r6 P3 P4 October 31, 2005 6.823 L14 12 Emer Physical Register Management Rename Physical Regs Free List Table P0 R1 P0 p R0 P1 P1 ld r1, 0(r3) R1 P0 P2 P8 P3 R2 P3 P2 add r3, r1, 4 R3 P1 P2 P4 P7 P4 R4 P5 R6 p P8 sub r6, r7, r6 R5 P6 R7 p R6 P3 P4 P7 add r3, r3, r6 P5 R3 p R7 P8 P6 R1 p ld r6, 0(r1) Pn ROB use ex op p1 PR1 p2 PR2 Rd LPRd PRd Execute x x x ld ld p p P7 P7 r1 r1 P8 P0 P0 Commit x add P0 r3 P7 P1 p x sub p P6 p P5 r6 P5 P3 x add P1 P3 r3 P1 P2 x ld P0 r6 P3 P4 p October 31, 2005 6.823 L14 13 Emer Physical Register Management Rename Physical Regs Free List Table P0 R1 P0 p R0 P1 R3 p P1 ld r1, 0(r3) R1 P0 P2 P8 P3 R2 P3 P2 add r3, r1, 4 R3 P1 P2 P4 P7 P4 R4 P5 R6 p P8 sub r6, r7, r6 R5 P6 R7 p P7 R6 P3 P4 P7 add r3, r3, r6 P5 R3 p R7 P8 P6 ld r6, 0(r1) Pn ROB use ex op p1 PR1 p2 PR2 Rd LPRd PRd x x ld p P7 r1 P8 P0 Execute x x x add add P0 P0 r3 r3 P7 P1 P1 p Commit x sub p P6 p P5 r6 P5 P3 x add P1 P3 r3 P1 P2 p x ld P0 r6 P3 P4 p October 31, 2005 6.823 L14 14 Emer Reorder Buffer Holds Active Instruction Window … … (Older instructions) Commit ld r1, (r3) ld r1, (r3) add r3, r1, r2 add r3, r1, r2 sub r6, r7, r9 sub r6, r7, r9 Execute add r3, r3, r6 add r3, r3, r6 ld r6, (r1) ld r6, (r1) add r6, r6, r3 add r6, r6, r3 st r6, (r1) Fetch st r6, (r1) ld r6, (r1) ld r6, (r1) … (Newer instructions) … Cycle t + 1 Cycle t October 31, 2005 6.823 L14 15 Emer Issue Timing i1 Add R1,R1,1 Issue Execute 1 1 i2 Sub R1,R1,1 Issue Execute 2 2 How can we issue earlier i1 Add R1,R1,1 Issue Execute 1 1 i2 Sub R1,R1,1 Issue Execute 2 2 What makes this schedule fail October 31, 2005 6.823 L14 16 Emer Issue Queue with latency prediction Inst use exec op p1 lat1 src1 p2 lat2 src2 dest ptr 2 next to commit BEQZ Speculative Instructions ptr 1 next available Issue Queue (Reorder buffer) • Fixed latency: latency included in queue entry (‘bypassed’) • Predicted latency: latency included in queue entry (speculated) • Variable latency: wait for completion signal (stall) October 31, 2005 6.823 L14 17 Emer DatainROB vs. Single Register File DatainROB style FU Read Read Write Decode/ Commit Reg ROB ROB Rename File Source Dest Cache Singleregisterfile style FU Read Write Decode/ Issue Commit Reg Reg Queue Rename File File Cache How does issue speculation differ October 31, 2005 6.823 L14 18 Emer Superscalar Register Renaming • During decode, instructions allocated new physical destination register • Source operands renamed to physical register with newest value • Execution unit only sees physical register numbers Inst 1 Inst 2 Op Dest Src1 Src2 Op Dest Src1 Src2 Read Addresses Update Register Rename Table Mapping Free List Read Data Op PDest PSrc1 PSrc2 Op PDest PSrc1 PSrc2 Does this work October 31, 2005 Write Ports 6.823 L14 19 Emer Superscalar Register Renaming Inst 1 Inst 2 Op Dest Src1 Src2 Op Dest Src1 Src2 Read Addresses Update Register = = Rename Table Mapping Free List Read Data Must check for RAW hazards between instructions issuing in same cycle. Can be done in parallel with rename Op PDest PSrc1 PSrc2 Op PDest PSrc1 PSrc2 lookup. MIPS R10K renames 4 seriallyRAWdependent insts/cycle) October 31, 2005 Write Ports20 Fiveminute break to stretch your legs 6.823 L14 21 Emer Memory Dependencies st r1, (r2) ld r3, (r4) When can we execute the load October 31, 2005 6.823 L14 22 Emer Speculative Loads / Stores Just like register updates, stores should not modify the memory until after the instruction is committed store buffer entry must carry a speculation bit and the tag of the corresponding store instruction • If the instruction is committed, the speculation bit of the corresponding store buffer entry is cleared, and store is written to cache • If the instruction is killed, the corresponding store buffer entry is freed Loads work normally “older” store buffer entries needs to be searched before accessing the memory or the cache October 31, 2005 6.823 L14 23 Emer Load Path Load Address Speculative Store Buffer L1 Data Cache V S Tag Data V S Tag Data V S Tag Data Tags Data V S Tag Data V S Tag Data V S Tag Data Store Commit Path Load Data • Hit in speculative store buffer has priority over hit in data cache • Hit to newer store has priority over hits to older stores in speculative store buffer October 31, 2005 6.823 L14 24 Emer Datapath: Branch Prediction and Speculative Execution Update predictors kill Branch Branch Resolution kill Prediction kill kill Decode PC Fetch Reorder Buffer Commit Rename Reg. File Branch Store ALU MEM D Unit Buffer Execute October 31, 2005 6.823 L14 25 Emer InOrder Memory Queue • Execute all loads and stores in program order = Load and store cannot leave ROB for execution until all previous loads and stores have completed execution • Can still execute loads and stores speculatively, and outoforder with respect to other instructions • Stores held in store buffer until commit October 31, 2005 6.823 L14 26 Emer Conservative OoO Load Execution st r1, (r2) ld r3, (r4) • Split execution of store instruction into two phases: address calculation and data write • Can execute load before store, if addresses known and r4 = r2 • Each load address compared with addresses of all previous uncommitted stores (can use partial conservative check i.e., bottom 12 bits of address) • Don’t execute load if any previous store address not known (MIPS R10K, 16 entry address queue) October 31, 2005 6.823 L14 27 Emer Address Speculation st r1, (r2) ld r3, (r4) • Guess that r4 = r2 • Execute load before store address known • Need to hold all completed but uncommitted load/store addresses in program order • If subsequently find r4==r2, squash load and all following instructions = Large penalty for inaccurate address speculation October 31, 2005 6.823 L14 28 Emer Memory Dependence Prediction (Alpha 21264) st r1, (r2) ld r3, (r4) • Guess that r4 = r2 and execute load before store • If later find r4==r2, squash load and all following instructions, but mark load instruction as storewait • Subsequent executions of the same load instruction will wait for all previous stores to complete • Periodically clear storewait bits October 31, 2005 6.823 L14 29 Emer Store Sets (Alpha 21464) Multiple Readers PC 8 PC 0 Store Store 4 Empty Program 8 Store Order Store 12 PC 0 PC 12 Multiple Writers multiple code paths Load 28 multiple stack spills 32 Load multiple components 36 Load of a single location PC 8 40 Load October 31, 2005 6.823 L14 30 Emer Memory Dependence Prediction using Store Sets •The processor approximates each load’s store set by initially allowing naïve speculation and recording memoryorder violations. • A load must wait for any stores in its store set that have not yet executed. October 31, 2005 6.823 L14 31 Emer The Store Set Map Table Store Set Map Table Program Store Index Order . . Store Index V . Writer . . . Load Index . Store . Set A Load Index V . Reader . . . Load Index Store/Load Pair causing Memory Order Violation October 31, 2005 6.823 L14 32 Emer Store Set Sharing for Multiple Readers Store Set Map Table Program Index Store Order . . Index Store V . . . . Index Load . Store . Set A Index Load V . . . . Index Load V Store/Load Pair causing Memory Order Violation October 31, 2005 6.823 L14 33 Emer Store Set Map Table, cont. Store Set Map Table Program Order Index Store V . . Store Index Store Set B V . . . . Index Load V . Store . Set A Index Load V . . . . Index Load V Store/Load Pair causing Memory Order Violation October 31, 2005 34 Thank you 35 Extras 6.823 L14 36 Emer Mispredict Recovery • Inorder execution machines: – Assume no instruction issued after branch can writeback before branch resolves – Kill all instructions in pipeline behind mispredicted branch Outoforder execution –Multiple instructions following branch in program order can complete before branch resolves October 31, 2005 6.823 L14 37 Emer Precise Exceptions via InOrder Commit Inorder Outoforder Inorder Commit Reorder Buffer Fetch Decode Kill Kill Kill Execute Exception Inject handler PC • Instructions fetched and decoded into instruction reorder buffer inorder • Execution is outoforder ( ⇒ outoforder completion) • Commit (writeback to architectural state, i.e., regfile memory, is inorder Temporary storage needed in ROB to hold results before commit October 31, 2005 6.823 L14 38 Emer Extensions for Precise Exceptions Inst use exec op p1 src1 p2 src2 pd dest data cause ptr 2 next to commit ptr 1 next available Reorder buffer • add pd, dest, data, cause fields in the instruction template • commit instructions to reg file and memory in program order ⇒ buffers can be maintained circularly • on exception, clear reorder buffer by resetting ptr =ptr 1 2 (stores must wait for commit before updating memory) October 31, 2005 6.823 L14 39 Emer Branch Misprediction Recovery Inst use exec op p1 src1 p2 src2 pd dest data cause ptr 2 next to commit BEQZ rollback next Speculative Instructions available ptr 1 next available Reorder buffer On mispredict • Roll back “next available” pointer to just after branch • Reset use bits • Flush misspeculated instructions from pipelines • Restart fetch on correct branch path October 31, 2005 6.823 L14 40 Emer Branch Misprediction in Pipeline Inject correct PC Branch Branch Kill Resolution Prediction Kill Kill Commit Reorder Buffer Fetch Decode PC Complete Execute • Can have multiple unresolved branches in ROB • Can resolve branches outoforder by killing all the instructions in ROB that follow a mispredicted branch October 31, 2005 6.823 L14 41 Emer Recovering Renaming Table t v t v t v r t v 1 Register Rename Rename r 2 File Snapshots Table Ins use exec op p1 src1 p2 src2 pd dest data t 1 t 2 Reorder . buffer . t n Commit Load Store FU FU FU Unit Unit t, result Take snapshot of register rename table at each predicted branch, recover earlier snapshot if branch mispredicted October 31, 2005