Question? Leave a message!




More Loop Optimizations

More Loop Optimizations
Spring Spring 2010 2010 More Loop Optimizations 5 Outline • Strength Reduction • Loop Test Replacement • Loop Invariant Code Motion • SIMDization with SSE Saman Amarasinghe 2 6.035 ©MIT Fall 1998 Streng gth Reduction •Rep p lace exp p ensive op p erations in an exp p ression using cheaper ones – Not a data-flow p problem – Algebraic simplification – Example: Example: aa4 4  a a 2 2 Saman Amarasinghe 3 6.035 ©MIT Fall 1998 Streng gth Reduction •In loop p s reduce exp pensive op perations in expressions in to cheaper ones by using the p p reviously y calculated value Saman Amarasinghe 4 6.035 ©MIT Fall 1998 Streng gth Reduction t = 202 for for j j = = 1 1 to to 100 100 t = t - 2 A(j) A(j) = tt Saman Amarasinghe 5 6.035 ©MIT Fall 1998 Streng gth Reduction t = 202 for for j j = = 1 1 to to 100 100 t = t - 2 ( ( ab b ase + + 4j) 4j) = tt Saman Amarasinghe 6 6.035 ©MIT Fall 1998 Streng gth Reduction t = 202 for for j j = = 1 1 to to 100 100 t = t - 2 ( ( ab b ase + + 4j) 4j) = tt Basic Induction variable: J J = 1 1 , 2 2, 3 3 , 4 4, ….. Induction variable 200 - 2j t = 202, 200, 198, 196, ….. Induction Induction variable variable abase abase+4 4j: j: abase+4j = abase+4, abase+8, abase+12, abase+14, …. Saman Amarasinghe 7 6.035 ©MIT Fall 1998 Streng gth Reduction t = 202 for for j j = = 1 1 to to 100 100 t = t - 2 ( (ab b ase + + 4j) 4j) = tt Basic Induction variable: J J = 1 1 , 2 2 , 3 3, 4 4, ….. 1 1 1 Induction variable 200 - 2j t = 202, 200, 198, 196, ….. Induction Induction variable variable abase abase+4 4j: j: abase+4j = abase+4, abase+8, abase+12, abase+14, …. Saman Amarasinghe 8 6.035 ©MIT Fall 1998 Streng gth Reduction t = 202 for for j j = = 1 1 to to 100 100 t = t - 2 ( (ab b ase + + 4j) 4j) = tt Basic Induction variable: J J = 1 1, 2 2, 3 3, 4 4 , ….. 1 1 1 Induction variable 200 - 2j t = 202, 200, 198, 196, ….. -2 -2 -2 Induction Induction variable variable abase abase+4 4j: j: abase+4j = abase+4, abase+8, abase+12, abase+14, …. Saman Amarasinghe 9 6.035 ©MIT Fall 1998 Streng gth Reduction t = 202 for for j j = = 1 1 to to 100 100 t = t - 2 ( ( ab b ase + + 4j) 4j) = tt Basic Induction variable: J J = 1 1 , 2 2, 3 3, 4 4, ….. 1 1 1 Induction variable 200 - 2j t = 202, 200, 198, 196, ….. -2 -2 -2 Induction Induction variable variable abase abase+4 4j: j: abase+4j = abase+4, abase+8, abase+12, abase+14, …. 4 4 4 Saman Amarasinghe 10 6.035 ©MIT Fall 1998 Streng g th Reduction Alg g orithm • For a dep pendent induction variable k = aj j + b for j = 1 to 100 (abase + 4j) = j Saman Amarasinghe 11 6.035 ©MIT Fall 1998 Streng gth Reduction Alg g orithm • For a dep pendent induction variable k = aj j + b • Add a pre-header k’ = ajinit + b t = abase + 41 for j = 1 to 100 (abase + 4j) = j Saman Amarasinghe 12 6.035 ©MIT Fall 1998 Streng gth Reduction Alg g orithm • For a dep pendent induction variable k = aj j + b • Add a pre-header k’ = ajinit + b • • Next Next to to j j = = j j + + c c add add k k’ = =k k ’ +a + ac c t = abase + 41 for j = 1 to 100 (abase + 4j) = j t = t + 4 Saman Amarasinghe 13 6.035 ©MIT Fall 1998 Streng gth Reduction Alg g orithm • For a dep pendent induction variable k = aj j + b • Add a pre-header k’ = ajinit + b • • Next Next to to j j = = j j + + c c add add k k’ =k = k ’ + +a ac c • Use k’ instead of k t = abase + 41 for j = 1 to 100 (t) = j t = t + 4 Saman Amarasinghe 14 6.035 ©MIT Fall 1998 Examp ple double A256, B256256 jj = 11 while(j100) Aj = Bjj j = j + 2 Saman Amarasinghe 15 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 jj = 11 while(j100) (&A + 4j) = (&B + 4(256j + j)) j = j + 2 Saman Amarasinghe 16 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 jj = 11 while(j100) (&A + 4j) = (&B + 4(256j + j)) j = j + 2 Base Induction Variable: j Saman Amarasinghe 17 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 j j = 11 while(j100) (&A + 4j) = (&B + 4(256j + j)) j = j + 2 Base Induction Variable: j D D epend dent I I nd d uctii on V V arii abl bl e: a = &A &A + 4j 4j Saman Amarasinghe 18 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 j j = 11 a = &A + 4 while(j100) (&A + 4j) = (&B + 4(256j + j)) j = j + 2 Base Induction Variable: j D D epend dent I I nd d uctii on V V arii abl bl e: a = &A &A + 4j 4j Saman Amarasinghe 19 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 jj = 11 a = &A + 4 while(j100) (&A + 4j) = (&B + 4(256j + j)) j = j + 2 a = a + 8 j Base Induction Variable: Dependent Dependent I Ind nd uction ction V Va ariable iable: a a = & &A A + + 4j 4j Saman Amarasinghe 20 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 jj = 11 a = &A + 4 while(j100) a = (&B + 4(256j + j)) j = j + 2 a = a + 8 Base Induction Variable: j Dependent Dependent I Ind nd uction ction V Va ariable iable: a a = & &A A + + 4j 4j Saman Amarasinghe 21 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 j=1 j = 1 a = &A + 4 while(j100) a = (&B + 4(256j + j)) j j = jj + 2 a = a + 8 BB ase I Ind duct tii on V Varii ablbl e: j j Dependent Induction Variable: b = &B + 4257j 4257j Saman Amarasinghe 22 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 j j=1 = 1 a = &A + 4 b = &B + 1028 while(j100) a = (&B + 4(256j + j)) jj = jj + 2 a = a + 8 BB ase I Ind duct tii on V Varii ablbl e: j j Dependent Induction Variable: b = &B + 4257j 4257j Saman Amarasinghe 23 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 j=1 j = 1 a = &A + 4 b = &B + 1028 while(j100) a = (&B + 4(256j + j)) j j = jj + 2 a = a + 8 b = b + 2056 Base Base Induction Induction Variable: Variable: j j Dependent Induction Variable: b = &B + 4 4 257 257 j j Saman Amarasinghe 24 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 j=1 j = 1 a = &A + 4 b = &B + 1028 while(j100) a = b j j = jj + 2 a = a + 8 b = b + 2056 Base Base Induction Induction Variable: Variable: j j Dependent Induction Variable: b = &B + 4 4 257 257 j j Saman Amarasinghe 25 6.035 ©MIT Fall 2006 Examp ple double A256, B256256 j j = 11 a = &A + 4 b b = &B &B+1 + 1028 028 while(j100) a = b j = j + 2 a = a + 8 b b=b+2 = b + 2056 056 Saman Amarasinghe 26 6.035 ©MIT Fall 2006 5 Outline • Strength Reduction • Loop Test Replacement • Loop Invariant Code Motion • SIMDization with SSE Saman Amarasinghe 27 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables Saman Amarasinghe 28 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 j = 1 while(j100) Aj = Bjj Saman Amarasinghe 29 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 j = 1 a = &A + 4 b = &B + 1028 while(j100) while(j100) a = b j = j + 2 a = a + 8 b = b + 2056 Saman Amarasinghe 30 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 j = 1 a = &A + 4 • J is only used for the loop bound b = &B + 1028 while(j100) while(j100) a = b j = j + 2 a = a + 8 b = b + 2056 Saman Amarasinghe 31 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 j = 1 a = &A + 4 • J is only used for the loop bound b = &B + 1028 • Use Use a a dependent dependent IV IV (a (a or or b) b) while(j100) while(j100) a = b j = j + 2 a = a + 8 b = b + 2056 Saman Amarasinghe 32 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 j = 1 a = &A + 4 • J is only used for the loop bound b = &B + 1028 • Use Use a a dependent dependent IV IV (a (a or or b) b) while(j100) while(j100) • Lets choose a a = b j = j + 2 a = a + 8 b = b + 2056 Saman Amarasinghe 33 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 j = 1 a = &A + 4 • J is only used for the loop bound b = &B + 1028 • Use Use a a dependent dependent IV IV (a (a or or b) b) while(j100) while(j100) • Lets choose a a = b j = j + 2 j 100  a &A + 800 a = a + 8 b = b + 2056 Saman Amarasinghe 34 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 j = 1 a = &A + 4 • J is only used for the loop bound b = &B + 1028 • Use Use a a dependent dependent IV IV (a (a or or b) b) while(a&A+800) while(a&A+800) • Lets choose a a = b j = j + 2 j 100  a &A + 800 a = a + 8 • Replace the loop condition b = b + 2056 Saman Amarasinghe 35 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 a = &A + 4 • J is only used for the loop bound b = &B + 1028 • Use Use a a dependent dependent IV IV (a (a or or b) b) while(a&A+800) while(a&A+800) • Lets choose a a = b j 100  a &A + 800 a = a + 8 • Replace the loop condition b = b + 2056 • Get rid of j Saman Amarasinghe 36 6.035 ©MIT Fall 1998 Loop p Test Rep placement • Eliminate basic induction variable used only y for calculating other induction variables double A256, B256256 a = &A + 4 b = &B + 1028 while(a&A+800) a a = = b b a = a + 8 b = b + 2056 Saman Amarasinghe 37 6.035 ©MIT Fall 1998 Loop p Test Rep placement Alg gorithm • If basic induction variable J is only used for calculating calculating other other induction induction variables variables • Select an induction variable k in the family of J (K = aJ + b) • • Replace Replace a a comparison comparison such such as as if (J X) goto L1 by if(K’ aX + b) goto L1 if a is positive if(K’ aX + b) goto L1 if a is negative • If J is live at any exit from loop, recompute J = (K’ - b)/a Saman Amarasinghe 38 6.035 ©MIT Fall 1998 5 Outline • Strength Reduction • Loop Test Replacement • Loop Invariant Code Motion • SIMDization with SSE Saman Amarasinghe 39 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp putation p p roduces the same value in every loop iteration, move it out of the loop • Same Same idea idea as as with with induction induction variables variables – Variables not updated in the loop are loop invariant – Expressions Expressions of of loop loop invariant invariant v variables ariables are are loop loop invariant – Variables Variables assigned assigned only only loop loop invariant invariant expressions expressions are loop invariant Saman Amarasinghe 40 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp p utation p p roduces the same value in every loop iteration, move it out of the loop for i = 1 to N x x = x+1 x + 1 for j = 1 to N a(i,j) = 100N + 10i + j + x Saman Amarasinghe 41 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp putation p p roduces the same value in every loop iteration, move it out of the loop for i = 1 to N xx = x+1 x + 1 for j = 1 to N a(i,j) = 100N + 10i + j + x Saman Amarasinghe 42 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp putation p produces the same value in every loop iteration, move it out of the loop t1 = 100N for i = 1 to N x x = x+1 x + 1 for j = 1 to N a(i,j) = 100N + 10i + j + x Saman Amarasinghe 43 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp p utation p p roduces the same value in every loop iteration, move it out of the loop t1 = 100N for i = 1 to N x x = x+1 x + 1 for j = 1 to N a(i,j) = t1 + 10i + j + x Saman Amarasinghe 44 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp putation p p roduces the same value in every loop iteration, move it out of the loop t1 = 100N for i = 1 to N xx = x+1 x + 1 for j = 1 to N a(i,j) = t1 + 10i + j + x Saman Amarasinghe 45 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp putation p produces the same value in every loop iteration, move it out of the loop t1 = 100N for i = 1 to N x x = x+1 x + 1 for j = 1 to N a(i,j) = t1 + 10i + j + x Saman Amarasinghe 46 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp putation p produces the same value in every loop iteration, move it out of the loop t1 = 100N for i = 1 to N xx = x+1 x + 1 t2 = t1 + 10i + x for j = 1 to N a(( i,j) ,j) = t1 + 10i + j j + x Saman Amarasinghe 47 6.035 ©MIT Fall 1998 Loop p Invariant Code Motion • If a comp putation p p roduces the same value in every loop iteration, move it out of the loop t1 = 100N for i = 1 to N x x = x+1 x + 1 t2 = t1 + 10i + x for j = 1 to N a( (i,j) ,j) = t2 + j j Saman Amarasinghe 48 6.035 ©MIT Fall 1998 5 Outline • Strength Reduction • Loop Test Replacement • Loop Invariant Code Motion • SIMDization with SSE Saman Amarasinghe 49 6.035 ©MIT Fall 1998 SIMD Throug gh SSE extensions •Sing g le Instruction Multip p le Data – Compute multiple identical operations in a single instruction – Exploit fine grained parallelism Saman Amarasinghe 50 6.035 ©MIT Fall 1998 SSE Reg gisters • 16 128-bit reg gisters: %xmm0 to %xmm16 – Multiple interpretations for each register – Each arithmetic op peration comes in multip ple versions 128 bit Double Quadword 64 bit Quadword 64 bit Quadword 32 bit Doubleword 32 bit Doubleword 32 bit Doubleword 32 bit Doubleword 16 16 bit bit w wo ord rd 16 bit word 16 bit word 16 16 bit bit w wo ord rd 16 bit word 16 16 bit bit w wo ord rd 16 16 bit bit w wo ord rd 16 bit word Saman Amarasinghe 51 6.035 ©MIT Fall 1998 Data Transfer • Moving Data From Memory or xmm registers – MOVDQA OP1, OP2 Move aligned Double Quadword • Can read or write to memory in 128 bit chunks • • If If OP1 OP1 or or OP2 OP2 are are registers registers , they they must must be be xmm xmm r registers egisters • Memory locations in OP1 or OP2 must be multiples of 16 – MOVDQU OP1, OP2 Move unaligned Double Quadword • Same as MOVDQA but • memory addresses don’t have to be multiples of 16 Saman Amarasinghe 52 6.035 ©MIT Fall 1998 � �� � Data Transfer • Moving Data From 64-bit registers – MOVQ OP1,OP2 Move Double Quadword Can move from 64 bit register to xmm register or viceversa Writes Writes to/Reads to/Reads from from the the lower lower 64 64 bits bits of of xmm xmm register register Can also be used to read a 64-bit chunk to/from memory MOVQ %r11, %xmm1 %r11 %xmm1 128 128 0 0 Saman Amarasinghe 53 6.035 ©MIT Fall 1998Data Reordering g •Unp pack and Interleave – PUNPCKLDQ Low Doublewords 128 128 0 0 128 0 Saman Amarasinghe 54 6.035 ©MIT Fall 1998Data Reordering g •Unp pack and Interleave – PUNPCKLQDQ Low Quadwords 128 128 0 0 128 0 Saman Amarasinghe 55 6.035 ©MIT Fall 1998Arithmetic • Arithmetic operations come in many flavors – based on the datatype of the register – specified in the instruction sufix • Example: Addition – PADDQ Add 64-bit Quadwords – PADDD PADDD Add Add 3 32 2 -bi bi t D D oub bl l eword ds – PADDW Add 16-bit words • • E E xample: ample: S S ubtraction btraction – PSUBQ Subtract 64-bit Quadwords – – PSUBD PSUBD Subtract Subtract 32 32- -bit bit Doublewords Doublewords – PSUBW Subtract 16-bit words Saman Amarasinghe 56 6.035 ©MIT Fall 1998 Putting g It All Tog gether • Source Code for i = 1 to N Ai = Ai b • After Unrolling loop: Reading g from consecutive mov mov (%rdi,%rax), (%rdi,%rax), %r10 %r10 mov (%rdi,%rbx), %rcx addresses imul %r11, %r10 Mult by a loop invariant imul %r11, %rcx mov %r10, (%rdi,%rax) Writing to consecutive sub 8, %rax addresses mov %rcx, (%rdi,%rbx) sub 8, %rbx jz loop Saman Amarasinghe 57 6.035 ©MIT Fall 1998 Putting g it all tog gether Original Version SSE Version movq %r11, %xmm2 punpckldq %xmm2, %xmm2 loop: loopp : movdd qa (% (%rdi di ,% %rax))% , %xmm00 mov (%rdi,%rax), %r10 pmuludq %xmm2, %xmm0 mov (%rdi,%rbx), %rcx movdqa %xmm0, (%rdi, %rax) imul %r11, %r10 sub 8, %rax imul imul %r11 %r11, %%rcx rcx jz loop mov %r10, (%rdi,%rax) sub 8, %rax mov %rcx, (%rdi,%rbx) sub 8, %rbx jz loop Saman Amarasinghe 58 6.035 ©MIT Fall 1998 Putting g it all tog gether SSE Version movq %r11, %xmm2 Populate xmm2 with copies of %r11 punpckldq %xmm2, %xmm2 loop: movdqa (%rdi,%rax), %xmm0 pmuludq %xmm2, %xmm0 movdqa %xmm0, (%rdi, %rax) sub sub 8, 8,%a %rax Only Only one one index index is is needed needed jz loop Saman Amarasinghe 59 6.035 ©MIT Fall 1998 Conditions for SIMDization • Consecutive iterations reading and writing from consecutive locations • Consecutive iterations are independent of each other • The easiest thing is to pattern match at the basic block level after unrolling loops Saman Amarasinghe 60 6.035 ©MIT Fall 1998 MIT OpenCourseWare http://ocw.mit.edu 6.035 Computer Language Engineering Spring 2010 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
Website URL
Comment
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.MasonHanks
User Type:
Teacher
Country:
Germany
Uploaded Date:
23-07-2017