Parallelism vs Concurrency

what is parallelization in computer processing and what is memory parallelization
Dr.MasonHanks Profile Pic
Dr.MasonHanks,Germany,Teacher
Published Date:23-07-2017
Your Website URL(Optional)
Comment
Spring Spring 2010 2010 Parallelization Saman Amarasinghe Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Outline • Why Why Parallelism Parallelism • Parallel Execution • Parallelizing Compilers • D Depend dence A A nall ysiis •Inc cre eas asing g Pa ara alle elizat atio on O Oppo ppo rtu tunit tie es sNumbe er of Transistors s Moore’s Law 1,000,000,000 Itanium 2 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006 Itanium 100,000,000 P4 P3 ??%/year 52%/year 10,000,000 P2 P2 Pentium 1,000,000 486 386 100,000 286 25%/year 8086 1 10,000 19781980 1982 1984 1986 1988 1990 1992 1994 1996 199820002002 2004 2006 2008 2010 2012 2014 2016 From David Patterson Saman Amarasinghe 3 6.035 ©MIT Fall 2006 Performance (vs. VAX X-11/780) Numbe er of Transistors s Unip( processor Performance (SPECint)) 1,000,000,000 Itanium 2 From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, 2006 Itanium 100,000,000 P4 P3 10,000,000 P2 P2 Pentium 1,000,000 486 386 100,000 286 8086 10,000 From David Patterson Saman Amarasinghe 4 6.035 ©MIT Fall 2006Multicores Are Here 512 512 Pi Pi cochi hi p Ambric PC102 AM2045 256 Cisco CSR-1 Intel 128 Tflops Tflops 64 32 Raza Cavium o off Ra Raw w XLR XLR Ot Octeon 16 cores Cell Niagara 8 Opteron 4P Boardcom Boardcom 148 1480 0 Xeon MP 4 4 Xbox360 PA-8800 Opteron Tanglewood Power4 2 PExtreme Power6 Yonah 4004 8080 8086 286 386 486 Pentium P2 P3 Itanium P4 P4 1 1 8008 Athlon Itanium 2 1970 1975 1980 1985 1990 1995 2000 2005 20?? Saman Amarasinghe 5 6.035 ©MIT Fall 2006 Issues with Parallelism • Amdhal’s Law – Any computation can be analyzed in terms of a portion that must must be be executed executed sequentially sequentially, Ts Ts, and and a a portion portion t that hat c can an be be executed in parallel, Tp. Then for n processors: – T(n) = Ts + Tp/n – T( T() ) = Ts, Ts, thus thus maximum maximum s speedup peedup (Ts (Ts + + Tp) Tp) /Ts /Ts • Load Balancing – The work is distributed among processors so that all processors are kept busy when parallel task is executed. • Granularity – The size of the parallel regions between synchronizations or the the ratio ratio of of computation computation (useful (useful work) work) to to communication communication (overhead). Saman Amarasinghe 6 6.035 ©MIT Fall 2006 Outline • Why Why Parallelism Parallelism • Parallel Execution • Parallelizing Compilers • D D epend dence A A nall ysiis •Inc cre eas asing g Pa ara alle elizat atio on O Oppo ppo rtu tunit tie es sTyp ypes of Parallelism • Instruction Level  Scheduling Scheduling and and Hardware Hardware PP arallll elili sm (ILP) (ILP)  Mainly by hand • Task Level Parallelism (TLP) • • Loop Loop Level Level P Parallelism arallelism  Hand or Compiler Generated (LLP) or Data Parallelism  Hardware or Streaming • Pipeline Parallelism  Recursive Recursive f functions unctions • • Divide Divide and and Conquer Conquer Parallelism Saman Amarasinghe 8 6.035 ©MIT Fall 2006 Why y Loop p s? • 90% of the execution time in 10% of the code – Mostly in loops • If parallel, can get good performance – Load balancing • Relatively easy to analyze Saman Amarasinghe 9 6.035 ©MIT Fall 2006 Prog grammer Defined Parallel Loop p •FORALL •FORACROSS – No “loop carried – Some “loop carried dependences” dependences” – Fully Fully parallel parallel Saman Amarasinghe 10 6.035 ©MIT Fall 2006 Parallel Execution •Example FORPAR FORPAR I I = 0 0t to oN N AI = AI + 1 • Block Distribution: Program gets mapped into Iters = ceiling(N//NUMPROC); FOR P = 0 to NUMPROC-1 FOR I = PIters to MIN((P+1)Iters, N) AI = AI + 1 • SPMD (Single Program, Multiple Data) Code If(myPid == 0) … … Iters = ceiling(N/NUMPROC); Barrier(); FOR FOR I I = myPid myPid Iters Iters to to MIN((myPid+1) MIN((myPid+1) Iters, Iters, N) N) AI = AI + 1 Barrier(); Saman Amarasinghe 11 6.035 ©MIT Fall 2006 Parallel Execution •Example FORPAR FORPAR I I = 0t 0 to oN N AI = AI + 1 • Block Distribution: Program gets mapped into Iters = ceiling(N//NUMPROC); FOR P = 0 to NUMPROC-1 FOR I = PIters to MIN((P+1)Iters, N) AI = AI + 1 • Code fork a function Iters = ceiling(N/NUMPROC); ParallelExecute ParallelExecute(func1); (func1); … void func1(integer myPid) FOR FOR I I = myPid myPid Iters Iters to to MIN((myPid+1) MIN((myPid+1) Iters Iters,N , N)) AI = AI + 1 Saman Amarasinghe 12 6.035 ©MIT Fall 2006 Parallel Execution •SPMD – Need to get all the processors execute the control flow • EEt xtra synch hroni izatiti on overhh eadd or redd undd ant t computation on all processors or both – Stac Stack: Priva ate te o or S Sha are ed d? •Fork – Local variables not visible within the function • Either make the variables used/defined in the loop body body global global or or pass pass and and return return them them as as arguments arguments • Function call overhead Saman Amarasinghe 13 6.035 ©MIT Fall 2006 Parallel Thread Basics • Create separate threads – Create Create an an OS OS thread thread • (hopefully) it will be run on a separate core – pthread_create(&thr, NULL, &entry_point, NULL) – Overhead in thread creation • Create a separate stack • Get Get the the O OS S t to o a allocate llocate a a t thread hread • Thread pool – Create all the threads ( (= num cores) ) at the beg g inning g – Keep N-1 idling on a barrier, while sequential execution – Get them to run parallel code by each executing a function – Back to the barrier when parallel region is done Saman Amarasinghe 14 6.035 ©MIT Fall 2006 Outline • Why Why Parallelism Parallelism • Parallel Execution • Parallelizing Compilers • D D epend dence A A nall ysiis •Inc cre eas asing g Pa ara alle elizat atio on O Oppo ppo rtu tunit tie es sParallelizing g Comp pilers •Finding g FORALL Loop p s out of FOR loop p s • Examples Examples FOR I = 0 to 5 AI = AI + 1 FOR I = 0 to 5 AI = AI+6 + 1 For I = 0 to 5 A2 A2 I I = = A2 A2 I I+1 + 1 +1 + 1 Saman Amarasinghe 16 6.035 ©MIT Fall 2006 ̅ Iteration Sp pace • N deep loops  n-dimensional discrete cartesian space – Normalized loops: assume step size = 1 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7  J J FOR I = 0 to 6 0 FOR J = I to 7 1 2 2 I  3 4 5 5 6 • Iterations are represented as coordinates in iteration space – – ii = = ii , ii , i i ,…, ii 1 2 3 n Saman Amarasinghe 17 6.035 ©MIT Fall 2006 Iteration Sp pace • N deep loops  n-dimensional discrete cartesian space – Normalized loops: assume step size = 1 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7  J J FOR I = 0 to 6 0 FOR J = I to 7 1 2 2 I  3 4 5 5 6 • Iterations are represented as coordinates in iteration space • • Sequential Sequential execution execution order order of of iterations iterations  Lexicographic Lexicographic o order rder 0,0, 0,1, 0,2, …, 0,6, 0,7, 1,1, 1,2, …, 1,6, 1,7, 2,2, …, 2,6, 2,7, ……… 6,6, 6,7, Saman Amarasinghe 18 6.035 ©MIT Fall 2006 ̅ ̅ ̅ ̅ Iteration Sp pace • N deep loops  n-dimensional discrete cartesian space – Normalized loops: assume step size = 1 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7  J J FOR I = 0 to 6 0 FOR J = I to 7 1 2 2 I  3 4 5 5 6 • Iterations are represented as coordinates in iteration space • • Sequential Sequential execution execution order order of of iterations iterations  Lexicographic Lexicographic o order rder • Iteration i is lexicograpically less than j , i j iff there exists c s.t. i = j , i = j ,… i = j and i j 1 1 2 2 c-1 c-1 c c Saman Amarasinghe 19 6.035 ©MIT Fall 2006 Iteration Sp pace • N deep loops  n-dimensional discrete cartesian space – Normalized loops: assume step size = 1 0 0 1 1 2 2 3 3 4 4 5 5 6 6 7 7  J J FOR I = 0 to 6 0 FOR J = I to 7 1 2 2 I  3 4 5 5 6 • An affine loop nest – Loop Loop bounds bounds are are integer integer linear linear functions functions of of constants, constants, loop loop constant constant variables and outer loop indexes – Array accesses are integer linear functions of constants, loop constant variables and loop indexes Saman Amarasinghe 20 6.035 ©MIT Fall 2006

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.