Question? Leave a message!




Multithreaded Parallelism and Performance Measures

Multithreaded Parallelism and Performance Measures
Dr.JakeFinlay Profile Pic
Dr.JakeFinlay,Germany,Teacher
Published Date:22-07-2017
Website URL
Comment
Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 1 / 52Plan 1 Parallelism Complexity Measures 2 cilk for Loops 3 Scheduling Theory and Implementation 4 Measuring Parallelism in Practice 5 Announcements (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 2 / 52Parallelism Complexity Measures Plan 1 Parallelism Complexity Measures 2 cilk for Loops 3 Scheduling Theory and Implementation 4 Measuring Parallelism in Practice 5 Announcements (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 3 / 52Parallelism Complexity Measures The fork-join parallelism model int int ffib (int ib (int n) n) Example: if if ((() () n2 n2)) return return ((((nn); );));; fib(4 fib(4)) else else int int xx,y; ,y; x = x = cilk_spawn cilk_spawn fib(n-1); fib(n-1); 4 y y y y == fib(n fib(n fib(n fib(n2); 2); 2); 2); cilk_sync cilk_sync;; return (x+y); return (x+y); 3 2 22 11 11 00 “Processor oblivious” 11 00 The computation dag unfolds dynamically. We shall also call this model multithreaded parallelism. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 4 / 52Parallelism Complexity Measures Terminology initial strand final strand strand strand continue edge return edge spawn spawn edge edge call edge a strand is is a maximal sequence of instructions that ends with a spawn, sync, or return (either explicit or implicit) statement. At runtime, the spawn relation causes procedure instances to be structured as a rooted tree, called spawn tree or parallel instruction stream, where dependencies among strands form a dag. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 5 / 52Parallelism Complexity Measures Work and span We de ne several performance measures. We assume an ideal situation: no cache issues, no interprocessor costs: T is the minimum running time on p processors p T is called the work, that is, the sum of the number of instructions at 1 each node. T is the minimum running time with in nitely many processors, called 1 the span (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 6 / 52Parallelism Complexity Measures The critical path length Assuming all strands run in unit time, the longest path in the DAG is equal to T . For this reason, T is also referred to as the critical path length. 1 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 7 / 52Parallelism Complexity Measures Work law We have: T  T =p. p 1 Indeed, in the best case, p processors can do p works per unit of time. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 8 / 52Parallelism Complexity Measures Span law We have: T  T . p 1 Indeed, T T contradicts the de nitions of T and T . p 1 p 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 9 / 52Parallelism Complexity Measures Speedup on p processors T =T is called the speedup on p processors 1 p A parallel program execution can have: linear speedup: T =T = (p) 1 P superlinear speedup: T =T =(p) (not possible in this model, 1 P though it is possible in others) sublinear speedup: T =T = o(p) 1 P (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 10 / 52Parallelism Complexity Measures Parallelism Because the Span Law dictates that that T T≥≥ T T , the the m maximum aximum P∞ possible speedup given T 1 and T is ∞ T T /T /T = parall lleli lism 1∞ = the average amount amount of of work work per step along the span. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 11 / 52Parallelism Complexity Measures The Fibonacci example 1 8 22 77 3 4 6 5 For Fib(4), we have T = 17 and T = 8 and thus T =T = 2:125. 1 1 1 1 What about T (Fib(n)) and T (Fib(n))? 1 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 12 / 52Parallelism Complexity Measures Series composition A B Work? Span? (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 13 / 52Parallelism Complexity Measures Series composition A B Work: T (AB) = T (A) +T (B) 1 1 1 Span: T (AB) = T (A) +T (B) 1 1 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 14 / 52Parallelism Complexity Measures Parallel composition A A B Work? Span? (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 15 / 52Parallelism Complexity Measures Parallel composition A A B Work: T (AB) = T (A) +T (B) 1 1 1 Span: T (AB) = max(T (A);T (B)) 1 1 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 16 / 52cilk for Loops Plan 1 Parallelism Complexity Measures 2 cilk for Loops 3 Scheduling Theory and Implementation 4 Measuring Parallelism in Practice 5 Announcements (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 17 / 52cilk for Loops For loop parallelism in Cilk++ a a a a a a 11 12⋯ 1n 11 21⋯ n1 a a a a a a 21 21 22 22⋯ 2n 2n 12 12 22 22⋯ n2 n2 ⋮⋮⋱⋮ ⋮⋮⋱⋮ a a a a a a n1 n1 n2 n2⋯ nn nn 1n 1n 2n 2n⋯ nn nn T AA cilk_for (int i=1; in; ++i) for (int j=0; ji; ++j) double temp = Aij; Aij = Aji; Aji = temp; The iterations of a cilk for loop execute in parallel. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 18 / 52cilk for Loops Implementation of for loops in Cilk++ Up to details (next week) the previous loop is compiled as follows, using a divide-and-conquer implementation: void recur(int lo, int hi) if (hi lo) // coarsen int mid = lo + (hi - lo)/2; cilk_spawn recur(lo, mid); recur(mid+1, hi); cilk_sync; else for (int j=0; jhi; ++j) double temp = Ahij; Ahij = Ajhi; Ajhi = temp; (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 19 / 52cilk for Loops Analysis of parallel for loops 11 22 33 44 5 6 7 8 Here we do not assume that each strand runs in unit time. Span of loop control: (log(n)) Max span of an iteration: (n) Span: (n) 2 Work: (n ) Parallelism: (n) (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 20 / 52