Multithreaded Parallelism and Performance Measures

Multithreaded Parallelism and Performance Measures
Dr.JakeFinlay Profile Pic
Dr.JakeFinlay,Germany,Teacher
Published Date:22-07-2017
Your Website URL(Optional)
Comment
Multithreaded Parallelism and Performance Measures Marc Moreno Maza University of Western Ontario, London, Ontario (Canada) CS 3101 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 1 / 52Parallelism Complexity Measures The fork-join parallelism model int int ffib (int ib (int n) n) Example: if if ((() () n2 n2)) return return ((((nn); );));; fib(4 fib(4)) else else int int xx,y; ,y; x = x = cilk_spawn cilk_spawn fib(n-1); fib(n-1); 4 y y y y == fib(n fib(n fib(n fib(n2); 2); 2); 2); cilk_sync cilk_sync;; return (x+y); return (x+y); 3 2 22 11 11 00 “Processor oblivious” 11 00 The computation dag unfolds dynamically. We shall also call this model multithreaded parallelism. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 4 / 52Parallelism Complexity Measures Terminology initial strand final strand strand strand continue edge return edge spawn spawn edge edge call edge a strand is is a maximal sequence of instructions that ends with a spawn, sync, or return (either explicit or implicit) statement. At runtime, the spawn relation causes procedure instances to be structured as a rooted tree, called spawn tree or parallel instruction stream, where dependencies among strands form a dag. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 5 / 52Parallelism Complexity Measures Work and span We de ne several performance measures. We assume an ideal situation: no cache issues, no interprocessor costs: T is the minimum running time on p processors p T is called the work, that is, the sum of the number of instructions at 1 each node. T is the minimum running time with in nitely many processors, called 1 the span (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 6 / 52Parallelism Complexity Measures The critical path length Assuming all strands run in unit time, the longest path in the DAG is equal to T . For this reason, T is also referred to as the critical path length. 1 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 7 / 52Parallelism Complexity Measures Work law We have: T  T =p. p 1 Indeed, in the best case, p processors can do p works per unit of time. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 8 / 52Parallelism Complexity Measures Span law We have: T  T . p 1 Indeed, T T contradicts the de nitions of T and T . p 1 p 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 9 / 52Parallelism Complexity Measures Speedup on p processors T =T is called the speedup on p processors 1 p A parallel program execution can have: linear speedup: T =T = (p) 1 P superlinear speedup: T =T =(p) (not possible in this model, 1 P though it is possible in others) sublinear speedup: T =T = o(p) 1 P (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 10 / 52Parallelism Complexity Measures Parallelism Because the Span Law dictates that that T T≥≥ T T , the the m maximum aximum P∞ possible speedup given T 1 and T is ∞ T T /T /T = parall lleli lism 1∞ = the average amount amount of of work work per step along the span. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 11 / 52Parallelism Complexity Measures The Fibonacci example 1 8 22 77 3 4 6 5 For Fib(4), we have T = 17 and T = 8 and thus T =T = 2:125. 1 1 1 1 What about T (Fib(n)) and T (Fib(n))? 1 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 12 / 52Parallelism Complexity Measures Series composition A B Work? Span? (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 13 / 52Parallelism Complexity Measures Series composition A B Work: T (AB) = T (A) +T (B) 1 1 1 Span: T (AB) = T (A) +T (B) 1 1 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 14 / 52Parallelism Complexity Measures Parallel composition A A B Work? Span? (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 15 / 52Parallelism Complexity Measures Parallel composition A A B Work: T (AB) = T (A) +T (B) 1 1 1 Span: T (AB) = max(T (A);T (B)) 1 1 1 (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 16 / 52cilk for Loops Plan 1 Parallelism Complexity Measures 2 cilk for Loops 3 Scheduling Theory and Implementation 4 Measuring Parallelism in Practice 5 Announcements (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 17 / 52cilk for Loops For loop parallelism in Cilk++ a a a a a a 11 12⋯ 1n 11 21⋯ n1 a a a a a a 21 21 22 22⋯ 2n 2n 12 12 22 22⋯ n2 n2 ⋮⋮⋱⋮ ⋮⋮⋱⋮ a a a a a a n1 n1 n2 n2⋯ nn nn 1n 1n 2n 2n⋯ nn nn T AA cilk_for (int i=1; in; ++i) for (int j=0; ji; ++j) double temp = Aij; Aij = Aji; Aji = temp; The iterations of a cilk for loop execute in parallel. (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 18 / 52cilk for Loops Implementation of for loops in Cilk++ Up to details (next week) the previous loop is compiled as follows, using a divide-and-conquer implementation: void recur(int lo, int hi) if (hi lo) // coarsen int mid = lo + (hi - lo)/2; cilk_spawn recur(lo, mid); recur(mid+1, hi); cilk_sync; else for (int j=0; jhi; ++j) double temp = Ahij; Ahij = Ajhi; Ajhi = temp; (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 19 / 52cilk for Loops Analysis of parallel for loops 11 22 33 44 5 6 7 8 Here we do not assume that each strand runs in unit time. Span of loop control: (log(n)) Max span of an iteration: (n) Span: (n) 2 Work: (n ) Parallelism: (n) (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 20 / 52cilk for Loops Parallelizing the inner loop cilk_for (int i=1; in; ++i) cilk_for (int j=0; ji; ++j) double temp = Aij; Aij = Aji; Aji = temp; Span of outer loop control: (log(n)) Max span of an inner loop control: (log(n)) Span of an iteration: (1) Span: (log(n)) 2 Work: (n ) Parallelism: (n) (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 21 / 52Scheduling Theory and Implementation Plan 1 Parallelism Complexity Measures 2 cilk for Loops 3 Scheduling Theory and Implementation 4 Measuring Parallelism in Practice 5 Announcements (Moreno Maza) Multithreaded Parallelism and Performance Measures CS 3101 22 / 52