Question? Leave a message!




DIVIDE AND CONQUER

DIVIDE AND CONQUER
DIVIDE AND CONQUER II master theorem ‣ integer multiplication ‣ matrix multiplication ‣ convolution and FFT ‣ Lecture slides by Kevin Wayne Copyright © 2005 PearsonAddison Wesley Copyright © 2013 Kevin Wayne http://www.cs.princeton.edu/wayne/kleinbergtardos Last updated on Jul 20, 2015, 3:31 PMDIVIDE AND CONQUER II master theorem ‣ integer multiplication ‣ matrix multiplication ‣ convolution and FFT ‣ SECTIONS 4.34.6Master method Goal. Recipe for solving common divideandconquer recurrences: n T(n)=aT + f(n) b Terms. a ≥ 1 is the number of subproblems. b 0 is the factor by which the subproblem size decreases. f (n) = work to divide/merge subproblems. Recursion tree. k = log n levels. b i a = number of subproblems at level i. i n / b = size of subproblem at level i. 3Case 1: total cost dominated by cost of leaves lg 3 Ex 1. If T (n) satisfies T (n) = 3 T (n / 2) + n, with T (1) = 1, then T (n) = Θ(n ). T (n) n T (n / 2) T (n / 2) T (n / 2) 3 (n / 2) 2 2 T (n / 4) T (n / 4) T (n / 4) T (n / 4) T (n / 4) T (n / 4) T (n / 4) T (n / 4) T (n / 4) 3 (n / 2 ) i i 3 (n / 2 ) log n 2 ⋮ ⋮ ... log n log n 2 2 T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) 3 (n/2 ) log n log 3 2 2 3 =n 1+log n 2 r 1 2 3 log n log 3 2 2 T(n)=(1+r+r +r +...+r ) n r = 3 / 2 1 = =3n 2n n r 1 4Case 2: total cost evenly distributed among levels Ex 2. If T (n) satisfies T (n) = 2 T (n / 2) + n, with T (1) = 1, then T (n) = Θ(n log n). T (n) n 2 (n / 2) T (n / 2) T (n / 2) 2 2 T (n / 4) T (n / 4) 2 (n / 2 ) T (n / 4) T (n / 4) log n 2 3 3 T (n / 8) T (n / 8) T (n / 8) T (n / 8) T (n / 8) T (n / 8) T (n / 8) T (n / 8) 2 (n / 2 ) ⋮ ⋮ ... T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) n (1) log n 2 2 =n 2 3 log n 2 r = 1 T(n)=(1+r+r +r +...+r ) n = n (log n + 1) 2 5Case 3: total cost dominated by cost of root 5 5 Ex 3. If T (n) satisfies T (n) = 3 T (n / 4) + n , with T (1) = 1, then T (n) = Θ(n ). T (n) 5 n 5 T (n / 4) T (n / 4) T (n / 4) 3 (n / 4) 2 2 5 T (n / 16) T (n / 16) T (n / 16) T (n / 16) T (n / 16) T (n / 16) T (n / 16) T (n / 16) T (n / 16) 3 (n / 4 ) i i 5 log n 3 (n / 4 ) 4 ⋮ ⋮ ... log n log n 5 4 4 T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) T (1) 3 (n/4 ) log n log 3 4 4 3 = n 1 5 5 5 2 3 5 r = 3 / 4 1 n ≤ T (n) ≤ (1 + r + r + r + … ) n ≤ n 1 – r 6Master theorem Master theorem. Suppose that T (n) is a function on the nonnegative integers that satisfies the recurrence n T(n)=aT + f(n) b where n / b means either ⎣ n / b⎦ or ⎡ n / b⎤. Let k = log a. Then, b k – ε k Case 1. If f (n) = O(n ) for some constant ε 0, then T (n) = Θ(n ). Ex. T (n) = 3 T(n / 2) + n. a = 3, b = 2, f (n) = n, k = log 3. 2 lg 3 T (n) = Θ(n ). 7Master theorem Master theorem. Suppose that T (n) is a function on the nonnegative integers that satisfies the recurrence n T(n)=aT + f(n) b where n / b means either ⎣ n / b⎦ or ⎡ n / b⎤. Let k = log a. Then, b k p k p+1 Case 2. If f (n) = Θ(n log n), then T (n) = Θ(n log n). Ex. T (n) = 2 T(n / 2) + Θ(n log n). a = 2, b = 2, f (n) = 17 n, k = log 2 = 1, p = 1. 2 2 T (n) = Θ(n log n). 8Master theorem Master theorem. Suppose that T (n) is a function on the nonnegative integers that satisfies the recurrence n T(n)=aT + f(n) b regularity condition holds k + ε where n / b means either ⎣ n / b⎦ or ⎡ n / b⎤. Let k = log a. Then, if f(n) = Θ(n ) b k + ε Case 3. If f (n) = Ω(n ) for some constant ε 0 and if a f (n / b) ≤ c f (n) for some constant c 1 and all sufficiently large n, then T (n) = Θ ( f (n) ). 5 Ex. T (n) = 3 T(n / 4) + n . 5 a = 3, b = 4, f (n) = n , k = log 3. 4 5 T (n) = Θ(n ). 9Master theorem Master theorem. Suppose that T (n) is a function on the nonnegative integers that satisfies the recurrence n T(n)=aT + f(n) b where n / b means either ⎣ n / b⎦ or ⎡ n / b⎤. Let k = log a. Then, b k – ε k Case 1. If f (n) = O(n ) for some constant ε 0, then T (n) = Θ(n ). k p k p+1 Case 2. If f (n) = Θ(n log n), then T (n) = Θ(n log n). k + ε Case 3. If f (n) = Ω(n ) for some constant ε 0 and if a f (n / b) ≤ c f (n) for some constant c 1 and all sufficiently large n, then T (n) = Θ ( f (n) ). Pf sketch. Use recursion tree to sum up terms (assuming n is an exact power of b). Three cases for geometric series. Deal with floors and ceilings. 10AkraBazzi theorem Desiderata. Generalizes master theorem to divideandconquer algorithms where subproblems have substantially different sizes. Theorem. AkraBazzi Given constants a 0 and 0 b ≤ 1, functions i i 2 c h (n) = O(n / log n) and g(n) = O(n ), if the function T(n) satisfies the recurrence: i k T(n)= a T (b n+h (n))+g(n) i i i i=1 a subproblems small perturbation to handle i floors and ceilings of size b n i k n g(u) p p a b =1 i Then T(n) = where p satisfies . n 1+ du i p+1 u 1 i=1 2 Ex. T(n) = 7/4 T(⎣n / 2⎦) + T(⎡3/4 n⎤) + n . a = 7/4, b = 1/2, a = 1, b = 3/4 ⇒ p = 2. 1 1 2 2 h (n) = ⎣1/2 n⎦ – 1/2 n, h (n) = ⎡3/4 n⎤ – 3/4 n. 1 2 2 2 g(n) = n ⇒ T(n) = Θ(n log n). 11DIVIDE AND CONQUER II master theorem ‣ integer multiplication ‣ matrix multiplication ‣ convolution and FFT ‣ SECTION 5.5Integer addition Addition. Given two nbit integers a and b, compute a + b. Subtraction. Given two nbit integers a and b, compute a – b. Gradeschool algorithm. Θ(n) bit operations. 1 1 1 1 1 1 0 1 1 1 0 1 0 1 0 1 + 0 1 1 1 1 1 0 1 1 0 1 0 1 0 0 1 0 Remark. Gradeschool addition and subtraction algorithms are asymptotically optimal. 13Integer multiplication Multiplication. Given two nbit integers a and b, compute a × b. 2 Gradeschool algorithm. Θ(n ) bit operations. 1 1 0 1 0 1 0 1 × 0 1 1 1 1 1 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 1 0 1 0 0 0 0 0 0 0 0 0 0 1 Conjecture. Kolmogorov 1952 Gradeschool algorithm is optimal. Theorem. Karatsuba 1960 Conjecture is wrong. 14Divideandconquer multiplication To multiply two nbit integers x and y: Divide x and y into low and highorder bits. Multiply four ½nbit integers, recursively. Add and shift to obtain result. m = ⎡ n / 2 ⎤ m m a = ⎣ x / 2 ⎦ b = x mod 2 use bit shifting to compute 4 terms m m c = ⎣ y / 2 ⎦ d = y mod 2 m m 2m m (2 a + b) (2 c + d) = 2 ac + 2 (bc + ad) + bd 2 3 4 1 Ex. x = 10001101 y = 111 0 0 0 0 1 a b c d 15Divideandconquer multiplication MULTIPLY(x, y, n) IF (n = 1) RETURN x 𐄂 y. ELSE m← ⎡ n / 2 ⎤. m m a ← ⎣ x / 2 ⎦; b ← x mod 2 . m m c ← ⎣ y / 2 ⎦; d ← y mod 2 . e ← MULTIPLY(a, c, m). f ← MULTIPLY(b, d, m). g ← MULTIPLY(b, c, m). h ← MULTIPLY(a, d, m). 2m m RETURN 2 e + 2 (g + h) + f. 16Divideandconquer multiplication analysis Proposition. The divideandconquer multiplication algorithm requires 2 Θ(n ) bit operations to multiply two nbit integers. Pf. Apply case 1 of the master theorem to the recurrence: 2 T(n) = 4T n/2 + Θ(n) ⇒ T(n) =Θ(n ) ( ) " " add, shift recursive calls € 17Karatsuba trick To compute middle term bc + ad, use identity: bc + ad = ac + bd – (a – b) (c – d) m = ⎡ n / 2 ⎤ m m a = ⎣ x / 2 ⎦ b = x mod 2 middle term m m c = ⎣ y / 2 ⎦ d = y mod 2 m m 2m m (2 a + b) (2 c + d) = 2 ac + 2 (bc + ad ) + bd 2m m = 2 ac + 2 (ac + bd – (a – b)(c – d)) + bd 1 1 3 2 3 Bottom line. Only three multiplication of n / 2bit integers. 18Karatsuba multiplication KARATSUBAMULTIPLY(x, y, n) IF (n = 1) RETURN x 𐄂 y. ELSE m ← ⎡ n / 2 ⎤. m m a ← ⎣ x / 2 ⎦; b ← x mod 2 . m m c ← ⎣ y / 2 ⎦; d ← y mod 2 . e ← KARATSUBAMULTIPLY(a, c, m). f ← KARATSUBAMULTIPLY(b, d, m). g ← KARATSUBAMULTIPLY(a – b, c – d, m). 2m m RETURN 2 e + 2 (e + f – g) + f. 19Karatsuba analysis 1.585 Proposition. Karatsuba's algorithm requires O(n ) bit operations to multiply two nbit integers. Pf. Apply case 1 of the master theorem to the recurrence: lg 3 1.585 T(n) = 3 T(n / 2) + Θ(n) ⇒ T(n) = Θ(n ) = O(n ). Practice. Faster than gradeschool algorithm for about 320640 bits. 20Integer arithmetic reductions Integer multiplication. Given two nbit integers, compute their product. problem arithmetic running time integer multiplication a × b Θ(M(n)) integer division a / b, a mod b Θ(M(n)) 2 integer square a Θ(M(n)) integer square root ⎣√a⎦ Θ(M(n)) integer arithmetic problems with the same complexity as integer multiplication 21History of asymptotic complexity of integer multiplication year algorithm order of growth 2 brute force Θ(n ) 1.585 1962 KaratsubaOfman Θ(n ) 1.465 1.404 1963 Toom3, Toom4 Θ(n ), Θ(n ) 1 + ε 1966 ToomCook Θ(n ) 1971 Schönhage–Strassen Θ(n log n log log n) O(logn) 2007 Fürer n log n 2 Θ(n) number of bit operations to multiply two nbit integers used in Maple, Mathematica, gcc, cryptography, ... Remark. GNU Multiple Precision Library uses one of five different algorithm depending on size of operands. 22DIVIDE AND CONQUER II master theorem ‣ integer multiplication ‣ matrix multiplication ‣ convolution and FFT ‣ SECTION 4.2Dot product Dot product. Given two length n vectors a and b, compute c = a⋅ b. Gradeschool. Θ(n) arithmetic operations. n a⋅ b = a b ∑ i i i=1 € a = .70 .20 .10 b = .30 .40 .30 a ⋅ b = (.70 × .30) + (.20 × .40) + (.10 × .30) = .32 € Remark. Gradeschool dot product algorithm is asymptotically optimal. 24Matrix multiplication Matrix multiplication. Given two nbyn matrices A and B, compute C = AB. 3 Gradeschool. Θ(n ) arithmetic operations. n c = a b ∑ ij ik kj k=1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ c c c a a a b b b 11 12 1n 11 12 1n 11 12 1n ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ € c c c a a a b b b 21 22 2n 21 22 2n 21 22 2n ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = × ⎢ " " " ⎥ ⎢ " " " ⎥ ⎢ " " " ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ c c c a a a b b b ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ n1 n2 nn n1 n2 nn n1 n2 nn € ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ .59 .32 .41 .70 .20 .10 .80 .30 .50 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ .31 .36 .25 = .30 .60 .10 × .10 .40 .10 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ .45 .31 .42 .50 .10 .40 .10 .30 .40 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ € Q. Is gradeschool matrix multiplication algorithm asymptotically optimal 25Block matrix multiplication A A B 11 12 11 C 11 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 152 158 164 170 0 1 2 3 16 17 18 19 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 504 526 548 570 4 5 6 7 20 21 22 23 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ = × ⎢ 856 894 932 970⎥ ⎢ 8 9 10 11⎥ ⎢24 25 26 27⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 1208 1262 1316 1370 12 13 14 15 28 29 30 31 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ B 11 € ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 0 1 16 17 2 3 24 25 152 158 C = A ×B + A ×B = × + × = 11 11 12 21 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 11 4 5 20 21 6 7 28 29 504 526 ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ € 26Matrix multiplication: warmup To multiply two nbyn matrices A and B: Divide: partition A and B into ½nby½n blocks. Conquer: multiply 8 pairs of ½nby½n matrices, recursively. Combine: add appropriate products using 4 matrix additions. C = A × B + A × B ( ) ( ) 11 11 11 12 21 C = A × B + A × B ( ) ( ) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ 12 11 12 12 22 C C A A B B 11 12 11 12 11 12 = × ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ C = A × B + A × B ( ) ( ) 21 21 11 22 21 C C A A B B ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 21 22 21 22 21 22 C = A × B + A × B ( ) ( ) 22 21 12 22 22 € € Running time. Apply case 1 of Master Theorem. 2 3 T(n) = 8T n /2 + Θ(n ) ⇒ T(n) =Θ(n ) ( ) " " add, form submatrices recursive calls 27 € Strassen's trick Key idea. multiply 2by2 blocks with only 7 multiplications. (plus 11 additions and 7 subtractions) ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ C C A A B B 11 12 11 12 11 12 = × P ← A 𐄂 (B – B ) 1 11 12 22 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ C C A A B B ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ 21 22 21 22 21 22 P ← (A + A ) 𐄂 B 2 11 12 22 P ← (A + A ) 𐄂 B 3 21 22 11 C = P + P – P + P 11 5 4 2 6 P ← A 𐄂 (B – B ) 4 22 21 11 € C = P + P 12 1 2 P ← (A + A ) 𐄂 (B + B ) 5 11 22 11 22 C = P + P 21 3 4 P ← (A – A ) 𐄂 (B + B ) 6 12 22 21 22 C = P + P – P – P 22 1 5 3 7 P ← (A – A ) 𐄂 (B + B ) 7 11 21 11 12 Pf. C = P + P 12 1 2 = A 𐄂 (B – B ) + (A + A ) 𐄂 B 11 12 22 11 12 22 = A 𐄂 B + A 𐄂 B . ✔ 11 12 12 22 28Strassen's algorithm STRASSEN (n, A, B) IF (n = 1) RETURN A 𐄂 B. assume n is Partition A and B into 2by2 block matrices. a power of 2 P ← STRASSEN (n / 2, A , (B – B )). 1 11 12 22 keep track of indices of submatrices P ← STRASSEN (n / 2, (A + A ), B ). 2 11 12 22 (don't copy matrix entries) P ← STRASSEN (n / 2, (A + A ), B ). 3 21 22 11 P ← STRASSEN (n / 2, A , (B – B )). 4 22 21 11 P ← STRASSEN (n / 2, (A + A ) 𐄂 (B + B )). 5 11 22 11 22 P ← STRASSEN (n / 2, (A – A ) 𐄂 (B + B )). 6 12 22 21 22 P ← STRASSEN (n / 2, (A – A ) 𐄂 (B + B )). 7 11 21 11 12 C = P + P – P + P . 11 5 4 2 6 C = P + P . 12 1 2 C = P + P . 21 3 4 C = P + P – P – P . 22 1 5 3 7 RETURN C. 29Analysis of Strassen's algorithm 2.81 Theorem. Strassen's algorithm requires O(n ) arithmetic operations to multiply two nbyn matrices. Pf. Apply case 1 of the master theorem to the recurrence: 2 log 7 2.81 2 T(n) = 7T n/2 + Θ(n ) ⇒ T(n) =Θ(n ) =O(n ) ( ) " " add, subtract recursive calls € Q. What if n is not a power of 2 A. Could pad matrices with zeros. 1230 10 11 12 0 84 90 96 0 4560 13 14 15 0 201 216 231 0 = 7890 16 17 18 0 318 342 366 0 0000 00 0 0 00 0 0 30Strassen's algorithm: practice Implementation issues. Sparsity. Caching effects. Numerical stability. Odd matrix dimensions. Crossover to classical algorithm when n is "small" . Common misperception. “Strassen is only a theoretical curiosity.” Apple reports 8x speedup on G4 Velocity Engine when n ≈ 2,048. Range of instances where it's useful is a subject of controversy. 31Linear algebra reductions Matrix multiplication. Given two nbyn matrices, compute their product. problem linear algebra order of growth matrix multiplication A × B Θ(MM(n)) –1 matrix inversion A Θ(MM(n)) determinant A Θ(MM(n)) system of linear equations Ax = b Θ(MM(n)) LU decomposition A = L U Θ(MM(n)) least squares min Ax – b Θ(MM(n)) 2 numerical linear algebra problems with the same complexity as matrix multiplication 32Fast matrix multiplication: theory Q. Multiply two 2by2 matrices with 7 scalar multiplications A. Yes Strassen 1969 log 7 2.807 2 Θ(n ) =O(n ) Q. Multiply two 2by2 matrices with 6 scalar multiplications € A. Impossible. Hopcroft and Kerr 1971 log 6 2.59 2 Θ(n ) = O(n ) Q. Multiply two 3by3 matrices with 21 scalar multiplications log 21 2.77 3 A. Unknown. € Θ(n ) = O(n ) Begun, the decimal wars have. Pan, Bini et al, Schönhage, … 2.805 € O(n ) Two 20by20 matrices with 4,460 scalar multiplications. 2.7801 O(n ) Two 48by48 matrices with 47,217 scalar multiplications. 2.7799 A year later. O(n ) € 2.521813 O(n ) December 1979. € 2.521801 January 1980. O(n ) € € 33 € History of asymptotic complexity of matrix multiplication year algorithm order of growth 3 brute force O (n ) 2.808 1969 Strassen O (n ) 2.796 1978 Pan O (n ) 2.780 1979 Bini O (n ) 2.522 1981 Schönhage O (n ) 2.517 1982 Romani O (n ) 2.496 1982 CoppersmithWinograd O (n ) 2.479 1986 Strassen O (n ) 2.376 1989 CoppersmithWinograd O (n ) 2.3737 2010 Strother O (n ) 2.3727 2011 Williams O (n ) 2 + ε O (n ) number of floatingpoint operations to multiply two nbyn matrices 34DIVIDE AND CONQUER II master theorem ‣ integer multiplication ‣ matrix multiplication ‣ convolution and FFT ‣ SECTION 5.6Fourier analysis Fourier theorem. Fourier, Dirichlet, Riemann Any (sufficiently smooth) periodic function can be expressed as the sum of a series of sinusoids. t N 2 sinkt y(t) = ∑ N N N N = 1 = 5 = 10 = 100 π k k=1 € 36Euler's identity ix Euler's identity. e = cos x + i sin x. Sinusoids. Sum of sine and cosines = sum of complex exponentials. 37Time domain vs. frequency domain 1 1 y(t) = sin(2π ⋅ 697 t) + sin(2π ⋅ 1209 t) Signal. touch tone button 1 2 2 € Time domain. sound pressure Frequency domain. 0.5 amplitude Reference: Cleve Moler, Numerical Computing with MATLAB 38Time domain vs. frequency domain Signal. recording, 8192 samples per second Magnitude of discrete Fourier transform. Reference: Cleve Moler, Numerical Computing with MATLAB 39Fast Fourier transform FFT. Fast way to convert between timedomain and frequencydomain. Alternate viewpoint. Fast way to multiply and evaluate polynomials. we take this approach “ If you speed up any nontrivial algorithm by a factor of a million or so the world will beat a path towards finding useful applications for it. ” — Numerical Recipes 40Fast Fourier transform: applications Applications. Optics, acoustics, quantum physics, telecommunications, radar, control systems, signal processing, speech recognition, data compression, image processing, seismology, mass spectrometry, … Digital media. DVD, JPEG, MP3, H.264 Medical diagnostics. MRI, CT, PET scans, ultrasound Numerical solutions to Poisson's equation. Integer and polynomial multiplication. Shor's quantum factoring algorithm. … “ The FFT is one of the truly great computational developments of the 20th century. It has changed the face of science and engineering so much that it is not an exaggeration to say that life as we know it would be very different without the FFT. ” — Charles van Loan 41Fast Fourier transform: brief history Gauss (1805, 1866). Analyzed periodic motion of asteroid Ceres. RungeKönig (1924). Laid theoretical groundwork. DanielsonLanczos (1942). Efficient algorithm, xray crystallography. CooleyTukey (1965). Monitoring nuclear tests in Soviet Union and tracking submarines. Rediscovered and popularized FFT. paper published only after IBM lawyers decided not to set precedent of patenting numerical algorithms (conventional wisdom: nobody could make money selling software) Importance not fully realized until advent of digital computers. 42Polynomials: coefficient representation Polynomial. coefficient representation 2 n−1 A(x) = a + a x + a x ++ a x 0 1 2 n−1 2 n−1 B(x) =b +bx +b x ++b x 0 1 2 n−1 € Add. O(n) arithmetic operations. € n−1 A(x)+ B(x) =(a +b )+(a +b )x ++(a +b )x 0 0 1 1 n−1 n−1 Evaluate. O(n) using Horner's method. € A(x) =a +(x(a +x(a ++x(a +x(a )))) 0 1 2 n−2 n−1 2 Multiply (convolve). O(n ) using brute force. € 2n−2 i i A(x)× B(x) = c x , where c = a b ∑ ∑ i i j i− j i =0 j =0 43 € A modest Ph.D. dissertation title DEMONSTRATIO NOVA THEOREMATIS OMNEM FVNCTIONEM ALGEBRAICAM RATIONALEM INTEGRAM VNIVS VARIABILIS IN FACTORES REALES PRIMI VEL SECUNDI GRADVS RESOLVI POSSE AVCTORE CAROLO FRIDERICO GAVSS HELMSTADII APVD C. G. FLECKEISEN. 1799 1. Quaelibet aequatio algebraica determinata reduci potest ad m m1 m2 formam x +Ax +Bx + etc. +M=0, ita vt m sit numerus integer positiuus. Si partem primam huius aequationis per X denotamus, aequationique X=0 per plures valores inaequales ipsius x satisfieri supponimus, puta ponendo x=α, x=β, x=γ etc. functio X per productum e factoribus xα, xβ, xγ etc. diuisibilis erit. Vice versa, si productum e pluribus factoribus simplicibus x α, xβ, xγ etc. functionem X metitur: aequationi X=0 satisfiet, aequando ipsam x cuicunque quantitatum α, β, γ etc. Denique si X producto ex m factoribus talibus simplicibus aequalis est (siue omnes diuersi sint, siue quidam ex ipsis identici): alii factores simplices praeter hos functionem X metiri non poterunt. ti Quamobrem aequatio m gradus plures quam m radices habere ti nequit; simul vero patet, aequationem m gradus pauciores radices habere posse, etsi X in m factores simplices resolubilis sit: "New proof of the theorem that every algebraic rational integral function in one variable can be resolved into real factors of the first or the second degree." 44Polynomials: pointvalue representation Fundamental theorem of algebra. A degree n polynomial with complex coefficients has exactly n complex roots. Corollary. A degree n – 1 polynomial A(x) is uniquely specified by its evaluation at n distinct values of x y y = A(x ) j j x j x 45Polynomials: pointvalue representation Polynomial. pointvalue representation A(x) : (x , y ), …, (x , y ) 0 0 n1 n−1 B(x) : (x , z ), …, (x , z ) 0 0 n1 n−1 Add. O(n) arithmetic operations. € A(x)+B(x) : (x , y +z ),…, (x , y +z ) 0 0 0 n1 n−1 n−1 Multiply (convolve). O(n), but need 2n – 1 points. € A(x)× B(x) : (x , y × z ), …, (x , y × z ) 0 0 0 2n1 2n−1 2n−1 2 Evaluate. O(n ) using Lagrange's formula. € (x−x ) ∏ j n−1 j≠k A(x) = y ∑ k (x −x ) ∏ k=0 k j j≠k 46 € Converting between two representations Tradeoff. Fast evaluation or fast multiplication. We want both representation multiply evaluate 2 coefficient O(n ) O(n) 2 pointvalue O(n) O(n ) Goal. Efficient conversion between two representations ⇒ all ops fast. (x , y ), …, (x , y ) a , a , ..., a 0 0 n−1 n−1 0 1 n1 pointvalue representation coefficient representation € € 47Converting between two representations: brute force n–1 Coefficient ⇒ pointvalue. Given a polynomial a + a x + ... + a x , 0 1 n–1 evaluate it at n distinct points x , ..., x . 0 n–1 2 n−1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ y a 1 x x " x 0 0 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 n−1 y a 1 x x " x 1 1 ⎢ 1 1 1 ⎥ ⎢ ⎥ ⎢ ⎥ 2 n−1 ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ y a 1 x x " x 2 2 2 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 n−1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ y a ⎣ ⎦ 1 x x " x ⎣ ⎦ ⎣ ⎦ n−1 n−1 n−1 n−1 n−1 € 2 Running time. O(n ) for matrixvector multiply (or n Horner's). 48Converting between two representations: brute force Pointvalue ⇒ coefficient. Given n distinct points x , ... , x and values 0 n–1 n–1 y , ... , y , find unique polynomial a + a x + ... + a x , that has given 0 n–1 0 1 n–1 values at given points. 2 n−1 ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ y a 1 x x " x 0 0 0 0 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 n−1 y a 1 x x " x 1 1 ⎢ 1 1 1 ⎥ ⎢ ⎥ ⎢ ⎥ 2 n−1 ⎢ ⎥ ⎢ ⎥ = ⎢ ⎥ y a 1 x x " x 2 2 2 2 2 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 n−1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥  y a ⎣ ⎦ 1 x x " x ⎣ ⎦ ⎣ ⎦ n−1 n−1 n−1 n−1 n−1 Vandermonde matrix is invertible iff x distinct i € 3 Running time. O(n ) for Gaussian elimination. 2.3727 or O(n ) via fast matrix multiplication 49Divideandconquer Decimation in frequency. Break up polynomial into low and high powers. 2 3 4 5 6 7 A(x) = a + a x + a x + a x + a x + a x + a x + a x . 0 1 2 3 4 5 6 7 2 3 A (x) = a + a x + a x + a x . low 0 1 2 3 2 3 A (x) = a + a x + a x + a x . high 4 5 6 7 4 A(x) = A (x) + x A (x). low high Decimation in time. Break up polynomial into even and odd powers. 2 3 4 5 6 7 A(x) = a + a x + a x + a x + a x + a x + a x + a x . 0 1 2 3 4 5 6 7 2 3 A (x) = a + a x + a x + a x . even 0 2 4 6 2 3 A (x) = a + a x + a x + a x . odd 1 3 5 7 2 2 A(x) = A (x ) + x A (x ). even odd 50Coefficient to pointvalue representation: intuition n–1 Coefficient ⇒ pointvalue. Given a polynomial a + a x + ... + a x , 0 1 n–1 evaluate it at n distinct points x , ..., x . we get to choose which ones 0 n–1 Divide. Break up polynomial into even and odd powers. 2 3 4 5 6 7 A(x) = a + a x + a x + a x + a x + a x + a x + a x . 0 1 2 3 4 5 6 7 2 3 A (x) = a + a x + a x + a x . even 0 2 4 6 2 3 A (x) = a + a x + a x + a x . odd 1 3 5 7 2 2 A(x) = A (x ) + x A (x ). even odd 2 2 A(x) = A (x ) – x A (x ). even odd Intuition. Choose two points to be ±1. Can evaluate polynomial of degree ≤ n A( 1) = A (1) + 1 A (1). even odd at 2 points by evaluating two polynomials A(1) = A (1) – 1 A (1). even odd of degree ≤ ½n at 1 point. 51Coefficient to pointvalue representation: intuition n–1 Coefficient ⇒ pointvalue. Given a polynomial a + a x + ... + a x , 0 1 n–1 evaluate it at n distinct points x , ..., x . we get to choose which ones 0 n–1 Divide. Break up polynomial into even and odd powers. 2 3 4 5 6 7 A(x) = a + a x + a x + a x + a x + a x + a x + a x . 0 1 2 3 4 5 6 7 2 3 A (x) = a + a x + a x + a x . even 0 2 4 6 2 3 A (x) = a + a x + a x + a x . odd 1 3 5 7 2 2 A(x) = A (x ) + x A (x ). even odd 2 2 A(x) = A (x ) – x A (x ). even odd Intuition. Choose four complex points to be ±1, ±i. A( 1) = A (1) + 1 A (1). even odd Can evaluate polynomial of degree ≤ n A(1) = A (1) – 1 A (1). even odd at 4 points by evaluating two polynomials A( i ) = A (1) + i A (1). even odd of degree ≤ ½n at 2 point. A( i ) = A (1) – i A (1). even odd 52Discrete Fourier transform n–1 Coefficient ⇒ pointvalue. Given a polynomial a + a x + ... + a x , 0 1 n–1 evaluate it at n distinct points x , ..., x . 0 n–1 k th Key idea. Choose x = ω where ω is principal n root of unity. k ⎡ ⎤ 1 1 1 1 " 1 ⎡ ⎤ ⎡ ⎤ y a 0 0 ⎢ ⎥ ⎢ ⎥ 1 2 3 n−1 ⎢ ⎥ 1 ω ω ω " ω y a 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 4 6 2(n−1) ⎢ ⎥ 1 ω ω ω " ω ⎢ ⎥ ⎢ ⎥ y a 2 2 = ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 3 6 9 3(n−1) 1 ω ω ω " ω y a 3 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ n−1 2(n−1) 3(n−1) (n−1)(n−1) ⎢ ⎥ 1 ω ω ω " ω y a ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ n−1 n−1 DFT Fourier matrix F n € 53Roots of unity th n Def. An n root of unity is a complex number x such that x = 1. th 0 1 n–1 2π i / n Fact. The n roots of unity are: ω , ω , …, ω where ω = e . k n 2π i k / n n π i 2k 2k Pf. (ω ) = (e ) = (e ) = (1) = 1. th 0 1 n/2–1 2 4π i / n Fact. The ½n roots of unity are: ν , ν , …, ν where ν = ω = e . 2 1 ω = ν = i 3 ω 1 ω 4 2 n = 8 0 0 ω = ν = 1 ω = ν = 1 7 5 ω ω 6 3 ω = ν = i 54Fast Fourier transform n–1 Goal. Evaluate a degree n – 1 polynomial A(x) = a + ... + a x at its 0 n–1 th 0 1 n–1 n roots of unity: ω , ω , …, ω . Divide. Break up polynomial into even and odd powers. 2 n/2 – 1 A (x) = a + a x + a x + … + a x . even 0 2 4 n2 2 n/2 – 1 A (x) = a + a x + a x + … + a x . odd 1 3 5 n1 2 2 A(x) = A (x ) + x A (x ). even odd th 0 1 n/2–1 Conquer. Evaluate A (x) and A (x) at the ½n roots of unity: ν , ν , …, ν . even odd k k 2 ν = (ω ) Combine. k k k k A(ω ) = A (ν ) + ω A (ν ), 0 ≤ k n/2 even odd k+ ½n k k k A(ω ) = A (ν ) – ω A (ν ), 0 ≤ k n/2 even odd k k + ½n 2 k+ ½n k ν = (ω ) ω = ω 55FFT: implementation FFT (n, a , a , a , …, a ) 0 1 2 n–1 IF (n = 1) RETURN a . 0 (e , e , …, e ) ← FFT (n/2, a , a , a , …, a ). 0 1 n/2–1 0 2 4 n–2 (d , d , …, d ) ← FFT (n/2, a , a , a , …, a ). 0 1 n/2–1 1 3 5 n–1 FOR k = 0 TO n/2 – 1. k 2πik/n ω ← e . k y ← e + ω d . k k k k y ← e – ω d . k + n/2 k k RETURN (y , y , y , …, y ). 0 1 2 n–1 56FFT: summary Theorem. The FFT algorithm evaluates a degree n – 1 polynomial at each of th the n roots of unity in O(n log n) steps and O(n) extra space. assumes n is a power of 2 Pf. T(n) = 2T(n/2) + Θ(n) ⇒ T(n) = Θ(n logn) € O(n log n) (x , y ), …, (x , y ) a , a , ..., a 0 0 n−1 n−1 0 1 n1 pointvalue representation coefficient representation € € 57FFT: recursion tree a , a , a , a , a , a , a , a 0 1 2 3 4 5 6 7 inverse perfect shuffle a , a , a , a a , a , a , a 0 2 4 6 1 3 5 7 a , a a , a a , a a , a 0 4 2 6 1 5 3 7 a a a a a a a a 2 6 1 5 3 7 0 4 000 100 010 110 001 101 011 111 "bitreversed" order 58Inverse discrete Fourier transform Pointvalue ⇒ coefficient. Given n distinct points x , ... , x and 0 n–1 n–1 values y , ... , y , find unique polynomial a + a x + ... + a x , 0 n–1 0 1 n–1 that has given values at given points. −1 ⎡ ⎤ 1 1 1 1 " 1 ⎡ ⎤ ⎡ ⎤ a y 0 0 ⎢ ⎥ 1 2 3 n−1 ⎢ ⎥ ⎢ ⎥ 1 ω ω ω " ω a y 1 1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ 2 4 6 2(n−1) ⎢ ⎥ 1 ω ω ω " ω ⎢ ⎥ ⎢ ⎥ a y 2 2 = ⎢ ⎥ ⎢ ⎥ 3 6 9 3(n−1) ⎢ ⎥ 1 ω ω ω " ω a y 3 3 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ n−1 2(n−1) 3(n−1) (n−1)(n−1) ⎢ ⎥ 1 ω ω ω " ω a y ⎣ ⎦ ⎣ ⎦ ⎣ ⎦ n−1 n−1 1 Inverse DFT Fourier matrix inverse (F ) n € 59Inverse discrete Fourier transform Claim. Inverse of Fourier matrix F is given by following formula: n ⎡ ⎤ 1 1 1 1 1 ⎢ ⎥ −1 −2 −3 −(n−1) 1 ω ω ω ω ⎢ ⎥ −2 −4 −6 −2(n−1) ⎢ ⎥ 1 ω ω ω ω 1 G = ⎢ ⎥ n −3 −6 −9 −3(n−1) 1 ω ω ω ω n ⎢ ⎥ ⎢ ⎥ " " " " " ⎢ ⎥ −(n−1) −2(n−1) −3(n−1) −(n−1)(n−1) 1 ω ω ω ω ⎣ ⎦ F / √n is a unitary matrix n € Consequence. To compute inverse FFT, apply same algorithm but use –1 –2π i / n th ω = e as principal n root of unity (and divide the result by n). 60Inverse FFT: proof of correctness Claim. F and G are inverses. n n Pf. n−1 n−1 ⎧ 1 if k = k ʹ′ 1 1 k j − j k ʹ′ (k−k ʹ′) j F G = ω ω = ω = ∑ ∑ ⎨ ( ) n n k k ʹ′ n n 0 otherwise ⎩ j=0 j=0 summation lemma (below) € th Summation lemma. Let ω be a principal n root of unity. Then n−1 ⎧ n if k≡ 0 mod n k j ω = ∑ ⎨ 0 otherwise ⎩ j=0 Pf. k If k is a multiple of n then ω = 1 ⇒ series sums to n. € th k n 2 n1 Each n root of unity ω is a root of x – 1 = (x – 1) (1 + x + x + ... + x ). k k k(2) k(n1) if ω ≠ 1 we have: 1 + ω + ω + … + ω = 0 ⇒ series sums to 0. ▪ 61Inverse FFT: implementation Note. Need to divide result by n. INVERSEFFT (n, y , y , y , …, y ) 0 1 2 n–1 IF (n = 1) RETURN y . 0 (e , e , …, e ) ← INVERSEFFT (n/2, y , y , y , …, y ). 0 1 n/2–1 0 2 4 n–2 (d , d , …, d ) ← INVERSEFFT (n/2, y , y , y , …, y ). 0 1 n/2–1 1 3 5 n–1 FOR k = 0 TO n/2 – 1. k 2πik/n – ω ← e . switch roles of a and y i i k a ← (e + ω d ). k k k k a ← (e – ω d ). k + n/2 k k RETURN (a , a , a , …, a ). 0 1 2 n–1 62Inverse FFT: summary Theorem. The inverse FFT algorithm interpolates a degree n – 1 polynomial th given values at each of the n roots of unity in O(n log n) steps. assumes n is a power of 2 O(n log n) (FFT) (x , y ), …, (x , y ) a , a , ..., a 0 0 n−1 n−1 0 1 n1 pointvalue representation coefficient representation O(n log n) (inverse FFT) € € 63Polynomial multiplication Theorem. Can multiply two degree n – 1 polynomials in O(n log n) steps. Pf. pad with 0s to make n a power of 2 coefficient coefficient representation representation c , c , …, c a , a , …, a 0 1 2n2 0 1 n1 b ,b , …,b 0 1 n1 € inverse FFT 2 FFTs € O(n log n) O(n log n) 0 2n−1 pointvalue A(ω ), ..., A(ω ) 0 2n−1 multiplication C(ω ), ...,C(ω ) 0 2n−1 B(ω ), ...,B(ω ) O(n) 64 € € FFT in practice 65FFT in practice Fastest Fourier transform in the West. Frigo and Johnson Optimized C library. Features: DFT, DCT, real, complex, any size, any dimension. Won 1999 Wilkinson Prize for Numerical Software. Portable, competitive with vendortuned code. Implementation details. Core algorithm is nonrecursive version of CooleyTukey. Instead of executing predetermined algorithm, it evaluates your hardware and uses a specialpurpose compiler to generate an optimized algorithm catered to "shape" of the problem. Runs in O(n log n) time, even when n is prime. Multidimensional FFTs. http://www.fftw.org 66Integer multiplication, redux Integer multiplication. Given two nbit integers a = a … a a and n–1 1 0 b = b … b b , compute their product a ⋅ b. n–1 1 0 Convolution algorithm. 2 n−1 A(x) = a + a x + a x ++ a x Form two polynomials. 0 1 2 n−1 2 n−1 Note: a = A(2), b = B(2). B(x) = b +b x +b x ++ b x 0 1 2 n−1 Compute C(x) = A(x) ⋅ B(x). € Evaluate C(2) = a ⋅ b. Running time: O(n log n) complex arithmetic operations. € Theory. SchönhageStrassen 1971 O(n log n log log n) bit operations. O(log n) Theory. Fürer 2007 n log n 2 bit operations. Practice. GNU Multiple Precision Arithmetic Library It uses brute force, Karatsuba, and FFT, depending on the size of n. 67
sharer
Presentations
Free
Document Information
Category:
Presentations
User Name:
Dr.AlexanderTyler
User Type:
Teacher
Country:
India
Uploaded Date:
21-07-2017