Lecture notes Dynamic programming

stochastic dynamic programming lecture notes what is dynamic programming in data structure, what is dynamic programming algorithm pdf free download
Dr.AlexanderTyler Profile Pic
Dr.AlexanderTyler,India,Teacher
Published Date:21-07-2017
Your Website URL(Optional)
Comment
6. DYNAMIC PROGRAMMING II sequence alignment ‣ Hirschberg's algorithm ‣ Bellman-Ford algorithm ‣ distance vector protocols ‣ negative cycles in a digraph ‣ Lecture slides by Kevin Wayne Copyright © 2005 Pearson-Addison Wesley Copyright © 2013 Kevin Wayne http://www.cs.princeton.edu/wayne/kleinberg-tardos Last updated on Oct 11, 2014, 9:26 AM6. DYNAMIC PROGRAMMING II sequence alignment ‣ Hirschberg's algorithm ‣ Bellman-Ford algorithm ‣ distance vector protocols ‣ negative cycles in a digraph ‣ SECTION 6.6String similarity Q. How similar are two strings? Ex. ocurrance and occurrence. o c u r r a n c e – o c – u r r a n c e o c c u r r e n c e o c c u r r e n c e 6 mismatches, 1 gap 1 mismatch, 1 gap o c – u r r – a n c e o c c u r r e – n c e 0 mismatches, 3 gaps 3Edit distance Edit distance. Levenshtein 1966, Needleman-Wunsch 1970 Gap penalty δ; mismatch penalty α . pq Cost = sum of gap and mismatch penalties. C T – G A C C T A C G C T G G A C G A A C G cost = δ + αCG + αTA Applications. Unix diff, speech recognition, computational biology, ... 4Sequence alignment Goal. Given two strings x x ... x and y y ... y find min cost alignment. 1 2 m 1 2 n Def. An alignment M is a set of ordered pairs x – y such that each item i j occurs in at most one pair and no crossings. x – y and x – y cross if i i ', but j j ' ' i j i' j Def. The cost of an alignment M is: cost(M ) = α + δ + δ ∑ ∑ ∑ x y i j (x ,y )∈ M i :x unmatched j :y unmatched i j i j " " mismatch gap x x x x x x 1 2 3 4 5 6 € C T A C C – G – T A C A T G y y y y y y 1 2 3 4 5 6 an alignment of CTACCG and TACATG: M = x –y , x –y , x –y , x –y , x –y 2 1 3 2 4 3 5 4 6 6 5Sequence alignment: problem structure Def. OPT(i, j) = min cost of aligning prefix strings x x ... x and y y ... y . 1 2 i 1 2 j Case 1. OPT matches x – y . i j Pay mismatch for x – y + min cost of aligning x x ... x and y y ... y . i j 1 2 i–1 1 2 j–1 Case 2a. OPT leaves x unmatched. i Pay gap for x + min cost of aligning x x ... x and y y ... y . i 1 2 i–1 1 2 j optimal substructure property (proof via exchange argument) Case 2b. OPT leaves y unmatched. j Pay gap for y + min cost of aligning x x ... x and y y ... y . j 1 2 i 1 2 j–1 ⎧ jδ if i = 0 ⎪ ⎧ α +OPT(i−1, j−1) x y i j ⎪ ⎪ ⎪ OPT(i, j) = min δ +OPT(i−1, j) otherwise ⎨ ⎨ ⎪ ⎪ δ +OPT(i, j−1) ⎩ ⎪ ⎪ iδ if j = 0 ⎩ 6 € Sequence alignment: algorithm SEQUENCE-ALIGNMENT (m, n, x , …, x , y , …, y , δ, α) 1 m 1 n ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ FOR i = 0 TO m M i, 0 ← iδ. FOR j = 0 TO n M 0, j ← jδ. FOR i = 1 TO m FOR j = 1 TO n M i, j ← min αx , y + M i – 1, j – 1, i j δ + M i – 1, j, δ + M i, j – 1). RETURN M m, n. ________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ 7Sequence alignment: analysis Theorem. The dynamic programming algorithm computes the edit distance (and optimal alignment) of two strings of length m and n in Θ(mn) time and Θ(mn) space. Pf. Algorithm computes edit distance. Can trace back to extract optimal alignment itself. ▪ Q. Can we avoid using quadratic space? A. Easy to compute optimal value in O(mn) time and O(m + n) space. Compute OPT(i, •) from OPT(i – 1, •). But, no longer easy to recover optimal alignment itself. 86. DYNAMIC PROGRAMMING II sequence alignment ‣ Hirschberg's algorithm ‣ Bellman-Ford algorithm ‣ distance vector protocols ‣ negative cycles in a digraph ‣ SECTION 6.7Sequence alignment in linear space Theorem. There exist an algorithm to find an optimal alignment in O(mn) time and O(m + n) space. Clever combination of divide-and-conquer and dynamic programming. Inspired by idea of Savitch from complexity theory. A = axa2...am if and only if there is a mapping F: 1, 2, ..., p 1, 2, ..., m such that f(i) = k only if c is ak and F is a monotone strictly increasing func- tion (i.e. F(i) = u, F(j) = v, and i j imply that uv). Programming G. Manacher String C is a common subsequence of strings A and B Techniques Editor if and only if C is a subsequence of A and C is a subse- quence of B. A Linear Space The problem can be stated as follows: Given strings A = aia.2.. "am and B = bxb2...bn (over alphabet Z), Algorithm for find a string C = ClC2...cp such that C, is a common subsequence of A and B and p is maximized. Computing Maximal We call C an example of a maximal common subse- quence. Common Subsequences Notation. For string D = dld2. • • dr, Dk t is dkdk+l. • • d, ifk t;dkdk_x...d, ifk __ t. When k t, we shall D.S. Hirschberg write 3kt so as to make clear that we are referring to a Princeton University "reverse substring" of D. L(i, j) is the maximum length possible of any com- mon subsequence of Ax and Bs. x lY is the concatenation of strings x and y. The problem of finding a longest common subse- We present the algorithm described in 3, which quence of two strings has been solved in quadratic time takes quadratic time and space. and space. An algorithm is presented which will solve this problem in quadratic time and in linear space. Key Words and Phrases: subsequence, longest Algorithm A common subsequence, string correction, editing CR Categories: 3.63, 3.73, 3.79, 4.22, 5.25 Algorithm A accepts as input strings Am and Bx. and produces as output the matrix L (where the ele- ment L(i, j) corresponds to our notation of maximum length possible of any common subsequence of Axl and B.). Introduction ALGA (m, n, A, B, L) 10 1. Initialization: L(i, 0) 0 i=0...m; L(O,j) + 0 j=0...n; The problem of finding a longest common subse- 2. for i + 1 to m do quence of two strings has been solved in quadratic time begin and space 1, 3. For strings of length 1,000 (assuming 3. for j - 1 to n do coefficients of 1 microsecond and 1 byte) the solution if A (i) = B(j) then L(i, j) - L(i 1, j 1) "k 1 would require 106 microseconds (one second) and 106 else L(i,j) maxL(i,j1), L(i I,j) end bytes (1000K bytes). The former is easily accommo- dated, the latter is not so easily obtainable. If the strings were of length 10,000, the problem might not be Proof of Correctness of Algorithm A solvable in main memory for lack of space. To find L(i, j), let a common subsequence of that We present an algorithm which will solve this prob- length be denoted by S(i, j) = ClC2...cp. If al = bj, lem in quadratic time and in linear space. For example, we can do no better than by taking cp = a and looking assuming coefficients of 2 microseconds and 10 bytes, for cl...cp_l as a common subsequence of length for strings of length 1,000 we would require 2 seconds L(i, j) 1 of strings AI,-1 and B1.i-x. Thus, in this and 10K bytes; for strings of length 10,000 we would case, L(i,j) = L(i- 1,j- 1) -+- 1. require a little over 3 minutes and 100K bytes. If ai bs, then cp is ai, b;, or neither (but not both). String C = cc2...cp is a subsequence of string If cp is a, then a solution C to problem (A, Bj) writ- Copyright © 1975, Association for Computing Machinery, Inc. General permission to republish, but not for profit, all or part ten P(i, j) will be a solution to P(i, j - 1) since bj is of this material is granted provided that ACM's copyright notice not used. Similarly, if cp is bi, then we can get a solu- is given and that reference is made to the publication, to its date tion to P(i, j) by solving P(i 1, j). If c is neither, of issue, and to the fact that reprinting privileges were granted then a solution to either P(i 1,j) or P(i,j 1) will by permission of the Association for Computing Machinery. Research work was supported in part by NSF grant GJ-30126 suffice. In determining the length of the solution, it is and National Science Foundation Graduate Felolwship. Author's seen that L(i, j) corresponding to P(i, j) will be the address: Department of Electrical Engineering, Princeton Uni- versity, Princeton, NJ 08540. maximum ofL(i 1,j) and L(i,j 1). 341 Communications June 1975 of Volume 18 the ACM Number 6 Hirschberg's algorithm Edit distance graph. Let f (i, j) be shortest path from (0,0) to (i, j). Lemma: f (i, j) = OPT(i, j) for all i and j. ε y y y y y y 1 2 3 4 5 6 ε 0-0 x 1 α x y i j δ x 2 i-j δ € x 3 m-n 11Hirschberg's algorithm Edit distance graph. Let f (i, j) be shortest path from (0,0) to (i, j). Lemma: f (i, j) = OPT(i, j) for all i and j. Pf of Lemma. by strong induction on i + j Base case: f (0, 0) = OPT (0, 0) = 0. Inductive hypothesis: assume true for all (i', j') with i' + j' i + j. Last edge on shortest path to (i, j) is from (i – 1, j – 1), (i – 1, j), or (i, j – 1). Thus, f(i,j)= min +f(i 1,j 1), +f(i 1,j), +f(i,j 1) x y f(i,j)= min +f(i 1,j 1), +f(i 1,j), +f(i,j 1) i j f(i,j)= min x y +f(i 1,j 1), +f(i 1,j), +f(i,j 1) i j x y i j =min +OPT(i 1,j 1), +OPT(i 1,j), +OPT(i,j 1) x y =min +OPT(i 1,j 1), +OPT(i 1,j), +OPT(i,j 1) i j =min x y +OPT(i 1,j 1), +OPT(i 1,j), +OPT(i,j 1) i j x y i j = OPT(i,j) = OPT(i,j) = OPT(i,j) ▪ α x y i j δ i-j δ € 12Hirschberg's algorithm Edit distance graph. Let f (i, j) be shortest path from (0,0) to (i, j). Lemma: f (i, j) = OPT(i, j) for all i and j. Can compute f (•, j) for any j in O(mn) time and O(m + n) space. j ε y y y y y y 1 2 3 4 5 6 ε 0-0 x 1 x 2 i-j x 3 m-n 13Hirschberg's algorithm Edit distance graph. Let g (i, j) be shortest path from (i, j) to (m, n). Can compute by reversing the edge orientations and inverting the roles of (0, 0) and (m, n). ε y y y y y y 1 2 3 4 5 6 ε 0-0 δ x 1 i-j i-j x y i+1 j+1 δ x 2 x 3 m-n 14Hirschberg's algorithm Edit distance graph. Let g (i, j) be shortest path from (i, j) to (m, n). Can compute g(•, j) for any j in O(mn) time and O(m + n) space. j ε y y y y y y 1 2 3 4 5 6 ε 0-0 x 1 i-j x 2 x 3 m-n 15Hirschberg's algorithm Observation 1. The cost of the shortest path that uses (i, j) is f (i, j) + g(i, j). ε y y y y y y 1 2 3 4 5 6 ε 0-0 x 1 i-j x 2 x 3 m-n 16Hirschberg's algorithm Observation 2. let q be an index that minimizes f (q, n/2) + g (q, n/2). Then, there exists a shortest path from (0, 0) to (m, n) uses (q, n/2). n / 2 ε y y y y y y 1 2 3 4 5 6 ε 0-0 x q 1 i-j x 2 x 3 m-n 17Hirschberg's algorithm Divide. Find index q that minimizes f (q, n/2) + g(q, n/2); align x and y . q n / 2 Conquer. Recursively compute optimal alignment in each piece. n / 2 ε y y y y y y 1 2 3 4 5 6 ε 0-0 x q 1 i-j x 2 x 3 m-n 18Hirschberg's algorithm: running time analysis warmup Theorem. Let T(m, n) = max running time of Hirschberg's algorithm on strings of length at most m and n. Then, T(m, n) = O(m n log n). Pf. T(m, n) ≤ 2 T(m, n / 2) + O(m n) ⇒ T(m, n) = O(m n log n). Remark. Analysis is not tight because two subproblems are of size (q, n/2) and (m – q, n / 2). In next slide, we save log n factor. 19Hirschberg's algorithm: running time analysis Theorem. Let T(m, n) = max running time of Hirschberg's algorithm on strings of length at most m and n. Then, T(m, n) = O(m n). Pf. by induction on n O(m n) time to compute f ( •, n / 2) and g ( •, n / 2) and find index q. T(q, n / 2) + T(m – q, n / 2) time for two recursive calls. Choose constant c so that: T(m, 2) ≤ c m T(2, n) ≤ c n T(m, n) ≤ c m n + T(q, n / 2) + T(m – q, n / 2) Claim. T(m, n) ≤ 2 c m n. Base cases: m = 2 or n = 2. Inductive hypothesis: T(m, n) ≤ 2 c m n for all (m', n') with m' + n' m + n. T(m, n) ≤ T(q, n / 2) + T(m – q, n / 2) + c m n ≤ 2 c q n / 2 + 2 c (m – q) n / 2 + c m n = c q n + c m n – c q n + c m n = 2 c m n ▪ 20

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.