Lecture notes for Advanced data structures and Algorithms

advanced data structures and algorithm analysis notes
GeorgeThomson Profile Pic
Published Date:09-07-2017
Your Website URL(Optional)
ADVANCED D AT A-STRUCTURES & ALGORITHM ANALYSIS Dr.Sukhamay Kundu Computer Science Dept, Louisiana state University Baton Rouge, LA 70803 kunducsc.lsu.edu Spring 2011 (copyright2010 , 2011)1.2 ROLE OF D AT A-STRUCTURES IN COMPUTATION Makes Computations Faster: •Faster is better.(Another way to makecomputations faster is to use parallel or distributed computation.) Three Basic Computation Steps: Computation = Sequence of Computation Steps (1) Locate/Accessdata-values (inputs to a step) External External (2) Computeavalue (output of a step) Input Output (3) Store the newvalue Program: Algorithm + DataStructure + Implementation. • Algorithm - The basic method; it determines the data-items computed. - Also, the order in which those data-items are computed (and hence the order of read/write data-access operations). • Data structures - Supports efficient read/write of data-items used/computed. Total Time=Time to access/store data + Time to compute data. Efficient Algorithm = Good method + Good data-structures (+ Good Implementation) Question: •? What is an efficient program? •? What determines the speed of an Algorithm? •? A program must also solvea"problem". Which of the three parts algorithm, data-structure, and implementation embodies this?1.3 ALGORITHM OR METHOD vs. D AT A STRUCTURE Problem: Compute the average of three numbers. Tw o Methods: (1) aver=(x+y+z)/3. (2) aver=(x/3) + (y/3) + (z/3). •Method (1) superior to Method (2); twoless div-operations. •Theyaccess data in the same order: 〈x, y, z,aver〉. •Any improvement due to data-structure applies equally well to both methods. Data structures: (a) Threevariablesx,y,z. (b) An array nums0..2. - This is inferior to (a) because accessing an array-item takes more time than accessing a simple variable. (Toaccess numsi, the executable code has to compute its address addr(numsi) = addr(nums0) + isizeof(int), which involves 1 addition and 1 multiplication.) - When there are large number of data-items, naming indi- vidual data-items is not practical. - Use of individually named data-items is not suitable when a varying number of data-items are involved (in particular,if theyare used as parameters to a function). APoor Implementation of (1): Using 3 additions and 1 division. a=x+y;//uses 2 additional assignments b=a+z; aver = b/3;1.4 LIMITS OF EFFICIENCY Hardwarelimit: • Physical limits of time (speed of electrons) and space (layout of circuits). This limit is computation problem independent. From 5 mips (millions of instructions per sec) to 10 mips is an improvement by the factor of 2. - 9 One nano-second = 10 (one billionth of a second); 10 mips = 100 ns/instruction. Softwarelimit: • Limitless in a way,except for the inherent nature of the problem. That is, the limit is problem dependent. Sorting Algorithm A1: O(n.log n)time 2 Sorting Algorithm A2: O(n )time (n =number of items sorted) A1 is an improvement overA2bythe factor 2 n n = =→∞ as n →∞. n.log n log n • O(n.log n)isthe efficiency-limit for sorting Algorithms.1.5 MEASURING PERFORMANCE Analytic Method: •Theoretical analysis of the Algorithm’stime complexity. Empirical Methods: •Count the number of times specific operations are performed by executing an instrumented version of the program. •Measure directly the actual program-execution time in a run. Example of Instrumentation: Original code: if (x y) small = x; else small = y; Instrumentd code: countComparisons++; //initialized elsewhere if (x y) small = x; else small = y; Question: •? What is wrong with the following instrumentation: if (x y) countComparisons++; small = x; else small = y; •? Instrument the code belowfor readCount and writeCount of x: if (x 3) y = x + 5; •? Showthe newcode when updates to loopCount is movedoutside the loop: for (i=j; imax; i++) loopCount++; if (xi 0) break; 1.6 EXERCISE 1. Instrument the code belowtocount the number of Exchanges (numExchanges) and number of comparisons (numComparisons) of the array data-items. Showthe values of numExchanges and numComparisons after each iteration of the outer for-loop for the input items = 3, 2, 4, 5, 2, 0. void crazySort(int items, int numItems) int i, j, small, for (i=0; inumItems; i++) //put ith smallest item in itemsi for (j=i+1; jnumItems; j++) if (itemsi itemsj) //exchange small = itemsj; itemsj = itemsi; itemsi = small; (a) If we use "i numItems - 1" in place of "i numItems" in the outer for-loop, do we still get the same final result? Will it affect the execution time? (b) Is the algorithm in the code more closely related to inser- tion-sort or to selection-sort? In what way does it differ from that? 2. For numItems = 6, find an input for which crazySort will give maximum numExchanges. When will numExchanges be mini- mum? 3. Give a pseudocode for deciding whether three givenline seg- ments of lengths x, y,and z can form a triangle, and if so whether it is a right-angled, obtuse-angled, or an acute-angled triangle. Makesure that you minimize the total number operations (arith- metic and comparisons of data-items)? 4. Givenanarray lengths1..nofthe lengths of n line segments, find a method for testing if theycan form a polygon (quadrilateral for n =4,pentagon for n =5,etc).1.7 SOLUTION TOSELECTED EXERCISES: 1. void crazySort(int items, int numItems) int i, j, small, numComparisons=0, //for two elements in items numExchanges=0; //of elements in items for (i=0; inumItems; i++) //put ith smallest item in itemsi for (j=i+1; jnumItems; j++) numComparisons++; //keep it here if (itemsi itemsj) //exchange numExchanges++; small = itemsj; itemsj = itemsi; itemsi = small; printf("numComparisons = %d, numExchanges = %d\n", numComparisons, numExchanges); After the comparison and exchanges (if any) for input items = 3, 2, 4, 5, 2, 0. i=0, j=1, items: 2 3 4 5 2 0 i=0, j=2, items: 2 3 4 5 2 0 i=0, j=3, items: 2 3 4 5 2 0 i=0, j=4, items: 2 3 4 5 2 0 i=0, j=5, items: 0 3 4 5 2 2 numComparisons = 5, numExchanges = 2 i=1, j=2, items: 0 3 4 5 2 2 i=1, j=3, items: 0 3 4 5 2 2 i=1, j=4, items: 0 2 4 5 3 2 i=1, j=5, items: 0 2 4 5 3 2 numComparisons = 9, numExchanges = 3 i=2, j=3, items: 0 2 4 5 3 2 i=2, j=4, items: 0 2 3 5 4 2 i=2, j=5, items: 0 2 2 5 4 3 numComparisons = 12, numExchanges = 5 i=3, j=4, items: 0 2 2 4 5 3 i=3, j=5, items: 0 2 2 3 5 4 numComparisons = 14, numExchanges = 7 i=4, j=5, items: 0 2 2 3 4 5 numComparisons = 15, numExchanges = 8 i=5, j=6, items: 0 2 2 3 4 5 numComparisons = 15, numExchanges = 8 This is more closely related to selection-sort, which involves at most one n exchange for each iteration of outer-loop. (Comparisons) is still C . 2 2. Triangle classification pseudocode; assume that 0 x ≤ y ≤ z.1.8 if (z x + y) zSquare = zz; xySquareSum = xx + yy; if (zSquare == xySquareSum) right-angled triangle; else if (zSquare xySquareSum) obtuse-angled triangle; else acute-angled triangle; else not a triangle; 3. Condition for polygon: •The largest length is less than the sum of the other lengths. •The lengths 2, 4, 5, 20 will not makea quadrilateral because 20 /2+4+5=11, but the lengths 2, 4, 5, 10 will.1.9 ANALYZING NUMBER OF EXCHANGES IN CRAZY-SORT Pseudocode 1: 1. Create all possible permutations p of 0, 1, 2, ⋅⋅⋅, n - 1. 2. For each p,apply crazySort and determine numExchanges. 3. Collect these data to determine numPermutationsi=(permuta- n tions which has numExchanges = i)for i =0,2, ⋅⋅⋅, C . 2 4. Plot numPermutationsiagainst i to visualize the behavior of numExchanges. Pseudocode 2: //No need to store all npermutations. n 1. For (i=0; iC ;i++), initialize numPermutationsi=0. 2 2. While (there is a nextPermutation(n)= p)dothe following: (a) Apply crazySort to p and determine numExchagnes. (b) Add1tonumPermutationnumExchanges. 3. PlotnumPermutationsiagainsti. Note: We can use this idea to analyze other sorting algorithms. Question: •? If p is a permutation of S =0, 1, 2, ⋅⋅⋅, n - 1, then howtodeter- mine the nextPermutation(p)inthe lexicographic order? Shown beloware permutations for n =4inlexicographic order. 0123 0312 1203 2013 2301 3102 0132 0321 1230 2031 2310 3120 ↓↓↓↓↓↓ 0213 1023 1302 2103 3012 3201 0231 1032 1320 2130 3021 32101.10 PSEUDOCODE vs. CODE Characteristics of Good Pseudocode: +Shows the key concepts and the key computation steps of the Algorithm, avoiding too much details. +Avoids dependencyonany specific prog. language. +Allows determining the correctness of the Algorithm. +Allows choosing a suitable data-structures for an efficient imple- mentation and complexity analysis. Example. Compute the number of positive and negative items in nums0. . n - 1; assume each numsi ≠ 0. (A)Pseudocode: 1. Initialize positiveCount = negativeCount = 0. 2. Use each numsitoincrement one of the counts by one. Code: 1.1 positiveCount = negativeCount = 0; 2.1 for (i=0; in; i++) //each numsi ≠ 0 2.2 if (0 numsi) positiveCount++; 2.3 elsenegativeCount++; (B)Pseudocode: 1. Initialize positiveCount = 0. 2. Use each numsi0toincrement positiveCount by one. 3. Let negativeCount = n - positiveCount. Code: 1. positiveCount = 0; 2. for (i=0; in; i++) //each numsi ≠ 0 3. if (0 numsi) positiveCount++; 4. negativeCount = n - positiveCount; Question: •? Whyis(B)slightly more efficient than (A)? Writing a pseudocode requires skills to express an Algorithm in a concise and yet clear fashion.1.11 PSEUDOCODE FOR SELECTION-SORT Idea: Successively find the ith smallest item, i =0,1, ⋅⋅⋅. Algorithm Selection-Sort: Input: Array items and its size numItems. Output: Array items sorted in increasing order. 1. For each i in 0, 1, ⋅⋅⋅,numItems-1, in some order,do(a)-(b): (a) Findtheith smallest item in items. (b) Place it at position i in items. Finding ith smallest item in items: •Finding ith smallest item directly is difficult, but it is easy if we knowall the kth smallest items for k =0,1,2, ⋅⋅⋅,(i - 1). •Itisthe smallest item among the remaining items. •Ifweassume that itemsk, 0 ≤ k ≤ (i - 1), are the kth smallest items, then smallest item in itemsi..numItems - 1 = ith smallest item. Thisgivesthe pseudocode: (a.1) smallestItemIndex=i; (a.2) for(j=i+1;jnumItems;j++) (a.3) if(itemsjitemssmallestItemIndex) (a.4) thensmallestItemIndex=j; Question: In what way (a.1)-(a.4) is better than step (a)? Placing ith smallest item at position i in items. (b.1) if(smallestItemIndexi)//why not smallestItemIndex ≠ i (b.2) thenexchange itemsiand itemssmallestItemIndex; "What" comes before "how".1.12 EXERCISE 1. Which of "put the items in right places" and "fill the places by right items" best describes the selection-sort Algorithm? Shown beloware the steps in the twomethods for input 3, 5, 0, 2, 4, 1. Put the items in Fill the places right places with right items 1. 2, 5, 0, 3,4,10,5,3,2,4,1 3movedtoright place 1st place is filled by 0 2. 0,5, 2, 3,4,10, 1,3,2,4,5 2movedtoright place 2nd place is filled by 1 3. 0,5, 2, 3,4,10, 1, 2,3,4,5 0already in right place 3rd place is filled by 2 4. 0,1, 2, 3,4,50, 1, 2, 3, 4, 5 5movedtoright place all places filled properly 5. 0, 1, 2, 3, 4, 5 all items in right places Note that once an item is put in right place, you must not change its position while putting other items in proper places. It is for this reason, we makeanexchange (and not an insertion) when we move anitem in the right place. The insertion after removing 3 from its current position in 3, 5, 0, 2, 4, 1 would have giv en5, 0, 2, 3, 4, 1 but not 2, 5, 0, 3, 4, 1 as we showed above. 2. Which input array for the set numbers 0, 1, 2, 3, 4, 5 requires maximum number of exchanges in the first approach? 3. Give a pseudocode for the first approach.1.13 ANOTHER EXAMPLE OF PSEUDOCODE Problem: Find the position of rightmost "00" in binString0..(n-1). 1. Search for 0 right to left upto position 1 (initially,start at position n-1). 2. If (0 is found and the item to its left is 1), then go back to step (1) to start the search for 0 from the left of the current position. Three Implementations: Only the first one fits the pseudocode. (1) i=n;//= length of binString do for (i=i-1 ; i0; i) if (0 == binStringi) break; while (1 == binStringi); //has a bug; find it (2) for (i=n-1; i0; i) if (0 == binStringi) && (0 == binStringi-1) break; //inefficient but works (3) for (i=n-1; i0; i) //bad for-loop; body updates i if (0 == binStringi) && (0 == binStringi) break; // works and efficient Question: •? Showhow these implementations work differently using the bin- String: ⋅⋅⋅000111010101. Extend each implementation to return the position of the left 0 of the rightmost "00". •? Instrument each code for readCount of the items in binString . •? Which of (1)-(3) is the least efficient in terms readCount? •? Give a pseudocode to find rightmost "00" without checking all bits from right till "00" is found. It is not necessary to sacrifice clarity for the sakeofefficiency.1.14 EXERCISE 1. BinStrings(n, m)=x:x is a binary string of length n and m ones, 0 ≤ m ≤ n.The strings in BinStrings(4, 2) in lexicographic order are: 0011, 0101, 0110, 1001, 1010, 1100. Which of the pseudocodes belowfor generating the strings in BinStrings(n, m)inlexicographic order is more efficient? (a) 1. Generate and save all binary strings of length n in lexicographic order. 2. Throwawaythe strings which have numOnes ≠ m. n- m m (b) 1. Generate the first binary string 0 1 ∈ Bin- Strings(n, m). 2. Successively create the next string in Bin- m n- m Strings(n,m)until the last string 1 0 . Which of the three characteristics of a good pseudocode hold for each of these pseudocodes? 2. Give the pseudocode of a recursive Algorithm for generating the binary strings in BinStrings(n, m)inlexicographic order. 3. Give anefficient pseudocode for finding the position of rightmost "01" in an arbitrary string x ∈ BinStrings(n, m). (The underlined portion in 10110011100 shows the rightmost "01".) Give enough details so that one can determine the number of times various items xiinthe array x are looked at. 4. Givenastring x ∈BinStrings(n,m), give a pseudocode for gen- erating the next string in BinStrings(n, m), if any.1.15 AL WA YSTEST YOUR METHOD AND YOUR ALGORITHM •Create a fewgeneral examples of input and the corresponding outputs. - Select some input-output pairs based on your understanding of the problem and before you design the Algorithm. - Select some other input-output pairs after you design the Algorithm, including a fewcases that involvespecial handling of the input or output. •Use these input-output pairs for testing (but not proving) the cor- rectness of your Algorithm. •Illustrate the use of data-structures by showing the "state" of the data-structures (lists, trees, etc.) at various stages in the Algo- rithm’sexecution for some of the example inputs. Always use one or more carefully selected example to illustrate the critical steps in your method/algorithm.1.16 EFFICIENCY OF NESTED IF-THEN-ELSE •Let E =average (condition evaluations). Wecount 1 for evalua- tion of both x and its negation (¬x). Example 1. Forthe code below, E =3⋅5. if (x and y) z = 0; else if ((not x) and y) z = 1; else if (x and (not y)) z = 2; else z = 3; Value of z (condition evaluations) 02(x=T and y = T ) 13(x=F,¬x=T,and y = T ) 25(x=T,y=F,¬x=F,x=T,and ¬y = T ) 34(x=F,¬x=T,y=F,x=F) Question: •? Show(condition evaluations) for each z for the code and also the av erage E : if (x) if (y) z = 0; else z = 2; else if (y) z = 1; else z = 3; •? Give a code to compute z without using the keyword "else" (or "case") and show(condition evaluations) for each value of z. •? Showthe improvedform of the twocode-segments below. (a). if (numsi = max) max = numsi; (b). if (x 0) z = 1; if ((x 0) && (y 0)) z = 2;1.17 BRIEF REVIEW OF SORTING Questions: •What is Sorting? Explain with an example. •Why dowewant to sort data? •What are some well-known sorting Algorithms? •Which sorting Algorithm uses the following idea: Successively,find the smallest item, the second small- est item, the third smallest items, etc. •Can we sort a set of pairs of numbers like(1,7), (2,7), (5,4), (3,6)? What is the result after sorting? •Can we sort non-numerical objects likethe ones shown below? Strings: abb, ba, baca, cab. Binary trees on 3 nodes (convert them to strings to sort): Flowcharts with 2 nodes (convert them to trees or strings to sort): A B C D E1.18 EXERCISE 1. Give a more detailed pseudocode (not code) for sorting using the idea "put the items in the right places". Determine the number of comparisons of involving data from items0..numItems-1 based on the pseudocode. Explain the Algorithm in detail for the input items = 3, 2, 4, 5, 1, 0. 2. Writeapseudocode for insertion-sort. Determine the number of comparisons of involving data from items0..numItems-1 based on the pseudocode; also determine the number of data- movements (i.e., movements of items from the items-array) based on the pseudocode. Explain the Algorithm in detail for the input items = 3, 2, 4, 5, 1, 0. 3. For each of the sorting Algorithms insertion-sort, selection-sort, bubble-sort, and merge-sort, showthe array after each successive exchange operation starting the initial array 3, 2, 4, 5, 1, 0. 4. Some critical thinking questions on selection-sort. Assume that the input is a permutation of 1, 2, ⋅⋅⋅, n. (a) Give anexample input for which the number of data- movements is maximum (resp., minimum). (b) In what sense, selection-sort minimizes data-movements? (c) Suppose we have exchanges of the form e :itemsi1 and 1 itemsi2, e :itemsi2 and itemsi3, ... , e :itemsi(k-1) 2 k- 1 and itemsik. Then argue that the indices i1, i2, ..., ik form a cycle in the permutation. Note that the exchange operations e may be interleavedwith other exchanges. i 5. Is it true that in bubble-sort if an item movesup, then it never movesdown? Explain with the input items = 3, 2, 4, 5, 1, 0.1.19 AVERAGE (COMPARISONS) TO LOCATE A D AT A-ITEM IN A SORTED-ARRAY 4 Binary Search: Assume N =numItems = 15 = 2 - 1. A0 A1 A2 ⋅⋅⋅ A14 (Nodes at (Compar. this level per node) 1 1 A7 2 2 A3 A11 4 3 A1 A5 A9 A13 8 4 A0 A2 A4 A6 A8 A10 A12 A14 •Number of comparisons for an item x: If x were A6, then we would make4comparisons: x A7, x A3, x A5, and x = A6. Total (Comparisons) =1×1+2×2+3×4+4×8=49; Av erage = 49/15 = 3⋅3. n •General case (N =2 - 1): Total (Comparisons) = n- 1 (compar . per node at level i)×(nodes at level i) Σ i=0 n- 1 n =1×1+2×2+3×4+ ⋅⋅⋅ + n×2 =1+(n- 1)2 =1+log(N + 1) - 1. (N + 1) = O(N.log N) Av erage (Comp.) = O(log N) Asimpler argument: •Max(Comp) = n and hence average ≤ n = O(log N).1.20 HEAP D AT A-STRUCTURE Heap: Aspecial kind of binary-tree, which givesanefficient O(N.log N)implementation of selection-sort. • Shape constraints: Nodes are added left to right, levelbylev el. - Anode has a rightchild only if it has a leftchild. - If there is a node at level m,then there are no missing nodes at level m - 1. • Node-Value constraint: Foreach node x and its children y,val(x) ≥ val(y), val(x)=the value associated with node x. Example: The shape of heaps with upto 7 nodes. Questions: Which of the following is true? (1) Each node has exactly one parent, except the root. (2) Each node has 0 or 2 children, except perhaps one. (3) The leftchild node with no brother has the maximum height. (4) The properties (1)-(3) define a heap. Example. Heaps with upto 4 nodes and small node-values. 1 2 3 3 4 4 4 1 1 2 2 1 2 3 3 2 3 1 1 1 2