Lecture Notes on Algorithm Analysis and Design

algorithm analysis and design lecture notes pdf and algorithm analysis and design ebook free download
Dr.DavidWllker Profile Pic
Dr.DavidWllker,United States,Professional
Published Date:11-07-2017
Your Website URL(Optional)
Comment
Lecture Notes for Algorithm Analysis and Design 1 Sandeep Sen November 6, 2013 1 Departmentof ComputerScience andEngineering, IITDelhi, New Delhi110016, India. E-mail:ssencse.iitd.ernet.inContents 1 Model and Analysis 6 1.1 Computing Fibonacci numbers . . . . . . . . . . . . . . . . . . . . . . 6 1.2 Fast Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3 Model of Computation . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4 Other models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.4.1 External memory model . . . . . . . . . . . . . . . . . . . . . 11 1.4.2 Parallel Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2 Warm up problems 15 2.1 Euclid’s algorithm for GCD . . . . . . . . . . . . . . . . . . . . . . . 15 2.1.1 Extended Euclid’s algorithm . . . . . . . . . . . . . . . . . . . 16 2.2 Finding the k-th element . . . . . . . . . . . . . . . . . . . . . . . . . 17 2.2.1 Choosing a random splitter . . . . . . . . . . . . . . . . . . . 18 2.2.2 Median of medians . . . . . . . . . . . . . . . . . . . . . . . . 19 2.3 Sorting words . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.4 Mergeable heaps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 2.4.1 Merging Binomial Heaps . . . . . . . . . . . . . . . . . . . . . 22 2.5 A simple semi-dynamic dictionary . . . . . . . . . . . . . . . . . . . . 23 2.5.1 Potential method and amortized analysis . . . . . . . . . . . . 24 3 Optimization I : Brute force and Greedy strategy 25 3.1 Heuristic search approaches . . . . . . . . . . . . . . . . . . . . . . . 25 3.1.1 Game Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.2 A framework for Greedy Algorithms . . . . . . . . . . . . . . . . . . . 29 3.2.1 Maximal Spanning Tree . . . . . . . . . . . . . . . . . . . . . 31 3.2.2 A Scheduling Problem . . . . . . . . . . . . . . . . . . . . . . 32 3.3 Efficient data structures for MST algorithms . . . . . . . . . . . . . . 33 3.3.1 A simple data structure for union-find . . . . . . . . . . . . . 33 3.3.2 A faster scheme . . . . . . . . . . . . . . . . . . . . . . . . . . 34 13.3.3 The slowest growing function ? . . . . . . . . . . . . . . . . . 35 3.3.4 Putting things together . . . . . . . . . . . . . . . . . . . . . . 36 3.3.5 Path compression only . . . . . . . . . . . . . . . . . . . . . . 37 3.4 Compromising with Greedy . . . . . . . . . . . . . . . . . . . . . . . 38 4 Optimization II : Dynamic Programming 39 4.1 A generic dynamic programming formulation . . . . . . . . . . . . . . 40 4.2 Illustrative examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 40 4.2.1 Context Free Parsing . . . . . . . . . . . . . . . . . . . . . . . 40 4.2.2 Longest monotonic subsequence . . . . . . . . . . . . . . . . . 41 4.2.3 Function approximation . . . . . . . . . . . . . . . . . . . . . 42 4.2.4 Viterbi’s algorithm for Maximum likelihood estimation . . . . 44 5 Searching 46 5.1 Skip Lists - a simple dictionary . . . . . . . . . . . . . . . . . . . . . 46 5.1.1 Construction of Skip-lists. . . . . . . . . . . . . . . . . . . . . 46 5.1.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.2 Treaps : Randomized Search Trees . . . . . . . . . . . . . . . . . . . 49 5.3 Universal Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 5.3.1 Example of a Universal Hash function . . . . . . . . . . . . . 52 5.4 Perfect Hash function . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 5.4.1 Converting expected bound to worst case bound . . . . . . . . 54 5.5 A loglogN priority queue . . . . . . . . . . . . . . . . . . . . . . . . 54 6 Multidimensional Searching and Geometric algorithms 57 6.1 Interval Trees and Range Trees . . . . . . . . . . . . . . . . . . . . . 57 6.1.1 Two Dimensional Range Queries . . . . . . . . . . . . . . . . 58 6.2 k-d trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.3 Priority Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 6.4 Planar Convex Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.4.1 Jarvis March . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.4.2 Graham’s Scan . . . . . . . . . . . . . . . . . . . . . . . . . . 63 6.4.3 Sorting and Convex hulls . . . . . . . . . . . . . . . . . . . . . 64 6.5 A Quickhull Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.5.1 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 ∗ 6.5.2 Expected running time . . . . . . . . . . . . . . . . . . . . . 67 6.6 Point location using persistent data structure . . . . . . . . . . . . . 68 2Chapter 1 Model and Analysis 2 When we make a claim like Algorithm A has running time O(n logn), we have an underlying computational model where this statement is valid. It may not be true if we change the model. Before we formalize the notion of a computational model, let us consider the example of computing Fibonacci numbers. 1.1 Computing Fibonacci numbers One of the most popular sequence is the Fibonacci sequence defined by  0 i =0  F = 1 i =1 i  F +F otherwise for i≥2 i−1 i−2 Exercise 1.1 How large is F , the n-th Fibonacci number - show that n √ 1 1+ 5 n n ′ ′ √ F = (φ −φ ) φ= φ =1−φ n 2 5 Clearly it grows exponentially with n. You can also prove that n−2 X F =1+ F n i i=0 Since the closed form solution for F involves the golden ratio - an irrational number, n we must find out a way to compute it efficiently without incurring numerical errors or approximation as it is an integer. Method 1 6Simply use the recursive formula. Unfortunately, one can easily argue that the number ofoperations(primarilyadditions)involved isproportionaltothevalueofF n (just unfold the recursion tree where each internal node corresponds to an addition. As we had noted earlier this leads to an exponential time algorithm and we can’t afford it. Method 2 Observe that we only need the last two terms of the series to compute the new term. So by applying the idea of dynamic programming we gradually compute the F starting with F =0 and F =1. n 0 1 This takes time that is proportional to approximately n additions where each addition involves adding (increasingly large) numbers. The size of F⌈n/2⌉ is about 1 n/2 bits so the last n/2 computations are going to take Ω(n) steps culminating in 2 an O(n ) algorithm. Since the n-th Fibonacci number is at most n bits, it is reasonable to look for a faster algorithm. Method 3      F 1 1 F i i−1 = F 1 0 F i−1 i−2 By iterating the above equation we obtain       n−1 F 1 1 1 n = F 1 0 0 n−1 n To compute A , where A is a square matrix we can extend the following strategy for n computing x where n is an integer.  2 2k k x =(x ) for even integral powers 2k+1 2k x =x·x for odd integral powers n The number of multiplications taken by the above approach to compute x is bounded by 2logn (Convince yourself by writing a recurrence). However, the actual running time depends onthe time to multiply two numbers which in turn depends on their lengths (number of digits). If we assume that M(n) is the number of (bit-wise) steps to multiply two n bit numbers. Therefore the number of steps to implement the above approach must take into account the lengths of numbers that are being multiplied. The following observations will be useful. k The length of x is bounded by k·x wherex is the length of x. k Therefore, the cost of the the squaring of x is bounded by M(kx). Similarly, the 1 Adding two k bit numbers take Θ(k) 72k cost of computing x×x can also be bound by M(2kx).The overall recurrence for n computing x can be written as T (n)≤T (⌊n/2⌋)+M(nx) B B where T (n) is the number of bit operations to compute the n-th power using the B previous recurrence. The solution of the above recurrence can be written as the following summation (by unfolding) logn X i M(2x) i=1 IfM(2i) 2M(i), then the above summation canbe bounded byO(M(nx), i.e. the cost the last squaring operation. In ourcase, A isa 2×2 matrix-each squaring operationinvolves 8 multiplication and 4 additions involving entries of the matrix. Since multiplications are more ex- pensive than additions, let us count the cost of multiplications only. Here, we have to keep track of the lengths of the entries of the matrix. Observe that if the maximum size of an entry isx, then the maximum size of an entry after squaring is at most 2x+1 (Why ?). n Exercise 1.2 Show that the cost of computing A is O(M(nx) where A is a 2×2 matrix and the maximum length of any entry isx. So the running time of computing F using Method 3 is dependent on the multipli- n cation algorithm. Well, multiplication is multiplication - what can we do about it ? Before that let us summarize what we know about it. Multiplying two n digit 2 numbers using the add-and-shift method takes O(n ) steps where each step involves multiplying two single digits (bits in the case of binary representation), and generat- ing and managing carries. For binary representation this takes O(n) for multiplying with each bit and finally n shifted summands are added - the whole process takes 2 O(n ) steps. Using such a method of multiplication implies that we cannot do better than 2 Ω(n ) steps tocompute F . For any significant (asymptotically better) improvement, n we must find a way to multiply faster. 1.2 Fast Multiplication Problem Given two numbers A and B in binary, we want to compute the product A×B. 8k Let us assume that the numbers A and B have lengths equal to n =2 - this will keep our calculations simpler without affecting the asymptotic analysis. n/2 n/2 A×B =(2 ·A +A )×(2 ·B +B ) 1 2 1 2 where A (B ) is the leading n/2 bits of A (B). Likewise A is the trailing n/2 bits 1 1 2 of A. We can expand the above product as n/2 n/2 A ×B ·2 +(A ×B +A ×B )·2 +A ×B 1 1 1 2 2 1 2 2 k Observethatmultiplicationby2 canbeeasilyachievedinbinarybyaddingktrailing k 0’s (likewise in any radix r, multiplying by r can be done by adding trailing zeros). So the product of two n bit numbers can be achieved by recursively computing four products of n/2 bit numbers. Exercise 1.3 What is the time to multiply using the above method - write and solve an appropriate recurrence ? We can achieve an improvement by reducing it to three recursive calls of multiplying n/2 n/2 bit numbers by rewriting the coefficient of 2 as follows A ×B +A ×B = (A +A )×(B +B )−(A ×B )−(A ×B ) 1 2 2 1 1 2 1 2 1 1 2 2 Although strictly speaking, A +A is notn/2 bits but at most n/2+1bits (Why ?), 1 2 wecanstillviewthisascomputingthreeseparateproductsinvolvingn/2bitnumbers recursively and subsequently subtracting appropriate terms to get the required prod- ucts. Subtraction and additions are identical in modulo arithmetic (2’s complement), so the cost of subtraction can be bounded by O(n). (What is maximum size of the numbers involved in subtraction ?). This gives us the following recurrence T (n)≤ 3·T (n/2)+O(n) B B where the last term accounts for addition, subtractions and shifts. Exercise 1.4 With appropriate terminating condition, show that the solution to the log 3 2 recurrence is O(n ). 1.7 2 TherunningtimeisroughlyO(n )whichisasymptoticallybetterthann andthere- 2 fore we have succeeded in designing an algorithm to compute F faster than n . n It is possible to multiply much faster using a generalization of the above method in O(nlognloglogn) by a method of Schonage and Strassen. However it is quite in- volved as it uses Discrete Fourier Transform computation over modulo integer rings and has fairly large constants that neutralize the advantage of the asymptotic im- provement unless thenumbers areafewthousand bitslong. Itishowever conceivable that such methods will become more relevant as we may need to multiply large keys for cryptographic/security requirements. 91.3 Model of Computation Although there are a few thousand variations of the computer with different archi- tectures and internal organization, it is best to think about them at the level of the assembly language. Despite architectural variations, theassembly level languagesup- portisverysimilar-themajordifferencebeinginthenumberofregistersandtheword length of the machine. But these parameters are also in a restricted range of a factor oftwo, andhence asymptoticallyinthe sameballpark. In summary, thinkaboutany computer as a machine that supports a basic instruction set consisting of arithmetic and logical operations and memory accesses (including indirect addressing). We will avoid cumbersome details of the exact instruction set and assume realistically that anyinstruction ofone machine canbe simulated using aconstant number ofavailable instruction of another machine. Since analysis of algorithms involves counting the number of operations and not the exact timings (which could differ by an order of magnitude), the above simplification is justified. ThecarefulreaderwouldhavenoticedthatduringourdetailedanalysisofMethod 3 in the previous sections, we were not simply counting the number of arithmetic operations but actually the number of bit-level operations. Therefore the cost of a multiplication or addition was not unity but proportional to the length of the n input. Had we only counted the number of multiplications for computing x , that would only be O(logn). This would indeed be the analysis in a uniform cost model where only the number of arithmetic (also logical) operations are counted and does not depend on the length of the operands. A very common us of this model is for comparison-based problems like sorting, selection, merging, and many data-structure operations. For these problems, we often count only the number of comparisons (not even otherarithmeticoperations)withoutbotheringaboutthelengthoftheoperands involved. In other words, we implicitly assume O(1) cost for any comparison. This is not considered unreasonable since the size of the numbers involved in sorting do not increase during the course of the algorithm for majority of the commonly known sorting problems. On the other hand consider the following problem of repeated n 2 n squaring n times starting with 2. The resultant is a number 2 which requires 2 bits to be represented. It will be very unreasonable to assume that a number that is exponentially long can be written out (or even stored) in O(n) time. Therefore the uniform cost model will not reflect any realistic setting for this problem. On the other extreme isthe logarithmiccost model where the cost ofan operation is proportional to length of the operands. This is very consistent with the physical worldandalsohascloserelationwiththeTuring Machinemodelwhichisafavoriteof complexity theorists. Our analysis in the previous sections is actually done with this model in mind. It is not only the arithmetic operations but also the cost of memory access is proportional to the length of the address and the operand. 10The most commonly used model is something in between. We assume that for 2 an input of size n, any operation involving operands of size logn takes O(1) steps. Thisisjustifiedasfollows. Allmicroprocessorchipshavespecializedhardwarecircuits for arithmetic operations like multiplication, addition, division etc. that take a fixed number of clock cycles when the operands fit into a word. The reason that logn is a natural choice for a word is that, even to address an input size n, you require logn bits of address space. The present high end microprocessor chips have typically 2-4 64 GBytes of RAM and about 64 bits word size - clearly 2 exceeds 4 GBytes. We will also use this model, popularly known as Random Access Machine (or RAM in short) except for problems that deal with numbers as inputs like multiplication in the previous section where we will invoke the log cost model. In the beginning, it is desirable that for any algorithm, you get an estimate of the maximum size of the numbers to ensure that operands do not exceed Ω(logn) so that it is safe to use the RAM model. 1.4 Other models Thereiscleartrade-offbetween thesimplicityandthefidelityachieved byanabstract model. One of the obvious (and sometimes serious) drawback of the RAM model is the assumption of unbounded number of registers since the memory access cost is uniform. In reality, there is a memory hierarchy comprising of registers, several levels of cache, main memory and finally the disks. We incur a higher access cost as we go from registers towards the disk and for techological reason, the size of the faster 5 memory is limited. There could be a disparity of 10 between the fastest andthe slowest memory which makes the RAM model somewhat suspect for larger input sizes. This has been redressed by the external memory model. 1.4.1 External memory model In this model, the primary concern is the number of disk accesses. Given the rather highcostofadiskaccesscomparedtoanyCPUoperation,thismodelactuallyignores all other costs and counts only the number of disk accesses. The disk is accessed as contiguous memory locations called blocks. The blocks have a fixed size B and the simplest model is parameterized by B and the size of the faster memory M. In this two level model, the algorithms are only charged for transferring a block between the internal and external memory and all other computation is free. In this model, the  n n cost of sorting n elements is O log disk accesses and this is also optimal. M/B B B To see this, analyse M/B-way mergesort in this model. Note that, one block from 2 We can also work with clogn bits as the asymptotic analysis does not change for a constant c. 11each of the M/B sorted streams can fit into the main memory. Using appropriate data structures, we can generate the next B elements of the output and we can write an entire block to the output stream. So, the overall number of I-Os per phase is O(n/B) since each block is read and written exactly once. The algorithm makes n/B O( ) passes, yielding the required bound. M/B There arefurtherrefinements tothismodelthatparameterizesmultiple levels and also accounts for internal computation. As the model becomes more complicated, designing algorithms also becomes more challenging and often more laborious. 1.4.2 Parallel Model The basic idea of parallel computing is extremely intuitive and a fundamental in- tellectual pursuit. At the most intuitive level it symbolises what can be achieved by cooperation among individuals in terms of expediting an activity. It is not in terms of division of labor (or specialisation), but actually assuming similar capabil- ities. Putting more labourers clearly speeds up the construction and similarly using more than one processor is likely to speed up computation. Ideally, by using p pro- cessors we would like to obtain a p-fold speed up over the conventional algorithms; however the principle of decreasing marginal utility shows up. One of the intuititive reasons for this that with more processors (as with more individuals), the commu- nication requirements tend to dominate after a while. But more surprisingly, there are algorithmic constraints that pose serious limitations to our objective of obtaining proportional speed-up. This is best demonstrated in the model called PRAM (or Parallel Random Ac- cess Machine) which is the analogue of the RAM. Here p processors are connected to a shared memory and the communication happens through reading and writing ina globally shared memory. It is left to the algorithm designer to avoid read and write conflicts. It is further assumed that all operations are synchorized globally and there is no cost of synchronization. In this model, there is no extra overhead for communi- cation as it charged in the same way as a local memory access. Even in this model, it hasbeenshown thatitisnotalwayspossible toobtainidealspeed up. Asanexample consider the elementary problem of finding the minimum of n elements. It has been proved that with n processors, the time (parallel time) is at least Ω(loglogn). For certain problems, like depth first search of graphs, it is known that even if we use any polynomial number of processors, we cannot obtain polylogarithmic time So, clearly not all problems can be parallelized effectively. A more realistic parallel model is the interconnection network model that has an underlyingcommunication network, usuallyaregulartopologylikeatwo-dimensional mesh, hypercube etc. These can be embedded into VLSI chips and can be scaled according to our needs. To implement any any parallel algorithm, we have to design 12efficient schemes for data routing. A very common model of parallel computation isa hardware circuit comprising of basic logic gates. The signals are transmitted in parallel through different paths and the output is a function of the input. The size of the circuit is the number of gates and the (parallel) time is usually measured in terms of the maximum path length fromanyinput gate tothe output gate(each gatecontributes toaunit delay). Those familiar with circuits for addition, comparison can analyse them in this framework. The carry-save adder is a low-depth circuit that adds two n-bit numbers in about O(logn) steps which is much faster than a sequential circuit that adds one bit at a time taking n steps. An example Given numbers x ,x ...x , consider problem of computing the terms 1 2 n P i S = x for all 1≤ i≤ n. Each term corresponds to a partial sum. It is i j j=1 trivial to compute all the partial sums in O(n) steps. Computing S for each i can i be done in parallel using a binary tree of depth⌈logi⌉ where the inputs are given at the leaf nodes and each internal node corresponds to a summation operation. All the summations at the same level can be done simultaneously and the final answer is available at the root. Doing this computation independently for each S is wasteful i 2 since S = S +x that will about O(n ) additions compared to the sequential i+1 i i+1 complexity of O(n). Instead we use the following idea. Add every odd-even pair of inputs into a single value y = x +x , for every even i (assume n is a power of two). Now compute i/2 i−1 i P 2j ′ ′ ′ ′ the partial sums S ,S ...S recursively. Note that S = x = S , i.e., half k 2j 1 2 j n/2 k=1 the terms can be computed this way. To obtain S , 0≤ j≤ n/2−1, add x 2j+1 2j+1 ′ to S . This can also be done simultaneously for all terms. j The total parallel time T (n) = T (n/2)+2 where the last term corresponds to the two additional operations to combine the terms. This yields T (n) = O(logn). The total number of operations used W(n)=W(n/2)+2n or W(n)=O(n) which is also optimal. This recursive description can be unfolded to yield a parallel circuit for this computation. This algorithm can be generalized for any arbitrary associative operation and is known as parallel prefix or scan operation. Using ap- propriately defined composition function for a semi-adder (adding two bits given a carry), we can construct the carry-save adder circuit. One of the most fascinating developments is the Quantum Model which is in- herently parallel but it is also fundamentally different from the previous models. A breakthrough result in recent years is a polynomial time algorithm for factorization which forms the basis of many cryptographic protocals in the conventional model. Biological Computing models is a very active area of research where scientists 13are trying to assemble a machine out of DNA strands. It has potentially many advantages over silicon based devices and is inherently parallel. 14Chapter 2 Warm up problems Oneoftheprimarychallengesinalgorithmdesignistocomeupwithprovablyoptimal algorithms. The optimality is with respect to the underlying model. In this chapter, we look closely at some well-known algorithms for basic problems that uses basic propertiesoftheproblemdomaininconjunction withelementaryanalyticalmethods. 2.1 Euclid’s algorithm for GCD Euclid’s algorithm for computing the greatest common divisor (gcd) of two positive integers is allegedly the earliest known algorithm in a true sense. It is based on two very simple observations that the gcd of numbers a,b satisfies gcd(a,b)=gcd(a,a+b) gcd(a,b)=b if b divides a Exercise 2.1 Prove this rigorously. The above also implies that gcd(a,b) = gcd(a−b,b) for b a and repeated appli- cations imply that gcd(a,b) = gcd(a mod b,b) where mod denotes the remainder operation. So we have essentially derived Euclid’s algorithm, described formally as Algorithm Euclid GCD Input: Positive integers a,b such that b≤a Output GCD of a,b Let c =a mod b. If c= 0 then return b else return Euclid GCD(b,c) 15Let us now analyze the running time of Euclid’s algorithm in the bit model. Since it depends on integer division, which is a topic in its own right, let us address the number of iterations of Euclid’s algorithm in the worst case. a Observation 2.1 The number a mod b≤ , i.e. the size of a mod b is strictly less 2 thana. a a Thisisasimplecaseanalysisbasedonb≤ andb . Asaconsequenceoftheabove 2 2 observation, it follows that the the number of iterations of the Euclid’s algorithm is bounded bya, or equivalently O(loga). Exercise 2.2 Construct an input for which the number of iterations match this bound. So,byusingthelongdivisionmethodtocompute mod ,therunningtimeisbounded 3 by O(n ) where n =a+b. 2.1.1 Extended Euclid’s algorithm If you consider the numbers defined by the linear combinations ofa,b, namely,xa+ yb x,yare integers it is known that gcd(a,b)= minxa+ybxa+yb 0 Proof: Let ℓ = minxa +ybxa +yb 0. Clearly gcd(a,b) divides ℓ and hence gcd(a,b)≤ ℓ. We now prove that ℓ divides a (also b). Let us prove by contradiction that a = ℓq+r where ℓ r 0. Now r = a−ℓq = (1−xq)a−(yq)b contradicting the minimality of ℓ. ✷ The above result can be restated as ℓ divides a and b. For some applications, we are interested in computing x and y corresponding to gcd(a,b). We can compute them recursively along with the Euclid’s algorithm. ′ ′ ′ Exercise 2.3 Let (x,y) correspond to gcd(b,a mod b), i.e. gcd(b,a mod b) = x· ′ ′ ′ b+y·(a mod b). Then show that gcd(a,b)=y·a+(x−q)b where q is the quotient of the integer division of a by b. One immediate application of the extended Euclid’s algorithm is computing the in- ∗ ∗ verse in a multiplicative prime field F where q is prime. F =1,2...(q− 1) q q 1 where the multiplication is performed modulo q. It is known that for every number 1 since it forms a group 16∗ ∗ x∈F , there exists y∈F such thatx·y≡ 1 mod q which is also called the inverse q q of x. To compute the inverse of a we can use extended Euclid algorithm to find s,t such that sa+tq = 1 since a is relatively prime to q. By taking remainder modulo q, we see that s mod q is the required inverse. ∗ Exercise 2.4 Extend the above result to Z =xx is relatively prime to N. First N ∗ ∗ show that Z is closed under multiplication modulo N, i.e., a,b∈Z a·b mod N∈ N N ∗ Z . N 2.2 Finding the k-th element Problem Given a set S of n elements, and an integer k, 1≤k≤n, find an element x∈ S such that the rank of x is k. The rank of an element in a set S is k if x = x k in the sorted set x ,x ,...x where x ∈ S. We will denote the rank of x in S by 1 2 n i R(x,S). Note that k is not unique if the value of x is not unique, but the value of the k-th element is unique. If S is a multiset, we can (hypothetically) append logn trailing bits equal to the input index to each element. So an element x can be thought of as i a pair (x,i) so that every pair is unique since the input index in unique. The case i k =1 (k =n) corresponds to finding the minimum (maximum) element. We can easily reduce the selection problem to sorting by first sorting S and then reporting the k-th element of the sorted set. But this also implies that we cannot circumvent the lower bound of Ω(nlogn) for comparison based sorting. If we want a faster algorithm, we cannot afford to sort. For instance, when k = 1 ork =n, we can easily select the minimum (maximum) element using n−1 comparisons. The basic idea for a faster selection algorithm is based on the following observation. Given an element x∈ S, we can answer the following query in n−1 comparisons Is x the k-th element or is x larger than the k-th element or is x smaller than the k-th element ? This is easily done by comparing x with all elements in S−x and finding the rank of x. Using an arbitrary element x as a filter, we can subsequently confine our search for the k-th element to either (i) S =y∈S−xy x if R(x,S)k or (ii) S =y∈S−xy x if R(x,S)k In the fortuitous situation, R(x,S) =k,x is the required element. In case 1, we must ′ ′ find k-th element in S where k =k−R(x,S). 17Suppose T(n)istheworst case running time forselecting thek-thelement forany k, then we can write the following recurrence T(n)≤ maxT(S ),T(S )+O(n) A quick inspection tells us that if we can ensure maxS ,S ≤ ǫn for some n−1 1 1/2≤ ǫ , (Why the bounds ?) T(n) is bounded by O( ·n). So it could n 1−ǫ 2 vary between Ω(n) and O(n ) - where a better running time is achieved by ensuring a smaller value of ǫ. An element x used to divide the set is often called a splitter or a pivot. So, now we will discuss methods to select a good splitter. From our previous discussion, we would like to select a splitter that has a rank in the range ǫ·n,(1−ǫ)·n for a fixed fraction ǫ. Typically, ǫ will be chosen as 1/4. 2.2.1 Choosing a random splitter Let us analyze the situation where the splitter is chosen uniformly atrandom fromS, i.e., any of the n elements is equally likely to be chosen as the splitter. This can be done using standard routines for random number generation in the range (1,2,...n). A central observation is For a randomly chosen element r∈S, the probability Prn/4≤R(r,S)≤3n/4≥1/2 It is easy to verify if the rank R(r,S) falls in the above range, and if it does not, then we choose another element independently at random. This process is repeated till we find a splitter in the above range - let us call such a splitter a good splitter. How many times do we need to repeat the process ? To answer this, we have to take a slightly different view. One can argue easily that there is no guarantee that we will terminate after some fixed number of trials, while it is also intuitively clear that it is extremely unlikely that we need to repeat this more than say 10 times. The probability of failing 9 consecutive times, when the 1 success probability of picking a good splitter is≥ 1/2 independently is≤ . More 9 2 2 precisely, theexpected number oftrialsisbounded by2. So, in(expected) two trials, 3 we will find a good splitter that reduces the size of the problem to at most n. This 4 argument can be repeated for the recursive calls, namely, the expected number of splitter selection (and verification of its rank) is 2. If n is the size of the problem i after i recursive calls with n = n, then the expected number of comparisons done 0 2 Please refer to the Appendix for a quick recap of basic measures of discrete probability 18after the i-th recursive call is 2n . The total expected number of comparisonsX after i t calls can be written as X +X +...X where t is sufficiently large such that the 0 1 t problem size n ≤ C for some constant C (you can choose other stopping criteria) t and X is the number of comparisons done at stage i. By taking expectation on both i sides EX =EX +X +...X =EX +EX +...EX 1 2 t 1 2 t 3 From the previous discussion EX = 2n and moreover n ≤ n . Therefore the i i i i−1 4 expected number of comparisons is bounded by 4n. 2.2.2 Median of medians Partition the elements into groups of 5 and choose the median of each group. Denote the groups byG and their medians by m . Now consider the median of the setm i i i which contains about n/5 elements. Denote the median of medians by M. How many elements are guaranteed to be smaller than M ? 3 Wlog, assume allelements aredistinct andthatimplies aboutn/10 mediansthat are smaller than M. For each such median there are 3 elements that are smaller than M, giving a total of at least n/10·3= 3n/10 elements smaller than M. Likewise, we can argue that there are at least 3n/10 elements larger than M. Therefore we can conclude that 3n/10≤ R(M,S)≤ 7n/10 which satisfies the requirement of a good splitter. The next question is how to find M which is the median of medians. Each m can be determined in O(1) time because we are dealing with groups of size 5. i However, finding the median of n/5 elements is like going back to square one But it is n/5 elements instead of n and therefore, we can apply a recursive strategy. We can write a recurrence for running time as follows     7n n T(n)≤T +T +O(n) 10 5 where the second recursive call is to find the median of medians (for finding a good splitter). After we find the splitter ( by recursively applying the same algorithm), we 7n use it to reduce the original problem size to at most . 10 Exercise 2.5 By using an appropriate terminating condition, show that T(n) ∈ O(n). Try to minimize the leading constant by adjusting the size of the group. 3 Strictly speaking we should be using the floor function but we are avoiding the extra symbols and it does not affect the analysis. 192.3 Sorting words Problem Given n words w ,w ...w of lengths l ,l ...l respectively, arrange the 1 2 n 1 2 n words in a lexicographic order. A word is an ordered sequence of characters from a given alphabet Σ. P Recallthatlexicographicorderingreferstothedictionaryordering. LetN = l , i i i.e. the cumulative length of all words. A single word may be very long and we cannot assume that it fits into a single word of the computer. So, we cannot use straightforward comparison sorting. Let us recall some basic results about integer sorting. Claim 2.1 n integers in the range 1..m can be sorted in O(n+m) steps. A sorting algorithm is considered stable if the relative order of input elements having identical values is preserved in the sorted output. k Claim 2.2 Using stable sorting, n integers in the range 1..m can be sorted in O(k(n+m) steps. The above is easily achieved by applying integer sorting in the range 1..m starting from the least significant digits - note that the maximum number of digits in radix m representation is k. If we apply the same algorithm for sorting words, then the running time will be O(L(n+Σ) where L = maxl ,l ..l. This is not satisfactory 1 2 n since L·n can be much larger than N (size of input). The reason that the above method is potentially inefficient is that many words maybemuch shorter thanLand hence byconsidering them tobe lengthL words(by hypothetical trailing blanks), we are increasing the input size asymptotically. When we considered radix sort as a possible solution, the words have to be left-aligned, i.e., all words begin from the same position. To make radix sort efficient and to avoid redundant comparison (of blanks), we should not consider a word until the radix sort reaches the right boundary of the word. The radix sort will take a maximum of L rounds and a word of length l will start participating from the L−l +1 iteration. This can be easily achieved. A bigger challenge is to reduce the range of sorting in each iteration depending on which symbols of the alphabet participate. Given a word w = a a ...a , where a ∈ Σ, we form the following pairs - i i,1 i,2 i,l i,j i (1,a ),(2,a ).... There are N such pairs from the n words and we can think of i,1 i,2 them as length two strings where the first symbol is from the range 1..L and the second symbol is from Σ. We can sort them using radix sort in two rounds in time proportional to O(N +L+Σ) which is O(N +Σ) since N L. From the sorted pairs we know exactly which symbols appear in a given position (between 1 and L) - let there be m words that have non-blank symbols in position i. We also have the i 20ordering of symbols in position i which is crucial to implement integer sort in O(m ) i steps. Now we go back to sorting the given words using radix sort where we will use the information available from the sorted pairs. When we are sorting position i from the left, we apply integer sort in the range 1..m where the ordered buckets are also i defined by the sorted pairs. We do not move the entire words into the buckets but only the pointers (which could be the input index of the words) associated with the words. For every round, we allocate an array of size m where we place the pointers i accordingtothesortedorderofthesymbolsinvolved. Forsamesymbols, wemaintain the order of the previous round (stable sorting). We must also take care of the new words that start participating in the radix sort - once a word participates, it will participate in all future rounds. (Where should the new words be placed within its symbol group ?) The analysis of this algorithm can be done by looking at the cost of each radix P L sort which is proportional to O(m ) which can be bounded by N. Therefore i i=1 overall running time of the algorithm is the sum of sorting the pairs and the radix sort. This is given by O(N+Σ). IfΣN, then the optimal running time is given by O(N). Exercise 2.6 Work out the details of sorting n binary strings in O(N) steps where P N = ℓ, ℓ is the number of bits in the i-th string. i i i 2.4 Mergeable heaps 4 Heaps are one of the most common implementation of priority queues and are known to support the operations min , delete-min , insert , delete in logarithmic time. A complete binary tree (often implemented as an array) is one of the simplest ways to represent a heap. In many situations, we are interested in an additional operation, namely, combining two heaps into a single heap. A binary tree doesn’t support fast (polylogarithmic) merging and is not suitable for this purpose - instead we use binomial trees. A binomial tree B of order i is recursively defined as follows i • B is a single node 0 • For i≥ 0 , B is constructed from two B ’s by making the root node of one i+1 i B a left child of the other B . i i 4 we are assuming min heaps 21Exercise 2.7 Prove the following properties for B by induction i i (i) The number of nodes in B equals 2 . i (ii) The height of B is k (by definition B has height 0). k 0  i (iii) There are exactly nodes at depth k for k =0,1... k (iv) The children of B are roots of B ,B ...B . i i−1 i−2 0 A binomial heap is an ordered set of binomial trees such that for any i there is at most one B . i Letusrefertotheabovepropertyastheunique-orderproperty. Weactuallymaintain list of the root nodes in increasing order of their degrees. Youmaythinkoftheabovepropertyasabinaryrepresentationofanumberwhere i the i-th bit from right is 0 or 1 and in the latter case, its contribution is 2 (for LSB i =0). From the above analogue, a Binomial Heap on n elements has logn Binomial trees. Therefore, finding the minimum element can be done in O(logn) comparisons by finding the minimum of the logn roots. 2.4.1 Merging Binomial Heaps Merging two Binomial Heaps amounts to merging the root lists and restoring the unique-order property. First we merge two lists of size at most logn. Subsequently, we walk along the list combining two trees of the same degree whenever we find them - they must be consecutive. We know that by combining two B trees, we obtain a i B tree which may have to be now combined with the next B tree if there exists i+1 i+1 one. In this process, if you have three consecutive Binomial trees of order i (can it happen ?), merge the second and third instead the first and second - it simplifies the procedure. Combining two binomial trees takes O(1) time, so the running time is proportional to the number of times we combine. Claim 2.3 Two Binomial heaps can be combined in O(logn) steps where the total number of nodes in the two trees is n. Every time we combine two trees, the number of binomial trees decreases by one, so there can be at most 2logn times where we combine trees. Remark The reader may compare this with the method for summing two numbers in binary representation. Exercise 2.8 Show that the delete-min operation can be implemented in O(logn) steps using merging. Inserting a new element is easy - add a node to the root list and merge. Deletion takes a little thought. Let us first consider an operation decrease-key. This happens when a key value of a node x decreases. Clearly, the min-heap property of the parent 22

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.