Lecture Notes Optimization

lecture notes on optimization techniques. lecture notes on modern convex optimization and lecture notes on convex optimization for signal pdf free download
ImogenCameron Profile Pic
ImogenCameron,France,Teacher
Published Date:14-07-2017
Your Website URL(Optional)
Comment
Network Mathematics Graduate Programme Hamilton Institute, Maynooth, Ireland Lecture Notes Optimization I 1 Angelia Nedi´c 4th August 2008 c by Angelia Nedi´c 2008 All rights are reserved. The author hereby gives a permission to print and distribute the copies of these lecture notes intact and for as long as the lecture note copies are not for any commercial purpose. 1 Industrial and Enterprise Systems Engineering Department, University of Illinois at Urbana- Champaign, Urbana IL 61801. E-mail: angeliaillinois.eduChapter 1 Review and Miscellanea In this chapter, we review briefly without proof some basic concepts and facts of mathe- matical (real) analysis and linear algebra. We do assume that the reader is familiar with the elementary calculus and linear algebra such as fundamental properties and operations with scalar functions (continuity, derivatives, integrals, etc.) and matrices (addition, mul- tiplication, inversion, determinant, etc.). 1.1 Linear Algebra This section provides notation, definitions, and basic results of linear algebra. More on this materialcanbefound, forexample, inthetextbookbyStrang32. Moregeneraltreatment of matrices can be found in the book by Horn and Johnson 18. 1.1.1 Vectors and Set Operations Vectors n n We use R to denote the set of n-dimensional vectors. We view the vectors of R as n columns. Given a vector, x ∈ R , we write x to denote its i-th component. We write i x ≥ 0 and x 0 when, respectively, x ≥ 0 and x 0 for all components i. For any i i n vectors x,y ∈R , we write x≥ y and x y when x−y ≥ 0 and x−y 0, respectively. Similarly, we interpret x≤ 0, x 0, x≤y, and xy. n + For a vector x∈R , we write x to denote the vector of componentwise maximum of x and the zero vector, i.e.,   maxx ,0 1   . + . x = .   . maxx ,0 n + − Note that x ≥ 0. Similarly, we define x as the componentwise minimum of x and the 78 CHAPTER 1. REVIEW AND MISCELLANEA zero vector,   minx ,0 1   . − . x = .   . minx ,0 n − + − Note that x ≤ 0. Furthermore, we have x =x +x . T T We use x to denote the transpose of a vector x. Accordingly, we use x y to denote P n n T n the inner product of two vectors x,y ∈ R , i.e., x y = xy . The vectors x,y ∈ R i i i=1 T are orthogonal when their inner product is zero, i.e., x y = 0. Set Operations n We denote the empty set by∅. Given a set X ⊆R , its complement set is c n X =x∈R x∈/ X. n Given a scalar α∈R and a set X ⊆R , the scaled set αX is given by αX =z =αxx∈X. Given two sets, their set difference X\Y is the set given by X\Y =zz∈X and z∈/ Y difference. Given two sets, their set intersection X∩Y, union X∪Y, Cartesian product X×Y, and sum X +Y are the sets respectively given by X∩Y =zz∈X and z∈Y intersection X∪Y =zz∈X or z∈Y union X×Y =(x,y)x∈X, y∈Y Cartesian product X +Y =x+yx∈X, y∈Y sum. The preceding definitions extend naturally to more than two sets. For example, the inter- n section of sets X ,...,X ⊆R is given by 1 m X ∩···∩X =xx∈X for all i = 1,...,m. 1 m i 1.1.2 Linear Combination and Independence m Given scalars α ,...,α ∈R and vectors x ,...,x ∈R , the vector z given by 1 m 1 m z =α x +...+α x 1 1 m m is referred to as a linear combination of vectors x ,...,x . 1 m1.1. LINEAR ALGEBRA 9 The vectors x ,...,x are said to be linearly dependent when the zero vector can be 1 m obtained as a nonzero linear combination of these vectors. Formally,x ,...,x are linearly 1 m dependent when there exists scalars α ,...,α not all equal to zero and such that 1 m α x +...+α x = 0. 1 1 m m The vectors x ,...,x are said to be linearly independent when they are not linearly 1 m dependent. Formally, they are independent when the equality α x +...+α x = 0 1 1 m m holds only for α = 0,...,α = 0. In other words, the vectors x ,...,x are linearly 1 m 1 m independent when the zero vector cannot be obtained as a nonzero linear combination of these vectors. 1.1.3 Subspace and Dimension n n A nonempty set S⊆R is a subspace ofR when every linear combination of its elements belongs to S. Formally, S is subspace when for every set of vectors x ,...,x ∈S we have 1 m α x +...+α x ∈S for any set of scalars α ,...,α . 1 1 m m 1 m n Note thatR is a subspace. Also, the set containing only the zero vector is also a subspace. n Given a subspaceS⊆R and a set of vectorsx ,...,x inS, the vectorsx ...,x are 1 m 1 m saidtospanthesubspaceS wheneveryvectorinS canbeexpressedasalinearcombination of vectors x ,...,x , i.e., 1 m S =α x +···+α x α ∈R for all i. 1 1 m m i A set of vectors x ,...,x ∈ S is said to be a basis of the subspace S when the vectors 1 m n x ,...,x span the subspace S and they are linearly independent. A subspace S ⊆ R 1 m can have more than one basis. However, every basis of S has the same number of vectors. This common number is the dimension of the subspace S. Note that the zero subspace, S =0, has zero dimension. n We use e to denote a vector in R whose j-th entry is 1 and all other entries are 0. j n The vectors e ,...,e are the standard basis forR . Note that these vectors are mutually 1 n orthogonal, i.e., T e e = 0 for all i = 6 j. j i n They are also refereed to as standard orthogonal basis forR . 1.1.4 Affine Sets n n A set X ⊆R is an affine set when X is a translation of a subspace in R , i.e., when X can be written as X =x+S =x+ss∈S (1.1)10 CHAPTER 1. REVIEW AND MISCELLANEA n for some x∈X and some subspace S⊆R . We also say that the affine set X is generated by the subspace S. The dimension of an affine set X is the same as the dimension of its n generating subspace S. For example, every set x for x∈R is an affine (singleton) set sincex =x+0. Each of these sets has zero dimension. The dimension of an affine set X is denoted by dimX. If X is an affine set then the subspace S that generates X is unique. However, the translation vector x in Eq. (1.1) is not unique. In fact, any vector x ∈ X can be used. Formally, if X is an affine set generated by the subspace S, then we have X =x +S for any x ∈X. 0 0 1.1.5 Orthogonal Vectors and Orthogonal Subspace n We say that vectors x,y ∈R are orthogonal when their inner product is zero, i.e., when T x y = 0. We often write x⊥y to denote that x and y are orthogonal. n ⊥ Given a set X ⊆R , the orthogonal complement of X is the set X of vectors y that are orthogonal to every element in X, i.e., ⊥ T X =yy x = 0 for all x∈X. ⊥ Note that X is a subspace regardless whether X is a subspace or not. For a subspace S, its orthogonal complement is also referred to as orthogonal subspace of S. Note that ⊥ ⊥ n (S ) = S. Furthermore, the sum of the dimensions of a subspace S ⊆ R and its ⊥ n orthogonal subspace S is equal to the dimension of the underlying spaceR , i.e., ⊥ n dimS +dimS =n for a subspace S⊆R . 1.1.6 Vector Norm n A vector norm is a scalar function that assigns a nonnegative scalar to every vectorx∈R . n Specifically, a norm k·k : R → 0,+∞) is a scalar function with the following defining properties: n 1. Nonnegativity kxk≥ 0 for all x∈R , wherekxk = 0 if and only if x = 0. n 2. Homogeneity kαxk =αkxk for any x∈R and any α∈R. n 3. Triangle Inequality or Subadditivity kx+yk≤kxk+kyk for all x,y∈R . For the most part, we use Euclidean norm given by v u n X √ u T t 2 kxk = x x = x . i i=1 Occasionally, we also use 1-norm and∞-norm, respectively, given by n X kxk = x, 1 i i=11.1. LINEAR ALGEBRA 11 kxk = maxx. ∞ i 1≤i≤n When we writek·k, then it is Euclidean norm. The following are important relations: T n 1 x y≤kxk·kyk for any x,y∈R Schwartz inequality . 2 2 2 n T kx+yk =kxk +kyk for any x,y∈R with x y = 0 Pythagorean Theorem. 1.1.7 Matrices Notation and Basics Given a matrix A, we use A or A to denote its ij-th entry. We write A or A to ij ij i i j j denote its i-th row vector. Similarly, we use A or A to denote its j-th column vector. T We denote the identity matrix by I. We use A to denote the transpose of the matrix A. The rank of A is the largest number of linearly independent columns of A. We say that A has full column rank when the columns of A are linearly independent. Similarly, We say that A has full row rank when the rows of A are linearly independent. We say that A has full rank when its rank is equal to the minimum of m and n. The matrix A and its T transpose A have the same rank. Null Space and Range Given an m×n matrix A, the null space of A is the set N defined by A n N =x∈R Ax = 0. A The range of A is the set R given by A m n R =y∈R y =Ax for some x∈R . A n m Both the null space and the range are subspaces (of R and R , respectively). There is an important orthogonality relation between the null space of A and the range of its T complement A . In particular, we have ⊥ . N =R (1.2) T A A 1.1.8 Square Matrices For a square matrix A, we use detA to denote its determinant. The matrix is said to be singular when its determinant is zero; otherwise, it is nonsingular or invertible. −1 For an n×n nonsingular matrix A, its inverse is the (n×n) matrix A such that −1 −1 A A =AA =I. −1 The inverse matrix A is unique. For n×n nonsingular matrices A and B, the following relation holds −1 −1 −1 (AB) =B A . Additional important properties of square matrices are given in the following lemma.12 CHAPTER 1. REVIEW AND MISCELLANEA Lemma 1 For a square matrix A, the following statements are equivalent: (a) A is nonsingular. T (b) A is nonsingular. (c) A has full column rank. (d) A has full row rank. (e) Ax = 0 if and only if x = 0. n n (f) For every y∈R , the equation AX =y as a unique solution x∈R . −1 (g) The inverse A exists and it is unique. 1.1.9 Eigenvalues and Eigenvectors Given an n×n square matrix A, we say that a complex number λ is an eigenvalue of A when n Ax =λx for some nonzero x∈R . A vector x 6= 0 satisfying the preceding relation is referred to as an eigenvector of A associatedwiththeeigenvalueλ. Notethattheeigenvaluesareingeneralcomplex numbers. Furthermore, every square matrix has n eigenvalues (possibly repeated). Also, if λ is a ¯ complex eigenvalue of A, then the complex conjugate λ is also an eigenvalue of A. ThesetofalleigenvaluesofAisthespectrum of Aandisdenotedbyσ(A). Thespectral radius of Aisthelargestmagnitudeoftheeigenvalues, anditisdenotedbyρ(A). Formally, it is given by ρ(A) = max λ. λ∈σ(A) The determinant of A is equal to the product of the eigenvalues of A, i.e., detA =λ ···λ . 1 n The trace of A, denoted by TrA, is defined as the sum of the diagonal entries of A n X TrA = A . ii i=1 The trace of A is equal to the sum of the eigenvalues of A, n X TrA = λ. i i=1 Some additional properties of the eigenvalues of square matrices are given in the fol- lowing lemma. Lemma 2 Let A be an n×n square matrix. We have:1.1. LINEAR ALGEBRA 13 T T (a) A has the same eigenvalues as A, i.e., σ(A ) =σ(A). −1 (b) Let B be a square matrix, and assume that B = CAC for some invertible matrix C. Then, the matrices A and B have the same eigenvalues, i.e., σ(A) =σ(B). −1 (c) Let A be invertible and let its eigenvalues be λ ,...,λ . Then, the eigenvalues of A 1 n 1 1 are ..., , i.e., λ λ 1 n   1 −1 σ(A ) = λ∈σ(A) . λ (d) Let λ ,...,λ be the eigenvalues of A. Then, the eigenvalues of the matrix αI +A 1 n are α+λ ,...,α+λ , i.e., 1 n σ(αI +A) =α+σ(A). k k k (e) Let λ ,...,λ be the eigenvalues of A. Then, the eigenvalues of A are λ ,...,λ , 1 n 1 n i.e., k k σ(A ) =λ λ∈σ(A). 1.1.10 Matrix Norms There are several matrix norms. Here, we exclusively consider the matrix norms induced by vector norms. We write kAk to denote the matrix norm induced by Euclidean vector norm, which is given by kAxk kAk = sup . kxk x6=0 Similarly, we writekAk andkAk to denote the matrix norms induced by vector 1-norm 1 ∞ and∞-norm, respectively, which are given by kAxk 1 kAk = sup , 1 kxk x=0 6 1 kAxk ∞ kAk = sup , ∞ kxk x6=0 ∞ The norm kAk induced by Euclidean vector norm is also known as spectral norm. It satisfies the following relation n o √ T kAk = max λλ is an eigenvalue of A A . ThenormkAk inducedbyvector1-normisalsoknownasmaximumcolumnsummatrix 1 norm. It is equivalently given by the following relation: X kAk = max A . 1 ij j i14 CHAPTER 1. REVIEW AND MISCELLANEA Similarly, the normkAk induced by vector∞-norm is also known as maximum row sum ∞ matrix norm. The following relation holds: X kAk = max A . ∞ ij i j In the space of all square matrices A, the spectral radius ρ(A) is a continuous function of A (the distance between A and B is measured bykA−Bk, or any other matrix norm). Note that in view of Lemma 2(a), we have for any square matrix A, T kAk =kA k. 1.1.11 Symmetric Matrices T AmatrixissaidtobesymmetricwhenA =A . Symmetricmatriceshavespecialeigenvalue properties, as given in the following. Lemma 3 Let A be an n×n symmetric matrix. We have (a) The eigenvalues of A are real. (b) There exist unit-norm and mutually orthogonal eigenvectorsx ,...,x associated with 1 n eigenvalues λ ,...,λ of A, respectively, i.e., the vectors such that 1 n Ax =λx with kxk = 1 for all i = 1,...,n, i i i i T x x = 0 for all i = 6 j. j i Furthermore, the matrix A has the following representation n X T A = λ xx . i i i i=1 AdditionalimportantpropertiesrelatedtothenormofasquarematrixAanditspowers k A are discussed in the following lemma. Lemma 4 Let A be a square symmetric matrix. We have: (a) kAk =ρ(A) = max λ. λ∈σ(A) k k (b) kA k =kAk for any integer k≥ 0. (c) Assume that A is nonsingular. Then 1 −1 kA k = max . λ∈σ(A) λ (d) LetAbeofsizen×nandletλ ≤···≤λ beitseigenvaluessortedinanondecreasing 1 n order. Then, 2 T 2 n λ kxk ≤x Ax≤λ kxk for all x∈R . 1 n1.1. LINEAR ALGEBRA 15 Positive Semidefinite and Positive Definite Matrices Let A be an n×n symmetric matrix A. The matrix A is positive semidefinite when T n x Ax≥ 0 for all x∈R . The matrix A is positive definite if the preceding inequality is strict when x6= 0, i.e., T n x Ax 0 for all x∈R with x = 6 0. Similarly, A is negative semidefinite when T n x Ax≤ 0 for all x∈R , and it is negative definite if the preceding inequality is strict when x = 6 0, i.e., T n x Ax 0 for all x∈R with x6= 0. Note that positive semidefinite, positive definite as well as negative semidefinite and negative definite matrices are all square and symmetric by definition. Lemma 5 Let A be a symmetric matrix. (a) A is positive semidefinite if and only if all its eigenvalues are nonnegative. (b) A is positive definite if and only if all its eigenvalues are positive. (c) A is negative semidefinite (negative definite) if and only if all its eigenvalues are nonnegative (negative). (d) Let A be positive definite. Then, its inverse is also positive definite. 1/2 A positive semidefinite matrix has a symmetric square root matrix denoted by A . 1/2 The symmetric square root matrix A is such that 1/2 1/2 A A =A. Further properties of the square root matrix are given in the following lemma. Lemma 6 Let A be a positive semidefinite matrix. Then: 1/2 (a) A is nonsingular if and only if A is nonsingular. −1/2 −1/2 −1 (b) A A =A . 1/2 1/2 (c) AA =A A.16 CHAPTER 1. REVIEW AND MISCELLANEA 1.2 Real Analysis and Multivariate Calculus This section contains a brief review of basic notions and results related to vector sequences n (suchasaccumulationpointsandconvergence),topologicalpropertiesofsetsinR (suchas closed and open sets), and vector functions (such as continuity, differentiability and Taylor expansions). More on these topics can be found in many textbooks on real analysis such as, for example, Rudin 27, Kolmogorov and Fomenko 20, and Zorich 35. Also, a good summary of these results (and more) can be found in the books by Bertsekas 5, Bertsekas, Nedi´c and Ozdaglar 9, Boyd and Vandenberghe 13 and Polyak 25. n Throughout this section, we consider n-dimensional vector space R equipped with √ P n T T the standard inner product x y = xy and the Euclidean norm kxk = x x = i i i=1 p P n 2 x . i i=1 1.2.1 Vector Sequence n Given a vector sequence x ⊂ R and an infinite index set K ⊆ 1,2,..., we refer to k the sequence x k ∈ K as a subsequence of the sequence x . We also use x or k k k K x to denote a subsequence ofx . We say that the sequencex is bounded when for k k k i some scalar C, we have kx k≤C for all k. k n The sequencex converges to a vector x˜∈R if k lim kx −x˜k = 0. k k→∞ The vector x˜ is referred to as the limit of x . When the sequence x converges to a k k vector x˜, we also write x →x. ˜ k If the sequence does not converge, we say that it diverges or that it is a divergent sequence. Somewhat abusing the terminology, we say that the sequencex converges to k ∞ when lim kx k =∞. k→∞ k n A vector y ∈R is an accumulation point of the sequence x when there is a subse- k quencex of the sequencex converging to y, i.e., such that x →y as i→∞. k k k i i n Theorem 1 (Bolzano)Aboundedsequencex ⊂R hasatleastoneaccumulationpoint. k Given a scalar sequencex ⊂R, we say that a is an upper bound forx when k k x ≤a for all k. k Thesmallestupperboundforx isreferredtoasthe supremumofx , anditisdenoted k k ˜ by sup x . If there exists an index k such that k k x = supx , ˜ k k k then we say that the supremum of x is attained, and we write max x instead of sup x . k k k k k1.2. REAL ANALYSIS AND MULTIVARIATE CALCULUS 17 Similarly, we say that b is a lower bound forx when k b≤x for all k. k The largest lower bound for x is referred to as the infimum of x , and it is denoted k k ˜ by inf x . When there exists an index k such that k k x = infx , ˜ k k k then we say that the infimum of x is attained, and we write min x instead of inf x . k k k k k We note that both the infimum and the supremum ofx may be infinite. Moreover, we k always have infx ≤ supx . k k k k We refer to the largest accumulation point a as the limit superior ofx and we write k limsupx =a. k k→∞ Note that this includes the possibility that a may be infinite (i.e., a =−∞ or a = +∞). Similarly, we refer to the smallest accumulation point b as the limit inferior of x and k we write liminfx =b, k k→∞ including the possibility that b is infinite. We always have infx ≤ liminfx ≤ limsupx ≤ supx . k k k k k k→∞ k→∞ k Furthermore, there holds liminfx = limsupx k k k→∞ k→∞ if and only if x is convergent, including the possibilities x → +∞ or x →−∞. k k k 1.2.2 Set Topology Open, Closed and Compact Sets n Let X ⊆R be a nonempty set. The set X is bounded when there exists a scalar C such that kxk≤C for all x∈X. A vector x˜ is an accumulation (or a limit) point of the set X when there is a sequence x ⊆ X such that x → x˜. The set X is closed when it contains all of its accumulation k k c n points. The set X is open when its complement set X =x∈R x6∈X is closed. AsetX iseitheropenorclosed,ornoneoftheabove(neitheropennorclosed). Theonly n exceptions to this rule are the whole vector spaceR and the empty set∅. By convention, n the setsR and∅ are the only sets that are both open and closed. We have:18 CHAPTER 1. REVIEW AND MISCELLANEA n • A subspace ofR is closed. n • An affine set inR is closed. n • The set B(x,r)⊆R given by n n B(x,r) =y∈R ky−xk≤r for some x∈R and r 0 is closed. This ball if also referred to as the closed ball centered at x with radius r. Thereisanalternative(equivalent)wayofdefiningopenandclosedsets. Thepreceding definition starts with the notion of a closed set, and then defines an open set as the set whose complement is closed. n In the following definition of an open set, we use the notion of an open ball inR . An n open ball inR centered at x and with radius r 0, denoted by B(x,r), is the set given by n B(x,r) =y∈R ky−xkr. n GivenasetX ⊆R ,thesetX isopenifforeveryx∈X thereisaradiusr smallenough (depending on x) such that the ball B(x,r) is contained in the set X, i.e., for every x∈X c there is r 0 such that B(x,r)⊆X. The set X is closed if its complement X is open. Let J be some set. We say that J is finite if J has finitely many elements i.e., the cardinality of J is finite. The following are some important properties of a family of open/closed sets. n Lemma 7 Let X j∈J be a family of sets X ⊆R , where I is some index set. j j (a) If X is closed for each j∈J, then the intersection set ∩ X is closed. j j∈J j (b) If X is open for each j∈J, then the union set ∪ X is open. j j∈J j (c) If J is finite and X is closed for each j∈J, then the union ∪ X is closed. j j∈J j (d) If J is finite and X is open for each j∈J, then the intersection ∩ X is open. j j∈J j A set X is compact when every sequence x ⊆ X has an accumulation point x˜ that k n belongs to the set X. Compact sets in R have another characterization as given in the following. Lemma 8 The set X is compact if and only if it is closed and bounded. n A closed ball inR is compact. Neither a subspace nor an affine set is compact since none of such sets is bounded.1.2. REAL ANALYSIS AND MULTIVARIATE CALCULUS 19 Closure, Interior and Boundary n Let X ⊆ R be a nonempty set. The closure of the set X is the set of all accumulation points of X, and it is denoted by clX. Note that, we always have X ⊆ clX, where the equality holds only when X is closed. A vector x ∈ X is an interior point of X when there exists a ball centered at x with some radius r such that B(x,r)⊆ X. The set of all interior points of a set X is referred to as the interior of X, and it is denoted by intX. We always have intX ⊆X, where the equality holds only when X is open. n A vector x˜ ∈ R is a boundary point of the set X when for every radius r, the ball B(x, ˜ r) contains the points that belong to the set X and the points that do not belong to the set X. Formally, x˜ is a boundary point of X when c B(x, ˜ r)∩X 6=∅ and B(x, ˜ r)∩X = 6 ∅ for all r 0. 1.2.3 Mapping and Function Mapping n m Let X ⊆R . A mapping from a set X toR is an operation that to every x∈ X assigns m m m a vector y ∈R . We write F : X →R to denote a mapping F from the set X to R . We refer to X as a domain of the mapping F. The domain of a mapping F is denoted by domF. The image of X under the mapping F is the following set m F(X) =y∈R y =F(x) for some x∈X. m Given a set Y ⊆R , the inverse image of Y under the mapping F is the following set −1 F (Y) =x∈X F(x)∈Y. m n Let F be a mapping F : X → R with X ⊆ R . The mapping F is affine when for m some matrix A and a vector b∈R , F(x) =Ax+b for all x∈X. n n For example, a space translation (i.e., for a given x ∈R , F(x) = x+x for all x∈R ) 0 0 n n is an affine mapping fromR toR . The mapping F is linear when for a matrix A, F(x) =Ax for all x∈X. n For example, given the coordinate subspace S = x ∈ R x = 0 for j ∈ J, where j n n J ⊆1,...,n, the projection on the subspace S is a linear mapping fromR toR , i.e., the mapping F given by F(x) =Ax where A = 1 for i6∈J and A = 0 otherwise ii ij n n is linear fromR toR .20 CHAPTER 1. REVIEW AND MISCELLANEA Function n When f is a mapping from a set X ⊆R to the scalar setR, we say that f is a function. n Some special functions defined onR include n • Quadraticfunctiongivenforann×nmatrixQ,avectora∈R ,andascalarb∈R,by T T n f(x) =x Qx+a x+b for all x∈R . n • Affine function given for a vector a∈R and a scalar b∈R, by T n f(x) =a x+b for all x∈R . n • Linear function given for a vector a∈R , by T n f(x) =a x for all x∈R . • Constant function given for a scalar b∈R, by n f(x) =b for all x∈R . 1.2.4 Continuity m n Let F : X →R be a mapping with X ⊆R , and let x ∈ X be a given vector. We say that F is continuous at the vector x when the vectors F(x ) converge to F(x) for every k sequencex ⊆X converging to x, i.e., k F(x )→F(x) for everyx ⊆X with x →x. k k k n m We sayF is continuous over X whenF is continuous at everyx∈X. WhenF :R →R n is continuous overR , we just say that F is continuous. For example, any vector norm in n R is a continuous function. m n A mapping F :X →R , with X ⊆R , is Lipschitz (continuous) over X if there exists a scalar c 0 (possibly depending on X) such that kF(x)−F(y)k≤ckx−yk for all x,y∈X. n When the preceding relation holds for X =R , we simply say F is Lipschitz (continuous). A function f :X →R is lower semicontinuous at a vector x∈X when f(x)≤ liminff(x ) for everyx ⊆X with x →x. k k k k→∞ A function f is upper semicontinuous at a vector x ∈ X when the function −f is lower semicontinuous at x, i.e., f(x)≥ limsupf(x ) for everyx ⊆X with x →x. k k k k→∞ When f is lower semicontinuous at every point x in some set X, we say f is lower semi- n continuous over X. When f is lower semicontinuous at every point x ∈ R , we say f is lower semicontinuous. Analogous terminology is used for upper semicontinuity.1.2. REAL ANALYSIS AND MULTIVARIATE CALCULUS 21 Lemma 9 The following are some properties of continuous mappings. p n m n n (a) Let Y ⊆R . Let F :R →R be a continuous mapping overR and G :Y →R be m a continuous mapping over Y. Then, their composition F ◦G :Y →R given by (F ◦G)(y) =F(G(y)) for all y∈Y is continuous over Y. n m (b) Let F :R →R be a continuous map. n (i) Let X ⊂R be a compact set. Then, the image F(X) of X under F is compact. m −1 (ii) Let Y ⊆ R . If Y is open, then the inverse image F (Y) is open. If Y is −1 closed, then the inverse image F (Y) is closed. n Given a function f :R →R, the (lower) level set L (f) of f for a given γ ∈R is the γ set defined by n L (f) =x∈R f(x)≤γ. γ When f is lower semicontinuous the level set L (f) is closed for each γ. In fact, a stronger γ relation holds, as given in the following. n Theorem 2 The function f :R →R is lower semicontinuous if and only if the level set L (f) is closed for each γ∈R. γ n Let X ⊆R and f : X →R. The function f is coercive over X when it satisfies the following relation lim f(x ) = +∞ for every sequencex ⊆X withkx k→∞. k k k k→∞ n When X =R in the preceding relation, we just say f is coercive. In the next theorem, we provide a condition on the function f and the given set X that is necessary for the attainment of both the infimum of f(x) over X and the supremum of f over X. n n Theorem 3 (Weierstrass) Let f : R → R be a continuous function and X ⊆ R be a nonempty compact set. Then, both inf f(x) and sup f(x) are finite, and there exist x∈X x∈X ∗ vectors x ∈X and x˜∈X such that ∗ f(x ) = inf f(x) and f(x˜) = supf(x). x∈X x∈X The following theorem provides some conditions on the function f and the given set X that can guarantee only the attainment of the infimum of f(x) over X. n Theorem 4 Let X ⊆ R be a nonempty set and let f : X → R be a function lower semicontinuous over X. Furthermore, let any of the following conditions be satisfied: (i) The function f is coercive over X and the set X is closed.22 CHAPTER 1. REVIEW AND MISCELLANEA (ii) For some γ∈R, the set x∈X f(x)≤γ is nonempty and compact. (iii) The set X is compact. ∗ Then, inf f(x) is finite and there exists a vector x ∈X such that x∈X ∗ f(x ) = inf f(x). x∈X 1.2.5 Differentiability Differentiable Functions n n n Let f :R →R be a function, and let a vector x∈R and a direction d∈R be given. Consider the limit f(x+λd)−f(x) 0 f (x;d) = lim , λ↓0 λ 0 where λ↓ 0 means that λ→ 0 with λ 0. When this limit exists, we say that f (x;d) is the directional derivative of f along direction d at the point x. 0 Suppose that f at the point x has directional derivatives f (x;d) in all directions d∈ n 0 n R . If the directional derivative function f (x;·) : R → R is linear, we say that f is differentiable at x. This type of differentiability is also known as Gateaux differentiability. It is equivalent to the existence of the gradient ∇f(x) of f at x, which satisfies 0 T n f (x;d) =∇f(x) d for all d∈R . The gradient∇f(x) is a column vector given by   ∂f(x) ∂x 1   . . ∇f(x) = ,   . ∂f(x) ∂x n ∂f(x) where for each i = 1,...,n is a partial derivative of f at x given by ∂x i ∂f(x) f(x+λe )−f(x) i = lim , λ→0 ∂x λ i n with e being i-th basis vector inR (see Section 1.1.3). When f is differentiable at x for i n all x in some (open) set X, we say f is differentiable over X. If f is differentiable overR , we just say f is differentiable. Let f be differentiable over some (open) set X and assume that ∇f(·) is continuous function over X. We then say that f is continuously differentiable over X. In this case, the following relation holds T f(x+d)−f(x)−∇f(x) d lim = 0 for all x∈X. kdk→0 kdk1.2. REAL ANALYSIS AND MULTIVARIATE CALCULUS 23 2 ∂ f(x) n Let f : R → R have second partial derivatives for all i,j at a given vector x. ∂x ∂x i j 2 2 Then, we say f is twice differentiable at x. The matrix ∇ f(x) with entries ∇ f(x) = ij 2 ∂ f(x) 2 is the Hessian of f at x, i.e., the Hessian∇ (x) is given by ∂x ∂x i j   2 2 ∂ f(x) ∂ f(x) ··· ∂x ∂x ∂x ∂x 1 1 1 n   . . 2 . . . . ∇ f(x) = .  .  . . 2 2 ∂ f(x) ∂ f(x) ··· ∂x ∂x ∂x ∂x n 1 n n Since the second partial derivatives are symmetric, 2 2 ∂ f(x) ∂ f(x) = for all i,j, ∂x∂x ∂x ∂x i j j i 2 the Hessian ∇ f(x) is a symmetric matrix. 2 If the Hessian ∇ f(x) exists at every x in an (open) set X, we say that f is twice differentiable over X. In addition, when the Hessian is continuous over X, we say f is twice continuously differentiable over X. Similarly, we say f is twice (continuously) n differentiable, when f is twice (continuously) differentiable overR . Mean Value Theorem The following theorem provide some useful properties of continuous and twice continuous differentiable functions. n Theorem 5 LetX ⊆R be an open set and letx,y∈X be arbitrary. Also, letf :X →R. (a) If f is continuously differentiable over X, then T f(y) =f(x)+∇f(z) (y−x), where z =αx+(1−α)y for some scalar α∈ 0,1. (b) If f is twice continuously differentiable over X, then 1 T T 2 f(y) =f(x)+∇f(x) (y−x)+ (y−x) ∇ f(ξ)(y−x), 2 where ξ =βx+(1−β)y for some scalar β∈ 0,1. Taylor Expansions The following theorem provides useful function value approximations based on Taylor’s series expansion. n Theorem 6 LetX ⊆R be an open set and letx,y∈X be arbitrary. Also, letf :X →R. (a) If f is continuously differentiable over X, then T f(y) =f(x)+∇f(x) (y−x)+o(ky−xk).24 CHAPTER 1. REVIEW AND MISCELLANEA (b) If f is twice continuously differentiable over X, then 1 T T 2 2 f(y) =f(x)+∇f(x) (y−x)+ (y−x) ∇ f(x)(y−x)+o(kyk ). 2 o(α) Here, o(α) is a continuous function such that lim = 0. α→0 α A function with Lipschitz continuous gradient can be approximated by a quadratic function as seen in the following theorem. n Theorem 7 Let f :R →R be a continuously differentiable function with Lipschitz gra- dient, i.e., for some scalar c 0, n k∇f(x)−∇f(y)k≤ckx−yk for all x,y∈R . n Then, we have for all x,y∈R , c T 2 f(y)≤f(x)+∇f(x) (y−x)+ ky−xk . 2 Differentiable Mappings n m Let F :R →R be a mapping given by   f (x) 1   . . F(x) = ,   . f (x) m n where f :R →R for all i. When each f is differentiable function at a given x, then the i i T T mapping F is differentiable at x. The matrix whose rows are∇f (x) ,...,∇f (x) is the 1 m Jacobian of F at x. The Jacobian is an m×n matrix denoted by JF(x), i.e.,   T ∇f (x) 1   . . JF(x) = .   . T ∇f (x) m Chain Rules p n n m Let G : R → R be a mapping differentiable at x, and let F : R → R be another mapping differentiable at G(x). Then, the composite mapping F ◦G is differentiable at x and the following chain rule holds for the Jacobian J(F ◦G)(x): J(F ◦G)(x) =JF(G(x))JG(x). When G is a linear mapping given by p G(x) =Ax for all x∈R and some n×p matrix A,1.2. REAL ANALYSIS AND MULTIVARIATE CALCULUS 25 then J(F ◦G)(x) =JF(Ax))A. p n n Let G :R →R be a mapping differentiable at x, and let f :R →R be a function differentiable at G(x). Then, the composite function f ◦G is differentiable at x and the following chain rule holds for the gradient ∇(f◦G)(x): T ∇(f◦G)(x) =JG(x) ∇f(G(x)). When G is a linear mapping given by p G(x) =Ax for all x∈R and for an n×p matrix A, then T ∇(f◦G)(x) =A ∇f(Ax)). Moreover, if f is twice differentiable at Ax, then the following chain rule holds for the 2 Hessian ∇ (f◦G)(x): 2 T 2 ∇ (f◦G)(x) =A ∇ f(Ax)A. n m n Let f : R ×R → R be a function of the variable (x,y) where x ∈ R and y ∈ m R . Then, the gradients ∇ f(x,y) and ∇ f(x,y) of f at (x,y) with respect to x and y, x y respectively, are given by     ∂f(x,y) ∂f(x,y) ∂y ∂x 1 1     . . . . ∇ f(x,y) = , ∇ f(x,y) = . x   y   . . ∂f(x,y) ∂f(x,y) ∂x ∂y n m p n p m p Let F : R → R and G : R → R be two mappings differentiable at z ∈ R . Let n m f :R ×R →R be a function of (x,y) differentiable at (F(z),G(z)). Then, the gradient ∇ f(F(z),G(z)) is given by z T T ∇ f(F(z),G(z)) =JF(z) ∇ f(F(z),G(z))+JG(z) ∇ f(F(z),G(z)). z x y

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.