Lecture Notes on Information Theory

how does information theory work and lecture notes on information theory and coding pdf fee download
Dr.JamesSmith Profile Pic
Dr.JamesSmith,France,Professional
Published Date:11-07-2017
Your Website URL(Optional)
Comment
Lecture Notes in Information Theory Volume II by y z Po-Ning Chen and Fady Alajaji y Department of Electrical Engineering Institute of Communication Engineering National Chiao Tung University 1001, Ta Hsueh Road Hsin Chu, Taiwan 30056 Republic of China Email: poningfaculty.nctu.edu.tw z Department of Mathematics & Statistics, Queen's University, Kingston, ON K7L 3N6, Canada Email: fadypolya.mast.queensu.ca December 7, 2014c Copyright by y z Po-Ning Chen and Fady Alajaji December 7, 2014Preface The reliable transmission of information bearing signals over a noisy commu- nication channel is at the heart of what we call communication. Information theoryfounded by Claude E. Shannon in 1948provides a mathematical frame- work for the theory of communication; it describes the fundamental limits to how eciently one can encode information and still be able to recover it with neg- ligible loss. This course will examine the basic concepts of this theory. What follows is a tentative list of topics to be covered. 1. Volume I: (a) Fundamentals of source coding (data compression): Discrete memo- ryless sources, entropy, redundancy, block encoding, variable-length encoding, Kraft inequality, Shannon code, Hu man code. (b) Fundamentals of channel coding: Discrete memoryless channels, mu- tual information, channel capacity, coding theorem for discrete mem- oryless channels, weak converse, channel capacity with output feed- back, the Shannon joint source-channel coding theorem. (c) Source coding with distortion (rate distortion theory): Discrete mem- oryless sources, rate-distortion function and its properties, rate-dis- tortion theorem. (d) Other topics: Information measures for continuous random variables, capacity of discrete-time and band-limited continuous-time Gaussian channels, rate-distortion function of the memoryless Gaussian source, encoding of discrete sources with memory, capacity of discrete chan- nels with memory. (e) Fundamental backgrounds on real analysis and probability (Appen- dix): The concept of set, supremum and maximum, in mum and minimum, boundedness, sequences and their limits, equivalence, pro- bability space, random variable and random process, relation between a source and a random process, convergence of sequences of random variables, ergodicity and laws of large numbers, central limit theorem, concavity and convexity, Jensen's inequality. ii2. Volumn II: (a) General information measure: Information spectrum and Quantile and their properties, R enyi's informatino measures. (b) Advanced topics of losslesss data compression: Fixed-length lossless data compression theorem for arbitrary channels, Variable-length loss- less data compression theorem for arbitrary channels, entropy of En- glish, Lempel-Ziv code. (c) Measure of randomness and resolvability: Resolvability and source coding, approximation of output statistics for arbitrary channels. (d) Advanced topics of channel coding: Channel capacity for arbitrary single-user channel, optimistic Shannon coding theorem, strong ca- pacity, "-capacity. (e) Advanced topics of lossy data compressing (f) Hypothesis testing: Error exponent and divergence, large deviations theory, Berry-Esseen theorem. (g) Channel reliability: Random coding exponent, expurgated exponent, partitioning exponent, sphere-packing exponent, the asymptotic lar- gest minimum distance of block codes, Elias bound, Varshamov-Gil- bert bound, Bhattacharyya distance. (h) Information theory of networks: Distributed detection, data com- pression over distributed source, capacity of multiple access channels, degraded broadcast channel, Gaussian multiple terminal channels. As shown from the list, the lecture notes are divided into two volumes. The rst volume is suitable for a 12-week introductory course as that given at the Department of Mathematics and Statistics, Queen's University at Kingston, Canada. It also meets the need of a fundamental course for senior undergradu- ates as that given at the Department of Computer Science and Information En- gineering, National Chi Nan University, Taiwan. For an 18-week graduate course as given in Department of Communications Engineering, National Chiao-Tung University, Taiwan, the lecturer can selectively add advanced topics covered in the second volume to enrich the lecture content, and provide a more complete and advanced view on information theory to students. The authors are very much indebted to all people who provided insightful comments on these lecture notes. Special thanks are devoted to Prof. Yunghsiang S. Han with the Department of Computer Science and Information Engineering in National Chi Nan University, Taiwan, for his enthusiasm in testing these lecture notes at his school, and providing the authors valuable feedback. iiiNotes to readers. In these notes, all the assumptions, claims, conjectures, corollaries, de nitions, examples, exercises, lemmas, observations, properties, and theorems are numbered under the same counter for ease of their searching. For example, the lemma that immediately follows Theorem 2.1 will be numbered as Lemma 2.2, instead of Lemma 2.1. In addition, you may obtain the latest version of the lecture notes from http://shannon.cm.nctu.edu.tw. Interested readers are welcome to return com- ments to qponingmail.nctu.edu.tw. ivAcknowledgements Thanks are given to our families for their full support during the period of writing these lecture notes. vTable of Contents Chapter Page List of Tables x List of Figures xi 1 Introduction 1 1.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 Generalized Information Measures for Arbitrary System Statis- tics 3 2.1 Spectrum and Quantile . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Properties of quantile . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Generalized information measures . . . . . . . . . . . . . . . . . . 11 2.4 Properties of generalized information measures . . . . . . . . . . . 12 2.5 Examples for the computation of general information measures . . 19 2.6 R enyi's information measures . . . . . . . . . . . . . . . . . . . . 24 3 General Lossless Data Compression Theorems 28 3.1 Fixed-length data compression codes for arbitrary sources . . . . . 29 3.2 Generalized AEP theorem . . . . . . . . . . . . . . . . . . . . . . 35 3.3 Variable-length lossless data compression codes that minimizes the exponentially weighted codeword length . . . . 38 3.3.1 Criterion for optimality of codes . . . . . . . . . . . . . . . 38 3.3.2 Source coding theorem for R enyi's entropy . . . . . . . . . 39 3.4 Entropy of English . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.4.1 Markov estimate of entropy rate of English text . . . . . . 40 3.4.2 Gambling estimate of entropy rate of English text . . . . . 42 A) Sequential gambling . . . . . . . . . . . . . . . . . . . . 42 3.5 Lempel-Ziv code revisited . . . . . . . . . . . . . . . . . . . . . . 44 vi4 Measure of Randomness for Stochastic Processes 53 4.1 Motivation for resolvability : measure of randomness of random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4.2 Notations and de nitions regarding to resolvability . . . . . . . . 54 4.3 Operational meanings of resolvability and mean-resolvability . . . 58 4.4 Resolvability and source coding . . . . . . . . . . . . . . . . . . . 65 5 Channel Coding Theorems and Approximations of Output Sta- tistics for Arbitrary Channels 76 5.1 General models for channels . . . . . . . . . . . . . . . . . . . . . 76 5.2 Variations of capacity formulas for arbitrary channels . . . . . . . 76 5.2.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 76 5.2.2 "-capacity . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 5.2.3 General Shannon capacity . . . . . . . . . . . . . . . . . . 87 5.2.4 Strong capacity . . . . . . . . . . . . . . . . . . . . . . . . 88 5.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 5.3 Structures of good data transmission codes . . . . . . . . . . . . . 91 5.4 Approximations of output statistics: resolvability for channels . . 93 5.4.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . 93 5.4.2 Notations and de nitions of resolvability for channels . . . 93 5.4.3 Results on resolvability and mean-resolvability for channels 95 6 Optimistic Shannon Coding Theorems for Arbitrary Single-User Systems 98 6.1 Motivations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 6.2 Optimistic source coding theorems . . . . . . . . . . . . . . . . . 99 6.3 Optimistic channel coding theorems . . . . . . . . . . . . . . . . . 102 6.4 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 6.4.1 Information stable channels . . . . . . . . . . . . . . . . . 106 6.4.2 Information unstable channels . . . . . . . . . . . . . . . . 106 7 Lossy Data Compression 111 7.1 General lossy source compression for block codes . . . . . . . . . . 111 8 Hypothesis Testing 119 8.1 Error exponent and divergence . . . . . . . . . . . . . . . . . . . . 119 8.1.1 Composition of sequence of i.i.d. observations . . . . . . . 122 8.1.2 Divergence typical set on composition . . . . . . . . . . . . 127 8.1.3 Universal source coding on compositions . . . . . . . . . . 128 8.1.4 Likelihood ratio versus divergence . . . . . . . . . . . . . . 132 8.1.5 Exponent of Bayesian cost . . . . . . . . . . . . . . . . . . 133 8.2 Large deviations theory . . . . . . . . . . . . . . . . . . . . . . . . 135 vii8.2.1 Tilted or twisted distribution . . . . . . . . . . . . . . . . 135 8.2.2 Conventional twisted distribution . . . . . . . . . . . . . . 135 8.2.3 Cramer's theorem . . . . . . . . . . . . . . . . . . . . . . . 136 8.2.4 Exponent and moment generating function: an example . . 137 8.3 Theories on Large deviations . . . . . . . . . . . . . . . . . . . . . 140 8.3.1 Extension of G artner-Ellis upper bounds . . . . . . . . . . 140 8.3.2 Extension of G artner-Ellis lower bounds . . . . . . . . . . 147 8.3.3 Properties of (twisted) sup- and inf-large deviation rate functions. . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 8.4 Probabilitic subexponential behavior . . . . . . . . . . . . . . . . 159 8.4.1 Berry-Esseen theorem for compound i.i.d. sequence . . . . 159 8.4.2 Berry-Esseen Theorem with a sample-size dependent mul- tiplicative coecient for i.i.d. sequence . . . . . . . . . . . 168 8.4.3 Probability bounds using Berry-Esseen inequality . . . . . 170 8.5 Generalized Neyman-Pearson Hypothesis Testing . . . . . . . . . 176 9 Channel Reliability Function 181 9.1 Random-coding exponent . . . . . . . . . . . . . . . . . . . . . . 181 9.1.1 The properties of random coding exponent . . . . . . . . . 185 9.2 Expurgated exponent . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.2.1 The properties of expurgated exponent . . . . . . . . . . . 193 9.3 Partitioning bound: an upper bounds for channel reliability . . . . 194 9.4 Sphere-packing exponent: an upper bound of the channel reliability201 9.4.1 Problem of sphere-packing . . . . . . . . . . . . . . . . . . 201 9.4.2 Relation of sphere-packing and coding . . . . . . . . . . . 201 9.4.3 The largest minimum distance of block codes . . . . . . . . 205 A) Distance-spectrum formula on the largest minimum distance of block codes . . . . . . . . . . . . . . . 208 B) Determination of the largest minimum distance for a class of distance functions . . . . . . . . . . . . . 214 C) General properties of distance-spectrum function . . . . 219 D) General Varshamov-Gilbert lower bound . . . . . . . . 227 9.4.4 Elias bound: a single-letter upper bound formula on the largest minimum distance for block codes . . . . . . . . . . 233 9.4.5 Gilbert bound and Elias bound for Hamming distance and binary alphabet . . . . . . . . . . . . . . . . . . . . . . . . 241 9.4.6 Bhattacharyya distance and expurgated exponent . . . . . 242 9.5 Straight line bound . . . . . . . . . . . . . . . . . . . . . . . . . . 243 10 Information Theory of Networks 252 10.1 Lossless data compression over distributed sources for block codes 253 10.1.1 Full decoding of the original sources . . . . . . . . . . . . . 253 viii10.1.2 Partial decoding of the original sources . . . . . . . . . . . 258 10.2 Distributed detection . . . . . . . . . . . . . . . . . . . . . . . . . 260 10.2.1 Neyman-Pearson testing in parallel distributed detection . 267 10.2.2 Bayes testing in parallel distributed detection systems . . . 274 10.3 Capacity region of multiple access channels . . . . . . . . . . . . . 275 10.4 Degraded broadcast channel . . . . . . . . . . . . . . . . . . . . . 276 10.5 Gaussian multiple terminal channels . . . . . . . . . . . . . . . . 278 ixList of Tables Number Page 2.1 Generalized entropy measures where 2 0; 1. . . . . . . . . . . . 13 2.2 Generalized mutual information measures where 2 0; 1. . . . . 14 2.3 Generalized divergence measures where 2 0; 1. . . . . . . . . . 15 xList of Figures Number Page 1 2.1 The asymptotic CDFs of a sequence of random variables,fAg . n n=1 u () = sup-spectrum of A ; u() = inf-spectrum of A ; U = n n 1   lim U . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 "1   2.2 The spectrum h() for Example 2.7. . . . . . . . . . . . . . . . . . 22  2.3 The spectrum i() for Example 2.7. . . . . . . . . . . . . . . . . . 22 n 2.4 The limiting spectrums of (1=n)h n(Z ) for Example 2.8 . . . . . 23 Z n n 2.5 The possible limiting spectrums of (1=n)i n n(X ;Y ) for Ex- X ;Y ample 2.8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 3.1 Behavior of the probability of block decoding error as block length n goes to in nity for an arbitrary sourceX. . . . . . . . . . . . . 35  3.2 Illustration of generalized AEP Theorem.F (;"),T H (X) + n n "  nT H (X) is the dashed region. . . . . . . . . . . . . . . . 37 n " 3.3 Notations used in Lempel-Ziv coder. . . . . . . . . . . . . . . . . 48 4.1 Source generator:fXg (I = (0; 1)) is an independent random t t2I process with P (0) = 1P (1) =t, and is also independent of X X t t the selector Z, where X is outputted if Z =t. Source generator t of each time instance is independent temporally. . . . . . . . . . . 72 n 4.2 The ultimate CDF of(1=n) logP n(X ): Prfh (Z)tg. . . . . 74 X b n n 5.1 The ultimate CDFs of (1=n) logP (N ). . . . . . . . . . . . . . 89 N 5.2 The ultimate CDF of the normalized information density for Ex- ample 5.14-Case B). . . . . . . . . . . . . . . . . . . . . . . . . . 91 5.3 The communication system. . . . . . . . . . . . . . . . . . . . . . 94 5.4 The simulated communication system. . . . . . . . . . . . . . . . 94 7.1  (D + )") sup : ()"D + . . . . . . . . 115 Z;f(Z) Z;f(Z) n n 7.2 The CDF of (1=n) (Z ;f (Z )) for the probability-of-error dis- n n tortion measure. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 8.1 The geometric meaning for Sanov's theorem. . . . . . . . . . . . . 126 xi8.2 The divergence view on hypothesis testing. . . . . . . . . . . . . . 134 8.3 Function of (=6)uh(u). . . . . . . . . . . . . . . . . . . . . . . 168 8.4 The Berry-Esseen constant as a function of the sample sizen. The sample size n is plotted in log-scale. . . . . . . . . . . . . . . . . . 170 9.1 BSC channel with crossover probability " and input distribution (p; 1p). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 9.2 Random coding exponent for BSC with crossover probability 0:2.  Also plotted is s = arg sup sRE (s). R = 0:056633. . 188 0 cr 0s1 9.3 Expurgated exponent (solid line) and random coding exponent (dashed line) for BSC with crossover probability 0:2 (over the range of (0; 0:192745)). . . . . . . . . . . . . . . . . . . . . . . . . 195 9.4 Expurgated exponent (solid line) and random coding exponent (dashed line) for BSC with crossover probability 0:2 (over the range of (0; 0:006)). . . . . . . . . . . . . . . . . . . . . . . . . . . 196 9.5 Partitioning exponent (thick line), random coding exponent (thin line) and expurgated exponent (thin line) for BSC with crossover probability 0:2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 c c 9.6 (a) The shaded area isU ; (b) The shaded area isA . . . . . . 247 0 m m;m n n  9.7  (R) asymptotically lies between ess inf(1=n)(X ;X ) and X n n n n   (1=n)E (X ;X )j (X ;X ) for R (X)RR (X). . . . . 248 n n p 0  9.8 General curve of  (R). . . . . . . . . . . . . . . . . . . . . . . . 248 X    1=2s 9.9 Function of sup sRs log 2s 1e . . . . . . . . . . . 249 s0     1=s 9.10 Function of sup sRs log 2 +e =4 . . . . . . . . . . . 249 s0 10.1 The multi-access channel. . . . . . . . . . . . . . . . . . . . . . . 253 10.2 Distributed detection with n senders. Each observations Y may i come from one of two categories. The nal decisionD2fH ;Hg. 254 0 1 10.3 Distributed detection inS . . . . . . . . . . . . . . . . . . . . . . 260 n 10.4 Bayes error probabilities associated with g and g . . . . . . . . . . 266  10.5 Upper and lower bounds on e ( ) in Case B. . . . . . . . . . . . 275 NP xiiChapter 1 Introduction This volume will talk about some advanced topics in information theory. The mathematical background on which these topics are based can be found in Ap- pendices A and B of Volume I. 1.1 Notations Here, we clarify some of the easily confused notations used in Volume II of the lecture notes. For a random variable X, we use P to denote its distribution. For con- X venience, we will use interchangeably the two expressions for the probability of X =x, i.e., PrfX =xg and P (x). Similarly, for the probability of a set char- X acterizing through an inequality, such as f(x) a, its probability mass will be expressed by either P fx2X : f(x)ag X or Prff(X)ag: In the second expression, we viewf(X) as a new random variable de ned through X and a function f(). Obviously, the above expressions can be applied to any legitimate function f() de ned overX , including any probability functionP () (or logP (x)) of a X X random variable X. Therefore, the next two expressions denote the probability of f(x) =P (x)a evaluated under distribution P : X X P fx2X : f(x)ag =P fx2X : P (x)ag X X X and Prff(X)ag = PrfP (X)ag: X 1As a result, if we write   P (x;y) X;Y P (x;y)2XY : log a X;Y P (x)P (y) X Y   P (X;Y ) X;Y = Pr log a ; P (X)P (Y ) X Y it means that we de ne a new function P (x;y) X;Y f(x;y), log P (x)P (y) X Y in terms of the joint distribution P and its two marginal distributions, and X;Y concern the probability of f(x;y)a where x and y have distribution P . X;Y 2Chapter 2 Generalized Information Measures for Arbitrary System Statistics In Volume I of the lecture notes, we show that the entropy, de ned by X H(X), P (x) logP (x) =E logP (X) nats; X X X X x2X of a discrete random variableX is a measure of the average amount of uncertainty in X. An extension de nition of entropy to a sequence of random variables X ;X ;:::;X ;::: is the entropy rate, which is given by 1 2 n 1 1 n n n lim H(X ) = lim E logP (X ); X n1 n1 n n assuming the limit exists. The above quantities have an operational signi cance established via Shannon's coding theorems when the stochastic systems under consideration satisfy certain regularity conditions, such as stationarity and ergod- icity 3, 5. However, in more complicated situations such as when the systems are non-stationary or with time-varying nature, these information rates are no longer valid and lose their operational signi cance. This results in the need to establish new entropy measures which appropriately characterize the operational limits of arbitrary stochastic systems. Let us begin with the model of arbitrary system statistics. In general, there are two indices for observations: time index and space index. When a sequence of observations is denoted by X ;X ;:::;X ;:::, the subscript i of X can be 1 2 n i treated as either a time index or a space index, but not both. Hence, when a sequence of observations are functions of both time and space, the notation of X ;X ;:::;X ;:::, is by no means sucient; and therefore, a new model for a 1 2 n time-varying multiple-sensor system, such as (n) (n) (n) X ;X ;:::;X ;:::; 1 2 t 3where t is the time index and n is the space or position index (or vice versa), becomes signi cant. When block-wise compression of such source (with block lengthn) is consid- ered, same question as to the compression of i.i.d. source arises: what is the minimum compression rate (bits per source sample) for which the error can be made arbitrarily small (2.0.1) as the block length goes to in nity? To answer the question, information theorists have to nd a sequence of data compression codes for each block length n and investigate if the decompression error goes to zero as n approaches in nity. However, unlike those simple source models considered in Volume I, the arbitrary source for each block lengthn may exhibit distinct statistics at respective sample, i.e., (1) n = 1 : X 1 (2) (2) n = 2 : X ;X 1 2 (3) (3) (3) n = 3 : X ;X ;X 1 2 3 (4) (4) (4) (4) n = 4 : X ;X ;X ;X (2.0.2) 1 2 3 4 . . . (4) (1) (2) (3) and the statistics ofX could be di erent fromX ,X andX . Since it is 1 1 1 1 the most general model for the question in (2.0.1), and the system statistics can be arbitrarily de ned, it is therefore named arbitrary statistics system. In notation, the triangular array of random variables in (2.0.2) is often de- noted by a boldface letter as 1 n X,fX g ; n=1   (n) (n) (n) n where X , X ;X ;:::;X ; for convenience, the above statement is n 1 2 sometimes briefed as n  o 1 (n) (n) n (n) X, X = X ;X ;:::;X : 1 2 n n=1 In this chapter, we will rst introduce a new concept on de ning information measures for arbitrary system statistics and then, analyze in detail their algebraic properties. In the next chapter, we will utilize the new measures to establish general source coding theorems for arbitrary nite-alphabet sources. 42.1 Spectrum and Quantile 1 De nition 2.1 (inf/sup-spectrum) IffAg is a sequence of random vari- n n=1 ables, then its inf-spectrum u() and its sup-spectrum u () are de ned by u(), lim inf PrfA g; n n1 and u (), lim sup PrfA g: n n1 In other words, u() and u () are respectively the liminf and the limsup of the cumulative distribution function (CDF) of A . Note that by de nition, the n CDF ofA PrfA g is non-decreasing and right-continuous. However, n n 1 for u() and u (), only the non-decreasing property remains. De nition 2.2 (quantile of inf/sup-spectrum) For any 0    1, the 1 It is pertinent to also point out that even if we do not require right-continuity as a funda- mental property of a CDF, the spectrums u() andu () are not necessarily legitimate CDFs of (conventional real-valued) random variables since there might exist cases where the \probabi- lity mass escapes to in nity" (cf. 1, pp. 346). A necessary and sucient condition for u() and u () to be conventional CDFs (without requiring right-continuity) is that the sequence of distribution functions of A is tight 1, pp. 346. Tightness is actually guaranteed if the n alphabet of A is nite. n 52 3  quantiles U and U of the sup-spectrum and the inf-spectrum are de ned by   U , supf :u ()g  and  U , supf :u()g;   respectively. It follows from the above de nitions that U and U are right-   continuous and non-decreasing in. Note that the supremum of an empty set is de ned to be1. 1 Based on the above de nitions, the liminf in probability U offAg 4, n n=1 which is de ned as the largest extended real number such that for all  0, lim PrA U = 0; n n1 2 Generally speaking, one can de ne \quantile" in four di erent ways:  U , supf : lim inf PrA g  n n1  U , supf : lim inf PrA g  n n1 +  U , supf : lim inf PrA g n  n1 +  U , supf : lim inf PrA g: n  n1 The general relations between these four quantities are as follows: + +     U =U U =U :     + +      Obviously, U  U  U and U  U by their de nitions. It remains to show that      + + +       U U , that U U and that U U .       + +    Suppose U U + for some 0. Then by de nition of U , lim inf PrA  n1 n      U + , which implies lim inf PrA U + =2 lim inf PrA U +   n1 n  n1 n  +    and violates the de nition of U . This completes the proof of U  U . To prove that    + +    U U , note that from the de nition ofU , we have that for any" 0, lim inf PrA  n1 n   + + +     U " and hence lim inf PrA U 2", implying that U 2"U . The n1 n     + +    latter yields that U  inf U 2" = U , which completes the proof. Proving that "0    +   U U follows a similar argument.     It is worth noting that U = lim U . Their equality can be proved by rst observing  "       that U  lim U by their de nitions, and then assuming that , U lim U 0.  "   "      ThenU U + =2 (U ) =2 implies thatu((U ) =2) for arbitrarily close to       from below, which in turn implies u((U ) =2), contradicting to the de nition of U .     In the lecture notes, we will interchangeably use U and lim U for convenience.  "  + +   The nal note is that U and U will not be used in de ning our general information   measures. They are introduced only for mathematical interests. 3 Note that the usual de nition of the quantile function () of a non-decreasing function F () is slightly di erent from our de nition 1, pp. 190, where () , supf : F () g: Remark that if F () is strictly increasing, then the quantile is nothing but the inverse of F (): 1 () =F (). 64 satis es U = limU =U :  0 0  Also, the limsup in probability U (cf. 4), de ned as the smallest extended real number such that for all  0,  lim PrA U + = 0; n n1 5 is exactly   U = limU = supf :u() 1g;  "1 Straightforwardly by their de nitions,   UU U U   for 2 0; 1).   Remark that U and U always exist. Furthermore, if U = U for all  in     0; 1, the sequence of random variablesA converges in distribution to a random n variable A, provided the sequence of A is tight. n For a better understanding of the quantities de ned above, we depict them in Figure 2.1. 4 It is obvious from their de nitions that limU U U:  0 0 The equality of lim U and U can be proved by contradiction by rst assuming 0  , limU U 0:  0 Then u (U + =2) for arbitrarily small  0, which immediately implies u (U + =2) = 0, contradicting to the de nition of U. 5    Since 1 = lim PrfA U +g lim PrfA U +g =u(U +), it is straight- n1 n n1 n forward that   U supf :u() 1g = limU :  "1   The equality of U and lim U can be proved by contraction by rst assuming that "1    ,U limU 0:  "1   Then 1u(U =2) for arbitrarily close to 1, which impliesu(U =2) = 1. Accordingly, by    1 lim inf PrfA U =4g lim inf PrfA U =2g =u(U =2) = 1; n n n1 n1 we obtain the desired contradiction. 7

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.