Question? Leave a message!




Probability & Stochastic Processes for Communications

Probability & Stochastic Processes for Communications 41
Probability Stochastic Processes for Communications: A Gentle Introduction Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 1Outline  Please see my experimental networking class for a longer video/audio primer on probability (not stochastic processes):  http://www.ecse.rpi.edu/Homepages/shivkuma/teaching/fall2006/index.html  Focus on Gaussian, Rayleigh/Ricean/Nakagami, Exponential, ChiSquared distributions:  Qfunction, erfc(),  Complex Gaussian r.v.s,  Random vectors: covariance matrix, gaussian vectors  …which we will encounter in wireless communications  Some key bounds are also covered: Union Bound, Jensen’s inequality etc  Elementary ideas in stochastic processes:  I.I.D, Autocorrelation function, Power Spectral Density (PSD)  Stationarity, WeakSenseStationarity (w.s.s), Ergodicity  Gaussian processes AWGN (“white”)  Random processes operated on by linear systems Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 2Elementary Probability Concepts (selfstudy) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 3Probability  Think of probability as modeling an experiment  Eg: tossing a coin  The set of all possible outcomes is the sample space: S  Classic “Experiment”:  Tossing a die: S = 1,2,3,4,5,6  Any subset A of S is an event:  A = the outcome is even = 2,4,6 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 4Probability of Events: Axioms •P is the Probability Mass function if it maps each event A, into a real number P(A), and: i.) P(A) 0 for every event A S ii.) P(S) = 1 iii.)If A and B are mutually exclusive events then, P(AB) P(A) P(B) B A A B Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 5Probability of Events …In fact for any sequence of pairwisemutually exclusive events, we have A ,A ,A ,... (i.e. A A 0 for any i j) 1 2 3 ij A A, and A. S  i j i i1  A 1  A 2 P A P() A nn A i nn  11  A j A n Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 6Detour: Approximations/Bounds/Inequalities Why A large part of information theory consists in finding bounds on certain performance measures Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 7Approximations/Bounds: Union Bound A B P(A  B) = P(A) + P(B) P(A A … A ) =  P(A ) 1 2 N i= 1..N i  Applications:  Getting bounds on BER (biterror rates),  In general, bounding the tails of prob. distributions  We will use this in the analysis of error probabilities with various coding schemes (see chap 3, Tse/Viswanath) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 8Approximations/Bounds: log(1+x)  log (1+x) ≈ x for small x 2  Application: Shannon capacity w/ AWGN noise:  BitsperHz = C/B = log (1+ ) 2  If we can increase SNR () linearly when  is small (i.e. very poor, eg: celledge)…  … we get a linear increase in capacity.  When  is large, of course increase in  gives only a diminishing return in terms of capacity: log (1+ ) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 9Approximations/Bounds: Jensen’s Inequality Second derivative 0 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 10Schwartz Inequality Matched Filter T  Inner Product (a x) = Product of Norms (i.e. ax)  Projection length = Product of Individual Lengths  This is the Schwartz Inequality  Equality happens when a and x are in the same direction (i.e. cos = 1, when  = 0)  Application: “matched” filter  Received vector y = x + w (zeromean AWGN)  Note: w is infinite dimensional  Project y to the subspace formed by the finite set of transmitted symbols x: y’  y’ is said to be a “sufficient statistic” for detection, i.e. reject the noise dimensions outside the signal space.  This operation is called “matching” to the signal space (projecting)  Now, pick the x which is closest to y’ in distance (ML detection = nearest neighbor) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 11Back to Probability… Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 12Conditional Probability P(A B) • = (conditional) probability that the outcome is in A given that we know the outcome in B P() AB P(A B) P(B) 0 PB () •Example: Toss one die. Pi ( 3 i is odd)= •Note that: P(AB) P(B)P(A B) P(A)P(B A) What is the value of knowledge that B occurred How does it reduce uncertainty about A Shivkumar Kalyanaraman How does it change P(A) Rensselaer Polytechnic Institute : “shiv rpi” 13Independence  Events A and B are independent if P(AB) = P(A)P(B).  Also: and P(B A) P(B) P(A B) P(A)  Example: A card is selected at random from an ordinary deck of cards.  A=event that the card is an ace.  B=event that the card is a diamond. P() AB PA () PB () P(A)P(B) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 14Random Variable as a Measurement  Thus a random variable can be thought of as a measurement (yielding a real number) on an experiment  Maps “events” to “real numbers”  We can then talk about the pdf, define the mean/variance and other moments Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 15Histogram: Plotting Frequencies Class Freq. 15 but 25 3 Count 25 but 35 5 5 35 but 45 2 4 Frequency 3 Relative Bars 2 Frequency Touch 1 Percent 0 0 15 25 35 45 55 Lower Boundary Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 16Probability Distribution Function (pdf): continuous version of histogram a.k.a. frequency histogram, p.m.f (for discrete r.v.) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 17Continuous Probability Density Function  1. Mathematical Formula Frequency  2. Shows All Values, x, (Value, Frequency) Frequencies, f(x)  f(X) Is Not Probability f(x)  3. Properties  f (x)dx 1 x a b All X (Area Under Curve) Value f (x ) 0, a x  b Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 18Cumulative Distribution Function  The cumulative distribution function (CDF) for a random variable X is F (x) P(X x) P(sS X(s) x) X  Note that is nondecreasing in x, i.e. Fx() X x x F (x ) F (x ) 1 2 xx 1 2  Also and limFx ( )1 limFx ( ) 0 x x x  x Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 19Probability density functions (pdf) 1.5 Lognormal(0,1) Gamma(.53,3) Exponential(1.6) Weibull(.7,.9) Pareto(1,1.5) 1 0.5 0 0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5 x Emphasizes main body of distribution, frequencies, Shivkumar Kalyanaraman Rensselaer Polytechnic Institute various modes (peaks), variability, skews : “shiv rpi” 20 f(x)Cumulative Distribution Function (CDF) 1 0.9 Lognormal(0,1) Gamma(.53,3) 0.8 Exponential(1.6) Weibull(.7,.9) Pareto(1,1.5) 0.7 0.6 0.5 0.4 0.3 0.2 median 0.1 0 0 2 4 6 8 10 12 14 16 18 20 x Emphasizes skews, easy identification of median/quartiles, Shivkumar Kalyanaraman Rensselaer Polytechnic Institute converting uniform rvs to other distribution rvs : “shiv rpi” 21 F(x)Complementary CDFs (CCDF) 0 10 1 10 2 10 3 10 Lognormal(0,1) Gamma(.53,3) Exponential(1.6) Weibull(.7,.9) 4 ParetoII(1,1.5) 10 ParetoI(0.1,1.5) 1 0 1 2 10 10 10 10 log(x) Useful for focussing on “tails” of distributions: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Line in a loglog plot = “heavy” tail : “shiv rpi” 22 log(1F(x))Numerical Data Properties Central Tendency (Location) Variation (Dispersion) Shape Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 23Numerical Data Properties Measures Numerical Data Properties Central Variation Shape Tendency Mean Range Skew Interquartile Range Median Mode Variance Standard Deviation Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 24Expectation of a Random Variable: EX  The expectation (average) of a (discretevalued) random variable X is  X E(X ) xP(X x) xP (x) X x Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 25Continuousvalued Random Variables  Thus, for a continuous random variable X, we can define its probability density function (pdf) dF () x ' X f (x) F (x) X x dx  Note that since is nondecreasing in x we Fx () X have fx ( ) 0 for all x. X Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 26Expectation of a Continuous Random Variable  The expectation (average) of a continuous random variable X is given by  E(X ) xf (x)dx X    Note that this is just the continuous equivalent of the discrete expectation  E(X ) xP (x) X x Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 27Other Measures: Median, Mode 1  Median = F (0.5), where F = CDF  Aka 50 percentile element  I.e. Order the values and pick the middle element  Used when distribution is skewed  Considered a “robust” measure  Mode: Most frequent or highest probability value  Multiple modes are possible  Need not be the “central” element  Mode may not exist (eg: uniform distribution)  Used with categorical variables Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 28Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 29Indices/Measures of Spread/Dispersion: Why Care You can drown in a river of average depth 6 inches Lesson: The measure of uncertainty or dispersion may matter more than the index of central tendency Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 30Standard Deviation, Coeff. Of Variation, SIQR  Variance: second moment around the mean: 2 2  = E(X)  Standard deviation =   Coefficient of Variation (C.o.V.)=/  SIQR= SemiInterQuartile Range (used with median th = 50 percentile) th th  (75 percentile – 25 percentile)/2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 31Covariance and Correlation: Measures of Dependence  Covariance: =  For i = j, covariance = variance  Independence = covariance = 0 (not viceversa)  Correlation (coefficient) is a normalized (or scaleless) form of covariance:  Between –1 and +1. Zero = no correlation (uncorrelated). Note: uncorrelated DOES NOT mean independent Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 32Random Vectors Sum of R.V.s  Random Vector = X , …, X , where Xi = r.v. 1 n  Covariance Matrix:  K is an nxn matrix…  K = CovX ,X ij i j  K = CovX ,X = VarX ii i i i  Sum of independent R.v.s  Z = X + Y  PDF of Z is the convolution of PDFs of X and Y Can use transforms Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 33Characteristic Functions Transforms  Characteristic function: a special kind of expectation Captures all the moments, and is related to the IFT of pdf: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 34Important (Discrete) Random Variable: Bernoulli  The simplest possible measurement on an experiment:  Success (X = 1) or failure (X = 0).  Usual notation: P (1) P(X1) p P (0) P(X 0)1 p XX  E(X)= Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 35Binomial Distribution P(X) .6 .4 n = 5 p = 0.1 .2 .0 X 0 1 2 3 4 5 Mean  E(x) np P(X) n = 5 p = 0.5 .6 Standard Deviation .4 .2 .0 X  np (1 p) 0 1 2 3 4 5 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 36Binomial can be skewed or normal Depends upon p and n Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 37Binomials for different p, N =20 Distribution of Blocks Experiencing k losses out of N Distribution of Blocks Experiencing k losses out of N 25.00 30.00 25.00 20.00 20.00 15.00 15.00 10.00 10.00 5.00 5.00 0.00 0.00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of Losses out of N = 20 Number of Losses out of N = 20 As Npq 1, better approximated by normal 10 PER 30 PER Distribution of Blocks Experiencing k losses out of N distribution (esp) near the mean: 20.00 Npq = 4.2 Npq = 1.8 18.00 symmetric, sharp peak at mean, exponential 16.00 x2 14.00 square (e ) decay of tails 12.00 10.00 (pmf concentrated near mean) 8.00 6.00 4.00 50 PER 2.00 Npq = 5 0.00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Number of Losses out of N = 20 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 38 Number of Blocks Number of Blocks Number of BlocksImportant Random Variable: Poisson  A Poisson random variable X is defined by its PMF: (limit of binomial) x   P(X x) e x 0,1,2,... x Where 0 is a constant   Exercise: Show that  and E(X) =   Px ( ) 1 X x0  Poisson random variables are good for counting frequency of occurrence: like the number of customers that arrive to a bank in one hour, or the number of packets that arrive to a router in one second. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 39Important Continuous Random Variable: Exponential  Used to represent time, e.g. until the next arrival  Has PDF x e for x  0 fx ( ) X 0 for x 0 for some 0   Properties:  1 f (x)dx 1 and E(X ) X   0  Need to use integration by Parts Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 40Gaussian/Normal Distribution References: Appendix A.1 (Tse/Viswanath) Appendix B (Goldsmith) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 41Gaussian/Normal  Normal Distribution: Completely characterized by 2 mean () and variance ( )  Qfunction: onesided tail of normal pdf  erfc(): twosided tail.  So: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 42Normal Distribution: Why Uniform distribution looks nothing like bell shaped (gaussian) Large spread () CENTRAL LIMIT TENDENCY Sum of r.v.s from a uniform distribution after very few samples looks remarkably normal BONUS: it has decreasing  Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 43Gaussian: Rapidly Dropping Tail Probability z2 Why Doubly exponential PDF (e term…) A.k.a: “Light tailed” (not heavytailed). No skew or tail = don’t have two worry nd about 2 order parameters (mean, variance) nd Fully specified with just mean and variance (2 order) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 44Height Spread of Gaussian Can Vary Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 45Gaussian R.V.  Standard Gaussian :  Tail: Q(x)  tail decays exponentially  Gaussian property preserved w/ linear transformations: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 46Standardize the Normal Distribution X Z  Normal Standardized Distribution Normal Distribution = 1  = 0 X Z One table Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 47Obtaining the Probability Standardized Normal Probability Table (Portion) Z .00 .01 .02  = 1 0.0 .0000 .0040 .0080 .0478 .0398 .0438 0.1 .0478 0.2 .0793 .0832 .0871 = 0 .12 Z 0.3 .1179 .1217 .1255 Shaded area exaggerated Probabilities Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 48Example P(X 8) X 8 5 Z .30  10 Normal Standardized Distribution Normal Distribution  = 10 = 1 .5000 .3821 .1179  = 0 .30 Z  = 5 8 X Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Shaded area exaggerated : “shiv rpi” 49Qfunction: Tail of Normal Distribution Q(z) = P(Z z) = 1 – PZ z Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 50Sampling from NonNormal Populations Central Tendency Population Distribution  x = 10 Dispersion   x n  = 50 X  Sampling with Sampling Distribution replacement n = 4 n =30  = 5 = 1.8 X X   = 50 X X Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 51Central Limit Theorem (CLT) As sample size gets large enough (n  30) ... X Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 52Central Limit Theorem (CLT)  As  x n sample sampling size gets distribution large becomes enough almost (n  30) ... normal. X  x Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 53Aside: Caveat about CLT  Central limit theorem works if original distribution are not heavy tailed  Need to have enough samples. Eg: with multipaths, if there is not rich enough scattering, the convergence to normal may have not happened yet  Moments converge to limits  Trouble with aggregates of “heavy tailed” distribution samples  Rate of convergence to normal also varies with distributional skew, and dependence in samples  Nonclassical version of CLT for some cases (heavy tailed)…  Sum converges to stable Levynoise (heavy tailed and long range dependent autocorrelations) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 54Gaussian Vectors Other Distributions References: Appendix A.1 (Tse/Viswanath) Appendix B (Goldsmith) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 55Gaussian Vectors (RealValued)  Collection of i.i.d. gaussian r.vs: Euclidean distance from the origin to w 2 The density f(w) depends only on the magnitude of w, i.e. w t t Orthogonal transformation O (i.e., O O = OO = I) preserves the magnitude of a vector Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 562d Gaussian Random Vector Level sets (isobars) are circles • w has the same distribution in any orthonormal basis. • Distribution of w is invariant to rotations and reflections i.e. Qw w • w does not prefer any specific direction (“isotropic”) • Projections of the standard Gaussian random vector in orthogonal directions are independent. • sum of squares of n i.i.d. gaussian r.v.s = , exponential for n = 2 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 57Gaussian Random Vectors (Contd)  Linear transformations of the standard gaussian vector: t 2  pdf: has covariance matrix K = AA in the quadratic form instead of   When the covariance matrix K is diagonal, i.e., the component random variables are uncorrelated. Uncorrelated + gaussian = independence.  “White” gaussian vector = uncorrelated, or K is diagonal  Whitening filter = convert K to become diagonal (using eigen decomposition)  Note: normally AWGN noise has infinite components, but it is projected onto a finite signal space to become a gaussian vector Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 58Gaussian Random Vectors (uncorrelated vs correlated) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 59Complex Gaussian R.V: Circular Symmetry  A complex Gaussian random variable X whose real and imaginary components are i.i.d. gaussian  … satisfies a circular symmetry property: j  e X has the same distribution as X for any . j  e multiplication: rotation in the complex plane.  We shall call such a random variable circularly symmetric complex Gaussian, 2 2 2  …denoted by CN(0,  ), where  = EX . Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 60Complex Gaussian Circular Symmetry (Contd) Covariance matrix: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 61Complex Gaussian: Summary (I) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 62Complex Gaussian Vectors: Summary  We will often see equations like:  Here, we will make use of the fact that projections of w are complex gaussian, i.e.: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 63Related Distributions X = X , …, X is Normal 1 n X is Rayleigh eg: magnitude of a complex gaussian channel X + jX 1 2 2 X is ChiSquared w/ ndegrees of freedom When n = 2, chisquared becomes exponential. eg: power in complex gaussian channel: sum of squares… Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 64ChiSquared Distribution Sum of squares of n normal variables: Chisquared For n =2, it becomes an exponential distribution. Becomes bellshaped for larger n Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 65Maximum Likelihood (ML) Detection: Concepts Reference: Mackay, Information Theory, http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html (chap 3, online book) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 66Likelihood Principle  Experiment:  Pick Urn A or Urn B at random  Select a ball from that Urn.  The ball is black.  What is the probability that the selected Urn is A Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 67Likelihood Principle (Contd)  Write out what you know  P(Black UrnA) = 1/3  P(Black UrnB) = 2/3  P(Urn A) = P(Urn B) = 1/2  We want P(Urn A Black).  Gut feeling: Urn B is more likely than Urn A (given that the ball is black). But by how much  This is an inverse probability problem.  Make sure you understand the inverse nature of the conditional probabilities  Solution technique: Use Bayes Theorem. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 68Likelihood Principle (Contd)  Bayes manipulations:  P(Urn A Black) =  P(Urn A and Black) /P(Black)  Decompose the numerator and denomenator in terms of the probabilities we know.  P(Urn A and Black) = P(Black UrnA)P(Urn A)  P(Black) = P(Black Urn A)P(Urn A) + P(Black UrnB)P(UrnB)  We know all these values (see prev page) Plug in and crank.  P(Urn A and Black) = 1/3 1/2  P(Black) = 1/3 1/2 + 2/3 1/2 = 1/2  P(Urn A and Black) /P(Black) = 1/3 = 0.333  Notice that it matches our gut feeling that Urn A is less likely, once we have seen black.  The information that the ball is black has CHANGED  From P(Urn A) = 0.5 to P(Urn A Black) = 0.333 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 69Likelihood Principle  Way of thinking…  Hypotheses: Urn A or Urn B  Observation: “Black”  Prior probabilities: P(Urn A) and P(Urn B)  Likelihood of Black given choice of Urn: aka forward probability  P(Black Urn A) and P(Black Urn B)  Posterior Probability: of each hypothesis given evidence  P(Urn A Black) aka inverse probability  Likelihood Principle (informal): All inferences depend ONLY on  The likelihoods P(Black Urn A) and P(Black Urn B), and  The priors P(Urn A) and P(Urn B)  Result is a probability (or distribution) model over the space of possible hypotheses. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 70Maximum Likelihood (intuition)  Recall:  P(Urn A Black) = P(Urn A and Black) /P(Black) = P(Black UrnA)P(Urn A) / P(Black)  P(Urn Black) is maximized when P(Black Urn) is maximized.  Maximization over the hypotheses space (Urn A or Urn B)  P(Black Urn) = “likelihood”  = “Maximum Likelihood” approach to maximizing posterior probability Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 71Maximum Likelihood: intuition Max likelihood This hypothesis has the highest (maximum) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute likelihood of explaining the data observed : “shiv rpi” 72Maximum Likelihood (ML): mechanics  Independent Observations (like Black): X , …, X 1 n  Hypothesis   Likelihood Function: L() = P(X , …, X ) =  P(X ) 1 n i i  Independence = multiply individual likelihoods  Log Likelihood LL() =  log P(X ) i i  Maximum likelihood: by taking derivative and setting to zero and solving for  P  Maximum A Posteriori (MAP): if nonuniform prior probabilities/distributions  Optimization function Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 73Back to Urn example  In our urn example, we are asking:  Given the observed data “ball is black”…  …which hypothesis (Urn A or Urn B) has the highest likelihood of explaining this observed data  Ans from above analysis: Urn B  Note: this does not give the posterior probability P(Urn A Black), but quickly helps us choose the best hypothesis (Urn B) that would explain the data… More examples: (biased coin etc) http://en.wikipedia.org/wiki/Maximumlikelihood http://www.inference.phy.cam.ac.uk/mackay/itprnn/book.html (chap 3) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 74Not Just Urns and Balls: Detection of signal in AWGN  Detection problem:  Given the observation vector , perform a mapping from z z ˆ m m to an estimate of the transmitted symbol, , such that i the average probability of error in the decision is minimized. n s z i m ˆ m Modulator Decision rule i Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 75Binary PAM + AWGN Noise p (z m ) z 2 p (z m ) z 1 s s 2 1  (t) 1 0  E E b b Signal s1 or s2 is sent. z is received Additive white gaussian noise (AWGN) = the likelihoods are bellshaped pdfs around s1 and s2 p (z m ) p (z m ) z 1 z 2 MLE = at any point on the xaxis, see which curve (blue or red) has a higher (maximum) value and select the corresponding signal (s1 or s2) : simplifies into a “nearestneighbor” rule Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 76AWGN Nearest Neighbor Detection  Projection onto the signal directions (subspace) is called matched filtering to get the “sufficient statistic”  Error probability is the tail of the normal distribution (Qfunction), based upon the midpoint between the two signals Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 77Detection in AWGN: Summary Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 78Vector detection (contd) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 79Estimation References: • Appendix A.3 (Tse/Viswanath) • Stark Woods, Probability and Random Processes with Applications to Signal Processing, Prentice Hall, 2001 • Schaum's Outline of Probability, Random Variables, and Random Processes • Popoulis, Pillai, Probability, Random Variables and Stochastic Processes, McGrawHill, 2002. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 80Detection vs Estimation  In detection we have to decide which symbol was transmitted s or s A B  This is a binary (0/1, or yes/no) type answer, with an associated error probability  In estimation, we have to output an estimate h’ of a transmitted signal h.  This estimate is a complex number, not a binary answer.  Typically, we try to estimate the complex channel h, so that we can use it in coherent combining (matched filtering) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 81Estimation in AWGN: MMSE Need:  Performance criterion: meansquared error (MSE)  Optimal estimator is the “conditional mean” of x given the observation y  Gives Minimum MeanSquare Error (MMSE)  Satisfies orthogonality property:  Error independent of observation:  But, the conditional mean is a nonlinear operator  It becomes linear if x is also gaussian.  Else, we need to find the best linear approximation (LMMSE) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 82LMMSE  We are looking for a linear estimate: x = cy  The best linear estimator, i.e. weighting coefficient c is:  We are weighting the received signal y by the transmit signal energy as a fraction of the received signal energy.  The corresponding error (MMSE) is: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 83LMMSE: Generalization Summary Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 84Random Processes References: • Appendix B (Goldsmith) • Stark Woods, Probability and Random Processes with Applications to Signal Processing, Prentice Hall, 2001 • Schaum's Outline of Probability, Random Variables, and Random Processes • Popoulis, Pillai, Probability, Random Variables and Stochastic Processes, McGrawHill, 2002. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 85Random Sequences and Random Processes Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 86Random process  A random process is a collection of time functions, or signals, corresponding to various outcomes of a random experiment. For each outcome, there exists a deterministic function, which is called a sample function or a realization. Random variables Sample functions or realizations (deterministic function) time (t) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 87 Real numberSpecifying a Random Process  A random process is defined by all its joint CDFs for all possible sets of sample times t n t 2 t t 1 0 Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 88Stationarity  If timeshifts (any value T) do not affect its joint CDF t +T n t n t +T 2 t +T 1 t t + T 0 2 t Shivkumar Kalyanaraman Rensselaer Polytechnic Institute 1 t 0 : “shiv rpi” 89Weak Sense Stationarity (wss) nd  Keep only above two properties (2 order stationarity)…  Don’t insist that higherorder moments or higher order joint CDFs be unaffected by lag T  With LTI systems, we will see that WSS inputs lead to WSS outputs,  In particular, if a WSS process with PSD S (f) is passed through a linear time X invariant filter with frequency response H(f), then the filter output is also a WSS 2 process with power spectral density H(f) S (f). X nd  Gaussian w.s.s. = Gaussian stationary process (since it only has 2 order moments) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 90Stationarity: Summary  Strictly stationary: If none of the statistics of the random process are affected by a shift in the time origin.  Wide sense stationary (WSS): If the mean and autocorrelation function do not change with a shift in the origin time.  Cyclostationary: If the mean and autocorrelation function are periodic in time. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 91Ergodicity  Time averages = Ensemble averages i.e. “ensemble” averages like mean/autocorrelation can be computed as “time averages” over a single realization of the random process  A random process: ergodic in mean and autocorrelation (like w.s.s.) if and Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 92Autocorrelation: Summary  Autocorrelation of an energy signal  Autocorrelation of a power signal  For a periodic signal:  Autocorrelation of a random signal  For a WSS process: Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 93Power Spectral Density (PSD) 1. S (f) is real and S (f) ≥ 0 X X 2. S (f) = S (f) X X 3. A (0) = ∫ S (ω) dω X X Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 94Power Spectrum For a deterministic signal x(t), the spectrum is well defined: If X() represents its Fourier transform, i.e., if  jt X () x(t)e dt,   2 then represents its energy spectrum. This follows from X () Parseval’s theorem since the signal energy is given by  22 1 x (t)dt X ( ) d E.  2  2 (, ) Thus represents the signal energy in the band X ( )  2 X ( ) (, ) Energy in Xt() t  0 0    Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 95Spectral density: Summary  Energy signals:  Energy spectral density (ESD):  Power signals:  Power spectral density (PSD):  Random process:  Power spectral density (PSD): Shivkumar Kalyanaraman Rensselaer Polytechnic Institute Note: we have used f for ω and Gx for Sx : “shiv rpi” 96Properties of an autocorrelation function  For realvalued (and WSS for random signals): 1. Autocorrelation and spectral density form a Fourier transform pair. R () ↔ S (ω) X X 2. Autocorrelation is symmetric around zero. R () = R () X X 3. Its maximum value occurs at the origin. R () ≤ R (0) X X 4. Its value at the origin is equal to the average power or energy. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 97Noise in communication systems  Thermal noise is described by a zeromean Gaussian random process, n(t).  Its PSD is flat, hence, it is called white noise. IID gaussian. w/Hz Power spectral density Autocorrelation function Probability density function Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 98White Gaussian Noise  White:  Power spectral density (PSD) is the same, i.e. flat, for all frequencies of 12 interest (from dc to 10 Hz)  Autocorrelation is a delta function = two samples no matter however close are uncorrelated.  N /2 to indicate twosided PSD 0 2  Zeromean gaussian completely characterized by its variance ( )  Variance of filtered noise is finite = N /2 0  Similar to “white light” contains equal amounts of all frequencies in the visible band of EM spectrum  Gaussian + uncorrelated = i.i.d.  Affects each symbol independently: memoryless channel  Practically: if b/w of noise is much larger than that of the system: good enough  Colored noise: exhibits correlations at positive lags Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 99Signal transmission w/ linear systems (filters) Input Output Linear system  Deterministic signals:  Random signals: Ideal distortion less transmission: • All the frequency components of the signal not only arrive with an identical time delay, but also amplified or attenuated equally. Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 100(Deterministic) Systems with Stochastic Inputs 1 A deterministic system transforms each input waveform into X (t, ) i Y (t, ) TX (t, ) an output waveform by operating only on the i i time variable t. Thus a set of realizations at the input corresponding Y (t, ) to a process X(t) generates a new set of realizations at the output associated with a new process Y(t). Y (t, ) i X (t, ) i X (t) Y (t) T  t t Fig. 14.3 Our goal is to study the output process statistics in terms of the input process statistics and the system function. 1  . A stochastic system on the other hand operates on both the variables t and Shivkumar Kalyanaraman Rensselaer Polytechnic Institute PILL : “ A sI/Ch hiv rpi a ” 101Deterministic Systems Memoryless Systems Systems with Memory Y (t) gX (t) TimeInvariant Linear systems Timevarying Y (t) LX (t) systems systems LinearTime Invariant (LTI) systems  Xt () ht () Y (t) h(t )X ( )d     h( )X (t )d. LTI system   Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 102LTI Systems: WSS input good enough X (t) Y (t) widesense widesense LTI system stationary process stationary process. h(t) (a) X (t) Y (t) strictsense LTI system strictsense stationary process h(t) stationary process (b) X (t) Y (t) Gaussian process Gaussian Linear system (also stationary) process (also stationary) (c) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 103White Noise Process LTI Systems W(t) is said to be a white noise process if R (t ,t ) q(t ) (tt ), WW 1 2 1 1 2 i.e., EW(t ) W (t ) = 0 unless t = t . 1 2 1 2 W(t) is said to be widesense stationary (w.s.s) white noise if EW(t) = constant, and R (t ,t ) q (tt ) q ( ). WW 1 2 1 2 If W(t) is also a Gaussian process (white Gaussian process), then all of its samples are independent random variables Colored noise LTI White noise N(t) h(t) W (t) h(t) W(t) Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 104Summary  Probability, union bound, bayes rule, maximum likelihood  Expectation, variance, Characteristic functions  Distributions: Normal/gaussian, Rayleigh, Chisquared, Exponential  Gaussian Vectors, Complex Gaussian  Circular symmetry vs isotropy  Random processes:  stationarity, w.s.s., ergodicity  Autocorrelation, PSD, white gaussian noise  Random signals through LTI systems: gaussian wss useful properties that are preserved.  Frequency domain analysis possible Shivkumar Kalyanaraman Rensselaer Polytechnic Institute : “shiv rpi” 105
Website URL
Comment