Question? Leave a message!




Markov processes and Hidden Markov Models (HMMs)

Markov processes and Hidden Markov Models (HMMs)
Markov processes and Hidden Markov Models (HMMs) www.ThesisScientist.comMotivation • The Bayes nets we considered so far were static: they referred to a single point in time – E.g., medical diagnosis • Agent needs to model how the world evolves – Speech recognition software needs to process speech over time – Artificially intelligent software assistant needs to keep track of user’s intentions over time – … … … www.ThesisScientist.comMarkov processes • We have time periods t = 0, 1, 2, … • In each period t, the world is in a certain state S t • The Markov assumption: given S , S is t t+1 independent of all S with i t i – P(S S , S , …, S ) = P(S S ) t+1 1 2 t t+1 t – Given the current state, history tells us nothing more about the future S S S S …… 1 2 3 t • Typically, all the CPTs are the same: www.ThesisScientist.com • For all t, P(S = j S = i) = a (stationarity assumption) t+1 t ij Weather example • S is one of s, c, r (sun, cloudy, rain) t • Transition probabilities: .6 s .1 .3 .4 not a Bayes net .3 .2 .3 c .3 r .5 • Also need to specify an initial distribution P(S ) 0 • Throughout, assume P(S = s) = 1 0 www.ThesisScientist.comWeather example… .6 s .1 .3 .4 .3 .2 .3 c .3 r .5 • What is the probability that it rains two days from now P(S = r) 2 • P(S = r) = P(S = r, S = r) + P(S = r, S = s) + 2 2 1 2 1 P(S = r, S = c) = .1.3 + .6.1 + .3.3 = .18 2 1 www.ThesisScientist.comWeather example… .6 s .1 .3 .4 .3 .2 .3 c .3 r .5 • What is the probability that it rains three days from now • Computationally inefficient way: P(S = r) = P(S = 3 3 r, S = r, S = r) + P(S = r, S = r, S = s) + … 2 1 3 2 1 n1 • For n periods into the future, need to sum over 3 paths www.ThesisScientist.comWeather example… .6 s .1 .3 .4 .3 .2 .3 c .3 r .5 • More efficient: • P(S = r) = P(S = r, S = r) + P(S = r, S = s) + P(S = r, 3 3 2 3 2 3 S = c) = P(S = r S = r)P(S = r) + P(S = r S = s)P(S 2 3 2 2 3 2 2 = s) + P(S = r S = c)P(S = c) 3 2 2 • Only hard part: figure out P(S ) 2 • Main idea: compute distribution P(S ), then P(S ), then 1 2 P(S ) 3 • Linear in number of periods www.ThesisScientist.com example on boardStationary distributions • As t goes to infinity, “generally,” the distribution P(S ) will converge to a stationary distribution t • A distribution given by probabilitiesπ (where i is a i state) is stationary if: P(S = i) =π means that P(S = i) =π t i t+1 i • Of course, P(S = i) = Σ P(S = i, S = j) = Σ P(S = j) a t+1 j t+1 t j t ji • So, stationary distribution is defined by π = Σ π a i j j ji www.ThesisScientist.comComputing the stationary distribution .6 s .1 .3 .4 .3 .2 .3 c .3 r .5 • π = .6π + .4π + .2π s s c r • π = .3π + .3π + .5π c s c r • π = .1π + .3π + .3π r s c r www.ThesisScientist.comRestrictiveness of Markov models • Are past and future really independent given current state • E.g., suppose that when it rains, it rains for at most 2 days S S S S … 3 4 1 2 • Secondorder Markov process • Workaround: change meaning of “state” to events of last 2 days S , S S , S S , S S , S 3 4 4 5 2 3… 1 2 • Another approach: add more information to the state • E.g., the full state of the world would include whether the sky is full of water – Additional information may not be observable www.ThesisScientist.com – Blowup of number of states…Hidden Markov models (HMMs) • Same as Markov model, except we cannot see the state • Instead, we only see an observation each period, which depends on the current state S S S S …… 1 2 3 t O O O O …… 3 t 1 2 • Still need a transition model: P(S = j S = i) = a t+1 t ij • Also need an observation model: P(O = k S = i) = b t t ik www.ThesisScientist.comWeather example extended to HMM • Transition probabilities: .6 s .1 .3 .4 .3 .2 .3 c .3 r .5 • Observation: labmate wet or dry • b = .1, b = .3, b = .8 sw cw rw www.ThesisScientist.comHMM weather example: a question .6 b = .1 sw s b = .3 .1 cw .3 .4 b = .8 .3 rw .2 .3 c .3 r .5 • You have been stuck in the lab for three days () • On those days, your labmate was dry, wet, wet, respectively • What is the probability that it is now raining outside • P(S = r O = d, O = w, O = w) 2 0 1 2 www.ThesisScientist.com • By Bayes’ rule, really want to know P(S , O = d, O = w, O = w) 2 0 1 2 Solving the question .6 b = .1 sw s b = .3 .1 cw .3 .4 b = .8 .3 rw .2 .3 c .3 r .5 • Computationally efficient approach: first compute P(S = i, O = d, O = w) for all states i 1 0 1 • General case: solve for P(S , O = o , O = o , …, O t 0 0 1 1 t = o ) for t=1, then t=2, … This is called monitoring t • P(S , O = o , O = o , …, O = o ) = Σ P(S = s , s t 0 0 1 1 t t t1 t1 t1 O = o , O = o , …, O = o ) P(S S = s ) P(O = 0 0 1 1 t1 t1 t t1 t1 t www.ThesisScientist.com o S ) t tPredicting further out .6 b = .1 sw s b = .3 .1 cw .3 .4 b = .8 .3 rw .2 .3 c .3 r .5 • You have been stuck in the lab for three days • On those days, your labmate was dry, wet, wet, respectively • What is the probability that two days from now it will be raining outside • P(S = r O = d, O = w, O = w) 4 0 1 2 www.ThesisScientist.comPredicting further out, continued… .6 b = .1 sw s b = .3 .1 cw .3 .4 b = .8 .3 rw .2 .3 c .3 r .5 • Want to know: P(S = r O = d, O = w, O = w) 4 0 1 2 • Already know how to get: P(S O = d, O = w, O = w) 2 0 1 2 • P(S = r O = d, O = w, O = w) = 3 0 1 2 Σ P(S = r, S = s O = d, O = w, O = w) s 3 2 2 0 1 2 2 Σ P(S = r S = s )P(S = s O = d, O = w, O = w) s 3 2 2 2 2 0 1 2 2 • Etc. for S 4 • So: monitoring first, then straightforward Markov process www.ThesisScientist.com updatesIntegrating newer information .6 b = .1 sw s b = .3 .1 cw .3 .4 b = .8 .3 rw .2 .3 c .3 r .5 • You have been stuck in the lab for four days () • On those days, your labmate was dry, wet, wet, dry respectively • What is the probability that two days ago it was raining outside P(S = r O = d, O = w, O = w, O 1 0 1 2 3 = d) www.ThesisScientist.com – Smoothing or hindsight problemHindsight problem continued… .6 b = .1 sw s b = .3 .1 cw .3 .4 b = .8 .3 rw .2 .3 c .3 r .5 • Want: P(S = r O = d, O = w, O = w, O = d) 1 0 1 2 3 • “Partial” application of Bayes’ rule: P(S = r O = d, O = w, O = w, O = d) = 1 0 1 2 3 P(S = r, O = w, O = d O = d, O = w) / 1 2 3 0 1 P(O = w, O = d O = d, O = w) 2 3 0 1 • So really want to know P(S , O = w, O = d O = d, O = w) 1 2 3 0 1 www.ThesisScientist.comHindsight problem continued… .6 b = .1 sw s b = .3 .1 cw .3 .4 b = .8 .3 rw .2 .3 c .3 r .5 • Want to know P(S = r, O = w, O = d O = d, O = w) 1 2 3 0 1 • P(S = r, O = w, O = d O = d, O = w) = 1 2 3 0 1 P(S = r O = d, O = w) P(O = w, O = d S = r) 1 0 1 2 3 1 • Already know how to compute P(S = r O = d, O = w) 1 0 1 • Just need to compute P(O = w, O = d S = r) 2 3 1 www.ThesisScientist.comHindsight problem continued… .6 b = .1 sw s b = .3 .1 cw .3 .4 b = .8 .3 rw .2 .3 c .3 r .5 • Just need to compute P(O = w, O = d S = r) 2 3 1 • P(O = w, O = d S = r) = 2 3 1 Σ P(S = s , O = w, O = d S = r) = s 2 2 2 3 1 2 Σ P(S = s S = r) P(O = w S = s ) P(O = d S = s ) s 2 2 1 2 2 2 3 2 2 2 • First two factors directly in the model; last factor is a “smaller” problem of the same kind • Use dynamic programming, backwards from the future www.ThesisScientist.com – Similar to forwards approach from the pastBackwards reasoning in general • Want to know P(O = o , …, O = o k+1 k+1 t t S ) k • First compute P(O = o S ) =Σ P(S = s S )P(O = o t t t1 s t t t1 t t t S = s ) t t • Then compute P(O = o , O = o S ) =Σ P(S = s t t t1 t1 t2 s t1 t1 t1 S )P(O = o S = s ) P(O = o S t2 t1 t1 t1 t1 t t t1 = s ) t1 www.ThesisScientist.com • Etc. Variable elimination • Because all of this is inference in a Bayes net, we can also just do variable elimination S S S S …… 1 2 3 t O O O O …… 3 t 1 2 • E.g., P(S = r, O = d, O = w, O = w) = 3 1 2 3 Σ Σ P(S =s )P(O =dS =s )P(S =s S =s ) s s 1 1 1 1 1 2 2 1 1 2 1 P(O =wS =s )P(S =rS =s )P(O =wS =r) 2 2 2 3 2 2 3 3 • It’s a tree, so variable elimination works well www.ThesisScientist.comDynamic Bayes Nets • So far assumed that each period has one variable for state, one variable for observation • Often better to divide state and observation up into multiple variables weather in Durham, 1 weather in Durham, 2 … NC wind, 1 NC wind, 2 weather in Beaufort, 1 weather in Beaufort, 2 www.ThesisScientist.com edges both within a period, and from one period to the next…Some interesting things we skipped • Finding the most likely sequence of states, given observations –Not necessary equal to the sequence of most likely states (example) –Viterbi algorithm • Key idea: for each period t, for every state, keep track of most likely sequence to that state at that period, given evidence up to that period • Continuous variables • Approximate inference methods www.ThesisScientist.com –Particle filtering
Website URL
Comment