Question? Leave a message!




Bayesian networks

Bayesian networks
Bayesian networksSpecifying probability distributions • Specifying a probability for every atomic event is impractical • P(X , …, X ) would need to be specified for every 1 n combination x , …, x of values for X , …, X 1 n 1 n – If there are k possible values per variable… n – … we need to specify k 1 probabilities • We have already seen it can be easier to specify probability distributions by using (conditional) independence • Bayesian networks allow us – to specify any distribution, – to specify such distributions concisely if there is (conditional) independence, in a natural wayA general approach to specifying probability distributions • Say the variables are X , …, X 1 n • P(X , …, X ) = P(X )P(X X )P(X X ,X )…P(X X , …, X ) 1 n 1 2 1 3 1 2 n 1 n1 • or: • P(X , …, X ) = P(X )P(X X )P(X X ,X )…P(X X , …, X ) 1 n n n1 n n2 n n1 1 n 2 • Can specify every component – For every combination of values for the variables on the right of , specify the probability over the values for the variable on the left • If every variable can take k values, i1 • P(X X , …, X ) requires (k1)k values i 1 i1 i1 i i1 n • Σ (k1)k = Σ k k = k 1 i=1,..,n i=1,..,n • Same as specifying probabilities of all atomic events – of course, because we can specify any distributionGraphically representing influences X 1 X 2 X 3 X 4Conditional independence to the rescue • Problem: P(X X , …, X ) requires us to i 1 i1 specify too many values • Suppose X , …, X partition into two 1 i1 subsets, S and T, so that X is conditionally i independent from T given S • P(X X , …, X ) = P(X S, T) = P(X S) i 1 i1 i i S • Requires only (k1)k values instead of (k i1 1)k valuesGraphically representing influences • … if X is conditionally independent from X 4 2 given X and X 1 3 X 1 X 2 X 3 X 4Rain and sprinklers example sprinklers is independent of raining, so no edge between them sprinklers (Y) raining (X) P(X=1) = .3 P(Y=1) = .4 grass wet (Z) Each node has a P(Z=1 X=0, Y=0) = .1 conditional P(Z=1 X=0, Y=1) = .8 probability table P(Z=1 X=1, Y=0) = .7 (CPT) P(Z=1 X=1, Y=1) = .9Rigged casino example P(CR=1) = 1/2 casino rigged P(D2=1CR=0) = 1/6 P(D1=1CR=0) = 1/6 … … P(D2=5CR=0) = 1/6 P(D1=5CR=0) = 1/6 P(D2=1CR=1) = 3/12 P(D1=1CR=1) = 3/12 … … P(D2=5CR=1) = 1/6 P(D1=5CR=1) = 1/6 die 1 die 2 die 2 is conditionally independent of die 1 given casino rigged, so no edge between themRigged casino example with poorly chosen order die 2 die 1 die 1 and die 2 are not independent casino rigged both the dice have relevant information for whether the need 36 probabilities here casino is riggedMore elaborate rain and sprinklers example sprinklers P(+r) = .2 rained were on P(+s) = .6 P(+g+r,+s) = .9 neighbor P(+g+r,s) = .7 grass wet walked dog P(+gr,+s) = .8 P(+gr,s) = .2 P(+n+r) = .3 P(+d+n,+g) = .9 P(+nr) = .4 P(+d+n,g) = .4 dog wet P(+dn,+g) = .5 P(+dn,g) = .3Inference sprinklers P(+r) = .2 rained were on P(+s) = .6 P(+g+r,+s) = .9 neighbor P(+g+r,s) = .7 grass wet walked dog P(+gr,+s) = .8 P(+gr,s) = .2 P(+n+r) = .3 P(+d+n,+g) = .9 P(+nr) = .4 P(+d+n,g) = .4 dog wet P(+dn,+g) = .5 P(+dn,g) = .3 • Want to know: P(+r+d) = P(+r,+d)/P(+d) • Let’s compute P(+r,+d)Inference… sprinklers P(+r) = .2 rained were on P(+s) = .6 P(+g+r,+s) = .9 neighbor P(+g+r,s) = .7 grass wet walked dog P(+gr,+s) = .8 P(+gr,s) = .2 P(+n+r) = .3 P(+d+n,+g) = .9 P(+nr) = .4 P(+d+n,g) = .4 dog wet P(+dn,+g) = .5 P(+dn,g) = .3 • P(+r,+d)= Σ Σ Σ P(+r)P(s)P(n+r)P(g+r,s)P(+dn,g) = s g n P(+r)Σ P(s)Σ P(g+r,s)Σ P(n+r)P(+dn,g) s g nVariable elimination sprinklers P(+r) = .2 rained were on P(+s) = .6 P(+g+r,+s) = .9 neighbor P(+g+r,s) = .7 grass wet walked dog P(+gr,+s) = .8 P(+gr,s) = .2 P(+n+r) = .3 P(+d+n,+g) = .9 P(+nr) = .4 P(+d+n,g) = .4 dog wet P(+dn,+g) = .5 P(+dn,g) = .3 • From the factor Σ P(n+r)P(+dn,g) we sum out n to obtain a factor only depending on g n • Σ P(n+r)P(+dn,+g) = P(+n+r)P(+d+n,+g) + P(n+r)P(+dn,+g) = .3.9+.7.5 = .62 n • Σ P(n+r)P(+dn,g) = P(+n+r)P(+d+n,g) + P(n+r)P(+dn,g) = .3.4+.7.3 = .33 n • Continuing to the left, g will be summed out next, etc. (continued on board) Elimination order matters sprinklers P(+r) = .2 rained were on P(+s) = .6 P(+g+r,+s) = .9 neighbor P(+g+r,s) = .7 grass wet walked dog P(+gr,+s) = .8 P(+gr,s) = .2 P(+n+r) = .3 P(+d+n,+g) = .9 P(+nr) = .4 P(+d+n,g) = .4 dog wet P(+dn,+g) = .5 P(+dn,g) = .3 • P(+r,+d)= Σ Σ Σ P(+r)P(s)P(n+r)P(g+r,s)P(+dn,g) = n s g P(+r)Σ P(n+r)Σ P(s)Σ P(g+r,s)P(+dn,g) n s g • Last factor will depend on two variables in this caseDon’t always need to sum over all variables sprinklers P(+r) = .2 rained were on P(+s) = .6 P(+g+r,+s) = .9 neighbor P(+g+r,s) = .7 grass wet walked dog P(+gr,+s) = .8 P(+gr,s) = .2 P(+n+r) = .3 P(+d+n,+g) = .9 P(+nr) = .4 P(+d+n,g) = .4 dog wet P(+dn,+g) = .5 P(+dn,g) = .3 • Can drop parts of the network that are irrelevant • P(+r, +s) = P(+r)P(+s) = .6.2 = .12 • P(+n, +s) = Σ P(r, +n, +s) = Σ P(r)P(+nr)P(+s) = P(+s)Σ P(r)P(+nr) = r r r P(+s)(P(+r)P(+n+r) + P(r)P(+nr)) = .6(.2.3 + .8.4) = .6.38 = .228 • P(+d +n, +g, +s) = P(+d +n, +g) = .9Trees are easy X X X 1 2 3 X 4 X X 5 6 X X 7 8 • Choose an extreme variable to eliminate first • Its probability is “absorbed” by its neighbor • …Σ P(x x ,x )…Σ P(x x ) = …Σ P(x x ,x )Σ P(x x )… x4 4 1 2 x5 5 4 x4 4 1 2 x5 5 4Clustering algorithms sprinklers sprinklers rained rained were on were on neighbor walked neighbor dog, grass wet grass wet walked dog dog wet dog wet • Merge nodes into “meganodes” until we have a tree – Then, can apply specialpurpose algorithm for trees • Merged node has values +n+g,+ng,n+g,ng – Much larger CPTLogic gates in Bayes nets • Not everything needs to be random… AND gate OR gate X X X X 1 2 1 2 P(+y+x ,+x ) = 1 P(+y+x ,+x ) = 1 1 2 1 2 Y Y P(+yx ,+x ) = 0 P(+yx ,+x ) = 1 1 2 1 2 P(+y+x ,x ) = 0 P(+y+x ,x ) = 1 1 2 1 2 P(+yx ,x ) = 0 P(+yx ,x ) = 0 1 2 1 2Modeling satisfiability as a Bayes Net • (+X OR X ) AND (X OR X OR X ) 1 2 1 2 3 P(+x ) = ½ 1 P(+x ) = ½ 2 P(+x ) = ½ 3 X X X 1 2 3 Y = X OR X 1 2 P(+c +x ,+x ) = 1 1 1 2 P(+c +y,+x ) = 1 P(+c x ,+x ) = 0 C 2 3 1 1 2 1 P(+y+x ,+x ) = 0 1 2 P(+c y,+x ) = 0 P(+c +x ,x ) = 1 2 3 1 1 2 P(+yx ,+x ) = 1 1 2 C P(+c +y,x ) = 1 P(+c x ,x ) = 1 2 2 3 1 1 2 P(+y+x ,x ) = 1 1 2 P(+c y,x ) = 1 2 3 P(+yx ,x ) = 1 1 2 P(+f+c ,+c ) = 1 1 2 P(+fc ,+c ) = 0 formula 1 2 P(+f+c ,c ) = 0 1 2 P(+fc ,c ) = 0 1 2 • P(+f) 0 iff formula is satisfiable, so inference is NPhard n • P(+f) = (satisfying assignments/2 ), so inference is Phard (because counting number of satisfying assignments is)More about conditional independence • A node is conditionally independent of its nondescendants, given its parents • A node is conditionally independent of everything else in the graph, given its parents, children, and children’s parents (its Markov blanket) sprinklers rained were on • N is independent of G given R • N is not independent of G given R and D neighbor • N is independent of S grass wet walked dog given R, G, D Note: can’t know for sure that two nodes are not independent: edges dog wet may be dummy edgesGeneral criterion: dseparation • Sets of variables X and Y are conditionally independent given variables in Z if all paths between X and Y are blocked; a path is blocked if one of the following holds: – it contains U V W or U V W or U V W, and V is in Z – it contains U V W, and neither V nor any of its descendants are in Z sprinklers rained • N is independent of G were on given R • N is not independent of S given R and D neighbor grass wet walked dog dog wet