How Bayesian Network works

what is bayesian network with example,what is bayesian network meta-analysis and network classification, how bayesian network deals with uncertainty
Dr.BenjaminClark Profile Pic
Dr.BenjaminClark,United States,Teacher
Published Date:21-07-2017
Your Website URL(Optional)
CSE 473 CSE 473 Chapter 14 Chapter 14 B Ba ay ye es si ia an n N Ne et tw wo or rk ks s What are Bayesian networks? • Simple, graphical notation for conditional independence assertions • Allows compact specification of full joint distributions • Syntax: a set of nodes, one per random variable a directed, acyclic graph (link ≈ "directly influences") a conditional distribution for each node given its parents: P (X Parents (X )) i i • For discrete variables, conditional distribution = conditional probability table (CPT) = distribution over X for each combination of parent values i © CSE AI Faculty 2 1Back at the Dentist’s • Topology of network encodes conditional independence assertions: • Weather is independent of the other variables • Toothache and Catch are conditionally independent of each other given Cavity © CSE AI Faculty 3 Example 2: Burglars and Earthquakes • You are at a “Done with 473” party at a friend’s. • Neighbor John calls to say your home alarm is ringing (but neighbor Mary doesn't). • Sometimes your alarm is set off by minor earthquakes. • Question: Is your home being burglarized? • Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls • Network topology reflects "causal" knowledge: A burglar can set the alarm off An earthquake can set the alarm off The alarm can cause Mary to call The alarm can cause John to call © CSE AI Faculty 4 2Burglars and Earthquakes © CSE AI Faculty 5 Compact Representation of Probabilities in Bayesian Networks k • A CPT for Boolean X with k Boolean parents has 2 rows i for the combinations of parent values • Each row requires 1 number p for X = true i (the number for X = false is just 1-p) i • If each variable has no more than k parents, the k complete network requires O(n · 2 ) numbers n I.e., grows linearly with n, vs. O(2 ) for full joint distribution 5 • For our network, 1+1+4+2+2 = 10 numbers (vs. 2 -1 = 31) © CSE AI Faculty 6 3Semantics Full joint distribution is defined as product of local conditional distributions: π n P (X , … ,X ) = P (X Parents(X )) 1 n i = 1 i i e.g., P(j ∧ m ∧ a ∧ ¬b ∧ ¬e) = P (j a) P (m a) P (a ¬b, ¬e) P (¬b) P (¬e) © CSE AI Faculty 7 Constructing Bayesian networks • 1. Choose an ordering of variables X , … ,X 1 n • 2. For i = 1 to n add X to the network i select parents from X , … ,X such that 1 i-1 P (X Parents(X )) = P (X X , ... X ) i i i 1 i-1 This choice of parents guarantees: P (X , … ,X ) = P (X X , … , X ) P (X , … , X ) 1 n n 1 n-1 1 n-1 = P (X X , … , X ) P (X X , … , X ) P (X , … , X ) n 1 n-1 n-1 1 n-2 1 n-2 n π = P (X X , … , X ) (chain rule) i =1 i 1 i-1 n π = P (X Parents(X )) (by construction) i =1 i i © CSE AI Faculty 8 4Example • Suppose we choose the ordering M, J, A, B, E P(J M) = P(J)? © CSE AI Faculty 9 Example • Suppose we choose the ordering M, J, A, B, E P(J M) = P(J)? No P(A J, M) = P(A J)? P(A J, M) = P(A)? © CSE AI Faculty 10 5Example • Suppose we choose the ordering M, J, A, B, E P(J M) = P(J)? No P(A J, M) = P(A J)? No P(A J, M) = P(A)? No P(B A, J, M) = P(B)? P(B A, J, M) = P(B A)? © CSE AI Faculty 11 Example • Suppose we choose the ordering M, J, A, B, E P(J M) = P(J)? No P(A J, M) = P(A J)? No P(A J, M) = P(A)? No P(B A, J, M) = P(B)? No P(B A, J, M) = P(B A)? Yes P(E B, A ,J, M) = P(E A)? P(E B, A, J, M) = P(E A, B)? © CSE AI Faculty 12 6Example • Suppose we choose the ordering M, J, A, B, E P(J M) = P(J)? No P(A J, M) = P(A J)? No P(A J, M) = P(A)? No P(B A, J, M) = P(B)? No P(B A, J, M) = P(B A)? Yes P(E B, A ,J, M) = P(E A)? No P(E B, A, J, M) = P(E A, B)? Yes © CSE AI Faculty 13 Example contd. Non-causal Causal vs. • Deciding conditional independence is hard in non- causal directions • Causal models and conditional independence seem hardwired for humans (recent Cog Sci research) • Non-causal network is less compact: 1 + 2 + 4 + 2 + 4 = 13 numbers (vs. 1+1+4+2+2 = 10 numbers) © CSE AI Faculty 14 7Lesson: Add nodes representing “root causes” first, then the variables they influence, and so on. Keep it causal, baby © CSE AI Faculty 15 Probabilistic Inference in BNs •The graphical independence representation yields efficient inference schemes •We generally want to compute P(XE) where E is evidence from sensory measurements etc. (known values for variables) Sometimes, may want to compute just P(X) •One simple algorithm: variable elimination (VE) © CSE AI Faculty 16 8P(B J=true, M=true) Earthquake Burglary Alarm John Mary P(bj,m) = α Σ P(b,j,m,e,a) e,a © CSE AI Faculty 17 P(B J=true, M=true) Earthquake Burglary Alarm John Mary P(bj,m) = αP(b) ΣP(e) ΣP(ab,e)P(ja)P(ma) e a © CSE AI Faculty 18 9Structure of Computation Repeated computations ⇒ use dynamic programming? © CSE AI Faculty 19 Variable Elimination • A factor is a function from some set of variables to a specific value: e.g., f(E,A,Mary) CPTs are factors, e.g., P(AE,B) function of A,E,B • VE works by eliminating all variables in turn until there is a factor with only query variable •To eliminate a variable: 1. join all factors containing that variable (like DBs/SQL), multiplying probabilities 2. sum out the influence of the variable on new factor © CSE AI Faculty 20 10Example of VE: P(J) P(J) Earthqk Burgl = Σ Σ Σ Σ P(J,M,A,B,E) M,A,B,E = Σ Σ Σ Σ P(JA)P(MA) P(B)P(AB,E)P(E) M,A,B,E Alarm = Σ Σ Σ Σ P(JA) Σ Σ Σ Σ P(MA) Σ Σ Σ Σ P(B) Σ Σ Σ Σ P(AB,E)P(E) A M B E = Σ Σ P(JA) Σ Σ P(MA) Σ Σ P(B) f1(A,B) Σ Σ Σ Σ Σ Σ A M B J M = Σ Σ Σ Σ P(N1A) Σ Σ Σ Σ P(MA) f2(A) A M = Σ Σ Σ Σ P(JA) f3(A) A = f4(J) © CSE AI Faculty 21 Other Inference Algorithms • Direct Sampling: Repeat N times: • Use random number generator to generate sample values for each node • Start with nodes with no parents • Condition on sampled parent values for other nodes Count frequencies of samples to get an approximation to joint distribution • Other variants: Rejection sampling, likelihood weighting, Gibbs sampling and other MCMC methods (see text) • Belief Propagation: A “message passing” algorithm for approximating P(Xevidence) for each node variable X • Variational Methods: Approximate inference using distributions that are more tractable than original ones © CSE AI Faculty 22 11Summary • Bayesian networks provide a natural way to represent conditional independence • Network topology + CPTs = compact representation of joint distribution • Generally easy for domain experts to construct • BNs allow inference algorithms such as VE that are efficient in many cases © CSE AI Faculty 23 Next Time • Machine Learning © CSE AI Faculty 24 12

Advise: Why You Wasting Money in Costly SEO Tools, Use World's Best Free SEO Tool Ubersuggest.