Bayesian Networks. Outline Introduction to Bayesian Networks Conditional probability and Bayes ’...

28
Bayesian Networks

Transcript of Bayesian Networks. Outline Introduction to Bayesian Networks Conditional probability and Bayes ’...

Bayesian Networks

Outline

• Introduction to Bayesian Networks

• Conditional probability and Bayes’ Theorem

• Analyzing a Bayesian Network

• Practical Uses for Bayesian Networks

• Conclusion

• Resources

Why using BN?

• There are countless real world examples where the probability of one event is conditional on the probability of a previous one.

Decision aids, data fusion, diagnostic aids, classification, Natural language desambiguisation, data mining.

What is BN?

• Directed Acyclic Graph

• Graphical formalism to represent dependencies between random variables

• Nodes are random variables

What is BN?

Train strike

Martin late Norman late

What is BN

Key features of BN:• Enable us to model and reason about

uncertainty.

• For example:• A train strike does not imply that Norman will

definitely be late (he might leave early and drive).• but there is an increased probability that he will be

late.

• We model this by filling in a probability table for each node. Conditional Probability Table (CPT).

What is BN?

Train delay

True 0.1

False 0.9

Train strike

Martin late Norman late

Train delay

Norman late

True False

True 0.6 0.5

False 0.4 0.5

Train delay

Martin late

True False

True 0.8 0.1

False 0.2 0.9

Conditional probability

• Recall: P(A|B) is the probability of event A, given event B.

• Lemma: P(AB) = P(A|B)P(B)

• Bayes’ Theorem( )

( | )( )

P A BP A B

P B

1

( ) ( ) ( | )( | )

( ) ( ) ( | )

k k kk S

ii

P A B P A P B AP A B

P B P A P B A

Example of Bayes’ Theorem

• 從台北車站到世貿中心,假設目前有三條道路可行,分別為信義路,忠孝東路以及南京東路。各路線被選重的機率分別為: P(A) = 0.3 , P(B) = 0.4 , P(C) = 0.3 。假設走信義路塞車的機率為 0.4 ,走忠孝東路塞車的機率為 0.5 ,走南京東路塞車的機率為0.3 。試問不塞車要走哪條路線為宜 ?

Example of Bayes’ Theorem

• Answer: P(A) = 0.3 , P(B) = 0.4 , P(C) = 0.3 。

令 G 表示塞車,P(G|A) = 0.4 表示走信義路塞車的機率,P(G|B) = 0.5 表示走忠孝東路塞車的機率,P(G|C) = 0.3 表示走南京東路塞車的機率。

0.3 0.5 20( | )

(0.3 0.4) (0.4 0.5) (0.3 0.3) 23

0.4 0.5 20( | )

(0.3 0.4) (0.4 0.5) (0.3 0.3) 23

0.3 0.3 9( | )

(0.3 0.4) (0.4 0.5) (0.3 0.3) 23

P A G

P B G

P C G

塞車機率最小

Analyzing a Bayesian NetworkTrain strike

T 0.1

F 0.9

Calculate the probability that Norman is late:p(Norman late) = p(Norman late | train strike) * p(train strike)

+ p(Norman late | no train strike) * p(no train strike) = (0.8 * 0.1) + (0.1 * 0.9) = 0.17 (marginal probability )

Similarly, the marginal probability that Martin is late = 0.51

Train strike

Martin late Norman late

T F

T 0.6 0.5

F 0.4 0.5

Train strikeM

art

in late T F

T 0.8 0.1

F 0.2 0.9

Train strike

Norm

an

late

Analyzing a Bayesian Network

• Revising probabilities

Suppose that we do not know if there is a train strike but do know that Norman is late. Given :

p(Norman late| train strike) = 0.8, p(Norman late) = 0.17,

determine:

a) the (revised) probability that there is a train strike; and

b) the (revised) probability that Martin will be late.

Analyzing a Bayesian Network

• To calculate a) we use Bayes’ theorem :

The observation that Norman is late significantly increases the probability that there is a train strike (up from 0.1 to 0.47).

• To calculate b)p(Martin late) = p(Martin late | train strike) * p(train strike) + p(Martin late | no train strike) * p(no train strike)

= (0.6 * 0.47) + (0.5 * 0.53) = 0.55

Norman is late has slightly increased the probability that Martin is late. When we enter evidence and use it to update the probabilities ,we call it

propagation.

(Norman late | train strike) (train strike)(Train strike | Norman late)

(Norman late)

p pp

p

0.8 0.10.47

0.17

Example of Bayesian Network

• I'm at work, neighbor to say my is ringing, but neighbor doesn't . Sometimes it's set off by minor

. Is there a ?

• Variables: Burglary, Earthquake, Alarm, JohnCalls, MaryCalls

• Network topology reflects "causal" knowledge:• A burglar can set the alarm off

• An earthquake can set the alarm off

• The alarm can cause Mary to call

• The alarm can cause John to call

John calls alarmMary call earth

quakes burglar

Example contd.

Example contd.

• Chain rule:• P(A,B) = p(A|B) p(B)

=> P(A,B,C) = P(A| B,C) P(B,C) = P(A|B,C) P(B|C) P(C)• P(A1, A2, ..., An) = P(A1| A2, ..., An) P(A2| A3, ..., An)

P(An-1|An) P(An)

• Joint probability distribution in BBN• {A1,A2,…,An}: a set of variables in BN

• parents(Ai): the set of parents of the node Ai in BN.

• The joint probability distribution for {A1,A2,…,An} = 1

1( ,..., ) ( | ( ))

n

n i ii

P A A P A parents A

Example contd.

• e.g., Alarm has sounded, but neither a burglary nor an earthquake has occurred, and both John and Mary call.

P(j , m , a , b , e)

= P (j | a) P (m | a) P (a | b, e) P (b) P (e)

= 0.90 x 0.70 x 0.001 x 0.999 x 0.998 = 0.00062

Example contd.

• the probability that someone has a smoking history, lung cancer but not bronchitis, suffers from fatigue and tests positive in an X-ray test is:

( , , , , ) 0.2 0.75 0.003 0.5 0.6 0.000135P s b l f x

Smoking history

Bronchitis Lung Cancer

Fatigue X-ray

P(b | s)=0.25P(b | s)=0.05

P(s)=0.2

P(l | s)=0.003P(l | s)=0.00005

P(f | b,l)=0.75P(f | b,l)=0.10P(f | b,l)=0.5P(f | b,l)=0.05 P(x | l)=0.6

P(x | l)=0.02

)|(),|()|()|()(),,,,( lxPlbfPslPsbPsPxflbsP

Why do we need a BBN for the probability computations?

• A network consisting of five variables (nodes) A,B,C,D,E.• No specification of the dependencies.• Apply chain rule, we get

p(A,B,C,D,E) = p(A|B,C,D,E)*p(B|C,D,E)*p(C|D,E)*p(D|E)*p(E) • Now suppose that the dependencies are explicitly modeled in a

BN as:

• Then the joint probability distribution p(A,B,C,D,E) = p(A|B)*p(B|C,E)*p(C|D)*p(D)*p(E)

A

C

B

D

E

Dealing with large amount of variables

• Tricky to work out all the probabilities and the revised probabilities.

• Can BN be used to solve realistic problems?• With the introduction of software tools that implement

• Algorithms

• providing a graphical interface to draw the graphs and fill in the probability tables

• BN tools:• Hugin (http://www.hugin.com/)

• Bayesian Knowledge Discoverer (BKD)

• Norsys (http://www.norsys.com/)

Practical Uses for Bayesian Networks

• AutoClass • Automatically interpolate raw data from interplanetary probes, and deep

space explorations.

• For more information, please log on to http://ic-www.arc.nasa.gov/ic/projects/bayes-group/autoclass/index.html

Practical Uses for Bayesian Networks

• Lumiere• Project created by Microsoft

• Resulted in the "Office Assistant" with the introduction of the office 95 suite of desktop products.

• Foundation:• “Inferring Informational Goals from Free-Text Queries: A Bayesian

Approach.” David Heckerman and Eric HorvitzDecision Theory & Adaptive Systems GroupMicrosoft Research, Redmond, Washington 98052-6399.

• Describe a Bayesian approach to modeling the relationship between words in a user's query for assistance and the informational goals of the user.

Practical Uses for Bayesian Networks

• 醫療影像傳輸及資料探索之系統開發(The Development of Communication and Data Mining System for Medical Image)• 對於子宮頸抹片檢查建立資料庫• 獲取的異常抹片病人之病例資料,丟入貝氏網路中 training 。• 研究發現貝氏網路可在上消化道疾病病患舌診影像分析資料

獲得最佳診斷結果。

• 資料來源 :

醫療影像傳輸及資料探索之系統開發 ( The Development of Communication and Data Mining System for Medical Image )中原大學醫學工程學系 范振添 著

Practical Uses for Bayesian Networks

• An Agent-based Bayesian Forecasting Model for Enhanced Network Security• Provides a novel application of the Bayesian forecasting

technique to predict user actions.• Bayesian Intrusion Detection System• Invalid behavior is determined by comparing

• User's current behavior Their typical behavior• User’s current behavior A set of general rules governing

valid behavior formed by systems administrators

Source:An agent-based Bayesian forecasting model for enhanced network securityPikoulas, J.; Buchanan, W.J.; Mannion, M.; Triantafyllopoulos, K.;Engineering of Computer Based Systems, 2001. ECBS 2001. Proceedings. Eighth Annual IEEE International Conference and Workshop on the 17-20 April 2001 Page(s):247 - 254

Practical Uses for Bayesian Networks

• 基於貝氏網路分析法之學生評鑑模式(Student assessment model based on bayesian method)• 元智大學資工所學習科技實驗室 鄭乃塵撰 • 指導老師:劉晨鐘 教授

• 貝氏網路為基礎之動態軟體專案管理(Dynamic Software Project Management using Bayesian Belief Network)• 元智大學資工所軟體工程實驗室 陳建蒝撰 • 指導教授:范金鳳 教授

Conclusion

• Offer assistance in a wide range of endeavors.

• Support the use of probabilistic inference to update and revise belief values.

• Support complex inference modeling including rational decision making systems, value of information and sensitivity analysis.

• Useful for causality analysis and through statistical induction they support a form of automated learning.

Resources

• Papers about BBNs • A Roadmap to Research on Bayesian Networks and other Decomposable

Probabilistic Models - hosted by Lonnie Chrisman, School of Computer Science, Pittsburgh.

• Special Issue on Bayesian Networks: Communications of the ACM., March, 1995, vol 38, no. 3.

• Special Issue on Data Mining: Communications of the ACM., November, 1996, vol 39, no. 11.

• D. Heckerman. A tutorial on learning Bayesian networks. Technical Report MSR-TR-95-06, Microsoft Research, March, 1995.

Resources

• BBN projects • IMPRESS (IMproving the software PRocESS using bayesian nets) EPSRC

Project GR/L06683. 1 Jan 1997 - 31 Dec 1999.

• SERENE (SafEty and Risk Evaluation using Bayesian NEts) ESPRIT Framework IV Collaborative Project 22187. 1 June 1996 - 1 June 1999.

• TRACS DERA contract for CSR, LSF/E20173. Sept 1996 - February 1999.

• European Union, Project INTAS 93-725 Multivariate Statistical Analysis and Bayesian Belief Networks for the Development of Intelligent Decision Support Systems with Applications in Medecine and Agriculture