Bayesian Networks and Decision...
Transcript of Bayesian Networks and Decision...
![Page 1: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/1.jpg)
Bayesian Networks and Decision GraphsA 3-week course at Reykjavik University
Finn V. Jensen & Uffe Kjærulff ({fvj,uk}@cs.aau.dk)
Group of Machine IntelligenceDepartment of Computer Science, Aalborg University
26 April – 13 May, 2005
![Page 2: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/2.jpg)
Outline
1 Introduction
2 Contents of Course
3 Causal Networks and Relevance Analysis
4 Bayesian Probability Theory
Reykjavik University, April/May 2005, BNs and DGs: Introduction 2
![Page 3: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/3.jpg)
Introduction
1 Artificial Intelligence
2 Expert Systems
3 Normative Expert Systems
4 Sample Application Areas
Reykjavik University, April/May 2005, BNs and DGs: Introduction 3
![Page 4: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/4.jpg)
Artificial Intelligence / Machine Intelligence
What is artificial intelligence?Device or service that
reasons and makes decisions under uncertainty,extracts knowledge from data/experience, andsolves problems efficiently and adapts to new situations.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 4
![Page 5: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/5.jpg)
Artificial Intelligence / Machine Intelligence
What is artificial intelligence?Device or service that
reasons and makes decisions under uncertainty,extracts knowledge from data/experience, andsolves problems efficiently and adapts to new situations.
Why use artificial intelligence?
Automate tasks.Automate reasoning and decision making.Extract knowledge and information from data.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 5
![Page 6: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/6.jpg)
Expert Systems
The first expert systems were constructed in the late 1960s.
Expert System = Knowledge Base + Inference Engine
Reykjavik University, April/May 2005, BNs and DGs: Introduction 6
![Page 7: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/7.jpg)
Expert Systems
The first expert systems were constructed in the late 1960s.
Expert System = Knowledge Base + Inference Engine
The first expert systems were constructed as computer models ofthe expert, e.g. production rules like:
if condition, then fact
if condition, then action
Reykjavik University, April/May 2005, BNs and DGs: Introduction 7
![Page 8: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/8.jpg)
Expert Systems
The first expert systems were constructed in the late 1960s.
Expert System = Knowledge Base + Inference Engine
The first expert systems were constructed as computer models ofthe expert, e.g. production rules like:
if condition, then fact
if condition, then action
In most systems there is a need for handling uncertainty:
if condition with certainty x , then fact with certainty f (x)
The algebras for combining certainty factors are notmathematically coherent and can lead to incorrect conclusions.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 8
![Page 9: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/9.jpg)
Normative Expert Systems
Observation
Action
Action
Observation
Model the problem domain, not the expert.
Support the expert, don’t substitute the expert.
Use classical probability calculus and decision theory, not anon-coherent uncertainty calculus.
Closed-world representation of a given problem domain (i.e.,the domain model assumes some given background conditionsor context in which the model is valid).
Reykjavik University, April/May 2005, BNs and DGs: Introduction 9
![Page 10: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/10.jpg)
Motivation for Normative Approach
Some important motivations for using model-based systems:
Procedure-based (extensional) systems are semanticallysloppy, model-based (intensional) systems are not.
Speak the language of causality, use a single knowledge baseto provide simulation, diagnosis, and prognosis.
Both knowledge and data can be used to construct Bayesiannetworks.
Adapt to individual settings.
Probabilities make it easy to interface with decision and utilitytheory.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 10
![Page 11: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/11.jpg)
Bayes’ Rule
P(Y |X ) =P(X |Y )P(Y )
P(X )
Rev. Thomas Bayes (1702–1761), an 18th century ministerfrom England.
The rule, as generalized by Laplace, is the basic starting pointfor inference problems using probability theory as logic.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 11
![Page 12: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/12.jpg)
Example: Chest Clinic
Chest Clinic
Shortness-of-breath (dyspnoea) may be due totuberculosis, lung cancer or bronchitis, or none ofthem, or more than one of them. A recent visit toAsia increases the chances of tuberculosis, whilesmoking is known to be a risk factor for both lungcancer and bronchitis. The results of a single chestX-ray do not discriminate between lung cancer andtuberculosis, as neither does the presence or absenceof dyspnoea.
This is a typical diagnostic situation.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 12
![Page 13: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/13.jpg)
Bayesian Decision Problems
Bayesian networks can be augmented with explicit representationof decisions and utilities. Such augmented models are denotedinfluence diagrams (or decision networks).
Bayesian decision theory provides a solid foundation forassessing and thinking about actions under uncertainty.
Intuitive, graphical specification of a decision problem.
Automatic determination of a optimal strategy andcomputation of the maximal expected utility of this strategy.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 13
![Page 14: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/14.jpg)
Example: Oil Wildcatter
Oil Wildcatter
An oil wildcatter must decide either to drill or not todrill. He is uncertain whether the hole is dry, wet, orsoaking, The wildcatter could perform a seismicsoundings test that will help determine the geologicalstructure of the site. The soundings will give a closedreflection pattern (indication of much oil), an openpattern (indication of some oil), or a diffuse pattern(almost no hope of oil). The cost of testing is$10,000 whereas the cost of drilling is $70,000. Theutility of drilling is $270,000, $120,000, and $0 for asoaking, wet, and dry hole, respectively.
This is a typical decision scenario.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 14
![Page 15: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/15.jpg)
Normative Systems — Characteristics
Systems based on Bayesian networks and influence diagrams arenormative, and have the following characteristics:
Graph representing causal relations.
Strength of relations by probabilities.
Preferences represented by utilities.
Recommendations based on maximizing expected utility.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 15
![Page 16: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/16.jpg)
Normative Systems — Task Categories
The class of tasks suitable for normative systems can be dividedinto three broad subclasses:
Forecasting:
Computing probability distributions for future events.
Interpretation:
Pattern identification (diagnosis, classification).
Planning:
Generation of optimal sequences of decisions/actions.
Reykjavik University, April/May 2005, BNs and DGs: Introduction 16
![Page 17: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/17.jpg)
Sample Application Areas of Normative Systems
Medical
Software
Info. proc.
Industry
Economy
Military
Agriculture
Mining
Law enforcement
Etc.Reykjavik University, April/May 2005, BNs and DGs: Introduction 17
![Page 18: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/18.jpg)
Outline
1 Introduction
2 Contents of Course
3 Causal Networks and Relevance Analysis
4 Bayesian Probability Theory
Reykjavik University, April/May 2005, BNs and DGs: Contents of Course 18
![Page 19: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/19.jpg)
Contents of Course
Lecture plan:1 Causal networks and Bayesian probability calculus Tue 26/4 UK2 Construction of Bayesian networks Wed 27/4 UK3 Workshop: Construction of Bayesian networks Thu 28/4 UK4 Inference and analyses in Bayesian networks Fri 29/4 UK5 Decisions, utilities and decision trees Mon 2/5 FVJ6 Troubleshooting and influence diagrams Tue 3/5 FVJ7 Solution of influence diagrams Wed 4/5 FVJ8 Methods for analysing an ID spec. dec. scenario Fri 6/5 FVJ9 Workshop: Construction of influence diagrams Mon 9/5 FVJ
10 Learning parameters from data Tue 10/5 FVJ11 Bayesian networks as classifiers Wed 11/5 FVJ12 Learning the structure of Bayesian networks Thu 12/5 FVJ13 Continuous variables Fri 13/5 FVJ
The two workshops have a duration of 4 hours.
All lectures start at 10:00.
The plan is subject to changes.
Reykjavik University, April/May 2005, BNs and DGs: Contents of Course 19
![Page 20: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/20.jpg)
Literature, Requirements, Etc.
Literature:
Finn V. Jensen (2001), Bayesian Networks and DecisionGraphs, Springer-Verlag.
Exercises suggested after each lecture.
To pass the course you are required to
attend all lectures andhand in written answers to home assignments.
A number of the exercises require access to the HUGIN Tool.See the course home page for instructions on downloading andinstalling the HUGIN Tool.
Home page: www.cs.aau.dk/∼uk/teaching/Reykjavik-05/.
Reykjavik University, April/May 2005, BNs and DGs: Contents of Course 20
![Page 21: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/21.jpg)
Outline
1 Introduction
2 Contents of Course
3 Causal Networks and Relevance Analysis
4 Bayesian Probability Theory
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 21
![Page 22: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/22.jpg)
Causal Networks and Relevance Analysis
1 Causal networks, variables and DAGs2 Relevance analysis (transmission of evidence)
Three types of connectionsExplaining awayd-separation
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 22
![Page 23: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/23.jpg)
Burglary or Earthquake
Burglary or Earthquake
Mr. Holmes is working in his office when he receives aphone call from his neighbor Dr. Watson, who tellshim that Holmes’ burglar alarm has gone off.Convinced that a burglar has broken into his house,Holmes rushes to his car and heads for home. On hisway, he listens to the radio, and in the news it isreported that there has been a small earthquake inthe area. Knowing that earthquakes have a tendencyto turn burglar alarms on, he returns to his work.
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 23
![Page 24: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/24.jpg)
Burglary or Earthquake: The Model
Burglary Earthquake
Alarm RadioNews
WatsonCalls
Each node in the graph represents a random variable.
In this example, each variable has state space {no, yes}.
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 24
![Page 25: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/25.jpg)
Burglary or Earthquake: The Model
Burglary Earthquake
Alarm RadioNews
WatsonCalls
Each node in the graph represents a random variable.
In this example, each variable has state space {no, yes}.
Three types of connections:
Serial
Diverging
ConvergingReykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 25
![Page 26: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/26.jpg)
Serial Connections
“Burglary” has a causal influence on “Alarm”, which in turnhas a causal influence on “Watson calls”.
Burglary Alarm WatsonCalls
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 26
![Page 27: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/27.jpg)
Serial Connections
“Burglary” has a causal influence on “Alarm”, which in turnhas a causal influence on “Watson calls”.
Burglary Alarm WatsonCalls
If we observe “Alarm”, any information about the state of“Burglary” is irrelevant to our belief about “Watson calls”and vice versa.
Burglary Alarm WatsonCalls
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 27
![Page 28: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/28.jpg)
Serial Connections
X has a causal influence on Y that has a causal influence on Z :
X Y Z
Serial Connections
Information may be transmitted through a serialconnection unless the state of the variable (Y ) in theconnection is known.
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 28
![Page 29: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/29.jpg)
Diverging Connections
“Earthquake” has a causal influence on both “Alarm” and“Radio news”.
Alarm Earthquake RadioNews
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 29
![Page 30: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/30.jpg)
Diverging Connections
“Earthquake” has a causal influence on both “Alarm” and“Radio news”.
Alarm Earthquake RadioNews
If we observe “Earthquake”, any information about the stateof “Alarm” is irrelevant for our belief about an earthquakereport in the “Radio news” and vice versa.
Alarm Earthquake RadioNews
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 30
![Page 31: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/31.jpg)
Diverging Connections
Y has a causal influence on both X and Z :
X Y Z
Diverging Connections
Information may be transmitted through a divergingconnection, unless the state of the variable (Y ) in theconnection is known.
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 31
![Page 32: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/32.jpg)
Converging Connections
“Alarm” is causally influenced by both “Burglary” and“Earthquake”.
Burglary Alarm Earthquake
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 32
![Page 33: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/33.jpg)
Converging Connections
“Alarm” is causally influenced by both “Burglary” and“Earthquake”.
Burglary Alarm Earthquake
If we observe “Alarm” and “Burglary”, then this will effectour belief about “Earthquake”: Burglary explains the alarm,reducing our belief that earthquake is the triggering factor,and vice versa.
Burglary Alarm Earthquake
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 33
![Page 34: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/34.jpg)
Converging Connections
Both X and Z have a causal influence on Y :
X Y Z
Diverging Connections
Information may only be transmitted through aconverging connection if either information about thestate of the variable in the connection (Y ) or one ofits descendants is available.
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 34
![Page 35: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/35.jpg)
Transmission of Evidence
Serial
Evidence may betransmitted unless thestate of B is known.
A B C
Diverging
Evidence may betransmitted unless thestate of B is known.
B
A1 A1 · · · An
Converging
Evidence may only betransmitted if B orone of its descendantshas received evidence.
A1 A1 · · · An
B
It takes hard evidence to block a serial or diverging connection, whereas toopen a converging connection soft evidence suffices.
Notice the explaining away effect in a converging connection: B has beenobserved; then if Ai is observed, it explains the observation of B and the othercauses are explained away (i.e., the beliefs in them are reduced).
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 35
![Page 36: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/36.jpg)
Explaining Away I
Burglary Earthquake
Alarm RadioNews
WatsonCalls
The converging connection realizes the explaining awaymechanism: The news about the earthquake strongly suggests thatthe earthquake is the cause of the alarm, and thereby explainsaway burglary as the cause.
The ability to perform this kind of intercausal reasoning is uniquefor graphical models and is one of the main differences betweenautomatic reasoning systems based on graphical models and thosebased on e.g. production rules.
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 36
![Page 37: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/37.jpg)
Explaining Away II
Assume that we have observed the symptom RunnyNose, and thatthere are two competing causes of it: Cold and Allergy. ObservingFever, however, provides strong evidence that cold is the cause ofthe problem, while our belief in Allergy being the cause decreasessubstantially (i.e., it is explained away by the observation of Fever).
intercausalreasoning
Cold Allergy
causalreasoning
diagnosticreasoning
Fever RunnyNose
The ability of probabilistic networks to automatically perform suchintercausal inference is a key contribution to their reasoning power.
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 37
![Page 38: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/38.jpg)
d-separation
The rules for transmission of evidence over serial, diverging, andconverging connections can be combined into one general ruleknown as d-separation:
d-separation
A path π = 〈u, . . . , v〉 in a DAG, G = (V , E ), isblocked by S ⊆ V if π contains a node w such thateither
w ∈ S and the connections in π does not meethead-to-head at w , or
w 6∈ S , w has no descendants in S , and theconnections in π meet head-to-head at w .
For three (not necessarily disjoint) subsets A, B, S ofV , A and B are said to be d-separated hvis all pathsbetween A and B are blocked by S .
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 38
![Page 39: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/39.jpg)
d-separation: Dependence and Independence
If X and Y are not d-separated, they are d-connected.
d-separation provides a criterion for reading statements of(conditional) dependence and independence (or relevance andirrelevance) from a causal structure.
Dependence and independence depends on what you know(and do not know).
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 39
![Page 40: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/40.jpg)
Example: Dependence and Independence
A B
C D E
F G
H
1 C and G are d-connected.
2 C and E are d-separated.
3 C and E are d-connectedgiven evidence on G .
4 A and G are d-separatedgiven evidence on D andE .
5 A and G are d-connectedgiven evidence on D.
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 40
![Page 41: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/41.jpg)
Summary
Causal networks
Serial/diverging/converging connectionsTransmission of evidence in causal networksExplaining away (intercausal reasoning)
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 41
![Page 42: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/42.jpg)
Summary
Causal networks
Serial/diverging/converging connectionsTransmission of evidence in causal networksExplaining away (intercausal reasoning)
Dependence and independence
d-separation in causal networks
Reykjavik University, April/May 2005, BNs and DGs: Causal Networks and Relevance Analysis 42
![Page 43: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/43.jpg)
Outline
1 Introduction
2 Contents of Course
3 Causal Networks and Relevance Analysis
4 Bayesian Probability Theory
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 43
![Page 44: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/44.jpg)
Bayesian Probability Theory
Axioms of probability theory
Probability calculus
Fundamental ruleBayes’ ruleThe chain ruleCombination and marginalization
Conditional independence
Evidence
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 44
![Page 45: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/45.jpg)
Axioms of Probability
The probability of an event, a, is denoted P(a). Probabilities obeythe following axioms:
1 0 ≤ P(a) ≤ 1, with P(a) = 1 if a is certain.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 45
![Page 46: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/46.jpg)
Axioms of Probability
The probability of an event, a, is denoted P(a). Probabilities obeythe following axioms:
1 0 ≤ P(a) ≤ 1, with P(a) = 1 if a is certain.
2 If events a and b are mutually exclusive, then
P(a or b) ≡ P(a ∨ b) = P(a) + P(b).
In general, if events a1, a2, . . . are pairwise incompatible, then
P
(
⋃
i
ai
)
= P(a1) + P(a2) + · · · =∑
i
P(ai ).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 46
![Page 47: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/47.jpg)
Axioms of Probability
The probability of an event, a, is denoted P(a). Probabilities obeythe following axioms:
1 0 ≤ P(a) ≤ 1, with P(a) = 1 if a is certain.
2 If events a and b are mutually exclusive, then
P(a or b) ≡ P(a ∨ b) = P(a) + P(b).
In general, if events a1, a2, . . . are pairwise incompatible, then
P
(
⋃
i
ai
)
= P(a1) + P(a2) + · · · =∑
i
P(ai ).
3 Joint probability: P(a and b) ≡ P(a, b) = P(b |a)P(a).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 47
![Page 48: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/48.jpg)
Conditional Probabilities
The basic concept in the Bayesian treatment of uncertainty incausal networks is conditional probability.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 48
![Page 49: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/49.jpg)
Conditional Probabilities
The basic concept in the Bayesian treatment of uncertainty incausal networks is conditional probability.
Every probability is conditioned on a context. For example,
“P(six) =1
6” ≡ “P(six |SymmetrixDie) =
1
6”
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 49
![Page 50: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/50.jpg)
Conditional Probabilities
The basic concept in the Bayesian treatment of uncertainty incausal networks is conditional probability.
Every probability is conditioned on a context. For example,
“P(six) =1
6” ≡ “P(six |SymmetrixDie) =
1
6”
In general, given the event b, the conditional probability ofthe event a is x :
P(a |b) = x .
It is not “whenever b we have P(a) = x”.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 50
![Page 51: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/51.jpg)
Conditional Probabilities
The basic concept in the Bayesian treatment of uncertainty incausal networks is conditional probability.
Every probability is conditioned on a context. For example,
“P(six) =1
6” ≡ “P(six |SymmetrixDie) =
1
6”
In general, given the event b, the conditional probability ofthe event a is x :
P(a |b) = x .
It is not “whenever b we have P(a) = x”.
Conditional Probability
If b is true and everything else known is irrelevant fora, then the probability of a is x .
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 51
![Page 52: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/52.jpg)
A Justification of Axiom 3 — The Fundamental Rule
C : a set of cats
B: the subset of brown cats (m)
A: the subset of Abyssinians (i of them are brown)
C
B A
mi
n
f (A |B, C ) =i
m, f (B |C ) =
m
n
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 52
![Page 53: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/53.jpg)
A Justification of Axiom 3 — The Fundamental Rule
C : a set of cats
B: the subset of brown cats (m)
A: the subset of Abyssinians (i of them are brown)
C
B A
mi
n
f (A |B, C ) =i
m, f (B |C ) =
m
n
f (A, B|C ) =i
n=
i
m·m
n= f (A |B, C ) · f (B |C ).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 53
![Page 54: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/54.jpg)
Discrete Random Variables
A discrete random variable, A, has a set of exhaustive andmutually exclusive states, dom(A) = {a1, . . . , an}.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 54
![Page 55: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/55.jpg)
Discrete Random Variables
A discrete random variable, A, has a set of exhaustive andmutually exclusive states, dom(A) = {a1, . . . , an}.
In this context, an event is an assignment of values to a set ofvariables and
P(A = a1 ∨ · · · ∨ A = an) = P(A = a1) + · · · + P(A = an)
=
n∑
i=1
P(A = ai ) = 1.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 55
![Page 56: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/56.jpg)
Discrete Random Variables
A discrete random variable, A, has a set of exhaustive andmutually exclusive states, dom(A) = {a1, . . . , an}.
In this context, an event is an assignment of values to a set ofvariables and
P(A = a1 ∨ · · · ∨ A = an) = P(A = a1) + · · · + P(A = an)
=
n∑
i=1
P(A = ai ) = 1.
Capital letters will denote a variable, or a set of variables, andlower case letters will denote states (values) of variables.
Example: R = r , B = ¬b (Rain? = Raining, BirdsOnRoof =No) is an event.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 56
![Page 57: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/57.jpg)
Probability Distributions for Variables
If X is a variable with states x1, . . . , xn, then P(X ) denotes aprobability distribution over these states:
P(X ) = (P(X = x1), . . . ,P(X = xn)),
where
P(X = xi ) ≥ 0 andn∑
i=1
P(X = xi ) = 1.
P(X |pa(X )) consists of one P(X ) for each configuration ofthe parents, pa(X ), of X .
B E
A
B = n B = n B = y B = yE = n E = y E = n E = y
A = n 0.999 0.1 0.05 0.01A = y 0.001 0.9 0.95 0.99
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 57
![Page 58: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/58.jpg)
Rule of Total Probability
P(A) = P(A, B = b1) + · · · + P(A, B = bn)
= P(A |B = b1)P(B = b1) + · · · + P(A |B = bn)P(B = bn).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 58
![Page 59: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/59.jpg)
Rule of Total Probability
P(A) = P(A, B = b1) + · · · + P(A, B = bn)
= P(A |B = b1)P(B = b1) + · · · + P(A |B = bn)P(B = bn).
Computing P(A) from P(A, B) using the rule of total probability isoften called marginalization, and is written compactly as
P(A) =∑
i
P(A, B = bi ),
or even shorter asP(A) =
∑
B
P(A, B).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 59
![Page 60: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/60.jpg)
The Fundamental Rule and Bayes’ Rule
The fundamental rule of probability calculus on variables:
P(X , Y ) = P(X |Y )P(Y )
= P(Y |X )P(X ).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 60
![Page 61: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/61.jpg)
The Fundamental Rule and Bayes’ Rule
The fundamental rule of probability calculus on variables:
P(X , Y ) = P(X |Y )P(Y )
= P(Y |X )P(X ).
Bayes’ rule:
P(Y |X )
=P(X |Y )P(Y )
P(X )
=P(X |Y )P(Y )
P(X |Y = y1)P(Y = y1) + · · · + P(X |Y = yn)P(Y = yn).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 61
![Page 62: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/62.jpg)
Simple Bayesian Inference
Graphically, inference using Bayes’ rule corresponds to reversingarrows:
A A
B B
P(A, B) = P(A)P(B |A) P(A, B) = P(B)P(A |B)
P(A |B) =P(A, B)
P(B)=
P(A)P(B |A)∑
a∈dom(A)
P(A = a)P(B |A = a)
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 62
![Page 63: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/63.jpg)
The Chain Rule
Let V = {X1, . . . ,Xn} be a set of variables
Let P(V ) denote the joint probability distribution over V
Using the Fundamental Rule, P(V ) can be written as
P(V ) =n∏
i=1
P(Xi |X1, . . . ,Xi−1)
Thus, any joint can be represented as a product ofconditionals, e.g.,
P(X1, X2, X3) = P(X3 |X1, X2)P(X2, X1)
= P(X3 |X1, X2)P(X2 |X1)P(X1)
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 63
![Page 64: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/64.jpg)
The Chain Rule and Graph Structure
Let V = {A, B, C , D}. Then P(V ) factorizes as
P(V ) = P(A, B, C , D) = P(A |B, C , D)P(B, C , D)
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 64
![Page 65: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/65.jpg)
The Chain Rule and Graph Structure
Let V = {A, B, C , D}. Then P(V ) factorizes as
P(V ) = P(A, B, C , D) = P(A |B, C , D)P(B, C , D)
= P(A |B, C , D)P(B |C , D)P(C , D)
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 65
![Page 66: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/66.jpg)
The Chain Rule and Graph Structure
Let V = {A, B, C , D}. Then P(V ) factorizes as
P(V ) = P(A, B, C , D) = P(A |B, C , D)P(B, C , D)
= P(A |B, C , D)P(B |C , D)P(C , D)
= P(A |B, C , D)P(B |C , D)P(C |D)P(D) (1)
(1)
A
B C
D
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 66
![Page 67: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/67.jpg)
The Chain Rule and Graph Structure
Let V = {A, B, C , D}. Then P(V ) factorizes as
P(V ) = P(A, B, C , D) = P(A |B, C , D)P(B, C , D)
= P(A |B, C , D)P(B |C , D)P(C , D)
= P(A |B, C , D)P(B |C , D)P(C |D)P(D) (1)
= P(B |A, C , D)P(D |A, C )P(C |A)P(A) (2)
(1)
A
B C
D
(2)
A
B C
D
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 67
![Page 68: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/68.jpg)
The Chain Rule and Graph Structure
Let V = {A, B, C , D}. Then P(V ) factorizes as
P(V ) = P(A, B, C , D) = P(A |B, C , D)P(B, C , D)
= P(A |B, C , D)P(B |C , D)P(C , D)
= P(A |B, C , D)P(B |C , D)P(C |D)P(D) (1)
= P(B |A, C , D)P(D |A, C )P(C |A)P(A) (2)
= · · ·
(1)
A
B C
D
(2)
A
B C
D
etc.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 68
![Page 69: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/69.jpg)
Combination and Marginalization
Combination of probability distributions is multiplication.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 69
![Page 70: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/70.jpg)
Combination and Marginalization
Combination of probability distributions is multiplication.
Using the Rule of Total Probability, from P(X , Y ) themarginal probability distribution P(X ) can be computed:
P(x) = P(X = x) =∑
y∈dom(Y )
P(X = x , Y = y).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 70
![Page 71: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/71.jpg)
Combination and Marginalization
Combination of probability distributions is multiplication.
Using the Rule of Total Probability, from P(X , Y ) themarginal probability distribution P(X ) can be computed:
P(x) = P(X = x) =∑
y∈dom(Y )
P(X = x , Y = y).
A variable Y is marginalized out of P(X , Y ) as
P(X ) =∑
y∈dom(Y )
P(X , Y = y) or short: P(X ) =∑
Y
P(X , Y ).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 71
![Page 72: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/72.jpg)
Combination and Marginalization
Combination of probability distributions is multiplication.
Using the Rule of Total Probability, from P(X , Y ) themarginal probability distribution P(X ) can be computed:
P(x) = P(X = x) =∑
y∈dom(Y )
P(X = x , Y = y).
A variable Y is marginalized out of P(X , Y ) as
P(X ) =∑
y∈dom(Y )
P(X , Y = y) or short: P(X ) =∑
Y
P(X , Y ).
The unity rule:
∑
X
P(X |pa(X )) = 1pa(X )
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 72
![Page 73: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/73.jpg)
Combination and Marginalization — Example
Combination:
b1 b2
a1 0.4 0.2a2 0.5 0.6a3 0.1 0.2
×b1 b2
0.3 0.7=
b1 b2
a1 0.12 0.14a2 0.15 0.42a3 0.03 0.14
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 73
![Page 74: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/74.jpg)
Combination and Marginalization — Example
Combination:
b1 b2
a1 0.4 0.2a2 0.5 0.6a3 0.1 0.2
×b1 b2
0.3 0.7=
b1 b2
a1 0.12 0.14a2 0.15 0.42a3 0.03 0.14
Marginalization:
P(A) =
b1 b2
a1 0.12 + 0.14a2 0.15 + 0.42a3 0.03 + 0.14
= (0.26, 0.57, 0.17)
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 74
![Page 75: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/75.jpg)
Conditional Independence
A variable X is independent of Y given Z if
P(xi |yj , zk) = P(xi |zk), ∀i , j , k
Shorthand:P(X |Y , Z ) = P(X |Z )
Notice that the definition is symmetric
Notation: X ⊥⊥ Y |Z
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 75
![Page 76: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/76.jpg)
Conditional Independence
A variable X is independent of Y given Z if
P(xi |yj , zk) = P(xi |zk), ∀i , j , k
Shorthand:P(X |Y , Z ) = P(X |Z )
Notice that the definition is symmetric
Notation: X ⊥⊥ Y |Z
Under X ⊥⊥ Y |Z the Fundamental Rule reduces to:
P(X , Y |Z ) = P(X |Y , Z )P(Y |Z )
= P(X |Z )P(Y |Z )
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 76
![Page 77: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/77.jpg)
Graphical Representations of X ⊥⊥ Y |Z
X and Y are conditionally independent given Z :
P(X , Y , Z ) = P(X |Y , Z )P(Y |Z )P(Z )
= P(X |Z )P(Y |Z )P(Z )
Z
X Y
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 77
![Page 78: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/78.jpg)
Graphical Representations of X ⊥⊥ Y |Z
X and Y are conditionally independent given Z :
P(X , Y , Z ) = P(X |Y , Z )P(Y |Z )P(Z )
= P(X |Z )P(Y |Z )P(Z )
Z
X Y
P(X , Y , Z ) = P(X )P(Y |X , Z )P(Z |X )
= P(X )P(Y |Z )P(Z |X )
Z
X Y
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 78
![Page 79: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/79.jpg)
Graphical Representations of X ⊥⊥ Y |Z
X and Y are conditionally independent given Z :
P(X , Y , Z ) = P(X |Y , Z )P(Y |Z )P(Z )
= P(X |Z )P(Y |Z )P(Z )
Z
X Y
P(X , Y , Z ) = P(X )P(Y |X , Z )P(Z |X )
= P(X )P(Y |Z )P(Z |X )
Z
X Y
P(X , Y , Z ) = P(X |Y , Z )P(Y )P(Z |Y )
= P(X |Z )P(Y )P(Z |Y )
Z
X Y
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 79
![Page 80: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/80.jpg)
Evidence
Definition 1An instantiation of a variable X is an observation on the exactstate of X .
f (X ) = (0, . . . , 0, 1, 0, . . . , 0)
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 80
![Page 81: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/81.jpg)
Evidence
Definition 1An instantiation of a variable X is an observation on the exactstate of X .
f (X ) = (0, . . . , 0, 1, 0, . . . , 0)
Definition 2Let X be a variable with n states. An evidence function on X is ann-dimensional table of zeros and ones.
f (X ) = (0, . . . , 0, 1, 0, . . . , 0, 1, 0, . . . , 0)
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 81
![Page 82: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/82.jpg)
Evidence
Definition 1An instantiation of a variable X is an observation on the exactstate of X .
f (X ) = (0, . . . , 0, 1, 0, . . . , 0)
Definition 2Let X be a variable with n states. An evidence function on X is ann-dimensional table of zeros and ones.
f (X ) = (0, . . . , 0, 1, 0, . . . , 0, 1, 0, . . . , 0)
Definition 3Let X be a variable with n states. An evidence function (likelihoodevidence) on X is an n-dimensional table of non-negative numbers.
f (X ) = (0, . . . , 0, 2, 0, . . . , 0, 1, 0, . . . , 0)
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 82
![Page 83: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/83.jpg)
Evidence
Let U be a set of variables and let {X1, . . . ,Xn} be the subsetof U.
Given a set of evidence ε = {ε1, . . . , εm} we want to computeP(Xi |ε), for all i .
This can be done via
P(Xi |ε) =
∑
X∈U\{Xi}P(U, ε)
∑
U P(U, ε)=
P(Xi , ε)
P(ε).
This requires the full joint P(U).
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 83
![Page 84: Bayesian Networks and Decision Graphspeople.cs.aau.dk/~uk/teaching/Reykjavik-05/mm1/introduction.pdf · if condition, then action Reykjavik University, April/May 2005, BNs and DGs:](https://reader034.fdocuments.in/reader034/viewer/2022042310/5ed72dbcc30795314c1757da/html5/thumbnails/84.jpg)
Summary
Axioms of probability theory.
Conditional probabilities.
The fundamental rule and Bayes’ rule.
Probability calculus.
Fundamental Rule.Bayes’ Rule.Combination and marginalization.The chain rule.
Conditional independence.
Evidence.
Reykjavik University, April/May 2005, BNs and DGs: Bayesian Probability Theory 84