I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference...
-
Upload
rodger-wilkins -
Category
Documents
-
view
217 -
download
2
Transcript of I NFERENCE IN B AYESIAN N ETWORKS. A GENDA Reading off independence assumptions Efficient inference...
INFERENCE IN BAYESIAN NETWORKS
AGENDA
Reading off independence assumptions Efficient inference in Bayesian Networks
Top-down inference Variable elimination Monte-Carlo methods
SOME APPLICATIONS OF BN
Medical diagnosis Troubleshooting of hardware/software
systems Fraud/uncollectible debt detection Data mining Analysis of genetic sequences Data interpretation, computer vision, image
understanding
MORE COMPLICATED SINGLY-CONNECTED BELIEF NET
Radio
Battery
SparkPlugs
Starts
Gas
Moves
Region = {Sky, Tree, Grass, Rock}
R2
R4R3
R1
Above
BN to evaluate insurance risks
BN FROM LAST LECTURE
Burglary Earthquake
Alarm
MaryCallsJohnCalls
causes
effects
Directed acyclic graph
Intuitive meaning of arc from x to y:
“x has direct influence on y”
ARCS DO NOT NECESSARILY ENCODE CAUSALITY!
A
B
C
C
B
A
2 BN’s that can encode the same joint probability distribution
READING OFF INDEPENDENCE RELATIONSHIPS
Given B, does the value of A affect the probability of C? P(C|B,A) = P(C|B)?
No! C parent’s (B) are
given, and so it is independent of its non-descendents (A)
Independence is symmetric:C A | B => A C | B
A
B
C
WHAT DOES THE BN ENCODE?
Burglary EarthquakeJohnCalls MaryCalls | AlarmJohnCalls Burglary | AlarmJohnCalls Earthquake | AlarmMaryCalls Burglary | AlarmMaryCalls Earthquake | Alarm
Burglary Earthquake
Alarm
MaryCallsJohnCalls
A node is independent of its non-descendents, given its parents
READING OFF INDEPENDENCE RELATIONSHIPS
How about Burglary Earthquake | Alarm ? No! Why?
Burglary Earthquake
Alarm
MaryCallsJohnCalls
READING OFF INDEPENDENCE RELATIONSHIPS
How about Burglary Earthquake | Alarm ? No! Why? P(BE|A) = P(A|B,E)P(BE)/P(A) = 0.00075 P(B|A)P(E|A) = 0.086
Burglary Earthquake
Alarm
MaryCallsJohnCalls
READING OFF INDEPENDENCE RELATIONSHIPS
How about Burglary Earthquake | JohnCalls? No! Why? Knowing JohnCalls affects the probability of
Alarm, which makes Burglary and Earthquake dependent
Burglary Earthquake
Alarm
MaryCallsJohnCalls
INDEPENDENCE RELATIONSHIPS
Rough intuition (this holds for tree-like graphs, polytrees): Evidence on the (directed) road between two
variables makes them independent Evidence on an “A” node makes descendants
independent Evidence on a “V” node, or below the V, makes
the ancestors of the variables dependent (otherwise they are independent)
Formal property in general case : D-separation independence (see R&N)
BENEFITS OF SPARSE MODELS
Modeling Fewer relationships need to be encoded (either
through understanding or statistics) Large networks can be built up from smaller ones
Intuition Dependencies/independencies between variables
can be inferred through network structures Tractable inference
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCESuppose we want to compute P(Alarm)
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCESuppose we want to compute P(Alarm)1. P(Alarm) = Σb,e P(A,b,e)2. P(Alarm) = Σb,e P(A|b,e)P(b)P(e)
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCESuppose we want to compute P(Alarm)1. P(Alarm) = Σb,e P(A,b,e)2. P(Alarm) = Σb,e P(A|b,e)P(b)P(e)3. P(Alarm) = P(A|B,E)P(B)P(E) +
P(A|B, E)P(B)P(E) +P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCESuppose we want to compute P(Alarm)1. P(A) = Σb,e P(A,b,e)2. P(A) = Σb,e P(A|b,e)P(b)P(e)3. P(A) = P(A|B,E)P(B)P(E) +
P(A|B, E)P(B)P(E) +P(A|B,E)P(B)P(E) +P(A|B,E)P(B)P(E)
4. P(A) = 0.95*0.001*0.002 +0.94*0.001*0.998 +0.29*0.999*0.002 +0.001*0.999*0.998= 0.00252
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCENow, suppose we want to compute P(MaryCalls)
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCENow, suppose we want to compute P(MaryCalls)1. P(M) = P(M|A)P(A) + P(M| A) P(A)
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCENow, suppose we want to compute P(MaryCalls)1. P(M) = P(M|A)P(A) + P(M| A) P(A)2. P(M) = 0.70*0.00252 + 0.01*(1-0.0252)
= 0.0117
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(Alarm|Earthquake)
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(A|e)1. P(A|e) = Σb P(A,b|e)2. P(A|e) = Σb P(A|b,e)P(b)
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
TOP-DOWN INFERENCE WITH EVIDENCE
Suppose we want to compute P(A|e)1. P(A|e) = Σb P(A,b|e)2. P(A|e) = Σb P(A|b,e)P(b)3. P(A|e) = 0.95*0.001 +
0.29*0.999 += 0.29066
TOP-DOWN INFERENCE
Only works if the graph of ancestors of a variable is a polytree
Evidence given on ancestor(s) of the query variable
Efficient: O(d 2k) time, where d is the number of ancestors
of a variable, with k a bound on # of parents Evidence on an ancestor cuts off influence of
portion of graph above evidence node
QUERYING THE BN
The BN gives P(T|C) What about P(C|T)?
Cavity
Toothache
P(C)
0.1
C P(T|C)
TF
0.40.01111
BAYES’ RULE
P(AB) = P(A|B) P(B)= P(B|A) P(A)
So… P(A|B) = P(B|A) P(A) / P(B)
APPLYING BAYES’ RULE Let A be a cause, B be an effect, and let’s say we
know P(B|A) and P(A) (conditional probability tables)
What’s P(B)?
APPLYING BAYES’ RULE Let A be a cause, B be an effect, and let’s say we
know P(B|A) and P(A) (conditional probability tables)
What’s P(B)? P(B) = Sa P(B,A=a) [marginalization]
P(B,A=a) = P(B|A=a)P(A=a) [conditional probability]
So, P(B) = Sa P(B | A=a) P(A=a)
APPLYING BAYES’ RULE Let A be a cause, B be an effect, and let’s say we
know P(B|A) and P(A) (conditional probability tables)
What’s P(A|B)?
APPLYING BAYES’ RULE Let A be a cause, B be an effect, and let’s say we
know P(B|A) and P(A) (conditional probability tables)
What’s P(A|B)? P(A|B) = P(B|A)P(A)/P(B) [Bayes
rule] P(B) = Sa P(B | A=a) P(A=a) [Last
slide] So, P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)]
HOW DO WE READ THIS?
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)] [An equation that holds for all values A can take on,
and all values B can take on] P(A=a|B=b) =
HOW DO WE READ THIS?
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)] [An equation that holds for all values A can take on,
and all values B can take on] P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa P(B=b | A=a) P(A=a)]
Are these the same a?
HOW DO WE READ THIS?
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)] [An equation that holds for all values A can take on,
and all values B can take on] P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa P(B=b | A=a) P(A=a)]
Are these the same a?
NO!
HOW DO WE READ THIS?
P(A|B) = P(B|A)P(A) / [Sa P(B | A=a) P(A=a)] [An equation that holds for all values A can take on,
and all values B can take on] P(A=a|B=b) = P(B=b|A=a)P(A=a) /
[Sa’ P(B=b | A=a’) P(A=a’)]
Be careful about indices!
QUERYING THE BN The BN gives P(T|C) What about P(C|T)? P(Cavity|Toothache) =
P(Toothache|Cavity) P(Cavity)
P(Toothache)
[Bayes’ rule]
Querying a BN is just applying Bayes’ rule on a larger scale…
Cavity
Toothache
P(C)
0.1
C P(T|C)
TF
0.40.01111 Denominator computed by
summing out numerator over Cavity and Cavity
PERFORMING INFERENCE
Variables X Have evidence set E=e, query variable Q Want to compute the posterior probability
distribution over Q, given E=e Let the non-evidence variables be Y (= X \ E) Straight forward method:
1. Compute joint P(YE=e)2. Marginalize to get P(Q,E=e)3. Divide by P(E=e) to get P(Q|E=e)
INFERENCE IN THE ALARM EXAMPLE
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
P(J|M) = ??
Query Q
Evidence E=e
INFERENCE IN THE ALARM EXAMPLE
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
P(J|MaryCalls) = ??
1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
P(x1x2…xn) = Pi=1,…,nP(xi|parents(Xi))
full joint distribution table
24 entries
INFERENCE IN THE ALARM EXAMPLE
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
P(J|MaryCalls) = ??
1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)
2 entries:one for JohnCalls,the other for JohnCalls
INFERENCE IN THE ALARM EXAMPLE
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
P(J|MaryCalls) = ??
1. P(J,A,B,E,MaryCalls) =P(J|A)P(MaryCalls|A)P(A|B,E)P(B)P(E)
2. P(J,MaryCalls) =Sa,b,e P(J,A=a,B=b,E=e,MaryCalls)
3. P(J|MaryCalls) = P(J,MaryCalls)/P(MaryCalls)= P(J,MaryCalls)/(SjP(j,MaryCalls))
HOW EXPENSIVE?
P(X) = P(x1x2…xn) = Pi=1,…,n P(xi|parents(Xi))
Straightforward method:1. Use above to compute P(Y,E=e)2. P(Q,E=e) = Sy1 … Syk P(Y,E=e)
3. P(E=e) = Sq P(Q,E=e) Step 1: O( 2n-|E| ) entries!
Normalization factor – no big deal once we have P(Q,E=e)
Can we do better?
VARIABLE ELIMINATION
Consider linear network X1X2X3
P(X) = P(X1) P(X2|X1) P(X3|X2)
P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)
VARIABLE ELIMINATION
Consider linear network X1X2X3
P(X) = P(X1) P(X2|X1) P(X3|X2)
P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)
= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
Rearrange equation…
VARIABLE ELIMINATION
Consider linear network X1X2X3
P(X) = P(X1) P(X2|X1) P(X3|X2)
P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)
= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
= Σx2 P(X3|x2) P(x2)Computed for each value of X2
Cache P(x2) for both values of X3!
VARIABLE ELIMINATION
Consider linear network X1X2X3
P(X) = P(X1) P(X2|X1) P(X3|X2)
P(X3) = Σx1 Σx2 P(x1) P(x2|x1) P(X3|x2)
= Σx2 P(X3|x2) Σx1 P(x1) P(x2|x1)
= Σx2 P(X3|x2) P(x2)Computed for each value of X2
How many * and + saved?*: 2*4*2=16 vs 4+4=8+ 2*3=8 vs 2+1=3
Can lead to huge gains in larger networks
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) P(j,m|E,b) Compute for all values of E,b
VE IN ALARM EXAMPLE
P(E|j,m)=P(E,j,m)/P(j,m) P(E,j,m) = ΣaΣb P(E) P(b) P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) Σa P(a|E,b) P(j|a) P(m|a)
= P(E) Σb P(b) P(j,m|E,b)
= P(E) P(j,m|E) Compute for all values of E
WHAT ORDER TO PERFORM VE?
For tree-like BNs (polytrees), order so parents come before children # of variables in each intermediate probability
table is 2^(# of parents of a node) If the number of parents of a node is
bounded, then VE is linear time!
Other networks: intermediate factors may become large
NON-POLYTREE NETWORKS
P(D) = Σa Σb Σc P(A)P(B|A)P(C|A)P(D|B,C) = Σb Σc P(D|B,C) Σa P(A)P(B|A)P(C|A)
A
B C
D
No more simplifications…
APPROXIMATE INFERENCE TECHNIQUES
Based on the idea of Monte Carlo simulation Basic idea:
To estimate the probability of a coin flipping heads, I can flip it a huge number of times and count the fraction of heads observed
Conditional simulation: To estimate the probability P(H) that a coin picked
out of bucket B flips heads, I can:1. Pick a coin C out of B (occurs with probability P(C))2. Flip C and observe whether it flips heads (occurs
with probability P(H|C))3. Put C back and repeat from step 1 many times4. Return the fraction of heads observed (estimate of
P(H))
APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION
Sample from the joint distribution
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=0E=0A=0J=1M=0
APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION
As more samples are generated, the distribution of the samples approaches the joint distribution!
B=0E=0A=0J=1M=0
B=0E=0A=0J=0M=0
B=0E=0A=0J=0M=0
B=1E=0A=1J=1M=0
APPROXIMATE INFERENCE: MONTE-CARLO SIMULATION
Inference: given evidence E=e (e.g., J=1) Remove the samples that conflict
B=0E=0A=0J=1M=0
B=0E=0A=0J=0M=0
B=0E=0A=0J=0M=0
B=1E=0A=1J=1M=0
Distribution of remaining samples approximates the conditional distribution!
HOW MANY SAMPLES?
Error of estimate, for n samples, is on average
Variance-reduction techniques
RARE EVENT PROBLEM:
What if some events are really rare (e.g., burglary & earthquake ?)
# of samples must be huge to get a reasonable estimate
Solution: likelihood weighting Enforce that each sample agrees with evidence While generating a sample, keep track of the
ratio of(how likely the sampled value is to occur in the real world)
(how likely you were to generate the sampled value)
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
w=1
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=0E=1
w=0.008
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=0E=1A=1
w=0.0023
A=1 is enforced, and the weight updated to reflect the likelihood that this occurs
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=0E=1A=1M=1J=1
w=0.0016
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=0E=0
w=3.988
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=0E=0A=1
w=0.004
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=0E=0A=1M=1J=1
w=0.0028
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=1E=0A=1
w=0.00375
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=1E=0A=1M=1J=1
w=0.0026
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
B E P(A|…)
TTFF
TFTF
0.950.940.290.001
Burglary Earthquake
Alarm
MaryCallsJohnCalls
P(B)
0.001
P(E)
0.002
A P(J|…)
TF
0.900.05
A P(M|…)
TF
0.700.01
B=1E=1A=1M=1J=1
w=5e-7
LIKELIHOOD WEIGHTING
Suppose evidence Alarm & MaryCalls Sample B,E with P=0.5
N=4 gives P(B|A,M)~=0.371 Exact inference gives P(B|A,M) = 0.375
B=0E=1A=1M=1J=1
w=0.0016
B=0E=0A=1M=1J=1
w=0.0028
B=1E=0A=1M=1J=1
w=0.0026
B=1E=1A=1M=1J=1
w~=0
RECAP
Efficient inference in BNs Variable elimination Approximate methods: Monte-Carlo sampling
NEXT LECTURE
Statistical learning: from data to distributions R&N 20.1-2