Experiments in Bayes Nets

21
Liu, Smith 1 Jamie Liu and Adam Smith 6.825 – Project 2 11/4/2004 We learned a lot from this project. Enjoy. 1. Variable Elimination Functionality After executing our variable elimination procedure, we obtained the following results for each of the queries below. For the sake of easy analysis of the PropCost probability distributions obtained throughout this project from the insurance network, we define the function f to be a weighted average across the discrete domain, resulting in a single scalar value representativ e of the overall cost. More specifically, f = 1E5*P HundredThou + 1E6*P Million + 1E4*P TenThou + 1E3*P Thousand  1. P(Burglary | JohnCalls = true, MaryCalls = true) <[Burglary] = [false]> = 0.7158281646356072 <[Burglary] = [true]> = 0.284171835364393 2. P(Earthquake | JohnCalls = true, Burglary = true) <[Earthquake] = [false]> = 0.8239331615949207 <[Earthquake] = [true]> = 0.17606683840507917 3. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou) <[PropCost] = [HundredThou]> = 0.1729786918964137 <[PropCost] = [Million]> = 0.02709352198178344 <[PropCost] = [TenThou]> = 0.3427002442093675 <[PropCost] = [Thousand]> = 0.45722754191243536 (f = 48275.62) These results are consistent with those obtained by executing the given enumeration procedure, and those given in Table 1 of the project hand-out. 2. More Variable Elimination Exercise A. Insurance Network Queries

Transcript of Experiments in Bayes Nets

Page 1: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 1/21

Liu, Smith 1

Jamie Liu and Adam Smith6.825 – Project 211/4/2004

We learned a lot from this project. Enjoy.

1. Variable Elimination Functionality

After executing our variable elimination procedure, we obtained the following results for each of the queries below.

For the sake of easy analysis of the PropCost probability distributions obtainedthroughout this project from the insurance network, we define the function f to be aweighted average across the discrete domain, resulting in a single scalar valuerepresentative of the overall cost. More specifically,

f = 1E5*PHundredThou + 1E6*PMillion + 1E4*PTenThou + 1E3*PThousand 

1. P(Burglary | JohnCalls = true, MaryCalls = true)

<[Burglary] = [false]> = 0.7158281646356072

<[Burglary] = [true]> = 0.284171835364393

2. P(Earthquake | JohnCalls = true, Burglary = true)

<[Earthquake] = [false]> = 0.8239331615949207

<[Earthquake] = [true]> = 0.17606683840507917

3. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou)

<[PropCost] = [HundredThou]> = 0.1729786918964137

<[PropCost] = [Million]> = 0.02709352198178344

<[PropCost] = [TenThou]> = 0.3427002442093675

<[PropCost] = [Thousand]> = 0.45722754191243536

(f = 48275.62)

These results are consistent with those obtained by executing the given enumerationprocedure, and those given in Table 1 of the project hand-out.

2. More Variable Elimination Exercise

A. Insurance Network Queries

Page 2: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 2/21

Liu, Smith 2

1. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,MakeModel = SportsCar)

If the MakeModel of the car in question is that of a sports car then,

based on the network as illustrated in Figure 1 of the handout, weexpect that the driver would be less risk averse, the driver would have

more money, the car would be of higher value. All of these thingsshould cause the cost of insurance to “go up,” relative to our previousquery which did not involve any evidence about the MakeModel of the

car. An increase in the PropCost domain sense means that the

probability distribution should be shifted towards the higher costelements of the domain (e.g. Million might have a higher probability

than Thousand).

Indeed, this is what happens. As can be seen below, f is about four 

thousand dollars greater in this case relative to that from Section 1.3.

<[PropCost] = [HundredThou]> = 0.17179333672003955

<[PropCost] = [Million]> = 0.03093877334365239

<[PropCost] = [TenThou]> = 0.34593039737969233

<[PropCost] = [Thousand]> = 0.45133749255661565

(f = 52028.74)

2. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,GoodStudent = True)

In this case, counter-intuitive as it may seem, if the driver is aGoodStudent, then the overall cost of insurance goes up. This

follows from the network as shown in Figure 1 of the project handout,

i.e. GoodStudent is only connected to the network through two

parents: Age and SocioEcon. Since Age is an evidence variable,

SocioEcon is the only node affected by the augmentation of 

GoodStudent to the evidence. More specifically, if the adolescent

driver is a good student, they are likely to have more money, and thusdrive fancier cars, be less risk averse, et cetera.

This result is manifested in the results after variable elimination giventhe proper evidence. More specifically, f is a little less than four 

thousand dollars greater in this case relative to that from Section 1.3.

<[PropCost] = [HundredThou]> = 0.1837467917616061

<[PropCost] = [Million]> = 0.029748793596801583

<[PropCost] = [TenThou]> = 0.32771416728772235

<[PropCost] = [Thousand]> = 0.4587902473538701

(f = 51859.40)

Page 3: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 3/21

Liu, Smith 3

B. Carpo Network Queries

1. P(N112 | N64 = “3”, N113 = “1”, N116 = “0”)

<[N112] = [0]> = 0.9880400004226929

<[N112] = [1]> = 0.01195999957730707

2. P(N143 | N146 = “1”, N116 = “0”, N121 = “1”)

<[N143] = [0]> = 0.899999996961172

<[N143] = [1]> = 0.10000000303882783

3. Random Elimination Ordering

A. Histograms

Histogram of Computation Time underRandom Elimination Ordering: Problem 1

0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 10

Trials

 Figure 1. Histogram of Computation Time for P(ProbCost | Age =Adolescent, Antilock = False, Mileage = FiftyThou, MakeModel =SportsCar).

Page 4: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 4/21

Liu, Smith 4

Histogram of Computation Time underRandom Elimination Ordering: Problem 2

0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 10

Trials

 Figure 2. Histogram of Computation Time for P(ProbCost | Age =Adolescent, Antilock = False, Mileage = FiftyThou, GoodStudent = True).

Histogram of Computation Time underRandom Elimination Ordering: Problem 3

0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 10

Trials

 

Figure 3. Histogram of Computation Time for P(N112 | N64 = "3", N113 ="1", N116 = "0").

Page 5: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 5/21

Liu, Smith 5

Histogram of Computation Time underRandom Elimination Ordering: Problem 4

0

1000

2000

3000

4000

5000

6000

1 2 3 4 5 6 7 8 9 10

Trials

 Figure 4. Histogram of Computation Time for P(N143 | N146 = "1", N116 ="0", N121 = "1").

B. Discussion

Error! Reference source not found. through Error! Reference sourcenot found. illustrate the running time of a random order variable

elimination algorithm for each of the problems in Task 2 of the projecthandout. We ran the algorithm ten times for each problem. For each bar,if there it is stacked with a purple bar on top of it, then the heap ran out of memory during that execution. In this case, we know that the executionwould have taken at least the amount of time illustrated by the blue bar,the time it executed before running out of memory. We suppose thateach execution where the computer ran out of memory would have takenat least 5000 seconds to complete.

It is worth noting that the time taken on the successful runs (the sampleswithout a purple bar) is much lower than the time taken to execute theunsuccessful runs before they crashed. I.e. the successful blue bars tend

to be shorter than the unsuccessful blue bars. This indicates that either random ordering tends to get it very right or very wrong.

Page 6: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 6/21

Liu, Smith 6

4. Greedy Elimination Ordering

A. Histograms

Greedy Variable Elimination Runtimes

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1 2 3 4

Problem Number

 

Figure 5. Greedy Variable Elimination Runtimes for 10 trials of running each of 4

problems.

Problem Average Time (seconds)Insurance – 1 0.629

Insurance – 2 1.086

Carpo – 1 0.088Carpo – 2 0.087

Table 1. Average time of execution for variable elimination for the problemsfrom Task 2. Averages are constructed across ten independant runs each,which are illustrated in Figure 5.

B. Discussion

As can be seen fromTable 1, the time needed for variable elimination is much smaller for agreedy elimination ordering versus a random ordering. This makes a lot

of sense, because the random ordering could happen to eliminate aparent of many children, creating a huge factor which slows down thealgorithm and eats up memory. On the contrary, greedy ordering variableelimination works very well. Even in the cases from Section 3 in whichwe did not run out of memory, the greedy algorithm tends to be about100-200 times faster.

Page 7: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 7/21

Liu, Smith 7

5. Likelihood Weighting and Gibbs SamplingFunctionality

Each of our results below look like they are in the right neighborhood. Wegive more explicit quality results in the problems that follow this one.

A. Basic Results – Likelihood Weighting

1. P(Burglary | JohnCalls = true, MaryCalls = true)

<[Burglary] = [false]> = 0.5448387970739699

<[Burglary] = [true]> = 0.4551612029260302

2. P(Earthquake | JohnCalls = true, Burglary = true)

<[Earthquake] = [false]> = 0.9997158283603297

<[Earthquake] = [true]> = 2.8417163967036946E-4

3. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou)

<[PropCost] = [HundredThou]> = 0.17105091038203132

<[PropCost] = [Million]> = 0.021563876240368398

<[PropCost] = [TenThou]> = 0.35877461270610517

<[PropCost] = [Thousand]> = 0.44861060067149516

4. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,MakeModel = SportsCar)

<[PropCost] = [HundredThou]> = 0.16339257873401916

<[PropCost] = [Million]> = 0.030620517617711222

<[PropCost] = [TenThou]> = 0.35048331774243846

<[PropCost] = [Thousand]> = 0.4555035859058312

5. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,GoodStudent = True)

<[PropCost] = [HundredThou]> = 0.20177159162635994<[PropCost] = [Million]> = 0.032866049889275516

<[PropCost] = [TenThou]> = 0.30414914618811645

<[PropCost] = [Thousand]> = 0.46121321229624807

6. P(N112 | N64 = “3”, N113 = “1”, N116 = “0”)

<[N112] = [0]> = 0.9910128302117664

<[N112] = [1]> = 0.00898716978823346

Page 8: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 8/21

Liu, Smith 8

7. P(N143 | N146 = “1”, N116 = “0”, N121 = “1”)

<[N143] = [0]> = 0.9172494563262301

<[N143] = [1]> = 0.08275054367376986

B. Basic Results – Gibbs Sampling

1. P(Burglary | JohnCalls = true, MaryCalls = true)

<[Burglary] = [false]> = 0.71

<[Burglary] = [true]> = 0.29

2. P(Earthquake | JohnCalls = true, Burglary = true)

<[Earthquake] = [false]> = 0.842<[Earthquake] = [true]> = 0.158

3. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou)

<[PropCost] = [HundredThou]> = 0.06

<[PropCost] = [Million]> = 0.01

<[PropCost] = [TenThou]> = 0.355

<[PropCost] = [Thousand]> = 0.5750000000000001

4. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,

MakeModel = SportsCar)

<[PropCost] = [HundredThou]> = 0.09

<[PropCost] = [Million]> = 0.011

<[PropCost] = [TenThou]> = 0.34

<[PropCost] = [Thousand]> = 0.559

5. P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,GoodStudent = True)

<[PropCost] = [HundredThou]> = 0.213

<[PropCost] = [Million]> = 0.038

<[PropCost] = [TenThou]> = 0.372<[PropCost] = [Thousand]> = 0.377

6. P(N112 | N64 = “3”, N113 = “1”, N116 = “0”)

<[N112] = [0]> = 0.97

<[N112] = [1]> = 0.03

Page 9: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 9/21

Liu, Smith 9

7. P(N143 | N146 = “1”, N116 = “0”, N121 = “1”)

<[N143] = [0]> = 0.922

<[N143] = [1]> = 0.078

6. Ignoring Prefix of Samples in Gibbs Sampling

A. Results

Prefix Throwaway in Gibbs Sampling

0.00E+00

5.00E-04

1.00E-03

1.50E-03

2.00E-03

2.50E-03

3.00E-03

3.50E-03

4.00E-03

0 200 400 600 800 1000

Size of Prefix Thrown Away

   K   L

   D   i  v  e  r  g

  e  n  c  e

 

Figure 6. Quality (KL divergence) of estimates produced by Gibbs sampler.Each run used 2000 samples, and threw away the first x samples, theindependant variable expressed on the x-axis.

Prefix Throwaway in Gibbs Sampling

0.00E+00

2.00E-04

4.00E-04

6.00E-04

8.00E-04

1.00E-03

1.20E-03

0 200 400 600 800 1000

Size of Prefix Thrown Away

   A  v  e  r  a  g  e   K   L

   D   i  v  e  r  g  e  n  c  e

 

Figure 7. Averages for different prefix throwaway sizes from Figure 6.

Page 10: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 10/21

Liu, Smith 10

B. Discussion

In this analysis, we ran the Gibbs sampler with 2000 samples on thesame problem (Carpo – 1). For each iteration, we threw away a variablenumber of the first samples. The idea is that since Gibbs sampling is aMarkov Chain algorithm, each sample highly depends on the samples

before it. Since we choose a random initialization vector for eachvariable, it can take some “burn in” time before the algorithm begins tosettle into the right global solution.

The results of our experiments are expressed in Figure 6 and Figure 7. Wehave a fairly nice characteristic curve as can be seen in the averagegraph, with the only exception being when we threw away the first 600samples. Looking at each run, however, at x = 600 there was a singleoutlier with an extremely high KL divergence; we can ignore it based onthe many runs that we did. It seems that the ideal “burn in” time, a trade-off between good initialization and diversity of counted samples, is 800samples. 

7. Detailed Analysis – KL Divergences

A. Results

We present results indexed first by the algorithm (Likelihood Weighting,then Gibbs Samples) and then by the problem. Within each problem wedisplay two graphs: the first showing the results from ten iterations, andthe second showing the average KL divergence across each iteration.

Page 11: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 11/21

Liu, Smith 11

1. Likelihood Weighting

Likelihood Weighting - Problem Insurance1

0.00E+00

1.00E-02

2.00E-02

3.00E-02

4.00E-02

5.00E-02

6.00E-02

7.00E-02

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

)Number of Samples (x1000

   K   L   D   i  v  e  r  g  e  n  c  e

Figure 3. KL Divergences when applying Likelihood Weighting to P(PropCost |

Age = Adolescent, Antilock=False, Mileage = FiftyThou, MakeModel = SportsCar).

Likelihood Weighting: Average KL Divergence -

Problem Insurance1

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

   1   0   0

   2   0   0

   3   0   0

  4   0   0

   5   0   0

  6   0   0

   7   0   0

   8   0   0

   9   0   0

   1   0   0

   0

   1   1   0

   0

   1   2   0

   0

   1   3   0

   0

   1  4   0   0

   1   5   0

   0

   1  6   0   0

   1   7   0

   0

   1   8   0

   0

   1   9   0

   0

   2   0   0

   0

Number of Samples

   K   L   D   i  v  e  r  g  e  n  c  e

 Figure 4. Average KL Divergence when applying Likelihood Weighting to

P(PropCost | Age = Adolescent, Antilock=False, Mileage = FiftyThou, MakeModel =

SportsCar) to sample sizes between 100 and 2000.

Page 12: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 12/21

Liu, Smith 12

Likelihood Weighting - Problem Insurance2

0.00E+00

1.00E-02

2.00E-02

3.00E-02

4.00E-02

5.00E-02

6.00E-02

7.00E-02

  1   0   0

   2   0   0

   3   0   0

  4   0   0

   5   0   0

   6   0   0

   7   0   0

   8   0   0

   9   0   0

  1   0   0   0

  1  1   0   0

  1   2   0   0

  1   3   0   0

  1  4   0   0

  1   5   0   0

  1   6   0   0

  1   7   0   0

  1   8   0   0

  1   9   0   0

   2   0   0   0

Sample Size

   D   i  v  e  r  g  e  n  c  e

 

Figure 8. KL Divergences when applying Likelihood Weighting to P(PropCost | Age= Adolescent, Antilock=False, Mileage = FiftyThou, GoodStudent = True).

Likelihood Weighting: Average KL Divergence -

Problem Insurance2

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

   1   0   0

   2   0   0

   3   0   0

  4   0   0

   5   0   0

  6   0   0

   7   0   0

   8   0   0

   9   0   0

   1   0   0   0

   1   1   0   0

   1   2   0   0

   1   3   0   0

   1  4   0   0

   1   5   0   0

   1  6   0   0

   1   7   0   0

   1   8   0   0

   1   9   0   0

   2   0   0   0

Number of Samples

   K   L   D   i  v  e

  r  g  e  n  c  e

 

Figure 9. Average KL Divergence when applying Likelihood Weighting to

P(PropCost | Age = Adolescent, Antilock=False, Mileage = FiftyThou, GoodStudent= True).

Page 13: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 13/21

Liu, Smith 13

Likelihood Weighting - Problem 3

0.00E+00

1.00E-02

2.00E-02

3.00E-02

4.00E-02

5.00E-02

6.00E-02

7.00E-02

8.00E-02

       1

       0       0

       2

       0       0

       3

       0       0

       4

       0       0

       5

       0       0

       6

       0       0

       7

       0       0

       8

       0       0

       9

       0       0

       1       0

       0       0

       1       1

       0       0

       1       2

       0       0

       1       3

       0       0

       1       4

       0       0

       1       5

       0       0

       1       6

       0       0

       1       7

       0       0

       1       8

       0       0

       1       9

       0       0

       2       0

       0       0

Number of Samples

 

Figure 10. KL Divergences when applying Likelihood Weighting to P(N112 | N64 =

"3", N113 = "1", N116 = "0").

Likelihood Weighting: Average KL Divergence -

Problem Carpo1

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

   1   0   0

   2   0   0

   3   0   0

  4   0   0

   5   0   0

  6   0   0

   7   0   0

   8   0   0

   9   0   0

   1   0   0   0

   1   1   0   0

   1   2   0   0

   1   3   0   0

   1  4   0   0

   1   5   0   0

   1  6   0   0

   1   7   0   0

   1   8   0   0

   1   9   0   0

   2   0   0   0

Number of Samples

   K   L   D   i  v  e  r  g  e  n  c  e

 

Figure 11. Average KL Divergence when applying Likelihood Weighting to P(N112 |

N64 = "3", N113 = "1", N116 = "0").

Page 14: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 14/21

Liu, Smith 14

Likelihood Weighting - Problem 4

0.00E+00

5.00E-03

1.00E-02

1.50E-02

2.00E-02

2.50E-02

       1       0       0

       2       0       0

       3       0       0

       4       0       0

       5       0       0

       6       0       0

       7       0       0

       8       0       0

       9       0       0

       1

       0       0       0

       1

       1       0       0

       1

       2       0       0

       1

       3       0       0

       1

       4       0       0

       1

       5       0       0

       1

       6       0       0

       1

       7       0       0

       1

       8       0       0

       1

       9       0       0

       2

       0       0       0

Number of Samples

 

Figure 12. KL Divergences when applying Likelihood Weighting to P(N143 | N146 =

"1", N116 = "0", N121 = "1").

Likelihood Weighting: Average KL Divergence -

Problem Carpo2

0

0.001

0.002

0.003

0.004

0.005

0.006

0.007

   1   0   0

   2   0   0

   3   0   0

  4   0   0

   5   0   0

  6   0   0

   7   0   0

   8   0   0

   9   0   0

   1   0   0   0

   1   1   0   0

   1   2   0   0

   1   3   0   0

   1  4   0   0

   1   5   0   0

   1  6   0   0

   1   7   0   0

   1   8   0   0

   1   9   0   0

   2   0   0   0

Number of Samples

   K   L

   D   i  v  e  r  g  e  n  c  e

 

Figure 13. Average KL Divergence when applying Likelihood Weighting to P(N143 |

N146 = "1", N116 = "0", N121 = "1").

Page 15: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 15/21

Liu, Smith 15

2. Gibbs Sampling

Gibbs Sampling: KL Divergences vs Number of Samples for Problem 1

0

0.2

0.4

0.6

0.8

1

1.2

       1       0       0       0

       2       0       0       0

       3       0       0       0

       4       0       0       0

       5       0       0       0

       6       0       0       0

       7       0       0       0

       8       0       0       0

       9       0       0       0

       1       0       0       0       0

       1       1       0       0       0

       1       2       0       0       0

       1       3       0       0       0

       1       4       0       0       0

       1       5       0       0       0

       1       6       0       0       0

       1       7       0       0       0

       1       8       0       0       0

       1       9       0       0       0

       2       0       0       0       0

       2       1       0       0       0

       2       2       0       0       0

       2       3       0       0       0

       2       4       0       0       0

       2       5       0       0       0

Number of Samples

 

Figure 14. Divergences resulting from Gibbs Sampling applied to P(PropCost | Age

= Adolescent, Antilock = False, Mileage = FiftyThou, MakeModel = SportsCar) for

sample sizes between 1000 and 25000.

Page 16: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 16/21

Liu, Smith 16

Gibbs Sampling: Average KL Divergence vs Numberof Samples for Problem 1

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

       1       0       0       0

       2       0       0       0

       3       0       0       0

       4       0       0       0

       5       0       0       0

       6       0       0       0

       7       0       0       0

       8       0       0       0

       9       0       0       0

       1

       0       0       0       0

       1

       1       0       0       0

       1

       2       0       0       0

       1

       3       0       0       0

       1

       4       0       0       0

       1

       5       0       0       0

       1

       6       0       0       0

       1

       7       0       0       0

       1

       8       0       0       0

       1

       9       0       0       0

       2

       0       0       0       0

       2

       1       0       0       0

       2

       2       0       0       0

       2

       3       0       0       0

       2

       4       0       0       0

       2

       5       0       0       0

Number of Samples

 

Figure 15. Average divergence resulting from Gibbs Sampling applied to

P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou, MakeModel

= SportsCar) for sample sizes between 1000 and 25000.

Gibbs Sampling: KL Divergences vs Number of Samples for Problem 2

0

0.2

0.4

0.6

0.8

1

1.2

       1       0

       0       0

       2       0

       0       0

       3       0

       0       0

       4       0

       0       0

       5       0

       0       0

       6       0

       0       0

       7       0

       0       0

       8       0

       0       0

       9       0

       0       0

       1       0       0

       0       0

       1       1       0

       0       0

       1       2       0

       0       0

       1       3       0

       0       0

       1       4       0

       0       0

       1       5       0

       0       0

       1       6       0

       0       0

       1       7       0

       0       0

       1       8       0

       0       0

       1       9       0

       0       0

       2       0       0

       0       0

       2       1       0

       0       0

       2       2       0

       0       0

       2       3       0

       0       0

       2       4       0

       0       0

       2       5       0

       0       0

Number of Samples

 

Figure 16. Divergences resulting from Gibbs Sampling applied to P(PropCost | Age

= Adolescent, Antilock = False, Mileage = FiftyThou, GoodStudent = True) for

sample sizes between 1000 and 25000.

Page 17: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 17/21

Liu, Smith 17

Gibbs Sampling: Average KL Divergence vs Numberof Samples for Problem 2

0

0.05

0.1

0.15

0.2

0.25

       1       0       0       0

       2       0       0       0

       3       0       0       0

       4       0       0       0

       5       0       0       0

       6       0       0       0

       7       0       0       0

       8       0       0       0

       9       0       0       0

       1       0       0       0       0

       1       1       0       0       0

       1       2       0       0       0

       1       3       0       0       0

       1       4       0       0       0

       1       5       0       0       0

       1       6       0       0       0

       1       7       0       0       0

       1       8       0       0       0

       1       9       0       0       0

       2       0       0       0       0

       2       1       0       0       0

       2       2       0       0       0

       2       3       0       0       0

       2       4       0       0       0

       2       5       0       0       0

Number of Samples

 

Figure 17. Average divergence resulting from Gibbs Sampling applied to

P(PropCost | Age = Adolescent, Antilock = False, Mileage = FiftyThou,

GoodStudent = True) for sample sizes between 1000 and 25000.

Gibbs Sampling: KL Divergences vs Number of Samples for Problem 3

0.00E+00

5.00E-04

1.00E-03

1.50E-03

2.00E-03

2.50E-03

3.00E-03

3.50E-03

4.00E-03

   1   0

   0   0

   3   0

   0   0

   5   0

   0   0

   7   0

   0   0

   9   0

   0   0

   1   1   0

   0   0

   1   3   0

   0   0

   1   5   0

   0   0

   1   7   0

   0   0

   1   9   0

   0   0

   2   1   0

   0   0

   2   3   0

   0   0

   2   5   0

   0   0

Number of Samples

 

Figure 18. Divergences resulting from Gibbs Sampling applied to P(N112 | N64 =

"3", N113 = "1", N116 = "0") for sample sizes between 1000 and 25000.

Page 18: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 18/21

Liu, Smith 18

Gibbs Sampling: Average KL Divergence vs Numberof Samples for Problem 3

0.00E+00

2.00E-04

4.00E-04

6.00E-04

8.00E-04

1.00E-03

1.20E-03

   1   0   0

   0

   3   0   0

   0

   5   0   0

   0

   7   0   0

   0

   9   0   0

   0

   1   1   0   0

   0

   1   3   0   0

   0

   1   5   0   0

   0

   1   7   0   0

   0

   1   9   0   0

   0

   2   1   0   0

   0

   2   3   0   0

   0

   2   5   0   0

   0

Number of Samples

 

Figure 19. Average Divergence resulting from Gibbs Sampling applied to P(N112 |

N64 = "3", N113 = "1", N116 = "0") for sample sizes between 1000 and 25000.

Gibbs Sampling: KL Divergences vs Number of Samples for Problem 4

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

       1       0       0       0

       2       0       0       0

       3       0       0       0

       4       0       0       0

       5       0       0       0

       6       0       0       0

       7       0       0       0

       8       0       0       0

       9       0       0       0

       1       0       0       0       0

       1       1       0       0       0

       1       2       0       0       0

       1       3       0       0       0

       1       4       0       0       0

       1       5       0       0       0

       1       6       0       0       0

       1       7       0       0       0

       1       8       0       0       0

       1       9       0       0       0

       2       0       0       0       0

       2       1       0       0       0

       2       2       0       0       0

       2       3       0       0       0

       2       4       0       0       0

       2       5       0       0       0

Number of Samples

 

Figure 20. Divergences resulting from Gibbs Sampling applied to P(N143 | N146 =

"1", N116 = "0", N121 = "1") for sample sizes between 1000 and 25000.

Page 19: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 19/21

Liu, Smith 19

Gibbs Sampling: Average KL Divergence vs Numberof Samples for Problem 4

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

       1       0       0       0

       2       0       0       0

       3       0       0       0

       4       0       0       0

       5       0       0       0

       6       0       0       0

       7       0       0       0

       8       0       0       0

       9       0       0       0

       1       0       0       0       0

       1       1       0       0       0

       1       2       0       0       0

       1       3       0       0       0

       1       4       0       0       0

       1       5       0       0       0

       1       6       0       0       0

       1       7       0       0       0

       1       8       0       0       0

       1       9       0       0       0

       2       0       0       0       0

       2       1       0       0       0

       2       2       0       0       0

       2       3       0       0       0

       2       4       0       0       0

       2       5       0       0       0

Number of Samples

 

Figure 21. Average divergence resulting from Gibbs Sampling applied to P(N143 |

N146 = "1", N116 = "0", N121 = "1") for sample sizes between 1000 and 25000.

B. Discussion of Results

Four interesting things:

1. Number of samples in Gibbs versus Likelihood WeightingAs seen from the figures in Section 7.A.1, Likelihood weighting tendsto converge after about 500 samples, but always after 1000 in our problems and analyses.

We originally assumed that Gibbs sampling would converge in aboutthe same time, if not better. It turns out that Gibbs takes much longer;it typically converges by 5000 samples, a full order of magnitudehigher, as can be seen from the figures in Section 7.A.2. This is likelybecause of the Markov Chain approach used; since each sampledepends on the ones before it, it can take many iterations before the

algorithm settles into the global optima, whereas likelihood weightingby definition discovers the appropriate probabilities (i.e. weights).

2. Variance of time to converge can be highThe convergence of Likelihood Weighting in Problem 3, as illustratedin Figure 10 and Figure 11, exhibits very interesting properties. In theother problems, likelihood weighting runs tended to exhibit relativelylow variance in time to convergence. However, here we see someruns which converged very quickly, and others that took abnormally

Page 20: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 20/21

Liu, Smith 20

long. This high variance occurred with high consistency in thisproblem, and thus is likely induced by some characteristic in theproblem; one likely explanation is that our query variable is a leaf node in a very poly-tree-like network.

3. Convergence is logarithmic

This is an evident feature of all of the graphs, but has enormousimplications for a choice of algorithms.

The criterion for “completeness” of an algorithm is that it arrives at theright answer. In the case of the sampling methods that we surveyed,unfortunately it takes an infinite time to arrive at the right answer.However, it is important to note that variable elimination alwaysarrives at the exact answer. Thus, if a user needs completeness (i.e.the right answer), they should probably use variable elimination.

However, if they only need a certain level of completeness, i.e. theywant to be x% right, they still cannot rely on sampling methods. This

gives rise to the x% correct y% of the time metric. We certainly seethis from our graphs.

4. Local optima in Gibbs sampling, but not in Likelihood WeightingThis is a very interesting point. In both problems 3 and 4 from Task 2under Gibbs sampling, one of the runs from each of these problemsdo not converge to zero. Instead, they seem to converge to a localoptima (which is not the global optima). This can be seen in the pinkline in Figure 18 and the jungle green line in Figure 20.

This is probably more likely in some networks than others. We couldprobably construct a very simple network that would not provoke this

behavior.

C. Computational Considerations – Sampling versus Variable Elimination

In comparing the computation time of sampling methods to variableelimination, we limit ourselves to discussion of greedy ordering variableelimination; since random ordering is very sub-optimal (see Section 4).

It turns out that for the networks and queries that we considered, variableelimination is the champ on both accuracy and speed. As can be seenfrom Table 2, variable elimination performed in near-second times on eachproblem, while Gibbs took about 15 seconds and Likelihood Weightingtook around 5 seconds.

This is with 1000 samples for the sampling algorithms, and an effectiveinfinite samples for variable elimination.

Our results might have been different if the networks involved were muchmore dense (i.e. connected) or much larger.

Page 21: Experiments in Bayes Nets

8/15/2019 Experiments in Bayes Nets

http://slidepdf.com/reader/full/experiments-in-bayes-nets 21/21

Liu, Smith 21

Task2.Insurance1

Task2.Insurance2

Task2.Carpo1

Task2.Carpo2

VariableElimination

0.741 1.142 0.120 0.090

GibbsSampling

12.778 13.530 19.228 18.045

LikelihoodWeighting

4.377 4.687 5.608 5.317

Table 2. Execution time of various algorithms on the four problems from Task 2.