Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining...

22
Software Engineering Labo ratory 1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining...

Page 1: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory

1

Introduction of Bayesian Network

4 / 20 / 2005

CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba

Page 2: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 2

References [1] D. Heckerman: “A Tutorial on Learning

with Bayesian Networks”, In “Learning in Graphical Models”, ed. M.I. Jordan, The MIT Press, 1998.

[2] http://www.cs.huji.ac.il/~nir/Nips01-Tutorial/

[3]Jiawei Han:”Data Mining Concepts and Techniques”,ISBN 1-53860-489-8

[4] Whittaker, J.: Graphical Models in Applied Multivariate Statistics, John Wiley and Sons (1990)

Page 3: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 3

Contents Brief introduction Review

A little review of probability Bayes theorem

Bayesian Classification Steps of using Bayesian Network

Page 4: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 4

Random variables X, Y, Xi, Θ Capitals Condition (or value) of a variable x, y, xi, θ

small Set of a variable X, Y, Xi, Θ in Capital bold Set of a condition (or value) x, y, xi, θ

small bold P(x/a) : Probability that an event x occurs

(or happens) under the condition of a

Page 5: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 5

What is Bayesian Network ? Network which express the dependencies

among the random variables Each node has posterior probability which

depends on the previous random variable The whole network also express the joint

probability distribution from all of the random variables

Pa is parent(s) of a node i

},...,,{ 21 nxxxX

X

n

iii PaxpXp

1

Page 6: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 6

How is it used ? Bayesian Learning

Estimating dependencies between the random variables from the actual data

Bayesian Inference When some of the random variables are defined

it calculate the other probabilities Patiants condition as a random variable, from the

condition it predicts the desease

Page 7: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 7

What is so good about it? Conditional independencies and graphical

expression capture structure of many real-world distributions. [1]

Learned model can be used for many tasks Supports all the features of probabilistic

learning Model selection criteria Dealing with missing data and hidden variables

Page 8: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 8

Example of Bayesian Network

Structure of a network

Conditional Probability X,Y,Z are random variables

which takes either 0 or 1 p(X), p(Y|X), p(Z|Y)

X Y Z

X Y P(Y|X)0 0 0.10 1 0.91 0 0.21 1 0.8

Y Z P(Z|Y)0 0 0.30 1 0.71 0 0.41 1 0.6

X P(X)0 0.51 0.5

Page 9: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 9

Example of Bayesian Network 2 What is the Joint probability of P(X, Y, Z)?

P(X, Y, Z) = P(X)*P(Y|X)*P(Z|Y)

X Y Z P(X,Y,Z)0 0 0 0.0150 0 1 0.0350 1 0 0.1800 1 1 0.270

X Y Z P(X,Y,Z)1 0 0 0.0301 0 1 0.0701 1 0 0.1601 1 1 0.240

Page 10: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 10

A little Review of probability 1 Probability : How likely is it that an event

will happen? Sample Space S

Element of S: elementary event An event A is a subset of S

P(A) ≧ 0 P(S) = 1

Page 11: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 11

A little review of probability 2 Discrete probability distribution

P(A) = Σ s∈ A P (s)

Conditional probability distribution P(A|B) = P(A, B) / P(B)

If the events are independent P(A, B) = P(A)*P(B)

Bayes Theorem

ABPAPABPAP

ABPAP

BP

ABPAPBAP

||

)|()(

)(

)|()()|(

BA

Page 12: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 12

Bayes Theorem

n

iii

iii

ABPAP

ABPAP

BP

ABPAPBAP

1

|

)|()(

)(

)|()()|(

Page 13: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 13

Example of Bayes Theorem You are about to be tested for a rare

desease. How worried should you be if the test result is positive ?

Accuracy of the Test is P(T) = 85% Chance of Infection P(I) = 0.01% What is P(I / not T)

http://www.gametheory.net/Mike/applets/Bayes/Bayes.html

Page 14: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 14

Bayesian Classification Suppose that there are m classes,

Given an unknown data sample, xthe Bayesian classifier assigns an unknown sample x to the class c if and only if

mCCC ,...,, 21

ijmj

XCPXCP ji

,1

)|()|(

Page 15: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 15

We have to maximize )()|( ii CPCXP

In order to reduce computationclass conditional independence is made

)(

)|()()|(

XP

CXPCPXCP ii

i

n

kiki CxPCXP

1

)|()|(

Page 16: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 16

Example of Bayesian Classificationin the text book[3] Customer under 30 and income is

“medium” and student and credit rating is “fair”, which category does the customer belongs? Buy or not.

Page 17: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 17

Bayesian Network Network which express the dependencies

among the random variables The whole network also express the joint

probability distribution from all of the random variables

Pa is parent(s) of a node i

},...,,{ 21 nxxxX

X

n

iii PaxpXp

1

X Y Z

n

iii xxxxpp

1121 ),...,,|()(x

)|(),...,,|( 121 iPaxpxxxxp iii

Pai are a subset

Page 18: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 18

Steps to apply Bayesian Network Step1 Create a Bayesian Belief Network

Include all the variables that are important in your system

Use causal knowledge to guide the connections made in the graph

Use your prior knowledge to specify the conditional distributions

Step2 Calculate the p(xi|pai) for your goal

Page 19: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 19

Example from [1] Example to make a BN from the prior

knowledge BN to find a credit card fraud

Define random variables Fraud(F):Probability that owner is a fraud Gas(G):Bought a gas in 24 hours Jewelry(J):Bought a jewelry in 24 hours Age(A):Age of owner of the card Sex(S):Gender of the owner of the card

Page 20: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 20

Give orders to random variables Define dependencies, but you have to be

careful.

),,|(),,,|(

)|(),,|(

)(),|(

)()|(

safjpgsafjp

fgpsafgp

spafsP

apfap

F

G J

SA

F

G

J S

A)|(),,,|(

)|(),,|(

)(),|(

)|()|(

fjpgsafjp

fgpsafgp

spafsP

fapfap

Page 21: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory 21

Next topic Training with Bayesian Network

Bayes Inference If the training data is complete If the training data is missing Network Evaluation

Page 22: Software Engineering Laboratory1 Introduction of Bayesian Network 4 / 20 / 2005 CSE634 Data Mining Prof. Anita Wasilewska 105269827 Hiroo Kusaba.

Software Engineering Laboratory

22

Thank you for listening.