COSC522–MachineLearning BaysianDecisionTheory

Post on 02-Oct-2021

5 views 0 download

Transcript of COSC522–MachineLearning BaysianDecisionTheory

COSC 522 – Machine Learning

Baysian Decision Theory

Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttps://www.eecs.utk.edu/people/hairong-qi/Email: hqi@utk.edu

1

Course Website: http://web.eecs.utk.edu/~hqi/cosc522/

Recap from Previous Lecture

• The differences among DL, ML, and AI• The evolution of AI and ML – the three

waves/eras• The different ways of categorizing ML algorithms

– Supervised vs. Unsupervised– Linear vs. Nonlinear– Parametric (model-based) vs. Nonparametric

(modeless)

2

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior

probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall

probability of error?• What are cost function (or objective function) and optimization

method?3

terminologies

the Formula

decision rule

The Toy Example

• Student (taking COSC522 in F21) gender classification

• Feature: height (1-D), if adding weight, then 2-D• Data:

– Training set: For half of the class, we take measurement of their height and record the student’s gender

– Testing set: For the other half of the class, given height information, determine the student’s gender

4

Terminologies

• Supervised learning: – Training data vs. testing data vs. validation data– Training: given input-output pairs

• Features (e.g., height)• Samples• Dimensions• Classification vs. Regression

5

Example 1 – 2-D Feature

6

height weight label5’5” 140 lb 15’6” 150 lb 25’4” 130 lb 15’7” 160 lb 2

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior

probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall

probability of error?• What are cost function (or objective function) and optimization

method?7

terminologies

the Formula

decision rule

Example 2 – 1-D feature

8

height label5’5” 15’6” 25’4” 15’7” 2

5’5” 15’8” 25’6” 15’7” 25’7” 2

From Histogram to Probability Density Distribution (pdf)

9

x

Q&A Session - Looking into Gaussian• Two classes with one intersection?

• Two classes with no intersection?

• Two classes with two intersections?

10

Bayes’ Formula (Bayes’ Rule)

11

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

posterior probability(a-posteriori probability)

prior probability(a-priori probability)Conditional probability density function

(likelihood)

normalization constant(evidence)

( ) ( ) ( )∑=

=c

jjj Pxpxp

1| ωω

From domain knowledge

Q&A Session

• How do you interpret prior probability in the toy example?

12

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior

probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall

probability of error?• What are cost function (or objective function) and optimization

method?13

the Formulaterminologies

decision rule

Bayes Decision Rule

14

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

xMaximum Posterior Probability (MPP):

15

Decision Regions

The effect of any decision rule is to partition the feature space into c decision regions

cℜℜℜ ,,, 21

1ℜ 2ℜ

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

x1ℜ

Conditional Probability of Error

16€

P error | x( ) =P ω1 | x( ) if we decide ω2

P ω2 | x( ) if we decide ω1

# $ %

= min P(ω1 | x),P(ω2 | x)[ ]

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

x

Overall Probability of Error

17

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

Or unconditional risk, unconditional probability of error

P error( ) = P(error,x)dx−∞

∫ = P(error | x)p(x)dx−∞

1ℜ 2ℜ

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

x1ℜ

= P (error|!2) + P (error|!1)

How Does It Work Altogether?

18

height label5’5” 15’6” 25’4” 15’7” 2

5’5” 15’8” 25’6” 15’7” 25’7” 2

Training Set:

Testing Sample:

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall probability of

error?• What are cost function (or objective function) and optimization method?

19

Q&A Session

• What is the cost function?• What is the optimization approach we use to find

the optimal solution to the cost function?

20

Recap

21

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

|| =

MaximumPosteriorProbability

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

∫Overall probability of error