COSC522–MachineLearning BaysianDecisionTheory

COSC 522 – Machine Learning

Baysian Decision Theory

Hairong Qi, Gonzalez Family ProfessorElectrical Engineering and Computer ScienceUniversity of Tennessee, Knoxvillehttps://www.eecs.utk.edu/people/hairong-qi/Email: hqi@utk.edu

Course Website: http://web.eecs.utk.edu/~hqi/cosc522/

Recap from Previous Lecture

• The differences among DL, ML, and AI• The evolution of AI and ML – the three

waves/eras• The different ways of categorizing ML algorithms

– Supervised vs. Unsupervised– Linear vs. Nonlinear– Parametric (model-based) vs. Nonparametric

(modeless)

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior

probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall

probability of error?• What are cost function (or objective function) and optimization

method?3

terminologies

the Formula

decision rule

The Toy Example

• Student (taking COSC522 in F21) gender classification

• Feature: height (1-D), if adding weight, then 2-D• Data:

– Training set: For half of the class, we take measurement of their height and record the student’s gender

– Testing set: For the other half of the class, given height information, determine the student’s gender

Terminologies

• Supervised learning: – Training data vs. testing data vs. validation data– Training: given input-output pairs

• Features (e.g., height)• Samples• Dimensions• Classification vs. Regression

Example 1 – 2-D Feature

height weight label5’5” 140 lb 15’6” 150 lb 25’4” 130 lb 15’7” 160 lb 2

method?7

terminologies

the Formula

decision rule

Example 2 – 1-D feature

height label5’5” 15’6” 25’4” 15’7” 2

5’5” 15’8” 25’6” 15’7” 25’7” 2

From Histogram to Probability Density Distribution (pdf)

Q&A Session - Looking into Gaussian• Two classes with one intersection?

• Two classes with no intersection?

• Two classes with two intersections?

Bayes’ Formula (Bayes’ Rule)

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

posterior probability(a-posteriori probability)

prior probability(a-priori probability)Conditional probability density function

(likelihood)

normalization constant(evidence)

( ) ( ) ( )∑=

jjj Pxpxp

1| ωω

From domain knowledge

Q&A Session

• How do you interpret prior probability in the toy example?

method?13

the Formulaterminologies

decision rule

Bayes Decision Rule

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

xMaximum Posterior Probability (MPP):

Decision Regions

The effect of any decision rule is to partition the feature space into c decision regions

cℜℜℜ ,,, 21

1ℜ 2ℜ

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

Conditional Probability of Error

P error | x( ) =P ω1 | x( ) if we decide ω2

P ω2 | x( ) if we decide ω1

= min P(ω1 | x),P(ω2 | x)[ ]

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

Overall Probability of Error

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

Or unconditional risk, unconditional probability of error

P error( ) = P(error,x)dx−∞

∫ = P(error | x)p(x)dx−∞

1ℜ 2ℜ

( ) ( )22| ωω Pxp

( ) ( )11| ωω Pxp

= P (error|!2) + P (error|!1)

How Does It Work Altogether?

height label5’5” 15’6” 25’4” 15’7” 2

5’5” 15’8” 25’6” 15’7” 25’7” 2

Training Set:

Testing Sample:

Questions• What is supervised learning (vs. unsupervised learning)?• What is the difference between the training set and the test set?• What is the difference between classification and regression?• What are features and samples?• What is dimension?• What is histogram?• What is pdf?• What is Bayes’ Formula?• What is conditional pdf?• What is the difference between prior probability and posterior probability?• What is Baysian decision rule? or MPP?• What are decision regions?• How to calculate conditional probability of error and overall probability of

error?• What are cost function (or objective function) and optimization method?

Q&A Session

• What is the cost function?• What is the optimization approach we use to find

the optimal solution to the cost function?

( ) ( )2. otherwise, 1, class tobelongs then ,|| if ,given aFor 21

xxPxPx ωω >

( ) ( ) ( )( )xpPxp

xP jjj

ωωω

MaximumPosteriorProbability

P error( ) = P ω2 | x( )p x( )dxℜ1

∫ + P ω1 | x( )p x( )dxℜ2

∫Overall probability of error

COSC522–MachineLearning BaysianDecisionTheory

Documents

Transcript of COSC522–MachineLearning BaysianDecisionTheory

Sentiment’Analysisof’Movie’Reviewsand’TwitterStatuses ...kfirbar/mlproject/project-ml.pdf · Machinelearning(–(final(project(Kfir(Bar(! 1! Sentiment’Analysisof’Movie’Reviewsand’TwitterStatuses’

machinelearning project

Bayesian Machine Learning - CATNIP Labcatniplab.github.io/journalclub/JCpapers/Wu_BML_EEG.pdf · Bayesian statistics, variational methods, and machinelearning images licensed by ingram

LinkingMotifSequenceswithTaleTypesby MachineLearning

Machine Learning - University of Washingtonhomes.cs.washington.edu/~shwetak/classes/cse599u/notes/MachineLearning... · Machine Learning Roughly speaking, for a given learning task,

HiBench In-depth Introduction - Schedschd.ws/hosted_files/apachebigdata2016/3a/HiBench - The Benchmark... · Samza Deployment Standalone YARN Kafka Zookeeper Application SQL MachineLearning

Talent Garden Innovation Report 2018 · #bitcoin #entrepreneur #cx #ai man #iot #bigdata #digitaltransformation #blockchain #fintech #digital #machinelearning #artificialintelligence

Björn W. Schuller - kiv.zcu.cz · Björn W. Schuller MachineLearning Group ... Charismatic 63.2 Teamplayer 65.2 4 ... Schuller-TALK-final.pptx Author: BWS

MachineLearning Methodsto PerformP ricing Optimization. A C … · 2018-04-17 · MachineLearning Methodsto PerformP ricing Optimization. A Comparisonwith StandardGLMs by Giorgio

MachineLearning: LinearRegressionandNeuralNetworksrolf/Edu/DM534/E16/DM534-marco.pdf · MathematicalConcepts MachineLearning LinearRegression LogisticRegression Outline ArtiﬁcialNeuralNetworks

MACHINELEARNING ALGORITHMS WEALTH ANALYTICS

AvionicsGraphicsHardwarePerformancePredictionwith ...downloads.hindawi.com/journals/sp/2019/9195845.pdf · ResearchArticle AvionicsGraphicsHardwarePerformancePredictionwith MachineLearning

CSC411Fall2014 MachineLearning&DataMining% …rsalakhu/CSC411/notes/lecture_ensemble1.pdfDiffer(in(training(strategy,(and(combination(method(1. Paralleltrainingwithdifferenttrainingsets:

ScalableArchitecture forAutomating MachineLearning ...... · DEGREEPROJECTINTECHNOLOGY, SECONDCYCLE,30CREDITS STOCKHOLM,SWEDEN2020 ScalableArchitecture forAutomating MachineLearning

RESEARCH OpenAccess Machinelearning-baseddynamic ...

AssuringtheMachineLearningLifecycle:Desiderata, Methods ... · Assuring the MachineLearning Lifecycle: Desiderata, Methods, and Challenges 3 understanding of the machine learning

Applied StatisticsforNeuroscientists Part IIa: MachineLearningsglasauer.userweb.mwn.de/Ahmadi_17_ASN_01_MachineLearning_w… · Applied StatisticsforNeuroscientists Part IIa: MachineLearning

karel.tsuda.ac.jpkarel.tsuda.ac.jp/lec/MachineLearning/Coursera/nittaWeek...Q 4) @ re t çe£ (9 hah r/ pracise The Japan Research Institute, Limited

Machine Learning: Linear Regression and Neural Networksrolf/Edu/DM534/E17/DM534-marco.pdf · MachineLearning LinearRegression HousePriceExample ArtiﬁcialNeuralNetworks Sizeinm2

link.springer.com...1562 MachineLearning(2019)108:1561–1599 1 Introduction Supervised machine learning models require enough labeled data to obtain good …