INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The...

INTRODUCTION TO MACHİNE LEARNİNG3RD EDİTİON

ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz© The MIT Press, 2014 for CS539 Machine Learning at WPI

alpaydin@boun.edu.trhttp://www.cmpe.boun.edu.tr/~ethem/i2ml3e

Lecture Slides for

CHAPTER 5:

MULTİVARİATE METHODS

Multivariate Data

XXXXXX

Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples

Multivariate Parameters

μμCov

ijijji

Tjjiijiij

XXEXXEXX

,Corr :nCorrelatio

,Cov:Covariance

,...,:Mean 1μx

Covariance Matrix:

Parameter Estimation from data sample X

:matrix nCorrelatio

:matrix Covariance

,...,1,: mean Sample

Estimation of Missing Values

What to do if certain instances have missing attribute values?

Ignore those instances. This is not a good idea if the sample is small

Use ‘missing’ as an attribute: may give information

Imputation: Fill in the missing value Mean imputation: Use the most likely value

(e.g., mean) Imputation by regression: Predict based on

other attributes

Multivariate Normal Distribution

μxμxx

Multivariate Normal Distribution

Mahalanobis distance: (x – μ)T ∑–1 (x – μ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

Bivariate: d = 2

iiii xz

zzzzxxp

2212122

21 212

z-normalization

Bivariate NormalIsoprobability [i.e., (x – μ)T ∑–1 (x – μ) = c2] contour plots

when covariance is 0, ellipsoid axes are parallel to coordinate axes

If xi are independent, offdiagonal values of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:

If variances are also equal, reduces to Euclidean distance

Independent Inputs: Naive Bayes

The use of the term “Naïve Bayes” in this chapter is somewhat wrong Naïve Bayes assumes independence in the probability sense,

not in the linear algebra sense

Parametric Classification

If p (x | Ci ) ~ N ( μi , ∑i )

Discriminant functions

idiCp μxμxx 1

1exp| //

log loglog

log| log

μΣμ2

Estimation of Parameters from data sample X

iii CPg ˆ log log mxmxx 1

Assuming a different Si for each Ci

Quadratic discriminant. Expanding the formula on previous slide:

has the form of a quadratic formula

See figure on next slide

iiiiTii

iiiTiii

ˆ log log

mmmxxxx

likelihoods

posterior for C1

discriminant: P (C1|x ) = 0.5

Assuming Common Covariance Matrix S

iCP̂ SS

Shared common sample covariance S

Discriminant reduces to

which is a linear discriminant

ii CP̂g log21 1 mxmxx S

iiTiiii

ˆ log

Common Covariance Matrix S

Arbitrary covariances but shared by classes

Assuming Common Covariance Matrix S is Diagonal

i CPsmx

g ˆ log

When xj j = 1,..d, are independent, ∑ is diagonal

p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’

assumption)

Classify based on weighted Euclidean distance (in sj units) to the nearest mean

Assuming Common Covariance Matrix S is Diagonal

variances may bedifferent

Covariances are 0, so ellipsoid axes are parallel to coordinate axes

Assuming Common Covariance Matrix S is Diagonal and variances are equal

ˆ log

Nearest mean classifier: Classify based on Euclidean distance to the nearest mean

Each mean can be considered a prototype or template and this is template matching

Assuming Common Covariance Matrix S is Diagonal and variances are equal

Covariances are 0, so ellipsoid axes are parallel to coordinate axes.Variances are the same, so ellipsoids become circles.

Classifier looks for nearest mean

Model Selection

Assumption Covariance matrix No of parameters

Shared, Hyperspheric Si=S=s2I 1

Shared, Axis-aligned Si=S, with sij=0 d

Shared, Hyperellipsoidal Si=S d(d+1)/2

Different, Hyperellipsoidal Si K d(d+1)/2

As we increase complexity (less restricted S), bias decreases and variance increases

Assume simple models (allow some bias) to control variance (regularization)

Different cases of covariance matrices

fitted to the same data lead to different decision

boundaries

Discrete Features

jijjijj

CPpxpx

log log log

log| log

Binary features:if xj are independent (Naive Bayes’)

the discriminant is linear

ijij Cxpp |1

jj ppCxp1

rxp̂Estimated parameters

Discrete Features

ikjijkijk CvxpCzpp || 1

Multinomial (1-of-nj) features: xj in {v1, v2,..., vnj

where zjk = 1 if xj = vk ; or 0 otherwiseif xj are independent

iijkj k jki

log log

Multivariate Regression26

dtt wwwxgr ,...,,| 10

Multivariate linear model

Multivariate polynomial model: Define new higher-order variables

z1=x1, z2=x2, z3=x12, z4=x2

2, z5=x1x2

and use the linear model in this new z space (basis functions, kernel trick: Chapter 13)

211010

xwxwwrwwwE

xwxwxww

X|,...,,

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The...

Documents

Transcript of INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The...

Machine Learning CSE 681 CH1 - INTRODUCTION. INTRODUCTION TO Machine Learning 2nd Edition ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr ethem/i2ml2e.

Introduction to Machine Learning - Ethem Alpaydin

INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/lectures/...1 INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 Edited for CS 536 Fall

ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr ethem/i2ml2e Lecture Slides for 1 Lecture Notes for E Alpaydın 2010.

INTERNET MEASUREMENT INTERNET MAPPING OVERHEAD MINIMIZATION Intra- and inter-monitor redundancy reduction IBRAHIM ETHEM COSKUN University of Nevada,

INTRODUCTION TO Machine Learningethem/i2ml/slides/v1-1/i2...INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture Slides for

Bias and Variance of the Estimator PRML 3.2 Ethem Chp. 4.

INTERNET TOPOLOGY MAPPING INTERNET MAPPING PROBING OVERHEAD MINIMIZATION Intra- and inter-monitor redundancy reduction IBRAHIM ETHEM COSKUN University.

Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010 alpaydin@boun.edu.tr.

INTRODUCTION TO Machine Learningusers.cis.fiu.edu/~jabobadi/CAP5610/slides10.pdfINTRODUCTION TO Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla ... (V1.1)

INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Machine Learning Part I: Classification and Bayesian Learning Ref: E. Alpaydin, Intro to Machine Learning, MIT 2004.

INTRODUCTION TO Machine Learning Adapted from: ETHEM ALPAYDIN Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, 2010 - BOUN

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDINModified by Prof. Carolina Ruiz © The MIT Press, 2014for CS539 Machine Learning at WPI alpaydin@boun.edu.tr.

ETHEM ALPAYDIN © The MIT Press, 2010scrippsj/cs678/slides/edition2/i2...What We Talk About When We Talk About“Learning” Learning general models from a data of particular examples

INTRODUCTION TO Machine Learningusers.cis.fiu.edu/~jabobadi/CAP5610/slides8.pdfINTRODUCTION TO Machine Learning 2nd Edition ETHEM ALPAYDIN, modified by Leonardo Bobadilla and some

Multivariate Methods Slides from Machine Learning by Ethem Alpaydin Expanded by some slides from Gutierrez-Osuna.

INTRODUCTION TO Machine Learning - CmpE WEBethem/i2ml/slides/v1... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml Lecture

Threshold Models of Collective Behavior - SIUhexmoor/classes/CS539-F10/Collective-Behavior.pdf · Threshold Models of Collective Behavior The models of this paper treat binary decisions-those