12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016....

22
2014-11-02 1 Hanyang University Quest Lab. Chapter 12 Discriminant Analysis Fall, 2014 Department of IME Hanyang University Hanyang University Quest Lab. Classical statistical method for classification and profiling Model-based approach to classification Classification is based on the “statistical” distance from each class average Classification functions are used to produce classification scores Common uses: 미생물의 종()분류, loans, credit cards, insurances, 대학입학허가여부, 지문분석 등등 12.1 Introduction

Transcript of 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016....

Page 1: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

1

Hanyang

University

Quest Lab.

Chapter 12

Discriminant Analysis

Fall, 2014Department of IMEHanyang University

Hanyang UniversityQuest Lab.

• Classical statistical method for classification and profiling

• Model-based approach to classification

• Classification is based on the “statistical” distance from each class

average

• Classification functions are used to produce classification scores

• Common uses: 미생물의 종(種)분류, loans, credit cards, insurances,

대학입학허가여부, 지문분석 등등

12.1 Introduction

Page 2: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

2

Hanyang UniversityQuest Lab.

12.1 Introduction

Example 1 : Riding mowers (from Chapter 7)

12 owners and 12 nonowners

Hanyang UniversityQuest Lab.

12.1 Introduction

Example 1 : Riding mowers (from Chapter 9)

The line seems to do a good job – 4 misclassificationsCan we do better?

Page 3: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

3

Hanyang UniversityQuest Lab.

12.2 Distance of an observation froma class

General idea:

• To classify a new record, measure its distance from the center

of each class

• Then, classify the record to the closest class

Hanyang UniversityQuest Lab.

Step 1: Measuring Distance

• Euclidean distance between an item and mean of a class

→ 각 predictor들의 단위에 따라 거리값이 달라짐(달러 vs. 천 달러 등)

→ 각 predictor들의 variability를 고려하지 못함(ex. 거리는 A보다 B에 가까워

도 variability가 B가 더 작으면 A로 classify하는 것이 타당할 수도 있음)

→ Euclidean distance는 predictor간 correlation은 고려하지 못함

→ “Statistical distance” (Mahalanobis distance)

12.2 Distance of an observation froma class

2 2 21 1 2 2( , ) ( ) ( ) ( )p pD x x x x x x= - + - + + -LEuclidean x x

Page 4: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

4

Hanyang UniversityQuest Lab.

Statistical distance (Mahalanobis distance)

where is the covariance matrix.

If = 1 (single predictor), this reduces to z-score.

If = (all predictors are non-correlated with each other), then

12.2 Distance of an observation froma class

[ ] [ ]2 1

1 1

2 211 1 2 2

( , )

[( ), ( ), , ( )] ,

T

p p

p p

D Sx xx x

x x x x x x S

x x

-

-

= - -

-é ùê ú-ê ú= - - -ê úê ú-ê úë û

LL

Statistical

x x x x x x

Statistical Euclidean( , ) ( , ).D D=x x x x

Hanyang UniversityQuest Lab.

12.3 Fisher’s linear classification functions

Idea: to find linear function of measurements that maximizes: • 분자값이 클수록 class간 구분이 확실히 되며, 분모값이 작을수록 class

내의 동질성 높음)

• 각 obs.마다 이 function을 사용하여 score를 계산하고 이 값이 가장

큰 class로 assign

• 각 class마다 classification function을 하나씩 생성

Step 2: Classification Functions

Page 5: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

5

Hanyang UniversityQuest Lab.

12.3 Fisher’s linear classification functions

Record #1: income = $60K, lot size = 18.4K

Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) = 53.2

Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4)= 54.48

“Non-owner” score is higher → classify as non-owner

How do we find these functions?

Hanyang UniversityQuest Lab.

12.3 Fisher’s linear classification functions

Page 6: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

6

Hanyang UniversityQuest Lab.

12.3 Fisher’s linear classification functions

Step 3: (alternative way to classify) Converting to Probabilities

• It is possible to convert classification scores to probabilities of belonging to a class: = ()() + () +⋯+ ()

Probability that record belongs to class • The probability is then compared to the cutoff value in order

to classify a record

• 장점: 비교없이 lift chart 가능

Hanyang UniversityQuest Lab.

12.3 Fisher’s linear classification functions

.. + . = 0.218

Page 7: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

7

Hanyang UniversityQuest Lab.

12.3 Fisher’s linear classification functions

DA

추측

Linear/Quadratic Discriminant Analysis

Page 8: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

8

11/2/2014 15

Bayesian Decision Theory

( ) ( ) ( )( )

|| j j

j

P C P CP C

P=

xx

x

Posterior probability from Bayes Formula:

( )jP CClass probabilities(Prior)

Conditional density of X: ( )| jP Cx

( ) ( ) ( )1

|n

j jj

P P C P C=

=åx xUnconditional

density of X

11/2/2014 16

Bayesian Classifier

( ) ( )1 1 2 2For a given , if | ( ) | ( ), then is classified to class 1, otherwise, class 2.

x P C p C P C p Cx

>x x

Find the maximum posterior probability

( ) ( )1 2For a given , if | | , then is classified to class 1, otherwise, class 2.

P C P C>x x xx

( ) ( ) ( )( )

|| i i

i

P C P CP C

P=

xx

x

Page 9: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

9

11/2/2014 17

Bayesian Classifier

( ) ( )1 1 2 2For a given , if | ( ) | ( ), then is classified to class 1, otherwise, class 2.

x P C p C P C p Cx

>x x

사전 확률 계산P(C1) = n1 / N, P(C2) = n2 / N

정확한 값이 아니라 추정 (N이 커짐에 따라 실제 값에 가까워짐)

우도(likelihood) 계산훈련 집합에서 Ci에 속하는 샘플들을 가지고 P(x | Ci) 추정

Naïve Bayes: P(x | Ci) = P(x1 | Ci) P(x2 | Ci) ⋯P(xp | Ci)

DA에서는? P(x | Ci) = multivariate normal 분포를 가정

11/2/2014 18

Classification Based on Normal Populations

정규 분포 (가우시언 분포)현실 세계에 맞는 경우 있음평균과 분산이라는 두 종류의 매개 변수만으로 표현 가능수학적인 매력

우도가 정규 분포를 따른다는 가정 하에 베이시언 분류기의 특성을 해석

Page 10: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

10

11/2/2014

Multivariate Normal Distribution

( )( )

( ) ( )11/2/2

1 1| exp22

T

j j jp

j

p Cp

-é ù= - - -ê úë ûΣ

Σx x μ x μ

( )

( )

1

11 1

1

1

, , MVN( , ),

where , , ,

T

p

pT

p

p pp

x x

s sm m

s s

=

é ùê ú= = ê úê úë û

Σ

Σ

L :

LL M O M

L

x μ

μ

( ) ( )2

2

1In case of 1, exp22

xp p x

msps

é ù-ê ú= = -ê úë û

11/2/2014

Multivariate Normal Distribution

( )( )

( ) ( )11/2/2

1 1| exp22

T

j j jp

j

p Cp

-é ù= - - -ê úë ûΣ

Σx x μ x μ

Page 11: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

11

11/2/2014 21

Discriminant Function

( ) ( ) ( ) ( )( ) ( ) ( )

| | , or

ln | lni i i i

i i i

P C p C P C

p C P C

d

d

º µ

= +

x x x

x x

( ) , 1, 2, , (where is the number of classes)i i K Kd = Lx

The classifier will assign an observation x to class Ci if

( ) ( ) for alli j i jd d> ¹x x

Discriminant function in terms of posterior probability

11/2/2014 22

Discriminant Function for Normal Distribution

( ) ( ) ( )

( ) ( ) ( ) ( )1

ln | ln1 1 ln 2 ln ln

2 2 2

i i i

T

i i i i i

p C P Cp P C

d

p -

= +

= - - + - - -Σ Σ

x x

x μ x μ

( )( )

( ) ( )11/2/2

1 1| exp22

t

i i i ip

i

p Cp

-é ù= - - -ê úë ûΣ

Σx x μ x μ

Decision boundary between class k and class r

( ) ( ) k rd d=x x

Page 12: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

12

11/2/2014 23

Derivation of Decision Boundary

0)()(21||ln21)2ln(2)2(ln

)()(21||ln21)2ln(2)1(ln

21

222

11

111

=-S-+S++-

-S--S---

-

mmp

mmp

xxpP

xxpPT

T

)2()1(ln2ln)()()()(

2

12

1221

111 P

Pxxxx TT -SS

=-S---S- -- mmmm

1 2( ) ( ) 0d d- =x x

24

Decision Boundary When 21 S=S

)2()1(ln2ln)()()()(

2

12

1221

111 P

Pxxxx TT -SS

=-S---S- -- mmmm

0)2()1(ln)()(

21)( 21

121

121 =--S--S- --

PPx TT mmmmmm

( ) ( ) ( ) ( ) 221

21

21

121 11

111

1When Súû

ùêë

é-+-

-+Sú

û

ùêë

é-+-

-=S=S=S

nnn

nnn

A

DecisionAssign x into class 1 if A + B > 0, class 2 if A + B < 0.

x + B =0 A+ B = 0: Linear Equation

Page 13: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

13

• 결정경계

• 예제

Page 14: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

14

27

Decision Boundary When 21 S¹S

)2()1(ln2ln)()()()(

2

12

1221

111 P

Pxxxx TT -SS

=-S---S- -- mmmm

0)2()1(ln)()(

21 1

22111

22

11 =--S-S+S-S- ----

PPTxxx TTT mm

21When S¹S

A

DecisionAssign x into class 1 if Ax2 + Bx + C > 0, class 2 if Ax2 + Bx + C < 0.

x2 + =0 Ax2 + Bx + C = 0: Quadratic Equation

x + C B

Page 15: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

15

• 예제

11/2/2014 30

Linear Decision Boundary from LDA

(a) Three normal distributions, with the same covariance and different means.(b) A sample of 30 drawn from each normal distribution, and the fitted LDA

decision boundaries.

(a) (b)

Page 16: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

16

11/2/2014 31

Quadratic Decision Boundary from QDA

11/2/2014 32

QDA vs. LDA

Page 17: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

17

Hanyang UniversityQuest Lab.

12.4 Classification performance of discriminant analysis

① 각 class에서의 distance측정에서 multivariate normal distribution을

가정함

• 이 가정이 적절한 경우 다른 어떤 classification 방법보다도 효과적

(적은 data로도 동일한 결과를 얻을 수 있다는 뜻)

• predictor들이 non-normal이거나 dummy variable일 지라도 robust

• outlier에는 sensitive하므로 exploratory analysis가 필요

Main assumptions:

Hanyang UniversityQuest Lab.

12.4 Classification performance of discriminant analysis

② Class내에서의 correlation structure가 다른 class에서도 동일하다는

가정

• 각 class에서의 correlation matrix를 구하여 서로 비교하여 check

• Correlation matrix가 class마다 다른 경우 variability가 가장 큰 쪽으로

classify하는 경향

• 대안은 quadratic discriminant analysis를 수행하는 것(if dataset is

very large)

Two main assumptions:

Page 18: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

18

Hanyang UniversityQuest Lab.

12.5 Prior probabilities

③ 이 방법은 각 class들이 향후 특정한 어떤 item을 만나게 될 확률이 동일하

다고 보는 가정을 바탕으로 하고 있음 – 이것이 다를 경우는?

Let = prior probability of membership in class 각 class의 classification function에 log()를 더해줌 (natural logarithm)

Ex. Riding-mower owner의 모비율이 15% (sample에서는 12대 12, 즉 50%이

었음)이면?

→ owner로 classify하는 비율이 낮아지도록 유도해야 함

→ Fig 12.3의 owner classification function의 constant term에 각각 log(0.15),

nonowner에 log(0.85)를 추가

Hanyang UniversityQuest Lab.

12.5 Prior probabilities

Record가 income = $60K, lot size = 18.4K 인 경우

Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) + log(0.15)= 51.3

Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4) + log(0.85)= 53.64

Page 19: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

19

Hanyang UniversityQuest Lab.

12.6 Unequal misclassification costs

④ Misclassification cost가 symmetric하지 않은 경우에는?

Let = misclassifying cost of class . • 각 class의 classification function에 log()를 더해줌 (prior probability와

동시에 고려한다면 log()를 더해주는 셈)

• 각각을 estimate하는 것이 어려우면 그 비율 2/1을 사용. 이 경우에는

class 2의 classification function에만 log(2/1)를 더해주면 됨

Hanyang UniversityQuest Lab.

12.7 Classifying more than two classes

119 자원이 일시적으로 부족한데 dispatching 요청이 있을 때 어느 현장

으로 갈 것인가?

Outcome: (3 classes)

No injury

Non-fatal injury

Fatal injury

Predictors: Time of day, day of week, weather, type of road, road

surface conditions, …

Example 3: Medical dispatch to accident scene

Page 20: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

20

Hanyang UniversityQuest Lab.

12.7 Classifying more than two classes

- Applied discriminant analysis to 1,000 records,

- 각 record 마다 3개 classification function, confusion matrix is 3×3

- Compute the classification scores

- Highest classification score인 class에 assign

Example 3: Medical dispatch to accident scene

Hanyang UniversityQuest Lab.

12.7 Classifying more than two classes

Example 3: Medical dispatch to accident scene

e30.93 / (e25.94+e31.42+e30.93+) = 0.38

Page 21: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

21

Hanyang UniversityQuest Lab.

12.8 Advantages and weaknesses

• DM 방법이라기보다는 statistical classification method

• Very popular in social science

• Use and performance are similar to multiple linear regression.

-> use same least square method

-> normal assumption (predictors are approx. multivariate normal)

-> 실제로 normal이 아닌 경우에도 good

-> 지나치게 skew된 variable의 경우에는 log transform을 하면 좋음

• It provides estimates of single-predictor contributions, and is

computationally simple, useful for small datasets.

Hanyang UniversityQuest Lab.

Page 22: 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016. 9. 9. · 2014-11-02 1 g y. Chapter 12 Discriminant Analysis Fall, 2014 Department

2014-11-02

22

Hanyang UniversityQuest Lab.

• 12.1 (Personal Loan)• 12.2 (SystemAdministrator) • 12.3 (detecting spam mail)

Ch. 12 Problems