12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016....
Transcript of 12. Discriminant Analysis - KOCWcontents.kocw.net/KOCW/document/2014/hanyang/huhseon/9.pdf · 2016....
2014-11-02
1
Hanyang
University
Quest Lab.
Chapter 12
Discriminant Analysis
Fall, 2014Department of IMEHanyang University
Hanyang UniversityQuest Lab.
• Classical statistical method for classification and profiling
• Model-based approach to classification
• Classification is based on the “statistical” distance from each class
average
• Classification functions are used to produce classification scores
• Common uses: 미생물의 종(種)분류, loans, credit cards, insurances,
대학입학허가여부, 지문분석 등등
12.1 Introduction
2014-11-02
2
Hanyang UniversityQuest Lab.
12.1 Introduction
Example 1 : Riding mowers (from Chapter 7)
12 owners and 12 nonowners
Hanyang UniversityQuest Lab.
12.1 Introduction
Example 1 : Riding mowers (from Chapter 9)
The line seems to do a good job – 4 misclassificationsCan we do better?
2014-11-02
3
Hanyang UniversityQuest Lab.
12.2 Distance of an observation froma class
General idea:
• To classify a new record, measure its distance from the center
of each class
• Then, classify the record to the closest class
Hanyang UniversityQuest Lab.
Step 1: Measuring Distance
• Euclidean distance between an item and mean of a class
→ 각 predictor들의 단위에 따라 거리값이 달라짐(달러 vs. 천 달러 등)
→ 각 predictor들의 variability를 고려하지 못함(ex. 거리는 A보다 B에 가까워
도 variability가 B가 더 작으면 A로 classify하는 것이 타당할 수도 있음)
→ Euclidean distance는 predictor간 correlation은 고려하지 못함
→ “Statistical distance” (Mahalanobis distance)
12.2 Distance of an observation froma class
2 2 21 1 2 2( , ) ( ) ( ) ( )p pD x x x x x x= - + - + + -LEuclidean x x
2014-11-02
4
Hanyang UniversityQuest Lab.
Statistical distance (Mahalanobis distance)
where is the covariance matrix.
If = 1 (single predictor), this reduces to z-score.
If = (all predictors are non-correlated with each other), then
12.2 Distance of an observation froma class
[ ] [ ]2 1
1 1
2 211 1 2 2
( , )
[( ), ( ), , ( )] ,
T
p p
p p
D Sx xx x
x x x x x x S
x x
-
-
= - -
-é ùê ú-ê ú= - - -ê úê ú-ê úë û
LL
Statistical
x x x x x x
Statistical Euclidean( , ) ( , ).D D=x x x x
Hanyang UniversityQuest Lab.
12.3 Fisher’s linear classification functions
Idea: to find linear function of measurements that maximizes: • 분자값이 클수록 class간 구분이 확실히 되며, 분모값이 작을수록 class
내의 동질성 높음)
• 각 obs.마다 이 function을 사용하여 score를 계산하고 이 값이 가장
큰 class로 assign
• 각 class마다 classification function을 하나씩 생성
Step 2: Classification Functions
2014-11-02
5
Hanyang UniversityQuest Lab.
12.3 Fisher’s linear classification functions
Record #1: income = $60K, lot size = 18.4K
Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) = 53.2
Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4)= 54.48
“Non-owner” score is higher → classify as non-owner
How do we find these functions?
Hanyang UniversityQuest Lab.
12.3 Fisher’s linear classification functions
2014-11-02
6
Hanyang UniversityQuest Lab.
12.3 Fisher’s linear classification functions
Step 3: (alternative way to classify) Converting to Probabilities
• It is possible to convert classification scores to probabilities of belonging to a class: = ()() + () +⋯+ ()
Probability that record belongs to class • The probability is then compared to the cutoff value in order
to classify a record
• 장점: 비교없이 lift chart 가능
Hanyang UniversityQuest Lab.
12.3 Fisher’s linear classification functions
.. + . = 0.218
2014-11-02
7
Hanyang UniversityQuest Lab.
12.3 Fisher’s linear classification functions
DA
추측
Linear/Quadratic Discriminant Analysis
2014-11-02
8
11/2/2014 15
Bayesian Decision Theory
( ) ( ) ( )( )
|| j j
j
P C P CP C
P=
xx
x
Posterior probability from Bayes Formula:
( )jP CClass probabilities(Prior)
Conditional density of X: ( )| jP Cx
( ) ( ) ( )1
|n
j jj
P P C P C=
=åx xUnconditional
density of X
11/2/2014 16
Bayesian Classifier
( ) ( )1 1 2 2For a given , if | ( ) | ( ), then is classified to class 1, otherwise, class 2.
x P C p C P C p Cx
>x x
Find the maximum posterior probability
( ) ( )1 2For a given , if | | , then is classified to class 1, otherwise, class 2.
P C P C>x x xx
( ) ( ) ( )( )
|| i i
i
P C P CP C
P=
xx
x
2014-11-02
9
11/2/2014 17
Bayesian Classifier
( ) ( )1 1 2 2For a given , if | ( ) | ( ), then is classified to class 1, otherwise, class 2.
x P C p C P C p Cx
>x x
사전 확률 계산P(C1) = n1 / N, P(C2) = n2 / N
정확한 값이 아니라 추정 (N이 커짐에 따라 실제 값에 가까워짐)
우도(likelihood) 계산훈련 집합에서 Ci에 속하는 샘플들을 가지고 P(x | Ci) 추정
Naïve Bayes: P(x | Ci) = P(x1 | Ci) P(x2 | Ci) ⋯P(xp | Ci)
DA에서는? P(x | Ci) = multivariate normal 분포를 가정
11/2/2014 18
Classification Based on Normal Populations
정규 분포 (가우시언 분포)현실 세계에 맞는 경우 있음평균과 분산이라는 두 종류의 매개 변수만으로 표현 가능수학적인 매력
우도가 정규 분포를 따른다는 가정 하에 베이시언 분류기의 특성을 해석
2014-11-02
10
11/2/2014
Multivariate Normal Distribution
( )( )
( ) ( )11/2/2
1 1| exp22
T
j j jp
j
p Cp
-é ù= - - -ê úë ûΣ
Σx x μ x μ
( )
( )
1
11 1
1
1
, , MVN( , ),
where , , ,
T
p
pT
p
p pp
x x
s sm m
s s
=
é ùê ú= = ê úê úë û
Σ
Σ
L :
LL M O M
L
x μ
μ
( ) ( )2
2
1In case of 1, exp22
xp p x
msps
é ù-ê ú= = -ê úë û
11/2/2014
Multivariate Normal Distribution
( )( )
( ) ( )11/2/2
1 1| exp22
T
j j jp
j
p Cp
-é ù= - - -ê úë ûΣ
Σx x μ x μ
2014-11-02
11
11/2/2014 21
Discriminant Function
( ) ( ) ( ) ( )( ) ( ) ( )
| | , or
ln | lni i i i
i i i
P C p C P C
p C P C
d
d
º µ
= +
x x x
x x
( ) , 1, 2, , (where is the number of classes)i i K Kd = Lx
The classifier will assign an observation x to class Ci if
( ) ( ) for alli j i jd d> ¹x x
Discriminant function in terms of posterior probability
11/2/2014 22
Discriminant Function for Normal Distribution
( ) ( ) ( )
( ) ( ) ( ) ( )1
ln | ln1 1 ln 2 ln ln
2 2 2
i i i
T
i i i i i
p C P Cp P C
d
p -
= +
= - - + - - -Σ Σ
x x
x μ x μ
( )( )
( ) ( )11/2/2
1 1| exp22
t
i i i ip
i
p Cp
-é ù= - - -ê úë ûΣ
Σx x μ x μ
Decision boundary between class k and class r
( ) ( ) k rd d=x x
2014-11-02
12
11/2/2014 23
Derivation of Decision Boundary
0)()(21||ln21)2ln(2)2(ln
)()(21||ln21)2ln(2)1(ln
21
222
11
111
=-S-+S++-
-S--S---
-
mmp
mmp
xxpP
xxpPT
T
)2()1(ln2ln)()()()(
2
12
1221
111 P
Pxxxx TT -SS
=-S---S- -- mmmm
1 2( ) ( ) 0d d- =x x
24
Decision Boundary When 21 S=S
)2()1(ln2ln)()()()(
2
12
1221
111 P
Pxxxx TT -SS
=-S---S- -- mmmm
0)2()1(ln)()(
21)( 21
121
121 =--S--S- --
PPx TT mmmmmm
( ) ( ) ( ) ( ) 221
21
21
121 11
111
1When Súû
ùêë
é-+-
-+Sú
û
ùêë
é-+-
-=S=S=S
nnn
nnn
A
DecisionAssign x into class 1 if A + B > 0, class 2 if A + B < 0.
x + B =0 A+ B = 0: Linear Equation
2014-11-02
13
• 결정경계
• 예제
2014-11-02
14
27
Decision Boundary When 21 S¹S
)2()1(ln2ln)()()()(
2
12
1221
111 P
Pxxxx TT -SS
=-S---S- -- mmmm
0)2()1(ln)()(
21 1
22111
22
11 =--S-S+S-S- ----
PPTxxx TTT mm
21When S¹S
A
DecisionAssign x into class 1 if Ax2 + Bx + C > 0, class 2 if Ax2 + Bx + C < 0.
x2 + =0 Ax2 + Bx + C = 0: Quadratic Equation
x + C B
2014-11-02
15
• 예제
11/2/2014 30
Linear Decision Boundary from LDA
(a) Three normal distributions, with the same covariance and different means.(b) A sample of 30 drawn from each normal distribution, and the fitted LDA
decision boundaries.
(a) (b)
2014-11-02
16
11/2/2014 31
Quadratic Decision Boundary from QDA
11/2/2014 32
QDA vs. LDA
2014-11-02
17
Hanyang UniversityQuest Lab.
12.4 Classification performance of discriminant analysis
① 각 class에서의 distance측정에서 multivariate normal distribution을
가정함
• 이 가정이 적절한 경우 다른 어떤 classification 방법보다도 효과적
(적은 data로도 동일한 결과를 얻을 수 있다는 뜻)
• predictor들이 non-normal이거나 dummy variable일 지라도 robust
• outlier에는 sensitive하므로 exploratory analysis가 필요
Main assumptions:
Hanyang UniversityQuest Lab.
12.4 Classification performance of discriminant analysis
② Class내에서의 correlation structure가 다른 class에서도 동일하다는
가정
• 각 class에서의 correlation matrix를 구하여 서로 비교하여 check
• Correlation matrix가 class마다 다른 경우 variability가 가장 큰 쪽으로
classify하는 경향
• 대안은 quadratic discriminant analysis를 수행하는 것(if dataset is
very large)
Two main assumptions:
2014-11-02
18
Hanyang UniversityQuest Lab.
12.5 Prior probabilities
③ 이 방법은 각 class들이 향후 특정한 어떤 item을 만나게 될 확률이 동일하
다고 보는 가정을 바탕으로 하고 있음 – 이것이 다를 경우는?
Let = prior probability of membership in class 각 class의 classification function에 log()를 더해줌 (natural logarithm)
Ex. Riding-mower owner의 모비율이 15% (sample에서는 12대 12, 즉 50%이
었음)이면?
→ owner로 classify하는 비율이 낮아지도록 유도해야 함
→ Fig 12.3의 owner classification function의 constant term에 각각 log(0.15),
nonowner에 log(0.85)를 추가
Hanyang UniversityQuest Lab.
12.5 Prior probabilities
Record가 income = $60K, lot size = 18.4K 인 경우
Owner score = -73.16 + (0.43)(60) + (5.47)(18.4) + log(0.15)= 51.3
Non-owner score= -51.42+(0.33)(60)+(4.68)(18.4) + log(0.85)= 53.64
2014-11-02
19
Hanyang UniversityQuest Lab.
12.6 Unequal misclassification costs
④ Misclassification cost가 symmetric하지 않은 경우에는?
Let = misclassifying cost of class . • 각 class의 classification function에 log()를 더해줌 (prior probability와
동시에 고려한다면 log()를 더해주는 셈)
• 각각을 estimate하는 것이 어려우면 그 비율 2/1을 사용. 이 경우에는
class 2의 classification function에만 log(2/1)를 더해주면 됨
Hanyang UniversityQuest Lab.
12.7 Classifying more than two classes
119 자원이 일시적으로 부족한데 dispatching 요청이 있을 때 어느 현장
으로 갈 것인가?
Outcome: (3 classes)
No injury
Non-fatal injury
Fatal injury
Predictors: Time of day, day of week, weather, type of road, road
surface conditions, …
Example 3: Medical dispatch to accident scene
2014-11-02
20
Hanyang UniversityQuest Lab.
12.7 Classifying more than two classes
- Applied discriminant analysis to 1,000 records,
- 각 record 마다 3개 classification function, confusion matrix is 3×3
- Compute the classification scores
- Highest classification score인 class에 assign
Example 3: Medical dispatch to accident scene
Hanyang UniversityQuest Lab.
12.7 Classifying more than two classes
Example 3: Medical dispatch to accident scene
e30.93 / (e25.94+e31.42+e30.93+) = 0.38
2014-11-02
21
Hanyang UniversityQuest Lab.
12.8 Advantages and weaknesses
• DM 방법이라기보다는 statistical classification method
• Very popular in social science
• Use and performance are similar to multiple linear regression.
-> use same least square method
-> normal assumption (predictors are approx. multivariate normal)
-> 실제로 normal이 아닌 경우에도 good
-> 지나치게 skew된 variable의 경우에는 log transform을 하면 좋음
• It provides estimates of single-predictor contributions, and is
computationally simple, useful for small datasets.
Hanyang UniversityQuest Lab.
2014-11-02
22
Hanyang UniversityQuest Lab.
• 12.1 (Personal Loan)• 12.2 (SystemAdministrator) • 12.3 (detecting spam mail)
Ch. 12 Problems