Chapter 8 Discriminant Analysis. 8.1 Introduction Classification is an important issue in...

Post on 13-Jan-2016

218 views 0 download

Transcript of Chapter 8 Discriminant Analysis. 8.1 Introduction Classification is an important issue in...

Chapter 8

Discriminant Analysis

8.1 Introduction

Classification is an important issue in multivariate analysis and data mining.

Classification: classifies data (constructs a model) based on the

training set and the values (class labels) in a classifying attribute and uses it in classifying new data, i.e., predicts unknown or missing values

Classification—A Two-Step Process

Model construction: describing a set of predetermined classes Each tuple/sample is assumed to belong to a predefined class, as determined b

y the class label attribute The set of tuples used for model construction is training set The model is represented as classification rules, decision trees, or mathematic

al formulae Prediction: for classifying future or unknown objects

Estimate accuracy of the model The known label of test sample is compared with the classified result fro

m the model Accuracy rate is the percentage of test set samples that are correctly classi

fied by the model Test set is independent of training set, otherwise over-fitting will occur

If the accuracy is acceptable, use the model to classify data tuples whose class labels are not known

Classification Process : Model Construction

TrainingData

NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no

ClassificationAlgorithms

IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’

Classifier

(Model)

Classification Process: Use the Model in Prediction

Classifier

TestingData

NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes

Unseen Data

(Jeff, Professor, 4)

Tenured?

Supervised vs. Unsupervised Learning

Supervised learning (classification)

Supervision: The training data (observations, measurements, etc.) are

accompanied by labels indicating the class of the observations

New data is classified based on the training set

Unsupervised learning (clustering)

The class labels of training data is unknown

Given a set of measurements, observations, etc. with the aim of establi

shing the existence of classes or clusters in the data

Discrimination— Introduction

Discrimination is a technique concerned with allocating new observations to previously defined groups.

There are k samples from k distinct populations:

One wants to find the so-called discriminant function and related rule to identify the new observations.

: :

1

111

111

11

111

1

11

kpn

kn

kp

k

k

pnn

p

kkxx

xx

G

xx

xx

G

Example 11.3 Bivariate case

Discriminant function and rule

1

2

Discriminant function:

ifRule

if

w l'

G w

G w

x x

x x a

x x a

Example 11.1: Riding mowersExample 11.1: Riding mowers

Consider two groups in city: riding-mower owners Consider two groups in city: riding-mower owners

and those without riding mowers. In order to identify and those without riding mowers. In order to identify

the best sales prospects for an intensive sales the best sales prospects for an intensive sales

campaign, a riding-mower manufacturer is interested campaign, a riding-mower manufacturer is interested

in classifying families as prospective owners or non-in classifying families as prospective owners or non-

owners on the basis of income and lot size.owners on the basis of income and lot size.

Example 11.1: Riding mowersExample 11.1: Riding mowers

x1:

(Income in $1000s)

x2:

(Lot size 1000 ft2)

x1:

(Income in $1000s)

x2:

(Lot size 1000 ft2)60 18.4 75 19.6

85.5 16.8 52.8 20.864.8 21.6 64.8 17.261.5 20.8 43.2 20.487 23.6 84 17.6

110.1 19.2 49.2 17.6108 17.6 59.4 1682.8 22.4 66 18.469 20 47.4 16.493 20.8 33 18.851 22 51 1481 20 63 14.8

π1: Riding-mower owners π2: Nonowners

Example 11.1: Riding mowersExample 11.1: Riding mowers

G1 G2

G1 10 2G2 2 10

True

Classify as

8.2 Discriminant by Distance

Assume k=2 for simplicity

0if

0if :Rule

:functionnt Discrimina

2

1

22

12

xx

xx

xxx

wG

wG

,Gd,Gdw

22

211

1 Σ,μ :,Σ,μ : pp NGNG

Consider the Mahalanobis distance

.,,μxΣμxx 21 12 j',Gd jj

jj

1 2

1 1 2 21 1

1 2 1 21

when

12

2

- -

-

w ' '

'

Σ Σ Σ

x x μ Σ x μ x μ Σ x μ

x μ μ Σ μ -μ

8.2 Discriminant by Distance

211

21

21

μμΣc

μμμ

-

Let

μxc

μμΣμxx

x

'

'w

w- 211

becan function nt discrimina The

8.2 Discriminant by Distance

'

nn

n

ji

n

i

jij

n

ii

j

j

jj

j

jj

Where

21

1

are estimators their known, are When

1

2121

1

21

xxxxA

AAΣ~

xx

Σ,μ,μ

8.2 Discriminant by Distance

Example Univariate Case with equal variance

212

1

21

if

if:Rule μμa

aG

aG

xx

xx

a1 2

2222

2111 :: ,,, NGNG

a*

2222

2111 :: ,,, NGNG

21

2112

*a

Example Univariate Case with equal variance

8.3 Fisher’s Discriminant Function

Idea: projection, ANOVA

Training samples

kn

kkpk

knp

kNG

NG

xxΣ μ

xxΣ μ

,,,,:

,,,,:

1

1111 1

8.3 Fisher’s Discriminant Function

Projection the data on a direction , the F-statisticspRl

,Ell

Blll kn'

k'F

1

where

k

a

aj

n

ja

aj

a

k

aaa

'E

'nB

a

1 1

1

xxxx

xxxx

8.3 Fisher’s Discriminant Function

To find such that pR*l

lll* FF

pRmax

The solution of is the eigenvector associated with the largest eigenvalue of .

*l

Discriminant function: ll x,lx where'u

0 EB

8.3 Fisher’s Discriminant Function

(B) Two Populations(B) Two Populations

'n'n xxxxxxxxB 222

111

Note

21

22

11

nnnn xx

x

We have and21 AAE

'nn

nn 2121

21

21 xxxxB

There is only one non-zero eigenvalue of as 0 EB .B 1rank

The associated eigenvector is .xxE 211

1 21

11 2

2

Discriminant function:

ifRule: when

if

' '

G

G

u x x E x x c x

x u xΣ Σ

x u x

where 1 21

2' c x x

(B) Two Populations(B) Two Populations

When is replaced by 1 ,2ΣΣ

21

21

12

ˆˆ

xcˆxcˆ ''

where

211212

121

21

2

22

22

211211

121

21

1

11

21

11

11

11

11

xxAAAAAxx

cAcˆ

xxAAAAAxx

cAcˆ

'n

'n

'n

'n

(B) Two Populations(B) Two Populations

Example Inset Classification

No. x1 x2 n. g. c. g. y

1 6.36 5.24 1 1 2.47132 5.92 5.12 1 2 2.33353 5.92 5.36 1 1 2.36634 6.44 5.64 1 1 2.54815 6.40 5.16 1 1 2.47146 6.56 5.56 1 1 2.57027 6.64 5.36 1 1 2.56508 6.68 4.96 1 1 2.52139 6.72 5.48 1 1 2.603410 6.76 5.60 1 1 2.630911 6.72 5.08 1 1 2.5488

Table 2.1 Data of two species of insects

No. x1 x2 n. g. c. g. y

1 6.00 4.88 2 2 2.32272 5.60 4.64 2 2 2.17963 5.65 4.96 2 2 2.23434 5.76 4.80 2 2 2.24565 5.96 5.08 2 2 2.33916 5.72 5.04 2 2 2.26747 5.64 4.96 2 2 2.23438 5.44 4.88 2 2 2.16829 5.04 4.44 2 2 1.997710 4.56 4.04 2 2 1.810611 5.48 4.20 2 2 2.086312 5.76 4.80 2 2 2.2456

Table 2.1 Data of two species of insects

Note: data x1 and x2 are the characteristics of insect (Hoel,1947)

n.g. means natural group (species),

c.g. the classified group,

y the value of the discriminant function

1 26.4654 5.5500 5 9878, ,

5.3236 4.7267 5 0122

2 6765 1 2942 4.8097 3.1364,

1.2942 1.7545 3.1364 2.0453

.

.

. .

x x x

E B

The eigenvalue of is 1.9187 and the associated eigenvector is

0 EB

..

.xxE

13670

27590211

Example Inset Classification

The discriminant function is

and the associated value of each observation is given in the table. The cutting point is

2121 1367027590 xxxxu ..,

..34472

Classification is G1 G2

G1 10 1G2 0 12

classify as

True

If we use , we have the same classification.

1 2ˆ ˆ2.3831 0.0939, 0.1497

Example Inset Classification

8.4 Bayes’ Discriminant Analysis

A. Idea

There are k populations G1, …, Gk in Rp.

A partition of Rp, R1, …, Rk , is determined based on a trainingsample.

Rule: if falls into Ri

Loss: is from Gi , but falls into Rj

The Probability of this misclassification

where is the density of .

iGx x

:ijc | x x

, xx| dpijPjR i

xip iGx

Expected cost of misclassification is

where q1, …, qk are prior probabilities.

We want to minimize ECM(R1, …, Rk ) w.r.t. R1, …, Rk .

11 1

ECM , , | |k k

k ii j

R R q c j i p j i

8.4 Bayes’ Discriminant Analysis

Theorem 6.4.1

Let

Then the optimal Rt’s are

1

|k

t i iii t

h x q p c t i

x

.,,,,xxx kttjhhR jtt 1:

B. Method

Take if and 0 if .

Then

| 1ijc j i ji ji

.,,,,xxx kttjpqpqR jjttt 1:

Proof:

1

k

t i i t ti

t t

h x q p q p

c q p

x x

x x

Corollary 1

In the case of k=2

12

21

112

221

cpqxh

cpqxh

x

x

we have

1 2 2 1 1

2 2 2 1 1

: 1| 2 2 |1

: 2 |1 1| 2

R q p c q p c

R q p c q p c

x x x

x x x

Corollary 2

1 2

1

2

2

1

Discriminant function:

ifRule:

if

1| 2where

2 |1

u p p

G u d

G u d

q cd

q c

x x x

x x

x x

In the case of k=2 and

22

11

if

if

GN

GN

p

p

xΣ,μ

xΣ,μ ~x

Corollary 3

ln if

ln if :Rule

2

1

dwG

dwG

xx

xx

Then

21121

2

1

21

where

exp

μμ Σμμ x x

xxx

x

-'w

wpp

u

C. Example 11.3:C. Example 11.3:Detection of hemophilia A carriersDetection of hemophilia A carriers

For the detection of hemophilia A carriers, to construct a For the detection of hemophilia A carriers, to construct a procedure for detecting potential hemophilia A carriers, procedure for detecting potential hemophilia A carriers, blood samples were assayed for two groups of women blood samples were assayed for two groups of women and measurements on the two variables. The first group and measurements on the two variables. The first group of 30 women were selected from a population of women of 30 women were selected from a population of women who did not carry the hemophilia gene. This group was who did not carry the hemophilia gene. This group was called the normal group. The second group of 22 women called the normal group. The second group of 22 women was selected from known hemophilia A carriers. This was selected from known hemophilia A carriers. This group was called the obligatory carriers.group was called the obligatory carriers.

Variables:Variables: loglog1010 (AHF activity) (AHF activity)

loglog1010 (AHF-like antigen) (AHF-like antigen)

Populations:Populations: population of women who did not carrypopulation of women who did not carry

the hemophilia gene (nthe hemophilia gene (n11=30)=30)

population of women who are knownpopulation of women who are known

hemophilia A carriers (nhemophilia A carriers (n22=45)=45)

C. Example 11.3C. Example 11.3::Detection of hemophilia a carriersDetection of hemophilia a carriers

C. Example 11.3:C. Example 11.3:Detection of hemophilia a carriersDetection of hemophilia a carriers

Data setData set

-0.0056 -0.1698 -0.3469 -0.0894 -0.1679 -0.0836 -0.1979 -0.0762 -0.1913 -0.1092 -0.0056 -0.1698 -0.3469 -0.0894 -0.1679 -0.0836 -0.1979 -0.0762 -0.1913 -0.1092 -0.5268 -0.0842 -0.0225 0.0084 -0.1827 0.1237 -0.4702 -0.1519 0.0006 -0.2015 -0.5268 -0.0842 -0.0225 0.0084 -0.1827 0.1237 -0.4702 -0.1519 0.0006 -0.2015 -0.1932 0.1507 -0.1259 -0.1551 -0.1952 0.0291 -0.228 -0.0997 -0.1972 -0.0867-0.1932 0.1507 -0.1259 -0.1551 -0.1952 0.0291 -0.228 -0.0997 -0.1972 -0.0867

-0.1657 -0.1585 -0.1879 0.0064 0.0713 0.0106 -0.0005 0.0392 -0.2123 -0.119 --0.1657 -0.1585 -0.1879 0.0064 0.0713 0.0106 -0.0005 0.0392 -0.2123 -0.119 -0.4773 0.4773 0.0248 -0.058 0.0782 -0.1138 0.214 -0.3099 -0.0686 -0.1153 -0.0498 -0.2293 0.0933 0.0248 -0.058 0.0782 -0.1138 0.214 -0.3099 -0.0686 -0.1153 -0.0498 -0.2293 0.0933 -0.0669 -0.1232 -0.1007 0.0442 -0.171 -0.0733 -0.0607 -0.056-0.0669 -0.1232 -0.1007 0.0442 -0.171 -0.0733 -0.0607 -0.056

-0.3478 -0.3618 -0.4986 -0.5015 -0.1326 -0.6911 -0.3608 -0.4535 -0.3479 -0.3539 -0.3478 -0.3618 -0.4986 -0.5015 -0.1326 -0.6911 -0.3608 -0.4535 -0.3479 -0.3539 -0.4719 -0.361 -0.3226 -0.4319 -0.2734 -0.5573 -0.3755 -0.495 -0.5107 -0.1652 -0.4719 -0.361 -0.3226 -0.4319 -0.2734 -0.5573 -0.3755 -0.495 -0.5107 -0.1652 -0.2447 -0.4232 -0.2375 -0.2205 -0.2154 -0.3447 -0.254 -0.3778 -0.4046 -0.0639 -0.2447 -0.4232 -0.2375 -0.2205 -0.2154 -0.3447 -0.254 -0.3778 -0.4046 -0.0639 -0.3351 -0.0149 -0.0312 -0.174 -0.1416 -0.1508 -0.0964 -0.2642 -0.0234 -0.3352 -0.3351 -0.0149 -0.0312 -0.174 -0.1416 -0.1508 -0.0964 -0.2642 -0.0234 -0.3352 -0.1878 -0.1744 -0.4055 -0.2444 -0.4784-0.1878 -0.1744 -0.4055 -0.2444 -0.4784  0.1151 -0.2008 -0.086 -0.2984 0.0097 -0.339 0.1237 -0.1682 -0.1721 0.0722 0.1151 -0.2008 -0.086 -0.2984 0.0097 -0.339 0.1237 -0.1682 -0.1721 0.0722 -0.1079 -0.0399 0.167 -0.0687 -0.002 0.0548 -0.1865 -0.0153 -0.2483 0.2132 -0.1079 -0.0399 0.167 -0.0687 -0.002 0.0548 -0.1865 -0.0153 -0.2483 0.2132 -0.0407 -0.0998 0.2876 0.0046 -0.0219 0.0097 -0.0573 -0.2682 -0.1162 0.1569 -0.0407 -0.0998 0.2876 0.0046 -0.0219 0.0097 -0.0573 -0.2682 -0.1162 0.1569 -0.1368 0.1539 0.14 -0.0776 0.1642 0.1137 0.0531 0.0867 0.0804 0.0875 0.251 -0.1368 0.1539 0.14 -0.0776 0.1642 0.1137 0.0531 0.0867 0.0804 0.0875 0.251 0.1892 -0.2418 0.1614 0.02820.1892 -0.2418 0.1614 0.0282

normalnormal

log10(AHF activity)log10(AHF activity)

log10(AHF-like antigen)log10(AHF-like antigen)

ObligatoryObligatorycarriercarrier

log10(AHF activity)log10(AHF activity)

log10(AHF-like antigen)log10(AHF-like antigen)

C. Example 11.3:C. Example 11.3:Detection of hemophilia a carriersDetection of hemophilia a carriers

SAS outputSAS output

C. Example 11.3:C. Example 11.3:Detection of hemophilia a carriersDetection of hemophilia a carriers

C. Example 11.3C. Example 11.3::Detection of hemophilia a carriersDetection of hemophilia a carriers

C. Example 11.3C. Example 11.3::Detection of hemophilia a carriersDetection of hemophilia a carriers

C. Example 11.3C. Example 11.3::Detection of hemophilia a carriersDetection of hemophilia a carriers