SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 28
'
&
$
%
Classi�cation of microarray samples
Example: small round blue cell tumors; Khan et
al, Nature Medicine, 2001
� Tumors classi�ed as BL (Burkitt lymphoma),
EWS (Ewing), NB (neuroblastoma) and RMS
(rhabdomyosarcoma).
� There are 63 training samples and 25 test
samples, although �ve of the latter were not
SRBCTs. 2308 genes
� Khan et al report zero training and test
errors, using a complex neural network model.
Decided that 96 genes were \important".
� Upon close examination, network is linear.
It's essentially extracting linear principal
components, and classifying in their subspace.
� But even principal components is
unnecessarily complicated for this problem!
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 29
'
&
$
%
Khan data
BL EWS NB RMS
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 31
'
&
$
%
Class centroids
BL
Average Expression
Gen
e
-0.5 0.5
050
010
0015
0020
00
EWS
Average Expression
Gen
e
-0.5 0.5
050
010
0015
0020
00
NB
Average Expression
Gen
e
-0.5 0.5
050
010
0015
0020
00
RMS
Average Expression
Gen
e
-0.5 0.5
050
010
0015
0020
00
Average expression
Gen
e
-1.0 0.0 1.0
050
010
0015
0020
00
Test sample
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 32
'
&
$
%
Nearest Shrunken Centroids
Idea: shrink each class centroid towards
the overall centroid. First normalize by
the within-class standard deviation for
each gene.
Details
� Let xij be the expression for genes i = 1; 2; : : : p
and samples j = 1; 2; : : : n.
� We have classes 1; 2; : : : K, and let Ck be indices
of the nk samples in class k.
� The ith component of the centroid for class k is
�xik =P
j2Ckxij=nk, the mean expression value
in class k for gene i; the ith component of the
overall centroid is �xi =Pn
j=1xij=n.
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 33
'
&
$
%
� Let
dik = (�xik � �xi)=si
where si is the pooled within-class standard
deviation for gene i:
s2
i =1
n�K
X
k
X
i2Ck
(xij � �xik)2
:
� Shrink each dik towards zero, giving d0
ik and new
shrunken centroids or prototypes
�x0
ik = �xi + sid0
ik
� The shrinkage is by soft-thresholding:
(0,0)
�
d0
ik = sign(dik)(jdikj ��)+
� Choose � by cross-validation.
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 34
'
&
$
%
K-Fold Cross-Validation
Primary method for estimating a tuning parameter �.
Divide the data into K roughly equal parts.
1 3 54
Train TrainTest
2
Train Train
� for each k = 1; 2; : : : K, �t the model with
parameter � to the other K � 1 parts, and
compute its error in predicting the kth part.
Average this error over the K parts to give the
estimate CV (�).
� do this for many values of �. Draw the curve
CV (�) and choose the value of � that makes
CV (�) smallest.
Typically we use K = 5 or 10.
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 35
'
&
$
%
Results
Amount of Shrinkage Delta
Err
or
0 2 4 6
2308
2188
1668
1020
598
339
206
133
81 52 34 22 15 10 8 5 1
Number of genes
0.0
0.2
0.4
0.6
0.8
cv cv cvcv cv cv cv
cv
cv
cv
tr tr tr tr tr tr
tr
tr
tr
tr
tete
te tete te te te te te te
te te
te
te
tete
te te
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 36
'
&
$
%
Advantages
� Simple, includes nearest centroid classi�er as
a special case.
� Thresholding denoises large e�ects, and sets
small ones to zero, thereby selecting genes.
� with more than two classes, method can
select di�erent genes, and di�erent numbers
of genes for each class.
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 37
'
&
$
%
The genes that matter
BL EWS NB RMS
295985
866702
814260
770394
377461
810057
365826
41591
629896
308231
325182
812105
44563
244618
796258
298062
784224
296448
207274
563673
504791
204545
21652
308163
212542
183337
241412
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 38
'
&
$
%
Estimated Class Probabilities
Sample
Pro
babi
lity
0 10 20 30 40 50 60
0.0
0.2
0.4
0.6
0.8
1.0
Training Data
• • • • • • • •
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • •• • • • • • • •
•• •
•• • • •
• ••••• • •
•••
•
• • •
• ••
•• •
••••• • • •
• • • • •• • •
•• • • • •
•
• • •
• • • • • • • ••• •
• • • • • • •• • • • • • • • • • • • •
•• • •
• •
•• • •
••
•• • • • • • • • •
• • • • • • • • • •• • • • • • • •
•• •
•• • • • • • •
••• • •
•• •
•
• • • •• •
•• •
• •••••
•
•• • • • • • • •
•• •
• • •
•
•• •
BL EWS NB RMS
Sample
Pro
babi
lity
5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Test Data
•
• • •
• • •
•
• • • • • • • • • • •
•
• • • • •
•
• • •
•
•
•
•
•
•
••
• • • ••
••
•
•
•
• • ••
• • •
• ••
•
••
••
• • • •
•
•
• • • •• • •
•
• • ••
•
••
••
• •
• • • ••
•
•
•
•
•
•• •
BL EWS NB RMS
OO O O
O
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 39
'
&
$
%
Class probabilities
� For a test sample x� = (x�1; x�2; : : : x
�p). We
de�ne the discriminant score for class k
Æk(x�) =
pX
i=1
(x�i� �x0
ik)2
s2i
� 2 log �k
� The classi�cation rule is then
C(x�) = ` if Æ`(x�) = mink Æk(x
�)
� estimates of the class probabilities, by analogy
to Gaussian linear discriminant analysis, are
p̂k(x�) =
e� 1
2Æk(x
�)
PK
`=1 e� 1
2Æ`(x�)
� Still very simple. In statistical parlance, this
is a restricted version of a naive Bayes
classi�er (also called idiot's Bayes!)
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 40
'
&
$
%
Adaptive threshold scaling
� idea: de�ne class-dependent scaling factors �k
for each class:
dik =�xik � �xi
mk�k � si
: (1)
� Use smaller factors for hard-to-classify classes
=> same test error with fewer total number
of genes
� Adaptive procedure: start with all �k = 1,
and then reduce �k by 10% for the class k
with largest area under training error curve.
� repeat 20 times and choose solution with
smallest area under curve for all classes
� can dramatically reduce total number of
genes used, without increasing error rate
SL&DM c Hastie & Tibshirani March 26, 2002 Supervised Learning: 41
'
&
$
%
Lymphoma data
Scaling factors changed from (1; 1; 1) to
(1:9; 1; 1:5)
0 2 4 6
4026
3795
3272
2593
2042
1561
1153
856
621
439
309
207
135
89 62 40 30 18 9 5 1
0.0
0.2
0.4
0.6
0.8
tetetetetetete
tetetetetetetetetetetetetetetetetetetetetetetetetetetetete
tete
tetetete
tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr trtr tr tr tr tr
0 1 2 3 4 5
4026
3757
3146
2414
1774
1236
867
557
354
212
128
78 48 21 13 3
0.0
0.2
0.4
0.6
0.8
te te te te te te te te te te te te te te te te te te te te te te te te te
te te te te
te te te
tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr tr
tr tr tr tr
tr tr tr
Amount of Shrinkage �
Amount of Shrinkage �
Error
Error
Size
Size
Top Related