Multi-class SVM with Negative Data Selection for Web Page Classification
-
Upload
vera-hanson -
Category
Documents
-
view
18 -
download
2
description
Transcript of Multi-class SVM with Negative Data Selection for Web Page Classification
Multi-class SVM with Negative Data Selection for Web Page
Classification
Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao
International Joint Conference on Neural Networks 2004
Motivation
• Several new websites are launched everyday
• Need to search fast and efficiently
• Search engines organize websites under topic hierarchy (taxonomy)
• Need a classifier: one-against-all SVM
• Catch: huge negative data increased training time
Negative Data Selection
Support vectors in the negative data are much similar to thepositive data than the other negative data
Negative Data Selection
1. Feature Selection: top n keywords from the positive data
2. All websites are represented as vectors of these top n keywords.
3. Cosine Similarity:
),(/),)(( ialliallkk catdocncattermdocnDF
m
im
kk
i
m
jj
inpc
km
kn
kp
kpDDSim
1
1
2
1
2
)
)(
()
)(
(),(
Negative Data Selection
• Plot similarity scores of negative to positive documents in descending order with negative documents
Sim
ilarit
y S
core
s in
D
esce
ndin
g or
der
Negative Documents
Convergence Point
Experiments
• Reuters dataset (10802 training, 565 test)
Class Number of Positive Data
Number of Negative Data
Crude 580 10222
Trade 475 10327
Dlr 162 10640
Nat-gas 92 10710
Acq 2357 8445
Experiments
Experiments