Multi-class SVM with Negative Data Selection for Web Page Classification

8
Multi-class SVM with Negative Data Selection for Web Page Classification Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference on Neural Networks 2004

description

Multi-class SVM with Negative Data Selection for Web Page Classification. Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference on Neural Networks 2004. Motivation. Several new websites are launched everyday Need to search fast and efficiently - PowerPoint PPT Presentation

Transcript of Multi-class SVM with Negative Data Selection for Web Page Classification

Page 1: Multi-class SVM with Negative Data Selection for Web Page Classification

Multi-class SVM with Negative Data Selection for Web Page

Classification

Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao

International Joint Conference on Neural Networks 2004

Page 2: Multi-class SVM with Negative Data Selection for Web Page Classification

Motivation

• Several new websites are launched everyday

• Need to search fast and efficiently

• Search engines organize websites under topic hierarchy (taxonomy)

• Need a classifier: one-against-all SVM

• Catch: huge negative data increased training time

Page 3: Multi-class SVM with Negative Data Selection for Web Page Classification

Negative Data Selection

Support vectors in the negative data are much similar to thepositive data than the other negative data

Page 4: Multi-class SVM with Negative Data Selection for Web Page Classification

Negative Data Selection

1. Feature Selection: top n keywords from the positive data

2. All websites are represented as vectors of these top n keywords.

3. Cosine Similarity:

),(/),)(( ialliallkk catdocncattermdocnDF

m

im

kk

i

m

jj

inpc

km

kn

kp

kpDDSim

1

1

2

1

2

)

)(

()

)(

(),(

Page 5: Multi-class SVM with Negative Data Selection for Web Page Classification

Negative Data Selection

• Plot similarity scores of negative to positive documents in descending order with negative documents

Sim

ilarit

y S

core

s in

D

esce

ndin

g or

der

Negative Documents

Convergence Point

Page 6: Multi-class SVM with Negative Data Selection for Web Page Classification

Experiments

• Reuters dataset (10802 training, 565 test)

Class Number of Positive Data

Number of Negative Data

Crude 580 10222

Trade 475 10327

Dlr 162 10640

Nat-gas 92 10710

Acq 2357 8445

Page 7: Multi-class SVM with Negative Data Selection for Web Page Classification

Experiments

Page 8: Multi-class SVM with Negative Data Selection for Web Page Classification

Experiments