Multiple Classifier Systems under attack

Multiple classifier systems under attack

Battista Biggio, Giorgio Fumera, Fabio RoliDept. of Electrical and Electronic Eng., Univ. of Cagliari

http://prag.diee.unica.it

9th International Workshop on Multiple Classifier Systems

2

Outline

● Adversarial classification

● MCSs in adversarial classification tasks

● Some experimental results

3

Adversarial classification

I am John Smith

Subject: MCS2010 Suggested tours

Dear MCS 2010 Participant,Attached please find the offerswe negotiated with the travel agency...

legitimate

Subject: Need affordable Drugs??

Order from Canadian Pharmacy& Save You MoneyWe are having SpecialsHot Promotion this week!...

spamgenuine

I am Bob Brown

impostor

J. Smith B. Brown

Templatedatabase

Biometric verificationSpam filtering

Two pattern classes:legitimate, malicious

Examples:● Biometric verification and recognition ● Intrusion detection in computer networks● Spam filtering● Network traffic identification...

4


Subject: Need affordab1e D r u g s??

Order from (anadian Ph@rmacy & S@ve You MoneyWe are having Specials H0t Promotion this week!..."Don't you guys ever read a paper? Moyer's a gentleman now. He knows t"Well I'm sure I can't help what you think," she said tartly. "After a

spam

Subject: Need affordable Drugs??

Order from Canadian Pharmacy & Save You MoneyWe are having Specials Hot Promotion this week!...

spam

I am Bob BrownAttack:Bad word obfuscationGood word insertion

Template databaseJ. Smith B. Brown

B. Brown

impostor

Attack: fingerprint spoofing

5


Main issues:● vulnerabilities of pattern recognition systems● performance evaluation under attack● design of pattern recognition systems robust to attacks

6

Multiple classifier systemsin adversarial environments

J. Smith B. Brown

I am Bob Brown

impostor

Accepted/Rejected

Fusion rule

Multimodal biometric systems: more accurate than unimodal ones

7

Multiple classifier systemsin adversarial environments

J. Smith B. Brown

I am Bob Brown

impostor

Fusion rule

Multimodal biometric systems: more accurate than unimodal onesAnd also more robust to attacks (?)

Analogous claims in other applications(spam filtering, network intrusion detection, etc.)

Accepted/Rejected

8

Aim of our work

Main issues in adversarial classification:● vulnerabilities of pattern recognition systems● performance evaluation under attack● design of pattern recognition systems robust to attacks

Our goal: to investigate whether and how MCSs allow to improve the robustness of PR systems under attack

9

Linear classifiers under attack

Buy vi4gr4!

Did you ever play that gamewhen you were a kid where the little plastic hippo tries to gobble up all your marbles?

x’ = [ 1 0 0 0 1 0 0 1 …]

Buy viagra!

x = [ 1 0 1 0 0 0 0 0 …]

The adversary exploits some knowledge on● the features● the classifier's decision function

An example: spam filtering, linear classifiersf(x) = sign { ω

1x

1 + ω

2x

2 + ... + ω

Nx

N + ω

0 }

xi {0,1}; f(x) = +1: spam; f(x) = -1: legitimate

10


The adversary exploits some knowledge on● the features● the classifier's decision function

buy viagra

kid game

0.52.0

-0.5

-2.0

f(x) = sign { ω1x

1 + ω

2x

2 + ... ω

Nx

N + ω

0 }

Buy viagra! 0.5 + 2.0 - 0.9 = 0.6 > 0: spam

Buy vi4gr4! 0.5 - 0.9 = -0.4 < 0: legitimate

Buy viagra! 0.5 + 2.0 - 2.0 - 0.9 = -0.4 < 0: legitimategame

ω0

-0.9

ω

11


Possible strategy to improve the robustness of linear classifiers: keep weights as much uniform as possible (Kolcz and Teo, 6th Conf. on Email and Anti-Spam, CEAS 2009)

buy viagra

kid game

1.0 1.5

-1.0-1.5

f(x) = sign { ω1x

1 + ω

2x

2 + ... ω

Nx

N + ω

0 }

Buy viagra! 1.0 + 1.5 - 0.9 = 1.6 > 0: spam

Buy vi4gr4! 1.0 - 0.9 = 0.1 > 0: spam

Buy viagra! 1.0 + 1.5 - 1.5 - 0.9 = 0.1 > 0: spamgame

Buy viagra! 1.0 + 1.5 - 1.0 - 1.5 - 0.9 = -0.9 < 0kid game legitimate

ω0

-0.9

ω

12

Ensembles of linear classifiers under attack

Do randomisation-based MCS techniques result in more uniform weights of linear base classifiers?● bagging● random subspace method● ...

(accuracy-robustness trade-off)

13

Experimental setting (1)

● Spam filtering task● TREC 2007 data set (20,000 out of > 75,000 e-mails, 2/3 spam)● Features: bag of words (word occurrence) > 360,000● Base linear classifiers: SVM, Logistic Regression● MCS

● ensemble size: 3, 5, 10● bagging: 20%, 100% training samples● RSM: 20%, 50%, 80% feature subset sizes

● 5 runs● Evaluation of performance under attack: worst-case BWO/GWI

attack, for m obfuscated/added words (m = “attack strength”)

14

Performance measure

0 0.1FP

TP

1

1

Receiver Operating Characteristic (ROC) curve

TP = Prob [f(X) = Malicious | Y = Malicious]

FP = Prob [f(X) = Malicious | Y = Legitimate]

AUC10%

15

Measure of weights uniformity

0K

F(K)

N

1least uniform weights

most uniform weights

|ω1 | |ω

Ν |

|ω1 | |ω

Ν |

|ω1 | |ω

Ν |

sum of weights absolute values

sum of top-K weights absolute values

Kolcz and Teo, 6th Conf. on Email and Anti-Spam (CEAS 2009)

|ω|

|ω|

|ω|

16

Results (1)

number of obfuscated/added words

17

Experimental setting (2)

● SpamAssassin● About N = 900 Boolean“tests”, x

1,

x

2, ...,x

N , x

i {0,1}

● Decision function:f(x) = sign { ω

1x

1 + ω

2x

2 + ... + ω

Nx

N + ω

0 },

f(x) = +1: spam; f(x) = -1: legitimate● Default weights: machine learning + manual tuning● Evaluation of performance under attack: evasion of the

worst m tests (m = “attack strength”)

18

Results (2)

number of evaded tests

19

Conclusions

● Adversarial classification: which roles can MCSs play?

● This work:● linear classifiers● attacks based on some knowledge about features

and decision function (case study: spam filtering)

● Future works: investigating MCSs on different applications, base classifiers, kinds of attacks, ...

Multiple Classifier Systems under attack

Documents

Transcript of Multiple Classifier Systems under attack