How purportedly great ML models can be Garbage In, Garbage ... · How purportedly great ML models...

Garbage In, Garbage Out How purportedly great ML models can be

screwed up by bad data

Hillary SandersData Scientist - operations team lead

What I’ll show...

1. Model accuracy claimed by security ML researchers is misleading

2. It’s generally biased in an overly optimistic direction

3. → Estimating the severity of that bias is important, and will help you make sure that your model isn’t… garbage.

ƒ(input) = output

ƒ(http://www.trustus.evil.ru/paypal/login/) = .944780ƒ(https://www.facebook.com/) =.019367

Machine Learning

input →

http://www.trustus.evil.ru/paypal/login/http://gsbyntwqmem.mrjz5viern.ru/start_page.exe

https://www.facebook.com/http://imgur.com/r/cats/omgn4Zv

→ output.944780.99981683.019367.008448

ƒ(input)=

Machine Learning

TRAINING

(Supervised) Machine LearningTRAINING

Test Datadwprgbyfykriizylpqcltzx.biz/, 1http://git.demo.nick.net.nz/, 0

Training Datahttp://www..evil.ru/paypal/login/, 1http://gsbynr.ru/start_page.exe, 1

https://www.facebook.com/, 0http://imgur.com/r/cats/omgn4Zv, 0

(Supervised) Machine Learning

“fitted” model ƒ

TRAINING

ƒ(input)

TESTaccuracy

Training accuracy

Test accuracy

Training accuracy

Training accuracy Test accuracy← Supposed to

represent “real world”

accuracy

TESTING

ƒ(input)

TESTING

ƒ(input)

Test Accuracy.99 (error ≈ .01).01 (error ≈ .01)

DEPLOYMENTaccuracy

TRAINING

ƒ(input)

Deployment Data!???.evil.com

???.good.com

Deployment Accuracy????, ???

Training Testing

Deployment

Training Testing

Deployment

Training Testing

Deployment

Training Testing

Deployment

Training Testing

Deployment

Train / Test “Sensitivity Analysis”: Identifying training data that leads to improved and consistent performance on new datasets

Train and test the same model across different datasets, and evaluate the results:

1. What training datasets generalize better to others?

2. How sensitive is a model’s accuracy to changes in test datasets?

Test Data A Test Data CTest Data B

Train Data A Train Data CTrain Data B

Train / Test “Sensitivity Analysis”

1. Model Used

2. Accuracy Metric Used: AUC

3. Datasets Used

4. Results!

1. Model Used

3. Datasets Used

4. Results!

URL Model

A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys

Joshua Saxe, Konstantin Berlin

URL Model

A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys

Joshua Saxe, Konstantin Berlin

→ output.999583.001491

input → http://www.trustus.evil.ru/paypal/login/

https://www.facebook.com/

ƒ(input)=

1. Model Used

3. Datasets Used

4. Results!

AUC = “Area Under the [ROC] Curve”

1. Model Used

3. Datasets Used

4. Results!

VirusTotal

10 million URLs from January 2017

≈ 4% malware

CommonCrawl & PhishTank Sophos

10 million internal URLs from January 2017≈ 4% malware

10 million URLs from January 2017*

≈ 20k malware samples

* plus pre-Jan ‘17 phishtank malicious URLs, due to lack of data

VirusTotal

≈ 4% malware

CommonCrawl & Phishtank Sophos

VirusTotal

≈ 4% malware

VirusTotal

≈ 4% malware

(January ‘17 data) (January ‘17 data)(January ‘17 data)

CommonCrawl & PT model

Sophosmodel

VirusTotalmodel

VirusTotal test data

CommonCrawl & PT test data

Sophos test data

(Feb-Apr ‘17 data) (Jan-Apr ‘17 data)(Jan-Apr ‘17 data)

(January ‘17 data) (January ‘17 data)(January ‘17 data)

CommonCrawl & PT model

Sophosmodel

VirusTotalmodel

1. Model Used

3. Datasets Used

4. Results!

Sophos Model Tested On Sophos Test Data

What did we learn?

- Model accuracy is extremely dependent on the training and test datasets used

- Which datasets generalize better

- Expected variance in accuracy on new, inherently different data

What did we learn?

How minimize the probability… of failing spectacularly

Models are liable to fail on different, future data.

Especially when we lack deployment test data, we need to map the limitations of our models using train / test dataset sensitivity analyses.

This technique can help us choose better training datasets and gain a better understanding of how sensitive model accuracy is to new test data distributions. This allows us to develop models that work in the real world, not just in idealized laboratory settings.

Thanks!

How purportedly great ML models can be Garbage In, Garbage ... · How purportedly great ML models...

Documents

Transcript of How purportedly great ML models can be Garbage In, Garbage ... · How purportedly great ML models...

Garbage pink

Softwarequalität: Garbage in - garbage out

Garbage white

ML. Main features Expression-oriented List-oriented, garbage-collected heap-based Functional Functions are first-class values Largely side-effect free.

Gathering Information Information Collection: Garbage In – Garbage Out.

Alert!_4 Garbage in.. Garbage out..

Garbage Disposal Adapter & Flange Kit - The Home Depot€¦ · Standard Garbage Disposal Unit Kraus Garbage Disposal Adapter ... Standard Garbage Disposal Unit Kraus Garbage Disposal

GARBAGE IN ,GARBAGE OUT

Sample Quality is Key Garbage in……………………………..Garbage out.

COLLECTION DAY MAP - Surrey · surrey collection day 2020, cloverdale garbage day, guildford garbage day, surrey garbage schedule, garbage schedule, recycling schedule, garbage pickup

Garbage/Debris Additional Informational Flyersjbparish.com/pdfs/Garbage Flyer - Metro.pdf · Household Garbage Twice Weekly Metro Service Group (Route is same) Household Garbage &

Garbage in, Garbage out Wncert.nic.in/textbook/pdf/fesc116.pdf · Garbage in, Garbage out W e throw out so much rubbish or garbage everyday from our homes, schools, shops and offices.

Garbage Collection unter.NET Garbage Collection Optimierung Finalization Resurrection.

Garbage red

Garbage In Garbage Out - Asset Management BC

Garbage separation and disposal instructions€¦ · Do not take out garbage except on a collection day. Burnable garbage Non-burnable garbage When disposing of burnable garbage,

G1 GC: Garbage-First Garbage Collector

Garbage First Garbage Collector Algorithm

Garbage orange

Garbage In: Garbage Out - Second Union Churchsecondunion.org/wp-content/uploads/2013/03/2ThoughtsFebruary131.pdf · Garbage In: Garbage Out Pastor Bob Zoba. February 2013 Garbage