[CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

60
ISAAC DAWSON, AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION VERACODE 15

Transcript of [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

Page 1: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

ISAAC DAWSON,

AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION

VERACODE

15

Page 2: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODEAROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION

ABOUT ME:

▸ Previously at @stake, Symantec (10 years)

▸ Moved into research role at Veracode, Inc. (6 years)

▸ Living in Japan for 12 years

▸ I <3

Page 3: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODEAROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION

IT ALL STARTED IN 2012…

Page 4: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODEAROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION

SECURITY HEADER SCANNING HISTORY

▸ All scanners use the Alexa Top 1 Million URLs

▸ Galexa (November 2012 - March 2014)

▸ Golexa (March 2014 - February 2016)

▸ Creeper v0-v1 (February 2016 - July 2016)

▸ Creeper v2 (July 2016 - …)

Page 5: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

ARCHITECTURETHE SYSTEM:

Page 6: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODEAROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION

SUMMARY OF SYSTEMS & COMPONENTS

▸ Admin (x1) - Manages jobs

▸ Agents (x50) - Analyzes URLs

▸ DB Writers (x4) - Feeds analysis data into the DB & S3

▸ Database (x1) - PostgreSQL 9.5 DB

▸ NSQ - A message queue for URLs, reports and responses

▸ S3 - Stores serialized DOM and HTML/JS

Page 7: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODEAROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION

THE MESSAGE QUEUE -NSQD, NSQLOOKUPD

▸ NSQ is an easy to deploy message queue

▸ JSON messages between all systems

▸ All agents point to Admin service running NSQLookupd

Page 8: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODEAROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION

HELPFUL NSQ FEATURES

// Create consumerc.urlConsumer, err = nsq.NewConsumer(job.Topics["url"], creeper_types.UrlChannel, cfg)

// Process numBrowser of messages concurrently (7)c.urlConsumer.AddConcurrentHandlers( nsq.HandlerFunc(c.processUrls), numBrowsers)// Job taking too long to handle/process a message?msg.Touch() // notify we are still working on this message

// Need to requeue because chrome crashed?msg.RequeueWithoutBackoff(-1)

// Need to change max # of inflight messages?c.urlConsumer.ChangeMaxInFlight(c.getInflightCount())

1

2

3

4

Page 9: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DATA STORAGE

AROUND THE WEB IN 80 HOURS: SCALABLE FINGERPRINTING WITH CHROMIUM AUTOMATION

DATAFLOW

DBAGENT

ADMIN

WRITER

WRITER

WRITER S3

AGENT

AGENT

Page 10: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

CREEPER AGENTSGETTING THE DATA WITH:

Page 11: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

Page 12: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

BROWSER AUTOMATION REQUIREMENTS

▸ Automatable

▸ Fast

▸ Capture network

▸ Capture various browser events (CSP violations)

▸ Inject JavaScript

Page 13: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

CHOSE CHROME, FOR OBVIOUS REASONS…

▸ Each agent runs 3-6 tabs concurrently

▸ Headless, uses Xvfb

▸ Can get full read access to network response data

▸ Easily inject javascript

▸ Can subscribe to console messages

Page 14: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

AGENT DESIGN

CREEPER AGENT

BROWSER MANAGER

ANALYZER

REPORTER

APP LOGIC

Page 15: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

CONTROLLING THE BROWSER

Page 16: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

GOOGLE CHROME REMOTE DEBUGGER

▸ Huge definition files: browser_protocol.json and js_protocol.json

{ "version": { "major": "1", "minor": "1" }, "domains": [{ "domain": "Inspector", "hidden": true, "types": [], "commands": [{ "name": "enable", "description": "Enables inspector domain...”, "handlers": ["browser", "renderer"] }], "events": [{ "name": "evaluateForTestInFrontend", "parameters": [ … ] }], }}

Page 17: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

GCD

▸ GCD generates Go code using templates

▸ Remote access to debugger events, functions, types.

▸ Can be updated easily as the protocol files change

Page 18: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

GCD WAS GOOD BUT…

▸ Needed something better

▸ Built autogcd to automate:

▸ Trapping console messages

▸ Intercepting network data

▸ Injecting JS

▸ Took some inspiration from WebDriver

Page 19: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

GETTING CSP EVENTS

func (b *Browser) StartIntercepting() error { b.tab.GetConsoleMessages(b.cspHandler()) return nil}

func (b *Browser) cspHandler() autogcd.ConsoleMessageFunc { return func(tab *autogcd.Tab, message *gcdapi.ConsoleConsoleMessage) { if message.Source != "security" { return } parseCsp(b.creeperData.CspResults, b.creeperData.ReportOnlyCspResults, message.Text) }}

1

2

Page 20: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

TRAPPING NETWORK RESPONSESfunc (b *Browser) StartIntercepting() error { b.tab.GetNetworkTraffic(nil, b.responseHandler(), b.respFinishedHandler()) }

func (b *Browser) responseHandler() autogcd.NetworkResponseHandlerFunc { return func(tab *autogcd.Tab, response *autogcd.NetworkResponse) { creeperResponse.Url = response.Response.Url b.networkContainer.WaitFor(response.RequestId) creeperResponse.ResponseBody, _ = b.encodeBody(response.RequestId, creeperResponse.MimeType, creeperResponse.Url) b.networkContainer.AddReady(creeperResponse) }}

// mark the body as readyfunc (b *Browser) respFinishedHandler() autogcd.NetworkFinishedHandlerFunc { return func(tab *autogcd.Tab, requestId string, dataLength, timeStamp float64) { b.networkContainer.BodyReady(requestId) }}

1

2

3

4

Page 21: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

INJECTING JAVASCRIPT

▸ Extract JS libraries and versions

▸ Retire.js and Wappalyzer have some good pointers

▸ Created a JSON file with 86 frameworks

▸ Must wait for the page to be fully loaded

Page 22: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

INJECTING JAVASCRIPT - THE QUERIES{ "libraries": [ { "url": "http://jquery.com/", "key": "jquery", "statement": "jQuery.fn.jquery" }, { "url": "https://jquerymobile.com/", "key": "jquery-mobile", "statement": "jQuery.mobile.version" }, { "url": "http://www.embeddedjs.com/", "key": "embeddedjs 1.0", "statement": "(typeof EJS === \"function\" && typeof EJS.Buffer === \"function\") ? \"ejs 1.0\":"\"" }, { "url": "http://www.embeddedjs.com/", "key": "embeddedjs 0.x", "statement": "(typeof EJS === \"function\" && typeof EjsScanner === \"function\") ? \"ejs 0.x\":\"\"" } ]}

Page 23: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

INJECTING JAVASCRIPT - INJECTING

for _, library := range JsLibs.Libraries { res, err := b.ExecuteScript(library.Statement) if err == nil && string(res) != "" { log.Printf("%s library result was: %s\n", library.Key, string(res)) report.JavaScriptLibraries[library.Key] = string(res) }}

Page 24: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

INJECTING JAVASCRIPT - WHEN IS A PAGE DONE?

▸ DOMContentLoaded doesn’t handle dynamically loaded JS

▸ Listen for DOM change events

▸ Page loaded if no DOM change events occur for > 2 seconds

▸ Timeout after 5 seconds

Page 25: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

CHALLENGES

Page 26: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

CHALLENGES - CONTAMINATION

+ + + + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + + + +

StartCapture

LoadURL

DocumentLoaded

StopCapture

Page 27: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

CHALLENGES - CONTAMINATION - SOLUTION

+ + + + + + + | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | + + + + + + +

BorrowBrowser

StartCapture

LoadURL

DocumentLoaded

StopCapture

KillBrowser

Start/AddPool

Page 28: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

CHALLENGES - CHROME BUG #1

▸ Turns out opening tabs excessively can cause tabs to not respond to debugger protocol

Page 29: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

CHALLENGES - CHROME BUG #1 - SOLUTION

▸ Mark tabs as ‘dead’

▸ If max dead tab count is reached, drain active URLs and kill chrome

Page 30: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

CRASHSAFARI.COM

Page 31: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

CHALLENGES - CHROME BUG #2 - CHRASHSAFARI.COM

▸ Would completely kill chrome *and* agent

▸ Lost all active tabs

▸ This site cost me about 2-3 weeks development time

Page 32: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

▸ Created killface package

▸ Sends a notification to stop active work

▸ Worker count dynamically adjusted to 1

▸ Pauses queue, runs all unfinished URLs again

▸ Once active count is 0, restart normally

CREEPER AGENTS: GETTING THE DATA

CHALLENGES - CRASHSAFARI.COM - SOLUTION

Page 33: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

CREEPER AGENTS: GETTING THE DATA

OTHER CHALLENGES

✘ NSQ messages too large, zipping ineffective

✓Split response data/report data

✘ Sites block AWS IP ranges, (craigslist.com etc)

☹ Timeout…

✘ Concurrency issues

✓ Very careful use of go routines, channels and timers.

✘ Site analysis failures/timeouts

✓ Try 3 times, keep track of retry state.

✓ During retry, open a new browser and work on additional url

Page 34: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

DB WRITERS & S3STORING THE DATA WITH:

Page 35: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

PREVIOUSLY…

▸ Creeper v0 had many problems

▸ RDS did not support PostgreSQL 9.5

▸ Duplicate data

▸ For v1, wrote to disk, SHA1 of contents:

▸ /job/files/5/a/b/c/5abcfbe73e39e0572a939b09f1eb16d7.html

▸ v1 did not shard database tables

▸ Database tables were normalized

▸ Lock contention

Page 36: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

DATABASE REFRESHER - NORMALIZING

url header_name header_value

http://veracode.com x-xss-protection 1; mode=blockhttp://codeblue.jp x-xss-protection 1; mode=blockhttp://google.jp x-xss-protection 1; mode=block report-uri

url header_name_id header_value_id

http://veracode.com 0 0

http://codeblue.jp 0 0

http://google.jp 0 1

header_name_id header_name

0 x-xss-protection

header_value_id header_value

0 1; mode=block1 1; mode=block report-uri …

NORMALIZED:

FLATTENED:

Page 37: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

CHALLENGES - GETTING THE DATA IN QUICKLY

▸ Get the data out of the DB writers as soon as possible

▸ Careful to not overload the database with many connections

▸ Reduce lock contention for writing

Page 38: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

SOLUTION #1 - GETTING THE DATA IN QUICKLY

▸ DB Writers batch up reports and responses

▸ Inserted every 2.5-3.5 seconds

▸ Reduces number of required DB connections

Page 39: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

SOLUTION #1 BATCHERfunc (b *Batcher) AddReport(r *creeper_types.CreeperReport) { select { case b.reportPool <- r: atomic.AddInt32(&b.reportCount, 1) }}

func (b *Batcher) EmptyReports() []*creeper_types.CreeperReport { reports := make([]*creeper_types.CreeperReport, 0) for { select { case report := <-b.reportPool: reports = append(reports, report) default: return reports } } return nil}

Page 40: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

SOLUTION #2 - GETTING THE DATA IN QUICKLY

▸ Insert into temporary table using COPY FROM

▸ Extracted from temporary table and INSERTed into final table. This allows for UPSERTS:

INSERT INTO header_names (header_name) SELECT responses_tmp.header_name FROM responses_tmp ON CONFLICT DO NOTHING;

Page 41: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

CHALLENGES - LARGE TABLES

▸ INSERT INTO … FROM SELECT … on a table with 80,000,000 rows

▸ As tables got bigger, db writers slowed down

▸ This is not scalable

Page 42: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

SOLUTION - TABLE SHARDING

▸ Much like sharding for the file system

▸ Requires a key:

▸ URL ID. (Ex: 1,google.com 2,microsoft.com etc)

▸ Only large tables require sharding

Page 43: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

shardKey % inputId

shardKey = 1

shardKey = 2

shardKey = 3

DB

DB WRITERS: STORING THE DATA

TABLE SHARDING

WRITER

Page 44: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

CREATING A SHARD KEY

▸ Choose the number of times to shard your tables: ▸ shardKey = input_id % 32

▸ Created PLpgSQL functions: ▸

create unlogged table if not exists job_0_responses ( response_id serial primary key, input_id integer not null, body_hash varchar(64) not null, resp_url bytea not null, resp_uuid varchar(64) unique not null, resp_type_id integer references resp_types (resp_type_id) not null, status_id integer references status_lines (status_id) not null, status_code integer, mime_type_id integer references mime_types (mime_type_id) not null, response_time bigint);

EXECUTE merge_headers(job, shardKey)

Page 45: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

CONS WITH SHARDING

▸ Added complexity for querying

▸ Best to create a new table with all data for reporting

▸ In the future, may use Citus for sharding across multiple databases

Page 46: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

RESPONSE DATA (JS/HTML)

Page 47: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

▸ S3 limits 100/rps, but pushing 200-2000/rps

▸ Had to contact support

▸ Exponential Backoff, retry 10 times

▸ Hash is stored in response table

▸ HeadObject first to check existence, then PutObject

▸ HeadObjects are way cheaper

DB WRITERS: STORING THE DATA

MOVING TO S3

Page 48: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

DB WRITERS: STORING THE DATA

LASTLY…

▸ Created unlogged tables

▸ Modified PostgreSQL configuration:

▸ Set checkpoints 5 minutes (max) instead of 1

▸ Enabled fsync

▸ Set max_wal_size 256

Page 49: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

THE RESULTSA LOOK AT THE DATA

Page 50: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

SCAN STATISTICS

Responses 72,193,155

Headers 525,385,900

JS Results 1,943,925

URLs w/Errors 67,315

Redirected to HTTPS 145,268

URLS w/CSP Violations 740

Scan Time 15 Hours

Cost 343$ / 35063円

Page 51: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

CSP VIOLATIONS

▸ 722 out of 4965 sites using CSP had violations

▸ Security sites:

▸ https://www.globalsign.com/en/, http://secunia.com/,

▸ https://lastpass.com/, https://www.avant.com/, http://www.veracode.com/

▸ Well known organizations:

▸ http://www.alibaba.com, https://www.doubleclickbygoogle.com

▸ https://mozillians.org/en-US/

Page 52: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

SUM OF CSP VIOLATION TYPES

0

750

1500

2250

3000

SCRIPTSRCIMGSRC

FRAMESRC

FONTSRC

STYLESRC

CONNECTSRC

MEDIASRC

CHILDSRC

OBJECTSRC

BASEURI

FORMACTION

MANIFESTSRC

Page 53: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

TOP JAVASCRIPT LIBRARIES > 3000

0

200000

400000

600000

800000

JQUE

RY

JQUE

RY-U

I

MODE

RNIZR

JQUE

RY-U

I-DIA

LOG

YEPN

OPE

JQUE

RY-U

I-AUT

OCOM

PLET

E

JQUE

RY-U

I-TOO

LTIP

BOOT

STRA

P

HTML

5SHI

V

UNDE

RSCO

RE

JQUE

RY.PR

ETTY

PHOT

O

PROT

OTYP

EJS

DRUP

AL

MOOT

OOLS

MEJS

BACK

BONE

.JS

ANGU

LARJ

S

FOUN

DATIO

N

JWPL

AYER

REQU

IREJ

S

HAND

LEBA

RS.JS

HAMM

ERJS

JPLA

YER

MUST

ACHE

.JS

SCRI

PTAC

ULOU

S

SHAD

OWBO

X

ZERO

CLIP

BOAR

D YUI

RAPH

AEL

DATA

TABL

ES

KNOC

KOUT

Page 54: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

JAVASCRIPT ‘NEXTGEN’ FRAMEWORKS > 100

0

4500

9000

13500

18000

BACKBONE.JS

ANGULARJS

FOUNDATION YUI

KNOCKOUTDOJO

REACTJS

MARIONETTEJS VUEJS

EMBER

METEOR

MITHRIL

EXTJS

POLYMER

Page 55: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

VULNERABILITY COUNTS

0

20000

40000

60000

80000JQ

UERY

JQUE

RY-U

I-DIA

LOG

JQUE

RY.PR

ETTY

PHOT

O

ANGU

LARJ

S

JQUE

RY-U

I-TOO

LTIP

JPLA

YER

HAND

LEBA

RS.JS

ZERO

CLIP

BOAR

D

MUST

ACHE

.JS YUI

PROT

OTYP

EJS

MEJS

JWPL

AYER

DOJO

EMBE

R

TINYM

CE

PLUP

LOAD

JQUE

RY-M

OBILE

CKED

ITOR

Page 56: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

LONGEST SECURITY HEADER AWARD - HTTPS://WWW.INSIGHTGUIDES.COM/Content-Security-Policy: default-src 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:; script-src 'self' http://www.googletagmanager.com https://www.googletagmanager.com http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com 'unsafe-eval' 'unsafe-inline' https://apis.google.com blob:; connect-src * 'self' http://tagmanager.google.com https://tagmanager.google.com https://*.doubleclick.net http://*.doubleclick.net https://*.google-analytics.com http://*.google-analytics.com https://*.livechatinc.com http://*.livechatinc.com https://*.cloudfront.net http://*.cloudfront.net https://*.googleusercontent.com http://*.googleusercontent.com https://www.bugherd.com http://www.bugherd.com https://*.braintreegateway.com http://*.braintreegateway.com https://www.biblioimages.com http://www.biblioimages.com https://fonts.gstatic.com http://fonts.gstatic.com https://*.googleapis.com http://*.googleapis.com https://tripadvisor.com http://tripadvisor.com https://*.gstatic.com http://*.gstatic.com https://www.tripadvisor.com http://www.tripadvisor.com https://www.insightguides.com http://www.insightguides.com https://rum-static.pingdom.net http://rum-static.pingdom.net https://rum-collector.pingdom.net http://rum-collector.pingdom.net https://*.youtube.com http://*.youtube.com https://www.googleadservices.com http://www.googleadservices.com https://connect.facebook.net http://connect.facebook.net https://googleads.g.doubleclick.net http://googleads.g.doubleclick.net https://www.facebook.com http://www.facebook.com https://cdn.inspectlet.com http://cdn.inspectlet.com https://hn.inspectlet.com http://hn.inspectlet.com https://*.apa.yoda.site http://*.apa.yoda.site https://www.preprod.apa.yoda.site http://www.preprod.apa.yoda.site https://www.test.apa.yoda.site http://www.test.apa.yoda.site https://www.google.com http://www.google.com https://www.google.pl http://www.google.pl https://www.google.co.uk http://www.google.co.uk https://google.com http://google.com https://google.pl http://google.pl https://google.co.uk http://google.co.uk https://ethn.io http://ethn.io https://stats.g.doubleclick.net http://stats.g.doubleclick.net https://platform.instagram.com http://platform.instagram.com https://instagram.com http://instagram.com https://www.instagram.com http://www.instagram.com https://*.amazonaws.com http://*.amazonaws.com blob:;

d lf h l h l h d bl l k h d bl l k h l l h l

Page 57: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

SOME OF MY FAVORITE HTTP STATUS LINES

▸ HTTP 500 access denied ("java.io.FilePermission" "D:\home\XXXXXXXXX.com\ori\ModelGlue\unity\eventrequest\EventRequest.cfc" "read")

▸ HTTP 500 "Duplicate entry '1473335051' for key 'timestamp' SQL=INSERT INTO `#__zt_visitor_counter` (`id`,`timestamp`,`visits`,`guests`,`ipaddress`,`useragent`) VALUES (null, '1473335051', 1 , 1 , '54.208.81.16', ‘chrome')"

▸ HTTP 500 "Server Made Big Boo"

Page 58: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

“NO HACKING”ABSOLUTE FAVORITE STATUS LINE

Page 59: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

CONCLUSION

▸ Use NSQ, seriously.

▸ Concurrency can be difficult

▸ Batch data before inserting to DB

▸ If DB rows > a few million, consider sharding

▸ Test different types of table schema for performance

▸ Treat browsers like garbage and handle appropriately

Page 60: [CB16] Around the Web in 80 Hours: Scalable Fingerprinting with Chromium Automation by Isaac Dawson

VERACODE

THE RESULTS: A LOOK AT DATA

QUESTIONS?

▸ twitter: @_wirepair

▸ github: wirepair

▸ gcd: https://github.com/wirepair/gcd

▸ autogcd: https://github.com/wirepair/autogcd

▸ killface: https://github.com/wirepair/killface

▸ Thanks to all my coworkers supporting and listening to my daily rants!