NachumShacham, PhD Director, Data Science PayPal · Constructing Operational Big-Data Models...

4
5/26/16 1 © 2014 PayPal Inc. All rights reserved. . © 2 0 1 5 Pay Pal In c. All rig hts reserved.. Constructing Operational Big-Data Models Nachum Shacham, PhD Director, Data Science PayPal 5/18/2016 Predictive Modeling With Big Data 2 Data Sourcing Data Warehouse Model(s) selection and tuning Data Munging Deploy Feature Engineering Observational big- data Data Exploration Validation

Transcript of NachumShacham, PhD Director, Data Science PayPal · Constructing Operational Big-Data Models...

Page 1: NachumShacham, PhD Director, Data Science PayPal · Constructing Operational Big-Data Models NachumShacham, PhD Director, Data Science PayPal 5/18/2016 Predictive Modeling With Big

5/26/16

1© 2014 PayPal Inc. All rights reserved. .

© 2015 PayPal Inc. All rights reserved..

Constructing Operational Big-Data Models

Nachum Shacham, PhDDirector, Data Science

PayPal

5/18/2016

Predictive Modeling With Big Data

2

Data Sourcing

Data Warehouse

Model(s) selection and tuning

Data Munging

Deploy

FeatureEngineering

Observational big-data

DataExploration

Validation

Page 2: NachumShacham, PhD Director, Data Science PayPal · Constructing Operational Big-Data Models NachumShacham, PhD Director, Data Science PayPal 5/18/2016 Predictive Modeling With Big

5/26/16

2© 2014 PayPal Inc. All rights reserved. .

Tools & Infrastructure for Data and Modeling

3

Data: Collect, stream, store, process

Training large scale predictive models

Score millions of objects

Big Server

Open source cluster-based tools

Cloud interface: IAAS, PAAS, SAAS

Collaboration/model-sharing tools

GitHub

• Big predictive models are complex• Trained on historical data to score future cases•Hundreds/thousands of features•Millions of cases

• Machine-learning algorithms have been upgraded to operate efficiently on big data•Neural nets• Tree-based ensembles (Random Forest, GBM)•Clustering

• Do the model results work?

4

Modeling: Predicting Individual Users

Page 3: NachumShacham, PhD Director, Data Science PayPal · Constructing Operational Big-Data Models NachumShacham, PhD Director, Data Science PayPal 5/18/2016 Predictive Modeling With Big

5/26/16

3© 2014 PayPal Inc. All rights reserved. .

• Performance Metrics: R2, Error Rate(s)•Measure via cross validation•Based on historical data

• Cost of errors: False Positive, False Negative•Set by the users of the model

• Pre-production evaluation requires trust•A/B testing for selecting alternatives

5

How Good Is Your Model

• Model results: interpretable or black box• Can score reasoning be articulated

• Identify champion(s) who believe in the model• Incentivize adoption of the model• Avoid impression of positional bias • Integrate gradually to prove results on a gradually increasing scale • Repeatedly test and adjust• Measure the benefits of actions based on the model

6

Gain a Trust in the Model

Page 4: NachumShacham, PhD Director, Data Science PayPal · Constructing Operational Big-Data Models NachumShacham, PhD Director, Data Science PayPal 5/18/2016 Predictive Modeling With Big

5/26/16

4© 2014 PayPal Inc. All rights reserved. .

THANK YOU

7