Best Practices for Big Data Analytics with Machine Learning by Datameer

40
© 2013 Datameer, Inc. All rights reserved. Best Practices for Big Data Analytics with Machine Learning

description

Don't forget! You can watch the full Datameer recording here: http://info.datameer.com/Online-Slideshare-Big-Data-Analytics-Machine-Learning-OnDemand.html Learn through industry use cases, how to empower users to identify patterns & relationships for recommendations using big data analytics.

Transcript of Best Practices for Big Data Analytics with Machine Learning by Datameer

Page 1: Best Practices for Big Data Analytics with Machine Learning by Datameer

© 2013 Datameer, Inc. All rights reserved.

Best Practices for Big Data Analytics with Machine Learning

Page 2: Best Practices for Big Data Analytics with Machine Learning by Datameer

Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language. At Zementis, Dr. Guazzelli is responsible for developing core technology and analytical solutions for Big Data and real-time scoring. Most recently, Dr. Guazzelli started teaching a class on standards for predictive analytics at UC San Diego Extension.

About our Speakers

Dr. Alex Guazzelli Zementis Vice President, Analytics (@DrAlexGuazzelli)

Page 3: Best Practices for Big Data Analytics with Machine Learning by Datameer

•  Came from Infomatica •  Worked with start-ups •  Infomatica purchased to bring data

solutions to market •  Data quality •  Master data management •  B2B •  Data security solutions

About our Speakers

•  Over 15 years of enterprise software experience

•  Co-authored 4 patents •  Worked in a variety of engineering,

marketing and sales roles •  Bachelors of Science degree in �

Management Science and Engineering from Stanford University

Karen Hsu Datameer Senior Director, Product Marketing (@Karenhsumar)

Page 4: Best Practices for Big Data Analytics with Machine Learning by Datameer

Agenda •  Considerations •  Best Practices •  Demonstration

•  Q&A

Page 5: Best Practices for Big Data Analytics with Machine Learning by Datameer

© 2013 Datameer, Inc. All rights reserved.

Considerations

Page 6: Best Practices for Big Data Analytics with Machine Learning by Datameer

Considerations

Target Users

Questions

Business IT

Descriptive! Predictive! Prescriptive!

Data Scientist

Page 7: Best Practices for Big Data Analytics with Machine Learning by Datameer

▪ Visual

Business Professional

Clustering

Decision Trees

Dependencies

+ More!

Target Users

Page 8: Best Practices for Big Data Analytics with Machine Learning by Datameer

IT

▪ Flexible, powerful

Target Users

Page 9: Best Practices for Big Data Analytics with Machine Learning by Datameer

▪ Algorithms ▪ SAS, SPSS, R

Data Scientist

Target Users

Page 10: Best Practices for Big Data Analytics with Machine Learning by Datameer

▪ Descriptive machine learning… – Tells you what has happened

Descriptive! Predictive! Prescriptive!Questions

Page 11: Best Practices for Big Data Analytics with Machine Learning by Datameer

▪ Predictive machine learning… – Answers the question what will happen

Descriptive! Predictive! Prescriptive!Questions

Page 12: Best Practices for Big Data Analytics with Machine Learning by Datameer

▪ Prescriptive machine learning… – What will happen, when it will happen, why

it will happen – Predict what will happen and prescribe how

to take advantage of this future

Descriptive! Predictive! Prescriptive!Questions

Page 13: Best Practices for Big Data Analytics with Machine Learning by Datameer

© 2013 Datameer, Inc. All rights reserved.

Best Practices

Page 14: Best Practices for Big Data Analytics with Machine Learning by Datameer

Lean Analytics

1. Integrate

3. Analyze

4. Visualize 2. PrepareIdentify

Use Case Deploy

Page 15: Best Practices for Big Data Analytics with Machine Learning by Datameer

Data Preparation

Profile Cleanse Enrich

Tran

sform

Bin

Normalize

Join

Union

Out

liers

Miss

ing

Valu

es

Inva

lid v

alue

s

Page 16: Best Practices for Big Data Analytics with Machine Learning by Datameer

Descriptive Analytics

Drag & Drop Smart Analytics

Page 17: Best Practices for Big Data Analytics with Machine Learning by Datameer

Predictive Analytics

Descriptive vs. Predictive Analytics "  Descriptive Analytics answers “What happened?” "  Predictive Analytics answers “What will happen next?”

Predictive Analytics helps you discover patterns in the past, which can signal what is ahead.

Predictive analytics is able to discover hidden patterns in historical data that the human expert may not see. It is in fact the result of mathematics applied to data. As such, it benefits from clever mathematical techniques as well as good data.

??

Page 18: Best Practices for Big Data Analytics with Machine Learning by Datameer

Example: Predicting Churn

Matt - Churned 2 days ago

Scott - “Liked” our company last week

John - ??

Page 19: Best Practices for Big Data Analytics with Machine Learning by Datameer

Churn-related features Matt 3 complaints in last 6 months Opened 2 support tickets in last 4 weeks Spent a total of $1,234 buying merchandise Spent a total of $123 in services Purchased 2 items in last 4 weeks Is 34 years old Is a male Lives in Los Angeles ...

Scott No complaints in last 6 months Opened 1 support ticket in last 4 weeks Spent a total of $9,876 buying merchandise Spent a total of $987 in services Purchased 12 items in last 4 weeks Is 54 years old Is a male Lives in Chicago ...

Page 20: Best Practices for Big Data Analytics with Machine Learning by Datameer

Big Data An ever expanding ocean of data containing

people and sensor data (lots and lots of it):

90% of the data today created in last 2 years

Breadth and Depth

"   Transaction records "   Social media "   Climate information "   Mobile GPS signals "   Healthcare "   Smart Grid "   Digital Breadcrumbs

Page 21: Best Practices for Big Data Analytics with Machine Learning by Datameer

Churn-related “Big Data” features Matt 12 friends listed as customers 2 complaints from friends in last 6 months Average age of friends is 41 years old 2 friends churned in last 30 days No purchases for same items as friends 1 website visit in last 7 days 2 website pages opened during last visit Opened 3 newsletters in last 6 months ...

Scott 34 friends listed as customers 1 complaint from friends in last 6 months Average age of friends is 62 years old No friends churned in last 30 days Purchased same 2 items as friends in last 2 months 3 website visits in last 7 days 5 website pages opened during last visit Opened 12 newsletters in last 6 months ...

Page 22: Best Practices for Big Data Analytics with Machine Learning by Datameer

Predictive Model

Building a predictive model ... Model Training

Churn-related features

Churned Not-churned

Data Prediction

Hidden Layer

Input Layer

Output Layer

Neural Networks Linear/Logistic Regression Support Vector Machines Scorecards Decision Trees Clustering Association Rules K-Nearest Neighbors Naive Bayes Classifiers ...

Page 23: Best Practices for Big Data Analytics with Machine Learning by Datameer

Why not several models?

Model Ensemble

Data Pre-Processing

Raw Inputs

Prediction

Scores from all models are computed

Majority Voting, Weighted Voting,

Weighted Average, etc.

Model 1

Model 2

Model n

Voting . . .

Page 24: Best Practices for Big Data Analytics with Machine Learning by Datameer

End Goal: Predicting churn ...

Model Deployment and Execution in

Churn Risk

Score Churn-related

Features

Big Data

Predictive Churn Model

Page 25: Best Practices for Big Data Analytics with Machine Learning by Datameer

Production Environment

Scientist’s Desktop

SAS, R, IBM SPSS, Perl,

Python

Java, .NET C, SQL

Lost in Translation

From Model Building to Model Deployment (Traditionally ...)

SAS, R, IBM SPSS …

Great for model building but not for scoring, even

more so when it comes to Hadoop

Page 26: Best Practices for Big Data Analytics with Machine Learning by Datameer

From Model Building to Model Deployment (with PMML)

Model Building Model Deployment and Execution

"  Angoss "  BigML "  FICO Model Builder "  IBM SPSS "  KNIME "  KXEN "  Microstrategy "  Open Data "  Pervasive DataRush "  RapidMiner "  R / Rattle "  SAS "  SAP Business Objects "  Salford Systems "  StatSoft STASTISTICA "  SQL Server "  TIBCO Spotfire "  Custom Code, etc.

               

Universal  PMML  Plug-­‐in  (UPPI)  

PMML  (models)  

PMML  (models)  

PMML  (models)  PMML

Datameer Server

Deploy in minutes ...

Page 27: Best Practices for Big Data Analytics with Machine Learning by Datameer

"   PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.

"   It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.

"   PMML eliminates need for custom model deployment and ensures reliability.

PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing)

Predictive Model Markup Language

Models

Data Transformations

Page 28: Best Practices for Big Data Analytics with Machine Learning by Datameer

"   Neural Networks (neural gas, radial-basis and backpropagation) "   Support Vector Machines (for classification and regression) "   Naive Bayes Classifier (for continuous and categorical inputs) "   Rule Set Models "   Clustering Models (2-step clustering, distribution and center-based) "   Decision Trees (for classification and regression) "   General Regression Models (Cox, General and Generalized Linear Models) "   Regression Models (Linear, Logistic and Polynomial Regression Models) "   Scorecards (with support for Reason Codes) "   Restricted Boltzmann Machines "   Association Rules "   Multiple Models (with the possibility of having models spread over multiple PMML

files) "   Model Ensemble (including Random Forest Models and Boosted Trees) "   Model Segmentation "   Model Chaining "   Model Composition "   Model Cascade

UPPI: Supported Techniques

© Zementis, Inc. - Confidential

Page 29: Best Practices for Big Data Analytics with Machine Learning by Datameer

Demonstration Flow

Descriptive Predictive Modeling Prescriptive Predictive

Production

Karen Alex Karen Karen

Page 30: Best Practices for Big Data Analytics with Machine Learning by Datameer

© 2013 Datameer, Inc. All rights reserved.

Descriptive Analytics

Page 31: Best Practices for Big Data Analytics with Machine Learning by Datameer

Descriptive Analytics ▪ Answers: What caused people to churn?

▪ Clustering ▪ Column Dependencies ▪ Decision Tree

Page 32: Best Practices for Big Data Analytics with Machine Learning by Datameer

Demonstration Flow

Descriptive Predictive Modeling Prescriptive Predictive

Production

Karen Alex Karen Karen

Page 33: Best Practices for Big Data Analytics with Machine Learning by Datameer

© 2013 Datameer, Inc. All rights reserved.

Predictive Analytics

Page 34: Best Practices for Big Data Analytics with Machine Learning by Datameer

Predictive Analytics ▪ Who will churn?

Page 35: Best Practices for Big Data Analytics with Machine Learning by Datameer

Demonstration Flow

Descriptive Predictive Modeling Prescriptive Predictive

Production

Karen Alex Karen Karen

Page 36: Best Practices for Big Data Analytics with Machine Learning by Datameer

© 2013 Datameer, Inc. All rights reserved.

Prescriptive Analytics

Page 37: Best Practices for Big Data Analytics with Machine Learning by Datameer

Prescriptive Analytics ▪ Who will churn? Why will they churn? ▪ What can we do to support that outcome?

Page 38: Best Practices for Big Data Analytics with Machine Learning by Datameer

Demonstration Flow

Descriptive Predictive Modeling Prescriptive Predictive

Production

Karen Alex Karen Karen

Page 39: Best Practices for Big Data Analytics with Machine Learning by Datameer

Q&A

Page 40: Best Practices for Big Data Analytics with Machine Learning by Datameer

Next Steps:

Page 40

More about Datameer and Big Data www.datameer.com

More about Zementis www.zementis.com

Contact us: Alex Guazzeli [email protected] Karen Hsu [email protected]