Best Practices for Big Data Analytics with Machine Learning by Datameer

Post on 31-Aug-2014

1.094 views 3 download

Tags:

description

Don't forget! You can watch the full Datameer recording here: http://info.datameer.com/Online-Slideshare-Big-Data-Analytics-Machine-Learning-OnDemand.html Learn through industry use cases, how to empower users to identify patterns & relationships for recommendations using big data analytics.

Transcript of Best Practices for Big Data Analytics with Machine Learning by Datameer

© 2013 Datameer, Inc. All rights reserved.

Best Practices for Big Data Analytics with Machine Learning

Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language. At Zementis, Dr. Guazzelli is responsible for developing core technology and analytical solutions for Big Data and real-time scoring. Most recently, Dr. Guazzelli started teaching a class on standards for predictive analytics at UC San Diego Extension.

About our Speakers

Dr. Alex Guazzelli Zementis Vice President, Analytics (@DrAlexGuazzelli)

•  Came from Infomatica •  Worked with start-ups •  Infomatica purchased to bring data

solutions to market •  Data quality •  Master data management •  B2B •  Data security solutions

About our Speakers

•  Over 15 years of enterprise software experience

•  Co-authored 4 patents •  Worked in a variety of engineering,

marketing and sales roles •  Bachelors of Science degree in �

Management Science and Engineering from Stanford University

Karen Hsu Datameer Senior Director, Product Marketing (@Karenhsumar)

Agenda •  Considerations •  Best Practices •  Demonstration

•  Q&A

© 2013 Datameer, Inc. All rights reserved.

Considerations

Considerations

Target Users

Questions

Business IT

Descriptive! Predictive! Prescriptive!

Data Scientist

▪ Visual

Business Professional

Clustering

Decision Trees

Dependencies

+ More!

Target Users

IT

▪ Flexible, powerful

Target Users

▪ Algorithms ▪ SAS, SPSS, R

Data Scientist

Target Users

▪ Descriptive machine learning… – Tells you what has happened

Descriptive! Predictive! Prescriptive!Questions

▪ Predictive machine learning… – Answers the question what will happen

Descriptive! Predictive! Prescriptive!Questions

▪ Prescriptive machine learning… – What will happen, when it will happen, why

it will happen – Predict what will happen and prescribe how

to take advantage of this future

Descriptive! Predictive! Prescriptive!Questions

© 2013 Datameer, Inc. All rights reserved.

Best Practices

Lean Analytics

1. Integrate

3. Analyze

4. Visualize 2. PrepareIdentify

Use Case Deploy

Data Preparation

Profile Cleanse Enrich

Tran

sform

Bin

Normalize

Join

Union

Out

liers

Miss

ing

Valu

es

Inva

lid v

alue

s

Descriptive Analytics

Drag & Drop Smart Analytics

Predictive Analytics

Descriptive vs. Predictive Analytics "  Descriptive Analytics answers “What happened?” "  Predictive Analytics answers “What will happen next?”

Predictive Analytics helps you discover patterns in the past, which can signal what is ahead.

Predictive analytics is able to discover hidden patterns in historical data that the human expert may not see. It is in fact the result of mathematics applied to data. As such, it benefits from clever mathematical techniques as well as good data.

??

Example: Predicting Churn

Matt - Churned 2 days ago

Scott - “Liked” our company last week

John - ??

Churn-related features Matt 3 complaints in last 6 months Opened 2 support tickets in last 4 weeks Spent a total of $1,234 buying merchandise Spent a total of $123 in services Purchased 2 items in last 4 weeks Is 34 years old Is a male Lives in Los Angeles ...

Scott No complaints in last 6 months Opened 1 support ticket in last 4 weeks Spent a total of $9,876 buying merchandise Spent a total of $987 in services Purchased 12 items in last 4 weeks Is 54 years old Is a male Lives in Chicago ...

Big Data An ever expanding ocean of data containing

people and sensor data (lots and lots of it):

90% of the data today created in last 2 years

Breadth and Depth

"   Transaction records "   Social media "   Climate information "   Mobile GPS signals "   Healthcare "   Smart Grid "   Digital Breadcrumbs

Churn-related “Big Data” features Matt 12 friends listed as customers 2 complaints from friends in last 6 months Average age of friends is 41 years old 2 friends churned in last 30 days No purchases for same items as friends 1 website visit in last 7 days 2 website pages opened during last visit Opened 3 newsletters in last 6 months ...

Scott 34 friends listed as customers 1 complaint from friends in last 6 months Average age of friends is 62 years old No friends churned in last 30 days Purchased same 2 items as friends in last 2 months 3 website visits in last 7 days 5 website pages opened during last visit Opened 12 newsletters in last 6 months ...

Predictive Model

Building a predictive model ... Model Training

Churn-related features

Churned Not-churned

Data Prediction

Hidden Layer

Input Layer

Output Layer

Neural Networks Linear/Logistic Regression Support Vector Machines Scorecards Decision Trees Clustering Association Rules K-Nearest Neighbors Naive Bayes Classifiers ...

Why not several models?

Model Ensemble

Data Pre-Processing

Raw Inputs

Prediction

Scores from all models are computed

Majority Voting, Weighted Voting,

Weighted Average, etc.

Model 1

Model 2

Model n

Voting . . .

End Goal: Predicting churn ...

Model Deployment and Execution in

Churn Risk

Score Churn-related

Features

Big Data

Predictive Churn Model

Production Environment

Scientist’s Desktop

SAS, R, IBM SPSS, Perl,

Python

Java, .NET C, SQL

Lost in Translation

From Model Building to Model Deployment (Traditionally ...)

SAS, R, IBM SPSS …

Great for model building but not for scoring, even

more so when it comes to Hadoop

From Model Building to Model Deployment (with PMML)

Model Building Model Deployment and Execution

"  Angoss "  BigML "  FICO Model Builder "  IBM SPSS "  KNIME "  KXEN "  Microstrategy "  Open Data "  Pervasive DataRush "  RapidMiner "  R / Rattle "  SAS "  SAP Business Objects "  Salford Systems "  StatSoft STASTISTICA "  SQL Server "  TIBCO Spotfire "  Custom Code, etc.

               

Universal  PMML  Plug-­‐in  (UPPI)  

PMML  (models)  

PMML  (models)  

PMML  (models)  PMML

Datameer Server

Deploy in minutes ...

"   PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.

"   It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.

"   PMML eliminates need for custom model deployment and ensures reliability.

PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing)

Predictive Model Markup Language

Models

Data Transformations

"   Neural Networks (neural gas, radial-basis and backpropagation) "   Support Vector Machines (for classification and regression) "   Naive Bayes Classifier (for continuous and categorical inputs) "   Rule Set Models "   Clustering Models (2-step clustering, distribution and center-based) "   Decision Trees (for classification and regression) "   General Regression Models (Cox, General and Generalized Linear Models) "   Regression Models (Linear, Logistic and Polynomial Regression Models) "   Scorecards (with support for Reason Codes) "   Restricted Boltzmann Machines "   Association Rules "   Multiple Models (with the possibility of having models spread over multiple PMML

files) "   Model Ensemble (including Random Forest Models and Boosted Trees) "   Model Segmentation "   Model Chaining "   Model Composition "   Model Cascade

UPPI: Supported Techniques

© Zementis, Inc. - Confidential

Demonstration Flow

Descriptive Predictive Modeling Prescriptive Predictive

Production

Karen Alex Karen Karen

© 2013 Datameer, Inc. All rights reserved.

Descriptive Analytics

Descriptive Analytics ▪ Answers: What caused people to churn?

▪ Clustering ▪ Column Dependencies ▪ Decision Tree

Demonstration Flow

Descriptive Predictive Modeling Prescriptive Predictive

Production

Karen Alex Karen Karen

© 2013 Datameer, Inc. All rights reserved.

Predictive Analytics

Predictive Analytics ▪ Who will churn?

Demonstration Flow

Descriptive Predictive Modeling Prescriptive Predictive

Production

Karen Alex Karen Karen

© 2013 Datameer, Inc. All rights reserved.

Prescriptive Analytics

Prescriptive Analytics ▪ Who will churn? Why will they churn? ▪ What can we do to support that outcome?

Demonstration Flow

Descriptive Predictive Modeling Prescriptive Predictive

Production

Karen Alex Karen Karen

Q&A

Next Steps:

Page 40

More about Datameer and Big Data www.datameer.com

More about Zementis www.zementis.com

Contact us: Alex Guazzeli aguazzeli@zementis.com Karen Hsu khsu@datameer.com