wise.io meetup at Hacker Dojo

11
Automate Your Workflow wise.io Machine Learning Tools Joseph W. Richards, PhD Co-founder & Chief Scientist, wise.io [email protected] @wiseio Wednesday, July 31, 13

description

Slides by http://wise.io from the Silicon Valley Hands On Programming Events meetup group on July 13, 2013.

Transcript of wise.io meetup at Hacker Dojo

Page 1: wise.io meetup at Hacker Dojo

Automate Your Workflowwise.io Machine Learning Tools

Joseph W. Richards, PhDCo-founder & Chief Scientist, wise.io

[email protected]

@wiseio

Wednesday, July 31, 13

Page 2: wise.io meetup at Hacker Dojo

Machine Learning - Set of data-driven statistical models aimed at solving problems with real-world data (large, high-dimensional, messy)

• Focus on prediction problems (e.g., spam vs. not spam) instead of parameter inference (e.g., is the effect significant)

• Computational requirements / constraints are important• Variety of input data from simple (structured) to complicated

text, image, time series & video (unstructured)• Flexible, non-parametric models are employed instead of

simple, parametric models Wednesday, July 31, 13

Page 3: wise.io meetup at Hacker Dojo

Drew ConwayWednesday, July 31, 13

Page 4: wise.io meetup at Hacker Dojo

APPROACHES TO DATA ANALYSIS

Manual Analysis & Simple Business

Rules

• Slow & labor intensive• Non-optimal• No statistical guarantees

Basic Analytics• Automated• Simple models• Inaccurate & limited

Machine Learning

• Automated• Highly accurate• Adaptible & flexible• Learn from new data

Wednesday, July 31, 13

Page 5: wise.io meetup at Hacker Dojo

TYPES OF MACHINE LEARNING

information retrieval - search, indexing, document retreival

classification - spam/fraud detection, sentiment analysis, ad targeting

regression - stock market prediction, sales prediction, cost forecasting

imputation - data cleaning, inference of missing information

recommendation - product recommendation, recruiting, ‘Netflix prize’

clustering - customer segmentation, product categorization

dimensionality reduction - visualization, manual insight, prediction

outlier detection - anomaly identification, process control

Wednesday, July 31, 13

Page 6: wise.io meetup at Hacker Dojo

ML USE CASES

frauddetection

adtargeting

sentimentanalysis

intelligentsensors

healthcare& genomics

Wednesday, July 31, 13

Page 7: wise.io meetup at Hacker Dojo

Long lead time from inception to

deployment

Existing toolkits are buckling under data

gravity

Data scientist scarcity

Results of prototyping

...

...large-scale production

environments

PERVASIVE ML PAIN POINTS

Wednesday, July 31, 13

Page 8: wise.io meetup at Hacker Dojo

ActionableInsight

Fast, Scalable Machine Learning

FeatureMarketplace

Your DataSources

Beautiful UIAPI & Embeddable Models

100x faster

patent- pending

MACHINE INTELLIGENCE ENGINE

textimagesvideo

time seriesgraph

CLOUD-BASED AND ON-PREMISE SOLUTIONS AVAILABLE

Democratizing Machine Intelligence

Wednesday, July 31, 13

Page 9: wise.io meetup at Hacker Dojo

EXAMPLE WORKFLOW:FRAUD DETECTION

1. Connect historical data(CSV, SQL, mongoDB, S3, Dropbox, etc.)

2. Ask the appropriate question: “how can I predict fraud (yes/no) for new transactions?”

3. Perform feature engineeringe.g., if I have text data then do NLP

4. Train an optimized classification model from the historical data to predict fraud as a function of input features

5. Use the optimized model in your production workflowRESTful APIs (Python, Ruby, Java) or embedded model file

Wednesday, July 31, 13

Page 10: wise.io meetup at Hacker Dojo

WiseRF™ THE ML ENGINE

‣ accurate nonlinear algorithm‣ heterogeneous data: categories,

numbers, integers, boolean‣ versatile‣ little tuning required‣ no normalization needed‣ handle missing data‣ robust to outliers

RANDOM FORESTS WiseRF™Our fast, memory-efficient and

scalable implementation of Random Forest

‣ Faster training: better optimization of models

‣ Faster predictions: smaller time between data collection and decision making

‣ Scalable learning: no need to subsample or approximate

‣ Memory efficient: embedded devices!Wednesday, July 31, 13

Page 11: wise.io meetup at Hacker Dojo

WiseRF™ BENCHMARKS

Digit recognition (MNIST): 45,000 training images, 784 dimensions

8GB dataset in 99 sec(SVM takes ~1 week)

Learning on Large Data:

Prediction on Fast Data:20 M predictions per second

Wednesday, July 31, 13