DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in...

22
https://www.meetup.com/Technologietrends-und-Innovation-fur-die-Praxis/

Transcript of DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in...

Page 1: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

https://www.meetup.com/Technologietrends-und-Innovation-fur-die-Praxis/

Page 2: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

Introducing AutoAI

with IBM Watson Studio

Benedikt Klotz

[email protected]/in/benediktklotz

IT Architect, IBM Hybrid Cloud

October 3rd 2019

Page 3: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

AutoAI definitions from various sources

Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.

Auto ML services provide machine learning at the click of a button, or, at the very least, promise to keep algorithm implementation, data pipelines, and code, in general, hidden from view.

Quite simply, it is the means by which your business can optimize resources, encourage collaboration and rapidly and dependably distribute data across the enterprise and use that data to predict, plan and achieve revenue goals.

AutoML is the capability to automatically ingest, clean, transform, and model with hyperparameter optimization

1

2

3

4

Page 4: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

AutoAI definitions from various sources

Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems.

Auto ML services provide machine learning at the click of a button, or, at the very least, promise to keep algorithm implementation, data pipelines, and code, in general, hidden from view.

Quite simply, it is the means by which your business can optimize resources, encourage collaboration and rapidly and dependably distribute data across the enterprise and use that data to predict, plan and achieve revenue goals.

AutoML is the capability to automatically ingest, clean, transform, and model with hyperparameter optimization

1

2

3

4

Blogger

IBM

Wikipedia

Blogger

Page 5: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

IBM Watson operationalizes AI across the enterprise

5

IBM TOOLS

Data Scientist App Developer

Build AI Run AI

3RD PARTY IDE & FRAMEWORKS

IBM AI RUNTIME

Watson OpenScale

Automated Anomaly and Drift detection

Business KPIs

Watson Studio Watson Machine Learning

Manage AI at scale

3RD PARTY RUNTIMES

Build Deploy and run Operate trusted AI

Business user

Consume AI

Fairness and Explainability

Inputs for Continuous Evolution

Accuracy

Validation and Feedback

SPSS Modeler

Custom (Kubernetes etc.)

Microsoft Azure ML

Amazon Web Services

Keras

Pytorch

Scikit-learn

Spark ML

Caffe2 …

Watson Knowledge Catalog

Data Profiling

Quality and Lineage

Data Governance

Organize and Govern data

Data Engineer

Organize Data for AI

Page 6: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

7

Case for AI Automation: AI Workflow’s Bigger & More Complex

Majority spent on

data wrangling!

Ground Truth Gathering

Data Cleansing

Feature Engineering

Model Selection

Parameter Optimization

Ensemble

Model Validation

Model Deployment

Runtime

Monitoring

Model Improvement

Source: https://www.kaggle.com/paultimothymooney/2018-kaggle-machine-learning-data-science-survey

AI Lifecycle

Management

Page 7: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

Coming up with features is difficult, time-consuming, requires expert knowledge. "Applied machine learning" is basically feature engineering.

-Andrew Ng

The algorithms we used are very standard for Kagglers. […] We spent most of our efforts in feature engineering. [...]

-Xavier Conort

Feature Engineering is essential,

difficult and costly.

Why care about Automation?

...some machine learning projects succeed and some fail. What makes the difference? Easily the most important factor is the features used.

- Pedro Domingos

Page 8: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

https://www.kaggle.com/paultimothymooney/2018-kaggle-machine-learning-data-science-survey

https://economicgraph.linkedin.com/research/linkedin-2018-emerging-jobs-report

https://economicgraph.linkedin.com/research/LinkedIns-2017-US-Emerging-Jobs-Report

Data Scientist Jobs Explosion

2016 2017 2018

How long have you been

writing code

How long have you used Machine Learning

methods

• ~50% of Data Scientist respondents on Kaggle said they had less

than 2 years of experience on ML methods. Same for coding

experience

• Significant percentage uptick in AI roles being adopted in vertical

industries (LinkedIn 2018 jobs report) (again more

domain/application knowledge, less ML and CS)

The Gap: Data scientists acknowledge lack of expertise

Page 9: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

AI for AI: Livecycle of Automation of AI Development and Operation

AI Designing AI AI Optimizing AI AI Governing AI

Neural network architecture and search

Lifecycle managementAI pipeline optimizationDecision optimization

Monitoring AI outcome with trust and explainability

Page 10: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

IBM’s Portfolio for Automation of AI Development

Transfer Learning

• Transfer knowledge learning in one deep learning system to apply to a different domain

• Featured in Watson Visual Recognition or NLP Services, available via Watson Studio

Neural Network Search

• Just bring data and automatically generate a custom deep neural network through searching the best architectures for the input data

• NeuNetS as a feature of Watson Studio, available in Open Beta

AutoAI Experiments & Pipeline optimization

• Auto clean data, engineer features, and complete HPO to find the optimal end to end pipeline

• AutoAI as a feature of Watson Studio

May 2019

Page 11: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

© 2018 IBM Corporation

What does AutoAI with IBM Watson Studio do?

• Integrated with Watson Studio and Watson Machine learning

• Automatically ingest, clean, transform, and model with hyperparameter optimization

• Training feedback visualizations provide real-time results to see model performance

• One-click deployment to Watson Machine Learning

Page 12: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

© 2018 IBM Corporation

How does it work?

AutoAI Pipeline

Optimization

Raw Labeled

Data SetPrep

Model

selectionHPO

Feature

EngineeringHPO Ensembling

Finds best preprocessing imputation / encoding and scaling strategies

Finds top-K estimators

HPO on selected estimator

HPO on estimator after Feature Engineering

Finds best data transformation sequence

Computes ensemble predictions based on pipeline predictions

Page 13: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

Train

Infrastructure

AutoAI Experiment

Watson Studio

Import/Add

Project Data

Watson Machine Learning

Artifact

Repository

Deploy

Model saved

to WML

Raw

Labeled

Data Set

PrepModel

selectionHPO

Feature

EngineeringHPO Ensembling

hosted

model

ApppredictionREST

API

How does it work?

Page 14: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize
Page 15: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize
Page 16: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

User Benefits of using AutoAI

Build models fasterAutomate data preparation and model development

Jump the skills gapNo coding? No problem – get started with a couple clicks

Discover more use casesSupercharge collaboration with AI everywhere to disrupt and transform

Find signal from noiseAuto-feature engineering makes it easy to extract more predictive power from your data

Rank and explore modelsQuickly compare candidate pipelines to find the best model for the job

Ready, set, deployPipelines generated with AutoAI can be deployed to REST APIs with one click

Page 17: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

Get Started Today

Experience AutoAI: https://www.ibm.com/demos/collection/IBM-Watson-Studio-AutoAI/

AutoAI:

• Accelerate• Automate • Optimize

Page 18: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize
Page 19: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

AI DesigningAI

Page 20: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

23 | © 2018 - IBM Corporation Copyright | IBM Confidential

NeuNetS: A Breakthrough in AI Designing AI

Weeks to a monthOvernight / dayIn a few hours light weight

Page 21: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize

24Think 2019 / DOC ID / Month XX, 2019 / © 2019 IBM

Corporation

Design Neural Networks is challenging

• Highly skilled researchers/data-scientists are needed to

hand-craft custom neural networks

• Hand-crafting complex networks is time-consuming, error

prone, and does not scale with time and resources

• Neural networks continue to grow in size and complexity

Page 22: DSX vs. DataRobot - Future Network · algorithm implementation, data pipelines, and code, in general, hidden from view. Quite simply, it is the means by which your business can optimize