Automatic Machine Learning, AutoML

37
Automatic Machine Learning By: Himadri Mishra, 13074014

Transcript of Automatic Machine Learning, AutoML

Page 1: Automatic Machine Learning, AutoML

Automatic Machine Learning

By: Himadri Mishra, 13074014

Page 2: Automatic Machine Learning, AutoML

Overview: What is Machine Learning?

● Subfield of computer science● Evolved from the study of pattern recognition and

computational learning theory in artificial intelligence● Gives computers the ability to learn without being

explicitly programmed● Explores the study and construction of algorithms that

can learn from and make predictions on data

Page 3: Automatic Machine Learning, AutoML

Basic Flow of Machine Learning

Page 4: Automatic Machine Learning, AutoML

Overview: Why Machine Learning?

● Some tasks are difficult to define algorithmically. Example: Learning to recognize objects.

● High-value predictions that can guide better decisions and smart actions in real time without human intervention

● Machine learning as a technology that helps analyze these large chunks of big data,

Page 5: Automatic Machine Learning, AutoML

● Research area that targets progressive automation of machine learning

● Also known as AutoML● Focuses on end users without expert knowledge● Offers new tools to Machine Learning experts.

○ Perform architecture search over deep representations○ Analyse the importance of hyperparameters

○ Development of flexible software packages that can be instantiated automatically in a data-driven way

● Follows the paradigm of Programming by Optimization (PbO)

What is Automatic Machine Learning?

Page 6: Automatic Machine Learning, AutoML

Examples of AutoML

● AutoWEKA: Approach for the simultaneous selection of a machine learning algorithm and its hyperparameters

● Deep Neural Networks: notoriously dependent on their hyperparameters, and modern optimizers have achieved better results in setting them than humans (Bergstra et al, Snoek et al).

● Making a science of model search: a complex computer vision architecture could automatically be instantiated to yield state-of-the-art results on 3 different tasks: face matching, face identification, and object recognition.

Page 7: Automatic Machine Learning, AutoML

Methods of AutoML

● Bayesian optimization● Regression models for structured data and big data● Meta learning● Transfer learning● Combinatorial optimization.

Page 8: Automatic Machine Learning, AutoML

An AutoML Framework

Page 9: Automatic Machine Learning, AutoML
Page 10: Automatic Machine Learning, AutoML

Modules of AutoML Framework, unraveled

● Data Pre-Processing● Problem Identification and Data Splitting● Feature Engineering● Feature Stacking● Application of various models to data● Decomposition● Feature Selection● Model selection and HyperParameter tuning● Evaluation of Model

Page 11: Automatic Machine Learning, AutoML

Data Pre-Processing

Page 12: Automatic Machine Learning, AutoML

● Tabular data is most common way of representing data in machine learning or data mining

● Data must be converted to a tabular form

Page 13: Automatic Machine Learning, AutoML

Problem Identification and Data Splitting

Page 14: Automatic Machine Learning, AutoML

● Single column, binary values (Binary Classification)● Single column, real values (Regression problem)● Multiple column, binary values (Multi-Class

Classification)● Multiple column, real values (Multiple target Regression

problem)● Multilabel Classification

Types of Labels

Page 15: Automatic Machine Learning, AutoML

● Stratified KFold splitting for Classification● Normal KFold split for regression

Page 16: Automatic Machine Learning, AutoML

Feature Engineering

Page 17: Automatic Machine Learning, AutoML

● Numerical Variables○ No Processing Required

● Categorical Variables○ Label Encoders○ One Hot Encoders

● Text Variables○ Count Vectorize○ TF-IDF vectorize

Types of Variables

Page 18: Automatic Machine Learning, AutoML

Feature Stacking

Page 19: Automatic Machine Learning, AutoML

● Two Kinds of Stacking○ Model Stacking

■ An Ensemble Approach■ Combines the power of diverse models into single

○ Feature Stacking■ Different features after processing, gets combined

● Our Stacker Module is a feature stacker

Page 20: Automatic Machine Learning, AutoML

Application of models and Decomposition

Page 21: Automatic Machine Learning, AutoML

● We should go for Ensemble tree based models:○ Random Forest Regressor/Classifier○ Extra Trees Regressor/Classifier○ Gradient Boosting Machine Regressor/Classifier

● Can’t apply linear models without Normalization○ For dense features Standard Scaler Normalization

○ For Sparse Features Normalize without scaling about mean, only to unit variance

● If the above steps give a “good” model, we can go for optimization of hyperparameters module, else continue

Page 22: Automatic Machine Learning, AutoML

● For High dimensional data, PCA is used to decompose● For images start with 10-15 components and increase it as

long as results improve● For other kind of data, start with 50-60 components● For Text Data, we use Singular Value Decomposition after

converting text to sparse matrix

Page 23: Automatic Machine Learning, AutoML

Feature Selection

Page 24: Automatic Machine Learning, AutoML

● Greedy Forward Selection○ Selecting best features iteratively○ Selecting features based on coefficients of model

● Greedy backward elimination● Use GBM for normal features and Random Forest for Sparse

features for feature evaluation

Page 25: Automatic Machine Learning, AutoML

Model selection and HyperParameter tuning

Page 26: Automatic Machine Learning, AutoML

● Most important and fundamental process of Machine Learning

Page 27: Automatic Machine Learning, AutoML

● Classification:○ Random Forest○ GBM○ Logistic Regression○ Naive Bayes○ Support Vector Machines○ k-Nearest Neighbors

● Regression○ Random Forest○ GBM○ Linear Regression○ Ridge○ Lasso○ SVR

Choice of Model and Hyperparameters

Page 28: Automatic Machine Learning, AutoML
Page 29: Automatic Machine Learning, AutoML

Evaluation of Model

Page 30: Automatic Machine Learning, AutoML

Saving all Transformations on Train Data for reuse

Page 31: Automatic Machine Learning, AutoML

Re-Use of saved transformations for Evaluation on validation set

Page 32: Automatic Machine Learning, AutoML

Current Research

Page 33: Automatic Machine Learning, AutoML

Automatic Architecture selection for Neural Network

Page 34: Automatic Machine Learning, AutoML

Automatically Tuned Neural Network

● Auto-Net is a system that automatically configures neural networks● Achieved the best performance on two datasets in the human expert track of

the recent ChaLearn AutoML Challenge● Works by tuning:

○ layer-independent network hyperparameters○ per-layer hyperparameters

● Auto-Net submission reached an AUC score of 90%, while the best human competitor (Ideal Intel Analytics) only reached 80%

● first time an automatically-constructed neural network won a competition dataset

Page 35: Automatic Machine Learning, AutoML

Conclusion

Page 36: Automatic Machine Learning, AutoML

● Machine learning (ML) has achieved considerable successes in recent years and an ever-growing number of disciplines rely on it.

● However, its success crucially relies on human machine learning experts to perform various tasks manually

● The rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge

● Auto-ML is an open research topic and will be very soon challenging the state of the Art results in various domains

Page 37: Automatic Machine Learning, AutoML

Thank You