Machine Learning, hype or hit?

Post on 21-Apr-2017

830 views 0 download

Transcript of Machine Learning, hype or hit?

ANP126Machine Learning: Hype or Hit?Fred Verheul

2

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

3

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

4

Machine Learning

"Field of study that gives computers the ability to learnwithout being explicitly programmed” (Arthur Samuel, 1959)

5

What is Machine Learning?

Computer

Computer

Traditional Programming

Machine Learning

Data

Data

Program Output

ProgramOutput

6

Examples: Recommender systems

7

Examples: Natural Language Processing

Siri

Google Translate

8

Examples, continued…

SPAM-filtering

Handwriting recognition

9

ML in the news: IBM Watson

10

ML in the news: Deepmind’s AlphaGo

11

ML in the news: business example

12

Vendor Platforms…

13

Tricking a neural network…

A cat! Surely also a cat?!

More examples and explanation by Julia Evans (@b0rk)

14

Machine Learning gone wrong

15

Data Mining Fail (by Carina C. Zona)

16

Prediction is hard…

17

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

18

CRISP-DM: data mining process

ML important

ML important

19

Data: terminology

featuretarget / label

instance

20

Examples of ML tasksSupervised learning

Regression target is numeric

Classification target is categorical

Unsupervised learning

Clustering

Dimensionalityreduction

21

Exploratory Data Analysis

22

Data preparation

• Data Cleaning

• Missing Data

• Feature Engineering• Normalization• Categorical data Numerical features• Log-based features or target• Date/time-related features• Combine features, e.g. by +, -, x, /

23

Modeling: so many algorithms…

24

ML Algorithms: by RepresentationCollection of candidate models/programs, aka hypothesis space

Decision trees

Instance-based

Neural networks

Model ensembles

ML Algorithms: by Evaluation

Evaluation: Quality measure for a model

25

Regression

Example metric: Root Mean Squared Error

RMSE =

Binary classification: confusion matrix

Accuracy: 8 + 971 -> 97,9%

Example: medical test for a disease

Accuracy: Better evaluation metrics:• Precision: 8 / (8 + 19)• Recall: 8 / (8 + 2)

26

Optimization: how the algorithm ‘learns’, depends on representation and evaluation

ML Algorithms: by Optimization

Greedy Search, ex. of combinatorial optimization

Gradient Descent (or in general: Convex Optimization)

Linear Programming (or in general:Constrained/Nonlinear Optimization)

27

Algorithms by Evaluation: Heuristics

• Hill climbing

• Simulated Annealing

• Nelder-Mead Simplex Method

• Artificial Bee Colony Optimization

• Genetic Algorithms

• Particle Swarm Optimization

• Ant Colony Optimization

28

Choice of ML-algorithm, considerations

• Size & Dimensionality of training set

• Computational efficiency

• Model building, no of parameters• Eager vs lazy learning• Online vs batch

• Interpretability

29

Evaluation: training vs test data

5-fold cross validation

30

Training error vs test error

31

Overfitting

32

Chebishev distance (L∞-norm: || ||∞ )

|| P – Q ||∞ = max( , )

Number of moves of a King on a chessboard ;-)

Manhattan distance (L1-norm: || ||1 )

|| P – Q ||1 = +

0 1 2 3 4 5 6 7 8 9012345678

Line through (2,2) and (6,5)Line y = 2 (between 2 and 6)Vertical line x = 6 (between 2 and 5)

Distance metrics

Euclidean distance (L2-norm: || ||2 )

|| P – Q ||2 = (length of)

P

Q

Many more: Cosine distance, Edit distance (aka Levenshtein distance), …

33

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

34

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

35

So you want to be a Data Scientist?

36

CRISP-DM: data mining process

37

Hacking skills

• Programming languages:

• Libraries (examples):• Tensorflow, Caffe, Theano, Keras• SciPy & scikit-learn• Spark MLLib (Scala/Java/Python)

38

Math skills: Statistics

Source: http://xkcd.com/552/

39

More math skills that may be needed…

Calculus Linear Algebra

40

Data Science for Business

• Focuses more on general principles than specific algorithms

• Not math-heavy, does contain some math

• O’Reilly link: http://shop.oreilly.com/product/0636920028918.do

• Book website: http://data-science-for-biz.com/DSB/Home.html

41

Agenda

1. Introduction: Hype or Hit?!

2. Machine Learning

3. Demo, SAP ICN

4. Skill set for aspiring ML experts

5. Take-aways

42

What has NOT been covered

• Deep learning / Neural Networks

• Specifics of ML-algorithms

• Tools / Libraries / Code

• SAP Products, like HANA / Predictive Analytics / Vora / …

• Hardware

• …

43

Take-aways

• Goal of ML: generalize from training data (not optimization!!)

• Part of ‘Data Mining Process’, not a goal in and of itself

• No magic! Just some clever algorithms…

• Increasingly important non-technical aspects:• Ethics

• Algorithmic transparency

Thank Youwww.soapeople.cominfo@soapeople.com@SOAPEOPLE

Fred VerheulBig Data Consultant+31 6 3919 2986fred.verheul@soapeople.com