Machine Learning CUNY Graduate Center Lecture 1: Introduction.

55
Machine Learning CUNY Graduate Center Lecture 1: Introduction

Transcript of Machine Learning CUNY Graduate Center Lecture 1: Introduction.

Page 1: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

Machine Learning

CUNY Graduate Center

Lecture 1: Introduction

Page 2: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

2

Today

• Welcome• Overview of Machine Learning• Class Mechanics• Syllabus Review• Basic Classification Algorithm

Page 3: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

3

My research and background

• Speech– Analysis of Intonation– Segmentation

• Natural Language Processing– Computational Linguistics

• Evaluation Measures• All of this research relies heavily on

Machine Learning

Page 4: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

4

You

• Why are you taking this class?• For Ph.D. students:

– What is your dissertation on?– Do you expect it to require Machine Learning?

• What is your background and comfort with– Calculus– Linear Algebra– Probability and Statistics

• What is your programming language of preference?– C++, java, or python are preferred

Page 5: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

5

Machine Learning

• Automatically identifying patterns in data• Automatically making decisions based on

data• Hypothesis:

Data Learning Algorithm Behavior

Data Programmer Behavior

Page 6: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

6

Machine Learning in Computer Science

Machine LearningBiomedical/

ChemedicalInformatics

Financial Modeling

Natural Language Processing

Speech/Audio

Processing Planning

Locomotion

Vision/Image

Processing

Robotics

Human Computer Interaction

Analytics

Page 7: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

7

Major Tasks

• Regression– Predict a numerical value from “other

information”

• Classification– Predict a categorical value

• Clustering– Identify groups of similar entities

• Evaluation

Page 8: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

8

Feature Representations

• How do we view data?

Entity in the World

Web PageUser BehaviorSpeech or Audio DataVisionWinePeopleEtc.

Feature Representation

Machine Learning Algorithm

Feature Extraction

Our Focus

Page 9: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

9

Feature Representations

Height Weight Eye Color Gender

66 170 Blue Male

73 210 Brown Male

72 165 Green Male

70 180 Blue Male

74 185 Brown Male

68 155 Green Male

65 150 Blue Female

64 120 Brown Female

63 125 Green Female

67 140 Blue Female

68 165 Brown Female

66 130 Green Female

Page 10: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

10

Classification

• Identify which of N classes a data point, x, belongs to.

• x is a column vector of features.

OR

Page 11: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

11

Target Values

• In supervised approaches, in addition to a data point, x, we will also have access to a target value, t.

Goal of Classification

Identify a function y, such that y(x) = t

Page 12: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

12

Feature Representations

Height Weight Eye Color Gender

66 170 Blue Male

73 210 Brown Male

72 165 Green Male

70 180 Blue Male

74 185 Brown Male

68 155 Green Male

65 150 Blue Female

64 120 Brown Female

63 125 Green Female

67 140 Blue Female

68 165 Brown Female

66 130 Green Female

Page 13: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

13

Graphical Example of Classification

Page 14: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

14

Graphical Example of Classification

?

Page 15: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

15

Graphical Example of Classification

?

Page 16: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

16

Graphical Example of Classification

Page 17: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

17

Graphical Example of Classification

Page 18: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

18

Graphical Example of Classification

Page 19: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

19

Decision Boundaries

Page 20: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

20

Regression

• Regression is a supervised machine learning task. – So a target value, t, is given.

• Classification: nominal t • Regression: continuous t

Goal of Classification

Identify a function y, such that y(x) = t

Page 21: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

21

Differences between Classification and Regression

• Similar goals: Identify y(x) = t.• What are the differences?

– The form of the function, y (naturally).– Evaluation

• Root Mean Squared Error• Absolute Value Error• Classification Error• Maximum Likelihood

– Evaluation drives the optimization operation that learns the function, y.

Page 22: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

22

Graphical Example of Regression

?

Page 23: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

23

Graphical Example of Regression

Page 24: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

24

Graphical Example of Regression

Page 25: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

25

Clustering

• Clustering is an unsupervised learning task.– There is no target value to shoot for.

• Identify groups of “similar” data points, that are “dissimilar” from others.

• Partition the data into groups (clusters) that satisfy these constraints1. Points in the same cluster should be similar.

2. Points in different clusters should be dissimilar.

Page 26: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

26

Graphical Example of Clustering

Page 27: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

27

Graphical Example of Clustering

Page 28: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

28

Graphical Example of Clustering

Page 29: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

29

Mechanisms of Machine Learning

• Statistical Estimation– Numerical Optimization– Theoretical Optimization

• Feature Manipulation• Similarity Measures

Page 30: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

30

Mathematical Necessities

• Probability• Statistics• Calculus

– Vector Calculus

• Linear Algebra

• Is this a Math course in disguise?

Page 31: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

31

Why do we need so much math?

• Probability Density Functions allow the evaluation of how likely a data point is under a model. – Want to identify good PDFs. (calculus)– Want to evaluate against a known PDF. (algebra)

Page 32: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

32

Gaussian Distributions

• We use Gaussian Distributions all over the place.

Page 33: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

33

Gaussian Distributions

• We use Gaussian Distributions all over the place.

Page 34: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

34

Class Structure and Policies

• Course website:– http://eniac.cs.qc.cuny.edu/andrew/gcml-11/syllabus.html

• Google Group for discussions and announcements– http://groups.google.com/gcml-spring2011– Please sign up for the group ASAP. – Or put your email address on the sign up sheet, and you will be

sent an invitation.

Page 35: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

35

Data Data Data

• “There’s no data like more data”• All machine learning techniques rely on the

availability of data to learn from.• There is an ever increasing amount of data

being generated, but it’s not always easy to process.

• UCI– http://archive.ics.uci.edu/ml/

• LDC (Linguistic Data Consortium)– http://www.ldc.upenn.edu/

Page 36: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

36

Half time.

Get Coffee.Stretch.

Page 37: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

37

Decision Trees

• Classification Technique.

color

h w w

w w h h

blue brown green

<66<140 <150

<66 <64<145 <170

m m

m

m mf f

f

f f

Page 38: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

38

Decision Trees

• Very easy to evaluate. • Nested if statements

color

h w w

w w h h

blue brown green

<66<140 <150

<66 <64<145 <170

m m

m

m mf f

f

f f

Page 39: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

39

More formal Definition of a Decision Tree

• A Tree data structure• Each internal node corresponds to a feature• Leaves are associated with target values.• Nodes with nominal features have N

children, where N is the number of nominal values

• Nodes with continuous features have two children for values less than and greater than or equal to a break point.

Page 40: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

40

Training a Decision Tree

• How do you decide what feature to use?• For continuous features how do you

decide what break point to use?

• Goal: Optimize Classification Accuracy.

Page 41: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

41

Example Data Set

Height Weight Eye Color Gender

66 170 Blue Male

73 210 Brown Male

72 165 Green Male

70 180 Blue Male

74 185 Brown Male

68 155 Green Male

65 150 Blue Female

64 120 Brown Female

63 125 Green Female

67 140 Blue Female

68 165 Brown Female

66 130 Green Female

Page 42: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

42

Baseline Classification Accuracy

• Select the majority class.– Here 6/12 Male, 6/12 Female.– Baseline Accuracy: 50%

• How good is each branch?– The improvement to classification accuracy

Page 43: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

43

Training Example

• Possible branches

color

blue brown green

2M / 2F 2M / 2F 2M / 2F

50% Accuracy before Branch

50% Accuracy after Branch

0% Accuracy Improvement

Page 44: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

44

Example Data Set

Height Weight Eye Color Gender

63 125 Green Female

64 120 Brown Female

65 150 Blue Female

66 170 Blue Male

66 130 Green Female

67 140 Blue Female

68 145 Brown Female

6 155 Green Male

70 180 Blue Male

72 165 Green Male

73 210 Brown Male

74 185 Brown Male

Page 45: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

45

Training Example

• Possible branches

height

<68

1M / 5F 5M / 1F

50% Accuracy before Branch

83.3% Accuracy after Branch

33.3% Accuracy Improvement

Page 46: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

46

Example Data Set

Height Weight Eye Color Gender

64 120 Brown Female

63 125 Green Female

66 130 Green Female

67 140 Blue Female

68 145 Brown Female

65 150 Blue Female

68 155 Green Male

72 165 Green Male

66 170 Blue Male

70 180 Blue Male

74 185 Brown Male

73 210 Brown Male

Page 47: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

47

Training Example

• Possible branches

weight

<165

1M / 6F 5M

50% Accuracy before Branch

91.7% Accuracy after Branch

41.7% Accuracy Improvement

Page 48: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

48

Training Example

• Recursively train child nodes.

weight

<165

5M height

<68

5F 1M / 1F

Page 49: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

49

Training Example

• Finished Tree weight

<165

5M height

<68

5F weight

<155

1M 1F

Page 50: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

50

Generalization

• What is the performance of the tree on the training data?– Is there any way we could get less than 100%

accuracy?

• What performance can we expect on unseen data?

Page 51: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

51

Evaluation

• Evaluate performance on data that was not used in training.

• Isolate a subset of data points to be used for evaluation.

• Evaluate generalization performance.

Page 52: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

52

Evaluation of our Decision Tree

• What is the Training performance?• What is the Evaluation performance?

– Never classify female over 165– Never classify male under 165, and under 68.– The middle section is trickier.

• What are some ways to make these similar?

Page 53: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

53

Pruning

• There are many pruning techniques.• A simple approach is to have a minimum

membership size in each node.weight

<165

5M height

<68

5F weight

<155

1M 1F

weight

<165

5M height

<68

5F 1F / 1M

Page 54: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

54

Decision Tree Recap

• Training via Recursive Partitioning.• Simple, interpretable models.• Different node selection criteria can be

used.– Information theory is a common choice.

• Pruning techniques can be used to make the model more robust to unseen data.

Page 55: Machine Learning CUNY Graduate Center Lecture 1: Introduction.

55

Next Time: Math Primer

• Probability– Bayes Rule– Naïve Bayes Classification

• Calculus– Vector Calculus

• Optimization– Lagrange Multipliers