Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

84
Lecture 1 Recap Prof. Leal-Taixé and Prof. Niessner 1

Transcript of Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Page 1: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Lecture 1 Recap

Prof. Leal-Taixé and Prof. Niessner 1

Page 2: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Reinforcement learning

Agents Environmentinteraction

Machine learning

Unsupervised learning Supervised learning

Prof. Leal-Taixé and Prof. Niessner 2

Page 3: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Nearest Neighbor

?Prof. Leal-Taixé and Prof. Niessner 3

Page 4: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Nearest Neighbor

distance

NN classifier = dog

Prof. Leal-Taixé and Prof. Niessner 4

Page 5: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Nearest Neighbor

Courtesy of Stanford course cs231n

Decision boundaries

Prof. Leal-Taixé and Prof. Niessner 5

Page 6: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Linear Regression

Prof. Leal-Taixé and Prof. Niessner 6

Page 7: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Linear regression

• Supervised learning

• Find a linear model that explains a target given the inputs

Prof. Leal-Taixé and Prof. Niessner 7

Page 8: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Training

Data points

Linear regression

Input (e.g. image, measurement)

Labels (e.g. cat/dog)

Learner

Model parameters

Prof. Leal-Taixé and Prof. Niessner 8

Page 9: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Training

Testing

Data points

Learner

Model parameters

Predictor

Estimation

Linear regression

Prof. Leal-Taixé and Prof. Niessner 9

Page 10: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Input data, features

weights, model parameters

d inputs

Linear prediction• A linear model is expressed in the form

Prof. Leal-Taixé and Prof. Niessner 10

Page 11: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

1 bias

Linear prediction• A linear model is expressed in the form

Prof. Leal-Taixé and Prof. Niessner 11

Page 12: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Temperature of a building

Outside temperature

Number of people

Sun exposure

Level of humidity

Linear prediction

Prof. Leal-Taixé and Prof. Niessner 12

Page 13: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Input features

Model parameters

Prediction

Linear prediction

Prof. Leal-Taixé and Prof. Niessner 13

Page 14: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Temperature of the building

Out

sid

e te

mp

erat

ure

Hum

idity

Num

ber

peo

ple

Sun

exp

osur

e (%

)

MODEL

Linear prediction

Prof. Leal-Taixé and Prof. Niessner 14

Page 15: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Temperature of the building

Out

sid

e te

mp

erat

ure

Hum

idity

Num

ber

peo

ple

Sun

exp

osur

e (%

)

MODEL

How do we obtain the

model?

Linear prediction

Prof. Leal-Taixé and Prof. Niessner 15

Page 16: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

How to obtain the model?

Data points

Model parameters

Labels (ground truth)

Estimation

Loss function

Optimization

Prof. Leal-Taixé and Prof. Niessner 16

Page 17: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

How to obtain the model?

• Loss function: measures how good my estimation is (how good my model is) and tells the optimization method how to make it better.

• Optimization: changes the model in order to improve the loss function (i.e. to improve my estimation).

Prof. Leal-Taixé and Prof. Niessner 17

Page 18: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Prediction: Temperature of

the building

Linear regression: loss function

Prof. Leal-Taixé and Prof. Niessner 18

Page 19: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Prediction: Temperature of

the building

Linear regression: loss function

Prof. Leal-Taixé and Prof. Niessner 19

Page 20: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

MinimizingObjective function

EnergyCost function

Linear regression: loss function

Prof. Leal-Taixé and Prof. Niessner 20

Page 21: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Optimization: linear least squares• Linear least squares: an approach to fit a

mathematical model to the data

• Convex problem, there exists a closed-form solution that is unique.

Prof. Leal-Taixé and Prof. Niessner 21

Page 22: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Optimization: linear least squares

The estimation comes from the

linear model

training samples

Prof. Leal-Taixé and Prof. Niessner 22

Page 23: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Optimization: linear least squares

Matrix notation

labelstraining samples, each input vector

has size Prof. Leal-Taixé and Prof. Niessner 23

Page 24: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Optimization: linear least squares

Matrix notation

Thursday April 19th: exercise session, more on matrix notation

Prof. Leal-Taixé and Prof. Niessner 24

Page 25: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Optimization: linear least squares

Convex

OptimumProf. Leal-Taixé and Prof. Niessner 25

Page 26: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Optimization

More info on Thursday!

No theta there!

Prof. Leal-Taixé and Prof. Niessner 26

Page 27: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Output: Temperature of

the buildingInputs: Outside temperature,

number of people…

Optimization

Prof. Leal-Taixé and Prof. Niessner 27

Page 28: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Is this the best estimate?• Least squares estimate

Prof. Leal-Taixé and Prof. Niessner 28

Page 29: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Maximum Likelihood

Prof. Leal-Taixé and Prof. Niessner 29

Page 30: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Maximum Likelihood Estimate

True underlying distribution

Controlled by a parameter

Parametric family of distributions

Prof. Leal-Taixé and Prof. Niessner 30

Page 31: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Maximum Likelihood Estimate• A method of estimating the parameters of a statistical

model given observations,

Observations from

Prof. Leal-Taixé and Prof. Niessner 31

Page 32: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Maximum Likelihood Estimate• A method of estimating the parameters of a statistical

model given observations, by finding the parameter values that maximize the likelihood of making the observations given the parameters.

Training samples

Prof. Leal-Taixé and Prof. Niessner 32

Page 33: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Maximum Likelihood Estimate

Logarithmic property

Prof. Leal-Taixé and Prof. Niessner 33

Page 34: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Back to linear regression

Prof. Leal-Taixé and Prof. Niessner 34

Page 35: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Back to linear regression

Assuming

Gaussian or Normal distribution

mean

Prof. Leal-Taixé and Prof. Niessner 35

Page 36: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Back to linear regression

Assuming

Prof. Leal-Taixé and Prof. Niessner 36

Page 37: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Back to linear regression

Assuming

mean

Prof. Leal-Taixé and Prof. Niessner 37

Page 38: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Back to linear regression

Original optimization problem

Prof. Leal-Taixé and Prof. Niessner 38

Page 39: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Back to linear regression

Matrix notation

Canceling log and e

Prof. Leal-Taixé and Prof. Niessner 39

Page 40: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Back to linear regression

How can we find the estimate of theta?

Prof. Leal-Taixé and Prof. Niessner 40

Page 41: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Linear regression• Maximum Likelihood Estimate (MLE) corresponds to

the Least Squares Estimate (given the assumptions)

• Introduced the concepts of loss function and optimization to obtain the best model for regression

Prof. Leal-Taixé and Prof. Niessner 41

Page 42: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Image classification

Prof. Leal-Taixé and Prof. Niessner 42

Page 43: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regression vs. Classification

• Regression: predict a continuous output value (e.g. temperature of a room)

• Classification: predicts a discrete value – Binary classification: output is either 0 or 1– Multi-class classification: set of N classes

Prof. Leal-Taixé and Prof. Niessner 43

Page 44: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression

CAT classifier

Prof. Leal-Taixé and Prof. Niessner 44

Page 45: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Sigmoid for binary predictions• 0

Prof. Leal-Taixé and Prof. Niessner 45

Can be interpreted as a probability

1

Page 46: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Spoiler alert: 1 layer Neural Network• 0

Prof. Leal-Taixé and Prof. Niessner 46

Can be interpreted as a probability

1

Page 47: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression• Probability of a binary output

Prof. Leal-Taixé and Prof. Niessner 47

Page 48: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression• Probability of a binary output

Prof. Leal-Taixé and Prof. Niessner 48

Model for coins

Bernoulli trial

Page 49: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression• Probability of a binary output

Model for coins

Prof. Leal-Taixé and Prof. Niessner 49

Page 50: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression: loss function• Probability of a binary output

• Maximum Likelihood Estimate

Prof. Leal-Taixé and Prof. Niessner 50

Page 51: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression: loss function

Prof. Leal-Taixé and Prof. Niessner 51

Page 52: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression: loss function

Prof. Leal-Taixé and Prof. Niessner 52

Maximize!

Page 53: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression: loss function

Prof. Leal-Taixé and Prof. Niessner 53

We want. large, since logarithm is a monotonically increasing function, we also want

large (1 is the largest value my estimate can take!)

Page 54: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression: loss function

Prof. Leal-Taixé and Prof. Niessner 54

We want. large, so we want to be small (0 is the smallest value my estimate can take!)

Page 55: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression: loss function

• Related to the multi-class loss you will see in this course (also called softmax loss)

Prof. Leal-Taixé and Prof. Niessner 55

Referred to as cross-entropy loss

Page 56: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression: optimization• Loss function

• Cost function

Prof. Leal-Taixé and Prof. Niessner 56

Minimization

Page 57: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Logistic regression: optimization• No closed-form solution

• Make use of an iterative method gradient descent

Prof. Leal-Taixé and Prof. Niessner 57

Page 58: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Gradient descent

Prof. Leal-Taixé and Prof. Niessner 58

Page 59: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Following the slope

Optimum

Prof. Leal-Taixé and Prof. Niessner 59

Initialization

Page 60: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Following the slope

Prof. Leal-Taixé and Prof. Niessner 60

Initialization

Optimum

Page 61: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Following the slope

Prof. Leal-Taixé and Prof. Niessner 61

Initialization

Follow the slope of the DERIVATIVE

Optimum

Page 62: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Gradient steps• From derivative to gradient

• Gradient steps in direction of negative gradient

Prof. Leal-Taixé and Prof. Niessner 62

Direction of greatest

increase of the function

Learning rate

Page 63: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Gradient steps• From derivative to gradient

• Gradient steps in direction of negative gradient

Prof. Leal-Taixé and Prof. Niessner 63

Direction of greatest

increase of the function

SMALL Learning rate

Page 64: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Gradient steps• From derivative to gradient

• Gradient steps in direction of negative gradient

Prof. Leal-Taixé and Prof. Niessner 64

Direction of greatest

increase of the function

LARGE Learning rate

Page 65: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Convergence

Prof. Leal-Taixé and Prof. Niessner 65

Initialization

What is the gradient when we reach this point?

Not guaranteed to reach the

optimumOptimum

Page 66: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Numerical gradient

• Approximate• Slow evaluation

Prof. Leal-Taixé and Prof. Niessner 66

Page 67: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Analytical gradient• Exact and fast

Prof. Leal-Taixé and Prof. Niessner 67

Analytical gradient

Recall Linear Regression

A step in that direction

Page 68: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Gradient descent for least squares

Prof. Leal-Taixé and Prof. Niessner 68

Convex, always converges to the same solution

Page 69: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Gradient descent for logistic regression

Prof. Leal-Taixé and Prof. Niessner 69

Explained in the following lectures when the following topics are introduced:• Computational graphs• Backpropagation of the gradients

Page 70: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization

Prof. Leal-Taixé and Prof. Niessner 71

Page 71: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization

Loss

L2 regularization

L1 regularization

Max norm regularization

Dropout

Prof. Leal-Taixé and Prof. Niessner 72

Page 72: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization: example• Input: 3 features

• Two linear classifiers that give the same result:

Ignores 2 features

Takes information from all features

Prof. Leal-Taixé and Prof. Niessner 73

Page 73: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization: example

Loss

L2 regularization

Minimization

Prof. Leal-Taixé and Prof. Niessner 74

Page 74: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization: example

Loss

Minimization

L1 regularization

Prof. Leal-Taixé and Prof. Niessner 75

Page 75: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization: example• Input: 3 features

• Two linear classifiers that give the same result:

Ignores 2 features

Takes information from all features

Prof. Leal-Taixé and Prof. Niessner 76

Page 76: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization: example• Input: 3 features

• Two linear classifiers that give the same result:

L1 regularization enforces sparsity

Takes information from all features

Prof. Leal-Taixé and Prof. Niessner 77

Page 77: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization: example• Input: 3 features

• Two linear classifiers that give the same result:

L1 regularization enforces sparsity

L2 regularization enforces that the weights have similar values

Prof. Leal-Taixé and Prof. Niessner 78

Page 78: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization: effect• Dog classifier takes different inputs

Furry

Has two eyes

Has a tail

Has paws

Has two ears

L1 regularization will focus all the attention to a few key features

Prof. Leal-Taixé and Prof. Niessner 79

Page 79: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization: effect• Dog classifier takes different inputs

Furry

Has two eyes

Has a tail

Has paws

Has two ears

L2 regularization will take all information into account to make decisions

Prof. Leal-Taixé and Prof. Niessner 80

Page 80: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization

Credits: University of Washington

What is the goal of regularization?

What happens to the training error?

Prof. Leal-Taixé and Prof. Niessner 81

Page 81: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Regularization• Any strategy that aims to

Prof. Leal-Taixé and Prof. Niessner 82

Lower validation error

Increasing training error

Page 82: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Beyond linear

This lecture Next lectures

Prof. Leal-Taixé and Prof. Niessner 83

Page 83: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Next lectures• Next Monday: Introduction to Neural Networks

• This Thursday: Math refresh #2, particularly useful for the jump to NN!

Prof. Leal-Taixé and Prof. Niessner 84

Page 84: Lecture 1 Recap · PowerPoint Presentation Author: Laura LT Created Date: 4/22/2018 5:59:36 PM ...

Useful references• General Machine Learning book:

– Pattern Recognition and Machine Learning. C. Bishop.

Prof. Leal-Taixé and Prof. Niessner 85