2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced...

35
2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support from Indian Institute of Science, Bangalore and The University of Toronto,

Transcript of 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced...

Page 1: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

2010 Winter School on Machine Learning and Vision

Sponsored byCanadian Institute for Advanced

Researchand Microsoft Research India

With additional support from

Indian Institute of Science, Bangaloreand The University of Toronto, Canada

Page 2: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Agenda

Saturday Jan 9 – Sunday Jan 10: Preperatory Lectures

Monday Jan 11 – Saturday Jan 16: Tutorials and Research Lectures

Sunday Jan 17: Discussion and closing

Page 3: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Speakers

William Freeman, MITBrendan Frey, University of TorontoYann LeCun, New York UniversityJitendra Malik, UC BerkeleyBruno Olshaussen, UC BerkeleyB Ravindran, IIT MadrasSunita Sarawagi, IIT BombayManik Varma, MSR IndiaMartin Wainwright, UC BerkeleyYair Weiss, Hebrew UniversityRichard Zemel, University of Toronto

Page 4: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Winter School Organization

Co-Chairs: Brendan Frey, University of TorontoManik Varma, Microsoft Research India

Local Organzation: KR Ramakrishnan, IISc, BangaloreB Ravindran, IIT, MadrasSunita Sarawagi, IIT, Bombay

CIFAR and MSRI: Dr P Anandan, Managing Director, MSRIMichael Hunter, Research Officer, CIFARVidya Natampally, Director Strategy, MSRIDr Sue Schenk, Programs Director, CIFARAshwani Sharma, Manager Research, MSRIDr Mel Silverman, VP Research, CIFAR

Page 5: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

The Canadian Institute for Advanced Research (CIFAR)

• Objective: To fund networks of internationally leading researchers, and their students and postdoctoral fellows

• Programs– Neural computation and perception (vision)– Genetic networks– Cosmology and gravitation– Nanotechnology– Successful societies– …

• Track record: 13 Nobel prizes (8 current)

Page 6: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Neural Computation and Perception (Vision)

– Geoff Hinton, Director, Toronto

– Yoshua Bengio, Montreal– Michael Black, Brown– David Fleet, Toronto– Nando De Freitas, UBC– Bill Freeman*, MIT– Brendan Frey*, Toronto– Yann LeCun*, NYU– David Lowe, UBC

– David MacKay, U Cambridge– Bruno Olshaussen*, Berkeley– Sam Roweis, NYU– Nikolaus Troje, Queens– Martin Wainwright*, Berkeley– Yair Weiss*, Hebrew Univ– Hugh Wilson, York Univ– Rich Zemel*, Toronto– …

• Goal: Develop computational models for human-spectrum vision

• Members

Page 7: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Introduction to Machine Learning

Brendan FreyUniversity of Toronto

Brendan J. Frey

University of Toronto

Page 8: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Textbook

Christopher M. Bishop

Pattern Recognition and Machine Learning

Springer 2006

To avoid cluttering slides with citations, I’ll cite sources

only when the material is not presented in the textbook

Page 9: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

How can we develop algorithms that will• Track objects?• Recognize objects?• Segment objects?• Denoise the video?• Determine the state (eg, gait) of each object?…and do all this in 24 hours?

Analyzing video

Page 10: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Handwritten digit clustering and recognition

How can we develop algorithms that will• Automatically cluster these images?• Use a training set of labeled images to learn to classify

new images?• Discover how to account for variability in writing style?

Page 11: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Document analysis

How can we develop algorithms that will• Produce a summary of the document?• Find similar documents?• Predict document layouts that are suitable for different

readers?

Page 12: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Bioinformatics

How can we develop algorithms that will• Identify regions of DNA that have high levels of

transcriptional activity in specific tissues?• Find start sites and stop sites of genes, by looking for

common patterns of activity?• Find “out of place” activity patterns and label their DNA

regions as being non-functional?

Mousetissues

DNA activityLow High

Position in DNA

Page 13: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

The machine learning algorithm development pipeline

Problem statement

Mathematical description of a cost

function

Mathematical description of how to

minimize the cost function

Implementation

E(w) L()

p(x|w)

E/ wi

L / = 0

Given training vectors x1,…,xN and targets t1,…,tN, find…

r(i,k) = s(i,k) – maxj{s(i,j)+a(i,j)}…

Page 14: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Tracking using hand-labeled coordinates To track the man in the striped shirt, we could

1. Hand-label his horizontal position in some frames

2. Extract a feature, such as the location of a sinusoidal (stripe) pattern in a horizontal scan line

3. Relate the real-valued feature to the true labeled position

Pix

el i

nte

nsi

ty

Horizontal location of pixel

x = 100

t = 75

0 320

Feature, xHan

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

Page 15: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Feature, xHan

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

Feature, x

Ha

nd

-la

be

led

ho

rizo

nta

l co

ord

ina

te, t

Tracking using hand-labeled coordinates

How do we develop an algorithm that relates our input feature x to the hand-labeled target t?

Page 16: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Regression: Problem set-up

Input: x, Target: t, Training data: (x1,t1)…(xN,tN)

t is assumed to be a noisy measurement of an unknown function applied to x

Feature extracted from video frame

Horizontal position of object

“Ground truth” function

Page 17: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Example: Polynomial curve fittingy(x,w) = w0 + w1x + w2x2 + … + wMxM

Regression: Learn parameters w = (w1,…,wM)

Page 18: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Linear regression

• The form y(x,w) = w0 + w1x + w2x2 + … + wMxM is linear in the w’s

• Instead of x, x2, …, xM, we can generally use basis functions:

y(x,w) = w0 + w1 1(x) + w2 2(x) + … + wM M(x)

Page 19: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Multi-input linear regressiony(x,w) = w0 + w1 1(x) + w2 2(x) + … + wM M(x)

• x and 1(),…,M() are known, so the task of learning w doesn’t change if x is replaced with a vector of inputs x:

y(x,w) = w0 + w1 1(x) + w2 2(x) + … + wM M(x)

• Example:

• Now, each m(x) maps a vector to a real number

• A special case is linear regression for a linear model: m(x) = xm

x = entire scan line

Page 20: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Multi-input linear regression

• If we like, we can create a set of basis functions and lay them out in the D-dimensional space:

1-D 2-D

• Problem: Curse of dimensionality

Page 21: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

The curse of dimensionality

• Distributing bins or basis functions uniformly in the input space may work in 1 dimension, but will become exponentially useless in higher dimensions

Page 22: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Objective of regression: Minimize error

E(w) = ½ n ( tn - y(xn,w) )2

• This is called Sum of Squared Error, or SSE

Other forms• Mean Squared Error, MSE =

(1/N) n ( tn - y(xn,w) )2

• Root Mean Squared Error, RMSE, ERMS =

(1/N) n ( tn - y(xn,w) )2

Page 23: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

How the observed error propagates back to the parameters

E(w) = ½ n ( tn - mwmm(xn) )2

• The rate of change of E w.r.t. wm is

E(w)/wm = - n ( tn - y(xn,w) ) m(xn)

• The influence of input m(xn) on E(w) is given by weighting the error for each training case by m(xn)

y(xn,w)

Page 24: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Gradient-based algorithms

• Gradient descent– Initially, set w to small random values– Repeat until it’s time to stop:

For m = 0…M m - n ( tn - y(xn,w) ) m(xn)

or m (E(w1..wm+..wM)-E(w1..wm..wM)) / , where is tiny

For m = 0…M

wm wm - m, where is the learning rate

• “Off-the-shelf” conjugate gradients optimizer: You provide a function that, given w, returns E(w) and 0E,…,M E (total of M+2 numbers)

This is a finite-

element approximation to E(w)/wm

Page 25: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

An exact algorithm for linear regressiony(x,w) = w0 + w1 1(x) + w2 2(x) + … + wM M(x)

• Evaluate the basis functions for the training cases x1,…,xN and put them in a “design matrix”:

where we define 0(x) = 1 (to account for w0)

• Now, the vector of predictions is y = and the

error is E = (t- )T(t- ) = tTt - 2tT + T T

• Setting E/w = 0 gives -2 Tt + 2 T = 0

• Solution: wMATLAB

Page 26: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Over-fitting

• After learning, collect “test data” and measure it’s error• Over-fitting the training data leads to large test error

Page 27: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

If M is fixed, say at M = 9,collecting more training data helps…

N = 10

Page 28: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Model selection using validation data

• Collect additional “validation data” (or set aside some training data for this purpose)

• Perform regression with a range of values of M and use validation data to pick M

• Here, we could choose M = 7

Validation

Page 29: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Regularization using weight penalties(aka shrinkage, ridge regression, weight decay)

• To prevent over-fitting, we can penalize large weights:

E(w) = ½ n ( tn - y(xn,w) )2 + /2m wm2

• Now, over-fitting depends on the value of

Page 30: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Comparison of model selectionand ridge regression/weight decay

Page 31: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Using validation data to regularize tracking

Feature, x

Feature, xHan

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

Feature, xHan

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

Training data Validation data

Entire data set

Han

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

M = 5

Page 32: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Validation when data is limited

• S-fold cross validation– Partition the data into S sets– For M=1,2,…:

• For s=1…S:– Train on all data except the sth set– Measure error on sth set

• Add errors to get cross-validation error for M– Pick M with lowest cross-validation error

• Leave-one-out cross validation– Use when data is sparse– Same as S-fold cross validation, with S = N

Page 33: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

Questions?

Page 34: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.

How are we doing on the pass sequence?

• This fit is pretty good, but…

Han

d-l

abe

led

ho

rizo

nta

l co

ord

inat

e, t

The red line doesn’t reveal different levels of uncertainty in predictions

Cross validation reduced the training data, so the red line isn’t as accurate as it should be

Choosing a particular M and w seems wrong – we should hedge our bets

Page 35: 2010 Winter School on Machine Learning and Vision Sponsored by Canadian Institute for Advanced Research and Microsoft Research India With additional support.