Deep Learning Models - SNU · PDF fileApplication cases of deep learning models ... Image...

70
Deep Learning Models 2012-05-03 Byoung-Hee Kim Biointelligence Lab, CSE, Seoul National University NOTE: most slides are from talks of Geoffrey Hinton, Andrew Ng, and Yoshua Bengio.

Transcript of Deep Learning Models - SNU · PDF fileApplication cases of deep learning models ... Image...

Deep Learning Models

2012-05-03

Byoung-Hee Kim

Biointelligence Lab, CSE,

Seoul National University

NOTE: most slides are from talks of Geoffrey Hinton, Andrew Ng, and Yoshua Bengio.

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 2

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 3

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 4

Input output target

Two!

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 5

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 6

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 7

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 8

Artificial Neural Networks

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 9

Historical background:

First generation neural networks

Perceptrons (~1960) used a layer of hand-coded features and tried to recognize objects by learning how to weight these features.

There was a neat learning algorithm for adjusting the weights.

But perceptrons are fundamentally limited in what they can learn to do.

non-adaptive

hand-coded

features

output units e.g.

class labels

input units

e.g. pixels

Sketch of a typical

perceptron from the 1960’s

Bomb Toy

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 10

Second generation neural networks (~1985)

input vector

hidden

layers

outputs

Back-propagate

error signal to

get derivatives

for learning

Compare outputs with

correct answer to get

error signal

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 11

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 12

But, finding any model with deep architecture was not successful till 2006

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 13

http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 14

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 15

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 16

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 17

Agenda

Computer Perception

Unsupervised feature learning

Various deep learning models

Application cases of deep learning models

Written digit recognition/generation (MNIST dataset)

Image classification

Audio recognition

Language modeling

Motion generation

References

Appendix

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 18

Brain-like Cognitive Computing & Deep Learning

It is well know that the brain has a hierarchical structure

Researchers try to build models that simulate and/or act like the brain

Learning deep structures from data, or the deep learning is a new frontier in Artificial Intelligence research

Researchers try to find analogies between the characteristics of the brain and their deep models

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 19

Feature Learning

Input

Input space Motorbikes

“Non”-Motorbikes

Learning algorithm

pixel 1

pix

el 2

pixel 1

pixel 2

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 20

Feature Learning

Input

Input space Feature space Motorbikes

“Non”-Motorbikes

Feature Extractor

Learning algorithm

pixel 1

pix

el 2

“wheel”

“han

dle

handle

wheel

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 21

How is computer perception done?

Image Low-level

vision features Recognition

Low-level state

features Action Helicopter

Audio Low-level

audio features

Speaker

identification

Object

detection

Audio

classification

Helicopter

control

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 22

Learning representations

Sensor Learning algorithm

Feature

Representation

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 23

Computer vision features

SIFT Spin image

HoG RIFT

Textons GLOH (C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 24

Audio features

ZCR

Spectrogram MFCC

Rolloff Flux

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 25

Problems of hand-tuned features

Needs expert knowledge

Sub-optimal

Time-consuming and expensive

Does not generalize to other domains

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 26

Can we automatically learn good feature representations?

Sensor representation in the brain

[BrainPort; Martinez et al; Roe et al.]

Seeing with your tongue Human echolocation (sonar)

Auditory cortex

learns to see.

Auditory

Cortex

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 27

Unsupervised Feature Learning

Find a better way to represent images than pixels

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 28

The goal of Unsupervised Feature Learning

Unlabeled images

Learning algorithm

Feature representation

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 29

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 30

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 31

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 32

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 33

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 34

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 35

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 36

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 37

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 38

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 39

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 40

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 41

Stochastic binary units (Bernoulli variables)

These have a state of 1 or 0.

The probability of turning on is determined by the weighted input from other units (plus a bias)

0

0

1

j

jijii

wsbsp

)exp(1)(

11

j

jiji wsb

)( 1isp

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 42

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 43

Binary

Stochastic

Neuron

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 44

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 45

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 46

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 47

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 48

A model of digit recognition

2000 top-level neurons

500 neurons

500 neurons

28 x 28

pixel

image

10 label

neurons

The model learns to generate

combinations of labels and images.

To perform recognition we start with a

neutral state of the label units and do

an up-pass from the image followed

by a few iterations of the top-level

associative memory.

The top two layers form an

associative memory whose

energy landscape models the low

dimensional manifolds of the

digits.

The energy valleys have names

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 49

Generation & Recognition of Digits by DBN

Deep belief network that learns to generate handwritten digits

http://www.cs.toronto.edu/~hinton/digits.html

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 50

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 51

First stage of visual processing in brain: V1

Schematic of simple cell Actual simple cell

[Images from DeAngelis, Ohzawa & Freeman, 1995]

“Gabor functions.”

The first stage of visual processing in the brain (V1) does

“edge detection.”

Sparse coding illustration

Natural Images Learned bases (f1 , …, f64): “Edges”

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

0.8 * + 0.3 * + 0.5 *

x 0.8 * f36 + 0.3 * f42

+ 0.5 * f63

[0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] = [a1, …, a64] (feature representation)

Test example

Compact & easily interpretable

Supervised learning

Testing:

What is this?

Cars Motorcycles

Semi-supervised learning

Unlabeled images (all cars/motorcycles)

Testing:

What is this?

Car Motorcycle

Self-taught learning

Testing:

What is this?

Car Motorcycle

Unlabeled images (random internet images)

Self-taught learning

Sparse coding, LCC, etc.

f1, f2, …, fk

Car Motorcycle

Use learned f1, f2, …, fk to represent training/test sets.

Using f1, f2, …, fk a1, a2, …, ak

Convolutional DBN for Images

Convolutional DBN on face images

pixels

edges

object parts

(combination

of edges)

object models

Examples of learned object parts from object categories

Learning of object parts

Faces Cars Elephants Chairs

Training on multiple objects

Plot of H(class|neuron active)

Trained on 4 classes (cars, faces, motorbikes, airplanes).

Second layer: Shared-features and object-specific features.

Third layer: More specific features.

Input images

Samples from feedforward Inference

(control)

Samples from Full posterior inference

Hierarchical probabilistic inference

Generating posterior samples from faces by “filling in” experiments

(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.

An application to modeling motion capture data (Taylor, Roweis & Hinton, 2007)

Human motion can be captured by placing reflective markers on the joints and then using lots of infrared cameras to track the 3-D positions of the markers.

Given a skeletal model, the 3-D positions of the markers can be converted into the joint angles plus 6 parameters that describe the 3-D position and the roll, pitch and yaw of the pelvis. We only represent changes in yaw because physics

doesn’t care about its value and we want to avoid circular variables.

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 63

Video lecture: http://videolectures.net/gesturerecognition2011_taylor_tutorial/

Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 64

Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 65

Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 66

Motion Generation by Conditional RBM

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 67

Hinton’s Talk in Google:

http://www.youtube.com/watch?v=VdIURAu1-aU

Andrew Ng’s Talk in Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning

http://www.youtube.com/watch?v=ZmNOAtZIgIk&feature=relmfu

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 68

References

General Info on Deep Learning

http://deeplearning.net/

Review

Y. Bengio, Learning deep architectures for AI, Foundations and Trends in Machine Learning, 2(1):1-127, 2009.

I. Arel, D.C. Rose, and T.P. Karnowski, Deep machine learning – A new frontier in Artificial Intelligence Research, Computational Intelligence Magazine, 14:12-18, 2010.

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 69

References

Tutorials & Workshops

Deep Learning and Unsupervised Feature Learning workshop – NIPS 2010: http://deeplearningworkshopnips2010.wordpress.com/schedule/acceptedpapers/

Workshop on Learning Feature Hierarchies - ICML 2009: http://www.cs.toronto.edu/~rsalakhu/deeplearning/index.html

(C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 70