Deep Learning Models - SNU · PDF fileApplication cases of deep learning models ... Image...

Deep Learning Models

2012-05-03

Byoung-Hee Kim

Biointelligence Lab, CSE,

Seoul National University

NOTE: most slides are from talks of Geoffrey Hinton, Andrew Ng, and Yoshua Bengio.


Input output target

Two!

Artificial Neural Networks


Historical background:

First generation neural networks

Perceptrons (~1960) used a layer of hand-coded features and tried to recognize objects by learning how to weight these features.

There was a neat learning algorithm for adjusting the weights.

But perceptrons are fundamentally limited in what they can learn to do.

non-adaptive

hand-coded

features

output units e.g.

class labels

input units

e.g. pixels

Sketch of a typical

perceptron from the 1960’s

Bomb Toy


Second generation neural networks (~1985)

input vector

hidden

layers

outputs

Back-propagate

error signal to

get derivatives

for learning

Compare outputs with

correct answer to get

error signal



But, finding any model with deep architecture was not successful till 2006


http://www.iro.umontreal.ca/~pift6266/H10/notes/deepintro.html



Agenda

Computer Perception

Unsupervised feature learning

Various deep learning models

Application cases of deep learning models

Written digit recognition/generation (MNIST dataset)

Image classification

Audio recognition

Language modeling

Motion generation

References

Appendix


Brain-like Cognitive Computing & Deep Learning

It is well know that the brain has a hierarchical structure

Researchers try to build models that simulate and/or act like the brain

Learning deep structures from data, or the deep learning is a new frontier in Artificial Intelligence research

Researchers try to find analogies between the characteristics of the brain and their deep models


Feature Learning

Input

Input space Motorbikes

“Non”-Motorbikes

Learning algorithm

pixel 1

pix

el 2

pixel 1

pixel 2


Feature Learning

Input

Input space Feature space Motorbikes

“Non”-Motorbikes

Feature Extractor

Learning algorithm

pixel 1

pix

el 2

“wheel”

“han

dle

”

handle

wheel


How is computer perception done?

Image Low-level

vision features Recognition

Low-level state

features Action Helicopter

Audio Low-level

audio features

Speaker

identification

Object

detection

Audio

classification

Helicopter

control


http://heli.stanford.edu/

http://heli.stanford.edu/

Learning representations

Sensor Learning algorithm

Feature

Representation


Computer vision features

SIFT Spin image

HoG RIFT

Textons GLOH (C) 2012, SNU Biointelligence Lab, http://bi.snu.ac.kr/ 24

Audio features

ZCR

Spectrogram MFCC

Rolloff Flux


Problems of hand-tuned features

Needs expert knowledge

Sub-optimal

Time-consuming and expensive

Does not generalize to other domains


Can we automatically learn good feature representations?

Sensor representation in the brain

[BrainPort; Martinez et al; Roe et al.]

Seeing with your tongue Human echolocation (sonar)

Auditory cortex

learns to see.

Auditory

Cortex


Unsupervised Feature Learning

Find a better way to represent images than pixels


The goal of Unsupervised Feature Learning

Unlabeled images

Learning algorithm

Feature representation


Stochastic binary units (Bernoulli variables)

These have a state of 1 or 0.

The probability of turning on is determined by the weighted input from other units (plus a bias)

0

0

1

j

jijii

wsbsp

)exp(1)(

11

j

jiji wsb

)( 1isp



Binary

Stochastic

Neuron

A model of digit recognition

2000 top-level neurons

500 neurons

500 neurons

28 x 28

pixel

image

10 label

neurons

The model learns to generate

combinations of labels and images.

To perform recognition we start with a

neutral state of the label units and do

an up-pass from the image followed

by a few iterations of the top-level

associative memory.

The top two layers form an

associative memory whose

energy landscape models the low

dimensional manifolds of the

digits.

The energy valleys have names


Generation & Recognition of Digits by DBN

Deep belief network that learns to generate handwritten digits

http://www.cs.toronto.edu/~hinton/digits.html


http://www.cs.toronto.edu/~hinton/digits.html

First stage of visual processing in brain: V1

Schematic of simple cell Actual simple cell

[Images from DeAngelis, Ohzawa & Freeman, 1995]

“Gabor functions.”

The first stage of visual processing in the brain (V1) does

“edge detection.”

Sparse coding illustration

Natural Images Learned bases (f1 , …, f64): “Edges”

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

50 100 150 200 250 300 350 400 450 500

50

100

150

200

250

300

350

400

450

500

0.8 * + 0.3 * + 0.5 *

x 0.8 * f36 + 0.3 * f42

+ 0.5 * f63

[0, 0, …, 0, 0.8, 0, …, 0, 0.3, 0, …, 0, 0.5, …] = [a1, …, a64] (feature representation)

Test example

Compact & easily interpretable

Supervised learning

Testing:

What is this?

Cars Motorcycles

Semi-supervised learning

Unlabeled images (all cars/motorcycles)

Testing:

What is this?

Car Motorcycle

Self-taught learning

Testing:

What is this?

Car Motorcycle

Unlabeled images (random internet images)

Self-taught learning

Sparse coding, LCC, etc.

f1, f2, …, fk

Car Motorcycle

Use learned f1, f2, …, fk to represent training/test sets.

Using f1, f2, …, fk a1, a2, …, ak

Convolutional DBN for Images

Convolutional DBN on face images

pixels

edges

object parts

(combination

of edges)

object models

Examples of learned object parts from object categories

Learning of object parts

Faces Cars Elephants Chairs

Training on multiple objects

Plot of H(class|neuron active)

Trained on 4 classes (cars, faces, motorbikes, airplanes).

Second layer: Shared-features and object-specific features.

Third layer: More specific features.

Input images

Samples from feedforward Inference

(control)

Samples from Full posterior inference

Hierarchical probabilistic inference

Generating posterior samples from faces by “filling in” experiments

(cf. Lee and Mumford, 2003). Combine bottom-up and top-down inference.

An application to modeling motion capture data (Taylor, Roweis & Hinton, 2007)

Human motion can be captured by placing reflective markers on the joints and then using lots of infrared cameras to track the 3-D positions of the markers.

Given a skeletal model, the 3-D positions of the markers can be converted into the joint angles plus 6 parameters that describe the 3-D position and the roll, pitch and yaw of the pelvis. We only represent changes in yaw because physics

doesn’t care about its value and we want to avoid circular variables.


Video lecture: http://videolectures.net/gesturerecognition2011_taylor_tutorial/

http://videolectures.net/gesturerecognition2011_taylor_tutorial/

http://videolectures.net/gesturerecognition2011_taylor_tutorial/

Motion Generation by Conditional RBM


Hinton’s Talk in Google:

http://www.youtube.com/watch?v=VdIURAu1-aU

Andrew Ng’s Talk in Bay Area Vision Meeting: Unsupervised Feature Learning and Deep Learning

http://www.youtube.com/watch?v=ZmNOAtZIgIk&feature=relmfu










References

General Info on Deep Learning

http://deeplearning.net/

Review

Y. Bengio, Learning deep architectures for AI, Foundations and Trends in Machine Learning, 2(1):1-127, 2009.

I. Arel, D.C. Rose, and T.P. Karnowski, Deep machine learning – A new frontier in Artificial Intelligence Research, Computational Intelligence Magazine, 14:12-18, 2010.




References

Tutorials & Workshops

Deep Learning and Unsupervised Feature Learning workshop – NIPS 2010: http://deeplearningworkshopnips2010.wordpress.com/schedule/acceptedpapers/

Workshop on Learning Feature Hierarchies - ICML 2009: http://www.cs.toronto.edu/~rsalakhu/deeplearning/index.html


http://deeplearningworkshopnips2010.wordpress.com/schedule/acceptedpapers/

http://deeplearningworkshopnips2010.wordpress.com/schedule/acceptedpapers/

http://www.cs.toronto.edu/~rsalakhu/deeplearning/index.html



Deep Learning Models - SNU · PDF fileApplication cases of deep learning models ... Image...

Documents

Transcript of Deep Learning Models - SNU · PDF fileApplication cases of deep learning models ... Image...