Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks...

33
Neural networks: Basics Convolutional neural networks Deep Learning II: Neural Networks Hinrich Sch¨ utze Center for Information and Language Processing, LMU Munich 2017-07-21 Sch¨ utze (LMU Munich): Neural networks 1 / 33

Transcript of Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks...

Page 1: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Deep Learning II: Neural Networks

Hinrich Schutze

Center for Information and Language Processing, LMU Munich

2017-07-21

Schutze (LMU Munich): Neural networks 1 / 33

Page 2: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Overview

1 Neural networks: Basics

2 Convolutional neural networks

Schutze (LMU Munich): Neural networks 2 / 33

Page 3: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Outline

1 Neural networks: Basics

2 Convolutional neural networks

Schutze (LMU Munich): Neural networks 3 / 33

Page 4: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Inverted ClassroomAndrew Ng: “Machine Learning”

http://coursera.org

Schutze (LMU Munich): Neural networks 4 / 33

Page 5: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Neural networks: Andrew Ng videos

Model representation I

Model representation II

Schutze (LMU Munich): Neural networks 5 / 33

Page 6: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

A single neuron

Input nodes: x1, x2, x3

Parameters/weights: lines connecting nodes

Raw input to neuron: weighted sum ΘT~x =∑3

i=1 θixi

Nonlinear activation function (e.g., sigmoid):g(ΘT~x) = 1/(1 + exp(−ΘT~x))

Output of neuron: g(ΘT~x)

Schutze (LMU Munich): Neural networks 6 / 33

Page 7: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

A neuron

Inputs: x1, x2, x3

Parameters (= weights = lines): θ1, θ2, θ3

Activation function (e.g., sigmoid / logistic)

Hypothesis: hΘ(~x) = (ΘT~x)

Schutze (LMU Munich): Neural networks 7 / 33

Page 8: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

A neural network

Input layer (same as before): xi

Hidden layer, here: three neurons

Output layer, here: single neuron

Activations a(k)i , k = layer

Full connectivity

Same or different activation functions

Schutze (LMU Munich): Neural networks 8 / 33

Page 9: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

A neural network

Schutze (LMU Munich): Neural networks 9 / 33

Page 10: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Another neural network architecture

Schutze (LMU Munich): Neural networks 10 / 33

Page 11: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Another neural network architecture

Schutze (LMU Munich): Neural networks 11 / 33

Page 12: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Forward propagation of activity

a(i)j = gi (Θ

Tij ~a

(i−1)) = 1/(1 + exp(−ΘTij ~a

(i−1)))

Schutze (LMU Munich): Neural networks 12 / 33

Page 13: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Learning/Training: Backpropagation

As before: cost function

As before: objective(find parameters that minimize cost)

As before: gradient descent

That is: compute gradient and move along gradient

What’s new:We use backpropagation to compute the gradient.

Schutze (LMU Munich): Neural networks 13 / 33

Page 14: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Gradient descent

Schutze (LMU Munich): Neural networks 14 / 33

Page 15: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Neurons can be trained to detect features.

Schutze (LMU Munich): Neural networks 15 / 33

Page 16: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Deep learning:

Each layer learns more powerful/abstract features.

Schutze (LMU Munich): Neural networks 16 / 33

Page 17: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Increasingly abstract features in vision

Schutze (LMU Munich): Neural networks 17 / 33

Page 18: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Number of weights/parameters

|L1| ∗ |L2|+ |L2| ∗ |L3| = 3 · 3 + 3 · 1 = 12

Schutze (LMU Munich): Neural networks 18 / 33

Page 19: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Exercise: Number of weights/parameters

Schutze (LMU Munich): Neural networks 19 / 33

Page 20: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Task: Does this sentence mention an officer leaving?

Given: A sentence

Workforce Solutions Alamo fired CEO John Hathaway yesterday.

Binary classification task

Class “yes”: This sentence contains information about an officerleaving a company (so a financial analyst should look at it).Class “no”: This sentence does not contain information about anofficer leaving a company (so nobody has to look at it).

Correct class in this case?

Class “yes”: This sentence contains information about an officerleaving a company.Class “no”: This sentence does not contain information about anofficer leaving a company.

Schutze (LMU Munich): Neural networks 20 / 33

Page 21: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Task: Does this sentence mention an officer leaving?

Given: A sentence

CEO John Hathaway fired his gardener yesterday.

Correct class in this case?

Class “yes”: This sentence contains information about an officerleaving a company.Class “no”: This sentence does not contain information about anofficer leaving a company.

Schutze (LMU Munich): Neural networks 21 / 33

Page 22: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Task: Does this sentence mention an officer leaving?

Given: A sentence

This picture shows parting CEO Cook talking with ex-CFO Dyer.

Correct class in this case?

Class “yes”: This sentence contains information about an officerleaving a company.Class “no”: This sentence does not contain information about anofficer leaving a company.

Schutze (LMU Munich): Neural networks 22 / 33

Page 23: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Simple architecture for detecting leaving events

Schutze (LMU Munich): Neural networks 23 / 33

Page 24: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Hypothesis? Parameters? Cost? Objective?

Schutze (LMU Munich): Neural networks 24 / 33

Page 25: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Simplest architecture: Fixed-length input

→ Padding

1 2 3 4 5 6 7 8 9The board forced him to resign PAD PAD PADA majority of the board forced him to quit

Eventually the escalation led him to resign PAD PADIt’s legal threats that compelled him to leave PAD

key idea of convolution:learn a filter for the pattern“[force] [pronoun] to [leave]”filter = feature detector

Schutze (LMU Munich): Neural networks 25 / 33

Page 26: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Exercise

If you use this architecture, why is it hard to learn the filter “[force][pronoun] to [leave]”?

Schutze (LMU Munich): Neural networks 26 / 33

Page 27: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Outline

1 Neural networks: Basics

2 Convolutional neural networks

Schutze (LMU Munich): Neural networks 27 / 33

Page 28: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Use convolution&pooling architecture Input layer

Convolution layer Convolution layer (filter size 3)

Max pooling layer Applying convolutional filterMax pooling ⇒ Sentence describes “a leaving

event”. Convolution computes features. Maxpooling selects max feature.

convolution&pooling=compute&select features

0.9

0.0 0.0 0.0 0.1 0.8 0.6 0.2 0.0 0.7 0.9 0.9 0.1

PAD

PAD

This

pictu

re

shows

partin

g

CEO

Cook

talking

with

ex-C

FO

Dyer

PAD

PAD

a = g(H ⊙ X )a = maxici

g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )

max pooling

poolinglayer

selectsmax feature

convolutionlayer

computesfeatures

inputlayer

Schutze (LMU Munich): Neural networks 28 / 33

Page 29: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Convolution & pooling

Widely used in vision

Recent development: widely used in NLP

Best example of successful transferfrom vision to NLP

Schutze (LMU Munich): Neural networks 29 / 33

Page 30: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Convolution and max pooling in vision

Schutze (LMU Munich): Neural networks 30 / 33

Page 31: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Exercise

Try to find a good example of a typical NLP task for which maxpooling (i.e., detecting whether or not a particular type of thingoccurs in a sentence) is the wrong approach.

(Alternatively, try to find a good example of a typical vision taskfor which max pooling (i.e., detecting whether or not a particulartype of thing occurs in a scene) is the wrong approach.)

Schutze (LMU Munich): Neural networks 31 / 33

Page 32: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Convolutional filter H

a = g(H ⊙ X )

Kernel size k : length of subsequence

H is applied to every subsequence of length k .

X is the representation of the subsequence,of dimensionality D × k .

D is the dimensionality of the embeddings.

H also has dimensionality D × k .

⊙ is the (Frobenius) inner product:H ⊙ X =

∑(i ,j)HijXij

g : nonlinearity (e.g., sigmoid)

Schutze (LMU Munich): Neural networks 32 / 33

Page 33: Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks Task: Does this sentence mention an officer leaving? Given: A sentence Workforce

Neural networks: Basics Convolutional neural networks

Notation

V vocabulary sizeD embedding dimensionalityC number of classesC i number of input channelsC o number of output channelsKs kernel sizesN minibatch sizeW padded sequence length

Schutze (LMU Munich): Neural networks 33 / 33