Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks...

Post on 19-Aug-2020

23 views 0 download

Transcript of Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks...

Neural networks: Basics Convolutional neural networks

Deep Learning II: Neural Networks

Hinrich Schutze

Center for Information and Language Processing, LMU Munich

2017-07-21

Schutze (LMU Munich): Neural networks 1 / 33

Neural networks: Basics Convolutional neural networks

Overview

1 Neural networks: Basics

2 Convolutional neural networks

Schutze (LMU Munich): Neural networks 2 / 33

Neural networks: Basics Convolutional neural networks

Outline

1 Neural networks: Basics

2 Convolutional neural networks

Schutze (LMU Munich): Neural networks 3 / 33

Neural networks: Basics Convolutional neural networks

Inverted ClassroomAndrew Ng: “Machine Learning”

http://coursera.org

Schutze (LMU Munich): Neural networks 4 / 33

Neural networks: Basics Convolutional neural networks

Neural networks: Andrew Ng videos

Model representation I

Model representation II

Schutze (LMU Munich): Neural networks 5 / 33

Neural networks: Basics Convolutional neural networks

A single neuron

Input nodes: x1, x2, x3

Parameters/weights: lines connecting nodes

Raw input to neuron: weighted sum ΘT~x =∑3

i=1 θixi

Nonlinear activation function (e.g., sigmoid):g(ΘT~x) = 1/(1 + exp(−ΘT~x))

Output of neuron: g(ΘT~x)

Schutze (LMU Munich): Neural networks 6 / 33

Neural networks: Basics Convolutional neural networks

A neuron

Inputs: x1, x2, x3

Parameters (= weights = lines): θ1, θ2, θ3

Activation function (e.g., sigmoid / logistic)

Hypothesis: hΘ(~x) = (ΘT~x)

Schutze (LMU Munich): Neural networks 7 / 33

Neural networks: Basics Convolutional neural networks

A neural network

Input layer (same as before): xi

Hidden layer, here: three neurons

Output layer, here: single neuron

Activations a(k)i , k = layer

Full connectivity

Same or different activation functions

Schutze (LMU Munich): Neural networks 8 / 33

Neural networks: Basics Convolutional neural networks

A neural network

Schutze (LMU Munich): Neural networks 9 / 33

Neural networks: Basics Convolutional neural networks

Another neural network architecture

Schutze (LMU Munich): Neural networks 10 / 33

Neural networks: Basics Convolutional neural networks

Another neural network architecture

Schutze (LMU Munich): Neural networks 11 / 33

Neural networks: Basics Convolutional neural networks

Forward propagation of activity

a(i)j = gi (Θ

Tij ~a

(i−1)) = 1/(1 + exp(−ΘTij ~a

(i−1)))

Schutze (LMU Munich): Neural networks 12 / 33

Neural networks: Basics Convolutional neural networks

Learning/Training: Backpropagation

As before: cost function

As before: objective(find parameters that minimize cost)

As before: gradient descent

That is: compute gradient and move along gradient

What’s new:We use backpropagation to compute the gradient.

Schutze (LMU Munich): Neural networks 13 / 33

Neural networks: Basics Convolutional neural networks

Gradient descent

Schutze (LMU Munich): Neural networks 14 / 33

Neural networks: Basics Convolutional neural networks

Neurons can be trained to detect features.

Schutze (LMU Munich): Neural networks 15 / 33

Neural networks: Basics Convolutional neural networks

Deep learning:

Each layer learns more powerful/abstract features.

Schutze (LMU Munich): Neural networks 16 / 33

Neural networks: Basics Convolutional neural networks

Increasingly abstract features in vision

Schutze (LMU Munich): Neural networks 17 / 33

Neural networks: Basics Convolutional neural networks

Number of weights/parameters

|L1| ∗ |L2|+ |L2| ∗ |L3| = 3 · 3 + 3 · 1 = 12

Schutze (LMU Munich): Neural networks 18 / 33

Neural networks: Basics Convolutional neural networks

Exercise: Number of weights/parameters

Schutze (LMU Munich): Neural networks 19 / 33

Neural networks: Basics Convolutional neural networks

Task: Does this sentence mention an officer leaving?

Given: A sentence

Workforce Solutions Alamo fired CEO John Hathaway yesterday.

Binary classification task

Class “yes”: This sentence contains information about an officerleaving a company (so a financial analyst should look at it).Class “no”: This sentence does not contain information about anofficer leaving a company (so nobody has to look at it).

Correct class in this case?

Class “yes”: This sentence contains information about an officerleaving a company.Class “no”: This sentence does not contain information about anofficer leaving a company.

Schutze (LMU Munich): Neural networks 20 / 33

Neural networks: Basics Convolutional neural networks

Task: Does this sentence mention an officer leaving?

Given: A sentence

CEO John Hathaway fired his gardener yesterday.

Correct class in this case?

Class “yes”: This sentence contains information about an officerleaving a company.Class “no”: This sentence does not contain information about anofficer leaving a company.

Schutze (LMU Munich): Neural networks 21 / 33

Neural networks: Basics Convolutional neural networks

Task: Does this sentence mention an officer leaving?

Given: A sentence

This picture shows parting CEO Cook talking with ex-CFO Dyer.

Correct class in this case?

Class “yes”: This sentence contains information about an officerleaving a company.Class “no”: This sentence does not contain information about anofficer leaving a company.

Schutze (LMU Munich): Neural networks 22 / 33

Neural networks: Basics Convolutional neural networks

Simple architecture for detecting leaving events

Schutze (LMU Munich): Neural networks 23 / 33

Neural networks: Basics Convolutional neural networks

Hypothesis? Parameters? Cost? Objective?

Schutze (LMU Munich): Neural networks 24 / 33

Neural networks: Basics Convolutional neural networks

Simplest architecture: Fixed-length input

→ Padding

1 2 3 4 5 6 7 8 9The board forced him to resign PAD PAD PADA majority of the board forced him to quit

Eventually the escalation led him to resign PAD PADIt’s legal threats that compelled him to leave PAD

key idea of convolution:learn a filter for the pattern“[force] [pronoun] to [leave]”filter = feature detector

Schutze (LMU Munich): Neural networks 25 / 33

Neural networks: Basics Convolutional neural networks

Exercise

If you use this architecture, why is it hard to learn the filter “[force][pronoun] to [leave]”?

Schutze (LMU Munich): Neural networks 26 / 33

Neural networks: Basics Convolutional neural networks

Outline

1 Neural networks: Basics

2 Convolutional neural networks

Schutze (LMU Munich): Neural networks 27 / 33

Neural networks: Basics Convolutional neural networks

Use convolution&pooling architecture Input layer

Convolution layer Convolution layer (filter size 3)

Max pooling layer Applying convolutional filterMax pooling ⇒ Sentence describes “a leaving

event”. Convolution computes features. Maxpooling selects max feature.

convolution&pooling=compute&select features

0.9

0.0 0.0 0.0 0.1 0.8 0.6 0.2 0.0 0.7 0.9 0.9 0.1

PAD

PAD

This

pictu

re

shows

partin

g

CEO

Cook

talking

with

ex-C

FO

Dyer

PAD

PAD

a = g(H ⊙ X )a = maxici

g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )

max pooling

poolinglayer

selectsmax feature

convolutionlayer

computesfeatures

inputlayer

Schutze (LMU Munich): Neural networks 28 / 33

Neural networks: Basics Convolutional neural networks

Convolution & pooling

Widely used in vision

Recent development: widely used in NLP

Best example of successful transferfrom vision to NLP

Schutze (LMU Munich): Neural networks 29 / 33

Neural networks: Basics Convolutional neural networks

Convolution and max pooling in vision

Schutze (LMU Munich): Neural networks 30 / 33

Neural networks: Basics Convolutional neural networks

Exercise

Try to find a good example of a typical NLP task for which maxpooling (i.e., detecting whether or not a particular type of thingoccurs in a sentence) is the wrong approach.

(Alternatively, try to find a good example of a typical vision taskfor which max pooling (i.e., detecting whether or not a particulartype of thing occurs in a scene) is the wrong approach.)

Schutze (LMU Munich): Neural networks 31 / 33

Neural networks: Basics Convolutional neural networks

Convolutional filter H

a = g(H ⊙ X )

Kernel size k : length of subsequence

H is applied to every subsequence of length k .

X is the representation of the subsequence,of dimensionality D × k .

D is the dimensionality of the embeddings.

H also has dimensionality D × k .

⊙ is the (Frobenius) inner product:H ⊙ X =

∑(i ,j)HijXij

g : nonlinearity (e.g., sigmoid)

Schutze (LMU Munich): Neural networks 32 / 33

Neural networks: Basics Convolutional neural networks

Notation

V vocabulary sizeD embedding dimensionalityC number of classesC i number of input channelsC o number of output channelsKs kernel sizesN minibatch sizeW padded sequence length

Schutze (LMU Munich): Neural networks 33 / 33