Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks...

Neural networks: Basics Convolutional neural networks

Deep Learning II: Neural Networks

Hinrich Schutze

Center for Information and Language Processing, LMU Munich

2017-07-21

Schutze (LMU Munich): Neural networks 1 / 33

Overview

1 Neural networks: Basics

2 Convolutional neural networks

Outline

Inverted ClassroomAndrew Ng: “Machine Learning”

http://coursera.org

Neural networks: Andrew Ng videos

Model representation I

Model representation II

A single neuron

Input nodes: x1, x2, x3

Parameters/weights: lines connecting nodes

Raw input to neuron: weighted sum ΘT~x =∑3

i=1 θixi

Nonlinear activation function (e.g., sigmoid):g(ΘT~x) = 1/(1 + exp(−ΘT~x))

Output of neuron: g(ΘT~x)

A neuron

Inputs: x1, x2, x3

Parameters (= weights = lines): θ1, θ2, θ3

Activation function (e.g., sigmoid / logistic)

Hypothesis: hΘ(~x) = (ΘT~x)

A neural network

Input layer (same as before): xi

Hidden layer, here: three neurons

Output layer, here: single neuron

Activations a(k)i , k = layer

Full connectivity

Same or different activation functions

A neural network

Another neural network architecture

Forward propagation of activity

a(i)j = gi (Θ

Tij ~a

(i−1)) = 1/(1 + exp(−ΘTij ~a

(i−1)))

Learning/Training: Backpropagation

As before: cost function

As before: objective(find parameters that minimize cost)

As before: gradient descent

That is: compute gradient and move along gradient

What’s new:We use backpropagation to compute the gradient.

Gradient descent

Neurons can be trained to detect features.

Deep learning:

Each layer learns more powerful/abstract features.

Increasingly abstract features in vision

Number of weights/parameters

|L1| ∗ |L2|+ |L2| ∗ |L3| = 3 · 3 + 3 · 1 = 12

Exercise: Number of weights/parameters

Task: Does this sentence mention an officer leaving?

Given: A sentence

Workforce Solutions Alamo fired CEO John Hathaway yesterday.

Binary classification task

Class “yes”: This sentence contains information about an officerleaving a company (so a financial analyst should look at it).Class “no”: This sentence does not contain information about anofficer leaving a company (so nobody has to look at it).

Correct class in this case?

Class “yes”: This sentence contains information about an officerleaving a company.Class “no”: This sentence does not contain information about anofficer leaving a company.

Given: A sentence

CEO John Hathaway fired his gardener yesterday.

Given: A sentence

This picture shows parting CEO Cook talking with ex-CFO Dyer.

Simple architecture for detecting leaving events

Hypothesis? Parameters? Cost? Objective?

Simplest architecture: Fixed-length input

→ Padding

1 2 3 4 5 6 7 8 9The board forced him to resign PAD PAD PADA majority of the board forced him to quit

Eventually the escalation led him to resign PAD PADIt’s legal threats that compelled him to leave PAD

key idea of convolution:learn a filter for the pattern“[force] [pronoun] to [leave]”filter = feature detector

Exercise

If you use this architecture, why is it hard to learn the filter “[force][pronoun] to [leave]”?

Outline

Use convolution&pooling architecture Input layer

Convolution layer Convolution layer (filter size 3)

Max pooling layer Applying convolutional filterMax pooling ⇒ Sentence describes “a leaving

event”. Convolution computes features. Maxpooling selects max feature.

convolution&pooling=compute&select features

0.0 0.0 0.0 0.1 0.8 0.6 0.2 0.0 0.7 0.9 0.9 0.1

partin

talking

a = g(H ⊙ X )a = maxici

g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )g(H⊙X )

max pooling

poolinglayer

selectsmax feature

convolutionlayer

computesfeatures

inputlayer

Convolution & pooling

Widely used in vision

Recent development: widely used in NLP

Best example of successful transferfrom vision to NLP

Convolution and max pooling in vision

Exercise

Try to find a good example of a typical NLP task for which maxpooling (i.e., detecting whether or not a particular type of thingoccurs in a sentence) is the wrong approach.

(Alternatively, try to find a good example of a typical vision taskfor which max pooling (i.e., detecting whether or not a particulartype of thing occurs in a scene) is the wrong approach.)

Convolutional filter H

a = g(H ⊙ X )

Kernel size k : length of subsequence

H is applied to every subsequence of length k .

X is the representation of the subsequence,of dimensionality D × k .

D is the dimensionality of the embeddings.

H also has dimensionality D × k .

⊙ is the (Frobenius) inner product:H ⊙ X =

∑(i ,j)HijXij

g : nonlinearity (e.g., sigmoid)

Notation

V vocabulary sizeD embedding dimensionalityC number of classesC i number of input channelsC o number of output channelsKs kernel sizesN minibatch sizeW padded sequence length

Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks...

Documents

Transcript of Deep Learning II: Neural Networkscnn.pdf · Neural networks: Basics Convolutional neural networks...

Convolutional Neural Networks for Sentence Classi …yoonkim/data/sent-cnn-slides.pdfConvolutional Neural Networks for Sentence Classi cation Word Embeddings Deep learning in Natural

ARTIFICIAL NEURAL NETWORKS. Introduction to Neural Networks.

Convolutional Neural Network for Sentence Classification

Neural network for Neural Networks

Convolutional Neural Networks for Language · Convolutional Neural Networks for Language Features from text Example: Sentiment classification The goal: Is the sentiment of a sentence

NEURAL NETWORKS,CELLULAR NEURAL NETWORKS AND ADAPTIVE FUZZY FILTERS

CS536: Machine Learning Artificial Neural Networks Neural Networks

Neural Networks Part II Feed-Forward Neural Networks

Neural Networks: Old and New · Arti cial neural networks Brain neural networks Credit: Max Pixel Arti cial neural networks Why called arti cial? {(Over-)simpli cation on neural level

Neural and Fuzzy Neural Networks

Deep Parametric Continuous Convolutional Neural Networks€¦ · Graph Neural Networks: Graph neural networks (GNNs) [25] are generalizations of neural networks to graph structured

Artiﬁcial Neural Networks and Fuzzy Neural Networks for ...

Multi-Perspective Sentence Similarity Modeling with ...kgimpel/papers/he+etal.emnlp15.poster.pdf · Multi-Perspective Sentence Similarity Modeling with Convolutional Neural Networks

Neural Networks · Neural Networks NEURAL NETWORKS by Christos Stergiou and Dimitrios Siganos Abstract This report is an introduction to Artificial Neural Networks. The various types

SENTENCE ORDERING USING R NEURAL NETWORKS

JYC: CSM17 Bioinformatics Week 9: Simulations #3: Neural Networks biological neurons natural neural networks artificial neural networks applications.

Neural Networks Chapter 8. 8.1 Feed-Forward Neural Networks.

Higher Order Neural Networks and Neural Networks for ...

Neural Networks: Backpropagation - sviveksvivek.com/.../fall2018/slides/neural-networks/neural-networks-backpropagation.pdfNeural Networks: Backpropagation 1 Based on slides and material

Neural Networks Neural Networks based on Competition CHAPTER 4.