Johannes Bjerva — [email protected] — 08/12/2016 Semantic...

31
Johannes Bjerva — [email protected] — 08/12/2016 Semantic Analysis with Deep Neural Networks 1

Transcript of Johannes Bjerva — [email protected] — 08/12/2016 Semantic...

Page 1: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Johannes Bjerva — [email protected] — 08/12/2016

Semantic Analysis with Deep Neural Networks

1

Page 2: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Deep Learning Overview

2

Page 3: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Why Deep Learning?

3

Page 4: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

— Arthur Samuel, 1959

“[Machine Learning] gives computers the ability to learn without being explicitly programmed”

What is Machine Learning?

Annotated data: Task: ?

4

Page 5: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

What is Machine Learning?

❖ Task: Part-of-Speech tagging❖ Performance: e.g. accuracy❖ Data: Annotated corpus

Demo!

Model

Train

Predict

?

5

Page 6: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

What does the computer learn from?

Features!

❖ Hand-coded❖ Time consuming❖ Not necessarily effective

❖ Word and character n-grams❖ Relevant linguistic properties

(e.g. affixes, capitalisation, root form)

Model

Train

Predict

?

Colours

Number of corners

6

Page 7: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

What is a neural network?

❖ Biologically inspired

1.Take an input

2.Learn feature representations

3.Predict output

4.Self-correct if output is wrong

5.Repeat!

Colour Number of corners

7

Page 9: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

What can Deep Learning do?❖ Make psychedelic images!❖ Google QuickDraw:

https://quickdraw.withgoogle.com❖ Text-to-speech:

https://deepmind.com/blog/wavenet-generative-model-raw-audio/

❖ Generate hand-writing: http://www.cs.toronto.edu/~graves/handwriting.cgi

❖ Currently the most successful approach to many NLP problems

9

Page 10: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

10

Page 11: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Deep Learning for everything?

❖ Not a silver bullet❖ Simple problems do not

require fancy methods

11

Page 12: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Deep Learning and the Human Brain

❖ In Computational Linguistics: Not an attempt to model the brain

❖ Some inspiration is useful (ReLU)

12

Page 13: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Why are neural networks back?

❖ More computational power (GPUs)

❖ More data❖ Better algorithms/architectures

13

Page 14: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Semantic Analysis with Deep Neural Networks

14

Page 15: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Semantic Analysis

❖ Parallel Meaning Bank http://pmb.let.rug.nl/explorer/explore.php

❖ English, Dutch, German, Italian❖ Goal:

Parallel corpus with Discourse Representation Structures for all languages

❖ About 11 million tokens

15

Page 16: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Chapter I: Semantic Tagging

❖ Multilingual Semantic Parsing❖ Experimenting with different Neural Network architectures

16

Page 17: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Semantic Tags — Motivation❖ POS tags: insufficient and irrelevant information

❖ Insufficient:

❖ every (DT / univ. quant.)

❖ no (DT / neg.)

❖ some (DT / exist. quant.)

❖ Irrelevant:

❖ walks (VBZ / pres. simpl.)

❖ walk (VBP / pres. simpl.)

17

Page 18: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Semantic Tags — Example

Tokens: These cats live in that house .

Sem-tags: PRX CON ENS REL DST CON NIL

UD-POS: DET NOUN VERB ADP DET NOUN PUNCT

18

Page 19: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Semantic Tags — Overview

❖ About 75 tags

❖ Abstract over POS and NE tags

❖ Includes categories for negation, modality and quantification

❖ Generalises over languages (en, de, nl, it)

19

Page 20: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Auxiliary tasks

❖ Giving the NN more work to do

❖ Informing the NN of what additional task might be helpful to learn

Neural Network

POS SEM

Input

Neural Network

POS f(w)

Input

❖ Word frequencies for POS tagging

❖ This work: Semantic tags for POS tagging

[Plank et al., 2016]20

Page 21: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Results

[Bjerva et al., 2016]21

Page 22: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Chapter II: Multitask Learning

When and why does Multitask Learning help?

22

Page 23: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

When does MTL help?

❖ “[…] when the label distribution is compact and uniform”

❖ —> High entropy, few labels

23

Page 24: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Is ‘high entropy’ sufficient?

Tokens: These cats live in that house .

Sem-tags: PRX CON ENS REL DST CON NIL

UD-POS: DET NOUN VERB ADP DET NOUN PUNCT

24

Page 25: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Is ‘high entropy’ sufficient?

Tokens: These cats live in that house .

Sem-tags: CON NIL DST PRX ENS REL CON

UD-POS: DET NOUN VERB ADP DET NOUN PUNCT

25

Page 26: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Tagset correlations

26

Page 27: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Information-theoretic Measures

❖ Calculating tagset correlations:❖ Conditional Entropy❖ Mutual Information

27

Page 28: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Correlation with Auxiliary Task Effectivity

Conditional Entropy and Mutual Information both correlate far better than entropy!

28

Page 29: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Change in accuracy (x) vs. Entropy (y)

Page 30: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Change in accuracy (x) vs. Mutual Information (y)

Page 31: Johannes Bjerva — j.bjerva@rug.nl — 08/12/2016 Semantic ...bjerva.github.io/talks/higher_seminar_081216.pdf · Semantic Analysis with Deep Neural Networks 1. Deep Learning Overview

Remaining chapters

❖ Chapter III: Multilingual Learning❖ Chapter IV: Semantic Similarity between Words and Sentences

(SemEval Shared Tasks)❖ Chapter V: Dataset Augmentation

31