Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional...

Post on 30-May-2020

20 views 0 download

Transcript of Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional...

Adaptive Joint Learning of Compositional and

Non-Compositional Phrase Embeddings

Kazuma Hashimoto & Yoshimasa Tsuruoka

University of Tokyo

08/08/2016 ACL 2016 in Berlin, Germany

Session 1F: Noncompositionality

• Embedding meanings of phrases into a vector space

– e.g. Recurrent neural networks

Learning Phrase Embeddings

grab attentioncatch eye𝒗 catch eye

phrase embeddings

08/08/2016 ACL 2016 in Berlin, Germany

𝒗 catch 𝒗 eye

Composition function

𝒗 grab 𝒗 attention

𝒗 grab attention

word embeddings

Compositional semantics

2 / 17

• Not all phrases are compositional

Compositional or Non-Compositional?

08/08/2016 ACL 2016 in Berlin, Germany

catch earcatch eye

𝒄 catch eye

𝒗 catch 𝒗 eye

Composition function

catch e-mail

grab attention

catch eyecatch attention

𝒏 catch_eye

Compositionalembedding

Non-compositionalembedding

Selecting one of them is

not always the best option!

3 / 17

This Work: Adaptive Joint Learning

08/08/2016 ACL 2016 in Berlin, Germany

phrase

𝑝

Compositionality detection

𝛼 𝑝 ∈ 0, 1

Compositional embedding

Non-compositional embedding

𝛼 𝑝

1 − 𝛼 𝑝

phrase embedding

𝒄 𝑝

𝒏 𝑝

𝒗 𝑝

State-of-the-art results on Compositionality detection for Verb-Object (VO) pairs Transitive verb disambiguation

𝑝 = 𝑉𝑂

4 / 17

• Very simple weighted addition of 𝒄 𝑝 and 𝒏 𝑝

– 𝑝 : phrase

– 𝛼 𝑝 : compositionality score

Adaptive Weighted Addition

08/08/2016 ACL 2016 in Berlin, Germany

compositional

embedding

weight vector

logistic function

non-compositional

embedding

feature vector

5 / 17

• All model parameters are jointly learned according to

the objective function for a specific task

– The compositionality levels are automatically

adjusted to the task

Overview of the General Training Process

08/08/2016 ACL 2016 in Berlin, Germany

A specifictask

Minimize the objective function

Backpropagation

6 / 17

• Co-occurrence-based Subject-Verb-Object (SVO)

phrase embeddings (Hashimoto and Tsuruoka, 2015)

– Applying our joint learning method to VO phrases

• 𝑝 = 𝑉𝑂 in our method

Focusing on Verb-Object Compositionality

08/08/2016 ACL 2016 in Berlin, Germany

[importer make payment]

7 / 17

𝒄 𝑉𝑂

• Indices (binary features) of

– Verb

– Object

– Verb-object pair

• Count-based features (Venkatapathy and Joshi, 2005)

– Frequency

– PMI

• Effective in detecting the compositionality of verb-

object pairs (McCarthy et al., 2007)

Simple Statistical Features for 𝛼 𝑉𝑂

08/08/2016 ACL 2016 in Berlin, Germany 8 / 17

• Training data

– BNC

• SVO: 1.38M pairs, SVOPN 0.93M pairs

– Wikipedia

• SVO 23.6M pairs, SVOPN 17.3M pairs

– BNC-Wikipedia

• The concatenation of the BNC and Wikipedia data

• 25-dimensional embeddings (and matrices)

– Random initialization

– AdaGrad (Duchi et al., 2011)

Experimental Settings

08/08/2016 ACL 2016 in Berlin, Germany 9 / 17

• Measuring compositionality levels of verb-object pairs

– Human rating: [1, 6]

• Higher score more compositional

–VJ’05: 765 pairs (Venkatapathy and Joshi, 2005)

–MC’07: 638 pairs (McCarthy et al., 2007)

Evaluation 1: Compositionality Detection

08/08/2016 ACL 2016 in Berlin, Germany

System

output

Spearman’srank correlation

Human

rating 𝛼 𝑉𝑂

10 / 17

• Our method outperforms the state-of-the-art

supervised and WordNet-based methods

– Without explicit training data for the task

State of the Art Using 𝛼 𝑉𝑂

08/08/2016 ACL 2016 in Berlin, Germany

Ranking based onPMI and Frequency

Supervised learning with features

11 / 17

Trends of 𝛼 𝑉𝑂 during the Training

08/08/2016 ACL 2016 in Berlin, Germany

• 𝛼 𝑉𝑂 converges according to its corresponding phrase

– Room for improvement when treating light verbs

Compositional

Non-Compositional

𝛼 𝑉𝑂

12 / 17

• The results match our intuition

– 𝛼 𝑉𝑂 is low for phrases which form idiomatic phrases

Which Verbs are Compositional?

08/08/2016 ACL 2016 in Berlin, Germany

Light verbs

13 / 17

• Measuring similarities of transitive verbs paired with

the same subjects and objects

– Human rating: [1, 7]

• Higher score more similar

Evaluation 2: Verb Disambiguation

08/08/2016 ACL 2016 in Berlin, Germany

Transitive verb disambiguation task (GS’11)

phrase pair human rating system output

student write name

student spell name7 0.91

child show sign

child express sign6 0.85

system meet criterion

system visit criterion1 -0.1

Spearman’srank correlation

(Grefenstette and Sadrzadeh, 2011)

cosine distance betweenSVO embeddings

14 / 17

• Outperforming the baseline and other methods

– Our adaptive joint learning method is more

effective than just using compositional embeddings

State of the Art Using Joint Embeddings

08/08/2016 ACL 2016 in Berlin, Germany

Baseline of𝒗 𝑉𝑂 = 𝒄 𝑉𝑂

15 / 17

Examples of Closest Neighbor Embeddings

08/08/2016 ACL 2016 in Berlin, Germany

should be higher

capturingidiomatic aspect

Baseline of 𝒗 𝑉𝑂 = 𝒄 𝑉𝑂

16 / 17

• Adaptive joint learning of

– Compositional and non-compositional embeddings

– Compositionality detection function

• State-of-the-art results on compositionality

detection and transitive verb disambiguation tasks

• Future work:

– Context-sensitive compositionality detection

– General types of phrases

Summary

08/08/2016 ACL 2016 in Berlin, Germany 17 / 17