Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional...

Adaptive Joint Learning of Compositional and

Non-Compositional Phrase Embeddings

Kazuma Hashimoto & Yoshimasa Tsuruoka

University of Tokyo

08/08/2016 ACL 2016 in Berlin, Germany

Session 1F: Noncompositionality

• Embedding meanings of phrases into a vector space

– e.g. Recurrent neural networks

Learning Phrase Embeddings

grab attentioncatch eye𝒗 catch eye

phrase embeddings

𝒗 catch 𝒗 eye

Composition function

𝒗 grab 𝒗 attention

𝒗 grab attention

word embeddings

Compositional semantics

2 / 17

• Not all phrases are compositional

Compositional or Non-Compositional?

catch earcatch eye

𝒄 catch eye

𝒗 catch 𝒗 eye

Composition function

catch e-mail

grab attention

catch eyecatch attention

𝒏 catch_eye

Compositionalembedding

Non-compositionalembedding

Selecting one of them is

not always the best option!

3 / 17

This Work: Adaptive Joint Learning

phrase

Compositionality detection

𝛼 𝑝 ∈ 0, 1

Compositional embedding

Non-compositional embedding

𝛼 𝑝

1 − 𝛼 𝑝

phrase embedding

𝒄 𝑝

𝒏 𝑝

𝒗 𝑝

State-of-the-art results on Compositionality detection for Verb-Object (VO) pairs Transitive verb disambiguation

𝑝 = 𝑉𝑂

4 / 17

• Very simple weighted addition of 𝒄 𝑝 and 𝒏 𝑝

– 𝑝 : phrase

– 𝛼 𝑝 : compositionality score

Adaptive Weighted Addition

compositional

embedding

weight vector

logistic function

non-compositional

embedding

feature vector

5 / 17

• All model parameters are jointly learned according to

the objective function for a specific task

– The compositionality levels are automatically

adjusted to the task

Overview of the General Training Process

A specifictask

Minimize the objective function

Backpropagation

6 / 17

• Co-occurrence-based Subject-Verb-Object (SVO)

phrase embeddings (Hashimoto and Tsuruoka, 2015)

– Applying our joint learning method to VO phrases

• 𝑝 = 𝑉𝑂 in our method

Focusing on Verb-Object Compositionality

[importer make payment]

7 / 17

𝒄 𝑉𝑂

• Indices (binary features) of

– Verb

– Object

– Verb-object pair

• Count-based features (Venkatapathy and Joshi, 2005)

– Frequency

– PMI

• Effective in detecting the compositionality of verb-

object pairs (McCarthy et al., 2007)

Simple Statistical Features for 𝛼 𝑉𝑂

08/08/2016 ACL 2016 in Berlin, Germany 8 / 17

• Training data

– BNC

• SVO: 1.38M pairs, SVOPN 0.93M pairs

– Wikipedia

• SVO 23.6M pairs, SVOPN 17.3M pairs

– BNC-Wikipedia

• The concatenation of the BNC and Wikipedia data

• 25-dimensional embeddings (and matrices)

– Random initialization

– AdaGrad (Duchi et al., 2011)

Experimental Settings

• Measuring compositionality levels of verb-object pairs

– Human rating: [1, 6]

• Higher score more compositional

–VJ’05: 765 pairs (Venkatapathy and Joshi, 2005)

–MC’07: 638 pairs (McCarthy et al., 2007)

Evaluation 1: Compositionality Detection

System

output

Spearman’srank correlation

rating 𝛼 𝑉𝑂

10 / 17

• Our method outperforms the state-of-the-art

supervised and WordNet-based methods

– Without explicit training data for the task

State of the Art Using 𝛼 𝑉𝑂

Ranking based onPMI and Frequency

Supervised learning with features

11 / 17

Trends of 𝛼 𝑉𝑂 during the Training

• 𝛼 𝑉𝑂 converges according to its corresponding phrase

– Room for improvement when treating light verbs

Compositional

Non-Compositional

𝛼 𝑉𝑂

12 / 17

• The results match our intuition

– 𝛼 𝑉𝑂 is low for phrases which form idiomatic phrases

Which Verbs are Compositional?

Light verbs

13 / 17

• Measuring similarities of transitive verbs paired with

the same subjects and objects

– Human rating: [1, 7]

• Higher score more similar

Evaluation 2: Verb Disambiguation

Transitive verb disambiguation task (GS’11)

phrase pair human rating system output

student write name

student spell name7 0.91

child show sign

child express sign6 0.85

system meet criterion

system visit criterion1 -0.1

Spearman’srank correlation

(Grefenstette and Sadrzadeh, 2011)

cosine distance betweenSVO embeddings

14 / 17

• Outperforming the baseline and other methods

– Our adaptive joint learning method is more

effective than just using compositional embeddings

State of the Art Using Joint Embeddings

Baseline of𝒗 𝑉𝑂 = 𝒄 𝑉𝑂

15 / 17

Examples of Closest Neighbor Embeddings

should be higher

capturingidiomatic aspect

Baseline of 𝒗 𝑉𝑂 = 𝒄 𝑉𝑂

16 / 17

• Adaptive joint learning of

– Compositional and non-compositional embeddings

– Compositionality detection function

• State-of-the-art results on compositionality

detection and transitive verb disambiguation tasks

• Future work:

– Context-sensitive compositionality detection

– General types of phrases

Summary

Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional...

Documents

Transcript of Adaptive Joint Learning of Compositional and Non ... · Adaptive Joint Learning of Compositional...

Deadman Wonderland Author: Kataoka Jinsei I llustrator: Kondou Kazuma

Australian regulatory guidelines for complementary ... · How compositional guidelines are used -----118 When compositional guidelines are generated-----118 Publication of compositional

Kazuma Surfboards Hawaii - Kazuma Surfboards - Europe€¦ · Created Date: 12/29/2012 9:28:01 AM

A Concise Guide to Compositional Data Analysisima.udg.edu/.../A_concise_guide_to_compositional_data_analysis.pdf · A Concise Guide to Compositional Data Analysis ... 3.2 Compositional

Compositional Semantics

Compositional techniques

Towards Compositional Feedback in Non-Deterministic and ......Dragomir, Preoteasa, Tripakis. Compositional Semantics and Analysis of Hierarchical Compositional Semantics and Analysis

Compositional Mapping

Compositional Programming

Compositional and lexical semantics • Compositional semantics: the ...

Introduction to Categorical Compositional Distributional Semantics Lecture · PDF fileIntroduction to Categorical Compositional Distributional Semantics ... We’ve described compositional

XIX. DELIUS'S COMPOSITIONAL TECHNIQUESthompsonian.info/Delius-Choral-Music-Caldwell-12-Compositional-Techniques-13...XIX. DELIUS'S COMPOSITIONAL TECHNIQUES In concluding the analytical

The Compositional Evolution of David'sresources.metmuseum.org/resources/metpublications/pdf/Compositional... · The Compositional Evolution of David's Leonidas at Thermopylae STEVEN

Compositional presentation

Compositional triangles

1 Compositional Dictionaries for Domain Adaptive Face ...people.duke.edu/~qq3/pub/comdic_final.pdfIn [6], [8], face recognition across pose is performed using stereo matching distance

LNCS 7623 - A Compositional Model for Gesture Definitiongiove.isti.cnr.it/attachments/publications/compositional model.pdf · A Compositional Model for Gesture Definition ... are

Compositional Slideshow

Adaptive Confidence Smoothing for Generalized …openaccess.thecvf.com/content_CVPR_2019/papers/Atzmon...In a broad perspective, zero-shot learning is a task of compositional reasoning

Compositional and Kinetic Analysis of Oil Shale Pyrolysis ...repository.icse.utah.edu/dspace/bitstream/123456789/11159/1/Compositional & kinetic...Compositional and Kinetic Analysis