Introduction to Distributional Semantics

43
Introduction to Distributional Semantics André Freitas Insight Centre for Data Analytics Insight Workshop on Distributional Semantics Galway, 2014 Based on the Great ESSLLI Tutorial from Evert & Lenci

Transcript of Introduction to Distributional Semantics

Page 1: Introduction to Distributional Semantics

Introduction to Distributional Semantics

André Freitas Insight Centre for Data Analytics

Insight Workshop on Distributional Semantics

Galway, 2014

Based on the Great ESSLLI Tutorial from Evert & Lenci

Page 2: Introduction to Distributional Semantics

Outline Contemporary Semantics Distributional Semantics Compositional-Distributional Semantics Take-away message

Page 3: Introduction to Distributional Semantics

Contemporary Semantics

Page 4: Introduction to Distributional Semantics

Shift in the Semantics Landscape

Corroboration

PraxisScientific / FormalPhilosophical

Semantics as a complex

phenomena

Page 5: Introduction to Distributional Semantics

Semantics for a Complex World• Most semantic models have dealt with particular types of

constructions, and have been carried out under very simplifying assumptions, in true lab conditions.

• If these idealizations are removed it is not clear at all that modern semantics can give a full account of all but the simplest models/statements.

Sahlgren, 2013

Formal World Real World

Baroni et al., 2012

Page 6: Introduction to Distributional Semantics

What is Distributional Semantics?

Page 7: Introduction to Distributional Semantics

Meaning

Word meaning is usually represented in terms of some formal, symbolic structure, either external or internal to the word

External structure- Associations between different concepts

Internal structure- Feature (property, attribute) lists

The semantic properties of a word are derived from the formal structure of its representation

- e.g. Inference algorithm, etc.

Semantics = Meaning representation model (data) + inference model

Page 8: Introduction to Distributional Semantics

Formal Representation of Meaning Modelling fine-grained lexical inferences

Page 9: Introduction to Distributional Semantics

Formal Representation of Meaning (Problems)

Different meanings- bat (animal), bat (artefact)

Meaning variation in context- clever politician, clever tycoon

Meaning evolution

Ambiguity, vagueness, inconsistency

Word meaning acquisition

Lack of flexibility

Scalability

Page 10: Introduction to Distributional Semantics

Distributional Hypothesis

“Words occurring in similar (linguistic) contexts tend to be semantically similar”

He filled the wampimuk with the substance, passed it around and we all drunk some

We found a little, hairy wampimuk sleeping behind the tree

Page 11: Introduction to Distributional Semantics

Weak and Strong DH (Lenci, 2008) Weak DH:

- Word meaning is reflected in linguistic distributions- By inspecting a sufficiently large number of

distributional contexts we may have a useful surrogate representation of meaning.

Strong DH:- A cognitive hypothesis about the form and origin of

semantic representations

Page 12: Introduction to Distributional Semantics

Contextual Representation

Abstract structure that accumulates encounters with the words in various (linguistic) contexts.

For our purposes …- Context is equated with linguistic context

Page 13: Introduction to Distributional Semantics

Distributional Semantic Models (DSMs)“The dog barked in the park. The owner of the dog put

him on theleash since he barked.”

Page 14: Introduction to Distributional Semantics

Distributional Semantic Models (DSMs)“The dog barked in the park. The owner of the dog put

him on theleash since he barked.” contexts = nouns and verbs in the

same sentence

Page 15: Introduction to Distributional Semantics

Distributional Semantic Models (DSMs)“The dog barked in the park. The owner of the dog put

him on theleash since he barked.”

bark

dog

park

leash

contexts = nouns and verbs in the same sentence

bark : 2park : 1leash : 1owner : 1

Page 16: Introduction to Distributional Semantics

Distributional Semantic Models (DSMs)distributional matrix = targets x contexts

contexts

targets

Vector Space Model (VSM)

Page 17: Introduction to Distributional Semantics

Semantic Similarity & Relatedness

θ

car

dog

cat

bark

run

leash

Page 18: Introduction to Distributional Semantics

Semantic Similarity & Relatedness Semantic similarity - two words sharing a high number

of salient- features (attributes)- synonymy (car/automobile)- hyperonymy (car/vehicle)- co-hyponymy (car/van/truck)

Semantic relatedness (Budanitsky & Hirst 2006) - two words semantically associated without being necessarily similar

- function (car/drive)- meronymy (car/tyre)- location (car/road)- attribute (car/fast)

Page 19: Introduction to Distributional Semantics

Distributional Semantic Models (DSMs) Computational models that build contextual semantic

representations from corpus data

Semantic context is represented by a vector

Vectors are obtained through the statistical analysis of the linguistic contexts of a word

Salience of contexts (cf. context weighting scheme)

Semantic similarity/relatedness as the core operation over the model

Page 20: Introduction to Distributional Semantics

DSMs as Commonsense Reasoning

Commonsense is here

θ

car

dog

cat

bark

run

leash

Page 21: Introduction to Distributional Semantics

DSMs as Commonsense Reasoning

Page 22: Introduction to Distributional Semantics

DSMs as Commonsense Reasoning

θ

car

dog

cat

bark

run

leash

...

vs.

Semantic best-effort

Page 23: Introduction to Distributional Semantics

Demonstration (EasyESA)

http://treo.deri.ie/easyesa/

Page 24: Introduction to Distributional Semantics

Applications

Applications- Semantic search- Question answering- Approximate semantic inference- Word sense disambiguation- Paraphrase detection- Text entailment- Semantic anomaly detection...

Page 25: Introduction to Distributional Semantics

Alternative Names for DSMs

Corpus-based semantics Statistical semantics Geometrical models of meaning Vector semantics Word (semantic) space models

Page 26: Introduction to Distributional Semantics

Definition of DSMs

Page 27: Introduction to Distributional Semantics

Building a DSM

Pre-process a corpus (target, context) Count the target-context co-occurrences Weight the contexts (optional) Build the distributional matrix Reduce the matrix dimensions (optional)

Parameters- Corpus- Context type- Weighting scheme- Similarity measure- Number of dimensions

A parameter configuration determines the DSM: (LSA, ESA, …)

Page 28: Introduction to Distributional Semantics

Parameters

Corpus pre-processing- Stemming/lemmatization- POS tagging- Syntactic Dependencies

Context- Document- Paragraph- Passage- Word windows- Words- Linguistic features- Lingustic patterns

- Verbs : contexts nouns- Verbs : contexts adverbs - etc.

- Size- Shape

Context Engineering

Page 29: Introduction to Distributional Semantics

Effect of Parameters

Page 30: Introduction to Distributional Semantics

Context Weighting

Smoothing frequency differences: From raw counts to log-frequency.

Association measures (Evert 2005): are used to give more weight to contexts that are more significantly associated with a target word

Page 31: Introduction to Distributional Semantics

Context WeightingMeasures

Kiela & Clark, 2014

Page 32: Introduction to Distributional Semantics

Similarity Measures

Kiela & Clark, 2014

Page 33: Introduction to Distributional Semantics

What is the best parameter configuration? The best parameter configuration depends on the

task.

Systematic exploration of the parameters

Page 34: Introduction to Distributional Semantics

DSM Instances

Latent Semantic Analysis (Landauer & Dumais 1996) Hyperspace Analogue to Language (Lund & Burgess

1996) Infomap NLP (Widdows 2004) Random Indexing (Karlgren & Salhgren 2001) Dependency Vectors (Pad´o & Lapata 2007) Explicit Semanitc Analysis (Gabrilovich & Markovitch,

2008) Distributional Memory (Baroni & Lenci 2009)

Page 35: Introduction to Distributional Semantics

CompositionalSemantics

Page 36: Introduction to Distributional Semantics

Paraphrase Detection

I find it rather odd that people are already trying to tie the Commission's hands in relation to the proposal for a directive, while at the same calling on it to present a Green Paper on the current situation with regard to optional and supplementary health insurance schemes.

I find it a little strange to now obliging the Commission to a motion for a resolution and to ask him at the same time to draw up a Green Paper on the current state of voluntary insurance and supplementary sickness insurance.

=?

Page 37: Introduction to Distributional Semantics

Compositional Semantics

Can we extend DS to account for the meaning of phrases and sentences?

Compositionality: The meaning of a complex expression is a function of the meaning of its constituent parts.

Page 38: Introduction to Distributional Semantics

Compositional Semantics

Words in which the meaning is directly determined by their distributional behaviour (e.g., nouns).

Words that act as functions transforming the distributional profile of other words (e.g., verbs, adjectives, …).

dogs

ol

d

Page 39: Introduction to Distributional Semantics

Compositional Semantics

Mixture

Function

Page 40: Introduction to Distributional Semantics

Compositional Semantics

Take the syntactic structure to constitute the backbone guiding the assembly of the semantic representations of phrases.

(CHASE × cats) × dogs.

3rd order tensor vector

vector

(CHASE × cats)

Baroni et al., 2012

Page 41: Introduction to Distributional Semantics

Formal Model

Distributional Semantics & Category Theory

Page 42: Introduction to Distributional Semantics

Take-away message

Low acquisition effort

Simple way to build a commonsense KB

Semantic approximation as a built-in construct

Semantic best-effort

Simple to use

DSMs are evolving fast (compositional and formal grounding)

Distributional semantics brings a promising approach for building semantic models that work in the real world

Page 43: Introduction to Distributional Semantics

Great Introductory References

Evert & Lenci ESSLLI Tutorial on Distributional Semantics, 2009. (many slides were taken or adapted from this great tutorial).

Turney & Pantel, From Frequency to Meaning:Vector Space Models of Semantics, 2010.

Baroni et al., Frege in Space: A Program for Compositional Distributional Semantics, 2012.

Kiela & Clark: A Systematic Study of Semantic Vector Space Model Parameters, 2014.