Distributional Semantic Models - Part 1:...

144
cat dog pet isa isa Distributional Semantic Models CS 114 James Pustejovsky slides by Stefan Evert DSM Tutorial – Part 1 1 / 91

Transcript of Distributional Semantic Models - Part 1:...

Page 1: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

cat

dog

petisa

isa

Distributional Semantic Models CS 114 James Pustejovsky

slides by Stefan Evert

DSM Tutorial – Part 1 1 / 91

Page 2: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Outline

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 2 / 91

Page 3: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 3 / 91

Page 4: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Meaning & distribution

I “Die Bedeutung eines Wortes liegt in seinem Gebrauch.”— Ludwig Wittgenstein

I “You shall know a word by the company it keeps!”— J. R. Firth (1957)

I Distributional hypothesis: difference of meaning correlateswith difference of distribution (Zellig Harris 1954)

I “What people know when they say that they know a word isnot how to recite its dictionary definition – they know how touse it [. . . ] in everyday discourse.” (Miller 1986)

DSM Tutorial – Part 1 4 / 91

Page 5: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

I He handed her her glass of bardiwac.I Beef dishes are made to complement the bardiwacs.I Nigel staggered to his feet, face flushed from too much

bardiwac.I Malbec, one of the lesser-known bardiwac grapes, responds

well to Australia’s sunshine.I I dined off bread and cheese and this excellent bardiwac.I The drinks were delicious: blood-red bardiwac as well as light,

sweet Rhenish.+ bardiwac is a heavy red alcoholic beverage made from grapes

The examples above are handpicked, of course. But in a corpus like theBNC, you will find at least as many informative sentences.

DSM Tutorial – Part 1 5 / 91

Page 6: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

I He handed her her glass of bardiwac.

I Beef dishes are made to complement the bardiwacs.I Nigel staggered to his feet, face flushed from too much

bardiwac.I Malbec, one of the lesser-known bardiwac grapes, responds

well to Australia’s sunshine.I I dined off bread and cheese and this excellent bardiwac.I The drinks were delicious: blood-red bardiwac as well as light,

sweet Rhenish.+ bardiwac is a heavy red alcoholic beverage made from grapes

The examples above are handpicked, of course. But in a corpus like theBNC, you will find at least as many informative sentences.

DSM Tutorial – Part 1 5 / 91

Page 7: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

I He handed her her glass of bardiwac.I Beef dishes are made to complement the bardiwacs.

I Nigel staggered to his feet, face flushed from too muchbardiwac.

I Malbec, one of the lesser-known bardiwac grapes, respondswell to Australia’s sunshine.

I I dined off bread and cheese and this excellent bardiwac.I The drinks were delicious: blood-red bardiwac as well as light,

sweet Rhenish.+ bardiwac is a heavy red alcoholic beverage made from grapes

The examples above are handpicked, of course. But in a corpus like theBNC, you will find at least as many informative sentences.

DSM Tutorial – Part 1 5 / 91

Page 8: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

I He handed her her glass of bardiwac.I Beef dishes are made to complement the bardiwacs.I Nigel staggered to his feet, face flushed from too much

bardiwac.

I Malbec, one of the lesser-known bardiwac grapes, respondswell to Australia’s sunshine.

I I dined off bread and cheese and this excellent bardiwac.I The drinks were delicious: blood-red bardiwac as well as light,

sweet Rhenish.+ bardiwac is a heavy red alcoholic beverage made from grapes

The examples above are handpicked, of course. But in a corpus like theBNC, you will find at least as many informative sentences.

DSM Tutorial – Part 1 5 / 91

Page 9: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

I He handed her her glass of bardiwac.I Beef dishes are made to complement the bardiwacs.I Nigel staggered to his feet, face flushed from too much

bardiwac.I Malbec, one of the lesser-known bardiwac grapes, responds

well to Australia’s sunshine.

I I dined off bread and cheese and this excellent bardiwac.I The drinks were delicious: blood-red bardiwac as well as light,

sweet Rhenish.+ bardiwac is a heavy red alcoholic beverage made from grapes

The examples above are handpicked, of course. But in a corpus like theBNC, you will find at least as many informative sentences.

DSM Tutorial – Part 1 5 / 91

Page 10: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

I He handed her her glass of bardiwac.I Beef dishes are made to complement the bardiwacs.I Nigel staggered to his feet, face flushed from too much

bardiwac.I Malbec, one of the lesser-known bardiwac grapes, responds

well to Australia’s sunshine.I I dined off bread and cheese and this excellent bardiwac.

I The drinks were delicious: blood-red bardiwac as well as light,sweet Rhenish.

+ bardiwac is a heavy red alcoholic beverage made from grapes

The examples above are handpicked, of course. But in a corpus like theBNC, you will find at least as many informative sentences.

DSM Tutorial – Part 1 5 / 91

Page 11: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

I He handed her her glass of bardiwac.I Beef dishes are made to complement the bardiwacs.I Nigel staggered to his feet, face flushed from too much

bardiwac.I Malbec, one of the lesser-known bardiwac grapes, responds

well to Australia’s sunshine.I I dined off bread and cheese and this excellent bardiwac.I The drinks were delicious: blood-red bardiwac as well as light,

sweet Rhenish.

+ bardiwac is a heavy red alcoholic beverage made from grapes

The examples above are handpicked, of course. But in a corpus like theBNC, you will find at least as many informative sentences.

DSM Tutorial – Part 1 5 / 91

Page 12: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

I He handed her her glass of bardiwac.I Beef dishes are made to complement the bardiwacs.I Nigel staggered to his feet, face flushed from too much

bardiwac.I Malbec, one of the lesser-known bardiwac grapes, responds

well to Australia’s sunshine.I I dined off bread and cheese and this excellent bardiwac.I The drinks were delicious: blood-red bardiwac as well as light,

sweet Rhenish.+ bardiwac is a heavy red alcoholic beverage made from grapes

The examples above are handpicked, of course. But in a corpus like theBNC, you will find at least as many informative sentences.

DSM Tutorial – Part 1 5 / 91

Page 13: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

DSM Tutorial – Part 1 6 / 91

Page 14: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

What is the meaning of “bardiwac”?

DSM Tutorial – Part 1 6 / 91

Page 15: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

A thought experiment: deciphering hieroglyphs

get sij ius hir iit kil(knife) naif 51 20 84 0 3 0(cat) ket 52 58 4 4 6 26??? dog 115 83 10 42 33 17(boat) beut 59 39 23 4 0 0(cup) kap 98 14 6 2 1 0(pig) pigij 12 17 3 2 9 27(banana) nana 11 2 2 0 18 0

DSM Tutorial – Part 1 7 / 91

Page 16: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

A thought experiment: deciphering hieroglyphs

get sij ius hir iit kil(knife) naif 51 20 84 0 3 0(cat) ket 52 58 4 4 6 26??? dog 115 83 10 42 33 17(boat) beut 59 39 23 4 0 0(cup) kap 98 14 6 2 1 0(pig) pigij 12 17 3 2 9 27(banana) nana 11 2 2 0 18 0

sim(dog, naif) = 0.770

DSM Tutorial – Part 1 7 / 91

Page 17: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

A thought experiment: deciphering hieroglyphs

get sij ius hir iit kil(knife) naif 51 20 84 0 3 0(cat) ket 52 58 4 4 6 26??? dog 115 83 10 42 33 17(boat) beut 59 39 23 4 0 0(cup) kap 98 14 6 2 1 0(pig) pigij 12 17 3 2 9 27(banana) nana 11 2 2 0 18 0

sim(dog, pigij) = 0.939

DSM Tutorial – Part 1 7 / 91

Page 18: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

A thought experiment: deciphering hieroglyphs

get sij ius hir iit kil(knife) naif 51 20 84 0 3 0(cat) ket 52 58 4 4 6 26??? dog 115 83 10 42 33 17(boat) beut 59 39 23 4 0 0(cup) kap 98 14 6 2 1 0(pig) pigij 12 17 3 2 9 27(banana) nana 11 2 2 0 18 0

sim(dog, ket) = 0.961

DSM Tutorial – Part 1 7 / 91

Page 19: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

English as seen by the computer . . .

get see use hear eat killget sij ius hir iit kil

knife naif 51 20 84 0 3 0cat ket 52 58 4 4 6 26dog dog 115 83 10 42 33 17boat beut 59 39 23 4 0 0cup kap 98 14 6 2 1 0pig pigij 12 17 3 2 9 27banana nana 11 2 2 0 18 0

verb-object counts from British National Corpus

DSM Tutorial – Part 1 8 / 91

Page 20: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Geometric interpretation

I row vector xdogdescribes usage ofword dog in thecorpus

I can be seen ascoordinates of pointin n-dimensionalEuclidean space

get see use hear eat killknife 51 20 84 0 3 0cat 52 58 4 4 6 26dog 115 83 10 42 33 17boat 59 39 23 4 0 0cup 98 14 6 2 1 0pig 12 17 3 2 9 27

banana 11 2 2 0 18 0

co-occurrence matrix M

DSM Tutorial – Part 1 9 / 91

Page 21: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Geometric interpretation

I row vector xdogdescribes usage ofword dog in thecorpus

I can be seen ascoordinates of pointin n-dimensionalEuclidean space

I illustrated for twodimensions:get and use

I xdog = (115, 10) ●●

0 20 40 60 80 100 120

020

4060

8010

012

0

Two dimensions of English V−Obj DSM

get

use

catdog

knife

boat

DSM Tutorial – Part 1 10 / 91

Page 22: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Geometric interpretation

I similarity = spatialproximity(Euclidean dist.)

I location depends onfrequency of noun(fdog ≈ 2.7 · fcat)

●●

0 20 40 60 80 100 120

020

4060

8010

012

0

Two dimensions of English V−Obj DSM

get

use

catdog

knife

boat

d = 63.3

d = 57.5

DSM Tutorial – Part 1 11 / 91

Page 23: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Geometric interpretation

I similarity = spatialproximity(Euclidean dist.)

I location depends onfrequency of noun(fdog ≈ 2.7 · fcat)

I direction moreimportant thanlocation

●●

0 20 40 60 80 100 120

020

4060

8010

012

0

Two dimensions of English V−Obj DSM

get

use

catdog

knife

boat

DSM Tutorial – Part 1 12 / 91

Page 24: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Geometric interpretation

I similarity = spatialproximity(Euclidean dist.)

I location depends onfrequency of noun(fdog ≈ 2.7 · fcat)

I direction moreimportant thanlocation

I normalise “length”‖xdog‖ of vector

I or use angle α asdistance measure

●●

0 20 40 60 80 100 120

020

4060

8010

012

0

Two dimensions of English V−Obj DSM

get

use

catdog

knife

boat

●●

DSM Tutorial – Part 1 13 / 91

Page 25: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Geometric interpretation

I similarity = spatialproximity(Euclidean dist.)

I location depends onfrequency of noun(fdog ≈ 2.7 · fcat)

I direction moreimportant than

locationI normalize “length”‖xdog‖ of vector

I or use angle α asdistance measure

●●

0 20 40 60 80 100 120

020

4060

8010

012

0

Two dimensions of English V−Obj DSM

get

use

catdog

knife

boat

●●

α = 54.3°

DSM Tutorial – Part 1 13 / 91

Page 26: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Semantic distances

I main result of distributionalanalysis are “semantic”distances between words

I typical applicationsI nearest neighboursI clustering of related wordsI construct semantic map

I other applications requireclever use of the distanceinformation

I semantic relationsI relational analogiesI word sense disambiguationI detection of multiword

expressions

pota

toon

ion

cat

bana

nach

icke

nm

ushr

oom

corn

dog

pear

cher

ryle

ttuce

peng

uin

swan

eagl

eow

ldu

ckel

epha

nt pig

cow

lion

helic

opte

rpe

acoc

ktu

rtle ca

rpi

neap

ple

boat

rock

ettr

uck

mot

orcy

cle

snai

lsh

ipch

isel

scis

sors

scre

wdr

iver

penc

ilha

mm

erte

leph

one

knife

spoo

npe

nke

ttle

bottl

ecu

pbo

wl

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Word space clustering of concrete nouns (V−Obj from BNC)

Clu

ster

siz

e

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

● ●

●●

●●

●●

●●

●●

−0.4 −0.2 0.0 0.8

−0.

4−

0.2

0.0

0.2

0.4

0.6

Semantic map (V−Obj from BNC)

birdgroundAnimalfruitTreegreentoolvehicle

chicken

eagle duck

swan owl

penguin

peacock

dog

elephantcow

cat

lionpig

snail

turtle

cherry

banana

pearpineapple

mushroom

corn

lettuce

potatoonion

bottle

pencil

pen

cup

bowl

scissors

kettle

knife

screwdriver

hammer

spoon

chisel

telephoneboatcarship

truck

rocket

motorcycle

helicopter

DSM Tutorial – Part 1 14 / 91

Page 27: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction The distributional hypothesis

Some applications in computational linguisticsI Unsupervised part-of-speech induction (Schütze 1995)I Word sense disambiguation (Schütze 1998)I Query expansion in information retrieval (Grefenstette 1994)I Synonym tasks & other language tests

(Landauer and Dumais 1997; Turney et al. 2003)I Thesaurus compilation (Lin 1998a; Rapp 2004)I Ontology & wordnet expansion (Pantel et al. 2009)I Attachment disambiguation (Pantel and Lin 2000)I Probabilistic language models (Bengio et al. 2003)I Subsymbolic input representation for neural networksI Many other tasks in computational semantics:

entailment detection, noun compound interpretation,identification of noncompositional expressions, . . .

DSM Tutorial – Part 1 15 / 91

Page 28: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction Three famous DSM examples

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 16 / 91

Page 29: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction Three famous DSM examples

Latent Semantic Analysis (Landauer and Dumais 1997)

I Corpus: 30,473 articles from Grolier’s Academic AmericanEncyclopedia (4.6 million words in total)

+ articles were limited to first 2,000 charactersI Word-article frequency matrix for 60,768 words

I row vector shows frequency of word in each articleI Logarithmic frequencies scaled by word entropyI Reduced to 300 dim. by singular value decomposition (SVD)

I borrowed from LSI (Dumais et al. 1988)+ central claim: SVD reveals latent semantic features,

not just a data reduction techniqueI Evaluated on TOEFL synonym test (80 items)

I LSA model achieved 64.4% correct answersI also simulation of learning rate based on TOEFL results

DSM Tutorial – Part 1 17 / 91

Page 30: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction Three famous DSM examples

Word Space (Schütze 1992, 1993, 1998)

I Corpus: ≈ 60 million words of news messagesI from the New York Times News Service

I Word-word co-occurrence matrixI 20,000 target words & 2,000 context words as featuresI row vector records how often each context word occurs close

to the target word (co-occurrence)I co-occurrence window: left/right 50 words (Schütze 1998)

or ≈ 1000 characters (Schütze 1992)I Rows weighted by inverse document frequency (tf.idf)I Context vector = centroid of word vectors (bag-of-words)

+ goal: determine “meaning” of a contextI Reduced to 100 SVD dimensions (mainly for efficiency)I Evaluated on unsupervised word sense induction by clustering

of context vectors (for an ambiguous word)I induced word senses improve information retrieval performance

DSM Tutorial – Part 1 18 / 91

Page 31: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction Three famous DSM examples

HAL (Lund and Burgess 1996)

I HAL = Hyperspace Analogue to LanguageI Corpus: 160 million words from newsgroup postingsI Word-word co-occurrence matrix

I same 70,000 words used as targets and featuresI co-occurrence window of 1 – 10 words

I Separate counts for left and right co-occurrenceI i.e. the context is structured

I In later work, co-occurrences are weighted by (inverse)distance (Li et al. 2000)

I Applications include construction of semantic vocabularymaps by multidimensional scaling to 2 dimensions

DSM Tutorial – Part 1 19 / 91

Page 32: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Introduction Three famous DSM examples

Many parameters . . .

I Enormous range of DSM parameters and applicationsI Examples showed three entirely different models, each tuned

to its particular applicationå Need overview of DSM parameters & understand their effects

DSM Tutorial – Part 1 20 / 91

Page 33: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 21 / 91

Page 34: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

General definition of DSMs

A distributional semantic model (DSM) is a scaled and/ortransformed co-occurrence matrix M, such that each row xrepresents the distribution of a target term across contexts.

get see use hear eat killknife 0.027 -0.024 0.206 -0.022 -0.044 -0.042cat 0.031 0.143 -0.243 -0.015 -0.009 0.131dog -0.026 0.021 -0.212 0.064 0.013 0.014boat -0.022 0.009 -0.044 -0.040 -0.074 -0.042cup -0.014 -0.173 -0.249 -0.099 -0.119 -0.042pig -0.069 0.094 -0.158 0.000 0.094 0.265

banana 0.047 -0.139 -0.104 -0.022 0.267 -0.042

Term = word, lemma, phrase, morpheme, word pair, . . .

DSM Tutorial – Part 1 22 / 91

Page 35: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

General definition of DSMs

Mathematical notation:I k × n co-occurrence matrix M (example: 7× 6 matrix)

I k rows = target termsI n columns = features or dimensions

M =

m11 m12 · · · m1nm21 m22 · · · m2n...

......

mk1 mk2 · · · mkn

I distribution vector mi = i-th row of M, e.g. m3 = mdogI components mi = (mi1,mi2, . . . ,min) = features of i-th term:

m3 = (−0.026, 0.021,−0.212, 0.064, 0.013, 0.014)= (m31,m32,m33,m34,m35,m36)

DSM Tutorial – Part 1 23 / 91

Page 36: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Overview of DSM parameters

Term-context vs. term-term matrix

⇓Definition of terms & linguistic pre-processing

⇓Size & type of context

⇓Geometric vs. probabilistic interpretation

⇓Feature scaling

⇓Normalisation of rows and/or columns

⇓Similarity / distance measure

⇓Dimensionality reduction

DSM Tutorial – Part 1 24 / 91

Page 37: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing

⇓Size & type of context

⇓Geometric vs. probabilistic interpretation

⇓Feature scaling

⇓Normalisation of rows and/or columns

⇓Similarity / distance measure

⇓Dimensionality reduction

DSM Tutorial – Part 1 24 / 91

Page 38: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context

⇓Geometric vs. probabilistic interpretation

⇓Feature scaling

⇓Normalisation of rows and/or columns

⇓Similarity / distance measure

⇓Dimensionality reduction

DSM Tutorial – Part 1 24 / 91

Page 39: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation

⇓Feature scaling

⇓Normalisation of rows and/or columns

⇓Similarity / distance measure

⇓Dimensionality reduction

DSM Tutorial – Part 1 24 / 91

Page 40: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling

⇓Normalisation of rows and/or columns

⇓Similarity / distance measure

⇓Dimensionality reduction

DSM Tutorial – Part 1 24 / 91

Page 41: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns

⇓Similarity / distance measure

⇓Dimensionality reduction

DSM Tutorial – Part 1 24 / 91

Page 42: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure

⇓Dimensionality reduction

DSM Tutorial – Part 1 24 / 91

Page 43: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Definition & overview

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 24 / 91

Page 44: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 25 / 91

Page 45: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 26 / 91

Page 46: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Term-context matrix

Term-context matrix records frequency of term in each individualcontext (e.g. sentence, document, Web page, encyclopaedia article)

F =

· · · f1 · · ·· · · f2 · · ·

...

...· · · fk · · ·

Felidae

Pet

Feral

Bloat

Philosophy

Kant

Back

pain

cat 10 10 7 – – – –dog – 10 4 11 – – –

animal 2 15 10 2 – – –time 1 – – – 2 1 –

reason – 1 – – 1 4 1cause – – – 2 1 2 6effect – – – 1 – 1 –

DSM Tutorial – Part 1 27 / 91

Page 47: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Term-context matrix

Some footnotes:I Features are usually context tokens, i.e. individual instancesI Can also be generalised to context types, e.g.

I bag of content wordsI specific pattern of POS tagsI n-gram of words (or POS tags) around targetI subcategorisation pattern of target verb

I Term-context matrix is often very sparse

DSM Tutorial – Part 1 28 / 91

Page 48: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Term-term matrix

Term-term matrix records co-occurrence frequencies with featureterms for each target term

M =

· · · m1 · · ·· · · m2 · · ·

...

...· · · mk · · ·

breed

tail

feed

kill

importa

ntexpla

inlikely

cat 83 17 7 37 – 1 –dog 561 13 30 60 1 2 4

animal 42 10 109 134 13 5 5time 19 9 29 117 81 34 109

reason 1 – 2 14 68 140 47cause – 1 – 4 55 34 55effect – – 1 6 60 35 17

+ we will usually assume a term-term matrix

DSM Tutorial – Part 1 29 / 91

Page 49: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Term-term matrix

Some footnotes:I Often target terms 6= feature terms

I e.g. nouns described by co-occurrences with verbs as featuresI identical sets of target & feature terms Ü symmetric matrix

I Different types of contexts (Evert 2008)I surface context (word or character window)I textual context (non-overlapping segments)I syntactic contxt (specific syntagmatic relation)

I Can be seen as smoothing of term-context matrixI average over similar contexts (with same context terms)I data sparseness reduced, except for small windowsI we will take a closer look at the relation between term-context

and term-term models later in this tutorial

DSM Tutorial – Part 1 30 / 91

Page 50: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 31 / 91

Page 51: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Corpus pre-processingI Minimally, corpus must be tokenised Ü identify termsI Linguistic annotation

I part-of-speech taggingI lemmatisation / stemmingI word sense disambiguation (rare)I shallow syntactic patternsI dependency parsing

I Generalisation of termsI often lemmatised to reduce data sparseness:

go, goes, went, gone, going Ü goI POS disambiguation (light/N vs. light/A vs. light/V)I word sense disambiguation (bankriver vs. bankfinance)

I Trade-off between deeper linguistic analysis andI need for language-specific resourcesI possible errors introduced at each stage of the analysis

DSM Tutorial – Part 1 32 / 91

Page 52: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Corpus pre-processingI Minimally, corpus must be tokenised Ü identify termsI Linguistic annotation

I part-of-speech taggingI lemmatisation / stemmingI word sense disambiguation (rare)I shallow syntactic patternsI dependency parsing

I Generalisation of termsI often lemmatised to reduce data sparseness:

go, goes, went, gone, going Ü goI POS disambiguation (light/N vs. light/A vs. light/V)I word sense disambiguation (bankriver vs. bankfinance)

I Trade-off between deeper linguistic analysis andI need for language-specific resourcesI possible errors introduced at each stage of the analysis

DSM Tutorial – Part 1 32 / 91

Page 53: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Corpus pre-processingI Minimally, corpus must be tokenised Ü identify termsI Linguistic annotation

I part-of-speech taggingI lemmatisation / stemmingI word sense disambiguation (rare)I shallow syntactic patternsI dependency parsing

I Generalisation of termsI often lemmatised to reduce data sparseness:

go, goes, went, gone, going Ü goI POS disambiguation (light/N vs. light/A vs. light/V)I word sense disambiguation (bankriver vs. bankfinance)

I Trade-off between deeper linguistic analysis andI need for language-specific resourcesI possible errors introduced at each stage of the analysis

DSM Tutorial – Part 1 32 / 91

Page 54: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Effects of pre-processing

Nearest neighbors of walk (BNC)

word formsI strollI walkingI walkedI goI pathI driveI rideI wanderI sprintedI sauntered

lemmatised corpus

I hurryI strollI strideI trudgeI ambleI wanderI walk-nnI walkingI retraceI scuttle

DSM Tutorial – Part 1 33 / 91

Page 55: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Effects of pre-processing

Nearest neighbors of arrivare (Repubblica)

word formsI giungereI raggiungereI arriviI raggiungimentoI raggiuntoI trovareI raggiungeI arrivasseI arriveràI concludere

lemmatised corpus

I giungereI aspettareI attendereI arrivo-nnI ricevereI accontentareI approdareI pervenireI venireI piombare

DSM Tutorial – Part 1 34 / 91

Page 56: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 35 / 91

Page 57: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Surface context

Context term occurs within a window of k words around target.

The silhouette of the sun beyond a wide-open bay on the lake; thesun still glitters although evening has arrived in Kuhmo. It’smidsummer; the living room has its instruments and other objectsin each of its corners.

Parameters:I window size (in words or characters)I symmetric vs. one-sided windowI uniform or “triangular” (distance-based) weightingI window clamped to sentences or other textual units?

DSM Tutorial – Part 1 36 / 91

Page 58: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Effect of different window sizes

Nearest neighbours of dog (BNC)

2-word windowI catI horseI foxI petI rabbitI pigI animalI mongrelI sheepI pigeon

30-word windowI kennelI puppyI petI bitchI terrierI rottweilerI canineI catI to barkI Alsatian

DSM Tutorial – Part 1 37 / 91

Page 59: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Textual context

Context term is in the same linguistic unit as target.

The silhouette of the sun beyond a wide-open bay on the lake; thesun still glitters although evening has arrived in Kuhmo. It’smidsummer; the living room has its instruments and other objectsin each of its corners.

Parameters:I type of linguistic unit

I sentenceI paragraphI turn in a conversationI Web page

DSM Tutorial – Part 1 38 / 91

Page 60: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Syntactic context

Context term is linked to target by a syntactic dependency(e.g. subject, modifier, . . . ).

The silhouette of the sun beyond a wide-open bay on the lake; thesun still glitters although evening has arrived in Kuhmo. It’smidsummer; the living room has its instruments and other objectsin each of its corners.

Parameters:I types of syntactic dependency (Padó and Lapata 2007)I direct vs. indirect dependency paths

I direct dependenciesI direct + indirect dependencies

I homogeneous data (e.g. only verb-object) vs.heterogeneous data (e.g. all children and parents of the verb)

I maximal length of dependency path© Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial – Part 1 39 / 91

Page 61: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

“Knowledge pattern” context

Context term is linked to target by a lexico-syntactic pattern(text mining, cf. Hearst 1992, Pantel & Pennacchiotti 2008, etc.).

In Provence, Van Gogh painted with bright colors such as red andyellow. These colors produce incredible effects on anybody lookingat his paintings.

Parameters:I inventory of lexical patterns

I lots of research to identify semantically interesting patterns(cf. Almuhareb & Poesio 2004, Veale & Hao 2008, etc.)

I fixed vs. flexible patternsI patterns are mined from large corpora and automatically

generalised (optional elements, POS tags or semantic classes)

DSM Tutorial – Part 1 40 / 91

Page 62: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Structured vs. unstructured context

I In unstructered models, context specification acts as a filterI determines whether context tokens counts as co-occurrenceI e.g. linked by specific syntactic relation such as verb-object

I In structured models, context words are subtypedI depending on their position in the contextI e.g. left vs. right context, type of syntactic relation, etc.

DSM Tutorial – Part 1 41 / 91

Page 63: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Structured vs. unstructured context

I In unstructered models, context specification acts as a filterI determines whether context tokens counts as co-occurrenceI e.g. linked by specific syntactic relation such as verb-object

I In structured models, context words are subtypedI depending on their position in the contextI e.g. left vs. right context, type of syntactic relation, etc.

DSM Tutorial – Part 1 41 / 91

Page 64: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Structured vs. unstructured surface context

A dog bites a man. The man’s dog bites a dog. A dog bites a man.

unstructured bitedog 4man 3

A dog bites a man. The man’s dog bites a dog. A dog bites a man.

structured bite-l bite-rdog 3 1man 1 2

DSM Tutorial – Part 1 42 / 91

Page 65: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Structured vs. unstructured surface context

A dog bites a man. The man’s dog bites a dog. A dog bites a man.

unstructured bitedog 4man 3

A dog bites a man. The man’s dog bites a dog. A dog bites a man.

structured bite-l bite-rdog 3 1man 1 2

DSM Tutorial – Part 1 42 / 91

Page 66: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Structured vs. unstructured dependency context

A dog bites a man. The man’s dog bites a dog. A dog bites a man.

unstructured bitedog 4man 2

A dog bites a man. The man’s dog bites a dog. A dog bites a man.

structured bite-subj bite-objdog 3 1man 0 2

DSM Tutorial – Part 1 43 / 91

Page 67: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Structured vs. unstructured dependency context

A dog bites a man. The man’s dog bites a dog. A dog bites a man.

unstructured bitedog 4man 2

A dog bites a man. The man’s dog bites a dog. A dog bites a man.

structured bite-subj bite-objdog 3 1man 0 2

DSM Tutorial – Part 1 43 / 91

Page 68: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Comparison

I Unstructured contextI data less sparse (e.g. man kills and kills man both map to the

kill dimension of the vector xman)

I Structured contextI more sensitive to semantic distinctions

(kill-subj and kill-obj are rather different things!)I dependency relations provide a form of syntactic “typing” of

the DSM dimensions (the “subject” dimensions, the“recipient” dimensions, etc.)

I important to account for word-order and compositionality

DSM Tutorial – Part 1 44 / 91

Page 69: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 45 / 91

Page 70: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Geometric vs. probabilistic interpretation

I Geometric interpretationI row vectors as points or arrows in n-dim. spaceI very intuitive, good for visualisationI use techniques from geometry and linear algebra

I Probabilistic interpretationI co-occurrence matrix as observed sample statisticI “explained” by generative probabilistic modelI recent work focuses on hierarchical Bayesian modelsI probabilistic LSA (Hoffmann 1999), Latent Semantic

Clustering (Rooth et al. 1999), Latent Dirichlet Allocation(Blei et al. 2003), etc.

I explicitly accounts for random variation of frequency countsI intuitive and plausible as topic model

+ focus on geometric interpretation in this tutorial

DSM Tutorial – Part 1 46 / 91

Page 71: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Geometric vs. probabilistic interpretation

I Geometric interpretationI row vectors as points or arrows in n-dim. spaceI very intuitive, good for visualisationI use techniques from geometry and linear algebra

I Probabilistic interpretationI co-occurrence matrix as observed sample statisticI “explained” by generative probabilistic modelI recent work focuses on hierarchical Bayesian modelsI probabilistic LSA (Hoffmann 1999), Latent Semantic

Clustering (Rooth et al. 1999), Latent Dirichlet Allocation(Blei et al. 2003), etc.

I explicitly accounts for random variation of frequency countsI intuitive and plausible as topic model

+ focus on geometric interpretation in this tutorial

DSM Tutorial – Part 1 46 / 91

Page 72: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Geometric vs. probabilistic interpretation

I Geometric interpretationI row vectors as points or arrows in n-dim. spaceI very intuitive, good for visualisationI use techniques from geometry and linear algebra

I Probabilistic interpretationI co-occurrence matrix as observed sample statisticI “explained” by generative probabilistic modelI recent work focuses on hierarchical Bayesian modelsI probabilistic LSA (Hoffmann 1999), Latent Semantic

Clustering (Rooth et al. 1999), Latent Dirichlet Allocation(Blei et al. 2003), etc.

I explicitly accounts for random variation of frequency countsI intuitive and plausible as topic model

+ focus on geometric interpretationDSM Tutorial – Part 1 46 / 91

Page 73: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 47 / 91

Page 74: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Feature scaling

Feature scaling is used to “discount” less important features:I Logarithmic scaling: x ′ = log(x + 1)

(cf. Weber-Fechner law for human perception)

I Relevance weighting, e.g. tf.idf (information retrieval)I Statistical association measures (Evert 2004, 2008) take

frequency of target word and context feature into accountI the less frequent the target word and (more importantly) the

context feature are, the higher the weight given to theirobserved co-occurrence count should be (because theirexpected chance co-occurrence frequency is low)

I different measures – e.g., mutual information, log-likelihoodratio – differ in how they balance observed and expectedco-occurrence frequencies

DSM Tutorial – Part 1 48 / 91

Page 75: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Feature scaling

Feature scaling is used to “discount” less important features:I Logarithmic scaling: x ′ = log(x + 1)

(cf. Weber-Fechner law for human perception)I Relevance weighting, e.g. tf.idf (information retrieval)

I Statistical association measures (Evert 2004, 2008) takefrequency of target word and context feature into account

I the less frequent the target word and (more importantly) thecontext feature are, the higher the weight given to theirobserved co-occurrence count should be (because theirexpected chance co-occurrence frequency is low)

I different measures – e.g., mutual information, log-likelihoodratio – differ in how they balance observed and expectedco-occurrence frequencies

DSM Tutorial – Part 1 48 / 91

Page 76: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Feature scaling

Feature scaling is used to “discount” less important features:I Logarithmic scaling: x ′ = log(x + 1)

(cf. Weber-Fechner law for human perception)I Relevance weighting, e.g. tf.idf (information retrieval)I Statistical association measures (Evert 2004, 2008) take

frequency of target word and context feature into accountI the less frequent the target word and (more importantly) the

context feature are, the higher the weight given to theirobserved co-occurrence count should be (because theirexpected chance co-occurrence frequency is low)

I different measures – e.g., mutual information, log-likelihoodratio – differ in how they balance observed and expectedco-occurrence frequencies

DSM Tutorial – Part 1 48 / 91

Page 77: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Association measures: Mutual Information (MI)

word1 word2 fobs f1 f2dog small 855 33,338 490,580dog domesticated 29 33,338 918

Expected co-occurrence frequency:

fexp = f1 · f2N

Mutual Information compares observed vs. expected frequency:

MI(w1,w2) = log2fobsfexp

= log2N · fobsf1 · f2

Disadvantage: MI overrates combinations of rare terms.

DSM Tutorial – Part 1 49 / 91

Page 78: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Association measures: Mutual Information (MI)

word1 word2 fobs f1 f2dog small 855 33,338 490,580dog domesticated 29 33,338 918

Expected co-occurrence frequency:

fexp = f1 · f2N

Mutual Information compares observed vs. expected frequency:

MI(w1,w2) = log2fobsfexp

= log2N · fobsf1 · f2

Disadvantage: MI overrates combinations of rare terms.

DSM Tutorial – Part 1 49 / 91

Page 79: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Association measures: Mutual Information (MI)

word1 word2 fobs f1 f2dog small 855 33,338 490,580dog domesticated 29 33,338 918

Expected co-occurrence frequency:

fexp = f1 · f2N

Mutual Information compares observed vs. expected frequency:

MI(w1,w2) = log2fobsfexp

= log2N · fobsf1 · f2

Disadvantage: MI overrates combinations of rare terms.

DSM Tutorial – Part 1 49 / 91

Page 80: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Association measures: Mutual Information (MI)

word1 word2 fobs f1 f2dog small 855 33,338 490,580dog domesticated 29 33,338 918

Expected co-occurrence frequency:

fexp = f1 · f2N

Mutual Information compares observed vs. expected frequency:

MI(w1,w2) = log2fobsfexp

= log2N · fobsf1 · f2

Disadvantage: MI overrates combinations of rare terms.

DSM Tutorial – Part 1 49 / 91

Page 81: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Other association measures

word1 word2 fobs fexp MI

local-MI t-score

dog small 855 134.34 2.67

2282.88 24.64

dog domesticated 29 0.25 6.85

198.76 5.34

dog sgjkj 1 0.00027 11.85

11.85 1.00

The log-likelihood ratio (Dunning 1993) has more complex form,but its “core” is known as local MI (Evert 2004).

local-MI(w1,w2) = fobs ·MI(w1,w2)

The t-score measure (Church and Hanks 1990) is popular inlexicography:

t-score(w1,w2) = fobs − fexp√fobs

Details & many more measures: http://www.collocations.de/

DSM Tutorial – Part 1 50 / 91

Page 82: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Other association measures

word1 word2 fobs fexp MI local-MI

t-score

dog small 855 134.34 2.67 2282.88

24.64

dog domesticated 29 0.25 6.85 198.76

5.34

dog sgjkj 1 0.00027 11.85 11.85

1.00

The log-likelihood ratio (Dunning 1993) has more complex form,but its “core” is known as local MI (Evert 2004).

local-MI(w1,w2) = fobs ·MI(w1,w2)

The t-score measure (Church and Hanks 1990) is popular inlexicography:

t-score(w1,w2) = fobs − fexp√fobs

Details & many more measures: http://www.collocations.de/

DSM Tutorial – Part 1 50 / 91

Page 83: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Other association measures

word1 word2 fobs fexp MI local-MI t-scoredog small 855 134.34 2.67 2282.88 24.64dog domesticated 29 0.25 6.85 198.76 5.34dog sgjkj 1 0.00027 11.85 11.85 1.00

The log-likelihood ratio (Dunning 1993) has more complex form,but its “core” is known as local MI (Evert 2004).

local-MI(w1,w2) = fobs ·MI(w1,w2)

The t-score measure (Church and Hanks 1990) is popular inlexicography:

t-score(w1,w2) = fobs − fexp√fobs

Details & many more measures: http://www.collocations.de/DSM Tutorial – Part 1 50 / 91

Page 84: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 51 / 91

Page 85: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Normalisation of row vectors

I geometric distances onlymake sense if vectors arenormalised to unit length

I divide vector by its length:

x/‖x‖

I normalisation depends ondistance measure!

I special case: scale torelative frequencies with‖x‖1 = |x1|+ · · ·+ |xn|Ü probabilisticinterpretation

●●

0 20 40 60 80 100 120

020

4060

8010

012

0

Two dimensions of English V−Obj DSM

get

use

catdog

knife

boat

●●

DSM Tutorial – Part 1 52 / 91

Page 86: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Scaling of column vectors

I In statistical analysis and machine learning, features areusually centred and scaled so that

mean µ = 0variance σ2 = 1

I In DSM research, this step is less common for columns of MI centring is a prerequisite for certain dimensionality reduction

and data analysis techniques (esp. PCA)I scaling may give too much weight to rare features

I M cannot be row-normalised and column-scaled at the sametime (result depends on ordering of the two steps)

DSM Tutorial – Part 1 53 / 91

Page 87: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Scaling of column vectors

I In statistical analysis and machine learning, features areusually centered and scaled so that

mean µ = 0variance σ2 = 1

I In DSM research, this step is less common for columns of MI centering is prerequisite for certain dimensionality reduction

and data analysis techniques (esp. PCA)I scaling may give too much weight to rare features

I M cannot be row-normalised and column-scaled at the sametime (result depends on ordering of the two steps)

DSM Tutorial – Part 1 53 / 91

Page 88: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 54 / 91

Page 89: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Geometric distance

I Distance between vectorsu, v ∈ Rn Ü (dis)similarity

I u = (u1, . . . , un)I v = (v1, . . . , vn)

I Euclidean distance d2 (u, v)I “City block” Manhattan

distance d1 (u, v)I Both are special cases of theMinkowski p-distance dp (u, v)(for p ∈ [1,∞])

x1

v

x2

1 2 3 4 5

1

2

3

4

5

6

6 u

d2 (!u,!v) = 3.6

d1 (!u,!v) = 5

DSM Tutorial – Part 1 55 / 91

Page 90: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Geometric distance

I Distance between vectorsu, v ∈ Rn Ü (dis)similarity

I u = (u1, . . . , un)I v = (v1, . . . , vn)

I Euclidean distance d2 (u, v)

I “City block” Manhattandistance d1 (u, v)

I Both are special cases of theMinkowski p-distance dp (u, v)(for p ∈ [1,∞])

x1

v

x2

1 2 3 4 5

1

2

3

4

5

6

6 u

d2 (!u,!v) = 3.6

d1 (!u,!v) = 5

d2 (u, v) :=√

(u1 − v1)2 + · · ·+ (un − vn)2

DSM Tutorial – Part 1 55 / 91

Page 91: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Geometric distance

I Distance between vectorsu, v ∈ Rn Ü (dis)similarity

I u = (u1, . . . , un)I v = (v1, . . . , vn)

I Euclidean distance d2 (u, v)I “City block” Manhattan

distance d1 (u, v)

I Both are special cases of theMinkowski p-distance dp (u, v)(for p ∈ [1,∞])

x1

v

x2

1 2 3 4 5

1

2

3

4

5

6

6 u

d2 (!u,!v) = 3.6

d1 (!u,!v) = 5

d1 (u, v) := |u1 − v1|+ · · ·+ |un − vn|

DSM Tutorial – Part 1 55 / 91

Page 92: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Geometric distance

I Distance between vectorsu, v ∈ Rn Ü (dis)similarity

I u = (u1, . . . , un)I v = (v1, . . . , vn)

I Euclidean distance d2 (u, v)I “City block” Manhattan

distance d1 (u, v)I Both are special cases of theMinkowski p-distance dp (u, v)(for p ∈ [1,∞])

x1

v

x2

1 2 3 4 5

1

2

3

4

5

6

6 u

d2 (!u,!v) = 3.6

d1 (!u,!v) = 5

dp (u, v) :=(|u1 − v1|p + · · ·+ |un − vn|p

)1/p

DSM Tutorial – Part 1 55 / 91

Page 93: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Geometric distance

I Distance between vectorsu, v ∈ Rn Ü (dis)similarity

I u = (u1, . . . , un)I v = (v1, . . . , vn)

I Euclidean distance d2 (u, v)I “City block” Manhattan

distance d1 (u, v)I Both are special cases of theMinkowski p-distance dp (u, v)(for p ∈ [1,∞])

x1

v

x2

1 2 3 4 5

1

2

3

4

5

6

6 u

d2 (!u,!v) = 3.6

d1 (!u,!v) = 5

dp (u, v) :=(|u1 − v1|p + · · ·+ |un − vn|p

)1/p

d∞ (u, v) = max{|u1 − v1|, . . . , |un − vn|

DSM Tutorial – Part 1 55 / 91

Page 94: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Other distance measures

I Information theory: Kullback-Leibler (KL) divergence forprobability vectors (non-negative, ‖x‖1 = 1)

D(u‖v) =n∑

i=1ui · log2

uivi

I Properties of KL divergenceI most appropriate in a probabilistic interpretation of MI zeroes in v without corresponding zeroes in u are problematicI not symmetric, unlike geometric distance measuresI alternatives: skew divergence, Jensen-Shannon divergence

I A symmetric distance measure (Endres and Schindelin 2003)

Duv = D(u‖z) + D(v‖z) with z = u + v2

DSM Tutorial – Part 1 56 / 91

Page 95: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Other distance measures

I Information theory: Kullback-Leibler (KL) divergence forprobability vectors (non-negative, ‖x‖1 = 1)

D(u‖v) =n∑

i=1ui · log2

uivi

I Properties of KL divergenceI most appropriate in a probabilistic interpretation of MI zeroes in v without corresponding zeroes in u are problematicI not symmetric, unlike geometric distance measuresI alternatives: skew divergence, Jensen-Shannon divergence

I A symmetric distance measure (Endres and Schindelin 2003)

Duv = D(u‖z) + D(v‖z) with z = u + v2

DSM Tutorial – Part 1 56 / 91

Page 96: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Other distance measures

I Information theory: Kullback-Leibler (KL) divergence forprobability vectors (non-negative, ‖x‖1 = 1)

D(u‖v) =n∑

i=1ui · log2

uivi

I Properties of KL divergenceI most appropriate in a probabilistic interpretation of MI zeroes in v without corresponding zeroes in u are problematicI not symmetric, unlike geometric distance measuresI alternatives: skew divergence, Jensen-Shannon divergence

I A symmetric distance measure (Endres and Schindelin 2003)

Duv = D(u‖z) + D(v‖z) with z = u + v2

DSM Tutorial – Part 1 56 / 91

Page 97: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Similarity measures

I angle α between twovectors u, v is given by

cosα =∑n

i=1 ui · vi√∑i u2i ·

√∑i v2i

= 〈u, v〉‖u‖2 · ‖v‖2

I cosine measure ofsimilarity: cosα

I cosα = 1 Ü collinearI cosα = 0 Ü orthogonal

●●

0 20 40 60 80 100 1200

2040

6080

100

120

Two dimensions of English V−Obj DSM

get

use

catdog

knife

boat

●●

α = 54.3°

DSM Tutorial – Part 1 57 / 91

Page 98: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Similarity measures

I angle α between twovectors u, v is given by

cosα =∑n

i=1 ui · vi√∑i u2i ·

√∑i v2i

= 〈u, v〉‖u‖2 · ‖v‖2

I cosine measure ofsimilarity: cosα

I cosα = 1 Ü collinearI cosα = 0 Ü orthogonal

●●

0 20 40 60 80 100 1200

2040

6080

100

120

Two dimensions of English V−Obj DSM

get

use

catdog

knife

boat

●●

α = 54.3°

DSM Tutorial – Part 1 57 / 91

Page 99: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Overview of DSM parameters

Term-context vs. term-term matrix⇓

Definition of terms & linguistic pre-processing⇓

Size & type of context⇓

Geometric vs. probabilistic interpretation⇓

Feature scaling⇓

Normalisation of rows and/or columns⇓

Similarity / distance measure⇓

Dimensionality reduction

DSM Tutorial – Part 1 58 / 91

Page 100: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Dimensionality reduction = model compression

I Co-occurrence matrix M is often unmanageably largeand can be extremely sparse

I Google Web1T5: 1M × 1M matrix with one trillion cells, ofwhich less than 0.05% contain nonzero counts (Evert 2010)

å Compress matrix by reducing dimensionality (= rows)

I Feature selection: columns with high frequency & varianceI measured by entropy, chi-squared test, . . .I may select correlated (Ü uninformative) dimensionsI joint selection of multiple features is useful but expensive

I Projection into (linear) subspaceI principal component analysis (PCA)I independent component analysis (ICA)I random indexing (RI)

+ intuition: preserve distances between data points

DSM Tutorial – Part 1 59 / 91

Page 101: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Dimensionality reduction = model compression

I Co-occurrence matrix M is often unmanageably largeand can be extremely sparse

I Google Web1T5: 1M × 1M matrix with one trillion cells, ofwhich less than 0.05% contain nonzero counts (Evert 2010)

å Compress matrix by reducing dimensionality (= rows)

I Feature selection: columns with high frequency & varianceI measured by entropy, chi-squared test, . . .I may select correlated (Ü uninformative) dimensionsI joint selection of multiple features is useful but expensive

I Projection into (linear) subspaceI principal component analysis (PCA)I independent component analysis (ICA)I random indexing (RI)

+ intuition: preserve distances between data points

DSM Tutorial – Part 1 59 / 91

Page 102: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Dimensionality reduction = model compression

I Co-occurrence matrix M is often unmanageably largeand can be extremely sparse

I Google Web1T5: 1M × 1M matrix with one trillion cells, ofwhich less than 0.05% contain nonzero counts (Evert 2010)

å Compress matrix by reducing dimensionality (= rows)

I Feature selection: columns with high frequency & varianceI measured by entropy, chi-squared test, . . .I may select correlated (Ü uninformative) dimensionsI joint selection of multiple features is useful but expensive

I Projection into (linear) subspaceI principal component analysis (PCA)I independent component analysis (ICA)I random indexing (RI)

+ intuition: preserve distances between data points

DSM Tutorial – Part 1 59 / 91

Page 103: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Dimensionality reduction & latent dimensions

Landauer and Dumais (1997) claim that LSA dimensionalityreduction (and related PCA technique) uncovers latentdimensions by exploiting correlations between features.

I Example: term-term matrixI V-Obj cooc’s extracted from BNC

I targets = noun lemmasI features = verb lemmas

I feature scaling: association scores(modified log Dice coefficient)

I k = 111 nouns with f ≥ 20(must have non-zero row vectors)

I n = 2 dimensions: buy and sell

noun buy sellbond 0.28 0.77cigarette -0.52 0.44dress 0.51 -1.30freehold -0.01 -0.08land 1.13 1.54number -1.05 -1.02per -0.35 -0.16pub -0.08 -1.30share 1.92 1.99system -1.63 -0.70

DSM Tutorial – Part 1 60 / 91

Page 104: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Dimensionality reduction & latent dimensions

0 1 2 3 4

01

23

4

buy

sell

acre

advertising

amount

arm

asset

bag

beerbill

bit

bond book

bottle

boxbread

building

business

car

card

carpet

cigaretteclothe

club

coal

collectioncompany

computer

copy

couple

currency

dress

drink

drugequipmentestate

farm

fish

flat

flower

foodfreehold

fruitfurniture

good

home

horse

house

insurance

item

kind

land

licence

liquor

lotmachine

material

meat milkmill

newspaper

number

oil

one

packpackagepacket

painting

pair

paperpart

per

petrol

picture

piece

place

plant

player

pound

productproperty

pub

quality

quantity

range

record

right

seatsecurity

service

set

share

shoe

shop

sitesoftware

stake

stamp

stock

stuff

suit

system

television

thing

ticket

time

tin

unit

vehicle

video

wine

work

year

DSM Tutorial – Part 1 61 / 91

Page 105: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Motivating latent dimensions & subspace projection

I The latent property of being a commodity is “expressed”through associations with several verbs: sell, buy, acquire, . . .

I Consequence: these DSM dimensions will be correlated

I Identify latent dimension by looking for strong correlations(or weaker correlations between large sets of features)

I Projection into subspace V of k < n latent dimensionsas a “noise reduction” technique Ü LSA

I Assumptions of this approach:I “latent” distances in V are semantically meaningfulI other “residual” dimensions represent chance co-occurrence

patterns, often particular to the corpus underlying the DSM

DSM Tutorial – Part 1 62 / 91

Page 106: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

Motivating latent dimensions & subspace projection

I The latent property of being a commodity is “expressed”through associations with several verbs: sell, buy, acquire, . . .

I Consequence: these DSM dimensions will be correlated

I Identify latent dimension by looking for strong correlations(or weaker correlations between large sets of features)

I Projection into subspace V of k < n latent dimensionsas a “noise reduction” technique Ü LSA

I Assumptions of this approach:I “latent” distances in V are semantically meaningfulI other “residual” dimensions represent chance co-occurrence

patterns, often particular to the corpus underlying the DSM

DSM Tutorial – Part 1 62 / 91

Page 107: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters DSM parameters

The latent “commodity” dimension

0 1 2 3 4

01

23

4

buy

sell

acre

advertising

amount

arm

asset

bag

beerbill

bit

bond book

bottle

boxbread

building

business

car

card

carpet

cigaretteclothe

club

coal

collectioncompany

computer

copy

couple

currency

dress

drink

drugequipmentestate

farm

fish

flat

flower

foodfreehold

fruitfurniture

good

home

horse

house

insurance

item

kind

land

licence

liquor

lotmachine

material

meat milkmill

newspaper

number

oil

one

packpackagepacket

painting

pair

paperpart

per

petrol

picture

piece

place

plant

player

pound

productproperty

pub

quality

quantity

range

record

right

seatsecurity

service

set

share

shoe

shop

sitesoftware

stake

stamp

stock

stuff

suit

system

television

thing

ticket

time

tin

unit

vehicle

video

wine

work

year

DSM Tutorial – Part 1 63 / 91

Page 108: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Examples

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 64 / 91

Page 109: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Examples

Some well-known DSM examplesLatent Semantic Analysis (Landauer and Dumais 1997)

I term-context matrix with document contextI weighting: log term frequency and term entropyI distance measure: cosineI dimensionality reduction: SVD

Hyperspace Analogue to Language (Lund and Burgess 1996)

I term-term matrix with surface contextI structured (left/right) and distance-weighted frequency countsI distance measure: Minkowski metric (1 ≤ p ≤ 2)I dimensionality reduction: feature selection (high variance)

DSM Tutorial – Part 1 65 / 91

Page 110: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Examples

Some well-known DSM examplesInfomap NLP (Widdows 2004)

I term-term matrix with unstructured surface contextI weighting: noneI distance measure: cosineI dimensionality reduction: SVD

Random Indexing (Karlgren and Sahlgren 2001)

I term-term matrix with unstructured surface contextI weighting: various methodsI distance measure: various methodsI dimensionality reduction: random indexing (RI)

DSM Tutorial – Part 1 66 / 91

Page 111: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

Taxonomy of DSM parameters Examples

Some well-known DSM examplesDependency Vectors (Padó and Lapata 2007)

I term-term matrix with unstructured dependency contextI weighting: log-likelihood ratioI distance measure: information-theoretic (Lin 1998b)I dimensionality reduction: none

Distributional Memory (Baroni and Lenci 2010)

I term-term matrix with structured and unstructereddependencies + knowledge patterns

I weighting: local-MI on type frequencies of link patternsI distance measure: cosineI dimensionality reduction: none

DSM Tutorial – Part 1 67 / 91

Page 112: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 68 / 91

Page 113: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Nearest neighboursDSM based on verb-object relations from BNC, reduced to 100 dim. with SVD

Neighbours of dog (cosine angle):+ girl (45.5), boy (46.7), horse(47.0), wife (48.8), baby (51.9),

daughter (53.1), side (54.9), mother (55.6), boat (55.7), rest(56.3), night (56.7), cat (56.8), son (57.0), man (58.2), place(58.4), husband (58.5), thing (58.8), friend (59.6), . . .

Neighbours of school:+ country (49.3), church (52.1), hospital (53.1), house (54.4),

hotel (55.1), industry (57.0), company (57.0), home (57.7),family (58.4), university (59.0), party (59.4), group (59.5),building (59.8), market (60.3), bank (60.4), business (60.9),area (61.4), department (61.6), club (62.7), town (63.3),library (63.3), room (63.6), service (64.4), police (64.7), . . .

DSM Tutorial – Part 1 69 / 91

Page 114: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Nearest neighbours

girl

boy

horse

wife

baby

daughter

side

mother

boat

other

rest

night

cat

son

man

place

husband

thing

friend

bit

fish

woman

child

minute

animal

car

house

bird

peoplehead

dog

DSM Tutorial – Part 1 70 / 91

Page 115: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Semantic maps

● ●

●●

●●

●●

●●

●●

−0.4 −0.2 0.0 0.8

−0.

4−

0.2

0.0

0.2

0.4

0.6

Semantic map (V−Obj from BNC)

birdgroundAnimalfruitTreegreentoolvehicle

chicken

eagle duck

swan owl

penguin

peacock

dog

elephantcow

cat

lionpig

snail

turtle

cherry

banana

pearpineapple

mushroom

corn

lettuce

potatoonion

bottle

pencil

pen

cup

bowl

scissors

kettle

knife

screwdriver

hammer

spoon

chisel

telephoneboatcarship

truck

rocket

motorcycle

helicopter

DSM Tutorial – Part 1 71 / 91

Page 116: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Clustering

pota

toon

ion

cat

bana

nach

icke

nm

ushr

oom

corn

dog

pear

cher

ryle

ttuce

peng

uin

swan

eagl

eow

ldu

ckel

epha

nt pig

cow

lion

helic

opte

rpe

acoc

ktu

rtle ca

rpi

neap

ple

boat

rock

ettr

uck

mot

orcy

cle

snai

lsh

ipch

isel

scis

sors

scre

wdr

iver

penc

ilha

mm

erte

leph

one

knife

spoo

npe

nke

ttle

bottl

ecu

pbo

wl

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Word space clustering of concrete nouns (V−Obj from BNC)

Clu

ster

siz

e

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

DSM Tutorial – Part 1 72 / 91

Page 117: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Latent dimensions

0 1 2 3 4

01

23

4

buy

sell

acre

advertising

amount

arm

asset

bag

beerbill

bit

bond book

bottle

boxbread

building

business

car

card

carpet

cigaretteclothe

club

coal

collectioncompany

computer

copy

couple

currency

dress

drink

drugequipmentestate

farm

fish

flat

flower

foodfreehold

fruitfurniture

good

home

horse

house

insurance

item

kind

land

licence

liquor

lotmachine

material

meat milkmill

newspaper

number

oil

one

packpackagepacket

painting

pair

paperpart

per

petrol

picture

piece

place

plant

player

pound

productproperty

pub

quality

quantity

range

record

right

seatsecurity

service

set

share

shoe

shop

sitesoftware

stake

stamp

stock

stuff

suit

system

television

thing

ticket

time

tin

unit

vehicle

video

wine

work

year

DSM Tutorial – Part 1 73 / 91

Page 118: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Semantic similarity graph (topological structure)

●●

●●

tea

cup

drink

winewhisky

soup

champagne

offer

beer

citron living

toast

breakfast

salad

journey

meal

lunch

dinner

vodka

rum

coffee

●●

●●

head

arm

foot

finger

legeye

place

ball

side

backdoor

face

bit

carlight

minute

money

glass

horse

week

way

hour

time

paper

bag

house

line

part

lookchance

hand

DSM Tutorial – Part 1 74 / 91

Page 119: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Semantic similarity graph (topological structure)

●●

●●

tea

cup

drink

winewhisky

soup

champagne

offer

beer

citron living

toast

breakfast

salad

journey

meal

lunch

dinner

vodka

rum

coffee ●

●●

●●

head

arm

foot

finger

legeye

place

ball

side

backdoor

face

bit

carlight

minute

money

glass

horse

week

way

hour

time

paper

bag

house

line

part

lookchance

hand

DSM Tutorial – Part 1 74 / 91

Page 120: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Context vectors (Schütze 1998)

Distributional representationonly at type level

+ What is the “average”meaning of mouse?(computer vs. animal)

Context vector approximatesmeaning of individual token

I bag-of-words approach:centroid of all contextwords in the sentence animals

com

pu

ters

cat

furry

chase

keyboard

computer

connect

connect the mouse and keyboard to the computer

the furry catchased the

mouse

DSM Tutorial – Part 1 75 / 91

Page 121: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Using DSM distances

Context vectors (Schütze 1998)

Distributional representationonly at type level

+ What is the “average”meaning of mouse?(computer vs. animal)

Context vector approximatesmeaning of individual token

I bag-of-words approach:centroid of all contextwords in the sentence animals

com

pu

ters

cat

furry

chase

keyboard

computer

connect

connect the mouse and keyboard to the computer

the furry catchased the

mouse

DSM Tutorial – Part 1 75 / 91

Page 122: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 76 / 91

Page 123: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

The TOEFL synonym task

I The TOEFL datasetI 80 itemsI Target: levied

Candidates: believed, correlated, imposed, requested

I Target fashionCandidates: craze, fathom, manner, ration

I DSMs and TOEFL1. take vectors of the target (t) and of the candidates (c1 . . . cn)2. measure the distance between t and ci , with 1 ≤ i ≤ n3. select ci with the shortest distance in space from t

DSM Tutorial – Part 1 77 / 91

Page 124: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

The TOEFL synonym task

I The TOEFL datasetI 80 itemsI Target: levied

Candidates: believed, correlated, imposed, requested

I Target fashionCandidates: craze, fathom, manner, ration

I DSMs and TOEFL1. take vectors of the target (t) and of the candidates (c1 . . . cn)2. measure the distance between t and ci , with 1 ≤ i ≤ n3. select ci with the shortest distance in space from t

DSM Tutorial – Part 1 77 / 91

Page 125: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

The TOEFL synonym task

I The TOEFL datasetI 80 itemsI Target: levied

Candidates: believed, correlated, imposed, requestedI Target fashion

Candidates: craze, fathom, manner, ration

I DSMs and TOEFL1. take vectors of the target (t) and of the candidates (c1 . . . cn)2. measure the distance between t and ci , with 1 ≤ i ≤ n3. select ci with the shortest distance in space from t

DSM Tutorial – Part 1 77 / 91

Page 126: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

The TOEFL synonym task

I The TOEFL datasetI 80 itemsI Target: levied

Candidates: believed, correlated, imposed, requestedI Target fashion

Candidates: craze, fathom, manner, ration

I DSMs and TOEFL1. take vectors of the target (t) and of the candidates (c1 . . . cn)2. measure the distance between t and ci , with 1 ≤ i ≤ n3. select ci with the shortest distance in space from t

DSM Tutorial – Part 1 77 / 91

Page 127: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

The TOEFL synonym task

I The TOEFL datasetI 80 itemsI Target: levied

Candidates: believed, correlated, imposed, requestedI Target fashion

Candidates: craze, fathom, manner, ration

I DSMs and TOEFL1. take vectors of the target (t) and of the candidates (c1 . . . cn)2. measure the distance between t and ci , with 1 ≤ i ≤ n3. select ci with the shortest distance in space from t

© Evert/Baroni/Lenci (CC-by-sa) DSM Tutorial – Part 1 77 / 91

Page 128: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

Humans vs. machines on the TOEFL task

I Average foreign test taker: 64.5%

I Macquarie University staff (Rapp 2004):I Average of 5 non-natives: 86.75%I Average of 5 natives: 97.75%

I Distributional semanticsI Classic LSA (Landauer and Dumais 1997): 64.4%I Padó and Lapata’s (2007) dependency-based model: 73.0%I Distributional memory (Baroni and Lenci 2010): 76.9%I Rapp’s (2004) SVD-based model, lemmatized BNC: 92.5%I Bullinaria and Levy (2012) carry out aggressive parameter

optimization: 100.0%

DSM Tutorial – Part 1 78 / 91

Page 129: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

Humans vs. machines on the TOEFL task

I Average foreign test taker: 64.5%I Macquarie University staff (Rapp 2004):

I Average of 5 non-natives: 86.75%I Average of 5 natives: 97.75%

I Distributional semanticsI Classic LSA (Landauer and Dumais 1997): 64.4%I Padó and Lapata’s (2007) dependency-based model: 73.0%I Distributional memory (Baroni and Lenci 2010): 76.9%I Rapp’s (2004) SVD-based model, lemmatized BNC: 92.5%I Bullinaria and Levy (2012) carry out aggressive parameter

optimization: 100.0%

DSM Tutorial – Part 1 78 / 91

Page 130: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

Humans vs. machines on the TOEFL task

I Average foreign test taker: 64.5%I Macquarie University staff (Rapp 2004):

I Average of 5 non-natives: 86.75%I Average of 5 natives: 97.75%

I Distributional semanticsI Classic LSA (Landauer and Dumais 1997): 64.4%I Padó and Lapata’s (2007) dependency-based model: 73.0%I Distributional memory (Baroni and Lenci 2010): 76.9%I Rapp’s (2004) SVD-based model, lemmatized BNC: 92.5%I Bullinaria and Levy (2012) carry out aggressive parameter

optimization: 100.0%

DSM Tutorial – Part 1 78 / 91

Page 131: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

Semantic similarity judgments

I Rubenstein and Goodenough (1965) collected similarityratings for 65 noun pairs from 51 subjects on a 0–4 scale

w1 w2 avg. ratingcar automobile 3.9food fruit 2.7cord smile 0.0

I DSMs vs. Rubenstein & Goodenough1. for each test pair (w1,w2), take vectors w1 and w22. measure the distance (e.g. cosine) between w1 and w23. measure (Pearson) correlation between vector distances and

R&G average judgments (Padó and Lapata 2007)

DSM Tutorial – Part 1 79 / 91

Page 132: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

Semantic similarity judgments

I Rubenstein and Goodenough (1965) collected similarityratings for 65 noun pairs from 51 subjects on a 0–4 scale

w1 w2 avg. ratingcar automobile 3.9food fruit 2.7cord smile 0.0

I DSMs vs. Rubenstein & Goodenough1. for each test pair (w1,w2), take vectors w1 and w22. measure the distance (e.g. cosine) between w1 and w23. measure (Pearson) correlation between vector distances and

R&G average judgments (Padó and Lapata 2007)

DSM Tutorial – Part 1 79 / 91

Page 133: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

Semantic similarity judgments: example

●●

●●

● ●●

●●

●●

● ●●

●●

●●

0 1 2 3 4

3040

5060

7080

9010

0

RG65: British National Corpus

human rating

angu

lar

dist

ance

|rho| = 0.748, p = 0.0000, |r| = 0.623 .. 0.842

DSM Tutorial – Part 1 80 / 91

Page 134: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Quantitative evaluation

Semantic similarity judgments: results

Results on RG65 task:I Padó and Lapata’s (2007) dependency-based model: 0.62I Dependency-based on Web corpus (Herdağdelen et al. 2009)

I without SVD reduction: 0.69I with SVD reduction: 0.80

I Distributional memory (Baroni and Lenci 2010): 0.82I Salient Semantic Analysis (Hassan and Mihalcea 2011): 0.86

DSM Tutorial – Part 1 81 / 91

Page 135: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

Outline

IntroductionThe distributional hypothesisThree famous DSM examples

Taxonomy of DSM parametersDefinition & overviewDSM parametersExamples

DSM in practiceUsing DSM distancesQuantitative evaluationSoftware and further information

DSM Tutorial – Part 1 82 / 91

Page 136: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

Software packages

HiDEx C++ re-implementation of the HAL model(Lund and Burgess 1996)

SemanticVectors Java scalable architecture based on randomindexing representation

S-Space Java complex object-oriented frameworkJoBimText Java UIMA / Hadoop frameworkGensim Python complex framework, focus on paral-

lelization and out-of-core algorithmsDISSECT Python user-friendly, designed for research on

compositional semanticswordspace R interactive research laboratory, but

scales to real-life data sets

click on package name to open Web page

DSM Tutorial – Part 1 83 / 91

Page 137: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

Recent conferences and workshopsI 2007: CoSMo Workshop (at Context ’07)I 2008: ESSLLI Lexical Semantics Workshop & Shared Task,

Special Issue of the Italian Journal of LinguisticsI 2009: GeMS Workshop (EACL 2009), DiSCo Workshop

(CogSci 2009), ESSLLI Advanced Course on DSMI 2010: 2nd GeMS (ACL 2010), ESSLLI Workshop on

Compositionality and DSM, DSM Tutorial (NAACL 2010),Special Issue of JNLE on Distributional Lexical Semantics

I 2011: 2nd DiSCo (ACL 2011), 3rd GeMS (EMNLP 2011)I 2012: DiDaS (at ICSC 2012)I 2013: CVSC (ACL 2013), TFDS (IWCS 2013), DagstuhlI 2014: 2nd CVSC (at EACL 2014)

click on Workshop name to open Web page

DSM Tutorial – Part 1 84 / 91

Page 138: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

Further information

I Handouts & other materials available from wordspace wiki athttp://wordspace.collocations.de/

+ based on joint work with Marco Baroni and Alessandro LenciI Tutorial is open source (CC), and can be downloaded from

http://r-forge.r-project.org/projects/wordspace/

I Review paper on distributional semantics:Turney, Peter D. and Pantel, Patrick (2010). From frequencyto meaning: Vector space models of semantics. Journal ofArtificial Intelligence Research, 37, 141–188.

I I should be working on textbook Distributional Semantics forSynthesis Lectures on HLT (Morgan & Claypool)

DSM Tutorial – Part 1 85 / 91

Page 139: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

References I

Baroni, Marco and Lenci, Alessandro (2010). Distributional Memory: A generalframework for corpus-based semantics. Computational Linguistics, 36(4), 673–712.

Bengio, Yoshua; Ducharme, Réjean; Vincent, Pascal; Jauvin, Christian (2003). Aneural probabilistic language model. Journal of Machine Learning Research, 3,1137–1155.

Blei, David M.; Ng, Andrew Y.; Jordan, Michael, I. (2003). Latent Dirichletallocation. Journal of Machine Learning Research, 3, 993–1022.

Bullinaria, John A. and Levy, Joseph P. (2012). Extracting semantic representationsfrom word co-occurrence statistics: Stop-lists, stemming and SVD. BehaviorResearch Methods, 44(3), 890–907.

Church, Kenneth W. and Hanks, Patrick (1990). Word association norms, mutualinformation, and lexicography. Computational Linguistics, 16(1), 22–29.

Dumais, S. T.; Furnas, G. W.; Landauer, T. K.; Deerwester, S.; Harshman, R. (1988).Using latent semantic analysis to improve access to textual information. In CHI’88: Proceedings of the SIGCHI conference on Human factors in computingsystems, pages 281–285.

Dunning, Ted E. (1993). Accurate methods for the statistics of surprise andcoincidence. Computational Linguistics, 19(1), 61–74.

DSM Tutorial – Part 1 86 / 91

Page 140: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

References II

Endres, Dominik M. and Schindelin, Johannes E. (2003). A new metric for probabilitydistributions. IEEE Transactions on Information Theory, 49(7), 1858–1860.

Evert, Stefan (2004). The Statistics of Word Cooccurrences: Word Pairs andCollocations. Dissertation, Institut für maschinelle Sprachverarbeitung, Universityof Stuttgart. Published in 2005, URN urn:nbn:de:bsz:93-opus-23714. Availablefrom http://www.collocations.de/phd.html.

Evert, Stefan (2008). Corpora and collocations. In A. Lüdeling and M. Kytö (eds.),Corpus Linguistics. An International Handbook, chapter 58. Mouton de Gruyter,Berlin, New York.

Evert, Stefan (2010). Google Web 1T5 n-grams made easy (but not for thecomputer). In Proceedings of the 6th Web as Corpus Workshop (WAC-6), pages32–40, Los Angeles, CA.

Firth, J. R. (1957). A synopsis of linguistic theory 1930–55. In Studies in linguisticanalysis, pages 1–32. The Philological Society, Oxford. Reprinted in Palmer (1968),pages 168–205.

Grefenstette, Gregory (1994). Explorations in Automatic Thesaurus Discovery, volume278 of Kluwer International Series in Engineering and Computer Science. Springer,Berlin, New York.

DSM Tutorial – Part 1 87 / 91

Page 141: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

References III

Harris, Zellig (1954). Distributional structure. Word, 10(23), 146–162. Reprinted inHarris (1970, 775–794).

Hassan, Samer and Mihalcea, Rada (2011). Semantic relatedness using salientsemantic analysis. In Proceedings of the Twenty-fifth AAAI Conference on ArtificialIntelligence.

Herdağdelen, Amaç; Erk, Katrin; Baroni, Marco (2009). Measuring semanticrelatedness with vector space models and random walks. In Proceedings of the2009 Workshop on Graph-based Methods for Natural Language Processing(TextGraphs-4), pages 50–53, Suntec, Singapore.

Hoffmann, Thomas (1999). Probabilistic latent semantic analysis. In Proceedings ofthe Fifteenth Conference on Uncertainty in Artificial Intelligence (UAI’99).

Karlgren, Jussi and Sahlgren, Magnus (2001). From words to understanding. InY. Uesaka, P. Kanerva, and H. Asoh (eds.), Foundations of Real-WorldIntelligence, chapter 294–308. CSLI Publications, Stanford.

Landauer, Thomas K. and Dumais, Susan T. (1997). A solution to Plato’s problem:The latent semantic analysis theory of acquisition, induction and representation ofknowledge. Psychological Review, 104(2), 211–240.

DSM Tutorial – Part 1 88 / 91

Page 142: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

References IV

Li, Ping; Burgess, Curt; Lund, Kevin (2000). The acquisition of word meaningthrough global lexical co-occurences. In E. V. Clark (ed.), The Proceedings of theThirtieth Annual Child Language Research Forum, pages 167–178. StanfordLinguistics Association.

Lin, Dekang (1998a). Automatic retrieval and clustering of similar words. InProceedings of the 17th International Conference on Computational Linguistics(COLING-ACL 1998), pages 768–774, Montreal, Canada.

Lin, Dekang (1998b). An information-theoretic definition of similarity. In Proceedingsof the 15th International Conference on Machine Learning (ICML-98), pages296–304, Madison, WI.

Lund, Kevin and Burgess, Curt (1996). Producing high-dimensional semantic spacesfrom lexical co-occurrence. Behavior Research Methods, Instruments, &Computers, 28(2), 203–208.

Miller, George A. (1986). Dictionaries in the mind. Language and Cognitive Processes,1, 171–185.

Padó, Sebastian and Lapata, Mirella (2007). Dependency-based construction ofsemantic space models. Computational Linguistics, 33(2), 161–199.

DSM Tutorial – Part 1 89 / 91

Page 143: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

References V

Pantel, Patrick and Lin, Dekang (2000). An unsupervised approach to prepositionalphrase attachment using contextually similar words. In Proceedings of the 38thAnnual Meeting of the Association for Computational Linguistics, Hongkong,China.

Pantel, Patrick; Crestan, Eric; Borkovsky, Arkady; Popescu, Ana-Maria; Vyas, Vishnu(2009). Web-scale distributional similarity and entity set expansion. In Proceedingsof the 2009 Conference on Empirical Methods in Natural Language Processing,pages 938–947, Singapore.

Rapp, Reinhard (2004). A freely available automatically generated thesaurus of relatedwords. In Proceedings of the 4th International Conference on Language Resourcesand Evaluation (LREC 2004), pages 395–398.

Rooth, Mats; Riezler, Stefan; Prescher, Detlef; Carroll, Glenn; Beil, Franz (1999).Inducing a semantically annotated lexicon via EM-based clustering. In Proceedingsof the 37th Annual Meeting of the Association for Computational Linguistics,pages 104–111.

Rubenstein, Herbert and Goodenough, John B. (1965). Contextual correlates ofsynonymy. Communications of the ACM, 8(10), 627–633.

Schütze, Hinrich (1992). Dimensions of meaning. In Proceedings of Supercomputing’92, pages 787–796, Minneapolis, MN.

DSM Tutorial – Part 1 90 / 91

Page 144: Distributional Semantic Models - Part 1: Introductioncompling.hss.ntu.edu.sg/courses/hg2002/pdf/... · Semantic map (V-Obj from BNC) l l l l bird groundAnimal fruitTree green tool

DSM in practice Software and further information

References VI

Schütze, Hinrich (1993). Word space. In Proceedings of Advances in NeuralInformation Processing Systems 5, pages 895–902, San Mateo, CA.

Schütze, Hinrich (1995). Distributional part-of-speech tagging. In Proceedings of the7th Conference of the European Chapter of the Association for ComputationalLinguistics (EACL 1995), pages 141–148.

Schütze, Hinrich (1998). Automatic word sense discrimination. ComputationalLinguistics, 24(1), 97–123.

Turney, Peter D. and Pantel, Patrick (2010). From frequency to meaning: Vectorspace models of semantics. Journal of Artificial Intelligence Research, 37, 141–188.

Turney, Peter D.; Littman, Michael L.; Bigham, Jeffrey; Shnayder, Victor (2003).Combining independent modules to solve multiple-choice synonym and analogyproblems. In Proceedings of the International Conference on Recent Advances inNatural Language Processing (RANLP-03), pages 482–489, Borovets, Bulgaria.

Widdows, Dominic (2004). Geometry and Meaning. Number 172 in CSLI LectureNotes. CSLI Publications, Stanford.

DSM Tutorial – Part 1 91 / 91