Distributional semantics with eyes - Enriching corpus...

36
Distributional semantics with eyes Enriching corpus-based models of word meaning with automatically extracted visual features Marco Baroni Center for Mind/Brain Sciences University of Trento Computational Linguistics Colloquium Computational Linguistics & Phonetics Department Saarland University May 2012 1

Transcript of Distributional semantics with eyes - Enriching corpus...

Page 1: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Distributional semantics with eyesEnriching corpus-based models of word meaning

with automatically extracted visual features

Marco Baroni

Center for Mind/Brain SciencesUniversity of Trento

Computational Linguistics ColloquiumComputational Linguistics & Phonetics Department

Saarland UniversityMay 2012

1

Page 2: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Collaborators

award

Jasper Uijlings

Giang Binh Tran

Nam Khanh Tran

Gemma Boleda

Elia Bruni

2

Page 3: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Warning!

This talk is NOT about:

I image retrievalI image annotationI object recognitionI connecting specific images to captions/phrases/wordsI improving computer vision

The talk is about using information extracted from images toimprove the semantic representation of word types

3

Page 4: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

The distributional hypothesisHarris, Charles and Miller, Firth, Wittgenstein? . . .

The meaning of a word is (can beapproximated by, derived from) the setof contexts in which it occurs in texts

4

Page 5: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

The distributional hypothesis in real lifeMcDonald & Ramscar 2001

He filled the wampimuk, passed itaround and we all drunk some

We found a little, hairy wampimuksleeping behind the tree

5

Page 6: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Distributional semanticsLandauer and Dumais (1997), Turney and Pantel (2010), . . .

he curtains open and the moon shining in on the barelyars and the cold , close moon " . And neither of the wrough the night with the moon shining so brightly , itmade in the light of the moon . It all boils down , wrsurely under a crescent moon , thrilled by ice-white

sun , the seasons of the moon ? Home , alone , Jay plam is dazzling snow , the moon has risen full and coldun and the temple of the moon , driving out of the hugin the dark and now the moon rises , full and amber a

bird on the shape of the moon over the trees in frontBut I could n’t see the moon or the stars , only the

rning , with a sliver of moon hanging among the starsthey love the sun , the moon and the stars . None of

the light of an enormous moon . The plash of flowing wman ’s first step on the moon ; various exhibits , aerthe inevitable piece of moon rock . Housing The Airsh

oud obscured part of the moon . The Allied guns behind

6

Page 7: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Distributional semanticsDistributional meaning as co-occurrence vector

planet night full shadow shine crescent

moon 10 22 43 16 29 12

sun 14 10 4 15 45 0

dog 0 4 2 10 0 0

7

Page 8: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Distributional semanticsThe geometry of meaning

shadow shinemoon 16 29sun 15 45dog 10 0

0 5 10 15 20

010

2030

4050

shadow

shine

dog (10,0)

sun (15,45)

moon (16,29)

8

Page 9: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Distributional semantics: A general-purposerepresentation of lexical meaningBaroni and Lenci, 2010

I Similarity (cord-string vs. cord-smile)I Synonymy (zenith-pinnacle)I Concept categorization (car ISA vehicle; banana ISA fruit)I Selectional preferences (eat topinambur vs. *eat sympathy)I Analogy (mason is to stone like carpenter is to wood)I Relation classification (exam-anxiety are in

CAUSE-EFFECT relation)I Qualia (TELIC ROLE of novel is to entertain)I Salient properties (car-wheels, dog-barking)I Argument alternations (John broke the vase - the vase

broke, John minces the meat - *the meat minced)

9

Page 10: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

The ungrounded nature of distributional semanticsGlenberg and Robertson 2000, Andrews, Vigliocco and Vinson 2009,Riordan and Jones 2010. . .

Describing tigers. . .

Humans (McRae et al.,2005):

I have stripesI have teethI are blackI . . .

State-of-the art distributionalmodel (Baroni et al., 2010):

I live in jungleI can killI risk extinctionI . . .

10

Page 11: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

The ungrounded nature of distributional semantics

SV

DS

UB

JEC

TS

TAXO RELENT PART QUALITY ACTIVITY FUNCTION LOCATION

Baroni and Lenci 2008 11

Page 12: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Interlude: “blind” semantics?

SIGHTED

BLIND

ABS_ENT

ABS_PROP

CONC_ENT

CONC_PROP

EVENT

PART

SPACE

TAXO

TIME

Cazzolli, Baroni, Lenci and Marotta in preparation12

Page 13: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

The distributional hypothesis, generalized

The meaning of a word is (can beapproximated by, derived from) the setof contexts in which it occurs //in///////texts

13

Page 14: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Context in the 2010s

14

Page 15: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Multimodal distributional semanticsusing textual and visual collocates

planet night

moon 10 22 22 0

sun 14 10 15 0

dog 0 4 0 20

15

Page 16: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Bags of visual wordsSivic and Zisserman, 2003

3 3 0 0 2 3 1 116

Page 17: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Determining the visual vocabulary

!

!

!

17

Page 18: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Representing images as bags-of-visual-word vectors

!"#$%&'

!"#$ % # & #

18

Page 19: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Associating bags-of-visual-word vectorsto word labels

!""#

!""# $ % & %

19

Page 20: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Simple multimodal matricesBruni, G.B. Tran and Baroni 2011

!"#$%& !"'()

*%%( +, -. / 0

!1( +2 02 - .

$%3 +/ / . +

See also Feng and Lapata 2010, Leong and Mihalcea 2011 20

Page 21: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

The ESP labeled-image data setvon Ahn, 2003

framefacealienboldgreysmile

malldevilredpictureman

bedmotelwhitelampflowerbreakfastpillowsfruithotel 21

Page 22: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Distribution of related concepts in text-vs. image-based spaceBLESS data set, 184 concrete concepts

Text

●●●

●●

●●

● ●

●●

COORD HYPER MERO ATTRI EVENT RAN.N RAN.J RAN.V

−2

−1

01

2

Image

●●

●●

●●

●●

●●

●●

COORD HYPER MERO ATTRI EVENT RAN.N RAN.J RAN.V

−2

−1

01

2

22

Page 23: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Nearest attributes of BLESS concepts

concept text imagecabbage leafy whitecarrot fresh orangecherry ripe reddeer wild browndishwasher electric whitehat white oldhatchet sharp shortonion fresh whiteoven electric newplum juicy redsparrow wild littletanker heavy grey 23

Page 24: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Multimodal fusionBruni, N.K. Tran and Baroni submitted

Latent multimodal smoothing(2)

normalize and concatenate

Textual fea-ture matrix

Visual fea-ture matrix

Text corpus Labeledimage data(1)

split blocks

Textualsmoothed matrix

Visualsmoothed matrix

Multimodal similarity estimation(3)

Textual features

Visual features

Featurecombination

Similarityestimate

Textual features

Visual features

Similarityestimate

Similarityestimate

Scorecombination

24

Page 25: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Predicting human semantic relatedness judgments

Window 2 Window 20Model MEN WordSim MEN WordSimText 0.73 0.70 0.68 0.70Image 0.43 0.36 0.43 0.36SmoothedText 0.77 0.73 0.74 0.75FeatureCombine 0.78 0.72 0.76 0.75ScoreCombine 0.78 0.71 0.77 0.72

25

Page 26: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Similar conceptsbest captured by Text vs. FeatureCombine

Text FeatureCombinedawn/dusk pet/puppysunrise/sunset candy/chocolatecanine/dog paw/petgrape/wine bicycle/bikefoliage/plant apple/cherryfoliage/petal copper/metalskyscraper/tall military/soldiercat/feline paws/whiskerspregnancy/pregnant stream/waterfallmisty/rain cheetah/lion

26

Page 27: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Distributional semantics in TechnicolorBruni, Boleda, Baroni and N.K. Tran 2012

Experiment 1 find typical color of 52 concrete objects:cardboard is brown, coal is black, forest is green(typical colors assigned by two judges byconsensus)

Experiment 2 distinguish literal and non-literal usages of coloradjectives: blue uniform, blue shark, blue note(342 adjective-noun pairs, 227 literal, 115non-literal, as decided by two judges byconsensus)

27

Page 28: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Experiment 1 resultsMedian rank of correct color and number of top matches

Model Exp 1TEXT30K 3 (11)

LAB128 1 (27)

SIFT40K 3 (15)

TEXT+LAB128 1 (27)

TEXT+SIFT40K 2 (17)

28

Page 29: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Experiment 1 examples

word gold LAB SIFT TEXTbanana yellow yellow blue orange

cauliflower white green yellow orangecello brown brown black bluedeer brown green blue redfroth white brown black orange

gorilla black black red greygrass green green green greenpig pink pink brown brownsea blue blue blue grey

weed green green yellow purple

29

Page 30: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Experiment 2 resultsAverage difference in normalized adj-noun cosines in literal vs. non-literal conditionswith t-test significance

Model Exp 1 Exp2TEXT30K 3 (11) .53***LAB128 1 (27) .25*SIFT40K 3 (15) .57***TEXT+LAB128 1 (27) .36***TEXT+SIFT40K 2 (17) .73***

30

Page 31: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Experiment 2 color breakdown

L N

0.05

0.10

0.15

0.20

0.25

0.30

Vision: black

L N

0.0

0.1

0.2

0.3

0.4

0.5

Text: black

L N

0.10

0.15

0.20

0.25

0.30

0.35

Vision: blue

L N

0.0

0.1

0.2

0.3

Text: blue

L N

0.05

0.15

0.25

Vision: green

L N

0.00

0.04

0.08

0.12

Text: green

L N

0.05

0.10

0.15

0.20

0.25

0.30

Vision: red

L N

0.00

0.10

0.20

0.30

Text: red

L N

0.05

0.10

0.15

0.20

0.25

0.30

Vision: white

●●

L N

0.00

0.05

0.10

0.15

Text: white

black issue, culture, business blue note, shark, shieldgreen future, politics, energy red meat, belt, face

31

Page 32: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

What would you rather eat?Bergsma and Goebel 2011

I migas?

I zeolite?

I carillons?

I a ficus?

I a mamey?

I manioc?

32

Page 33: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

What would you rather eat?Bergsma and Goebel 2011

Figure 1: Which out-of-vocabulary nouns areplausible direct objects for the verb eat? Each rowcorresponds to a noun: 1. migas, 2. zeolite, 3.carillon, 4. ficus, 5. mamey and 6. manioc.

sponding classifier that scores noun arguments onthe basis of various textual features. We use thisdiscriminative framework to incorporate the visualinformation as new, visual features.Our experiments evaluate the ability of these

classifiers to correctly predict the selectional pref-erences of a small set of verbs. We evaluate twocases: 1) the case where the nouns are all as-sumed to be out-of-vocabulary, and the classifiersmust make predictions without any corpus-basedco-occurrence information, and 2) the case wherewe assume access to noun-verb co-occurrence in-formation derived from web-scale N-gram data.We show that visual features are useful for some

verbs, but not for others. For verbs taking abstractarguments without definitive visual features, theclassifier can often learn to disregard the visualdata. On the other hand, for verbs taking physi-cal arguments (such as food, animals, or people),the classifier can make accurate predictions usingthe nouns’ visual properties. In these cases, visualinformation remains useful even after incorporat-ing the web-scale statistics.

2 Visual Selectional Preference

Consider determining whether the nouns carillon,migas and mamey are plausible arguments for the

verb eat. Existing systems are unlikely to havesuch words in their training data, let alone infor-mation about their edibility. However, after in-specting a few images returned by a Google searchfor these words (Figure 1), a human might rea-sonably predict which words are edible. Humansmake this determination by observing both intrin-sic visual properties (pits, skins, rounded shapesand fruity colors) and extrinsic visual context (cir-cular plates, bowls, and other food-related tools)(Oliva and Torralba, 2007).We propose using similar information to pre-

dict the plausibility of arbitrary verb-noun pairs.That is, we aim to learn the distinguishing vi-sual features of all nouns that are plausible argu-ments for a given verb. This differs from workthat has aimed to recognize, annotate and retrieveobjects defined by a single phrase, such as tree orwrist watch (Feng and Lapata, 2010a). These ap-proaches learn from labeled images during train-ing in order to assign words to unlabeled imagesduring testing. In contrast, we analyze labeled im-ages (during training and testing) in order to deter-mine their visual compatibility with a given predi-cate. Our approach does not need labeled trainingimages for a specific noun in order to assess thatnoun during testing; e.g. we can make a reason-able prediction for the plausibility of eat mameyeven if we’ve never encountered mamey before.We now specify how we automatically 1) down-

load a set of images for each noun, 2) extract vi-sual features from each image, and 3) combine thevisual features from multiple images into plausi-bility scores. Scripts, code and data are availableat: www.clsp.jhu.edu/∼sbergsma/ImageSP/.

2.1 Mining noun images from the web

To obtain a set of images for a particular noun ar-gument, we submit the noun as a query to eitherthe Flickr photo-sharing website (www.flickr.com), or Google’s image search (www.google.com/imghp). In both cases, we download thethumbnails on the results page directly rather thandownloading the source images. Flickr returns im-ages by matching the query against user-providedtags and accompanying text. Google retrieves im-ages based on the image caption, file-name, andsurrounding text (Feng and Lapata, 2010a). Im-ages obtained from Google are known to be com-petitive with “hand prepared datasets” for trainingobject recognizers (Fergus et al., 2005).

400

33

Page 34: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

Coda: the illustrated distributional hypothesisBruni, Uijlings and Baroni rejected

The meaning of a visually depicted concept is (canbe approximated by, derived from) the set ofcontexts in which it occurs in images

34

Page 35: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

The illustrated distributional hypothesisSperman correlation of image-based models with semantic relatedness intuitionsfor 20 concrete Pascal concepts

Segmentation:Area No Manual AutomaticConcept NA 39 36Surround NA 50 51Concept+Surround 47 54 54

!"#!$%&

'())"(#*

35

Page 36: Distributional semantics with eyes - Enriching corpus ...marcobaroni.org/publications/lectures/eyed-distsem-saarbruecken... · Distributional semantics: A general-purpose representation

That’s all, folks!

Thank you!

http://clic.cimec.unitn.it/marco

36