Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005...

117
1 Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire Laboratoire Leibniz-IMAG (CNRS UMR 5522) University of Grenoble France [email protected]

Transcript of Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005...

Page 1: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

1

Latent Semantic Analysis

a tutorial at CogSci'2005

Benoît Lemaire

Laboratoire Leibniz-IMAG (CNRS UMR 5522)University of Grenoble

[email protected]

Page 2: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

2

Outline

Intuitive introductionrepresenting the meaning of words from the analysis of huge corpora

LSA techniqueTestsCognitive modelsLimitsCompeting models

Page 3: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

3

Intuitive introductionIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Corpus

planeairport

flybee pilot

truckcar

drive

wingsbird

traintravel

parks

restauranteat

applebanana

vegetablesapples

fruittree

gardenflower

flowersgrow

bouquetpetals

flyingflew

nectarhoney

Page 4: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

4

From words to textsIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

planeairport

flybee pilot

truckcar

drive

wingsbird

traintravel

parks

restauranteat

applebanana

vegetablesapples

fruittree

gardenflower

flowersgrow

bouquetpetals

flyingflew

nectarhoney

The pilot parks the plane

Page 5: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

5

From words to textsIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

planeairport

flybee pilot

truckcar

drive

wingsbird

traintravel

parks

restauranteat

applebanana

vegetablesapples

fruittree

gardenflower

flowersgrow

bouquetpetals

flyingflew

nectarhoney

The pilot parks the plane

Page 6: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

6

From words to textsIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

planeairport

flybee pilot

truckcar

drive

wingsbird

traintravel

parks

restauranteat

applebanana

vegetablesapples

fruittree

gardenflower

flowersgrow

bouquetpetals

flyingflew

nectarhoney

The pilot parks the plane

Page 7: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

7

From words to textsIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

planeairport

flybee pilot

truckcar

drive

wingsbird

traintravel

parks

restauranteat

applebanana

vegetablesapples

fruittree

gardenflower

flowersgrow

bouquetpetals

flyingflew

nectarhoney

The pilot parks the plane

Page 8: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

8

Usages of LSA

plane

airport

flybee

pilot

truck

car

drive

wingsbird

train

travelpark

restauranteat

applebanana

vegetablesapples

fruittree

gardenflower

flowers

grow

bouquetpetals

flyingflew

nectarhoney

Corpus

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 9: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

9

LSA as a tool

Compare textsControl experiment material

plane

airport

flybee

pilot

truck

car

drive

wingsbird

train

travelpark

restauranteat

applebanana

vegetablesapples

garden

flowersbouquet

petals

flyingflew

nectarhoney

Corpus

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 10: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

10

LSA as a cognitive modelof semantic memory

Build lots of cognitive modelsText comprehensionInterface navigation...

Corpus

plane

airport

flybee

pilot

truck

car

drive

wingsbird

train

travelpark

restauranteat

applebanana

vegetablesapples

fruittree

gardenflower

flowers

grow

bouquetpetals

flyingflew

nectarhoney

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 11: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

11

LSA as a cognitive model of word acquisition

plane

airport

flybee

pilot

truck

car

drive

wingsbird

train

travelpark

restauranteat

applebanana

vegetablesapples

fruittree

gardenflower

flowers

grow

bouquetpetals

flyingflew

nectarhoney

Corpus

Humans also construct the meaning of words from written materialDoes LSA account for this process?

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 12: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

12

Outline

Intuitive introductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 13: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

13

Semantic information is drawn from raw texts

Take a huge corpusSplit it into paragraphsDetermine the statistical context in which each word occurs

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 14: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

14

A first approach based on co-occurrence

Define the meaning of a word from the co-occurring wordsTwo words are similar if they occur in the same paragraphs

plane= (1,0,0,1,0,0,0,0,2,1,0,1,1,0,0,0,0,0,1,1)

airport=(1,1,0,0,0,0,0,0,1,1,0,2,1,0,0,0,0,0,0,1)

Does not work well (French, Perfetti)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 15: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

15

A better approach

Two words are similar if they occur in the same paragraphs

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 16: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

16

A better approach

Two words are similar if they occur in the same paragraphs

Two words are similar if they occur in similar paragraphs

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 17: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

17

A better approach

Two words are similar if they occur in the same paragraph

Two words are similar if they occur in similar paragraphs

Two paragraphs are similar if they

contain common words

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 18: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

18

Two paragraphs are similar if they

contain common words

A better approach

Two paragraphs are similar if they

contain similar words

Two words are similar if they occur in the same paragraph

Two words are similar if they occur in similar paragraphs

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 19: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

19

A mutual recursion

Mathematicians know how to solve such circular systemsSuppose P paragraphs, W wordsBuild M, a P x W matrix

M[i,j] = nb of occurrence of word i in paragraph j

Each word is a P-dimension vectorReduce the matrix to its best 300 dimensions

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 20: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

20

Example (Landauer & Dumais 1997)

Human 1 0 0 1 0 0 0 0 0

interface 1 0 1 0 0 0 0 0 0

computer 1 1 0 0 0 0 0 0 0

user 0 1 1 0 1 0 0 0 0

system 0 1 1 2 0 0 0 0 0

response 0 1 0 0 1 0 0 0 0

time 0 1 0 0 1 0 0 0 0

EPS 0 0 1 1 0 0 0 0 0

survey 0 1 0 0 0 0 0 0 1

trees 0 0 0 0 0 1 1 1 0

graph 0 0 0 0 0 0 1 1 1

minors 0 0 0 0 0 0 0 1 1

§1: Human machine interface for ABC computer applications§2: A survey of user opinion of computer system response time§3: The EPS user interface management system§4: System and human system engineering testing of EPS§5: Relation of user perceived response time to error measurement§6: The generation of random, binary, ordered trees§7: The intersection graph of paths in trees§8: Graph minors IV: Widths of trees and well-quasi-ordering§9: Graph minors: A survey

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 21: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

21

Matrix reduction

= x x0

0Mwords

paragraphs

W P

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 22: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

22

Matrix reduction

= x x0

0Mwords

paragraphs

W P

= x xM'

paragraphs

words

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 23: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

23

Example (Landauer & Dumais 1997)

Human 1 0 0 1 0 0 0 0 0 .16 .40 .38 .47 .18 -.05 -.12 -.16 -.09

interface 1 0 1 0 0 0 0 0 0 .14 .37 .33 .40 .16 -.03 -.07 -.10 -.04

computer 1 1 0 0 0 0 0 0 0 .15 .51 .36 .41 .24 .02 .06 .09 .12

user 0 1 1 0 1 0 0 0 0 .26 .84 .61 .70 .39 .03 .08 .12 .19

system 0 1 1 2 0 0 0 0 0 .45 1.23 1.05 1.27 .56 -.07 -.15 -.21 -.05

response 0 1 0 0 1 0 0 0 0 .16 .58 .38 .42 .28 .06 .13 .19 .22

time 0 1 0 0 1 0 0 0 0 .16 .58 .38 .42 .28 .06 .13 .19 .22

EPS 0 0 1 1 0 0 0 0 0 .22 .55 .51 .63 .24 -.07 -.14 -.20 -.11

survey 0 1 0 0 0 0 0 0 1 .10 .53 .23 .21 .27 .14 .31 .44 .42

trees 0 0 0 0 0 1 1 1 0 -.06 .23 -.14 -.27 .14 .24 .55 .77 .66

graph 0 0 0 0 0 0 1 1 1 -.06 .34 -.15 -.30 .20 .31 .69 .98 .85

minors 0 0 0 0 0 0 0 1 1 -.04 .25 -.10 -.21 .15 .22 .50 .71 .62

r(human,user)= -.38 r(human,user)=.94

§1: Human machine interface for ABC computer applications§2: A survey of user opinion of computer system response time§3: The EPS user interface management system§4: System and human system engineering testing of EPS§5: Relation of user perceived response time to error measurement§6: The generation of random, binary, ordered trees§7: The intersection graph of paths in trees§8: Graph minors IV: Widths of trees and well-quasi-ordering§9: Graph minors: A survey

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 24: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

24

Meaning as vectors

The meaning of a each word is represented as a vector

computer

disk workbike

disk = (2,4,6,-1,0,3,-4,8,-5,8,12.....9,0,1,-9,4,1) computer = (1,4,5,-1,0,2,-2,5,-6,8,11.....8,0,0,-5,7,1)

Illustrative purpose only: does not work in 2 dimensions!

eat

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 25: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

25

Comparison with symbolic approaches

bikeride

wheel

bicycle

fun

bike bicycleSYNONYM

wheelsride

FUNCTION HAS

IS-A

Human-propelledvehicle

vehicle

IS-A

guidon

HAS

guidon

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 26: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

26

Comparison with symbolic approaches

Representation is absoluteGood for drawing inferencesNot so good for comparisonsHard to compound nodes

bike bicycleSYNONYM

wheelsride

FUNCTION HAS

IS­A

Human-propelled

vehicle

vehicle

IS­A

guidon

HAS

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 27: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

27

Comparison with symbolic approaches

Representation is relativeGood for semantic comparisonsEasy to compound words

bikeride

wheel

bicycle

funguidon

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 28: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

28

Semantic comparison

Semantic distance = cosine between vectorsA value between –1 and 1Other measures:

Euclidian distanceHellinger distance...

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 29: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

29

Outline

Intuitive introductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 30: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

30

Word comparisons

A semantic space built from a 4.6 million word encyclopedia (lsa.colorado.edu)

Closest words to bicycle:pedals: .84handlebars .79bicycles .79pedaling .75bike .75

fly – insect: .26fly -plane: .48insect – plane: .02

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 31: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

31

Existing semantic spaces

Englishencyclopedia, literature, psychology textbooks, scientific textbooks (available at lsa.colorado.edu)

Frenchnewspaper, literature, children tales, children production (some of them are available at lsa.colorado.edu)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 32: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

32

What is the right number of dimensions?

Too few dimensions would not capture the latent semantic structureToo much dimensions would describe idiosyncrasic structuresAn open issue300 dimensions seems a good value for the whole language (Dumais found a maximum of performance at 90 dim. for a specific domain)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 33: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

33

From words to texts

Sum up word vectors to get text vectors

disk= (2, 4, 6,-1, 0, 3,-4, 8,.....9, 0,-9, 4, 1)

computer=(1 4, 5,-1, 0, 2,-2, 5,.....8, 0,-5, 7, 1)

kid= (1, 0, 0,-9, 4, 5,-1, 2,.....0, 2, 3, 4, 5)

broken= (5,-3, 8, 6, 5,-2, 2, 0,.....6, 1, 7, 1, 2)

S = The computer disk was broken by the kid

S= (9, 5,19,-5, 9, 8,-5, 15....23, 3,-4,16, 9)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 34: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

34

Text comparisons

A semantic space built from a 4.6 million word encyclopedia (lsa.colorado.edu)

The cat was lost in a forest / My little feline disappeared in the trees: .66The radius of spheres / a circle's diameter: .55 (Landauer)

The radius of spheres / the music of spheres: .01(Landauer)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 35: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

35

No syntax processing

Paragraphs are bags of wordsSemantic is determined from the lexical level

Lexical level

Syntax level

Semantic level

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 36: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

36

No syntax processing

We learned that this is about birds, nests and a building processThe analysis unit is the paragraph

Birds build nests

Nests build birds

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 37: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

37

Outline

Intuitive introductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 38: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

38

Models based on LSA

Semantic representationVocabulary acquisitionText comprehensionFree text assessment

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 39: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

39

Semantic representation

Comparison of wordsTOEFL testVocabulary test for children

Comparison of sentencesMeasure of coherence

Comparison of texts

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 40: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

40

Comparison of words

Synonymy part of the TOEFL test (Landauer & Dumais, 97)

80 items: 1 word/4 alternative wordsGuess which one is the closestExample: levied

imposed

believed

requested

correlated

LSA was trained on a 4.6 million word corpusLSA score : 64.4%Foreign applicants to US colleges: 64.5%

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 41: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

41

Vocabulary test:LSA vs children

(Denhière & Lemaire, 2004)

115 items

4 definitions : correct, close, distant, unrelated

Subjects were in grade 2 to 4

Example (translated from French):

slope rising road tilted surface which goes up or down small piece of ground face of a rock or a mountain

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 42: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

42

A children's semantic space

Stories and tales for children: ~ 1,600,000 wordsChildren productions: ~ 800,000 wordsTextbooks: ~ 400,000 wordsEncyclopedia: ~ 400,000 wordsDictionary: ~ 50,000 words

TOTAL : 3,2 million words

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 43: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

43

Vocabulary test: results

Correct Close Far Unrelated0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

Children data

2nd grade

3rd grade

4th grade

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 44: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

44

Vocabulary test: resultsIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Correct Close Far Unrelated0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

Children+Model data

2nd grade

3rd grade

4th grade

LSA

Page 45: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

45

99 most frequent wordsIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Correct Close Far Unrelated0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

Children+Model data

2nd grade

3rd grade

4th grade

115 words

99 words

Page 46: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

46

Vocabulary test: word lemmatization

Correct Close Far Unrelated0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

Children+Model data

2nd grade

3rd grade

4th grade

115 words

99 words

lemm.

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 47: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

47

Vocabulary test: verb lemmatization

Correct Close Far Unrelated0

5

10

15

20

25

30

35

40

45

50

55

60

65

70

Children+Model data

2nd grade

3rd grade

4th grade

115 words

99 words

lemm.

verb lemm.

Page 48: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

48

Comparison of sentences

● Use LSA to measure the coherence of a text

● Compute the semantic similarities between adjacent sentences

coherence (T) = Σ cosine(sentence i, sentence i+1)

n-1

i=1

n-1

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 49: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

49

Foltz et al. (1998) experiment

McNamara et al. (1996) data4 versions of text about the heart

Low local coherence, low macro coherence (lm)Low local coherence, high macro coherence (lM)High local coherence, low macro coherence (Lm)High local coherence, high macro coherence (LM)

Corpus: 21 texts, 100 dimensions

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 50: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

50

Foltz et al. (1998) experiment

Results :Coherence(lm) = 0.177Coherence(lM) = 0.205Coherence(Lm) = 0.210Coherence(LM) = 0.242

comprehension

coherence

0.2

0.3

0.4

.14 .18 .22

LSA r2=0.853word overlap

r2=0.098

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 51: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

51

Can LSA assess text recall ?

Measure M1 : cosine between source text and texts recalled

Measure M2 : number of propositions recalled

Text Recall Nb subjects Correlation M1/M2Poule Immediate n = 48 (8,3 years old) 0.49Dragon Immediate n = 48 (8,3 years old) 0.64Dragon Delayed n = 48 (8,3 years old) 0.7Araignée Immediate n = 57 (16-20 years old) 0.81Clown Immediate n = 57 (16-20 years old) 0.65Géant Immediate n = 130 (16-20 years old) 0.6Ourson Immediate n = 44 (16-20 years old) 0.48Clown Summary n = 44 (16-20 years old) 0.86

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 52: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

52

Comparison of textsFoltz et al. (1996)

● Compare texts and their summaries● Britt et al. (1994) data

● 24 participants read 21 texts ● Participants write an essay

● Corpus: 21 texts + 8 encyclopedia texts + 2 book excerpts

● Assessing the essays:● Score 1 = average similarity between each sentence and

the closest text sentence● Score 2 = same but compare with the main 10 text

sentences

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 53: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

53

Comparison of textsFoltz et al. (1996)

Judge1 Judge2 Judge3 Judge4 Score1 Score2

Judge1 .575** .768** .381 .418* .589*

Judge2 .582** .412* .552** .626**

Judge3 .367 .317 .384+

Judge4 .117 .240

** p<.01 * p<.05 + p<.06

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 54: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

54

Models based on LSA

Semantic representationVocabulary acquisitionText comprehensionFree text assessment

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 55: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

55

A model of learning

LSA learns the meaning of words from textsDoes it mimic the human rate of learning?Landauer & Dumais (1997) experiment:

Most of the lexicon is acquired from textChildren data:

3500 words read / day50 new words encountered / day10 words learned / dayControlled experiments lead to 2.5 words learned / day

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 56: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

56

Direct and indirect effects

How do children acquire the remaining 7.5 words/day?The poverty of stimulus problem

how do children acquire as much knowledge on the basis of little information?

Hypothesis: a word is learned…From texts containing it (direct effect)From texts not containing it (indirect effect)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 57: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

57

LSA simulation

TOEFL test score is function of:T: total number of paragraphsS: number of paragraphs containing the stem word

score=0.128 (log 0.076 T) (log 31.910 S)

Correlation with observed score= .98After “reading” 25,000 paragraphs, what is the effect of the 25,001th?

When the stem word belongs to it?When the stem word does not belong to it?

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 58: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

58

Example

Suppose a word that appeared twice in the 25,000 paragraphsScore = 0.128 (log 0.076*25000)(log 31.91*2) = .75809

Direct effect: the word belongs to the 25,001th paragraphScore = 0.128 (log 0.076*25001)(log 31.91*3) = .83202

Indirect effect: the word does not belong to the 25,001th paragraphScore = 0.128 (log 0.076*25001)(log 31.91*2) = .75810

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 59: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

59

LSA simulation

Effects are cumulated over all word-frequency bandDirect effect:

.0007 words gained per word encountered

.0007 * 3500 = 2.5 words acquired / day

Indirect effect:.15 words gained per paragraph encountered.15 * 50 = 7.5 words acquired / day

The meaning of a word would be acquiredfor 25% by reading text containing itfor 75% by reading texts not containing it

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 60: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

60

Direct and indirect effects(Lemaire et al., to appear)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

20002721345441874909563063527073780685289249998210797117051261213520-0,1

-0,05

0

0,05

0,1

0,15

0,2

0,25

0,3

0,35

0,4

0,45

0,5

0,55

0,6

acheter (to buy)/magasin (store)

Number of paragraphs

Cosi

ne a

chete

r/m

ag

asi

n

Page 61: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

61

Cosine acheter/magasin

Total gain: .58

Gain due to the occurrence of acheter: -.10

Gain due to the occurrence of magasin: -.19

Gain due to the co-occurrence: .73

Gain due to 2nd order co-occurrences: .11

Gain due to 3rd and more co-occurrences: .03

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 62: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

62

Same measures on 28 words

Total gain: .13

Gain due to the occurrence of W1: -.16

Gain due to the occurrence of W2: -.19

Gain due to the co-occurrence: .34

Gain due to 2nd order co-occurrences: .05

Gain due to 3rd and more co-occurrences: .09

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 63: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

63

Models based on LSA

Semantic representationVocabulary acquisitionText comprehensionFree text assessment

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 64: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

64

Text comprehension

Cognitive models can be simulated on computers, using LSA as a model of semantic memory:

the construction-integration model (Kintsch, 1998)A predication algorithmMetaphor comprehension

Page 65: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

65

A Computational model of text comprehension (Lemaire et al, to

appear)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Next sentence

Selectassociates

SEMANTICMEMORY

(LSA)WORKINGMEMORY

EPISODICMEMORY

Retrievepreviouselements

Decay

Integration

Page 66: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

66

Example (translated from French)

A woodcutter was walking in the forest

Select associates:walk: stroll, meet, pickwoodcutter: ax, forest, cottage

forest: glade, oak, wood

Page 67: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

67

Example (translated from French)

A woodcutter was walking in the forest

walk/woodcutter/forest

walk

forest

woodcutterstroll

meet

pick ax

cottage

glade

oak

wood

Page 68: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

68

Example (translated from French)

A woodcutter was walking in the forest

walk/woodcutter/forest

walk

forest

woodcutterstroll

meet

pickax

cottage

glade

oak

wood

Page 69: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

69

Example (translated from French)

A woodcutter was walking in the forest

walk/woodcutter/forest

walk

forest

woodcutterstroll

meet

pick ax

cottage

glade

oak

wood

Page 70: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

70

Example (translated from French)

A woodcutter was walking in the forest

walk/woodcutter/forest

walk

forest

woodcutterstroll

meet

pick ax

cottage

glade

oak

wood

Page 71: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

71

The construction-integration model

Before LSA, the construction step was performed by handLink weights were set approximatelyAll nodes being vectors, the weight of a link is the cosine of the corresponding nodes A question remains: what is the vector of P(A1,…An) ?

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 72: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

72

The predication algorithm (Kintsch, 2001)

The centroid rule: P(A) P+A

…does not work wellThe meaning of the predicate depends on the meaning of the argumentsThe horse ran / The color ranshark(lawyer) ≠ shark + lawyer

A better rule: P(A) P+A+M1+…+Mn

Mi are vectors that are close to both P and A

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 73: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

73

Example 1:construction step (Kintsch, 2001)

The horse ran ran(horse)

ran

stopped

down

hopped

horse.21

.18

.12

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 74: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

74

Example 1:integration step

The horse ran ran(horse)

ran

stopped

down

hopped

horse.46

.19

.00

The horse ran horse+ran+stopped

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 75: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

75

Example 2:construction step

The color ran ran(color)

ran

stopped

down

hopped

color.06

.11

.02

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 76: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

76

Example2:integration step

The horse ran ran(horse)

ran

stopped

down

hopped

color.00

.70

.00

The color ran color+ran+down

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 77: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

77

Predication algorithm: test(Kintsch, 2001)

.11.01.01dissolve

.29.75.33gallop

color ranhorse ranranLandmarks

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 78: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

78

Inferences (Kintsch, 2001)

.70.73Predication

.54.66Centroid

The elk was deadThe hunter was deadThe hunter shot the elk

.91.87Predication

.66.76Centroid

The glass was broken

The student was broken

The student dropped the glass

.83.62Predication

.71.70Centroid

The table was cleanThe student was clean

The student washed the table

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 79: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

79

Metaphors

My lawyer is a shark shark(lawyer)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

lawyer

shark

My lawyer is a shark

The meaning of my lawyer is a shark is not a simplecombination of the meaning of lawyer andthe meaning of shark

Page 80: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

80

Metaphors

Look for words that are similar to shark but also close to lawyer

shark

fin

teeth

sea

sharks

vicious

fish

agressive

lawyer

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 81: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

81

Metaphors

lawyer

shark

My lawyer is a shark

viciousagressive

My lawyer is a shark

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 82: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

82

Metaphors

Much more neighbors of the predicate need to be consideredAccording to Kintsch, predication fails if number of neighbors is less than 100!

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 83: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

83

An incremental algorithm(Lemaire & Bianco, 2003)

Does not work on a predefine set of neigbors but look for them progressively

This kid is intelligent 1... 2... 3 clever 4... 5 brilliant 6... 7... 8... 9 sage

This kid is a scientist1... 2... 3............................................................................................................................................................................97 invention ......................................................................................... 128 knowledge.............................................138 wonderful

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 84: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

84

Experiment

Does the model account for the processing time of various kind of metaphors?12 texts, 4 conditions :

Synonym (child / kid)Conventional metaphor (child / scientist)Original metaphor (child / sage)Bad metaphor (child / colonel)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 85: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

85

Subjects (N=49)

-200

-150

-100

-50

0

50

100

150

200

250

300

350

Subjects

Synonym

Conv. met.

Unconv. Met.

Bad met.

Types of metaphors

Addit

ionnal ti

me (

ms)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 86: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

86

Subjects and model

Correlation: .62

0

25

50

75

100

125

150

175

200

225

250

275

300

Simulation

Types of metaphors

Nb o

f n

eig

hb

ors

-200

-150

-100

-50

0

50

100

150

200

250

300

350

Subjects

Synonym

Conv. met.

Unconv. Met.

Bad met.

Types of metaphors

Addit

ionnal

tim

e (

ms)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 87: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

87

Models based on LSA

Semantic representationVocabulary acquisitionText comprehensionFree text assessment

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 88: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

88

Free text assessment

Assessing the content of a student essayRely on a domain-specific corpusCompare the student essay with a reference text:

Pre-graded essaysText to be summarizedPortions of a text course

Trusso Haley et al. (2005) published a good state-of-the-art report (Tech. Rep. Open University)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 89: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

89

Intelligent Essay Assessor

Foltz et al. (1999)Holistic score:

Compare student essay with pre-graded essaysGive the grade of the closest one

Gold standard scoreCompare student essay with a teacher essayGive the grade according to the semantic similarityCorrelation with teacher's grades around .8

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 90: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

90

Summary Street

Wade-Stein & E. Kintsch (2003)Helps the student to summarize a textIndicate the part of the text that are well or not well summarizedAn experiment with 60 6th grade students has shown a positive effect

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 91: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

91

Summary StreetIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 92: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

92

ApexIntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Dessus & Lemaire (1999, 2001,2002)Compare student essay with relevant portions of the courseThe course is structured :

#T The first chapter

#N In the first part, we will study the way...

#N Now, I will present a method...

...

#T The second chapter

#N ...

Page 93: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

93

Page 94: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

94

Apex

Correlation with teacher's grades : .62Not only for grading but for engaging students in a revising processCan be used at a distance

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 95: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

95

Outline

Intuitive introductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 96: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

96

Strengths and limits

LSA strengthsfully automaticvectorial representation allows:

esay representation of sequence of words easy comparisons between words or texts

a cognitive model of learning and representation

LSA weaknessesno syntax processingrepresentations are not explicitnot incrementalparagraphs are bag of words

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 97: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

97

Outline

Intuitive introductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 98: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

98

Competing models

Cognitive models of semantic memory could mimic:

the human semantic associations orthe human construction of these associations over a long period of time

Semantic networks are good model of semantic associations BUT... they do not account for the construction of these associations

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 99: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

99

Features

Input (corpus, word association norms)Knowledge representation (vector-based, network-based)Addition of a new text (incrementally or not)Unit of context (paragraph, sliding window)Use of high-order co-occurrences (yes/no)Compositionality (representing the meaning of a text from the meaning of its words)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 100: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

100

LSA

Input: corpus Knowledge representation: vector-basedAddition of a new text: not incrementalUnit of context: paragraphUse of high-order co-occurrences: yesCompositionality: easy

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 101: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

101

HAL(Burgess, 1998)

Word co-occurrence matrix built from a N-word sliding window

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

vector = concatenation of row and columnbarn = <0,2,4,3,6,0,5,0,0,0>Measure of similarity = Euclidian distance

Page 102: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

102

HAL(Burgess, 1998)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 103: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

103

HAL(Burgess, 1998)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 104: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

104

HAL(Burgess, 1998)

Input: corpus Knowledge representation: vector-basedAddition of a new text: incrementalUnit of context: sliding windowUse of high-order co-occurrences: noCompositionality: easy

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 105: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

105

Word association space(Steyvers et al., in press)

Based on association norms for 5,000 wordsWord/word matrix scaled to 200-500 dimensions

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 106: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

106

Page 107: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

107

Word association space(Steyvers et al., in press)

Input: association norms Knowledge representation: vector-basedAddition of a new text: not incrementalUnit of context: N/AUse of high-order co-occurrences: noCompositionality: easy

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 108: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

108

PMI-IR(Turney, 2001)

Based on Mutual Information (Church & Hanks, 89)

compares the probability of two words occuring together with the probability of the words occuring separately

MI(A,B)=log2(p(A,B)/p(A).p(B))

TOEFL test: 1 problem word/4 choicesscore

1(choice)=hits(problem AND choice)/hits(choice)

score2(choice)=hits(problem NEAR choice)/hits(choice)

...

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 109: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

109

PMI-IR(Turney, 2001)

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 110: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

110

PMI-IR(Turney, 2001)

Input: webKnowledge representation: N/AAddition of a new text: incrementalUnit of context: web pageUse of high-order co-occurrences: noCompositionality: hard

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 111: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

111

ICAN(Lemaire & Denhière, 2004)

Network-based representationbetter for quickly retrieving neighborsbetter for representing asymmetrical relations

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

cable

electric

connector

networkrope

wire.7

.2.4

.

.5.7

.5.4 .3

plug

.4

.6

Page 112: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

112

ICAN

... if you have such a [device, connect the cable to the network connector] then switch ...

Create or reinforce co-occurrence links

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

cable

electric

connector

networkrope

wire.7

.2.7

.

.5.7

.5.4 .3

device.5

connect

.5

plug

.4

.8cable

electric

connector

network

wire.7

.2.4

.

.5.7

.5.4 .3

plug

.4

.6

rope

Page 113: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

113

ICAN

... if you have such a [device, connect the cable to the network connector] then switch ...

Create or reinforce co-occurrence links

Create or slightly reinforce 2nd-order co-occurrence links

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

cable

electric

connector

networkrope

wire.7

.2

.7

.

.5.7

.5.4 .3

device.5

connect

.5

plug

.4.3.8

cable

electric

connector

network

wire.7

.2

.4

.

.5.7

.5.4 .3

plug

.4

.6

rope

Page 114: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

114

ICAN

... if you have such a [device, connect the cable to the network connector] then switch ...

Create or reinforce co-occurrence links

Create or slightly reinforce 2nd-order co-occurrence links

Sligthly decrease other links

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

cable

electric

connector

networkrope

wire.6

.2.7

.

.4.7

.5.4 .2

device.5

connect

.5

plug

.4.3.8

cable

electric

connector

network

wire.7

.2.4

.

.5.7

.5.4 .3

plug

.4

.6

rope

Page 115: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

115

ICAN

3.2 million word corpus of texts for childrenTest on 1200 pairs of words (from association norms)Average weights:

stem/1st associate: .415

stem/2nd associate: .269

stem/3rd associate: .236stem/last associates: .098

Correlation with human data: .50

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 116: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

116

ICAN

Input: corpusKnowledge representation: network-basedAddition of a new text: incrementalUnit of context: sliding windowUse of high-order co-occurrences: yesCompositionality: hard

IntroductionLSA techniqueTestsCognitive modelsLimitsCompeting models

Page 117: Latent Semantic Analysis - WordPress.com · Latent Semantic Analysis a tutorial at CogSci'2005 Benoît Lemaire ... airport bee fly pilot truck car drive wings bird train travel parks

117

Web pages

Testslsa.colorado.edu

Referenceswww-leibniz.imag.fr/~blemaire/lsa.html