Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9....

43
Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology M. Soleymani Fall 2016 Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

Transcript of Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9....

Page 1: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Latent Semantic Indexing (LSI) CE-324: Modern Information Retrieval Sharif University of Technology

M. Soleymani

Fall 2016

Most slides have been adapted from: Profs. Manning, Nayak & Raghavan (CS-276, Stanford)

Page 2: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Vector space model: pros

Partial matching of queries and docs

dealing with the case where no doc contains all search terms

Ranking according to similarity score

Term weighting schemes

improves retrieval performance

Various extensions

Relevance feedback (modifying query vector)

Doc clustering and classification

2

Page 3: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Problems with lexical semantics

Ambiguity and association in natural language

Polysemy: Words often have a multitude of meanings and

different types of usage

More severe in very heterogeneous collections.

The vector space model is unable to discriminate between

different meanings of the same word.

3

Page 4: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Problems with lexical semantics

Synonymy: Different terms may have identical or similar

meanings (weaker: words indicating the same topic).

No associations between words are made in the vector

space representation.

4

Page 5: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Polysemy and context

Doc similarity on single word level: polysemy and context

carcompany

•••dodgeford

meaning 2

ringjupiter

•••space

voyagermeaning 1…

saturn...

planet...

contribution to similarity, if

used in 1st meaning, but not

if in 2nd

5

Page 6: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

SVD

6

Page 7: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Latent Semantic Indexing (LSI)

Perform a low-rank approximation of doc-term

matrix (typical rank 100-300)

latent semantic space

Term-doc matrices are very large but the number of topicsthat people talk about is small (in some sense)

General idea: Map docs (and terms) to a low-dimensional

space

Design a mapping such that the low-dimensional space reflects

semantic associations

Compute doc similarity based on the inner product in this latent

semantic space

7

Page 8: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Goals of LSI

Similar terms map to similar location in low

dimensional space

Noise reduction by dimension reduction

8

Page 9: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

9

This matrix is the basis for computing similarity between docs and queries.

Can we transform this matrix, so that we get a better measure of similarity

between docs and queries? . . .

Term-document matrix

Page 10: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Singular Value Decomposition (SVD)

𝑀𝑀 𝑀𝑁 𝑁𝑁

For an 𝑀𝑁 matrix 𝐴 of rank 𝑟 there exists a factorization:

The columns of 𝑈 are orthogonal eigenvectors of 𝐴𝐴𝑇.

The columns of 𝑉 are orthogonal eigenvectors of 𝐴𝑇𝐴.

Singular values

Eigenvalues 1… 𝑟 of 𝐴𝐴𝑇 are also the eigenvalues of 𝐴𝑇𝐴.

𝐴 = 𝑈Σ𝑉𝑇

Typically, the singular values arranged in decreasing order.

Σ = diag 𝜎1, … , 𝜎𝑟𝜎𝑖 = 𝜆𝑖

Page 11: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Singular Value Decomposition (SVD)

Truncated SVD

11

min(𝑀,𝑁)

min(𝑀,𝑁)

Mmin(M,N) Min(M,N)min(M,N) Min(M,N)N

𝐴 = 𝑈Σ𝑉𝑇

Page 12: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

SVD example

M=3, N=2

Or equivalently:

0 2/ 6

1/ 2 −1/ 6

1/ 2 −1/ 6

1 0

0 3

1/ 2 1/√2

1/ 2 −1/ 2

𝐴 =

0 2/ 6

1/ 2 −1/ 6

1/ 2 −1/ 6

1/ 3

1/ 3

1/ 3

1 0

0 30 0

1/ 2 1/√2

1/ 2 −1/ 2

𝐴 =1 −10 11 0

Page 13: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Example

13

We use a non-weighted matrix here to simplify the example.

Page 14: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Example of 𝐶 = 𝑈Σ𝑉𝑇: All four matrices

14

𝐶 = 𝑈Σ𝑉𝑇

Page 15: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Example of 𝐶 = 𝑈Σ𝑉𝑇: matrix 𝑈

15

Columns: “semantic” dims (distinct topics like politics, sports,...)

𝑢𝑖𝑗: how strongly related term 𝑖 is to the topic in column 𝑗 .

One row per term

One column per min(M,N)

Page 16: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Example of 𝐶 = 𝑈Σ𝑉𝑇: The matrix Σ

16

Singular value:

“measures the importance of the corresponding semantic dimension”.

We’ll make use of this by omitting unimportant dimensions.

square, diagonal matrix

min(M,N) × min(M,N).

Page 17: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Example of 𝐶 = 𝑈Σ𝑉𝑇: The matrix 𝑉𝑇

17

Columns of 𝑉: “semantic” dims

𝑣𝑖𝑗: how strongly related doc 𝑖 is to the topic in column 𝑗 .

One column per doc

One row per min(M,N)

Page 18: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Matrix decomposition: Summary

We’ve decomposed the term-doc matrix 𝐶 into a

product of three matrices.

𝑈: consists of one (row) vector for each term

𝑉𝑇: consists of one (column) vector for each doc

Σ: diagonal matrix with singular values, reflecting importance of

each dimension

Next:Why are we doing this?

18

Page 19: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

LSI: Overview

19

Decompose term-doc matrix 𝐶 into a product of

matrices using SVD

𝐶 = 𝑈Σ𝑉𝑇

We use columns of matrices 𝑈 and 𝑉 that correspond to the

largest values in the diagonal matrix Σ as term and document

dimensions in the new space

SVD for this purpose is called LSI.

Page 20: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Solution via SVD

Low-rank approximation

set smallest r-k

singular values to zero

column notation:

sum of rank 1 matrices

𝑀 ×𝑁 𝑀 × 𝑘

𝑘 × 𝑘 𝑘 × 𝑁

We retain only 𝑘 singular values

𝐴𝑘 = 𝑈 diag 𝜎1, … , 𝜎𝑘 , 0, … 0 𝑉𝑇

𝐴𝑘 =

𝑖=1

𝑘

𝜎𝑘𝑢𝑖𝑣𝑖𝑇

Page 21: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

SVD can be used to compute optimal low-rank approximations.

Keeping the 𝑘 largest singular values and setting all others to zero results in

the optimal approximation [Eckart-Young].

No matrix of the rank 𝑘 can approximates 𝐴 better than 𝐴𝑘 .

Approximation problem: Given matrix 𝐴, find matrix 𝐴𝑘 of rank 𝑘 (e.g.

a matrix with 𝑘 linearly independent rows or columns) such that

𝐴𝑘 and 𝑋 are both 𝑀 × 𝑁 matrices.

Typically, we want 𝑘 ≪ 𝑟.

Low-rank approximation

Frobenius norm

21

𝐴𝑘 = min𝑋:𝑟𝑎𝑛𝑘 𝑋 =𝑘

𝐴 − 𝑋 𝐹

Page 22: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Approximation error

How good (bad) is this approximation?

It’s the best possible, measured by the Frobenius norm of

the error:

where the 𝑖 are ordered such that 𝑖 𝑖+1.

Suggests why Frobenius error drops as 𝑘 increases.

22

min𝑋:𝑟𝑎𝑛𝑘 𝑋 =𝑘

𝐴 − 𝑋 𝐹 = 𝐴 − 𝐴𝑘 𝐹 = 𝜎𝑘+1

𝐴𝑘 = 𝑈 diag 𝜎1, … , 𝜎𝑘 , 0, … 0 𝑉𝑇

Page 23: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

SVD Low-rank approximation

Term-doc matrix 𝐶 may have 𝑀 = 50000,𝑁 = 106

rank close to 50000

Construct an approximation 𝐶100with rank 100.

Of all rank 100 matrices, it would have the lowest Frobenius

error.

Great … but why would we??

Answer: Latent Semantic Indexing

C. Eckart, G. Young, The approximation of a matrix by another of lower rank.

Psychometrika, 1, 211-218, 1936.

Page 24: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Recall unreduced decomposition 𝐶 = 𝑈Σ𝑉𝑇

24

Page 25: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Reducing the dimensionality to 2

25

Page 26: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Reducing the dimensionality to 2

26

Page 27: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Original matrix 𝐶 vs. reduced 𝐶2 = 𝑈Σ2𝑉𝑇

27

𝐶2 as a two-dimensional

representation of 𝐶.

Dimensionality reduction to

two dimensions.

Page 28: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Why is the reduced matrix “better”?

28

28

Similarity of d2 and d3 in the original space: 0.

Similarity of d2 und d3 in the reduced space:

0.52 * 0.28 + 0.36 * 0.16 + 0.72 * 0.36 + 0.12 * 0.20 + - 0.39 * - 0.08 ≈ 0.52

Page 29: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Why the reduced matrix is “better”?

29

“boat” and “ship” are semantically similar.

The “reduced” similarity measure reflects this.

What property of the SVD reduction is responsible for improved similarity?

Page 30: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Example

30 [Example from Dumais et. al]

Page 31: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Example

31 [Example from Dumais et. al]

Page 32: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Example (k=2)

32 [Example from Dumais et. al]

𝑈𝑘

Σ𝑘 𝑉𝑘𝑇

Page 33: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

33

graph

tree

minor

survey

time

responseuser

computer

interface

humanEPS

system

Squares: terms

Circles: docs

Page 34: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

34 [Example from Dumais et. al]

Page 35: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

How we use the SVD in LSI

Key property of SVD: Each singular value tells us how

important its dimension is.

By setting less important dimensions to zero, we keep the

important information, but get rid of the “details”.

These details may

be noise ⇒ reduced LSI is a better representation

Details make things dissimilar that should be similar ⇒ reduced LSI is a better

representation because it represents similarity better.

35

Page 36: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

How LSI addresses synonymy and semantic

relatedness?

Docs may be semantically similar but are not similar in the

vector space (when we talk about the same topics but use

different words).

Desired effect of LSI: Synonyms contribute strongly to doc similarity.

Standard vector space: Synonyms contribute nothing to doc similarity.

LSI (via SVD) selects the “least costly” mapping:

different words (= different dimensions of the full space) are

mapped to the same dimension in the reduced space.

Thus, it maps synonyms or semantically related words to the same dimension.

“cost” of mapping synonyms to the same dimension is much less

than cost of collapsing unrelated words.

Thus, LSI will avoid doing that for unrelated words.

36

Page 37: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Performing the maps

Each row and column of 𝐶 gets mapped into the 𝑘-dimensional LSI space, by the SVD.

A query 𝑞 is also mapped into this space, by

Query NOT a sparse vector.

Claim: this is not only the mapping with the best

(Frobenius error) approximation to 𝐶, but also improves

retrieval.

37

Since 𝑉𝑘 = 𝐶𝑘𝑇𝑈𝑘Σ𝑘

−1, we

should transform query 𝑞 to 𝑞𝑘

𝑞𝑘 = 𝑞𝑇𝑈𝑘Σ𝑘

−1

Page 38: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Implementation

Compute SVD of term-doc matrix

Map docs to the reduced space

Map the query into the reduced space 𝑞𝑘 = 𝑞𝑇𝑈𝑘Σ𝑘

−1

Compute similarity of 𝑞𝑘 with all reduced docs in 𝑉𝑘 .

Output ranked list of docs as usual

What is the fundamental problem with this approach?

38

Page 39: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Empirical evidence

Experiments on TREC 1/2/3 – Dumais

Lanczos SVD code (available on netlib) due to Berry used

in these experiments

Running times of ~ one day on tens of thousands of docs [still an

obstacle to use]

Dimensions – various values 250-350 reported.

Reducing k improves recall.

Under 200 reported unsatisfactory

Generally expect recall to improve – what about precision?

39

Page 40: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Empirical evidence

Precision at or above median TREC precision

Top scorer on almost 20% of TREC topics

Slightly better on average than straight vector spaces

Effect of dimensionality:

Dimensions Precision

250 0.367

300 0.371

346 0.374

40

Page 41: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

But why is this clustering?

We’ve talked about docs, queries, retrieval and

precision here.

What does this have to do with clustering?

Intuition: Dimension reduction through LSI brings

together “related” axes in the vector space.

41

Page 42: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Simplistic picture

Topic 1

Topic 2

Topic 342

Page 43: Latent Semantic Indexing (LSI)ce.sharif.edu/courses/95-96/1/ce324-1/resources/root/... · 2020. 9. 7. · Latent Semantic Indexing (LSI) Perform a low-rank approximation of doc-term

Reference

43

Chapter 18 of IIR Book