A Vector Space Model for Automatic Indexing

20
A Vector Space Model for Automatic Indexing G. Salton, A. Wong and C. S. Yang Enhanced Vector Space Models for Content-based Recommender Systems Cataldo Musto Presenter Sawood Alam <[email protected]>

description

A Vector Space Model for Automatic Indexing. Enhanced Vector Space Models for Content-based Recommender Systems. G. Salton, A. Wong and C. S. Yang. Cataldo Musto. Presenter Sawood Alam . A Vector Space Model for Automatic Indexing. G. Salton, A. Wong and C. S. Yang - PowerPoint PPT Presentation

Transcript of A Vector Space Model for Automatic Indexing

Page 1: A Vector Space Model for Automatic Indexing

A Vector Space Model for Automatic Indexing

G. Salton, A. Wong and C. S. Yang

Enhanced Vector Space Models for Content-based Recommender Systems

Cataldo Musto

PresenterSawood Alam <[email protected]>

Page 2: A Vector Space Model for Automatic Indexing

A Vector Space Model for Automatic Indexing

G. Salton, A. Wong and C. S. YangCornell University

Page 3: A Vector Space Model for Automatic Indexing

Introduction

• In document retrieval, best indexing space is where each entity lies far away from others

• Density of the object space becomes a measure of indexing system

• Retrieval performance correlate inversely with space density

Page 4: A Vector Space Model for Automatic Indexing

Document Space

• Di = (di1, di2, di3, …, dij)

Page 5: A Vector Space Model for Automatic Indexing

Document Space (cont.)

Page 6: A Vector Space Model for Automatic Indexing

Document Space (cont.)

Page 7: A Vector Space Model for Automatic Indexing

Indexing Performance vs. Space Density

Page 8: A Vector Space Model for Automatic Indexing

Cluster Density vs. Indexing Performance

Page 9: A Vector Space Model for Automatic Indexing

Discrimination Value Model

Page 10: A Vector Space Model for Automatic Indexing

Discrimination Value Model (cont.)

Page 11: A Vector Space Model for Automatic Indexing

Discrimination Value Model Summary

Page 12: A Vector Space Model for Automatic Indexing

Average Recall vs. Precision

Page 13: A Vector Space Model for Automatic Indexing

Summary Recall vs. Precision

Page 14: A Vector Space Model for Automatic Indexing

Enhanced Vector Space Models for Content-based Recommender Systems

Cataldo MustoDept. of Computer Science

University of Bari, [email protected]

Page 15: A Vector Space Model for Automatic Indexing

Introduction

• Vector Space Models (VSM) in Information Retrieval is an established practice

• Investigate the impact of vector space models in Information Filtering– Recommender system

Page 16: A Vector Space Model for Automatic Indexing

Problems of VSM

• High dimensionality– Becoming more serious due to emerging social

apps and micro-blogging, generating lots of web content and new vocabulary

• Inability to manage document semantics– Order of the term occurrence in the document

Page 17: A Vector Space Model for Automatic Indexing

Components

• Context vector for each term– Values in {-1, 0, 1}

• Vector Space representation of a term (t)• Vector Space representation of a document (d)• Vector Space representation of a user profile (pu)

Page 18: A Vector Space Model for Automatic Indexing

Indexing Technique

• Random Indexing-based model• Weighted Random Indexing-based model• Semantic Vector-based model• Weighted Semantic Vector-based model

Page 19: A Vector Space Model for Automatic Indexing

Experimental Evaluation

Page 20: A Vector Space Model for Automatic Indexing

Conclusions

• First prototype with naive weighting scheme is comparable to other content based filtering techniques like Bayesian classifier

• Other complex weighting schemes should perform better

• User profiles may be studied based on Linked Data rather than keyword based user profiles