Minor Project

Context Based Search

ByShatabdi Kundu (2010EET2553)

Computer Technology,M.TechIIT Delhi

Email ID:shatabdikundu@live.com

Project Guide:Prof.Santanu Chaudhury

Electrical Engineering DepartmentIIT Delhi

Email ID:santanuc@ee.iitd.ac.in

June 22, 2011Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 1of 16

Outline

Introduction to Topic Models- Probabilistic Modelling

Latent Dirichlet Allocation

Topic Discovery using Wordnet

Work Done

Results

Conclusion and Future Work

References

Shatabdi Kundu :: 2010EET2553 Prof.Santanu Chaudhury 09 MAY 2011 2of 16

Probabilistic Modelling

Treat data as observations that arise from a generativeprobabilistic process that includes hidden variables

For documents, the hidden variables reflect the thematicstructure of the collection

Infer the hidden structure using posterior inference

What are the topics that describe this collection?

Situate new data into the estimated model

How does this query or new document fit into the estimatedtopic structure?

Intuition behind LDA

Generative Process

Cast these intuitions into a generative probabilistic process

Each document is a random mixture of corpus-wide topics

Each word is drawn from one of those topics

Graphical Models

Nodes are random variablesEdges denote possible dependenceObserved variables are shadedPlates denote replicated structure

Graphical Models

Structure of the graph defines the pattern of conditionaldependence between the ensemble of random variables.

Eg. this graph corressponds to

p(y , x1...xN) = p(y)N∏

p(xn | y) (1)

1 Draw each topic βk ∼ Dir(η), for k ε {1,.....,K}2 For each document:

1 Draw topic proportions θd ∼ Dir(α)2 For each word:

1 Draw Zd,n ∼ Mult(θd)2 Draw Wd,n ∼ Mult(βZd,n )

From a collection of documents, infer

Per-word topic assignment Zd,n

Per-document topic proportions θdPer-corpus topic distributions βk

Use posterior expectations to perform the task at hand, e.ginformation retrieval,document similarity, etc.

Topic Discovery using Wordnet

Lexical relations used for finding out the latent topics

synsets(synonym sets) as basic units

hyponymya semantic relation between word meaningsEg. {maple} is a hyponym of {tree}

hypernymyinverse of hyponymEg.{tree} is a hypernym of {maple}

Work Done

I took a collection of 10 documents that had a total of around28K words

I removed the stop words and rare words along withpunctuation marks and numbers.

Then I modeled a 7-topic LDA model with this corpus

Now I had 7 topics with 5 most highly probable occuringwords from each topic.

I then used the lexical relations of Wordnet to identify thehidden topics using common parents of all the words in eachtopic.

Results after training LDA model

This model only selects appropriate words within a topic butdoes not name the topic

Discovering the topic name is done using Wordnet

Results after applying to Wordnet

The above result gives us the hidden topic names of the wordsthat comprised the documents.

This kind of model can be used for identifying topics whengiven only a word.

Conclusion and Future Work

Now we will be working on searching based on topics(context)using this model.

Basically we will be dealing with geo-intent of the queries anddecide on the topic to which they belong for better retrieval ofinformation.

References

Latent Dirichlet allocation. D. Blei, A. Ng, and M. Jordan.Journal of Machine Learning Research, 3:993-1022, January2003.

Jun Fu Cai, Wee Sun Lee, Yee Whye Teh. NUS-ML:Improving Word Sense Disambiguation Using Topic Features.SEMEVAL (2007).

David M. Blei, Jon D. McAuliffe. Supervised Topic Models.NIPS (2007).

Wordnet. http://www.shiffman.net/teaching/a2z/wordnet

Thank You

Minor Project

Education

Transcript of Minor Project

Minor Project Report

Minor Project

Project Proposal for Minor Project

Minor Project 1 Project Planner

Chandan Minor Project

My minor project

Minor Project Report1

Minor Project Work

MINOR PROJECT new,..............

MINOR PROJECT

TVS Minor Project

minor project robotics.docx

Minor Project Cms

Minor Project Sapna

Minor Project Me

Minor project workbook

Puneet Minor Project

Mayank Minor Project

Project Report : Minor Project

LEAN Project Delivery of Minor Capital Projectsassets.highways.gov.uk/.../LCI_NW_Minor_projects_webinar.pdfLEAN Project Delivery of Minor Capital Projects ... • Minor Works Project