SEMANTIC SEARCH PRESENTED BY: Group No:13 Jai Mashalkar113050007 Khushraj Madnani113050041 Lahari...

49
SEMANTIC SEARCH PRESENTED BY: Group No:13 Jai Mashalkar 113050007 Khushraj Madnani 113050041 Lahari Poddar 113050029

Transcript of SEMANTIC SEARCH PRESENTED BY: Group No:13 Jai Mashalkar113050007 Khushraj Madnani113050041 Lahari...

SEMANTIC SEARCH

PRESENTED BY:

Group No:13

Jai Mashalkar 113050007

Khushraj Madnani 113050041

Lahari Poddar 113050029

SEMANTIC SEARCH

Semantic search seeks to improve search accuracy by understanding searcher’s intent and the contextual meaning of terms as they appear in the searchable dataspace, whether on the Web or within a closed system, to generate more relevant results.

MOTIVATION FOR SEMANTIC SEARCH

SEMANTIC SEARCH TECHNOLOGY

Semantically Relatable Sets

Query Expansion

Relevance Feedback

SEMANTICALLY RELATABLE SETS

SEMANTICALLY RELATABLE SET

A semantically relatable set (SRS) of a sentence is a group of unordered words in the sentence (not necessarily consecutive) that appear in the semantic graph of the sentence as linked nodes.

FORMS OF SRS

a. {CW,CW}b. {CW,FW,CW}c. {FW,CW}

CW: Content Word or ClauseFW: Function Words

Example: The girl borrowed a book on AI from library.

CW: girl, borrowed, book, AI, libraryFW: the, a, on, from

THE GIRL BORROWED A BOOK ON AI FROM LIBRARY

borrowed

librarybookgirl

AI

agentobject

place

the: definitea: indefinite from: modifier

on: modifier

modifier

past tense

THE GIRL BORROWED A BOOK ON AI FROM LIBRARY

Sets Formed:a) {the,girl}b) {girl,borrowed}c) {borrowed,book}d) {book,on,AI}e) {borrowed,from,library}f) {a,book}

THE PROFESSOR ANNOUNCED THAT HE WILL CONDUCT AN EXTRA LECTURE ON SUNDAY

announced

SCOPEprofessor

agent

the: definite

object

that: modifier

THE PROFESSOR ANNOUNCED THAT HE WILL CONDUCT AN EXTRA LECTURE ON SUNDAY

SCOPE:

conduct

sunday

lecturehe

extra

agentobject

time

on: modifiermodifier

Will: Future Tense

an: indefinite

SETS FORMED

a) {the,professor}b) {professor,announced}c) {announced.that,SCOPE}d) SCOPE:{he,conduct}e) SCOPE:{will,conduct}f) SCOPE:{conduct,lecture}g) SCOPE:{conduct,on,sunday}h) SCOPE:{extra,lecture}i) SCOPE:{an,lecture}

SRS BASED SEARCH

Rq(d) = Relevance of the document d to the query q |Sd| = Number of sentences in the document d rq(s) = Relevance of sentence s to the query q

• The relevance score for a document d:

• The relevance of the sentence s to the query q :

weight(srs) = weight of the SRS srs.press(srs) = true if srs is present in sentence s, false otherwise.

ANALYSIS OF SRS

SRS based search technique gives very high precision value ( the fraction of retrieved instances that are relevant) compared to tf-idf based search.

But falls short of tf-idf based search due to its low recall( the fraction of relevant instances that are retrieved).

LOW RECALL

REASONS: Morphological Divergence

Eg: Apparel for man: Clothes for men Synonymy/Hypernymy/Hyponymy

Divergence Eg: Color: red/blue

Physical Separation DivergenceEg: Book on AI: AI book

LOW RECALL

ENHANCEMENTS: Stemming

Eg: Moving, moved, moves → move Word Similarity

Eg: Clothes ~ Apparel SRS Augmentation

<Noun1 Preposition Noun2> ~ <Noun2 Noun1>

QUERY EXPANSION

QUERY EXPANSION

Query expansion is the process of reformulating a seed query to improve retrieval performance.

Techniques involved: Finding synonyms of words. Finding all the various morphological forms of a

word by stemming

TYPES OF QUERY EXPANSION

GLOBAL : Examine word occurrences and relationships using thesaurus. It can be constructed manually or automatically.

LOCAL: Using the top ranked documents retrieved by the original query.

GLOBAL QUERY EXPANSION

Manual Thesaurus Generation: Use of a controlled vocabulary (maintained

by human editors) that is built up from sets of synonymous names for concepts.

Automatic Thesaurus Generation: Exploit word co occurrence. Exploit grammatical relations or grammatical

dependencies.

ANALYSIS OF QUERY EXPANSION

Query expansion is effective in increasing recall of relevant documents.

But it may significantly decrease precision,particularly when the query contains ambiguous terms.

In general a domain specific thesaurus is required for better performance.

RELEVANCE FEEDBACK

RELEVANCE FEEDBACK

i. Initially the given query by user is firedii. Some results are retrievediii. Analyze whether or not those results are

relevantiv. Perform a new query and then produce the

final search results by firing this modified query.

TYPES OF RELEVANCE FEEDBACK

Explicit Feedback :

Process of taking Feedback Taken By users for assessing a given output(Set of Documents).

Eg: After a document is viewed, ask “Was this document helpful?”

ANALYSIS:

ADVANTAGE:It is able to depict the actual requirement and expectations of the user

DISADVANTAGE:

Large fraction of user may not be interested to participate in surveys and Feedbacks.

These surveys may be biased based on personal choices of users.

e.g. : When searched about inferno, most of the people may rank the pages of musical band named inferno over that of inferno OS

IMPLICIT FEEDBACK:

Feedback which is inferred by the actions of user on output documents.

Factors: Number of times document is visited Duration of visit on particular URL Depth and number of links from visited

ANALYSIS: ADVANTAGE :

The interaction time with user is eliminated as the system takes the feedback of the user implicitly.

DISADVANTAGE:

Number of Hits on Url: Users may tend to always click on the initial document received. Thus if the search was initially not upto the mark, it may continue performing poor.

Time Spent on URL: Sometimes the time taken to reject a document may be substantial enough for the algorithm to believe that it is relevant.

Number and Depth of links visited: This will definitely rank a relevant document as relevant. But this will fail to rank a good document without links as relevant.

PSEUDO RELEVANCE FEEDBACK OR BLIND FEEDBACK :

Takes a query as an input. From some top k ranked results on that

query, some keywords (as per their weights) are selected and augmented to the query which results in further search process.

ANALYSIS:

ADVANTAGE : It is a completely automated process. Hence totally free from human biasness.

DISADVANTAGE:

The efficiency heavily depends on the ranking algorithm used. If the top documents retrieved by the initial query are not very relevant then the final result will also not be very impressive.

The type of term associations obtained for QE is restricted to co-occurrence based relationships in the feedback documents, and thus other types of term associations such as lexical and semantic relations (morphological variants, synonyms) are not explicitly captured .

MULTI LINGUAL PRF  Given a query in a language, we take the

help of another language to ameliorate the well known problems of PRF.

The steps are:i. Translation: L1 -> L2 ii. PRF performed in L2. iii.Result back-translation: L2 -> L1iv.Combination of feedback models of L1,L2. v. Fetch a new ranked list of documents.

  

ANALYSIS OF MULTILINGUAL PRF

Good Feedback from Assisting Language: If the feedback model in the assisting language contains good terms, then the back-translation process will introduce the corresponding feedback terms in the source language, thus leading to improved performance.

Finding Synonyms/Morphological Variations: Another situation in which MultiPRF leads to large improvements is when it finds semantically/lexically related terms to the query terms which the original feedback model was unable to.

Abundance of documents in the assisting language in the web compared to the base language.

OBSERVATIONS

CONCLUDING REMARKS Semantic Search will be helpful in case of

Research Search but won’t be much helpful for Navigational Search.

Semantic Search performs better than traditional searching methods in case of semantically meaningful sentences or phrases but will fall short for keyword based search.

To be able to use Semantic Search Engine to their full potential the users also need to get used to searching with meaningful queries instead of just keywords.

THE FUTURE AHEAD…

Semantic search may not able to replace the traditional web completely but it has the power to enhance it.

With semantic search the web will become more intelligent as it will be able to understand exactly what we mean instead of searching just the keywords.

REFERENCES Rajat Mohanty, Anupama Dutta and Pushpak

Bhattacharyya, Semantically Relatable Sets: Building Blocks for Repesenting Semantics, 10th Machine Translation Summit ( MT Summit 05), Phuket, September, 2005.

Manoj Chinnakotla, Karthik Raman and Pushpak Bhattacharyya, Multilingual PRF: English Lends a Helping Hand, SIGIR 2010, Geneva, Switzerland, July, 2010.

Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press. 2008.

Query Expansion Using Local and Global Document Analysis Jinxi Xu and W. Bruce Croft Center for Intelligent Information Retrieval Computer Science Department University of Massachusetts, Amherst, MA 01003-4610, USA.

http://en.wikipedia.org/wiki/Semantic_search ,Last modified on 23 October 2011 at 14:11,Last Accessed on 02 November 2011 at 17:31

http://en.wikipedia.org/wiki/Query_expansion, Last modified on 7 October 2011 at 20:43, Last Accessed on 04 November 2011 at 18:45

http://en.wikipedia.org/wiki/Relevance_feedback,Last modified on 31 October 2011 at 03:46,Last Accessed on 04 November 2011 at 19:10

THANK YOU