Intent Mining from Search Results

17
Intent Mining from Search Results Jan Pedersen

description

Intent Mining from Search Results. Jan Pedersen. Outline. Intro to Web Search Free text queries Architecture Why it works Result Set Mining Disambiguation Correction Amplification. The Worst Interface ( ca 1990). The Search Interface ( ca 2010). Search wasn’t always like this. - PowerPoint PPT Presentation

Transcript of Intent Mining from Search Results

Page 1: Intent Mining from  Search Results

Intent Mining from Search Results

Jan Pedersen

Page 2: Intent Mining from  Search Results

Outline

• Intro to Web Search– Free text queries– Architecture– Why it works

• Result Set Mining– Disambiguation– Correction– Amplification

Page 3: Intent Mining from  Search Results

The Worst Interface (ca 1990)

The Search Interface (ca 2010)

Page 4: Intent Mining from  Search Results

Search wasn’t always like this

ttl/(tennis and (racquet or racket))isd/1/8/2002 and motorcyclein/newmar-julieSource: USPTO

Page 5: Intent Mining from  Search Results

Salton’s Contribution

Source: cs.cornell.edu

• Free text queries• Approximate matching• Relevance ranking

• Exploit redundancy• Meta data• Scored-OR

Page 6: Intent Mining from  Search Results

Life of a query

Gerry Salton

(Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7]))

Index• Separation between user query and backend query

• Relevance scoring and ranking• Query-in-context summaries

Page 7: Intent Mining from  Search Results

Why Does it Work?

Page 8: Intent Mining from  Search Results

Semantic Meta-Data

Segment Tail OverallAll Queries 100% 100%Word Count > 4 41% 20%Misspelled 21% 11%Perfect Matches Popularity 28% 54%Partial Matches Popularity 45% 28%No Matches Popularity 9% 7%

Page 9: Intent Mining from  Search Results

RESULT SET MINING

Page 10: Intent Mining from  Search Results

Query Expansion

• [Gerry Salton] [Gerry Salton Cornell]• Disambiguation via Expansion• Pseudo Relevance Feedback (Evans)

Page 11: Intent Mining from  Search Results

Life of a query (2)

Gerry Salton

(Scored-OR 10, ([(“Gerry” or “Gerald”),0.3], [“Salton”,0.7]))

Index

Gerry Salton Gerry Salton Cornell

• Result Set Analysis• Automated Query expansion• Reranking

Page 12: Intent Mining from  Search Results

Spelling Correction

• Session Log Mining• Multiple queries with Blending• Behavioral feedback loop

Blend(Scored-AND(200, “britinay”, “spares”), Scored-AND(200, “britney”, “spears”))

Scored-AND(200, OR(“britinay”, “britney”), OR(“spares”, “spears”))

Page 13: Intent Mining from  Search Results

Web Search

Gerry Salton

• Speller• Synonyms

Index

First Stage reRanking: 100K

(Scored-AND 200,”Gerry”, “Salton”)

IndexIndexIndexIndexIndex100B

LocalNews

Second Stage reRanking: 5K

Third Stage reRanking: 50

• Query Understanding• Federation• ReRanking and Blending

Page 14: Intent Mining from  Search Results

• Entity Detection• Grouping• Summarization

Page 15: Intent Mining from  Search Results

Post Result Triggering

• Alternative to Answer Blending• Structured Data integration• Off-page data joins

Page 16: Intent Mining from  Search Results

Grouping

• Reranked Results• Compressed Presentation• Coherently grouped

Page 17: Intent Mining from  Search Results

Summary

• Web Queries are not User Intent– Suffer from ambiguity and errors

• Intent can be mined from results– Query Correction– Disambiguation– Grouping and Organization