The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April...

31
The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004

description

Relevance Feedback “The process of obtaining relevance information and using it in a further search.” --Croft and Harper (1972)

Transcript of The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April...

Page 1: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

The Limited Use of Relevance Feedback in Internet Search

Engines(with penguins)

Paul DebraskiApril 27, 2004

Page 2: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Relevance “Since information science first began to coalesce into a distinct discipline in the forties and fifties, relevance has been identified as its fundamental and central concept.”

--Schamber (1990)

Page 3: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Relevance Feedback

“The process of obtaining relevance information and using it in a further search.”

--Croft and Harper (1972)

Page 4: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Yes, yes, use it!

“In view of the simplicity of the query modification operation, the relevance feedback process should be incorporated into operational text retrieval environments.”

--Salton and Buckley (1990)

Page 5: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

“Relevance Revolution”

“There has been increasing acceptance that stated requests are not the same as information needs, and that consequently relevance should be judged in relation to the needs rather than stated requests.”

--Robertson & Hancock-Beaulieu (1992)

Page 6: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

What do you think? “Users and their questions are

fundamental to all kinds of information systems, and human decisions and human-system interactions are by far the most important variables in processes dealing with searching for and retrieval of information. Nevertheless, it is nothing but short of amazing how relatively little knowledge and understanding in a scientific sense we have about these factors.” --Saracevic (1988)

Page 7: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Automatic v Interactive

“They want answers rather than pointers; they want document delivery rather than information retrieval… [and they] want to achieve their goals with a minimum of cognitive load and a maximum of enjoyment.”

--Marchionini (1992)

Page 8: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

“Users want magic”

“User control itself was not enough to overcome the effect of the other factors which might affect preference, in particular that of effort.” --Belkin et al (1999)

Page 9: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Reality (the Internet)

“We found it surprising that relevance feedback was so seldom utilized.”

--Spink (2000, researching EXCITE)

“Query modification was not a typical occurrence.”

Page 10: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

It gets worse“Most users searched with one query only and did not follow with successive queries.”

“The mean number of terms per query was 1.98 for the relevance feedback population and 2.2 for the larger population.”

--Spink (2000)

Page 11: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

In fact

“The additional task of judging feedback terms is itself a difficult one which users will avoid.”

--Anick (2000 researching AltaVista)

Page 12: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

So who does use it?“As for user characteristics of the relevance feedback population, they do not appear to differ in terms of sophistication from the other Web users, but they exhibit more doggedness in attempting to locate relevance information.”

--Jansen, Spink, Saracevic (1999)

Page 13: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Subuyliminal

“Many users did not even notice feedback terms when embedded within an already textually full results page. Those that did notice them often interpreted them as directories or advertising.”

--Anick (2000)

Page 14: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

The debate “Our findings

emphasize the need to approach design of Web IR systems, search engines, and even Web site design in a significantly different way than the design of IR systems as practiced to date”

--Spink (2000)

“The people who use your website most

will be among your most valuable

customers. Unless the original design

was deeply flawed, they will likely hate

any changes you make”

--McGovern (2000)

Page 15: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

So what did I do?

Page 16: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

The About network consists of hundreds of Guide sites neatly organized into 23 channels. The sites cover more than 50,000 subjects with over 1 million links to the best resources on the Net and the fastest-growing archive of high quality original content. Topics range from pregnancy to cars, palm pilots to painting, weight loss to video game strategies. No one has greater depth and breadth than About.

Page 17: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

During the spring of 1995, scientists at Digital Equipment Corporation’s Research lab in Palo Alto, CA, devised a way to store every word of every HTML page on the Internet in a fast, searchable index. This led to AltaVista’s development of the first searchable, full-text database on the World Wide Web. Most advanced Internet search features and capabilities: multimedia search, translation & language recognition, and specialty search.

Page 18: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Imagine a search engine that could read your mind, one that understands what you are thinking when you type in a query. What you’ve imagined is Natural Language Processing (NLP). When we chat with friends, we speak in casual, conversational tones. It should be the same thing when you’re looking for information online.

With NLP, Jeeves is able to understand the context of what you are asking, and he can thus to offer you answers and search suggestions in the same human terms in which we all communicate. You can see this technology in action in our related search terms and editorially selected answers. It’s natural because it’s what comes to us innately. [sic]

Page 19: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Ask.comTeoma, which means ‘expert’ in Gaelic, is unlike any other search engine out there. Now, we could throw a lot of fancy terms at you, like refinement and relevance and advanced algorithms. And all of these describe what makes Teoma so powerful. But, what’s really important for you to know is that Teoma adds a new dimension to your search results-authority. Instead of ranking results based upon the sites with the most links leading to them, Teoma analyzes the Web as it naturally occurs - in its subject-specific communities - to determine which sites are most relevant. Teoma is unique from any other search technology because it analyzes the Web as it actually exists - in subject-specific communities. This process begins by creating a comprehensive and high-quality index. Web crawling is an essential tool for this approach, and it ensures that we have the most up-to-date search results.

Page 20: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

PageRank relies on the uniquely democratic nature of the web by using its vast link structure as an indicator of an individual page’s value. In essence, Google interprets a link from page A to page B as a vote, by page A, for page B. But, Google looks at more than the sheer volume of votes, or links a page receives; it also analyzes the page that casts the vote. Votes cast by pages that are themselves “important” weigh more heavily and help to make other pages “important.”

Page 21: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

GoogleImportant, high-quality sites receive a higher PageRank, which Google remembers each time it conducts a search. Of course, important pages mean nothing to you if they don’t match your query. So, Google combines PageRank with sophisticated text-matching techniques to find pages that are both important and relevant to your search. Google goes far beyond the number of times a term appears on a page and examines all aspects of the page’s content (and the content of the pages linking to it) to determine if it’s a good match for your query.

Page 22: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.
Page 23: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

At last, Penguins

• About.com: penguin.com• AltaVista: penguincomputing.com• Ask.com: penguin.co.uk• Excite: penguin.co.uk• Google: penguin.com• Lycos: penguin.com• Yahoo!: penguin.co.uk

Page 24: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

And yet,

• About: search About Network

• AltaVista: ‘Refine your search’– Linux Penguin, Linux Penguin Logo, New

Zealand, Antarctica

• Ask.com: ‘Related Searches’– Penguin books, penguin habitats, emporer

penguins

Page 25: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Further,

• Excite: ‘Refine Your Results’– Books (12) [New Zealand (2), Gifts (2)]– Linux (10) [Computing (2) Mascot (2)]– Animal (8)– CCM Swiveling (5) [Motorola v60 (2)]

• Google: ‘Similar pages’

Page 26: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Finally,

• Lycos: ‘Narrow Your Search’– Penguin Books, Poke the Penguin, Penguin

Putnam, Penguin Publishing

• Yahoo!: ‘Related,’ ‘More,’ ‘Show All’– Penguin Game, Penguin facts, Pittsburgh

Penguins, Penguin Jokes, Penguin Mints, Penguin Gore, Macaroni Penguin, Penguin Tux, Extreme Penguin, Jackass Penguin, Penguin Munsingwear

Page 27: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Inflatable!

• About: penguin-place.com• AltaVista: Bullet Holed Messenger• Ask.com: Inflatable Linux Penguin (suse.com)• Excite: creatableinflatables.com• Google: Linux Journal Store• Lycos: penguin-place.com• Yahoo!: Bullet Holed Messenger

Page 28: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Refined Inflatables

• About: Search about network

• AltaVista: ‘Refine Your Search’– Linux Penguin, New Zealand, Baseball Cap,”

Geeko

• Ask.com: ‘Related Searches’– Penguin Toys, Blow Up Animals, Elvis Wigs,

Furry Handcuffs, Hen Night L Plate

Page 29: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

More

• Excite: ‘Refine Your Results’– Toys, Doll (10) [Dinosaur (3), Ebay (3)]– Airblown (10) [Gemmy (5), Christmas (2)]– Signs, YARD (6) [Your milestone (4)]– Emperor, Tall (4)

• Google: narrow Your Search

Page 30: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Lastly,

• Lycos: ‘Narrow Your Search’– Penguin Books, Poke the Penguin, Penguin

Putnam, Penguin Publishing– (same)

• Yahoo!: ‘Related,’ ‘More,’ ‘Show All’– nothing

Page 31: The Limited Use of Relevance Feedback in Internet Search Engines (with penguins) Paul Debraski April 27, 2004.

Feeling lucky?

• Sponsored Results• Pop up ads!

• Altruism?