Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia...

20
Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science Wellesley College, USA

Transcript of Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia...

Page 1: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Coverage and Independence:Measuring Quality in Web Search Results

Panagiotis Takis MetaxasLilia Ivanova

Eni Mustafaraj

Department of Computer Science

Wellesley College,

USA

Page 2: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Precision and Recall in Traditional IR

Page 3: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Precision and Recall in Web IRHigh Precision is easy to achieve but does not convey useful information

Recall is uninteresting and cannot be computed accurately because of the enormous size of the web

85% of Web Searchers never look past top-10!

Page 4: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

But what is Quality?

Page 5: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Quality when searching controversial issues?

Page 6: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Quality when searching Political Issues?

But Google is usually so good in finding info… Why does it do that?

Page 7: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Define Search Quality in a web-meaningful way

Comprehensive Coverage = Lack of bias towards some search results

For a controversial issue (at a minimum): cover the pro, con and balanced opinions

For k opinions, and top-N results:

expected # of results / opinion: N/k

Coverage Bias = total distance from N/k

Page 8: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Define Search Quality in a web-meaningful way

Comprehensive Coverage = Lack of bias towards some search results

(bad coverage) 0 ≤ C ≤ 1 (good coverage)

Now we can talk about, e.g., 60% coverage

Page 9: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Define Search Quality in a web-meaningful way

Independent search results = Results that are not dependent due to spamming

u: URL Dependency

r: Redirection Dependency

c: Content Dependency

l: Link Dependency

Page 10: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Example of Dependent Results:Google’s “HGH benefits”

Redirection dependency URL dependency

Table 1: Top-10 results of Google when given the query ”HGH benefits” for August, 2007 and September, 2007. For each entry we have calculated the size of the backGraph as (|V |, |E |) revealed by the Google API and the change between these two dates.

Page 11: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Example of Dependent Results:Yahoo’s “Is ADHD a real disease”

Link dependency Content dependency

Table 4: Top-10 results of the Yahoo search engine when given the query ”Is ADHD a real disease” (August and September, 2007).

Page 12: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Define Search Quality in a web-meaningful way

Independent search results = Results that are not dependent due to spamming

u: URL Dependency

r: Redirection Dependency

c: Content Dependency

l: Link Dependency

(total dependence) 0 ≤ ≤ 1 (total independence)

Page 13: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Evaluating Quality of 3 Search Results

Query with commercial interest: “Human Growth Hormone (HGH) benefits”

Query with medical interest: “Is ADHD a real disease?”

Query with political interest: “Morality of abortions”

Page 14: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Evaluating Quality of 3 Search Results

Coverage of Google Coverage of Yahoo Independence

Our result show low coverage for controversial questions that are not highly pursued and higher coverage for an issue that is highly pursued (“Abortion”).

They also show high independence of results that are not highly pursued andhigher independence for an issue that is highly pursued (“Abortion”).

There is significant overlap between the top-10 returns of both Yahoo and Google results!

Page 15: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Comparing visible neighborhoods

Google

Yahoo

Both

Page 16: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Coverage and Independence:Measuring Quality in Web Search Results

Panagiotis Takis Metaxas

Department of Computer Science

Wellesley College,

USA

Thank [email protected]

Page 17: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Example of Dependent Results:Yahoo’s “HGH benefits”

Page 18: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Example of Dependent Results:Google’s “Is ADHD a real disease”

Page 19: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Example of Dependent Results:Google’s “Morality of Abortion”

Page 20: Coverage and Independence: Measuring Quality in Web Search Results Panagiotis Takis Metaxas Lilia Ivanova Eni Mustafaraj Department of Computer Science.

Example of Dependent Results:Yahoo’s “Morality of Abortion”