Overview of the TREC 2016 Open Search track: Academic Search Edition

37
Overview of the TREC 2016 Open Search track Academic search edi>on Krisz>an Balog University of Stavanger @krisz’anbalog 25th Text REtrieval Conference (TREC 2016) | Gaithersburg, 2016 Anne Schuth Blendle @anneschuth

Transcript of Overview of the TREC 2016 Open Search track: Academic Search Edition

Page 1: Overview of the TREC 2016 Open Search track: Academic Search Edition

Overview of the TREC 2016 Open Search track

Academic search edi>on

Krisz>an Balog University of Stavanger

@krisz'anbalog

25th Text REtrieval Conference (TREC 2016) | Gaithersburg, 2016

Anne Schuth Blendle

@anneschuth

Page 2: Overview of the TREC 2016 Open Search track: Academic Search Edition

USERS

TREC assessors "unsuspec>ng users"

VS

Page 3: Overview of the TREC 2016 Open Search track: Academic Search Edition

THE DATA DIVIDE

INDUSTRY ACADEMIA

Page 4: Overview of the TREC 2016 Open Search track: Academic Search Edition

WHAT IS OPEN SEARCH?

Open Search is a new evalua1on paradigm for IR. The experimenta1on pla=orm is an exis1ng search engine. Researchers have the opportunity to replace components of this search engine and evaluate these components using interac1ons with real, "unsuspec1ng" users of this search engine.

Page 5: Overview of the TREC 2016 Open Search track: Academic Search Edition

WHY OPEN SEARCH?

• Because it opens up the possibility for people outside search organiza>ons to do meaningful IR research

• Meaningful includes • Real users of an actual search system

• Access to the same data

Page 6: Overview of the TREC 2016 Open Search track: Academic Search Edition

RESEARCH QUESTIONS

• How does online evalua>on compare to offline, Cranfield style, evalua>on?

• Would systems be ranked differently?

• How stable are such system rankings?

• How much interac>on volume is required to be able to reach reliable conclusions about system behavior?

• How many queries are needed?

• How many query impressions are needed? • To which degree does it maSer how query impressions are

distributed over queries?

Page 7: Overview of the TREC 2016 Open Search track: Academic Search Edition

RESEARCH QUESTIONS (2)

• Should systems be trained or op>mized differently when the objec>ve is online performance?

• What are ques>ons that cannot be answered about a specific task (e.g., scien>fic literature search) using offline evalua>on?

• How much risk do search engines that serve as experimental plaYorm take?

• How can this risk be controlled while s>ll be able to experiment?

Page 8: Overview of the TREC 2016 Open Search track: Academic Search Edition

LIVING LABS METHODOLOGY

Page 9: Overview of the TREC 2016 Open Search track: Academic Search Edition

KEY IDEAS

• An API orchestrates all the data exchange between sites (live search engines) and par>cipants

• Focus on frequent (head) queries • Enough traffic on them for experimenta>on

• Par>cipants generate rankings offline and upload these to the API

• Eliminates real->me requirement

• Freedom in choice of tools and environment

K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14

Page 10: Overview of the TREC 2016 Open Search track: Academic Search Edition

OVERVIEW

experimental systems

users live site

API

K. Balog, L. Kelly, and A. Schuth. Head First: Living Labs for Ad-hoc Search Evalua=on. CIKM'14

Page 11: Overview of the TREC 2016 Open Search track: Academic Search Edition

METHODOLOGY (1)

experimental systemusers live site

API

• Sites make queries, candidate documents (items), historical search and click data available through the API

Page 12: Overview of the TREC 2016 Open Search track: Academic Search Edition

METHODOLOGY (2)

experimental systemusers live site

API

• Rankings are generated (offline) for each query and uploaded to the API

Page 13: Overview of the TREC 2016 Open Search track: Academic Search Edition

METHODOLOGY (3)

experimental system

API

• When any of the test queries is fired on the live site, it requests an experimental ranking from the API and interleaves it with that of the produc>on system

query

interleaved ranking

query

experimental ranking

Page 14: Overview of the TREC 2016 Open Search track: Academic Search Edition

INTERLEAVING

doc 1

doc 2

doc 3

doc 4

doc 5

doc 2

doc 4

doc 7

doc 1

doc 3

system A system Bdoc 1

doc 2

doc 4

doc 3

doc 7

interleaved list

A>BInference:

• Experimental ranking is interleaved with the produc>on ranking

• Needs 1-2 order of magnitudes data than A/B tes>ng (also, it is within subject as opposed to between subject design)

Page 15: Overview of the TREC 2016 Open Search track: Academic Search Edition

INTERLEAVING

doc 1

doc 2

doc 3

doc 4

doc 5

doc 1

doc 2

doc 3

doc 7

doc 4

system A system Bdoc 1

doc 2

doc 3

doc 4

doc 7

interleaved list

Inference: tie

• Team Drac Interleaving • No preferences are inferred from common prefix of A and B

Page 16: Overview of the TREC 2016 Open Search track: Academic Search Edition

METHODOLOGY (4)

• Par>cipants get detailed feedback on user interac>ons (clicks)

experimental systemusers live site

API

Page 17: Overview of the TREC 2016 Open Search track: Academic Search Edition

METHODOLOGY (5)

• Evalua>on measure:

• where the number of “wins” and “losses” is against the produc>on system, aggregated over a period of >me

• An Outcome of > 0.5 means bea>ng the produc>on system

Outcome =#Wins

#Wins + #Losses

Page 18: Overview of the TREC 2016 Open Search track: Academic Search Edition

WHAT IS IN IT FOR PARTICIPANTS?

• Access to privileged (search and click-through) data • Opportunity to test IR systems with real,

unsuspec>ng users in a live seing • Not the same as crowdsourcing!

• Con>nuous evalua>on is possible, not limited to yearly evalua>on cycle

Page 19: Overview of the TREC 2016 Open Search track: Academic Search Edition

KNOWN ISSUES

• Head queries only • Considerable por>on of traffic, but only popular info needs

• Lack of context • No knowledge of the searcher’s loca>on, previous searches, etc.

• No real->me feedback • API provides detailed feedback, but it’s not immediate

• Limited control • Experimenta>on is limited to single searches, where results are interleaved

with those of the produc>on system; no control over the en>re result list

• Ul>mate measure of success • Search is only a means to an end, it is not the ul>mate goal

Page 20: Overview of the TREC 2016 Open Search track: Academic Search Edition

KNOWN ISSUES

• Head queries only • Considerable por>on of traffic, but only popular info needs

• Lack of context • No knowledge of the searcher’s loca>on, previous searches, etc.

• No real->me feedback • API provides detailed feedback, but it’s not immediate

• Limited control • Experimenta>on is limited to single searches, where results are interleaved

with those of the produc>on system; no control over the en>re result list

• Ul>mate measure of success • Search is only a means to an end, it is not the ul>mate goal

Come to the planning session tomorrow!

Page 21: Overview of the TREC 2016 Open Search track: Academic Search Edition

OPEN SEARCH 2016:ACADEMIC SEARCH

Page 22: Overview of the TREC 2016 Open Search track: Academic Search Edition

ACADEMIC SEARCH

• Interes>ng domain • Need seman>c matching to overcome vocabulary mismatch

• Different en>ty types (papers, authors, orgs, conferences, etc.)

• Beyond document ranking: ranking en>>es, recommending related literature, etc.

• This year • Single task: ad hoc scien>fic literature search

• Three academic search engines

Page 23: Overview of the TREC 2016 Open Search track: Academic Search Edition

TRACK ORGANIZATION

• Mul>ple evalua>on rounds • Round #1: Jun 1 - Jul 15

• Round #2: Aug 1 - Sep 15

• Round #3: Oct 1 - Nov 15 (official TREC round)

• Train/test queries • For train queries feedback is available individual impressions

• For test queries only aggregated feedback is available (and only acer the end of each evalua>on period)

• Single submission per team

Page 24: Overview of the TREC 2016 Open Search track: Academic Search Edition

EXAMPLE RANKING

Ranking in TREC format

Ranking to be uploaded to the API

R-q2 Q0 R-d70 1 0.9 MyRunID R-q2 Q0 R-d72 2 0.8 MyRunID R-q2 Q0 R-d74 3 0.7 MyRunID R-q2 Q0 R-d75 4 0.6 MyRunID R-q2 Q0 R-d1270 5 0.5 MyRunID R-q2 Q0 R-d73 6 0.4 MyRunID R-q2 Q0 R-d1271 7 0.3 MyRunID R-q2 Q0 R-d71 8 0.2 MyRunID ...

{ 'doclist': [ {'docid': 'R-d70'}, {'docid': 'R-d72'}, {'docid': 'R-d74'}, {'docid': 'R-d75'}, {'docid': 'R-d1270'}, {'docid': 'R-d73'}, {'docid': 'R-d1271'}, {'docid': 'R-d71'} ], 'qid': 'R-q2', 'runid': 'MyRunID' }

Page 25: Overview of the TREC 2016 Open Search track: Academic Search Edition

SITES AND RESULTS

Page 26: Overview of the TREC 2016 Open Search track: Academic Search Edition

CITESEERX

• Main focus is on Computer and Informa>on Sci. • hSp://citeseerx.ist.psu.edu/

• Queries • 107 test + 100 training for Rounds #1 and #2

• 700 addi>onal test queries for Round #3

• Documents • Title

• Full document text (extracted from PDF)

Page 27: Overview of the TREC 2016 Open Search track: Academic Search Edition

CITESEERX RESULTSROUNDS #1 & #2

TeamRound #1 Round #2

Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr.

UDel-IRL 0.86 6 1 2 9

webis 0.75 3 1 1 5

UWM 0.67 2 1 3 6

IAPLab 0.73 8 3 1 12 0.60 3 2 1 6

BJUT 0.33 3 6 1 10 0.60 6 4 1 11

QU 0.50 3 3 3 9 0.50 3 3 1 7

Gesis 0.67 4 2 3 9 0.50 2 2 1 5

OpnSearch_404 0.00 0 0 1 1 0.50 4 4 1 9

KarMat 0.60 3 2 2 7 0.44 4 5 0 9

Page 28: Overview of the TREC 2016 Open Search track: Academic Search Edition

CITESEERX RESULTSROUND #3 (=OFFICIAL RANKING)

TeamRound #3

Outcome #Wins #Losses #Ties #Impr.

Gesis 0.71 5 2 0 7

OpnSearch_404 0.71 5 2 2 9

KarMat 0.67 4 2 0 6

UWM 0.67 2 1 0 3

IAPLab 0.63 5 3 2 10

BJUT 0.55 44 36 15 95

UDel-IRL 0.54 33 28 14 75

webis 0.50 20 20 11 51

DaiictIr2 0.38 6 10 5 21

QU 0.25 2 6 2 10

Page 29: Overview of the TREC 2016 Open Search track: Academic Search Edition

SSOAR

• Social Science Open Access Repository • hSp://www.ssoar.info/

• Queries • 74 test + 57 training for Rounds #1 and #2

• 988 addi>onal test queries for Round #3

• Documents • Title, abstract, author(s), various metadata field (subject, type,

year, etc.)

Page 30: Overview of the TREC 2016 Open Search track: Academic Search Edition

SSOAR RESULTSROUNDS #1 & #2

TeamRound #1 Round #2

Outcome #Wins #Losses #Ties #Impr. Outcome #Wins #Losses #Ties #Impr.

Gesis 1.00 1 0 461 462 1.00 1 0 96 97

UWM 0.60 3 2 473 478 1.00 1 0 94 95

QU 0.33 1 2 472 475 0.50 1 1 112 114

webis 0.50 1 1 88 90

KarMat 0.80 4 1 504 509 0.00 0 2 84 86

IAPLab 0.00 0 0 148 148 0.00 0 0 24 24

UDel-IRL 0.00 0 0 11 11 0.00 0 1 84 85

OpnSearch_404 0.00 0 0 2 2 0.00 0 0 2 2

Page 31: Overview of the TREC 2016 Open Search track: Academic Search Edition

SSOAR RESULTSROUND #3 (=OFFICIAL RANKING)

TeamRound #3

Outcome #Wins #Losses #Ties #Impr.

IAPLab 1.00 1 0 185 186

Gesis 0.61 11 7 5136 5154

webis 0.50 2 2 1640 1644

UDel-IRL 0.11 2 17 4723 4742

UWM 0.00 0 1 176 177

QU 0.00 0 0 179 179

KarMat 0.00 0 0 185 185

OpnSearch_404 0.00 0 0 6 6

Page 32: Overview of the TREC 2016 Open Search track: Academic Search Edition

MICROSOFT ACADEMIC SEARCH

• Research service developed by MSR • hSp://academic.research.microsoc.com/

• Queries • 480 test queries

• Documents • Title, abstract, URL

• En>ty ID in the Microsoc Academic Search Knowledge Graph

Page 33: Overview of the TREC 2016 Open Search track: Academic Search Edition

MICROSOFT ACADEMIC SEARCHEVALUATION METHODOLOGY

• Offline evalua>on, performed by Microsoc • Head queries (139)

• Binary relevance, inferred from historical click data

• Tradi>onal rank-based evalua>on (MAP)

• Tail queries (235) • Side-by-side evalua>on against a baseline produc>on system

• Top 10 results decorated with Bing cap>ons

• Rela>ve ranking of systems w.r.t. the baseline

Page 34: Overview of the TREC 2016 Open Search track: Academic Search Edition

MICROSOFT ACADEMIC SEARCHRESULTS

Team MAP

UDEL-IRL 0.60

BJUT 0.56

webis 0.52*

Team Rank

webis #1

UDEL-IRL #2

BJUT #3

* Significantly different from UDEL-IRL and BJUT

Head queries(click-based evalua>on)

Tail queries(side-by-side evalua>on)

Page 35: Overview of the TREC 2016 Open Search track: Academic Search Edition

SUMMARY

• Ad hoc scien>fic literature search • 3 academic search engines, 10 par>cipants

• TREC OS 2017 • Academic search domain

• Addi>onal sites

• One more subtask (recommending literature; ranking people, conferences, etc.)

• Mul>ple runs per team

• Consider a second use-case • Product search, contextual adver>sing, news recommenda>on, ...

Page 36: Overview of the TREC 2016 Open Search track: Academic Search Edition

CONTRIBUTORS

• API development and maintenance • Peter Dekker

• CiteSeerX • Po-Yu Chuang, Jian Wu, C. Lee Giles

• SSOAR • Narges Tavakolpoursaleh, Philipp Schaer

• MS Academic Search • Kuansan Wang, Tobias Hassmann, Artem Churkin, Ioana

Varsandan, Roland DiSel

Page 37: Overview of the TREC 2016 Open Search track: Academic Search Edition

QUESTIONS?

hEp://trec-open-search.org