Planning for the TREC 2008 Legal Track Douglas Oard Stephen Tomlinson Jason Baron.

17
Planning for the TREC 2008 Legal Track Douglas Oard Stephen Tomlinson Jason Baron
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    216
  • download

    0

Transcript of Planning for the TREC 2008 Legal Track Douglas Oard Stephen Tomlinson Jason Baron.

Planning for the TREC 2008Legal Track

Douglas Oard

Stephen Tomlinson

Jason Baron

Agenda

• Track goals• Deciding on a document collection• “Beating Boolean”• Handling nasty OCR• Making the best use of the metadata• Ad hoc task design• Interactive task design• Relevance feedback task design• Other issues

Track Goals

• Develop a reusable test collection– Documents, topics, evaluation measures

• Foster formation of a research community

• Establish baseline results

Choosing a Collection

• FERC Enron (w/attachments, full headers)– Somewhat larger than CMU– Email is the real killer app for E-discovery

• IIT CDIP version 1.0 (same as 2006/07)– We have 83 topics. Do we need more?

• State Department Cables– Task model would be FOIA, not E-Discovery

TREC Topic Number: 1

Title: Marketers or Traders of Electricity on the Financial Market Description: Identify Enron employees who bought and sold electricity on California’s financial (long-term sales) energy market, solely for the purpose of re-buying/re-selling this energy later for a profit. Narrative: A relevant document must at a minimum identify the name and email address of the marketer, as well as the Enron subsidiary to which he/she belonged. The marketer’s phone number would be helpful as well, to help analysis of the corresponding Enron voice dataset. Hint: Enron Power Marketing, Inc. (EPMI), Enron Energy Services, Inc. and Enron Energy Marketing Corporation all appear to have conducted long-term marketing services for Enron. This observation is based on the fact that Enron submitted information for all three of these subsidiaries in its reply to FERC’s data request 2 (DR2). (DR2 asked Enron to submit information about its short-term and long-term sales. Enron replied with data from these three subsidiaries.) (38, pp. 1-2, plus personal analysis.) It would be good, however, to know for sure which entities or persons did marketing at Enron.

Query Possibilities: • (marketer or marketers or “Enron Power Marketing” or EPMI or “Enron Energy Services” or “Enron Energy Marketing Corporation”) • (marketer or marketers or “Enron Power Marketing” or EPMI or “Enron Energy Services” or “Enron Energy Marketing Corporation”) and (MW or KW or watt* or MwH or KwH)

o This is to target electricity sales rather than natural gas sales. All the subsequent electricity queries can be similarly modified.

• (marketer or marketers or EPMI) and (short or long) o As in have a long or short position in sales/purchases.

• (marketer or marketers or EPMI) and (NYMEX or CBOT or “Mid-Columbia” or COB or “California-Oregon Border” or “Four Corners” or “Palo Verde” or EOL)

o The electricity futures hubs were Mid-Columbia, COB, Four Corners, and Palo Verde, as best the author can tell. (85) NYMEX and CBOT ran these. (89; 15, p. 78) o EOL was the forward market trading place. (36, p. 3)

Identity Modeling in Enron

[email protected] m scott

suebobsusan scott

sue

susan

ciao

again

m scott

[email protected]

scott susan

susan m scott

susan scott

[email protected] scott

friday

sscott5

susan

sscott

susan m scott

com members

66,715 models

82,084addr-name

3,151 addr-nickname

19,708 addr-addr

Enron Identity Test Collections

Collection Emails Identities Mention Candidates

Queries Min. Avg. Max.

Sager 1,628 627 51 1 4 11

Shapiro 974 855 49 1 8 21

Enron-subset 54,018 27,340 78 1 152 489

Enron-all 248,451 123,783 78 3 518 1785

Sager

Shapiro

Enron-subsetEnron-all

Test CollectionsTest Collections

Example Document

Title: CIGNA WELL-BEING NEWSLETTER - FUTURE STRATEGY

Organization Authors: PMUSA, PHILIP MORRIS USA

Person Authors: HALLE, L

Document Date: 19970530

Document Type: MEMO, MEMORANDUM

Bates Number: 2078039376/9377

Page Count: 2

Collection: Philip Morris

Philip Moxx's. U.S.A. x.dr~am~c. cvrrespoaa.aaBenffrts Departmext Rieh>pwna, Yfe&iaTa: Dishlbutfon Data aday 90,1997.From: Lisa FisllaSabj.csr CIGNA WeWedng Newsbttsr -Yntsre StratsUDuring our last CIGNA Aatfoa Plan meadng, tlu iasuo of wLetSae to i0op per'Irw+ngartieles aod discontinue mndia6 CIGNA Well-Being aawslener to om employees was amsiter of disanision . I Imvm done somme reaearc>>, and wanted to pruedt you with mySadings and pcdiminary recwmmeadatioa for PM's atratezy Ieprding l4aas aewelattee* .I believe .vayone'a input is valusble, and would epproolate hoarlng fmaa aaeh of you onwhetlne you concur with my reeommendatioa…

Scanned OCR Metadata

State Department Cables

0

100,000

200,000

300,000

400,000

1973 1974 1975

Nu

mb

er o

f R

eco

rds

Withdrawn

Metadata

Full Text

791,857 records – 550,983 of which are full text

State Department Cables

Handling Nasty OCR

• Index pruning

• Error estimation

• Character n-grams

• Duplicate detection

• Expansion using a cleaner collection

How to “Beat Boolean”

• Work from reference Boolean?– Swap out low-ranked-in for high-ranked-out

• Relax Boolean somehow?– Cover density, proximity perturbation, …

Using Metadata

• Title (term match)

• Author (social network

• Bates number (sequence)

Ad Hoc Task Design

• Evaluation measures– R@B?, P@R?, Index size?– Error bars / Statistical significance testing– Limits on post-hoc use of the collection?– What are “meaningful” differences?

• Topic design– Negotiation transcript?

• Inter-annotator agreement

Interactive Track Design

• Evaluation measure– Precision-oriented?– Recall-oriented?– Effect of assessor disagreement

Relevance Feedback Task

• Evaluation measure– Residual recall at B_Residual?

• Two-stage feedback?

Some Open Questions• Test collection reusability

– Unbiased estimates? Tight error bars?

• Why can’t we beat Boolean???– Different strategies? Detailed failure analysis?

• Can we improve topic formulation?– Structured relevance relevance feedback?

• Is OCR masking effects we need to see?– Is it time for a new collection?– Must it be de-duped? Is metadata needed?

• Does Δscope invalidate the interactive task?