SF Women in eDiscovery Sept 2011

34
12/5/2011 1 Getting to a Manageable Review Set Intake Data 100% Duplicates 25% Non- Responsive 20% Produced 12.25% These figures vary based upon the data set received NR/Priv 20% Responsive & Priv 15% Junk/Spam/ Porn 20% Focus on finding, reviewing & using the “right” data, not just filtering data

Transcript of SF Women in eDiscovery Sept 2011

Page 1: SF Women in eDiscovery Sept 2011

12/5/2011 1

Getting to a Manageable Review Set

Intake

Data

100%

Duplicates

25%

Non-

Responsive

20%

Produced

12.25%

These figures vary based upon the data set received

NR/Priv

20%

Responsive

& Priv 15%

Junk/Spam/

Porn

20%

Focus on finding,

reviewing & using the

“right” data,

not just filtering data

Page 2: SF Women in eDiscovery Sept 2011

12/5/2011 2

Review risks

Failure to collect the right data

Failure to find responsive documents

Failure to recognize responsive documents

Failure to recognize privileged documents

Inconsistent treatment of documents (e.g.,

duplicates)

Failure to complete project in a timely manner

Sophisticated Tools

– Understand What They Do and Don’t Do Well

– Inform Yourself, Speak to References, Consultants

Page 3: SF Women in eDiscovery Sept 2011

12/5/2011 3

Search Methodologies

specific exact wordsKeyword

Clustering Ontology

relationships among relevant people

similarity of

salient features

generalized

words or phrases

Social Network Analysis

specific exact wordsKeyword specific exact wordsKeyword

Clustering Ontologysimilarity of

salient features

generalized

words or phrases

specific exact wordsKeyword

Clustering Ontology

Social Network Analysis

Relationship

Analysisdocuments with

causal or

sequential relationship

relationships among relevant people

similarity of

salient features

generalized

words or phrases

specific exact words,

proximity searches, stemmingKeyword

Clustering Ontology

Social Network Analysis

Relationship

Analysisdocuments with

causal or

sequential relationship

relationships among relevant people

similarity of

salient features

generalized

words or phrases

Content

Concept

Context

Visualization

Measurement

Page 4: SF Women in eDiscovery Sept 2011

12/5/2011 4

Myth

Keyword Searching is the Way to Go

If I agree to keyword terms, I am OK

Missing in Action (Under-inclusive)

Unwanted Extras (Over-inclusive)

Multiple subject/persons (Disambiguate)

Reality: Keyword Search is one tool among many!

Page 5: SF Women in eDiscovery Sept 2011

Keyword culling

"simple keyword searches end up being both over- and under-inclusive."

Judge Paul Grimm, Victor Stanley, Inc. v. Creative Pipe, Inc., No. MJG-06-2662, 2008 U.S. Dist. LEXIS 42025

(D. Md. May 29, 2008).

Page 6: SF Women in eDiscovery Sept 2011

12/5/2011 6

Keyword Accuracy Example

8,553 responsive documents

missed by keyword search

(Almost 8% of responsive

documents missed by

keyword search - Under-inclusive)

Keyword search reduced the

document set by only 47%

And 88% of the documents

returned by keyword

search were not responsive

(Over-inclusive)

Page 7: SF Women in eDiscovery Sept 2011

12/5/2011 7

Missing abbreviations / acronyms / clippings:

– incentive stock option but not ISO

Missing inflectional variants:

– grant but not grants, granted, granting

Missing spellings or common misspellings:

– gray but not grey

– privileged but not priviliged, priviledged, privilidged,

priveliged, privelidged, priveledged, …

Missing syntactic variants:

• board of directors meetingbut not meeting of the board of

directors, BOD meeting, board meeting, BOD mtg…

Missing Synonyms/Paraphrases:

• Hire date but not start date

Under Inclusive - Missing in Action

Page 8: SF Women in eDiscovery Sept 2011

12/5/2011 8

Options

Target: Sheila was granted 100,000 options at $10

Match: What are our options for lunch?

Match in a signature line:

Amanda Wacz

Acme Stock Options Administrator

Destroy

Target:destroyevidence

Match in a disclaimer: The information in this email, and any

attachments, may contain confidential and/or privileged

information and is intended solely for the use of the named

recipient(s). Any disclosure or dissemination in whatever form, by

anyone other than the recipient is strictly prohibited. If you have

received this transmission in error, please contact the sender

and destroy this message and any attachments. Thank you.

Over-Inclusive - Unwanted Extras (a)

Page 9: SF Women in eDiscovery Sept 2011

12/5/2011 9

Over-Inclusive - Unwanted Extras (b)

alter*

Target: alter, alters, altered, altering

Matches: alternate, alternative, alternation, altercate,

altercation, alterably, …

grant

Target:stock optiongrant

Matches names:Grant Woods, Howard Grant

Page 10: SF Women in eDiscovery Sept 2011

12/5/2011 10

Example: refund is used to refer to:

– FERC-ordered refunds owed by Enron for

overcharging

– Tax refunds (both corporate and personal)

– Mundane business matters

In a given matter, one might be of interest

while the others are not

Failure to Disambiguate

Words that Relate to Multiple Subjects

Page 11: SF Women in eDiscovery Sept 2011

12/5/2011 11

Priv by

High-Speed

Manual Review

Source

Data

Eliminate

Duplicates &

System Files

Non-Responsive

Isolation

ontologies

Responsive

by Technology

Enhanced

Review

(removed

another 7%)

NR by

Technology

Enhanced

Review

(removed

another 18%)

30%

30%

15%22%

100%

3%

Technology Enhanced Review:

Speed, Predictable Costs, and Accuracy

Automate any portion of the review

Example from a real case

Page 12: SF Women in eDiscovery Sept 2011

12/5/2011 12

Example: “priv” ontology

Valuable, re-usable work product

Combines classifiers into concepts,

into bigger concepts

Page 13: SF Women in eDiscovery Sept 2011

12/5/2011 13

Disclaimer Detection

Disclaimers can throw off attempts to detect privileged communications

Prevalent throughout many companies, even on trivial communications

Detect them automatically, and exclude them from searches

Page 14: SF Women in eDiscovery Sept 2011

12/5/2011 14

Privileged by Actor and Term

Privileged by Actor Only

Privileged by Term Only

Domain of Disclaimer

Detection

Responsive

Privileged by

Disclaimer Only

Page 15: SF Women in eDiscovery Sept 2011

12/5/2011 15

Priv Logs

Expensive - But Do NOT Have to Be

In re Vioxx Products Liability Litigation (E.D. La 2007)

Merck’s Priv Log had 30,000 items on it

– How to Make a Judge Angry

– How to Waste Client Money

– How to Attract Sanctions

Page 16: SF Women in eDiscovery Sept 2011

12/5/2011 16

Transparency of Process

Discussing Review Protocols

– Provide transparent, defensible, sophisticated search

based on document content

– Clustering, Ontologies, Analytics, and yes, sometimes

Keywords too

Develop search methodologies for each case

– Use technology experts in consultation with case / legal

experts

Results verifiable by Quality Control

– Defensible sampling

Sophisticated Tools

– Understand What They Do and Don’t Do Well

– Inform Yourself, Speak to References, Consultants

Page 17: SF Women in eDiscovery Sept 2011

Blair &Maron:

Keyword search is incomplete

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Predicted Obtained

Resp

on

siv

e d

ocu

men

ts

Blair and Maron, Communications of the ACM, 28, 1985, 289-299

What the lawyers thought

they were finding

What they

actually found

Page 18: SF Women in eDiscovery Sept 2011

Blair & Maron Study: 20% recallLawyers picked 3 key terms, B & M found 26 more

Defense: “Unfortunate incident” Plaintiff: “Disaster”

Blair and Maron, Communications of the ACM, 28, 1985, 289-299

Blair and Maron“It is impossibly difficult for users to

predict the exact words, word

combinations, and phrases that are

used by all (or most) relevant

documents and only (or primarily) by

those documents.”

Page 19: SF Women in eDiscovery Sept 2011

Predictive

Coding

Page 20: SF Women in eDiscovery Sept 2011

Document categorization in Legal Discovery: Computer Classification vs. Manual ReviewHerbert L. Roitblat, Anne Kershaw, & Patrick Oot

Page 21: SF Women in eDiscovery Sept 2011

0.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Team A Team B System C System D

Ag

reem

en

t w

ith

ori

gin

al

Roitblat, Kershaw, &Oot, 2010, JASIST

Manual

review

Computer

classification

Page 22: SF Women in eDiscovery Sept 2011

Gold Standard

Page 23: SF Women in eDiscovery Sept 2011

Turing test

Alan Turing, 1912-1954

Page 24: SF Women in eDiscovery Sept 2011

Substantial disagreement between Team A & Team B

629 580 858

0 500 1000 1500 2000

Responsive Documents

A

Both

B

28%

Roitblat, Kershaw, &Oot, 2010, JASIST

Page 25: SF Women in eDiscovery Sept 2011

Conclusion

The computer systems yielded comparable level of

performance relative to manual review

Fewer people, less time, less cost

Measure performance to evaluate

Page 26: SF Women in eDiscovery Sept 2011

Will lawyers lose control?

Computer system amplifies the

intelligence of the Expert

Page 27: SF Women in eDiscovery Sept 2011

Will lawyers

lose their jobs?

Page 28: SF Women in eDiscovery Sept 2011

Tap into the mind of an expert

Page 29: SF Women in eDiscovery Sept 2011

12/5/2011 29

Technology-Enhanced or Automated Review

Page 30: SF Women in eDiscovery Sept 2011

Setup

Sample

Expert judges

sample

Non-

responsiveResponsive

Model learns

Model

predicts

Model categorizes all remaining

documents

Responsive Non-responsive

Repeat as needed

Page 31: SF Women in eDiscovery Sept 2011

Predictive coding achieves much higher

accuracy (Jaccard)

0.186

0.304

0.688

0.281

0.126

0.415

Responsive Documents

Team A Only Team A and Team B Team B

Humans Humans and Predictive Coding Predictive Coding

Data from Roitblat, et al. and an Internal OrcaTec Case Study

Page 32: SF Women in eDiscovery Sept 2011

Why doesn’t everyone use it?

• Attorneys don’t understand the

technology

• May not be aware of the accuracy data

• May not understand how to fit into their

work flow

• Not in everyone’s economic interest

• Acceptable to judges?

Page 33: SF Women in eDiscovery Sept 2011

Defensible?

Measure TREC

2008

Roitblat, e

t al. Team

A

Roitblat

et al.

Team B

Predictiv

e

Coding*

Precision 0.210 0.197 0.183 0.899

Recall 0.555 0.488 0.539 0.873

*OrcaTec internal Result

Page 34: SF Women in eDiscovery Sept 2011

12/5/2011 34

Thank you!

Sonya Sigler

650-281-8325

[email protected]

Herb Roitblat

770-650-7706x229

[email protected]