When to stop reviewing documents in eDiscovery...

16
When to stop reviewing documents in eDiscovery cases The Lit i View Quality Monitor and Endpoint Detector UBIC, Inc. 2013 All Rights Reserved. Jakob Halskov, Hideki Takeda UBIC Inc., Technology Dept. MEDES/ACM 2013 Luxembourg, October 30 th 2013 Tokyo| Osaka | Nagoya | Seoul | Taipei| Hong Kong | Silicon Valley| Washington DC | New York | London

Transcript of When to stop reviewing documents in eDiscovery...

Page 1: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

When to stop reviewing

documents in eDiscovery cases

The Lit i View Quality Monitor and Endpoint Detector

ⒸUBIC, Inc. 2013 All Rights Reserved.

Jakob Halskov, Hideki Takeda UBIC Inc., Technology Dept.

MEDES/ACM 2013

Luxembourg, October 30th 2013

Tokyo| Osaka | Nagoya | Seoul | Taipei| Hong Kong | Silicon Valley| Washington DC | New York | London

Page 2: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

Outline of talk

• Introduction: Redefining Big Data!

• The Discovery system

• UBIC’s Legal Cloud & Lit i View SaaS

• Outline of Predictive Coding technology

• Impact of Predictive Coding: case study

• Estimating sample size & HOT ratio

• Demo of UBIC’s Quality Monitor and Endpoint Detector

ⒸUBIC, Inc. 2013 All Rights Reserved. 1

Page 3: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

ⒸUBIC, Inc. 2013 All Rights Reserved. 2

Behavior Informatics

We need new approaches for analyzing

Human Thought and Behavior

UBIC redefines Big Data

Big Data is a Universe of

Human Thought and Behavior

Page 4: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

ⒸUBIC, Inc. 2013 All Rights Reserved. 3

Informatics

Statistics – Mathematics

Data Mining - Text Mining

Speech Technology

Behavioral Science

Criminology

Sociology

Psychology

Discover Risk

Discover Knowledge

More Effectively and Efficiently

What is Behavior Informatics?

Page 5: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

Legal Intelligence (eDiscovery)

Discover Risk for Company Digital

Forensics

Business Intelligence

Discover

Knowledge Medicine

Intelligence(Security Support)

Discover Risk for Community

M&A

ⒸUBIC, Inc. 2013 All Rights Reserved.

Applications of Behavior Informatics

Page 6: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

The Discovery system • Data protection and privacy laws in the US are lax

– Categorical document requests (virtually all types of ESI is discoverable)

• Being forced to give information to a competitor/government is RISKY

– Narrow down the amount of information released

– UBIC makes this process as painless as possible while ensuring defensibility

• At the “Meet and Confer” the opposing parties will agree to a Discovery Plan (aka “protocol”)

– Defining the scope of responsive (relevant) data

– Scope of Accessibility & cost shifting (who is paying?)

– Defining privileged data (exempt from production)

– Setting performance goals and deadlines for production

• Recall and defensibility are key under this system

• Famous cases

– ENRON (TREC Legal track)

– Global Aerospace (Dec 2012, Predictive Coding becomes mainstream in eDiscovery, judge sets minimum recall rate at 75%)

ⒸUBIC, Inc. 2013 All Rights Reserved. 5

Page 7: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

The nine phases of eDiscovery

ⒸUBIC, Inc. 2013 All Rights Reserved. 6

Review typically costs 70% of the total costs of an eDiscovery case.

Page 8: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

UBIC’s Lit i View software

• Lit i View (Cloud-based SaaS)

– Used in more than 275 cross-border litigation cases, including

• Plaintiff = private

– Intellectual Property (patent infringement)

– Product Liability, …

• Plaintiff = government

– Anti-trust regulations (cartels)

– Covers virtually all phases of the EDRM

• Custodian identification and management (“Central Linkage”)

• Legal hold management (“Easy hold”)

• Collection & preservation

• Processing (+CJK support, encoding/segmentation etc.)

• Analysis & Review (Predictive Coding)

ⒸUBIC, Inc. 2013 All Rights Reserved. 7

Page 9: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

UBIC Legal Cloud overview

ⒸUBIC, Inc. 2013 All Rights Reserved. 8

Customer benefit: Most data can stay locally (in Asia or US) for the duration of the case

Page 10: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

Outline of Predictive Coding

What:

Supervised machine learning algorithm assigning Relevance Scores to documents

Why:

– Improve quality/consistency of Review

– Save time

– Optimize sampling strategy (control costs)

Flexible & iterative design

– Random sample extraction

– Feature selection

• Morphological analysis (+CJK)

• Statistical analysis

– Feature (re)weighting

• Association measures

– Document (re)scoring

ⒸUBIC, Inc. 2013 All Rights Reserved. 9

Page 11: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

Impact of Predictive Coding: case study

ⒸUBIC, Inc. 2013 All Rights Reserved. 10

• Japanese maker, US law firm

• Over 1 million documents

• PC carried out twice

• Review costs were reduced by

40% vs. conventional human review

Page 12: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

Estimating minimum sample size, ns

The error level, 𝛥𝑝, for the predictor 𝑝 =𝑁𝐻𝑂𝑇

𝑁 is given by:

𝛥𝑝 = 𝛾𝑁 − 𝑛𝑠

𝑁 − 1

𝑝 1 − 𝑝

𝑛𝑠<=>

𝑛𝑠 =𝛾2

𝛥𝑝2

1

𝑁 − 1𝑁

1𝑝(1 − 𝑝)

+𝛾2

𝑁𝛥𝑝2

When N is much greater than ns: 𝑁−𝑛𝑠

𝑁−1→ 1, and thus:

𝑛𝑠 ≈𝛾2

𝛥𝑝2𝑝(1 − 𝑝)

Unfortunately, we do not know p (as NHOT is unknown)

ⒸUBIC, Inc. 2013 All Rights Reserved. 11

Page 13: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

Estimating NHOT and minimum ns

N Conf. Level = 95% Conf. Level = 99%

ns ns << N ns ns << N

10,000 4,899

9,604

6,247

16,641 100,000 8,057 14,267

1,000,000 9,513 16,369

5,000,000 9,586 16,586

ⒸUBIC, Inc. 2013 All Rights Reserved. 12

0

0.1

0.2

0.3

0 0.2 0.4 0.6 0.8 1

p(1

-p)

p

f(p)=p(1-p)

With p=0.5 as the worst case

(giving the highest sample size),

we get

𝑛𝑠 ≈1

4

𝛾2

𝛥𝑝2

Using a confidence interval of

95%, the confidence coefficient

(𝛾) is 1.96, and we can now

compute the minimum sample

sizes for various N, for example

setting the error level at 0.01.

𝑁𝐻𝑂𝑇𝑒𝑠𝑡 = 𝑁 𝑝𝑇𝐴𝐺 ± 𝛥𝑝

𝛥𝑝 = 𝛾𝑁 − 𝑛𝑠

𝑁 − 1

𝑝𝑇𝐴𝐺 1 − 𝑝𝑇𝐴𝐺

𝑛𝑠

𝑝𝑇𝐴𝐺 =𝑛𝑇𝐴𝐺

𝑛𝑠

Page 14: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

Quality Monitor demo

ⒸUBIC, Inc. 2013 All Rights Reserved. 13

Page 15: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

Conclusion & new/future features

• UBIC’s QM and EPD provide a user-friendly UI to secure a high

quality and defensible outcome of the review process

ⒸUBIC, Inc. 2013 All Rights Reserved. 14

• Leveraging

theory from

Social Network

Analysis, UBIC

released “Central

Linkage” on

October 1st 2013

Page 16: When to stop reviewing documents in eDiscovery …sigappfr.acm.org/MEDES/13/download/Jakob-Halskov.pdf– UBIC makes this process as painless as possible while ensuring defensibility

References Diesner, Jana; Terrill L. Frantz; Kathleen M. Carley. 2005. Communication Networks from the Enron Email Corpus. “It’s Always About the People. Enron is no Different”. In Computational & Mathematical Organization Theory. 11 (3), 201-228. Kluwer, MA.

Halskov, Jakob. 2013. Augmenting Predictive Analytics for eDiscovery with Richer Linguistic Features. Poster presentation at Asian Summer School in Information Access, ASSIA (Research Center for Knowledge Communities, Tsukuba University, Japan, June 22-24). http://www.kc.tsukuba.ac.jp/assia2013/poster_presentation

Oard, Douglas W.; Jason R. Baron; Bruce Hedin; David D. Lewis; Stephen Tomlinson. 2010. Evaluation of information retrieval for E-discovery. In Artificial Intelligence and Law, 18 (4), 347-386. Springer, Amsterdam.

Takeda, Hideki. 2013. Trend on Digital Forensic Technologies and Business in Japan. Keynote Speech. In Proceedings of the 5th IEEE International Workshop on Computer Forensics in Software Engineering (Kyoto, Japan, July 22-26). IEEE Computer Society Press, CA.

Webber, William. 2011. Re-examining the Effectiveness of Manual Review. In Proceedings of SIGIR 2011 Information Retrieval for E-Discovery Workshop (Beijing, China, July 28). http://www.umiacs.umd.edu/~oard/sire11/

ⒸUBIC, Inc. 2013 All Rights Reserved. 15