Statistical Validation And Data Analytics In e Discovery

23
Statistical Validation And Data Analytics In eDiscovery Geoff Black Director, High Tech Investigations Prudential The views expressed in this presentation are solely those of the presenter and do not necessarily reflect the views of the presenter’s employer.

description

Statistical Validation And Data Analytics In e Discovery. Geoff Black Director, High Tech Investigations Prudential. The views expressed in this presentation are solely those of the presenter and do not necessarily reflect the views of the presenter’s employer. Recommended Reading. - PowerPoint PPT Presentation

Transcript of Statistical Validation And Data Analytics In e Discovery

Page 1: Statistical Validation And Data Analytics In  e Discovery

Statistical Validation And Data Analytics In eDiscovery

Geoff BlackDirector, High Tech Investigations

Prudential

The views expressed in this presentation are solely those of the presenter and do not necessarily reflect the views of the presenter’s employer.

Page 2: Statistical Validation And Data Analytics In  e Discovery

Recommended Reading

Page 3: Statistical Validation And Data Analytics In  e Discovery

Why do we need Statistics?(Ensuring Quality in eDiscovery)

Professional standards

Savvy judges already require sampling

Defensibility

Page 4: Statistical Validation And Data Analytics In  e Discovery

Types of Sampling

Judgmental

Statistical*

Page 5: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with Sampling

Setting the stage

Page 6: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with SamplingThe Challenge

Select appropriate filters for a large data set

Audit reviewers without double reviewing everything

Test our processing tools

Accomplish all of these with a high confidence level and low confidence interval

Page 7: Statistical Validation And Data Analytics In  e Discovery

Statistics for eDiscoveryConfidence Interval

The “confidence interval” or margin of error

How closely our results will reflect the general population

Lower is better

Page 8: Statistical Validation And Data Analytics In  e Discovery

Statistics for eDiscoveryConfidence Interval Example

We have 100 documents and our confidence interval is ± 2%.

Testing shows 10% responsiveness

General population should show between 8% and 12% responsiveness, or

8 to 12 documents.

Page 9: Statistical Validation And Data Analytics In  e Discovery

Statistics for eDiscoveryConfidence Level

The “confidence level”

Does our sample accurately represent the results of general population?

Higher is better

Page 10: Statistical Validation And Data Analytics In  e Discovery

Statistics for eDiscoverySample Sizes for Population of 1,000,000

± 10% ± 5% ± 2%0

500

1,000

1,500

2,000

2,500

3,000

3,500

4,000

4,500

99% Confidence Level95% Confidence Level90% Confidence Level

Margin of Error

Page 11: Statistical Validation And Data Analytics In  e Discovery

[Scaling] Statistics for eDiscovery

10,000 100,000 1,000,000 10,000,0002,800

3,000

3,200

3,400

3,600

3,800

4,000

4,200

4,400

Sample Sizes at 99% Confidence ± 2%

Population Size

Page 12: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with SamplingFiltering Selection

Finding a good search method is difficult

Who chooses search terms?

Requires iterative testing and validation

Page 13: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with SamplingValidating Filters

Began with around 10,000,000 documents

A 99% confidence level with a ± 2% confidence interval dictated a sample size of 4,150 documents

Chose a random sample and searched

Reviewed all the results (positive and negative)

Page 14: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with SamplingValidating Filters

Results did not match expectations

Revised the list of search terms

Tested the filtering again, and…

A more accurate search with less responsive data!

Page 15: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with SamplingValidating Filters

Wait a minute, I always test my keywords!

Not whether you test, but how much data…

Page 16: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with SamplingValidating Review

After filtering about 120,000 documents to review

Reviewers often disagree about relevance or simply don’t understand the material

Double and triple review kills budgets

Instead, sample a random set of 4,010 reviewed documents

Page 17: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with SamplingValidating Review

Subject matter expert noted a few anomalies

Re-reviewed items with the confusing term

One reviewer’s results could not be trusted

Page 18: Statistical Validation And Data Analytics In  e Discovery

A Recent Experience with SamplingKeeping Your Vendors Honest

How do they test their tools?

How were automated tools used in your matter?

Do you know what they cannot do?

How did you use the results in your decisions?

Page 19: Statistical Validation And Data Analytics In  e Discovery

What’s Next?

Built-in iterative review with statistical sampling

Relying solely on “Concept Searching” is a black box and a dead end

Advanced search techniques must offer explanatory reasoning

Page 20: Statistical Validation And Data Analytics In  e Discovery

What does all this mean?(The Benefits of Using Statistics)

Small dataset for testing

Minimize false positives

More accurate search, reduced data volume

Defensibility of statistically validated testing

Page 21: Statistical Validation And Data Analytics In  e Discovery

One last thing…

Technologies will always differ and change rapidly,

but statistical validation is a timeless truth.

Page 22: Statistical Validation And Data Analytics In  e Discovery

References & Related Cases—The Sedona Conference Working Group Series, “Commentary

on Achieving Quality in the E-Discovery Process,” May 2009.—Losey, Ralph. “The Multi-Modal ‘Where’s Waldo?’ Approach

to Search…,” 2010. http://e-discoveryteam.com/2010/02/27/—William A. Gross Construction Associates, Inc. v. American

Manufacturers Mutual Insurance Co., 256 F.R.D. 134, 134 (S.D.N.Y. 2009)

—Victor Stanley v. Creative Pipe, 250 F.R.D. 251 (D. Md. 2008) —In re Seroquel Products Liability Litigation, 244 F.R.D. 650, 662

(M.D. Fla. 2007)

Page 23: Statistical Validation And Data Analytics In  e Discovery

Statistical Validation And Data Analytics In eDiscovery

Geoff [email protected]

www.geoffblack.com/ediscovery