The ROI of Analytics Scott W. Lombard, SME Senior Vice President – Litigation Management Rust...
-
Upload
camila-burks -
Category
Documents
-
view
213 -
download
0
Transcript of The ROI of Analytics Scott W. Lombard, SME Senior Vice President – Litigation Management Rust...
The ROI of Analytics
Scott W. Lombard, SME
Senior Vice President – Litigation Management
Rust Consulting Inc
Premise
Analytics represent the strategic use of technology against a set of data by applying legal and case-specific knowledge against “data”…with the intention of creating “information” clusters (to reduce the amount and identify accurate context) of potentially-proof of claims (or, create confidence and accuracy that you’ve produced all you reasonably can about the “data) to create accurate productions effectively and efficiently
Data
A Little More Data
More Data
It’s Home
Key Issues
• In 2011 alone,1.8 zettabytes (or 1.8 trillion gigabytes) of data will be created, the equivalent to every U.S. Citizen writing 3 tweets per minute for 26,976 years. (IDC)
• It’s estimated that 294B emails are sent per day (90 trillion per year) (Radicati Group)
• 188B text messages are sent a month (6.3B per day or 2.3 trillion per year) (CNN)
Where Is The Data?
• 52% of all data is stored on hard disk drives• 28% is stored in optical storage• 11% is stored on digital tapes• 9% is stored in other forms
Defining “Analytics”
• Technology assisted review, computer assisted review, predictive coding, predictive priority and text analytics, etc.
• Industry buzzwords coined to describe applications of a mathematical modeling technique by which documents are mapped into a concept space according to their textual content. • Within a concept space, similarity between documents can
be measured by their proximity to one another.
The concept space: The Backbone of analytics
Fundamental observation:
Documents in close proximity are similar in content
Fundamental observation:
Documents in close proximity are similar in content
Intelligent Review
• Using a seed set of documents coded by humans, the algorithm projects coding decisions across the entire document collection
• Segregate documents for priority review• Elevate potentially responsive documents• Reduce priority of likely non-responsive documents• Identify potentially privilege set for review
• Reduce the universe of documents• Remove “noise” and/or privilege documents• Statistically evaluate confidence level and margin of error to
verify results.
JUNK JUNK JUNK RESP RESP SEED SET INTELLIGENT REVIEW
JUNK JUNK JUNK RESP RESP
RESP RESP
RESP RESP
RESP RESP
RESPRESP
Non Responsive Hot
Background
Privilege
Clustering:
Group documents based on conceptual patterns throughout the document
Concept searching• Find similar documents within a user defined similarity
threshold
Concept searching• Find similar documents within a user defined similarity
threshold
Other Applications
• Assisted keyword generation• Submit a keyword of list of keywords, and the
algorithm will suggest additional terms appearing in similar contexts across the document universe
• Near duplicate detection• Locate and remove duplicate documents based on
content rather than hash value
Plaintiffs
Added Value
• More efficient review Remove non-responsive documents Promote potentially responsive documents Able to evaluate ROI
• Discover more, sooner Quickly locate hot documents through concept
searching and clustering Form case strategy early on Difficult to place $ value
Techniques Used
• Clustering• Concept searching• Intelligent review• Assisted keyword generation• Statistical sampling to define confidence interval and
margin of error
Cost
• Typically a onetime fee based on GB volume E.G. For 100 GB’s of ESI, at $125/GB, enabling
analytics would cost $12,500• Hourly fees for analytics expert
Less common for plaintiffs Defendants aren’t concerned with the statistical
defensibility of plaintiffs’ review May still be useful in verifying results $125-$350/hour
Measuring ROI
• Ask: how does assisted review compare with human review?
• To measure this, we need to know:Chr = cost of human review ($⁄document) V = cost of assisted review ($⁄gb) Car = volume of data (gb) Nar= number of documents coded using analytics
Case Study
• Large Anti-Trust matter• Review set of 1.9 million documents• Client used linear review workflow for first 14 months • Estimated another 36 months to complete review• Implemented analytics in relativity to accelerate the review
process
Cost
• In January, 2013, 46,730 documents were reviewed• Manual review
Coding time was 1235.5 hours At a rate of $25 per hour, the monthly review cost
was $30,887, or $.66 per document• Total review cost = 1235.5 hrs x $25/hr = $30,887• Cost per document = $30,887/46,730 = $.66 per
document• Analytics
Priced at $125/GB Volume: 571 GB’s
Minimum to Add Value
ANALYTICS WILL HAVE A SUPERIOR ROI OVER HUMAN REVIEW IF IT CAN BE USED TO DEFINE
108,144 DOCUMENTS.
Minimum To Add Value
5.5%
31.0%
63.5%
Required to Outperform Unassisted Review
Reviewed
Unreviewed
1 2 5 10
15 20
25 30
35 40
45 50
55 60
65 70
75 80
85 90
95 100
$0
$200,000
$400,000
$600,000
$800,000
$1,000,000
$1,200,000
$1,400,000Minimum to Add value
Analytics Assisted Review Cost Manual Review Cost
% Eliminated Using Analytics
5.5%
Results
• In first 30 days:• Through concept searching and clustering, 250,000
non-responsive files removed from the review set, over twice the minimum required to add value to the case
• This set of non-responsive documents could not be otherwise identified using date, file type, NIST or other metadata filtration.
Results: 30 Days
12.7% 2.5%
31.0%
53.8%
Removed from Review Set (AR)
Elevated for Immediate Review (AR)
Reviewed
Unreviewed
Savings: 30 days
• Human review cost: 250,000 x $.66 = $165,000• Analytics assisted review cost : = $71,375
$93,625
SAVINGS
1 2 5 10 15 20 25
$900,000
$950,000
$1,000,000
$1,050,000
$1,100,000
$1,150,000
$1,200,000
$1,250,000
$1,300,000
$1,350,000 SAVINGS
Analytics Assisted Review Cost Manual Review Cost
% Eliminated Using Analytics
12.7%
$93,625
Additional Benefits
• Intelligent review identified an urgent review set of roughly 50,000 files, which, after human review, proved to have 10 x more critical documents than unassisted samplings
• With 30% of the review complete, and depositions fast approaching, clustering and concept searching played a critical role in identifying key communications across the non-reviewed set, doubling the number of documents brought to the first two depos
Ongoing
• Analytics has been seamlessly integrated into the linear review workflow.
• As reviewers locate examples of responsive, non-responsive and privilege documents, they tag them as potential examples. An analytics expert then submits examples as “seeds”, used by the engine to locate conceptually similar documents across the universe.• Likely responsive documents are elevated in priority.
When end-reviewers log in, they see these documents first in their list, without additional work.
• Likely non-responsive documents are demoted in priority or removed from review.
Conclusion
• Analytics has• Significantly and measurably accelerated the review • Outperformed ROI from full human review• Improved the productivity of human review• Assisted attorneys in developing a more informed case
strategy, through a better upfront understanding of their discovery universe
Defense
Added Value
• Pre-Process• Remove non-responsive and privilege documents
before incurring processing and hosting fees• Evaluate case merits/weaknesses early on
• Post-Process (review database)• Prioritize review
Privilege candidates Responsive candidates Non-responsive candidates
Pre-Process vs. Post-Process
Raw Data
Vo
lum
e
Host Responsive Data for Review in Relativity
ECA, Pre-Process Analytics, and Processing Platform
ProduceProcess and Export
Responsive Set
Post-ProcessPre-Process
Cost• Pre-Process
• Per GB “in” or “out” In: the total volume of data before it is reduced by
filtration and analytical culling.• Lower per-GB fee• Larger volume
Out: the total volume of data exiting the ECA platform, and entering the hosted review platform.
• Higher per-GB fee• Lower volume
• Post-Process• Same as plaintiffs
GB’s “In” vs. “Out”
Raw Data
Vo
lum
e
Host Responsive Data for Review in Relativity
ECA, Pre-Process Analytics, and Processing Platform
ProduceProcess and Export
Responsive Set
GB’s “Out”
GB’s “In”
Measuring ROI
V = volume of initial collectionCa = cost of analytics ($/gb)Cp = cost to process ($/gb)
Ch= cost to host ($/gb)Ra = proportion of documents deemed potentially responsive after analytic filtration (%)
T = duration of hosted review
Case Study• Large IP litigation• Initial collection totaled 3.5 TB’s
PST’s Desktop PC’s Other media
• NIST filtration and de-duplication removed 34% of the collection, leaving 2.31 TB’s of potentially responsive data
• Analytics was implemented after de-NIST/de-duplication but prior to processing
• Estimate for linear privilege/responsive review phase was 12 months
Techniques Used
• Clustering• Concept searching• Assisted keyword generation• Near duplicate removal• Statistical sampling to define confidence interval
and margin of error
Cost
Volume of data set = V = 2.3 tbAnalytics cost = ca = $125/gb
Processing cost = cp = $250/gbHosting cost = ch= $25/gb/mo
Proportion of responsive (not filtered) data = ra = unknownDuration of review = t = 12 months
Minimum to Add Value
RA < .77Analytics will have a superior ROI if it can be used to cull
>23% of the initial data set.
CA + RA CP + RACHT < CP + CHT
0.0%5.0%
10.0%15.0%
20.0%25.0%
30.0%35.0%
40.0%45.0%
50.0%55.0%
60.0%65.0%
70.0%75.0%
80.0%85.0%
90.0%95.0%
100.0%
$0
$200,000
$400,000
$600,000
$800,000
$1,000,000
$1,200,000
$1,400,000 Minimum to Add value
Analytics Assisted Review Cost Un-Assisted
% Eliminated Using Analytics
23%
Results
• Using analytics, we were able to reduce the data set to 750 GB’s of potentially responsive data• Clustering and concept searching for non-responsive
(Junk email), and privilege documents proved most effective
• Ra, the portion of potentially responsive documents after analytics, was 750/2,300 = 32%, meaning 68% of the data was removed, over twice the minimum to add value of 23%
• Statistical sampling was used to verify accuracy of each “cull criteria” before data was removed
Savings
• Cost of standard model:• $250(2300) + ($25)(12)(2300) = $1,265,500
• Cost of analytics assisted model:• $125(2300) + $550(.32)(2300) = $692,300
Savings $573,200
0.00 0.05
0.10 0.15
0.20 0.25
0.30 0.35
0.40 0.45
0.50 0.55
0.60 0.65
0.70 0.75
0.80 0.85
0.90 0.95
1.00
$0
$200,000
$400,000
$600,000
$800,000
$1,000,000
$1,200,000
$1,400,000 Savings
Analytics Assisted Review Cost Un-Assisted
% Eliminated Using Analytics
68%
$573,200
Additional Benefits
• Using analytics early on, lead attorneys were able to gain an in-depth knowledge of their data set• Key “good” and “bad” documents were located• Additional custodians and email addresses were
identified• Results were used to defining case strategy early on• Promoted cooperation between parties through
• Transparent reporting• Defensible, auditable workflow• Statistically measured accuracy
Is It Right For My Case?Good
• Emails• PC’s• Removable media• Network shares• HQ scans
Bad• Poor quality scans• Highly repetitive
content (Timecards)• Primarily numeric
content (financial documents)
Recommendations
Work with people you can trust Easy to get burned
Perform a formal evaluation Service provider may provide this free of charge
Education Stay up to date on court decisions Read white papers Schedule a demo
Use commercial, off the shelf software. What is “under the hood” can vary widely.
Thank You!
Scott W. Lombard, SME
Rust Consulting Inc
Senior Vice President – Litigation Management
612-359-2904 (Office)
612-293-8130 (Cell)