Catalyst Predictive Ranking for the Real World

8/12/2019 Catalyst Predictive Ranking for the Real World

1/14

Predictive Ranking: TechnologyAssisted Review Designedfor the Real World

By Jeremy Pickens, Ph.D.

Senior Applied Research Scientist

Catalyst Repository Systems


2/14


3/14

i

Contents

Why Predictive Ranking? ........................................................................ 1

What is Predictive Ranking? ................................................................... 1

Finding Responsive Documents ............................................................. 2

Examples of Finding ............................................................................... 3

Validating What You Have Found ........................................................... 6

Workflow: Putting Finding and Validating Together ................................ 6

Standard TAR Workflow ......................................................................... 7

How Catalysts Workflow Differs ............................................................. 8

Consequences and Benefits of the Catalyst Workflow............................ 9

About the Author ................................................................................... 10


4/14


5/14

1

Why Predictive Ranking?

Most articles about technology assisted review (TAR) start with dire warnings about theexplosion in electronic data. In most legal matters, however, the reality is that thequantity of data is big, but it is no explosion. The fact of the matter is that even a halfmillion documentsa relatively small number in comparison to the big data of thewebposes a significant and serious challenge to a review team. That is a lot ofdocuments and can cost a lot of money to review, especially if you have to go throughthem in a manual, linear fashion. Catalysts Predictive Ranking bypasses that linearity,helping you zero-in on the documents that matter most. But that is only part of what itdoes.

In the real world of e-discovery search and review, the challenges lawyers face comenot merely from the explosion of data, but also from the constraints imposed by rollingcollection, immediate deadlines, and non-standardized (and at times confusing)validation procedures. Overcoming these challenges is as much about process and

workflow as it is about the technology that can be specifically crafted to enable thatworkflow. For these real-world challenges, Catalysts Predictive Ranking providessolutions that no other TAR process can offer.

In this article, we will give an overview of Catalysts Predictive Ranking and discuss howit differs from other TAR systems in its ability to respond to the dynamics of real-worldlitigation. But first, we will start with an overview of the TAR process and discuss someconcepts that are key to understanding how it works.

What is Predictive Ranking?Predictive Ranking is Catalysts proprietary TAR process. We developed it more thanfour years ago and have continued to refine and improve it ever since. It is the processused in our newly released product, Insight Predict.

In general, all the various forms of TAR share common denominators: machinelearning, sampling, subjective coding of documents, and refinement. But at the end ofthe day, the basic concept of TAR is simple, in that it must accomplish only twoessential tasks:

1. Finding all (or proportionally all) responsive documents.

2. Verifying that all (or proportionally all) responsive documents have been found.

That is it. For short, let us call these two goals finding and validating.


6/14

2

Finding Responsive Documents

Finding consists of two parts:

1. Locating and selecting documents to label. By label, we mean manually mark

them as responsive or nonresponsive.

2. Propagating (via an algorithmic inference engine) these labels onto unseendocuments.

This process of finding or searching for responsive documents is typically evaluatedusing two qualitative measures: precision and recall. Precision is a measure of thenumber of true hits (actually responsive documents) in the search compared against thetotal number of hits returned. Recall is a measure of the total true hits returned from thesearch against the actual number of true hits in the population.

One area of contention and disagreement among vendors is step 1, the samplingprocedures used to train the algorithm in step 2. Vendors philosophies general fall intoone of two camps, which loosely can be described as judgmentalists and randomists.

The judgmentalist approach assumes that litigation counsel (or the review manager)has the most insightful knowledge about the domain and matter and is therefore goingto be the most effective at choosing training documents. The randomist approach, onthe other hand, is concerned about bias. Expertise can help the system quickly findcertain pockets of responsive information, the randomists concede, but the problemthey see is that even experts do not know what they do not know. By focusing theattention of the system on some documents and not others, the judgmental approachpotentially ignores large swaths of responsive information even while it does

exceptionally well at finding others.

Therefore, the random approach samples every document in the collection with equalprobability. This even-handed approach mitigates the problem of human bias andensures that a wide set of starting points are selected. However, there is still noguarantee that a simple random sample will find those known pockets of responsiveinformation about which the human assessor has more intimate knowledge.

At Catalyst, we recognize merits in both approaches. An ideal process would be onethat combines the strengths of each to overcome the weakness of the other. Onestraightforward solution is to take the more-is-more approach and do both judgmental

and random sampling. A combined sample not only has the advantage of humanexpertise, but also avoids some of the issues of bias.

However, while it is important to avoid bias, simple random sampling misses the point.Random sampling is good for estimating counts; it does not do as well at guaranteeingtopical coverage (suspecting all pockets). The best way to avoid bias is not to pickrandom documents, but to select documents about which you know that you know verylittle. Lets call it diverse topical coverage.


7/14

3

Remember the difference between the two goals: finding vs. validating. For validation, astatistically valid random sample is required. But for finding, we can be more intelligentthan that. We can use intelligent algorithms to explicitly detect which documents weknow the least about, no matter which other documents we already know somethingabout. This is more than just simple random sampling, which has no guarantee totopically cover a collection. This is using algorithms to explicitly seek out thosedocuments about which we know nothing or next to nothing. The Catalyst approach istherefore to not stand in the way of our clients by shoehorning them into a singlesampling regimen for the purpose of finding. Rather, our clients may pick whateverdocuments that they want to judge, for whatever reason and contextual diversitysampling will detect any imbalances and help select the rest.

Examples of Finding

The following examples illustrate the performance of Catalysts intelligent algorithmswith respect to the various points that were made in the previous section about random,

judgmental, and contextual diversity sampling. In each of these examples, the horizontalx-axis represents the percentage of the collection that must be reviewed in order to find(on the y-axis) the given recall level using Catalysts Predictive Ranking algorithms.

For example, in this first graph we have a Predictive Ranking task with a significantnumber of responsive documents, a high richness. There are two lines, eachrepresenting a different initial seed condition: random versus judgmental. The first thingto note is that judgmental sampling starts slightly ahead of random sampling. Thedifference is not huge; the judgmental approach finds perhaps 2-3% more documentsinitially. That is to be expected, because the whole point of judgmental sampling is that

the human can use his or her intelligence and insight into the case or domain to finddocuments that the computer is not capable of finding by strictly random sampling.

That brings us to the concern that judgmental sampling is biased and will not allow TARalgorithms to find all the documents. However, this chart shows that by using Catalystsintelligent iterative Predictive Ranking algorithms, both the judgmental and randominitial sampling get to the same place. They both get about 80% of the availableresponsive documents after reviewing only 6% of the collection, 90% after reviewingabout 12% of the collection, and so forth. Initial differences and biases are swallowedup by Catalysts intelligent Predictive Ranking algorithms.


8/14

4

In the second graph, we have a different matter in which the number of availableresponsive documents is over an order of magnitude less than in the previous example;the collection is very sparse. In this case, random sampling is not enough. A randomsample does not find any responsive documents, so nothing can be learned by anyalgorithm. However, the judgmental sample does find a number of responsivedocuments, and even with this sparse matter, 85% of the available responsivedocuments may be found by only examining a little more than 6% of the collection.

However, a different story emerges when the user chooses to switch on contextualdiversity sampling as part of the algorithmic learning process. In the previous example,contextual diversity was not needed. In this case, especially with the failure of therandom sampling approach, it is. The following graph shows the results of both random

0

10

20

30

40

50

60

70

80

90

100

0 5 10 15 20

Random Judgmental

0

10

20

30

40

50

60

70

80

90

100

0 2 4 6 8

Random Judgmental


9/14

5

sampling and judgmental sampling with contextual diversity activated, alongside theoriginal results with no contextual diversity:

Adding contextual diversity to the judgmental seed has the effect of slowing learning inthe initial phases. However, after only about 3.5% of the way through the collection, itcatches up to the judgmental-only approach and even surpasses it. A 95% recall maybe achieved a little less than 8% of the way through the collection. The results foradding contextual diversity to the random sampling are even more striking. It alsocatches up to judgmental sampling about 4% of the way through the collection and alsosurpasses it by the end, ending up at just over 90% recall a little less than 8% of the

way through the collection.

These examples serve two primary purposes. First, they demonstrate that Catalystsiterative Predictive Ranking algorithms work, and work well. The vast majority of acollection does not need to be reviewed, because the Predictive Ranking algorithm finds85%, 90%, 95% of all available responsive documents within only a few percent of theentire collection.

Second, these examples demonstrate that, no matter how you start, you will attain thatgood result. It is this second point that bears repeating and further consideration. Real-world e-discovery is messy. Collection is rolling. Deadlines are imminent. Experts arenot always available when you need them to be available. It is not always feasible tostart a TAR project in the clean, perfect, step-by-step manner that a vendor mightrequire. Knowing that one can instead start either with judgmental samples or withrandom samples, and that the ability to add a contextual diversity option ensures thatearly shortcomings are not only mitigated but exceeded, is of critical importance to aTAR project.

0

10

20

30

40

50

60

70

80

90

100

0 2 4 6 8

Random Judgmental

Random+Diversity Judgmental+Diversity


10/14

6

Validating What You Have Found

Validating is an essential step in ensuring legal defensibility. There are multiple ways ofdoing it. Yes, there needs to be random sampling. Yes, it needs to be statisticallysignificant. But there are different ways of structuring the random samples. The most

common method is to do a simple random sample of the collection as a whole, and thenanother simple random sample of the documents that the machine has labeled asnonresponsive. If the richness of responsivedocuments in the latter sample hassignificantly decreased from the responsive-document richness in the initial wholepopulation, then the process is considered to be valid.

However, at Catalyst we use a different procedure, one that we think is better atvalidating results. Like other methods, it also relies on random sampling. However,instead of doing a simple random sample of a set of documents, we use a systematicrandom sample of a ranking of documents. Instead of labeling documents first andsampling for richness second, the Catalyst procedure ranks all documents by their

likelihood of being responsive. Only then is a random samplea systematic randomsampletaken.

At equal intervals across the entire list, samples are drawn. This gives Catalyst theability to better estimate the concentration of responsive documents at every point in thelist than an approach based on unordered simple random sampling. With this betterestimate, a smarter decision boundary can be drawn between the responsive andnonresponsive documents. In addition, because the documents on either side of thatboundary have already been systematically sampled, there is no need for a two-stagesampling procedure.

Workflow: Putting Finding and Validating Together

In the previous section, we introduced the two primary tasks involved in TAR: findingand validation. If machines (and humans, for that matter) were perfect, there would beno need for these two stages. There would only be a need for a single stage. Forexample, if a machine algorithm were known to perfectly find every responsivedocument in the collection, there would be no need to validate the algorithms output.

And if a validation process could perfectly detect when all documents are correctlylabeled, there would be no need to use an algorithm to find all the responsive ones; allpossible configurations (combinatorial issues aside) could be tested until the correct one

is found.

But no perfect solutions exist for either task, nor will they in the future. Thus, the reasonfor having a two-stage TAR process is so that each stage can provide checks andbalances to the other. Validation ensures that finding is working, and finding ensuresthat validation will succeed.


11/14

7

Therefore, TAR requires some combination of both tasks. The manner in which bothfinding and validation are symbiotically combined is known as the e-discovery workflow.Workflow is a non-standard process that varies from vendor to vendor. For the mostpart, every vendors technology combines these tasks in a way that, ultima tely, isdefensible. However, defensibility is the minimum bar that must be cleared.

Some combinations might work more efficiently than others. Some combinations mightwork more effectively than others. And some workflows allow for more flexibility to meetthe challenges of real world e-discovery, such as rolling collection.

Well discuss a standard model, typical of the industry, then review Catalysts approach,and finally conclude with the reason Catalysts approach is better. Hint: Its not (only)about effectiveness, although we will show that it is that. Rather, it is about flexibility,which is crucial in the work environments in which lawyers and review teams use thistechnology.

Standard TAR Workflow

Most TAR technologies follow the same essential workflow. As we will explain, thisstandard workflow suffers from two weaknesses when applied in the context of real-world litigation. Here are the steps it entails:

(1) Estimate via simple random sampling how many responsive and nonresponsivedocs there are in the collection (aka estimate whole population richness).

(2) Sample (and manually, subjectively code) documents.

(3) Feed those documents to a predictive coding engine to label the remainder of thecollection.

(4) If manual intervention is needed to assist in the labeling (for example viathreshold or rank-cutoff setting), do so at this point.

(5) Estimate via sample random sampling how many responsive documents thereare in the set of documents that have been labeled in steps 3 and 4 asnonresponsive.

(6) Compare the estimate in step 5 with the estimate in step 1. If there has been a

significant decrease in responsive richness, then the process as a whole is valid.

TAR as a whole relies on these six steps working as a harmonious process. However,each step is not done for the same reason. Steps 2-4 are for the purpose of finding andlabeling. Steps 1, 5, and 6 are for the purpose of validation.

The first potential weakness in this standard workflow stems from the fact that thevalidation step is split into two parts, one at the very beginning and one at the very end.


12/14

8

It is the relative comparison between the beginning and the end that gives this simplerandom-sampling-based workflow its validity. However, that also means that in order toestablish validity, no new documents may arrive at any point after the workflow asstarted. Collection must be finished.

In real-world settings, collection is rarely complete at the outset. If new documentsarrive after the whole-population richness estimate (step 1) is already done, then thatestimate will no longer be statistically valid. And if that initial estimate is no longer valid,then the final estimates (step 5), which compare themselves to that initial estimate, willalso not be valid. Thus, the process falls apart.

The second potential weakness in the standard workflow is that the manual interventionfor threshold setting (step 4) occurs before the second (and final) random sampling(step 5). This is crucial to the manner in which the standard workflow operates. In orderto compare before and after richness estimates (step 1 vs. step 5), concrete decisionswill have had to be made about labels and decision boundaries. But in real-world

settings, it may be premature to make concrete decisions at this point in the overallreview.

How Catalysts Workflow Differs

In order to circumvent these weaknesses and match our process more closely to real-world litigation, Catalysts Predictive Ranking uses a proprietary, four-step workflow:

(1) Sample (and manually, subjectively code) documents.

(2) Feed those documents to our Predictive Ranking engine to rank the remainder ofthe collection.

(3) Estimate via a systematic random sample the relative concentration ofresponsive documents throughout the ranking created in step 2

(4) Based on the concentration estimate from step 3, select a threshold or rank-cutoff setting which gives the desired recall and/or precision.

Once again, as with the standard predictive coding workflow, our Predictive Ranking asa whole relies on these four steps working as a harmonious process. However, eachstep is not done for the same reason. Steps 1 and 2 are for the purpose of finding andlabeling. Steps 3 and 4 are for the purpose of validation.

Two important points should be noted about Catalysts workflow. The first is that thevalidation step is not split into two parts. Validation only happens at the very end of theentire workflow. If more documents arrive while documents are being found and labeledduring steps 1 and 2 (i.e. if collection is rolling), the addition of new documents does notinterfere with anything critical to the validation of the process. (Additional documents


13/14

9

might make finding more difficult; finding is a separate issue from validating, one whichCatalysts contextual diversity sampling algorithms are designed to address.)

The fact that validation in our workflow is not hampered by collections that are fluid anddynamic is significant. In real-world e-discovery situations, rolling collection is the norm.Our ability to handle this fluidity nativelyby which we mean central to the way theworkflow normally works, rather than as a tacked-on exceptionis highly valuable tolawyers and review teams.

The second important point to note about Catalysts workflow is that the manualintervention for threshold setting (step 4) happens afterthe systematic random sample.

At first it may seem counterintuitive as to why this is defensible, because choices aboutthe labeling of documents are happening after a random sample has been taken. Butthe purpose of the systematic random sample is to estimate concentrations in astatistically valid manner. Since the concentration estimates themselves are valid,decisions made based on those concentrations are also valid.

Consequences and Benefits of the Catalyst Workflow

We already touched on two key ways in which the Catalyst Predictive Ranking workflowis unique from the industry standard workflow. It is important to understand what ourworkflow allows usand youto do:

(1) Get good results. Catalyst Predictive Ranking consistently demonstrates highscores for both precision and recall.

(2) Add more training samples, of any kind, at any time. That allows the flexibility ofhaving judgmental samples without bias.

(3)Add more documents, of any kind, at any time. You dont have to wait 158 daysuntil all documents are collected. And you dont have to repeat step 1 of thestandard workflow when those additional documents arrive.

(4) Go through multiple stages of culling and filtering without hampering validation. Inthe standard workflow, that would destroy your baseline. This is not a concernwith the Catalyst approach, which saves the validation to the very end, via thesystematic sample.

Catalyst has more than four years of experience using Predictive Ranking techniques totarget review and reduce document populations. Our algorithms are highly refined andhighly effective. Even more important, however, is that our Predictive Ranking workflowhas what other vendors workflows do notthe flexibility to accommodate real-world e-discovery. Out there in the trenches of litigation, e-discovery is a dynamic process.Whereas other vendors TAR workflows require a static collection, ours flows with thedynamics of your case.


14/14

10

About the Author

Jeremy Pickens, Ph.D., is one of the world's leading search scientists and a pioneer in

the field of collaborative exploratory search, a form of search in which a group of peoplewho share a common information need actively collaborate to achieve it. Dr. Pickenshas six patents pending in the field of search and information retrieval, including two forcollaborative exploratory search systems.

At Catalyst, Dr. Pickens researches and develops methods of using collaborativesearch to achieve more intelligent and precise results in e-discovery search and review.He also studies other ways to enhance search and review within the Catalyst system.

Dr. Pickens earned his master's and doctoral degrees at the University ofMassachusetts, Amherst, Center for Intelligent Information Retrieval. He conducted hispost-doctoral work at King's College, London, on a joint grant with Goldsmiths Universityof London. As part of the OMRAS project (Online Music Recognition and Searching), hehelped organize the first Music Information Retrieval (ISMIR) conference in Plymouth,Mass. Before joining Catalyst, Dr. Pickens spent five years as a research scientist at FXPalo Alto Lab, where his major research themes included video search andcollaborative exploratory search.

Dr. Pickens is co-author of the forthcoming book,A Taxonomy of CollaborativeInformation Seeking, to be published by Morgan & Claypool Publishers. He was aneditor of the spring 2010 special issue on collaborative information seeking of the

journal Information Processing and Management. He is a frequent author and speakeron the topic.

Catalyst Predictive Ranking for the Real World

Documents

Transcript of Catalyst Predictive Ranking for the Real World