e-Discovery Team at TREC 2015 Total Recall Track

e-DiscoveryTeamatTREC2015TotalRecallTrack

RalphC.Losey∗ JimSullivanandTonyReichenbergerNationale-DiscoveryCounsel Sr.DiscoveryServicesConsultants,

JacksonLewisP.C. KrollOntrack,Inc.

e-DiscoveryTeam.com eDiscovery.com

[email protected] [email protected]

[email protected]

ABSTRACT The2015TRECTotalRecallTrackprovidedinstantrelevancefeedbackinthirtyprejudged

topicssearchingthreedifferentdatasets.Thee-DiscoveryTeamofthreeattorneysspecializingin

legalsearchparticipatedinallthirtytopicsusingKrollOntrack’ssearchandreviewsoftware,

eDiscovery.comReview(EDR).Theyemployedahybridapproachtocontinuousactivelearningthatusesbothmanualandautomaticsearches.Avarietyofmanualsearchmethodswereused

tofindtrainingdocuments,includinghighprobabilityrankeddocumentsandkeywords,anadhocprocesstheTeamcallsmultimodal. Intheonetopic(109)requiringlegalanalysistheTeam’sapproachwassignificantlymore

effectivethanallotherparticipants,includingthefullyautomatedapproachesthatotherwise

attainedcomparablescores.InalltopicstheTeam’shybridmultimodalmethodconsistently

attainedthehighestF1valuesatthetimeofReasonableCall,equivalenttoastoppoint.InalltopicstheTeam’smultimodalhumanmachineapproachalsofoundrelevantdocumentsmore

quicklyandwithgreaterprecisionthanthefullyautomatedorothermethods.

CategoriesandSubjectDescriptors:H.3.3InformationSearchandRetrieval:Searchprocess,

relevancefeedback,supervisedlearning,bestpractices.

Keywords:HybridMultimodal;AI-enhancedreview;predictivecoding;predictivecoding

3.0;electronicdiscovery;e-discovery;legalsearch;activemachinelearning;continuousactive

learning;CAL;Computer-assistedreview;CAR;Technology-assistedreview;TAR;relevant

irrelevanttrainingratios.

1. INTRODUCTION Thee-DiscoveryTeamparticipatedinallthirtyTotalRecallTracktopicsintheAthomegroupwherebothmanualandautomaticmethodswerepermitted.TheTeamiscomposedofthree

practicingattorneyswhospecializeinlegalsearch.TheyusedKrollOntrack’ssearchandreview

software,eDiscovery.comReview(“EDR”),employingwhattheycallahybridmultimodalmethod.

1Theyattainedhighrecallandprecisioninmostofthethirtytopics.Thefewexceptions

appearderivedfromthefactthattheattorneysareaccustomedtoself-definingtheground

truth,and,insometopics,theiropinionsonrelevancedifferedsignificantlyfromtheTREC

assessors.InlatertopicstheattorneyTeamlearnedtoturnofftheirownjudgmentsandrely

primarilyontheirsoftware’sautomatedprocesses,whichgenerallyledtoimprovedscores

bettermatchingtheTRECrelevanceassessments.TheTeam’smanualefforts,asmeasuredby

timeexpendedandnumberofdocumentsmanuallyreviewed,wereverylowbylegalsearch

standards.

∗Theviewsexpressedhereinaresolelythoseoftheauthor,RalphLosey,andshouldnotbeattributedtohisfirmoritsclients.

2

ThefullyautomaticmethodsemployedbytheSandboxgroupparticipantsintheTotalRecallTrackattainedcomparablehighrecallandprecisioninmosttopics.TheTeam’shybridmultimodalmethoddid,however,consistentlyattainthehighestF1valuesatthetimeof

ReasonableCall,equivalenttoatrainingstoppoint,whichisveryimportanttolegalsearch.Oneofthethirtytopics,109-ScarletLetterLaw-requiredasmallamountoflegalknowledgeand

analysistounderstandrelevance(mostoftheothersrequirednone).Onthistopicourlegal

team,asyouwouldexpect,attainedsignificantlybetterresultsthanthefullyautomated

methodsthatcontainednobaselegalknowledge.

Thee-DiscoveryTeam’shybridmultimodalmethodisatypeofcontinuousactivelearning

textretrievalsystemthatemployssupervisedmachinelearningandavarietyofmanualsearch

methods.2,3

TheTeamattainedveryhighrecallandprecisionratesinmost,butnotall,ofthe

thirtyTotalRecalltopics.TheTeam’sF1scoresatthetimeofReasonableCallrangedfroma

perfectscoreof100%inonetopic(3484),to91%to99%ineighttopics,and82%-87%infive

others.Although,ofcourse,notdirectlycomparable,thesescoresarefarhigherthanany

previouslyrecordedinthesixyearsofTRECLegalTrack(2006-2011)oranyotherstudyoflegal

search.Onereasonforthismaybethatthethirtytopicsinthe2015TotalRecalltrackpresented

relativelysimpleinformationneedsbylegalsearchstandards,withoneexception(Topic109–

ScarletLetterLaw).AnothermaybeimprovedsoftwareandtheTeam’simprovedhybridmultimodalmethodthatincludescontinuousactivelearning.

Thee-DiscoveryTeamwasabletofindthetargetrelevantdocumentsinallthirtytopicswith

relativelylittlehumaneffortandalmostnolegalanalysis.OnlyTopic109requiredlegal

knowledgeandanalysis,withfourothers-101,105,106,107-requiringsomesmallmeasureof

analysis.

Atotalof16,576,798documentswereclassifiedinthirtytopics.Ofthesedocuments70,414

werepredeterminedbyTRECassessorstoberelevant.Thee-DiscoveryTeamfoundthese

relevantdocumentsbymanualreviewofonly32,916documents.Theother37,498relevant

documentswerefoundwithnohumanreviewofthesedocuments.1.1 TotalRecallTrackDescription–AthomeandSandbox. TheTotalRecalltrackoffered30differentpre-judgedtopicsforsearchintwodifferent

divisions,AthomeandSandbox.OurTeamonlyparticipatedintheAthomeexperiments.IntheAthomeexperimentsthedatawasloadedontotheparticipants’owncomputers.Therewereno

restrictionsonthetypesofsearchesthatcouldbeperformed.Thesetupallowedthee-DiscoveryTeamtouseaslightlymodifiedversionofourstandardHybridMultimodalmethod,

which,asmentioned,employsbothadhocmanualreviewandmachinelearning.

TheSandboxparticipantswereonlypermittedtousefullyautomatedsystemsandthedata

remainedonTRECadministratorcomputers.TheysearchedthesamethreedatasetsasAthome,plustwomorenotincludedintheAthomedivisionduetoconfidentialityrestrictions.TheSandboxparticipantswereprohibitedfromanymanualreviewofdocumentsoradhocsearchadjustments.

4Evenafterthesubmissionsended,theSandboxparticipantsreportedatthe

Conferencethattheyneverlookedatanydocuments,eventheunrestrictedAthomeshareddatasets.Theynevermadeanyefforttodeterminewheretheirsoftwaremadeerrorsin

predictingrelevance,orforanyotherreasons.Totheseparticipants,allofwhomwereacademic

institutions,thegroundtruthitselfwasofnorelevance.

ThreedifferentdatasetsweresearchedinboththeAthomeandSandboxevents,withthesametentopicsineach.Eventhoughthedatasearchedandtopicsoverlappedinthetwo

divisions,noneoftheparticipantsinonedivisionparticipatedintheotherdivision.Thisis

unfortunatebecauseitmakesdirectcomparisonsproblematic,ifnotimpossible,especiallyasto

3

thesoftwaresystemsused.ItishopethatsomeparticipantswillparticipateinbotheventsinfutureTotalRecalltracks. Thee-DiscoveryTeamparticipatedinallthirtyoftheAthometopics.Weweretheonlymanualparticipanttodoso,withallotherscompletingtenorfewertopics.ThelackofparticipationbyothersintheAthomegroupalsomakemeaningfulcomparisonsverydifficultorimpossible,butwenotethatthee-DiscoveryTeam’sscoreswereconsistentlyhigherthananyotherAthomeparticipants. AtHomeparticipantswereaskedtotrackandreporttheirmanualefforts.Thee-DiscoveryTeamdidthisbyrecordingthenumberofdocumentsthatwerehumanreviewedandclassified.Virtuallyalldocumentshumanreviewedwerealsoclassified,althoughalldocumentsclassifiedwerenotusedforactivetrainingofthesoftwareclassifier.Moreover53%oftherelevantdocumentsusedfortrainingwereneverhumanreviewed.Wealsotrackedeffortbynumberofattorneyhoursworkedasistraditionalinlegalservices. TheTeamusedKrollOntrack’ssoftware,knownaseDiscovery.comReview,orEDR,whichincludesactivemachinelearningfeatures,a/k/apredictivecodinginlegalsearch.EDRemploysaproprietaryprobabilistictypeoflogisticregressionalgorithmfordocumentclassificationandranking. TheAtHomeparticipantsusedtheirowncomputersystemsandsoftwareforsearch,andthensubmitteddocumentstotheTRECadministratorthattheyconsideredrelevant.TRECsetupa“jig”wherebyinstantfeedbackwasprovidedtoaparticipantaswhethereachdocumentsubmittedasrelevantwasinfactpreviouslyjudgedtohavebeenrelevantbyTRECassessors.Whenaparticipantdeterminedthatareasonableefforthadbeenmadetofindallrelevantdocumentsrequired,whichisimportantinlegalsearchandrepresentsastoppingpointforfurthermachinetraininganddocumentreview,theywouldnotifyTRECofthissuppositionand“CallReasonable.”Continuedsubmissionsweremadeafterthatpointsothatalldocumentswereclassifiedaseitherrelevantorirrelevant.ThegoalasweunderstooditwastosubmitasmanyrelevantdocumentsaspossiblebeforetheReasonablecall,andthereaftertohaveallfalsenegativesappearinsubmissionsassoonaftertheReasonableCallaspossible. Mostofthethirtytopicspresentedonlysimple,single-issueinformationneedssuitableforsingle-facetclassification.Further,onlyafewofthetopicsrequiredanylegalanalysisforrelevanceidentification.Thesetwofactors,plustheomissionofmetadata,was,wethink,adisadvantagetothee-DiscoveryTeamoflawyers.Conversely,itappearsthatthesesamefactorsmadeitsimplerfortheacademicSandboxparticipantstoperformwellinmosttopicsusingfullyautomatedmethods.ItshouldalsobenotedthatalthoughourlawyerTeamwaspracticedandskilledincomplexinformationneedsrequiringextensivelegalanalysis,andhadlongexperiencewithprojectsusingSMEdefinedgroundtruths,nonehadanypriorexperienceusingmachinelearningforthetypesofsearchespresentedinthe2015RecallTrack. TheoneexceptionthatbroughtinlegalanalysiswithbeneficialSMEanalysis,wasTopic109,ScarlettLetterLaw.Itrequiredsomelegalknowledge,albeitveryrudimentary,tobeginlocatingrelevantdocuments.Thekeywordsalone-“ScarlettLetterLaw”–wouldonlyfindrelevantdocumentswiththiswordcombinationandsimilartextpatterns.ThesewordswerejustthenicknameoftheproposedandeventuallyenactedFloridaStatute.Anyattorneywouldknowthattofindrelevantinformationtheywouldnotonlyhavetosearchthename,buttheywouldalsohavetosearchthevarioushouseandsenatebillnumbersforthislaw.Thesenumberswouldnotoftenappearinthesamedocumentasthenickname,andsincethemachinedidnotknowtosearchforthesenumbers,itdidnotrealizethesignificance.Eventuallytheautomatedmachinelearningdidseetheconnection,aftermanyrelevancefeedbacksubmissions.These

4

submissionsandinstantfeedbackofrelevant,ornot,would,ofcourse,nothappeninreallegalsearch.1.2GovernorBushEmail ThefirstsetofAthomeTopicssearchedacorpusof290,099emailsofFloridaGovernorJebBush.Mostofthemetadataoftheseemailsandassociatedattachmentsandimageshadbeenstrippedandconvertedtopuretextfiles.ThisincreasedthedifficultyoftheTeam’ssearch,whichnormallyincludesamixtureofmetadataspecificsearches. AsignificantpercentageoftheBushemailswereformtypelobbyingemailsfromconstituents,whichrepeatedthesamelanguagewithlittleofnovariance.Theunusuallyhighprevalenceofnear-duplicateemailsmadesearchofmanyoftheBushtopicseasierthanistypicalinlegalsearch. ThetenBushemailtopicssearched,andtheirnames,whichweretheonlyguidanceonrelevanceprovidedtoeithertheAthomeorSandboxparticipants,areshownbelow.

Topic100SchoolandPreschoolFunding

Topic101 JudicialSelectionTopic102 CapitalPunishmentTopic103 ManateeProtectionTopic104 NewMedicalSchoolsTopic105 AffirmativeActionTopic106 TerriSchiavoTopic107 TortReformTopic108 ManateeCountyTopic109 ScarletLetterLaw

E-DiscoveryTeamleader,RalphLosey,alifelongFloridanative,personallysearchedeachofthesetenTopics.Inabouthalfofthetopicshispersonalknowledgeoftheissueswashelpful,butinseveralothersitwasdetrimental.HehaddefinitepreconceptionsofwhatemailshethoughtshouldberelevantandthesesometimesdifferedsignificantlyfromtheTRECassessors.InalloftheBushTopicsLoseywasatleastsomewhatassistedbyasingle“contractreviewattorney.”5ThecontractattorneysinmostofthesetenTopicsdidamajorityofthedocumentreviewunderLosey’sveryclosesupervision,buthadonlylimitedinvolvementininitialkeywordsearches,andnoinvolvementinpredictivecodingsearchesorrelateddecisions. Allparticipantsinthe2015RecallTrackwererequiredtocompletealltenoftheBushEmailTopics.CompletionoftheothertwentyTopicsinthetwootherdatacollectionswasoptional.SeveralparticipantsstartedreviewoftheBushTopics,butdidnotfinish,andthuswerenotpermittedtosubmitareportorattendtheTRECConference.OnlyoneotherAthomeparticipant,Catalyst,completedalltenBushTopics.NootherAthomeparticipantsevenattemptedtheothertwentytopics,andthuscomparisonswiththee-DiscoveryTeam’sresultsarelimitedtothefullyautomaticparticipants.1.3BlackHatWorldForums. ThesecondsetofAthomeTopicssearchedacorpusof465,149poststakenfromBlackHatWorldForums.Again,almostallmetadataofthesepostsandassociatedimageshadbeenstrippedandconvertedtopuretextfiles.Thetentopicssearched,andtheirnames,whichagainweretheonlyguidanceinitiallyprovidedonrelevance,areshownbelow.

5

Topic2052

PayingforAmazonBook

Reviews

Topic2108 CAPTCHAServices

Topic2129 FacebookAccounts

Topic2130 SurelyBitcoinscanbeUsed

Topic2134 PayPalAccounts

Topic2158

UsingTORforAnonymous

InternetBrowsing

Topic2225 Rootkits

Topic2322 WebScraping

Topic2333 ArticleSpinnerSpinning

Topic2461 OffshoreHostSites

TheTeammembersagainhadexpertiseissueswithsomeofthesearcanetopicsthattheyhappenedtobefamiliarwith.Theirknowledgewouldsometimesprovedetrimental.Again,asthereviewcontinued,theTeammemberslearnedtosuspendtheirownknowledgeandgroundtruthjudgmentsandinsteadrelyentirelyontheautomatedrankingsearches,muchlikethefullyautomatedparticipantsalwaysnecessarilydid.1.4 LocalNewsArticles.

ThethirdsetofAthomeTopicssearchedacorpusof902,434onlineLocalNewsArticles,againintextonlyformat.Thetentopicssearched,andtheirnames,whichagainweretheonlyguidanceprovidedonrelevanceasidefromtheinstantfeedback,areshownbelow.

Topic3089 PicktonMurders

Topic3133 PacificGateway

Topic3226 TrafficEnforcementCameras

Topic3290

RoosterTurkeyChicken

Nuisance

Topic3357 OccupyVancouver

Topic3378

RobMcKennaGubernatorial

Candidate

Topic3423 RobFordCuttheWaist

Topic3431 KingstonMillsLockMurders

Topic3481 Fracking

Topic3484 PaulandCathyLeeMartin

TheTeamfoundtheNewsArticleslessdifficulttoworkwiththanourtypicallegalsearchofcorporateESI.Still,thesamekindofgroundtruthvalidityandconsistencyissueswerenotedinsomeofthenewstopics,buttoalesserdegreethantheothertwodatasets.1.5 E-DiscoveryTeam’sThreeResearchQuestions. Ourfirstandprimaryquestionwastodetermine:WhatRecall,PrecisionandEffortlevelsthee-DiscoveryTeamwouldattaininTRECtestconditionsoverall30TopicsusingtheTeam’s

6

PredictiveCoding3.0hybridmultimodalsearchmethodsandKrollOntrack’ssoftware,eDiscovery.comReview(EDR). Oursecondaryquestionwas:HowwilltheTeam’sresultsusingitssemi-automated,supervisedlearningmethodcomparewithotherRecallTrackparticipantsusingsemiautomatedsupervisedorfullyautomatedunsupervisedlearningmethods.Ourlastquestionwas:Whataretheidealratios,ifany,forrelevantandirrelevanttrainingexamplestomaximizeeffectivenessofactivemachinelearningwithEDR. 2.RELATEDWORK Itisgenerallyacceptedinthelegalsearchcommunitythattheuseofpredictivecodingtypesearchalgorithmscanimprovethesearchandreviewofdocumentsinlegalproceedings.6Theuseofpredictivecodinghasalsobeenapproved,andevenencouragedbyvariouscourtsaroundtheworld,includingnumerouscourtsintheU.S.7 Althoughthereisagreementonuseofpredictivecoding,thereiscontroversyanddisagreementastothemosteffectivemethodsofuse.8Thereare,forinstance,proponentsforavarietyofdifferentmethodstofindtrainingdocumentsforpredictivecoding.Someadvocatefortheuseofchanceselectionalone,othersfortheuseoftoprankeddocumentsalone,othersforacombinationoftoprankedandmid-levelrankeddocumentswhereclassificationisunsure,andstillothers,includingLosey,callfortheuseofacombinationofallthreeoftheseselectionprocessesandmore.9ThelatestrespectfuldisagreementisbetweenLosey’se-DiscoveryTeam,andtheAdministratorsoftheTotalRecallTrack,GrossmanandCormack,concerningtheadvisabilityof:1)keepingattorneysearchexpertsintheloop,thehybridapproach,asopposedtothefullyautomatedapproach;and2)usingavarietyofsearchmethods,themultimodalapproach,asopposedtorelianceonhighrankingdocumentsaloneformachinetraining.10

Someattorneys,predictivecodingsoftwarevendors,and,apparently,GrossmanandCormack,advocatefortheuseofpredictivecodingsearchmethodsalone,andforegoothersearchmethodswhentheydoso,suchaskeywordsearch,conceptsearches,similaritysearchesandlinearreview.E-DiscoveryTeammembersrejectthatapproachandinsteadadvocateforahybridmultimodalapproachthattheycallPredictiveCoding3.0,furtherdescribedbelow.Itusesallmethods.AsdiscussedinEndnote2,werejectthenotionofinherentlawyerbiasthatunderliessomeexperts’fullyautomatedapproaches,including,buttoalesserdegree,GrossmanandCormack.Weinsteadseektoaugmentandenhanceattorneysearchexperts,notautomateandreplacethem.Wedo,however,favorcertainsafeguardsagainstthepropagationoferrors,intentionalorinadvertent,andadvocatewithinthelegalcommunityforcontinuousactivetrainingoflawyersinsearchtechniquesandethics. Ourparticipationinthe2015TRECTotalRecallTrack,theresearchquestionsweposed,andtheexperimentsweperformed,werenotinanymannerdesignedorintendedtoattempttoresolvethiscurrentmethodologydisputewiththeAdministratorsofthisTrack.Infact,itwasonlyatthe2015Conferencethatwefullyunderstoodtheextentofthesedifferences.AlthoughGrossmanandCormackdidindividuallyparticipateinthisTrack,aswellasadministratorit,andsotoodidothergroupsfromCormack’suniversity,theydidnotparticipateinthemanualAthomedivisionthatwedid.ToourknowledgetheTotalRecalltrackwasnotdesignedtoaddressthisnewlyemergingdisagreementinpreferredmethodologies,noradvanceanyoneparticularmethodology.Still,wewouldconcedethat,subjecttonormalcaveats,someindirectlessonscanbederivedonthisissuefromtheTotalRecallTrackresults.

7

3.HYBRIDMULTIMODALAPPROACH Thee-DiscoveryTeamapproachincludesalltypesofsearchmethods,withprimaryrelianceplacedonpredictivecodingandtheuseofhigh-rankeddocumentsforcontinuousactivetraining.InthatwayitissimilartotheapproachusedbyGrossmanandCormack,11butdiffersinthattheTeamusesamultimodalselectionofsearchmethodstolocatesuitabletrainingdocuments,includinghighrankingdocuments,somemid-levelrankeduncertaindocuments,andallothersearchmethods,includingkeywordsearch,similaritysearch,conceptsearchandevenoccasionaluseoflinearreviewandrandomsearches.ThevarioustypesofsearchesusuallyincludedintheTeam’smultimodalapproachareshowninthesearchpyramid,below.

Thestandardeight-stepworkflowusedbytheTeaminlegalsearchprojectsisshowninthediagrambelow.Astepbystepdescriptionsoftheworkflowcanbefoundine-DiscoveryTeamwritings.12TheapplicationofthismethodologycanbeseentheTeam’sdescriptionoftheirworkineachofthethirtyTopicsthatisincludedintheAppendix.OurusualstepsOne,ThreeandSevenhadtobeomittedorseverelyconstrainedtomeettheTRECexperimentformat.

8

StandardstepsThreeandSevenoftheworkflowwereomittedtomeetthetimerequirementsofcompletingeveryreviewprojectin1.5days.Skippingthesestepsallowedustocomplete30reviewprojectsin45daysintheTeam’ssparetime,buthadadetrimentalimpact. Ourusualfirststep,ESIDiscoveryCommunications,iswhereourinformationneedsareestablished.ThishadtoomittedtofittheformatoftheRecallTrackAthomeexperiments.TheonlycommunicationundertheTRECprotocolwasaveryshort,oftenjusttwo-worddescriptionofrelevance,plusinstantfeedbackintheformoryesornoresponsesastowhetherparticulardocumentssubmittedwererelevant.Inthee-DiscoveryTeam’stypicalworkflowdiscoverycommunicationstypicallyinvolve:1)detailedrequestsforinformationcontainedincourtdocumentssuchasubpoenasorRequestForProduction;2)inputfromaqualifiedSME,whoistypicallyalegalexpertwithdeepknowledgeofthefactualissuesinthecaseandhowthepresidingjudgeinthelegalproceedingwilllikelyruleonborderlinerelevantissues;and,3)dialogueswiththeclient,witnesses,andwiththepartyrequestingtheproductionofdocumentstoclarifythesearchtarget. TheTeamneverreceivesarequestforproductionwithjusttwoorthreeworddescriptionsasencounteredintheTRECexperiments.WhentheTeamreceivesvaguerequests,whichiscommon,theTeamseeksclarificationindiscussions(StepOne).Inpracticeifthereisdisagreementastorelevancebetweentheparties,whichisalsocommon,thepresidingjudgeisaskedtomakerelevancerulings.Again,noneofthiswaspossibleintheTRECexperiments. AllofourusualpracticesinStepOnehadtobeadjustedtothesubmissionsformatofthe30AthomeTopics.ThemostprofoundimpactoftheseadjustmentswasthattheattorneysontheTeamoftenlackedaclearunderstandingastotheintendedscopeofrelevanceandtherationalebehindtheautomatedTRECrelevancerulingsonparticulardocuments.TheseprotocolchangeshadtheimpactofminimizingtheimportanceoftheSMEroleontheactivemachinelearningprocess.Instead,thisrolewasoftenshiftedalmostentirelytotheanalyticsoftheEDRsoftware.Thesoftwareanalyticscouldoftenseepatterns,andcorrectlypredictrelevance,thatthehumanattorneyreviewerscouldnot(often,butnotalways,becausethehumanreviewersdisagreed

9

withtheTRECassessorshumanjudgmentofgroundtruthinseveraltopics,andotherwisecouldnotfolloworseeanylogictothedocumentsreturnedasrelevant). ThisminimizationoftheimportanceoftheSMEroleisnotcommoninlegalsearchwhereattorneyreviewersalwayshavesomesortofunderstandingofrelevance.TheroleoftheSMEintheTeam’sdecadesofexperienceinlegalsearchhasalwaysbeenimportanttohelpensurehighquality,trustworthyresults.ContrarytotheunfortunatepopularbeliefamonglaypersonsgoingbacktothetimeofShakespeare,13thevastmajorityoflegalprofessionalsmaintainveryhighstandardsofethicsandtrustworthiness.Inspiteoftheallegednegativeinfluencesofthecenturiesoldadversarialtraditionofthecommonlaw,attorneysarededicatedtouncoveringthetruth,thewholetruth,andnothingbutthetruth,regardlessoftheparticularcaseimpact.Anynotionofinherentbiasbyattorneysismisplaced.Itis,afterall,attorneyswhocontrolthediscoveryprocessanddefinerelevance,andattorneys,notrobotsorscientists,whomaketheproductionofrelevantdocumentstotheotherside.14 Scientificresearchisbetterservedwhendrivenbyreasonandobjectivemeasurements,notprejudicesandassumptionsaboutanentireprofessionandourcommonlawsystemofjustice,basedasitisonanadversarialtruthseekingprocess.Thee-DiscoveryTeamwillcontinuetolookforwaystoimprovequalitycontrol,andguardagainstinadvertenterrors,whichalwaysexistsinanyhumanendeavor,andidentifyintentionalerrors,whichrarelyexistinlegalsearch,but,weconcedemaysometimestakeplace.Forthatreasonwewillexploregreaterrelianceonautomatedprocessinourfutureresearchandotherqualitycontroltechniques.15Wewillnot,however,abandonahybridapproachwhereahumanremains,ifnotincontrol,thenatleastasanactivepartner,outofanysubjectiveprejudicesagainstlawyers.Wealsorefusetoaccepttheunprovenassumptionthatouradversarialsystemisinherentlysuspect,encouragesbias,andotherwiserequiresthathumansberemovedfrome-discoveryandreplacedbyrobots.Conversely,wedonotnaivelyassumelawyersareautomaticallysuperiortomachines.Wehavelongadvocatedagainstthecurrentlegalstandardofonlyusingmanualreviewofeverydocument.TheTeam’shybridapproachaimsforaproportionalbalance.4.EXPERIMENTSANDDISCUSSIONS Thee-DiscoveryTeamsoughttoanswerthethreepreviouslylistedResearchQuestionsinitsexperimentsatthe2015TRECTotalRecallTrack.4.1FirstandPrimaryResearchQuestion. WhatRecall,PrecisionandEffortlevelswillthee-DiscoveryTeamattaininTRECtestconditionsoverall30TopicsusingtheTeam’sPredictiveCoding3.0hybridmultimodalsearchmethodsandKrollOntrack’ssoftware,eDiscovery.comReview(EDR). Weprimarilymeasuredeffortbythenumberofdocumentsthatwereactuallyhuman-reviewedandcodedrelevantorirrelevant.TheTeamhuman-reviewedonly32,916documentstoclassify16,576,798documents.Asanadditionalmeasureofeffort,weestimatedourtotaltimespentonallTopics.TheTeamspent45daysdoingallofthework,withanestimatedaverageof8hoursperdaytotalexpendedbytheTeam.(AllTeammemberscarriedontheirnormalemploymentactivitiesononlyasomewhatreducedbasisduringthe45daysofthereview,andTRECworkwasalsoreducedonmostweekends.)TheestimatedtotalhoursspentbyTeammembersforbothanalysisandreviewisthusapproximately360hours. Itistypicalinlegalsearchtotrytomeasuretheefficiencyofadocumentreviewbythenumberofdocumentsclassifiedinanhour.Forinstance,atypicalcontractreviewattorneycanclassifyanaverageof50documentsperhour.HereusingPredictiveCoding3.0ourTeamclassified16,576,798documentsin360hours.Thatisanaveragespeedof46,047filesperhour.

10

Inlegalsearchitisalsotypical,indeedmandatory,tomeasurethecostsofreviewandbillclientsaccordingly.Ifwehereassumeahighattorneyhourlyrateof$500perhour,thenthetotalcostofthereviewofall30Topicswouldbe$180,000.Thatisacostoflessthan$0.01perdocument.Inatraditionallegalreview,wherealawyerreviewsonedocumentatatime,thecostwouldbefarhigher.Evenifyouassumealowattorneyrateof$50perhour,andreviewspeedof50filesperhour,thetotalcosttoreviewwouldbe$16,576,798.Thatisacostof$1.00perdocument,whichisactuallylowbylegalsearchstandards.16

Analysisofprojectdurationisalsoveryimportantinlegalsearch.Insteadofthe360hoursexpendedbyourTeamusingPredictiveCoding3.0,traditionallinearreviewwouldhavetaken331,536hours(16,576,798/50).Inotherwords,whatwedidin45days,taking360hours,wouldhavetakenateamoftwolawyersusingtraditionalmethodsover45years. CompletedetailsanddescriptionsoftheadhocmethodsemployedinallthirtytopicsareincludedintheAppendix.4.2ResearchQuestionNo.2. HowwilltheTeam’sresultsusingitssemi-automated,supervisedlearningmethodcomparewithotherRecallTrackparticipantsusingsemiautomatedsupervisedlearningmethods. UnfortunatelynootherAthomeparticipantscompletedallthirtytopicsandonlyonecompletedalltenBushemailtopics.ThelackofparticipationbyothersintheAthomegroupmakesmeaningfulcomparisonsverydifficultorimpossible,butwenotethatthee-DiscoveryTeam’sscoreswereconsistentlyhigherthananyotherAthomeparticipants. TheSandboxparticipants’workincludedthesamethreedatasetsasAtHome,butnoneofthemalsoparticipatedintheAthomedivision.Thisisunfortunatebecauseitmakesdirectcomparisonsproblematic,ifnotimpossible,especiallyastothesoftwaresystemsused.Still,withsomecaveats,afewlimitedcomparisonsarepossiblebetweenthetwodivisionsbecausethesametopicsanddatasetsweresearched.4.3ResearchQuestionNo.3. Whataretheidealratios,ifany,forrelevantandirrelevanttrainingexamplestomaximizeeffectivenessofactivemachinelearningwithEDR. TheTeamexperimentedwithvariouspositiveandnegativetrainingratiosusingthepredictivecodingtrainingfeaturesoftheirsoftware.Mostoftheseexperimentswereposthoc,butsomewerecarriedoutduringtheinitialTRECsubmissions.Insomeofthethirtytopicsourreviewworkwouldhavebeenconcludedearlierbutforthesesideexperiments.5.RESULTS5.1ResearchQuestionNo.1. TheTRECmeasuredresultsdemonstratedhighlevelsofRecallandPrecisionwithrelativelylittlehumanrevieweffortsusingthee-DiscoveryTeam’smethodsandEDR.Thethree-manattorneyTeamwasabletoreviewandclassify16,576,798documentsin45daysunderdifficultTRECtestconditions.TheyattainedtotalRecallofallrelevantdocumentsinall30Topicsbyhumanreviewofonly32,916documents.Theydidsowithtwo-manattorneyteamsinthe10BushEmailTopics,andone-attorneyteamsinthe20otherTopics.InTopic3484,whichsearchedacollectionof902,434NewsArticles,theTeamattainedboth100%Recalland100%Precision.OnmanyotherTopicstheTeamattainednearperfectionscores.Intotal,veryhighscoreswererecordedin18ofthe30topicswithgoodresultsobtainedinall,especiallywhenconsideringthelowhumaneffortsinvolvedinthesupervisedlearning.Moreover,theTeam’sF1scoresatthetimeofReasonableCallrangedfromaperfectscoreof100%inTopic3484,to91%to99%ineighttopics,and82%-87%infiveothers.

11

Consideringthelimitedhumaneffortputintothereviews,andthespeedofthereviews,weconsidertheresultsinallTopicstobeexcellent.Asshownbythecomparisonswithtraditionalreviewdiscussedabove,theseresultsarefarsuperiortothetypicallinearlegaldocumentreviewdonebylawfirmattorneysandcontractreviewattorneys. TheeffortsbynumberofdocumentshumanreviewedinallthirtytopicsareshowninthebelowchartFigure1.Asyoucansee,theTeamreviewed32,916documentstoattaintotalrecallofthe70,414documentspredeterminedbyTRECasrelevantinall30Topicsfromoutofatotalof16,576,798documents.TheaveragenumberofdocumentsreviewedtoattaintotalRecallineachtopicwas1,097.Thefigurerangedfromalowof19documentsreviewedinTopic2134(PayPal),whichhad252relevantdocuments,toahighof7,203inTopic103(ManateeProtection),whichhad5,725relevantdocuments.

TheTeam’sattainmentofhighlevelsofRecallandPrecisioninmultipleprojectsconfirmsthehypothesisthatEDRsoftwareandtheTeam’sPredictiveCoding3.0hybridmultimodalmethodsareeffectiveinmostprojectsatattaininghighlevelsofRecallandPrecisionwithminimalhumanefforts. ThebelowchartssummarizeforeachofthethreedatasetsthePrecisionresultsobtainedineachtopicat70%orhigherRecalllevels.PrecisionisshownontheleftandRecalllevelsattainedbysubmissionsareshownonthebottom.AdifferentcoloredlineshowseachTopic.AlthoughPrecisionwasnotthefocusoftheeffortsintheTeam’sRecallTrackparticipation,insteadthe

Topic NeedTotal

DocumentsTotal

Relevant 70% 80% 90% 95% 97.5% 100%

Topic100 SchoolandPreschoolFunding 290,099 4,542 651 651 651 651 651 651Topic101 JudicialSelection 290,099 5,834 6,841 6,895 6,895 6,895 6,895 6,896Topic102 CapitalPunishment 290,099 1,624 1,493 1,493 1,493 1,493 1,493 1,493Topic103 ManateeProtection 290,099 5,725 7,203 7,203 7,203 7,203 7,203 7,203Topic104 NewMedicalSchools 290,099 227 1,091 1,091 1,091 1,091 1,091 1,091Topic105 AffirmativeAction 290,099 3,635 582 582 582 674 674 674Topic106 TerriSchiavo 290,099 17,135 831 1,987 1,995 2,005 2,025 2,226Topic107 TortReform 290,099 2,369 877 1,142 1,164 1,164 1,164 1,164Topic108 ManateeCounty 290,099 2,375 696 696 696 696 696 696Topic109 ScarletLetterLaw 290,099 506 491 496 639 753 753 753Topic2052 PayingforAmazonBookReviews 465,147 265 1,842 1,960 2,213 2,325 2,325 2,325Topic2108 CAPTCHAServices 465,147 656 2,101 2,101 2,101 2,101 2,101 2,101Topic2129 FacebookAccounts 465,147 589 94 94 94 94 94 94Topic2130 SurelyBitcoinscanbeUsed 465,147 2,299 283 283 285 285 285 285Topic2134 PaypalAccounts 465,147 252 19 19 19 19 19 19Topic2158 UsingTORforAnonymousInternetBrowsing 465,147 1,261 1,332 1,332 1,332 1,332 1,332 1,335Topic2225 Rootkits 465,147 182 183 186 205 214 219 225Topic2322 WebScraping 465,147 10,145 194 195 195 195 195 195Topic2333 ArticleSpinnerSpinning 465,147 4,805 190 228 228 228 228 228Topic2461 OffshoreHostSites 465,147 179 32 32 32 32 32 32Topic3089 PicktonMurders 902,434 255 472 516 779 834 834 836Topic3133 PacificGateway 902,434 113 49 49 49 49 49 49Topic3226 TrafficEnforcementCameras 902,434 2,094 18 18 18 78 81 81Topic3290 RoosterTurkeyChickenNuisance 902,434 26 137 191 306 306 310 310Topic3357 OccupyVancouver 902,434 629 751 751 920 920 920 920Topic3378 RobMcKennaGubernatorialCandidate 902,434 66 79 161 200 200 200 200Topic3423 RobFordCuttheWaist 902,434 76 92 92 92 92 92 92Topic3431 KingstonMillsLockMurders 902,434 1,111 272 272 272 272 272 302Topic3481 Fracking 902,434 1,966 31 236 367 367 367 367Topic3484 PaulandCathyLeeMartin 902,434 23 22 22 22 22 73 73

Figure1 TOTALS 16,576,800 70,964 28,949 30,974 32,138 32,590 32,673 32,916

Effort(Docsreviewed)byRECALLSCORES

12

focuswasonRecallandeffort,stillthemeasurementsofPrecisionacrosstheRecalllevelsprovidevaluableinsightsintotheoverallwork.Figure2belowshowstheresultsofthe10TopicsinJebBushEmailcollectionof290,099emails.Figure3showstheresultsofthe10TopicsinBlackHatWorldForumcollectionof465,149posts,andFigure4showstheresultsoftheNewsArticlescollectionof902,434articles.

Figure1

AquickexamoftheresultsoftheBushEmailTopicsshowsthatfourofthetenTopicshadsignificantlylessPrecisioninattaining80%orhigherRecallthantheothers.Theyare:Topic104NewMedicalSchools,showninpurple;Topic100SchoolandPreschoolFunding,showninblue;Topic102CapitalPunishment,showningreen;and,Topic108ManateeCounty.Topic108wasprobablythemosterror-filledofalloftheTopicstandards,andthismayexplainpartoftheoutlierresultsforthattopicandothersinthislowperforminggroup.InvestigationoftheoutliersshowedthattheprimarycauseoftheseresultswasdisagreementbytotheTeam’sleadattorneyfortheBushemail,aFloridalife-longresidentwhoisusedtoservingastheSMEdefininggroundtruth,andtheTRECassessors’relevancedeterminations.Also,thesetenBushtopicswerecarriedoutatthebeginningoftheprojectbeforetheTeamadoptedmitigatingcounterstrategiesofgreaterrelianceonmachinerankingtomitigatetheimpactofthepersonaljudgmentdisagreements.

13

Figure2

AnalysisoftheresultsofthetenTopicsinBlackHatWorldalsoindicatedthattherelevancedisagreementsaccountedformostofthediscrepancies. ItappearsthaterrorsandinconsistenciesintheTRECstandardjudgingexplainmostofthePrecisiondifferencesamongtheTopics,especiallytheTopicsintheBlackHatWorlddataset.InseveraloftheseTopicstheTeamoftenhaddifficultydetectinganylogicalpatterntotherelevancescope.Theyinstead,asmentioned,hadtorelyalmostentirelyontheEDRrelevancepredictions.OnlytheTeamsoftwareinsomeoftheseTopicscoulddetectanyconnectivityandpatterntotheTRECrelevantstandards. TheresultsonthelocalNewsdatasetof902,434articles(Figure4below)againshowssignificantdivergencesinPrecision,althoughlessthanthedifferencesseeninBushEmailorBlackHatWorlddatasets.AnalysisoftheresultsofthetenNewsArticlesTopicsagainshowsconsiderabledisagreementonrelevancejudgmentsinsometopics.InherentdifficultyofthevariousissuesintheTopicsmayalsoexplainsomeofthedifferences.ThesizeoftherelevancepoolalsohasadirectrelationshiponthePrecision.

14

Figure3

ThefollowingresultsarehighlightsoftheTeam’stop18topicswhereatleastseventy-five

percentofthetargetdocuments(Recall75%+)werefoundwithaPrecisionrateof80%or

higher.TheTop-18ProjectsoftheTeamarerankedbyus,somewhatarbitrarily,asfollows,

startingwithapreviouslyunheardofperfectscore.1. InTopic3484(Paul&KathyMartin),thee-DiscoveryTeam(JimSullivan)attainedaperfect

scoreof100%Precisionand100%Recall.All23ofthetargetdocumentswerefoundinthefirst

23documentssubmitted.SullivanthencalledReasonableafterthe23rdrelevantdocumentwas

submittedandsoplayedtheperfectgame.Hepredictedthattheremaining902,411articlesin

theNewscollectionwouldbeirrelevant.Sullivanwasright.Theeffortexpendedforperfection

washispersonalreviewof73newsreportsoutofthetotalcollectionof902,434.100%Recall

with100%Precisioninalargesearchprojectwaspreviouslythoughtimpossiblebymosttext

retrievalexperts.

2. InTopic3431(KingstonMillsMurders),100%RecallwasattainedbytheTeam(Tony

Reichenberger)with82.3%Precision.Heattained97.5%RecallwithaPrecisionof98.9%,and

95%Recallwith99%Precision.Theeffortexpendedtoreach100%Recallwashispersonal

reviewof332newsreportsoutofthetotalcollectionof902,434.

3. InTopic106(TerrySchaivo),whichhadthehighestprevalenceofanytopic(5.9%),98.47%

RecallwasattainedbytheTeam(RalphLosey)with97.22%Precision.Atthattime,after

submitting2,025documents,hecalledreasonable.TheF1measurethenattainedwas97.84%.

Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was

2,025Bushemails,outofthetotalcollectionof290,099,andtotalrelevantof17,135.Acontract

reviewattorney,whosestandardbillingrateisone-tenththatofLosey’s,assistedinthereview

effort.Loseyalsoattained99.7%RecallinthisTopicwithaPrecisionof70%.

4. InTopic2158(UsingTOR),theTeam(JimSullivan)attained97.5%Recallofthetargetwhile

maintainingaPrecisionof95%.Heattained95%RecallwithaPrecisionof98.4%,and90%

15

Recallwith99%Precision.Theeffortexpendedtoreach97.5%Recallwashispersonalreviewof1,332BlackHatForumposts,outofthetotalcollectionof465,149.5. Topic103(ManateeProtection),whichhadthethirdhighestPrevalenceof1.97%,theTeam(RalphLosey)attained97.5%RecallwithaPrecisionof90.6%,95%RecallwithaPrecisionof98.8%,and90%Recallwith99.3%Precision.Theeffortexpendedtoreach97.5%Recallwashispersonalreviewof7,203Bushemails,outofthetotalcollectionof290,099.Againhewasassistedbyacontractreviewattorney.ThehighreviewcounthereisduetothefactthisisoneoftwoprojectswherethePredictiveCoding3.0secondstepofrandomsamplingwasincluded.Thisisalsothefirstprojectundertaken.6. InTopic109(ScarlettLetterLaw),theTeam(RalphLosey)attained97.5%Recallwith84.4%Precision,95%Recallwith95.4%Precision,and90%Recallwith96%Precision.Theeffortexpendedtoreach97.5%Recallwashispersonalreviewof753Bushemails,againoutofthetotalcollectionof290,099.Onecontractreviewattorneyassisted.7. InTopic3378(RobMcKenna),theTeam(TonyReichenberger)attained100%Recallafterthesubmissionofonly192documentsandreviewofonly200documents.ThiswasalowprevalenceTopicwithonly66relevantoutofthetotalcollectionof902,434.ForthesereasonsthePrecisionwas34.31%,eventhoughonly192documentsweresubmittedtoattain100%Recall. TheTeamresultsexceededexpectations,whereourRecallgoalwas90%,inmanyadditionalTopics:8. InTopic3481(Fracking),theTeam(JimSullivan)attained95%Recallwith95.2%Precisionbyreviewingonly367newsarticles.9. InTopic105(AffirmativeAction),theTeam(RalphLosey)attained90%Recallwith99.7%Precisionbyreviewingonly582mails(onecontractreviewattorneyassisted).10. InTopic3089(PicktonMurders),theTeam(JoeWhite)attained90%Recallwith97.9%Precisionbyreviewingonly779articles.A99.61%Recalllevelwasattainedwith54.98%Precision,againwithreviewofonly799articles.11. InTopic3226(TrafficCameras),theTeam(JimSullivan)attained90%Recallwith95.9%Precisionbyhispersonalreviewonly18forumposts.12. InTopic101(JudicialSelection),whichhadthesecondhighestPrevalencerateof2%,theTeam(RalphLosey)attained90%Recallwith87.8%Precisionbyreviewing6,895emails(onecontractreviewattorneyassisted).13. InTopic3357(OccupyVancouver),theTeam(TonyReichenberger)attained90%Recallwith82.4%Precisionbyreviewingonly920newsarticles.14. InTopic107(TortReform),theTeam(RalphLosey)attained90%Recallwith80.9%Precisionbyreviewingonly1,164emails(onecontractreviewattorneyassisted). FouradditionalTopicsalsodidquitewell,andattainedRecalllevelsover75%withhighPrecisionrates:15. InTopic2225(Rootkits)theTeam(RalphLosey)attained80%Recallwith88%Precisionbyreviewingonly186forumposts.16. InTopic2333(ArticleSpinner)theTeam(RalphLosey)attained80%Recallwith79%Precisionbyreviewingonly228forumposts.17. InTopic2052(PayingforBookReviews)theTeam(JimSullivan)attained80%Recallwith73.4%Precision)byreviewing1,960forumposts.18. InTopic3133(PacificGateway)theTeam(RalphLosey)attained76.99%Recallwith89.69%Precisionbyreviewingonly49NewsArticles.Figure5belowshowstherecallandprecisionofthesetop18projects.

16

Figure5 TheTeam’slowerperformanceintheother12projectswas,accordingtoouranalysis,primarilycausedbythefactthattheattorneyTeammembersareaccustomedtoself-definingthegroundtruth,andtheiropinionsonrelevancedifferedsignificantlyfromtheTRECassessors.InlatertopicstheattorneyTeamlearnedtoturnofftheirownjudgmentsandrelyprimarilyontheirsoftware’sautomatedprocesses,atwhichpointtheirscoresimproved.InalltopicsthemachinelearningoftheTeam’sEDRsoftwarewasabletofinddocumentsthatTRECwouldconsiderrelevant,evenwherethehumanteammemberscouldseenoconnection.ButinsometopicsthehumansearcherswouldbecompletelybewilderedbythezigzagrelevancescopeshownbyTREC’sresponsetosubmissions.TheattorneyswouldnotseeanykindoflogicalconnectingpatterntosomeofthedocumentsthatTRECdeterminedtoberelevant.Sometimestheattorneysonlysawwronganswersandinconsistencies.Eventhoughtheattorneyscouldnotseeanypattern,theylearnedthattheirEDRsoftwarecouldoftenstillfindthepatternsandcorrectlypredictwhichdocumentsTRECwouldlabelrelevant.WhenthishappenedtheywouldineffectturnallsubmissiondecisionsovertoEDRandonlysubmitthehighest-rankingdocuments.Thecut-offpointofrankingforsubmissions,beittop5%ortop100documents,orsomeotherscheme,wasstilldeterminedbythehumanincharge.ThatispartoftheTeam’shybriddesign. ThereareprobablyotherexplanationsforthebottomtwelvescoringtopicsasidefromquestionableTRECassessoradjudications,including:thedataitself;thedifficultyoftheissuesaddressedintheTopic;relativeperformanceofhumanreviewers;and,theimpactoftheomissionofStepsThreeandSevenfromtheTeam’sstandardworkflowtomeetthe45daytimelimitation,andtheradicalchangetoStepOne.See:ConceptDriftandConsistency:TwoKeystoDocumentReviewQuality,e-DiscoveryTeam(Jan.20,2016).AlloftheTeam’sinconsistencieswerenotcausedbydifferencesofopiniononTRECrelevanceadjudications,onlysome.Weappreciatethedifficultyofcreatinginterestingtopicsforsuchadiversegroupofparticipants,mostofwhomusedfullyautomatedCALapproaches.WeunderstandtheinherentdifficultiesinsettingagroundtruthforprejudgedrelevancewherethetraditionalTRECpoolingmethodscouldnotbeused.17Inspiteofourcriticismshere,weoverallhavehighpraiseandthanksfortheTRECadministrators’tirelesseffortsandagreewiththemajorityoftheassessmentstheymadeunderdifficult,timeconstrainedconditions.

17

Regardlessoftheseissuesandmetricinconsistencies,theTeam’smanualefforts,as

measuredbytimeexpendedandnumberofdocumentsmanuallyreviewedwereconsistently

verylowinalltopics.Morethanhalfoftherelevantdocumentsfoundwerenotmanually

reviewed.Instead,theTeamwasroutinelyabletodelegaterelevancecodingtotheEDR

software,eitherbychoiceandconvenience,orsometimes,asdiscussed,bynecessityinthe

topicswherethegroundtruthofrelevancewasunknownandincomprehensibletothe

attorneys.Thisresultshouldshatteronceandforallthealreadyweakenedlegalsearchmyththatalldocumentsmustbemanuallyreviewedforrelevance.

Althoughnotdirectlycomparableduetodifferenttestconditions,differentsearches,etc.,

thee-DiscoveryTeam’sscoreswerefarhigherthananypreviouslyrecordedinthesixyearsofTRECLegalTrack(2006-2011)

18oranyotherstudyoflegalsearch.

19TheresultsofBlairand

MaronandTRECfrom2007to2011aresummarizedbelowinFigure6withF1scores.

Figure6

Thisisnotalistingoftheaveragescoreperyear,suchscoreswouldbefar,farlower.Ratherthisshowstheverybesteffortattainedbyanyparticipantinthatyearinanytopic.Theseare

thehighestscoresfromeachTRECyear.NotehowtheycomparewiththeTeam’shighscoresin

2015,Figure7.

Figure7

18

Onereasonforthissignificantjumpinhighscoresmaybethatmanyofthethirtytopicsinthe2015TotalRecallTrackpresentedrelativelysimpleinformationneedsbylegalsearchstandards,withonemajorexception,Topic109–ScarletLetterLaw.Itrequiredsomelegalknowledgeandanalysis.Therewerealsofourotherminorexceptions–Topics101,105,106,107–thatrequiredsomemeasureoflegalanalysis.AnotherexplanationmaybeimprovedsoftwareandtheTeam’shybridmultimodalmethodthatincludescontinuousactivelearning.ThelaterisstronglysuggestedbecausetheresultsinTopic109,aswellasTopics101,105,106and107,areclosetotypicallegalsearchtypeprojectsandtheTeam’sresultsinthesetopicswereallconsistentlyhigh:Topic109(ScarlettLetterLaw)-95%F1atReasonableCall;Topic101(JudicialSelection)-87%F1atReasonableCall;Topic105(AffirmativeAction)-95%F1atReasonableCall;Topic106(TerriSchiavo)-98%F1atReasonableCall;Topic107(TortReform)-84%F1atReasonableCall.ThisisshowninFigure8below.

Figure85.2ResearchQuestionNo.2. TheTeamattainedveryhighrecallandprecisionratesinmost,butnotall,ofthethirtyTotalRecalltopics.TheTeam’sF1scoresatthetimeofReasonableCallrangedfromaperfectscoreof100%inonetopic(3484),to91%to99%ineighttopics,and82%-87%infiveothers. Although,ofcourse,notdirectlycomparable,thesescoresarefarhigherthananypreviouslyrecordedinthesixyearsofTRECLegalTrack(2006-2011)oranyotherstudyoflegalsearch.Onereasonforthismaybethatthethirtytopicsinthe2015TotalRecalltrackpresentedrelativelysimpleinformationneedsbylegalsearchstandards,withoneexception(Topic109–ScarletLetterLaw).AnothermaybeimprovedsoftwareandtheTeam’shybridmultimodalmethodthatincludescontinuousactivelearning. Sincemostofthethirtytopicspresentedonlysimple,single-issueinformationneedssuitableforsingle-facetclassification,theyhadsomewhatlimitedvalueforpurposesoflegalsearchexperimentation.Further,onlyafewofthetopicsrequiredanylegalanalysisforrelevanceidentification.Thisagainlimitedtheuseoftheseexperimentsforpurposesoflegalsearchresearch.Thesetwofactors,plustheomissionofmetadata,wasadisadvantagetothee-DiscoveryTeamoflawyerswhoarepracticedinmorecomplexinformationneedsrequiringextensivelegalanalysisandSMEdefinedgroundtruths.Further,theirmethodsandEDR

19

softwarearedesignedtoutilizefullmetadataderivedfromnativefiles.Conversely,itappearsthatthesesamefactorsmadeitsimplerfortheSandboxparticipantstoperformwellinmosttopics. TheoneexceptionwasTopic109,ScarlettLetterLaw,which,asmentioned,wastheonlytopicrequiringlegalanalysisandsomeveryrudimentaryknowledgetobeginlocatingrelevantdocuments.Thekeywordsalone-“ScarlettLetterLaw”–wouldonlyfindrelevantdocumentswiththiswordcombinationandsimilartextpatterns.ThesewordswerejustthenicknameoftheproposedandeventuallyenactedFloridaStatute.Anyattorneywouldknowthattofindrelevantinformationtheywouldnotonlyhavetosearchthename,theywouldhavetosearchthevarioushouseandsenatebillnumbersforthislaw.Thesenumberswouldnotoftenappearinthesamedocumentasthenickname,andsincethemachinedidnotknowtosearchforthesenumbers,itdidnotrealizethesignificance.Eventuallytheautomatedmachinelearningsawtheconnection,aftermanyrelevancefeedbacksubmissions.Thesesubmissionswould,ofcourse,nothappeninreallegalsearch,andeveniftheydid,thisimprecisionwouldequatetosubstantialadditionalhumanreviewsandthusexpense. Somewhatsurprisinglytous,thefullyautomaticmethodsemployedbytheSandboxparticipantsattainedrecallandprecisionscorescomparabletothatofthee-DiscoveryTeaminmostofthetopics.Moreover,therewerefewdifferencesbetweenthevariousfullyautomatedapproaches.Still,thehighestF1valuesatthetimeofReasonableCallwereattainedbythee-DiscoveryTeamintwentyofthethirtytopics,andthesecondorthirdbestF1scoresinfourothers.ThisisshowninFigure9below.TheTeamF1rankingsforeachtopicareshowninthethirdcolumn.

Figure9

InTopic109,ScarletLetterLaw,wheresomelegalknowledgeandanalysiswasrequiredtounderstandrelevance,theTeamattainedsignificantlybetterresults-96%F1-atthetimeofReasonableCallthandidtheautomaticruns.IntheSandboxautomaticrunstheF1valuesatthetimeofReasonableCallrangedfrom0%to29%.Moreover,atthe1RpointinTopic109,thee-

F1 Topic100 Rank Topic101 Rank Topic102 Rank Topic103 Rank Topic104 Rank Topic105 Rank Topic106 Rank Topic107 Rank Topic108 Rank Topic109 RankeDiscoveryTeam 68.96% 2 82.45% 4 69.88% 1 90.69% 1 73.53% 1 95.07% 1 97.38% 1 84.40% 1 47.03% 5 95.58% 1NINJA 22.74% 8 79.17% 5 56.38% 5 83.79% 3 57.40% 4 77.24% 2 88.90% 5 50.89% 8 13.43% 11 48.79% 2UvA.ILPS-baseline 73.55% 1 86.36% 1 56.38% 4 89.94% 2 10.27% 10 64.13% 5 95.87% 2 77.26% 4 64.47% 1 28.88% 3UvA.ILPS-baseline2 45.56% 5 71.04% 7 42.42% 8 77.24% 6 2.42% 11 43.27% 7 84.67% 6 47.81% 9 35.13% 8 26.90% 6WaterlooClarke-UWPAH1 11.95% 9 9.98% 11 32.16% 11 10.46% 11 68.51% 2 15.99% 10 3.61% 10 22.96% 11 21.61% 9 0.73% 8WaterlooClarke-UWPAH2 10.37% 10 9.98% 10 32.16% 10 10.46% 10 65.93% 3 15.99% 11 3.54% 11 23.11% 10 21.54% 10 0.73% 9WaterlooCormack-Knee100 45.02% 6 67.65% 9 42.32% 9 71.10% 9 28.49% 7 34.08% 8 77.03% 9 53.92% 7 42.65% 7 0.94% 7WaterlooCormack-Knee1000 41.82% 7 67.67% 8 45.21% 7 71.11% 8 31.06% 5 33.90% 9 77.03% 8 57.79% 5 42.65% 6 27.17% 5WaterlooCormack-stop2399 68.21% 3 72.02% 6 51.74% 6 75.55% 7 14.34% 9 58.92% 6 81.60% 7 57.77% 6 58.96% 2 27.17% 4Webis-baseline 66.96% 4 83.87% 3 68.36% 2 82.42% 5 27.95% 8 64.91% 4 94.89% 4 79.24% 3 58.76% 3 0.00% 11Webis-keyphrase 0.14% 11 85.21% 2 67.71% 3 83.15% 4 31.04% 6 65.13% 3 94.90% 3 79.24% 2 58.34% 4 0.33% 10

F1 Topic2052 Rank Topic2108 Rank Topic2129 Rank Topic2130 Rank Topic2134 Rank Topic2158 Rank Topic2225 Rank Topic2322 Rank Topic2333 Rank Topic2461 RankeDiscoveryTeam 45.21% 1 53.99% 1 26.10% 6 64.31% 1 12.23% 6 95.61% 1 84.90% 1 72.60% 3 73.23% 1 16.68% 7NINJA 58.13% 2 53.66% 2 49.22% 2 52.18% 2 39.70% 2 76.26% 2 39.43% 4 24.83% 9 62.65% 6 24.48% 5UvA.ILPS-baseline 10.74% 3 22.74% 9 21.88% 7 41.12% 4 8.08% 7 42.02% 7 7.20% 9 73.20% 2 69.80% 2 7.33% 9UvA.ILPS-baseline2 10.37% 4 22.45% 10 19.23% 8 30.88% 5 6.96% 8 22.47% 9 6.45% 10 48.11% 6 46.02% 9 6.53% 10WaterlooClarke-UWPAH1 78.54% 5 52.20% 3 56.89% 1 13.42% 8 63.18% 1 40.08% 8 61.45% 2 5.85% 10 12.22% 10 49.90% 1WaterlooCormack-Knee100 41.43% 6 33.89% 5 28.52% 5 19.49% 6 18.45% 3 16.15% 10 41.33% 3 47.39% 7 47.33% 7 43.87% 2WaterlooCormack-Knee1000 38.10% 7 34.00% 4 30.91% 4 19.45% 7 18.45% 4 60.57% 5 27.02% 5 44.11% 8 47.30% 8 21.65% 6WaterlooCormack-stop2399 16.94% 8 31.35% 7 31.01% 3 46.56% 3 15.51% 5 45.06% 6 11.84% 8 75.86% 1 68.87% 3 11.72% 8Webis-baseline 13.24% 9 32.65% 6 7.73% 10 0.00% 10 2.21% 10 61.11% 4 18.36% 6 67.40% 5 68.07% 4 43.56% 3Webis-keyphrase 10.53% 10 30.56% 8 8.29% 9 0.00% 9 2.21% 9 62.14% 3 12.97% 7 67.72% 4 68.04% 5 31.95% 4

F1 Topic3089 Rank Topic3133 Rank Topic3226 Rank Topic3290 Rank Topic3357 Rank Topic3378 Rank Topic3423 Rank Topic3431 Rank Topic3481 Rank Topic3484 RankeDiscoveryTeam 93.28% 1 82.46% 1 55.39% 4 37.70% 2 86.70% 2 68.21% 1 58.12% 1 99.24% 1 95.48% 1 100.00% 1NINJA 86.84% 2 67.97% 2 22.75% 9 38.98% 1 89.95% 1 67.88% 2 57.85% 2 74.67% 4 71.59% 2 100.00% 1UvA.ILPS-baseline 5.47% 9 2.47% 9 37.25% 5 0.57% 9 12.75% 9 1.39% 9 1.26% 9 21.90% 7 35.00% 7 0.51% 9UvA.ILPS-baseline2 5.35% 10 2.39% 10 34.75% 6 0.39% 10 11.82% 10 1.38% 10 0.74% 10 21.74% 8 29.19% 9 0.51% 10WaterlooClarke-UWPAH1 76.14% 3 50.45% 3 24.73% 7 11.90% 5 62.65% 3 32.58% 4 18.65% 5 44.29% 6 26.87% 10 12.99% 6WaterlooCormack-Knee100 57.66% 4 49.02% 4 64.61% 2 26.09% 3 55.57% 4 57.87% 3 30.70% 3 93.34% 3 53.62% 5 34.07% 4WaterlooCormack-Knee1000 37.35% 5 18.38% 6 68.61% 1 4.59% 7 48.23% 5 11.26% 7 6.77% 7 93.77% 2 61.55% 4 4.07% 7WaterlooCormack-stop2399 16.41% 7 8.43% 7 56.65% 3 2.01% 8 32.80% 6 5.01% 8 3.56% 8 44.78% 5 53.56% 6 1.78% 8Webis-baseline 14.77% 8 47.06% 5 24.51% 8 19.31% 4 18.84% 7 27.37% 5 28.16% 4 19.71% 9 65.54% 3 34.59% 3Webis-keyphrase 19.10% 6 6.40% 8 18.29% 10 10.22% 6 17.98% 8 18.23% 6 16.04% 6 19.19% 10 32.89% 8 30.08% 5

20

DiscoveryTeamhadattainedover95%recall,whereasalloftheautomatedmethodswerestilllessthan1%recall.Thisisshowninthechartbelow,Figure10.

Figure10

TheTeam’smultimodalhumanmachineapproachalsoconsistentlyfoundmorerelevantdocumentsatthestartofasearch,anddidsowithgreaterprecisionthanthefullyautomatedapproaches.Further,thehybridman-machineapproachwasconsistentlymoreeffectiveatdeterminingastoppoint,referredtobytheRecallTrackasa“ReasonableCall.”AnexampleofthisisshownintheFigure11forTopic109.ThedarkgreenlinerepresentstheReasonableCallpoint,recallisshowninthevertical,andhorizontalisthenumberofdocumentssubmitted.

Figure11

21

Anotherwaytoevaluatetheperformanceofthemulti-modalapproachistoconsiderhowprecisethecodingsuggestionswereduringthecourseofreview.Thiswouldindicateanefficientreview,whichiscriticalinlegalsearchtocostsavings.AstotheAthome109topic,thebelowFigure12contrastsprecisionpercentageontheY-axis,withrecallpercentageontheX-axis.Precisiondoesnotbegintodropuntilapproximately95%Recall.Notethatthegreenlinerepresentingpercentofthedatabasesubmittedbarelymovesoffthebaseline.Figure13showstheactualdocumentcountsreviewedandsubmittedinordertoobtainthevariousprecisionthresholds.

Figure12

Figure13

22

ForfurthercomparisonFigure14below(preparedbytheTotalRecalladministrators)plotstheaverageAthome3precisionbyrecallresults.Thee-DiscoveryTeamresults(barelyvisibleontop)followacurveverysimilartotheAthome109topic.TheTeam’sresultsoutperformedtheautomatedrunsformostofthedurationoftheprocess,demonstratingaconsistentefficiencyinresults.WhilevariousautomatedrunsexperiencedcomparableresultsintheAthome1andAthome2sets,theconsistentlyhighlevelofthemultimodalapproachcorroboratesaconsistentefficientprocessacrossalldatasets.

Figure145.3ResearchQuestionNo.3. TheTeam’sexperimentswithdifferentpositivenegativetrainingratiosshowedthattrainingusinga50/50ratioofrelevanttoirrelevantdocumentsperformedconsistentlybetterthananyotherratios.ThisresultisbelievedtobespecifictotheproprietarytypeoflogisticregressionalgorithmusedinKrollOntrack’sEDR.Itmaynothaveapplicationsbeyondthissoftware,orevenothermorecomplexprojects.Ourworkonthisquestioncontinues.6.CONCLUSIONS TheresultsinTopic109andothertopicsindicatethathybridman-machinelearningbyskilledattorneysis,atthecurrenttime,significantlymoreeffectiveatmeetingcomplexlegalsearchneedsthanfullyautomatedapproaches.Thisseemsobvious,butmoreexperimentsonthisissueareneededbeforethiscanbeaccuratelyquantified.ThesurprisingsuccessoftheSandboxparticipantsusingfullyautomatedsearch,eventhoughlimitedtonon-legaltopicsandsituationswithonlysimpleinformationneeds,suggeststhatgreaterrelianceonautomatedmethodscouldbeplacedinlegalsearchwherethecasesandneedsaresimple.Therelativelyloweffortinvolvedinautomatedlearning,andthuslowexpense,iscompelling,especiallyinviewoftheproportionalityanalysisrequiredbylawundertheDecember2015AmendmentstotheFederalRulesofCivilProcedure.TheTeamhasbegunandwillcontinueposthocanalysisandexperimentsusingvarioushybridmethodsthatadjustthebalancebetweenmanandmachine.

23

Weareexperimentingwithmethodsthatplacegreaterrelianceonmachinelearninginalltopics,

including,butnotlimitedto,topicswithlessercomplexityandinformationneeds.Wewillalso

furtherinvestigatetheuseofbothfullyautomatedmethods,andhybridmethods,inlegal

searchqualitycontrol,frauddetection,andinthepredictionoffuturewrongfulconduct.20

The2015TRECTotalRecallTrackresultsalsosuggestthatevenwheninformationneedsare

simpleandrequirenocomplexanalysisorbackgroundknowledge,aswastrueofmostofthe

topics,thatahybridmethodoutperformsfullyautomatedmethodsintwoways:one,atfinding

relevantdocumentsquicklyandwithhighprecision;andtwo,atmakingbetterstopdecisions.Thesetwoconsiderationsareveryimportantinlegalsearchwhereattorneysmustfinda

proportionalbalancebetweenrecallandeffort/expense.Theresultsinalltopics,eventhe

simpleones,thuscautionagainstover-relianceatthistimeonmachinelearningalonewithout

properexpertsupervision.7.ACKNOWLEDGMENTS Thee-DiscoveryTeamwouldliketothankKrollOntrack,Inc.andJacksonLewisP.C.fortheir

generoussupportofthisproject.WewouldalsoliketothankthemanyemployeesatKroll

Ontrackwhopitchedinbehindthescenes,oftenlateatnightandonweekends,tohelpmake

thishappen.

8.REFERENCES(Endnotes)[1] Losey,R.,PredictiveCoding3.0,parttwo(e-DiscoveryTeam,10/18/15);alsosee

PredictiveCodingArticlesbyRalphLosey,(collectionofover50articlesbyRalphLoseyfurtherdescribingthehybridmultimodalapproach).

[2] Thee-DiscoveryTeam’shybridmultimodalapproachissimilartothemethodpromoted

bytheTotalRecallTrackadministrators,MauraGrossmanandGordonCormack,inthat

theybothusecontinuousactivelearning(CAL)inlegalsearchaspartofatechnology-

assistedreview(TAR).Itis,however,fundamentallydifferentfromGrossmanand

Cormack’scurrentmethodsintwoways.

First,ourapproachreliesuponandencouragesparticipationofskilledreviewersin

thesearchprocess,thehybridapproach,whereastheGrossmanandCormackapproach

seekstoeliminatetheroleoftheskilleduser,namelytrainedattorneys.Therationale

fortheirautomationgoalistheunsubstantiatedclaimthattheadversarialcontextof

legalsearchmakesattorneysuntrustworthy.Theyclaimthatinherentuserbiasmeans

fullyautomatedapproachesaretheonlyreliablemethodsoflegalsearch.Grossman&

Cormack,AutonomyandReliabilityofContinuousActiveLearningforTechnology-AssistedReview,CoRRabs/1504.06868atpg.1(2015)(“IneDiscovery,thereviewistypicallyconductedinanadversarialcontext,whichmayofferthereviewerlimitedincentivetoconductthebestpossiblesearch.”)ObviouslytheTeamdisputesthis

assumptionandconclusion.Wedonotendorsetheviewoftheinherentbiasand

untrustworthinessofattorneys.InRalphLosey’sexperienceasapracticingattorney

since1980suchbiasistherareexception,notthenorm,andshouldnotbethebasisof

alegalsearchstrategy.Thebettersolutiontothisminorissueoftrustworthinessis

educational,totrainmoreattorneysinsearchandinprofessionalethics.Sinceourcore

assumptionsonprocessandattorneyhonestyarefundamentallydifferent,sotooare

ourmethodsandgoal.Ouraimisaugmentationofskilledattorneystoperformlegal

search,notautomation,notreplacement.

Second,ourTeamusesavarietyofsearchmethods,amultimodalapproach,whereastheGrossmanandCormackapproachreliessolelyupontheuseofhigh-ranking

24

documentstotrainaclassifier.Thisisconsistentwiththeiraimtofullyautomateandeliminateattorneysfromthelegalsearchprocess,againbasedonthepremisewedisputeofattorneybias.Intheirwords:“Forthereasonsstatedabove,itmaybe

desirabletolimitdiscretionarychoicesintheselectionofsearchtools,tuningparameters,

andsearchstrategy.”Id.Wedisagreeandseektoempowerattorneyswithavarietyofsearchtools,includingtheonesearchmethodthattheyendorseofrelianceonhigh-rankingdocuments.AlsoseeandthediscussionandcitationsinEndnote19.

[3]Intheserespectsthee-DiscoveryTeamfollowstheteachingsofGaryMarchionini,DeanoftheSchoolofInformationandLibrarySciencesofU.N.C.atChapelHill,whoexplainedinInformationSeekinginElectronicEnvironments(Cambridge1995)thatinformationseekingexpertiseisacriticalskillforsuccessfulsearch.ProfessorMarchioniniargues,andweagree,that:“Onegoalofhuman-computerinteractionresearchistoapply

computingpowertoamplifyandaugmentthesehumanabilities.”WealsofollowtheteachingsofUCLAProfessorMarciaJ.Bateswhohasadvocatedforamultimodalapproachtosearchsince1989.Bates,MarciaJ.,TheDesignofBrowsingandBerrypickingTechniquesfortheOnlineSearchInterface,OnlineReview13(October1989):407-424.AsProfessorBatesexplainedin2011inQuora:

“AnimportantthingwelearnedearlyonisthatsuccessfulsearchingrequireswhatI

called“berrypicking.”…Berrypickinginvolves1)searchingmanydifferent

places/sources,2)usingdifferentsearchtechniquesindifferentplaces,and3)

changingyoursearchgoalasyougoalongandlearnthingsalongtheway.Thismay

seemfairlyobviouswhenstatedthisway,but,infact,manysearcherserroneously

thinktheywillfindeverythingtheywantinjustoneplace,andsecond,many

informationsystemshavebeendesignedtopermitonlyonekindofsearching,and

inhibitthesearcherfromusingthemoreeffectiveberrypickingtechnique.”

Alsosee:White&Roth,ExploratorySearch:BeyondtheQuery-ResponseParadigm(Morgan&Claypool,2009).

[4] TheTotalRecallTrackfullyautomatedmethodfollowstheTrackAdministrator’spreferredmethodologyoffullyautomatedmonomodalsearch(highrankingonly)andtheirrecentlyannouncedgoaltoeliminateattorneyreviewinfavoroffullautomation.Grossman&Cormack,AutonomyandReliabilityofContinuousActiveLearningfor

Technology-AssistedReview,supraatpg.1(2015): “Ourgoalistofullyautomatethesechoices,sothattheonlyinputrequiredfromthe

revieweris,attheoutset,ashortquery,topicdescription,orsinglerelevant

document,followedbyanassessmentofrelevanceforeachdocument,asitis

retrieved.” Theycallthemethod“AutonomousTAR.”Id.atpg.6.Theprotocolsofthefully

automateddivisionoftheTotalRecallTrackwereapparentlydesignedinpartbyCormackandGrossmantotestthispremise,andtheresultstheyattainedasparticipantsinthisdivision,alongwithalloftheotherfullyautomatedparticipantsfromUniversitiesaroundtheworld,areveryimpressive.Still,thee-DiscoveryTeam,whodidnotparticipateinthe2015automateddivision,notesthatmanyoftheprotocolsinthisexperimentarebasedonfictionsandconditionsnotfoundintherealworldoflegalsearch,wheretheTeam’smethodsweredeveloped.Thedifferencesinclude,butarenotlimitedto:theexistenceofanomnipotentSMEthatinstantlyprovidesperfectlycorrectjudgmentalfeedbackastorelevanceofalldocumentsselectedbytheautomatedprocessesasprobablerelevant;simple,single-facetissues;relativelysimpledatasetsstrippedofmostnativemetadata;and,perhapsmostimportantly,issues

25

requiringlittleornolegalanalysisorbackgroundlegalknowledge.Note,inposthocrunsthee-DiscoveryTeamranafewfullyautomatedrunsonKrollOntracksystemsandEDR.WeusedthesamehighrankingonlyAutonomousTARtrainingmethodandobtainedthesameresultsasalloftheotherfullyautomateddivisionparticipants.

[5] “Contractreviewattorney,”orsimply“contractattorney,”isatermnowincommonparlanceinthelegalprofessiontorefertolicensedattorneyswhododocumentreviewonaproject-by-projectbasis.Theirpayunderaprojectcontractisusuallybythehourandisatafarlowerratethanattorneysinalawfirm,typicallyonly$50to$75perhour.Theironlyresponsibilityistoreviewdocumentsunderthedirectsupervisionoflawfirmattorneyswhohavemuchhigherbillingrates.

[6] PredictiveCodingisdefinedbyTheGrossman-CormackGlossaryofTechnology-AssistedReview,2013Fed.Cts.L.Rev.7(January2013)(Grossman-CormackGlossary)as:“Anindustry-specifictermgenerallyusedtodescribeaTechnologyAssistedReviewprocessinvolvingtheuseofaMachineLearningAlgorithmtodistinguishRelevantfromNon-RelevantDocuments,basedonSubjectMatterExpert(s)CodingofaTrainingSetofDocuments.”ATechnologyAssistedReviewprocessisdefinedas:“AprocessforPrioritizingorCodingaCollectionofelectronicDocumentsusingacomputerizedsystemthatharnesseshumanjudgmentsofoneormoreSubjectMatterExpert(s)onasmallersetofDocumentsandthenextrapolatesthosejudgmentstotheremainingDocumentCollection.…TARprocessesgenerallyincorporateStatisticalModelsand/orSamplingtechniquestoguidetheprocessandtomeasureoverallsystemeffectiveness.”Alsosee:Technology-AssistedReviewinE-DiscoveryCanBeMoreEffectiveandMoreEfficientThanExhaustiveManualReview,RichmondJournalofLawandTechnology,Vol.XVII,Issue3,Article11(2011).

[7] DaSilvaMoorev.PublicisGroupe868F.Supp.2d137(SDNY2012)andnumerouscaseslatercitingtoandfollowingthislandmarkdecisionbyJudgeAndrewPeck,includingJudgePeck’sownmorerecentRioTintov.Vale,2015WL872294(March2,2015,SDNY).

[8] Grossman&Cormack,EvaluationofMachine-LearningProtocolsforTechnology-AssistedReviewinElectronicDiscovery,SIGIR’14,July6–11,2014;Grossman&Cormack,Commentson“TheImplicationsofRule26(g)ontheUseofTechnology-AssistedReview”,7FederalCourtsLawReview286(2014);HerbertRoitblat,seriesoffiveOrcaTecblogposts(1,2,3,4,5),May-August2014;HerbertRoitblat,Daubert,Rule26(g)andtheeDiscoveryTurkeyOrcaTecblog,August11th,2014;Hickman&Schieneman,TheImplicationsofRule26(g)ontheUseofTechnology-AssistedReview,7FED.CTS.L.REV.239(2013);Losey,R.PredictiveCoding3.0,partone(e-DiscoveryTeam10/11/15).

[9] Id.;Webber,Randomvsactiveselectionoftrainingexamplesine-discovery(Evaluatinge-Discoveryblog,7/14/14).

[10] SeeEndnote[2].Thisdisagreementiswithinageneralframeworkofagreementonthesuperiorityofcomputerassistedmethodsovertraditionallinearreview,jointcriticismofrandomselectionmethodsandcontrolsetsinlegalreview,andagreementontheuseofcontinuousactivelearning,asopposedtooneanddone,identifiedbyLoseyasPredictiveCodingVersion1.0.PredictiveCoding3.0,partone(e-DiscoveryTeam10/11/15).

[11] Grossman&Cormack,AutonomyandReliabilityofContinuousActiveLearningforTechnology-AssistedReview,CoRRabs/1504.06868(2015);Multi-FacetedRecallof

26

ContinuousActiveLearningforTechnology-AssistedReview,SIGIR’15,August09-13,2015,Santiago,Chile.(2015).

[12] Losey,R.,PredictiveCoding3.0,parttwo(e-DiscoveryTeam,10/18/15).[13] Shakespeare,W.,HenryVI,PtII,Act4,Scene2,71-78(“Thefirstthingwedo,let'skillall

thelawyers.”).Thisfamousanti-lawyerlinewasspokenby“Dickthebutcher,”atraitorhopingtostartarevolutionandpropuphisfriendasanautocraticruler.

[14] Losey,R.,PredictiveCoding3.0,partone)(2015e-DiscoveryTeam),seethesubsectiontherein,PredictiveCoding1.0andtheFirstPatents,discussingcommonprejudiceagainstlawyersbyacademicsandITthatdrovetheill-advisedimpositionofsecretcontrolsetsinthefirstversionsofpredictivecodingsoftware.ThenewdrivebyCormackandGrossmantofullyautomatelegalsearchandeliminateSMEsandattorneysearchexpertisefromlegalsearchseemsbased,atleastinpart,onthesamefalsepremises.AlsoseeLosey,R.,Manciav.MayflowerBeginsaPilgrimagetotheNewWorldofCooperation,10SedonaConf.J.377(2009Supp.);Losey,R.,LawyersBehavingBadly,60MercerL.Rev.983(Spring2009).

[15] SeeZeroErrorNumericsforapartiallistofqualitycontrolandqualityassurancemethodsendorsedbythee-DiscoveryTeam,foundatZeroErrorNumerics.com(ZENDocumentReview).Alsosee:ConceptDriftandConsistency:TwoKeystoDocumentReviewQuality,e-DiscoveryTeam(Jan.20,2016).

[16] Thecostoftraditionallineardocumentreviewisoftenfarhigherthan$1.00perfileinpractice.In2007theU.S.DepartmentofJusticespent$9.09perdocumentforreviewintheFannieMaecase,eventhoughitusedcontractlawyersforthereviewwork.InreFannieMaeSecuritiesLitig.,552F.3d814,817(D.C.Cir.2009)($6,000,000/660,000emails).AtaboutthesametimeVerizonpaid$6.09perdocumentforamassivesecondreviewprojectthatenjoyedlargeeconomiesofscaleand,again,utilizedcontractreviewlawyers.Roitblat,Kershaw,andOot,Documentcategorizationinlegalelectronicdiscovery:computerclassificationvs.manualreview.JournaloftheAmericanSocietyforInformationScienceandTechnology,61(1):70–80,2010($14,000,000toreview2.3milliondocumentsinfourmonths).

[17] E.M.Voorhees,VariationsinrelevancejudgmentsandthemeasurementofretrievalEffectiveness,InformationProcessing&Management,36(5):697{716,2000(onpooling);Oard,Baron,Hedlin,lewis,Tomlinson,EvaluationofInformationRetrievalforE-Discovery,JournalArtificialIntelligenceandLaw,Vol.18Issue4,December2010Pgs.347-386.

[18] AutonomyandReliability,supraatpgs.2-3(“Thispaperoffersahistoricalreviewofresearcheffortstoachievehighrecall...”ThepaperalsoestimatestheBlairMaronprecisionscoreof20%andliststhetopscores(withoutattribution)inmostTRECyears);Hedin,Tomlinson,Baron,andOard,OverviewoftheTREC2009LegalTrack(TREC2009);Cormack,Grossman,Hedin,andOard;OverviewoftheTREC2010LegalTrack(TREC2010);Grossman,Cormack,Hedin,andOard,OverviewoftheTREC2011LegalTrack(TREC2011);EvaluationofInformationRetrievalforE-Discovery,supraatpgs.24-27.ThetopTRECresultscitedforthesixyearsofLegaltrackareinthe60%to70%F1rangewithacoupleofresultsinthelow80%F1range.TheRecommindparticipationinthelastTRECLegalTrack2011,andtheirsubsequentprohibitedmarketingadvertisementsclaimingto“win,”whichledtotheirlifetimebanfromTREC,onlyattainedaRecallof62.3%inonetopic(403).OverviewoftheTREC2011LegalTrack(TREC2011)supra.ContrastallofthepriorTRECresultswiththee-DiscoveryTeamresultsin18topicsinthe80%to100%F1range,withnumeroustopicsinthemidtohigh90%F1range.Of

27

course,thesedifferentTRECeventshadvaryingexperimentsandtestconditionsandsodirectcomparisonsbetweenTRECstudiesarenevervalid,butgeneralcomparisonsareinstructiveandfrequentlymadeinthecitedliterature.

[19] SeethereportontheElectronicDiscoveryInstitute(EDI)Oraclelegalsearchexperimentsinvolvingthelargestnumberoflegalsearchparticipantstodatewhereamemberofthee-DiscoveryTeamattainedhighscores.Bay,M.,EDI-OracleStudy:HumansAreStillEssentialinE-Discovery:PhaseIofthestudyshowsthatolderlawyersstillhavee-discoverychopsandyoudon’twanttoturnEDDovertorobots(11/20/13,LTN).MonicaBay,theEditorofLawTechnologyNews,summarizestheconclusionofEDIfromthestudythat:“Conclusion:Softwareisonlyasgoodasitsoperators.Humancontributionisthemostsignificantelement.”PatrickOot,co-founderoftheElectronicDiscoveryInstitutepresentedthefindingsofPhaseIIoftheOraclePredictiveCodingSurveyatILTACONDay3,asreportedinTheRelativityBlog,9/2/15:“[W]henitcomestowhatsomevendorscallContinuousActiveLearning,Ootindicatedthedebatewassomewhatofaredherring,adding,“ContinuousActiveLearningisjustabuzzword.”Ootsummeduphisthoughtsbystressingthehumancomponentoftechnology-assistedreview.NotingthatthebestperformingtechnologyintheOraclestudywastheoneusedbyaseniorattorney,Ootsaid,“Agoodartistwithagoodbrushisbest.”UnfortunatelythefinalresultsoftheEDIOraclestudyhavenotyetbeenpublishedand,asparticipantsinthatstudy,wearecurrentlyconstrainedfromanydetailedreporting.

[20] SeePreSuit.comwherethee-DiscoveryTeam’sproposalisoutlinedtomonitortheITsystemsoflargeorganizationswithadvancedanalyticsandothersearchmethodstopredictandavoidfutureillegalconduct.Thisman-machinehybridtypeofearlywarningsystemincludessafeguardstoprotectbothindividualprivacyrightsandconfidentialcorporateinformation.

APPENDIX

E-DiscoveryTeam89-PageNarrativeReportofall30Topics

ThisAppendixNarrativeReportdescribesthesearchofallthirtyTotalRecalltopicsinTREC2015

usingthee-DiscoveryTeam’sHybridMultimodalmethod.Thereportfollowsthechronological

orderinwhichthesearcheswereconducted.ThefirstprojectstartedonJuly14,2015.Itwas

Topic103ManateeProtection.ThelastTopic3089PicktonMurdersconcludedonAugust28,2015.AtthebeginningofeachTopictheresultsarereportedforthatTopic.Eachhasthesame

formanddisclosesmetricsatthetimeswhen:(1)theReasonablecallwasmade;and,(2)the

pointwhere97.5%Recallwasattained.Theyaresummarizedalongwithavariationofa

standardConfusionMatrix,a/k/aContingencyTable1TheConfusionMatrixitselfishighlighted

inblue.Itisfollowedbyalistofthekeythevaluesattained:Recall,Precision,F1Measure,Accuracy,Error,ElusionandFallout.

Workonmultipletopicswasconductedatthesametime.Sullivan,whoworkedoneighttopics,

Reichenberger,whoworkedonfour,andWhite,whodidone,eachworkedonasingletopicata

time.Theydid,however,workconcurrentlywithLoseyandeachother.Losey,whoworkedon

seventeentopics,andhadtheassistanceofacontractreviewattorneyonthetenBushEmail

Topics,typicallyworkedconcurrentlyonmultipletopicsatthesametime.AllTopicswerea

Teameffort,buttheattorneysidentifiedasrunningeachTopiccontrolledthereviewworkforthatTopic.Consultationwascommon,especiallyatfirst.

Topic103ManateeProtection

ConfusionMatrix-Topic103TotalDocuments:290,099

TotalRelevant:5,725

TotalPrevalence:1.97%

1Grossman&CormackGlossary,supraFN1atpg.6.TheConfusionMatrixisalsoreferredtoasaContingencyTable.

@Reas.Call

@97.5%Recall

TruePositives 4,780 5,582

TrueNegatives 284,348 283,793

FalsePositives 26 581

FalseNegatives 945 143

Recall 83.49% 97.50%

Precision 99.46% 90.57%

F1Measure 90.78% 93.91%

Accuracy 99.67% 99.75%

Error 0.33% 0.25%

Elusion 0.33% 0.05%

Fallout 0.01% 0.20%

2

Thee-DiscoveryTeam’sTRECTotalRecallprojectcommencedonJuly14,2015withworkonTopic103ManateeProtection.ThistopicwasrunbyLosey.HedidnotcompleteworkuntilJuly22,2015.Althoughitmayseemfasttoseeareviewof290,099documentscompletedbyoneattorneyinonlyeightdays(withnobreaks),therewasmoretimespentonthistopicthananyoftheothers.Butasignificantamountofthistimewasspentongeneralset-up,procedures,contractreviewertraining,projectorientation,andcommunicationprotocols.CompletionofthisTopicwasalsodelayedduetotheavailabilityofthecontractreviewattorney,AnneBottolene,whoassistedLoseyforthefirstpartoftheworkonTopic103,andduetosomeinitialsoftwareconfigurationsetupissues.TheTeamfoundthisTopicchallengingforavarietyofreasons,includingthefactthattheBushcollectionof290,099emailshadbeenstrippedofitsoriginalmetadata,images,andattachments.Further,wefoundsomeinconsistenciesinjudgingthistopic,althoughnotmany.OverallwefoundTopic103hadoneofthebestgold-standardsofthetenBushEmailTopics.RalphLoseyisanativeFloridianandFloridaattorneyfor35years.HewassomewhatknowledgeableaboutalloftheBushEmailissues,certainlyfarmoresothantheaverageperson,buthedidnotconsiderhimselfabonafidesubjectmatterexpert(SME)onanyofthem.Losey’sknowledgeandinterestonManateeProtectionissueswas,however,higherthantheotherBushTopics.Forthatreasonitwaschosenasthefirsttopic.Losey’sassistant,Bottolene,hadlivedinFloridaforseveralyearsandalsohadsomebackgroundwiththeManateeProtectionissue.TheygenerallyconsideredtheirfamiliaritywiththeissuetobeanassetinthesearchofTopic103.ThesamecannotbesaidofotherBushEmailTopics.TheprojectcommencedafterinitialorientationonJuly14,2015withLoseybeginningStepTwo,MultimodalSearchReviews.BottolenewasassignedStepThree,RandomBaseline.DuetovariousschedulingandimplementationissuesBottolenedidnotcompleteherreviewofthesampleuntilJuly20,2015,lateafternoon.Shereviewedandcodedaseitherrelevantorirrelevantarandomsampleof1,534Bushemails.ThiswasoneofonlytwoTopicswhereinStepThreewasfollowedandafullrandomsamplewastaken.Itprovedveryhelpful.BasedonthesampleprevalencewepredictedaspotprojectionforprevalenceinTopic103of5,175documents(95%+/-2.5%confidencelevels).Infact,thetotalrelevantdocumentsinTopic103provedtobe5,725,wellwithinthe2.5%marginoferror.Basedonthelengthoftimeneededforrandomsamplereview,andourdesiretocompleteallthirtytopicsin45days,wedecidedtoskipthisstepforensuingreviews.(Topic101JudicialSelectionwasstartedshortly

3

afterTopic103,andalsoincludedStepThreeRandomBaseline.)Asmentioned,wealsoskippedmostoftheproceduresinStep7-“ZeroErrorNumerics”concerningqualitycontrolinthisandall30Topics.AfterBottolenecompletedtherandomsamplereviewonJuly20thsheassistedLoseyonJuly21stand22ndinhisworkonStepFiveMultimodalSearchReview.AtthattimesubmissiontoTREChadalreadybegunandtheTeamwasevaluatingtheconfirmedrelevantandirrelevantdocumentsfromTREC.Atotalof24documentsubmissionsweremadetoTRECinthisTopic:fourdocumentsubmissionsonJuly20th,oneofJuly21st,andtheremainingnineteensubmissionsweremadeonJuly22,2015.InbetweenmostofthesesubmissionstheTeamconductedStepsFour,FiveandSixofitsstandardworkflow.Thesearethepredictivecodingstepsthatiterate.InStepFourthesoftware,Mr.EDR,analyzesthedocumentsdesignatedfortraininginStepTwointheseedset,andinStepFivethereafter.Mr.EDRthenranksthewholedatasetaccordingtoprobablerelevanceandirrelevance.InStepFivetheattorneyssearchformoredocumentstousetotrainMr.EDR.ItisessentiallythesameasStepTwo,exceptnowtheattorneyscanaddprobabilityandrankbasedsearchestotheirmultimodaltoolkit.ThatistheTeam’sfullsearchpyramid,shownright.ThemethodsareusedadhocaccordingtowhattheattorneyreviewerconsidersapromisingmethodtofindadditionalrelevantdocumentsbasedinpartonthelatestEDRrankingsandTRECsubmissionreturns.Oncenewdocumentsarefoundthatarelikelytoberelevant,theyarethendesignatedinStepSixforTraining.Notalldocumentsaresodesignated.Againthisisatthediscretionoftheattorneysastowhatdocumentstheythinkwouldbestservetotrainintheongoingactivelearningprocess.InTopic103theuseofpredictivecodingrankedbasedsearcheswasseverelyconstrained.Thiswasduetoinitialconfigurationsetuperrors,whereinputparametersforthelearningengineweresetincorrectly.ThesesetuperrorsweredetectedandcorrectedbyJuly22,2015,andthereafterMr.EDRwasofgreatassistance.Still,asaresultofthedelaysandearlyerrors,thisTopicreliedmuchmoreheavilythananyotheronkeywordsearchesandhumanlinearreviews.Similaritysearcheswerealsousedextensively.BasicallythepredictivecodingassistanceinthisTopicdidnotbeginuntilthe14thsubmission.LoseycalledReasonableafterthe15thsubmission.IntheTRECexperimentsmost,butnotall,ofthedocumentsreturnedasrelevantorirrelevantbyTRECwereincludedintraining(StepSix).Inthatwaytheirrankingimpactwasevaluated(StepFour)beforethenextsubmission.TrainingalsoincludedvariousirrelevantdocumentsthatwerenotTRECadjudicated,butwerethoughttobeobviouslyirrelevant.Experimentsweremadeastotheimpactofvaryingthenumberofirrelevantdocumentsinthehopethatsome

4

idealrangeorratiocouldbedeterminedtomaximizeMr.EDRefficiency.Theseexperimentsarestillunderway.OurconclusionsasoflateDecember2015arestatedinthebodyofthisreport.Afteratotalof15submissionsthatpresented4,806documentstoTRECforadjudication,LoseycalledReasonableandstoppedworkonJuly22,2015,aweekaftertheTopicstarted.Thereafteranadditional9submissionsweremadetoTRECtosubmittheremaining285,293emails(98.34%ofthe290,099total).TherewasTraininginbetweenmostoftheremainingsevensubmissionsbasedontheTRECadjudications,butnofurtherhumaninput.Thefirsttwopost-callsubmissionswerecriticaltotheTeam’sexcellentperformanceonthisTopic.LoseycalledReasonableatthepointhethoughtthatareasonablehumanefforthadbeenmadetofindrelevantdocuments.LoseyandhisassistantBottolenehadpersonallyreviewedandcodedasrelevantorirrelevant7,203documents.(Additionaldocumentshadbeencodedwithoutreview.)Infact,bythetimeLoseyhadsubmitted2,309documentstoTRECforadjudication(the14thsubmission)hehadcompletedallindividualdocumentreview(7,203documents),andhadcompletedallsearchesotherthanpredictivecodingrankingsearcheswheredocumentcontentisnotreviewed.Atthattime(afterthe14thsubmission)heessentiallyturnedtheprocessovertoMr.EDR,whohadbythenjustrecoveredfromanearliertechnicalillnessandhadnotbeenfunctionalbefore.AtthetimeLoseycalledReasonablehehadsubmittedatotalof4,806documents.Ofthose,4,780hadbeenadjudicatedasrelevant.ThiswasanincrediblePrecisionrateof99.46%.ThiswasthemostPreciseproductionthatLoseythinkshehasevermade.Healsothoughtthathemayhaveattainedashighasa90%Recall,but,infactthelatersubmissionsshowedthatatthetimeReasonablewascalledhehadattainedaRecallof83.5%.ThisisstillconsideredahighRecalllevelinlegalsearch,andthecombinedF1measureof90.8%is,inlegalsearch,likeanyother,averyoutstandingeffort.ThenextsubmissionsafterReasonablewascalledwerealwaysthedocumentsthatwerehighestrankedbyMr.EDR,whichiswhywecallthisanautomatedfunction.AsweunderstandthegamesetupbyTRECfortheRecallTrack,theactualscoringisnotimpactedbytheReasonablecall.Thescoringcontinuesforallsubmissionsuntilalldocumentshavebeenreturned.TheReasonablecallismerelyanindicationofefforts.Thesamegoesforthe70%,80%recallcalls,whenandiftheyaremadebeforetheReasonableeffortcall,excepttheyareofevenlessinterest.Thesecallswerenotsupposedtohaveanimpactonscoring.InthefirsttwosubmissionsafterthecallinTopic103,the16thand17thsubmissions,Mr.EDRidentifiedandhighlyranked661additionalrelevantdocuments,bringingthetotalrelevantfoundto5,467outofthetotal5,725.WeweretherebyabletoattaininthatsubmissionaRecallof90%withPrecisionof99.33%,aRecallof95%withPrecisionof98.8%,and97.5%RecallwithaPrecisionof90.57%!AsfarasLoseyknows,thesestatisticsrepresenthispersonalbestefforts,especiallyconsideringthathedidsowithverylittlerelianceonpredictiveranking.Whatmakesthis97.5%Recall,90.6%Precisionallthemoreremarkableforlegalsearchisthatitwasaccomplishedbyonlyoneexpertattorneyassistedbyonecontractreviewattorney.Themeasuredefforttoattainthesehighlevelswasremarkablylow,especiallyconsideringthatasignificantamountoftimeinTopic103wasspentreviewingthebaselinesample(StepThree).Togetherthetwoattorneysonlyreviewed7,203documentsoutofthetotalcorpusof290,099

5

emails(2.5%).Inlegalsearchitiscommonforattorneyreviewteamstoconsistofdozensorevenhundredsofattorneys.Moreover,evenwhenpredictivecodingisused,afarhigherpercentofthecorpusistypicallyreviewedthan2.5%,andRecalllevelsof97.5%areunheardof,muchlessprecisioninexcessof90%.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.(Pleasenote,thatthegraphisnottoscaleasthegraphisbasedonindividualsubmissions.Wethoughtthisabetterdepictionthanbyproportionallyshowingprogressbecauseinmostcasesaproportionalgraphwouldbealinevirtuallystraightupfromthestartandflatgoingover).

ThenextchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheManateeProtectiontopic,bythetime97.5%Recallhadbeenattainedonly2.12%ofthecorpus,6,163documents,hadbeensubmittedforadjudication.Thisisatriumphforthesearchpyramidfoundation,especiallykeywordsearch,thatsupportsAItraining.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining97.88%or283,936documents.

6

Thechartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

______________________________________

Topic2108CAPTCHAServices

ConfusionMatrix-Topic2108TotalDocuments:465,147TotalRelevant:656 TotalPrevalence:0.14%

@Reas.Call

@97.5%Recall

TruePositives 580 640TrueNegatives 463,566 458,906FalsePositives 925 5,585FalseNegatives 76 16Recall 88.41% 97.56%Precision 38.54% 10.28%F1Measure 53.68% 18.60%Accuracy 99.78% 98.80%Error 0.22% 1.20%Elusion 0.02% 0.00%Fallout 0.20% 1.20%

7

Topic2108wasrunbyLoseywithoutanyassistanceofareviewlawyer.Theworktosearchthe465,149BlackHatWorldForumpostsstartedonJuly16,2015,butdidnotconcludeuntilAugust1,2015.ThereasonforthedelayincompletionisthattheTeamencountereddifficultiesinunderstandingtheinitialTRECadjudicationstotheirfirstsubmissions.NeitherLosey,northeotherattorneyTeammembersconsulted,couldunderstandtherelevancepatternbehindTREC’sinitialsubmissionresponses.DuetotheinitialEDRconfigurationerror,predictivecodingwasnotavailabletoassistatfirstinascertainingtherelevancescope.Afterseveraldaysofstrugglingwiththisproject,LoseyputthisTopiconholduntilJuly29thatwhichtimeLoseyreturnedtotheTopictofinish.AsageneralcommenttheTeamfoundalloftheBlackHatWorldForumpostschallengingtosearch,moredifficultthanatypicalsearchofcorporateESI.Thatisinpartbecausealmostallmetadataoftheseposts,andallassociatedimagery,hadbeenstrippedbyTRECandtheESIconvertedtotextfiles.Alsothelanguageandissues(allnon-legal)intheBlackHatWorldForumswereobscure.Eventhoughourattorneysearcherswereallfamiliarwithforumsandhadknowledgeofmostofthetechnologiesandsometimesillegal,nearlyalwaysunethical,marketingpracticesdiscussedinBlackHatWorld,theystillfoundtheslang-filledpostsdifficulttoreviewandanalyze.Thechallengeswerecompoundedbysignificantinconsistencies,andapparentillogicoftheTRECjudginginmanyofthesetopics.Still,theTeamwasabletoovercomethesechallengesand,afterwelearnednottotrytounderstandanyrelevancerules,weoveralldidquitewellinreviewofthetenBlackHatWorldForumTopics.Basedontheelusive(tohumans)relevancestandard,wefoundthatthesetopicsrequiredgreaterrelianceonMr.EDRthantheBushEmailsandNewsArticles.EventhoughwecontinuedtouseamultimodalapproachinForumtopics,ouremphasiswasontheAIfeaturesofrankingandprobability.TheTeamreadilyadmitsthatitsownhumanintelligence,withouttheconsiderableAIenhancementsofMr.EDR,wasnotuptothetaskofmatchingTRECrelevancecallsfortheForumTopics.Butwiththehelpofpredictivecoding(Me.EDR)weovercamethedifficultiesandattainedrelativelyhighrecalllevels.OnJuly31,2015,aftermaking22documentsubmissionstoTRECprovidingatotal1,505documents,Loseyhadfoundatotalof580relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was2,101documents.Infact,Loseyhadstoppeddocumentreviewafterthe21stsubmission.His22ndsubmissionwasentirelybasedondocumentrankingswithoutreview.Afterthe22ndTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof88.41%hadbeenattained.ThereweresevenadditionalsubmissionstoTRECaftertheReasonablecallpoint.Inthenext,23rdsubmission,95%Recallwasattainedaftersubmittingonly2,130additionaldocuments.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.(Pleasenote,thatthisgraph,andallotherslikeit,arenottoscaleasthegraphsarebasedonindividualsubmissions.Wethoughtthisabetterdepictionthanbyproportionallyshowingprogressbecauseinmostcasesaproportionalgraphwouldbealinevirtuallystraightupfromthestartandflatgoingover).

8

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheCAPTCHAServicestopic,bythetime97.5%Recallhadbeenattainedonly1.34%ofthecorpus,6,225documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining98.66%or458,922documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingthemultimodalhybridmodelofsearchandtrainingofMr.EDR.

9

______________________________________

Topic101JudicialSelectionConfusionMatrix-Topic101TotalDocuments:290,099TotalRelevant:5,834 TotalPrevalence:2.01%

@Reas.Call

@97.5%Recall

TruePositives 5,026 5,688TrueNegatives 283,608 281,901FalsePositives 657 2,364FalseNegatives 808 146Recall 86.15% 97.50%Precision 88.44% 70.64%F1Measure 87.28% 81.93%Accuracy 99.49% 99.13%Error 0.51% 0.87%Elusion 0.28% 0.05%Fallout 0.23% 0.83%

10

Topic101wasrunbyLoseywiththeassistanceofareviewattorney,DavidJensen.Theworktosearchthe290,099BushEmailsstartedonJuly16,2015andconcludedonJuly26,2015.TheprojectcommencedwithLoseybeginningStepTwo,MultimodalSearchReviews,andJensenassignedStepThree,RandomBaseline.JensenfinishedtherandomsamplereviewthenextdayandbeganassistingLoseyinStepTwo,andaftersubmissionsbegan,theechoStepFive,multimodal.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Jensenfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.FinaldecisionsonsubmissionswerealwaysmadebyLoseyonallTopics.DuetothesamementionedinitialconfigurationsetuperrorstheAIfeaturesdidnotworkuntilneartheendofthisTopic.LoseyinsteadreliedheavilyonKeyword,linear,andanewtypeofSimilaritysearchtheTeaminventedoutofnecessityduringTRECevents.ItisanticipatedthatthenewsimilaritysearchfeaturewillbeincludedinfutureMr.EDRreleases.Reviewoftherandomsampleof1,534Bushemailsfound30thatwererelevant.Thatsuggestedaprevalenceof1.96%andaspotprojectionof5,673documents.Theactualrelevantcountof5,834andprevalenceof2.01%wasveryclosetotheprojection.NotethisisthesecondandlastTopicinwhichafullStepThreerandomsamplewasimplemented.OnJuly25,2015,aftermaking15documentsubmissionstoTRECprovidingatotal5,683documents,Loseyhadfoundatotalof5,026relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was6,895documents.Infact,Loseyhadstoppeddocumentreviewafterthe14thsubmission,ashis15thsubmissionwasentirelybasedondocumentrankingswithoutreview.Afterthe15thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof86.15%hadbeenattainedwithaPrecisionof88.44%.Therewereanadditional8submissionstoTRECaftertheReasonablecallpoint.Inthenext,the16ththerewasasubmissionof652documents,345ofwhichwererelevant.95%Recallwith82.7%Precisionwasattainedaftersubmittingonly6,705documents(1,022afterReasonablecall).97.5%Recallwith70.6%Precisionwasattainedaftersubmittingonly8,052documents(2,369afterReasonablecall).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

11

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheJudicialSelectiontopic,bythetime97.5%Recallhadbeenattainedonly2.78%ofthecorpus,8,052documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining97.22%or282,047documents.


12

______________________________________

Topic108ManateeCountyConfusionMatrix-Topic108TotalDocuments:290,099TotalTRECRelevant:2,375 TotalTRECPrevalence:0.82%

Topic108wasrunbyLoseywiththeassistanceofareviewattorney,Bottolene.Theworktosearchthe290,099BushEmailsalsostartedonJuly16,2015andconcludedonJuly24,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearch

UsingTRECrelevantcalls

@Reas.Call

@97.5%Recall

TruePositives 734 2,316TrueNegatives 287,712 26,197FalsePositives 12 261,527FalseNegatives 1,641 59Recall 30.91% 97.52%Precision 98.39% 0.88%F1Measure 47.04% 1.74%Accuracy 99.43% 9.83%Error 0.57% 90.17%Elusion 0.57% 0.22%Fallout 0.00% 90.90%

13

Reviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasdonebyLoseywithassistanceatfirstofBottolene.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.AllfinalsubmittaldecisionsweremadebyLosey.

ObservationsontheErrorsofRelevanceJudgmentsinThisandOtherTopicsThiswasthemostfrustratingofalloftheTRECRecallTopicsfortheTeamtoworkonbecausethejudgmentsonrelevancecontainedmoreobviouserrorsandinconsistenciesthananyother.ThisTopicwasManateeCounty,asopposedtoTopic103,whichwasManateeProtection,whichofcoursereferredtotheendangeredmammal.Unfortunately,asalifelongFloridaattorney,LoseyhassubstantialindependentknowledgeofManateeCountyandmanatees.BottolenehadalsobeenaFloridaresidentforseveralyearsandanattorney.TheirdirectpersonalknowledgeofFloridaprovedtobeasignificantdisadvantageinthisTrack(and,toalesserextent,inotherTracks,especiallyonesthatcontainedobviouserrorsinrelevance)becauseTRECadjudicationswerenottiedtoactualfactsandreality(obviouslynooneatTRECwasaFloridaSME)andwereotherwisesurprising.ForinstanceinTopic108,eventhoughthesubjectwastheCountyofManatee,apoliticalentity,sometimes,butnotalways,anemailwithmerementionofthemammalmanateewouldbeconsideredrelevant,eventhoughtherewasnomentionoflocationorthecounty.Also,manyreferencestoManateeParkwereconsideredrelevanttoTREC,eventhoughthatparkis,asanyFloridianwouldknow,especiallyLoseywholivesinCentralFlorida,notlocatedinManateeCountyandotherwisehasnoconnectiontothecounty.Also,almostallemailaddressesthathadmanateeinthenamewerecalledrelevantbyTREC,eveniftheemailhadnothingtodowiththeCountyofManatee.Theremaywellbesomepatterntotheso-calledgoldstandardusedinthisTopic,butifso,itwasnotlogicalandnotknowntoBottoleneorLosey.ItappearedtotheseFloridians,afterthefact,tobelackofexpertiseonthepartofTREC.Otherteammembersreviewedtheseadjudicationslateragreed.Oneexamplewewerelaterabletofigureout:awell-knownFloridalawfirm(Holland&Knight)hasahomeofficeinBradenton,Florida,andtheattorneystherewouldoftenwritetothegovernor.Aspartofpost-hocanalysiswesawthatalmostalloftheseemailswereconsideredrelevantbyTRECassessorstothistopicsimplybecausetheofficecitywasintheirstandardsignaturelineaddress,eventhoughthecontentoftheemailshasnothingtodowithManateeCounty.SinceLoseyisusedtodirectinglegalsearchasanSME,ordirectSMEsurrogate,hisusualapproachtolegalsearchinvolvesusinghisknowledgeandunderstandingtodifferentiaterelevantfromirrelevant.Asmentioned,inlegalsearchunderstandingofrelevanceiscritical,infact,itisalegaldutyandresponsibilityoftheattorneysearchers.ThushispositionasanactualFloridaSMEservedasadisadvantageinmanyoftheBushemailTopics,includingthisone.TheTeamlaterencounteredotherTopicswithinconsistenciesandmistakeslikeTopic108.InsuchcasesweeventuallylearnedtostepoutoftheprocessandstoptryingtounderstandorlookforarationalbasisfortheTRECrelevancecalls.WewouldputasideourtraditionalSMErole,whichisotherwisethefirmlyestablishednorminlegalsearch.Instead,whenwefound

14

ourselvesinthissituation(andthishappenedinalittlelessthanhalfoftheTopics),wewould

basicallyturnthesearchandsubmissiondecisionsovertoMr.EDR.Inthosesituationswedid

noteventrytoseeanypatternorconsistencytotheadjudications.Whenweadoptedthis

approachinlatertopicswedidquitewell,inspiteofdefectswesawintheTRECgoldstandards.

ThissuggeststhatTREC’sselectionofrelevantdocumentsinsomeoftheTopicssufferedfrom

over-delegationtocomputerselectionwithoutadequateSMEbasedqualitycontrols.Itis

unknownwhatsoftwarewasusedbyTRECtocreatetherelevantgoldstandarddocumentset,

butlikeanypredictivecodingsoftwaretoday,itobviouslycanbeledastraywithoutadequate

humansupervisionandqualitycontrolsafeguards.Thisiswhythee-DiscoveryTeamadoptsa

hybridapproach,computerandhuman,includingSMEs,andwhyinnormalcircumstancesStep

SevenforqualitycontrolissoimportantundertheirPredictiveCoding3.0method.

Topic108Description

OnJuly23,2015,aftermaking10documentsubmissionstoTRECprovidingatotal746

documents,Loseyhadfoundatotalof734relevantdocuments(Precisionof98.4%).Theeffort,

ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was696documents.

Afterthe10thTRECsubmission,LoseydecidedtocallReasonable.Itwaslaterdeterminedthata

Recallof31%hadbeenattained.ThedecisiontocallReasonableprovedtobeabigmistake

becausetheTRECadjudicationswerenotlimitedtoManateeCountyrelevanceastheTeamhad

assumed.Asmentioned,theerrorwasbasedupontheTeam’sconstructionofrelevanceina

muchnarrowermannerthanTREC.ThedivergencewasnotknownbecausetheTeamdidnotdo

enoughexplorationofirrationalconstructionsandsodidnotdetectthe,toourmind,outlier

natureofTREC’sapproachtothisTopic.

TheTeamshouldhavebeenlessprecise(itssubmissionshadaPrecisionof98.4%),andshould

havepresentedmoredocumentsforsubmission,eventhoughtheTeamdidnotpersonally

considerthemtoberelevant.Itshouldhavebettertesteditsrelevanceconcept.Butas

mentioned,asanSMELoseywasusedtosettingthescopeofrelevance,andaslawyers,the

entireTeamwasusedtorationaladjudicationsofrelevancealonglinesthatmakesensetothem.

15

Thiswasanearlytopicforusintheprocessandwehadnotyetlearnedtomistrustourownassessments.Therewere6additionalsubmissionstoTRECaftertheReasonablecallpoint.Inretrospect,thiswasalsoanerror.TheTeamshouldhavesubmittedmultiplesmallersubmissionsaftertheystartedtodiscovertheoutliernatureoftheTRECadjudications,withtrainingbetweeneachsubmissionwhereMr.EDRcouldtakeoverinanautomatedfashion.Thiswasanothergame-typelessonlearnedthehardwaybythisTopic,whichprovedtobetheTeam’sworstperformance.EvenintheworstcasewithmultiplemistakestheTeamstillmanagedtoattain78%Recallwithreviewofonly696documents,andsubmissionofonly60,817ofthetotal290,099documents.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheManateeCountytopic,bythetime97.5%Recallhadbeenattained90.95%ofthecorpus,263,843documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining9.05%or26,256documents.

16


CorrectionoftheGoldStandardRelevanceSetinTopic108SincetheTeamisconsideringuseoftheBushemailsetinfurthertesting,trainingandresearch,theywantedtotrytocorrectthemanydeficienciestheysawinTREC’sdeterminationofthegoldstandardforthisTopic.TheyalsowantedtobetterunderstandwhythescoreonthisTopicwassooutofrangefromtheirotherscores.Withthisinmindtheyre-reviewedtheTREC

17

adjudicationsandsetupathree-attorneypeerreviewofallerrorsspottedintherelevancydeterminations.AconservativeapproachwastakenanddeferencewasgiventotheTRECadjudicationswherearational,consistentbasiscouldbefound.Losey’spersonal,narrowviewofwhatshouldberelevantwasnotfollowed,iftherewasareasonseentofollowTREC’sadjudications.(Note,theTeamandothersinthefiledofLegalSearch,haveobservedovermanyprojectsthatSMEstypicallytakeamorenarrowviewofrelevancethannon-SMEswho,bydefinition,donotunderstandthesubjectaswell.)Loseyacceptedalladverserulingsagainsthisownpositionsaspartofthisprocess.AlsonotethatsuggestionstoreviseTRECadjudicationscamefromallthreeTeammembers,notjustLosey,andwereallsubjecttomultiplereviewsandobjections.Afterthere-reviewandre-adjudicationprocesswascompleted,1,264documentsadjudicatedasrelevantbyTRECwerechangedtoIrrelevant.Further,3documentsadjudicatedasirrelevantbyTRECwerechangedtorelevant.BelowarethecorrectedmetricsoftheTeam’sreviewundertheimprovedadjudications.ConfusionMatrix(Adjusted)-Topic108TotalDocuments:290,099TotalAdjustedRelevant:1,114(was2,375)(1,264changedtoIrrelevant,3ChangedtoRelevant) TotalAdjustedPrevalence:0.38%(was0.82%)

Afterthe10thTRECsubmission,whenLoseydecidedtocallReasonable,Loseyhadfoundatotalof736relevantdocuments(anincreaseof2documents)undertheadjustedgoldstandard.ThiswasaRecallof66.07%andPrecisionof98.66%undertheadjustedstandard.TheF1measurewas79.14%.Notethatthesemetricsaremuchmoreinlinewiththeother29projects,althoughtheadjusted66%RecallisstilltheTeam’ssecondtolowestRecallscoreattheReasonablecallpoint.UnderthecorrectedstandardtheTeamattained94.43%Recallwithreviewofonly696documents,andsubmissionofonly60,817ofthetotal290,099documents.AgraphmappinghowthereviewbyRecallattainedafternumberofdocumentssubmittedisshownbelowwithboththeoriginalTRECstandard(blue)andtheTeamadjustedstandard(red).

Usingadjustedrelevantcalls

@Reas.Call

@97.5%Recall

TruePositives 736 1,087TrueNegatives 288,975 131,844FalsePositives 10 157,141FalseNegatives 378 27Recall 66.07% 97.58%Precision 98.66% 0.69F1Measure 79.14% 1.36%Accuracy 99.87% 45.82%Error 0.13% 54.18%Elusion 0.13% 0.02%Fallout 0.00% 54.38%

18

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholdsundertheadjustedstandard.

______________________________________

19

Topic2052PayingforAmazonBookReviewsConfusionMatrix-Topic2052TotalDocuments:465,147TotalRelevant:265 TotalPrevalence:0.06%

Topic2052wasrunbySullivan,whostartedonJuly20,2015,andconcludedJuly22,2015.ThiswasSullivan’sfirstTopic.Forthatreasonhespentmoretimethaninhislaterreviewsintryingtounderstandthedatasetandprocesses.Sullivanhasabackgroundincomputersandprogramming.Hehassubstantialexperienceinforumstounderstandtheuniquecharacteristicspresentinforumcommunications.Whileheconsidershimselffarmoreknowledgeablethantheaverageperson,hehasnoexperiencewiththeunethicalworldofBlackhatForumsanddoesnotconsiderhimselftobeabonafidesubjectmatterexpert(SME)onanyofthem.Allforumtopicspresentedauniquechallengeofidentifyingvariationsoftermsandunderstandinguseofslang.Whilethisprovedtobeeasytoovercome,itcertainlyplayedavitalroleintheprocessinawaynotnecessaryintheNewstopics,wherespellingerrorswerelargelynon-existent.Onthefirstday,SullivanstartedwithStepThree,RandomBaselineandreviewedarandomsampleof1,534documents.ThiswasusedbothasamethodtoestimateprevalenceandameansofgainingbetterunderstandingofthedatasetforthisandfuturetopicsinAtHome2.Thisrandomsampleyielded1relevantdocument.Basedonthesampleprevalencewepredicted303relevantdocumentsexistedinthedataset(95%confidencelevelwith2.5%marginoferror).Wewouldlaterdiscoverthedatasetcontained265relevantdocuments,whichiswellwithinthemarginoferror.Giventheamountoftimenecessarytocompletethisrandomsample,andthelittlevaluegained,StepThreewasomittedfromallsubsequenttopicsreviewedbySullivan.

@Reas.Call

@97.5%Recall

TruePositives 257 259TrueNegatives 464,364 464,165FalsePositives 518 717FalseNegatives 8 6Recall 96.98% 97.74%Precision 33.16% 26.54%F1Measure 49.42% 41.74%Accuracy 99.89% 99.84%Error 0.11% 0.16%Elusion 0.00% 0.00%Fallout 0.11% 0.15%

20

DaytwowasspentrunningkeywordsearchestofinddocumentsforseedingintothepredictivecodingalgorithmandsubmittingdocumentstogetabetterunderstandingtheTRECstandardforrelevance.Attheendofdaytwo,273documentshadbeensubmitted,with204beingreturnedasrelevant.Thisprovidedanadequateseedsettobeingrelyingmoreheavilyonpredictivecoding.Ondaythree,Sullivandevelopedastrategywhichhereliedheavilyinfuturetopics.RatherthanrelyingonMr.EDRaloneandreviewingthedocumentsthatweregivenhighscoresbythemachine,heusedthemulti-modalapproachtoprioritizedocumentsforreview.Startingwithallvariationsof“Amazon”w/5“Review,”heworkeddownreviewingandcategorizingthehighestscoringdocumentsfirst.Whenhehitapointwherefewrelevantdocumentswerebeingfound,heiterativelyexpandedthescopeofhisreviewuniverse.Hemovedtoallvariationsof“Amazon”w/10“Review,then“Amazon”w/25“Review,”and“Amazon”AND“Review.”Heexpandedinto“Amazon”and(“Review”or“Book”or“Feedback”or“Purchase”)andeventuallytoanydocumentcontainingavariationof“Amazon.”Aspreviouslymentioned,theuniquecharacteristicsoftheforumsrequiredmorecreativesearchesthannecessaryinotherdatasets.UsingtheConceptSearchingtoolasaguide,itwasdeterminedthatalmostallreasonablevariationsof“Amazon”couldbefoundusingthefollowingsearch:(“amazon*”OR“@mazon”OR“@maz0n”OR“azmon*”OR“azmn*”OR“amzn*”).Thismethodprovedeffectiveineliminatingissuesofmisseddocumentsduetoslangormisspelling.Usingthismethod,Sullivanwasabletoidentify257ofthe265relevantdocumentsatthetimehecalledReasonableeffort.2,325totaldocumentshadbeenreviewed,includedthe1,534documentsintheinitialrandomsample.AftercallingReasonableeffort,Sullivancontinuedbysubmittingalldocumentsthatcontainedanyvariationoftheterm“Amazon”inorderofpriorityscoredescending.100%recallwasobtainedthroughthismethod.Allremainingdocumentswerethensubmittedindescendingpriorityorder,withnomorerelevantdocumentsbeingreturned.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,slightlydarkerlinesignifies80%RecallcallandthedarkgreenlinetheReasonableRecallcall.

21

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePayingforAmazonBookReviewstopic,bythetime97.5%Recallhadbeenattainedonly0.21%ofthecorpus,976documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.79%or464,171documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemultimodalhybridmodeloftrainingEDR.

______________________________________

22

Topic2225Rootkits


Topic2225wasrunbyLoseywhostartedthesearchof290,099BlackHatForumpostsonJuly21,2015andconcludedonAugust18,2015.LoseyputasideworkonthisTopicseveraltimeswhilehegaveprioritytotheJebBushEmailTopics.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust,2015,aftermaking12submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotal201documentstoTRECandconfirmedatotalof163relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was205documents.Afterthe12thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof89.56%hadbeenattainedwithaPrecisionof81%.Therewere23additionalsubmissionstoTRECaftertheReasonablecallpoint.A90%Recallwasattainedaftersubmittingonly212documents.A95%Recallwasattainedaftersubmitting891documents,and97.5%Recallattainedafter3,188documents.TotalRecallwasattainedaftersubmitting12,109documentsoutofthecorpustotalof465,147.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

@Reas.Call

@97.5%Recall


23

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheRootkitstopic,bythetime97.5%Recallhadbeenattainedonly0.69%ofthecorpus,3,188documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.31%or461,959documents.


24

______________________________________

Topic102CapitalPunishmentConfusionMatrix-Topic102CapitalPunishmentTotalDocuments:290,099TotalRelevant:1,624 TotalPrevalence:0.56%

@Reas.Call

@97.5%Recall

TruePositives 941 1,583TrueNegatives 288,345 17,048FalsePositives 130 271,427FalseNegatives 683 41Recall 57.94% 97.50%Precision 87.86% 0.58%F1Measure 69.83% 1.15%Accuracy 99.72% 6.42%Error 0.28% 93.58%Elusion 0.24% 0.24%Fallout 0.05% 94.09%

25

Topic102wasrunbyLoseywiththeassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonJuly26,2015andconcludedonJuly29,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,washandledwiththeassistance,atfirst,ofJensen.LoseyperformedalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnJuly28,2015,aftermaking20submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotal1,071documentstoTRECandconfirmedatotalof941relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was1,493documents.Afterthe20thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof57.94%hadbeenattainedwithaPrecisionof87.86%,sohiscallprovedtobeearly.Therewereonly3additionalsubmissionstoTRECaftertheReasonablecallpoint,whichwelaterlearnedwasamistake.WelearnedlaterthathigherRecallandoverallTRECscoringcomesfrommultiple,smallersubmissions,withtrainingaftereach.ThisisanotherTopicinwhichwefoundmanyoftheTRECjudgmentsinconsistentandincomprehensible.Still,evenwiththeseproblemsanderrors,aRecallof70%wasattainedafteratotalofonly7,785documentshadbeensubmittedoutof290,099,andonly1,493documentshadbeenreviewed.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%RecallCall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheCapitalPunishmenttopic,bythetime97.5%Recallhadbeenattained94.11%ofthecorpus,273,010documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining5.89%or17,089documents.

26


______________________________________

27

Topic 106TerriSchiavoConfusionMatrix-Topic106TotalDocuments:290,099TotalRelevant:17,135 TotalPrevalence:5.91%

Topic106wasrunbyLoseywiththeassistanceofareviewattorney,Bottolene.Theworktosearchthe290,099BushEmailsstartedonJuly27,2015andconcludedonAugust2,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalwashandledwiththeassistanceatfirstofBottolene.LoseyperformedalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.ThisreviewprocesswentlongerthanotherbecausethisprovedtobethehighestprevalenceTopic(5.91%).OnAugust2,2015,aftermaking25submissions,withtrainingaftermostofthese,Loseyhadsubmittedatotal17,354documents.Atotalof16,872ofthesesubmissionswereconfirmedrelevantbyTREC,foraPrecisionrateof97.22%.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was2,025documents.Afterthe25thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthatanincredibleRecallof98.47%hadbeenattained.TheF1measurewas97.84%.ThatistheTeam’sbestresultonanyoftheBushEmailTopics.Further,LoseybelievesthismaybeapersonalbestforRecallandF1scores.Therewere7additionalsubmissionstoTRECaftertheReasonablecallpoint.Inthe29thsubmission,99.7%Recallwasattainedaftersubmittingonly7,060additionaldocuments.ThePrecisionwas70%.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonableRecallCall.

@Reas.Call

@97.5%Recall

TruePositives 16,872 16,707TrueNegatives 272,482 272,551FalsePositives 482 413FalseNegatives 263 428Recall 98.47% 97.50%Precision 97.22% 97.59%F1Measure 97.84% 97.54%Accuracy 99.74% 99.71%Error 0.26% 0.29%Elusion 0.10% 0.16%Fallout 0.18% 0.15%

28

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheTerriSchiavotopic,bythetime97.5%Recallhadbeenattainedonly5.90%ofthecorpus,17,120documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining94.10%or272,979documents.


29

______________________________________

Topic105AffirmativeActionConfusionMatrix-Topic105TotalDocuments:290,099TotalRelevant:3,635 TotalPrevalence:1.25%

@Reas.Call

@97.5%Recall


30

Topic105wasrunbyLoseywiththeassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonJuly29,2015andconcludedonJuly31,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwiththeassistanceatfirstofJensen.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnJuly30,2015,aftermaking23documentsubmissionstoTRECprovidingatotal3,418documents,Loseyhadfoundatotalof3,353relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was674documents.Afterthe23rdTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof92.24%hadbeenattained,withPrecisionof98.1%,andF1of95.08%.Therewere7additionalsubmissionstoTRECaftertheReasonablecallpoint.Inthe27thsubmission,aftersubmittingonly3,427additionaldocuments(total6,845),95%Recallwasattained.Thiswasattainedaftersubmissionofonly2.36%ofthetotaldocuments.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheAffirmativeActiontopic,bythetime97.5%Recallhadbeenattainedonly2.90%ofthecorpus,8,423documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining97.10%or281,676documents.

31


______________________________________

32

Topic3357OccupyVancouverConfusionMatrix-Topic3357TotalDocuments:902,434TotalRelevant:629 TotalPrevalence:0.07%

Topic3357wasrunbyReichenberger.Theworktosearchthe902,434NewsArticlesdatabasestartedonJuly29,2015,andcompletedonJuly30,2015.Theinitialsubmissionsonthefirstdayweretotesttheoutlinesofthecategory.Theinitialsearchof“Occupy”AND“Vancouver”identifiedaseriesofprotestsinVancouverabouteconomicincomeinequality.Documentswereselectedbasedonavaryingofcontent,including“Occupy”movementsinothercities,riots/proteststhattookplaceinthesamearea(butnotsametime)astheOccupyVancouverprotests,andgenericstoriesabout“Occupy”proteststhatreferenceprotestsinVancouverbutdonotspecificallynamethemas“OccupyVancouver.”Varioussourceswerealsotested,suchasLetterstotheEditor,storiessourcedinothercitiesandsoforth.Resultshelpedformulateananticipatedruleonrelevance.AftertrainingEDRandreceivingpriorityscores,relevantdocumentsonsubsequentsubmissionswereconfirmedbytheserulesandtheirpriorityscores.Infact,ofthefiveirrelevantdocumentsfoundinthelast2submissionsonJuly29th,threescoredover97%andcontainedsubstantialanddirectreferencestoOccupyVancouver;thesemaybeTRECcodingerrors.AmodifiedStepThree,RandomSampleof1,000documentswastakenafterStepTwowascomplete.Thefirst500contained50“training”documentstofocuson,whilethesecond500documentscontained250.Alldocumentshittingon“Occupy”OR“Vancouver”OR“AshlieGough”(astudentwhodiedattheprotests)OR“RobsonSquare”(locationoftheprotests)werereviewed,whileallothersmasstrainedasirrelevant.ThelastTRECsubmissiononJuly29thwasfromthe1,000randomdocuments.Ofthe1,000documents,33wereidentifiedasrelevant,confirmedbysubmission.

@Reas.Call

@97.5%Recall


33

Onthesecondday,the30th,submissionsbydocumentscontainingsearchtermsandescalatedasrelevantwerereviewedandsubmittedinpriorityorder.Inthefirstsubmissionoftheday,123weresubmittedasrelevantand118camebackasconfirmedrelevant.Ofthefiveirrelevantinthatset,fourweredocumentsthathadtheexactsamerelevanttextasdocumentsTRECpreviouslyconfirmedasrelevant.Thisisanotherexampleofthekindof“goldstandard”inconsistenciestheTeamencounteredinmostoftheTopics.Inthenextsetofsubmissions,documentsescalatedasrelevantbyMr.EDRincludedstoriessourcedintheVancouverpaperonOccupymovementselsewhere,andsportsstorieswiththeword“occupy”inthearticle(e.g.“AnotherVancouverplayeroccupiedthepenaltybox”).Oncethosedocumentswereremovedasirrelevant,allothersweresubmittedandconfirmedasrelevantonsubmission.Someadditional“grayarea”documentsweresubmitted(e.g.“OccupyChristmas”whichwasanoffshootoftheprotests,orcampaignquestionsposedtocandidatesabouttheOccupyVancouverprotests).AstheMr.EDRrankingscoresdecreased,theprecisiondropped.Priortothefinalsubmissions,alldocumentswith“Occupy”and“Vancouver”withrelevanceprobabilityscoresover0.1%hadeitherbeensubmittedorreviewed,andalldocumentswithscoresover75%withoutthosetermshadalsobeenreviewed.AfterthefinalReasonablecallwasmadetheremainingdocumentsweresubmittedinthefollowinggroupsindescendingpriorityorder:1)alldocumentscurrentlycodedasirrelevantbythehumanreviewernotyetsubmitted(2,212documents,ofwhich45werefoundtoberelevant);2)anythingremainingwith“Occup!”AND“Vancouver”(493documents,allthesehadscoresbelow0.1%,ofwhich8werefoundtoberelevant);andthen3)allelse(norelevantdocumentsfoundinthisset).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonablecall.

34

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheOccupyVancouvertopic,bythetime97.5%Recallhadbeenattainedonly0.18%ofthecorpus,1,584documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.82%or900,850documents.


35

______________________________________

Topic 2158UsingTORforAnonymousBrowsingontheInternetConfusionMatrix-Topic2158TotalDocuments:465,149TotalRelevant:1,261 TotalPrevalence:0.27%

Topic2158wasrunbySullivanwhoalsostartedonJuly29,2015.Hefinishedhisreviewof465,149forumpostsinBlackHatWorldonJuly31,2015Sullivan’scomputerbackgroundprovedtobehelpfulinanotheruncommonforumtopic.Heconsidershimselfmoreknowledgeableonthistopicthantheaverageperson,butdoesnotconsiderhimselftobeasubjectmatterexpertonTOR.Day1ofthistopicstartedwithconceptsearchingtofindotherkeywordsrelatingtoTORandanonymousbrowsing.Manypreviouslyunknowntermscametolight,suchasvpn,torbrowser,proxy,andip.ThisprocessofusingconceptsearchingatthebeginningofeverytopicbecamestandardprocessforallremainingreviewsdonebySullivan.Theresultsofthisexercisewereusedinfuturekeywordsearchesaswellasdatabase-widekeywordhighlighting.Next,Sullivanstartedmanuallyreviewingsomeofthehitsontermshefeltwouldbemostlikelytoyieldresponsivedocuments.Startingwith102documentsthathiton“TOR”and“anonym*”andmovingontohitson“TORBrowser,”then“TOR”and“Prox*.”Itwasnotdifficulttofindarelativelyhighquantityofrelevantdocuments.108relevantdocumentsand100irrelevantdocumentsweretrainedforpredictivecodingwhenthefirstlearningsessionwasrun.Afterthefirstlearningsessioncompleted,Sullivanmanuallyreviewedthehighestscoringdocumentsthatcontainedtheterm“TOR”andfoundalmostalltoberelevant.Atthe

@Reas.Call

@97.5%Recall


36

conclusionofthefirstday,214documentshadbeensubmittedtoTREC,withall214being

returnedasrelevant.

Day2consistedofmanyiterationsoflearningsessionsandevaluatingsearchresults.Similarto

howSullivanreviewedTopic2052,hestartedwithanarrowlistofkeywordsearchesand

broadenedthetermsiteratively.Foreachset,hereviewedthedocumentswiththehighest

predictivecodingscores.Startingthedaywith“TOR”and“prox*,”hemovedto“TryTOR,”“Try

usingTOR,”and“UseTOR.”Eventuallyhemovedtoalldocumentsthatcontained“TOR”or

“T0R.”EverydocumenthedeterminedtoberelevantwassubmittedtoTREC.

Attheendoftheexercise,Sullivanhadsubmitted1,339documents,with1,244beingreturned

asrelevantand95beingreturnedasnotrelevantaccordingtotheTRECstandard.Atthispoint

hecalledhisshotatReasonableRecall.

Day3startedwiththesubmissionofallremainingdocumentsthatcontainedtheterm“TOR”as

amethodtocatchanydocumentspotentiallymissed.Noadditionalrelevantdocumentswere

returned.

Allremainingdocumentsinthedatabaseweresubmittedinorderofdescendingpredictive

codingscore.14morerelevantdocumentswerereturned.Evaluationofthesedocumentsled

tofindingspectacularerrorsintheTRECstandard.All14contained“*tor*”insomecontext,

butnonehadanyevenmarginallinkstothecurrenttopic.Amajorityofthemisseddocuments

containedtheterm“hostigator.com.”Evaluationofthese14documentsresultedina

determinationthatall14werecausedbyanerrorintheTRECclassificationsystem.

Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenline

signifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocuments

submitted(green)astrackedacrossvaryingrecallthresholds.OntheUsingTORforAnonymous

InternetBrowsingtopic,bythetime97.5%Recallhadbeenattainedonly0.28%ofthecorpus,

37

1,294documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.72%or463,855documents.


______________________________________

38

TOPIC104NewMedicalSchools


Topic104wasrunbyLoseywiththeassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonJuly31,2015andconcludedonAugust4,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwiththeassistanceatfirstofJensen.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnAugust3,2015,aftermaking8documentsubmissionstoTRECprovidingatotal199documents,Loseyhadfoundatotalof157relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was1,091documents.Afterthe8thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof69.16%hadbeenattained,withPrecisionof78.89%,andF1of73.71%.HemadethecalldecisionalittleprematurelyonthisTopic.Inthenextsubmissionofonly20documents,LoseybroughttheRecalllevelupto71.37%withPrecisionof73.97%.Inthenextsubmissionof781documentshebroughttheRecalllevelto77.97%.Therewereatotalof7additionalsubmissionstoTRECaftertheReasonablecallpoint.Aftersubmittingatotalof1,611documents,whichisonly0.56%ofthetotaldocuments,andreviewingonly1,091documents,an80%Recallwasattained.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

@Reas.Call

@97.5%Recall


39

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheNewMedicalSchoolstopic,bythetime97.5%Recallhadbeenattained82.16%ofthecorpus,238,331documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining17.84%or51,768documents.


40

______________________________________

Topic109ScarletLetterLaw

ConfusionMatrix-Topic109ScarletLetterLawTotalDocuments:290,099TotalRelevant:506 TotalPrevalence:0.17%

@Reas.Call

@97.5%Recall


41

Topic109wasrunbyLoseywiththeassistanceofareviewattorney,Bottolene.Theworktosearchthe290,099BushEmailsstartedonAugust3,2015andconcludedonAugust11,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwiththeassistanceatfirstofBottolene.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadesuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnAugust11,2015,aftermaking26submissionstoTRECprovidingatotal510documents,Loseyhadfoundatotalof485relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was953documents.Afterthe26thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof95.85%hadbeenattained,withPrecisionof95.1%.Therewere14additionalsubmissionstoTRECaftertheReasonablecallpoint.Inthenextsubmissionafterthecallofonly121documentsaRecallof98.62%wasattained.Recallof100%wasattainedthreesubmissionslateraftersubmittingonly1,074documents,0.37%ofthetotal,andreviewofonly953documents.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheScarletLetterLawtopic,bythetime97.5%Recallhadbeenattainedonly0.20%ofthecorpus,585documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.80%or289,514documents.

42


______________________________________

43

Topic100SchoolandPreschoolFundingConfusionMatrix-Topic100TotalDocuments:290,097TotalRelevant:4,542 TotalPrevalence:1.57%

Topic100wasrunbyLoseywiththelimitedassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonAugust4,2015andconcludedonAugust8,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwithsomeassistanceatfirstofJensen.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadeacoupleofsuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnAugust6,2015,aftermaking44submissionstoTRECprovidingatotal2,537documents,Loseyhadfoundatotalof2,441relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was651documents.Afterthe44thTRECsubmission,LoseydecidedtocallReasonable.Thisprovedtobeaprematurecall.ItwaslaterdeterminedthataRecallof53.74%hadbeenattained,withPrecisionof96.22%,andF1of68.96%.Therewere19additionalsubmissionstoTRECaftertheReasonablecallpoint.Aftersubmittingatotalof7,541documents,whichisonly2.6%ofthetotaldocuments,andreviewingonly651documents,a70%Recalllevelwasattained.ARecallof80%wasattainedaftersubmitting6.28%ofthetotaldocuments,andRecallof90%aftersubmitting7.92%.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

@Reas.Call

@97.5%Recall

TruePositives 2,441 4,429TrueNegatives 285,459 199,460FalsePositives 96 86,095FalseNegatives 2,101 113Recall 53.74% 97.51%Precision 96.22% 4.89%F1Measure 68.96% 9.32%Accuracy 99.24% 70.28%Error 0.76% 29.72%Elusion 0.73% 0.06%Fallout 0.03% 30.15%

44

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheSchoolandPreschoolFundingtopic,bythetime97.5%Recallhadbeenattainedonly31.20%ofthecorpus,90,524documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining68.80%or199,573documents.


45

______________________________________

Topic107TortReformConfusionMatrix-Topic107TotalDocuments:290,099TotalRelevant:2,369 TotalPrevalence:0.82%

@Reas.Call

@97.5%Recall


46

Topic107wasrunbyLoseywiththelimitedassistanceofareviewattorney,Jensen.Theworktosearchthe290,099BushEmailsstartedonAugust5,2015andconcludedonAugust15,2015.TheprojectcommencedwithLoseyandhisassistantbeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodal,wasperformedwithsomeassistanceatfirstofJensen.LoseyhandledalloftheAIrelatedsearchesinStepFive,includingtheprobabilityandrankingrelatedsearches.Hisassistantfocusedonkeywordsearchesandalsomadeacoupleofsuggestionsofdocumentstosubmit.Again,allfinaldecisionsonsubmittalweremadebyLosey.OnAugust14,2015,aftermaking48submissionstoTRECprovidingatotal2,259documents,Loseyhadfoundatotalof1,950relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was1,164documents.Afterthe48thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof82.31%hadbeenattained,withPrecisionof86.32%,andF1of84.27%.Therewere31additionalsubmissionstoTRECaftertheReasonablecallpoint.Aftersubmittingatotalof2,648documents,whichisonly0.91%ofthetotaldocuments,andreviewingonly1,164documents,a90%Recalllevelwasattainedwith80.55%Precision.Recallof95%wasattainedaftersubmitting3,963documents,1.37%oftotal.Recallof98%wasattainedaftersubmitting5,843documents,2.01%oftotal.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheTortReformtopic,bythetime97.5%Recallhadbeenattainedonly2.01%ofthecorpus,5,843documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining97.99%or284,256documents.

47


______________________________________

48

Topic3481Fracking

ConfusionMatrix-Topic3481FrackingTotalDocuments:902,434TotalRelevant:1,966 TotalPrevalence:0.22%

Topic3481wasrunbySullivanwhostartedonAugust4,2015.Hefinishedhisreviewof902,434NewsArticlesonAug.7,2015after7totalhoursofeffort.

Sullivanhadnobackgroundorknowledgeoffrackingpriortothisexercise.Whileexpert

knowledgewasnotnecessary,therewereafewinstanceswheresomeadditionalknowledgeof

thetopicwouldhavebeenhelpful.

Sullivanhadpreviouslytackledtopicsintheforumsdataset,butthiswashisfirsttopicinthe

Newsdataset.Hefoundthelackofspellingissuesandoverallconsistencyinthedocuments

providedamucheasiersetofdatatoreview.Muchlessmanualreviewwasnecessarywiththe

newstopics.

Onthefirstday,Sullivanusedconceptsearchingtoidentifysimilartopics,perhisstandard

process.Hecreatedalistofmostlikelyrelevantkeywordsandusedthelistforsearchingand

keywordhighlighting.Bothsearchandkeywordhighlightinglistsweremodifiedthroughthe

courseofthereviewasnewinformationwasobtained.

Sullivandecidedtogowithadifferentapproachtothistopic.Ratherthanperformingamanual

reviewofdocumentstobegin,hedecidedtosubmitasrelevantanydocumentthatcontained

over5instancesoftheterm“fracking”withoutreview.286documentsmetthisstandard,and

allwerereturnedasrelevantwhensubmittedtoTREC.

Whilethedatausedforthisexercisedidnotcontainanymetadata,Sullivandeterminedanytext

thatappearedinthefirst2linesofthedocumentcouldbeconsideredthedocument’stitle.He

found61documentsthatcontained“fracking”inthetitleandanadditionalinstanceoffracking

elsewhereinthedocument.All60werereturnedasrelevant,with1onenotrelevant.Further

@Reas.Call

@97.5%Recall


49

evaluationdeterminedthenotrelevantdocumentwasanerrorintheTRECstandard.Next,9documentswerefoundwhichcontained“hydrofracking”inthetitle.All9werereturnedasrelevant.Hethencontinuedwithslightvariationsuntilsubmittingalldocumentsthatcontain2ormorehitsontheterm“fracking.”After1hourandmanualreviewof29documents,746documentshadbeensubmittedwith745beingreturnedasrelevant.Sullivancontinuedmanuallyreviewingthedocumentswithasinglehitonfrackingtosortoutthefalsepositives.Afterreviewingacouplesetsofdocuments,heinitiatedhisfirstpredictivecodinglearningsessionforthistopic.OnthestartofDay2,Sullivanbelievedhehadfoundnearlyallrelevantdocumentsforthistopic.However,afterreviewingdocumentswithhighpredictivecodingscores,hequicklyrealizedthat“fracturing”wasanotherkeytermhehadn’tpreviouslyconsidered.Theuseofpredictivecodinghelpedhimquicklyfindanadditional400relevantdocumentsthatwouldhavebeenlostifusingkeywordsearchingalone.ReasonableRecallwascalledaftersubmitting2,077documents,with1,893returnedasrelevant.Theremainingdocumentsweresubmittedinorderofdescendingpredictivecodingscores,and73morerelevantdocumentswerereturned.AnevaluationofthereturneddocumentscontainedmanyerrorsintheTRECstandard,aswellasafairnumberofrelevantdocumentsthatwerenotproperlycapturedduetoSullivan’slackofknowledgeoffrackingandrelatedminingterms.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheFrackingtopic,bythetime97.5%Recallhadbeenattainedonly0.27%ofthecorpus,2,439documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.73%or899,995documents.

50

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%recallusingthemulti-modalhybridmodeloftrainingEDR.

______________________________________

51

Topic3431KingstonMillsLockMurdersConfusionMatrix-Topic3431TotalDocuments:902,434TotalRelevant:1,111 TotalPrevalence:0.12%

Topic3431wasrunbyReichenberger.Theworktosearchthe902,434NewsArticlesdatabasestartedonAugust4,2015,andwascompletedonAugust5,2015.Theinitialsubmissionsonthefirstdayweretotesttheoutlinesofthecategory.Theinitialsearchof“Kingston”AND“murder”identifiedasensationalizedmurderstoryaboutamanwiththelastname“Shafia”murderinghisdaughtersinan“honorkilling.”Documentscontainingtheinformationinvariousforms(headline,text,“clickbait”linkreferenceatendofarticle)weresubmitted.Resultshelpedformulateananticipatedruleonrelevance.AftertrainingMr.EDRandreceivingrelevancepriorityscores,asearchonthespecificvictimnamesor“Shafia”weresortedbyprioritizationorder.Samplesof10documentsabove90%,10between80-90%,10between60-80%,10between25-60%and10below25%showedthatdocumentsabove60%wereverylikelyrelevant.Infact,documentsscoringover90%allhadmultiplenamehitsandwerespecificallyonpoint;documentsinthemiddlerangeswereusuallyindirectlyrelated(e.g.about“honorkilling,”ordomesticabuse,ormoreofacasualreferencetotheKingstonMillsmurders);andthosedocumentsbelow5%werealmostalwaysirrelevant.Asatest,thesecondsubmissioncontainedalldocumentswithascoreover90%,alongwithsamplesofseveraldocumentsatvariousscoresgreaterthan50%,cuttingthesubmissionoffat200documentseven.Withonly111documentsreviewedeyesontothispoint,Reichenbergerhada98.5%precisionon205documentssubmitted.Ofthe205documentssubmittedtothispoint,theonly3irrelevantdocumentsallhadthesametrait:“Shafia”appearedintheheaderbuttherewasnoreferencetoitinthetext.Similardocumentsweremasscodedasirrelevantgoingforward.Likewise,peoplewithnamessimilartothevictimswerefoundinthe40-60%probabilityrangebutwere“falsepositive”documents.TheseincludedanAPphotographer,thePresidentofGambia,andprotestersinYemenwithfirst

@Reas.Call

@97.5%Recall


52

namesthesameasoneofthevictims.Searchesweredoneonthosespecificnamesandmass-taggedasirrelevant.Afteramachinelearningsession,thescoresadjusteddroppingthosefalsepositivenamestothebottom.Atthispoint,asamplingofkeytermhitsshowedeverythingover20%scoreswererelevant,andeverythingbelow1%wereirrelevant.Everythinginbetweenwerelowqualityreferencestothemurderswithsomeirrelevantdocumentsmixedin.Assuch,thenextsubmissionwasforeverythingwithakeytermover25%relevantscore(456documents)ofwhich449werefoundrelevant.The7documentsfoundirrelevantweremisclicksbyReichenberger(humanerror).Inonecaseadocumentwasprimarilyaboutadifferentmurder,butlaterinthearticletherewasrelevantdiscussionofthetargetmurder.Mr.EDRpickedthisup,butitwasapparentlymissedbyTREC’srelevancescopeadjudications.The70%Recallcallwasthenmadehavingreviewedonly209documents.ItturnedoutthatRecallwasactually58.6%withPrecisionat98.5%.Thenextsubmissionconsistedlargelyofdocumentscontainingasinglelineof“clickbait”linktextfoundbyTRECtoberelevant.Otherdocumentsconsideredweredocumentswithkeytermsthathadscoresraiseabove20%followingthemachinelearningsessionfromtheprevioussetanddocumentswithscoresabove50%withnokeyterms.Whiledocumentswithkeytermswerelargelyfoundtoberelevant,mostofthedocumentswithoutthetermswerefoundtobeirrelevant.Infact,documentsscoringabove70%wereoftentangentialtotheissuesinthemurder(domesticviolencemostly)butnotrelevant,whilethose50-70%hadnosemblanceofrelevanceatall,andwerebeingescalatedbasedoncoincidental“clickbait”textadvertisementlinesattheendofthearticle.Another459documentsweresubmittedwith456werefoundrelevant.Thethreeirrelevantdocumentsallwereonthelowendscoreswithinthesubmissionandwereonlypassingreferencestothecase.Atthispointthe80%recallcallwasmade.Recallwasactuallyat99.64%withaprecisionat99.34%.Only272documentswerereviewedeyesontothispoint,and1120relevantdocumentshadbeenfound.Alldocumentswithscoresover70%hadbeenreviewedorsubmitted,andallthosewithkeytermsandscoresover20%hadbeenreviewedorsubmitted.Followingthesubsequentmachinelearningsession,30documentswereescalatedtoconsider.Oneborderlinedocumentwasconsideredpotentiallyrelevantandsubmitted,returnedasirrelevant,whiletherestallmarkedirrelevant.TheReasonablecallwasmade.AftertheReasonablecallwasmadedocumentsweresubmittedinthefollowinggroupsindescendingpriorityscoreorder:1)threedocumentspotentiallyrelevantfoundwhilependingresultsoftheprevioussubmission(onewasfoundtoberelevant)2)alldocumentsreviewedeyesonanticipatedtobeirrelevant,butnotyetsubmitted(199documents,ofwhichtwowererelevantandtheonlyrelevanttextwithinthesetwodocumentswerecontainedinadocumentpreviouslysubmittedtoTRECandreturnedasirrelevant);3)anythingmass-codedasirrelevant(thisresultedinonerelevantdocument,ofwhichtheredoesnotappeartobeanyrelevantmaterialwithinitandmaybeyetanotherTRECcodingerror);and4)anythingremaining(allirrelevant).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinetheReasonablecall.

53

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheKingstonMillsLockMurderstopic,bythetime97.5%Recallhadbeenattainedonly0.12%ofthecorpus,1,096documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.88%or901,338documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain100%RecallusingtheMultimodalHybridmodeloftrainingMr.EDR.

54

______________________________________

Topic2130SurelyBitcoinsCanBeUsedConfusionMatrix-Topic2130TotalDocuments:465,147TotalRelevant:2,299 TotalPrevalence:0.49%

@Reas.Call

@97.5%Recall

TruePositives 1,961 2,242TrueNegatives 461,007 448,083FalsePositives 1,841 14,765FalseNegatives 338 57Recall 85.30% 97.52%Precision 51.58% 13.18%F1Measure 64.29% 23.23%Accuracy 99.53% 96.81%Error 0.47% 3.19%Elusion 0.07% 0.01%Fallout 0.40% 3.19%

55

Topic2130wasrunbyReichenberger.Theworktosearchthe465,147documentsintheBlackHatWorldForumsdatabasestartedonAugust7,2015andwascompletedAugust13,2015.Theinitialsubmissionsweretotesttheoutlinesofthecategory.ThefirstsubmissionwasninedocumentswithvaryingdiscussionsaboutBitcoin(e.g.bitcoinexchanges,whetherbitcoinwasaccepted,bitcoinmining,etc).Allninecamebackasirrelevant.Asecondsubmissionofninereturnedfiverelevantdocumentsbutnonoticeablecommonalityamongthemexceptthat“acceptbitcoin”wasrelevantand“acceptbitcoins”wasnot.Thenext25documentssubmittedalsofollowedthistrend,withsingular“acceptbitcoin”beingrelevant,thoseinthepluralbeingirrelevant.Alldocumentswith“acceptw/3bitcoin”weresubmittedinthefollowingtwosubmissionsets;however,havingthattextwasnotindicativeofrelevance,assomestillcamebackirrelevant.Likewise,avariationofbitcoin(“BTC”)wassubmitted(15relevant,5irrelevant,noconsistentthread).Afteramachinelearningsession,thesubmitteddocumentswererevisitedanditappearedusingbitcoinforlegalactivityorsomeonevouchingforaforumusertendedtoberelevant,whileillegalorimmoralactivitywereirrelevant.Forthenextsubmission,the60highestscoringdocumentsweresubmittedandanticipatedasrelevant/irrelevantbasedonthepurposeofthetransaction.Whilenotperfect,thislargelycorrelatedwiththeresults.(10expectedrelevant,endresultwas13).Thenextsubmissioncontainedalldocumentswitha90%orhigherprobablerelevantscoreandcontainingtheterm“vouch*”.Ofthe122documents,94wererelevant.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonablecall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheSurelyBitcoinscanbeUsedtopic,bythetime97.5%Recallhadbeenattainedonly3.66%ofthecorpus,17,007documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining96.34%or448,140documents.

56


______________________________________

57

Topic3089PicktonMurders


Topic3089wasrunbyJoeWhite.WorkonthistopiccommencedonAugust5,2015and

concludedonAugust28,2015.Approximately24hourswerespentonthistopic,includingafew

hoursupfrontresearchingthesubjectmatter.Thisservedasaproxyforthee-DiscoveryTeam

HybridMultimodalModel,Step1,ESIDiscoveryCommunications.CompletionofthisTopicwas

drawnoutduetotimeconflictsincludingvacation.

Thecollectionof902,434NewsArticlesweregenerallyeasiertosearchthantheBushEmailsorBlackHatWorldForumposts,thoughthenewsarticlescontainedmanylinks,footersand

subjectmattersthatweresharedwithothernewsstories,creatingtheappearanceofsimilarity.

Aswouldbeexpectedwithnewsarticles,misspelledwordsandnamesseemednonexistent,

whichwashelpful.Whitedid,however,findafewgold-standardinconsistenciesinthistopic.

WhitebeganStepTwo,multimodalsearch,bycreatingseveralkeywordlistsbasedonhis

judgmentandnotesfromtheinitialtopicresearch.Thisresearchincludedevents,names,

locations,andotherinformationrelatedtothecase.Thekeywordlistgoalswereto:(a)to

createaseedsettobeginfindingthepotentiallyrelevantdocumentsandtobegintrainingMr.

EDR;(b)toguesstimatehowlargetherelevantdocumentsetwouldbeakindofrough

substituteforStepThreeSample;and(c)tohighlightrelevanttermsinthesoftwaretofacilitate

moreeffectivereviewandtraining.(Note–allreviewerssohighlightedcertainkeywordsasa

matterofcoursetospeedupandimprovereview.)

Whentheinitialkeywordsbroughtbackonlyjustover220-somedocuments,whilestill

cognizantofthelimitationsofkeywordsearch,Whitebelievedthismeantarelativelysmall

potentialdatasetexisted.Thisaffordedhimtheabilitytoperformalinearreviewofallofthe

keywordhits,butalsomeantthatprecisionwouldbeeasilyharmedbyfalsepositives.Forthat

reasonWhiteknewthatcarewouldbeneededinascertainingtruerelevance.AnormalStep3,

@Reas.Call

@97.5%Recall


58

initialRandomBaselinesample,wasomittedgiventhelikelylowprevalenceandgeneraltimeconstraintsforthework.BasedontheinitialjudgmentalsamplereviewsinStepTwo,WhitesubmittedinitialsetsofdocumentstoTRECtoestablishrelevanceboundariesandbeginwhittlingdownonthesetofrelevantcandidatedocuments.Aminorlossofprecisionwasanticipatedoncertaindocumentsinexchangeforknowledgethatwouldguidesubsequentsubmissions.Eachtimedocumentsweredeterminedtoberelevant,Whiteupdatedthetrainingandpredictiveranking,tofacilitatepriority-drivenreviewthataugmentedthejudgmentalsamplingwork(seestepsFour,FiveandSix:AIPredictiveRanking,MultimodalSearchReview&HybridActiveTraining).Healsoutilizedconceptualsearch(predominantlyFindSimilar,viaLSI)tobranchoffparticularlyinterestingornoveldocumentstolearnmore.AlthoughWhite,likeallofthereviewers,diduseconceptsearch,andsimilaritysearch,hefoundthatthepredictivecodingrankings(usingamorerobusttechnology)provedtobemoreeffectiveoverall.Allreviewershadthesameexperience.Duringtheinitialpartofthesubmissionprocess,WhitetrainedonalldocumentsdeemedrelevantorirrelevantbyTREC.Thishelpedcreateadditionalseparationinthemodelandrankings.InoneinstanceheleftoneobviousTRECmistaketrainedasrelevant(aduplicateofanotherdocumentthathadbeenadjudicatedrelevant)inordertoensurehewouldfindanyotherslikeit.Duringthepredictiveanalysisandtraining,Whitefounditwasmosthelpfultoreviewcertainsetsofdocumentsfromthebottom-up,toanalyzetheleast-likelycandidatesincaseswhererelevanceseemedclear.Inothersetsofdocuments,whererelevanceseemedlesscertain,Whitereviewedfromthetop-down.Afteradditionalanalysiswascompletedand99documentshadbeensubmittedtoTREC,Whitepredictedtherewouldbe200–250relevantdocumentsintotal.(Intheend,hewouldlearntherewere255totalrelevantdocumentsinthistopic,sotheearlypredictionturnedouttobequiteclose.)Whitealsousedrandomsamplinginoneinstance,totrainasetof100documentsthatseemedclearlyirrelevant.ThesedocumentsassistedMr.EDRinseparatingirrelevantdocsfromrelevantonesatapointearlyintheprocesswhenonlyrelevantdocumentshadbeentrained.ThiswasparttheTeam’sexperimentationoftheidealratiosofirrelevanttorelevantintrainingmodels.Asisalmostalwaysthecasewithaniterativetrainingprocess,asthetrainingandlearningcommenced,additionalrelevantsubjectareascametolight.Whilealmostalloftheseareasweresomewhatapparentfromthestart,fascinatingandsubtlenuancesemerged.Newsstoriesonthecasetooklittleturnsandspawnedentirelynewareasofrelevanceuntothemselves.Whitethoughtthebiggestchallengewiththesedocumentswasn’tasmuchaboutwhethertheyexistedorhowtolocatethem,butaboutwhetherTRECwouldseethemasrelevantornot.Hefoundthatithelpedtotrackeachpocketofrelevanceasaseparatesubjectarea,toutilizekeywordsforeachsubjectareatocreatesmallseedsets,andtothenutilizethepredictiverankingswithineachsubjectareatodivedeeperandensurethateachwasadequatelyexplored.Whitemadeatotalof56documentsubmissionstoTRECinthistopic:6submissionsbetweenAug.6thand12th,encompassing184documents,22submissionsbetweenAug.21and27th,encompassing284documents,andtheremaining28submissionsonAug.28th,encompassing901,966documents.InbetweenmostofthesesubmissionsheconductediterativestepsFour,FiveandSixofthestandardworkflow,utilizingpredictiveranking,search,andtraining.

59

After218documentshadbeensubmittedandadditionalpriority-rankeddocumentsandtopkeywordsetshadbeenevaluated,Whitecalled70%.Therewasstillafairquantityofsuspectedborderlinedocumentsin-hand,buthisintuitionwasthathehadprobablysurpassed70%byafairmarginandsoneededtocalltheshot.ActualRecallatthispointturnedouttobe83.53%.Whitethenstudiedcloselythesuspectedborderlinedocumentsbeforehedecidedtosubmitthem.Hewasattemptingtodeterminethescopeofrelevanceforthesesubjectareas.Afterlocatingwhathebelievedtobethefullextentofthesubject,andhavingfound23morerelevantdocuments,hecalledthe80%shot.Whitebelievedhewasevenfartheralongthan80%,giventherankedresultshewasseeing.AsitturnedouttheactualRecallatthispointwas92.55%.Aftersubmitting8moredocumentsthathethoughtmightbeconsideredrelevant,butwereclosequestionsandprobablywouldnot,WhitecalledReasonable.Thiswaswith251totaldocumentssubmitted,236ofthemrelevant,andonly779documentsreviewed.ActualRecallatthispointwasstill92.55%.HavingcalledReasonableandfindingnothingnewthatlookedrelevant,Whiteturnedtohispoolofremainingdocumentsthatlookedirrelevant,toallowthepredictiverankingtohelphimbeingsubmittingthem.Indeed,Mr.EDRhelpedseethingshecouldnot,andsoonfound18additionaldocumentsthatcontainedanobliquereferencetoasubjectrelatedtothecase.Whilethesedocumentsseemedjustasobliqueasothersthatweredeemedirrelevant,thefactthatthepredictiverankingscaughtthemquicklywasreassuring.Afteranadditionalroundoftrainingandpredictiverankingturnedupnoadditionaldocuments,thesubmissionscontinued.Finally,atthe2,000thdocumentsubmitted,a“relevant”documentwasdiscoveredthatcompletedthe255-docset.Thisdocumentappearedtobeaclearmistake,asitwasonlyareferencetoanunrelatedLondon,UKmurder.Afterthat,allremainingdocumentssubmittedwereconfirmedasirrelevant.OnAugust28,2015,aftermaking19submissionstoTRECprovidingatotal251documents,Whitehadfoundatotalof236relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyWhitetoattainthisresult,was834documents.Afterthe18thTRECsubmission,WhitedecidedtocallReasonable.ItwaslaterdeterminedthataRecallof92.55%hadbeenattained,withPrecisionof94.02%.Therewere37additionalsubmissionstoTRECaftertheReasonablecallpoint.Aftersubmittingatotalof462documents,whichisonly0.05%ofthetotal902,434documents,andreviewingonly834documents,a99.61%Recalllevelwasattainedwith54.98%Precision.100%Recallwith12.75%Precisionwasattainedaftersubmissionof2,000documents,whichis0.22%ofthetotal.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

60

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePicktonMurderstopic,bythetime97.5%Recallhadbeenattainedonly0.05%ofthecorpus,457documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.95%or901,977documents.


61

______________________________________

Topic2461OffshoreHostSitesConfusionMatrix-Topic2461OffshoreHostSitesTotalDocuments:465,147TotalRelevant:179 TotalPrevalence:0.04%

Topic2461wasrunbySullivanwhostartedonAugust14,2015.

@Reas.Call

@97.5%Recall

TruePositives 175 175TrueNegatives 463,225 463,408FalsePositives 1,743 1,560FalseNegatives 4 4Recall 97.77% 97.77%Precision 9.12% 10.09%F1Measure 16.68% 18.29%Accuracy 99.62% 99.66%Error 0.38% 0.34%Elusion 0.00% 0.00%Fallout 0.37% 0.34%

62

Hefinishedhisreviewof902,434NewsArticlesonAug.15,2015after5.0totalhoursofeffort.Sullivan’sbackgroundandknowledgeinhostsiteswasexpectedtobehelpfulinthistopic,butinrealityitworkedagainsthim.Whilehedoesnotconsiderhimselftobeasubjectmatterexpertonthistopic,hehasasolidlevelofknowledgewithhostsites.Thisproveddifficult,becausehethoughtheknewwhatdocumentsshouldbeconsideredrelevant,buttheTRECgoldstandarddisagreedwithmostofhisdeterminations.Perhisstandardprocess,Sullivanstartedwithconceptsearchingtoidentifypopularkeywordstouseashighlightingandfuturesearches.ThisgeneratedalonglistoftermsrelatingtodifferenthostingsitesandVPNs.SullivancontinuedwiththenextstepoffindingsomedocumentstoseedforpredictivecodingandgetanunderstandingoftheTREClineforrelevance.Hefound8documentsthathiton“offshorehost*site*”andcontainedclearlyrelevantcontentbyhisdefinition.TRECdeterminedall8tobenotrelevant.Hethenfound5documentsthatrelatetospecificoffshorehostingsites,suchashostingpanamaandanonhoster.TRECreturned1relevantand4notrelevant.HecontinuedtotrydifferentvariationsoftermsrelatingtohostingisspecificcountriesanddocumentswithdifferenttypesofcontentandcouldnotfindanylogictotheTRECrelevancestandard.Frustrated,heinitiatedalearningsessionandtookabreak.Uponreturning,hedecidedtotryatestsubmissionof29topscoringdocumentsthatcontainedthetext“offshore”w/2“host”withoutlookingatanyofthedocuments.Tohissurprise,26ofthedocumentswerereturnedbyTRECasrelevant.Inareviewofthedocuments,hesawnodifferencebetweenthecontentoftheTRECrelevantdocumentsandthedocumentshefoundandsubmittedthatwerereturnedasnotrelevant.TheonlygeneralcorrelationhewasabletoidentifyistheTRECstandardappearedtofavorsmallersizeddocumentswithahigherproportionofcontentdedicatedtooffshorehostsites.Adocumentwithasinglelinediscussingoffshorehostsiteswasmorelikelytoberelevantthanadocumentwith50linesand10references.Beingunabletodetermineanyreasonableconnectionbetweencontentandrelevance,SullivanhadnochoicebuttocontinueridingMr.EDR’ssuggestionsfordocumentstosubmit.Thisprocessconsistedofmanyiterationsoflearningsessionsandsearching.SimilartohowSullivanreviewedTopic2052and3481,hestartedwithanarrowlistofkeywordsearchesandbroadenedthetermsiteratively.Foreachset,hesubmittedthedocumentswiththehighestpredictivecodingscores.Startingwith“offshore”w/2“host*,”hemovedto“offshore”and“host,”“offshore”and“web,”and“offshore”and“vpn.”Eventuallyhemovedtoalldocumentsthatcontained“offshore”or“hosting.”ThedifferencebetweenthisprocessandwhatwasusedinpriorreviewsisSullivandidnotactuallylookatanyofthedocuments.AshefoundhisjudgmenttobeoutoflinewiththeTRECstandard,documentsweresubmittedwithoutreview.Resultsofasearchwouldbetakenandthetopdocumentswouldbesubmitted.Ifmostweredeterminedtoberelevant,lowersetsofdocumentsfromtheresultwouldbesubmitteduntilalowamountofrelevantdocumentswerereturned.Hewouldthenmoveontothenextsearchandrepeat.Afterexhaustingalloftheallkeyterms,Sullivansubmittedallremainingdocumentsindescendingpriorityorder.

63


signifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocuments

submitted(green)astrackedacrossvaryingrecallthresholds.OntheOffshoreHostSitestopic,

bythetime97.5%Recallhadbeenattainedonly0.37%ofthecorpus,1,735documents,had

beensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionof

theremaining99.63%or463,412documents.

Thelastchartbelowrepresentstheamountofeffortintermsofdocumentsreviewedtoattain

100%recallusingthemulti-modalhybridmodeloftrainingEDR.

______________________________________

64

Topic3290RoosterTurkeyChickenNuisance


Topic3290wasrunbyLoseyalonewhostartedonAugust15,2015andconcludedonAugust23,2015.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust22,2015,aftermaking14submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof95documentstoTRECandconfirmedatotalof23relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was306documents.Afterthe14thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof88.46%wasattainedbysubmissionofonly95documents,whichis0.01%ofthetotal902,434documents.Thiswasaccomplishedbyreviewofonly0.03%ofthetotalcollection.Therewere23additionalsubmissionstoTRECaftertheReasonablecallpoint.InthenextsubmissionafterReasonablecall,the15th,theRecalllevelroseto96.15%.Recallof100%wasattainedaftersubmissionofonly0.15%.A90%Recallwasattainedaftersubmittingonly129documents.A95%Recallwasattainedaftersubmitting1,923documents,and97.5%Recallattainedafter3,188documents.TotalRecallwasattainedaftersubmitting17,414documentsoutofthecorpustotalof902,43(0.15%).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

@Reas.Call

@97.5%Recall


65

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheRoosterTurkeyChickenNuisancetopic,bythetime97.5%Recallhadbeenattainedonly1.93%ofthecorpus,17,414documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining98.07%or885,020documents.


66

______________________________________

Topic2333ArticleSpinnerSpinningConfusionMatrix-Topic2333TotalDocuments:465,147TotalRelevant:4,805 TotalPrevalence:1.03%

@Reas.Call

@97.5%Recall

TruePositives 4,201 4,685TrueNegatives 457,877 450,329FalsePositives 2,465 10,013FalseNegatives 604 120Recall 87.43% 97.50%Precision 63.02% 31.88%F1Measure 73.24% 48.04%Accuracy 99.34% 97.82%Error 0.66% 2.18%Elusion 0.13% 0.03%Fallout 0.54% 2.18%

67

Topic2333wasrunbyLoseywhoalsostartedonAugust19,2015.Hefinishedhisreviewof465,149forumpostsinBlackHatWorldonAugust23,2015.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust21,2015,aftermaking23submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof6,666documentstoTRECandconfirmedatotalof4201relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was228documents.Afterthe23rdTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof87.43%wasattainedbysubmissionofonly6,666documents,whichis.043%ofthetotal465,147documents.Thiswasaccomplishedbypersonalreviewofonly228documents,0.05%ofthetotalcollection.Therewere32additionalsubmissionstoTRECaftertheReasonablecallpoint.Recallof90%wasattainedaftersubmittingaftersubmitting7,091documents,and95%Recallafter10,931.Recallof98%Recallwasreachedaftersubmitting14,698documents,whichwasonly3.22%oftotalof456,147collectionofBlackHatWorldForumposts.Again,thiswasaccomplishedbypersonalreviewofonly228documents,0.05%ofthetotalcollection.InalltopicswealwaysstoppedindividualdocumentreviewaftertheReasonablecallandreliedonMr.Robotsautomaticprocesseswhereinthedocumentsweresubmittedinorderofhighestranking.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheArticleSpinnerSpinningtopic,bythetime97.5%Recallhadbeenattainedonly3.16%ofthecorpus,14,698documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining96.84%or450,449documents.

68


______________________________________

69

Topic2129FacebookAccounts

ConfusionMatrix-Topic2129TotalDocuments:465,147

TotalRelevant:589

TotalPrevalence:0.13%

Topic2129wasrunbySullivanwhostartedonAugust21,2015.Hefinishedhisreviewof

465,149forumpostsinBlackHatWorldonAugust22,2015.

WhilehecountshimselfamongFacebook’s1.5billionactiveusers,Sullivandoesnotconsider

himselfmoreknowledgeableonthistopicthantheaverageperson.

Day1onthistopicstartedlikeallSullivantopicswithconceptsearchingtofindkeywords

relatingtoFacebookaccountsforsearchingandhighlighting.Specifically,variationsof

Facebookspellingandslangwereinvestigatedtoensureallcommonvariantsareidentified.

Manypreviouslyunexpectedvariationsoffacebookwereidentified,suchasfbook.All

variationswereaddedtothehighlightinglistanddocumentedforfuturesearches.

Sullivanspent2.5hoursonDay1tryingtodefinerelevanceaccordingtotheTRECstandard.He

startedwith8documentsthatcontainedclearreferencestofacebookaccounts,andonly1of

thedocumentswasreturnedasrelevantaccordingtotheTRECstandard.Hecontinuedby

isolatingdocumentsthatcontained“Facebookaccount*”inthetitleaswellasanumberof

commonvariants.Attheendoftheday,SullivanwasnoclosertocrackingtheFacebookpuzzle

andwasbarelyabletoexceed50%precisioneventhoughhewasonlysubmittingdocuments

thatwerecertaintoberelevantbyanyobjectivestandard.

Facingwhatappearedtobeadead-end,SullivanstartedDay2byrelyingonthepriorityscores

generatedbyMr.EDR,andstartedtoseemuchbetterresults.WhileSullivanwasunableto

identifywhichdocumentswouldbereturnedasresponsivebyTREC,Mr.EDRseemedtobeable

tofindthepattern.Assuch,hestoppedlookingatthedocuments,andjuststartedsubmitting

alldocumentsthathadahighpriorityscorethatcontainedthetermFacebookoranyknown

@Reas.Call

@97.5%Recall

TruePositives 580 575

TrueNegatives 461,284 462,644

FalsePositives 3,274 1,914

FalseNegatives 9 14

Recall 98.47% 97.62%

Precision 15.05% 23.10%

F1Measure 26.11% 37.36%

Accuracy 99.29% 99.59%

Error 0.71% 0.41%

Elusion 0.00% 0.00%

Fallout 0.70% 0.41%

70

variation,withlearningsessionsbeingrunperiodicallytoupdatethescoresbasedonnewlearning.Oncethosedocumentswereexhausted,allremainingdocumentsweresubmittedindescendingpriorityscoreorder.Hespent2.75hourssubmittingandevaluatingtheresults,foratotalof5.25hoursspentonthistopic.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheFacebookAccountstopic,bythetime97.5%Recallhadbeenattainedonly0.54%ofthecorpus,2,489documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.46%or462,658documents.

71


______________________________________

Topic3378RobMcKennaGubernatorialCandidateConfusionMatrix-Topic3378TotalDocuments:902,434TotalRelevant:66 TotalPrevalence:0.01%

@Reas.Call

@97.5%Recall


72

Topic3357wasrunbyReichenberger.Theworktosearchthe902,434NewsArticlesdatabasestartedonAugust22,2015,andwascompletedonAugust23,2015.Theinitialsubmissionsonthefirstdayweretotesttheoutlinesoftherelevancescope.ItwasascertainedinthefirsttwosubmissionsthatdocumentsrelatingtoMcKennaasacandidatewererelevant,andthoserelatedtohisjobasAttorneyGeneralwereirrelevant.BorderlinedocumentswerethoseassociatedwithhisAttorneyGeneraljobthatcouldbepretexttoapoliticalcampaign(e.g.filingasuitrelatedtoObamacareimplementation).Thethirdsubmissionwasmadewiththenext65documentsbasedonprioritizationwithoutlookingatthecontent;theresultslargelyconfirmedtheanticipatedparameters(43relevant,22irrelevant,withtheborderlinedocumentsskewingtotheirrelevant)The70%callwasmadefollowingthereturnofresults.Afterlookingatwhatwasbeingpromotedbyprioritizationandcontaining“McKenna,”thenext13documentsweresubmitted.Mostoftheseappearedtobeborderline,only4wereadjudicatedrelevantbyTREC.The80%recallcallwasmadeatthatpoint.Onemoresetof14documentswassubmittedandonly3camebackresponsive.ThedecisionwasthenmadetocallReasonable,andthereafterthefinalsubmissionsweremade.Thepostcallsubmissionsweremadebythefollowinggroupsindescendingpriorityscoreorder:1)alldocumentsreviewedthatwerecurrentlyanticipatedtobeirrelevant,buthadnowbeensubmitted(129documents,ofwhich7wererelevant);2)anythingremainingwith“McKenna”(695documents,allirrelevant;andthen3)allelse(allirrelevant).Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonablecall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingRecallthresholds.OntheRobMcKennaGubernatorialCandidatetopic,bythetime97.5%Recallhadbeenattainedonly0.02%ofthecorpus,169documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.98%or902,265documents.

73


______________________________________

74

Topic2322WebScrapingConfusionMatrix-Topic2322WebScrapingTotalDocuments:456,147TotalRelevant:10,145 TotalPrevalence:2.22%

Topic2322wasrunbyLoseywhoalsostartedonAugust22,2015.Hefinishedhisreviewof465,149forumpostsinBlackHatWorldonAugust25,2015.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust25,2015,aftermaking24submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof12,799documentstoTRECandconfirmedatotalof8,060relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was195documents.Afterthe24thTRECsubmission,LoseydecidedtocallReasonable.ItwaslaterdeterminedthataRecallof79.45%wasattainedbysubmissionofonly12,799documents,whichis2.8%ofthetotaldocuments.Thiswasaccomplishedbyreviewofonly0.04%ofthetotalcollection.Therewere21additionalsubmissionstoTRECaftertheReasonablecallpoint.InthenextsubmissionafterReasonablecall,the25th,1,000documentsweresubmittedandtheyallcamebackrelevant.Obviouslyanerroringamesmanshiphadbeenmadeandthecallwasmadealittletooearly.Afterthat25thsubmission,theRecalllevelroseto89.31%andthePrecisionincreasedto65.66%.A90%Recallwasattainedaftersubmitting14,477documents.A95%Recallwasattainedaftersubmitting16,983documents,and97.5%Recallattainedafter19,821documentsweresubmitted,whichwasonly4.35%oftotalof456,147collectionofBlackHatWorldForumposts.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

@Reas.Call

@97.5%Recall

TruePositives 8,060 9,892TrueNegatives 441,263 436,073FalsePositives 4,739 9,929FalseNegatives 2,085 253Recall 79.45% 97.51%Precision 62.97% 49.91%F1Measure 70.26% 66.02%Accuracy 98.50% 97.77%Error 1.50% 2.23%Elusion 0.47% 0.06%Fallout 1.06% 2.23%

75

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheWebScrapingtopic,bythetime97.5%Recallhadbeenattainedonly4.35%ofthecorpus,19,821documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining95.65%or436,326documents.


76

______________________________________

Topic3484PaulandCathyLeeMartin


@Reas.Call

@97.5%Recall


77

ThisTopicwasrunbySullivanwhostartedonAugust24,2015.Hecompletedhisreviewof902,434documentsonAugust25,2015.TheentireTeamobservedhisfinalsubmissionsandcheeredonhisperfecthandlingofthissearchproject.ThistopicwascompletelyunknowntoSullivanpriortothisexercise.HisonlyknowledgecamefromaquickGooglesearchonthetopic.SullivanstartedlateonDay1andbeganwithasimplesearchusingthefollowingkeywords:((martinw/3paul)ANDcathy)OR((martinw/3cathy)ANDpaul).Thissearchreturned26documents.Aquickreviewofthedocumentsyielded22clearlyrelevantdocumentsand1marginallyrelevant.Sullivansubmittedthe22relevantdocuments,whichwereallreturnedasrelevantbyTRECandquitforthenightafter15minutesofeffort.OnDay2,Sullivanwentbacktohisstandardprocessofusingconceptsearchingtofindrelevantkeywordsforhighlightingandsearches.Aswithalltopicsindataset3,spellingerrorswerenon-existent,whichremovedtherequirementofbroadsearchingtoaccountforslangorspellingissues.Broadsearcheswererunusingallrelevantkeywordsandtheresultsweresampled.Nextpredictivecodingscoreswereusedtoidentifyadditionalpotentiallyrelevantdocuments.AlargenumberoffalsepositiveswereencounteredwhenitwasdiscoveredapopularhockeyplayerandPrimeMinistersharedthesamenamesastheparties.Thesewerequicklyidentifiedandexcludedfromthepotentiallyrelevantset.After90minutesofwork,Sullivanconcededthathewasunabletofindanyadditionalrelevantdocuments.InreviewingthesinglemarginallyrelevantdocumentfoundonDay1,itwasdeterminedthisdocumentwasverylikelytoberelevant,soitwassubmittedtoTRECandwasinfactreturnedrelevant.Atthispoint,Sullivancalledreasonablerecallandsubmittedallremainingdocumentsindescendingorderofpriorityscore.Afteralldocumentsweresubmitted,itwasdiscoveredthatSullivaninfacthadattained100%recalland100%precisionatthepointthereasonablecallwasmade.Additionally,95.7%recallwasattained,with100%precision,afteronly15minutes.Inall,hewasabletoachieveaperfectgamewithonly1.75hourscommittedtothistopic!Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

78

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePaulandCathyLeeMartintopic,bythetime97.5%Recallhadbeenattainedonly0.00%ofthecorpus,23documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining100.00%or902,411documents.


79

______________________________________

Topic2134PaypalAccountsConfusionMatrix-Topic2134TotalDocuments:465,147TotalRelevant:252 TotalPrevalence:0.05%

Topic2134wasrunbySullivanwhostartedonAugust26,2015.Hefinishedhisreviewof465,149forumpostsinBlackHatWorldonAugust26,2015.

@Reas.Call

@97.5%Recall

TruePositives 241 246TrueNegatives 461,447 443,136FalsePositives 3,448 21,759FalseNegatives 11 6Recall 95.63% 97.62%Precision 6.53% 1.12%F1Measure 12.23% 2.21%Accuracy 99.26% 95.32%Error 0.74% 4.68%Elusion 0.00% 0.00%Fallout 0.74% 4.68%

80

AsaregularPayPaluserforabout10years,Sullivanhasahighlevelofknowledgeregardingthis

topic.Thisadvancedknowledgeprovedtobeaburdenonthistopicbecausehisunderstanding

ofwhatshouldberelevantdidnotmatchwiththeTRECgoldstandard.Hewasableto

overcomethisburdenbyrelyingonavarietyofadvancedmethodsratherthanusinghisown

judgmentinreviewofthedocuments.

Sullivanstartedthistopicwithhisusualprocessofrunningconceptsearchestofindsimilarand

relatedkeywordtermsforhighlightingandfuturesearching.Aswithallforumtopics,hespend

sometimeidentifyingcommonvariantsbasedonmisspellingorslang.Allvariationswereadded

tothedatabaseforhighlighting.

Whileusinganumberofmethodstoidentifydocumentshefeltwereclearlyrelevant,Sullivan

quicklyrealizedhewasunabletomakeanylogicoftheTRECrelevancestandard.Documents

withsimilaroridenticalcontentwereseeminglyarbitrarilydesignatedasrelevantornot

relevant.Ratherthanspendaconsiderabletimeevaluatingthedocumentshimself,aswasdone

inTopic2129FacebookAccounts,hewentstraighttoMr.EDRforhelp.

SimilartothemethoddevelopedinTopic2129,Sullivanreliedheavilyonthepredictivecoding

anddidverylittlereviewonanydocuments.Hewoulditerativelysubmitthehighestscoring

documentstoTRECforanalysis,andtrainthedocumentswiththerelevancydetermination

returned.Inadditiontousingacontinuousactivelearningapproach,hestartedusingthe“Find

Similar”featuremuchmoretofinddocumentsthatcontainedsimilarcharacteristicsto

documentsalreadydeterminedtoberelevant.Hestartedwithdocumentsthatcontaineda

variationofPayPalinthesubjectline,thenmovingtodocumentsthatcontainedtheterm

anywhereinthetext.Usingthismultimodalmethodhewasabletoworkhiswaythroughthe

entiredatasetwithalmostnoactualreviewofthedocuments.Inall,Sullivanwasableto

completethereviewforthistopicinlessthan4hours.


signifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

81

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePaypalAccountstopic,bythetime97.5%Recallhadbeenattainedonly4.73%ofthecorpus,22,005documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining95.27%or443,142documents.


______________________________________

82

Topic3423RobFordCuttheWaist


Topic3423wasrunbyLoseywhoalsostartedonAugust26,2015.Hefinishedhisreviewof902,434NewsArticlesonAugust27,2015.TheprojectcommencedasusualwithLoseybeginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust26,2015,aftermaking11submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof40documentstoTRECandconfirmedatotalof34relevantdocuments.Theeffort,ornumberofdocumentsreviewedandcodedbyLoseytoattainthisresult,was92documents.Afterthe11thTRECsubmission,LoseydecidedtocallReasonable.Thisprovedtobeaprematurecall.ItwaslaterdeterminedthataRecallof44.74%wasattained.Inthe17automaticsubmissionsthatfollowed,Recallof76.32%wasattainedwith84.06%Precision.The76.32%Recallwasattainedaftersubmittingonly106documents,whichis0.01%ofthetotalof902,434.Therewere17submissionstoTRECaftertheReasonablecallpoint.Total100%Recallwasattainedaftersubmittingonly35,193documents,whichis3.9%ofthetotal.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%Recallcall,andthedarkgreenlinetheReasonableRecallcall.

@Reas.Call

@97.5%Recall


83

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheRobFordCuttheWaisttopic,bythetime97.5%Recallhadbeenattainedonly3.89%ofthecorpus,35,096documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining96.11%or867,338documents.


84

______________________________________

Topic3133PacificGatewayConfusionMatrix-Topic3133TotalDocuments:902,434TotalRelevant:113 TotalPrevalence:0.01%

Topic3133wasrunbyLoseywhoalsostartedonAugust27,2015.Hefinishedhisreviewof902,434NewsArticlesonAugust28,2015.TheprojectcommencedasusualwithLosey

@Reas.Call

@97.5%Recall


85

beginningStepTwo,MultimodalSearchReviews.StepThree,RandomBaseline,wasomitted.Aftersubmissionsbegan,theechoStepFive,multimodalsearchbegan,includingpredictivecodingfeatures,withiteratedtraining.OnAugust28,2015,aftermaking7submissionstoTREC,andtrainingafteralmosteverysubmission,Loseyhadprovidedatotalof97documentstoTRECandconfirmedatotalof87relevantdocuments.Theeffort,ornumberofdocumentsindividuallyreviewedandcodedbyLoseytoattainthisresult,was49documents.Afterthe7thTRECsubmission,LoseydecidedtocallReasonable.Thatcallprovedtobealittlepremature.ItwaslaterdeterminedthataRecallof76.99%wasattainedwithPrecisionof89.69%.Inthe6thautomaticsubmissionafterthecall,aRecallof94.69%wasattainedaftersubmittingonly693documentstotal,whichis0.07%ofthetotalof902,434.Therewere24submissionstoTRECaftertheReasonablecallpoint.Total100%Recallwasattainedaftersubmitting103,189documents,whichis11.43%ofthetotal.Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OnthePacificGatewaytopic,bythetime97.5%Recallhadbeenattainedonly11.35%ofthecorpus,102,446documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining88.65%or799,988documents.

86


______________________________________

87

Topic3226TrafficEnforcementCamerasConfusionMatrix-Topic3226TotalDocuments:902,434TotalRelevant:2,094 TotalPrevalence:0.23%

Topic3226wasrunbySullivanwhoalsostartedonAugust27,2015.Hefinishedhisreviewof902,434NewsArticlesonAugust28,2015.Sullivanhassomepriorexperienceasacriminaldefenseattorney,withexperiencewithtrafficlaws,buthehasnopriorexperiencewithtrafficenforcementcameras,whichwerenotinuseatthetimehewaspracticing.Asusual,Sullivanstartedhisinvestigationwithhisstandardprocessofusingkeywordandconceptsearchestoformulatealistofrelatedkeywordsforhighlightingandfuturesearching.Forthisexercise,nothingextraordinarywasdiscovered,buthewasabletogenerateagoodlistoftermsrelatingtotrafficcameras,redlightcameras,andtraffictickets.Day1wasashortdayandstartedwithsubmittingtheresultsofthemostpopularkeywordsearcheswithminimalreview.After30minutesofwork,76documentsweresubmittedwith50beingreturnedasrelevant.UsingthedocumentsidentifiedonDay1,SullivanwasabletostartutilizingthepredictivecodingtosupplementhissearchesonDay2.Hewasabletoprogressivelymakehiswaythroughthereviewsetusingacombinationofpredictivecodingscoresandkeywordhits.Heusedthismultimodalapproachtosubmitlargesetsofdocumentswithminimal,ifany,manualreview.Hebelievedhehadfoundallrelevantdocumentsaftersubmittingonly5,347totaldocumentswith2,061relevant.Aftersubmittingalloftheremainingdocumentsindescendingorderbypredictivecodingpriorityscore,itwasdiscoveredheonlymissed33oftherelevantdocumentsinthedatasetaftersubmitting0.6%ofthedocuments!Becauseheminimizedtheamountofmanualreviewonthistopic,hewasabletocompletethistopicafter3.0hoursonDay2,foratotalof3.5hoursonthistopic.

@Reas.Call

@97.5%Recall

TruePositives 2,061 2,042TrueNegatives 897,054 899,807FalsePositives 3,286 533FalseNegatives 33 52Recall 98.42% 97.52%Precision 38.54% 79.30%F1Measure 55.39% 87.47%Accuracy 99.63% 99.94%Error 0.37% 0.06%Elusion 0.00% 0.01%Fallout 0.36% 0.06%

88

Agraphmappinghowthereviewwasconductedappearsbelow,withthelightgreenlinesignifyingtheanticipated70%recallcall,andthedarkgreenlinethereasonablerecallcall.

ThefollowingchartshowsPrecision(leftandblueline),F1(red)andpercentofdocumentssubmitted(green)astrackedacrossvaryingrecallthresholds.OntheTrafficEnforcementCamerastopic,bythetime97.5%Recallhadbeenattainedonly0.29%ofthecorpus,2,575documents,hadbeensubmittedforadjudication.Thelastportionofthegraphthusrepresentsthesubmissionoftheremaining99.71%or899,859documents.


e-Discovery Team at TREC 2015 Total Recall Track

Documents

Transcript of e-Discovery Team at TREC 2015 Total Recall Track