Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly...

13
© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 1 Software Bugs in Common E-Discovery Search Tools Building awareness of issues in e-discovery technology Winston Krone, Esq. Megan Bell October 2010 KIVU CONSULTING, Inc. 220 Montgomery Street Suite 1950 San Francisco, CA 94104 Tel: (415) 524-7320 Fax: (415) 524-7325 www.kivuconsulting.com PI License #26798

Transcript of Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly...

Page 1: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 1

Software Bugs in Common E-Discovery Search Tools Building awareness of issues in e-discovery technology

Winston Krone, Esq.

Megan Bell

October 2010

KIVU CONSULTING, Inc. 220 Montgomery Street Suite 1950 San Francisco, CA 94104 Tel: (415) 524-7320 Fax: (415) 524-7325 www.kivuconsulting.com PI License #26798

Page 2: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 2

1. Introduction Here’s a dirty secret. Most e-discovery searches in the last four years have been missing responsive files because of known bugs in common search software. If your case involved Microsoft Office 2007 files, it’s a near certainty. And yet the issue is never raised up front, or it’s buried in voluminous “exception logs.” There are three causes:

1. Common Microsoft Office files (even Word files) have become more complex in structure with each version of Office. They’re now akin to mini-web pages, with compound layers of underlying data. Until the last few months, most of the programs powering the common e-discovery search engines were unable to drill down through all these layers. If your term (say, “Highly Confidential”) was on the footer of each page, your search may well have missed it.

2. Even when the software works, many e-discovery search engines deliberately “dumb

down” their searches by default. Imagine your search term is “Highly Confidential” in bold or red italics. You rely on your search tool to find different variations of the search term because the manual or the salesperson said it would? Big mistake.

3. As e-discovery has been commoditized and costs driven down, genuine quality control

and checking against the specific data being searched has become an expensive option. It’s a rare e-discovery vendor who actually understands how their search tools work or understands the complexities of the data being searched.

Page 3: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 3

2. The Reliance on Search Engines Computer investigators and e-discovery professionals depend on reliable search results from the numerous software tools in the market. These tools offer multiple types of processing, including searching, parsing and de-duplication. They range from off-the-shelf products to so-called “proprietary” tools developed by individual e-discovery providers. However, whatever the outer packaging, these tools commonly incorporate one of a limited number of software engines to actually run their searches, the most common being the dtSearch Text Retrieval Engine. Assuming search parameters are well-defined, the expectation is that search engines will locate all instances of a keyword in a defined search and not miss content due to errors in software code. Consistent reliability assumes the developer has worked out any errors in its supported file formats and that the latest version of a program is (relatively) bug free. End-users must stay current with their tools, but with this being said, reliability then falls back to the tools being used. The same level of reliability is often mistakenly expected for search tools as for a scientific test performed in a lab. For example, if a sample is weighed on a mass-balance in a lab, the reported mass would be expected to be accurate within several thousandths of a gram. The expectation of a high degree of accuracy leads to reliance. The same object should have the same mass reported every time. Unfortunately, this assumption does not hold true for search tools. Mass is a well-defined concept, and the measurement of mass does not change. One should not think of search tools in the same light as a mass scale. For example, data file formats for commonly used applications (think Microsoft Office Suite) can vary greatly across different software releases. This means that data files for the same software application can be stored differently and have a different internal structure for each software release. Every time there is a change, search tools need to account for this difference in their design. Search tools can also have flaws or bugs in their design. Even when a search tool’s functionality has been heavily vetted, there is a possibility that errors still exist in the underlying code. Search tools have to regularly adapt with changing technology and actively correct errors as they are detected. 3. Search-Tool Failure During an Investigation dtSearch Software Defect Misses Footer Search Terms dtSearch Text Retrieval Engine1 is widely used search software that also provides the engine in many e-discovery products. However, the software suffered from a defect that was not widely known but carried a big consequence in conducting searches. Specifically, dtSearch was missing search terms in Office 2007 documents (a detailed analysis is set out below). This problem was present for more than two years after Office 2007’s release on November 30, 2006.2

1 dtSearch is a platform of text search and retrieval tools such as dtSearch Text Retrieval Engine. This suite of tools is developed by dtSearch Corp. More information can be found at www.dtsearch.com. 2 Source: http://www.microsoft.com/presspass/press/2006/nov06/11-062007officertmpr.mspx Kivu Consulting identified and verified this issue in early 2010 and reported it to dtSearch Corp. Kivu has been involved in subsequent proofing of the beta version designed to fix the problem.

Page 4: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 4

Discovering dtSearch’s Defect Kivu Consulting first came across this anomaly in a theft of trade secrets case. This involved the common scenario of senior employees going to a rival company and proceeding to share classified documents from their former employer. Kivu Consulting was brought in to conduct a forensic analysis of digital evidence. As litigation was filed, the case quickly revolved around the e-discovery issues of searching and producing relevant files. Attorneys representing both parties provided keyword lists for searching. Kivu Consulting used a number of forensic and e-discovery tools, including dtSearch and other tools based upon dtSearch, to complete the keyword search evaluation on numerous hard drive images and network data. Initially, dtSearch appeared to work without issue. For each forensic image, it mounted the image and used the pre-defined keywords list to create an index. The index yielded a set of electronic files for legal review. However, as part of out quality control procedure, and in conjunction with reviewing counsel, Kivu Consulting determined that certain highly important documents (identified through witness testimony) were not being hit by the search procedure. This was despite the obvious presence in those documents of terms specifically in the search list. Disturbingly, the missed keywords included “confidential” and “secret.” Furthermore, these words were manually typed and not automatic inserts such as auto-text in the footer of a Microsoft Word document. They were not buried in the file metadata. Instead, they were clearly visible to someone who manually reviewed the documents. Our analysis determined that the error resulted from a combination of Microsoft changing its document storage technology in Office 2007 and dtSearch version 7.62 having a defect in dissecting this new file format (even though dtSearch claimed support for Office 2007 documents in release notes).3 Microsoft’s Changing File Formats In the 1990’s, Microsoft began the transition of storing digital files as XML documents. XML or Extensible Markup Language is a method of digitally formatting text so that it can be stored, retrieved, manipulated and displayed across different applications or operating systems. Microsoft’s first implementation of XML was in the Office XP version of Excel4. Microsoft then rolled out Microsoft Office XML formats in its 2003 Microsoft Office release. Each Microsoft Office documents had a single XML file associated with it. Microsoft expanded its use of XML technology by standardizing Office Open XML file format and then switching to this format in the 2007 release of Microsoft Office. The change was Microsoft Office XML formats in Office 2003 to Office Open XML Formats in Office 2007. To those not familiar with Microsoft’s technology, this was a substantial change. Microsoft no longer stored a single electronic file in a binary format.

3 dtSearch reported file parsing fixes for release 7.63 in release notes. 4 Information about Microsoft’s implementation of XML can be found at msdn.microsoft.com.

Page 5: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 5

The file format container in the 2007 release5 Source: msdn.microsoft.com/en-us/library/aa338205.aspx#office2007aboutnewfileformat_structureoftheofficexmlformats

Coupled with the file format change to XML, Microsoft also changed file storage technology in Microsoft Office 2007. Microsoft switched to ZIP archiving technology to store electronic files. The ZIP compression format was chosen because of its broad industry acceptance. The benefits of this file format include application independence, better file recovery, reduced file size, easier application development and better control over print quality. Each Microsoft Office 2007 document is digitally stored as a set of smaller related files in a compressed ZIP file—also known as a container. Elements within the same document (e.g. a paragraph of text, a picture or a table) are stored separately following the conventions set forth in the Office Open XML Formats. Kivu investigators spent substantial time in discussion with dtSearch tech support investigators to evaluate and identify the cause behind dtSearch not reading Microsoft Office 2007 documents. To verify the issue, Kivu investigators set up an experiment to evaluate and replicate locating key search terms in responsive documents in dtSearch. Given that end users are often the source of supposed problems with software, reproducibility of errors is critical in confirming software errors. dtSearch Missing Responsive Search Terms dtSearch breaks a file into components parts (e.g., text and formatting) in order to identify words to index. For compound files such as Office 2007 documents, dtSearch first mounts a file in order to access file contents.6 Once a file is accessible, dtSearch uses a process called file parsing to separate words from the remainder of a file’s contents.7 Overall, dtSearch properly parses and

5 Illustration of Microsoft container technology from msdn.microsoft.com. 6 dtSearch must mount (or open) a compound file in order to read the contents inside the file. 7 dtSearch reads a file by parsing, or breaking down, XML into component parts—i.e., formatting or text.

Page 6: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 6

then indexes words in most parts of Office 2007 documents. One exception is the document footer. After extensive analysis, Kivu Consulting determined that version 7.62 of dtSearch was not indexing the footer of Microsoft Office 2007 files (including spreadsheets and documents). Specifically, dtSearch was not correctly parsing words in the footer and therefore did not include these words in indexing. Having failed to index a file properly, dtSearch (and e-discovery tools based on the same version of dtSearch) was not finding responsive search terms existing in the footers. During tests, we noticed that computer forensic tools such as Guidance Software’s EnCase (version 6.13) did recognize and find the search terms in the footers if the Office 2007 files were properly mounted. However, such forensic tools are rarely used for large-scale searching in e-discovery. Kivu Consulting brought this to the attention of dtSearch Corporation in early 2010. Developers there informed us they were aware of the problem and provided us with a beta version 7.63 which, from our tests, had resolved the problems with indexing the footers. However, this still leaves open the possibility that many e-discovery tools used between late 2006 and early 2010 may have failed to search Office 2007 files and missed responsive terms that would have been plainly visible to a human viewer. 4. Software Updates – Sometimes the Fix Breaks Something Else While the above problem of missing words in Office 2007 document footers was eventually resolved, a new problem arose in version 7.64 of dtSearch. Instead of missing entire words as in version 7.62, the tool now identified words but some words were incorrectly parsed so that they included the surrounding HTML formatting tags. What does this mean in plain English? Imagine the term “Confidential” in the footer of an Office 2007 document. The word is formatted in some way (e.g., italicized Arial 12-point font or colored red). As discussed earlier, Office 2007 documents are in fact compound files containing XML code—i.e., formatted words that a normal user sees in the document are actually surrounded by HTML style code behind the scenes. In effect, while the user of such a document would see:

Confidential

The actual term as stored in the document would be:

<p class=MsoNormal align=center style='text-align:center'><i

style='mso-bidi-font-style:normal'><span style='font-family:Arial'>Confidential<o:p></o:p></span></i></p>

<p class=MsoNormal align=center style='text-align:center'><i style='mso-bidi-font-style:normal'><span style='font-size:8.0pt'><span style='mso-spacerun:yes'>&nbsp;</span><o:p></o:p></span></i></p>

When dtSearch mounts and reads an Office 2007 document, it must correctly parse “Confidential” from its surrounding formatting instructions—thereby creating nested content. This is where dtSearch version 7.64 failed and produced footer content similar to:

Page 7: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 7

Arial>Confidential

Searching for “Confidential” alone returned no hits although dtSearch was searching Office 2007 footer. To locate the term “Arial.9>Confidential,” a wildcard was needed to be added to the search term:

*Confidential*

dtSearch posts file parsing fixes in most version releases.8 File parsing to identify words for indexing is not an error-free process and there is a real risk of missing responsive documents as illustrated in the above examples. 5. Forensic Software: Accuracy Not Guaranteed Most forensic investigators are well-versed in hardware and software technology. However, identifying a software error can be a daunting task, even for an experienced software developer or quality assurance engineer. dtSearch’s error required Kivu investigators to function as developers and test dtSearch for issues. They relied on their own knowledge of Microsoft’s changes to its file format technology and of dtSearch functionality to evaluate the possible causes of missing content. Lack of knowledge depth is a substantial risk when using unsupervised inexperienced forensic investigators or other e-discovery professionals on cases. As a company like Microsoft changes its file format technology, forensic investigators rely on tools of their trade like dtSearch (and tools built upon dtSearch) to keep pace and incorporate the technology changes. The expectation in a key word search is that dtSearch will evaluate either a Microsoft Office XP document or an Office 2007 document regardless of the difference in underlying file formats unless stated otherwise in software literature. However, it is still user-beware in practice. 6. Do You Know What You Don’t Know? As the net cast to evaluate more electronic devices grows wider and computer technology becomes more complex, the need for efficient and accurate forensic investigations does not change. This dynamic has created a fractured environment. Forensic investigators need to keep abreast of changes to data format, while being aware that forensic software tools are occasionally error-prone and may have overly simplified design that results in missed responsive ESI. This deadly combination results in lower accuracy of detecting responsive ESI and increases the cost and time of document review. In the worst case scenario, responsive ESI is simply missed during initial automated searches and is never caught by manual attorney review. User Errors Can Equal Undetected ESI When Microsoft incorporated ZIP container technology in Office 2007, many forensic software tools experienced errors reading the Office 2007 files. As forensic investigators have encountered more Office 2007 files in investigations, an understanding of the shortcomings of forensic tools has (slowly) spread. As illustrated below, Office 2007 files could be read and

8 dtSearch posts release notes at www.dtsearch.com/releasenotes763.html. Review release note sections labeled.

Page 8: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 8

analyzed in tools like EnCase9, but additional steps had to be taken to ensure the tools properly see the files. EnCase: Example of User Error with ZIP Files To read a compound Office 2007 document, forensics search tools must “mount” the document. When forensic tools mount a file, each “part” is listed as a separate files—a master file is broken down into its base components. If not properly mounted, files are not visible and search tools are unable to evaluate content and metadata. Prior to Office 2007, forensic investigators usually only saw the need to “mount” well-known compound or compressed files such as ZIP files. Mounting large numbers of files in a forensic image could add considerable time to a search and analysis. However, Office 2007 has forced a change in procedure for those investigators who understand the data types. The following test illustrates this scenario. We created a Word 2003 document with the sentence, “The brown fox has jumped over the fence,” in the body of the document. We then created a Word 2007 document with the same conditions. Both documents are loaded into EnCase to illustrate the difference between the two Word versions and how Word 2007 content can be missed if not properly mounted. EnCase search results are used to discuss the results.

In the first EnCase screenshot, results are presented for locating search results for the phrase “brown fox” in a Word 2003 document. This phrase was found in the sentence, “The brown fox has jumped over the fence.” EnCase found the responsive phrase by searching a single file—not stored in a ZIP container.

9 Encase is a widely used forensic imaging and analysis tool made by Guidance Software, Inc All references to Encase in this paper refer to version 6.13.

Page 9: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 9

This second EnCase screenshot portrays a Word 2007 document that is not correctly mounted. This was done by using the wrong EnCase script—one that does not mount a ZIP file. In the illustration above, the term “brown fox” is not found since EnCase is not recognizing individual files. All that’s visible is code and directory references. Consider the situation where someone not knowledgeable with EnCase interprets the results as “brown fox” was not found. The evidence would be missed with this conclusion. To find “brown fox,” proper file mounting is required.

The final Encase screenshot displays a properly mounted Word 2007 document with a match for “brown fox.” In the right top frame, the set of compound files are visible, and the sentence, “The brown fox has jumped over the fence,” is visible. Unlike the second Encase illustration, the responsive term match is not lost in tags and illegible code. User error can be avoided with knowledge about correct forensic tool usage and consistently following appropriate procedures. Software flaws and changes like the Microsoft Office’s new file format do not alleviate the burden to have deep and broad knowledge of how to retrieve, search and analyze ESI for responsive data. However, the conflict between improperly designed forensic tools and lack of experienced forensic investigators and e-discovery professionals further magnifies the need for sound methodologies for analyzing ESI.

Page 10: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 10

7. Simplified Software Design Can Lead to Misuse and Missed Responsive Data Companies that develop forensic tools offer product features such as simplified searching and improved workflow with an emphasis on faster processing and turnaround time. However, these default product features can easily be misused by untrained individuals. The default settings for several commonly used e-discovery tools are strongly “dumbed down.” One of several examples is Wave Software’s Trident Pro10 which is marketed as “an advanced native email solution written with a beginner in mind.” Although Trident is a powerful de-duplication tool widely used in forensics investigations, there is substantial risk that a novice investigator incorrectly uses the keyword and date range filtering capability. In particular, the novice might use Trident (or a similar e-discovery product) as the only tool for culling and analysis without understanding how to adjust key settings or comparing results against output from other e-discovery tools (e.g. dtSearch in its “unadulterated” format, dtSearch Desktop). Understanding the impact of the proper skill sets and training illustrates the need for vigilance and experience when using tools such as Trident Pro. Search and filtering tools use noise and alphabet character files for discerning and evaluating text-based content, including search engines like Google. Products such as Trident Pro offer filtering product functionality which have default alphabet and noise files. The alphabet file tells search software which characters are text, what character causes a word break, and special characters to ignore. The noise file indicates which words to skip (e.g., “just” or “like”) during a search. When left untouched, default settings for alphabet and noise files can lead to missed responsive data. What is troubling is that inexperienced Trident users may not be aware of these files and the modifications needed for a specific case11. The correct level of noise file and alphabet file settings (i.e. which common words or formats should be skipped) depends upon the particular case. Ideally, the particular ESI in a case should be sampled to minimize false positives without missing responsive files. Use of dtSearch “To, From, CC, Subject, and Body fields” as the only method of filtering emails may result in missing keywords and responsive emails. When emails are evaluated for responsive keywords, all email content including metadata and attachments should be read. Metadata is stored in what is known as the MIME12 header as should be searched in addition to email fields like “To.” Kivu Consulting has found that email keyword research in dtSearch (and tools like Trident that use dtSearch) alone can miss keywords in the metadata. To compensate for this issue, Kivu uses a combination of tools to filter email with the goal of maximizing the set of responsive emails. As a practice Kivu uses multiple tools on each case to compare and analyze responsive ESI result sets—for example using Guidance Software’s EnCase and dtSearch from dtSearch Corporation.

10 Trident Pro is produced by Wave Software (www.discoverthewave.com). 11 Trident Pro uses dtSearch as its engine, but changing the noise file is not as straightforward. Changing the noise file was not discussed in Trident’s product or user guides. 12 MIME is an acronym for Multipurpose Internet Mail Extensions—an Internet-standard format for e-mail. The MIME format incorporates use of standard header fields like “To” and “From” as well as content-type identification (e.g., plain text).

Page 11: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 11

Computer forensics tools use different methods for date range filtering. The result: two different forensic tools can produce different result sets when filtering on the same date range for the same data. Date range filtering of email is a common product feature in tools like Aid4Mail, dtSearch Desktop and Trident Pro. However, they differ in their execution of date range filtering—the main primary difference being tools that use individual email’s MIME information to cull dates and times (e.g., Aid4Mail) versus those tools which look at emails as a database record (in the case of Outlook email, taking the date/time information from the Exchange date fields in Microsoft format). In some cases, different forensic tools produce different numbers of responsive emails for the same defined date range on the same data set. For example, date/time that an email is received can differ if the email is blocked by a spam filter. As a general practice (and dependent on the facts of the case), we suggest that the better practice is to filter date range using individual email files and the email’s specific date/time received since this is the native source of email content and metadata. This level of analysis complexity is often overlooked by inexperienced investigators. Incorrectly using simplified filtering and analysis tools in a forensic investigation is a recipe for missing responsive ESI. It does not mean that the tools themselves are not useful. These tools offer efficient high-level analysis of ESI. The issue is inexperienced forensic investigators and their ability to modify tools like Trident Pro and work across multiple forensic tools and compare results for a specific case. As more e-discovery shops, law firms and corporation purchase tools like Trident Pro or KCura’s Relativity, the burden falls on upper management to ensure these tools are used appropriately by trained forensic investigators and other e-discovery professionals. Failing to implement appropriate practices and methodologies can readily leave an end result of missed responsive ESI and the very real downstream risks of discovery sanctions. 8. Acknowledge the Complexity without Sacrificing Functionality Competent forensic and e-discovery personnel understand the relationship between forensic tools and locating/ analyzing responsive ESI. They customize the forensic tools and methodology to the needs of a specific case. One tool is not used across all cases, and widely used tools like EnCase, dtSearch, Relativity and Trident Pro are often configured differently across cases. Unfortunately many companies that develop forensic products create tools with misguided design. Selling “dumbed down” products might benefit ease-of-use, speed up processing, and reduce the number of “responsive” files. However, they do so at the expense of transparency when it comes to the complexity of search. Forensic investigators have a responsibility to search an entire document or data set, not a limited set of fields. This includes the knowledge to draft, test and adjust searching and analysis to evaluate a diversity of electronic media, file systems, file types, and diversity of information stored by different applications. Inappropriate product design can easily result in end users who miss responsive ESI, are not adequately trained, or don’t understand how forensic tools work. What happens when such an individual needs to testify in court?

Page 12: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 12

9. Lessons To Be Learned Keep up-to-date with new technologies, not simply follow new forensic product features Computer forensics investigations and e-discovery assignments are complicated by ongoing changes in technology and continued growth in the amount of ESI. It is not sufficient for an investigator to merely have knowledge of hard drive images and which forensic tool(s) to employ. An investigator must also have a strong knowledge of how data is stored and how forensic tools read and interpret stored data. A gap in this knowledge can have severe consequences to the outcome of an investigation. As computer technology changes, investigators must educate themselves to understand what they may encounter during an investigation. For example, Microsoft is scheduled to release another version of Microsoft Office in 2010. This could include additional changes or advances in the Office Open XML Formats. For forensics investigators, it is critical to stay abreast of changes and to evaluate forensics tools for the impact of any changes. Do not simply accept a claim in software release notes that Office 2010 documents are finally supported. Use current forensics software and regularly test software when changes occur The change in file format in Office 2007 represents a classic case of when to evaluate search tools. The new file format differed substantially from earlier releases and switched to ZIP-archiving technology. Not all search tools cleanly recognized Office 2007 files. Prior versions of Forensic Toolkit13 (“FTK”) (versions 1.7 and prior) also did not “see” text in the XML/ZIP-based Office documents. The key to identifying potential issues is to test search software being used and make sure the latest release or patch is in place. Keeping up-to-date is a pre-requisite to delivering reliable results to clients. Knowing how to use and modify search tools is a requirement In laboratories, scientists weigh ingredients using electronic scales. These scales provide highly reliable measurements because they are designed using a universal measuring standard.14 This ensures that you get the same result across different mass balance scale. Search tools are not designed on a universal standard. One cannot assume the same standardized approach to analysis works on each case. This is where having a control for each case provides predictability of results. This also means that when a new search tool is used, or a new release of an existing tool, that the tool should be heavily tested to become familiar with the tool’s performance and identify any bugs. Starting a new case is not the best time to test a new tool and identify issues. Understand the limitations of search tools used in a forensics or e-discovery case There are many solid forensic search tools such as dtSearch and EnCase, and many other e-discovery tools built with the dtSearch Text Retrieval Engine as an underlying platform. Even though these tools are well-vetted, users need to be aware of limitations and unresolved software bugs. For example, in EnCase, users must run “mount compound files” module or mount files 13 Forensic Toolkit is designed and distributed by AccessData. For more information, visit www.accessdata.com/forensictoolkit.html. 14 See ASTM E617 - 97(2008) Standard Specification for Laboratory Weights And Precision Mass Standards. The American Society for Testing and Materials (ASTM) is a standards body that “develops and publishes voluntary consensus technical standards for a wide range of materials, products, systems, and services.”

Page 13: Software Bugs in Common E-Discovery Search Tools · 2017-01-25 · Search tools have to regularly adapt with changing technology and actively correct errors as they are detected.

© Copyright Kivu Consulting, Inc., 2010. All rights reserved. Page 13

manually. The built-in “file mounter” script of versions 6.13 and prior failed to properly mount Office 2007 documents. Not only is it important to use current software releases, it is important to know technical issues should a user encounter a prior release of software. Understand and apply basic software QA testing principles Quality assurance (QA) is an important part in product development and manufacturing of a product. The goal is to achieve high reliability in anything produced and ensure errors in functionality are minimized. When dealing with computer technology, investigators can benefits from applying similar principles. For example, if a new forensics software tool is released, a prior search can be run on the new tool and results compared. If results are the same, then there is a basis for considering the new tool to be reliable. This is known as control testing, and the more control testing that is done, one can assume a higher the level of reliability—provided of course that the earlier version was not suffering from a problem. About Kivu Consulting Kivu Consulting combines technical, legal and business experience to offer investigative, e-discovery, and forensic analysis services to clients worldwide. Winston Krone, Esq. [email protected] Megan Bell [email protected]