Still on Stage: Boolean Search. Your Speakers Speaker: Richard Cheng Richard Cheng, CISSP, CISA,...
-
Upload
rebecca-smith -
Category
Documents
-
view
213 -
download
0
Transcript of Still on Stage: Boolean Search. Your Speakers Speaker: Richard Cheng Richard Cheng, CISSP, CISA,...
Still on Stage: Boolean Search
Your Speakers
Speaker: Richard Cheng Richard Cheng, CISSP, CISA, directs digital forensics and e-
discovery cases and consults on IT audits, governance and compliance. His experience includes the collection and processing of unique and/or proprietary ESI (Apple devices, mobile devices, collaboration sites, and the cloud). Richard has provided testimony as a neutral expert and technology authority. He has two M.S. degrees from the University of New Haven and a B.S. from MIT.
Speaker: Megan Bell Megan Bell directs data analysis projects. She is
experienced in the analysis of complex data sets, search and reporting technology and the automation of workflows that increase efficiency and deliver better outcomes. Her case experience includes data/security breach, IP theft, insurance, and employment matters. She also has extensive experience in the development and launch of new product technologies. She has a degree in Chemical Engineering from WPI.
Speaker: Shawnna Childress, P.I.
Overview: Boolean Search Early eDiscovery
famous moments Martha Stewart
voicemail Lehman Brothers’
bankruptcy Merrill Lynch
analyst emails on “junk” investments
It’s not just e-discovery.
Universe of Search Types of Data
Sources: Databases, Email, Files, SharePoint Locations: Local computer, server, backup,
mobile device Search Technologies:
dtSearch Lucene Grep SQL
Automated “predictive” methods/ neural nets
Why Boolean?
Boolean search: Character-based searching. Toolbox of relationship connectors and
limiters to broaden or narrow search Benefits:
Identify important words/ phrases and how used
Research “written” language context and relationship
Easily vary breadth and scope of search Customizable search
Overview of Boolean Search Construction Boolean connectors
AND, OR, NOT
Overview of Boolean Search Construction Other Boolean elements
Proximity, Stemming, Fuzzy Searching Parentheses Wildcards Numeric terms and ranges Fields (i.e., email address)
Differences in Boolean connectors AND versus Proximity Stemming versus Wildcard use
Overview of Boolean Search Construction
Overview of Boolean Search Construction for Foreign Languages
Foreign LanguagesHow will you handle the multiple foreign languages?
Example: Chinese DialectsGan - 赣语 / 贛語 31 millionGuan (Mandarin) - 官话 / 官話 836 millionHui - 徽語 3.2 millionJin - 晋语 / 晉語 45 millionKejia (Hakka) - 客家話 34 millionMin - 閩語 / 闽语 60 millionWu - 吴语 / 吳語 77 millionXiang - 湘语 / 湘語 / 湖南话 / 湖南話 36 millionYue - 粵語 / 粤语 71 millionUnclassified not determined
Optimizing Boolean Search Statement Construction
1. Invest time in identifying relevant search terms and phrases.
2. Determine which search terms to search in combination.
3. Use the most appropriate Boolean logic.4. Adjust Boolean search statements to
account for variations in search term wording, spellings and abbreviations.
5. Modify Boolean search statement when special characters are present.
Examples
1. Capturing the Variation for a Word
Example: eDiscovery
Boolean:“e-Discovery” OR eDiscovery OR “electronic discovery” OR electronic w/1 discovery
2. Searching for Unique Phrases
Example: Search for the ratio 1:1
Boolean: 1?1 AND (NOT(101 OR 111 OR 121 OR
131 OR 141 OR 151 OR 161 OR 171 OR 181 OR 191))
3. Simplifying Complex Compound Phrases Example:
(“product rollout “ OR “product release”) AND (China OR Japan OR Korea OR Asia OR ASEAN OR Taiwan OR Hong Kong)
Boolean: (“product release”) AND (China OR Japan OR Korea
OR Asia OR ASEAN OR Taiwan OR Hong Kong) (“product rollout “) AND (China OR Japan OR Korea
OR Asia OR ASEAN OR Taiwan OR Hong Kong)
4. When Dates are Search Terms
Example: 1/6/11
Boolean: “1?6?11” OR “!1?6?2011” Others?
5. Compound Words
Example: Watch-out
Boolean: Watchout OR Watch?out “watch out”?
6. Noise Filter Issues
Example: The The
Boolean: “The The”
7. Improving Search Results for an Overused and Important Word Example:
When “confidential” is important as a search term and overused
Boolean: confidential AND NOT (“communication is confidential”
OR “confidentiality notice” OR “confidential personal”) confidential AND NOT (confidential w/3 communication) confidential AND NOT (confidential w/3 notice) confidential AND NOT (confidential w/3 personal)
Statistical Sampling Recent court opinions suggest that sampling as used in
Assisted Review is not only useful but may be required in certain cases. Several decisions in the past few years have penalized lawyers for not sampling documents before they were produced (waiver of privilege) and for not sampling the documents that were not produced (omission of responsive data). In two landmark decisions, U.S. Magistrate Judges John M. Facciola and Paul W. Grimm issued key rulings discussing sampling. Specifically, they criticized counsel who hoped to be excused for inadvertent waiver of privilege because they did not sample the documents produced after key-word searches.
United States v. O’Keefe, 537 F. Supp. 2d 14 (D.D.C. 2008) (Judge Facciola)
Victor Stanley, Inc. v. Creative Pipe, Inc., 250 F.R.D. 251 (D. Md. 2008) (Judge Grimm)
Smoking Gun
Even more recently, another court found waiver of privilege in a “smoking gun” attorney-client communication because counsel failed to sample.
Mt. Hawley Ins. Co. v. Felman Prod., Inc., 2010 WL 1990555 (S.D. W. Va. May 18, 2010)
Q&A