Lexbe eDiscovery Webinar- Redefining High Speed eDiscovery Processing & Production
Expand Your eDiscovery Scope Beyond English 55 Languages · At its core, eDiscovery is about...
Transcript of Expand Your eDiscovery Scope Beyond English 55 Languages · At its core, eDiscovery is about...
At its core, eDiscovery is about analyzing huge collections of unstructured content – documents, email, call logs, transcripts, contracts – to uncover information about people, places, and organizations.
In the age of globalization, this content may be written in different languages, using multiple scripts and character sets. The challenge is therefore how to efficiently search this multilingual text, extract entities with high accuracy and precision, and ensure that all the necessary information is revealed.
Basis Technology’s Rosette® suite of text analytics components provide a robust and scalable solution to this multilingual eDiscovery challenge.
Through the combination of language identification, morphological analysis, entity extraction, and automatic name translation, Basis Technology can reveal the key information necessary to establish connections and build relationships.
Expand Your eDiscovery Scope Beyond English 55Supported
Languages
KEY FEATURES
- Simple API- Fast and scalable- Industrial-strength support- Easy installation- Flexible and customizable- Java, C++, or web services- Unix, Linux, Mac, or Windows
“Text analytics is no longer an academic specialty. It has become a necessary component in most search and discovery software, from selling products, tracking terrorists, delivering news, or playing music, improving communication among people worldwide. Basis Technology’s new Rosette platform ups the ante with its improvements in accuracy, enabling its customers to power a new breed of intelligent workspace applications.”— Susan Feldman, Research Vice President, IDC
eDiscoverySOLUTIONS
[email protected]+1 617-386-2090
www.basistech.com
Expand your scope - Contact us today
Select Customers
Litigation service providers and law firms have depended on Ipro’s scalable, powerful tools to discover, perform metadata extraction, and image conversion on data collections of all sizes and compositions quickly and defensibly.
KPMG’s eDiscovery/Enterprise Discovery Management services help make the discovery process more cost-effective, while maintaining defensibility of the process and managing risks.
Code Base Platform Support
Compatibility
Identify the language(s) in a document
Apply linguistic intelligence to identify word forms, parts of speech, and sentence structure
Automatically find names
of people, places , products ,
and organizations in text
across many languages.
Verb Determiner
Preposition Determiner
Noun
Noun Noun
Noun
Noun Punctuation
Conjunction
Preposition Adjective
Adjective
Improve the speed and
accuracy of your search
application with advanced
linguistic analysis .
Primary Language
FrenchPrimary Script
Latin
English
Chinese
French
Arabic
8%
22%
31%
39%
English
Arabic
Instantly identify and triagemany languages within largevolumes of text.
Chinese
Identifiez et triez instantanément plusieurs
langues à travers de nombreux textes. French
即时识别和处理大量多语言文本。
التحديد والتصنيف الفوري للعديد من اللغات ضمن كميات كبيرة من النصوص.
The Rosette Language Identifier (RLI) identifies the language(s) and character encoding systems present in a document so that its textual content can be filtered and processed. Extracted text is converted to Unicode so that discovery and information retrieval applications can access a single data representation regardless of language. Using a module called the Language Boundary Locator, mixed-language documents are segmented into regions so that language-specific processing can be performed on each region.
Rosette Base Linguistics (RBL) examines documents and performs a complete morphological analysis so that text can be accurately filtered, analyzed, and searched.
RBL identifies parts of speech, sentence boundaries, word breaks, tokens, lemmas and other linguistic components in European, Asian, and Middle Eastern languages.
The Rosette Entity Extractor (REX) sifts through unstructured text and identifies people, places, dates, and other items that establish the true meaning of a document for further analysis.
REX locates generic terms as well as custom entities such as specific names, phone numbers, and email addresses. Statistical modeling helps determine if an entity resides within a document, rather than simply referring to a list of possibilities and risk overlooking a variation. The result is entity extraction technology that lets you find what you know —and also what you didn’t know.
Rosette Name Translator (RNT) uses a combination of user-supplied name dictionaries, linguistic algorithms and statistical modeling to provide highly accurate, standardized English translations of names that originate from several non-latin writing systems, including Chinese, Russian and Arabic.
By combining REX and RNT, key names can be extracted and translated to help investigators rapidly identify relevant documents that need to be flagged for translation and further study.
True Multilingual eDiscovery
Abu-Yusif Ya'qubأبو يوسف يعقوب
Yao MingOrigin Chinese
Entity Type Person
Language Chinese
Origin Japanese
Entity Type Location
Language Japanese
Shinano River
John KennedyOrigin English
Entity Type Person
Language Arabic
Origin Arabic
Entity Type Person
Language Arabic
Chan Ho PakOrigin Korean
Language Russian
Entity Type Person
جون كينيدي
姚明
Чан Хо Пак
信濃川
Step 1: Language Identifier Step 2: Base Linguistics
Extract the items of interest (including those you didn’t know about)
Step 3: Entity ExtractionAutomatically translate non-English names into English to enable rapid triage of multilingual content
Step 4: Name Translation
Basis Technology helps the legal community meet its multilingual discovery challenges head-on with Rosette®, a linguistics platform proven in hundreds of commercial and government environments.
The Rosette software components are configured as building blocks, and work seamlessly within discovery workflows and information retrieval applications, covering the major European, Asian, and Middle Eastern languages. For legal professionals, Rosette provides the ability to examine multilingual text with unparalleled accuracy and efficiency.
© 2015 Basis Technology Corporation. “Basis Technology Corporation” , “Rosette”, and “Highlight” are registered trademarks of Basis Technology Corporation. “Big Text Analytics” is a trademark of Basis Technology Corporation. All other trademarks, service marks, and logos used in this document are the property of their respective owners. (2015-01-23-SED)KPMG LLP's Trademarks are the sole property of KPMG LLP and their use here does not imply auditing or endorsement of KPMG LLP.
eDiscoverySOLUTIONS
WEST COAST
1700 Montgomery St.San Francisco, CA 94111
FEDERAL
2553 Dulles View Dr.Suite 450Herndon, VA 20171
HEADQUARTERS
One Alewife CenterCambridge, MA 02140
EUROPE
Furzeground WayMiddlesex UB11 1BD, UK
ASIA
9-6 Nibancho, Chiyoda-kuTokyo 102-0084, Japan
Rosette®
BIG TEXT ANALYTICS
RES
RNT
RNI
REX
RBL
RLILanguage Identifier Identify languages and encodings
Base Linguistics Search many languages with high accuracy
Entity Extractor Tag names of people, places, and organizations
Name Indexer Match names between many variations
Name Translator Translate foreign names into English
CategorizerCategorize Everything In Sight
Sentiment AnalyzerDetect The Sentiments Of Your Text
Entity Resolver Make real-world connections in your data
Better Search
Tagged Entities
Real Identities
Matched Names
Sorted Languages
Translated Names
Sorted Content
Actionable Insights
RES
RNT
RNI
REX
RBL
RLI ROSETTELanguage Identifier
ROSETTEBase Linguistics
ROSETTEEntity Extractor
ROSETTEName Indexer
ROSETTEName Translator
ROSETTECategorizer
ROSETTESentiment Analyzer
ROSETTEEntity Resolver
RCA
RSA
RCA
RSA
Rosette®
BIG TEXT ANALYTICS
RES
RNT
RNI
REX
RBL
RLILanguage Identifier Identify languages and encodings
Base Linguistics Search many languages with high accuracy
Entity Extractor Tag names of people, places, and organizations
Name Indexer Match names between many variations
Name Translator Translate foreign names into English
CategorizerCategorize Everything In Sight
Sentiment AnalyzerDetect The Sentiments Of Your Text
Entity Resolver Make real-world connections in your data
Better Search
Tagged Entities
Real Identities
Matched Names
Sorted Languages
Translated Names
Sorted Content
Actionable Insights
RES
RNT
RNI
REX
RBL
RLI ROSETTELanguage Identifier
ROSETTEBase Linguistics
ROSETTEEntity Extractor
ROSETTEName Indexer
ROSETTEName Translator
ROSETTECategorizer
ROSETTESentiment Analyzer
ROSETTEEntity Resolver
RCA
RSA
RCA
RSA
Rosette®
BIG TEXT ANALYTICS
RES
RNT
RNI
REX
RBL
RLILanguage Identifier Identify languages and encodings
Base Linguistics Search many languages with high accuracy
Entity Extractor Tag names of people, places, and organizations
Name Indexer Match names between many variations
Name Translator Translate foreign names into English
CategorizerCategorize Everything In Sight
Sentiment AnalyzerDetect The Sentiments Of Your Text
Entity Resolver Make real-world connections in your data
Better Search
Tagged Entities
Real Identities
Matched Names
Sorted Languages
Translated Names
Sorted Content
Actionable Insights
RES
RNT
RNI
REX
RBL
RLI ROSETTELanguage Identifier
ROSETTEBase Linguistics
ROSETTEEntity Extractor
ROSETTEName Indexer
ROSETTEName Translator
ROSETTECategorizer
ROSETTESentiment Analyzer
ROSETTEEntity Resolver
RCA
RSA
RCA
RSA
Rosette®
BIG TEXT ANALYTICS
RES
RNT
RNI
REX
RBL
RLILanguage Identifier Identify languages and encodings
Base Linguistics Search many languages with high accuracy
Entity Extractor Tag names of people, places, and organizations
Name Indexer Match names between many variations
Name Translator Translate foreign names into English
CategorizerCategorize Everything In Sight
Sentiment AnalyzerDetect The Sentiments Of Your Text
Entity Resolver Make real-world connections in your data
Better Search
Tagged Entities
Real Identities
Matched Names
Sorted Languages
Translated Names
Sorted Content
Actionable Insights
RES
RNT
RNI
REX
RBL
RLI ROSETTELanguage Identifier
ROSETTEBase Linguistics
ROSETTEEntity Extractor
ROSETTEName Indexer
ROSETTEName Translator
ROSETTECategorizer
ROSETTESentiment Analyzer
ROSETTEEntity Resolver
RCA
RSA
RCA
RSA