Expand Your eDiscovery Scope Beyond English 55 Languages · At its core, eDiscovery is about...

2
At its core, eDiscovery is about analyzing huge collections of unstructured content – documents, email, call logs, transcripts, contracts – to uncover information about people, places, and organizations. In the age of globalization, this content may be written in different languages, using multiple scripts and character sets. The challenge is therefore how to efficiently search this multilingual text, extract entities with high accuracy and precision, and ensure that all the necessary information is revealed. Basis Technology’s Rosette ® suite of text analytics components provide a robust and scalable solution to this multilingual eDiscovery challenge. Through the combination of language identification, morphological analysis, entity extraction, and automatic name translation, Basis Technology can reveal the key information necessary to establish connections and build relationships. Expand Your eDiscovery Scope Beyond English 55 Supported Languages KEY FEATURES - Simple API - Fast and scalable - Industrial-strength support - Easy installation - Flexible and customizable - Java, C++, or web services - Unix, Linux, Mac, or Windows “Text analytics is no longer an academic specialty. It has become a necessary component in most search and discovery software, from selling products, tracking terrorists, delivering news, or playing music, improving communication among people worldwide. Basis Technology’s new Rosette platform ups the ante with its improvements in accuracy, enabling its customers to power a new breed of intelligent workspace applications.” — Susan Feldman, Research Vice President, IDC eDiscovery SOLUTIONS [email protected] +1 617-386-2090 www.basistech.com Expand your scope - Contact us today Select Customers Litigation service providers and law firms have depended on Ipro’s scalable, powerful tools to discover, perform metadata extraction, and image conversion on data collections of all sizes and compositions quickly and defensibly. KPMG’s eDiscovery/Enterprise Discovery Management services help make the discovery process more cost-effective, while maintaining defensibility of the process and managing risks.

Transcript of Expand Your eDiscovery Scope Beyond English 55 Languages · At its core, eDiscovery is about...

Page 1: Expand Your eDiscovery Scope Beyond English 55 Languages · At its core, eDiscovery is about analyzing huge collections of unstructured content – documents, email, call logs, transcripts,

At its core, eDiscovery is about analyzing huge collections of unstructured content – documents, email, call logs, transcripts, contracts – to uncover information about people, places, and organizations.

In the age of globalization, this content may be written in different languages, using multiple scripts and character sets. The challenge is therefore how to efficiently search this multilingual text, extract entities with high accuracy and precision, and ensure that all the necessary information is revealed.

Basis Technology’s Rosette® suite of text analytics components provide a robust and scalable solution to this multilingual eDiscovery challenge.

Through the combination of language identification, morphological analysis, entity extraction, and automatic name translation, Basis Technology can reveal the key information necessary to establish connections and build relationships.

Expand Your eDiscovery Scope Beyond English 55Supported

Languages

KEY FEATURES

- Simple API- Fast and scalable- Industrial-strength support- Easy installation- Flexible and customizable- Java, C++, or web services- Unix, Linux, Mac, or Windows

“Text analytics is no longer an academic specialty. It has become a necessary component in most search and discovery software, from selling products, tracking terrorists, delivering news, or playing music, improving communication among people worldwide. Basis Technology’s new Rosette platform ups the ante with its improvements in accuracy, enabling its customers to power a new breed of intelligent workspace applications.”— Susan Feldman, Research Vice President, IDC

eDiscoverySOLUTIONS

[email protected]+1 617-386-2090

www.basistech.com

Expand your scope - Contact us today

Select Customers

Litigation service providers and law firms have depended on Ipro’s scalable, powerful tools to discover, perform metadata extraction, and image conversion on data collections of all sizes and compositions quickly and defensibly.

KPMG’s eDiscovery/Enterprise Discovery Management services help make the discovery process more cost-effective, while maintaining defensibility of the process and managing risks.

Page 2: Expand Your eDiscovery Scope Beyond English 55 Languages · At its core, eDiscovery is about analyzing huge collections of unstructured content – documents, email, call logs, transcripts,

Code Base Platform Support

Compatibility

Identify the language(s) in a document

Apply linguistic intelligence to identify word forms, parts of speech, and sentence structure

Automatically find names

of people, places , products ,

and organizations in text

across many languages.

Verb Determiner

Preposition Determiner

Noun

Noun Noun

Noun

Noun Punctuation

Conjunction

Preposition Adjective

Adjective

Improve the speed and

accuracy of your search

application with advanced

linguistic analysis .

Primary Language

FrenchPrimary Script

Latin

English

Chinese

French

Arabic

8%

22%

31%

39%

English

Arabic

Instantly identify and triagemany languages within largevolumes of text.

Chinese

Identifiez et triez instantanément plusieurs

langues à travers de nombreux textes. French

即时识别和处理大量多语言文本。

التحديد والتصنيف الفوري للعديد من اللغات ضمن كميات كبيرة من النصوص.

The Rosette Language Identifier (RLI) identifies the language(s) and character encoding systems present in a document so that its textual content can be filtered and processed. Extracted text is converted to Unicode so that discovery and information retrieval applications can access a single data representation regardless of language. Using a module called the Language Boundary Locator, mixed-language documents are segmented into regions so that language-specific processing can be performed on each region.

Rosette Base Linguistics (RBL) examines documents and performs a complete morphological analysis so that text can be accurately filtered, analyzed, and searched.

RBL identifies parts of speech, sentence boundaries, word breaks, tokens, lemmas and other linguistic components in European, Asian, and Middle Eastern languages.

The Rosette Entity Extractor (REX) sifts through unstructured text and identifies people, places, dates, and other items that establish the true meaning of a document for further analysis.

REX locates generic terms as well as custom entities such as specific names, phone numbers, and email addresses. Statistical modeling helps determine if an entity resides within a document, rather than simply referring to a list of possibilities and risk overlooking a variation. The result is entity extraction technology that lets you find what you know —and also what you didn’t know.

Rosette Name Translator (RNT) uses a combination of user-supplied name dictionaries, linguistic algorithms and statistical modeling to provide highly accurate, standardized English translations of names that originate from several non-latin writing systems, including Chinese, Russian and Arabic.

By combining REX and RNT, key names can be extracted and translated to help investigators rapidly identify relevant documents that need to be flagged for translation and further study.

True Multilingual eDiscovery

Abu-Yusif Ya'qubأبو يوسف يعقوب

Yao MingOrigin Chinese

Entity Type Person

Language Chinese

Origin Japanese

Entity Type Location

Language Japanese

Shinano River

John KennedyOrigin English

Entity Type Person

Language Arabic

Origin Arabic

Entity Type Person

Language Arabic

Chan Ho PakOrigin Korean

Language Russian

Entity Type Person

جون كينيدي

姚明

Чан Хо Пак

信濃川

Step 1: Language Identifier Step 2: Base Linguistics

Extract the items of interest (including those you didn’t know about)

Step 3: Entity ExtractionAutomatically translate non-English names into English to enable rapid triage of multilingual content

Step 4: Name Translation

Basis Technology helps the legal community meet its multilingual discovery challenges head-on with Rosette®, a linguistics platform proven in hundreds of commercial and government environments.

The Rosette software components are configured as building blocks, and work seamlessly within discovery workflows and information retrieval applications, covering the major European, Asian, and Middle Eastern languages. For legal professionals, Rosette provides the ability to examine multilingual text with unparalleled accuracy and efficiency.

© 2015 Basis Technology Corporation. “Basis Technology Corporation” , “Rosette”, and “Highlight” are registered trademarks of Basis Technology Corporation. “Big Text Analytics” is a trademark of Basis Technology Corporation. All other trademarks, service marks, and logos used in this document are the property of their respective owners. (2015-01-23-SED)KPMG LLP's Trademarks are the sole property of KPMG LLP and their use here does not imply auditing or endorsement of KPMG LLP.

eDiscoverySOLUTIONS

WEST COAST

1700 Montgomery St.San Francisco, CA 94111

FEDERAL

2553 Dulles View Dr.Suite 450Herndon, VA 20171

HEADQUARTERS

One Alewife CenterCambridge, MA 02140

EUROPE

Furzeground WayMiddlesex UB11 1BD, UK

ASIA

9-6 Nibancho, Chiyoda-kuTokyo 102-0084, Japan

Rosette®

BIG TEXT ANALYTICS

RES

RNT

RNI

REX

RBL

RLILanguage Identifier Identify languages and encodings

Base Linguistics Search many languages with high accuracy

Entity Extractor Tag names of people, places, and organizations

Name Indexer Match names between many variations

Name Translator Translate foreign names into English

CategorizerCategorize Everything In Sight

Sentiment AnalyzerDetect The Sentiments Of Your Text

Entity Resolver Make real-world connections in your data

Better Search

Tagged Entities

Real Identities

Matched Names

Sorted Languages

Translated Names

Sorted Content

Actionable Insights

RES

RNT

RNI

REX

RBL

RLI ROSETTELanguage Identifier

ROSETTEBase Linguistics

ROSETTEEntity Extractor

ROSETTEName Indexer

ROSETTEName Translator

ROSETTECategorizer

ROSETTESentiment Analyzer

ROSETTEEntity Resolver

RCA

RSA

RCA

RSA

Rosette®

BIG TEXT ANALYTICS

RES

RNT

RNI

REX

RBL

RLILanguage Identifier Identify languages and encodings

Base Linguistics Search many languages with high accuracy

Entity Extractor Tag names of people, places, and organizations

Name Indexer Match names between many variations

Name Translator Translate foreign names into English

CategorizerCategorize Everything In Sight

Sentiment AnalyzerDetect The Sentiments Of Your Text

Entity Resolver Make real-world connections in your data

Better Search

Tagged Entities

Real Identities

Matched Names

Sorted Languages

Translated Names

Sorted Content

Actionable Insights

RES

RNT

RNI

REX

RBL

RLI ROSETTELanguage Identifier

ROSETTEBase Linguistics

ROSETTEEntity Extractor

ROSETTEName Indexer

ROSETTEName Translator

ROSETTECategorizer

ROSETTESentiment Analyzer

ROSETTEEntity Resolver

RCA

RSA

RCA

RSA

Rosette®

BIG TEXT ANALYTICS

RES

RNT

RNI

REX

RBL

RLILanguage Identifier Identify languages and encodings

Base Linguistics Search many languages with high accuracy

Entity Extractor Tag names of people, places, and organizations

Name Indexer Match names between many variations

Name Translator Translate foreign names into English

CategorizerCategorize Everything In Sight

Sentiment AnalyzerDetect The Sentiments Of Your Text

Entity Resolver Make real-world connections in your data

Better Search

Tagged Entities

Real Identities

Matched Names

Sorted Languages

Translated Names

Sorted Content

Actionable Insights

RES

RNT

RNI

REX

RBL

RLI ROSETTELanguage Identifier

ROSETTEBase Linguistics

ROSETTEEntity Extractor

ROSETTEName Indexer

ROSETTEName Translator

ROSETTECategorizer

ROSETTESentiment Analyzer

ROSETTEEntity Resolver

RCA

RSA

RCA

RSA

Rosette®

BIG TEXT ANALYTICS

RES

RNT

RNI

REX

RBL

RLILanguage Identifier Identify languages and encodings

Base Linguistics Search many languages with high accuracy

Entity Extractor Tag names of people, places, and organizations

Name Indexer Match names between many variations

Name Translator Translate foreign names into English

CategorizerCategorize Everything In Sight

Sentiment AnalyzerDetect The Sentiments Of Your Text

Entity Resolver Make real-world connections in your data

Better Search

Tagged Entities

Real Identities

Matched Names

Sorted Languages

Translated Names

Sorted Content

Actionable Insights

RES

RNT

RNI

REX

RBL

RLI ROSETTELanguage Identifier

ROSETTEBase Linguistics

ROSETTEEntity Extractor

ROSETTEName Indexer

ROSETTEName Translator

ROSETTECategorizer

ROSETTESentiment Analyzer

ROSETTEEntity Resolver

RCA

RSA

RCA

RSA