Redefining Search Technology Solutions for Better Information Access ASIDIC Spring Meeting Eric...
-
Upload
rosamund-simon -
Category
Documents
-
view
219 -
download
1
Transcript of Redefining Search Technology Solutions for Better Information Access ASIDIC Spring Meeting Eric...
Redefining Search Technology Solutions for Better Information Access
ASIDIC Spring MeetingEric Bregand, Chief Executive Officer TEMISTampa (FL) – March 09
Copyright © 2009 TEMIS – All rights reserved 2
What Have We Learned So Far?
Users back as heart of solutions• Engage, Empower, Ease of Use & Trust (E3T)
Information accuracy is key• #1 criteria for churn
Web 2.0/3.0 as backbone for information delivery• Semantic Search• Digital desktop
“Give users what they want before they know they want it”
Agenda
1. Introduction to Text Mining
2. Text Mining for information consumers
3. Text Mining for information producers
4. Moving forward >> Text Mining Web Services
5. Summary and Q&A
Copyright © 2009 TEMIS – All rights reserved 5
Term
Entity
Fact
Knowledge
What is Text Mining? Example!
Product
Dosing
Action Target
State Event Action
Potential Adverse EffectDrug = Trimilax Dosing = 500mgSymptom = TirenessWhen = After administration
Drug Symptom Condition
Prop. Num.Abrev. Verb /3rd
Pron. Adj. Prep. NounVerb
Trimilax
500 makes me feel after ingestion
mg dizzy
Copyright © 2009 TEMIS – All rights reserved 6
Text Mining? Understand!
Title: Google gives drivers a hand at the gas pumps
Source: InformationWeekAuthor: Antone GonsalvesDate: November 7, 2007
Metadata
Entities
Facts
Copyright © 2009 TEMIS – All rights reserved 7
Text Mining? Understand!
Linux
United States
Open-source …
T-Mobile HTC
Qualcomm Motorola
Atlanta
Locations
National Association of Conveni…
Organizations
Lucy Sackett
Persons
Internet
Technologies
Gilbarco Veeder-Root
Companies
InformationWeek
Sackett
Gilbarco
Entities
Facts
Metadata
Product
New Service Google Service
Copyright © 2009 TEMIS – All rights reserved 8
Text Mining? Understand!
Launch
Gilbarco Google Service
Gilbarco New service
Announcement
Partnership
Gilbarco Google
Sackett InformationWeek
Function
Sackett Gilbarco
Alliance
Google HTC
Qualcomm
Motorola
T-Mobile
Entities
Facts
Metadata
Announcement
Who: GilbarcoWhom: unknownWhat: New ServiceWhen: unknown
Who: GilbarcoWhat: Google ServiceWhen: early next week
Launch
Who: SackettCompany: GilbarcoFunction: spoke woman
Function
Who: GilbarcoWith whom: GoogleWhen; unknownState: Negative
Partnership
Who: GoogleWith whom: T-Mobile, HTC, Qualcom, MotorolaWhen: unknown
Alliance
Announcement
Who: SackettWhom: InformationWeekWhen: unknownWhat: unknown
Copyright © 2009 TEMIS – All rights reserved 10
What is Text Mining?
Text Mining is an information access technology… Text Mining generates Knowledge Text Mining serves information consumers & producers
Text Mining Back-End
DataRepository
Text Mining Front-End(Text Analytics)
Agenda
1. Introduction to Text Mining
2. Text Mining for Information Consumers
3. Text Mining for Information Producers
4. Moving forward >> Text Mining Web Services
5. Summary and Q&A
Copyright © 2009 TEMIS – All rights reserved 12
Search Engines Today
Scalable (billions docs) Pervasive (any sources) Live (any time) Dynamic ?
Fast (m’sec queries) Simple (list of documents) Relevant? Informative ?
Document Processing QueryIndex
Copyright © 2009 TEMIS – All rights reserved 13
Text Analytics Today
Index
Text Mining Platform
Entities & Concepts Events & Facts
Occurrences & Position
Search
Discover
Analyze
Scalable (100k docs) Domain-centric Live (any time)
Pertinent Collaborative
Copyright © 2009 TEMIS – All rights reserved 14
Enhanced Searches with Text Mining
Enrich Search Index with more & more relevant extracted
information
Document Processing Query
Text Mining Platform
Index
Business Centric Annotators
Pertinent searches Richer indexes More relevant information
Just better searches No analysis No discovery
Copyright © 2009 TEMIS – All rights reserved 15
Beyond Search ! Discover & Analyze
Document ProcessingIndex
Entities & Concepts Events & Facts
Occurrences & Position Discover
Analyze
Query
Informative Easy reading with highlighting Knowledge Discovery within info links
Pertinent searches Richer indexes More relevant information
Text Mining Platform
Copyright © 2009 TEMIS – All rights reserved 16
Term
Entity
Concept
Pertinence Gains – Beyond Terms…
Pertinence
Average
Good
Excellent
Administration
Federal
Federal Drug Administration
Regulation Agency
Agency
Swiss
Regulatory
Swiss Regulation Agency
Drug
“Search Regulation Agency” better than “Search FDA or Federal…”
Copyright © 2009 TEMIS – All rights reserved 17
Term
Entity
Concept
Proximity(Paragraph)
Pertinence Gains – Beyond Doc’ts
Co-Occurrence(Document)
Facts(Sentence)
Identify entities near by in documentIdentify entities near by in paragraphIdentify entities linked by semantic sense
Proximity
Buy
It was discovered by San Francisco-based Sugen, a biotechnology company that was purchased by pharmaceutical company Pharmacia Corp.
….
Five months later, Pfizer bid for Pharmacia. The maker of the popular arthritis drug Celebrex and hair-loss treatment Rogaine…
Pertinence
Average
Good
Excellent
Copyright © 2009 TEMIS – All rights reserved 18
Pertinence Gains – Benchmarks
Relevance
Average
Good
Excellent Concept Facts
ProximityEntity
TermCo-
Occurence
Text Mining & Search Engine
Standalone Search engine
Copyright © 2009 TEMIS – All rights reserved 19
Key Feature Benefits
Combined Text Analytics & Search• Stay fast & scalable• But also become more pertinent & collaborative
End-user benefits = powerful search & discovery1. Enhanced search 2. Guided navigation3. Assisted document reading4. Standardized data analysis and reporting5. Information discovery6. Collaborative platform
Copyright © 2009 TEMIS – All rights reserved 20
1. Enhanced Search Experience
Simple recognition of words…
From standard keyword search….
Copyright © 2009 TEMIS – All rights reserved 21
•Make comprehensive and precise search•Get more relevant documents•Find what you don’t know!
1. Enhanced Search Experience… to Entity & Fact search!
End-
User
Benefits
Copyright © 2009 TEMIS – All rights reserved 23
2. Faceted Navigation
•Get a quick vision of document content•Navigate within context-relevant information•Rapidly focus on targeted documents
End-
User
Benefits
… to multi-dimensional faceted navigation
Self-adjusting filters to refine the search
Ability to combine several filters at once
(and/or)
Point & Click filtering
Copyright © 2009 TEMIS – All rights reserved 25
3. Assisted Document Reading
•Instant spotting of relevant information•Guided reading•Get additional context (“Smart Link”)
End-
User
Benefits
… to targeted information
viewing
Instant access to relevant information
Text Highlighting
Copyright © 2009 TEMIS – All rights reserved 27
4. Data Analysis and Reporting
… to bird-eye
view!
•Visualize key Entities & Facts (pie/bar charts)•Detect Entities & Facts dependencies (matrix
charts)•Zoom in & out by drilling anywhere
End-
User
Benefits
Copyright © 2009 TEMIS – All rights reserved 28
5. Information Discovery
From flat list of documents ….
Copyright © 2009 TEMIS – All rights reserved 29
5. Information Discovery… to
information network
Entities
Facts
Search Panel
Discovery Tools
Proofs
•Search in knowledge, not in documents •Get a graphical representation of knowledge•Discover information by navigating within Facts
End-
User
Benefits
Copyright © 2009 TEMIS – All rights reserved 30
6. Collaborative Platform
User Enriched Content• Join 2 entities
Ex: BASF = BASF Plant Sciences
• Re-assign entityEx: Carl Zeiss = Company (instead of person)
• Remove entityEx: BUT is not a company (although a French one)
• Add entityEx: XyyyZ is a protein•Increase information sharing
•Capitalize on knowledge•Improve indexing quality
End-
User
Benefits
… to information producer!
From information consumer…
Agenda
1. Introduction to Text Mining
2. Text Mining for Information Consumers
3. Text Mining for Information Producers
4. Moving forward >> Text Mining Web Services
5. Summary and Q&A
Copyright © 2009 TEMIS – All rights reserved 33
Text Mining as Core Component
ProductManagement
Web Content Management
Text MiningContent
Enrichment
Related TopicsExtraction
SmartLinking
Sentiment Analysis
Trends Analysis & Charting
Similarity Detection
Content Annotation
Metadata Extraction
Taxonomy Management
Automatic Categorization
Entity & FactsExtraction
Original ContentJournal Scans
Expert InterviewsEvent Reports
Visitors & customer
s
Content Editors
Editorial& Content
Management
Copyright © 2009 TEMIS – All rights reserved 34
Text Mining Value Proposition
1. Enhance editorial productivity• Reduce cost of creating information products• Increase product quality and consistency• Improve editorial team satisfaction & productivity
2. Enrich content for agile publishing• Increase revenue & maximize content monetization• Improve customer experience & loyalty• Provide agility in creating faster smarter products
Text Mining reduces the production costs and accelerates the delivery of information products
Copyright © 2009 TEMIS – All rights reserved 35
1. Enhancing Editorial Productivity
Content categorization & alerts• Content is automatically categorized according to editors’
preferences and expertise Reduce time in integrating content
Extraction & normalization• References, citations and metadata are automatically
extracted and normalized Ensure information consistency
Semantic and topical tagging• Semantic tags and topics are suggested for editors’ review
and approval Speed-up the editorial process
Copyright © 2009 TEMIS – All rights reserved 36
2. Enrich Content for Agile Publishing
Semantic content linking – navigate!• Provide more relevant content in context by suggesting
similar documents Create more engaging, longer lasting user visits
Richer content tagging – find!• Leverage the powerful content enrichment to better
describe the content and then power accurate searches Richer user experience through accurate answers & facets
Information Analytics – understand!• Powerful analytics to slice & dice your content Quickly assess the feasibility of new product ideas Reach out to new audiences with smarter products
Copyright © 2009 TEMIS – All rights reserved 38
Better Search Capabilities!Example
Peshawar President Bush Islamic union Boycott Benazir Bhutto Pervez Musharraf John McCain Islamabad
Politics Local Washington International
Business News Product Launch
Finance M&A Stock Dow/Nasdaq
Deals People
On the move Interviews
SEARCH
Related documentsMusharraf to hold early electionTalibans positions moveAdministration reiterates supportBenazir calls for resignation(more)
In this documentPeople President Bush Benazir BhuttoOrganizations White HouseLocations Peshawar Washington(more)
News Today
Relevant Topic Extraction
Automatic Categorization
Pakistan polls boycott would help Musharraf : Bhutto2 days agoPESHAWAR, Pakistan (AFP) — Former Pakistan prime minister Benazir Bhutto said Sunday an opposition boycott of upcoming polls would only help President Pervez Musharraf legitimise his imposition of emergency rule.Bhutto said she would meet early next week with former rival Nawaz Sharif, who has called for a boycott of the January 8 election, to discuss the issue."If we all boycott elections, then it will give Musharraf a two-thirds majority in the parliament to validate his provisional constitutional order," she told a press conference in northwestern city of Peshawar, an Islamic political stronghold."That is why we are saying that we will take part in elections under protest, but we will also leave the door open (to talks on a boycott).""I am getting conflicting signals from Nawaz Sharif and Qazi Hussain Ahmad about (an) election boycott as they have filed nomination papers and if someone does that it means he is taking part in election," Bhutto told reporters.
Pervez Musharraf--------------------GoogleWikipediaLinkedIn
Pervez Musharraf--------------------GoogleWikipediaLinkedIn
Smart Linking Entity Extraction
Similarity
Copyright © 2009 TEMIS – All rights reserved 40
Editorial – Current BIODATA
Objectives• Automate primary content acquisition (scientific
literature, patents, business wires, sites, …)• Automate primary content indexing (protein,
genes, diseases, company, people, etc.) Solution
• Web harvesting with QL2• Information extraction, categorization and alerting with
Luxid® and packaged Annotators (BER, MER, CI) Benefits
• Significant cost savings on data gathering and analysis• Highly scalable framework covering multi-topics and
thousands of sources
Copyright © 2009 TEMIS – All rights reserved 41
Editorial – LexisNexis
Objectives• Automatic categorization & indexation using legal controlled
vocabulary• Centralized Knowledge• Easier access to Content
Solution• Mondeca as Legal Ontology • Luxid® with legal Annotator (custom made)
Benefits• More efficient asset management and update• Improved content quality and consistency• More efficient search/navigation based on semantics
Copyright © 2009 TEMIS – All rights reserved 44
Editorial – Search enhancement
Objective• Increase search and retrieval quality with better part-of-
speech tagging in German
Solution• TEMIS XeLDA® to improve the indexing process• Integration with Verity K2
Benefits• Increase customer satisfaction by providing more
accurate and comprehensive search results
Copyright © 2009 TEMIS – All rights reserved 45
Editorial – AFP
Objective• Build the new AFP cross media platform of information
access (B2B « Image Forum » platform).
Solution• Luxid® with People, Location, Organization, Company and
IPTC codes annotators• Integration with an ontology management tool and a
search engine
Benefits• Uniform access to any AFP content (text, audio, video…)• Make information access easier on 10M+ articles in 6
different languages, 10M+ images and between 2 and 3 millions of news articles per year
Copyright © 2009 TEMIS – All rights reserved 46
Agile Publishing – Elsevier
Objective• Develop a revolutionary database indexing the last 28 years in
chemistry patent• Provide an exceptional users’ experience by using “smart
content”
Results• ~20 Million Chemistry Patent documents• Searchable by chemical reactions, solvents, reactants directly
extracted from the documents• Released by Elsevier-MDL in Nov. 2004
Currently• TEMIS distributes the Chemical Entities Relationships Annotator
in partnership with Elsevier
Copyright © 2009 TEMIS – All rights reserved 47
Agile Publishing – Thomson
Objective = Rescue lost-data• 49 bound volumes of Biological
Abstracts® for 1926 to 1968 digitized using offshore resources
• Required to make the data searchable with the BIOSIS
Approach• Use Luxid® entity extraction to obtain
candidate terms from the titles and abstracts• Map the extracted entities to the BIOSIS vocabulary• Output the resulting indexing as XML for loading to the
Content Management System
Copyright © 2009 TEMIS – All rights reserved 48
Agile Publishing – Springer Objective
• Mapping of meaningful words and phrases in journal articles to encyclopedia entries
• Identification of related documents in a pool of over three million journal articles
Solution• Indexing of incoming journal articles to link journal articles with the
related encyclopedia entry• Creation of semantic fingerprint for each journal article to allow search
engine calculate degree of relationship• Integration with Springer’s search engine
Benefits• Increased product sales by improving content linking
Agenda
1. Introduction to Text Mining
2. Text Mining for Information Consumers
3. Text Mining for Information Producers
4. Moving forward >> Text Mining Web Services
5. Summary and Q&A
Copyright © 2009 TEMIS – All rights reserved 51
Market Expectations
On-the-fly annotation services• Federated platform (web2.0/3.0)• Serving all user/IT tools (browser, office, search, content
management, …) Text Mining Any Where• Highly scalable• Anytime (24/7)• Any documents• Any languages (US, European, Asian, Arabic, …)
Copyright © 2009 TEMIS – All rights reserved 52
Market Expectations
On-the-fly annotation services High-quality & accuracy
• Generic entities (people, company, …)• Market-specific entities (drug, patient, court cases, …)• Generic facts (acquisition, announcements, events, …)• Market-specific facts (binding, activation, law suit, …)• Disambiguation (Orange! Telco company? Location? Fruit?)• Normalization (IBM Corp = IBM = I.B.M)
Copyright © 2009 TEMIS – All rights reserved 53
Market Expectations
On-the-fly annotation services High-quality & accuracy More than just annotations
• Content enrichment with additional dataGPS coordinate for locations , Chemical
structure for drugs, …• Information linking
Content is about hyper linking• Semantic mash-up
Wikipedia for named entities (people, location, events, …)
Google maps for geolocationPatents database for scientific literature…
Copyright © 2009 TEMIS – All rights reserved 54
Receive annotated documents
Text Mining Web Services
Send documents
Content Annotation
Web Services
+++ Receive annotated
enriched documents
Receive annotated enriched & linked
documents
++
+ +
Persistent Content Repositories
Text Mining Services
Content Hyper-linking Web
Services
Text Mining Services Customer Data
Public Data
Content Enrichment
Web Services
Text Mining Services
Copyright © 2009 TEMIS – All rights reserved 55
Workflow Engine
Data Source Reader
Data OutputGenerator
……
Luxid® Information Mart
AD
MIN
Workflow Engine
Data Source Reader
Data OutputGenerator
……
Luxid® Information Mart
AD
MIN
Luxid® Annotation Factory
…
API/W
EB S
ER
VIC
E
AnnotationPlan 1
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
AnnotationPlan N
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
Luxid® Annotation Factory
…
API/W
EB S
ER
VIC
E
AnnotationPlan 1
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
AnnotationPlan 1
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
AnnotationPlan N
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
AnnotationPlan N
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
Back-up Environment
TEMIS – Luxid® Web Services
Workflow Engine
Data Source Reader
Data OutputGenerator
……
Luxid® Information Mart
AD
MIN
Workflow Engine
Data Source Reader
Data OutputGenerator
……
Luxid® Information Mart
AD
MIN
Luxid® Annotation Factory
…
API/W
EB S
ER
VIC
E
AnnotationPlan 1
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
AnnotationPlan N
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
Luxid® Annotation Factory
…
API/W
EB S
ER
VIC
E
AnnotationPlan 1
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
AnnotationPlan 1
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
AnnotationPlan N
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
AnnotationPlan N
Skill C
artridge
Skill C
artridge
Skill C
artridge
…
Production Environment
• Create/Update Annotation Plans• Create/Update Annotation Workflows
TEMIS, Inc.
HTTPSWEB
SERVICES
• Create/Update Skill Cartridges™• Create/Update Classification Plans• Install/Upload Skill Cartridges™
Luxid® Knowledge Studio
On-DemandAnnotationTriggered by manual intervention
On-the-FlyAnnotationTriggered by automatic call
Luxid® Administration Console
HTTPSBrowser
HTTPSBrowser
SecuredFTP
Remote AdministrationMonitoring & Administration
Copyright © 2009 TEMIS – All rights reserved 56
TEMIS – Company Background
TEMIS = TExt MIning Solutions• Software company created in 2000• Dual Headquarters in Philadelphia & Paris• Acquisition of Xerox Linguistics (20 years of R&D)
Leader in Publishing and Life Sciences Text Mining• Over 200 clients in Pharma and B-to-B publishing• Founding member of UIMA’s OASIS committee
Flagship software product• Top-20 most innovative products
across Europe
Enable organizationsto better interact with their environment
by extracting knowledge and making sense of content
Agenda
1. Introduction to Text Mining
2. Text Mining empower Search Engines
3. Text Mining for Publishers
4. Moving forward – Text Mining Web Services
5. Summary and Q&A
Copyright © 2009 TEMIS – All rights reserved 58
Summary
Content Enrichment is critical • For End-Users• For Publishers • For any information consumers and producers
Copyright © 2009 TEMIS – All rights reserved 59
Summary
Content Enrichment is critical More than Content Enrichment is expected
• Content is about linking (Hyper-linking)• Semantic mash-up
Copyright © 2009 TEMIS – All rights reserved 60
Summary
Content Enrichment is critical More than content enrichment is expected Text-Mining plays an important role
• Proven technology• Key component in information access technology stack• Wide range of services (from basic tagging to semantic
linking)
Copyright © 2009 TEMIS – All rights reserved 61
Summary
Content Enrichment is critical More than content enrichment is expected Text-Mining plays an important role Key business benefits
• Reduce cost of creating information products• Increase revenue & maximize content monetization
Copyright © 2009 TEMIS – All rights reserved 62
Summary
Content Enrichment is critical More than content enrichment is expected Text-Mining plays an important role Key business benefits Immediate impacts
• Improve editorial team satisfaction & productivity• Enhance product quality and consistency• Increase customer experience & loyalty