Enterprise Terminology Management - Publishers'...
Transcript of Enterprise Terminology Management - Publishers'...
Enterprise Terminology Management as a Basis for Powerful Semantic Services in Content Publishing
Publishers‘ Forum 2013Berlin, 22 of April 2013
Martin KaltenböckSemantic Web Companywww.semantic-web.at
Christian DirschlWolters Kluwer Deutschland GmbHwww.wolterskluwer.de
@semwebcompany
Agenda of the Workshop
Challenges and Introduction
Solution: Linked Controlled Vocabularies
Terminology WKD Use Case (C. Dirschl, WKD)
Conclusion & Outlook: a new Business Model?
Semantic Services on Top of Terminology Mgnt.
Q&A and Open Discussion… bring your own Use Cases!
© Semantic Web Company – http://www.semantic-web.at/
Semantic Web Company (SWC)
SWC FACTSSEMANTIC INFORMATION MANAGEMENT
• Semantic Web Company founded 2001 in Vienna, Austria• 20 experts in strategy, coding, consulting, research• Product: PoolParty Suite (launched 2009)• Serving global 500 companies• EU- & US-based consulting services
Partner Network
© Semantic Web Company – http://www.semantic-web.at/
SWC Customers (excerpt)
World Bank
Roche Diagnostics
Credit Suisse
Wolters Kluwer
Biogen Idec
Wood MacKenzie
UNIQA Insurance AG
Pearson
REEEP
British Museum
Education Services Australia
Daimler
A1 Telekom
© Semantic Web Company – http://www.semantic-web.at/
Challenges and Introduction
© Semantic Web Company – http://www.semantic-web.at/
We use different terminologies…
We use different languages…
We use different classification systems…
We use different meta data management systems…
We use different glossaries and definitions…
We use content from several data silos…
What are the challenges?
Innovationmanagement Innovation
management
HRMarketing
© Semantic Web Company – http://www.semantic-web.at/
Terminology = Controlled Vocabulary = SKOS Thesaurus
SKOS = Simple Knowledge Organisation System
L(O)D = Linked (Open) Data
Linked Controlled Vocabularies = using L(O)D principles
Concept based tagging = semantic tagging = semantic annotation
URI = Uniform Resource Identifier
….
I am using a special Terminology ;)
© Semantic Web Company – http://www.semantic-web.at/
What is a thesaurus, what is the difference to a taxonomy or an ontology?
A thesaurus is expressiveenough to improve mostenterprise applications
significantly
but it is not too complex to create and maintain it
in a sustainable way
Taxonomy – Thesaurus - Ontology
© Semantic Web Company – http://www.semantic-web.at/
SKOS stands for ‚Simple Knowledge Organization System‘
© Semantic Web Company – http://www.semantic-web.at/ 9
• W3C Standard since 2009
• Based on SemanticWeb standards
• Open for linking withadditional linked data
• W3C Standard since 2009
• Based on SemanticWeb standards
• Open for linking withadditional linked data
http://www.w3.org/2004/02/skos/
What is a Concept? The Semiotic Triangle
conceptconcept
objectobjectlabellabel
A-Class
A-Klasse
W 176
Mental model of „A-Class“
anotherobjectanotherobject
Another mental model of „A-Class“
© Semantic Web Company – http://www.semantic-web.at/ 10
Concept-tagging vs. Term-tagging
Enterprise vocabulary
--- ------ --- --- ---- ----- ---- ------- --- - --- --- ---- ----- ------
Concept Tagging
Content from CMS
Term Tagging
‚Term-tags‘ become a ‚concept‘as part of the enterprise vocabulary
Concept-tagging is done on top of concepts which are already part of the enterprise vocabulary, thus contextualised and linked to other concepts.
Term-tagging means that tags are extracted from text (automatically via text mining) which are not part of the controlled vocabulary yet.
Term-tags can be inserted into the enterprise vocabulary. This extends and refines the vocabulary more and more.
© Semantic Web Company – http://www.semantic-web.at/ 11
Solution: Linked Controlled Vocabularies
© Semantic Web Company – http://www.semantic-web.at/
Using Linked (Open) Data Principles
• Use URIs to denote things.• Use HTTP URIs so that these things can be
referred to and looked up ("dereferenced") by people and user agents.
• Provide useful information about the thing when its URI is dereferenced, leveraging standards such as RDF, SPARQL.
• Include links to other related things (using their URIs) when publishing data on the Web.
Linked Data Principles Tim Berners‐Lee
WHY?• To enable connected vocabularies over several
departments (also different languages)• To enrich a Terminology in the areas of concepts,
synonyms, definitions, relations….• To enable contextualization / data integration
linking different Terminologies
Linked Controlled Vocabularies
© Semantic Web Company – http://www.semantic-web.at/
© Semantic Web Company – http://www.semantic-web.at/ 14
1. Each concept in one or many concept schemes2. Each concept has one URI3. Each concept has one ore more labels4. (Poly‐)Hierarchical and non‐hierachical relations5. Matching between concepts from various sources
1. Each concept in one or many concept schemes2. Each concept has one URI3. Each concept has one ore more labels4. (Poly‐)Hierarchical and non‐hierachical relations5. Matching between concepts from various sources
1.
2.
3.4.
5.
Linked Controlled Vocabularies
Linked Controlled Vocabularies
• Simple Knowledge Organisation System is a W3C standard to develop enterprise vocabularies
• SKOS provides several properties for vocabularylinking (mapping):– skos:exactMatch– skos:closeMatch– skos:broadMatch– skos:narrowMatch– skos:relatedMatch
http://www.w3.org/TR/2009/REC-skos-reference-20090818/
© Semantic Web Company – http://www.semantic-web.at/
16© Semantic Web Company – http://www.semantic-web.at/
Semantic Services on Top of Terminology Mgnt.
© Semantic Web Company – http://www.semantic-web.at/
Semantic Services on Top of Terminology Management
Topic Pages & Dossier Pages
SEO / SEM
Semantic Search
Recommender Systems
Content Aggregation
Data Integration (Services)
Matchmaking Services
Smart Glossary Services
© Semantic Web Company – http://www.semantic-web.at/
© Semantic Web Company – http://www.semantic-web.at/ 19
Live‐Demohttp://scot.curriculum.edu.au/
Smart Glossary ServicesExample: Schools Online Thesaurus
Dossier Pages:From ‚Gopher‘ to ‚Super-Mashups‘
© Semantic Web Company – http://www.semantic-web.at/ 20
Live‐Demohttp://www.reegle.info/countries
Topic Pages: Mashups providing a quick overview
© Semantic Web Company – http://www.semantic-web.at/ 21
Short Description
Related Concepts
Geo-Search
Content (Tw
itter, Videos etc) fom
several different sourcesAPI
http://
CMS
© Semantic Web Company – http://www.semantic-web.at/ 22
Live‐Demohttp://www.gbpn.org/newsroom/news-aggregator
Content AggregationExample: GBPN News Aggregator
SKOS & Linked data alignment
© Semantic Web Company – http://www.semantic-web.at/ 23
Live‐Demohttp://bit.ly/semantic_search
The Business Perspective: Costs of Data Integration
© Semantic Web Company – http://www.semantic-web.at/ 24
Source: Price Waterhouse Coopers – Technology Forecast, Spring 2009
Semantic Search
„Innovation management methods“ Search
HRMarketing/Sales
Research Production
© Semantic Web Company – http://www.semantic-web.at/
Live‐Demohttp://pilot4.poolparty.biz/alcedo/
Querying structured data AND unstructured data in one step
IndustryNews
Show me industry news which mention countries or regions to which our export volume has increased over the last 5 years at least by 10% and which deal with one of our products and/or with one of our competitors.
(Federated) SPARQL Queries
Export statistics
© Semantic Web Company – http://www.semantic-web.at/
Terminology WKD Use Case (C. Dirschl, WKD)
© Semantic Web Company – http://www.semantic-web.at/
© Semantic Web Company – http://www.semantic-web.at/
Content Acquisition
Manually collecting data from different sources Most information is publicly not available 1:1 contractual relationships with authors
Content Enrichment
Composing/Bundling
Using internal taxonomies and thesauri Mainly manual enrichment Linking of WK content only
Sales
Customer ServiceOnline libraries as isolated applications Hardly any integration with Web content Only first steps in integration of client software and content
ContentAcquisition
ContentEnrichment
ComposingBundling
PublishingInterfacing Sales Customer
Service Customer
Publishing
Interfacing
Publishing mainly in the context of a distinct product Publishing of texts, not information
Content Supply Chain
© Semantic Web Company – http://www.semantic-web.at/
Jurion Platform
jDeskReal integration in
local processes
jCloudSecure access and mobility
jStoreAccess to many sources and immediate usage
jBookIndividualisation of
content
jLinkNetworking and Personalisation
jCreateCreate and sell
knowledge
jSearchSemantic search on legal information
© Semantic Web Company – http://www.semantic-web.at/
Overview Search and Content Enrichment architecture
CMS
CustomerContent
MetadataDB/Services
www… Crawler
Importpath
3rd PartyContent
UGCImportpath
Classification*
Metadata Recognition
Content Enrichment
Classification*
Metadata Recognition
Content Enrichment
Index
Concept Recognition*
Doc. Segmentation
Normalization
Index
Concept Recognition*
Doc. Segmentation
Normalization
User Query
Query Analysis• Concept Recogn.*• Named Entity Recogn.• Semantic expansion*• Link to Taxonomy*
Search
Search Result (Raw)
Result Analysis• Relevance Ranking
Refinement• Data organization
(e.g. faceting)• Further analysis (e.g.
ontology, linked data)
Search Result(Final)
Search Feedback
(e.g. ontology)
* Domain specific requirements
Enrichment Preprocessing/Indexing Search
UserInformation
© Semantic Web Company – http://www.semantic-web.at/
Jurion – Autosuggest from dedicated knowledge domain database
Domain knowledge in PoolParty is the basis for auto complete;No keywords, but detailed legal concepts are offered
© Semantic Web Company – http://www.semantic-web.at/
PoolParty for Metadata Storage and Development
Tool for storing the domain knowledge vocabulary; independent of content and metadata database; sound basis for applied knowledge management
© Semantic Web Company – http://www.semantic-web.at/
Pebbles for Additional Metadata Assignment
Vocabulary maintained in PoolParty is assigned to content via an editorial workflow;Additional free metadata can also be applied
© Semantic Web Company – http://www.semantic-web.at/
Pebbles as a means to include external knowledge
Leveraging the external knowledge available in the Semantic Web;Automatic inclusion of e.g. synonyms, definitions and references
© Semantic Web Company – http://www.semantic-web.at/
Linked Data Publishing
vocabulary.wolterskluwer.de
© Semantic Web Company – http://www.semantic-web.at/
Cooperation between SWC and WKD
Metadata Management
Text Mining
Data Integration
Semantic Search
Thesaurus Management
Knowledge Extraction
Knowledge Model Creation
Knowledge Model Maintenance
Knowledge Model Development
Open Data Usage
Linked Data Usage
Wolters Kluwer
Semantic Web Company
© Semantic Web Company – http://www.semantic-web.at/
Cooperation between SWC and WKD
Metadata Management
Text Mining
Data Integration
Semantic Search
Thesaurus Management
Knowledge Extraction
Knowledge Model Creation
Knowledge Model Maintenance
Knowledge Model Development
Open Data Usage
Linked Data Usage
Wolters Kluwer
Semantic Web Company
Conclusion & Outlook: a new Business Model?
© Semantic Web Company – http://www.semantic-web.at/
Enterprise Terminologies: An Explicit Metadata Layer
• Metadata are stored and processed separately from data• Metadata management is part of the enterprise information management strategy
HRMarketing/Sales
Research Production
© Semantic Web Company – http://www.semantic-web.at/
Linked enterprise vocabularies are the backbone for a semantic infrastructure
© Semantic Web Company – http://www.semantic-web.at/ 40
Information integration on semantic level
Application (integrated views)
http://company.com/research/1452
http://company.com/production/729
Lean manufacturing
Lean production
http://company.com/regions/Belgium
http://company.com/regions/Benelux broaderrelatedmatch
Experienced publishers can provide support in each of these steps:
1. Publishers have expertise in their specific domain and can support others with this knowledge about adequate concepts and its usage.
2. Publishers can consult partners or customers concerning the different processes that come up with creating standardized data or transforming existing data in the desired format.
3. Publishers can take over the creation of taxonomies or thesauri by using existing resources or engaging their internal domain experts’ network.
4. Enrichment can be supported by publishers in form of planning and executing the linking with external (cloud) or internal (publisher’s) resources and quality management of the linking.
5. Also curation can be executed manually or automatically by specialized tools. Publishers might have better experience in quality improvement of data and appropriate tools at hand.
6. Values of controlled vocabularies lie in the internal structural processes. They can improve functionalities of applications or enable additional services and even completely new applications. Publishers can support in order to use the potential of these data and to monetize the advantages of already existing applications by introducing proper showcases.
7. Maintenance is also an important topic that has to be taken into account as language, data and information change over time. This service can be offered by publishers.
Publishers could therefore support the implementation of external linked data infrastructures by process consulting and content expertise.
Source: A systemic perspective on linked open vocabularies (Blumauer, Dirschl, Eck, Pellegrini)
A Business Model for Publishers?
© Semantic Web Company – http://www.semantic-web.at/
http://www.semantic‐web.at/http://poolparty.biz
Martin KaltenböckManaging Partner & CFOm.kaltenboeck@semantic‐web.at
42
„We are happy aboutany comments andquestions – and pleasebring in your own usecases now!“
Christian DirschlContent [email protected]