Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … ·...

24
Andreas Koller CIO, Semantic Web Company Linked Data for Data Science 1

Transcript of Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … ·...

Page 1: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Andreas KollerCIO, Semantic Web Company

Linked Data forData Science

1

Page 2: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

INTRODUCING SEMANTIC WEB COMPANY (SWC) AND POOLPARTY

Semantic Web Company▸ Founded in 2004▸ Based in Vienna▸ Privately held▸ >30 employees, experts in text

mining & linked data▸ SWC participates in

EU-projects with a total funding of over € 17.0 million

▸ SWC named to KMWorld’s 2016 "100 Companies That Matter in Knowledge Management"

PoolParty Semantic Suite▸ First release in 2009▸ Current version 5.5▸ W3C standards compliant▸ Over 100 installations

world-wide▸ 50% of SWC’s revenue is

reinvested into development of PoolParty

▸ PoolParty on-premises or used as a cloud service

▸ KMWorld listed PoolParty as Trend-Setting Product 2015

2

Page 3: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

MAKE USE OF POOLPARTY SEMANTIC SUITE

OVERVIEW

3

Page 4: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

TECHNICAL CORE COMPONENTS

4Bain Capital is a venture capital company based in Boston, MA.Since inception it has invested in hundreds of companies including AMC Entertainment, Brookstone, and Burger King. The company was co-founded by Mitt Romney.

Taxonomy & Ontology Server

Entity Extraction & Text Mining

Page 5: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

BASIC PRINCIPLESBenefiting from the Semantic Web

in a Nutshell

5

Page 6: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

‘Things’ but not Strings: Using a ‘Semantic Knowledge Graph’

http://www.mycom.com/taxonomy/62346723

prefLabel

Venice

image

http://www.mycom.com/images/90546089

http://www.mycom.com/taxonomy/97345854

prefLabel St. Mark’s Square

altLabelPiazza San Marco

http://www.mycom.com/taxonomy/4543567

prefLabel

altLabel

SquarePiazza

has broader

Page 7: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

The power of knowledge graphs: Agility, extensibility, precision

doc doc doc

Norway France Austria Canada

doc

Norway France Austria Canada

doc

Show me all documents about European

CountriesTraditional approach Graph-based approach

doc doc doc

Page 8: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Norway France Austria Canada

The power of knowledge graphs:Agility, extensibility, precision

doc doc doc doc doc

Show me all documents about European

Countries

Europe

Traditional approach Graph-based approach

doc doc doc

Europe,Norway

Europe,France

Europe,Austria

America,Canada

Page 9: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

The power of knowledge graphs:Agility, extensibility, precision

doc doc doc

Norway France Austria Canada

doc

Traditional approach Graph-based approach

Show me all documents about EU member countries

doc doc

Europe

EU

Europe,Norway

EU,Europe,France

EU,Europe,Austria

America,Canada

doc doc

Show me all documents about European

Countries

Page 10: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

The power of knowledge graphs:Agility, extensibility, precision

doc

Europe,Norway

French,EU,

Europe,France

EU,Europe,Austria

French,America,Canada Norway France Austria Canada

Europe

Traditional approach Graph-based approach

doc

French-speaking? EU

French-speaking

doc doc docdoc doc doc

Show me all documents about EU member countries

Show me all documents about European

Countries

Page 11: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

The power of knowledge graphs:Agility, extensibility, precision

doc

Europe,Norway

French,EU,

Europe,France

EU,Europe,Austria

French,America,Canada Norway France Austria Canada

Europe

Traditional approach Graph-based approach

doc

French-speaking? EU

French-speaking

doc doc docdoc doc doc

Show me all documents about EU member countries

Show me all documents about European

Countries

Metadata per document1. No or little network effects2. No reuse of metadata3. Metadata resides in silos4. Data quality hard to measure5. Not machine-readable

Knowledge about metadata1. Explicit knowledge models2. Reusable and measurable3. Metadata is machine-processable4. Standards-based metadata5. Linkable metadata opens silos

Page 12: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

360-degree views over various content repositories

12

Page 13: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Semantic WebStandards &Technologies

13

Page 15: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Simple Knowledge Organization System (SKOS)Taxonomies and controlled vocanularies

15

http://www.w3.org/2004/02/skos/

Page 16: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

From Simple SKOS to large knowledge graphs

16

your data, e.g. Excel

your docs

- Taxonomy Editing- Collaborative workflows- Free term extraction- Tag recommender- Quality Checker

- Reuse of existing vocabularies- Corpus Analysis- Excel import- XML import- Linked data harvester

your CMS

- Reuse existing ontologies- Create custom schemes- Apply SKOS-XL- Apply ontologies on your SKOS taxonomy

- Automatic mapping between taxonomies- Linked Data frontend- Link to other LD graphs, e.g. DBpedia or Geonames

Generate 1st version of SKOS taxonomy

Edit, extend & curate taxonomy

Extend schema, apply ontologies, use SKOS-XL

Link and map between taxonomies and LD graphs

Page 18: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Complex queries with SPARQL

18 PREFIX mrv-schema: <http://gbpn.org/mrv-schema/> PREFIX qb: <http://purl.org/linked-data/cube#>

SELECT DISTINCT * WHERE { GRAPH <http://gbpn.org/mrv> { ?observation mrv-schema:year ?year. ?observation mrv-schema:region ?region. ?observation mrv-schema:region <http://gbpn.org/mrv-thes/region/India>. ?observation mrv-schema:scenario ?scenario. ?observation mrv-schema:scenario <http://gbpn.org/mrv-thes/scenario/deep-efficiency>. { ?observation mrv-schema:urbanizationType ?urbanizationType. ?observation mrv-schema:urbanizationType <http://gbpn.org/mrv-thes/urbanization-type/urban>. ?observation mrv-schema:buildingType ?buildingType. ?observation mrv-schema:buildingType <http://gbpn.org/mrv-thes/building-type/MF>. ?observation mrv-schema:publicBuildingType ?publicBuildingType. ?observation mrv-schema:publicBuildingType <http://gbpn.org/mrv-thes/public-building-type/NO>. } UNION { ?observation mrv-schema:urbanizationType ?urbanizationType. ?observation mrv-schema:urbanizationType <http://gbpn.org/mrv-thes/urbanization-type/urban>. ?observation mrv-schema:buildingType ?buildingType. ?observation mrv-schema:buildingType <http://gbpn.org/mrv-thes/building-type/Slums>. ?observation mrv-schema:publicBuildingType ?publicBuildingType. ?observation mrv-schema:publicBuildingType <http://gbpn.org/mrv-thes/public-building-type/NO>. } UNION { …….

Page 19: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Challenges in Data & Information Management

191. Distributed Data Sources2. Differing Formats3. Implicit Semantics4. Dubious Provenance5. Missing Licenses6. Unclear Topicality

Page 20: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Linked Data: Discovering Answers to Complex Questions

20

the following sources can be consulted and linked:

● MeSH (Medical Subject Headings)● PubMed● Geonames● DBpedia

To answer the following question,

“Are there interdependencies between the Human Development Index of certain countries and the regional research activities concerning specific types of illnesses?”

Page 21: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Place your screenshot here

21Climate TaggerHelp organizations in the climate and development arenas catalogue, categorize, contextualize, and connect data and information resources.

Climate Tagger is backed by the expansive Climate Compatible Development Thesaurus.

http://www.climatetagger.net

Page 22: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Place your screenshot here

22CTCN MatchmakingControlled vocabularies enable accurate matchmaking between ‘problem statements’ and capabilities of solution providers.

Matchmaking is based upon the Climate Compatible Development Thesaurus.

Reference

Page 23: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

Place your screenshot here

23healthdirect AustraliaIntegrated views and semantic search over more than 100 trusted sources.

Harmonization of various metadata systems through the use of a central vocabulary hub: Australian Health Thesaurus.

http://www.healthdirect.gov.au

Page 24: Data Science Linked Data forviennadatasciencegroup.at/wp-content/uploads/2016/11/Linked-Data … · POOLPARTY Semantic Web Company Founded in 2004 Based in Vienna Privately held >30

CONNECT

Andreas KollerCIO, Semantic Web [email protected]

24

© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/