Tomas Knap | RDF Data Processing and Integration Tasks in UnifiedViews: Use Cases & Lessons Learned

35
Tomas Knap Semantic Web Company RDF Data Processing and Integration Tasks in UnifiedViews Use Cases & Lessons Learned 1

Transcript of Tomas Knap | RDF Data Processing and Integration Tasks in UnifiedViews: Use Cases & Lessons Learned

Tomas KnapSemantic Web Company

RDF Data Processing and Integration Tasks in UnifiedViews

Use Cases & Lessons Learned

1

Agenda

▸ UnifiedViews▹ Introduction of the Tool

▸ UnifiedViews Use Cases▹ 3 Use Cases▹ Benefits/Lessons Learned

2

UnifiedViewsIntroduction of the Tool

3

UnifiedViews Motivation

▸ Maintaining RDF data processing tasks is challenging▹ Different tools▹ Different configurations▹ Tens of data processing tasks

sharing parts of the data processing

▸ Debugging

4

UnifiedViewsApproach

▸ UnifiedViews is an ETL tool for RDF data processing▹ Allows users to manage RDF data

processing tasks▹ Natively supporting RDF data

format

5

UnifiedViews Approach

▸ Standard maintenance interface▹ Define, execute, monitor, schedule, and

share data processing tasks▹ Predefined and customizable building

blocks (plugins) to set up the individual data processing tasks

▸ Debugging features▸ Simplified documentation

▹ Visualizations of the prepared tasks■ Plugins■ Data flow

6

UnifiedViews Pipeline

7

UnifiedViews Core Components

▸ Web administration interface▹ Define and maintain pipelines▹ Validate, execute, monitor pipelines▹ Possibility to schedule pipelines

■ Notifications ▹ Possibility to debug pipelines▹ Possibility to share pipelines and plugins▹ Define and maintain plugins▹ Multi-user environment, SSO support

▸ Robust engine running the tasks▸ API to work with tasks, executions,

schedulled events

8

UnifiedViews Core Plugins

▸ Set of Core plugins available▹ Extractors

■ Obtaining external sources (CSV, DBF, XLS, XML files, RDF data, or relational tables)

▹ Transformers■ Transforming them between various formats

(e.g. CSV files to RDF data, relational tables to RDF data)

■ Executing typical transformations such as SPARQL Update queries, or XSL transformations

▹ Loaders■ Loading the transformed and curated data to

external systems, repositories

▸ 35+ plugins

9

UnifiedViews Team

11

PoolParty Semantic Integrator and UnifiedViews

▸ UnifiedViews is part of PoolParty Semantic Integrator

▸ A semantic technology suite▹ Organize and maintain company

knowledge▹ Annotate documents with concepts from

the knowledge base▹ Provide focused search on top of the

annotated document space▸ https://www.poolparty.biz/

▹ Or please visit PoolParty booth

12

UnifiedViews Availability

▸ Available under an open source license (GPL + LGPL v3)▹ Commercial license also available as part

of PoolParty Semantic Integrator

▸ Hosted on GitHub▹ https://github.com/UnifiedViews

▸ Latest release (June 2016):▹ UnifiedViews Core 2.3.1

▸ http://unifiedviews.eu

13

UnifiedViews Use CasesOverview

14

3 Use Cases

1. Aligned Project▹ Extraction/Annotation of data

from Atlassian Confluence/JIRA

2. Boehringer Ingelheim▹ Publication tracker

3. World Bank ▹ Annotation of World Bank docs▹ Integration with MarkLogic

15

Use Case 1Aligned Project

16

About

▸ Aligned project:▹ H2020, http://aligned-project.eu/

▸ One of the goals:▹ Integrate outputs from commercial

tools such as Atlassian Confluence, JIRA to bring a data-centric approach to governance of software and data engineering

17

UnifiedViews Use Case

▸ UnifiedViews pipeline▹ Extracting data from Atlassian

Confluence, JIRA▹ Annotating textual content with a

taxonomy maintained in PoolParty ▹ Loading everything to a remote

triple store

18

UnifiedViews Pipeline

19

Benefits, Lessons Learned

▸ Predefined plugins which may be used out of the box▹ No heavy programming

▸ Easy pipeline management via user interface

▸ Further support when preparing the pipeline▹ Pipeline validation▹ Pipeline debugging

20

Use Case 2Boehringer Ingelheim

Publication Tracker

21

About

▸ Boehringer Ingelheim wanted to get better overview over world-wide research activities

▸ Extract and annotate articles published at PubMed▹ http://www.ncbi.nlm.nih.gov/pubmed

▸ Linking of unstructured and structured / internal and external information

22

UnifiedViews Use Case

23

Benefits, Lessons Learned

▸ Pipelines in UnifiedViews may be easily ▹ scheduled▹ extended in the future

▸ Detailed information about the pipeline executions is available ▹ Events, logs

▸ Maintenance simplified

25

Benefits, Lessons Learned

▸ Missing▹ Long running pipelines

■ Tighter integration of UnifiedViews and PoolParty Semantic Integrator

▹ Loops, conditional execution of plugins

26

Use Case 3Annotation of World Bank Documents

Integration with MarkLogic

27

About

▸ Goal: Search over annotated World Bank documents▹ World Bank topical taxonomy▹ Geo taxonomy

▸ Demo:▹ http://marklogic-demo.poolparty.biz

28

UnifiedViews Use Case

▸ UnifiedViews pipeline to annotate portions of the World Bank documents▹ Country & region information

annotated with Geo taxonomy▹ Full text, topics annotated with

World Bank topical taxonomy

29

UnifiedViews Pipeline

30

Benefits, Lessons Learned

▸ Easy pipeline management via user interface▹ Easy pipeline configuration

▸ Reusing already existing plugins▹ Pipeline prepared quickly

31

SummaryLessons learned

32

Summary

▸ UnifiedViews▹ UnifiedViews and PoolParty

Semantic Integrator▸ UnifiedViews Use Cases

▹ Conversion of sources to RDF data▹ Annotation of sources▹ Enrichment of the data▹ Publication of the curated data to

the target store▸ UnifiedViews 2.0 in 5mins

33

Summarized Lessons Learned

▸ Easy pipeline management via user interface

▸ Predefined plugins which may be used out of the box▹ No heavy programming▹ Simplified pipeline creation

▸ Further support when preparing pipeline▹ Pipeline validation▹ Pipeline debugging

▸ Pipeline scheduling

34

Contact

Tomas Knap, PhDTechnical Consultant, ResearcherSemantic Web Company

[email protected]▸ https://www.semantic-web.at/ ▸ https://twitter.com/semwebcompany

35

© Semantic Web Company - http://www.semantic-web.at/ and http://www.poolparty.biz/