Cloud-based Linked Data Management for Self-service Application Development

Peter Haase, Michael Schmidtfluid Operations AG

Cloud-based Linked Data Management

Self-service Application Development

International Workshop on Scalable Semantic Computing

Hangzhou, November 6, 2010

Increasing Popularity of Linked Open Data

• LOD cloud as of May 2009

• 4.7 billion triples

• 142 million RDF links

• LOD cloud as of Sep 2010

• 25 billion triples

• 395 million RDF links

• Covering various domains

• Media

• Life Science

• Geography

• Publications

• …Linking Open Data cloud diagram, by Richard

Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Agenda

• Linked Data Application Development

Opportunities and Challenges

• Information Workbench as Platform for

Linked Data Application Development

• Accessing Linked Data as a Service

Vision and First Experiences

• Conclusions

New Opportunities

• Established standards define common data models,

vocabularies, semantics

• RDF/RDFS, OWL, SPARQL

• From data silos to a web of data

• Ease of specifying relationships in a decentralized way

• Innovative applications that integrate data from various

domains and sources

• Linked Government Data

• Linked Open Data

• Benefits of Linked Data in the enterprise

• Semantically integrate and interlink data scattered among systems

• Cross the chasm between enterprise-internal and public data

• Leverage semantic technologies for improved search and presentation

Challenges in Building Linked Data Applications

• Heterogeneity in various dimensions Location of data (internal / external, open / closed)

Identifiers, structure and vocabularies

Ownership of data

• Structured and unstructered data

• Quality of Linked Data• Various forms of imperfection (erroneous, incomplete, imprecise data)

• Trustworthiness

• End-user oriented interfaces and interaction paradigms• Interfaces that operate over large amounts of data, flexible and dynamic schemas

• Meaningful aggregation of the data

• Support for expressive queries, while retaining intuitive interfaces

• User-generated content• Collaborative annotation and knowledge acquisition

The Information Workbench

• Platform for Linked Data application development

• Base functionality to build applications without any programming

• SDK for easy extensions

• Covering the entire lifecycle of interacting with Linked Data

Discovery of data sources

Integration of data sources

Visualization

Search and Exploration

Collaborative generation of data

• Targeted at

• Semantic Web Community

• Linked Open Data community

• Innovative Enterprises

• Demo and source available at http://iwb.fluidops.com/.

The IWB Application Development Process

Linked Open Data Discovery

• Visually explore data sets

registered to global registries

• Sort/filter data sets by domain,

location, and many more facets

to identify relevant data

LOD Discovery with the Information Workbench

Data Integration

• Integrate discovered Linked Data

• Add providers for internal and external

legacy data sources

• Improve data quality, e.g. via

incremental refinement of ontology

Data Integration

legacy data sources

Customization

• Declaratively specify UI

based on available pool of

widgets

• Embed reports and charts into

wiki pages and wiki page

templates

• Semantically annotate and

interlink connected resources

Data Integration

legacy data sources

Customization

• Declaratively specify UI

based on available pool of

widgets

• Embed reports and charts into

wiki pages and wiki page

templates

• Semantically annotate and

interlink connected resources

Advanced System Configuration

and Extensions

• Use APIs and SDKs to implement own

widgets and mashups

• Script data providers to integrate data

behind non-standard interfaces

• Develop and integrate own modules,

e.g. for customized search and

information extraction

Information Workbench Architecture

• Extensible, widget-based UI• Resource-centric presentation

• Living UI, which exploits semantics

of underlying data

• Large collection of predefined

widgets, easily extendable

• Search and information Access• Coexistence of structured and

unstructured data

• Different search paradigms (keyword

and faceted search, semantic query

completion)

• Data integration through providers• Convert data from a data source into

the RDF data format

• Customizable, easily extensible

• Use of public LOD registries

Information Workbench Architecture

In the remainder of the talk• Focus on challenges in data

integration layer

• In particular: virtualized, cloud-

based integration of data

sources

Linked Data Integration – Where we are

• Non-RDF data stored locally in the repository

• On demand, this data can be updated periodically

• RDF data can be…• persisted in repository, or

• connected via naive federation layer (where possible)

Linked Data Integration – Our Vision

• Current way of publishing• Authors provide RDF dumps linked on some homepage

• Provisioning information missing (data zipped, splitted, available in

different formats, …)

• Often also SPARQL endpoints (typically with poor response times)

• How it should be done• Rich meta-data describing content, structure, properties of the data

• Enable exploration of data via meta repositories

• Efforts have been made (see CKAN), but…

• … poor quality of meta data and data

• Possibility for end-users to buy service guarantees

• Integration details should be irrelevant to the end-user

Software Components

• Definition of „Software Components“

"A software component is a unit of composition with contractually

specified interfaces and explicit context dependencies only. A software

component can be deployed independently and is subject to

composition by third parties." (wikipedia.org)

Data Components

• What we need for Linked Data: „Data Components“

• Interfaces: data components with precise interfaces and metadata

• Deployment: easy provisioning and integration in applications

• Composition: transparent access to atomic or composite units

• Definition of „Software Components“

"A software component is a unit of composition with contractually

specified interfaces and explicit context dependencies only. A software

component can be deployed independently and is subject to

composition by third parties." (wikipedia.org)

Next Step: Data-as-a-Service

• Idea• Producer provides data components

• Consumers can access data components as a service

• Possible realization: use cloud technology!

• Sold on demand

• Elastic

• Fully managed by provider

characteristics of cloud services,

like e.g. AWS, exactly match the

needs (just like it is the case for

Software-as-a-Service)

Virtualized Semantic Repositories

Identification, composition, and use of (fragments of) datasets in manners

that abstract the applications from the specific setup of the data

management service (such as local vs. remote, federation, and distribution)

• Possible realization: use cloud technology!

• Sold on demand

• Elastic

• Fully managed by provider

characteristics of cloud services,

like e.g. AWS, exactly match the

needs (just like it is the case for

Software-as-a-Service)

Challenge 1: Precise Interfaces

• Standardization efforts for RDF meta data descriptions• Statistical Core Vocabulary (SCOVO)

• Very flexible

• Forms a good basis for describing RDF statistics

• Vocabulary of Interlinked Data Sets (voiD)

• Based on SCOVO

• Used to publish meta information about Linked Data Sources

• voiD 2 (in progress)

• Dataset meta information, like source, description, dump, license

• Used vocabularies/ontologies

• Dataset interlinking

• Statistics (e.g. distinct subject count, triples with given predicate etc.)

• Open data registries• Comprehensive Knowledge Archive Network

• Based on DublinCore and DERI‘s data catalog vocabulary (dcat)

Challenge 2: Deployment

• Based on Interfaces

• Possibly based on cloud technologies

• State-of-the-art not satisfying• URLs pointing to human readable description, but not the actual endpoint

• Various forms of syntax errors in RDF documents

• MIME types incorrect or missing

• Endpoints/servers not reachable

• Endpoint/file password protected

Some Statistics

Based on subset of LOD cloud

(excluding a few extremely large datasets)

Challenge 3: Composition

Query Processing over Federation: State-of-the-Art

• First public implementations exists• AliBaba federation layer on top of Sesame

• Benchmark results show severy bottlenecks

• Efficiency issues• Which data sets deliver results for which graph patterns?

• Localized execution of subqueries

• Global estimation of subquery result sizes

• Join oder optimization

• Incremental processing with completeness/correctness guarantees

Peter Haase, Tobias Mathäß, Michael Ziller: An Evaluation to Approaches for Federated

Query Processing over Linked Data. In Proc. I-Semantics 2010.

Linked Data Federation: Vision

Data Source Data Source Data Source Data Source

SPARQL

Endpoint

Virtualized Federation Layer

Consumer

Publisher

Repository

Component

DumpData

Component

Self-service Data Provisioning (Data-as-a-Service)

Challenge 3: Composition

Rich theory in database community for Federated Query

Processing exists

• Data Statistics

• Accuracy vs. index size

• Updating statistics

• Query Optimization

• Join types (e.g., semi-joins)

• Minimizing communication cost

• Optimizing execution localization

• Streaming results

Olaf Görlitz, Steffen Staab: Federated Data Management and Query Optimization for

Linked Open Data. In „New Directions of Web Data Management“, to appear.

Challenges

• Satisfying and standardized statistics framework for RDF

• void 2.0 not yet fully satisfying (e.g. histograms missing)

• Therefore:

• Establish comprehensive, standardized statistics framework for RDF

• Should also be tailored to query optimization

• Address specifics of RDF and SPARQL

• Graph-structured data model

• Importance of efficient merge joins

• OPTIONAL queries

• Exploit built-in semantics of RDFS

• Semantic Query Optimization

Michael Schmidt, Michael Meier, Georg Lausen: Foundations of SPARQL Query

Optimization. In Proc. ICDT 2010.

Conclusion

• Clear benefits of Linked Data application development platform

• Discovery of relevant data

• Virtualized integration of data sources as a key step to success

• Fast customization and extensions

• Information Workbench addressing these needs

• Still some work left to do

• Metadata quality and standardization

• Data quality in general, trust

• Data-as-a-Service

• Efficient federated query processing

Thank you for your attention!

CONTACTfluid Operations AG Email: info@fluidOps.comAltrottstr. 31 Website: www.fluidOps.comWalldorf, Germany Tel.: +49 6227 3849-567

Cloud-based Linked Data Management for Self-service Application Development

Technology

Transcript of Cloud-based Linked Data Management for Self-service Application Development

CLARA: Circular Linked-List Auto and Self Refresh Architectureresearch.nvidia.com/sites/default/files/pubs/2016-10_CLARA... · CLARA: Circular Linked-List Auto and Self Refresh Architecture

August 31, 2019 Radioactive Cloud Linked to Russia · 2019. 8. 8. · August 31, 2019 Radioactive Cloud Linked to Russia Cross‐curricular Discussion, Q&A Directions for teachers:

Continuous Self-Updating Query Results over Dynamic Linked Data

Linked lists Singly linked lists Doubly linked lists Circular lists Self-organizing lists LinkedList and ArrayList Data Structures and Algorithms in Java,

Personal Cloud Self‐Protecting Self‐Encrypting Storage Devices · 2019. 12. 21. · Personal Cloud Self‐Protecting Self‐Encrypting Storage Devices Robert Thibadeau, Ph.D.

Constructing a self-assembling C -symmetric covalently linked … · 2018-09-25 · S1 Constructing a self-assembling C 3-symmetric covalently linked (fused) donor-acceptor-type molecule

Seeding the Linked Data Cloud - HiOAedu.hioa.no/korg2016/korg2016_godby.pdfSeeding the Linked Data Cloud: ... Wikidata and the Virtual International Authority File (VIAF) ... MARC,

Self Organising Cloud Cells

Object Examples and Linked Lists Introkmicinski.com/cs107/assets/slides/linked-list-intro.pdf · class Link: def __init__(self,data,next): self.data = data self.next = next def getData(self):

Self learning cloud controllers

What's up LOD Cloud - Observing the state of Linked Open Data Cloud Metadata

LINKED SPATIAL DATA: BEYOND THE LINKED OPEN DATA … · author, and do not necessarily represent those of the Faculty. i ABSTRACT The Linked Open Data Cloud (LOD Cloud) is the constellation

NER as a gateway drug to the Linked Data cloud

Peak cloud based data - linked data

LEVEL 3 CLOUD CONNECT SOLUTIONS – SELF-SERVICE TO … · LEVEL 3® CLOUD CONNECT SOLUTIONS – SELF-SERVICE TO AWS ... *VLAN self-service feature is for AWS Virtual Private Cloud

Self-service Cloud Computing

1 Lesson 9: Solution of Integral Equations & Integral Boltzmann Transport Eqn Neumann self-linked equations Neumann self-linked equations Attacking the.

Self-Service Linked Government Data with dcat and Gridworks

Fig. 22.1 Two self-referential class objects linked together

Lecture linked data cloud & sparql

Object Examples and Linked Lists Introkmicinski.com/cs107/assets/slides/linked-list-intro.pdf · class Link: def init(self,data,next): self.data = data self.next = next def getData(self):