Stardog Linked Data Catalog
-
Upload
clark-parsia-llc -
Category
Technology
-
view
413 -
download
2
description
Transcript of Stardog Linked Data Catalog
Stardog Linked Data Catalog
Héctor Pérez-UrbinaEdgar Rodríguez-Díaz
Clark & Parsia, LLC{hector, edgar}@clarkparsia.com
Who are we?● Clark & Parsia is a semantic software startup ● HQ in Washington, DC & office in Boston● Provides software development and integration
services● Specializing in Semantic Web, web services, and
advanced AI technologies for federal and enterprise customers
http://clarkparsia.com/Twitter: @candp
What's SLDC?● Stardog Linked Data Catalog● A catalog of data sources
○ Semi structured○ Relational○ Object-oriented○ ...
● Provides a coherent view over existing data repositories so that users and/or applications can easily find them and query them
Use Cases● Sources
○ Management, import, subscription, categorization, sharing
● Query○ Management, sharing, results export○ Querying
■ Metadata, external sources, integration● Locating sources
○ Search, browse● NLP/AI
○ Entity extraction, graph algorithms, clustering analysis
Application layer
Middleware layer
NLP/AI analytics layer
Data layer
Demo
Semantic Technologies● W3C standards
○ RDF(S), OWL, SPARQL● Lower operational costs and raise productivity
○ Cooperation without coordination○ Appropriate abstractions○ Declarative is better than imperative○ Correctness when it matters; sloppiness
when it doesn’t
Data Model● Similar to DCAT from W3C
○ Catalog entries● Enhanced with
○ SSD○ VoID datasets○ SKOS background models○ Axioms & rules
Modeling the Domain● Use of axioms to model
relationships between classes
○ :Query subClassOf :Resource
○ :Entry subClassOf :Resource
● Retrieve the resources user :u can see
○ SELECT ?resource WHERE { ?resource type :Resource . }
Security● Authentication
○ Shiro-Based implementation○ Extensible to LDAP and/or AD
● Authorization○ Eat-your-own-food approach○ Reasoning-Based○ Use of axioms & rules
Deriving Permissions● Users have permission
roles● Permission roles have
permission relations with resources
Deriving Permissions● If a user has a permission role containing a
read permission associated to a resource, then the user has the same permission over the resource
:permissionRole(?user,?role),:readPermission(?role,?resource) ->:readUserPermission(?user,?resource)
● Everybody has read access to public resources
:User(?user),:PublicResource(?resource) ->:readUserPermission(?user,?resource)
Deriving Permissions● User :user1 has delete permissions over any
source○ :deleteUserPermission(?user,:anySource),:DataSource(?source) -> :deleteUserPermission(?user,?source)
○ :user1 :deleteUserPermission :anySource● Everybody has all permissions to the resources
they created○ :resourceCreator(?user,?resource) -> :allUserPermissions(?user,?resource)
○ :allUserPermissions(?user,?resource) -> :readUserPermission(?user,?resource)
○ ...
Impact of ReasoningCan user :user1 delete resource :source1?
ASK WHERE { { :user1 :deleteUserPermission :source1 . } UNION { :user1 :permissionRole ?role . ?role :deletePermission :source1 . } UNION { :user1 :resourceCreator :source1 . } UNION { :user1 :deleteUserPermission :anyResource . } UNION { :user1 :allUserPermissions :source1 . } UNION { ... } UNION ...
Impact of Reasoning● Are you sure you're not missing anything?● New awesome way of getting delete permissions
you came up with yesterday● Model knowledge where it belongs and let the
reasoner do the work for you:ASK WHERE { { :user1 :deleteUserPermission :source1 . }}
Too much Inference?When I say
:deleteUserPermission domain :User:deleteUserPermission range :Resource
I mean that for every triple :user1 :deleteUserPermission :resource1the individual :user1 must be an instance of :User and :resource1 of :Resource.
But the reasoner doesn't find the error!!
Typing ConstraintOnly users can have delete user permissions
● :deleteUserPermission domain :User● :user1 :deleteUserPermission :resource1
Typing ConstraintOnly users can have delete user permissions
● :deleteUserPermission domain :User● :user1 :deleteUserPermission :resource1
OWA CWA Consistent true false
Reason Infer that:user1 type :User
Assume that:user1 type not :User
CWA or OWA?● Which one?
○ Of course use both!● Some axioms should be interpreted under
CWA:deleteUserPermission domain :User
● And others under OWA:SuperUser subClassOf :User
● So the right thing happens:user1 :deleteUserPermission :resource1:user1 type :SuperUser
SLDC for Data Integration● SLDC provides descriptions of data sources,
relationships between them, and information to query them
● We can treat data sources as an integrated single data source
○ Distributed querying○ AI analytics
● Virtual, materialized, hybrid
Mappings● Simple
○ pops:Employee subClassOf foaf:Person○ pops:Project equivalentTo foaf:Project○ pops:hasEmployee subPropertyOf foaf:member
● SWRL-Based○ pops:firstName(?person, ?first), pops:lastName(?person, ?last),swrlb:concat(?name, ?first, " ", ?last) ->foaf:name(?person, ?name)
○ pops:worksOnProject(?person,?project),pops:ActiveProject(?project) ->foaf:currentProject(?person,?project)
Summing Up● SLDC is a linked data catalog
○ Manage a variety of sources○ Find sources○ Query sources
● Implemented using Semantic Technologies○ Reasoning
■ Axioms & Rules○ Data validation○ Data integration
Questions?
Why?● Large organizations
○ Disparate departments○ Independent, isolated sources
● Where is what?○ Do we have a data source about clients?○ Where is it?
● Who created what?○ Who owns it?
● Who has access to what?○ Do I have access to it?○ Who do I talk to to get it?
Source Management● Management
○ Create, delete, update, clone● Import
○ RDF, HTML, XML● Subscription
○ Endpoint location● Categorization
○ Categories○ External vocabularies
● Sharing○ To specific users○ Public
Querying Sources● Querying metadata
○ Queries about the catalog itself● External query
○ Querying a particular source● Integrated query
○ Querying a set of integrated sources● Query management● Query sharing● Results export
Finding Sources● Browse
○ Facets○ Pelorus
● Search○ Text-based search○ Rich query language
Last but not least● NLP processing
○ Entity/Event extraction from natural language source descriptions
○ Better source classification & search● Graph algorithms
○ What's the shortest path between these resources?
● Clustering○ Can we discover similar sources based on a
given criteria?
Axioms● It's not always about simple taxonomies...● What about domain/range axioms?
○ :someProperty domain :SomeClass○ :a :someProperty :b○ :SomeClass(x)?
● What about complex subclass chains?○ :SomeClass subClassOf :someProperty some :OtherClass
○ :someProperty some :OtherClass subClassOf :AnotherClass
○ :a type :SomeClass○ :AnotherClass(x)?
● What about cardinality constraints, universal quantification, datatype reasoning, ...?
Data Validation● Fundamental data management problem
○ Verify data integrity and correctness ○ Data corruption can lead to failures in applications, errors
in decision making, security vulnerabilities, etc. ● Relevant in many scenarios
○ Storing data for stand-alone applications○ Exchanging data in distributed settings
● For some use cases, data validation is critical but we still want to do it intelligently
Participation ConstraintEach resource must have been created by a user
● :Resource subClassOf inv(resourceCreator) some :User
● :resource1 type :Resource
OWA CWA Consistent true false
Reason
Infer that● _:b :resourceCreator :resource1
● _:b type :Resource
Assume that_:b :resourceCreator :resource1is false
Uniqueness ConstraintEach data source must belong to at most one catalog entry
● :dataSource inverseFunctional● :entry1 :dataSource :dataSource1● :entry2 :dataSource :dataSource1
OWA CWA Consistent true false
Reason Infer that:entry1 sameAs :entry2
Assume that:entry1 sameAs :entry2is false