Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK Semantic Web infrastructure Trisolda current...

33
Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK http://www.ksi.mff.cuni.cz/semwex/ Semantic Web infrastructure Trisolda current state and perspectives 10. Mixer 26.11.2008

Transcript of Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK Semantic Web infrastructure Trisolda current...

Page 1: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Filip Zavoral, Jiří DokulilSemWex - KSI MFF UK

http://www.ksi.mff.cuni.cz/semwex/

Semantic Web infrastructure Trisolda current state and perspectives

10. Mixer 26.11.2008

Page 2: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Semantic web vs. semantization

Semantic web vision Tim Berners-Lee

“The Semantic Web,” Scientific Am. 2001 semantic research generously funded 'hardly one has ever seen ...'

New buzzwords Web 2.0, Web 3.0, Social web, Web of data, Meshups, …

Semantic web died? no, not yet born

Web Semantization

Page 3: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Semantic technologies

TCP/IP

HTTP

HTML

Browser

Page 4: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Technical details

Page 5: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Semantic web services

Page 6: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Trisolda

Motto 'hardly one has ever seen ...' the semantic web

data from real life incomplete, duplicated, inaccurate, >20 millions triples

Jena very slow load, over >1 million of triples → crash

Sesame unable to load more then 200 000 triples exponential complexity for loading

where is a working platform for semantic web research?

Technology background Repository – data integration DataPile

Page 7: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Trisolda

Trisolda Architecture

Import interfaces

Repository

Querying & Executors

Page 8: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Repository

Trisolda Repository Stores incoming data Retrieves results for queries Stores used ontology DataPile structure

holds data in any formatApplications server

Not all data and knowledge available when imported the knowledge is not

accurate Background worker

inferencing data unifications reasoner

Framework for plug-ins

Page 9: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Import

Direct import data in data sources converters to the used ontology

Crawling wild Web Egothor web crawler

AgentMat parsed pages stored deductors deduce data and

ontology real life data incomplete, duplicated,

inaccurateImport modes

batch insert immediate insert

Page 10: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Querying

Query API Based on simple graph matching

query: set of RDF triples with var.

result: multiset of possible variable mapping – a relation

Not another SQL-like language set of C++ classes and

operators Query evaluation

levels of support by q engines

Query environments present outputs examples: rep. browser, RDF

visualizer, semantic executors service composition -

conductors

Page 11: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

AgentMat - data semantization framework

Page 12: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

AgentMat - data extraction

Page 13: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Future work

Conclusions working infrastructure

currently not working - re-deployment, AgentMat & TriQ integration

gathering, storing and querying of semantic data platform for research and experiments

Future work & long-term goals specialized semantic data storage semantic acquisition, data semantization interface-based loosely coupled network of Semantic

Web repositories semantic computing, services, composition, executors ...

Page 14: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Selected Publications

Beňo, Míšek, Zavoral: AgentMat: Framework for Data Scraping and Semantization, 3rd International Conference on Research Challenges in Information Science, IEEE, 2009

Dokulil, Yaghob, Zavoral: Trisolda: The Environment for Semantic Data Processing, International Journal On Advances in Software, IARIA, 2009

Podzimek, Dokulil, Yaghob, Zavoral: Mám hlad: pomůže mi Sémantický web?, Informačné technológie - Aplikácia a Teória, ITAT 2008

Dokulil, Tykal, Yaghob, Zavoral: Semantic Web Repository And Interfaces, International Conference on Advances in Semantic Processing, SEMAPRO 2007, IEEE Computer Society Press - Best Paper Award

Dokulil, Tykal, Yaghob, Zavoral: Semantic Web Infrastructure, IEEE International Conference on Semantic Computing ICSC, IEEE Computer Society Press 2007

Yaghob, Zavoral: Semantic Web Infrastructure using DataPile, Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence and Itelligent Agent Technology, Hong Kong, IEEE Computer Society Press 2006

Page 15: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.
Page 16: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

PART II

Tables in RDF querying -do we really need them?

Page 17: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

SPARQL

syntax SQL-like – at first look “simple language” but complex grammar

{?x ?y ?z . OPTIONAL { ?a ?b ?c . } . ?k ?l ?m . } {?x ?y ?z OPTIONAL { ?a ?b ?c } ?k ?l ?m }

Page 18: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

SPARQL

semantics lot of changes – now stable based on algebra

works with sets of variable mappings – i.e. tables very different from SQL

“closed” no compositionality

Page 19: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

SPARQL

RDF is a graphSPARQL provides pattern (subgraph) matching –

no other graph handling

SPARQL handles only fixed-size graphsRDFS supports arbitrary hierarchy of classes

SPARQL has no aggregate functions, no “group by” no constructors

Page 20: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Seasoned SQL developer

Page 21: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Seasoned SQL developer

Page 22: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Idea… ?

make the language SQL-like inside not just outside joins, selection, projection, grouping,

aggregation relational algebra works with relation, i.e. sets of

triples, the database is made of relations RDF data is made of… RDF graphs

maybe we should work with RDF graphs

Page 23: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Tables – Graphs

John Smith

John Doe

Jane Doe

Bill Jackson

John

Smith

John

Doe

Jane

Doe

Bill

Jackson

Page 24: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Basic pattern

variables -> “columns”

?firstname

?lastname

?personex:firstname

ex:lastname

Page 25: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Further operations

selection, joins, aggregation, projectiongroup by

Page 26: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Multiple values

[email protected]

[email protected]

ex:johnex:mail

ex:mail

Page 27: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Local and global aggregations

more values in one “column”

maximal number of mailstotal count of mails

Page 28: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

What’s more?

optional parts of the graphregular expressionstextual representation (language)

Page 29: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Conclusion

current state is badtry something different ?

Page 30: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

PART III

Let’s have a look – RDF visualizer

Page 31: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

RDF

subject – the thing we are describingpredicate – the property of the thingobject – the value of the property

a graph (directed, labeled)

Page 32: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Visualization

triangle layout layered drawing for trees

node merging more information for a node

navigation the way to handle huge data

Page 33: Filip Zavoral, Jiří Dokulil SemWex - KSI MFF UK  Semantic Web infrastructure Trisolda current state and perspectives.

Let’s have a look

A picture is worth a thousand words…