1
SCOVO:
Using Statistics on the Web of Data
Michael Hausenblas, Wolfgang Halb, Yves Raimond, Lee Feigenbaum, Danny Ayers
ESWC2009 In-Use Track, 2009-05-04, Heraklion, Greece
2
Agenda SCOVO Motivation
Requirements and Issues
Statistical Modelling Framework
Comparison
Usage
3
Motivation SCOVO: Statistical Core Vocabulary
http://sw.joanneum.at/scovo
Statistical data is present everywhere
4
MotivationWeb of Data is for sharing, accessing and using
DATA
SCOVO aims at makingstatistical data easierand better accessibleon the Web of Data
5
MotivationBased on 3 distinct efforts riese – ‚RDFizing and Interlinking the EuroStat Data
Set Effort‘ http://riese.joanneum.at Eurostat data (official European statistics)
US Census Bureau‘s annual Statistical Abstract Publishing UN and OECD
http://oecd.dataincubator.org/
6
Issues Handling of Multiple Dimensions Reusability and Uptake Structural vs. Domain Semantics Performance and Scalability Issues
7
Requirements Usable on the Web of Data (URIs, RDF, etc.) Extensible both on schema level and instance level Light-weight, addressing uptake, and performance
and scalability issues.
8
Statistical Modelling Framework
9
SCOVO
http://purl.org/NET/scovo
10
Comparison
11
Example From http://purl.org/NET/scovo
SELECT ?airport_name ?period ?percent_ontime_arrivals
FROM <http://sw.joanneum.at/scovo/otf-example-full.rdf>
WHERE {
?item rdf:type scv:Item ;
scv:dimension ?airport;
scv:dimension ?time_period;
scv:dimension ex:ota ;
rdf:value ?percent_ontime_arrivals .
?airport rdf:type ex:Airport ;
dc:title ?airport_name .
?time_period rdf:type ex:TimePeriod ;
scv:min ?min ;
scv:max ?max ;
dc:title ?period .
FILTER ( ?min > "2006-02-01" && ?max < "2006-08-01")
}
12
Usage: riese
RDFizing and Interlinking the Eurostat data Set Efforthttp://riese.joanneum.at
Contributing to Linking Open Data project Offers linked data version of Eurostat data for both humans and machines (first LOD-in-RDFa
dataset in the cloud)
13
Usage: voiD voiD – Vocabulary of Interlinked Datasets
http://semanticweb.org/wiki/VoiD Formal description of linked datasets Uses SCOVO to express stats about triples,
interlinking, resources, etc.
14
Usage: RDFStats
http://semwiq.faw.uni- linz.ac.at/node/9
15
Conclusion Modeling statistics is a non-trivial task (wide range of
requirements, etc.) SCOVO is usable, generic, simple However, there are issue:
Aggregation Domain semantics
16
Let‘s discuss!
Top Related