Query-generation-for-provo-data-201406

19
Towards Query Generation for PROV-O Data Jun Zhao 1 , HongHanWu 2 and Jeff Z. Pan 2 1 Lancaster University @junszhao | j.zhao5 at lancaster.ac.uk 2 University of Aberdeen honghan.wu | jeff.z.pan at abdn.ac.uk

description

Query-generation-for-provo-data for provAnalytics 2014 at Provenance Week: http://provenanceweek.org/2014/analytics/

Transcript of Query-generation-for-provo-data-201406

Page 1: Query-generation-for-provo-data-201406

Towards Query Generation for PROV-O Data

Jun Zhao1, HongHanWu2 and Jeff Z. Pan2

1Lancaster University@junszhao | j.zhao5 at lancaster.ac.uk

2University of Aberdeen

honghan.wu | jeff.z.pan at abdn.ac.uk

Page 2: Query-generation-for-provo-data-201406

Outline

• Motivation• Profile-driven query generation

– K-Drive– ProvQ

• Result discussion• Future work

Page 3: Query-generation-for-provo-data-201406

The Big Picture of PROV: A Motivation Scenario

http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png

Page 4: Query-generation-for-provo-data-201406

The Big Picture of PROV: A Motivation Scenario

Adapted from: http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png

Provenance information

Page 5: Query-generation-for-provo-data-201406

The Big Picture of PROV: A Motivation Scenario

http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png

Page 6: Query-generation-for-provo-data-201406

Provenance in the Wild v.s. ProvBench

Taverna-PROV

Vistrails PROV

Wings PROV

Wikipedia-PROV

Twitter-PROV

OBIAMA (social

simulation)

Workflow / scientific domain

• 11 repositories so far• Various representations• Cross different domains• Openly accessible under

different open licenses

Web resources

Social domain

https://github.com/provbenchhttps://sites.google.com/site/provbench/home

Page 7: Query-generation-for-provo-data-201406

Next Step: Access PROV Datasets

Taverna-PROV

Vistrails PROV

Wings PROV

Wikipedia-PROV

Twitter-PROV

OBIAMA (social

simulation)

Can we query across them?

Can we learn something by

querying across them?

What can we do with them?

……

Page 8: Query-generation-for-provo-data-201406

Query Generation: A Bottom-up Approach

Taverna-PROV

Wings PROV

Wikipedia-PROV

OBIAMA (social

simulation)

Provenance Data Profile Generator

Provenance Query Builder

SPARQL queries for PROV-O

datasets

Example profiles:• Class associations• Property

associations

Page 9: Query-generation-for-provo-data-201406

Query Generation: A First Step

APROV

Dataset

Provenance Data Profile Generator

Provenance Query Builder

SPARQL queries for the PROV-O

dataset

Example profiles:• Class associations• Property

associations

Page 10: Query-generation-for-provo-data-201406

Big City:

Big Road:

Slide credit: Dr Wu at Scottish Linked Data Workshop 2014http://www.kdrive-project.eu EU FP7 Marie-Curie 286348Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116

• University of Aberdeen• A generic query generation

tool for semantic web data• Find key sub-graphs in the

RDF data– Big City: The most

instantialised concepts in the data

– Big Road: The most frequent relations connecting those big cities

K-Drive Query Generation

Page 11: Query-generation-for-provo-data-201406

K-Drive Generator

Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html

Page 12: Query-generation-for-provo-data-201406

Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html

SELECT ?Generation ?x4_1 ?x3_1 ?x0_1

WHERE {

?Generation rdf:type <http://www.w3.org/ns/prov#Generation>.

?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 .

?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 .

?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation .}

K-Drive Generator

Page 13: Query-generation-for-provo-data-201406

ProvQ: Property Association Mining

APROV

Dataset

Provenance Data Profile Generator

Provenance Query Builder

SPARQL queries for the PROV-O

dataset

Discover properties that are used together with each PROV-O properties

Expand a set of “seed” PROV-O queries using the discovered associating properties

https://github.com/junszhao/ProvQ

Page 14: Query-generation-for-provo-data-201406

ProvQ: Property Association Mining

• Advantages– Reduce the performance challenge usually faced

in association rule mining– Produce provenance-centric queries

• Disadvantages– Could miss queries that are not related to PROV-O

terms at all

Page 15: Query-generation-for-provo-data-201406

Expanding Starting Queries

Page 16: Query-generation-for-provo-data-201406

Approach Walk-Through

• Given a seed atomic query,

we have seed property: • We find all properties used together with

– http://purl.org/wf4ever/wfprov#describedByParameter – http://purl.org/wf4ever/wfprov#wasOutputFrom – http://www.w3.org/ns/prov#qualifiedGeneration

• Return resulting conjunctive SPARQL query

Page 17: Query-generation-for-provo-data-201406

Results Comparison

• K-Drive Generator– 7 Queries– 3 of them are not exactly

provenance queries– Probably easier to

understand because classes are included in the queries

– But queries can be complex

• ProvQ– 7 Queries– 1 not returned by K-Drive

(prov:wasDerivedFrom)– Only provenance queries

are returned– Queries are simple, based

on properties associations starting from “seed” PROV-O properties

https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt

Page 18: Query-generation-for-provo-data-201406

Future Work

• Define and evaluate usefulness• Test against more datasets• Experiment with reasoning• Query generation across multiple datasets

Page 19: Query-generation-for-provo-data-201406

Thank you!

These slides have been created by Jun Zhao

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unportedhttp://creativecommons.org/licenses/by-nc-sa/3.0/