SWAT4LS 2014 SLIDE by Yamamoto

Click here to load reader

  • date post

    15-Jul-2015
  • Category

    Software

  • view

    65
  • download

    1

Embed Size (px)

Transcript of SWAT4LS 2014 SLIDE by Yamamoto

  • A SPARQL Endpoint Profiler for Efficient Question Answering

    SystemsYasunori Yamamoto

    Database Center for Life Science (DBCLS)

  • Various SPARQL Endpoints

  • You can get any data necessarily and sufficiently from them?

    SPARQLSPARQL Protocol and RDF Query Language

  • A SPARQL editor you might often see

    Almost nothing here

    dean beyett

    Use the force

  • Lower Columbia College (LCC)

    Wed like to make this happen.

  • Question Answering

  • SELECT ?t1 WHERE { ?t1 [:isa] [genes] . ?t2 [:isa] [alzheimer disease] . ?t1 [associated with] ?t2 . }

    What genes are associated with Alzheimer disease?

    [genes] = a [gene] class

    [alzheimer disease] = instance of a [disease] class

    Out of this studys focus

    Mapping to existing vocabularies

    PubDictionary / identifiers.org

  • Example: BIO2RDF

    [gene] Class = bio2rdf:omim_vocabulary:Gene [disease] Class = bio2rdf:umls_vocabulary:Resource

    [disease] Class = bio2rdf:omim_vocabulary:Phenotype

    A QA system wants to know if there is any path between them.

  • QA System

    SPARQL Endpoint

    SPARQL Endpoint

    SPARQL Endpoint

    SPARQL Endpoint

    ?

    Slow and inefficient

    A

    Finding EPs that can compose a path between a [gene] class and [alzheimer disease] .

    gene/omim/umls?

    Naive method

  • QA System

    SPARQL Endpoint

    SPARQL Endpoint

    SPARQL Endpoint

    SPARQL Endpoint

    Metadata

    Previously gathers metadata of each endpoint

    Our approach

  • QA System

    SPARQL Endpoint

    SPARQL Endpoint

    SPARQL Endpoint

    SPARQL Endpoint

    Metadata

    Metadata

    Metadata

    Metadata

    Metadata

    ?

    Effective and efficient

    A

  • QA System

    SPARQL Endpoint

    SPARQL Endpoint

    SPARQL Endpoint

    SPARQL Endpoint

    Metadata

    Metadata

    Metadata

    Metadata

    Metadata

    Each endpoint provides its metadata

    Ideal way

  • VoID

    SD

    Yes, theyre fine!

    SPARQL 1.1 Service DescriptionVocabulary of Interlinked Datasets

  • But, we cannot get

    Data about relationships between classes

    PhenotypeGene ?Cap9

    APCS

    PRNP

    :

    EEC SYNDROME 1

    TOXEMIA OF PREGNANCY

    DIABETES

    :

  • SPARQL Builder MetadataExtensions to VoID / SD

  • Obtain all the relationshipsPredicate first vs. Class first

  • Criterion Average Min Max Median TotalClasses 1,976.29 1 50,303 5.0 509,884Properties 33.48 1 885 21.0 9,608

    http://stats.lod2.eu/stats

  • Obtain metadata using SPARQLA locally declared class

    :sycamore rdf:type :Tree .

    any resource that appears as an object of rdf:type in the dataset.

    A locally undeclared class (LUC) instance

    any resource that does not appear as an subject of rdf:type in the dataset.

  • Obtain metadata using SPARQL

    Class A Class B LUC Literal

    Class A 0 76 23 50

    Class B

    LUC

    Object

    Subject

    SELECT ?p (COUNT(?p) AS ?rc) WHERE { ?f ?p ?t . ?f a :Class_A . ?t a :Class_B . } GROUP BY ?p

    :Class_A :Class_B

    ?p?f ?ta a

  • Improvement

    N2 times (N: number of the classes)

    Even a dataset of many classes has only a few relationships.

    inefficientSELECT ?p ?c (COUNT(?p) AS ?pc)

    WHERE { ?f a :Class_A ; ?p ?t ; !rdf:type ?t . ?t a ?c . } GROUP BY ?p ?c

    N times

    :Class_A ?c

    ?p?f ?ta a

  • Gathering and providing metadata

    http://tm.dbcls.jp/tdp/

  • Searching example[gene] Class = bio2rdf:omim_vocabulary:Gene

    [disease] Class = bio2rdf:umls_vocabulary:Resource

    [disease] [gene]

    [gene] [disease]

    OR

  • From [disease]

    [disease]

    Literal

    rdfs:Resource

    dcterms:Dataset

    rdf:typevoid:inDataset

    From [gene]Count Predicate Object class608220 bio2rdf-omim:x-gi bio2rdf-gi:Resource!98639 bio2rdf-omim:article bio2rdf-pubmed:Resource95443 bio2rdf-omim:refers-to bio2rdf-omim:Resource81438 bio2rdf-omim:vocabulary:refers-to bio2rdf-omim:Gene28032 bio2rdf-omim:vocabulary:gene-symbol bio2rdf-hgnc:Resource

  • Things to doTo improve inter-operability

    Integrating vocabularies with other related parties.

    To improve efficiency Calling for data providers to provide metadata.

    To realize a QA system using metadata Discussing with LODQA developers.

  • Thank you!