Dataset Quality Ontology - An Engineering Experience
-
Upload
jerdeb -
Category
Presentations & Public Speaking
-
view
275 -
download
1
Transcript of Dataset Quality Ontology - An Engineering Experience
DatasetQualityOntology:Anengineeringexperience
JeremyDeba*staUniversityofBonn/FraunhoferIAIS
Germany
…whoamI
• PhDStudentattheUniversityofBonn• OriginallyfromMalta(EnyislandintheMediterraneanbetweenItalyandLibya)
deba*[email protected] 2
…whoamI
• B.Sc(Hons)inComputerScience–UniversityofMalta– Thesis:CollaboraEveEdiEngandExpertFinding
• M.AppScinComputerScience–DERI(nowInsight),NaEonalUniversityofIreland,Galway– Thesis:Ontology-basedrulesforUser-ControlledSupportinUbiquitousEnvironments
deba*[email protected] 3
…myPhD–thebigpicture
• WorkrelatedtoDataQuality(inLD)– represenEngqualitymetadata(daQ)– assessingdataquality(Luzzu)– idenEfyingnewmetricsfromstandardvocabularies(likePROV-O)usingdifferenttechniquesforscalability.
…agenda• DefiniEonsofQuality
• Lookatsomequalityaspectsre:OntologyEngineering
• OurexperienceindevelopingdaQ– contribuEonstowardsaW3Cvocab
• VoCoLasatoolforcollaboraEvevocabularydevelopment
• VOWLasatoolforvisualrepresentaEonofontologies
deba*[email protected] 5
Robert Pirsig
… the result of care Zen and the Art of Motorcycle Maintenance (1974)
7deba*[email protected]
… conformance to requirements
Quality is Free : The Art of Making Quality Certain. Mentor book. (1979)
Joseph JuranPhillip Crosby
9deba*[email protected]
…whatisqualityforyou?
deba*[email protected] 10
…QualityasdefinedinadicEonary
D1.howgoodorbadsomethingis
D2.acharacteris1corfeaturethatsomeoneorsomethinghas
D3.ahighlevelofvalueorexcellence
…defini1onsfromh9p://www.merriam-webster.com
deba*[email protected] 11
…dereferenceability“AnyHTTPURIshouldbedereferenceable,meaningthatHTTPclientscanlookuptheURIusingtheHTTPprotocolandretrieveadescrip1onoftheresourcethatisiden1fiedbytheURI.”–TomHeath,ChrisBizer:LinkedDataBook
(LDEvol)
deba*[email protected] 12
?303SeeOther 200OK
…theprocessthatretrievesarepresenta/onoftherequestedresource
…unknownexternalontologyused
• Usageofontologiesthatcannotbedereferenced.– rdfs:domain– rdfs:range– rdfs:subClassOf– …
• UsageofdeprecatedClasses(classestaggedwithowl:deprecatedClass)
deba*[email protected] 13
…licencing“Specifyanappropriateopendatalicense.Data(inourcaseontology)reuseis
morelikelytooccurwhenthereisaclearstatementabouttheorigin,ownershipandtermsrelatedtotheuseofthepublisheddata”–BPfor
publishingLinkedData(hjps://www.w3.org/TR/ld-bp/)
• Awaytodefineclearboundaries
• InclusionofMachineandHumanReadableLicensetoontology’smetainformaEon
deba*[email protected] 14
…ontologydecleraEon
• Describinganontologyusingowl:Ontology(andvoaf:Vocabulary–forLOVinclusion)– Metadataincludes:creator,datemodified,descrip1on,versioninfo,preferrednamespaceuri,preferredprefix…
– otherprovenanceinformaEonsuchashistoryofchangesetc…
deba*[email protected] 15
…domainandrangedefiniEon
• Opendomain-rangeisnotrecommended
• Reducesinteroperabilityandthus“understanding”ofresources’properEes
deba*[email protected] 16
…ontologyhijacking
• RedefiniEonofclassesandproperEesinavocabularythatitisnotinitsnaturalnamespace.– e.gredefiningfoaf:PersoninyourownontologytobeasubclassofanewlydefinedPersonconcept.
deba*[email protected] 17
…consistencychecking
• Possibleproblemswhenusingaxiomssuchas:– owl:inverseFunc1onalProperty– owl:disjointClass– owl:disjointWith– owl:inverseOf– …
deba*[email protected] 18
…otherpossiblemeasures• MulElingualism
• Humanreadablelabelsandcomments
• Interlinkingwithsimilarterms/concepts
• ValidSyntax
• Un-typedclassesandproperEes
• …others?
deba*[email protected] 19
…thedaQmeta-model
deba*[email protected] 20
…daQ–History
• DescribingQualityMetadatainastandardisedmanner
• Startedaroundtheendof2013
• FormsthebasisoftheupcomingW3CDataQualityVocabulary(DQV)standard
deba*[email protected] 21
…daQ–TheFirstVersion
deba*[email protected] 22
…daQ–SubsequentVersions
• Alwayspen,paperandasimpletexteditor– GITasaversioningcontrolsystem
• 4versionsbeforethecurrentversion
• UseCaseiteraEontesEng
deba*[email protected] 23
…daQ–2ndVersion
• Introduced:QualityGraph,and3levelsofAbstrac1on(BasedonZaverietal.categorisaEon)
deba*[email protected] 24
rdfg:Graph QualityGraphA
B
Category Dimension Metric
rdfs:Resource
hasDimension hasMetric
dateComputed requires
value
xsd:dateTime
computedOn rdfs:Resource
…daQ–AbstracEon• HidingComplexity
• DisEncEonbetweendaQconceptsandtangiblequalitymeasureconcepts
• Abstractclassescannotbetyped(rdf:type),butinsteadshouldbesub-classed(rdfs:subClassOf)
• ThereisnowaytocheckforabstractclassviolaEonunlessthereisanapplicaEonthatcheckssuchsyntaxerrors.
deba*[email protected] 25
…daq–WhyAbstractProperEes?
• BestPracEcetoavoiddoubtandambiguity:– Ametricisajachedtoonedimensiononly.– Adimensionisajachedtoonecategoryonly.
• UnifiedviewalsopresentedinZaverietal.DataQualitySurvey
deba*[email protected] 26
…daQ–3rdVersion
• Introduced:TheDataCubeVocabulary
deba*[email protected] 27
rdfg:Graph QualityGraph
Aqb:DataSet
definesQBDataSet
B
Category Dimension Metric
rdfs:Resource
hasDimension hasMetric
dateComputedrequires
valuexsd:dateTime
qb:ObservaDon
hasObservaDon
rdfs:Resource
computedOn
metric
qb:dataSet
…daQ–4thVersion
• Modified:QualityGraph;Introduced:expectedDataType;Added:datetoObserva1on
deba*[email protected] 28
rdfg:Graph QualityGraph
Aqb:DataSet
B
Category Dimension Metric
rdfs:Resource
hasDimension hasMetric
expectedDataType
requires
valuexsd:anySimpleType
qb:ObservaBon
hasObservaBon
rdfs:Resource
computedOn
metric
qb:dataSet
dc:date
…daQ–canametricreturnavalueotherthanasimpledatatype?
“ThispropertyfromDAQisdefinedtohaverangexsd:anySimpleType.Whileitseemsusefultodefinetheexpecteddatatypeforametric,asimpletypemaytoonarrow:inmanycasesametricwillbedeterminedonadatarecordorasubgraph.”–[BailerWarner28/10/2015]W3CDWBPPublicCommentsList-h9ps://lists.w3.org/Archives/Public/public-dwbp-comments/2015Oct/0019.html
deba*[email protected] 29
…daQ-5thVersion
• Introduced:isEs1mate,computedBy
deba*[email protected] 30
rdfg:Graph QualityGraph
Aqb:DataSet
B
Category Dimension Metric
rdfs:Resource
hasDimension hasMetric
expectedDataType requires
value
xsd:anySimpleType qb:ObservaBon
hasObservaBon
rdfs:Resource
computedOn
metric
qb:dataSet
dc:date
xsd:boolean
isEsBmate
prov:Agent
computedBy
…daQ–CurrentVersion
• Removed:computedBy,qb:Observa1onAdded:daq:Observa1on
deba*[email protected] 31
rdfg:Graph QualityGraph
Aqb:DataSet
B
Category Dimension Metric
rdfs:Resource
hasDimension hasMetric
expectedDataType requires
value
xsd:anySimpleType daq:ObservaBon
hasObservaBon
rdfs:Resource
computedOn
metric
qb:dataSet
sdmx-dimension:BmePeriod
xsd:boolean
isEsBmate
xsd:dateTime
qb:ObservaBon
prov:EnBty
…involvementinW3C
• W3CWorkingGroup–DataontheWebBestPracEces– developopendataecosystem– provideguidancetopublishers– fostertrustindata
• 3Deliverables:– BestPracEces– DataQualityVocabulary(DQV)– DataUsageVocabulary
deba*[email protected] 32
…involvementinW3C-DQV
• Ameta-modeltocovermanyqualityaspectsofadataset(linkeddataornot)
• ThecorecomponentdescribingquanEtaEvemeasuresisinspiredbydaQ
deba*[email protected] 33
…involvementinW3C-DQV
deba*[email protected] 34
…involvementinW3C-DQV
• Notableissues(204,205)betweendaQandDQV(hjps://www.w3.org/2013/dwbp/track/issues/xxx)-wherexxxis204or205
– UsageofabstractclassesandproperEes– DefiningCategory-Dimension-Metricassubclassofskos:Concept
deba*[email protected] 35
…collaboraEveframework• VoCoL–anIDEforcollaboraEvevocabularydevelopment
withVCSintegraEon
• A(exchangeable)componentbasedsystem– HumanReadableDocumentGeneraEon– IntelligentTurtleEditor– EvoluEonTracker– OntologyVisualisaEon– SPARQLEndpointService– Client-SidevalidaEonbeforecommittoVCS
• OnlineDemo:hEp://buEerbur06.iai.uni-bonn.de/
deba*[email protected] 36
…visualisingontologies-VOWL
• VOWL–AvisualnotaEonforOWL– IntuiEve
– Self-explaining
– Comprehensible
– Well-specified
– Complete
– Device-independent
deba*[email protected] 37
hjp://vowl.visualdataweb.org
…daQinVOWL
deba*[email protected] 39
…ReferencesandLinks• (daQ)-RepresenEngdatasetqualitymetadatausingmulE-
dimensionalviews–J.Deba*sta,C.Lange,S.Auer• (DQV)-hjps://www.w3.org/TR/vocab-dqv/• (VoCoL)-hjps://github.com/vocol/vocol• (Zaverietal.)-QualityAssessmentforLinkedData:A
Survey• (LDEvol)-hjp://linkeddatabook.com/ediEons/1.0/• (VOWL)–hjp://vowl.visualdataweb.org
deba*[email protected] 40