Issues and activities in authoring ontologies

32
Issues and activities in authoring ontologies Robert Stevens School of Computer Science University of Manchester [email protected]

description

Departmental seminar at Department of Computer Science, university of Birmingham, 6 November, 2014 abstract: Ontologies are complex knowledge representation artefacts used across biomedical sciences, the media and other domains for defining terminologies and providing metadata. Their use is increasing rapidly, but so far, ontology authoring tools have not benefited from empirical research into the ontology authoring process. Understanding how people build ontologies is key to developing tools that can properly support common authoring activities. In this talk I will first present the outcomes of qualative interviews with ontology authors and the issues it reveals. Second, I will present the results of a study that identifies common activity patterns through analysis of the event logs, screen capture and eye-tracking data collected from the popular authoring tool, Protege. Results from this bottom-up investigation suggest that the class hierarchy is the central focus of activity, playing a role beyond simple class representation. We also find that checking how updates to the ontology is hard and performance is hindered by inadequate support in the user interface. From this investigation we propose design guidelines for bulk editing, efficient reasoning and increased situational awareness in ontology authoring.

Transcript of Issues and activities in authoring ontologies

  • 1. Issues and activities inauthoring ontologiesRobert StevensSchool of Computer ScienceUniversity of [email protected]

2. We need to know what were talkingabout if we dont, our data are useless If we are to interpret our data then we needto know what entities it describes We need to share data and re-use it We need to find data; compare data; analysedata We need to know what we know and agreeabout it. 3. What is an Ontology? Ontology (Socrates & Aristotle 400-360BC) The study of beingWord borrowed by computing for theexplicit description of theconceptualisation of a domain: concepts properties and attributes of concepts constraints on properties andattributes individuals (often, but not always) An ontology defines An agreement on the entities of adomain a common vocabulary for the entitiesof a domain 4. Web Ontology Language (OWL) W3C recommendation for ontologies for the SemanticWeb OWL-DL mapped to a decidable fragment of first orderlogic Classes, properties and instances Boolean operators, plus existential and universalquantification Rich class expressions used in restriction on properties hasDomain some (ImnunoGlobinDomain orFibronectinDomain) Automated reasoners reveal entailmentsfrom the axioms of an ontology in OWL 5. OWL represents classes ofinstancesABC 6. Some OWL and why its hardClass: RanunculusRepensSubClassOf:* Flower,Flowerand (hasFlowerSymmetry some RadialSymmetry)and (hasPart some(Androeciumand (hasAndroecialFusion some Apostemonous)and (hasPart some(Stamenand (hasPart some Filament)and (hasPart some(Antherand (hasAntherAttachment some AdnateAntherAttachment)and (hasDehiscenceType some LongitudinalDehiscence)))))))and (hasPart some(Gynoeciumand (hasGynoecialFusion some Apocarpous)and (hasPart some(Pistiland (hasPart some Carpel)and (hasPart some Style)and (hasPart some(Stigmaand (hasStickiness some Stickiness)and (hasStigmaShape some HookedStigmaShape)))and (hasPart only(Carpelor Stigmaor Style))))and (hasSexualPartArrangement some SpiralArrangement)))and (hasPart exactly 1 (Perianth 7. Some OWL and why its hardClass: RanunculusRepensSubClassOf:* Flower,Flowerand (hasPart some(Calyxand (hasPart exactly 5 (Sepaland (hasColour some Green)and (hasRegion some(BaseRegionand (hasForm some Truncate)))and (hasRegion some(MarginRegionand (hasSepalPetalFeature some Entire)and (hasSepalPetalFeature some Membranous)))and (hasRegion some(SurfaceRegionand (hasSepalPetalFeature some Pubescent)and (hasSurfaceSelector some LowerSurfaceSelector)))and (hasRegion some(SurfaceRegionand (hasSepalPetalFeature some Smooth)and (hasSurfaceSelector some UpperSurfaceSelector)))and (hasRegion some(TipRegionand (hasForm some Truncate)))and (hasSepalPetalFeature some PalmatelyNetted)and (hasSepalPetalShape some Ovate)and (hasSepalousity some Aposepalos))))) 8. Some OWL and why its hardClass: RanunculusRepensSubClassOf:* Flower,Flowerand (hasPart some(Corollaand (hasPart exactly 5 (Petaland (hasColour some Yellow)and (hasPetalousity some Apopetalos)and (hasRegion some(BaseRegionand (hasForm some Acute)))and (hasRegion some(MarginRegionand (hasSepalPetalFeature some Entire)))and (hasRegion some(TipRegionand (hasForm some Acute)))and (hasSepalPetalFeature some PalmatelyNetted)and (hasSepalPetalShape some Obovate)and (hasPart exactly 1 Nectary)))))and (hasPerianthArrangement some AlternatingPerianthArrangement)and (hasPart only(Calyxor Corolla)))) 9. Describing potatoesPotatoBoilingPotato LateFirstEarlyPotatoAccentClass: BoilingPotatoEquivalentTo: Potato and hasPreferredCookingMethod some BoilingClass: LateFirstEarlyPotatoEquivalentTo: Potato and hasCroppingTime some LateFirstEarlyCroppingClass: AccentSubClassOf:Potato,hasPreferredCookingMethod some Boiling,hasYield some HighYield,hasCroppingTime some LateFirstEarlyCropping 10. Protgprotege.stanford.edu 11. Understanding how ontologies areauthored in OWL We want to understand how these complex,cognitively hard artefacts are authored HCI approaches do not pervade all computingdisciplines Instruments to run user studies are scarce Consequences for the OWL realm No real understanding about the authoring process Authoring tools are not human-centered What if we want to go further? Automatic detection of authoring patterns Intelligent support for authoring 12. How we tackle the problem Get familiarised with theproblem Set the scope Acquire insights for thequantitative approachQualitativeapproach Interview study Thematic analysis Collection of quantifiable data Use of lab apparatus (eye-tracker,video, etc.) Find authoring patterns Quantify and generaliseQuantitativeapproach Instrumentation of Protg Lab study Data-driven analysis 13. Little is known about the humanfactors of ontology authoring What we know is mostly based on anecdotalevidence We asked about problems and strategies 14. Uncovering issues in ontologyauthoring Exploration and navigation Increase situational awareness by giving feedbackabout the consequences of actions: e.g. undo,reasoning Provide overviews for those who are not familiarwith a given ontology For those who are familiar with an ontology allowbookmarks and provide landmarks Facilitate the navigation through filters, facetednavigation mechanisms and hyperlinking entities 15. Uncovering issues in ontologyauthoring Search and retrieval Integrated support to search on remote ontologiesand incorporate entities in the working ontology Efficient authoring Include design templates and spreadsheets Provide on-the-fly reasoning capabilities Remove information overload in explanations Include predefined unit tests for evaluation 16. Protg4US: a step towards havingobservational instruments Protg4US: Protg for User Studies Logging capabilities of: Interaction events: click, hover, expand hierarchy... Authoring events: add siblings, add restrictions... Environment commands: reason, search, undo...76585,2,Classes,Element edited,Juliette subclass of: Potato and hasCroppingTime some Maincropping77786,3,Classes,Save ontology,http://owl.cs.manchester.ac.uk/ontology/start-here.owl80204,3,Classes,Reasoner invoked,HermiT 1.3.880647,1,Classes,Mouse entered, Class hierarchy (inferred)82910,1,Classes,Element hovered,Early_cropping_potato83049,1,Classes,Element selected,Early_cropping_potato83661,1,Classes,Hierarchy expanded,Early_cropping_potato 17. User study to show the strengths ofProtg4US Experimental design: Participants: 16 expert authors Stimuli: a potato ontology and Protg4US 3 authoring tasks with an increased complexity Collected data Protg4US logs: 10K events Completion times Self reported expertise Perceived task difficulty Screen video and eye-tracking 18. Describing potatoesPotatoBoilingPotato LateFirstEarlyPotatoAccentClass: BoilingPotatoEquivalentTo: Potato and hasPreferredCookingMethod some BoilingClass: LateFirstEarlyPotatoEquivalentTo: Potato and hasCroppingTime some LateFirstEarlyCroppingClass: AccentSubClassOf:Potato,hasPreferredCookingMethod some Boiling,hasYield some HighYield,hasCroppingTime some LateFirstEarlyCropping 19. Protg4US in action 20. Analysis of log data Interaction events account for 65% of eventswhile authoring events are 30% The top 3 events (entity selection, descriptionselection and invocation of editing menu)account for 56% of events 21. Analysis of log data N-gram analysis of consecutiveevents suggests lots ofrepetition Esp. for entity selection andhierarchy expansion Mouse driven functionalitiesmakes this possible in Protg We built adjacency matrices forparticipants: number oftransitions from event x toevent y100075050025002 4 6 8 10Ngrams sizefrequencyEventClass additionDescription selectedEntity selectedEntity selected(i)Hierarchy expandedHierarchy expanded(i) 22. Reconstructing the interaction toidentify patterns through visualisation Left: web diagrams show most frequenttransitions between states Right: time diagrams show the authoringrhythm P8BackClass additionConvert into definedDescription selectedDescription selected(i)Entity deletedEntity draggedEntity edited:finishEntity edited:startEntity selectedSet property UndoRun reasonerProperty additionLoad ontologyGet explanationHierarchy expanded(i)Hierarchy collapsed(i)Hierarchy collapsedEntity renamed Entity selected(i)Hierarchy expandedSaveDescription selected(i)Description selectedEntity draggedEntity deletedEntity renamedBackUndoHierarchy collapsed(i)Hierarchy collapsedGet explanationSet propertyProperty additionClass additionRun reasonerSaveConvert into definedEntity edited:finishEntity edited:startHierarchy expanded(i)Hierarchy expandedEntitySelected(i)Entity selectedLoad ontology0 1000 2000 3000 4000 23. Analysis of eye-tracking data Distribution of aggregated dwell times in theareas of interest The class hierarchyand the entityedition menu getthe majority offixations and dwelltime 24. Analysis of eye-tracking data Number of fixations between areas of interest High frequencyexpected at thediagonal Symmetrysuggests checkingbehaviours The class hierarchyis the pivotalwindow 25. Log data + eye-tracking data Synchronised both data sources Merged same consecutive eventse.g. class additiont, class additiont+1, class additiont+2, entity selectedt+3M_class_additiont+2, entity selectedt+3 Synchronised both data sources Computed N-gram analysis and we found 3main activities: Exploration activity Authoring activity Reasoning activity 26. Exploration activitySelectentityExpandhierarchy0.480.31SelectinferredentityExpandinferredhierarchy0.250.430.120.54Loadontology0.520.31ExpandhierarchySelectdescription0.290.37Exploration activity Expand the asserted classhierarchy after loading anontology The exploration of theasserted hierarchy isabout finding a specificlocation to add or modifyan entity, while explorationof the inferred one is tocheck the state of theontology 27. Editing activitySelectdescriptionSelectentity0.29 Modifyentity0.370.630.59Editing activity Sequence found 362 times 22.6 times per participant The high probabilities along with the frequencywith which this activity is performed, indicatesthat entities were modified in batches 28. Reasoning activityRunreasoner0.17Convert intodefined classSaveSelectdescription0.160.150.40Expandinferredhierarchy0.30Selectentity0.410.370.43Selectinferredentity0.540.25 0.12Reasoning activity After running the reasoner participants observethe consequences of reasoning on the assertedhierarchy and the description area OR To check classification, participants expand theinferred class hierarchy and make selections oninferred entities 29. Discussion Ontology editing is highly repetitive The class hierarchy received users attention45% of the time Acts as an external memory of the ontology Plays the role of an index with pointers to extendedinformation Navigation of the inferred hierarchy isexploratory, while the navigation of the assertedhierarchy is directed 30. Discussion Some outcomes corroborate initial findings:repetitiveness of editing task and lack ofsituational awareness after running thereasoner Design recommendations Support bulk editing Place editing features close to the class hierarchy Show entity descriptions close to the classhierarchy Anticipate reasoner invocation Make changes to the inferred hierarchy explicit 31. AcknowledgementsMarkel Vigo did the work.Caroline Jay and Robert Stevens helped out with design,analysis, and so on. 32. Issues and activities inauthoring ontologiesRobert StevensSchool of Computer ScienceUniversity of [email protected]: Answering What if... questions for Ontology Authoring.EPSRC reference EP/J014176/1