Extracting static and dynamic model elements from textual specifications in humanities
-
Upload
technological-ecosystems-for-enhancing-multiculturality -
Category
Education
-
view
10 -
download
0
Transcript of Extracting static and dynamic model elements from textual specifications in humanities
Extracting static and dynamic model elements from textual specifications in humanities
Patricia Martín-Rodilla
César González-Pérez
Institute of Heritage Sciences, Spanish National Research Council
Santiago de Compostela, Spain.
IndexResearch Context & ProblemGoal(s)Related WorkProposal:
oProposal Overview
oProposal Phases
Case Study in Cultural Heritage Information SystemsDiscussion & Open Issues
Research ContextInformation Systems are composed of different information dimensions:
…Structural (STATIC)
ArchitecturalBehavioral
Methodological (DYNAMIC)…
BUT, IS support humans activities
SOFTWARE ANALYSTSoftware Textual Specifications
Documents about practices
…Structural (STATIC) MODEL
Architectural MODELBehavioral MODEL
Methodological (DYNAMIC) MODEL…
BUT, in Humanities information…
Narrative-based domainsImportance about the methodological context of information(Static and dynamic link very pronounced)Software analysts require hard information dimension effortSoftware analysts are far from DH expertise
• To study how other works deal with the different information dimensions from an holistic point of view, also:• For humanities IS• Directly from textual specifications in early stages software conception
• To propose a pipeline method as a tentatively semiautomatic approach for our needs in humanities domains
Goal(s)
Related work
Works in modelling and automatic extraction of DIFFERENT information
dimensions
Methods (Domain rules)Processes (Process Mining)Notations (BPMN, Topic maps, Mind Maps, Concept Maps, i*,…)Practices (Scenarios)
Works in HOLISTIC modelling and automatic extraction of information
dimensions
Open/METISISO/IEC 24744Requirements: Cross-cutting concerns
NEED: From early stages textual specifications?NEED: More than a conceptual bridge…Semi-supervised?
Pipeline approach: based on previous works: TextProcessMiner tool (Epure, Martin-Rodilla et al. 2015)
Initial dynamic information -> Process Mining Algorithms: Activity LogsInitial static information -> Identification of domain key concepts: Concept map
Proposal
Phase I: TextProcessMiner• Natural Language Processing approach• TextProcessMiner extracts activities from
historical and archaeological official reports.• Previously tested at CSIC, ADS…: in different
languages, validated by report’s authors.• Locality principle in the activity identification:
tree-based syntactic structure.
TextCleaner (Lemmatization, Automatic
cleaning, activities recognition)
ActivityMinerActivityRelationshipMiner
Phase II: Preliminary Concept Map
Historical and Archaeological Methodological Textual
Specifications
Discovered Log(DYNAMIC INFO
DIMENSION)
Discovered Log(DYNAMIC INFO
DIMENSION)
• Automatic identification of domain key concepts• Part of Speech (POS) tagging techniques: decoupling action verbs (activities candidates) countable nouns (key concepts candidates)• Why concepts maps?:Intermediate formalization degreeLearning potentialIterative methodology in concept map creation• Why semi-automatic? Better results in annotation approaches in humanities
Entities Decoupling (POS tagg.)Activities Decoupling (POS tagg.)
Cross-links matching(tree-based syntactic structure)
Preliminary Concept Map
Phase III: Supervised Phase
Preliminary Concept Map
Iterative PhaseConcepts and activity names verification: terminology, synonymsOrder and dependence cross links verificationDomain key concepts learning
Pipeline offers to Software Analysts:- Most important concepts identification in the domain in a learning environment- Activities identification and logs- Static and dynamic preliminary link in domains’ terminology
Pipeline is current used:- As a preliminary tool for extracting an holistic information view from early stages
textual specifications.- As a tool for improving the model quality in terms of humanities terminology.
Supervised Concept Map+
Activity Log sequence
Case Study: Extracting models in Cultural Heritage IS
Phase I: Extracting models in Cultural Heritage IS“The trench was excavated using a toothed bucket using the
back actor of a small excavating machine. The watching brief archaeologist inspected the sides of the trench for any past cultural remains below the overburden. The removed spoil was inspected in order to recover any past cultural artefacts.
Where archaeological deposits were revealed, each layer, fill and cut was individually numbered and described in terms of soil detail, stratigraphic position, dimensions, artefact content, environmental samples and interpretation. The context system was cross-referenced to other records. Registers were maintained for all photographs, levels, plans, section, finds and samples taken, made or gathered in the field.”
(From ADS Archaeological Report, Gerry Martin Associates Ltd. Glasgow)
- excavate trench -take photograph - use bucket -take level - use back_actor_of_machine -take plan - inspect side_of_trench -take section - inspect spoil -take find - recover artefact -make photograph - reveal deposit -make level - number layer -make plan - number fill -make section - number cut -make find - describe layer -gather photograph - describe fill -gather level - describe cut -gather plan - cross_referenced context_system -gather section - maintain register -gather findDiscovered LogTextual Specification
Phase II: Extracting models in Cultural Heritage ISTrench
Bucket
Back_actor_of_machine
LayerSide_of_trench
Level
Cut Find
Plan
SectionFill
Deposit
Artefact
Spoil Photograph
Register
Context system
- excavate trench -take photograph - use bucket -take level - use back_actor_of_machine -take plan - inspect side_of_trench -take section - inspect spoil -take find - recover artefact -make photograph - reveal deposit -make level - number layer -make plan - number fill -make section - number cut -make find - describe layer -gather photograph - describe fill -gather level - describe cut -gather plan - cross_referenced context_system -gather section - maintain register -gather find
Discovered Log String Concept Map+
Activity List
Phase II: Extracting models in Cultural Heritage IS
Preliminary Concept Map
Eastgate, Hexham
BucketBack_actor_of_machine
Layer
Side_of_trench
Level
Cut
Find PlanSection
Fill
DepositArtefact
Spoil
Photograph
Register
Context system
USE
INSPECT
REVEAL
RECOVER
NUMBERDESCRIBE
CROSS-REFERENCE
GATHER
MANTAIN
EXCAVATE
Trench
Phase III: Extracting models in Cultural Heritage IS
Supervised Concept Map
Eastgate, Hexham
BucketBack_actor_of_machine
Layer
Side_of_trench
Level
Cut
Find PlanSection
Fill
DepositArtefact
Spoil
Photograph
Register
Context system
USES
INSPECTS
ALLOWSREVEALING
ALLOWS RECOVERING
NUMBERSDESCRIBES
CROSS-REFERENCE
TO GATHER
HAS TO MANTAIN
EXCAVATES
Trench
Discussion & Open Issues Work-in-progress proposal: holistic static and dynamic approach in information modelling Software analysts do not need previous domain knowledge to start creating models Maintenance of the semantic static and dynamic link in humanities domains’ terminology Semi-supervised approach: Software analysts can gradually learn domains’ key concepts and practices Iterative pipeline: incremental improvement of the outputs Tested and evaluated by experts at historical and archaeological textual specifications
Technological dependences: TextProcessMiner (NLP toolkit by Standford) -> TOWARDS A METAMODEL Locality principle and synonyms limitations-> WordNet, CILI INTEGRATION Humanities sub-domains’ adaptation: CH thesauri's, ontologies
Need for rigorous validation with a vast CH textual specifications corpus From activity list to Process Models (Process Mining tools integration: DISCO, etc.)
Extracting static and dynamic model elements from textual specifications in humanities
Thank you for your attention
Patricia Martí[email protected]
Institute of Heritage Sciences Spanish National Research Council
Santiago de Compostela, Spain.