Extracting static and dynamic model elements from textual specifications in humanities

15
Extracting static and dynamic model elements from textual specifications in humanities Patricia Martín-Rodilla César González-Pérez Institute of Heritage Sciences, Spanish National Research Council Santiago de Compostela, Spain.

Transcript of Extracting static and dynamic model elements from textual specifications in humanities

Page 1: Extracting static and dynamic model elements from textual specifications in humanities

Extracting static and dynamic model elements from textual specifications in humanities

Patricia Martín-Rodilla

César González-Pérez

Institute of Heritage Sciences, Spanish National Research Council

Santiago de Compostela, Spain.

Page 2: Extracting static and dynamic model elements from textual specifications in humanities

IndexResearch Context & ProblemGoal(s)Related WorkProposal:

oProposal Overview

oProposal Phases

Case Study in Cultural Heritage Information SystemsDiscussion & Open Issues

Page 3: Extracting static and dynamic model elements from textual specifications in humanities

Research ContextInformation Systems are composed of different information dimensions:

…Structural (STATIC)

ArchitecturalBehavioral

Methodological (DYNAMIC)…

BUT, IS support humans activities

SOFTWARE ANALYSTSoftware Textual Specifications

Documents about practices

…Structural (STATIC) MODEL

Architectural MODELBehavioral MODEL

Methodological (DYNAMIC) MODEL…

BUT, in Humanities information…

Narrative-based domainsImportance about the methodological context of information(Static and dynamic link very pronounced)Software analysts require hard information dimension effortSoftware analysts are far from DH expertise

Page 4: Extracting static and dynamic model elements from textual specifications in humanities

• To study how other works deal with the different information dimensions from an holistic point of view, also:• For humanities IS• Directly from textual specifications in early stages software conception

• To propose a pipeline method as a tentatively semiautomatic approach for our needs in humanities domains

Goal(s)

Page 5: Extracting static and dynamic model elements from textual specifications in humanities

Related work

Works in modelling and automatic extraction of DIFFERENT information

dimensions

Methods (Domain rules)Processes (Process Mining)Notations (BPMN, Topic maps, Mind Maps, Concept Maps, i*,…)Practices (Scenarios)

Works in HOLISTIC modelling and automatic extraction of information

dimensions

Open/METISISO/IEC 24744Requirements: Cross-cutting concerns

NEED: From early stages textual specifications?NEED: More than a conceptual bridge…Semi-supervised?

Page 6: Extracting static and dynamic model elements from textual specifications in humanities

Pipeline approach: based on previous works: TextProcessMiner tool (Epure, Martin-Rodilla et al. 2015)

Initial dynamic information -> Process Mining Algorithms: Activity LogsInitial static information -> Identification of domain key concepts: Concept map

Proposal

Page 7: Extracting static and dynamic model elements from textual specifications in humanities

Phase I: TextProcessMiner• Natural Language Processing approach• TextProcessMiner extracts activities from

historical and archaeological official reports.• Previously tested at CSIC, ADS…: in different

languages, validated by report’s authors.• Locality principle in the activity identification:

tree-based syntactic structure.

TextCleaner (Lemmatization, Automatic

cleaning, activities recognition)

ActivityMinerActivityRelationshipMiner

Phase II: Preliminary Concept Map

Historical and Archaeological Methodological Textual

Specifications

Discovered Log(DYNAMIC INFO

DIMENSION)

Discovered Log(DYNAMIC INFO

DIMENSION)

• Automatic identification of domain key concepts• Part of Speech (POS) tagging techniques: decoupling action verbs (activities candidates) countable nouns (key concepts candidates)• Why concepts maps?:Intermediate formalization degreeLearning potentialIterative methodology in concept map creation• Why semi-automatic? Better results in annotation approaches in humanities

Entities Decoupling (POS tagg.)Activities Decoupling (POS tagg.)

Cross-links matching(tree-based syntactic structure)

Preliminary Concept Map

Page 8: Extracting static and dynamic model elements from textual specifications in humanities

Phase III: Supervised Phase

Preliminary Concept Map

Iterative PhaseConcepts and activity names verification: terminology, synonymsOrder and dependence cross links verificationDomain key concepts learning

Pipeline offers to Software Analysts:- Most important concepts identification in the domain in a learning environment- Activities identification and logs- Static and dynamic preliminary link in domains’ terminology

Pipeline is current used:- As a preliminary tool for extracting an holistic information view from early stages

textual specifications.- As a tool for improving the model quality in terms of humanities terminology.

Supervised Concept Map+

Activity Log sequence

Page 9: Extracting static and dynamic model elements from textual specifications in humanities

Case Study: Extracting models in Cultural Heritage IS

Page 10: Extracting static and dynamic model elements from textual specifications in humanities

Phase I: Extracting models in Cultural Heritage IS“The trench was excavated using a toothed bucket using the

back actor of a small excavating machine. The watching brief archaeologist inspected the sides of the trench for any past cultural remains below the overburden. The removed spoil was inspected in order to recover any past cultural artefacts.

Where archaeological deposits were revealed, each layer, fill and cut was individually numbered and described in terms of soil detail, stratigraphic position, dimensions, artefact content, environmental samples and interpretation. The context system was cross-referenced to other records. Registers were maintained for all photographs, levels, plans, section, finds and samples taken, made or gathered in the field.”

(From ADS Archaeological Report, Gerry Martin Associates Ltd. Glasgow)

- excavate trench -take photograph - use bucket -take level - use back_actor_of_machine -take plan - inspect side_of_trench -take section - inspect spoil -take find - recover artefact -make photograph - reveal deposit -make level - number layer -make plan - number fill -make section - number cut -make find - describe layer -gather photograph - describe fill -gather level - describe cut -gather plan - cross_referenced context_system -gather section - maintain register -gather findDiscovered LogTextual Specification

Page 11: Extracting static and dynamic model elements from textual specifications in humanities

Phase II: Extracting models in Cultural Heritage ISTrench

Bucket

Back_actor_of_machine

LayerSide_of_trench

Level

Cut Find

Plan

SectionFill

Deposit

Artefact

Spoil Photograph

Register

Context system

- excavate trench -take photograph - use bucket -take level - use back_actor_of_machine -take plan - inspect side_of_trench -take section - inspect spoil -take find - recover artefact -make photograph - reveal deposit -make level - number layer -make plan - number fill -make section - number cut -make find - describe layer -gather photograph - describe fill -gather level - describe cut -gather plan - cross_referenced context_system -gather section - maintain register -gather find

Discovered Log String Concept Map+

Activity List

Page 12: Extracting static and dynamic model elements from textual specifications in humanities

Phase II: Extracting models in Cultural Heritage IS

Preliminary Concept Map

Eastgate, Hexham

BucketBack_actor_of_machine

Layer

Side_of_trench

Level

Cut

Find PlanSection

Fill

DepositArtefact

Spoil

Photograph

Register

Context system

USE

INSPECT

REVEAL

RECOVER

NUMBERDESCRIBE

CROSS-REFERENCE

GATHER

MANTAIN

EXCAVATE

Trench

Page 13: Extracting static and dynamic model elements from textual specifications in humanities

Phase III: Extracting models in Cultural Heritage IS

Supervised Concept Map

Eastgate, Hexham

BucketBack_actor_of_machine

Layer

Side_of_trench

Level

Cut

Find PlanSection

Fill

DepositArtefact

Spoil

Photograph

Register

Context system

USES

INSPECTS

ALLOWSREVEALING

ALLOWS RECOVERING

NUMBERSDESCRIBES

CROSS-REFERENCE

TO GATHER

HAS TO MANTAIN

EXCAVATES

Trench

Page 14: Extracting static and dynamic model elements from textual specifications in humanities

Discussion & Open Issues Work-in-progress proposal: holistic static and dynamic approach in information modelling Software analysts do not need previous domain knowledge to start creating models Maintenance of the semantic static and dynamic link in humanities domains’ terminology Semi-supervised approach: Software analysts can gradually learn domains’ key concepts and practices Iterative pipeline: incremental improvement of the outputs Tested and evaluated by experts at historical and archaeological textual specifications

Technological dependences: TextProcessMiner (NLP toolkit by Standford) -> TOWARDS A METAMODEL Locality principle and synonyms limitations-> WordNet, CILI INTEGRATION Humanities sub-domains’ adaptation: CH thesauri's, ontologies

Need for rigorous validation with a vast CH textual specifications corpus From activity list to Process Models (Process Mining tools integration: DISCO, etc.)

Page 15: Extracting static and dynamic model elements from textual specifications in humanities

Extracting static and dynamic model elements from textual specifications in humanities

Thank you for your attention

Patricia Martí[email protected]

Institute of Heritage Sciences Spanish National Research Council

Santiago de Compostela, Spain.