Public Administration, Laws Requirements, Natural Language

44
Public Administration, Laws Requirements, Natural Language Alessio Ferrari 1 [email protected] ISTI-CNR, Pisa, Italy Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 1 / 45

Transcript of Public Administration, Laws Requirements, Natural Language

Public Administration, LawsRequirements, Natural Language

Alessio Ferrari1

[email protected]

ISTI-CNR, Pisa, Italy

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 1 / 45

Preliminaries

Who am I?Alessio Ferrari, Ph. D. in Computer EngineeringThree years at GE Transportation Systems s.p.a. (Modelling andCode Generation)Three years at ISTI-CNR (Requirements Engineering and NLP)Main interests: artificial intelligence, natural language

Content of this TalkLearnPAd EU Project: model-based learning for PublicAdministrations (www.learnpad.eu)Requirements in LearnPAdNatural language pragmatic ambiguities

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 2 / 45

LearnPAd

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 3 / 45

Context

Norm Natural LanguageRegulator

Graphical Language RequirementLaw Regulation

Requirements EngineerArtifact

Specification

Needs

Public Administation

Procedure

Software ProcedureCivil Servant

UserCitizen

Needs

WHY NOT?

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 4 / 45

LearnPAd Project

FP7- ICT-2013.8.2 European ProjectModel-based learning in the Public Administration (PA) domainIDEA 1: PA procedures can be modelled with Business ProcessModel and Notation (BPMN)IDEA 2: PA procedures can be enriched by civil servants withNatural Language (NL) descriptions

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 5 / 45

LearnPAd: BPMN

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 6 / 45

LearnPAd: Overview

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 7 / 45

LearnPAd: Overview

Quality of service of PA improved

Quick changes in PA procedures addressed

Process-driven learning provided

Informative learning provided

Procedural learning provided

Knowledge assessment performed

Knowledge sharing fostered

Learning support provided

Learners engaged

Meritocracy promoted

Quality of learning content ensured

Learning content accessed by learners

Learning content defined

Basic definition of learning content

provided

Iterative definition of learning content

provided

Cooperation fostered

Learning content

increased

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 8 / 45

LearnPAd: Requirements Process

ObjectiveAchieve a clear and agreed set of requirementsfor the LearnPAd platform

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 9 / 45

EU Projects Peculiaritiesnumber/distribution of partners: 9 partners, plenary discussiondifficultculture: Italy, France, Switzerland, Austria, Lithuania, need tomeet/talkindustrial vs academic mindsets: 4 academic, 2 close sourcecompanies, 2 open source, 1 PA, industries more practical in REbackground: different domains and terminologyabstraction: focus on specific background leads to lack ofabstractionage/roles: uneasiness of young vs oldobjectives: requirements introduced to pursue specific interestsfocus: the project is not the main activity of participants

What often happens...Everyone develop their piece of the project → integration issues

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 10 / 45

LearnPAd: Requirements Process

KJ Sessions

Collaborative Requirements

Sessions(WIKI)

Requirements Analysis

Preliminary Requirements

Structured Requirements

Justifications

Goal Model

Learning

Modelling

Quality

Glossary Tags

Requirements Consolidation

ConsolidatedRequirements

GOAL Modelling

Goals evaluation

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 11 / 45

KJ Sessions

Activity24 people in 3 groups: Modelling, Learning, QualityDescription of the task by the moderatorWrite requirements in cardsDiscuss the requirementsSecond session to add new requirements

People really excited and high degree of participationInitial individual activity mitigated age/role effects and objectivediscrepanciesSecond session to align terminologyModerators: with recognized authority, or external (notrepresentative of any group)Still, most of the 249 requirements were poorly specified

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 12 / 45

Collaborative Refinement

Requirements uploaded in a Wiki platform (XWiki)Justifications given and Refinements provided

People rather motivated (even if motivation was not perceived)249 → 337 requirementsPeople do not contribute to the requirements of others

Still, requirements were poorly specifiedA selected task force of project participants provided a set of 191consolidated requirementsPeople directly asked to clarify their requirementsExcel sheets used for refinement and consolidation

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 13 / 45

Goal Modelling

Bottom-up goal model definitionFrom requirements to justifications (goals)Provide higher degree of abstraction and spot-our missing needs

Goal ModelsStage 0Reqs: -

Stage 1R: 82

Stage 2R: 78

Stage 3R: 90 Score

G S E G S E G S E G S EMain 24 4 3 32 4 5 24 4 4 24 4 4 HLearning content accessed - - - 9 0 1 32 4 1 47 5 4 HQuality of WIKI Documents - - - 17 0 2 17 0 2 17 0 2 MQuality of BP Models - - - 12 0 3 17 0 4 17 0 4 MLearning support provided - - - 13 0 1 17 1 1 17 1 1 HBP Models edited - - - - - - 15 0 2 15 0 2 MBP Models reused - - - - - - 8 0 0 8 0 0 MQuality by logging - - - - - - 15 0 0 15 0 0 MIterative definition of content - - - - - - 19 0 1 19 0 1 MPlatform flexibility enforced - - - - - - 11 0 0 11 0 0 HKnowledge assessment - - - - - - 8 0 0 39 1 5 MProcedural learning provided - - - - - - - - - 24 2 1 LTOTAL 24 4 3 83 4 12 183 9 15 283 13 24

Table : Growth of the goal models at each stage. R = number of originalrequirements. G = number of hard-goals and requirements. S = number ofsoft-goals. E = number of expectations.

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 14 / 45

What have we learnt

People have to be trained about writing requirementsPeople from academia less confident in collaborativerequirements elicitationToo few user requirements → involve users in separate meetingsNeed for a web-moderator/leader to motivate collaborativerefinementXWiki is good to get statistics on requirementsGoal modelling useful to have abstract view and spot out missingneeds but requires effortTooling not appropriate for goal modelling and sharing (wepreferred sharing with Google Docs but traceability was poor)Integrated tools for the whole requirements process are missing

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 15 / 45

Improved Requirements Process

KJ Sessions Collaborative Requirements

Sessions(XWiki)

Requirements Analysis

Preliminary Requirements

Structured Requirements

Justifications

Goal Model

Learning

Modelling

Quality

Glossary Tags

VOLERE Requirements

Analysis

ConsolidatedRequirements

and Justifications

GOAL Modelling(Objectiver)

Goals evaluation

Requirements Lesson

Preliminary Glossary

Web Moderator

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 16 / 45

LearnPAd: Quality of NL Descriptions

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 17 / 45

LearnPAd: Quality of NL descriptions ensured

BP Model

BP Manager

WIKI Doc

Load SelectCriterions

VALIDATE

Press Validate

Quality Evaluation

Page

Complexity

Structuring

Ambiguity

Complexity: 0.9 (Reduce)

Structuring: 0.1 (Increase)

Ambiguity: 0.7 (Reduce) INSPECT

INSPECT

InspectionPage

The document shall be sent to the proper authorities as soon as

possible after the document has been signed by the officer

WIKI Doc (Non Editable)

Press Inspect

MODIFY

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 18 / 45

LearnPAd: Quality of NL descriptions ensured

ObjectiveIdentify typical NL defects of PA documents

RationaleWe do not have contributions of civil servantsWe ask civil servants about their difficulties with their currentdocumentsWe identify quality defects of currently existing PA documents,normally edited (and read) by civil servants

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 19 / 45

Defects in NL Descriptions: Process

Perform Interviews

Define Questionnaire

Deliver Questionnaire

Evaluate Questionnaire

List of most relevant

categories of defects to be

detected in PA procedures

Evaluate Web-links defining guidelines for editing PA procedures

Define guidelines for editing PA procedures

Guidelines for editing PA

procedures

List of categories of defects to be detected in PA

procedures

Evaluate guidelines

Rule-based identifiable

defects

Non-rule based

identifiable defects

Define defect categories to be identified with

machine-learning

Implement rule-based approach for the

identification of most relevant defects

Tag data-set according to categories

Select PA procedures from the Web

Select a sub-set of PA procedures as data-

set

Implement machine-learning approach

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 20 / 45

Defects in NL Descriptions: From the interviews

7 people interviewed1 EU officer, 4 people from administrative staff of CNR (ResearchInstitute), 2 municipality employees from the Marche RegionWhich are the defects in the NL documents you deal with?

DefectsMost of the time, procedures are not described anywhere!Cross-references with too many lawsAmbiguity and VaguenessLack of contextRedundancy

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 21 / 45

Defects in NL Descriptions

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 22 / 45

Using Collective Intelligence to Detect Pragmatic Ambiguities

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 23 / 45

Ambiguity in Natural Language Requirements

It would be nice to have formal requirements, but NL is the mostwidely understood communication codeNL is inherently ambiguousAmbiguous requirements might cause misinterpretationsamong stakeholdersThe developer/modeller might decide a possible interpretation ofthe requirement - unconscious disambiguationAmbiguities are lexical, syntactic, semantic, and...

PRAGMATIC

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 24 / 45

A Mole at Work

There is a MOLE

at WORK

mh...

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 25 / 45

Pragmatic Ambiguities depend on the CONTEXT

Fe

-+

Common Sense Knowledge

Domain Knowledge

Other Requirements

Other Situational Aspects

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 26 / 45

Approach for Pragmatic Ambiguity Detection

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 27 / 45

Domain knowledge acquisition for different readers

DOCUMENT SET 1 DOCUMENT SET 2

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 28 / 45

Different readers analyse the same requirement

REQUIREMENT

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 29 / 45

Different readers compare their interpretations

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 30 / 45

Overview

REQUIREMENT

DOMAIN DOCUMENTS

Domain Knowledge Graph Construction

Requirement Interpretation

Interpretation Comparison

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 31 / 45

Domain Knowledge Modelling

We model the domain knowledge as a weighted graphEach node is a conceptEach edge represents a connection among conceptsThe weight of the edge represent how close is the connectionbetween two conceptsThe lower the weight, the closer the connectionThe weight is derived from the number of co-occurrences

We build this weighted graph starting from Web pagesconcerning the domain of the requirements document

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 32 / 45

Domain Knowledge Graphs

0.17

0.05

0.167

0.33

0.25

0.25

0.16 0.037

0.1

0.25

0.11 0.0710.17

0.5

0.5

0.33

0.33

patient

observ

deathlocat

visit time

careinform

patient

risk

deathlocat

visit sourc

care

sign

contact

hospit

hospit

Lower weights indicate stronger connections

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 33 / 45

Requirements interpretation as a least-cost path search

Interpreting a requirement is activating the concepts of therequirement in the knowledge graphActivating two concepts in a requirement implies the activation ofother neighboring conceptsThe concepts that are activated are those that are more closelyconnected with the concepts in the requirement (i.e., their edgeshave lower weight)The interpretation of the requirement is a least-cost path searchwithin the domain knowledge graph

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 34 / 45

Requirements Interpretation

REQ. 1 - The system shall store patient data

system

store

patient

data

button

feedback

screen

database

retrieve

memory

content

location

vaccine

name

sicknessdoctor

surname

ram

disk

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 35 / 45

Interpretation Comparison

system

storepatient

data

button

feedback screen

database

retrieve

memory

content

location

vaccine

namesickness

doctor

noise

return

health

duration

care

9

10

5

9

9 + 10 + 5σ = = 0.38 =  𝜏 < 0.5

AMBIGUITY

surname

ram

disk

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 36 / 45

Issues on Coverage and Threshold

CoverageThe content of the domain document shall cover the content of therequirements specificationMinimum coverage: ρ = terms in requirements∩terms in documents

terms in requirements

ThresholdMultiple analysis with different combinations of documents tocompute similarities: σ̄(Ri) and σmin(Ri)

Thresholds computed as average of the similarities for R1 . . .Rn

τσ̄ and τσmin are the considered thresholds

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 37 / 45

Experimental Evaluation

VS

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 38 / 45

Experimental Evaluation

SourceRequirement specification of a system for Outbreak Management(OM) issued by the Public Health Information Network (PHIN)Data collection (names, vaccines, clinical samples) from peoplethat might be affected by an epidemic health event

Set-up114 requirements43 include pragmatic ambiguities (manual)25 domain documents5 different combinations of documents

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 39 / 45

Experimental Evaluation: Domain DocumentsID Title Linkd1 PHEMCE strategy http://goo.gl/hYaipmd2 Application to clinical and Public Health Practice http://goo.gl/hVVy1Yd3 Biodefense countermeasure Department of Defense http://goo.gl/I6U0Nsd4 Wikipedia page for “Case Definition” http://goo.gl/yPndtxd5 Wikipedia page for “Chain of Custody” http://goo.gl/4uvTucd6 Definition of “Chain of custody” http://goo.gl/OUgcQdd7 Communicable disease outbreak plan http://goo.gl/rV72wXd8 Foodborn outbreak management http://goo.gl/pTlgp9d9 Guidelines for the investigation and control of outbreaks http://goo.gl/Sv4Ebud10 Practice guidelines of the infectious diseases http://goo.gl/GjLvg2d11 Implementation guide ambulatory healthcare http://goo.gl/qEiLGRd12 Management of scabies outbreaks http://goo.gl/GUAbKSd13 Modeling information systems architectures di P. Grefen http://goo.gl/j2E4Lxd14 Outbreak control http://goo.gl/f0HC1hd15 Outbreak management guidelines for healthcare http://goo.gl/EcYVEid16 Surveillance and response in humanitarian emergencies http://goo.gl/ybje6id17 PHIN guide for syndromic surveillance http://goo.gl/lEz8zwd18 PHIN messagging guide for syndromic surveillance http://goo.gl/3AAXNEd19 Developing a management system: an overview http://goo.gl/0l5sthd20 Industrial system 800xA system architecture http://goo.gl/RSaBnDd21 System architecture and complexity http://goo.gl/v44tC0d22 WHO guidelines for epidemic prearedness and response http://goo.gl/PK9yn7d23 Wikipedia page for “Management System” http://goo.gl/mgWfhhd24 Wikipedia page for “Outbreak” http://goo.gl/LUQEWmd25 Wikipedia page for “Scabies” http://goo.gl/fjYYrQ

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 40 / 45

Experimental evaluation: Combinations and Results

Combinations of Documents

k G1 |VG1 | |EG1 | ρG1

G2 |VG2 | |EG2 | ρG2

1 d1d3d5d7d9d11d13d16d17d19d20d23d25 7131 62265 0.99d2d4d6d8d10d12d14d15d18d21d22d24 5970 33325 0.98

2 d2d3d6d7d10d11d15d16d17d19d22d23 7383 49989 0.98d1d4d5d8d9d12d13d14d18d20d25d20d24 5826 46179 0.99

3 d6d7d15d22d16d23d1d9d18d25d8d14d24 6375 58736 1d2d10d17d3d11d19d5d13d20d4d12d20 6642 34882 0.98

4 d6d22d16d1d18d8d24d10d3d19d13d4d20 6914 46384 0.99d15d7d23d9d25d14d2d17d11d5d20d12 6400 49848 0.98

5 d22d1d8d10d19d4d15d23d25d2d11d5 6693 41735 0.99d6d16d18d24d3d13d20d7d9d14d17d20d12 6550 53973 1

Precision and Recall

Threshold p rτσ̄ = 0.3247 45% 58%τσmin = 0.2781 51% 63%

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 41 / 45

Observations

Requirements analysis tools shall be tuned to favour recall overprecision (Dan Berry)False negative cases are the main issue

“Demographic information should be collected about theinvestigator [...]”→ influence of the other terms in the computation of the similarity“Mapping interfaces and data dictionaries must be defined [...]”→ multi-word terms

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 42 / 45

Summary and Future Works

Unsupervised and statistical (not rule-based) methodConsider novel similarity metrics to emphasize the role ofsingle ambiguous termsConsider multi-word termsInclude the common-sense knowledge

I Concepts that are highly connected in the domain knowledge areless connected in the common sense knowledge

Integrate structural and dynamic beliefs about the world and thedomain within the knowledge graphs

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 43 / 45

Questions?

Ferrari (ISTI-CNR) PA, Laws, Requirements, NL 44 / 45