[IEEE 2009 ICSE Workshop on Software Engineering in Health Care - Vancouver, BC, Canada...

Mining and Analysing Security Goal Models in Health Information Systems

Jens H. Weber-Jahnke

[email protected]

Dept. of Computer ScienceUniversity of VictoriaVictoria, B.C., Canada

Adeniyi Onabajo

[email protected]

Dept. of Computer ScienceUniversity of VictoriaVictoria, B.C., Canada

Abstract

Large-scale health information software systems have toadhere to complex, multi-lateral security and privacy reg-ulations. Such regulations are typically defined in form ofnatural language (NL) documents. There is little method-ological support for bridging the gap between NL regula-tions and the requirements engineering methods that havebeen developed by the software engineering community.This paper presents a method and tool support, which areaimed at narrowing this gap by mining and analysing struc-tured security requirements in unstructured NL regulations.A key value proposition of our approach is that requirementsare mined “in-place”, i.e., the structured model is tightly in-tegrated with the NL text. This results in better traceabilityand enables an iterative rather than waterfall-like require-ments extraction and analysis process. The tool and methodhave been evaluated in context of a real-world, large scaleproject, i.e., the Canadian Electronic Health Record.

1. Introduction

Governed by complex multilateral requirements, large-scalee-Health information software systems are increasingly net-worked on a global level. The information maintained, pro-cessed and exchanged in these systems is often highly sen-sitive. Over the last decade, the software requirements en-gineering community has developed methods for denoting,analysing and negotiating multilateral requirements. Manysuch methods employ a goal-based paradigm, i.e., they con-sider requirements in terms of hierarchies of potentiallyconflicting goals and subgoals. The diffusion of these meth-ods into industrial practice remains low, however. One rea-son may be that security regulations are commonly writtenby policy analysts using natural language (NL) rather thanby software engineers. Surveying 50 years of research inmodelling and using NL legal regulations, Otto and Antonconclude that “extracting requirements from legal texts is

still a difficult and error-prone process” [20]. Attempts havebeen made to bridge the gap between NL regulations andengineering models. Breaux and Anton propose a processof generating goal models from NL policies involving (1)goal mining and (2) semantic parameterisation [3]. Thefirst step in this process aims at detecting goals in NL reg-ulations by translating them to a set of restricted naturallanguage sentences (RNLS). The second step embeds thelinguistic goal model into a semantic formalism, enablingsubsequent analysis and reasoning.

In this paper, we are presenting an alternative approachto closing the gap between NL privacy regulations and se-mantic goal models. In contrast to Breaux and Anton’s ap-proach, our method does not rely on a linguistic normaliza-tion of NL regulations, but it is based on an in-place em-bedding of semantic annotations directly with the originalNL documents. Since the original NL regulations are notreplaced, they retain their readability and maintainabilityfrom the viewpoint of policy analysts. At the same time, thesemantic annotations provide a basis for indexing, retriev-ing, navigating and analysing regulations. This approachprovides for better traceability between NL regulations andgoal models. Traceability is important because of the emer-gent and evolving nature of e-health software. It is notonly an enabler for collaboration between policy analystsand system engineers, but also provides for impact analysiswhen policies or systems evolve. Otto and Anton point out“Traceability within the context of regulatory systems takeson a far greater significance than we already afford it in therequirements engineering community” [20].

This paper makes three contributions: (1) We present analternative approach to mining security goals using in-placesemantic annotations rather than linguistic translations asin [3], (2) we define a formal notion for conflict and in-terference between multilateral security goals, taking intoaccount a notion of defeasible default goals and overridingexceptions, (3) we evaluate the method (and its supportingtool) with a real-world large-scale e-Health system.

The rest of the paper is structured as follows: The next

SEHC’09, May 18-19, 2009, Vancouver, Canada978-1-4244-3739-9/09/$25.00 © 2009 IEEE ICSE’09 Workshop42

section introduces the case study used to evaluate and il-lustrate our method. Sec. 3 discusses work related to thearea of security requirement analysis in general and goal-based modelling in particular. Sec. 4 describes the proposedmethod and its tool support in detail. Since the method con-sists of several activities, we will add discussions reflectingpractical experiences with each activity directly to each sub-section. We will describe an empirical evaluation in Sec. 5and offer conclusions in Sec. 6.

2. Case study: Canadian Health Infostructure

Canada is currently developing a nation-wide ULS formaintaining, processing and exchanging electronic healthrecords (EHR). The implementation of health services fallsunder the jurisdiction of individual provinces and territories,which may have different regulations. The decentralizednature of the Canadian health care system has contributedto the development of approximately 40,000 different het-erogeneous health information systems [1]. A not-for-profitpan-Canadian organization called Health Canada Infoway(Infoway) has been created to facilitate the integration ofall these systems into a joint infrastructure. Infoway haspublished an extensive set of regulations, standards and ar-chitecture designs at www.infoway.ca.

In this paper, we will focus on Infoway’s security regu-lations [21]. However, we provide an overview of the ar-chitecture, as depicted in Fig. 1, in order to understand theterminology. Jurisdictional boundaries are indicated withdashed lines. Components either belong to the shared EHRinfrastructure (EHRi) or represent specific software systemsat the various Points of Service (PoS). Examples of PoSsystems are hospital record systems, laboratory informa-tion systems (LIS) and drug information systems (DIS).PoS systems may be operated by different organizationsand EHRi components may be hosted by different organi-zations, e.g., health authorities or service providers.

Infoway’s document contains 87 nominal security re-quirements and 28 nominal privacy requirements, some ofthem pertaining to the EHRi, others pertaining to organiza-tions hosting components of the EHRi, organizations con-necting to the EHRi, and PoS systems connecting to theEHRi [21]. The Infoway case study is suitable for study-ing security requirements engineering in a multilateral con-text, since there exist many inter-dependencies between In-foway regulations and other jurisdictional policies. SeveralInfoway requirements are specified as defeasible defaults(e.g., “Unless required by law...”) or conditionals(e.g.,“If required by law...”). Few current require-ments modelling methods provide concepts for expressingsuch defeasible or conditional goals. The Infoway require-ment in Fig. 2 is an example that combines a conditionalclause (“When consent is required...”) with a de-

feasibility clause (“where no exception...”). (Theacronym PHI stands for Personal Health Information.)

EHRiComp 1

Hospital LIS DISPoS

Systems other

Host 1

EHRiComp 2

EHRiComp 3

Host 2

EHRiComp m

Host n

EHRiComp m+1Bus

EHR infrastructure

(EHRi)

Org. 1 Org. k

Figure 1. Simplified EHR architecture

Figure 2. Requiremt. 12 - Consent directives

3. Related work

Otto and Anton survey approaches to formalizing andanalysing NL regulations [20]. They point out that typi-cal systems are governed not by single regulations but bycollections of multilateral regulations. They find that for-malisms for modelling regulations lack stakeholder com-prehensibility. They elicit requirements for methods andtools supporting requirements engineers with respect toNL regulations. Out of these requirements, we specifi-cally highlight the following four, because they are not ad-equately addressed by current methods — a fact that hasmotivated our approach. Tools and methods for engineeringrequirements with NL regulations should provide: (1) anno-tation of regulatory statements, (2) traceability between ref-erences and requirements, (3) a data dictionary and glossaryto ensure consistency, and (4) navigation and searching.

Breaux and Anton bridge the gap between NL regula-tions and formalisms used in requirements engineering bytranslating the regulations into a set of restricted natural lan-guage sentences (RNLS) [3]. RNLS have a simple struc-ture, each dealing only with a single actor, a single actionand a single object. As such, they are in “close-to-model”form, yet they technically are still in NL and should remainaccessible to stakeholders. Our own experiences with trans-lating large NL policy documents to RNLS indicate thatthis accessibility is limited in practice. Complex policies

43

easily give rise to many hundreds of RNLS, whose nestedstructure impede their overall readability to a degree that itbecomes difficult to comprehend documents on that level.Consider Fig. 3 as an example. It contains a translationof the previously presented Infoway requirement (Fig. 2)to RNLS. A stakeholder may read the requirement startingfrom sentences marked with an asterisk (*). While it is pos-sible for laypersons to decipher the requirement’s meaning,it requires significant effort. Kiyavitskaya et al. providetool support for Breaux and Anton’s goal mining methodby automatically tagging text with concepts of interest (e.g.,right, obligation, actor, action, object etc.) [12].

Lau et al. present an XML schema for a semanticmarkup of legal requirements [13]. The approach usessemi-automated feature extraction but is not described indetail and the markup schema is not completely presented.This is the also the case for many publications on policymarkup - a fact that impedes evaluation and comparison ofthese approaches [20]. One exception is Moen’s work oncomparing XML retrieval models for legislation [15]. How-ever, her models are not semantic but only structural, mark-ing hierarchies of NL regulations in terms of statutes, titles,chapters, sections etc.

Requirements engineering methods tend to neglect theactivity of processing NL regulations. Many methods formodelling and analysis early requirements employ a goal-based paradigm. The i∗ method, which originated in theagent-oriented modelling domain, is one of the most pro-lific approaches [24]. Key concepts in i∗ are goals, agents,resources, tasks and dependencies. In addition to strictgoals, i∗ can express softgoals to reflect non-functionalrequirements and how they are influenced (positively ornegatively) by the satisfaction of other goals. This way,i∗ allows engineers to analyse and negotiate trade-offs be-tween goals [16]. The i∗ method provides the basis foran extended requirements engineering method called Tro-pos, which provides a formal semantics and the ability toexpress temporal properties of goals and dependencies [7].A further extension of this method called Secure Tropos isparticularly interesting for our project focus on security re-quirements. Secure Tropos extends Tropos with the abilityto express trust and delegation dependencies. Secure Tro-pos is supported by a tool that enables automatic detectionof certain classes of conflicts, e.g., regarding authorizationand trust [6]. Models are created using visual diagrams butthe method does not (explicitly) consider the mining of NLdata sources as input. Published case studies indicate thatconsiderable effort is spent on the “goal mining” step: Mas-sacci et al. report that three months of work and significantinteraction with policy experts where required to elicit re-quirements from a 300 page document [14]. Realizing that“it has always been difficult to bridge the gap between legallanguage and computer language” Compagna et al. pro-

RNLS_R12_1: law requires consentRNLS_R12_2: EHRi receives PHIRNLS_R12_3: EHRi stores PHIRNLS_R12_4: EHRi processes PHIRNLS_R12_5: EHRi transmits PHI RNLS_R12_6: RNLS_R12_1 for RNLS_R12_2RNLS_R12_7: RNLS_R12_1 for RNLS_R12_3RNLS_R12_8: RNLS_R12_1 for RNLS_R12_4RNLS_R12_9: RNLS_R12_1 for RNLS_R12_5RNLS_R12_10 : EHRi must be able to maintain association between PHI and usage consent directivesRNLS_R12_12* : if RNLS_R12_6 then RNLS_R12_10 RNLS_R12_13* : if RNLS_R12_7 then RNLS_R12_10 RNLS_R12_14* : if RNLS_R12_8 then RNLS_R12_10 RNLS_R12_15* : if RNLS_R12_9 then RNLS_R12_10 RNLS_R12_11 : EHRi must be able to maintain association between PHI and disclosure consent directivesRNLS_R12_16* : if RNLS_R12_6 then RNLS_R12_11RNLS_R12_17* : if RNLS_R12_7 then RNLS_R12_11 RNLS_R12_18* : if RNLS_R12_8 then RNLS_R12_11 RNLS_R12_19* : if RNLS_R12_9 then RNLS_R12_11 RNLS_R12_20 : EHRi must be able to process consent directives before RNLS_R12_5

RNLS_R12_21* : if RNLS_R12_6 then RNLS_R12_20 RNLS_R12_22* : if RNLS_R12_7 then RNLS_R12_20 RNLS_R12_23* : if RNLS_R12_8 then RNLS_R12_20 RNLS_R12_24* : if RNLS_R12_9 then RNLS_R12_20RNLS_R12_25 : RNLS_R12_5 violates consent directivesRNLS_R12_26 : EHRi discloses PHI RNLS_R12_27 : law permits RNLS_R12_26RNLS_R12_28*: if RNLS_R12_6 and RNLS_R12_25 and not RNLS_R12_27 then prohibit RNLS_R12_5 RNLS_R12_29* : if RNLS_R12_7 and RNLS_R12_25 and not RNLS_R12_27 then prohibit RNLS_R12_5 RNLS_R12_30* : if RNLS_R12_8 and RNLS_R12_25 and not RNLS_R12_27 then prohibit RNLS_R12_5 RNLS_R12_31* : if RNLS_R12_9 and RNLS_R12_25 and not RNLS_R12_27 then prohibit RNLS_R12_5 RNLS_R12_32: EHRi notifies requestorRNLS_R12_33*: if RNLS_R12_28 then RNLS_R12_32RNLS_R12_34*: if RNLS_R12_29 then RNLS_R12_32RNLS_R12_35*: if RNLS_R12_30 then RNLS_R12_32RNLS_R12_36*: if RNLS_R12_31 then RNLS_R12_32

Figure 3. R12 in Fig. 2 translated to RNLS

pose a pattern-based approach to aid requirement engineersin constructing Secure Tropos models [4]. The hypothesisbehind this work is that patterns help engineers learn abouttypical interactions related to security policies. They makeno reference to using patterns for mining goals in NL regu-lations.

Other approaches for goal-based definition of securityrequirements have been based on the KAOS methodol-ogy [17]. KAOS provides descriptive concepts similar toi∗ but it is also capable of expressing temporal phenomena.Haley et al. have presented a comprehensive frameworkfor representing and analysing security requirements [9].NL regulations are mentioned in this framework as “coreartifacts” for the identification of goals and requirements.However, the framework does not give any further guid-ance on how to systematically process NL regulations. He,Anton and Jones present the Requirements-based AccessControl and Policy Specification (ReCAPS) as a methodto ensure better compliance between policies, requirementsand system design [10]. Their experience from applyingReCAPS reinforces the importance of integrating NL doc-uments into the overall software engineering process. Thelimited focus on access control in ReCAPS provides for thedefinition of heuristic templates to help with goal mining inNL policies.

Ghanavati et al. have researched a framework for track-ing compliance between goal models and policy documents,with specific application to health care and privacy poli-cies [5]. They use the i∗-based Goal-oriented RequirementsLanguage (GRL) and introduce traceability links betweenGRL models and policy documents. In contrast to our ap-proach, policy links are established only after model cre-ation rather than deriving the goal models from the policy.

In an earlier publication, we have presented a processand visual formalism for modelling confidentiality require-ments [8]. The focus of the current paper is on mining se-

44

curity requirements from NL regulations.

4. In-place goal mining (IGM)

The in-place goal-mining (IGM) process incorporates fouractivities laid out in Fig. 4 and summarized below.

1) Goal Annotation: Goals are identified in NL reg-ulations and annotated with concepts from an ontology.2) Structural Analysis: Goal annotation structures areanalysed with respect to their structural completeness.3) Terminological Sorting: Annotated concepts areconsolidated into consistent terminologies.4) Semantic Analysis: Goal models are analysed forconsistency.

The result of this four-step process is a semi-structuredset of NL regulations that are indexed with a common ter-minology and have been analysed (and potentially revised)with respect to completeness and consistency.

Goal Annotation

Structural Analysis

TerminologicalSorting

Semantic Analysis

this is a text thatwill become unreadablewhen I lowerthe font size further.

NL regulations

AnnotationOntology

Structural Constraints

Inference Rules

Consistency Constraints

this is a text thatwill become unreadablewhen I lowerthe font size further.

model-embeddedNL regulations

(semi-structured)

1 2 3 4

Figure 4. In-place goal mining process

4.1. Goal Annotation

4.1.1 Annotation Ontology

Semantic goal annotation requires an annotation schema,also called annotation ontology, i.e., a conceptualization ofthe phenomena we are interested in, which is in the do-main of multilateral security regulations. We have devel-oped an annotation schema using a frame-based ontologymodel [18].

We began the development of the annotation ontologyby aligning meta models of three major goal-based require-ments engineering methods, i.e., i∗/Secure Tropos [23],GBRAM [3] and KAOS [11]. i∗/Secure Tropos models rep-resent actors, resources and dependencies (including depen-dencies specific to the security domain, e.g., trust and del-egation). KAOS models further provide notions of actionsthat implement goals. GBRAM implements a nested sub-ject, object, target, action (SOTA) pattern of representinggoals. The annotation schema in Fig. 5 combines featuresselected from all three meta models, but also adds conceptsspecific to the type of analyses we wanted to conduct.

Goal - Since our annotation ontology is goal-based, theoverarching concept in it is the Goal. Like most other goal-

based methods, we allow goals to be decomposed into sub-goals. This decomposition (modelled with the sub relation-ship) provides for hierarchical structures of goals, also com-monly called goal-trees.1 Goal trees are typically modelledas and/or trees, because sub-goals may be combined con-junctively or disjunctively. The nature of the composition isrepresented by the disjunctive attribute in our model. If thisattribute is true for a composite goal, only one of its sub-goals must be met in order to meet composite goal. Goalsmay also be negative, i.e., they mean to avoid a certain sit-uation. This concept is similar to anti-goals in the KAOSmethod.

SecGoal - Our interest is primarily in goals pertainingto information security. We currently distinguish betweeneight different security goals in our ontology:

• Create-goals create information

• Change-goals change content of information

– Amend-goals change information by adding to it

– Edit-goals change information arbitrarily

• Collect-goals gather information

• Disclose-goals provide information to actors

• Delete-goals destroy information

• Use-goals utilize information

Some reader may view the above concepts as actionsrather than goals and, thus, we would like to briefly stateour position on this question, before continuing with the in-troduction of the annotation ontology. The difference be-tween goals and actions is not precisely defined in currentgoal-base methods. The New Oxford Dictionary defines anaction as the fact or process of doing something, typicallyto achieve a goal. When requirements are decomposed intogoal-trees, it is clear that the leaves of the trees are moreactionable than the roots. However, it is not clear whento start calling them actions vs. goals. While experiment-ing with previous versions of our annotation ontology, weexperienced that annotators tend to get uncertain and con-fused by having to choose between annotating a particulartext as goal or action.

Actor - Actors are related to goals in several possibleways. They may be defining the goal (maker-relationship),they may be executing the goal (actor-relationship), theymay be the target of a goal execution (target-relationship),or they may actually be able to override a goal (defeasi-ble by-relationship). The first three relationship types existanalogously in other goal-based methods. We introducedthe defeasible by relationship to provide a way to model

1The subgoal relationship should not be confused with goal sub-classing.

45

Collect

Sec Goal

Funct Goal

Goalnegative Booleandisjunctive Boolean

Edit

Change

Information

THING

Actor

Disclose Use

Nominal Goal

Amend

Create Delete

isa

isa

isa

isa

custodian*

owner

part of*

about*

isa isa

isa

isa

isa

isa

isa

isa

defeasible by*

sub*

object*

makeractor*

target*

isa

sub actor

isa

Figure 5. Annotation schema

regulations where particular actors were given the power tospecify exceptions from default goals. We will see exam-ples for this situation later. Actors are specialized in actorhierarchies (sub actor-relationship).

Information - Similar to the concept of a dependum ini∗, representing the object of a goal, we consider Informa-tion objects, which are connected to goals by the object-relationship. Information objects may contain other infor-mation objects part-of-relationship. Actors may appear inthe role of owners or custodians of information objects. In-formation is data about something or someone. This is rep-resented by the about-relationship pointing to the most gen-eral root concept in the Protege ontology, called THING.

Functional Goal - Occasionally, security regulationsmake reference to functional requirements of software com-ponents. We provide a concept for annotating text as func-tional goals in order to facilitate traceability to functionalrequirements specifications, e.g., use case models.

Nominal Goal - This last concept is introduced merelyas a means to reflect and model the nominal structure ofrequirements in the NL regulation documents. This meansthat if the NL regulation refers to a particular text passage as“Requirement X”, this passage can be annotated as a Nomi-nal Goal X to reflect this chosen structure, independently ofits internal cohesion.

4.1.2 A simple annotation example

In Step 1 of our IGM process, the concepts in the annota-tion ontology described above are used to “mark up” the NLregulations with a semantic structure. This step is currentlya human-driven, tool-supported activity. The requirementsengineer has to choose appropriate concepts from the on-tology to annotate proper text passages. Fig. 6 illustrateshow the following (contrived) NL regulation is annotatedwith ontology concepts: Let us assume that health author-

ity X has the following regulation: “Hospitals (actor) mustnot (negation) release (disclose) personal health informa-tion (object) about patients (about) to the general public(target).” Note, that we use a shaded node to indicate thenegation in Fig. 6. Of course, real-world regulations arerarely as simple as the example presented above. Goalsare often nested (conjunctively or disjunctively), may con-ditionally depend on other goals or are overriden by othergoals. We will discuss real-world applications after brieflydescribing the tool-support for the IGM process.

<<Disclose>>release

<<Actor>>Hospital

<<Actor>>public

<<Information>>PHI

<<Actor>>patient

target

about

object

custodian

actor

<<Actor>>Health Authority

maker

Figure 6. Simple example regulation

4.1.3 Tool support

The annotation process is interactive and supported by atool called CREE-tool. It is a plug-in to Stanford Univer-sity’s Protege knowledge engineering work bench and ex-tends another plug-in developed by the University of Col-orado, called Knowtator [19]. CREE-tool allows usersto annotate and analyse NL regulations with goal models.Users are provided with instant visual feedback during theannotation process. The concepts in the annotation ontol-ogy are colour-coded to match the text spans annotated withthem. The annotation tool is context-sensitive with respectto the annotation schema, disabling invalid structural anno-tations. The user is presented with a continually updatedgraphical view, visualizing the goal structure mined so far.CREE-tool further provides analysis operations for IGMprocess steps 2-4 (cf. Fig. 4). Fig. 7 shows a screen shot

46

Figure 7. CREE-tool

of CREE-tool with the annotation schema on the left side,the NL text pane in the centre, and the goal model visualizeron the right side. We are not using auto-generated diagramsin the rest of this paper, because manually drawn diagramsuse up less space and the colour-coding does not print wellin black and white.

4.1.4 Annotating e-health regulations

Let us revisit our case study. Fig. 8 shows a goal model thatresults from annotating our example Infoway requirementfrom Fig. 2. Note that we use bold arrows to indicate sub-goal (sub) relationships in this diagram to improve readabil-ity. We start the mining process by annotating easily iden-tifiable concepts (i.e., the “low hanging fruits”), e.g., actorssuch as the EHRi and the requestor, information resourcessuch as PHI and consent directive, positive disclosure goalssuch as transmitting and notify, as well as negative disclo-sure goals such as block transmission.

Some concepts may not be referred to explicitly in NLtext. For example, R12 in Fig. 2 mentions the possibil-ity of exceptions outlined in law. This indicates an actorlegislature who can specify regulations that defeat R12 and

<<Nominal goal>> Privacy Requirement 12

<<Actor>> EHRiactor

<<Security goal>> receiving, storing,

processing, or transmitting

<<Functional goal>> maintain the association

<<Information>> consent directives

object

<<Information>> PHI

about

<<Actor>> Infoway maker

<<Disclose>> block the transmission

<<Disclose>> notify

<<Actor>> Requestor

<<Disclose>> transmitting

object

target

<<Use>> process consent

object

<<Functional goal>>respond to request

<<Functional goal>>violate the directives

OR

AND

AND

<<Functional goal>>violate the diretives

AND

<<Functional goal>>respond or block

<<Actor>> law

defeasible by

Figure 8. Requirement 12 - Goal Model

expressed with the defeasible by association in the annota-tion ontology. Despite the fact that R12 does not refer tothe actor explicitly, we annotate the text span “law” withthe actor concept in Fig. 8. Other concepts that may not bementioned explicitly often refer to functional goals that aretypically described elsewhere, e.g., in Use Case specifica-tions. Examples are the maintain the association between

47

PHI and the consent directives and respond to request func-tional goals in Fig. 8. While the existence of these func-tionalities is indicated in Fig. 2, the actual functions aredescribed in functional requirements documentation of theEHRi.

Part b of R12 refers to an information usage step (“pro-cess consent directive”) and essentially expresses a condi-tional choice based on the outcome of this usage, i.e., toallow transmission in case of positive consent or to blocktransmission if consent was withheld. This choice can bemodelled in a goal-oriented paradigm by two opposite func-tional goals (violate the directives) as part of the two alter-natives (transmitting or block). Our approach currently ab-stracts from temporal properties such as the one indicatedin part b: “process consent directives before transmitting...”We decided to abstract from temporal phenomena at thistime for the sake of simplicity and consider adding suchproperties in the future. Another phenomena that we haveoccasionally encountered are goals which depend on the ex-istence of other goals. Note that this is different from goalsrequiring the satisfaction of other goals in order to be satis-fied. R12 contains an example by prefacing the regulationwith the condition “When consent is required...”. Our cur-rent method abstracts from such phenomena as well.

In terms of scalability, we experienced convoluted goalgraphs renderings if we require each goal to denote an ac-tor, target, object, and maker (cf. Fig. 5). However, werecognized that subgoals often share many of the same val-ues for these attributes as their parents (or parent’s parentetc.). Therefore, we have introduced the convention of in-heriting values for these attributes from their parent goals,in case they are left undeclared in subgoals. As an example,consider the transmitting goal in Fig. 8: it inherits its targetfrom the functional goal respond to request and its object,actor and maker from the top-level nominal goal. This in-heritance convention greatly reduces the annotator’s workin our case studies, yet it also has its drawback, as we willpoint out in the next section. (The reader may have spottedthe problem already.)

The case study described in Sec. 2 has generated manyother interesting goal models, but we will keep using our ex-ample requirement R12 for describing the subsequent IGMactivities, because of space constraints. The full set ofmined goals for the Health Canada Infoway case study willbe published in an extended technical report.

4.2 Structural Analysis

The objective of the structural analysis activity is to verifystructural completeness of the annotated goal model. Thisstep aids the user in finding errors and omissions. Iterationsare normal between this activity and the annotation activity.The user triggers the structural analysis from the CREE-tool

user interface (button “Check Incompleteness” in Fig. 7)and causes the analysis engine to highlight missing struc-tural elements in the goal model. The analysis function hasdeclaratively been defined in terms of UML Object Con-straint Language (OCL) invariants and programmed in Java.Some invariants simply check conformance to the cardi-nality constraints in the annotation ontology, while othersare more complex, e.g., checking for the existence of spec-ified or inherited properties. The expression given in Fig. 9checks for the existence of an object, target, and actor prop-erty for each goal. It further enforces an acyclic goal struc-ture, requires that goals are either top level (nominal goals)or children of other goals, and requires that nominal goalshave a maker. Note that we utilize an extended versionof OCL proposed by Schurr in [22]. Schurr introduces atransitive closure operator (*) for navigating over “zero ormore” instances of a specified path. This operator providesfor a more concise definition of the “inheritance” conven-tion introduced in the previous section. Moreover, note thatsuper denotes the role name that navigates the sub asso-ciation in our annotation ontology from the subgoal to itsparent in Fig. 5.

context Goal inv:self.super*.object->notEmpty() andself.super*.target->notEmpty() andself.super*.actor->notEmpty() andnot self.super.super*->includes(self) andself.super->isEmpty() implies oclIsTypeOf(NominalGoal) andoclIsTypeOf(NominalGoal) implies self.maker->notEmpty()

Figure 9. Structural constraint - example

4.2.1 Analysing e-health policy structures

Using the structural analysis function has proven useful forfinding annotation errors and omissions. We also frequentlyfound NL regulations to be incomplete or implicit with re-spect to our structural constraints. Let us revisit our ex-ample requirement R12 to illustrate this situation: The firstparagraph of the Infoway policy in Fig. 2 does not specifyfrom whom the PHI would be received, where it would bestored, who would be allowed to process it, and to whomit would be transmitted. Consequently, the annotated goalmodel in Fig. 8 does not contain a target for goal receiv-ing, storing, processing or transmitting and would thus beflagged by the structural analysis rule in Fig. 9. Neverthe-less, most human readers would understand such an omis-sion as indicating a universal role that applies for any target.We allow the annotator to make this knowledge explicit bysetting the property to the special value ANY.

Not all kinds of omissions indicate universal properties.Another kind of omission is contained in the last paragraphof Fig. 2: Here, Infoway requires the EHRi to notify the

48

requestor whenever data is blocked. However, there is noexplicit mentioning what information such a notificationshould include. Clearly, this omission does not imply thatthe notification may include any arbitrary choice of infor-mation. The regulation is simply mute in this regard. Un-fortunately, the structural constraint in Fig. 9 may miss thisomission, due to a drawback of our convention of “inherit-ing” properties from parent goals, if they are not annotatedin subgoals. This is the drawback indicated at the end ofSec. 4.1.4: Consider the model in Fig. 8, which annotatesthe word “notify” as a disclosure subgoal of a parent goalthat has an object property (top-level goal). In this case, ourannotation convention would allow goal notify to inherit theobject transitively from its parents, i.e., the OCL constraintin Fig. 9 would not indicate a structural omission. The omis-sion would have been detected without our inheritance con-vention. However, as indicated earlier, we found that theextent to which the inheritance convention reduces the an-notator’s work and simplifies the resulting goal models byfar outweighs the drawback with respect to structural analy-sis. Moreover, the problem does not remain undetected butwill be picked up in the subsequent semantic analysis, as wewill describe in Sec. 4.4.

4.3 Terminological sorting

The objective of this activity is to compile a consistent ter-minology of annotated concepts. The user interactively re-solves terminological conflicts due to homonyms and syn-onyms, and arranges concepts in hierarchical taxonomies.A particular result of the activity is an actor hierarchy andan information hierarchy (cf. the two associations sub actorand part of in Fig. 5). Each entry in either hierarchy con-tains a thesaurus (set) of terms that are considered synony-mous in the context of the NL policies being mined, e.g.,the terms “patient” and “client” may be considered synony-mous in the context of the EHR project. If several synonymsexist, one of them has to be indicated as the fully quali-fied term. At the end of the terminological sorting activity,in which the user can interactively rearrange and renameterms to build up the actor hierarchy and the information hi-erarchy, all homonyms must be eliminated among the fullyqualified terms.

Another issue is the use of pronouns in NL documents,e.g., it, he, she, they, etc. We found that most pronouns caneasily be reduced to their antecedents (i.e., the thing theyrefer to) directly during annotation and encourage the an-notator to do so. Still, some annotators may find it easierto resolve pronouns after annotating the goal structure. Weare providing functionality to do so. The list below summa-rizes the operations involved in the terminological sortingactivity:

• Make one term a child of another term,

• Merge two terms into one (combine their synonyms),

• Split one term into two (divide its synonyms),

• Add a synonym to a term,

• Rename a term,

• Indicate fully qualified synonym,

• Resolve a pronoun to its antecedent term,

• Add description explaining a term.

4.4 Semantic Analysis

The objective of the semantic analysis is to verify the con-sistency of mined goal models. We base our concept ofconsistency on the following definition of a goal model’ssignature.

Definition 1 (Goal model signature) A goal model signa-ture is defined by a tuple, (I, A, G,≤I ,≤A), with

• I: a set of information elements,

• A: a set of actors,

• G: a set of goals of form (n, s,m, a, t, i, c, k, d)

– n ∈ {T, F}: indicates if the goal is negative

– s ∈ {nomGoal, secGoal, functGoal, use, dis-close, collect, create, amend, change, edit,delete}, a goal class from the annotation schema.We use ≤s to denote the partial order of goalclasses, as defined in the annotation schema.

– m, a, t ∈ A : the goal’s maker, actor and target

– i ∈ I: object (information element) for the goal

– k ∈ {AND, OR}: subgoal composition mode

– c ⊂ G: subgoals

– d ⊂ A actors that can override the goal.

• ≤I - a partial order defining the information hierarchy,

• ≤A - a partial order defining the actor hierarchy

In the following, we will use G(M) as a shorthand torefer to the set of goals of a given goal model M. Likewise,we will use auxiliary functions n, s,m, etc. to refer to theequally named element of a given goal.

For a given goal model M with g1, g2 ∈ G(M), let /denote the partial order induced by the parent/child compo-sition of goals, i.e., g1 / g2 ⇔ g1 = g2 ∨ g1 ∈ c(g2) ∨∃g3 ∈ G(M) g1 / g3 ∧ g3 / g2s

The detection of inconsistencies is based on the seman-tics of the security goal types in the annotation ontology

49

(i.e., collect, change, disclose, etc.), goal negations, and ac-tor and information type hierarchies created during the an-notation process.

The following two definitions introduce terms used fordefining our notion of goal consistency. Def. 2 defines theset of all goals implied by a given goal g. Informally, theseare all the goals that can be reached by traversing a con-nected path of conjunctive subgoals of g. Def. 3 definesa subsumption relation on goals based on goal types, actorhierarchy and the information hierarchy.

Definition 2 (Implied goals) Let g! denote the set of im-plied goals of g, defined asg! := {r|r / g ∧ ∀y(r / y ∧ y / g)⇒ k(y) = “AND” }

Definition 3 (Subsumed goals) For a given goal model(I, A, G,≤I ,≤A) with g1, g2 ∈ G, we say that g2 sub-sumes g1, denoted as g1 � g2, if and only if s(g1) ≤s

s(g2)∧a(g1) ≤A a(g2)∧ t(g1) ≤A t(g2)∧ i(g1) ≤I i(g2).We further denote�� as the symmetric extension of�,i.e., x�� y ⇔ x� y ∨ y � x.

Goals are decomposed in form of AND/OR-trees. A sce-nario where two goals in different goal trees are inconsistentdoes not necessarily imply that the root goals of the treesare inconsistent. Specifically, if the inconsistent sub-goalsparticipate in a disjunctive (“or”) decomposition of higherlevel parent goals, the parent goals may be consistently sat-isfiable. We highlight such points of interference, and theanalyst can decide if these constitute potential problems.

Definition 4 (Interference) Let g1, g2 be two goals andlet x and y be two security goals, i.e., s(x), s(y) ≤S

“secGoal”. g1 and g2 interfere with respect to the offendinggoals x and y , denoted as g1 †xy g2, if x/g1∧y/g2∧x��y ∧ n(x) 6= n(y)

Two interfering goals conflict if the offending subgoalsare in exclusively “AND”-decomposed subtrees and onegoal was not specified as an exception to the other.

Definition 5 (Conflict) Two goals g1, g2 conflict with re-spect to two offending security goals x and y, denoted asg1 ‡xy g2, if g1 †xy g2 ∧ x ∈ g1! ∧ y ∈ g2! ∧¬(m(g1) ≤A d(g2) ∨m(g2) ≤A d(g1))

Note that the last line in Def. 5 precludes the situationthat one of the goals is allowed to defeat the other. This con-dition is important because otherwise goals would appear inconflict, even if they just represent allowed exceptions. Seepart b of requirement R12 in Fig. 2 as an example for sucha defeasible goal. As explained earlier, a defeasible goal isannotated with a defeasible by relationship referencing theactor who may override it (cf. Fig. 8).

According to the above definitions, conflicting goals areclearly inconsistent, while interfering goals may be incon-sistent and the user has to investigate them manually. Thesemantic analysis has been implemented based on a logic-based inference engine, using DR-Prolog, a defeasible rea-soning extension of higher-order Prolog (Hilog) [2]. Whenthe user invokes the semantic analysis, CREE-tool gener-ates DR-Prolog clauses for the analysis. While an introduc-tion into the DR-Prolog language is beyond the scope of thispaper, Fig. 10 gives the reader an impression of what is gen-erated for our sample goal model in Fig. 8. In DR-Prolog,clauses are either facts, strict rules or defeasible rules. Con-flicts detected by DR-Prolog are visualized in CREE-tool.

4.5 Analyzing e-health policy semantics

We have applied the semantic analysis to the case study de-fined in Sec. 2. We found that some indicated conflicts wereactually not due to contradictions in the NL policies butrather due to incompleteness or ambiguity. For example, thesemantic analysis indicates a conflict between the two goals“block the transmission” and “notify” in R12 (cf. Fig. 8).This conflict is actually due to our convention of “inherit-ing” properties from parent goals if they remain unspecifiedin subgoals (cf. Sec. 4.2.1). A user who investigates thisindicated conflict in the NL policy (Fig. 2) quickly learnsthat the regulation simply does not specify the informationobject for the notify action in part c of the text. Techni-cally, this conflict can be resolved by introducing a new in-formation object (e.g., “blockage record”) for goal notify.However, more importantly, the user can now investigatewith the policy provider whether the design of this blockagerecord is left entirely at the discretion of the software engi-neers implementing the system or whether the regulationshould be amended with further details about this record.This example demonstrates that the IGM process not only

fact(actor(requestor)).fact(actor(law)).fact(actor(ehri)).fact(actor(infoway)).fact(info(phi)).fact(info(consent_direct)).fact(info_about(consent_directives,phi)).defeasible(r10,nomGoal(r_12,infoway,ehri,null,phi),[]).defeasible_by(r10,[law]).and_decompose(r_12,[resp_to_requ,receiv_stor_proc_or_transm]).strict(r1,secGoal(receiv_stor_proc_or_transm,infoway,ehri,null,phi),[]).and_decompose(receiv_stor_proc_or_transm,[maintain_the_assoc]).strict(r6,functGoal(maintain_the_assoc,infoway,ehri,null,consent_direct),[]).strict(r9,functGoal(resp_to_requ,infoway,ehri,requestor,phi),[]).and_decompose(resp_to_requ,[proc_consent_direct,respond_or_block]).strict(r11,functGoal(respond_or_block,infoway,ehri,requestor,phi),[]).or_decompose(respond_or_block,[block_the_trans,transmitting]).strict(r3,˜(disclose(block_the_trans,infoway,ehri,requestor,phi)),[]).and_decompose(block_the_trans,[notify,violate_the_directives]).strict(r2,disclose(notify,infoway,ehri,requestor,phi),[]).strict(r7,functGoal(violate_the_directives,infoway,ehri,requestor,phi),[]).strict(r4,disclose(transmitting,infoway,ehri,requestor,phi),[]).and_decompose(transmitting,[violate_the_directives_1]).strict(r8,˜(functGoal(violate_the_directives_1,infoway,ehri,requestor,phi)),[]).strict(r5,use(proc_consent_direct,infoway,ehri,requestor,consent_direct),[]).

Figure 10. Generated DR-Prolog clauses

50

has value for “downstream” software engineering activities(from regulation to models) but also helps to indicate weak-nesses and potential omissions in regulations (“upstream”).

The analysis of our case study has indicated a few otherinteresting points of interference, in particular if differentstakeholders are considered (multilateral security). One ex-ample is described in the following. Infoway requirementsstate that consent directives contain personal informationabout patients [21]. The policy requires consent directivesbe collected by the EHRi (R10 in [21]). The CanadianPersonal Information Protection and Electronic DocumentAct (PIPEDA) forbids the collection of personal informa-tion unless the individual has granted consent. (This can bemodelled as a negative, defeasible goal against the collec-tion of personal information.) The semantic analysis indi-cates interference between the PIPEDA goal against collect-ing personal information and the Infoway goal of collectingconsent directives. This case is interesting because it raisesthe question whether there should be a form of consent forcollecting medical consent directives. Indeed, medical con-sent directives may contain sensitive information, e.g., a di-rective to withhold the mental health section of a patient.

5. Evaluation

Revisiting Otto and Anton’s requirements on tools andmethods for engineering requirements with NL regulations(cf. Sec. 3), we note that the IGM provides annotation ofregulatory statements, traceability between references andrequirements, a data dictionary to ensure consistency andsemiautomatic navigation and searching (implemented inthe CREE-tool). A main advantage of the IGM method overother goal mining methods based on linguistic transforma-tions (e.g., RNLS) is the tight integration of goal modelsand NL regulations, which aids traceability and enables anditerative goal mining rather than necessitating a waterfall-oriented process. As a result, it becomes easier to use goalmodels to critique and enhance the NL regulations.

We have applied the IGM method and CREE-tool with alarge-scale, real-world case study (Health Canada Infoway)and some of our experiences have already been reportedin earlier sections. Infoway’s security and privacy regula-tion document comprises 115 nominal requirements on 126pages [21]. The document has originally been publishedin PDF, a format that cannot be processed by the currentversion of CREE-tool. Therefore, we converted the doc-ument to plain text prior to processing it. The resultingloss of formatting information made the document harderto navigate but otherwise did not negatively impact the goalmining process. Infoway categorizes the regulations as ei-ther technical or administrative. While the IGM annotationschema was more useful for technical requirements, it wasstill possible to annotate goal models for many administra-

tive requirements. For example, the following administra-tive requirement could give rise to a model with a goal tocreate information about the accountable individual (actor)and disclose it to the public (actor): “Organisations [] mustdesignate and publicly name an individual who is account-able for [] privacy requirements” [21], p. 20.

Concentrating on the 63 technical requirements, we it-eratively evaluated and refined the annotation schema un-til it proved sufficiently expressive to model the phenom-ena we were interested in, i.e., regulations on informationconfidentiality and integrity. Our approach does not cur-rently consider availability, which is considered a third se-curity objective. As discussed in Sec. 4.1.4, the annotationschema currently abstracts from temporal phenomena andgoals which depend on the existence of other goals. Evenwith this limitation, the resulting goal models helped dis-ambiguating and analysing the NL regulation.

We conducted an empirical comparison of IGM withBreaux and Anton’s linguistic approach [3] by selecting 12technical privacy regulations and mining them with bothapproaches. Lacking any tool support, it took us approx-imately 80 hours to produce the 518 RNLS and subse-quent subject-object-action patterns for the 12 regulations(cf. Fig. 3 for an excerpt). One main problem was that theresulting RNLS became hard to understand and trace backto the original text. Using IGM, we were able to annotateand analyse the 12 technical requirements in less than sixhours. While formal usability experiment is yet outstand-ing, this preliminary comparison indicates benefits due tobetter traceability and tool support. Both approaches wereeffective in detecting sources of structural incompleteness(cf. Sec. 4.2.1), while semantic analyses were not availablefor the linguistic approach. The IGM semantic analysis in-dicated some interesting interferences (cf. Sec. 4.4), how-ever, we found examples for missed inconsistencies: Theanalysis currently fails to detect conflicts between differ-ent actors participating in the exchange of information, e.g.,consider a goal of one actor to collect certain informationfrom another stakeholder, who has the goal not to disclosethis information.

6. Conclusion

Few requirements engineering approaches provide system-atic methods and tools to mine goal models in NL reg-ulations. Approaches based on linguistic transformationshave proven effective but provide for limited traceabilityand maintainability. We present an alternative method thatuses in-place semantic annotations of goal models in NLregulations. The approach aids the navigation, indexing andmodelling of multilateral security goals formulated in NLand provides a valuable tool for critically evaluating andrefining the NL text. Our evaluation with a large scale ap-

51

plication indicates that the approach is practical and useful.A limitation of our current evaluation is that we performedthe experiments ourselves. Experiments with unbiased sub-jects will follow to provide further evidence of the effective-ness of the IGM method and its tool support. One problemin comparing approaches is the absence of common bench-marks. By publishing our tool and Infoway case study dataonline, we hope to create an accessible resource for com-paring requirements engineering methods and tools.

Currently, the annotation task relies completely on hu-man input and the correctness depends on the user’s abilityto recognize text span in the NL source which should beannotated, and to select the right concept from the annota-tion ontology. For example, “notify” in Fig. 2 is annotatedwith the concept Disclose (Fig 8). Kiyavitskaya et al. havedemonstrated that further automation is possible using natu-ral language processing algorithms [12], however, this stepwill generally require human interaction and confirmation.We hope to integrate such algorithms with CREE-tool in thefuture.

References

[1] A. Allas. Canada Health Infoway: EHRS Blueprint. HealthCanada Infoway, 2006.

[2] G. Antoniou and A. Bikakis. DR-Prolog: A system for de-feasible reasoning with rules and ontologies on the seman-tic web. IEEE Trans. on Knowledge and Data Engineering,19(2):233–245, 2007.

[3] T. Breaux and A. Anton. Analyzing goal semantics for rights,permissions, and obligations. In Proc. 13th IEEE Intl. Conf.on Requ. Eng., pages 177–188, 2005.

[4] L. Compagna, P. E. Khoury, F. Massacci, R. Thomas, andN. Zannone. How to capture, model, and verify the knowl-edge of legal, security, and privacy experts: a pattern-basedapproach. In Proc. 11th Intl. Conf. on A.I. and Law, pages149–153, New York, NY, USA, 2007. ACM.

[5] S. Ghanavati, D. Amyot, and L. Peyton. Towards a Frame-work for Tracking Legal Compliance in Healthcare. In Proc.19th Intl. Conf. on Adv. Inform. Systems Eng. Springer, 2007.

[6] P. Giorgini, F. Massacci, J. Mylopoulos, and N. Zannone. ST-Tool: A CASE tool for security requirements engineering. InProc. 13th IEEE Intl. Conf. on Requ. Eng., pages 451–452,2005.

[7] P. Giorgini, J. Mylopoulos, and R. Sebastiani. Goal-orientedrequirements analysis and reasoning in the Tropos method-ology. Eng. Applications of A.I., 18(2):159–171, 2005.

[8] S. Gurses, J. H. Jahnke, C. Obry, A. Onabajo, T. Santen,and M. Price. Eliciting confidentiality requirements in prac-tice. In Proc. 15th Centers for Advanced Studies Conference(CASCON ’05), pages 101–116, 2005.

[9] C. Haley, R. Laney, J. Moffett, and B. Nuseibeh. Securityrequirements engineering: A framework for representation

and analysis. IEEE Trans. on Software Eng., 34(1):133–153,Jan.-Feb. 2008.

[10] Q. He, P. Otto, A. Anton, and L. Jones. Ensuring compli-ance between policies, requirements and software design: acase study. 4th IEEE Intl. Worksh. on Information Assurance,pages 14 pp.–, April 2006.

[11] W. Heaven and A. Finkelstein. UML profile to supportrequirements engineering with KAOS. IEE Proceedings -Software, 151(1):10–27, 9 Feb. 2004.

[12] N. Kiyavitskaya, N. Zeni, T. D. Breaux, A. I. Anton, J. R.Cordy, L. Mich, and J. Mylopoulos. Extracting rights andobligations from regulations: toward a tool-supported pro-cess. In Proc. 22nd Intl. Conf. on Autom. Software Eng.,pages 429–432. ACM, 2007.

[13] G. T. Lau, K. H. Law, and G. Wiederhold. Legal informationretrieval and application to e-rulemaking. In Proc. 10th Intl.Conf. on A.I. and law, pages 146–154. ACM, 2005.

[14] F. Massacci, J. Mylopoulos, and N. Zannone. Computer-aided support for Secure Tropos. Automated Software Eng.,14(3):341–364, 2007.

[15] M.-F. Moens. Combining structured and unstructured in-formation in a retrieval model for accessing legislation. InProc. of 10th Intl. Conf. on A.I. and law, pages 141–145,New York, NY, USA, 2005. ACM.

[16] J. Mylopoulos, L. Chung, and E. Yu. From object-orientedto goal-oriented requirements analysis. Communications.ACM, 42(1):31–37, 1999.

[17] H. Nakagawa, T. Karube, and S. Honiden. Analysis of multi-agent systems based on KAOS modeling. In Proc. Intl. Conf.on Softw. Eng., pages 926–929, 2006.

[18] N. Noy, R. Fergerson, and M. Musen. The knowledge modelof Protege-2000: Combining interoperability and flexibility.In Proc. 12th Intl. Conf. on Knowl. Eng. and Knowl. Mgmt.,pages 17–32. Springer, 2000.

[19] P. Ogren. Knowtator: A Protege plug-in for annotated corpusconstruction. In Proc. Conf. of the North American Chapterof the Assoc. for Comput. Linguistics on Human LanguageTechnology, pages 273–275. Assoc. for Comput. LinguisticsMorristown, NJ, USA, 2006.

[20] P. N. Otto and A. I. Anton. Addressing legal requirements inrequirements engineering. In Proc. 15th IEEE Intl. Conf. onRequ. Eng., pages 5–14, 2007.

[21] S. Ratajczak. Electronic Health Record (EHR) privacy andsecurity requirements v.1.1. Technical report, Health CanadaInfoway, 2005.

[22] A. Schurr. Adding graph transformation concepts to UML’sconstraint language OCL. Electronic Notes in TheoreticalComputer Science, 44(4):93–106, 2001.

[23] A. Susi, A. Perini, and J. Mylopoulos. The Tropos Meta-model and its Use. Informatica, 29(4):401–408, 2005.

[24] E. Yu. Towards modelling and reasoning support for early-phase requirements engineering. In Proc. 3rd IEEE Intl.Symp. on Requ. Eng., pages 226–235, 1997.

52

[IEEE 2009 ICSE Workshop on Software Engineering in Health Care - Vancouver, BC, Canada...

Documents

Transcript of [IEEE 2009 ICSE Workshop on Software Engineering in Health Care - Vancouver, BC, Canada...