Post on 23-Aug-2020
Making Model-Driven Verification Practical and Scalable:
Experiences and Lessons Learned
Lionel Briand IEEE Fellow, FNR PEARL Chair Interdisciplinary Centre for ICT Security, Reliability, and Trust (SnT) University of Luxembourg, Luxembourg STAF, L’Aquila, Italy, 2015
SnT Software Verification and Validation Lab
• SnT centre, Est. 2009: Interdisciplinary, ICT security-reliability-trust
• 230 scientists and Ph.D. candidates, 20 industry partners
• SVV Lab: Established January 2012, www.svv.lu
• 25 scientists (Research scientists, associates, and PhD candidates)
• Industry-relevant research on system dependability: security, safety, reliability
• Six partners: Cetrel, CTIE, Delphi, SES, IEE, Hitec …
2
An Effective, Collaborative Model of Research and Innovation
Basic&Research& Applied&Research&
Innova3on&&&Development&
• Basic and applied research take place in a rich context
• Basic Research is also driven by problems raised by applied research, which is itself fed by innovation and development
• Publishable research results and focused practical solutions that serve an existing market. 3
Schneiderman, 2013
Collaboration in Practice
• Well-defined problems in context • Realistic evaluation • Long term industrial collaborations
4
Problem Formulation
Problem Identification
State of the Art Review
Candidate Solution(s)
Initial Validation
Training
Realistic Validation
IndustryPartners
ResearchGroups
1
2
3
4
5
7Solution Release
8
6
Motivations
• The term “verification” is used in its wider sense: Defect detection and removal, design or run-time, before or after deployment
• One important application of models is to drive and automate verification
• In practice, despite significant advances, e.g., model-based testing, this is not commonly part of main stream engineering
• Decades of research have not yet significantly and widely impacted practice
5
6
Applicability? Scalability?
Definitions
• Applicable: Can a technology be efficiently and effectively applied by engineers in realistic conditions? – realistic ≠ universal
• Scalable: Can a technology be applied on large
artifacts (e.g., models, data sets, input spaces) and still provide useful support within reasonable effort, CPU and memory resources?
7
Plan
• Project examples, with industry collaborations
• Lessons learned and reflections regarding developing applicable and scalable verification solutions
8
Some Past Projects (< 5 years)
9
Company Domain Objective Notation Automation
Cisco Video conference Robustness testing UML profile Search, model transformation
Kongsberg Maritime Oil & Gas CPU usage and RT deadlines
UML+MARTE Constraint Solving
WesternGeco Marine seismic acquisition
Functional testing UML profile + MARTE Search, constraint solving
SES Satellite Functional and robustness testing, requirements QA
UML profile Search, Model mutation, NLP
Delphi Automotive control systems
Testing safety + performance
Matlab/Simulink Search, machine learning, statistics
CTIE Legal & financial Legal Requirements testing & simulation
UML Profile Model transformation, constraint checking
HITEC Crisis Support systems Testing security, Access Control
UML Profile Constraint verification, machine learning, Search
CTIE eGovernment Offline and run-time verification
UML Profile, BPMN, OCL extension
Domain specific language, Constraint checking
IEE Automotive sensor systems
Requirements-driven testing, traceability
UML profile, Use Case Modeling extension, Matlab/Simulink
NLP, Constraint solving
Testing Software Controllers
References:
10
• R. Matinnejad et al., “Effective Test Suites for Mixed Discrete-Continuous Stateflow Controllers”, ACM ESEC/FSE 2015
• R. Matinnejad et al., “MiL Testing of Highly Configurable Continuous Controllers: Scalable Search Using Surrogate Models”, IEEE/ACM ASE 2014
• R. Matinnejad et al., “Search-Based Automated Testing of Continuous Controllers: Framework, Tool Support, and Case Studies”, Information and Software Technology (2014)
Dynamic Continuous Controllers
11
Electronic Control Units (ECUs)
More functions
Comfort and variety
Safety and reliability
Faster time-to-market
Less fuel consumption
Greenhouse gas emission laws
12
A Taxonomy of Automotive Functions
Controlling Computation
State-Based Continuous Transforming Calculating
unit convertors calculating positions, duty cycles, etc
State machine controllers
Closed-loop controllers (PID)
Different testing strategies are required for different types of functions
13
Development Process
14
Hardware-in-the-Loop Stage
Model-in-the-Loop Stage
Simulink Modeling
Generic Functional
Model
MiL Testing
Software-in-the-Loop Stage
Code Generationand Integration
Software Running on ECU
SiL Testing
SoftwareRelease
HiL Testing
MATLAB/Simulink model
Fibonacci sequence: 1,1,2,3,5,8,13,21,…
15
Controller Input and Output at MIL
InitialDesired Value
FinalDesired Value
time time
Desired Value
Actual Value
T/2 T T/2 T
Test Input Test Output
Plant Model
Controller(SUT)
Desired value Error
Actual value
System output+-
16
Configuration parameters
17
Plant Model
++
+
⌃
+-
e(t)
actual(t)
desired(t)
⌃
KP e(t)
KDde(t)dt
KI
Re(t) dt
P
I
D
output(t)
Inputs: Time-dependent variables
Configuration Parameters
Requirements and Test Objectives
18
Initi
al D
esire
d(ID
)
Desired ValueI (input)Actual Value (output)
Fina
l Des
ired
(FD
)
timeT/2 T
Smoothness
Responsiveness
Stability
Test Strategy: A Search-Based Approach
19
Initial Desired (ID)
Fina
l Des
ired
(FD
)
Worst Case(s)?
• Continuous behavior • Controller’s behavior can
be complex • Meta-heuristic search in
(large) input space: Finding worst case inputs
• Possible because of automated oracle (feedback loop)
• Different worst cases for different requirements
• Worst cases may or may not violate requirements
20
Search Elements
• Search Space: • Initial and desired values, configuration parameters
• Search Technique: • Variants of hill climbing, GAs …
• Search Objective: • Objective/fitness function for each requirement
• Evaluation of Solutions • Simulation of Simulink model => fitness computation
• Result: • Worst case scenarios or values to the input variables that break requirements at MiL level • High-risk, stress test cases based on actual hardware (HiL)
20
Smoothness Objective Functions: OSmoothness
Test Case A Test Case B
OSmoothness(Test Case A) > OSmoothness(Test Case B)
We want to find test scenarios which maximize OSmoothness
21
Solution Overview (Simplified Version)
22
HeatMap Diagram
1. ExplorationList of Critical RegionsDomain
Expert
Worst-Case Scenarios
+Controller-
plant model
Objective Functionsbased on
Requirements 2. Single-State
Search
time
Desired ValueActual Value
0 1 20.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Initial Desired
Final Desired
Automotive Example
• Supercharger bypass flap controller ! Flap position is bounded within [0..1] ! Implemented in MATLAB/Simulink ! 34 sub-components decomposed into 6
abstraction levels ! The simulation time T=2 seconds
Supercharger
Bypass Flap
Supercharger
Bypass Flap
Flap position = 0 (open) Flap position = 1 (closed) 23
Finding Seeded Faults Inject Fault
24
Analysis – Fitness increase over iterations
25 Number of Iterations
Fitn
ess
• The higher the fitness, the worse the test scenario
• How efficiently do we converge towards a worst case scenario?
Analysis II – Search over different regions
26
0.315
0.316
0.317
0.319
0.321
0.323
0.324
0.326
0.327
0.329
0.330
0 10 20 30 40 50 60 70 80 90 100
0.328
0.325
0.320
0.318
0 10 20 30 40 50 60 70 80 90 1000 10 20 30 40 50 60 70 80 90 100
Random Search(1+1) EA
0 10 20 30 40 50 60 70 80 90 100
0.0166
0.0168
0.0170
0.0176
0.0180
0.0178
0.0172
0.0160
0.0162
0.0164 Random Search(1+1) EA
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
0.0174
Average (1+1) EA Distribution Random Search Distribution
Number of Iterations
• We found much worse scenarios during MiL testing than our partner had found so far, and much worse than random search (baseline)
• These scenarios are also run at the HiL level, where testing is much more expensive: MiL results -> test selection for HiL
• But further research was needed: – Simulations are expensive – Configuration parameters – Dynamically adjust search algorithms in different
subregions (exploratory <-> exploitative)
Conclusions
i.e., 31s. Hence, the horizontal axis of the diagrams in Figure 8 shows the number ofiterations instead of the computation time. In addition, we start both random search and(1+1) EA from the same initial point, i.e., the worst case from the exploration step.
Overall in all the regions, (1+1) EA eventually reaches its plateau at a value higherthan the random search plateau value. Further, (1+1) EA is more deterministic than ran-dom, i.e., the distribution of (1+1) EA has a smaller variance than that of random search,especially when reaching the plateau (see Figure 8). In some regions (e.g., Figure 8(d)),however, random reaches its plateau slightly faster than (1+1) EA, while in some otherregions (e.g. Figure 8(a)), (1+1) EA is faster. We will discuss the relationship betweenthe region landscape and the performance of (1+1) EA in RQ3.RQ3. We drew the landscape for the 11 regions in our experiment. For example, Fig-ure 9 shows the landscape for two selected regions in Figures 7(a) and 7(b). Specifically,Figure 9(a) shows the landscape for the region in Figure 7(b) where (1+1) EA is fasterthan random, and Figure 9(b) shows the landscape for the region in Figure 7(a) where(1+1) EA is slower than random search.
0.30
0.31
0.32
0.33
0.34
0.35
0.36
0.37
0.38
0.39
0.40
0.70 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.800.10
0.11
0.12
0.13
0.14
0.15
0.16
0.17
0.18
0.19
0.20
0.90 0.91 0.92 0.93 0.94 0.95 0.96 0.97 0.98 0.99 1.00
(a) (b)
Fig. 9. Diagrams representing the landscape for two representative HeatMap regions: (a) Land-scape for the region in Figure 7(b). (b) Landscape for the region in Figure 7(a).
Our observations show that the regions surrounded mostly by dark shaded regionstypically have a clear gradient between the initial point of the search and the worst casepoint (see e.g., Figure 9(a)). However, dark regions located in a generally light shadedarea have a noisier shape with several local optimum (see e.g., Figure 9(b)). It is knownthat for regions like Figure 9(a), exploitative search works best, while for those like Fig-ure 9(b), explorative search is most suitable [10]. This is confirmed in our work wherefor Figure 9(a), our exploitative search, i.e., (1+1) EA with � = 0.01, is faster and moreeffective than random search, whereas for Figure 9(b), our search is slower than randomsearch. We applied a more explorative version of (1+1) EA where we let � = 0.03 to theregion in Figure 9(b). The result (Figure 10) shows that the more explorative (1+1) EAis now both faster and more effective than random search. We conjecture that, from theHeatMap diagrams, we can predict which search algorithm to use for the single-statesearch step. Specifically, for dark regions surrounded by dark shaded areas, we suggestan exploitative (1+1) EA (e.g., � = 0.01), while for dark regions located in light shadedareas, we recommend a more explorative (1+1) EA (e.g., � = 0.03).
6 Related WorkTesting continuous control systems presents a number of challenges, and is not yet sup-ported by existing tools and techniques [4, 1, 3]. The modeling languages that have been
13
27
Testing in the Configuration Space
• MIL testing for all feasible configurations • The search space is much larger
• The search is much slower (Simulations of Simulink models are expensive)
• Results are harder to visualize
• Not all configuration parameters matter for all objective functions
28
Modified Process and Technology
29
+Controller
Model (Simulink)
Worst-Case Scenarios
List of Critical
PartitionsRegressionTree
1.Exploration with Dimensionality
Reduction
2.Search withSurrogate Modeling
Objective Functions
DomainExpert
Visualization of the multi-dimensional space using regression trees Dimensionality
reduction to identify the significant variables
Surrogate modeling to predict the objective function and speed up the search
Dimensionality Reduction
• Sensitivity Analysis: Elementary Effect Analysis (EEA)
• Identify non-influential inputs in computationally costly mathematical models
• Requires less data points than other techniques
• Observations are simulations generated during the Exploration step
• Compute sample mean and standard deviation for each dimension of the distribution of elementary effects
30
Cal5ID
Cal3FD
Cal4Cal6
Cal1,Cal2
0.6
0.4
0.2
0.0
Sam
ple
Stan
dard
Dev
iatio
n (
)
-0.6 -0.4 -0.2 0.0 0.2Sample Mean ( )
⇤10�2
⇤10�2
S� i
�i
Elementary Effects Analysis Method
! Imagine function F with 2 inputs, x and y:
A �x
�y
A1
A2
C �x
�y
C1
C2
B �x
�y
B1
B2
X
Y
Elementary Effects for X for Y
F(A1)-F(A) F(B1)-F(B) F(C1)-F(C)
…
F(A2)-F(A) F(B2)-F(B) F(C2)-F(C)
…
31
Visualization in Inputs & Configuration Space
32
All Points
FD>=0.43306
Count MeanStd Dev
Count MeanStd Dev
FD<0.43306Count MeanStd Dev
ID>=0.64679Count MeanStd Dev
Count MeanStd Dev
Cal5>=0.020847 Cal5>0.020847Count MeanStd Dev
Count MeanStd Dev
Cal5>=0.014827 Cal5<0.014827Count MeanStd Dev
Count MeanStd Dev
1000 0.007822
0.0049497
ID<0.64679
574 0.00595130.0040003
426 0.01034250.0049919
373 0.00475940.0034346
201 0.00816310.0040422
182 0.01345550.0052883
244 0.00802060.0031751
70 0.01067950.0052045
131 0.00681850.0023515 Regression Tree
Surrogate Modeling (1)
• Goal: To predict the value of the objective functions within a critical partition, given a number of observations, and use that to avoid as many simulations as possible and speed up the search
33
A
B
Surrogate Modeling (2)
34
• Any supervised learning or statistical technique providing fitness predictions with confidence intervals
1. Predict higher fitness with high confidence: Move to new position, no simulation
2. Predict lower fitness with high confidence: Do not move to new position, no simulation
3. Low confidence in prediction: Simulation
Surrogate Model
Real Function
x
Fitness
Experiments Results (RQ1)
! The best regression technique to build Surrogate models for all of our three objective functions is Polynomial Regression with n = 3 ! Other supervised learning techniques, such as SVM
Mean of R2/MRPE values for different surrogate modeling techniques
Fst
Fsm
Fr
PR(n=3)R2/MRPE
0.66/0.0526 0.95/0.0203
0.78/0.0295
0.26/0.2043
0.98/0.0129
0.85/0.0247 0.85/0.0245
0.46/0.1755 0.54/0.1671
0.44/0.0791
0.49/1.2281
0.22/1.2519
LRR2/MRPE
ERR2/MRPE
PR(n=2)R2/MRPE
35
Experiments Results (RQ2)
! Dimensionality reduction helps generate better surrogate models for Smoothness and Responsiveness requirements
0.0
0.02
0.04
0.06
0.08 0.05
0.01
0.02
0.03
0.04
0.1
0.2
0.3
DR No DR DR No DR DR No DR
Smoothness( )Fsm Responsiveness( )Fr Stability( )Fst
Mea
n R
elat
ive
Pred
ictio
n Er
rors
(M
RPE
Val
ues)
36
! For responsiveness, the search with SM was 8 times faster
! For smoothness, the search with SM was much more effective
Experiments Results (RQ3)
Sear
ch O
utpu
t Va
lues
Sear
ch O
utpu
t Va
lues
0.215SM
After 800 seconds After 2500 seconds After 3000 seconds
NoSM
0.220
0.225
0.230
0.235
SM NoSMSM NoSM
After 200 seconds
0.160
0.164
0.168
After 300 seconds After 3000 seconds
NoSM NoSM SM NoSMSM SM
37
! Our approach is able to identify critical violations of the controller requirements that had neither been found by our earlier work nor by manual testing.
MiL-Testing different configurations
Stability
Smoothness
Responsiveness
MiL-Testing fixed configurations Manual MiL-Testing
- -2.2% deviation
24% over/undershoot 20% over/undershoot 5% over/undershoot
170 ms response time 80 ms response time 50 ms response time
Experiments Results (RQ4)
38
A Taxonomy of Automotive Functions
Controlling Computation
State-Based Continuous Transforming Calculating
unit convertors calculating positions, duty cycles, etc
State machine controllers
Closed-loop controllers (PID)
Different testing strategies are required for different types of functions
39
Differences with Close-Loop Controllers
40
respectively. In addition, we adapt the whitebox coverage and theblackbox output diversity selection criteria to Stateflows, and evalu-ate their fault revealing power for continuous behaviours. Coveragecriteria are prevalent in software testing and have been consideredin many studies related to test suite effectiveness in different appli-cation domains [?]. In our work, we consider state and transitioncoverage criteria [?] for Statflows. Our output diversity criterion isbased on the recent output uniqueness criterion [?] that has beenstudied for web applications and has shown to be a useful surro-gate to whitebox selection techniques. We consider this criterionin our work because Stateflows have complex internal structuresconsisting of differential equations, making them less amenable towhitebox techniques, while they have rich time-continuous outputs.
In this paper, we make the following contributions:
• We focus on the problem of testing Stateflows with mixeddiscrete-continuous behaviours. We propose two new testcase selection criteria output stability and output continuitywith the goal of selecting test inputs that are likely to pro-duce continuous outputs exhibiting instability and disconti-nuity failures, respectively.
• We adapt the whitebox coverage and the blackbox outputdiversity selection criteria to Stateflows, and evaluate theirfault revealing power for continuous behaviours. The formeris defined based on traditional state and transition coveragefor state machines, and the latter is defined based on the re-cent output uniqueness criterion [?].
• We evaluate effectiveness of our newly proposed and theadapted selection criteria by applying them to three Stateflowcase study models: two industrial and one public domain.Our results show that RESULT.
Organization of the paper.
2. BACKGROUND AND MOTIVATIONMotivating example. We motivate our work using a simplifiedStateflow from the automotive domain which controls a superchargerclutch and is referred to as the Supercharger Clutch Controller (SCC).Figure 1(a) represents the discrete behaviour of SCC specifyingthat the supercharger clutch can be in two quiescent states [?]: en-gaged or disengaged. Further, the clutch moves from the disen-gaged to the engaged state whenever both the engine speed engspdand the engine coolant temperature tmp respectively fall inside thespecified ranges of [smin..smax] and [tmin..tmax]. The clutchmoves back from the engaged to the disengaged state whenevereither the speed or the temperature falls outside their respectiveranges. The variable ctrlSig in Figure 1(a) indicates the sign andmagnitude of the voltage applied to the DC motor of the clutchto physically move the clutch between engaged and disengagedpositions. Assigning 1.0 to ctrlSig moves the clutch to the en-gaged position, and assigning �1.0 to ctrlSig moves it back tothe disengaged position. To avoid clutter in our figures, we useengageReq to refer to the condition on the Disengaged ! En-gaged transition, and disengageReq to refer to the condition onthe Engaged ! Disengaged transition.
The discrete transition system in Figure 1(a) assumes that theclutch movement takes no time, and further, does not provide anyinsight on the quality of movement of the clutch. Figure 1(b) ex-tends the discrete transition system in Figure 1(a) by adding a timervariable, i.e., time, to explicate the passage of time in the SCCbehaviour. The new transition system in Figure 1(b) includes two
(a) SCC -- Discrete Behaviour
(b) SCC -- Timed Behaviour
EngagedDisengaged
Engaging
(c) Engaging state of SCC -- mixed discrete-continuous behaviour
Disengaging
Disengaged
Engaged
time + +;
[disengageReq]/time := 0
[time
>5]
[time
>5]
time + +;
[(engspd > smin � engspd < smax) � (tmp > tmin � tmp < tmax)]/ctrlSig := 1
[engageReq]/ time := 0
[¬(engspd > smin � engspd < smax) � ¬(tmp > tmin � tmp < tmax)] /ctrlSig := �1
OnMoving OnSlipping
OnCompleted
time + +;ctrlSig := f(time)
Engaging
time + +;ctrlSig := g(time)
time + +;ctrlSig := 1.0
[¬(vehspd = 0) �time > 2]
[(vehspd = 0) �time > 3]
[time > 4]
Figure 1: Supercharge Clutch Controller (SCC) Stateflow.
transient states [?], engaging and disengaging, specifying that mov-ing from the engaged to the disengaged state and vice versa takessix milisec. Since this model is simplified, it does not show han-dling of alterations of the clutch state during the transient states.In addition to adding the time variable, we note that the variablectrlSig, which controls physical movement of the clutch, cannotabruptly jump from 1.0 to �1.0, or vice versa. In order to ensuresafe and smooth movement of the clutch, the variable ctrlSig hasto gradually move between 1.0 and �1.0 and be described as afunction over time, i.e., a signal. To express the evolution of thectrlSig signal over time, we decompose the transient states en-gaging and disengaging into sub-state machines. Figure 1(c) showsthe sub-state machine related to the engaging state. The one relatedto the disengaging state is similar. At beginning (in state OnMov-ing), the function ctrlSig has a steep grade (i.e., function f ) tomove the stationary clutch from the disengaged state and acceler-ate it to reach a certain speed in about two milisec. Afterwards (instate OnSlipping), ctrlSig decreases the speed of clutch basedon the gradual function g until about four milisec. This is to ensurethat the clutch slows down as it gets closer to the crank shaft ofthe car. Finally, at state OnCompleted, ctrlSig reaches value 1.0and remains constant, causing the clutch to get engaged in aboutone milisec. When the car is stationary, i.e., vehspd is 0, the clutchmoves based on the steep grade function f for three milisec, anddoes not have to go to the OnSlipping phase to slow down beforeit reaches the crank shaft at state OnCompleted.Input and Output. The Stateflow inputs and outputs are signals(functions over time). Each input/output signal has a data type,e.g. boolean, enum or float, specifying the range of the signal.For example, Figure 2 shows an example input (dashed line) andoutput (solid line) signals for SCC. The input signal is related toengageReq and is boolean, while the output signal is related to
• Open loop controllers
• Mixed discrete-continuous behavior: Simulink stateflows
• Much quicker simulation time
• No feedback loop -> no automated oracle
• The main testing cost is the manual analysis of output signals
• Goal: Minimize test suites
• Challenge: Test selection
• Entirely different approach to testing
Selection Strategies
• Adaptive Random Selection • White-box Structural Coverage
• State Coverage • Transition Coverage
• Output Diversity • Failure-Based Selection Criteria (search)
• Domain specific failure patterns • Output Stability • Output Continuity
41
Stability of Output Signals
42
Discontinuities in Output Signals
43
Automatic Generation of System Test Cases from Use Case
Specifications
References:
44
• Chunhui Wang et al., “Automatic Generation of System Test Cases from Use Case Specifications ”, ACM ISSTA 2015
• Chunhui Wang et al., “ UMTG: A Toolset to Automatically Generate System Test Cases from Use Case Specifications ”, ESEC/FSE 2015 Tool track
Problem
• Verifying the compliance of a system with its requirements in a cost-effective way
• Traceability between requirements and system test cases – This cannot be manual
• E.g., automotive industry which must comply to ISO 26262
45
Objectives
• Automatically generate test cases from requirements
• Capture traceability information between test cases and requirements during generation
46
Context Matters
– Requirements are captured through use cases (common)
– Use cases are used to communicate with customers and the system test team
– Complete and precise behavioral models are not an option: too complex
47
Strategy
• Analyzable use case specifications (RUCM) • Automatically extract test model from the use
case specifications through Natural Language Processing
• Minimize modeling • No behavioral modeling
48
THE ACTOR SEND HE SYSTEM VALI HE SYSTEM DIS THE ACTOR SEND
THE ACTOR SEND HE SYSTEM VALI HE SYSTEM DIS THE ACTOR SEND
THE ACTOR SEND HE SYSTEM VALI HE SYSTEM DIS THE ACTOR SEND
ERRORS ARE ABSENT
TEMPERATURE IS LOW STATUS IS VALID
Identify Constraints 4
Constraint descriptions
Errors.size() == 0 Status �= null ��> 0 && t < 50
Generate Scenarios and Inputs
6
Elicit Use Cases 1
Missing Entities
Specify Constraints 5
OCL constraints
Model the Domain 2
Evaluate Consistency
3
Domain Model RUCM Use Cases
Generate Test Cases
7 ���� ���� ����
Test Cases Object
Diagrams Test
Scenarios 49
Mapping Table
Costs of Additional Modeling
50
Use Case Steps Use Case Flows
OCL Constraints
UC� 50 8 9 UC2 44 13 7 UC3 35 8 8 UC4 59 11 12 UC5 �� 8 5 UC6 25 6 12
5 to 10 minutes to write each constraints
Effectiveness: scenarios covered
51
0
5
10
15
20
25
30
35
40
UC1 UC2 UC3 UC4 UC5 UC6
Scenarios Covered By Engineer Scenarios Covered By UMTG
100%
100%
100%
100% 100% 100%
1%
�7% 100%
�%
��% ��%
It is hard for engineers to capture all the possible scenarios involving error conditions.
Modeling and Verifying Legal Requirements
Reference:
52
• G. Soltana et al., “A Model-Based Framework for Probabilistic Simulation of Legal Policies”, IEEE/ACM MODELS 2015
• G. Soltana et al., “Using UML for Modeling Procedural Legal Rules”, IEEE/ACM MODELS 2014
• M. Adedjouma et al., “Automated Detection and Resolution of Legal Cross References”, IEEE RE 2014
Context and Problem
53
• CTIE: Government computer centre in Luxembourg
• Large government (information) systems
• Implement legal requirements, must comply with the law
• The law usually leaves room for interpretation and changes on a regular basis, many cross-references
• Involves many stakeholders, IT specialists but also legal experts, etc.
Article Example
Art. 105bis […]The commuting expenses deduction (FD) is defined as a function over the distance between the principal town of the municipality on whose territory the taxpayer's home is located and the place of taxpayer’s work. The distance is measured in units of distance expressing the kilometric distance between [principal] towns. A ministerial regulation provides these distances. The amount of the deduction is calculated as follows: If the distance exceeds 4 units but is less than 30 units, the deduction is € 99 per unit of distance. The first 4 units does not trigger any deduction and the deduction for a distance exceeding 30 units is limited to € 2,574.
Project Objectives
55
Objective Benefits Specification of legal requirements • including rationale and traceability
to the text of law
• !Make interpretation of the law explicit • Improve communication • Prerequisite for automation
Checking consistency of legal requirements
• Prevent errors in the interpretation of the law to propagate
Automated test strategies for checking system compliance to legal requirements
• Provide effective and scalable ways to verify compliance Run-time verification mechanisms to
check compliance with legal requirements Analyzing the impact of changes in the law
• Decrease costs and risks associated with change • Make change more predictable
Solution Overview
56
Test cases
Actual software system
Traces to
Traces to
Analyzable interpretation of the law Generates
Results match?
Impact of legal changes
Simulates
Research Steps
57
2. Build UML profile
3. Model Transformation to enable V&V
• What information requirements should we expect?
• What are the complexity factors?
• Explicit means for capturing information requirements
• Basis for modeling methodology
• Target: Legal experts and IT specialists
• Target existing automation techniques
• Solvers for testing • Simulation language
or library
1. Conduct grounded theory study
Example
58
Art. 105bis […]The commuting expenses deduction (FD) is defined as a function over the distance between the principal town of the municipality on whose territory the taxpayer's home is located and the place of taxpayer’s work. The distance is measured in units of distance expressing the kilometric distance between [principal] towns. A ministerial regulation provides these distances.
Interpretation + Traces
Example
59
The amount of the deduction is calculated as follows: If the distance exceeds 4 units but is less than 30 units, the deduction is € 99 per unit of distance. The first 4 units does not trigger any deduction and the deduction for a distance exceeding 30 units is limited to € 2,574.
Interpretation + Traces
Challenges and Results
• Profile must lead to models that are: – understandable by both IT specialists and legal experts – precise enough to enable model transformation and support
our objectives • Tutorials, many modeling sessions with legal experts
• In theory, though such legal requirements can be captured by OCL constraints alone, this is not applicable
• That is why we resorted to customized activity modeling, carefully combined with a simple subset of OCL
• Many traces to law articles, dependencies among articles: automated detection (NLP) of cross-references 60
Run-Time Verification of Business Processes
References:
61
• W. Dou et al., “OCLR: a More Expressive, Pattern-based Temporal Extension of OCL”, ECMFA 2014
• W. Dou et al., “Revisiting Model-Driven Engineering for Run-Time Verification of Business Processes”, IEEE/ACM SAM 2014
• W. Dou et al., “A Model-Driven Approach to Offline Trace Checking of Temporal Properties with OCL”, Technical Report SnT-TR-2014-5, submitted to a journal
Context and Problem
• CTIE: Government Computing Centre of Luxembourg
• E-government systems mostly implemented as business processes
• CTIE models these business processes (MDE methodology)
• Business models have temporal properties that must be checked – Temporal logics not applicable by analysts – Limited tool support (scalability)
• Goal: Efficient, scalable, and practical off-line and run-time verification in an MDE context 62
Solution Overview
63
Solution Overview
64
• We identified patterns based on analyzing many properties of real business process models
• How to facilitate their expression?
Solution Overview
65
• Want to transform the checking of temporal constraints into checking regular constraints on trace conceptual model
• OCL engines (Eclipse) are our target, to rely on mature technology (scalability)
• Defined a more expressive extension of OCL (OCLR), resembling natural language, to facilitate translation
• Optimized translation
Scalability Analysis
• Analyzed 47 properties in Identity Card Management System • “Once a card request is approved, the applicant is notified within
three days; this notification has to occur before the production of the card is started.”
• Scalability: Check time as a function of trace size …
66 OCLR-Check vs. MonPoly Scalability Analysis
Schedulability Analysis and Stress Testing
References:
67
• S. Di Alesio et al. “Stress Testing of Task Deadlines: A Constraint Programming Approach”, ISSRE 2013, San Jose, USA
• S. Di Alesio et al. “Worst-Case Scheduling of Software Tasks – A Constraint Optimization Model to Support Performance Testing”, Constraint Programming (CP), 2014
• S. Di Alesio et al. “Combining Genetic Algorithms and Constraint Programming to Support Stress Testing of Task Deadlines”, ACM TOSEM, 2015
Problem
• Real-time, concurrent systems (RTCS) have concurrent interdependent tasks which have to finish before their deadlines
• Some task properties depend on the environment, some are design choices
• Tasks can trigger other tasks, and can share computational resources with other tasks
• Schedulability analysis encompasses techniques that try to predict whether all (critical) tasks are schedulable, i.e., meet their deadlines
• Stress testing runs carefully selected test cases that have a high probability of leading to deadline misses
• Testing in RTCS is typically expensive, e.g., hardware in the loop
68
Arrival Times Determine Deadline Misses
69
0
1
2
3
4
5
6
7
8
9
j0, j1 , j2 arrive at at0 , at1 , at2 and must finish before dl0 , dl1 , dl2
J1 can miss its deadline dl1 depending on when at2 occurs!
0
1
2
3
4
5
6
7
8
9
j0 j1 j2 j0 j1 j2 at0
dl0
dl1
at1 dl2
at2
T
T
at0
dl0 dl1
at1 at2
dl2
Context
70
Drivers (Software-Hardware Interface)
Control Modules Alarm Devices (Hardware)
Multicore Architecture
Real-Time Operating System
Monitor gas leaks and fire in oil extraction platforms
Challenges and Solutions
• Ranges for arrival times form a very large input space
• Task interdependencies and properties constrain what parts of the space are feasible
• We re-expressed the problem as a constraint optimisation problem
• Constraint programming
71
Constraint Optimization
72
Constraint Optimization Problem
Static Properties of Tasks (Constants)
Dynamic Properties of Tasks (Variables)
Performance Requirement (Objective Function)
OS Scheduler Behaviour (Constraints)
Process and Technologies
73
UML Modeling (e.g., MARTE)
Constraint Optimization
Optimization Problem (Find arrival times that maximize the
chance of deadline misses)
System Platform
Solutions (Task arrival times likely to
lead to deadline misses)
Deadline Misses Analysis
System Design Design Model (Time and Concurrency
Information)
INPUT
OUTPUT
Stress Test Cases
Constraint Programming
(CP)
Challenges and Solutions (2)
• Scalability problem: Constraint programming (e.g., IBM CPLEX) cannot handle such large input spaces (CPU, memory)
• Solution: Combine metaheuristic search and constraint programming – metaheuristic search identifies high risk regions in
the input space – constraint programming finds provably worst-case
schedules within these (limited) regions
74
Combining CP and GA
75
A:12 S. Di Alesio et al.
Fig. 3: Overview of GA+CP: the solutions x1, y1 and z1 in the initial population of GA evolve intox6, y6, and z6, then CP searches in their neighborhood for the optimal solutions x⇤, y⇤ and z
⇤.
in the schedule generated by the arrival times in x:
J
⇤(x)
def=
n
j 2 J
�
� 9k⇤ 2 K
j
⇤(x) ·
�
deadline miss
j
⇤,k
⇤(x) � 0 _
8j 2 J, k 2 K
j
·
deadline miss
j
⇤,k
⇤(x) � deadline miss
j,k
(x)
�
o
— Union set I⇤ of impacting sets of tasks missing or closest to miss their deadlines. Let I⇤(x)be the union of the impacting sets of tasks in J
⇤(x):
I
⇤(x)
def=
[
j
⇤2J
⇤(x)
I
j
⇤(x)
By definition, I⇤(x) contains all the tasks that can have an impact over a task that misses adeadline or is closest to a deadline miss.
— Neighborhood ✏ of an arrival time and neighborhood size D. Let ✏(xj,k
) be the intervalcentered in the arrival time x
j,k
computed by GA, and let D be its radius: ✏(xj,k
) = [x
j,k
�D, x
j,k
+D]. ✏ defines the part of the search space around x
j,k
where to find arrival times thatare likely to break task deadlines. D is a parameter of the search.
— Constraint Model M implementing a Complete Search Strategy. Let M be the constraintmodel defined in our previous work [Di Alesio et al. 2014] for the purpose of identifying ar-rival times for tasks that are likely to lead to deadline misses scenarios. M models the staticand dynamic properties of the software system respectively as constants and variables, and thescheduler of the operating system as a set of constraints among such variables. Note that M im-plements a complete search strategy over the space of arrival times. This means that M searchesfor arrival times of all aperiodic tasks within the whole interval T .
Our combined GA+CP strategy consists in the following two steps:
ACM Transactions on Software Engineering and Methodology, Vol. V, No. N, Article A, Pub. date: January YYYY.
Applicable? Scalable?
76
Scalability Examples
• This is the most common SE challenge in practice • Testing software controllers
– Large input and configuration space – Smart search optimization heuristics (e.g., sensitivity
analysis, machine learning) • Requirements-Driven Testing from UCS
– NLP of restricted use case specifications – OCL constraints solving using metaheuristic search
• Schedulability analysis and stress testing – Constraint programming cannot scale by itself – Carefully combined with genetic algorithms
77
Scalability Examples (2)
• Verifying legal requirements – Many provisions and articles in laws: Traceability is complex – Many dependencies within the law – Natural Language Processing: Cross references, support for
identifying missing modeling concepts – Simulation: Generating large number of model instances
satisfying constraints and probability distributions • Run-time Verification of Business Processes
– Traces can be large and properties complex to verify – Optimized transformations of temporal properties into regular
OCL properties, defined on a trace conceptual model – Incremental verification at regular time intervals – Heuristics to identify which subtraces to verify
78
Scalability: Lessons Learned
• Scalability must be part of the problem definition and solution from the start, not a refinement or an after-thought
• It often involves heuristics, e.g., meta-heuristic search, NLP, machine learning, statistics
• Scalability often leads to solutions that offer “best answers” within time constraints, not guarantees
• Solutions to achieve scalability are multidisciplinary • Scalability analysis should be a component of every research
project – otherwise it is unlikely to be adopted in practice • How many research papers do include even a minimal form of
scalability analysis?
79
Applicability
• Can the target user population efficiently apply it?
• Assumptions: Are working assumptions realistic, e.g., realistic information requirements?
• Integration into the development process, e.g., are required inputs available in the right form and level of precision?
80
Applicability examples
• Testing software controllers – Working assumption: availability of sufficiently precise plant
(environment) models – Means to visualize relevant properties in the search space
(inputs, configuration), to get an overview and focus search on high-risk areas
• Schedulability analysis and stress testing – Availability of tasks architecture models – Accurate WCET analysis – Assess risk based on near-deadline misses as well
81
Applicability examples (2)
• Requirements-Driven Testing from UCS – Precise behavioral scenarios are not an option – Number of constraints – Effort of writing constraints
• Run-time verification of business process models – Temporal logic not usable by analysts – Language closer to natural language, directly tied to business
process model – Easy transition to industry strength constraint checker (OCL)
• Verifying legal requirements – Modeling notation must be shared by IT specialists and legal
experts – One common representation for many applications, with traces
to the law to handle changes – Multiple model transformation targets: Simulation, testing …
82
Applicability: Lessons Learned
• Make working assumptions explicit: Determine the context of applicability
• Make sure those working assumptions are at least realistic in some industrial domain and context
• Assumptions don’t need to be universally true – they rarely are anyway. No solution needs to be universally applicable
• Run industrial case studies and usability studies – do it for real!
83
Conclusions
• In most research endeavors, applicability and scalability are an after-thought, a secondary consideration, when at all considered
• Implicit assumptions are often made, often unrealistic in any context
• Problem definition in a vacuum
• Not adapted to research in an engineering discipline
• Leads to limited impact
• Research in model-based V&V is necessarily multi-disciplinary
• Industrial case studies and user studies are required and far too rare
• In engineering research, there is no substitute to reality
84
Acknowledgements
PhD. Students: • Reza Matinnejad • Stefano Di Alesio • Wei Dou • Ghanem Soltana • Chunhui Wang
Scientists: • Shiva Nejati • Mehrdad Sabetzadeh • Domenico Bianculli • Fabrizio Pastore • Arnaud Gotlieb
85
Making Model-Driven Verification Practical and Scalable: Experiences and Lessons
Learned
Lionel Briand IEEE Fellow, FNR PEARL Chair Interdisciplinary Centre for ICT Security, Reliability, and Trust (SnT) University of Luxembourg, Luxembourg STAF, L’Aquila, Italy, 2015 SVV lab: svv.lu SnT: www.securityandtrust.lu