Symbolic Execution for Software Testing in Practice – Preliminary Assessment
6.1/80 TESTING…. 6.2/80 Overview l Motivation, l Testing glossary, l Quality issues, l...
-
Upload
lauren-daniels -
Category
Documents
-
view
215 -
download
1
Transcript of 6.1/80 TESTING…. 6.2/80 Overview l Motivation, l Testing glossary, l Quality issues, l...
6.1/80
TESTING…
6.2/80Overview
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing?, Testing distributed SW, Testing Real-Time SW, When testing stops? , Summary.
6.3/80
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.4/80Motivation
SW life cycle models too often include separate testing phase…
Nothing could be more dangerous!,
Testing should be carried continuously throughout the SW life cycle.
6.5/80
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.6/80Testing – Glossary …
“V & V” VS Testing:– Verification – אימות
Determine if the phase was completed correctly. (Take place at the end of each phase)Boehm: Verification = “Are we building the product right?” ,
– Validation – תקפות Determine if the product as a whole satisfies its requirementsTakes place before product is handed to the clientBoehm: Validation = “Are we building the right product?” .
6.7/80Testing – Glossary (Cont’d)
Warning: – “Verify” also used for all non-execution-based testing , – V&V might implies that there is a separate phase for testing.
The are two types of testing:– Execution-based testing,– Non-execution-based testing,
It is impossible to ‘execute’ the MRD or EPS , On the other hand, is code-testing enough/efficient for
the implementation phases?
6.8/80
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW , When testing stops? Summary.
6.9/80Quality …
Quality…
6.10/80Quality (Cont’d)
Quality: Peculiar and essential character, An inherent feature, Degree of excellence, Superiority in kind , An intelligible feature by which a thing may be
identified.
6.11/80SW Quality …
In other areas quality implies excellence,
Not here! ,
The quality of SW is the extent to which the product satisfies its specifications.
6.12/80SW Quality (Cont’d) …
Very often bugs are found as the delivery deadline approaches:
Release a faulty product orLate delivery,
Have a separate SW Quality Assurance (SQA) team,
Instead of 100 programmers devoting 30% of their time to SQA activities, Have full-time SQA professionals ,
In small company utilize cross-review.
,
6.13/80SW Quality (Cont’d)
Managerial independence:– Development group,– SQA group ,
B it & B ug C om pany
SQ AT eam
D evelopm entT eam
G enera l M anager
6.14/80The SQA Team Responsibilities
To ensure that the current phase is correct, To ensure that the development phase have been
carried out correctly,
To check that the product as a whole is correct,
The development of various standards and tools, to which the SW and the SW development must conform, [CMM level?] ,
Establishment of monitoring procedures forassuring compliance with those standards.
6.15/80
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.16/80Non-Execution-Based Testing … Underlying principles: Group synergy, We should not review our own work , Our own blind-spot,
Cover your right eye and stare at the red circle. Then, slowly move away from the page (or if you're already far, move toward the page). As you move away/toward the page (Do not look at the blue star), there should be a point where the blue star disappears from the picture. That is your blind spot!
6.17/80Non-Execution-Based Testing …
6.18/80Non-Execution-Based Testing (Cont’d)
Non-execution-Based Testing:
Walkthrough – סקירה,
Inspection – ביקורת,
(Peer-Reviews – סקר עמיתים) ,
6.19/80Walkthrough – The Team
4–6 members, representative from:
Specification team member (document author?),
Specification team manager,
Client,
Next team (spec’s clients) ,
SQA – chairman of the walkthrough.
6.20/80Walkthrough – Preparations
Set the team,
Distribute spec in advance,
Each reviewer prepares two lists:– Things he/she does not understands,– Things he/she thinks are incorrect ,
Execute the walkthrough session(s).
6.21/80The Walkthrough Session …
Chaired by SQA (the one who will loose most..):whose roles are:– Elicit questions,– Facilitate discussion,– Prevent ‘point-scoring-session’,– Prevent annual evaluation session,
(Remember team leader andteam manager …),
Up to 2 hours sessions, Might be participant-driven or document-driven, Verbalization leads to fault finding!, Most fault are found by the presenter! ,
6.22/80The Walkthrough Session (Cont’d)
Detect faults – do not correct them!, Why?, Cost-benefit of the correction (6 members..), Faults should be analyzed carefully, There is not enough time in the session, The ‘committee attitude.’,
6.23/80Inspection …
More formal process, with six-stages: Planning – set the team, set schedule, Overview session – Overview & doc. distribution, Preparation – learn the spec, aided by statistics of
fault types, that is – utilize the organization knowledge base,
Inspection – walkthrough the document, verifying each item. Formal summary will be distributed with summary and ARs. (Action required items: task and due dates),
Rework – fault resolving , Follow-up – every issue is resolved: fix or
clarification.
6.24/80Inspection (Cont’d) …
Team of five: Moderator (I.E. – Spec team leader) – מתווך:
– Manages the inspection – team and object,– Ensures that the team takes positive approach,
Specification author:– Answers questions about the product,
Reader:– Reads the doc aloud,
Recorder:– Documents the results of the inspection ,
SQA, inspector, specialist:– Provides independent assessment of the spec.
6.25/80Inspections (Cont’d)
Use a checklist of potential faults: Is each item of the spec correctly addressed?, In case of interface, do actual and formal
arguments correspond?, Have error handling mechanism been identified?, Is SW design compatible with HW design?, Etc. ,
Throughout the inspection: Faults recording.
6.26/80Fault Statistics
Recorded by severity and fault type.– Major.. (Premature termination, DB damage, etc..),– Or minor ,
Usage of the data:– Compare with previous products,– What if there are a disproportionate number of faults
in a specific module?,– Maybe – redesign from scratch?,– Carry forward fault statistics to the next phase ,– Not for performance appraisal!
6.27/80Inspection – Example [Fagan 1976]
100 person-hours task, Rate of two 2-hours inspections per day, Four-person team, 100/ (5*4) = 5 working days, 67% of all faults were detected before module
execution-based testing! , 38% fewer faults than a comparable product.
6.28/80Statistics on Inspections
93% of all detected faults (IBM, 1986), 90% decrease in cost of detecting fault (switching
system, 1986), 4 major faults, 14 minor faults per 2 hours (JPL,
1990). Savings of $25,000 per inspection , Number of faults decreased exponentially by phase
(JPL, 1992).
6.29/80Review Strengths and Weaknesses
Strengths:– Effective way of faults detecting,– Early detection,– Saving $ ,
Weaknesses:– Depends upon process adequate (הולם, מספיק) ,– Large-scale SW is extremely hard to review
(unless modularity concept – OOP),– Depends upon previous phase documents ,– Might be used for performance appraisal.
6.30/80Metrics for Inspections
Fault density:– Faults per page or –– Faults per KLOC,
By severity (major/minor), By phase, Fault detection rate (e.g. Faults detected per hour), Fault detection efficiency (e.g. Faults detected per
person/hour) , What does a 50% increase in the fault detection
rate mean?
6.31/80
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.32/80Execution-Based Testing
Definitions:– Failure (incorrect behavior),– Error (mistake made by programmer),
Nonsensical statement:– “Testing is demonstration that faults are not present.” ,
Dijkstra:– “Program testing can be very effective
way to show the presence of bugs, but it is hopelessly inadequate for showing their absence” [Dijkstra, 1972] ,
6.33/80What Is Execution-based Tested?
“The process of inferring certain behavioral properties of product based, in part, on results of executing product in known environment with selected inputs.” [IEEE 610.12, 1990],
Troubling implications: Inference – היסק, היקש .
– Trying to find whether there is a black cat in a dark room. Known environment?
– Neither the SW nor the HW are really known, Selected inputs:
– What about RT systems? (e.g. Avionic system) ,
6.34/80
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.35/80But What Should Be Tested?
Utility – תועלת ,
Reliability – אמינות ,
Robustness – ן fחֹוס ,
Performance – ביצועים ,
Correctness – נכונות .
6.36/80Utility – תועלת
Utility – the extent to which user’s needs are met when a correct product is used under conditions permitted by its specification,
Does it meet user’s needs?– Ease of use,– Useful functions,– Cost-effectiveness,
Utility should be tested first, and if the product fails on that score, testing should be stop ,
6.37/80Reliability –אמינות
Reliability – A measure of the frequency and criticality of product failure,
Frequency and criticality of failure: MTBF – Mean Time Between Failures, MTTR – Mean Time To Repair, Mean time, cost to repair results of failure,
– Suppose our SW fails only one every six month,but when it fails it completely wipes out a database. The SW can be re-run within 2hr.,but the DB reconstruction might take a week ,
6.38/80Robustness – ן fחֹוס …
Range of operating conditions:Possibility of unacceptable results with valid input,Effect of invalid input ,
A product with a wide permissible operating conditions is more robust than a product that is more restrictive.
6.39/80Robustness – ן fחֹוס(Cont’d)
A robust product should not yield unacceptable results when the input satisfies its specifications ,
A robust product should not crash when the product is not under permissible operating conditions ,
6.40/80Performance – ביצועים
Extent to which space and time constraints are met,
Real-time SW – hard-time constraints:Can the CPU process an image data within 5ms? (For a 200hz sampling rate?) ,
6.41/80Correctness – נכונות
A product is correct if it satisfies its output specifications, independent of its use of computing resources, when operated under permitted conditions ,
6.42/80
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.43/80Correctness Proofs (Verification)
Mathematical technique for showing that a product is correct,
Correctness means – It satisfies its specifications ,
Is it an alternative to execution-based testing?
6.44/80Specifications Correctness … Specification for a sort:
Are these good specifications?, Function trickSort satisfies these specifications:
6.45/80Specifications Correctness (Cont’d)
Incorrect specification for a sort:
Corrected specification for the sort:
6.46/80Correctness
NOT sufficient – as was demonstrated in previous example ,
NOT a showstopper – consider a new compiler that is:– Twice faster,– Object code is 20% smaller,– Object code is 20% faster,– Much clearer error-code messages,– But – A single error message – for the first ‘for’
statement encountered in any class ,– Will you use it?
6.47/80Correctness Proofs – Glossary
An assertion is:A claim that a certain mathematical property holds true at a given point,
An invariant is:A mathematically expression that holds true under all conditions tested ,
An input specification is:A condition that holds true before the code is executed.
6.48/80Example of Correctness Proof …
Code to be proven correct.
,
6.49/80Example (Cont’d) …
Flowchart of code segment:
6.50/80Example (Cont’d) …
6.51/80Example (Cont’d) …
We have to prove the output spec: @H: S = y[0] + y[1] + … + y[n-1],
Or even a stronger assertion:
@H: k=n and S = y[0] + y[1] + … + y[n-1].
6.52/80Example (Cont’d) …
We will prove for the loop invariant@D: n k and S = y[0] + y[1] + … + y[k-1],
(1) @A: n ∈ {1,2,3 …}, (2) @B: k=0 and n ∈ {1,2,3 …} (we will omit that), (3) @C: k=0 and S=0, Before the loop is entered (@D):
@D: k=0, S=0 and n ∈ {1,2,3 …} , hence n kthus S=0 ,
This is our induction base.
6.53/80Example (Cont’d) …
The induction step:we assume that for some stage, k0, n k0 0 ,
the loop invariant holds: @D: n k0 and S = y[0] + y[1] + … + y[k0-1] ,
Control now passes to the test box:One possibility: if K0 nbecause n k0 (our assumption) k0 =n.So we get:@H: k0 =n and S = y[0] + y[1] + … + y[k0 -1]which is our target!
6.54/80Example (Cont’d) …
The other possibility: k0 <n, so it follows:
(4) @E: k0 <n and S = y[0] + … + y[k0-1],
We execute S = S+y[k0], (5)@F: k0 <n and S = y[0] + … + y[k0-1] + y[k0]
that is: S = y[0] + … +y[k0],
We execute k0 = k0 + 1. We are at point G and so we get:
k0 n and S = y[0] + … +y[k0 -1].
Exactly the loop invariant!
6.55/80Example (Cont’d)
So the loop invariant holds for n k 0 ,
We have to prove that the loops terminates. Obvious – as each time we increase k by 1, while n is fixed.
6.56/80Correctness Proof Case Study
Never prove a program correct without testing it as well ,
We need both testing and correctness proof.
6.57/80Naur and the Line-editor – Episode 1 …
1969 — Naur paper, “Naur text-processing problem”,
Given a text consisting of words separated by blankor by nl (new line) characters, convert it to line-by-line form in accordance with following rules:
(1) Line breaks must be made only where given text has blank or n,
(2) Each line is filled as far as possible, as long as,
(3) No line will contain more than maxpos characters ,
Naur constructed a procedure (25 lines of Algol60), and informally proved its correctness.
6.58/80Naur and the Line-editor – Episode 2 …
1970 — reviewer in computing reviews.– In the output of Naur’s procedure, the first word of the
first line is preceded by blank unless the first word is exactly maxpos characters long ,
Most likely that such a problem have been detected by testing.
6.59/80Naur and the Line-editor – Episode 3 …
1971 — London finds 3 more faults, Including:
– The procedure does not terminate unless a word longer than maxpos characters is encountered,
Again, this fault is likely to be have been detected if the procedure had been tested ,
London, present a corrected version and a formal proof.
6.60/80Naur and the Line-editor – Episode 4
1975 — Goodenough and Gerhart find three further faults Including:
The last word will not be output unless it is followed by blank or nl ,
Again… reasonable choice of test data would have detected that fault.
6.61/80Proofs and SW Engineering
Out of seven faults, four could have been detected simply by running the procedure on test data, such as illustrations given in Naur’s original paper ,
Lesson: even if product is proved correct, it must STILL be tested.
6.62/80Three Myths … Why correctness proving should not be viewed as
a standard SW engineering technique?
1. SW engineers do not have enough math for proofs,
2. Proving is too expensive to be practical ,
3. Proving is too hard ,
6.63/80Three Myths (Cont’d)
Math knowledge – most CS today either take courses in the requisite material or have the background to learn correctness-proving on the job.(Remember the acquaintance questionnaire?),
Expensive – consider SW for a space station, or anywhere else where human life are at stake ,
HARD – although… many nontrivial SW products have been successfully proved to be correct including OS kernels, compilers and communication systems.
6.64/80Proofs Difficulties …
Can we trust a theorem prover?,
What if a theorem prover prints out: “This product is correct”?,
Consider:void theoremProver(){ system.Out.Println(“this product is correct”);} ,
What if we submit a prover to itself?
6.65/80Proofs Difficulties (Cont’d) …
How to find input–output specifications, loop invariants?,
What if the specifications are wrong (trickSort..)? ,
Can never be sure that specifications or a verification system are correct [manna & Waldinger].
6.66/80Proofs and SW Engineering (Cont’d) …
Correctness proofs are a vital SW engineering tool, WHERE APPROPRIATE, If:– Human lives are at stake,– Indicated by cost/benefit analysis,– Risk of not proving is too great,
Also, informal proofs can improve the quality of the product ,
Assertion in code:If at run time the assertion does not hold, he product will be halted.
6.67/80Proofs and SW Engineering (Cont’d)
Languages with assertion capability:– Java, Ada– Assert statement. (Eiffel)
assert (checkVar > 0)if, at any time, (checkVar is not > 0) – execution is stopped,
Assert statement are mostly under debug mode, and turned off to accelerate execution,
Using bounds checking while developing a product but turning it off once the product is working correctly, can be likened to learning sail ondry land wearing a life jacket and then taking the life jacket off when actually at sea ,
6.68/80
Motivation, Testing glossary, Quality issues, Non-execution-based testing, Execution-based testing, What should be tested?, Correctness proofs, Who should perform execution-based testing?, Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.69/80Who Should Performs Ex.-based Testing?
Is testing a “destructive” task?– A successful test finds a fault,– A programmer doesn’t wish to destroy his own work,
Solution:– 1. The programmer does informal testing,– 2. SQA does systematic testing,– 3. The programmer debugs the module, (that is –
finding the cause of the failure and correcting the fault) ,
All test cases must be:– Planned beforehand, including expected output,– Retained afterwards.
6.70/80
Motivation. Testing glossary. Quality issues. Non-execution-based testing. Execution-based testing. What should be tested? Correctness proofs. Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.71/80Testing Distributed SW …
Testing code on a uni-processor we assume:– There is a global environment,– The execution of the product within the environment is
deterministic,– The instructions of the product are executed sequentially ,
and– Inserting debugging statements between source code
statements will not modify the execution of the product , All these assumptions do not hold in a distributed
system.
6.72/80Testing Distributed SW (Cont’d)
In a distributed SW:– There is no global environment,– Product execution may be not reproducible, – The product’s instructions are executed in parallel,– Inserting debugging statements might affect process timing,
We need special tools, e.g. distributed debugger, for testing distributed SW ,
We need to maintain history files in order to be able to reproduce exact sequence that led to a failure.
6.73/80
Motivation. Testing glossary. Quality issues. Non-execution-based testing. Execution-based testing. What should be tested? Correctness proofs. Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? , Summary.
6.74/80Testing Real-Time SW
RT systems are critically dependent upon the timing of inputs and the order of inputs,
These two factors are not controlled by the programmer,
Examples:– Air-craft arrival,– Temperature of a nuclear reactor,– Patient heart rate in an intensive-care-unit,
In RT system we will meet a higher demand for robustness, as often these are stand-alone systems, i.e. they handle many exceptions – thus theyare required for self-recovery capabilities.
6.75/80Technique for RT SW Testing …
Structure analysis:– In order to investigate control flow, we prove that any part of
the code is ‘feasible’, and that there is a ‘termination path’ from from any part of the code,
– Detect and prevent deadlocks, Correctness proofs:
– A number of theorem provers were constructed to prove RT systems ,
Systematic testing:– Running sets of test cases consisting of the same input data
arranged in all possible orderings (n input will induce n! test cases!).
6.76/80Technique for RT SW Testing (Cont’d)
Statistical techniques,– In order to decrease system failures to, say, 0.001%!
Simulation,– “A simulator is a device which calculates, emulates or
predicts the behavior of another device, or some aspect of the behavior of the world.”,
– A simulator might use as a test-bed on which the product can be run,
– The SQA might use the simulator to provide selected inputs to the product ,
– Simulators are particularly important when it is impossible or too dangerous to test a product against suitable sets of test data – like aircraft stalling etc.
6.77/80
Motivation. Testing glossary. Quality issues. Non-execution-based testing. Execution-based testing. What should be tested? Correctness proofs. Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.78/80When Testing Stop?
Only when the product has been irrevocably retired,
Do maintain old test cases.
6.79/80
Motivation. Testing glossary. Quality issues. Non-execution-based testing. Execution-based testing. What should be tested? Correctness proofs. Who should perform execution-based testing? Testing distributed SW, Testing Real-Time SW, When testing stops? Summary.
6.80/80
TESTING
The End