2018-12-15 Software Quality Measurementmcp130030/talks/t20181215... · 2018-12-01 · Quality Is A...
Transcript of 2018-12-15 Software Quality Measurementmcp130030/talks/t20181215... · 2018-12-01 · Quality Is A...
________________________________________________________________________Jonsson School of Engineering and Computer Science
Dr. Mark C. [email protected], [email protected]
Software Quality Measurement
What Is Quality?Some Traditional Definitions
Quality is conformance to requirements- Phillip Crosby
Quality is defined by the customer- W. Edwards Deming
Quality is fitness for use.- Joseph Juran
Nine dimensions of quality: performance, features, functionality, safety, conformance, reliability, durability, service, and aesthetics
- David Garvin (1987)
2
Is Quality “Failures” of the Software?IEEE 1633 Failure Severity
A rating system for the impact of every recognized credible software failure mode.
NOTE—The following is an example of a rating system:• Severity #1: Loss of life or system• Severity #2: Affects ability to complete mission
objectives• Severity #3: Workaround available, therefore minimal
effects on procedures (mission objectives met)• Severity #4: Insignificant violation of requirements or
recommended practices, not visible to user in operational use
• Severity #5: Cosmetic issue that should be addressed or tracked for future action, but not necessarily a present problem.
3
Is Quality “Defects” in the Software? IEEE 1044 Defect Attributes
Defect ID, Description, Status, Asset, Artifact, Version detected, Version corrected, Probability, Effect, Type, Mode, Insertion activity, Detection activity, Failure reference, Change reference, …
Priority Ranking for processing assigned by the organization responsible for the evaluation,resolution, and closure of the defect relative to other reported defects.
Severity The highest failure impact that the defect could (or did) cause, as determined by(from the perspective of) the organization responsible for software engineering.
4
Severity vs Priority
Severity
Critical
Major
Average
Minor
Exception
Priority
Resolve Immediately- show stopper
Give High Attention- urgent
Normal Queue- high
Low Priority- medium
Defer- low, cosmetic
Traditional Definitions of Quality(Glass 1998)
The traditional definitions are wrong. • quality is not about user satisfaction, product
conformance, or costs and schedules• nor is it solely about defects
Instead, there is a well-defined, intuitive relationship between quality and those other product traits, one that clearly distinguishes among them.
6
Quality Is A Collection (Glass, 2004)
Quality is a collection of attributes. - Reliability is about a software product that does what
it’s supposed to do, dependably.- Human engineering (also known as usability) is
about a software product that is easy and comfortable to use.
- Understandability is about a software product that is easy for maintainers to comprehend.
- Modifiability is about a software product that is easy for a maintainer to change.
- Efficiency is about a software product that economizes on both running time and space consumption.
- Testability is about a software product that is easy to test.
- Portability is about creating a software product that is easily moved to another platform.
7
Quality equals the totality of features and characteristics of a product or service that bear on its ability to satisfy stated and implied need.
- ISO 8402: 1986
Quality = portability + reliability + efficiency + human engineering + understandability + modifiability + testability (Boehm 1975)
- different software products call for different “ilities” mixes
- no universal definition of quality at the detail level, that is independent of the product in question, exists
8
ISO/IEC 25010 Software Quality Requirement and Evaluation (SQuaRE)
Functional suitability
- functional completeness
- functional correctness
- functional appropriateness
Performance efficiency
- time behavior - resource utilization- capacity
Compatibility- coexistence- interoperability
Usability- appropriateness
recognizability- learnability- operability- user error prediction- user interface
aesthetics- accessibility
9
Reliability- maturity- availability- fault tolerance- recoverability
Security- confidentiality- integrity- nonrepudiation- accountability- authenticity
Maintainability- modularity- reusability- analyzability- modifiability- testability
Portability- adaptability- installability- replaceability
10
Seven Quality Attributesin Software Architecture
Availability
Interoperability
Modifiability
Performance
Security
Testability
Usability
L. Bass, P. Clements, and R. Kazman, Software Architecture in Practice, Third Edition, 2012.
11
Availability
A property of software that it is there and ready to carry out its task when you need it to be.
Builds upon the concept of reliability by adding the notion of recovery
“Availability refers to the ability of a system to mask or repair faults such that the cumulative service outage period does not exceed a required value over a specified time interval.”
Availability is about minimizing service outage time by mitigating faults.
12
13
Interoperability
The degree to which two or more systems can usefully exchange meaningful information via interfaces in a particular context.
Syntactic interoperability – the ability to exchange data.
Semantic interoperability – the ability to correctly interpret the data being exchanged.
14
15
Modifiability
Modifiability is about change, and our interest in it centers on the cost and risk of making changes.
What can change?
What is the likelihood of the change?
When is the change made and who makes it?
What is the cost of the change?
16
17
Performance
It’s about time and the software system’s ability to meet timing requirements.
When events occur, the system, or some element of the system, must respond to them in time.• real-time• hard real-time
Characterizing the events that can occur (and when they can occur) and the system or element’s time-based response to those events is the essence is discussing performance.
18
19
Security
A measure of the system’s ability to protect data and information from unauthorized access while still providing access to people and systems that are authorized.
An action taken against a computer system with the intention of doing harm is called an attack.• an unauthorized attempt to access data or
services• an unauthorized attempt to modify data• intended to deny services to legitimate users
20
21
Testability
Refers to the ease with which software can be made to demonstrate its faults through (typically execution-based) testing.
Refers to the probability, assuming that the software has at least one fault, that it will fail on its next test execution.
Industry estimates indicate that between 30 and 50 percent (or in some cases, even more) of the cost of developing well-engineered systems is taken up by testing.
22
23
Usability
Concerned with how easy it is for the user to accomplish a desired task and the kind of user support the system provides.
Usability comprises:• Learning system features. • Using a system efficiently. • Minimizing the impact of errors. • Adapting the system to user needs. • Increasing confidence and satisfaction.
24
25
Cost and Schedule
Should cost and schedule be considered quality attributes?
Are they included in customer expectations?• for a custom software development project?• for a commercial software product?- look among competing products “off the shelf” for
features, cost, and availability
26
Key Measurement Questions
Are we measuring the right thing?• Goal / Question / Metric (GQM)• business objectives data- cost (dollars, effort)- schedule (duration, effort)- functionality (size)- quality (defects)
Are we measuring it right?• operational definitions
Popular measures focus on project management concerns… with a focus on “quality” as defects in meeting requirements.
27
Goal-Driven Measurement
Goal / Question / Metric (GQM) paradigm- V.R. Basili and D.M. Weiss, "A Methodology for
Collecting Valid Software Engineering Data,” IEEE Transactions on Software Engineering, November 1984.
SEI variant: goal-driven measurement- R.E. Park, W.B. Goethert, and W.A. Florac, “Goal-
Driven Software Measurement – A Guidebook,”CMU/SEI-96-HB-002, August 1996.
ISO 15939 and PSM variant: measurement information model
- J. McGarry, D. Card, et al., Practical Software Measurement: Objective Information for Decision Makers, Addison-Wesley, Boston, MA, 2002.
28
29
Goal-Driven Measurement
SLOC Staff-Hours Trouble Reports Milestone dates
Business => Sub-goals => Measurement
Goals
• How large is our backlog of customer change requests?
• Is the response time for fixing bugs compatible with customer constraints?
QuestionsGOAL(s)
Questions
Indicators
Measures
Indicator TemplateObjectiveQuestion
80
204060
100
InputsAlgorithmAssumptions
Action Plans
Analysis & Diagnosis
Infrastructure Assessment
Indicators
4
_____
_____
_____
Definition Checklist
4 4
_____ 4 _____ 4
Operational Definitions
The rules and procedures used to capture and record data
What the reported values include and exclude
Operational definitions should meet two criteria• Communication – will others know what has
been measured and what has been included and excluded?• Repeatability – would others be able to repeat
the measurements and get the same results?
30
SEI Core Measures
Dovetails with SEI’s adaptation of goal-driven software measurement
Checklist-based approach with strong emphasis on operational definitions
Measurement areas where checklists have already been developed include:• effort• size• schedule• quality
See http://www.sei.cmu.edu/measurement/index.cfm31
SLOC Definition ConsiderationsWhether to include or exclude• executable and/or non-executable code statements• code produced by programming, copying without
change, automatic generation, and/or translation• newly developed code and/or previously existing
code• product-only statements or also include support
code• counts of delivered and/or non-delivered code• counts of operative code or include dead code• replicated code
When the code gets counted• at estimation, at design, at coding, at unit testing,
at integration, at test readiness review, at system test complete
32
Time or time interval when the system must be available
Availability percentage
Time to detect the fault
Time to repair the fault
Time or time interval in which system can be in degraded mode
Proportion or rate of a certain class of faults that the system prevents, or handles without failing
33
Response Measures for Availability
Response Measures for Interoperability
Percentage of information exchanges correctly processed
Percentage of information exchanges correctly rejected
34
Response Measures for Modifiability
Cost in terms of …
Number, size, complexity of affected artifacts
Effort
Calendar time
Money (direct outlay or opportunity cost)
Extend to which this modification affects other functions or quality attributes
New defects introduced35
Response Measures for Performance
The time it takes to process the arriving events (latency or deadline)
The variation in this time (jitter)
The number of events that can be processed within a particular time interval (throughput)
A characterization of the events that cannot be processed (miss rate)
36
Performance Measures for Security
How much of a system is compromised when a particular component or data value is compromised
How much time passed before an attack was detected
How many attacks were resisted
How long it took to recover from a successful attack
How much data was vulnerable to a particular attack
37
Performance Measures for TestabilityEffort to find a fault or class of faults
Effort to achieve a given percentage of state space coverage
Probability of fault being revealed by the next test
Time to perform tests
Effort to detect faults
Length of longest dependency chain in test
Length of time to prepare test environment
Reduction in risk exposure • size of loss * probability of loss
38
Performance Measures for Usability
Task time
Number of errors
Number of tasks accomplished
User satisfaction
Gain of user knowledge
Ratio of successful operations to total operations
Amount of time or data lost when an error occurs39
40
Dysfunctional Behavior
Austin’s Measuring and Managing Performance in Organizations• motivational versus information measurement
Deming strongly opposed performance measurement, merit ratings, management by objectives, etc.
Dysfunctional behavior resulting from organizational measurement is inevitable unless• measures are made “perfect”• motivational use impossible
Software Complexity
Complexity is everywhere in the software life cycle… usually an undesired property… makes software harder to read and understand… harder to change
- I. Herraiz and A.E. Hassan, “Beyond Lines of Code: Do We Need More Complexity Metrics?” Chapter 8 in Making Software: What Really Works, and Why We Believe It, A. Oram and G. Wilson (eds), 2011, pp. 125-141.
Dependencies between seemingly unrelated parts of a system… (unplanned) couplings between otherwise independent system components
- G.J. Holzmann, “Conquering Complexity,” IEEE Computer, December 2007.
41
Complexity Is Complex
S. Kelly and M.A. Allison, The Complexity Advantage, 1999.• nonlinear dynamics• open and closed systems• feedback loops• fractal structures• co-evolution• natural elements of human group behavior- exchange energy (competition to collaboration)- share information (limited to open and fully)- align choices for interaction (shallow to deep)- co-evolve (from on-the-fly to with-coordination)
42
Types of Complexity
Nonlinear dynamics small differences at the start may lead to vastly different results
- the butterfly effect
Open systems the boundaries permit interaction with the environment
Feedback loops a series of actions, each of which builds on the results of prior action and loops back in a circle to affect the original state
- amplifying and balancing feedback loops
43
Fractal structures nested parts of a system are shaped into the same pattern as the whole
- self-similarity- software design patterns may contain other
patterns…
Co-evolution continual interaction among complex systems; each system forms part of the environment for all other systems
- system of systems- simultaneous and continual change- species survive that are most capable of adapting to
their environment as it changes over time
44
A Vague Concept
Not always clear what “complexity” is measuring...
Characteristics include difficulty of implementing, testing, understanding, modifying, or maintaining a program.
E.J. Weyuker, “Evaluating Software Complexity Measures,” September 1988.
45
Potential Software Complexity Measures
Lines of code
Source lines of code
Number of functions
McCabe cyclomatic complexity• maximum of all functions• average over functions
Coupling and cohesion
46
Halstead’s software science• length• volume• level• mental discriminations
Oviedo’s data flow complexity
Chidamber and Kemerer’s object oriented measures
Knot measure- for a structured program, the knot measure is always 0
47
Fan-in, fan-out
Henry and Kafura’s measure depends onprocedure size and the flow of information into procedures and out of procedures.• length x (fan-in x fan-out)- S. Henry and D. Kafura, “The Evaluation of Software
Systems’ Structure Using Quantitative Software Metrics,” Software Practice and Experience, June 1984.
And so forth…
48
(Source) Lines of Code
LOC – total number of lines in a source code file, including comments, blank lines, etc.
- countable using the Unix wc utility
SLOC – any line of program text that is not a comment or blank line, regardless of the number of statements or fragments of statements on the line• includes program headers, declarations,
executable and non-executable statements
I. Herraiz and A.E. Hassan, “Beyond Lines of Code: Do We Need More Complexity Metrics?” Chapter 8 in Making Software: What Really Works, and Why We Believe It, A. Oram and G. Wilson (eds), 2011, pp. 125-141.
49
McCabe Cyclomatic Complexity
In the control flow graph for a procedure reachable from the main procedure containing• N nodes• E edges• p connected procedures- only procedures that are reachable from the main
procedure
V(G) = E – N + 2p
For structured programs, V(G) = # of decisions + 1
T. McCabe, “A Complexity Measure,” IEEE Transactions on Software Engineering, September 1976.
50
Recommended Ranges for Cyclomatic Complexity
V(G) should be less than 10• commonly accepted range
Some recommend less than 5
Some suggest that 10-20 should be classified as “challenging”
51
Halstead’s Software ScienceM.H. Halstead, Elements of Software Science, 1977.
N1 number of operators in a programN2 number of operands in a programη1 number of unique operators in a programη2 number of unique operands in a programη program vocabulary = η1 + η2N program length = N1 + N2
V program volume = N x log2 ηD difficulty = (η1 / 2) x (N2 / η2)lv level = 1 / DE effort (number of mental discriminations) = D x V
B number of delivered bugs = V / 300052
Halstead’s E
Halstead’s E is in terms of discriminations per second• Stroud number is 18 discriminations / second
To convert Halstead’s E to person months, one correction factor is
18 discriminations/sec * 60 sec/min * 60 min/hr * 8 hr/day * 17 day/mon
= 8,812,800 discriminations/month
“Halstead Time” is typically measured in minutes.
53
Weyuker’s Properties of Complexity Measures
E.J. Weyuker, “Evaluating Software Complexity Measures,” IEEE Transactions on Software Engineering, September 1988.
1) There exists programs P, Q, such that |P| ≠ |Q|.2) Let c be a nonnegative number. There are only finitely many
programs of complexity c.3) There are distinct programs P and Q such that |P| = |Q|.4) There exists P, Q, such that P is equivalent to Q and |P| ≠ |Q|.5) For every P, Q then |P| ≤ |P;Q| and |Q| ≤ |P;Q|.6) a) There exists P, Q, R such that |P| = |Q| and |P;R| ≠ |Q;R|.
b) There exists P, Q, R such that |P| = |Q| and |R;P| ≠ |R;Q|.7) There are P and Q such that Q is formed by permuting the order
of the statements of P and |P| ≠ |Q|.8) If P is a renaming of Q, then |P| = |Q|.9) There exists P, Q such that |P| + |Q| < |P;Q|.
54
Summary of Weyuker’s FindingsProperty LOC
(statement count)
Cyclomatic complexity
Halstead effort
Data flow complexity
1 Yes Yes Yes Yes2 Yes No Yes No3 Yes Yes Yes Yes4 Yes Yes Yes Yes5 Yes Yes No No6 No No Yes Yes7 No No No Yes8 Yes Yes Yes Yes9 No No Yes Yes
55
Scales and Legal Operations
56
Defects and Reliability
Defect prediction models• predict the number of defects in a module or
system• predict which modules are defect-prone
Reliability models• predict failures (usually mean-time-to-failure
MTTF)
Be wary of attempts to equate defect densities with failure rates!
57
Who Uses?
Defect prediction models are used during development.• by project management and the development
team• to focus effort on the parts of the system that
need the most attention• to understand the impact of selected
processes, techniques, and tools on quality
Reliability models can be used during testing to determine where the software is ready to release.• to understand the quality of the operational
software
58
Causal Factors for Defects
Difficulty of the problem
Complexity of designed solution
Programmer/analyst skill
Design methods and procedures used
N.E. Fenton and M. Neil, “A Critique of Software Defect Prediction Models,” IEEE Transactions on Software Engineering, September/October 1999.
59
Explanatory Variables for Predicting Defects
Size measures (LOC)
Complexity measures - McCabe cyclomatic complexity- Halstead software science: effort- count of procedures- Henry and Kafura’s Information Flow Complexity- Hall and Preisser’s Combined Network Complexity
OO structural measures (Chidamber and Kemerer)
Code churn measures- amount of change between releases
60
Process change and fault measures- experience- number of developers making changes- number of defects in previous releases- number of LOC added/changed/deleted
61
Techniques Used
Regression models• multicollinearity is a problem
Factor analysis / principal component analysis
Bayesian belief networks
Artificial neural networks
Capture-recapture
62
Bayesian Belief NetworksFenton and Neil (1999) concluded that Bayesian belief nets were the best solution.
- Explicit modeling of “ignorance” and uncertainty in estimates, as well as cause-effect relationships.
- Makes explicit those assumptions that were previously hidden - hence adds visibility and auditability to the decision-making process.
- Intuitive graphical format makes it easier to understand chains of complex and seemingly contradictory reasoning.
- Ability to forecast with missing data.- Use of “what-if?” analysis and forecasting of effects of
process changes.- Use of subjectively or objectively derived probability
distributions.- Rigorous, mathematical semantics for the model.
63
Capture-Recapture
Uses the overlap between the sets of defects found by different reviewers to estimate residual defects
Assumptions• reviewers work independently of each other• searching is performed before, and not during,
an inspection meeting
If the overlap is large, few defects are left to be detected.
If the overlap is small, many faults are undetected.
64
History of Capture-Recapture
First known use of capture–recapture was byLaplace (1786), who used it to estimate the population size of France
In biology, capture–recapture is used to estimate the population size of animals in an area
65
Lincoln-Petersen Method
N = M C / R
N – estimate of total population size
M – total number of animals captured and marked on the first visit
C – total number of animals captured on the second visit
R – number of animals captured on the first visit that were then recaptured on the second visit
66
Example
Capture 10 specimens on a first visit and mark them
Capture 15 specimens on a second visit • 5 are marked from the first visit
N = M C / R = (10) (15) / 5 = 30
67
Chapman Estimator
A less biased estimator for small samples
N = [(M + 1) (C + 1) / (R + 1)] – 1
var(N) = [(M + 1) (C + 1) (M – R) (C – R)] / [(R + 1) (R + 1) (R + 2)]
Example• N = [(10 + 1) (15 + 1) / (5 + 1)] -1 = 29.3• var(N) = (11*16*5*10) / (6*6*7) = 34.9• std(N) = sqrt(34.9) = 5.9
68
Capture-Recapture Models in Software Engineering
Basic model (M0) assumes that all faults are equally probable to be found and that allreviewers have equal abilities to find faults
Mh model – the probabilities of fault detection vary
Mt model – abilities of reviewers vary
Mth model – both the probabilities of fault detection and the abilities of reviewers vary
69
Capture-Recapture Estimators
M0• M0–ML – maximum likelihood (Otis, 1978)
Mt • Mt–ML – maximum likelihood (Otis, 1978)• Mt–Ch – Chao’s estimator (Chao, 1989)
Mh • Mh–JK – Jackknife (Burnham, 1978)• Mh–Ch – Chao’s estimator (Chao, 1987)
Mth • Mth–Ch – Chao’s estimator (Chao, 1992)
70
Challenges in Using Defect Prediction Models (Fenton, 1999)
Difficult to determine in advance the seriousness of a defect
Great variability in the way systems are used by different users, resulting in wide variations of operational profiles
Difficult to predict which defects are likely to lead to failures (or to commonly occurring failures)
- 33% of defects led to failures with a MTTF greater than 5,000 years
- proportion of defects which led to a MTTF of less than 50 years was around 2%
71
Software ReliabilityJ.D. Musa, A. Iannino, and K. Okumoto, Software Reliability: Measurement, Prediction, Application, 1987.
Probability of failure-free operation of a computer program for a specified time in a specified environment.
Reliability is defined with respect to time.• execution time• calendar time
Characterizing failure occurrences in time• time of failure• time interval between failures• cumulative failures experienced up to a given time• failures experienced in a time interval
72
The Random Nature of Failures
Mistakes by programmers, and hence the introduction of defects, is a complex, unpredictable process.
Conditions of execution of a program are generally unpredictable.
Failure behavior is affected by two principal factors• number of defects in the software being executed• execution environment or operational profile of
execution
73
Nonhomogenous Processes
A random process whose probability distribution varies with time is called nonhomogeneous.
Musa’s basic execution time model and logarithmic Poisson execution time model assume that failures occur as a (NHPP) nonhomogeneous Poisson process.
As the software is debugged, defects are removed changing failure profiles a nonhomogeneous process
74
Predicting Reliability
Stochastic reliability growth models can produce accurate predictions of the reliability of a software system providing that a reasonable amount of failure data can be collected for that system in representative operational use.
Unfortunately, this is of little help in those many circumstances when we need to make predictions before the software is operational.
N. Fenton and M. Neil, “Software Metrics: Successes, Failures, and New Directions,” The Journal of Systems and Software, July 1999.
75
What Is Quality Today?
A collection of quality attributes• some are relevant to a given project, some are not
Budget and schedules are important “quality” factors for custom software development
Failures are a critical quality attribute from the customer’s perspective• low failure rates are necessary but not sufficient
for customer satisfaction
It is important for developers to measure and understand defects to achieve the other quality factors.
76
Operationally Defining Software Quality
What quality attributes of the application are important to the customer and users?
How will we measure these attributes?
How will we verify they have been achieved?• are they testable?• must we simulate the operational environment?
How can we predict what they will be in operation?• operational profiles vary at any time and change
over time
77
78
Questions and Answers