Operating System Introduction. Topics What is an OS? OS History OS Concepts OS Structures.
QUALITY ASSURANCE - Northeastern University · ... [IEEE 610.12 IEEE Standard Glossary of Software...
-
Upload
nguyenkhuong -
Category
Documents
-
view
220 -
download
2
Transcript of QUALITY ASSURANCE - Northeastern University · ... [IEEE 610.12 IEEE Standard Glossary of Software...
• Software Assurance: The planned and systematic set of activities that ensures that software life cycle processes and
products conform to requirements, standards, and procedures.
• Software Quality: The discipline of software quality is a planned and systematic set of activities to ensure quality is
built into the software. It consists of software quality assurance, software quality control, and software quality engineering. As
an attribute, software quality is (1) the degree to which a system, component, or process meets specified requirements. (2)
The degree to which a system, component, or process meets customer or user needs or expectations [IEEE 610.12 IEEE
Standard Glossary of Software Engineering Terminology].
• Software Quality Assurance: The function of software quality that assures that the standards, processes, and
procedures are appropriate for the project and are correctly implemented.
• Software Quality Control: The function of software quality that checks that the project follows its standards,
processes, and procedures, and that the project produces the required internal and external (deliverable) products.
• Software Quality Engineering: The function of software quality that assures that quality is built into the software by
performing analyses, trade studies, and investigations on the requirements, design, code and verification processes and
results to assure that reliability, maintainability, and other quality factors are met.
• Software Reliability: The discipline of software assurance that 1) defines the requirements for software controlled
system fault/failure detection, isolation, and recovery; 2) reviews the software development processes and products for
software error prevention and/or controlled change to reduced functionality states; and 3) defines the process for measuring
and analyzing defects and defines/derives the reliability and maintainability factors.
• Verification: Confirmation by examination and provision of objective evidence that specified requirements have been
fulfilled [ISO/IEC 12207, Software life cycle processes]. In other words, verification ensures that “you built it right”.
• Validation: Confirmation by examination and provision of objective evidence that the particular requirements for a specific
intended use are fulfilled [ISO/IEC 12207, Software life cycle processes.] In other words, validation ensures that “you built
the right thing”.
Definitions According to NASA
From: http://www.hq.nasa.gov/office/codeq/software/umbrella_defs.htm
Technology Objective: Designing a quality system and writing quality software
√ The tech team aims to deliver a correctly behaving system to the client
Software Quality Assurance is about assessing if the system meets expectations
Доверяй, но проверяй
(Russian Proverb - Doveryay, no proveryay)
Trust, but verify
Software Quality Assurance
Validation
Are we building the right
product or service?
Verification
Are we building the
product or service right?
Validation Versus Verification
Both involve testing – done at every stage
but “testing can only show the presence of errors,
not their absence” Dijkstra
Product Trials User Experience
Evaluation
Validation
Typically a client-leaning activity
After all, they are the ones who asked for the
system
Optimist: It’s about showing
correctness/goodness
Pessimist: It’s about identifying defects
Verification
Bad Input
Good Input
System
Bad Output
Good Output?
?
Quality versus Reliability
Quality Assurance
Assessing whether a
software component or
system produces the
expected/correct/accepted
behavior or output
relationship between a
given set of inputs
OR
Assessing features of the
software
Reliability
Probability of failure-free
software operation for a
specified duration in a
particular environment
Cool phrases
Five 9’s
No down-time
The First "Computer Bug". Moth found trapped between points at Relay # 70, Panel F, of the Mark II Aiken Relay Calculator while it was being tested at Harvard University, 9 September 1947.
The operators affixed the moth to the computer log, with the entry: "First actual case of bug being found". They put out the word that they had "debugged" the machine, thus introducing the term "debugging a comp...uter program".
In 1988, the log, with the moth still taped by the entry, was in the Naval Surface Warfare Center Computer Museum at Dahlgren, Virginia. The log is now housed at the Smithsonian Institution’s National Museum of American History, who have corrected the date from 1945 to 1947. Courtesy of the Naval Surface Warfare Center, Dahlgren, VA., 1988. NHHC Photograph Collection, NH 96566-KN (Color).
Fun Story – First Computer Bug (1947)
From https://www.facebook.com/navalhistory/photos/a.77106563343.78834.76845133343/10153057920928344/
Other factors include
Quality of the Process
Quality of the Team
Quality of the Environment
Testing is Computationally Hard
The space is huge and it is generally infeasible to test anything completely
Assessing quality is an exercise in establishing confidence in a system
Or Minimizing Risks
Hardware
Host OS
OS1
App1
VM
Each layer
introduces
risk
• Component behavior
• Interactions between components
• System and sub-system behavior
• Interactions between sub-systems
• Negative path
• Behavior under load
• Behavior over time
• Usability
Lots to Consider
Static Evaluations
Making judgments
without executing the
code
Dynamic Evaluations
Involves executing the
code and judging
performance
Two Approaches
Often a formal process
Value: finding issues at design/definition time rather than waiting for results of the step to
complete
Highly effective, but does not replace the need for dynamic
techniques
Static Technique - Reviews
Fundamental QA Technique
Peer(s) reviews artifact for correctness and clarity
Requirements
Architecture & Design
Implementation
Test Plans
• Single reviewer model– Usually a “certified” / senior person
• Panel model– Highly structured reviews
– Can take significant preparation
• Usually done at the design or development stage
• May introduce delay between when code is written and when it gets reviewed
One Extreme: Jury/Peer Reviews
Before anything is accepted, someone other than
the creator must review it and approve it
Review Meeting
Value
Second opinion on clarity, effectiveness, and efficiency
Learning from others
Avoids “board blindness” on seeing flaws
Peer pressure to be neat and tie up loose ends
Reviews
Models exist for both reviewer or author to lead the discussion
Author usually provides participants materials to study in advance
Requires positive and open attitudes and preparation
Author
Moderator Scribe
Review Panel
Peers
Experts
Client(s)
Lightweight Peer Reviews
One person drives while the other watches/reviews
Derived from Extreme Programming, current favorite in
agile
When compared to solo dev models,
MAY cause higher initial cost per module created (time and
resource), BUT higher quality and lower overall cost
Paired Programming
Continuous review
Shared problem solving
Better communications
Learning from Peer
Social!
Peer PressureSee as an example
http://collaboration.csc.ncsu.edu/laurie/Papers/XPSardinia.PDF
Clarity
Can the reader easily and directly understand what the artifact is doing
Correctness
Analysis of algorithm used
Common Code Faults
1. Data initialization, value ranges and type mismatches
2. Control: are all the branches really necessary (are the conditions properly and efficiently organizated)? Do loops terminate?
3. Input: are all parameters or collected values used?
4. Output: every output is assigned a value?
5. Interface faults: Parameter numbers, types, and order; structures and shared memory
6. Storage management: memory allocation, garbage collection, inefficient memory access
7. Exception handling: what can go wrong, what error conditions are defined and how are they handled
What do reviews look for?
List adapted from W. Arms: http://www.cs.cornell.edu/Courses/cs5150/2015fa/slides/H2-testing.pdf
You are asked to sort an array. There are many algorithms to sort an array. [You aren’t going to use a library function so you have to write this]
Many choices exist. Suppose you are deciding between bubble sort, quicksort, and merge sort. All will work (sort an array), but which will be the better code ?
Examples
Bubble sort is very easy to write: two loops. Slow on average O(n2) –how big will n be?? O(n) for memory.
Quicksort is complicated to write. O(n log(n)) on average, O(n2) worst case. Requires constant memory O(n). Very effective on in-memory data. Most implementations are very fast.
Mergesort is moderate to write. O(n log(n)) worst case. Memory required is a function of the data structure. Very effective on data that requires external access.
boolean SquareRoot (double dValue,
double &dSquareRoot)
{
boolean bRetValue = false;
if (dValue < 0) {
dSquareRoot = NULL;
bRetValue = false;
}
else {
dSquareRoot = pow(dValue, 0.5);
bRetValue = true;
}
return bRetValue;
}
boolean SquareRoot (double dValue,
double &dSquareRoot)
{
dSquareRoot = NULL;
if (dValue < 0)
return false;
dSquareRoot = pow(dValue, 0.5);
return true;
}
Expressively Logical…
Evaluate code modules automatically looking for errors or odd things
Loops or programs within multiple exits (more common) or entries (less common)
Undeclared, uninitialized, or unused variables
Unused functions/procedures, parameter mismatches
Unassigned pointers
Memory leaks
Show paths through code/system
Show how outputs depend on inputs
Static Program Analyzers
1. Write SIMPLE code
2. If code is difficult to read,
RE-WRITE IT
3. Test implicit assumptions
– Check all parameters
passed in from other
modules
4. Eliminate all compiler
warnings from code
5. It never hurts to check
system states after
modification
Rules of Defensive Programming
(taken from Bill Arms)
Based on Murphy’s Law:
Anything that can go wrong, will
Quick Terminology
• Mistake– A human action that results in
an incorrect result
• Fault / Defect– Incorrect step, process, or data
within the software
• Failure– Inability of the software to
perform within performance criteria
• Error– The difference between the
observed and the expected value or behavior
Objective
Write test cases and organize
them into suites that cause failure
and illuminate faults
Ideally you will fail in striving for
this objective,
but you will be surprised how
successful you may be
Dynamic Evaluations
Developers
Experienced Outsiders
and Clients
Inexperienced Users
Mother Nature
Good for exposing known risks areas
Good for finding gaps missed by developers
Good for finding other errors
Always finds the hidden flaw
Who is a Tester?
1. Top Down
– System flows are tested
– Units are stubbed
2. Bottom Up
– Each unit is tested on its own
3. Stress – Test at or past design limits
Approaches
Especially useful in• UI’s, UX• Workflows• Very large systems
Testing Flow
(Dynamic Evaluation)
UnitTest
Integration Functional
PerformanceInstallation
Soak –Operational Readiness Acceptance
System Test
Unit
Operational
Test
Client
Test
• No access to the internal workings of the system under test (SUT)
• Testing against specifications– The tester knows what the
SUT’s I/O or behavior should be
• The tester observes the results or behavior
• With software, this tests the interface
→What is input to the system?
→What you can do from the outside to change the system?
→What is output from the system?
Black Box Testing
White Box Testing
• Have access to the internal workings of the system under test (SUT)
• Testing against specifications, with access to algorithms, data structures, and messaging.
• The tester observes the results or behavior
• Testing evaluates logical paths through code– Conditionals
– Loops
– Branches
• Impossible to exercise all paths completely, so you make compromises
– Focus paths on only important paths• Keeping components
small is a big help here
– Focus on only importantdata structures
Tests focus an individual component
1. Interfaces
2. Messages
3. Shared memory
4. Internal functions
Emphasizes adherence to the specifications
Code bases often include the code and the unit tests
as a coherent pieceUsually done by developers building the
component
Unit tests decouple the
developer from the code
Individual code ownership is not
required if unit tests protect the
code
Unit tests enable
refactoring
After each small change, the unit
tests can verify that a change in
structure did not introduce a
change in functionality
Ground Floor – Unit Testing
What Makes for a Good Test
Test Perspective
• Either addresses a partition of inputs or tests for common developer errors
• Automated
• Runs Fast – To encourage frequent use
• Small in scope – Test one thing at a time
• When a failure occurs, it should pinpoint the issue and not require much debugging
– Failure messages help make the issue clear
– Should not have to refer to the test to understand the issue
Tester Perspective
Know why the test exists
– Should target finding specific
problems
– Should optimize the cost of
defining and running the test
against the likelihood of
finding a fault/failure
Organizing Testing
Test PlanDescribes test activities
1. Scope
2. Approach
3. Resources
4. Schedule
Identifies
• What is to be tested
• The tasks required to do the testing
• Who will do each task
• The test environment
• The test design techniques
• Entry and exit criteria to be used
• Risk identification and contingency planning
Test Suite
A set of test cases and scripts to
measure answers
Often the post condition of one test
is often used as the precondition for
the next one
OR
Tests may be executed in any order
Adapted from http://sqa.stackexchange.com/questions/9119/test-suite-vs-test-plan
An assessment of a defect’s impact
Can be a major source of contention between dev and test
Defect Severity
Critical Show stopper. The functionality cannot be delivered unless that defect is cleared. It does not have a workaround.
Major Major flaw in functionality but it still can be released. There is a workaround; but it is not obvious and is difficult.
Minor Affects minor functionality or non-critical data. There is an easy workaround.
Trivial Does not affect functionality or data. It does not even need a workaround. It does not impact productivity or efficiency. It is merely an inconvenience.
1. Document PurposeShort description about the objective
2. Application Overview Overview of the SUT
3. Testing ScopeDescribes the functions/modules in and out of scope for testing. Also identifies what was omitted.
4. MetricsResults of testing, including summaries
– Number of test cases planned vs executed
– Number of test cases passed/failed
– Number of defects identified and their Status & Severity
– Distribution of defects
5. Types of testing performed– Description of tests run
6. Test Environment and Tools– Description of the environment.
Helpful for recreating issues and understanding context
7. Recommendations– Workaround options
8. Exit Criteria– Statement whether SUT passes or
not
9. Conclusion/Sign Off– Go/ no go recommendation
Test Exit Report – Input to Go/No Go Decision
• If a single value, try– Negative values
– Alternate types
– Very small or very large inputs (overflow buffers if you can)
– Null values
• If input is a sequence, try– Using a single valued sequence
– Repeated values
– Varying the length of sequences and the order of the data
– Forcing situations where the first, last and middle values are used
• Try to force each and every error message
• Try to force computational overflows or underflows
Testing Hint #1 – Mess With Inputs
Each logical path must be exercised at least onceEach execution path through the code
• If…then…else = Two paths
• Switch…case() = One path per case • +one path if no catch-all case
• Repeat…Until ≥ Two paths
• While…Do ≥ Two paths
• Object member functions = 1 path per signature
Testing Hint #2 – Force Every Path
• Remember, interfaces may be involve1. References to data or functions
• Data may be passed by-reference or by-data
• Methods only have data interfaces
2. Shared memory
3. Messages
• Set interface parameters to extremely low and high values
• Set pointer values to NULL
• Mis-type the parameters or violate value boundaries – e.g. set input as negative where the signature expects ≥ 0
• Call the component so it will fail and check the failure reactions
• Pass too few or too many parameters
• Bombard the interface with messages
• With shared memory, vary accessor instantiation and access activities
Testing Hint #3 – Mess With Interfaces
Internals1. Functions2. Data
Try to break the system by using data with
extreme values to crash the system
Testing Hint #4 – Be Diabolical
• If unit testing is not thorough, all subsequent testing will likely be a waste of time.
• You should always take the time to do a good job with unit testing– Even when the project is
falling behind
• The end of a project is almost always compressed– Developers often defer testing-
related tasks until as late as possible
• Unit tests will be most needed
when you have the least
amount of time
– Unit tests should be created
before they are needed, not
when you need them
Life Lessons
Like Unit Test, activities focus
on following uses and data
1. Typical
2. Boundaries
3. Outliers
4. Failures
Unlike Unit Test
• Components may come from many, independent parties
• Bespoke development may meet Off-The-Shelf or reused components
• Testing becomes a group activity
• Testing may move to an independent team altogether
System Test
Integrating components and sub-systems to create the system
Testing checks on component compatibility, interactions, correctly passing information, and timing
Some behavior is only clear when you put
components together
This has to be tested too,
although it can be very hard to plan in advance!
Unlike Components, Systems Have
Emergent Behavior
Integrating Multiple Parties May Introduce
Conflict
System Integration
• Components may come from
multiple, possibly independent,
parties
• Bespoke development may
meet Off-The-Shelf or reused
components
• Testing becomes a group
activity
• Testing may move to an
independent team altogether
Implications
• Who controls integration readiness?– What does lab entry mean?
– Are COTS components trusted?
• How to assign credit for test results and then who is responsible for repairs?– How to maintain momentum
when everyone isn’t at the table?
– When partner priorities are not shared?
– What about open source?
Use Cases are a useful testing model
• Forces components to interact
• Sequence diagrams form a strong basis for designing these tests– Articulates the inputs required and
the expected behaviors and outputs
Testing Focus
Emphasizes component compatibility, interactions, correctly passing information, and timing
Integration aims to find misunderstandings one component introduces when it interacts with other components
Two senses
1. Create tests incrementally
2. Run tests iterativelya. On check-in and branch merge, test all affected modules
b. On check-in, test all modules
c. Per a schedule, test all modules– E.g. daily
Each change, especially after a bug fix, should mean adding at least one new test case
It is always best to test after each change as completely as you can, and completely before a release
Iterative Development Leads to Iterative Testing
Regression Testing
many defects found few defects found
few defects found few defects found
Picking the Subset
Selection based on company policy
Every statement must executed one
Every path must be exercised
Crafted by specific end user use cases (scenario testing)
Selection based on testing team experience
Your testing is good enough until a problem
shows that it is not good enough
It is hard to know when you should feel enough confidence to release the system
Confidence comes, in part, on the sub-test of possible tests selected
Software
Quality
Test Quality
highlow
low
high
bugDensityrelease(i) =bugspre−release (i) + bugspost−release (i)
𝑐𝑜𝑑𝑒𝑀𝑒𝑎𝑠𝑢𝑟𝑒
If density for the next release’s additional code is within ranges of prior releases, it is a candidate for
release
Unless test or development practices have improved
Measuring Quality: Defect Density
Using the past to estimate the future
Judges code stability by comparing past number of bugs per code measure (lines of code, number of modules,…) to
present measured levels
7
9.5
0
1
2
3
4
5
6
7
8
9
10
Release 1 2
Poor Test Coverage/Quality
Poor Software Quality
Expected Quality
Def
ect
Den
sity
Using a known quantity as inference to the unknown
Judges code stability by intentionally inserting bugs into a program and then measuring how many get found as an estimator for the
actual number of bugs
bugsrelease(i) =seededBugsp𝑙𝑎𝑛𝑡𝑒𝑑(i)
seededBugsfound(i)∗ bugsfound(i)
Challenges
1. Seeding is not easy. Placing right kinds of bugs in enough of the code is hard.
– Bad seeding, being too easy or too hard to find, creates false senses of confidence in your reviews and testing
• Too easy: doesn’t mean that most or all of the real bugs were found.
• Too hard: danger of looking past the Goodenov line or for things that aren’t there
2. Seeded code must be cleansed of any missed seeds before release. Post clean-up, the code must be tested to insure nothing got accidently broken.
Measuring Quality: Defect Seeding
Applies estimating technique used in predicting wild-life populations (Humphrey, Introduction to Team Software Process, Addison
Wesley, 2000)
Uses data collected by two or more independent collectors
Collected via reviews or tests
Example: Estimating Turtle Population
You tag 5 turtles and release them.
You later catch 10 turtles, two have tags.
𝑇𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑡𝑢𝑟𝑡𝑙𝑒𝑠
5 𝑡𝑢𝑟𝑡𝑙𝑒𝑠≈
10 𝑡𝑢𝑟𝑡𝑙𝑒𝑠
2 𝑡𝑢𝑟𝑡𝑙𝑒𝑠𝑇𝑜𝑡𝑎𝑙 # 𝑜𝑓 𝑡𝑢𝑟𝑡𝑙𝑒𝑠 =
10 𝑡𝑢𝑟𝑡𝑙𝑒𝑠 ∗ 5 𝑡𝑢𝑟𝑡𝑙𝑒𝑠
2 𝑡𝑢𝑟𝑡𝑙𝑒𝑠= 25 𝑡𝑢𝑟𝑡𝑙𝑒𝑠
Measuring Quality: Capture-Recapture
Each collector finds some defects out of the total number of defects
Some of these defects found will overlap
Method
1. Count the number of defects found by each collector (A, B)
2. Count the number of intersecting defects found by each collector (C)
3. Calculate defects found = (A+B) - C
4. Estimate total defects = (𝐴∗𝐵)
𝐶
5. Estimate remaining defects = (𝐴∗𝐵)
𝐶- (A+B)-C
If multiple collectors, assign A to the highest collected number and set B to the rest of the collected defects. When multiple engineers find the same defect, count it just once.
Capture-Recapture
Performance
Aims to assess compliance with
non-functional requirements
StressIdentify defects that emerge only
under load
Performance Testing
Measures the system’s capacity to process load
Involves creating and executing an operational profile that reflects the expected values of uses
Endurance
Measures reliability and availability
Ideally the system should degrade gracefully rather than collapse under load
Under load, other issues like protocol overhead or timing issues take center stage