CSEB233: Fundamentals of Software Engineering Introduction to Software & Software Engineering.
CS5103 Software Engineering
-
Upload
wyatt-garner -
Category
Documents
-
view
22 -
download
0
description
Transcript of CS5103 Software Engineering
2
Static bug detection
Static bug detection is a minor approach for software quality assurance, compared with testing
Compared to testing Work for specific kinds of bugs
Sometimes not scalable
Generate false positives
Easy to start (no build, no setup, no install …)
Sometimes can guarantee the software to be free of certain kinds of bugs
No need for debugging
3
State-of-art: static bug detection
Type-specific detection (Fixed Specification and improvement is provided) Major or important type of bugs
Null pointer, memory leak, unsafe cast, injection, buffer overflow, Dynamic SQL error, racing, deadlock, dead loop, html error, UI inconsistency, i18n bugs, …
A large bunch of techniques for each kind of bugs Most of them have severe limitations preventing them
from practical usage
Specification based detection Model checking, symbolic execution, theorem proving
4
Specification
A description of the correct behavior of software
We must have formal specification to do static bug detection
Three main types of specifications Value
Temporal
Data Flow
5
Value Specification
The value (s) of one or several variable (s) must satisfy a certain constraint
Example: Final Exam Score <= 100
sortedlist(0) >= sortedlist(1)
http_url.startsWith(“http”)
Sql_query belongs to Language_SQL
6
Temporal Specification
Two events (or a series of events) must happen in a certain order
Example lock() -> unlock()
file.open() -> file.close() and file.open() -> file.read()
They are different, right?
Temporal Logic Lock() -> F(unlock())
(!read())U(open())
7
Data Flow Specification
Data from a certain source must / must not flow to a certain sink
Example: ! Contact Info -> Internet
Password -> encryption -> Internet
Data Flow Specification are mainly for security usage
8
General Specifications
Common behaviors of all software a/b -> b!=0
a.field -> a!=null
a[x] -> x<a.length()
p.malloc() -> p.free()
lock(s) -> unlock(s)
while(Condition) -> F(!Condition)
<script> xxx </script> -> ! User_input -> xxx
! Hard-coded string -> User Interface
Divide by 0
Null Pointer Reference
Buffer Overflow
Memory Leak
deadlock
Infinite Loop
XSS
I18n error
9
Checking SpecificationsBasic ways
Value Specifications Symbolic execution
Temporal Specification Model Checking
Data Flow Specification Graph traversal (Data Dependence Graph)
Static symbolic execution
Basic Example
y = read(); y = 2 * y; if (y <= 12) y = 3; else y = y + 1;print ("OK");
T (y=s), s is a symbolic variable for input
Here T is the condition for the statement to be executed, (y=s) is the relationship of all variables to the inputs after the statement is executed
T (y=2*s)T (y=2*s)T^y<=12 (y = 3)
T^!(y<=12) (y= 2*s + 1)
T^ 2*s<=12 (y= 3 ) | T^!(2*s<=12) (y=2*s + 1)
(2*s <= 12 & y = 3) & y <= 0 Not Satisfiable
!(2*s <= 12) & (y = 2*s + 1) & y<=0 Not SatisfiableProve y > 0?
11
Static symbolic execution
Complex Example
y = read(); p = 1; while(y < 10){ y = y + 1; if y >2 p = p + 1; else p = p + 2;}print (p);
T (y=s), s is a symbolic variable for inputT (p = 1, y = s)T (p = 1, y = s)T^ s<10 (y = s + 1, p = 1)
T^!(2 < s + 1< 10) (y = s + 1, p = 2)
T^s + 1<=2 (y = s + 1, p = 3)
T^ 2<s+1<10 (y = s + 2, p = 2) | s+1<=2 (y = s + 2, p = 3)
…
Prove p > 0?
12
Checking SpecificationsBasic ways
Value Specifications Symbolic execution
Temporal Specification Model Checking
Data Flow Specification Graph traversal (Data Dependence Graph)
13
Model Checking
Basic idea Transform the program to an automaton
Program states are state of the automaton, and statements are transitions / edges
Checking temporal properties on the automaton by traversing it
14
Model Checking: Model Building
Basic approach: Use Control Flow Graph:
View all program states after a statement as ONE state
Use Abstract states View all program states after a statement with
same abstract values as ONE state Use Concrete values
View all program states after a statement with same concrete values as ONE state: usually impossible
15
An example with CFG-model Checking whether a file is closed in all
casesboolean load(){ f.open(); line = f.read(); while(line!=null){ if(line.contains('key')){ f.close() return true; }else if(line.contains('value')){ f.close() } line = f.read(); } return false;}
Start
opened
new line read
!=null
key
value
none==null
f is not open
closed
closed
ret
16
An example with CFG-model Traversing the model to find contrary
examples Start
opened
new line read
!=null
key
value
none==null
f is not open
closed
closed
ret
17
An example with CFG-model Read must before close
Start
opened
new line read
!=null
key
value
none==null
f is not open
closed
closed
ret
18
Temporal Logic
The basic idea of model checking is to find a certain path in the model that violate the specification
Describe the sequential relationship among a number of events: the specification So that any specification can just be read by a
path finding tool Do not need to bother writing a path finding tool
for each proof
19
Usage of Temporal Logic
Describe the sequential relationship among a number of events
U: until PUQ means that P has to be true until Q is true
!read(f)Uopen(f) !close(f)Uopen(f)
F: Future FP means that P will be true some time in future
open(f) -> Fclose(f) close(f) -> !Fread(f)
20
Checking SpecificationsBasic ways
Value Specifications Symbolic execution
Abstract Interpretation
Temporal Specification Model Checking
Data Flow Specification Graph traversal (Data Dependence Graph)
21
Some Simple check with Graph Traversal
Check x flows to w
Check (!z used as divider)U(Z is written)
22
Problems of static bug detection
Lack of Specifications Very rare project-specific formal specification
Solutions: General specifications (for typical bugs) Mining specifications (for API-specific, project-specific
specifications)
False Positives vs. Efficiency More sensitivities -> higher cost
Path sensitivity is rarely achieved
Combination of all sensitivities -> Incomputable problems
23
State-of-practice: static bug detection
Findbugs A tool developed by researchers from UMD
Widely used in industry for code checking before commit
The idea actually comes from Lint
Lint A code style enforcing tool for C language
Find bad coding styles and raise warnings Bad naming Hard coded strings …
24
Idea: do it reversely Most static bug detection tools
Set up a specification (either from users or well-defined ones) E.g., Devisor should not be 0, null pointer should not
be referred to, the salary of a personal cannot be negative
Check all possible cases to guarantee that the specification hold
Otherwise provide counter-examples
Findbugs Detect code patterns for bugs
E.g., a = null, b = a.field; str.replace(“ ”, “”);
25
Characters of Findbugs Based on existing concrete code patterns
Check code patterns locally: only do inner-procedure analysis What are the advantages and disadvantages of
doing so?
Perform bug ranking according to the probability and potential severity of bugs Probability: the bug is likely to be true
Severity: the bug may cause severe consequence if not fixed
26
Application of Findbugs-like tools Findbugs is adopted by a number of large
companies such as Google Usually only the issues with highest
confidence/severity are reported as issues
A statistics in Google 2009: More than 4000 issues are identified, in which
1700 bugs are confirmed, and 1100 are fixed.
The software department of USAA is using PMD, an alternative of Findbugs
27
Patterns to be checked 404 bug patterns in 6 major categories
Bad Practice / Dodgy code
Correctness
Internationalization
Vulnerability / Security
Multithread correctness
Performance
28
Bad Practice / Dodgy code Hackish code, not stable and may harm future
maintenance
Examples: Equals method should not assume type of object
argument
boolean Equals(Object o){
Myclass my = (Myclass)o;
return my.id = this.id;
}
Abstract class defines covariant compareTo() method
int compareTo(Myclass obj){ … }
29
Correctness The code pattern may result in incorrect
behavior of the software
Examples: DMI: Collections should not contain themselves
List s = new …; …
if(s.contains(s)){ … }
DMI: Invocation of hashCode on an array
Int[] x = new int[10];
…
x.hashcode();
30
Internationalization A code pattern that will hard future i18n of
the software
Example: Use toUpperCase, toLowerCase on localized
strings
String s = getLocale(key);
s.toUpperCase(); Perfrom tobytes() on localized strings
String s = getLocale(key);
s.getBytes();
31
Multi-thread correctness A code pattern that may cause
incorrectness in multi-thread execution
Examples Synchronization on boxed primitive
private static Boolean inited = Boolean.FALSE;... synchronized(inited) { if (!inited) { init(); inited = Boolean.TRUE; } }...
32
Vulnerability/Security The code pattern may result in vulnerability
or security issues
Examples: SQL: A SQL query is generated from a non-constant
String
String str = “select” + bb + ” ddd” + …
server.execute(str);
This code directly writes an HTTP parameter to JSP output, which allows for a cross site scripting vulnerability
Para = request.getParameter(key);
out.print(Para);
33
Performance The code pattern may harm the performance
of the software
Examples: SBSC: Method concatenates strings using + in a loop
String s = "";for (int i = 0; i < field.length; ++i) { s = s + field[i]; }
StringBuffer buf = new StringBuffer();for (int i = 0; i < field.length; ++i) { buf.append(field[i]);}String s = buf.toString();
34
Major problem: False positives
Overall precision 5% to 10% on open source and industry
projects
Developers want to make sure they do not waste effort on a false positive
Usually more bugs than developers can fix
35
Solution: Bug ranking
Ranking bug categories Some categories are more likely to be
bugs than others How to give scores to each category?
Check large number of issues in the history of software
How large a proportion is fixed?
Raise precision to about 30% in the 25% top ranked bugs
36
Findbugs
Disadvantages Can not guarantee the software to be free of certain
bugs
Still involve many false positives
Advantages Easy to start
Scalable
Relatively less false positives
Some what like testing Becomes the most popular and practical static bug
detection techniques