CS5103 Software Engineering

37
CS5103 Software Engineering Lecture 15 Static Bug Detection and Verification

description

CS5103 Software Engineering. Lecture 15 System Testing Testing Coverage. Higher level testing. Integration Testing Testing the interaction among a number of interactive components System Testing Testing the system as a whole, considering various environments, external exceptions - PowerPoint PPT Presentation

Transcript of CS5103 Software Engineering

CS5103Software

Engineering

Lecture 15Static Bug Detection and

Verification

2

Static bug detection

Static bug detection is a minor approach for software quality assurance, compared with testing

Compared to testing Work for specific kinds of bugs

Sometimes not scalable

Generate false positives

Easy to start (no build, no setup, no install …)

Sometimes can guarantee the software to be free of certain kinds of bugs

No need for debugging

3

State-of-art: static bug detection

Type-specific detection (Fixed Specification and improvement is provided) Major or important type of bugs

Null pointer, memory leak, unsafe cast, injection, buffer overflow, Dynamic SQL error, racing, deadlock, dead loop, html error, UI inconsistency, i18n bugs, …

A large bunch of techniques for each kind of bugs Most of them have severe limitations preventing them

from practical usage

Specification based detection Model checking, symbolic execution, theorem proving

4

Specification

A description of the correct behavior of software

We must have formal specification to do static bug detection

Three main types of specifications Value

Temporal

Data Flow

5

Value Specification

The value (s) of one or several variable (s) must satisfy a certain constraint

Example: Final Exam Score <= 100

sortedlist(0) >= sortedlist(1)

http_url.startsWith(“http”)

Sql_query belongs to Language_SQL

6

Temporal Specification

Two events (or a series of events) must happen in a certain order

Example lock() -> unlock()

file.open() -> file.close() and file.open() -> file.read()

They are different, right?

Temporal Logic Lock() -> F(unlock())

(!read())U(open())

7

Data Flow Specification

Data from a certain source must / must not flow to a certain sink

Example: ! Contact Info -> Internet

Password -> encryption -> Internet

Data Flow Specification are mainly for security usage

8

General Specifications

Common behaviors of all software a/b -> b!=0

a.field -> a!=null

a[x] -> x<a.length()

p.malloc() -> p.free()

lock(s) -> unlock(s)

while(Condition) -> F(!Condition)

<script> xxx </script> -> ! User_input -> xxx

! Hard-coded string -> User Interface

Divide by 0

Null Pointer Reference

Buffer Overflow

Memory Leak

deadlock

Infinite Loop

XSS

I18n error

9

Checking SpecificationsBasic ways

Value Specifications Symbolic execution

Temporal Specification Model Checking

Data Flow Specification Graph traversal (Data Dependence Graph)

Static symbolic execution

Basic Example

y = read(); y = 2 * y; if (y <= 12) y = 3; else y = y + 1;print ("OK");

T (y=s), s is a symbolic variable for input

Here T is the condition for the statement to be executed, (y=s) is the relationship of all variables to the inputs after the statement is executed

T (y=2*s)T (y=2*s)T^y<=12 (y = 3)

T^!(y<=12) (y= 2*s + 1)

T^ 2*s<=12 (y= 3 ) | T^!(2*s<=12) (y=2*s + 1)

(2*s <= 12 & y = 3) & y <= 0 Not Satisfiable

!(2*s <= 12) & (y = 2*s + 1) & y<=0 Not SatisfiableProve y > 0?

11

Static symbolic execution

Complex Example

y = read(); p = 1; while(y < 10){ y = y + 1; if y >2 p = p + 1; else p = p + 2;}print (p);

T (y=s), s is a symbolic variable for inputT (p = 1, y = s)T (p = 1, y = s)T^ s<10 (y = s + 1, p = 1)

T^!(2 < s + 1< 10) (y = s + 1, p = 2)

T^s + 1<=2 (y = s + 1, p = 3)

T^ 2<s+1<10 (y = s + 2, p = 2) | s+1<=2 (y = s + 2, p = 3)

Prove p > 0?

12

Checking SpecificationsBasic ways

Value Specifications Symbolic execution

Temporal Specification Model Checking

Data Flow Specification Graph traversal (Data Dependence Graph)

13

Model Checking

Basic idea Transform the program to an automaton

Program states are state of the automaton, and statements are transitions / edges

Checking temporal properties on the automaton by traversing it

14

Model Checking: Model Building

Basic approach: Use Control Flow Graph:

View all program states after a statement as ONE state

Use Abstract states View all program states after a statement with

same abstract values as ONE state Use Concrete values

View all program states after a statement with same concrete values as ONE state: usually impossible

15

An example with CFG-model Checking whether a file is closed in all

casesboolean load(){ f.open(); line = f.read(); while(line!=null){ if(line.contains('key')){ f.close() return true; }else if(line.contains('value')){ f.close() } line = f.read(); } return false;}

Start

opened

new line read

!=null

key

value

none==null

f is not open

closed

closed

ret

16

An example with CFG-model Traversing the model to find contrary

examples Start

opened

new line read

!=null

key

value

none==null

f is not open

closed

closed

ret

17

An example with CFG-model Read must before close

Start

opened

new line read

!=null

key

value

none==null

f is not open

closed

closed

ret

18

Temporal Logic

The basic idea of model checking is to find a certain path in the model that violate the specification

Describe the sequential relationship among a number of events: the specification So that any specification can just be read by a

path finding tool Do not need to bother writing a path finding tool

for each proof

19

Usage of Temporal Logic

Describe the sequential relationship among a number of events

U: until PUQ means that P has to be true until Q is true

!read(f)Uopen(f) !close(f)Uopen(f)

F: Future FP means that P will be true some time in future

open(f) -> Fclose(f) close(f) -> !Fread(f)

20

Checking SpecificationsBasic ways

Value Specifications Symbolic execution

Abstract Interpretation

Temporal Specification Model Checking

Data Flow Specification Graph traversal (Data Dependence Graph)

21

Some Simple check with Graph Traversal

Check x flows to w

Check (!z used as divider)U(Z is written)

22

Problems of static bug detection

Lack of Specifications Very rare project-specific formal specification

Solutions: General specifications (for typical bugs) Mining specifications (for API-specific, project-specific

specifications)

False Positives vs. Efficiency More sensitivities -> higher cost

Path sensitivity is rarely achieved

Combination of all sensitivities -> Incomputable problems

23

State-of-practice: static bug detection

Findbugs A tool developed by researchers from UMD

Widely used in industry for code checking before commit

The idea actually comes from Lint

Lint A code style enforcing tool for C language

Find bad coding styles and raise warnings Bad naming Hard coded strings …

24

Idea: do it reversely Most static bug detection tools

Set up a specification (either from users or well-defined ones) E.g., Devisor should not be 0, null pointer should not

be referred to, the salary of a personal cannot be negative

Check all possible cases to guarantee that the specification hold

Otherwise provide counter-examples

Findbugs Detect code patterns for bugs

E.g., a = null, b = a.field; str.replace(“ ”, “”);

25

Characters of Findbugs Based on existing concrete code patterns

Check code patterns locally: only do inner-procedure analysis What are the advantages and disadvantages of

doing so?

Perform bug ranking according to the probability and potential severity of bugs Probability: the bug is likely to be true

Severity: the bug may cause severe consequence if not fixed

26

Application of Findbugs-like tools Findbugs is adopted by a number of large

companies such as Google Usually only the issues with highest

confidence/severity are reported as issues

A statistics in Google 2009: More than 4000 issues are identified, in which

1700 bugs are confirmed, and 1100 are fixed.

The software department of USAA is using PMD, an alternative of Findbugs

27

Patterns to be checked 404 bug patterns in 6 major categories

Bad Practice / Dodgy code

Correctness

Internationalization

Vulnerability / Security

Multithread correctness

Performance

28

Bad Practice / Dodgy code Hackish code, not stable and may harm future

maintenance

Examples: Equals method should not assume type of object

argument

boolean Equals(Object o){

Myclass my = (Myclass)o;

return my.id = this.id;

}

Abstract class defines covariant compareTo() method

int compareTo(Myclass obj){ … }

29

Correctness The code pattern may result in incorrect

behavior of the software

Examples: DMI: Collections should not contain themselves

List s = new …; …

if(s.contains(s)){ … }

DMI: Invocation of hashCode on an array

Int[] x = new int[10];

x.hashcode();

30

Internationalization A code pattern that will hard future i18n of

the software

Example: Use toUpperCase, toLowerCase on localized

strings

String s = getLocale(key);

s.toUpperCase(); Perfrom tobytes() on localized strings

String s = getLocale(key);

s.getBytes();

31

Multi-thread correctness A code pattern that may cause

incorrectness in multi-thread execution

Examples Synchronization on boxed primitive

private static Boolean inited = Boolean.FALSE;... synchronized(inited) { if (!inited) { init(); inited = Boolean.TRUE; } }...

32

Vulnerability/Security The code pattern may result in vulnerability

or security issues

Examples: SQL: A SQL query is generated from a non-constant

String

String str = “select” + bb + ” ddd” + …

server.execute(str);

This code directly writes an HTTP parameter to JSP output, which allows for a cross site scripting vulnerability

Para = request.getParameter(key);

out.print(Para);

33

Performance The code pattern may harm the performance

of the software

Examples: SBSC: Method concatenates strings using + in a loop

String s = "";for (int i = 0; i < field.length; ++i) { s = s + field[i]; }

StringBuffer buf = new StringBuffer();for (int i = 0; i < field.length; ++i) { buf.append(field[i]);}String s = buf.toString();

34

Major problem: False positives

Overall precision 5% to 10% on open source and industry

projects

Developers want to make sure they do not waste effort on a false positive

Usually more bugs than developers can fix

35

Solution: Bug ranking

Ranking bug categories Some categories are more likely to be

bugs than others How to give scores to each category?

Check large number of issues in the history of software

How large a proportion is fixed?

Raise precision to about 30% in the 25% top ranked bugs

36

Findbugs

Disadvantages Can not guarantee the software to be free of certain

bugs

Still involve many false positives

Advantages Easy to start

Scalable

Relatively less false positives

Some what like testing Becomes the most popular and practical static bug

detection techniques

Review of Static Bug Detection

Specification-based static bug detection Value Specifications : Symbolic Execution,

Abstract Interpretation

Temporal Specifications: Model Checking

Data Flow Specifications: Dependence Graph, Traversing

Pattern-based static bug detection Findbugs

Bug Ranking