Integrating Artificial Intelligence Software Testingin.bgu.ac.il/en/engn/ise/QT/Documents/Artificial...

42
Integrating Artificial Intelligence i in Software Testing Roni Stern and Meir Kalech, ISE department, BGU Niv Gafni, Yair Ofir and Eliav Ben-Zaken, Software Eng., BGU 1

Transcript of Integrating Artificial Intelligence Software Testingin.bgu.ac.il/en/engn/ise/QT/Documents/Artificial...

Integrating Artificial Intelligenceiin

Software Testing

Roni Stern and Meir Kalech, ISE department, BGU

Niv Gafni, Yair Ofir and Eliav Ben-Zaken, Software Eng., BGU

1

AbstractAbstract

Di i

Artificial Intelligence

DiagnosisPlanning

Software Engineering

TestingTesting

2

Testing Planningg g

Diagnosis

3

Testing is Important• Quite a few software development paradigms• All of them recognize the importance of testing• All of them recognize the importance of testing• There are several types of testing

i i bl k b (f i l) i– E.g., unit testing, black-box (functional) testing ….

Tester• Runs tests • Follows a test plan

Programmer• Write programs• Follows spec

4

• Follows a test plan• Goal: find bugs!

• Follows spec.• Goal: fix bugs!

Handling a BugHandling a Bug

1. Bugs are reported by the tester- “Screen A crashed on step 13 of test suite 4…”p

2. Prioritized bugs are assigned to programmers3. Bugs are diagnosed

- What caused the bug?g

4. Bugs are fixedH f ll- Hopefully…

5

Why is Debugging Hard?Why is Debugging Hard?

• The developer needs to reproduce the bug• Reproducing (correctly) a bug is non-trivialReproducing (correctly) a bug is non trivial

– Bug reports may be inaccurate or missingS f h h i l– Software may have stochastic elements• Bug occurs only once in a while

– Bugs may appear only in special cases

Let the tester provide more information!

6

7

Introducing AI!Introducing AI!es

ter Run a test suite

Fi d b mer Diagnose

Fi h bTe Find a bug

rogr

am

Fix the bug

Pr

Test

er Run a test suiteFind a bug

AI Compute Diagnoses m

mer Fix the bug

T Find a bugPlan next tests

Prog

ram

8

Test Diagnose and Plan (TDP)Test, Diagnose and Plan (TDP)r r

Test

er Run a test suiteFind a bug

AI Compute DiagnosesPl t t t am

mer Fix the bug

Plan next tests

Prog

ra

1. The tester runs a set of planned tests (test suite)2. The tester finds a bug3. AI generates possible diagnoses4. If there is only one candidate – pass to programmer5 Else AI plans new tests to prune candidates5. Else, AI plans new tests to prune candidates

9

DiagnosisDiagnosis

Find the reason of a problem given observed symptoms

Requirement: knowledge of the diagnosed systemGi b• Given by experts

• Learned by AI techniques

10

Model Based DiagnosisModel-Based Diagnosis

• Given: a model of how the system works

I f h t tInfer what went wrong

• Example:Example:

IF ok(battery) AND ok(ignition) THEN start(engine)

• What if the engine doesn’t start?

11

Where is MBD applied?Where is MBD applied?

i• Printers (Kuhn and de Kleer 2008)

• Vehicles(Pernestål et. al. 2006)

• Robotics (Steinbauer 2009)(Steinbauer 2009)

• Spacecraft - Livingstone (Williams and Nayak, 1996; Bernard et al., 1998)

….

12

Software is Difficult to ModelSoftware is Difficult to Model

• Need to code how the software should behave– Specs are complicated to modelp p

• Static code analysis ignores runtimeP l hi– Polymorphism

– Instrumentation

13

Zoltar [Ab t l 2011’]Zoltar [Abrue et. al. 2011’]

• Construct a model from the observations• Observations should include execution tracesObservations should include execution traces

– Functions visited during executionOb d b / b– Observed outcome: bug / no bug

• Weaker modelOk(function1) function1 outputs valid valueA bug entails that at least one comp was not OkA bug entails that at least one comp was not Ok

14

Execution MatrixExecution Matrix

• A key component in Zoltar is the execution matrix• It is built from the observed execution traces

– Observation 1 (BUG) : F1 F5 F6 – Observation 2 (BUG) : F2 F5( )– Observation 3 (OK) : F2 F6 F7 F8

F1 F2 F3 F4 F5 F6 F7 F8 BugF1 F2 F3 F4 F5 F6 F7 F8 Bug

Obs1 1 0 0 0 1 1 0 0 1

Obs2 0 1 0 0 1 0 0 0 1

Obs3 0 1 0 0 0 1 1 1 0Obs3 0 1 0 0 0 1 1 1 0

15

Diagnosis = Hitting Sets of ConflictsDiagnosis = Hitting Sets of Conflicts

In every BUG trace at least one function is faulty• Observations:Observations:

– Observation 1 (BUG) : F1 F5 F6 Ob i 2 (BUG) F2 F5– Observation 2 (BUG) : F2 F5

• Conflicts:– Ok(F1) AND Ok(F5) AND Ok(F6)

Ok(F2) AND Ok(F5) Hitting sets of Conflicts– Ok(F2) AND Ok(F5)

• Possible diagnoses: {F5} ,{F1,F2},{F6,F2}

Hitting sets of Conflicts

16

Software Diagnosis with ZoltarSoftware Diagnosis with Zoltar

P i itiBug Report Prioritize Diagnosesg

ZoltarSet of

Candidate Diagnoses

17

Plan More TestsPlan More Tests

Bug Report

AIEngineSet of

Possible Diagnoses

Suggest New Test to P C did

18

Prune Candidates

Plan More TestsPlan More Tests

Bug Report

AIEngine A single diagnosisA single diagnosis

Suggest New Test to P C did

19

Prune Candidates

Pacman ExamplePacman Example

20

Pacman ExamplePacman Example

21

Pacman ExamplePacman Example

22

Pacman ExamplePacman Example

23

Pacman ExamplePacman Example

24

Pacman ExamplePacman Example

25

Pacman ExamplePacman Example

26

Execution TraceExecution Trace

F1 F2 F3F1Move

F2Stop

F3Eat Pill

27

Which Diagnosis is Correct?Which Diagnosis is Correct?

28

KnowledgeKnowledge

• Most probable cause: Eat Pill

29

KnowledgeKnowledge

• Most probable cause: Eat Pill• Most easy to check: StopMost easy to check: Stop

30

ObjectiveObjective

Plan a sequence of testsf d h b dto find the best diagnosis

with minimal tester actionswith minimal tester actions

31

1 Highest Probability (HP)1. Highest Probability (HP)

h b f f• Compute the prob. of every function Ci

= Sum of prob. of candidates containing Cip g i

• Plan a test to check the most probable function

0.2F1

0.4 0.2

0 4

0.4F5F2

F6

32

0.4

1 Highest Probability (HP)1. Highest Probability (HP)

h b f f• Compute the prob. of every function Ci

= Sum of prob. of candidates containing Cip g i

• Plan a test to check the most probable function

0.2F1

0.4 0.6

0 4F5

F2

F6

33

0.4

2 Lowest Cost (LC)2. Lowest Cost (LC)

d l f h ( )• Consider only functions Ci with 0<P(Ci)<1• Test the closest function from this set

F1

F5F2

F6

34

3 Entropy-based3. Entropy-based• A test α = a set of components

Ω set of candidates that assume α will pass– Ω+ = set of candidates that assume α will pass – Ω- = set of candidates that assume α will fail– Ω? = set of candidates that are indifferent to α?

• Prob. α will pass = prob. of candidates in Ω+

• Choose test with lowest Entropy(Ω+ ,Ω ,Ω?)py( + , - , ?)= highest information gain

0.2F1

0.4 0.2

0 4

0.4F5F2

F6

35

0.4

4 Planning Under Uncertainty4. Planning Under Uncertainty

• Outcome of test is unknownwe would like to minimize expected cost

• Formulate as Markov Decision Process (MDP)– States: set of performed tests and their outcomesStates: set of performed tests and their outcomes– Actions: Possible tests – Transition: Pr(Ω ) Pr(Ω ) and Pr(Ω?)Transition: Pr(Ω+), Pr(Ω-) and Pr(Ω?)– Reward (cost): Cost of performed tests

• Current solver uses myopic sampling• Current solver uses myopic sampling

36

Preliminary Experimental ResultsPreliminary Experimental Results

• Synthetic code• SetupSetup

– Generated random call graphs with 300 nodesG d f h d ll h– Generate code from the random call graphs

– Injected 10/20 random bugs – Generate 15 random initial tests

• Run until a diagnosis with prob >0 9 is found• Run until a diagnosis with prob.>0.9 is found– Results on 100 instances

37

Preliminary Experimental ResultsPreliminary Experimental Results

• HP and Entropy perform worse than (LC) and MDP• probability based (HP and Entropy) perform worse than the cost based approach. • MDP which considers cost and probability outperforms the others

38

• MDP which considers cost and probability outperforms the others• 20 bugs is more costly for all algorithms than 10 bugs

Preliminary Experimental Results NUnitsPreliminary Experimental Results - NUnits

• Real code• Well-known testing framework for C#Well known testing framework for C#• Setup

– Generated call graphs with 302 nodes from NUnits– Injected 10,15,20,25,30 random bugs j g– Generate 15 random initial tests

• Run until a diagnosis with prob >0 9 is found• Run until a diagnosis with prob.>0.9 is found– Results on 110 instances

39

Preliminary Experimental Results NUnitsPreliminary Experimental Results - NUnits

• Similar results• The cost increases with the probability bound

40

p y• MDP which considers cost and probability outperforms the others

ConclusionConclusion

• The test, diagnose and plan (TDP) paradigm:– AI diagnoses observed bug with MBD algorithmg g g– AI plans further tests to find correct diagnosis

E t t i AI• Empowers tester using AI– Bug diagnosis done by tester+AI

41

SCM Server Logs Source CodeSCM Server Logs Source Code

iAI Engine

QA Tester Developer

42