Yogi – Part 3

58
Yogi – Part 3 Engineering the tool and parallelization

description

Yogi – Part 3. Engineering the tool and parallelization. Windows Device Drivers. do { //get the write lock KeAcquireSpinLock (& devExt -> writeListLock ); nPacketsOld = nPackets ; request = devExt -> WriteListHeadVa ; if(request && request->status){ - PowerPoint PPT Presentation

Transcript of Yogi – Part 3

Yogi – Part 3Engineering the tool and parallelization

do { //get the write lock

KeAcquireSpinLock(&devExt->writeListLock);nPacketsOld = nPackets; request = devExt->WriteListHeadVa;

if(request && request->status){devExt->WriteListHeadVa = request->Next;

KeReleaseSpinLock(&devExt->writeListLock);

irp = request->irp;if(request->status > 0){

irp->IoStatus.Status = STATUS_SUCCESS;irp->IoStatus.Information = request->Status;}

else{irp->IoStatus.Status = STATUS_UNSUCCESSFUL;irp->IoStatus.Information = request->Status;

}SmartDevFreeBlock(request);IoCompleteRequest(irp, IO_NO_INCREMENT);nPackets++;

}} while (nPackets != nPacketsOld);

KeReleaseSpinLock(&devExt->writeListLock);

Windows Device Drivers

LockedLLUnlocked

ErrorU

U

Source Code

TestingDevelopment

API Usage Rules(SLIC)

Software Model Checking

Read forunderstanding

New API rules

Drive testingtools

Defects

100% pathcoverage

Rules

Static Driver Verifier

Source Code

TestingDevelopment

API Usage Rules(SLIC)

Software Model Checking

Read forunderstanding

New API rules

Drive testingtools

Defects

100% pathcoverage

Rules

Static Driver Verifier

Architecture of Yogi

Yogi IR (yir)

C Program slamcl

Slam.li database

sliccSLIC property

Instrumented Slam.li database

li2yir

Test case that exposes a

bug

Proof that program satisfies property

Alias and mod-ref information

Z3 theorem prover

YAbsManYSim

polymorphic

region graphs

Initial function summaries

Implementation

~6K lines of OcamlZ3 theorem prover

Integrated with SDV for Windows

Engineering effort: 3 person years

F# version available with SDVRP http://research.microsoft.com/yogi

Optimizations

Share our experiences in making Yogi robust, scalable and industrial strength

Several of the implemented optimizations are folklore Very difficult to design tools that are bug free evaluating

optimizations is hard! Our empirical evaluation gives tool builders information about

what gains can be realistically expected from optimizations Details in “An empirical study of optimizations in Yogi, ICSE ’10“

Vanilla implementation of algorithms: (flpydisk, CancelSpinLock) took 2 hours

Algorithms + engineering + optimizations: (flpydisk, CancelSpinLock) took less than 1 second!

Optimizations

Initial abstraction from property predicates

Relevance heuristics for predicate abstraction Suitable predicates (SP) Control dependence predicates (CD)

Summaries for procedures

Thresholds for tests

Evaluation setup

Benchmarks: 30 WDM drivers and 83 properties (2490 runs) Anecdotal belief: most bugs in the tools are

usually caught with this test suite

Presentation methodology: Group optimizations logically such that related

optimizations are in the same group Total time taken, total number of defects found

for every possible choice of enabling/disabling each optimization in the group

Initial abstraction

state { enum {Locked = 0, Unlocked = 1} state = Unlocked;}

KeAcquireCancelSpinlock.Entry { if (state != Locked) { state = Locked; } else abort;}

KeReleaseCancelSpinlock.Entry { if (state == Locked) { state = Unlocked; } else abort;}

01

(𝑠𝑡𝑎𝑡𝑒≠𝐿𝑜𝑐𝑘𝑒𝑑)

01

(𝑠𝑡𝑎𝑡𝑒=𝐿𝑜𝑐𝑘𝑒𝑑)

01𝑇

𝑇

Empirical resultsAbstraction using SLIC predicates

Total time(minutes)

#defects #timeouts

yes 2160 241 77no 2580 241 86

16%

Relevance heuristics (SP)

Avoid irrelevant conjuncts

AC

𝑇

𝑇

B𝑇

D 𝛿

AC

𝑇

¬𝜌

B𝑇

D 𝛿

C 𝜌

𝑎𝑠𝑠𝑢𝑚𝑒(𝜙)

𝑎𝑠𝑠𝑢𝑚𝑒(𝜙)

Irrelevant?

Relevance heuristics (CD) Abstract assume statements that are not

potentially relevant by skip statements

If Yogi proves that the program satisfies property, we are done.

Otherwise, validate the error trace and

refine the abstraction by putting back assume statements, if the error trace is spurious

Example: SP heuristic

int x;void foo() { bool protect = true; … if (x > 0) protect = false; … if (protect) KeAcquireCancelSpinLock(); for (i = 0; i < 1000; i++) { a[i] = readByte(i); } if (protect) KeReleaseCancelSpinLock();}

AC

𝑇

𝑇

B𝑇

D 𝑠𝑡𝑎𝑡𝑒=𝑙𝑜𝑐𝑘𝑒𝑑

AC

𝑇

¬𝜌

B𝑇

D 𝑠𝑡𝑎𝑡𝑒=𝑙𝑜𝑐𝑘𝑒𝑑

C 𝜌

𝑎𝑠𝑠𝑢𝑚𝑒( 𝑖>1000)

𝑎𝑠𝑠𝑢𝑚𝑒( 𝑖>1000)

𝜌=(state=Locked )∧( 𝑖>1000)

Example: SP heuristic

int x;void foo() { bool protect = true; … if (x > 0) protect = false; … if (protect) KeAcquireCancelSpinLock(); for (i = 0; i < 1000; i++) { a[i] = readByte(i); } if (protect) KeReleaseCancelSpinLock();}

𝜌=(state=Locked )

AC

𝑇

𝑇

B𝑇

D 𝑠𝑡𝑎𝑡𝑒=𝑙𝑜𝑐𝑘𝑒𝑑

AC

𝑇

¬𝜌

B𝑇

D 𝑠𝑡𝑎𝑡𝑒=𝑙𝑜𝑐𝑘𝑒𝑑

C 𝜌

𝑎𝑠𝑠𝑢𝑚𝑒( 𝑖>1000)

𝑎𝑠𝑠𝑢𝑚𝑒( 𝑖>1000)

Example: CD heuristic

int x;void foo() { bool protect = true; … if (x > 0) protect = false; … if (protect) KeAcquireCancelSpinLock(); for (i = 0; i < 1000; i++) { a[i] = readByte(i); } if (protect) KeReleaseCancelSpinLock();}

Empirical resultsSP

heuristicCD

heuristic

Total time

(minutes)

#defects

#timeouts

yes yes 2160 241 77yes no 2580 239 91no yes 2400 238 87no no 2894 235 174

10%

Empirical resultsSP

heuristicCD

heuristic

Total time

(minutes)

#defects

#timeouts

yes yes 2160 241 77yes no 2580 239 91no yes 2400 238 87no no 2894 235 174

16%

Empirical resultsSP

heuristicCD

heuristic

Total time

(minutes)

#defects

#timeouts

yes yes 2160 241 77yes no 2580 239 91no yes 2400 238 87no no 2894 235 174

25%

Interprocedural analysis

Yogi performs a compositional analysis : Is it possible to execute starting from

state and reach state ?

Global modification analysis

Compositional May-Must analysis (POPL 2010)

Example

AC

𝑇

𝑇

B𝑇

D 𝑠𝑡𝑎𝑡𝑒=𝑙𝑜𝑐𝑘𝑒𝑑

AC

𝑇

¬𝜌

B𝑇

D 𝑠𝑡𝑎𝑡𝑒=𝑙𝑜𝑐𝑘𝑒𝑑

C 𝜌

𝑓𝑜𝑜(…)

foo(…)

⟨ 𝜙1 , 𝑓𝑜𝑜 (…) ,𝜙2 ⟩

Empirical resultsModification analysis

Summaries

Total time

(minutes)

#defects

#timeouts

yes yes 2160 241 77yes no 2760 239 109no yes 3180 237 134no no 3780 236 165

32%

Empirical resultsModification analysis

Summaries

Total time

(minutes)

#defects

#timeouts

yes yes 2160 241 77yes no 2760 239 109no yes 3180 237 134no no 3780 236 165

28%

Empirical resultsModification analysis

Summaries

Total time

(minutes)

#defects

#timeouts

yes yes 2160 241 77yes no 2760 239 109no yes 3180 237 134no no 3780 236 165

42%

Testing

Yogi relies on tests for “cheap” reachability

Long tests avoiding several potential reachability

queries results in too many states and thus

memory consumption

Test thresholds: time vs. space tradeoff

Empirical evaluation

Test threshold

Total time

(minutes)

#defects

#timeouts

250 2600 236 92500 2160 241 771000 2359 240 881500 2400 239 89

Modeling the environment

if (DestinationString) { DestinationString->Buffer = SourceString;

// DestinationString->Length should be set to the // length of SourceString. The line below is missing // from the original stub SDV function DestinationString->Length = strlen(SourceString);}

if (SourceString == NULL){ DestinationString->Length = 0; DestinationString->MaximumLength = 0;}

Issue type #issues

Integers used as pointers

8Uninitialized

variables15

Type inconsistencies 9

Summary

Described optimizations implemented in Yogi Evaluated optimizations on the WDM test suite

Empirical data used to decide which optimizations to include in Yogi

We believe that this detailed empirical study of optimizations will enable tool builders to decide which optimizations to include and how to engineer their tools

Download: http://research.microsoft.com/yogi

Bolt Bolt: a generic

framework that uses MapReduce style parallelism to scale top-down analysis

Bolt

sumDB

Yogi IntraproceduralanalysisQueries

Queryingsummaries

Interprocedural analysis

Ynot-may summary: perform refinementmust summary : generate test and extend frontier

1Ω1𝐶𝐴𝐿𝐿( 𝑓𝑜𝑜 (𝑖 , 𝑗 ))

2

can we parallelize this

algorithm?

Main idea

int main (int y){ if (*) x = foo(y); else x = bar(y); if (x < 0) error();}

⟨𝑡𝑟𝑢𝑒?⇒𝑚𝑎𝑖𝑛𝑒𝑟𝑟𝑜𝑟 ⟩

⟨𝑡𝑟𝑢𝑒?⇒𝑓𝑜𝑜 𝑥<0 ⟩ ⟨𝑡𝑟𝑢𝑒?⇒𝑏𝑎𝑟 𝑥<0 ⟩

Analyze queries in parallel!

Is it worth it?

1 9 17 25 33 41 49 57 65 73 81 89 97 1051131211291371451531611691771851932012090

10

20

30

40

50

60

Time

Nu

mb

er

of

un

an

sw

ere

d s

ub

-q

ueri

es

Driver: func_failProperty: ToasterDispatchPnP

Introducing BoltBolt

sumDB

Yogi IntraproceduralanalysisQueries

Queryingsummaries

Modification to intraprocedural Yogi

Q1

Yogi

sumDB

Q1

Done(add summary to sumDB)

Q1

Blocked(add new sub-queries)

Q1

Ready(may add new sub-queries)

Lifecycle of a query

ready

done

blocked

𝑌𝑜𝑔𝑖(𝑄𝑖)𝑌𝑜𝑔𝑖(𝑄𝑖)

𝑌𝑜𝑔𝑖(𝑄𝑖)

Reduce

Reduce

In a nutshell …

Bolt uses a MapReduce-style parallelization: Map stage: Run each ready query on a

separate instance of Yogi Reduce stage: Resolve dependencies

between queries

Terminate when the main query has an answer in sumDB

Examplevoid main (int i){ if (i > 0) x = foo(i); else if (j > -10) x = bar(i); else x = baz(j); y = x + 5; if (y <= -5) error();}

𝑄𝒎𝒂𝒊𝒏Yogi

𝑄𝒎𝒂𝒊𝒏 𝑄 𝒇𝒐𝒐 𝑄𝒃𝒂𝒓 𝑄𝒃𝒂𝒛

Map

sumDB initially empty

⟨ 𝑖>0?⇒𝑚𝑎𝑖𝑛 𝑦 ≤−5 ⟩

Example 𝑄𝒎𝒂𝒊𝒏

𝑄𝒎𝒂𝒊𝒏 𝑄 𝒇𝒐𝒐 𝑄𝒃𝒂𝒓 𝑄𝒃𝒂𝒛

sumDB

Reduce𝑄𝒎𝒂𝒊𝒏 𝑄 𝑓𝑜𝑜 𝑄𝑏𝑎𝑟 𝑄𝑏𝑎𝑧

MapYogivoid main (int i){ if (i > 0) x = foo(i); else if (j > -10) x = bar(i); else x = baz(j); y = x + 5; if (y <= -5) error();}

Example

𝑄 𝒇𝒐𝒐 𝑄𝒃𝒂𝒛 𝑄𝒓𝒐𝒐

Map

sumDB

𝑄𝑚𝑎𝑖𝑛 𝑄 𝑓𝑜𝑜 𝑄𝑏𝑎𝑟 𝑄𝑏𝑎𝑧YogiYogiYogi

𝑄𝒃𝒂𝒓

⟨𝑡𝑟𝑢𝑒¬𝑚𝑎𝑦⇒

𝑓𝑜𝑜 𝑦<7 ⟩

void main (int i){ if (i > 0) x = foo(i); else if (j > -10) x = bar(i); else x = baz(j); y = x + 5; if (y <= -5) error();}

Example

𝑄 𝒇𝒐𝒐 𝑄𝒃𝒂𝒛 𝑄𝒓𝒐𝒐

Map

sumDB

𝑄𝑚𝑎𝑖𝑛 𝑄 𝑓𝑜𝑜 𝑄𝑏𝑎𝑟 𝑄𝑏𝑎𝑧YogiYogiYogi

𝑄𝒃𝒂𝒓Reduce

𝑄𝑚𝑎𝑖𝑛 𝑄𝒃𝒂𝒛 𝑄𝒓𝒐𝒐

void main (int i){ if (i > 0) x = foo(i); else if (j > -10) x = bar(i); else x = baz(j); y = x + 5; if (y <= -5) error();}

Implementation

let bolt () = while (initQuery.isNotDone()) do

worklist := Async.AsParallel [for q in worklist -> async {return yogi q}];

reduce (); done;

Evaluation

Benchmarks 45 Windows device drivers and 150 safety

properties Picked 50 driver+property pairs taking

1000+ seconds on sequential version

Experimental setup 8 processor cores Artificial throttle: number of threads per

Map stage

Evaluation

Cumulative results on 50 checks, 8-cores, 64 threads

Total time (sequential) 26 hours

Total time (parallel) 7 hours

Average speedup 3.71x

Maximum speedup 7.41x

In-depth analysis

Driver: toastmon (25KLoC)

In-depth analysis

Driver: toastmon (25KLoC)

In-depth analysisDriver: toastmon (25KLoC)

Property: PnpIrpCompletion

2

In-depth analysis

4Driver: toastmon (25KLoC)

Property: PnpIrpCompletion

In-depth analysis

8Driver: toastmon (25KLoC)

Property: PnpIrpCompletion

In-depth analysis

8Driver: toastmon (25KLoC)

Property: PnpIrpCompletion

In-depth analysis

16Driver: toastmon (25KLoC)

Property: PnpIrpCompletion

In-depth analysis

32Driver: toastmon (25KLoC)

Property: PnpIrpCompletion

In-depth analysis

64Driver: toastmon (25KLoC)

Property: PnpIrpCompletion

Observations

Bolt achieves an average speedup of 3.7x maximum speedup of 7.4x

Bolt verified drivers that were previously out of reach for Yogi and other tools

Summary

Bolt: a general framework for parallelizing a large class of program analysis

A may-must instantiation of Bolt based on Yogi Evaluation on a large set of Windows device drivers Speedups up to 7.4x on 8-cores Solving problems sequential version cannot solve

Intraprocedural analysis Bolt Parallel

Interprocedural

analysis

Recap and conclusions…

So what did we learn, and what next?

Yogi: main ideas

Combining verification and testing is interesting in theory and practice Theory:▪ map out the “space” between testing and verification. Can use tests to

do proofs, if we observe how the tests run!▪ interesting interplay between “may” and “must”, can write beautiful

rules that bring out the duality between “may” and “must” (see POPL 2010 paper)

▪ can write down interprocedural may-must algorithms using Map-Reduce parallelism

Practice: ▪ can verify properties by running tests, and reducing load on theorem

prover▪ can use tests and must-alias analysis to come up with hypothesis for

doing proofs▪ can persist must and may summaries and scale precise analysis to large

pieces of code, and use parallelism to scale even more!

Future directions

Integrate with interpolants/abduction. So far Yogi has concentrated on “where to refine”, and not much on “how to refine” or “which predicates to use to refine” (CAV ‘12)

How much parallelism is inherently there for analyzing very large programs? Can parallelism help?

PLDI 2012 tutorialhttp://research.microsoft.com/yogi/pldi2012.aspx{adityan, sriram}@microsoft.com