Open Source Verification under a Cloud (OpenCert 2010)
-
Upload
peter-breuer -
Category
Technology
-
view
176 -
download
0
description
Transcript of Open Source Verification under a Cloud (OpenCert 2010)
Open Source VerificationUnder a Cloud
Peter Breuer Simon Pickin
Dept. Comp. Sci. Dpto. Ing. Tel.Univ. Birmingham U. Carlos III de Madrid
UK Spain
What we’ve done
Started with monolithic analysis software
Restructured it into
database plus verification problem serverad hoc network of remote verification solver clients
Thrown it at verification problems tackled previouslyChecking Linux kernel code for SMP locking errors
just over 1,000,000 lines of C code/assemblerlook for possible take of spinlock in dangerous contextpossible double-take of spinlock before release, etc.took about 9,000,000s of system time
Outcome is intrinsically 50× as slow as original
takes ˜ 100 clients to go 2× as fastbut it scales well!
Why we’ve done it
For a future vision
formal methods specialists contribute standard analysesdevelopers upload code for analysis to cloudunskilled open source supporters help favourite project
run client solvers at home on their spare CPU cycles
specialists report regressions and new errors
requires humans to eliminate false positives
Hopefully we get a positive reinforcement loop
more developers develop formal methods skillsmore formal methods people understand coding problemsmore unskilled contributers develop skills
Perhaps to run faster!
500 clients analyse 1,000,000 LOC in 3h
What our Ad Hoc Volunteer Cloud of Solvers Looked Like
Internet
3x<1+y?
p/\q=>q−>p?
querystore
...
����
Firewall
High
Disk
Speed
DB server
How the Calculation is Organised
P Parse produces syntax tree T1,000,000 LOC produces ˜10,000,000 syntax tree nodes
A Decorate T with post-conditions post to get T post
E.g. ‘x ≤ 1’ where x counts locks taken
H Add evaluation [post ∧ d ] for defect d to get T posteval
E.g. x ≤ 1 ∧
{x ≥ 1 p = lock()
false otherwise
L Where there’s a nonzero evaluation, flag the point p
X Certify intermediates T posteval and defect list L
customized approximating logic� Symbolic Approximation �
Rearranging the Calculation for a Cloud
Parsing P is currently done monolithically first of all
Logical analysis is done piecewise at each syntax node
A = ◦pAp
Ditto checking H = ◦pHp
Organised as work-units of analysis and checking
A◦H = ◦p(Ap ◦Hp)
Subject to dependencies induced by syntax p = Pi
(pi )
Certification
Store the evidence (intermediate and final results) T posteval
Store parse, logic configuration, defect definitions
Provide list of defects L and certificate X
Certificate contains signatures of all items above
?Idea is that a doubter can ask to see the data
Any part of the computation can then be repeated . . .. . . to confirm or refute what the certificate claims
!
Experimental Data
746,844 function definitions in >1,000,000 LOC
many turned out to be ‘static inline’ duplications
Reduced to 78,619 syntactically different definitions
Clients each initially given 10m to analyse 1 functionAbort after 10m and try another
only 373 tasks needed longer than 10m
Time-limit on task raised to 15m, etc.
129 functions needed longer than 1h
Were split up syntax-wise and checkpointed every 5m
24 functions remained unanalyzed at experiment end
complexity explosion in logic accounts for most
Duplicated Function Definitions are a Problem!
Top-level definitions with multiple instances (∑
xy = 746844)
1
10
100
1000
10000
100000
1 10 100 1000
#uniq
ue d
efs
#instances
Time taken per analysis task
1
10
100
1 10 100 1000 10000 100000
#ta
sks
time in seconds
Time taken per analysis task (cumulative count)
0
20
40
60
80
100
1 10 100 1000 10000 100000
%ta
sks
time in seconds
Percentage of total time taken per analysis task(cumulative)
0
20
40
60
80
100
1 10 100 1000 10000 100000
\%ti
me
time taken per task in seconds
Practical Difficulties
Populating DB remotely was going to take a month!
parsing/loading locally took about 24h
Remote DB transactions can take seconds eachbeaten by cacheing (95% hits) of DB queries on clients
solver attempts 150-500 queries/s (90% reads)5-10 queries/s escape caches to the net
Solver CPU loading is only a few %, peaking occasionally
DB server limits transaction rate
limit at 100 queries/s for 3GB RAM 1.8GHz 64bit Athlonneeded to up RAM to � DB size to avoid disk I/O limitsCPU loaded to about 15% by each active thread
Client/net breakdowns mean only 25-50% operating level
Summary
Prototype verification cloud softwareBased on symbolic approximation
customized approximating logic
Computation handled incrementally and piecewiseIntermediate results retained for accountability
any part repeated/duplicated if challenged
Produces certificate and a list of defects
Experiment on 1,000,000 LOCC (Linux kernel)
9,000,000s of (normalized 1GHz CPU) system time< 24h for 100 clients< 6h for 500 clients
provided can find DB servers to handle 2500 queries/s!(fortunately queries are strongly localized)