KLEE: Unassisted and Automac Generaon of High‐Coverage Tests...

19
KLEE: Unassisted and Automa2c Genera2on of High‐Coverage Tests for Complex Systems Programs Cris2an Cadar, Daniel Dunbar, Dawson Engler Stanford University Presented by Adam Bergstein November 28, 2011

Transcript of KLEE: Unassisted and Automac Generaon of High‐Coverage Tests...

Page 1: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE:UnassistedandAutoma2cGenera2onofHigh‐Coverage

TestsforComplexSystemsPrograms

Cris2anCadar,DanielDunbar,DawsonEnglerStanfordUniversity

PresentedbyAdamBergsteinNovember28,2011

Page 2: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

Outline•  Background

–  Symbolicexecu2on–  Constraintsandsolvers–  Sinks/sinksources–  Abstractdomainandconcre2za2on–  Systemmodeling

•  KLEE–  Mainconcepts–  Overallprocess–  PrecisionfromLLVMandbytecode–  No2onofstates–  Constraintsandpaths–  PerformanceandEnvironment–  Results

•  MyThoughts•  Ques2ons

Page 3: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

Background•  Symbolicexecu2on–  Simula2onthatapproximatesvariablevaluesbyusingsymbols

–  Opera2onsonvariablesconstrainthesymbols–  Usedtoreasonaboutpossiblevaluesthatcausecertaincondi2onsinaprogram•  Isasymbolicvalueintherangeofvaluesthatcausesomethingtooccur?

–  hXp://www.stat.uga.edu/stat_files/billard/tr_symbolic.pdf•  Constraintsandsolvers–  Constraintsarecollectedfactsaboutaprogramthatdefineboundsonpossibleexecu2onatspecificpointsinaprogram

–  Solversdeterminethepossibilityofconcretevaluesbasedontheconstraints

–  Certainconcretevaluescancondi2onallycauseprogramstobehaveinundesirableways

Page 4: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

Background

•  Sinksandsinksources–  Sinksiden2fymeaningfulopera2onswithinthecode–  Sourcesiden2fythedataoriginsthatcaninfluencesinks

•  Abstractdomainandconcre2za2on–  Definingtherangeofallpossiblevaluesforvariables–  Concre2za2onmapsactualvariablevaluesfromrangesofpossiblevalues

•  Systemmodeling–  “Approxima2ng”howasystembehaveswhenitruns– Wehavelookedatdifferentwaystorepresentsystems,likeCFGs,summaryfunc2ons,etc

Page 5: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE>MainConcepts•  Useofsta2canalysistodetermineiftherearepossible

concretevaluesthatcausevulnerabili2esintheprogram•  Simulateaprogramandleveragesymbolicexecu2on•  Buildconstraintsandmaintainaseriesofstatesthroughoutthe

simula2on–  Statesdefineeachuniquepaththroughouttheprogram

•  Leverageasolvertodeterminepossibili2eswithintheprogrambasedonconstraints–  Returnconcretevaluesifsomethingwassolvable

•  Documentareasofthecodethathaveanypossiblevaluesthatcancausevulnerabili2es–  Basedonasetofpossibledangerousopera2ons

•  “Basedontheconstraints(stateofuniquepath)atthe2meIgettothislineofcodewithapoten2allydangerousopera2on,isthereanypossiblevaluethatcancausethislineofcodetobedangerous?”

Page 6: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE>MainConcepts•  KLEEbeginsbyconstruc2ngunconstrainedvariablesforargumentsinto

state–  Ini2alconstraintsaresetbasedon‐‐sym‐argswhenrunningKLEE–  Definesnumberofargumentsandnumberofcharactersperargument–  Setsini2alconstraintssoopera2onisnottotallyunbounded

•  Analysissimulateseachinstruc2onandrunseachstateperinstruc2on–  Schedulingalgorithmtoselectwhichstatetoanalyzefirst–  Collectmoreconstraints,updatethesymbolicvaluesinthestate–  Whenreachingapoten2alopera2onthatcontainsanexitorerror,lookat

thepathcondi4on•  Pathcondi2onsarethecollec2onofconstraintsthatarevalidforthat

specificpath–  Apathcondi2onisuniqueforeachstatesinceapathcaninfluencethe

symbolicvaluesonapathbypathbasis–  Onabranchstatement,astateisclonedforpossiblepaths–  Thepathcondi2onisupdatedperstate,tomimicuniquepaths

•  Determiningmaliciousconcretevaluesareboundedbythepathcondi2on–  ThesearesenttoSTPsolver–  Isthereapossiblesetofvaluesthatcancauseanissue?

Page 7: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE>OverallProcess•  CompileprogramintobytecodewithLLVM•  RunKLEEwithdefinednumberofargumentsandini2alcharacter

boundconstraintsofarguments–  Assistswithabstractdomaintomakeitbounded

•  Simulatetheprogram,symbolicexecu2on–  Collectconstraintsonvariables,updatestate

•  Forbranches,determinewhatispossiblebasedonconstraints–  Passconstraintstosolvertoseewhatbranchispossible–  Clonestateforallpossiblebranches,updatepathcondi2onsineach

state–  Similartomay/mustanalysis

•  Forpoten2aldangerousopera2ons,iden2fyanyconcretevaluesthatcausedangerousopera2ons–  Passconstraintstosolver–  Returnanypossiblevaluesthatcancauseundesiredresults

•  Usefulforboundschecking,pointerdereferencing,asser2ons

Page 8: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE>PrecisionfromLLVMbytecode

•  Theconstraintsareveryprecisebecausethebytecoderepresentsbit‐levelaccuracy

•  Thisreducestheapproxima2onusedinmodelingtherunningapplica2on

•  Thisprecisionmakesthesolvermoreeffec2veindeterminingpossiblevalues

Page 9: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE>No2onofStates

•  Eachstaterepresentsoneuniquepathintheprogramatagivenpointinrun2me

•  Needtomaintainsymbolicvaluesbystateatthegiveninstruc2on

•  Maintainsregisterfile,stack,heap,programcounter–  Instruc2onpointerismaintainedbyKLEE

•  Maintainconstraintsofthepathcondi2onsforusewithinthesolver–  Statesmaybeac2veorinac2veforagiveninstruc2onbasedonpathcondi2onandconstraints

Page 10: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE>ConstraintsandPaths

•  Thegoalistofindconcretevaluesthatcausedangerousopera2ons

•  Forthesolvertobeeffec2veinfindingconcretevalues,theabstractdomainneedstobereduced

•  Pathcondi2onssetconstraintsonvariablevaluesofthespecificpath–  i<0,j==10,etc

•  Symbolicvaluescreatesitsownconstraintsonvariables–  i=(2xi)+10–  j=j2

•  Thecombina2onofsymbolicvaluesandpathcondi2onssetboundsforthesolvertodeterminepossiblevaluesbasedonstateforagiveninstruc2on

Page 11: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE>PerformanceandEnvironment

•  Twoofthebiggestchallengeswereperformanceandmodelingopera2onsinvolvingtheenvironment

•  Thenumberofstatescangrowrapidly–  Tocombatit,KLEEusesasharedmemorymappingbetweenstates

•  Useofcompiler‐liketrickstomakeproblemseasierforthesolver

•  EnvironmentcallsaremodeledbyCcode,toreflecttherun2mestate–  UseofuClibctomimicsystemcalls–  KLEEdevelopershavesetupothercustommodelstoreflectopera2onsinvolvingtheenvironment

Page 12: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

KLEE>Results

•  Lookedatpackageswhichsupportedcommoncommand‐lineprogramslikelsandtr

•  Averageof90%codecoverage•  HighlighteddifferencesbetweeninCoreU2lsandBusybox– Simulatedthesamecommandsandfounddifferencesbetweenthetwopackages

•  FounderrorsinbothCoreU2lsandBusybox,respec2vely

Page 13: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance
Page 14: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance
Page 15: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance
Page 16: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

DifferencesbetweenCoreU2lsandBusybox

Page 17: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

MyThoughts

•  Therearealotofsimilari2esfromwhatwehavediscussedinclass–  PHPpaperusedsinksandsinksourceswithquerystatements–  Thispaperlooksforopera2onslikepointers,asser2ons,prinl,

andload/stores–  Symbolicexecu2onlikethePHPpaper–  May/mustanalysisforlookingatpoten2alpaths–  Constraintsanduseofasolver

•  Constraintsdefinedbysymbolicanalysisandpaths–  Canbeconsideredcontextandflowsensi2ve

•  Createsnewstatesbasedonpathbranches•  Simulatesfunc2oncallsperstatebasedonthecurrentstatevalues

–  Concre2za2onbasedonsymbolicvaluesandpathcondi2ons

Page 18: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

MyThoughts•  Therearesomedifferencesbetweentheapproaches– Nomen2onofacontrolflowgraph,purelyasimula2ontool

–  Theirgoalisonlytofindconcretevaluesbasedonstates,sotherearenomeetorjoinopera2ons•  Theyarelookingatspecificstatesandderivingconcretevaluesthataredangerous

•  Theyarenotapproxima2ngsystemfunc2onality

– Othersta2canalysisusedapproxima2onbecauseprecisionisexpensive•  Iamcurioushowlargethetestedapplica2onswere•  Authorsclaimthatthecodewascomplicatedbutmyassump2onisthattherewasnotalotofcode

Page 19: KLEE: Unassisted and Automac Generaon of High‐Coverage Tests …trj1/cse598-f11/slides/klee.pdf · KLEE > Performance and Environment • Two of the biggest challenges were performance

Ques2ons

WhichUniversityhastheHardTimesCaféshowntothelem?