KLEE: Unassisted and Automac Generaon of High‐Coverage Tests...
Transcript of KLEE: Unassisted and Automac Generaon of High‐Coverage Tests...
KLEE:UnassistedandAutoma2cGenera2onofHigh‐Coverage
TestsforComplexSystemsPrograms
Cris2anCadar,DanielDunbar,DawsonEnglerStanfordUniversity
PresentedbyAdamBergsteinNovember28,2011
Outline• Background
– Symbolicexecu2on– Constraintsandsolvers– Sinks/sinksources– Abstractdomainandconcre2za2on– Systemmodeling
• KLEE– Mainconcepts– Overallprocess– PrecisionfromLLVMandbytecode– No2onofstates– Constraintsandpaths– PerformanceandEnvironment– Results
• MyThoughts• Ques2ons
Background• Symbolicexecu2on– Simula2onthatapproximatesvariablevaluesbyusingsymbols
– Opera2onsonvariablesconstrainthesymbols– Usedtoreasonaboutpossiblevaluesthatcausecertaincondi2onsinaprogram• Isasymbolicvalueintherangeofvaluesthatcausesomethingtooccur?
– hXp://www.stat.uga.edu/stat_files/billard/tr_symbolic.pdf• Constraintsandsolvers– Constraintsarecollectedfactsaboutaprogramthatdefineboundsonpossibleexecu2onatspecificpointsinaprogram
– Solversdeterminethepossibilityofconcretevaluesbasedontheconstraints
– Certainconcretevaluescancondi2onallycauseprogramstobehaveinundesirableways
Background
• Sinksandsinksources– Sinksiden2fymeaningfulopera2onswithinthecode– Sourcesiden2fythedataoriginsthatcaninfluencesinks
• Abstractdomainandconcre2za2on– Definingtherangeofallpossiblevaluesforvariables– Concre2za2onmapsactualvariablevaluesfromrangesofpossiblevalues
• Systemmodeling– “Approxima2ng”howasystembehaveswhenitruns– Wehavelookedatdifferentwaystorepresentsystems,likeCFGs,summaryfunc2ons,etc
KLEE>MainConcepts• Useofsta2canalysistodetermineiftherearepossible
concretevaluesthatcausevulnerabili2esintheprogram• Simulateaprogramandleveragesymbolicexecu2on• Buildconstraintsandmaintainaseriesofstatesthroughoutthe
simula2on– Statesdefineeachuniquepaththroughouttheprogram
• Leverageasolvertodeterminepossibili2eswithintheprogrambasedonconstraints– Returnconcretevaluesifsomethingwassolvable
• Documentareasofthecodethathaveanypossiblevaluesthatcancausevulnerabili2es– Basedonasetofpossibledangerousopera2ons
• “Basedontheconstraints(stateofuniquepath)atthe2meIgettothislineofcodewithapoten2allydangerousopera2on,isthereanypossiblevaluethatcancausethislineofcodetobedangerous?”
KLEE>MainConcepts• KLEEbeginsbyconstruc2ngunconstrainedvariablesforargumentsinto
state– Ini2alconstraintsaresetbasedon‐‐sym‐argswhenrunningKLEE– Definesnumberofargumentsandnumberofcharactersperargument– Setsini2alconstraintssoopera2onisnottotallyunbounded
• Analysissimulateseachinstruc2onandrunseachstateperinstruc2on– Schedulingalgorithmtoselectwhichstatetoanalyzefirst– Collectmoreconstraints,updatethesymbolicvaluesinthestate– Whenreachingapoten2alopera2onthatcontainsanexitorerror,lookat
thepathcondi4on• Pathcondi2onsarethecollec2onofconstraintsthatarevalidforthat
specificpath– Apathcondi2onisuniqueforeachstatesinceapathcaninfluencethe
symbolicvaluesonapathbypathbasis– Onabranchstatement,astateisclonedforpossiblepaths– Thepathcondi2onisupdatedperstate,tomimicuniquepaths
• Determiningmaliciousconcretevaluesareboundedbythepathcondi2on– ThesearesenttoSTPsolver– Isthereapossiblesetofvaluesthatcancauseanissue?
KLEE>OverallProcess• CompileprogramintobytecodewithLLVM• RunKLEEwithdefinednumberofargumentsandini2alcharacter
boundconstraintsofarguments– Assistswithabstractdomaintomakeitbounded
• Simulatetheprogram,symbolicexecu2on– Collectconstraintsonvariables,updatestate
• Forbranches,determinewhatispossiblebasedonconstraints– Passconstraintstosolvertoseewhatbranchispossible– Clonestateforallpossiblebranches,updatepathcondi2onsineach
state– Similartomay/mustanalysis
• Forpoten2aldangerousopera2ons,iden2fyanyconcretevaluesthatcausedangerousopera2ons– Passconstraintstosolver– Returnanypossiblevaluesthatcancauseundesiredresults
• Usefulforboundschecking,pointerdereferencing,asser2ons
KLEE>PrecisionfromLLVMbytecode
• Theconstraintsareveryprecisebecausethebytecoderepresentsbit‐levelaccuracy
• Thisreducestheapproxima2onusedinmodelingtherunningapplica2on
• Thisprecisionmakesthesolvermoreeffec2veindeterminingpossiblevalues
KLEE>No2onofStates
• Eachstaterepresentsoneuniquepathintheprogramatagivenpointinrun2me
• Needtomaintainsymbolicvaluesbystateatthegiveninstruc2on
• Maintainsregisterfile,stack,heap,programcounter– Instruc2onpointerismaintainedbyKLEE
• Maintainconstraintsofthepathcondi2onsforusewithinthesolver– Statesmaybeac2veorinac2veforagiveninstruc2onbasedonpathcondi2onandconstraints
KLEE>ConstraintsandPaths
• Thegoalistofindconcretevaluesthatcausedangerousopera2ons
• Forthesolvertobeeffec2veinfindingconcretevalues,theabstractdomainneedstobereduced
• Pathcondi2onssetconstraintsonvariablevaluesofthespecificpath– i<0,j==10,etc
• Symbolicvaluescreatesitsownconstraintsonvariables– i=(2xi)+10– j=j2
• Thecombina2onofsymbolicvaluesandpathcondi2onssetboundsforthesolvertodeterminepossiblevaluesbasedonstateforagiveninstruc2on
KLEE>PerformanceandEnvironment
• Twoofthebiggestchallengeswereperformanceandmodelingopera2onsinvolvingtheenvironment
• Thenumberofstatescangrowrapidly– Tocombatit,KLEEusesasharedmemorymappingbetweenstates
• Useofcompiler‐liketrickstomakeproblemseasierforthesolver
• EnvironmentcallsaremodeledbyCcode,toreflecttherun2mestate– UseofuClibctomimicsystemcalls– KLEEdevelopershavesetupothercustommodelstoreflectopera2onsinvolvingtheenvironment
KLEE>Results
• Lookedatpackageswhichsupportedcommoncommand‐lineprogramslikelsandtr
• Averageof90%codecoverage• HighlighteddifferencesbetweeninCoreU2lsandBusybox– Simulatedthesamecommandsandfounddifferencesbetweenthetwopackages
• FounderrorsinbothCoreU2lsandBusybox,respec2vely
DifferencesbetweenCoreU2lsandBusybox
MyThoughts
• Therearealotofsimilari2esfromwhatwehavediscussedinclass– PHPpaperusedsinksandsinksourceswithquerystatements– Thispaperlooksforopera2onslikepointers,asser2ons,prinl,
andload/stores– Symbolicexecu2onlikethePHPpaper– May/mustanalysisforlookingatpoten2alpaths– Constraintsanduseofasolver
• Constraintsdefinedbysymbolicanalysisandpaths– Canbeconsideredcontextandflowsensi2ve
• Createsnewstatesbasedonpathbranches• Simulatesfunc2oncallsperstatebasedonthecurrentstatevalues
– Concre2za2onbasedonsymbolicvaluesandpathcondi2ons
MyThoughts• Therearesomedifferencesbetweentheapproaches– Nomen2onofacontrolflowgraph,purelyasimula2ontool
– Theirgoalisonlytofindconcretevaluesbasedonstates,sotherearenomeetorjoinopera2ons• Theyarelookingatspecificstatesandderivingconcretevaluesthataredangerous
• Theyarenotapproxima2ngsystemfunc2onality
– Othersta2canalysisusedapproxima2onbecauseprecisionisexpensive• Iamcurioushowlargethetestedapplica2onswere• Authorsclaimthatthecodewascomplicatedbutmyassump2onisthattherewasnotalotofcode
Ques2ons
WhichUniversityhastheHardTimesCaféshowntothelem?