Founda’ons of Soware Engineeringckaestne/17313/2017/20170907-metrics.pdf · Visual Studio since...
Transcript of Founda’ons of Soware Engineeringckaestne/17313/2017/20170907-metrics.pdf · Visual Studio since...
Founda'onsofSo,wareEngineering
Lecture4:MeasurementClaireLeGoues
1
Administrivia
• HW1aduetonight– Rememberteampolicy
• Reminderre:quizzes/readings
2
LearningGoals• Usemeasurementsasadecisiontooltoreduceuncertainty
• Understanddifficultyofmeasurement;discussvalidityofmeasurements
• ExamplesofmetricsforsoIwarequaliKesandprocess
• UnderstandlimitaKonsanddangersofdecisionsandincenKvesbasedonmeasurements
3
AboutYouCouldexplainCycloma'cComplexity
No
Vaguely
Yes
4
CaseStudy:TheMaintainabilityIndex
5
VisualStudiosince2007
“MaintainabilityIndexcalculatesanindexvaluebetween0and100thatrepresentstherelaKveeaseofmaintainingthecode.AhighvaluemeansbeGermaintainability.ColorcodedraKngscanbeusedtoquicklyidenKfytroublespotsinyourcode.Agreenra'ngisbetween20and100andindicatesthatthecodehasgoodmaintainability.Ayellowra'ngisbetween10and19andindicatesthatthecodeismoderatelymaintainable.Aredra'ngisara'ngbetween0and9andindicateslowmaintainability.”
6
• Indexbetween0and100represenKngtherelaKveeaseofmaintainingthecode.
• HigherisbeZer.Colorcodedbynumber:– Green:between20and100– Yellow:between10and19– Red:between0and9.FromVisualStudio,since2007
7
Designra'onal(fromMSDNblog)
• "WenoKcedthatascodetendedtoward0itwasclearlyhardtomaintaincodeandthedifferencebetweencodeat0andsomenegaKvevaluewasnotuseful."
• "Thedesirewasthatiftheindexshowedredthenwewouldbesayingwithahighdegreeofconfidencethattherewasanissuewiththecode."
hZp://blogs.msdn.com/b/codeanalysis/archive/2007/11/20/maintainability-index-range-and-meaning.aspx8
TheIndex
MaintainabilityIndex=MAX(0,(171– 5.2*log(HalsteadVolume)– 0.23*(CyclomaKcComplexity)– 16.2*log(LinesofCode) )*100/171)
9
LinesofCode• Easytomeasure
>wc–lfile1file2…
LOC projects
450 ExpressionEvaluator
2.000 Sudoku,FuncKonalGraphLibrary
40.000 OpenVPN
80-100.000 BerkeleyDB,SQLlight
150-300.000 Apache,HyperSQL,Busybox,Emacs,Vim,ArgoUML
500-800.000 gimp,glibc,mplayer,php,SVN
1.600.000 gcc
6.000.000 Linux,FreeBSD
45.000.000 WindowsXP10
NormalizingLinesofCode• Ignorecommentsandemptylines• Ignorelines<2characters• PreZyprintsourcecodefirst• Countstatements(logicallinesofcode)
for(i=0;i<100;i+=1)print("hello");/*Howmanylinesofcodeisthis?*/
/*Howmanylinesofcodeisthis?*/for(
i=0;i<100;i+=1
){print("hello");
}11
Normaliza'onperLanguageLanguage Statementfactor
(produc'vity)Linefactor
C 1 1
C++ 2.5 1
Fortran 2 0.8
Java 2.5 1.5
Perl 6 6
Smalltalk 6 6.25
Python 6.5
Source:hZp://www.codinghorror.com/blog/2005/08/are-all-programming-languages-the-same.htmlu.a.
12
HalsteadVolume
• IntroducedbyMauriceHowardHalsteadin1977
• HalsteadVolume=numberofoperators/operands*log2(numberofdisKnct operators/operands)
• Approximatessizeofelementsandvocabulary
13
HalsteadVolume-Example
main(){inta,b,c,avg;scanf("%d%d%d",&a,&b,&c);avg=(a+b+c)/3;print("avg=%d",avg);
}
Operators/Operands:main,(),{},int,a,b,c,avg,scanf,(),"…",&,a,&,b,&,c,avg,=,a,+,b,+,c,(),/,3,
print,(),"…",avg14
Cycloma'cComplexity• ProposedbyMcCabe1976• Basedoncontrolflowgraph,measureslinearlyindependentpathsthroughaprogram– ~=numberofdecisions– Numberoftestcasesneededtoachievebranchcoverage
if(c1){f1();
}else{f2();
}if(c2){
f3();}else{
f4();}
M=edgesofCFG–nodesofCFG+2*endpoints15
Origins• 1992PaperattheInternaKonalConferenceonSoIwareMaintenancebyPaulOmanandJackHagemeister
• DevelopersratedanumberofHPsystemsinCandPascal
• StaKsKcalregressionanalysistofindkeyfactorsamong40metrics
COM=percentageofcomments16
Thoughts?• MetricseemsaZracKve• Easytocompute• OIenseemstomatchintuiKon
• Parametersseemalmostarbitrary,calibratedinsinglesmallstudycode(fewdevelopers,unclearstaKsKcalsignificance)
• Allmetricsrelatedtosize:justmeasurelinesofcode?
• Original1992C/PascalprogramspotenKallyquitedifferentfromJava/JS/C#code
hZp://avandeursen.com/2014/08/29/think-twice-before-using-the-maintainability-index/17
Evalua'ngMetrics• ForeverymetricanswerthefollowingquesKons:
– Whatisthepurposeofthemeasure?– Whatisthescopeofthemeasure?– WhataZributearewetryingtomeasure?– WhatisthenaturalscaleoftheaZributewearetryingtomeasure?– WhatisthenaturalvariabilityoftheaZribute?– Whatisthemetric(measurementfuncKon)?Whatmeasuringinstrumentdo
weuse?– Whatisthenaturalscaleforthemetric?– Whatisthenaturalmeasurementerrorforthisinstrument?– WhatistherelaKonshipoftheaZributetothemetric?(constructvalidity)– Whatarethenaturalandforeseeablesideeffectsofusingthisinstrument?
Furtherreading:KanerandBond.So,wareEngineeringMetrics:WhatDoTheyMeasureandHowDoWeKnow?METRICS200418
MeasurementforDecisionMakinginSo,wareDevelopment
19
WhatisMeasurement?• Measurementistheempirical,objecKveassignmentofnumbers,accordingtoarulederivedfromamodelortheory,toaZributesofobjectsoreventswiththeintentofdescribingthem.–Craner,Bond,“SoIwareEngineeringMetrics:WhatDoTheyMeasureandHowDoWeKnow?”
• AquanKtaKvelyexpressedreducKonofuncertaintybasedononeormoreobservaKons.–Hubbard,“HowtoMeasureAnything…”
20
So,warequalitymetric
IEEE1061says:“Aso&warequalitymetricisafunc5on
whoseinputsareso&waredataandwhoseoutputisasinglenumericalvaluethatcan
beinterpretedasthedegreetowhichso&wareprocessesagivena<ributethat
affectsitsquality.”
21
MeasurementforDecisionMaking
• Fundproject?• MoretesKng?• Fastenough?Secureenough?• Codequalitysufficient?• Whichfeaturetofocuson?• Developerbonus?• TimeandcostesKmaKon?PredicKonsreliable?
22
Whatso,warequali'esdowecareabout?(examples)• Scalability• Security• Extensibility• DocumentaKon• Performance• Consistency• Portability
• Installability• Maintainability• FuncKonality(e.g.,dataintegrity)
• Availability• Easeofuse
23
Whatprocessquali'esdowecareabout?(examples)• On-Kmerelease• Developmentspeed• MeeKngefficiency• Conformancetoprocesses
• Timespentonrework• ReliabilityofpredicKons
• Fairnessindecisionmaking
• Measure5me,costs,ac5ons,resources,andqualityofworkpackages;comparewithpredic5ons
• Useinforma5onfromissuetrackers,communica5onnetworks,teamstructures,etc
• …
24
Trendanalyses
25
Benchmark-BasedMetrics
• Monitormanyprojectsormanymodules,gettypicalvaluesformetrics
• ReportdeviaKons
hZps://semmle.com/insights/26
MeasurementisDifficult
27
28
Measurementsvalidity• Construct–Arewemeasuringwhatweintendedtomeasure?
• PredicKve–TheextenttowhichthemeasurementcanbeusedtoexplainsomeothercharacterisKcoftheenKtybeingmeasured
• Externalvalidity–ConcernsthegeneralizaKonofthefindingstocontextsandenvironments,otherthantheonestudied
29
Everythingismeasurable1. IfXissomethingwecareabout,thenX,bydefiniKon,
mustbedetectable.– Howcouldwecareaboutthingslike“quality,”“risk,”“security,”or“publicimage”ifthesethingsweretotallyundetectable,directlyorindirectly?
– IfwehavereasontocareaboutsomeunknownquanKty,itisbecausewethinkitcorrespondstodesirableorundesirableresultsinsomeway.
2. IfXisdetectable,thenitmustbedetectableinsomeamount.
– Ifyoucanobserveathingatall,youcanobservemoreofitorlessofit
3. Ifwecanobserveitinsomeamount,thenitmustbemeasurable.
D.Hubbard,HowtoMeasureAnything,2010
30
31
Thestreetlighteffect• AknownobservaKonalbias.
• Peopletendtolookforsomethingonlywhereit’seasiesttodoso.– Ifyoudropyourkeysatnight,you’lltendtolookforitunderstreetlights.
32
33
Whatcouldpossiblygowrong?• BadstaKsKcs:Abasicmisunderstandingofmeasurementtheoryandwhatisbeingmeasured.
• Baddecisions:Theincorrectuseofmeasurementdata,leadingtounintendedsideeffects.
• BadincenKves:Disregardforthehumanfactors,orhowtheculturalchangeoftakingmeasurementswillaffectpeople.
34
Lies,damnedlies,and…
• In1995,theUKCommiZeeonSafetyofMedicinesissuedthefollowingwarning:"third-generaKonoralcontracepKvepillsincreasedtheriskofpotenKallylife-threateningbloodclotsinthelegsorlungstwofold--thatis,by100percent”
35
…sta's'cs• “…ofevery7,000womenwhotooktheearlier,second-generaKonoralcontracepKvepills,aboutonehadathrombosis;thisnumberincreasedtotwoamongwomenwhotookthird-generaKonpills…”
• “…Theabsoluteriskincreasewasonlyonein7,000,whereastherela'veincrease(amongwomenwhodevelopedbloodclots)wasindeed100percent.”
36
Understandingyourdata
37
Measurementscales• Scale:thetypeofdatabeingmeasured.• Thescaledictateswhatsortsofanalysis/arithmeKcislegiKmateormeaningful.
• YouropKonsare:– Nominal:categories– Ordinal:order,butnomagnitude.– Interval:order,magnitude,butnozero.– RaKo:Order,magnitude,andzero.– Absolute:specialcaseofraKo.
38
Summaryofscales
39
Nominal/categoricalscale• EnKKesclassifiedwithrespecttoacertainaZribute.
CategoriesarejointlyexhausKveandmutuallyexclusive.– Noimpliedorderbetweencategories!
• Categoriescanberepresentedbylabelsornumbers;however,theydonotrepresentamagnitude,arithme5copera5onhavenomeaning.
• CanbecomparedforidenKtyordisKncKon,andmeasurementscanbeobtainedbycounKngthefrequenciesineachcategory.Datacanalsobeaggregated.
En'ty AGribute Categories
ApplicaKon Purpose E-commerce,CRM,Finance
ApplicaKon Language Java,Python,C++,C#
Fault Source assignment,checking,algorithm,funcKon,interface,Kming
40
Ordinalscale• Orderedcategories:mapsameasuredaZributetoanordered
setofvalues,butnoinformaKonaboutthemagnitudeofthedifferencesbetweenelements.
• Measurementscanberepresentedbylabelsornumbers,BUT:ifnumbersareused,theydonotrepresentamagnitude.– Honestly,trynottodothat.IteliminatestemptaKon.
• Youcannot:add,subtract,performaverages,etc(arithmeKcoperaKonsareout).
• Youcan:comparewithoperators(like“lessthan”or“greaterthan”),createranksforthepurposesofrankcorrelaKons(Spearman’scoefficient,Kendall’sτ).
En'ty AGribute Values
ApplicaKon Complexity VeryLow,Low,Average,High,VeryHigh
Fault Severity 1–CosmeKc,2–Moderate,3–Major,4–CriKcal
41
Intervalscale• Hasorder(likeordinalscale)andmagnitude.
– TheintervalsbetweentwoconsecuKveintegersrepresentequalamountsoftheaZributebeingmeasured.
• DoesNOThaveazero:0isanarbitrarypoint,anddoesn’tcorrespondtotheabsenceofaquanKty.
• MostarithmeKc(addiKon,subtracKon)isOK,asaremeananddispersionmeasurements,asarePearsoncorrelaKons.RaKosarenotmeaningful.– Ex:Thetemperatureyesterdaywas64oF,andtodayis32oF.Istodaytwiceascoldasyesterday?
• Incrementalvariables(quanKtyasoftoday–quanKtyatanearlierKme)andpreferencesarecommonlymeasuredinintervalscales.
42
Ra'oscale• AnintervalscalethathasatruezerothatactuallyrepresentstheabsenceofthequanKtybeingmeasured.
• AllarithmeKcismeaningful.• Absolutescaleisaspecialcase,measurementsimplymadebycounKngthenumberofelementsintheobject.– Takestheform“numberofoccurrencesofXintheenKty.”
En'ty AGribute Values
Project Effort Realnumbers
SoIware Complexity CyclomaKccomplexity
43
• ForcausaKon– Provideatheory(fromdomainknowledge,independentofdata)
– ShowcorrelaKon– Demonstrateabilitytopredictnewcases(replicate/validate)
hZp://xkcd.com/552/44
45
Confoundingvariables
– IfyoulookonlyatthecoffeeconsumpKon→cancerrelaKonship,youcangetverymisleadingresults
– Smokingisaconfounder
CoffeeconsumpKon
Smoking
Cancer
AssociaKons
CausalrelaKonship
46
EffectofClassSizeontheValidityofObject-orientedMetrics
KhaledElEmam,SaidaBenlarbi,andNishithGoel,September1999
Onlyfour,outoftwenty-fourcommonlyusedobject-orientedmetrics,wereactuallyusefulinpredicKngthequalityofasoIwaremodulewhentheeffectofthemodulesizewasaccountedfor.
47
48
TheMcNamaraFallacy• Thereseemstobeageneralmisunderstandingtotheeffect
thatamathemaKcalmodelcannotbeundertakenunKleveryconstantandfuncKonalrelaKonshipisknowntohighaccuracy.ThisoIenleadstotheomissionofadmiZedlyhighlysignificantfactors(mostofthe“intangibles”influencesondecisions)becausetheseareunmeasuredorunmeasurable.Toomitsuchvariablesisequivalenttosayingthattheyhavezeroeffect...Probablytheonlyvalueknowntobewrong…– J.W.Forrester,IndustrialDynamics,TheMITPress,1961
49
McNamarafallacy
1. Measurewhatevercanbeeasilymeasured.
2. Disregardthatwhichcannotbemeasuredeasily.
3. Presumethatwhichcannotbemeasuredeasilyisnotimportant.
4. Presumethatwhichcannotbemeasuredeasilydoesnotexist.
hZps://chronotopeblog.com/2015/04/04/the-mcnamara-fallacy-and-the-problem-with-numbers-in-educaKon/50
DefectDensity• Defectdensity=Knownbugs/lineofcode• Systemspoilage=Kmetofixpost-releasedefects/
totalsystemdevelopmentKme• Post-releasevspre-release• Whatcountedasdefect?Severity?Relevance?• Whatsizemetricused?• Whatqualityassurancemechanismsused?
• LiZlereferencedatapubliclyavailable;typically2-10defects/1000linesofcode
51
MeasuringUsability
52
Measurementstrategies
• Automatedmeasuresoncoderepositories
• Useorcollectprocessdata• Instrumentprogram(e.g.,in-fieldcrashreports)
• Surveys,interviews,controlledexperiments,expertjudgment
• StaKsKcalanalysisofsample53
MetricsandIncen'ves
54
hZp://dilbert.com/strips/comic/1995-11-13/
55
56
Produc'vityMetrics
• Linesofcodeperday?– Industryaverage10-50lines/day– Debugging+reworkca.50%ofKme
• FuncKon/object/applicaKonpointspermonth
• Bugsfixed?• Milestonesreached?
57
StackRanking
58
Incen'vizingProduc'vity
• Whathappenswhendeveloperbonusesarebasedon– Linesofcodeperday– AmountofdocumentaKonwriZen– Lownumberofreportedbugsintheircode– Lownumberofopenbugsintheircode– Highnumberoffixedbugs– AccuracyofKmeesKmates
59
AutonomyMasteryPurpose
CanexKnguishintrinsicmoKvaKonCandiminishperformance
CancrushcreaKvityCancrowdoutgoodbehavior
CanencouragecheaKng,shortcuts,andunethicalbehaviorCanbecomeaddicKve
Canfostershort-termthinking60
Tempta'onofSo,wareMetrics
61
So,wareQualityMetrics
• IEEE1061definiKon:“AsoIwarequalitymetricisafuncKonwhoseinputsaresoIwaredataandwhoseoutputisasinglenumericalvaluethatcanbeinterpretedasthedegreetowhichsoIwareprocessesagivenaZributethataffectsitsquality.”
• MetricshavebeenproposedformanyqualityaZributes;maydefineownmetrics
62
ExternalaGributes:MeasuringQuality
McCall model has 41 metrics to measure 23 quality criteria from 11 factors
63
Decomposi'onofMetrics
Maintainability
Correctability
Testability
Expandability
Faultscount
DegreeoftesKng
Effort
Changecounts
ClosureKmeIsolate/fixKmeFaultrateStatementcoverageTestplancompletenessResourcepredicKonEffortexpenditureChangeeffortChangesizeChangerate
64
Object-OrientedMetrics
• NumberofMethodsperClass• DepthofInheritanceTree• NumberofChildClasses• CouplingbetweenObjectClasses• CallstoMethodsinUnrelatedClasses• …
65
Otherqualitymetrics?
• Commentdensity• Testcoverage• Componentbalance(systembreakdownopKmalityandcomponentsizeuniformity)
• Codechurn(numberoflinesadded,removed,changedinafile)
• …66
Warning• MostsoIwaremetricsarecontroversial
– Usuallyonlyplausibilityarguments,rarelyrigorouslyvalidated– CyclomaKccomplexitywasrepeatedlyrefutedandissKllused– “SimilartotheaZemptofmeasuringtheintelligenceofapersonin
termsoftheweightorcircumferenceofthebrain”
• Usecarefully!• Codesizedominatesmanymetrics• Avoidclaimsabouthumanfactors(e.g.,readability)and
quality,unlessvalidated• Calibratemetricsinprojecthistoryandotherprojects• Metricscanbegamed;yougetwhatyoumeasure
67
(Some)strategies• Metricstrackedusingtoolsandprocesses(processmetricslikeKme,orcodemetricslikedefectsinabugdatabase).
• Expertassessmentorhuman-SubjectExperiments(controlledexperiments,talk-aloudprotocols).
• MiningsoIwarerepositories,defectdatabases,especiallyfortrendanalysisordefectpredicKon.– Somesuccesse.g.,asreportedbyMicrosoIResearch
• Benchmarking(especiallyforperformance).
68
Factorsinasuccessfulmeasurementprogram1. SetsolidmeasurementobjecKvesandplans.2. Makemeasurementpartoftheprocess.3. Gainathoroughunderstandingof
measurement.4. Focusonculturalissues.5. Createasafeenvironmenttocollectand
reporttruedata.6. CulKvateapredisposiKontochange.7. Developacomplementarysuiteofmeasures.
CarolA.DekkersandPatriciaA.McQuaid,“TheDangersofUsingSoIwareMetricsto(Mis)Manage”,2002.69
Kaner’sques'onswhenchoosingametric1. Whatisthepurposeofthis
measure?2. Whatisthescopeofthis
measure?3. WhataZributeareyoutryingto
measure?4. WhatistheaZribute’snatural
scale?5. WhatistheaZribute’snatural
variability?6. Whatinstrumentareyouusing
tomeasuretheaZribute,andwhatreadingdoyoutakefromtheinstrument?
CemKanerandWalterP.Bond.“SoIwareEngineeringMetrics:WhatDoTheyMeasureandHowDoWeKnow?”2004
7. Whatistheinstrument’snaturalscale?
8. Whatisthereading’snaturalvariability(normallycalledmeasurementerror)?
9. WhatistheaZribute’srelaKonshiptotheinstrument?
10. Whatarethenaturalandforeseeablesideeffectsofusingthisinstrument?
70
Summary• Measurementisdifficultbutimportantfordecisionmaking
• SoIwaremetricsareeasytomeasurebuthardtointerpret,validityoIennotestablished
• Manymetricsexist,oIencomposed,pickordesignsuitablemetricsifneeded
• Carefulinuse:monitoringvsincenKves• Strategiesbeyondmetrics
71
FurtherReadingonMetrics• Sommerville.SoIwareEngineering.EdiKon7/8,SecKons26.1,27.5,and28.3
• Hubbard.Howtomeasureanything:Findingthevalueofintangiblesinbusiness.JohnWiley&Sons,2014.Chapter3
• KanerandBond.SoIwareEngineeringMetrics:WhatDoTheyMeasureandHowDoWeKnow?METRICS2004
• FentonandPfleeger.SoIwareMetrics:Arigorous&pracKcalapproach.ThomsonPublishing1997
72
Microso,Survey(2014)
• "SupposeyoucouldworkwithateamofdatascienKstsanddataanalysistswhospecializeinstudyinghowsoIwareisdeveloped.PleaselistuptofivequesKonsyouwouldlikethemtoanswer.Whydoyouwanttoknow?Whatwouldyoudowiththeanswers?"AndrewBegelandThomasZimmermann."Analyzethis!145quesKonsfordatascienKstsinsoIwareengineering."ICSE.2014.
73
TopQues'ons• HowdouserstypicallyusemyapplicaKon?• WhatpartsofasoIwareproductaremostusedand/orlovedbycustomers?
• HoweffecKvearethequalitygateswerunatcheckin?
• HowcanweimprovecollaboraKonandsharingbetweenteams?
• Whatarebestkeyperformanceindicators(KPIs)formonitoringservices?
• Whatistheimpactofacodechangeorrequirementschangetotheprojectandtests?
74
TopQues'ons• WhatistheimpactoftoolsonproducKvity?• HowdoIavoidreinvenKngthewheelbysharingand/orsearchingforcode?
• WhatarethecommonpaZernsofexecuKoninmyapplicaKon?
• Howwelldoestestcoveragecorrespondtoactualcodeusagebyourcustomers?
• WhatkindsofmistakesdodevelopersmakeintheirsoIware?Whichonesarethemostcommon?
• WhatareeffecKvemetricsforshipquality?
75
BoGomQues'ons• Whichindividualmeasurescorrelatewithemployee
producKvity(e.g.,employeeage,tenure,engineeringskills,educaKon,promoKonvelocity,IQ)?
• WhichcodingmeasurescorrelatewithemployeeproducKvity(e.g.,linesofcode,KmeittaketobuildthesoIware,aparKculartoolset,pairprogramming,numberofhoursofcodingperday,language)?
• Whatmetricscanbeusedtocompareemployees?• HowcanwemeasuretheproducKvityofaMicrosoI
employee?• Isthenumberofbugsagoodmeasureofdeveloper
effecKveness?• CanIgenerate100%testcoverage?
76