Cory Kapser Cloning Considered Harmful Considered Harmful...

8
"Cloning Considered Harmful" Considered Harmful A look back Cory Kapser and Mike Godfrey University of Waterloo WCRE 2006, Benevento Italy Cory Kapser Living in Calgary since graduaHng in 2009 Working at a startup His office, 1986 Talk at IBM Toronto, 1999

Transcript of Cory Kapser Cloning Considered Harmful Considered Harmful...

Page 1: Cory Kapser Cloning Considered Harmful Considered Harmful ...plg.uwaterloo.ca/~migod/papers/2016/SANER-2016-MIP.pdf · • Working at a startup His office, 1986 Talk at IBM Toronto,

"CloningConsideredHarmful"ConsideredHarmful

Alookback

CoryKapserandMikeGodfreyUniversityofWaterloo

WCRE2006,BeneventoItaly

CoryKapser

•  LivinginCalgarysincegraduaHngin2009

•  Workingatastartup

Hisoffice,1986 TalkatIBMToronto,1999

Page 2: Cory Kapser Cloning Considered Harmful Considered Harmful ...plg.uwaterloo.ca/~migod/papers/2016/SANER-2016-MIP.pdf · • Working at a startup His office, 1986 Talk at IBM Toronto,

Myoffice,2002 IWPSE2004,Kyoto

Manyconferences,incl.MSR2006 IWPCkeynote,2003

Page 3: Cory Kapser Cloning Considered Harmful Considered Harmful ...plg.uwaterloo.ca/~migod/papers/2016/SANER-2016-MIP.pdf · • Working at a startup His office, 1986 Talk at IBM Toronto,

WCRE1998,andmanyotherssince

Dolly,RIP

TheagesofsoUwarecloningresearch

Abrieflookbackandforward

[StolenfromIWSC2015keynote]

Page 4: Cory Kapser Cloning Considered Harmful Considered Harmful ...plg.uwaterloo.ca/~migod/papers/2016/SANER-2016-MIP.pdf · • Working at a startup His office, 1986 Talk at IBM Toronto,

Math,science,andengineering

"Where'sthescienceinwhatyoudo?"

•  Scienceconcernsbuildingreliableexplanatorymodelsofhowtheworldworks–  ScienHficmodelsmustbetestable,and(reasonably)consistentwithobservedreality

–  ScienHficmodelsmaybestaHsHcal,structural(Newtonianmechanics),…

– Wheredothemodelscomefrom?•  Experiencewiththedomain,groundedtheory,…

Math,science,andengineering

"Where'sthescienceinwhatyoudo?"

•  Mathisn'treallyscience,perse!–  Itsonlyhardrequirementisself-consistency;goodmathneednothavepracHcalapplicaHons.

– Mathisakindofpoetry,withrigorousrulesofconstrucHon.

– MathisatoolusedbyscienHststohelpbuildandanalyzemodelsofhowtheworldworks

Math,science,andengineering

"Where'sthescienceinwhatyoudo?"

•  Engineeringis(roughly)thepracHcalapplicaHonofsciencetosolvereal-worldproblems– Mustunderstand"howtheworldworks"togetstuffdone–  Engineersmustalsoknowaboutprocesses,materials,costs,risks,tools,people,law,ethics,etc.

1.  TheAgeofMath:Clonedetec9onispossible!

–  Algorithmsexist,canscaletobigsystems!

[1990s:Baker,Johnson,Baxter,Ducasse,Merlo,…][2000s:CCFinder,iClones,NiCad,ConQAT,…]

Clone detection is possible!

ThethreeoverlappingagesofsoUwarecloningresearch

Page 5: Cory Kapser Cloning Considered Harmful Considered Harmful ...plg.uwaterloo.ca/~migod/papers/2016/SANER-2016-MIP.pdf · • Working at a startup His office, 1986 Talk at IBM Toronto,

ThethreeoverlappingagesofsoUwarecloningresearch

2.  TheAgeofScience:Cloneanalysisispossible!

–  Let'sassumedetecHon"justworks",whatcanyoutellmeaboutthesystemanditsclones?•  Someclonesevolve,somedon't•  Type3clonesaremorestable/lessbuggy/…

–  Notjustsourcecode!OtherarHfactsmaner!WedoMSR!e.g.,StackOverflow,Bugzilla,gitmeta-data

[2000s-2010s:Krinke,Kim,Kapser,Jürgens,…IWSC-16]

Clone analysis is possible!

ThethreeoverlappingagesofsoUwarecloningresearch

3.  TheAgeofEngineering:Clonemanagementispossible!

–  Clonetriage,clonerefactoring,linkedediHng,clonerecommendaHon,programtransformaHon,SPLs,…

[2000s-2010s:Robillard,LaToza,Basit,…]

Clone management is possible!

Theroadahead:Alookback?

•  Goodnews:We'veaccomplishedalot!– Weknowwhatwecandetect,howwell,andatwhatscale– We'vedonemanyempiricalstudiesontype1/2/3clones

•  …butwesHllaren'tsurewhichclonesareimportant/riskyandwhy–  Somaybeweneedmorecomprehensivemodelsofcloningasprac9cedbydevelopersandexperiencedbymanagers

Controversialstatement

Ifcloningresearchistohaveimpactonprac9ce,thenourimmediatescien9ficgoalsmustbe

moredeveloperoriented

•  Itisnotenoughsimplytofindclonesandthenrefactor(someof)them;rather,wemustaskquesHonssuchas:–  Whydotheseclonesexistinthefirstplace?–  Whatdesigndecisionsledtotheircrea9on?–  Howdodevelopersandmanagersperceivethem?–  Whatpossiblerisksdotheyrepresenttotheongoingdevelopmentofthe

soIwaresystem?–  Howcanwerecognizeclonesthatneedmanagement?–  Whatstrategiesshouldweusetomanagethemoverthelongterm?

Page 6: Cory Kapser Cloning Considered Harmful Considered Harmful ...plg.uwaterloo.ca/~migod/papers/2016/SANER-2016-MIP.pdf · • Working at a startup His office, 1986 Talk at IBM Toronto,

"Physicsistheonlyrealscience.Therestarejuststampcollec9ng."

Ernest Rutherford (1871 – 1937) Father of atomic physics Professor at McGill Univ.

&Univ. of Edinburgh Nobel prize for … chemistry

Zoologyc.1850

•  MostHmeisspentdoingdatacollecHon,cleansing,curaHon,etc.

•  Thenanalysis,organizaHon,categorizaHon,...– Basedonlow-levelempiricalobservaHon

•  WeakpredicHvepower

AlongcomesDarwin… Ataxonomyofcloningintent1.  Forking

–  HardwarevariaHone.g.,LinuxSCSIdrivers[SCAM2011]

–  PlasormvariaHon–  ExperimentalvariaHon

2.  TemplaHng–  BoilerplaHng–  API/libraryprotocols–  Generalizedprogrammingidioms–  Parameterizedcode

3.  Post-hoccustomizing–  Bugworkarounds–  Replicate+specialize

"'Cloningconsideredharmful'consideredharmful",CoryJ.KapserandMichaelW.Godfrey,WCRE2006

Page 7: Cory Kapser Cloning Considered Harmful Considered Harmful ...plg.uwaterloo.ca/~migod/papers/2016/SANER-2016-MIP.pdf · • Working at a startup His office, 1986 Talk at IBM Toronto,

Forking:PlasormvariaHon•  MoHvaHon

–  Differentplasorms⇒verydifferentlow-leveldetails–  Interleavingplasorm-specificcodeinoneplaceistoocomplex

•  Wellknownexamples–  Linuxkernel“arch”subsystem–  ApachePortableRunHme(APR)

•  PortableimploffuncHonalitythatistypicallyplasormdependent,suchasfileandnetworkaccess

•  fileio -> {netware, os2, unix, win32} •  Typicaldiffs:inserHonofextraerrorcheckingorAPIcalls•  Cloningisobviousandwelldocumented

Forking:PlasormvariaHon•  Advantagesofcloning

–  Each(cloned)variantissimplertomaintain–  Norisktostabilityofothervariants–  Plasormsarelikelytoevolveindependently,somaintenanceislikely

tobe“mostlyindependent”

•  Disadvantagesofcloning–  EvoluHonintwodimensions:userreqs+plasormsupport–  Changetotheinterfacelevelmeanschangestomanyfiles

Forking:PlasormvariaHon•  Managementandlong-termissues

–  FactoroutplasormindependentfuncHonalityasmuchaspossible–  DocumentvariaHonpoints+plasormpeculiariHes–  As#ofplasormsgrows,interfacetothesystemhardens

•  StructuralmanifestaHons–  Cloningusuallyhappensatthefilelevel

•  ClonesareoUenstoredasfiles(ordirectories)inthesamesourcedirectory•  DirectoriesmaybenamedaUerOSsorsimilar

Cloningharmfulness:Twoopensourcecasestudies

Group Pattern Good Harmful Good HarmfulForking Hardware variation 0 0 0 0Forking Platform variation 10 0 0 0Forking Experimental variation 4 0 0 0Templating Boiler-plating 5 0 6 7Templating API 0 0 0 9Templating Idioms 0 12 1 1Templating Parameterized code 5 12 10 34Customizing Replicate + specialize 12 4 15 16Customizing Bug workarounds 0 0 0 0Total 36 28 32 67

Apache httpd 2.2.4 - 60 Tokens Gnumeric 1.6.3 - 60 Tokens

Apache Gnumeric

Page 8: Cory Kapser Cloning Considered Harmful Considered Harmful ...plg.uwaterloo.ca/~migod/papers/2016/SANER-2016-MIP.pdf · • Working at a startup His office, 1986 Talk at IBM Toronto,

Thechallengeforfuturecloningresearch

•  Grandtheoriesand"acHonable"bigideasareanoblegoal,ofcourse!–  Ithelpstoavoid"yeah,OK,butwhocares?"papers

•  …butlearningto"swimwiththedata"leadstohigherqualityresearchinthelongrun–  ItabetsopportunisHcexploraHonoftheproblemspace–  …whichleadtodeeperinsightsabouttheproblemspace–  …andmakesfundamentalnaïvemistakeslesslikely

Thankyou