Looking Backwards The Coming Decade of BSD...The Linux Scheduler: a Decade of Wasted Cores NUMA...
Transcript of Looking Backwards The Coming Decade of BSD...The Linux Scheduler: a Decade of Wasted Cores NUMA...
LookingBackwardsTheComingDecadeofBSD
GeorgeNeville-Neil
WelcometoEuroBSD 2026!
• FreeBSD15• Droppedsupportforsparc64andPC98
• NetBSD 11.0• DroppedVAX,Amiga,andAtariSTSupport
• OpenBSD 9.0• FirstimplementationofSMP!
SomeNotableBSDAchievements
• Scalingto32KCPUcores• SingleSystemServing10Terabits/sec• AlwaysonPetabyteFileServer• SecurityIsolationTechnologyinEveryMobileDevice• MostcommonlydeployedIoT OS• ThemostusedOStechnologyintheworld
2017BSDDeclaredDead(again)
• 64bitinode workcomplete• Firstexabyte scaleUFS3deployment• Networkstacklibrarification continues• IntegrationofConcurrencyKitprimitives• BSDAPIStandardsPublished• LLVMCompilerExtensionsBegin
2018LinuxOntheDesktop
• Threenewschedulersaddedaslibraries• MassiveMulticore(MMC)• LittleJohn(Big/LittlewrittenbyJohnBaldwin)• SkimpySched(Powerawareschedulerforembedded)• EnhancedNUMAAwarenessstartedinMMCScheduler
• 1TerabitNICssupport• VFSsystempackagedasalibrary• MSDOSFSfirstFStobeturnedintoalibrary• Adoptedasstandardbymostembeddedsystemsprojects
2019Hinkley PointBMeltdowntrackedtouseofLinux2.6kernel• Allnetworkstackcomponentsarenowlibraries• BasedonpioneeringworkwithifLib• Networkdevicedriversshrinkby2/3
• Librarification ofVMsystemstarts• FirstworkingversionofLLVMassistedsystemconfigurator• LLDBandLLVMnowdefaultforallBSDsystemsandCPUarchitectures• Allcallstoprintf()replacedbyDTrace debugging• NVDIMMSupportComplete• Librariesmaynowusememorythatnevergoesaway
2021GoogleAbandonsGoinFavorofRust
• VMsystemasalibrary• Alluserlevelconfigurationprogramsnowconsumeandemitmachinereadableoutput• AllBSDsnowcomeinflavorswhichmayormaynotlooklikedistributions• pkg systemachievessentienceanddemandsavacation
2022DragonFly SelectedasDefaultOSonOpenCompute• GEOMandStorageLayersasalibrary• Storagedriversshrinkby2/3
• bhyve nowdefaultvirtualizationsystemonallBSDs• Configuratorcannowbuildkernelimagesbetween1Mand512G• SupportforRPi10• SupportforHAL9000
• Whichisnow25yearslate• Whichweknowistypical
2023OpenBSD AdoptedastheprimaryOSatNSA,GCHQ,FSB,etc.• PCIasaFabricSupportAdded• Capsicumization ofkernelanduserspacecomponentscomplete• OpenBSD adoptscapsicum
• Configuratorcanremoteorlocalizecode• AdoptionofnewX12windowingsystem• JavaaddedtothebasesystemofallBSDs
2025AppleDonatestotheFreeBSD,NetBSDandOpenBSD Foundations
• UniversalPeace• WorldHungerEnds• RealizationoftheHumanMillennium• Everyonegetsapony!
Whatdowewanttoachieve?
• ThemostusedOStechnologyintheworld• ScalingtomanymoreCPUcores• SingleSystemServingmanyTerabits/sec• AlwaysonYottabyteFileServer• SecurityIsolationTechnologyinEveryMobileDevice• MostcommonlydeployedIoT OS• OrwouldyoupreferLinuxorWindowstorunyournextautomobile?
Howdowegetthere?
• APIs• DesignGuidelines• Easeofremoting
• Libraries• Shatterthekernel,andglueitbacktogether
• Tooling• Wenowhavethemostflexible,opensource,compilerontheplanet• Butwebarelyuseitsadvancedfeatures• Orcreateourownextensions• That,must,change…
JordanHubbardisCorrect…
Hardware/SoftwareCo-Evolution
• CPUExtensions effectonUNIX• NVME– FasterthanSSD• NVDIMM– Memory thatnevergoesaway• Morecores(18/36 availablein2014)• Morecaches(128 MBofL4willavailableonSkyLake)• FasterNICs• Terabitisnotasfarawayasyouthink
WhatwasUNIXwrittenfor?
Hot,bedtime,reading
Acompanythatcared
Behold!ThePentium4!
CurrentCPUTechnology
SchedulerUpgrades
• Isalreadypluggable!• Manymorecores• NUMA• I/OScheduling• CacheAwareness• Power• Avoidthepitfalls
TheLinuxScheduler:aDecadeofWastedCores
NUMAAwareness
• Weknowthememorytopology• Memorymustbeallocatedneartheprocess• Andprocessesoughttobestartedwherethereismemory• I/OComplicatestheproblem• ExtendtheschedulertoknowabouttheI/Olayout
SchedulingforCache
• Instructionsarecheap,cachemissesareexpensive• Nowtheoverwhelmingsourceofmostbottlenecks• Teachthescheduleraboutcachelayoutandconstraints• Optimizeforcachecoherency• Feedhwpmc samplesintotheschedulingdecisions
Power
• Big/LittleWillBecomeMoreCommon• Needtounderstandthecomputepowerofeachcore• Doweschedulefor…• Quickesttocomplete• Earliestdeadline• Lowestpowerconsumption
FrommonolithtobuildingblocksLibrarification
• NetBSD’s RUMPkernels• libuinet• ifLib• MusthavegoodAPIstandards• DocumentationstandardforAPIs
Needtokeepgoing
Iwantoneofthese!
APIDesign
• Regularity• Tractability• Composability• Assisted bythecompilertoolchain• WitheredDrivers• Easilyforwardable APIs• Betterbuildingblocks!
APIRegularity
• Thepositionofargumentsmatter• Whatistheverb?• Whatarethenouns?• ArewewritingEnglishorHebrew,orJapaneseor?
void *memcpy(void *dst, const void *src, size_t len);
void bcopy(const void *src, void *dst, size_t len);
APITractability:GoldilocksandthethreeAPIs
• TooBig• MostWindowsAPIs• X11isclassicallyterrible
• TooSmall• ioctl()consideredharmful
• Whatdoesitmean?Ican’teasilytell.• Useasalastresort
• JustRight• Between5and7arguments
APIForwarding
• In2026allsystemsaredistributedsystems• Itwastruein2016butweignoredthattruth
• Deepstructuresarehardtopack• WhatifthisAPIwasanRPC?• Pointersbecomemorefuntodealwith
• Goshallow• Splitstructuresintolocalandremotecomponents
APItoResourceRelationship
• PassingPointers• Whoallocates?• Whofrees?
• SharingLocks• Wholocks?• Whounlocks?
• MoreaboutGoldilocks• Toobig?• TooSmall• Justright?
Aworkedexample
LiterallyLitteredwithLibraries
OptimistorPessimist?
Opponentspointoutthatnosuchprogramhaseverbeenconstructedandthatexperiencewouldindicatethatevenifitcouldbebuilt,itwouldberifewithuntestableandundetectableerrors.
Proponentssaythesoftwarecouldbeassembledinsmallerpieces,whichcouldprobablybetestedadequatelyorotherwisemade“fault-tolerant.”
Ithasbeenestimatedbyexpertsthatthenecessarysoftwareprogramwouldinvolvetenmillion(1x10^7)ormorelinesofcode.
Pervasive TracingandDebug
• Deathtoprintf()!!!• TracingFeaturesMustBePervasive• Easytouse• ProduceMachineReadableOutput
How big is anOSkernel?FilesLines
C 5,685 5,140,567CHeaderFiles 5,356 2,271,425
UnifytheControlPlane
• Machinereadableoutput• Machinecontrollableinput• Addressbothhumansandprogrammers• Increaseandimproveautomation• Ifyou’redoingitbyhand,you’redoingitwrong!
“Thestudyofcomputerscienceisthestudyofwhatcanbeautomated.”D.Knuth
PuttingthePiecesTogether
• HumptyDumptyKernel• Notamicro-kernel• Thoughitcouldbe
• NeedtheConfigurator• Tooling,tooling,tooling• “Inthe80speoplegotpaidtoaddfeaturestothekernel,andinthe90stheygotpaidtotakethesamefeaturesoutofit.”– H.Massalin
OperatingSystemsAreLikeLegos
• TheBSDshavealwaysbuiltsolidarchitectures• SmallandFlexibleComponents• WelldefinedAPIs• Builtintolibraries• Comeinmanycolors!
Comments?Questions?