perfSONAR GoingForward
EricBoyd,Internet2http://www.perfsonar.net
Internet2TechnologyExchangeSeptember27th 2016
ProblemStatement• TheglobalResearch&Educationnetworkecosystemiscomprisedofhundredsofinternational,national,regionalandlocal-scalenetworks.
September27,2016 2©2016,http://www.perfsonar.net
ProblemStatement• Whilethesenetworksallinterconnect,eachnetworkisowned
andoperatedbyseparateorganizations(called“domains”)withdifferentpolicies,customers,fundingmodels,hardware,bandwidthandconfigurations.
September27,2016 3©2016,http://www.perfsonar.net
ProblemStatement• Thiscomplex,heterogeneoussetofnetworksmustoperateseamlesslyfrom“endtoend”tosupportscienceandresearchcollaborationsthataredistributedglobally.
September27,2016 4©2016,http://www.perfsonar.net
ProblemStatement• Inpractice,performanceissuesareprevalentanddistributed.• Whenanetworkisunderperformingorerrorsoccur,itis
difficulttoidentifythesource,asproblemscanhappenanywhere,inanydomain.
September27,2016 5
• Local-areanetworktestingisnotsufficient,aserrorscanoccurbetweennetworks.
©2016,http://www.perfsonar.net
ProblemStatement:Hardvs.SoftFailures
• “Hardfailures”arethekindofproblemseveryorganizationunderstands– Fibercut– Powerfailuretakesdownrouters– Hardwareceasestofunction
• Classicmonitoringsystemsaregoodatalertinghardfailures– i.e.,NOCseessomethingturnredontheirscreen– Engineerspagedbymonitoringsystems
September27,2016 6©2016,http://www.perfsonar.net
ProblemStatement:Hardvs.SoftFailures
• “Softfailures”aredifferentandoftengoundetected– Basicconnectivity(ping,traceroute,webpages,email)works
– Performanceisjustpoor
• Howmuchshouldwecareaboutsoftfailures?
September27,2016 7©2016,http://www.perfsonar.net
September27,2016 8©2016,http://www.perfsonar.net
Physicalpipethatleakswateratrateof.0046%byvolume.
è è
Network‘pipe’thatdropspacketsatrateof.0046%.
è è
Result100%ofdatatransferred,slowly,at<<5%optimalspeed.
Elephant Flows Place Great Demands on NetworksResult99.9954%ofwatertransferred,at“linerate.”
essentiallyfixed
determinedbyspeedoflight
Throughcarefulengineering,wecanminimizepacketloss.
September27,2016 9©2016,http://www.perfsonar.net
SoftFailuresCausePacketLossandDegradedTCPPerformance
MetroArea
Local(LAN)
RegionalContinental
International
Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)
With loss, high performance beyond metro distances is essentially impossible
September27,2016 10©2016,http://www.perfsonar.net
publicperfSONARServers(Sept2016)• Over2000publiclyregisteredservers
– Equalnumberofnon-registeredservers?• ESnet:50
– mostly10G,includesa40GhostinBoston• GEANT:22• Internet2:3• Someothertopdeployments:
– Onenet (24),AMPATH(8),bc.net (10),RNP(8),Canarie (13),kreonet(14),NERO(12),AARnet (19),JGN(17),CENIC(5),KANREN(5)
September27,2016©2016,http://www.perfsonar.net 11
MoreperfSONARStatistics• TotalHosts:2001• TotalDomains:413• 75%arerunninglatestversion(auto-update)• 7%arerunningaversion<3.5(end-of-life,orphanhosts?)
• 27%ofthehostshaveanIPV6Address• 38%are.edu hosts• 75totaltop-leveldomains,940domains
September27,2016©2016,http://www.perfsonar.net 12
MoreperfSONARStats• 95%RHEL/CentOS;5%Debian/Ubuntu• 10%areVMs• 40%usingjumboframes• NICSpeed:
– 1Gbps49.06%– 10Gbps44.17%– 40Gbps2.88%– 100Mbps 2.32%
September27,2016©2016,http://www.perfsonar.net 13
perfSONAR 4.0• perfSONAR 4.0FeatureTour
– AndyLake– 9:00AM
• pScheduler DeepDive– MarkFeit– 9:30AM
September27,2016©2016,http://www.perfsonar.net 14
Whatcomesnext?• perfSONAR SteeringGroupisengagedinaprojecttodefinestrategicplanforpost-perfSONAR 4.0efforts
• Fourbasicthemesarebeingconsidered:– Operationsefficiency– Automationofconfiguration,execution,&analysis– Performanceofthecloud– Careandfeedingoftheopensourceeffort
September27,2016©2016,http://www.perfsonar.net 15
OperationsEfficiency• Reduceoperationseffortrequiredofnetworkoperatorsanddataintensivesciencecollaborations– EasydeploymentofephemeralpS Nodes:perfSONARcontainerization(e.g.Docker)
– Automateloganalysistoidentifykeysignatures– Improvedashboardvisualization– Enhancemeshconfigurations
AutomationofConfiguration,Execution,&Analysis
• Configuration– Automatictestconfiguration
• SignificantprogressinperfSONAR 4.0– Automaticnodediscovery/insertionintotestmessages
• Execution– Auto-detectionofproblemsalongtheend-to-endpath– Automaticallyrunadditionaltests
• Analysis– Anomalydetectiontools– Intelligentalarming(e.g.adaptiveviamachinelearning)– Archivesforresearchdata
PerformanceoftheCloud• Cloudservicesandcloudbasedworkflowsareimportantgoingforward.
– Toourresearchers– Toourlabsanduniversities
• Performancemonitoringneedstobeautomatedand‘ephemeral’tillthecloudworkflowisactive
• Supportatargetedpopularusecase– e.g.:perfSONAR AmazonEC2deployment– Showcasewhatispossible– Researchhowtosolvetheproblemmoregenerally
• Buildtowardsamoregeneralsolutioninfollow-onreleases• Observation:Increasingneed/useofperfSONAR onVMs
– Informtheuserofthelimitationsofwhatcanbeconcluded
CareandFeeding• Expandnewofferings
– pScheduler isanewoffering(1.0release)whichwillpromptanewiterationofcommunityideas• Securitychallenges,vulnerabilities,andinefficiencies
– e.g.Expandunittesting– Review,refine,andreduceinefficientcodecomponents
• Supportevolvingdefaultoperatingsystems– e.g.CentOS6->Centos7,Debian 7->Debian 8->Debian 9
• PreferredAPIinterfaces,middlewarekeepevolving– e.g.SOAP/XML->REST/JSONAPIs– messagepassingarchitectures– e.g.RabbitMQ
• CustomCodeReplacement– RemoveperfSONAR customcodewithwell-supportedpackagesastheybecomeavailable
• ELK(Elasticsearch,Logstash,Kibana)• ESnet’s ReactTimeseries Charts• IU’sTimeSeriesDataService(TSDS)
• SmallNodes– Emergingtechnologiesarerapidlyevolving– Maintainacurrentsnapshotofbestofferings
CommunityMilestones&Feedback• Milestones
– perfSONAR 4.0– RC1isoutthisweek– perfSONAR 4.0– RC2expectedinOctober– perfSONAR 4.0outinNovember
• Feedback– WhatdoyouneedfromperfSONAR post4.0?
September27,2016©2016,http://www.perfsonar.net 20
Top Related