The perfSONAR Measurement Framework: Project Update … · The perfSONAR Measurement Framework:...
Transcript of The perfSONAR Measurement Framework: Project Update … · The perfSONAR Measurement Framework:...
WhatisperfSONAR?• perfSONAR isatoolto:
– Set(hopefullyraise)networkperformanceexpectations– Findnetworkproblems(“softfailures”)– Helpfixtheseproblems
• Allinmulti-domainenvironments• Theseproblemsareallharderwhenmultiplenetworksareinvolved
• perfSONAR isprovidesastandardwaytopublishactiveandpassivemonitoringdata– Thisdataisinteresting tonetworkresearchersaswellasnetworkoperators
6/2/15 2
TargetperfSONARUsers• NetworkEngineers• Wide-Area NetworkOperators• DistributedDataManagers
– Largedistributedscienceprojects (e.g.: LHC)• perfSONAR isnot aimedatend-users
– Findingtheexistence ofperformance problemsisnot hardwiththerighttools
– Findingthecauseofperformance problems is hard,evenwiththerighttools
– perfSONAR isagreattoolforskillednetworkengineers todiagnoseproblems• Ifthereareenough perfSONARhostsalongthepath.
May17,2016 3
perfSONARDashboard:RaisingExpectations and
improving networkvisibility
Statusat-a-glance• Packetloss• Throughput• CorrectnessCurrentliveinstancesat:• http://ps-dashboard.es.net/• AndmanymoreDrill-downcapabilities:• Testhistorybetweenhosts• Abilitytocorrelatewithother
events• Veryvaluableforfault
localizationandisolation
6/2/15 4
ProblemStatement• Inpractice,performance issuesare
prevalentanddistributed.• Whenanetworkisunderperforming
orerrorsoccur, itisdifficulttoidentifythesource,asproblemscanhappenanywhere, inanydomain.
• Local-areanetwork testingisnotsufficient, aserrorscanoccurbetweennetworks.
6/2/15 6
WhereAreTheProblems?
SourceCampus Backbone
S
NREN
Congestedorfaultylinksbetweendomains
Congestedintra- campuslinks
D
DestinationCampus
LatencydependantproblemsinsidedomainswithsmallRTT
Regional
6/2/15 7
SourceCampus
R&EBackbone
Regional
DS
DestinationCampus
Regional
PerformanceisgoodwhenRTTis<~10ms
PerformanceispoorwhenRTTexceeds~10ms
Switchwithsmallbuffers
LocalTestingWillNotFindEverything
6/2/15 8
Hardvs.SoftFailures• “Hardfailures”arethekindofproblemseveryorganizationunderstands
– Fibercut– Powerfailuretakesdownrouters– Hardwareceases tofunction
• Classicmonitoringsystemsaregoodatalertinghardfailures– i.e.,NOCseessomething turnredontheirscreen– Engineers pagedbymonitoring systems
• “Softfailures”aredifferent andoftengoundetected– Basicconnectivity (ping,traceroute,webpages,email)works– Performanceisjustpoor
6/2/15 9
SampleSoftFailure:failingopticsGb
/s
normalperformance
degradingperformance
onemonth
repair
6/2/15 10
A small amount of packet loss makes a huge difference
MetroArea
Local(LAN)
RegionalContinental
International
Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)
With loss, high performance beyond metro distances is essentially impossible
6/2/15 11
perfSONAR Collaboration• TheperfSONAR collaboration isaOpenSourceproject ledbyESnet, Internet2,Indiana
University,andGEANT.– Eachorganizationhascommitted1.5FTEefforttotheproject– Plusadditionalhelpfrommanyothersinthecommunity(OSG,RNP,SLAC,andmore)
• TheperfSONAR Roadmapisinfluenced by– requestsontheprojectissuetracker– annualusersurveyssenttoeveryoneontheuserlist– regularmeetingswithVOusingperfSONAR suchastheWLCGandOSG– discussionsatvariousperfSONARrelatedworkshops
• Basedon theabove,every6-12months theperfSONAR governancegroupmeetstoprioritize featuresbasedon:– impacttothecommunity– levelofeffortrequired toimplementandsupport– availability ofsomeonewiththerightskill setforthetask
May17,2016©2016,http://www.perfsonar.net 12
TargetperfSONARUsers• NetworkEngineers• Wide-Area NetworkOperators• DistributedDataManagers
– Largedistributedscienceprojects (e.g.: LHC)• perfSONAR isnot aimedatend-users
– Findingtheexistence ofperformance problemsisnot hardwiththerighttools
– Findingthecauseofperformance problems is hard,evenwiththerighttools
– perfSONAR isagreattoolforskillednetworkengineers todiagnoseproblems…• …ifthereareenoughperfSONAR hostsalong thepath.
May17,2016 13
publicperfSONARServers(May2016)• Around1600publiclyregisteredservers
– Equalnumberofnon-registeredservers?• ESnet:50
– mostly10G,includesa40Ghost inBoston– About50%arenowa‘combined’throughput/latencyhost
• GEANT:22– 100Ghostcomingsoon
• Internet2:3– PASserversareprivate,usedforalarming,butresultsareavailableviaMADDASH
• Someothertopdeployments:– Onenet (24),AMPATH(8),bc.net (10),RNP(8),Canarie (13),kreonet (14),NERO(12),AARnet
(19),JGN(17),CENIC(5),KANREN(5)
May17,2016©2016,http://www.perfsonar.net 14
MoreperfSONARStatistics• 75%are running latest version
– v3.5.1.3,probably running auto-update• 22%of the hosts have anIPV6Address• 38%are .edu hosts• 58totaltop-level domains• 736domains• 40%haveMTU=9000• 49%havea10GNIC• 2.5%havea40GNIC
May17,2016©2016,http://www.perfsonar.net 16
perfSONARHardware• Thesedaysyoucangetagood1Uhostcapableofpushing10Gbps
TCPforaround$500(+10GNICcost,$750?).– SeeperfSONAR userlist
• Andyoucangetahostcapableof1Gforaround$150!– Getamulti-core IntelCeleron-based host
• ARMisnot fastenough!– e.g.: ZBOXbyZOTAC:
https://www.zotac.com/us/product/mini_pcs/zbox-ci323-nano
• VMsarenotrecommended– Toolsmoreaccurate ifcanguaranteeNICisolation
17
RecentUpdates• 3.5(September 2015)
– Re-designed Toolkitwebinterface– Introducedbundles– Debian 7support– Improvedcentralmanagement features
• 3.5.1(March2016)– UpdatedregulartestingUI– Updatedesmond API– Synchronizedpackagenames andfilestructures betweenRedHat andDebian– Debian 8support
May17,2016©2016,http://www.perfsonar.net 20
CommonThemesFromUsers• Centralmeshesusefulbuthardtosetup• WhereshouldIruntests?• WhenwillperfSONAR supportCentOS 7?• Iwishmydashboardhadalerting• BWCTL/schedulingissues
– IwishIhadmorevisibility intoBWCTL/scheduling– Whydoesn’teverything gettrackedbyBWCTL?– Iwantmore flexible scheduling– HowdoIaddnewtools?
May17,2016©2016,http://www.perfsonar.net 21
perfSONAR4.0• Targetingbetainlatesummer,finalinthefallof2016
• Wanttotackleasmanyissuesaswecanbutneedtokeepscopedwithinwhat’spossible
May17,2016©2016,http://www.perfsonar.net 22
ImprovedSupportforCentralManagement
• Goals:– MakeiteasytoincorporateperfSONARhostsintoexistinghostmanagementsystems(puppet,chef,SaltStack,cfengine,etc.)• Includesamplepuppetconfig files
– MakeiteasytomanagemanyperfSONARhostsatasingleinstitution
– Newrpmanddebian bundlestosupportthisMay17,2016 23
CurrentperfSONARdevelopment• Oneofthethemes forv4.0willbe“ControlandScalability”
– perfSONARissuccessfulbecauseofthe‘defaultopen’model.– BUT,asthenumberofperfSONARhostsworldwidegrows,weneedawaytocontrol
• Whoisrunningtests• Howoftenaretheyallowedtoruntests• WhathostscanIrunteststo?HowtoIgetmyhostaddedtosomeoneelse’slistof
allowedhosts?• Workingonanewtestscheduler (pScheduler):
– Sharedbyalltests andawareoftheresourceseachuses– Containingfinergrainedcontrolsaboutwhocanruntestsandwhatteststhey
areallowedtorun.– Increasedvisibility andcontrolastowhentestswillberun
May17,201624
RoadmapfortheNextRelease• Newgraphsthatallowforeasiercomparisonofmultiple
metrics– basedonESnetTools teamreact-based plottingtools
• Awebinterfaceforcreatingtestmeshes• Easierselectionofendpointsbasedontopologylocation,
geographiclocation,accessibilityand/orcustomsearches• Dashboardsthatsupportalertingbasedonpatternsacrossan
entiremesh• CentOS 7/Debian 8support
May17,201625
New:EndpointSelection• Commonfeedbackisthatit’shardtodeterminewheretotest
• Wewon’tsolvealltheproblemsthisrelease,buttryingtoputsomeinfrastructureinplace– Gatheringmoremetadatainlookupservice– Leverageexistinginformationtofindclosestendpointbasedoffoftraceroute
– Lookingatwaystodetermineendpointaccessibility(i.e.isthereafirewall?Doesitblockme?)
May17,2016©2016,http://www.perfsonar.net 26
Current:MaDDash• Statusat-a-glance
– Packetloss– Throughput– Correctness
• Currentliveinstancesat:– http://ps-dashboard.es.net/
– Andmanymore• Drill-downcapabilities:
– Testhistorybetweenhosts– Abilitytocorrelatewithotherevents
• Veryvaluableforfaultlocalizationandisolation
• Currentlynowaytobepushed anotificationofanissue
May17,2016©2016,http://www.perfsonar.net 27
New:IntroducingMaDAlert• Developed atUniversityofMichigan• Looksatdashboards andscansforpatterns
– Example: Ifeveryboxforahost isorange,goodindicationhost isdown• ProvidesRESTAPItoresults• CurrentlyaGUItolookatjustthealerts
– http://madalert.aglt2.org/• WorkingonNagios checkssocanleverage thatnotificationsystemwithout
floodingyourselfwithemails• AlsohopingtointegratewithMaDDash UItomakeidentifyingcommon
problemseasier
May17,2016©2016,http://www.perfsonar.net 28
New:CentOS 7• CurrenttoolkitsrunonCentOS 6• Notaflashychange,butsurveyresultsshowthemigrationatmanyinstitutionstoRedHat/CentOS 7isalreadywellunderway
• CurrentplanistoprovideCentOS 6ANDCentOS 7RPMs ofallthe4.0packages
• LikelywillonlybeprovidingCentOS 7ToolkitISOforthisrelease
May17,2016©2016,http://www.perfsonar.net 29
perfSONAR Toolkit• CurrentlymostpeopleruntheperfSONARToolkit– FullsuiteofperfSONAR toolstoconfigure,execute,collect,andvisualizemeasurementresults
– CentOS-basedISOpre-tunedandconfiguredwithdefaultsystemandsecuritysettings
May17,2016©2016,http://www.perfsonar.net 31
perfSONAR Bundles• perfsonar-tools– Justthebasics:iperf,iperf3,bwctl,owamp
• perfsonar-testpoint– Tools+regulartesting,LSregistration
• perfsonar-core– Testpoint +esmond (forstoringresults)
May17,2016©2014,http://www.perfsonar.net 32
Will this host primarily run
regularly scheduled measurements?
Install perfsonar-tools
Do I want to manage each host through the web
UI?
Is this
host going to centrally archive or manage my other
measurement hosts?
Do I want to store
my measurements in an archive that runs
on this host?
Install perfsonar-testpoint
Install perfsonar-centralmanagement
Yes
No
Install perfsonar-toolkitYes
Install perfsonar-coreYes
Yes
NoNo
START
Who answers "No"?
- Central measurement archives
- Data transfer nodes- Hosts that use the
network for purposes beyond just measurement
Who answers "Yes"?
- Central measurement archives
Who answers "Yes"?
- Dedicated measurement hosts solely tasked with performing network measurements
Who answers "No"?
- Hosts part of a large deployment, usually centrally managed by Puppet, CFEngine, etc.
- Hosts running on minimal hardware
Who answers "Yes"?
- Hosts without access to a central archive such as those in a large deployment that do not wish to deal with the extra effort required to run a large central archive
Who answers "No"?- Hosts running on minimal
hardware - Hosts with access to a central
measurement archive- Hosts that are part of a
centrally managed mesh- You want a registered testpoint
that others can run tests to
Who answers "No"?
- Data transfer nodes- Any other host that
uses a network
Who answers "Yes"?
- Hosts part of a small deployment (1-2 hosts)
- Hosts run by new perfSONAR users wanting to explore the full set of features from collection to display
Other Useful Packages:- Dashboard:
- maddash- Host Configuration:
- perfsonar-toolkit-ntp- perfsonar-toolkit-sysctl- perfsonar-toolkit-security
- Nagios- nagios-plugins-perfsonar
perfSONAR Bundle Selection
Guide
No
May17,2016 33
Current:MeshConfig• Theideaofacentralmeshfileistodefinetestsinoneplaceforallyourhosts
• Basicprocessis:1. Manually createconfigurationfile2. ConverttoJSONusingprovided script3. PublishJSONonwebserver4. PointclientsatJSONtofigureoutwhatteststorun
(optionally pointMaDDash atJSONtodisplayresults)
May17,2016©2016,http://www.perfsonar.net 34
Current:MeshConfig File• Cangrowquickly• ESnethasoneabout10,000lineslong:– https://github.com/esnet/esnet-perfsonar-mesh/blob/master/conf/esnet-mesh_config.conf
May17,2016©2016,http://www.perfsonar.net 35
New:MeshConfig AdminUI• Replacesneedtoedittextfilebyhand– BasedonworkdoneforOSG
• Automaticallypullshostsfromlookupservice• Accesscontrolallowsyoutoassigndifferentadminseditrightstodifferentmeshes
• AutomaticallyproducesURLstoJSONspecificforeachhost(i.e.noneedtoseetestsnotinvolved-in)
May17,2016©2016,http://www.perfsonar.net 36
Current:TestScheduling• Currentlytwocomponents areinchargeofscheduling andexecuting tests:
– BWCTL– perfSONAR RegularTesting
• BWCTLhasbeenaroundanumberofyearsandisgoodatwhatitdoes…but it’schallenging tomake itdomore
• Lotsofrequests for:– Greatervisibilityintoscheduler– Makeiteasiertopipemoretoolsthroughitsoaren’tunexpectedconflicts– Support fordifferentwaystodefineschedules– Betterabilitytorequesttestson-demand– Greaterabilitytoextendingeneral
May17,2016©2016,http://www.perfsonar.net 38
New:pScheduler• Completely newsoftwaretohandleall thestuffBWCTLandregulartestingcould
do…plusmore• RESTAPIallowsteststoberequested, cancelled, viewed,etc• Plug-inframeworkforwritingnewtools
– Plug-insforalltheexistingtoolsincludedatlaunch– Plug-incanbewritteninanylanguage– Systemfornormalizingoutput betweensimilartools
• Plug-inframeworkforwritingtodifferentarchivers• Keepsstate indatabasesomaintain schedule betweenreboots,outages,etc• Workingondesigning amoreflexible limits andresourcemanagement
infrastructure
May17,2016©2016,http://www.perfsonar.net 41