The perfSONAR Measurement Framework: Project Update … · The perfSONAR Measurement Framework:...

42
The perfSONAR Measurement Framework: Project Update and Roadmap http://www.perfsonar.net May 17, 2016

Transcript of The perfSONAR Measurement Framework: Project Update … · The perfSONAR Measurement Framework:...

TheperfSONARMeasurementFramework:ProjectUpdateand

Roadmaphttp://www.perfsonar.net

May17,2016

WhatisperfSONAR?• perfSONAR isatoolto:

– Set(hopefullyraise)networkperformanceexpectations– Findnetworkproblems(“softfailures”)– Helpfixtheseproblems

• Allinmulti-domainenvironments• Theseproblemsareallharderwhenmultiplenetworksareinvolved

• perfSONAR isprovidesastandardwaytopublishactiveandpassivemonitoringdata– Thisdataisinteresting tonetworkresearchersaswellasnetworkoperators

6/2/15 2

TargetperfSONARUsers• NetworkEngineers• Wide-Area NetworkOperators• DistributedDataManagers

– Largedistributedscienceprojects (e.g.: LHC)• perfSONAR isnot aimedatend-users

– Findingtheexistence ofperformance problemsisnot hardwiththerighttools

– Findingthecauseofperformance problems is hard,evenwiththerighttools

– perfSONAR isagreattoolforskillednetworkengineers todiagnoseproblems• Ifthereareenough perfSONARhostsalongthepath.

May17,2016 3

perfSONARDashboard:RaisingExpectations and

improving networkvisibility

Statusat-a-glance• Packetloss• Throughput• CorrectnessCurrentliveinstancesat:• http://ps-dashboard.es.net/• AndmanymoreDrill-downcapabilities:• Testhistorybetweenhosts• Abilitytocorrelatewithother

events• Veryvaluableforfault

localizationandisolation

6/2/15 4

Whendidnetworkchangeoccur?

May17,2016©2016,http://www.perfsonar.net 5

ProblemStatement• Inpractice,performance issuesare

prevalentanddistributed.• Whenanetworkisunderperforming

orerrorsoccur, itisdifficulttoidentifythesource,asproblemscanhappenanywhere, inanydomain.

• Local-areanetwork testingisnotsufficient, aserrorscanoccurbetweennetworks.

6/2/15 6

WhereAreTheProblems?

SourceCampus Backbone

S

NREN

Congestedorfaultylinksbetweendomains

Congestedintra- campuslinks

D

DestinationCampus

LatencydependantproblemsinsidedomainswithsmallRTT

Regional

6/2/15 7

SourceCampus

R&EBackbone

Regional

DS

DestinationCampus

Regional

PerformanceisgoodwhenRTTis<~10ms

PerformanceispoorwhenRTTexceeds~10ms

Switchwithsmallbuffers

LocalTestingWillNotFindEverything

6/2/15 8

Hardvs.SoftFailures• “Hardfailures”arethekindofproblemseveryorganizationunderstands

– Fibercut– Powerfailuretakesdownrouters– Hardwareceases tofunction

• Classicmonitoringsystemsaregoodatalertinghardfailures– i.e.,NOCseessomething turnredontheirscreen– Engineers pagedbymonitoring systems

• “Softfailures”aredifferent andoftengoundetected– Basicconnectivity (ping,traceroute,webpages,email)works– Performanceisjustpoor

6/2/15 9

SampleSoftFailure:failingopticsGb

/s

normalperformance

degradingperformance

onemonth

repair

6/2/15 10

A small amount of packet loss makes a huge difference

MetroArea

Local(LAN)

RegionalContinental

International

Measured (TCP Reno) Measured (HTCP) Theoretical (TCP Reno) Measured (no loss)

With loss, high performance beyond metro distances is essentially impossible

6/2/15 11

perfSONAR Collaboration• TheperfSONAR collaboration isaOpenSourceproject ledbyESnet, Internet2,Indiana

University,andGEANT.– Eachorganizationhascommitted1.5FTEefforttotheproject– Plusadditionalhelpfrommanyothersinthecommunity(OSG,RNP,SLAC,andmore)

• TheperfSONAR Roadmapisinfluenced by– requestsontheprojectissuetracker– annualusersurveyssenttoeveryoneontheuserlist– regularmeetingswithVOusingperfSONAR suchastheWLCGandOSG– discussionsatvariousperfSONARrelatedworkshops

• Basedon theabove,every6-12months theperfSONAR governancegroupmeetstoprioritize featuresbasedon:– impacttothecommunity– levelofeffortrequired toimplementandsupport– availability ofsomeonewiththerightskill setforthetask

May17,2016©2016,http://www.perfsonar.net 12

TargetperfSONARUsers• NetworkEngineers• Wide-Area NetworkOperators• DistributedDataManagers

– Largedistributedscienceprojects (e.g.: LHC)• perfSONAR isnot aimedatend-users

– Findingtheexistence ofperformance problemsisnot hardwiththerighttools

– Findingthecauseofperformance problems is hard,evenwiththerighttools

– perfSONAR isagreattoolforskillednetworkengineers todiagnoseproblems…• …ifthereareenoughperfSONAR hostsalong thepath.

May17,2016 13

publicperfSONARServers(May2016)• Around1600publiclyregisteredservers

– Equalnumberofnon-registeredservers?• ESnet:50

– mostly10G,includesa40Ghost inBoston– About50%arenowa‘combined’throughput/latencyhost

• GEANT:22– 100Ghostcomingsoon

• Internet2:3– PASserversareprivate,usedforalarming,butresultsareavailableviaMADDASH

• Someothertopdeployments:– Onenet (24),AMPATH(8),bc.net (10),RNP(8),Canarie (13),kreonet (14),NERO(12),AARnet

(19),JGN(17),CENIC(5),KANREN(5)

May17,2016©2016,http://www.perfsonar.net 14

WhoisrunningperfSONAR?

http://stats.es.net/ServicesDirectory/6/2/15 15

MoreperfSONARStatistics• 75%are running latest version

– v3.5.1.3,probably running auto-update• 22%of the hosts have anIPV6Address• 38%are .edu hosts• 58totaltop-level domains• 736domains• 40%haveMTU=9000• 49%havea10GNIC• 2.5%havea40GNIC

May17,2016©2016,http://www.perfsonar.net 16

perfSONARHardware• Thesedaysyoucangetagood1Uhostcapableofpushing10Gbps

TCPforaround$500(+10GNICcost,$750?).– SeeperfSONAR userlist

• Andyoucangetahostcapableof1Gforaround$150!– Getamulti-core IntelCeleron-based host

• ARMisnot fastenough!– e.g.: ZBOXbyZOTAC:

https://www.zotac.com/us/product/mini_pcs/zbox-ci323-nano

• VMsarenotrecommended– Toolsmoreaccurate ifcanguaranteeNICisolation

17

PERFSONARDETAILS

May17,2016©2014,http://www.perfsonar.net 18

perfSONAR Components

May17,2016©2016,http://www.perfsonar.net 19

RecentUpdates• 3.5(September 2015)

– Re-designed Toolkitwebinterface– Introducedbundles– Debian 7support– Improvedcentralmanagement features

• 3.5.1(March2016)– UpdatedregulartestingUI– Updatedesmond API– Synchronizedpackagenames andfilestructures betweenRedHat andDebian– Debian 8support

May17,2016©2016,http://www.perfsonar.net 20

CommonThemesFromUsers• Centralmeshesusefulbuthardtosetup• WhereshouldIruntests?• WhenwillperfSONAR supportCentOS 7?• Iwishmydashboardhadalerting• BWCTL/schedulingissues

– IwishIhadmorevisibility intoBWCTL/scheduling– Whydoesn’teverything gettrackedbyBWCTL?– Iwantmore flexible scheduling– HowdoIaddnewtools?

May17,2016©2016,http://www.perfsonar.net 21

perfSONAR4.0• Targetingbetainlatesummer,finalinthefallof2016

• Wanttotackleasmanyissuesaswecanbutneedtokeepscopedwithinwhat’spossible

May17,2016©2016,http://www.perfsonar.net 22

ImprovedSupportforCentralManagement

• Goals:– MakeiteasytoincorporateperfSONARhostsintoexistinghostmanagementsystems(puppet,chef,SaltStack,cfengine,etc.)• Includesamplepuppetconfig files

– MakeiteasytomanagemanyperfSONARhostsatasingleinstitution

– Newrpmanddebian bundlestosupportthisMay17,2016 23

CurrentperfSONARdevelopment• Oneofthethemes forv4.0willbe“ControlandScalability”

– perfSONARissuccessfulbecauseofthe‘defaultopen’model.– BUT,asthenumberofperfSONARhostsworldwidegrows,weneedawaytocontrol

• Whoisrunningtests• Howoftenaretheyallowedtoruntests• WhathostscanIrunteststo?HowtoIgetmyhostaddedtosomeoneelse’slistof

allowedhosts?• Workingonanewtestscheduler (pScheduler):

– Sharedbyalltests andawareoftheresourceseachuses– Containingfinergrainedcontrolsaboutwhocanruntestsandwhatteststhey

areallowedtorun.– Increasedvisibility andcontrolastowhentestswillberun

May17,201624

RoadmapfortheNextRelease• Newgraphsthatallowforeasiercomparisonofmultiple

metrics– basedonESnetTools teamreact-based plottingtools

• Awebinterfaceforcreatingtestmeshes• Easierselectionofendpointsbasedontopologylocation,

geographiclocation,accessibilityand/orcustomsearches• Dashboardsthatsupportalertingbasedonpatternsacrossan

entiremesh• CentOS 7/Debian 8support

May17,201625

New:EndpointSelection• Commonfeedbackisthatit’shardtodeterminewheretotest

• Wewon’tsolvealltheproblemsthisrelease,buttryingtoputsomeinfrastructureinplace– Gatheringmoremetadatainlookupservice– Leverageexistinginformationtofindclosestendpointbasedoffoftraceroute

– Lookingatwaystodetermineendpointaccessibility(i.e.isthereafirewall?Doesitblockme?)

May17,2016©2016,http://www.perfsonar.net 26

Current:MaDDash• Statusat-a-glance

– Packetloss– Throughput– Correctness

• Currentliveinstancesat:– http://ps-dashboard.es.net/

– Andmanymore• Drill-downcapabilities:

– Testhistorybetweenhosts– Abilitytocorrelatewithotherevents

• Veryvaluableforfaultlocalizationandisolation

• Currentlynowaytobepushed anotificationofanissue

May17,2016©2016,http://www.perfsonar.net 27

New:IntroducingMaDAlert• Developed atUniversityofMichigan• Looksatdashboards andscansforpatterns

– Example: Ifeveryboxforahost isorange,goodindicationhost isdown• ProvidesRESTAPItoresults• CurrentlyaGUItolookatjustthealerts

– http://madalert.aglt2.org/• WorkingonNagios checkssocanleverage thatnotificationsystemwithout

floodingyourselfwithemails• AlsohopingtointegratewithMaDDash UItomakeidentifyingcommon

problemseasier

May17,2016©2016,http://www.perfsonar.net 28

New:CentOS 7• CurrenttoolkitsrunonCentOS 6• Notaflashychange,butsurveyresultsshowthemigrationatmanyinstitutionstoRedHat/CentOS 7isalreadywellunderway

• CurrentplanistoprovideCentOS 6ANDCentOS 7RPMs ofallthe4.0packages

• LikelywillonlybeprovidingCentOS 7ToolkitISOforthisrelease

May17,2016©2016,http://www.perfsonar.net 29

PERFSONARINSTALLATIONOPTIONS

May17,2016©2014,http://www.perfsonar.net 30

perfSONAR Toolkit• CurrentlymostpeopleruntheperfSONARToolkit– FullsuiteofperfSONAR toolstoconfigure,execute,collect,andvisualizemeasurementresults

– CentOS-basedISOpre-tunedandconfiguredwithdefaultsystemandsecuritysettings

May17,2016©2016,http://www.perfsonar.net 31

perfSONAR Bundles• perfsonar-tools– Justthebasics:iperf,iperf3,bwctl,owamp

• perfsonar-testpoint– Tools+regulartesting,LSregistration

• perfsonar-core– Testpoint +esmond (forstoringresults)

May17,2016©2014,http://www.perfsonar.net 32

Will this host primarily run

regularly scheduled measurements?

Install perfsonar-tools

Do I want to manage each host through the web

UI?

Is this

host going to centrally archive or manage my other

measurement hosts?

Do I want to store

my measurements in an archive that runs

on this host?

Install perfsonar-testpoint

Install perfsonar-centralmanagement

Yes

No

Install perfsonar-toolkitYes

Install perfsonar-coreYes

Yes

NoNo

START

Who answers "No"?

- Central measurement archives

- Data transfer nodes- Hosts that use the

network for purposes beyond just measurement

Who answers "Yes"?

- Central measurement archives

Who answers "Yes"?

- Dedicated measurement hosts solely tasked with performing network measurements

Who answers "No"?

- Hosts part of a large deployment, usually centrally managed by Puppet, CFEngine, etc.

- Hosts running on minimal hardware

Who answers "Yes"?

- Hosts without access to a central archive such as those in a large deployment that do not wish to deal with the extra effort required to run a large central archive

Who answers "No"?- Hosts running on minimal

hardware - Hosts with access to a central

measurement archive- Hosts that are part of a

centrally managed mesh- You want a registered testpoint

that others can run tests to

Who answers "No"?

- Data transfer nodes- Any other host that

uses a network

Who answers "Yes"?

- Hosts part of a small deployment (1-2 hosts)

- Hosts run by new perfSONAR users wanting to explore the full set of features from collection to display

Other Useful Packages:- Dashboard:

- maddash- Host Configuration:

- perfsonar-toolkit-ntp- perfsonar-toolkit-sysctl- perfsonar-toolkit-security

- Nagios- nagios-plugins-perfsonar

perfSONAR Bundle Selection

Guide

No

May17,2016 33

Current:MeshConfig• Theideaofacentralmeshfileistodefinetestsinoneplaceforallyourhosts

• Basicprocessis:1. Manually createconfigurationfile2. ConverttoJSONusingprovided script3. PublishJSONonwebserver4. PointclientsatJSONtofigureoutwhatteststorun

(optionally pointMaDDash atJSONtodisplayresults)

May17,2016©2016,http://www.perfsonar.net 34

Current:MeshConfig File• Cangrowquickly• ESnethasoneabout10,000lineslong:– https://github.com/esnet/esnet-perfsonar-mesh/blob/master/conf/esnet-mesh_config.conf

May17,2016©2016,http://www.perfsonar.net 35

New:MeshConfig AdminUI• Replacesneedtoedittextfilebyhand– BasedonworkdoneforOSG

• Automaticallypullshostsfromlookupservice• Accesscontrolallowsyoutoassigndifferentadminseditrightstodifferentmeshes

• AutomaticallyproducesURLstoJSONspecificforeachhost(i.e.noneedtoseetestsnotinvolved-in)

May17,2016©2016,http://www.perfsonar.net 36

NEWTESTSCHEDULER:PSCHEDULER

May17,2016©2014,http://www.perfsonar.net 37

Current:TestScheduling• Currentlytwocomponents areinchargeofscheduling andexecuting tests:

– BWCTL– perfSONAR RegularTesting

• BWCTLhasbeenaroundanumberofyearsandisgoodatwhatitdoes…but it’schallenging tomake itdomore

• Lotsofrequests for:– Greatervisibilityintoscheduler– Makeiteasiertopipemoretoolsthroughitsoaren’tunexpectedconflicts– Support fordifferentwaystodefineschedules– Betterabilitytorequesttestson-demand– Greaterabilitytoextendingeneral

May17,2016©2016,http://www.perfsonar.net 38

3.5DataCollection

May17,2016©2016,http://www.perfsonar.net 39

4.0DataCollection

May17,2016©2016,http://www.perfsonar.net 40

New:pScheduler• Completely newsoftwaretohandleall thestuffBWCTLandregulartestingcould

do…plusmore• RESTAPIallowsteststoberequested, cancelled, viewed,etc• Plug-inframeworkforwritingnewtools

– Plug-insforalltheexistingtoolsincludedatlaunch– Plug-incanbewritteninanylanguage– Systemfornormalizingoutput betweensimilartools

• Plug-inframeworkforwritingtodifferentarchivers• Keepsstate indatabasesomaintain schedule betweenreboots,outages,etc• Workingondesigning amoreflexible limits andresourcemanagement

infrastructure

May17,2016©2016,http://www.perfsonar.net 41

UsefulURLs• http://docs.perfsonar.net/• http://www.perfsonar.net/• http://fasterdata.es.net/– http://fasterdata.es.net/performance-testing/network-troubleshooting-tools/

• https://github.com/perfsonar– https://github.com/perfsonar/project/wiki

42