Post on 12-Apr-2018
DeliverableD4.41
MonitoringandMaintenance-Interim
Editor G.Gardikis(SPH)
Contributors I.Koutras,G.Mavroudis,S.Costicoglou(SPH),G.Dimosthenous,D.Christofi(PTL),M.DiGirolamo(HPE),K.Karras(FINT),G.Xilouris,E.Trouva(NCSRD),M.Arnaboldi(ITALTEL),P.Harsh(ZHAW),E.Markakis(TEIC)
Version 1.0
Date November30th,2015
Distribution PUBLIC(PU)
Ref. Ares(2016)560809 - 02/02/2016
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium2
ExecutiveSummary
This deliverable is an interim report of the work currently being carried out in Task 4.4(MonitoringandMaintenance).Thetaskfocusesontheimplementationandintegrationofamonitoring framework, able to extract, process and communicatemonitoring informationfrombothphysicalandvirtualnodesaswellasVNFsatIVMlevel.
ThefirststepistheconsolidationofIVMrequirements,asexpressedinDeliverableD2.32,inordertoderivethespecificrequirementsforthemonitoringframework.Thelatterinclude:monitoring of all NFVI domains (hypervisor/compute/storage/network) as well as VNFapplications;processingandgenerationofeventsandalarms;communicationofmonitoringinformationaswellasevents/alarmstotheOrchestratorinascalablemanner.
Inparallel,acomprehensivesurveyofcloudandnetworkmonitoringtoolsisperformed,inordertoidentifytechnologieswhichcanbere-usedforVIMmonitoring.Specialemphasisisput on frameworks which integrate smoothly with Openstack, in particular OpenstackTelemetry/Ceilometer,Monasca,Gnocchi,Cyclops,Zabbix,NagiosaswellasrelevantOPNFVprojects(DoctorandPrediction).ItseemsthatmostoftheexistingtechnologicalenablersforVIMmonitoring,canonlypartiallyaddressalltheaforementionedchallengesinalightweightandresource-efficientmanner.Althoughmostofthemareindeedopenandmodular,theyare already quite complicated and resource-demanding and therefore further expandingthemtocovertheseneedswouldrequireconsiderableeffortandwouldraiseefficiencyissues.Wethusproposea“clean-slate”approachtowardsNFVmonitoringatVIMlevel,exploitingonlysomebasicenablersandaddingonlytherequiredfunctionalities.
TheT-NOVAVIMmonitoringframeworkisintroducedasacontributiontowardsthisdirection.The framework is built around the VIMMonitoringManager (VIMMM),which is the keycomponent devoted to monitoring at VIM level. The VIM MM exploits OpenStack andOpenDaylight APIs to retrieve a set ofmetrics for both physical and virtual nodes, whichshouldbesufficientformostNFVhandlingrequirements.However,inordertogainamoredetailedinsightontheVNFstatusandoperation,aMonitoringAgent,basedonthecollectdframework,isalsointroducedineachVNFVM,collectingalargevarietyofmetricsatfrequentintervals.
TheVIMMMconsistsofthefollowingcomponents:
• OpenstackandOpenDaylightconnectors,usedtoperiodicallypollthetwoplatformsviatheirmonitoringAPIs.
• AVNFApplicationconnector,whichacceptsdataperiodicallydispatchedbytheVNFapplication.ThesemetricsarespecifictoeachVNF.
• Atime-seriesdatabase(InfluxDB)fordatapersistence.
• Analarming/anomalydetectionengine–currentlyunderdevelopment-whichutilisesstatistical methods based on pre-defined but also dynamic thresholds in order toidentify possible anomalies in the NFV service and to produce the correspondingalarms/eventstobeforwardedtotheOrchestrator/VNFM.
• A Graphical User Interface (GUI), based on Grafana, which visualizes the storedmetricsandpresentsthemaslive,time-seriesgraphs.
• A Northbound API, which communicates selected metrics and events to theOrchestrator and, in turn, to the VNF Manager(s). The provided REST API allowsmetricstobecommunicatedineitherpushorpullmode.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium3
The VIMmonitoring framework is integrated, validated, evaluated and released as open-sourceintheframeoftheproject.
Itisconcludedthat,withtheproposedapproach,thegoalofdeliveringaneffective,efficientandscalablemonitoringsolutionfortheT-NOVAIVMlayer isachieved.Thesolutionunderdevelopment is able to expose to the Orchestrator and to the Marketplace enhancedawareness of the IVM status and resources, while at the same time keeping thecommunicationandsignallingoverheadatminimum.
Thenext steps in implementation involve the finalizationof theOrchestratorAPIwith thealarmingfunctionality,theintegrationofOpenDaylight,aswellastheanomalydetectionpart.Theseadvanceswillbereflectedinthefinalversionofthisdeliverable.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium4
TableofContents
1.INTRODUCTION..............................................................................................................6
2.REQUIREMENTSOVERVIEWANDCONSOLIDATION.........................................................7
3.TECHNOLOGIESANDFRAMEWORKSFORNFVMONITORING..........................................9
3.1.OPENSTACKTELEMETRY/CEILOMETER....................................................................................93.2.MONASCA.......................................................................................................................113.3.GNOCCHI.........................................................................................................................133.4.CYCLOPS..........................................................................................................................143.5.ZABBIX............................................................................................................................153.6.NAGIOS...........................................................................................................................153.7.OPNFVPROJECTS............................................................................................................16
3.7.1.Doctor....................................................................................................................163.7.2.Prediction..............................................................................................................17
3.8.OPENDAYLIGHTMONITORING.............................................................................................173.9.OTHERRELEVANTMONITORINGFRAMEWORKS.......................................................................183.10.TECHNOLOGYSELECTIONANDJUSTIFICATION.......................................................................18
4.THET-NOVAVIMMONITORINGFRAMEWORK.............................................................21
4.1.ARCHITECTUREANDFUNCTIONALENTITIES............................................................................214.2.MONITORINGMETRICSLIST................................................................................................22
4.2.1.Genericmetrics......................................................................................................224.2.2.VNF-specificmetrics..............................................................................................24
4.3.VNFMONITORINGAGENT.................................................................................................284.4.COLLECTIONOFVNF-SPECIFICMETRICS................................................................................294.5.MONITORINGOFFPGA-BASEDVNFS..................................................................................304.6.VIMMONITORINGMANAGERARCHITECTUREANDCOMPONENTS.............................................31
4.6.1.VIMMMArchitecture............................................................................................314.6.2.Interfacestocloudandnetworkcontrollers..........................................................344.6.3.NorthboundAPItoOrchestrator...........................................................................344.6.4.Anomalydetection................................................................................................374.6.5.Time-seriesDatabase............................................................................................374.6.6.Graphicaluserinterface........................................................................................38
4.7.PACKAGING,DOCUMENTATIONANDOPEN-SOURCERELEASE....................................................38
5.VALIDATION.................................................................................................................40
5.1.FUNCTIONALTESTING........................................................................................................405.2.BENCHMARKING...............................................................................................................425.3.FULFILLMENTOFREQUIREMENTS.........................................................................................43
6.CONCLUSIONSANDFUTUREWORK..............................................................................45
7.REFERENCES.................................................................................................................46
8.LISTOFACRONYMS......................................................................................................48
9.ANNEXI:SURVEYOFRELEVANTIT/NETWORKMONITORINGTOOLS............................49
9.1.1.IT/Cloudmonitoring..............................................................................................49
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium5
9.1.2.NetworkMonitoring..............................................................................................51
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium
6
1. INTRODUCTION
This deliverable is an interim report of the work currently being carried out in Task 4.4(MonitoringandMaintenance).Task4.4focusesontheimplementationandintegrationofamonitoring framework, able to extract, process and communicatemonitoring informationfrom both physical and virtual nodes as well as VNFs at IVM level. In other words, theoperationalscopeofthemonitoringframeworkbeingdevelopedinTask4.4correspondstothe two lower layers of the T-NOVAarchitecture, namely theNFVI andVIM. Themetrics1collected,alongwithalarms/eventsgenerated,areinturncommunicatedtotheupperlayers(OrchestratorandMarketplace),sothatthelatterhaveacomprehensiveviewofthestatusoftheinfrastructureresourcesaswellasthenetworkservicesrunningonthem.
Thepresentdocumentisstructuredasfollows:
• Chapter 2 overviews and consolidates the T-NOVA system and IVM requirementswhichdirectlyorindirectlyaffectthemonitoringframework.
• Chapter3presentsa surveyof themostprominentenabling technologies forNFVmonitoring, aswell as relevantOpenstack, OpenDaylight andOPNFV projects andpresentsajustificationforthetechnologiesused.
• Chapter 4 presents the architecture and the functional blocks of the T-NOVAVIMmonitoringframework.
• Chapter5presentsthetestingandvalidationoftheframeworkagainstspecifictestcases.
• Finally,Chapter6concludesthedocument.
1ItmustbeclarifiedthatTask4.4focusesonthecollectionofdynamicmetrics,i.e.metricswhichchangefrequentlyinrelationtoresourceusage.Staticinformationreflectingthestatusandcapabilitiesofinfrastructure,e.g.numberof installed compute nodes, processing resources per node etc. are assumed to be handled by Task 3.2(InfrastructureRepository).
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium7
2. REQUIREMENTSOVERVIEWANDCONSOLIDATION
DeliverableD2.32[D232]hasdefinedandidentifiedarchitecturalconceptsandrequirementsfortheIVM(NFVIandVIM)layers.ThetechnicalrequirementswhichdrivethespecificationanddevelopmentoftheT-NOVAmonitoringframeworkcanbedirectlyderived/inheritedbythespecific IVMrequirements.Table1belowidentifiesthe IVMrequirementsthat–eitherdirectlyorindirectly-areassociatedtoIVMmonitoring,focusingonNFVI-PoPresourcesanddescribes how each of these are translated to a specific requirement for the monitoringframework.
Table1.IVMrequirementswhichaffectthemonitoringframework
IVMReq.ID
IVMRequirementName RequirementfortheMonitoringFramework
VIM.1Abilitytohandleheterogeneousphysicalresources
TheMFmustprovideavendoragnosticmechanismforphysicalresourcemonitoring.
VIM.2
Abilitytoprovisionvirtualinstancesoftheinfrastructureresources
TheMFmustbeabletoreportthestatusofvirtualizedresourcesaswellasfromphysicalresourcesinordertoassistplacementdecisions
VIM.3 APIExposure TheMFmustprovideaninterfacetotheOrchestratorforthecommunicationofmonitoringmetrics.
VIM.6
Translationofreferencesbetweenlogicalandphysicalresourceidentifiers
TheMFmustre-useresourceidentifierswhenlinkingmetricstoresources.
VIM.8 ControlandMonitoring
TheMFmustmonitorinrealtimethephysicalnetworkinfrastructureaswellasthevNetsinstantiatedontopofit,providingmeasurementsofthemetricsrelevanttoservicelevelassurance.
VIM.9 Scalability TheMFmustkeepupwithdynamicincreaseofthenumberofresourcestobemonitored
VIM.18 QueryAPIandMonitoring
TheMFmustprovideanAPIforcommunicatingmetrics(ineitherpushorpullmode)
VIM.21 VirtualisedInfrastructureMetrics
TheMFmustcollectperformanceandutilisationmetricsfromthevirtualisedresourcesintheNFVI.
C.7 ComputeDomainMetrics TheMFmustcollectcomputedomainmetrics.
C.12 Hardwareacceleratormetrics TheMFmustcollecthardwareacceleratormetrics
H.1 ComputeDomainMetrics TheMFmustcollectcomputemetricsfromtheHypervisor.
H.2 NetworkDomainMetrics
TheMFmustcollectnetworkdomainmetricsfromtheHypervisor.
H.12 Alarm/ErrorPublishing TheMFmustprocessanddispatchalarms.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium8
N.5 Usagemonitoring TheMFmustcollectmetricsfromphysicalandvirtualnetworkingdevices.
N.8 SDNManagement TheMFmustleverageSDNmonitoringcapabilities.
Byconsolidatingtheaforementionedrequirements,itbecomesclearthatthebasicrequiredfunctionalitiesoftheIVMmonitoringframeworkareasfollows:
1. CollectionofITandnetworkingmetricsfromvirtualandphysicaldevicesoftheNFVI.It should be noted that at the IVM level,metrics correspond only to physical andvirtual nodes and are not associated to services since the VIM does not haveknowledge of the end-to-end Network Service. Metrics are mapped to NetworkServicesatOrchestratorlevel;
2. Processingandgenerationofeventsandalarms;
3. Communicationofmonitoringinformationandevents/alarmstotheOrchestratorinascalablemanner;
ThefollowingchapteroverviewsseveraltechnologicalframeworksforNFVmonitoringwhichcouldbepartiallyexploitedtowardsfulfillingtheserequirements.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium9
3. TECHNOLOGIESANDFRAMEWORKSFORNFVMONITORING
ThischapterpresentsabriefoverviewofthemostrelevantmonitoringframeworkswhichcanbeappliedtotheNFVdomain.ThissectionmostlyfocusesonmonitoringtoolswhichprovideasatisfactorydegreeofintegrationwithOpenStackandcanbeextendedforNFVmonitoring;amorecomprehensivesurveyofother,mostgenericITandnetwork/SDNmonitoringtoolscanbefoundinAnnex2.
3.1. OpenStackTelemetry/Ceilometer
The goal of the Telemetry project within OpenStack [Telemetry], is to reliably collectmeasurementsoftheutilisationofphysicalandvirtualresources,comprisingdeployedclouds,storesuchdata forofflineusage,and triggeractionson theoccurrenceofgivenevents. Itincludesthreedifferentservices(Aodh,CeilometerandGnocchi–seeSec.3.3),providingthedifferent stagesof thedatamonitoring functional chain:Aodhdeliversalarming functions,Ceilometerdealswithdatacollection,Gnocchiprovidesatime-seriesdatabasewithresourceindexing.
The actual data collection service in the Telemetry project is Ceilometer. Ceilometer is anOpenStackservicewhichperformscollectionofdata,normalizesanddulytransformsthem,making them available to other services (starting from the Telemetry ones). Ceilometerefficiently collects themetering data of virtual machines (VMs) and the computing hosts(Nova),thenetwork,theOperatingSystemimages(Glance),thediskvolumes(Cinder),theidentities (Keystone), the object storage (Swift), the orchestration (Heat), the energyconsumption(Kwapi)andalsouser-definedmeters.
Figure1.OpenStackTelemetry/Ceilometerarchitecture
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium10
Figure1depictsanoverallsummaryoftheTelemetry/Ceilometerlogicalarchitecture.EachoftheTelemetryservicesaredesignedtoscalehorizontally.Additionalworkersandnodescanbe added depending on the expected load. The system consists of the following basiccomponents:
• Pollingagents;theseare:o compute agents (ceilometer-agent-compute): they run on each compute
nodeandpollforresourceutilisationstatistics;o central agents (ceilometer-agent-central): it runs on one or more central
management servers topoll for resourceutilisation statistics for resourcesnottiedtoinstancesorcomputenodes;
• Notificationagents;theserunononeormorecentralmanagementserverstomonitorthemessagequeues(fornotificationsandformeteringdatacomingfromtheagent);
• Collectors(ceilometer-collector):designedtogatherandrecordeventandmeteringdatacreatedbynotificationandpollingagents.
• Databases, containing Events, Meters and Alarms; these are capable of handlingconcurrentwrites (fromoneormore collector instances) and reads (from theAPImodule);
• AnAlarm Evaluator andNotifier (ceilometer-alarm-notifier): Runs on one ormorecentral management servers to allow configuration of alarms based on thresholdevaluation fora collectionof samples. This functionality isnowundertakenby theAodhmodule,aswillbedescribedlater.
• AnAPImodule(ceilometer-api):Runsononeormorecentralmanagementserverstoprovideaccesstothedatafromthedatastore.
Ceilometerofferstwoindependentwaystocollectmeteringdata,allowingeasyintegrationofanyOpenstack-relatedprojectwhichneedstobemonitored:
• Bylisteningtoeventsgeneratedonthenotificationbus,transformedintoCeilometersamples.Thisisthepreferredmethodofdatacollection,sinceitisthemostsimpleandstraightforward.Itrequires,however,thatthemonitoredentityusesthebustopublishevents,whichmaynotbethecaseforallOpenStack-relatedprojects.
• BypollinginformationviatheAPIsofmonitoredcomponentsatregularintervalstocollect information. The data are usually stored in a database and are availablethroughtheCeilometerRESTAPI.Thismethodisleastpreferredduetotheinherentdifficultyinmakingsuchacomponentresilient.
Eachmetermeasures a particular aspect of resource usage or on-going performance. Allmetershaveastringname,aunitofmeasurement,andatypeindicatingwhethervaluesaremonotonically increasing (cumulative), interpreted as a change from the previous value(delta), or a standalone value relating only to the current duration (gauge). Samples areindividualdatapointsassociatedwithaparticularmeterandhaveatimestampandavalue.Theaggregationofasetofsamplesforaspecifiedduration(start-endtime)iscalledastatistic.Eachstatistichasalsoanassociatedtimeperiod,whichisarepeatingintervaloftimethatthesamples are grouped for aggregation. Currently there are five aggregation functionsimplemented:count,max,min,avgandsum.
AnotherfeatureofTelemetryisalarming,whichusedtobeinternaltoCeilometer,butmovedtoaseparateproject,Aodh[Aodh].Analarmisasetofrulesdefiningamonitorofastatisticthatwilltriggerwhenathresholdconditionisbreached.Analarmcanbesetonasinglemeter,oronacombinationofmetersandcanhavethreestates:
• alarm(thethresholdconditionisbreached)
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium11
• ok(thethresholdconditionisnotmet)• insufficientdata(notenoughdatahasbeengatheredtodetermineifthealarmshould
fireornot).
Thetransitiontothesestatescanhaveanassociatedaction,whichiseitherwritingtoalogfile or an http post to a URL. The concept ofmeta-alarm is also supported;meta-alarmsaggregateoverthecurrentstateofasetofotherbasicalarmscombinedviaalogicaloperator(AND/OR).Forexample,ameta-alarmcouldbetriggeredwhenthreebasicalarmsbecomeactiveatthesametime.
3.2. Monasca
Monasca [Monasca] is an OpenStack project, aiming at developing an open-sourcemulti-tenant,highlyscalable,performant,fault-tolerantmonitoring-as-a-servicesolution,whichisintegratedwithintheOpenStackframework.MonascausesaRESTAPIforhigh-speedmetricsprocessingandquerying,andhasastreamingalarmandnotificationengine.MonascaisbeingdevelopedbyHPE,RackspaceandIBM.
Monascaisconceivedtoscaleuptoserviceproviderlevelofmetricsthroughput(intheorderof100,000metrics/sec). TheMonascaarchitecture isnativelydesigned to support scaling,performanceandhigh-availability.Retentionperiodofhistoricaldataisnotlessthanoneyear.Storageofmetrics values, andmetricsdatabasequery,useanHTTPRESTAPI.Monasca ismulti-tenant, and exploits OpenStack authentication mechanisms (Keystone) to controlsubmissionandaccesstometrics.
Themetricdefinitionmodelconsistsofa(key,value)pairnameddimension.Basicthreshold-basedreal-timealarmsareavailableonmetrics.Furthermore,complexalarmeventscanbedefinedendinstrumented,basedonasimpledescriptiongrammarwithspecificexpressionsandoperators.
Monasca agents embed a number of built-in systemand service level checks, plusNagioschecksandstatsd.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium12
Figure2.Monascaarchitecture
Monasca agents are Python based, and consist of several sub-components and supportssystemmetrics,suchascpuutilizationandavailablememory,Nagiosplugins,statsdandmanybuilt-inchecksforservicessuchasMySQL,RabbitMQ,andmanyothers.
TheRESTAPIprovidesanexhaustivesetoffunctions:
• Real-timestorageandqueryingoflargeamountsofmetrics;• Statisticsqueryformetrics;• Alarmdefinitionmanagement(create,delete,update);• Queryandcleanupofhistoricalmetricsdatabase;• Compoundalarmsdefinition;• Alarmseverityranking;• Fullstorageofalarmtransitionpattern;• Managementofalarmnotificationmechanisms;• JavaandPythonAPIavailable
PublishedmetricsandeventsarepushedintoaKafka2basedmessagequeue,fromwhichacomponentnamedPersisterpullsthemoutandstoresthemintothemetricsdatabase(HPEVertica, InfluxDB and Cassandra are supported). Other engine components look aftercompound metric creation, predictive metrics, notification, and alarm thresholdmanagement.
Monascaalsoincludesamulti-publisherpluginforOpenStackceilometer,abletoconvertandpublishmetricsamplestotheMonitoringAPI,plusanOpenStackHorizondashboardasuserinterface.
2http://kafka.apache.org/
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium13
Monasca features like real-time alarm processing, integration with OpenStack andscalability/extendibilitymakeitamonitoringsystempotentiallywellsuitabletobeemployedwithinNFVplatforms.
3.3. Gnocchi
Gnocchi[Gnocchi]saprojectincubatedundertheOpenStackTelemetryprogramumbrella,addressingthedevelopmentofaTDBaaS(TimeSeriesDatabaseasaService)framework.ItsparamountgoalistofixthesignificantperformanceissuesexperiencedbyCeilometerinthetimeseriesdatacollectionandstorage.Therootcauseof such issues is thehighlygenericnature of Ceilometer’s datamodel, which gave the needed design flexibility in the initialOpenStack releases, but imposed a performance penalty which is no longer deemedacceptable(storingalargeamountofmetricsonseveralweeksmakessubstantiallycollapsethestoragebackend).Thecurrentdatamodelononehandencompassesmanyoptionsneverappearing in real user requests, on the other hand doesn't handle use cases which areovercomplexortooslowtoberun.Fromtheaforementionedremarks,theideaofabrandnew solution formetrics sample collectionwas ignited,whichbrought to the inceptionofGnocchi.
Diving deeper into the problem, whereas event collection model in Ceilometer is prettyrobust,metrics collectionand storage suffers theaforementionedperformance flaws. Therootoftheproblemisinthefreeformmetadataassociatedtoeachmetric,storingabevyofredundant information which is hard to efficiently query. Gnocchi proposes a faster andscalablepairoftimeseriesstorage/resourceindexer,withaRESTAPIreturninganentity(themeasured thing) and a resource (information metadata). Differently from Ceilometer, inGnocchidatastoresareseparatedformetricsandmetadata.
Figure3.Gnocchiarchitecture
Thestoragedriver(abstracted)isinchargeofmetricsstorage.Aggregatedmetricsareactuallypre-aggregatedbeforethestorageoperationoccurs,basedontheuserrequestatentitytimecreation.Thecanonical implementationof timeseriesdata (TSD) storageusesPandasandSwift.
Theindexerdriver(abstractedaswell)usesSQLAlchemy,toexploitthespeedandindexablenatureofSQL,verywellfittingindexingstorage.InGnocchivision,therewillbepredefinedresource schemas (image, instance,…) to improve indexing and querying at themaximumextent.
Additional functional updates envisioned in Gnocchi include configurable per time-seriesretentionpolicies.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium14
Infutureperspective,GnocchiAPIshouldbetransitionedtoCeilometerAPIV3,anditsTSDBinteractionfullymovedintotheCeilometercollector.Inaninitialphase,Gnocchishouldbeintegrated as self-standing code inside the Ceilometerworkflow (via Ceilometer DatabaseDispatcher).
3.4. Cyclops
Cyclops [Cyclops] is a generic rating-charging-billing (rcb) framework that allows arbitrarypricing and billing strategies to be implemented. The business rules are registered andprocessedinsideofaDroolsBPMruleengine[Drools].Theframeworkitselfisorganizedasasetofmicro-serviceswithclearseparationoffunctionalitiesandcommunicationamongthemcarriedoverwelldefinedRESTfulinterfaces.
Figure4.Cyclopsarchitecture
Figure 4 above shows the key micro services and modules that make up the Cyclopsframework.Abriefexplanationofeachmoduleisdescribednext:
• udrμ-service: thisservice is responsibleofmetriccollection fornativelysupportedcloud platforms, currently OpenStack and CloudStack are supported. The metriccollectiondriversforotherpopularframeworksincludingPaaSaspubliccloudvendorsareintheplans.ItalsopersiststhecollectedmetricsdataintoTSDB(InfluxDB0.9.xdatastore).Fornon-nativelysupportedapplications,itprocessesthemetricspushedintothemessagingqueuesasandwhentheyarrivethusenablingtheframeworktosupportallsortsofcompositeandconvergedbillingneeds.
• rcμ-service:theratingandchargingserviceprocessestheudr-recordsandtransformsthemintocharge-recordswithcostdetails.Theratingpartoftheservicegenerates/fetches the ratevalue for identifiedservicesor resources that form in theproductportfolioofanyprovider.The rating rulesareprocessed throughdroolsBPMrulesengine, enabling providers to activate dynamic rating and maximize the revenuepotentialoftheirportfoliowhilemaintainingconsumersatisfactionandloyalty.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium15
• billingμ-service:asthenamesuggests,thisserviceprocessesandaggregatesallthechargerecordscreatedbyrcμ-service; furthermore itprocessesanySLAviolationsandassociatedpenaltiesforaspecifiedtimeperiod.Thisservicealsoprocessesanypending service credits, discounts and seasonal offers and applicable regional taxratesbeforegeneratingthefinalbilldocument.
• Auth-n/zμ-service:Cyclopsmicro-servicesvalidateAPIrequestsusingsecuretokens.
• Messagingservice: this featureallowsexternalnon-nativelysupportedapplicationsandplatformstosendintheirusagemetricsdataforfurtherprocessingbytheCyclopsframework.
• Dashboard: this module provides a rich graphical user interface for customers tomanageandviewtheirusagechartsandbills,andallowsadminstocontrolvariousparametersoftheframeworkandalsomanagethepricingandbillingrules.
Astheframeworkisimplementedasadistributedplatform,thehealthstatusmonitoringofvariousserviceiscritical.Forthis,currentlySensu[Sensu]isusedtotrackthealivenessofeachservice. Sensu could also be used to manage the data collection tasks scheduling andtriggering. Although the framework designers are migrating towards a self-containedschedulerfortheirdatacollectionandprocessingrequirements.Theusagemetricscollectiondependsheavilyonthegranularityoftheservicemonitoringimplementation.
3.5. Zabbix
Zabbix[Zabbix]isanopensource,general-purpose,enterprise-classnetworkandapplicationmonitoring tool that can be customised for use with OpenStack. It can be used toautomatically collect and parse data from monitored cloud resources. It also providesdistributedmonitoringwithcentralisedWebadministration,ahighlevelofperformanceandcapacity, JMX monitoring, SLAs and ITIL KPI metrics on reporting, as well as agent-lessmonitoring.AnOpenStackTelemetrypluginforZabbixisalreadyavailable.
Using Zabbix the administrator can monitor servers, network devices and applications,gatheringstatisticsandperformancedata.MonitoringperformanceindicatorssuchasCPU,memory, network, disk space andprocesses canbe supported throughan agent,which isavailable as a native process for Linux, UNIX andWindows platforms. For the OpenStackinfrastructureitcancurrentlymonitor:
● Core OpenStack services: Nova, Keystone, Neutron, Ceilometer (OpenStackTelemetry),Horizon,Cinder,Glance,SwiftObjectStorage,andOVS(OpenvSwitch)
● Core infrastructure components: MySQL, RabbitMQ, HAProxy, memchached, andlibvirtd.
● Operatingsystemstatistics:DiskI/O,CPUload,freeRAM,etc.ZabbixisnotlimitedtoOpenStackcloudinfrastructures:itcanbeusedtomonitorVMwarevCenter and vSphere installations for various VMware hypervisor and virtual machinepropertiesandstatistics.
3.6. Nagios
Nagiosisanopensourcetoolthatprovidesmonitoringandreportingfornetworkservicesandhost resources [Nagios]. The entire suite is based on the open-source Nagios Corewhichprovides monitoring of all IT infrastructure components - including applications, services,operating systems, networkprotocols, systemmetrics, andnetwork infrastructure.Nagiosdoesnotcomeasaone-size-fits-allmonitoringsystemwiththousandsofmonitoringagents
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium16
andmonitoringfunctions;itisratherasmall,lightweightsystemreducedtothebareessentialof monitoring. It is also very flexible since it makes use of plugins in order to setup itsmonitoringenvironment.
Nagios Fusion enables administrators to gain insight into the health of the organisation'sentirenetworkthroughacentralisedviewoftheirmonitoredinfrastructure.Inaddition,theycanautomatetheresponsetovariousincidentsthroughtheusageofNagiosIncidentManagerandReactor.TheNetworkAnalyser,whichispartofthesuite,providesanextensiveviewofallnetworktrafficsourcesandpotentialsecurity threatsallowingadministrators toquicklygatherhigh-level informationregardingthestatusandutilisationofthenetworkaswellasdetailed data for complete and thorough network analysis. All monitoring information isstored in the Log Server that provides monitoring of all mission-critical infrastructurecomponents–includingapplications,services,operatingsystems,networkprotocols,systemsmetrics,andnetworkinfrastructure.
NagiosandTelemetryarequitecomplementaryproductswhichcanbeusedinanintegratedsolution. The ICCLab, which operates within the ZHAW’s Institute of Applied InformationTechnology,hasdevelopedaNagiospluginwhichcanbeusedtocapturemetricsthroughtheTelemetryAPI,thusallowingNagiostomonitorVMsinsideOpenStack.Finally,theTelemetryplugincanbeusedtodefinethresholdsandtriggersintheNagiosalertingsystem.
3.7. OPNFVProjects
3.7.1. Doctor
Doctor (Fault Management) [Doctor] is an active OPNFV requirements project. StartedDecember2014,itsaimistobuildfaultmanagementandmaintenanceframeworkforhighavailabilityofNetworkServicesontopofvirtualizedinfrastructure.Theprojectissupportedbyengineersfromseveralmajortelecomvendorsaswellastelcoproviders.
So far, the project has produced a report deliverable which was very recently released(October2015)[DoctorDel].ThisreportidentifiesusecasesandrequirementsforanNFVfaultdetectionandmanagementsystem.Inspecific,thefollowingrequirementsareidentifiedforaVIM-layermonitoringsystem:
• Monitoringofresources
• Detectionofunavailabilityandfailures
• CorrelationandCognition(especiallycorrelationoffaultsamongresources)
• Notificationbymeansofalarms
• Fencing,i.e.isolationofafaultyresource
• Recoveryactions
Doctor has also specified an architectural blueprint for the fault management functionalblockswithintheNFVinfrastructure,asshowninFigure5.Inparticular,itisenvisagedthatcertainfunctionalitiesforcontrol,monitoring,notificationandinspectionneedtobeincludedintheVIM.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium17
Figure5.Doctorfunctionalblocks
Asafirstimplementationproposal,thereportproposestore-useandintegratesomeoff-theshelf solutions for these functionalities, namely Ceilometer (see Sec. 3.1) for theNotifier,Zabbix(seeSec.3.5)fortheMonitorandMonasca(seeSec.3.2)fortheInspector.
However, it is evident that integrating these frameworks results in considerable overlaps,sincemanyfunctionalitiesarepresentinallofthem(e.g.metricscollection,storage,alarmingetc.) and thus may produce an unnecessarily complex and overprovisioned system. Inaddition,somekeyrequirementsmentionedinthedocument,suchascorrelationandrootcausedetectionarenotcoveredbythepresentversionsoftheseframeworks.
3.7.2. Prediction
Data Collection for Failure Prediction [Prediction] is another OPNFV project, aiming toimplementasystemforpredictingfailures.Notificationsproducedcanbedispatchedtothefaultmanagementsystem(seeprevioussection),sothatthelattercanproactivelyrespondtofaults,beforetheseactuallyhappen.
Thescopeoftheprojectisverypromisingindeedandalsoveryrelevant.However,itsprogressseems quite limited for the time being. The plan is to introduce just the data collectioncapabilitiesinOPNFVRelease2.
3.8. OpenDaylightmonitoring
SinceOpenDaylighthasbeenselectedastheSDNnetworkcontrollerforT-NOVA,itisrelevanttoinvestigatethemonitoringcapabilitiesitprovides.
The OpenDaylight Statistics Manager module, implements statistics collection, sendingstatistics requests to all enabled nodes (managed switches) and storing responses in thestatisticsoperationalsubtree.TheStatisticsManagercollectsinformationonthefollowing:
• node-connector(switchport)
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium18
• flow• meter• table• groupstatistics
IntheHydrogenandHeliumreleases,monitoringmetricswereexposedviathenorthboundStatisticsRESTAPI.
TheLithiumreleaseintroducestheModel-DrivenServiceAbstractionLayer(MDSAL),whichstoresstatus-relateddataintheformofadocumentobjectmodel(DOM),knownasa“datatree.”MDSAL'sRESTful interfacesforconfigurationandmonitoringaredesignedbasedonRESTCONFprotocol.TheseinterfacesaregenerateddynamicallyatruntimebasedonYANGmodelsthatdefineitsdata.
3.9. Otherrelevantmonitoringframeworks
Apartfromtheframeworkssurveyedinthepresentsections,thereexistsalargenumberofIT/CloudandNetwork/SDNmonitoringtools,manyofthemopen-source,whichcouldbere-used as components of anNFVmonitoring platform. Some of themost popular tools arepresentedinFigure6,andarebrieflyoverviewedinAnnexI.
Figure6.OtherrelevantCloud/SDNMonitoringframeworks
WhilemostoftheseframeworksrequireconsiderableeffortinordertobeadaptedtosuittheneedsofNFVmonitoring,therearecertaincomponentswhicharequitematureandcanbereused. For example, in T-NOVA, the collectd module (the core version) is adopted asmonitoringagentforVNFsandcomputenodes,aswillbedescribedinthenextsection.
3.10. Technologyselectionandjustification
With regard to the basic functionalities identified in Section 2 as requirements for VIMmonitoring,metricscollection(Functionality1)canalreadybeachievedbyre-usinganumberofthepre-existingmonitoringmechanismsforvirtualisedinfrastructures,assurveyedinthefollowingsection.Apartfromselectingandproperlyintegratingtheappropriatetechnologiesandpossiblyselectingtheappropriatesetofmetrics, limitedprogressbeyondthestate-of-the-artshouldbeexpectedinthisfield.
Shinken Icinga Zenoss Ganglia
StackTach Healthmon SeaLion MonALISA
collectd,StatsDandGraphite
vSphere AmazonCloudWatch OpenNetMon
Payless DCM Flowsense
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium19
On the other hand, the actual challenges and envisaged innovation of the monitoringframeworkareseentobeassociatedwithFunctionalities2and3.Specifically,thefollowingchallengeshavebeenidentified:
• Eventsandalarmsgeneration:Movingbeyondthetypicalapproach,whichisfoundinmostmonitoringsystemsandisbasedonstaticthresholds(i.e.generateanalarmwhenametrichas crossedapre-defined threshold) theaim is to studyandadoptmore dynamic methods for fault detection. Such methods should be based onstatistical methods and self-learning approaches, identifying outliers in systembehaviourandtriggeringalarmsreactivelyorevenproactively(e.g.beforetheactualfaulthasoccurred).Thisanomalydetectionprocedure,inthecontextofT-NOVA,canclearlybenefitfromthefactthatthemonitoredservicesarecomposedofVNFsratherthan generic VMs. As virtual appliances dedicated to traffic processing, VNFs areexpectedtoexposesomecommoncharacteristics(e.g.theCPUloadisexpectedtoproportionallyrise,notnecessarilylinearly,withtheincreaseofprocessedtraffic).Asignificant deviation from this correlation could, for example, indicate a potentialmalfunction.
• Communicationwith theOrchestrator:With this functionality, scalability is thekeyrequirement that needs to be fulfilled. In an operational environment, theOrchestrator is expected to manage tens or hundreds of NFVI-PoPs (or eventhousands,ifmicro-datacentresdistributedintheaccessnetworkareenvisaged).Itis thus impossible for the Orchestrator to handle the full set of metrics from allmanagedphysicalandvirtualnodes.ThechallengeistooptimisethecommunicationofmonitoringinformationtotheOrchestratorsothatonlynecessaryinformationistransmitted.Thisoptimisationdoesnotonlyimplyfine-tuningofpollingfrequency,carefuldefinitionofaminimalsetofmetricsorproperdesignofthecommunicationprotocol, but also requires an intelligent aggregation procedure at VIM level. Thisprocedureshouldachievethegrouping/aggregationofvariousmetricsfromdifferentpartsoftheinfrastructureaswellasofalarms,andthedynamicidentificationoftheinformationthatisofactualvaluetotheOrchestrator.
Toachievetheaforementionedinnovations,Task4.4workplaninvolvesinitsinitialstagetheestablishmentofabaselineframeworkwhichfulfilsthebasicfunctionalitiesbycollectingandcommunicating metrics and, as a second step, the study, design and incorporation ofinnovativetechniquesforanomalydetectionandmetricsaggregation.
Therearemanyalternativewaystowardsthisdirection,whoseassessmentisoverviewedinthefollowingtable.
Table2.AssessmentofvariousimplementationchoicesforVIMmonitoring
Implementation choice for T-NOVAVIMmonitoring
Pros Cons
Integrationofrequiredfunctionalities(pushmeters,statisticalprocessing,integrationofguestOSandVNFmetrics)intoCeilometerandAodh.
DirectintegrationintoOpenstack,contributiontoamainstreamproject.TakesadvantageofCeilometer’sopenandmodulararchitecture.
WillrequireintrusiveinterventionsintoCeilometer.SolutionwillbeOpenstack-specificandalsoversion-specific.Also,Ceilometersuffersspecificscalabilityissues.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium20
AdoptionofMonasca,withsomeextensions(pushmeters,statisticalprocessing,integrationofguestOSandVNFmetrics)
Monascaisacompletemonitoringsystemwithremarkablescalabilityandalsoquitemature.ItsRESTAPIalreadyprovidesanexhaustivesetoffunctions.Monascahasanopenandmodulararchitecture.
Monascaisquitecomplexandresource-demanding,involvingmanycapabilitieswhicharenotrequiredinT-NOVA,giventhatmetricsarealsoprocessedatOrchestrator.Requiresaspecialmonitoringagent(monasca-agent).
ExtensionofGnocchi(asaTDBaaSframework)withallnecessarycommunicationandprocessingtools
Gnocchiisquitematureandadvancingrapidly,isalsowellintegratedwithCeilometertoprovidescalability.
Willneedtoimplementseveralextensionsforcommunicationandprocessing,sinceGnocchimainlyprovidesastoragesolution.
ExtensionofCyclopswithallnecessarycommunicationandprocessingtools
Quitematuresolution,know-howavailablewithinT-NOVAconsortium(CyclopsisdevelopedbyZHAW)
WillrequireextensivemodificationsinceCyclopsismostlyarating-charging-billingplatform.
ExtensionofNagiosorZabbixwithallnecessarycommunicationandprocessingtools
Botharewell-provenmonitoringframeworksandprovidesupportformultiplesystemsandapplications
NagiosandZabbixalreadyinvolvemanyfeaturesandcapabilitieswhicharenotneededinT-NOVA,andthustheirextensionwouldbeinefficient,alsorequiringseveralmodifications.
Integrationofspecificenablers(agent,time-seriesDB,existingAPIs)intoanewframework.
WillresultinatailoredsolutionforT-NOVAneeds.Lightweightanddirectlyconfigurable.
Somefunctionalities(suchasalarming)willhavetoberedevelopedfromscratch.
ItseemsthatmostoftheexistingtechnologicalenablersforVIMmonitoring,aspreviouslyoverviewed,canonlypartiallyaddressalltheaforementionedchallengesinalightweightandresource-efficientmanner.Althoughmost of themare indeedopenandmodular (such asMonasca),theyarealreadyquitecomplicatedandresource-demandingandthereforefurtherexpanding them to cover these needs would require considerable effort andwould raiseefficiency issues.We argue that a “clean-slate” approach towardsNFVmonitoring at VIMlevel,exploitingsomebasicenablersandaddingonlytherequiredfunctionalities,isamoreoptimizedapproach.
TheVIMmonitoring framework,whichwedescribe in thenext section, aims toprovide alightweightandNFV-tailoredcontributiontowardsthisdirection.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium21
4. THET-NOVAVIMMONITORINGFRAMEWORK
4.1. Architectureandfunctionalentities
TheoverallarchitectureoftheT-NOVAVIMmonitoringframeworkcanbedefinedbytakingintoaccountthetechnicalrequirements,as identified inSection2,aswellasthetechnicalchoicesmadefortheNFVIandVIMinfrastructure.ThespecificationphasehasconcludedthattheOpenStackplatformwillbeusedforthecontrolofthevirtualisedITinfrastructure,aswellastheOpenDaylightcontrollerforthemanagementoftheSDNnetworkelements.
In this context, it is proper to leverage the OpenDaylight (Statistics API) and OpenStack(TelemetryAPI) capabilities for collectingmetrics, rather thandirectly polling thenetworkelementsandthehypervisorsatNFVIlayer,respectively.
Theoretically,itwouldbepossiblefortheOrchestratortodirectlypollthecloudandnetworkcontrollers of each NFVI-PoP and retrieve resource metrics respectively. This approach,althoughsimpleandstraightforward,wouldonlypoorlyaddress thechallengesoutlined inSection 3.10 and in particularly would introduce significant scalability issues on theOrchestratorside.
Thus, it seems appropriate to introduce amediator/processing entity at the VIM level tocollect,consolidate,processmetricsandcommunicatethemtotheOrchestrator.WecallthisentityVIMMonitoringManager(VIMMM),asastand-alonesoftwarecomponent.AsjustifiedinSec.3.10,VIMMMisre-designedanddevelopedinT-NOVAasanovelcomponent,withoutdependingonthemodificationofexistingmonitoringframeworks.
Withregardtothecollectionofmonitoringinformation,OpenStackandOpenDaylightalreadyprovidearichsetofmetricsforbothphysicalandvirtualnodes,whichshouldbesufficientformostT-NOVArequirements.However,inordertogainamoredetailedinsightontheVNFandtheNFVIstatusandoperation,weconsideradvisabletoalsocollectarichsetofmetricsfromtheguestOSoftheVNFcontainer(VM)-includinginformationwhichcannotbeobtainedviathehypervisor–aswellasthecomputenodeitself.
Forthispurpose,weintroduceanadditionalVNFMonitoringAgent,deployedwithintheVNFVMs.TheagentintendstoaugmentVNFmonitoringcapabilities,bycollectingalargevarietyofmetrics,asdeclared in theVNFDescriptordocument (VNFD)ofeachVNFandalsoatahighertemporalresolutioncomparedtoCeilometer.
Themonitoring agent can be either pre-installed in the VNF image or installed upon VNFdeployment.Itmustbenoted,however,thatinsomecasesthepresenceofanagentmightnot be desirable by the VNF developer for several reasons (e.g. resource constraints,incompatibilitiesetc.).Inthiscase,thesystemcanalsoworkinagent-lessmode,solelyrelyingonCeilometerdataforVNFswhichdonothaveanagentinstalled.
InadditiontocollectinggenericVNFandinfrastructuremetrics,theVIMMMisalsoexpectedto retrieveVNF-specificmetrics from theVNFapplication itself. For thispurpose,wehavedevelopedspecificlightweightlibraries(currentlyinPython,butplannedtoexpandtootherlanguages),whichcanbeusedbytheVNFdevelopertodispatchapplication-specificmetricstotheVIMMM.
AlthoughtraditionallytheVNFmetricsaresupposedtobedirectlysenttotheVNFManager,for the sake of simplicity we chose to exploit the already established VIM monitoringframeworktocollectandforwardVNFmetricstotheVNFManagerthroughtheVIM,ratherthanimplementasecondparallel“monitoringchannel”.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium22
Basedonthedesignchoices,outlinedabove,thearchitectureoftheT-NOVAVIMmonitoringframeworkcanbedefinedasshowninFigure7below.
Figure7.OverviewoftheVIMmonitoringmodules
TheVIMMMaggregatesmetricsbypollingthecloudandnetworkcontrollersandbyreceivingadditional information from the monitoring agents as well as the VNF applications,consolidatesthesemetrics,producesevents/alarmsifappropriateandcommunicatesthemtotheOrchestrator.Forthesakeofscalabilityandefficiency,itwasdecidedthatmetricswillbe pushed by the VIM MM to the Orchestrator, rather than being polled by the latter.Moreover, the process ofmetrics collection/communication and event generation can bepartiallyconfiguredbytheOrchestratorviaarelevantconfigurationservicetobeexposedbytheVIMMM.Moredetailsontheintroducedmodulescanbefoundinthesectionstofollow.
4.2. Monitoringmetricslist
4.2.1. Genericmetrics
A crucial task when defining the T-NOVA approach formonitoring is the identification ofmetrics that need to be collected from the virtualised infrastructure. Although the list ofmetricsthatareavailableviatheexistingcontrollerscanbequiteextensive,itisnecessary,forthesakeofscalabilityandefficiency,torestrictthislisttoincludeonlytheinformationthat
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium23
isactuallyneededfortheimplementationoftheT-NOVAUseCases,asdefinedinDeliverableD2.1.Table3belowssummarisesalistofsuchmetrics,whichare“generic”inthesensethatthey are not VNF application-specific3. This list is meant to be continuously updatedthroughouttheprojectinordertoalignwiththetechnicalcapabilitiesandrequirementsofthecomponentsunderdevelopmentandtheusecaseswhichareimplemented.
Table3.Listofgenericmonitoringmetrics
Domain Metric Units Origin RelevantUCs
VM/VNF CPUutilisation(user&system)
% VNFMon.Agent UC3,UC4
VM/VNF RAMallocated MB VNFMon.Agent UC3,UC4
VM/VNF RAMavailable MB VNFMon.Agent UC3,UC4
VM/VNF Diskread/writerate MB/s VNFMon.Agent UC3,UC4
VM/VNF Diskread/writerate Ops/s VNFMon.Agent UC3,UC4
VM/VNF NetworkInterfacein/outbitrate
Mbps VNFMon.Agent UC3,UC4
VM/VNF NetworkInterfacein/outpacketrate
pps VNFMon.Agent UC3,UC4
VM/VNF No.ofprocesses # VNFMon.Agent UC4
ComputeNode CPUutilisation % OSTelemetry UC2,UC3,UC4
ComputeNode RAMavailable MB OSTelemetry UC2,UC3,UC4
ComputeNode Diskread/writerate MB/s OSTelemetry UC3,UC4
ComputeNode Networki/fin/outrate Mbps OSTelemetry UC3,UC4
Storage(Volume) Read/writerate MB/s OSTelemetry UC3,UC4
Storage(Volume) Freespace GB OSTelemetry UC2,UC3,UC4
Network(virtual/physicalswitch)
Portin/outbitrate Mbps ODLStatistics UC2,UC3,UC4
Network(virtual/physicalswitch)
Portin/outpacketrate pps ODLStatistics UC3,UC4
Network(virtual/physicalswitch)
Portin/outdrops # ODLStatistics UC3,UC4
Withregardtometrics identification,averyrelevantreference istheETSIGVNFV-INF010document[NVFINF010]whichwasreleasedDecember2014.Thisdocumentaimsatdefininganddescribingmetricswhichrelatetotheservicequality,asperceivedbytheNFVConsumer.Thesemetricsareoverviewedinthetablebelow.
3PleaserefertoSec.4.2.2foralistofVNF-specificmetrics.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium24
Table4.NFVServiceQualityMetrics(Source:[NFVINF010])
Itcanbeseenthat,apartfromtheservicelatencymetricswhicharerelatedtotheprovisioningand/orreconfigurationoftheserviceandessentiallyrefertotheresponseofmanagementcommands (e.g. VM start), the restmetrics can be directly or indirectly derived from theelementarymetricsidentifiedinTable3aswellastheevents/alarmsassociated.However,itisuptotheOrchestrator,whichhasacompleteviewoftheservice,toassemble/exploitVIMmetricsinordertoderivetheservicequalitymetricstobeexposedtotheSPandtheCustomerviatheDashboard.ThesemetricswillbeusedasinputtoenforcetheServiceLevelAgreement(SLA)thatwillbefinallyevaluatedatMarketplacelevelfortheapplicabilityofpossiblerewardstothecustomerincaseoffailure(seeD6.4–SLAsandbilling).
4.2.2. VNF-specificmetrics
Apartfromthegenericmetricsidentifiedintheprevioussection,eachVNFgeneratesspecificdynamicmetricstomonitoritsinternalstatusandperformance.
Thesemetrics:
• arespecifiedinsidetheVNFDescriptor(VNFD)asmonitoring-parameters(bothfortheVDUsand for thewholeVNF) todefine theexpectedperformanceof theVNFundercertainresourcerequirements.
• aresentbytheVNFapplicationtotheVIMMonitoringManager,eitherviatheagentordirectly(seedetailsinSec.4.4)
• are processed, aggregated and forwarded, if required, to the upper layers(OrchestratorandMarketplace).
AttheOrchestrationlevel,someoftheVNF-specificmetricscanbeusedforautomatingtheselectionofthemostefficientVNFflavourintermsofusageofresources,toachieveagivenSLA(forexampleusingautomatedscalingprocedures–seeD3.3).
At theMarketplace level, those VNF-specificmetrics thatmay be part of the SLA agreedbetween SP and customer will be evaluated for business and commercial clauses (e.g:penalties,rewards,etc.)thatwillfinallyimpactinthebillingprocedure(seeD6.4)
ThesubsectionstofollowoverviewalistofVNF-specificmetricsforeachoftheVNFsbeingdevelopedinT-NOVA.ThelistswhichfollowaretentativeandaremeanttobecontinuouslyupdatedastheVNFapplicationsevolve.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium25
PleasealsonotethatmostofthesemetricsrefertothespecificfunctionalityofeachVNFaswell as its component softwaremodules. For a detailed description of the T-NOVA VNFs,pleaserefertoDeliverableD5.32[D532].
4.2.2.1. vSBCmetrics
ThevSBCcomponents(VNFCs)abletogeneratemetricsare:LB,IBCF,BGFandO&M(pleasereferto[D532]formoredetails).
ThesedataarecollectedbytheO&Mcomponent,andsenttotheMonitoringManagerviathemonitoringagent,usingtheSNMPprotocol.
ThefollowingtablesumsuptheVNF-specificmetricsforthevSBCfunctions.
Table5.vSBCmonitoringmetrics
Metric Unit Notes
TotalnumberofSIPsessions/transactions
Count-Incremental GeneratedbytheLBandIBCFcomponent
NumberoffailedSIPsessions/transactionsduetovSBCinternalproblems
Count-Incremental Thiscounterdoesn’tincludeincomingincorrectSIPRequests,orfailurescomingfromexternalnetwork
IncomingRTPdatathroughput(incomingbandwidthconsumption)
Count-Incremental NumberofincomingRTPpackets/bytes.
GeneratedbytheBGFcomponent
OutgoingRTPdatathroughput(outgoingbandwidthconsumption)
Count-Incremental NumberofoutgoingRTPpackets/bytes.
GeneratedbytheBGFcomponent
RTPframeloss Average% Dueeithertosourcefiltering,ortofailingasourceaddress/portcheck,oriftheflowrateexceedsthepre-determinedbandwidthContext
Latency msec(Averagevalue) Averagetransmissiondelay.
GeneratedbytheBGFcomponent
Interarrivaljitter msec(Averagevalue) Averageinter-packetarrivaljitter.
GeneratedbytheBGFcomponent
Numberoftranscoding/transratingprocedures
Count-Incremental GeneratedbytheBGFcomponent
Numberoffailedtranscoding/transratingproceduresduetovSBCinternalproblems
Count-Incremental GeneratedbytheBGFcomponent
Wepointoutthat:
• thefirsttwometrics(SIPsessions)arerelatedtothecontrolplanemonitoring• therestmetricsarerelatedtothemediaplanemonitoring
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium26
AtthereceiptofanewSNMPGETrequestcomingfromthemonitoringagent,allthesemetricsarereset,whiletheirenhancementstartsagain.
Thesemetricsmaybestronglyinfluencedby:
• incomingpacketsizes(i.e:64,128,256,……,1518byte)• hardwareandsoftwareaccelerationtechnologies(i.e:DPDKorGPU).Inparticular
theGPUhardwareacceleratorsmightbeused,intandemwithstandardprocessors,incaseofintensiveprocessing(i.e:videotrascoding/transrating).
4.2.2.2. vTCmetrics
Table6overviewsthemetricsreportedbythevirtualTrafficClassifierVNF(vTC).
Table6.vTCmonitoringmetrics
Metric Unit Notes
Packetspersecond Count-average PacketsprocessedbytheTCFlowspersecond Count-average Discreteflowspersecond
Totalflows Count-incremental Thenumberofuniquetotalflowsrecognized
ApplicationProtocols Count-incremental Thenumberofdifferentapplicationsdetected
Throughput Mbits/sec ThetrafficinMbitsprocessedbythevTCpersecond
4.2.2.3. vSAmetrics
ThemetricsreportedbythevirtualSecurityAppliance(vSA)areshowninTable7.ThevSAmetricscorrespondtothetwovSAcomponents(snortandpfsense).
Table7.vSAmonitoringmetrics
Metric Unit Notes
Numberoferrorscomingin/goingoutofthewan/laninterfaceofpfsense
Count-incremental Fourmetrics(withprefix'vsa_pfsense_'):lan_inerrs,lan_outerrs,wan_inerrs,wan_outerrs
Numberofbytescomingin/goingoutofthewan/laninterfaceofpfsense
Count-incremental Fourmetrics(withprefix'vsa_pfsense_'):lan_inbytes,lan_outbytes,wan_inbytes,wan_outbytes
Numberofpacketscomingin/goingoutofthewan/laninterfaceofpfsense
Count-incremental Fourmetrics(withprefix'vsa_pfsense_'):lan_inpkts,lan_outpkts,wan_inpkts,wan_outpkts
Statetablesizeofpfsense Count -
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium27
Percentofdroppedpackets,generatedbysnort
Percent(%) -
Numberofalertspersecond,generatedbysnort
Percent(%) -
vSAthroughput,generatedbysnort
Kbits/sec -
Pfsenseuptime String Forexample:01Hour17Minutes51Seconds
4.2.2.4. vHGmetrics
Table8summarisesthevirtualHomeGateway(vHG)metrics.
Table8.vHGmonitoringmetrics
Metric Unit Notes
Totalsizeswiftnode Bytes FromSwift
Totalobjectsstored Count-incremental FromSwift
Totalversionsstored Count-incremental FromFrontendxmlfile
TotaluniqueURLsstored Count-incremental FromFrontendxmlfile
Totalrules Count-incremental FromFrontendxmlfile
Averagetimefortranscoding Seconds Fromworkers
4.2.2.5. vProxymetrics
Themetricsreportedbythevirtualproxy(vProxy)VNFaresummarizedinTable9.Mostofthemareboundtothespecificproxyimplementation(squid),butcanbeextendedtomatchotherimplementationsaswell.
Table9.vProxymonitoringmetrics
Metric Unit Notes
NumberofHTTPrequestsreceived Count-incremental ThenumberofHTTPrequestsreceivedbySquidsincethelastmeasurement
Cachehitspercentage Percent(%) ThepercentageofHTTPrequeststhatresultinacachehitforthelast5minutes.ItalsoincludescasesinwhichSquidvalidatesacachedresponseandreceivesa304(NotModified)reply.
Memoryhitspercentage Percent(%) Thepercentageofallcachehitsthatwereservedfrommemory
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium28
(hitsthatareloggedasTCP_MEM_HITinSquid’slogs)
Diskhitspercentage Percent(%) Thepercentageofallcachehitsthatwereservedfromdisk(hitsthatareloggedasTCP_HITinSquid’slogs)
Cachediskutilization Percent(%) Theamountofdiskcurrentlybeingusedbythecachedobjectsdividedbythetotalamountofdiskthatcanbeallocatedforcaching.
Cachememoryutilization Percent(%) Theamountofmemory(RAM)currentlybeingusedbythecachedobjectsdividedbythemaximumamountofmemorythatcanbeallocatedforcaching.
Numberofusersaccessingtheproxy
Count SquidassumesthateachuserhasauniqueIPaddress
4.3. VNFMonitoringAgent
TheVNFMonitoringAgentcomeseitherpre-installedwithintheVMimagehostingtheVNFCorinstalleduponVNFCdeployment.ItwillbeautomaticallylauncheduponVNFstart-upandruncontinuouslyinthebackground.TheagentcollectsawiderangeofmetricsfromthelocalOS.
For the implementationof themonitoringagent,weexploit the thepopular collectd-coremodule[collectd](alsoseeAnnexI,Sec.9.1.1.8.).Collectd-corecomesinapackagealreadyavailable in most Linux distributions and can be directly installed with relatively minimaloverhead.
Giventhatthelistofavailablecollectdpluginsisquiteextensive,wehaveselectedabasicsetofpluginstobeusedinT-NOVA,inordertocoverallgenericmetrics,asidentifiedinSec.4.2.1butalsotocapturemostvitalmetersofthesystem,withoutontheotherhandintroducingtoomuchoverhead.Theseplugins,accompaniedbyabriefdescriptionandthemetricswhicharecollected,areoverviewedinTable10below.
Table10.CollectdpluginsusedinT-NOVA
Plugin Description Metrics
CPU CollectstheamountoftimespentbytheCPUinvariousstates,mostnotablyexecutingusercode,executingsystemcode,waitingforIO-operationsandbeingidle.
• user• interrupt• softirq• steal• nice• system• idle• wait
Memory Collectsphysicalmemoryutilization. • used• buffered• cached
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium29
• free
Disk Collectsperformancestatisticsofhard-disksand,wheresupported,partitions.
• octets.read• octets.write• ops.read• ops.write• time.read• time.write• merged.read• merged.write
Interface Collectsinformationaboutthetraffic(octetspersecond),packetspersecondanderrorsofinterfaces
• if_octects.rx• if_octects.tx• if_packets.rx• if_packets.tx• if_errors.rx• if_errors.tx
Processes Collectsthenumberofprocesses,groupedbytheirstate(e.g.running,sleeping,zombies,etc.).
• ps_state-running• ps_state-sleeping• ps_state-zombies• ps_state-stopped• ps_state-paging• ps_state-blocked• fork_rate
Forthecommunicationofmetrics,theMonitoringAgentfeaturesaTCPorUDPdispatcherwhich pushes measurements to the VIM MM periodically. The push frequency will beconfigurable (either manually or automatically). The set of metrics (selection among allavailableones) tobecommunicatedwillalsovaryamongVNFs,andwillbedefined in theVNFD.
4.4. CollectionofVNF-specificmetrics
TheVIMmonitoringframeworkprovidesseveraloptionsforcollectingVNFmetrics;eachVNFdevelopermaychoosethemostappropriateoptionwhichsuitstheirrequirements,policiesandconstraints.
The direct communication method involves the VNF application itself reporting selectedmetricsaskey-valuepairstotheVIMMMatarbitrary intervals.Forthispurpose,wehavedeveloped a set of lightweight libraries (currently in Python and Java) which the VNFprovider/developercanintegrateintheapplication.Thisway,theVNFPcanusethemethodsprovided to easily and quickly dispatch internal application metrics without knowing theinternalsandinterfacesofthemonitoringframework.
The indirect communication method implies that all VNF metrics are collected by themonitoringagent(collectd)bymeansofplugins.Thiscanbedoneinthreealternativeways:
1. Usingthecollectd“Snmp”plugin.ThisoptionisappropriateincaseswhentheVNFalready exposes an SNMP service. In this case, the metrics to be collected aredescribedbyaManagementInformationBase(MIB).TheMIBincludestheVNF/VDUidentifiers, and uses a hierarchical namespace containing object identifiers (OIDs);eachOIDidentifiesavariablethatcanbereadviatheSNMPprotocol(seeRFC2578).TheT-NOVAagent(collectd)issuesstandardSNMPGETrequestsperiodicallytothe
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium30
VNF SNMP service for these specificOIDs and gets back the values,which in turncommunicatestotheVIMMM.
2. Usingthecollectd“Tail”plugin.ThisisthesimplestmethodwhichrequiresminimalintegrationwiththeVNF.Withthisapproach,theVNFapplicationdumpsmetricsasentriesinalogfilewithknownformat.ThecollectdTailpluginparsesthelogfileaftereachupdate,extractsthemetricsandcommunicatestotheVIMMM.
3. Usingthecollectd“Custom”plugin.ThisisthemostcomplicatedmethodandrequirestheVNFPtodevelopaspecialcollectdpluginfortheVNF.However,thismightbethepreferredchoiceforsomeVNFPswhoinanycasewanttoaddcollectdsupportintheirVNF, given that collectd is very widely used also outside T-NOVA and alreadyintegratedwithsomeofthemostpopularmonitoringframeworks.
4.5. MonitoringofFPGA-basedVNFs
The T-NOVA project attempts to expand the NFV purview to heterogeneous computearchitecturesuchasGPUsandFPGAs.Utilizingsuchspecializedhardwarehasdirecteffectsonthemonitoringinfrastructurethatshouldbeusedtosupportit.ThissectionexploresthecorollariesoftheuseofprogrammablelogicascomputenodesintheT-NOVAenvironment.
Monitoring programmable logic devices represents unique challenges in comparison tostandardCPUs.Manyofthenotionspresentinthelatterarenotpresentinprogrammablelogicdevicesandfurthermoreprogrammablelogic-basedsystemscanshowlargedisparitieswhichmakesthetaskofprovidingoneoverarchingconceptexceedinglydifficult.
Asastartingpoint forourmeasurementarchitecturedefinitionweuse theprogrammablecloudplatformarchitectureintroducedinD4.1.InthisarchitectureweassumeanFPGASoCis used as the compute node. An FPGA SoC consists of a Processing System (PS), whichcomprisesoneormoreCPUsand theProgrammableLogic (PL). In thisarchitecture thePSexecute theOpenStackworker andany software requirement for themanagementof theprogrammable resources available in the PL,while the actual VNFCs to bemonitored aredeployedtothePL.
Inthisschemethemonitoringinfrastructureisbynecessityalsodividedintotwocomponent.OnecomponentresidesinSWandisexecutedinthePS.ThisprogramcollectsstatisticsfromtheHWcomponentsandforwardsthemtothemonitoringmanagerandcanalsobeusedtomonitortheperformanceofthePSifthisisdesired.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium31
Figure8.MonitoringarchitecturefortheT-NOVAFPGASoC
The HW component on the other hand is responsible for collecting all of the relevantparameters on the HW side and forwarding them to the software component. TheseparametersarespecifictoeachVNFCandthusit’suptotheusestoprovidetheappropriateconnectionsandcircuitsforit.TheFPGASoCplatformprovidestheuserwithinfrastructurewhichcanbeusedtosendthatdataontothesoftwarecomponent.
ThisinfrastructureconsistsofasimpleAXI4streamwhichtheuserHWstatisticsmodulemustuse to interfacewith an AXI4 DMA engine. The latter is responsible for all data transfersbetweenthePSandthePLandensureshighthroughput,lowlatencytransfersbetweenthetwo.TheHWmonitoringcomponentsharetheAXIDMAwiththeVNFCalsodeployedontheFPGA.ThisisenabledbyusingseparateDMAchannelsforthedatatransfers,afeatureofferedbytheAXIDMAblockusedinthedesign.TheDMAblockoffersoneAXI4S interfaceforallinputsandusesadditionalsignaltodiscerntheuserwhichprovidedthedata.ItthentransfersthattotheappropriatememoryaddressspacefromwhichtheSWmonitoringapplicationcanreadthedata.
Thisschemeprovideaclear,expandable,standardinterfacefortheHWVMtotransfersitsmonitoringdatatotheSWcomponentandastraightforwardmethodfortheSWtoreadthedataandperformanyprocessingrequiredbeforesendingitontothemonitoringmanager.
4.6. VIMMonitoringManagerarchitectureandcomponents
4.6.1. VIMMMArchitecture
Aligned with the requirements and the design choices set in the previous sections, thefunctionalcomponentsoftheVIMMonitoringManageraredepictedinFigure9anddescribedinthissectionandintheoneswhichfollow.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium
32
Figure9.VIMMMfunctionalcomponents
TheMonitoringBackendisthecorecomponentofthemonitoringframework.ItisdevelopedinJavaScriptandusesthenode.js[nodejs]frameworktorunasaserver-sideapplication.Thereason behind this choice is that JavaScript matches an asynchronous, event-drivenprogrammingstyle,optimalforbuildingscalablenetworkapplications.Themainfunctionalityof VIMMM is data communication and the node.js ecosystem offers several services tofacilitatecommunication,especiallyviawebservices,aswellasevent-drivennetworking.
Thebackenditselfisdividedtothefollowingmodules:
• Databaseconnector.Thismoduleaccessesthetime-seriesdatabase(seeSec.4.6.5)in order to write and to read measurements. This module uses influent4, anInfluxDBJavascriptdriver.
• OpenStack and OpenDaylight connectors. These modules perform requests tovariousOpenStackandOpenDaylightservicesinordertoacquirecloud-andnetwork-related metrics (see Sec. 4.6.2). The Openstack connector communicates withKeystone,theOpenStackIdentityservice,inordertogeneratetokensthatcanbeusedforauthenticationandauthorisationduringtherestoftheOpenStackqueries.ItpollsNova,theOpenStackComputeservice,inordertogettheavailableinstances.Finally,itpollsCeilometer, theOpenStackTelemetryservice, inorder to receive the latestmeasurementsoftheinstances.Therequest-promise5npm(node.jspackage)moduleprovides here the HTTP client to perform all these requests. The OpenDaylightconnectorisstillunderdevelopment.
4https://github.com/gobwas/influent5https://www.npmjs.com/package/request-promise
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium
33
• NorthboundRESTAPI.ThismoduleexposesalltherecordedmeasurementsviaHTTPandofferstheabilitytosubscribetospecificmeasurementevents.hapi.js6hasbeenused as a framework to build this. hapi.js pluginswere also used, such as joi7 forvalidation, hapi-swaggered8 and hapi-swagger9 for Swagger documentationgeneration.SeeSec.4.6.3formoredetails.
• Alarmingandanomalydetection. Thismodule, still underdevelopment, performsstatisticalprocessing inorder toderiveeventsandalarms (seeSec.4.6.4 formoredetails)
• VNF Application connector. It accepts data periodically dispatched by each VNFapplication, filters them and stored. These metrics are specific to each VNF (e.g.number of flows, sessions etc.). The list ofmetrics to be collected as well as thedispatchfrequencyaredescribedintheVNFDescriptor(VNFD).
• Configuration.Thismoduleallowstheuseoflocalfilesinordertoloadsettings.Node-config10hasbeenusedheretodefineasetofdefaultparametersandextendthemfor different deployment environments, e.g., development, QA, staging andproduction. Configurations are stored in configuration files within the backendapplicationandcanbeoverriddenandextendedbyenvironmentvariables.
Thedefaultconfigfileiscalledconfig/default.jsonandtheadministratormaycreatemultiplefiles intheconfigdirectorywiththesameformat,whichcan laterbeusedbythebackendapplicationiftheenvironmentvariableNODE_ENVissettotheconfigurationfilewithoutthe.jsonsuffix,e.g.forconfig/production.jsonthefollowingcommandneedstobeinvokedonaBash-compatibleshell:
exportNODE_ENV=production
Theconfigurationparametersthatarecurrentlyavailablearethefollowing:
ConfigParameter Description
loggingLevel Setsthelogginglevel.Availablelevelsaredebug,warnandinfo.
database Connection information for the time-series database. Requiredinformationshouldbeenteredinthefollowingstrings:host,port,username,passwordandname(forthetargetdatabasename).
identity Connection information for the OpenStack Keystone service.Requiredinformationshouldbeenteredinthefollowingstrings:host, port, tenantName, username and password. It should benoted that the tenant whose credentials must have sufficientprivilegestoaccessallthenecessaryOpenStackVNFinstances.
ceilometer Connection information for the OpenStack Ceilometer service.Requiredinformationshouldbeenteredinthefollowingstrings:pollingInterval,host andport. ThepollingInterval sets the timeperiod during which the backend polls Ceilometer formeasurements.
6http://hapijs.com/7https://github.com/hapijs/joi8https://github.com/z0mt3c/hapi-swaggered9https://github.com/glennjones/hapi-swagger10https://github.com/lorenwest/node-config
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium34
nova Connection information for the OpenStack Nova service.Requiredinformationshouldbeenteredinthefollowingstrings:hostandport.
4.6.2. Interfacestocloudandnetworkcontrollers
Monitoringofcomputing,hypervisorandstoragestatusandresourcesareperformeddirectlyviatheOpenStackCeilometerframework.TheVIMMM(OpenStackconnector)periodicallypolls the Telemetry API formetrics regarding all deployed physical and virtual resources.AlthoughthesemetricscouldberetrievedbydirectlyaccessingtheCeilometerdatabase,sincetheschemeofthelattermayevolveinfutureOpenStackversions,itismoreappropriatetousetheREST-basedTelemetryAPI.TheVIMMMissuesGETrequeststotheservicereferringtoaspecificresourceandmeter,andtheresultarereturnedinJSONformat.
Fortunately,theTelemetrysupportforthehypervisorselectedforT-NOVA(libvirt)offersthewidestpossiblelistofavailablemonitoringmetrics,comparedtootherhypervisors,suchasXenorvSphere.
ThecurrentversionoftheOpenStackconnectorhasthefollowingworkflow:
• Tokenmanagement.CommunicationwiththeOpenStackAPIrequiresalwaysavalidtoken.Thebackenduses theOpenstackKeystone service toacquirea valid token,whichisusedforeverytransaction.Thetokenisbeingcheckedbeforesubmittinganyrequestandifitisexpired,itgetsrenewed.
• Instanceinformationretrieval.Thebackenddoesnotknowaprioriwhichinstancesaretobemonitored.BypostingarequestattheNovaAPI,itgetsalistoftheactiveinstancesinordertoproceedwiththemeasurementrequest.
• Measurement retrieval Once the backend knows the existence of an activeOpenStackinstance,itisabletoretrievespecificmeasurementsforit.CurrentlyCPUutilisation, incoming and outgoing bytes rate are supported, but the list is quicklyexpandedwithothermetricsthatOpenStackCeilometersupports.
ThisworkflowisbeingperformedwithatimeperiodthatcanbesetwiththepollingIntervalparameterasaforementioned.
Moreover, collectingmetricsvia theAPIallowsexploitingadditional featuresofTelemetrysuchas:
• Metergrouping:itispossibletodefinesetofmetricsandretrieveanentiresetwithasinglequery;
• Sampleprocessing:itispossibletodefinebasicaggregationrules(average,max/minetc.)andretrieveonlytheaggregateinsteadofasetofmetrics;
• Alarming:itispossibletosetalarmsbasedonthresholdsforthecollectionofsamples.Analarmcandependonasinglemeter,oracombination.TheVIMMMmayusetheAPItosetanalarmanddefineanHTTPcallbackservicetobecalledwhenthealarmhasbeensetoff.
4.6.3. NorthboundAPItoOrchestrator
The VIMMonitoring Framework offers a Northbound API to the Orchestrator in order toinformthelatterofthenewestmeasurementsandinthefutureforpossiblealerts.AnHTTP
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium35
RESTfulinterfaceprovidesthelatestmeasurementsuponrequestsandtheabilitytosubscribetomeasurements.
ThelatestdraftoftheETSINFVIFAdocument[NFVIFA005]whichprovidesaninsighttotheOr-Vi referencepoint, containsahigh-level specificationof the requirementsand thedatastructureswhich need to be adopted for infrastructuremanagement andmonitoring. Therequirementsrelatedtomonitoringcanbesummarizedintothefollowing:
• The VIM must support querying information regarding consumable virtualisedresources
• TheVIMmustissuenotificationsofchangestoinformationregardingresources
• The VIM must offer full support for alarming (alarm creation/ modification/subscription/issue/deletion)
• TheVIMmustissuenotificationsforinfrastructurefaults
TheNorthboundAPIprovidedby theT-NOVAVIMMonitoringFramework intends toalignwiththeserequirements.
4.6.3.1. Querying
First,asetofRESTGETendpointssupport thetransmissionof the latestmeasurementsofeveryavailabletypeforeveryinstancebeingmonitored.
The template of such URL is /api/measurements/{instance}.{type}, where instance is theUniversallyUniqueIdentifier(UUID)givenbytheOpenStackdeploymenttotheinstanceandtype one of the supported measurement types. The currently supported measurementtypesare:
• cpu_util(CPUutilisation)
• cpuidle(CPUidleusage)
• fsfree(freespaceontherootfilesystem)
• memfree(freememoryspace)
• network_incoming(therateofincomingbytes)and
• network_outgoing(therateofoutgoingbytes)
TheformatoftheanswerisaJSONobjectwhosefieldsarethefollowing:
• timestamp:showsthetimestampthemeasurementwastaken
• value:showstheactualmeasurementvalue
• units:showsthemeasurementunits
Theseendpointsrequireconstantpollinginordertoretrievetheirvalues.Ifasystemrequiresa constant stream of measurements at specific interval times, then it could use thesubscriptionendpoint.
4.6.3.2. Meters/notificationspush
TheVIMMMenablesapublish-subscribecommunicationmodelforpushingofmetricsandeventstotheOrchestrator. Inordertosubscribeformeasurementevents, it isrequiredtoprovidethefollowinginformationintheformofaJSONobject:
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium36
• types:Thisisanarrayofthemeasurementtypes.ThesupportedtypesarethesameonesastheonesintheGETendpoints.
• instances:Thisisanarrayoftheinstancesthathavetobemonitored.TheUUIDsoftheinstancesarealsousedhere.
• interval:Thisistheintervaltimethemonitoringbackendhastowaitbeforesendinganewsetofmeasurements.Thetimeshouldbegiveninminutes.
• callbackUrl:ThisistheURLthemonitoringbackendhastocallbackinordertosubmitthenewestmeasurements.
This JSON object has to be submitted as a payload in a POST request to the endpoint/api/subscribe.Upontransmission,aconfirmationmessageissentbackasresponseandafterthespecifiedinterval,amessageisgiventothecallbackUrl,similartotheonesonecangetviatheGETendpoints.
4.6.3.3. Alarming
TheVIMMMwilloffermethodsforcreatingalarmsanddispatchingcallbackswheneverthestatusofalarmchanges.Thisfeatureisunderdevelopment.
4.6.3.4. APIdocumentation
FortheconvenienceofAPIconsumers,aSwagger-UIendpointisgivenat/docs,whereuserscanrefertoforup-to-dateinformation(Figure10)
Figure10.LiveAPIdocumentationviaSwagger
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium37
4.6.4. Anomalydetection
Fordetectingcriticalsituationsandproducingalarms,theT-NOVAVIMMMwillatfirststagesupportoperationsviastaticallydefinedthresholds.TheOrchestratorwillusethealarmingmethods(seeSec.4.6.3.3.)todefineandmodifyalarms.InturntheVIMMMwillchecktheaffectedmeasurements,evaluatetheexpressionsgivenbytherulesandsendeventuallyanotificationiftheexpressionistrue.Thisisastandardfeatureprovidedbyseveralmonitoringframeworks,assurveyedinChap.3
Goingbeyondthiscapability,anomalydetection i.e. identificationofpossiblemalfunctionswithout pre-defined alarm thresholds is a very promising research directionwhich is veryrelevanttoNFVmonitoring.
In the T-NOVAVIMmonitoring framework, anomalydetectionwill be incorporated to thealarmingsystem.TheVIMMMwilluseoutlierdetectionmethodstoexamineasetofmetersandsamplesassociatedwitharesourceandidentifywhetherthereisasignificantoperationfrom the “standard” operation. Some of themethods for outlier detection which will beinvestigated for application in T-NOVA are [Hodge04] proximity-based techniques,parametricsandnon-parametricmethodsaswellasneuralnetworks.
Both semi-supervised and unsupervised approaches will be considered. Semi-supervisedapproachesareexpected toyieldbetter results, yet their applicationassumes thatpropertestingoftheVNFwillhavebeenpreceded,resultingintoanappropriatelysizeddatasetwhichwillcorrespondto“normal”operation.
4.6.5. Time-seriesDatabase
Since the T-NOVA VIM Monitoring Backend handles primarily measurements, we haveselected a time-series database as optimal. For its implementationwe have opted to useInfluxDB [InfluxDB], a time-series database written in Go. By concentrating all data in aperformant DB and relying on periodical feeds, we can simplify workflows, reduce inter-componentsignalingandthuseliminatetheneedforamessagequeue,whichiscommonlyusedinmonitoringframeworks.
Although,InfluxDBisadistributeddatabase,forthetimebeingweareevaluatingitonasinglenodeuntilstorageissuesappear.
The Backend requires that a database has already been created in InfluxDB. The use of aretention policy is also highly recommended, since the database could store potentiallymultiplegigabytesofmeasurementdataeveryday.ForthedevelopmentandQAtestingofthebackendweusearetentionpolicyof30days.After30daysthemeasurementsareerased,inordertofreeupdiskandmemoryspace.
Eachmeter isstored inaseparatetable,wheremultiple instancesmaystorevaluesofthespecificmeasurementtype.Inadditiontotheactualvalue,atimestampandaninstancetagarealsostored,inordertoidentifythemeasurement’soriginandtime.Finally,thedatatypeofeverymeasurementisfloat64.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium
38
QueriesareperformedintheLineprotocol11byusingtheinfluentmodule.Anexamplequeryisthefollowing:
SELECTlast(value)FROMmeasurementTypeWHEREhost='instanceA'
ThisqueryretrievesthelastmeasurementofacertaintypeandacertainVNFinstance.
4.6.6. Graphicaluserinterface
Themain interface of theVIMMonitoring Framework is theHTTPAPI of the backend, asdescribed in Sec. 4.6.3. During the development of the monitoring framework, theVNF developers have, however, requested for a graphical way of accessing themonitoringdatatheirVNFinstancesandVNFapplicationsmorespecifically,produce.ThisleadtotheintegrationofaGrafanaserver.Grafana[Grafana]isagraphbuilderforvisualisingtimeseriesmetrics. It supports InfluxDBasadatasourceandthus, it iseasy tovisualiseall theavailablemeasurementsdirectlyfromthedatabase.
Figure11.VisualizationofmeasurementswithGrafana
4.7. Packaging,documentationandopen-sourcerelease
InanefforttocontributetomaximisingtheimpactofT-NOVAontheNFVcommunity,theT-NOVA monitoring framework is released [GH-VIM] under the GNUGeneral Public License v3.012. Interested stakeholders and prospective contributors arewelcometodownloadanddopullrequestsonthepublicGitHubrepository13.Theplanistomove the project to the overall T-NOVA Github account, as soon as the latter becomesavailable.
11https://influxdb.com/docs/v0.9/write_protocols/write_syntax.html12https://github.com/spacehellas/tnova-vim-backend/blob/master/LICENSE.txt13https://github.com/spacehellas/tnova-vim-backend
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium39
ADockerimagecontainingnodeandtheapplicationisalsoprovidedinsidethisrepository.ADocker image allows the seamless usage of the backend in any OpenStack deployment,regardless of deploying it on a physical or virtual machine. Most of the configurationparametersareexposedinDockerenvironmentvariablesandcanbesetupduringcontainercreation.A YAML file (docker-compose.yml) is also provided inside the repository, so thatuserscancombinetheVIMmonitoringbackend,InfluxDBandGrafanainthesamewaythebackendisbeingdevelopedandtested.
For further documentation of the backend, please refer to the README file14 and thedocumentationdirectory15oftherepository.The informationwillbekeptup-to-datewhiledevelopmentprogresses.FortheAPIdocumentation,pleaserefertothe/docsendpointofaworkingdeployment,wheretheSwagger-UIishosted.
14https://github.com/spacehellas/tnova-vim-backend/blob/master/README.md15https://github.com/spacehellas/tnova-vim/blob/master/documentation
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium40
5. VALIDATION
5.1. Functionaltesting
ForthepurposeoffunctionaltestingandbenchmarkingofthecurrentreleaseoftheT-NOVAVIMmonitoringframework,thelatterwasintegratedintotheT-NOVAIVMtestbedasshowninFigure12below.
Figure12.TestbedconfigurationfortestingVIMMonitoring
TheVIMMonitoringManagerwasdeployedasaDockercontainerinaseparatephysicalhost,aspartoftheVIMmanagementandmonitoringframework.ItinterfacedwithOpenstackforthecollectionofmetricsviaCeilometer.
TheworkloadtoproducethemetricswasthelatestversionofthevTC(virtualTrafficClassifier)VNF,deployedinaVM.ThevTCusedthePythonlibraryprovided(seeSec.4.4)inordertodispatchVNFmetricstotheVIMMM.Atthesametime,thecollectdagentwasalsoinstalledintheVNFVM,todispatchgenericmetrics.
Inordertoemulaterealisticoperationalconditions,atrafficgeneratorhostedinaseparateVMwasusedtoplaybackarealnetworktrafficdumpcontainingamixofvariousservices.
BehindtheVIMMon.Mgr.twoclientswereused;oneforaccessingthemetricsviatheRESTAPIandasecondoneaccessingtheweb-basedGUI.
ThechosenfunctionaltestintendedtovalidatemostofthefunctionalcapabilitiesoftheVIMMM,namely:
• InterfacingwithCeilometer
• Collectionofagentmetrics
• CollectionofVNFmetrics
• Persistenceofmeasurements
• GUIoperation
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium41
TheGUIwasconfiguredbytheusertodisplay thefollowingmetrics, integrated inasingleview:
• VNFCPUutilization,retrievedfromOpenstack
• VNFmemoryusage andnetwork traffic (cumulative packet count), as reportedbyguestOSviathemonitoringagent
• VNF-specificmetrics.Specifically,thevTCisabletoreportthepacketrateofdifferentapplicationsdetectede.g.Skype,Bittorrent,Dropbox,Google,Viberetc.)
Figure13.End-to-endfunctionaltest:theVIMMMGUIscreenshot,integratingmetricsfromvarious
sources
Itwasverifiedthatthemultiplesampleswerecollectedanddisplayedproperly.ThevalidityofthemeasurementswasverifiedbyaccessingtheconsoleviewoftheVNFVMand:
Checkingthesystemmetricsviacommand-linetools(top,netstatetc.)
Checking the vTC VNF logs which contained periodic dumps of the VNF metrics (per-application rate). It is clarified thatwhat is tested here is the ability to communicate andcollectmetricsandnottheaccuracyofthevTC.
Systemstabilitywasalsocheckedbyallowingthesystemtoruncontinuously;thevTCandVIM MM was found to be operating normally after more than two days of uptime andcontinuousoperation,untilitwasmanuallystopped.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium42
5.2. Benchmarking
Asaforementioned,apartfromtheGUI,theVIMalsoexposesaprogrammaticinterface(API)totheorchestrationandVNFMcomponents.WetestedthescalabilityandperformanceoftheVIMMMbyloadingitwithavariablenumberofGETrequests,askingforasinglespecificmetric (CPU load) of the vDPI VNF. We used the httperf software [httperf] to generatesyntheticHTTPGETrequestsatvariousratesandmeasuredtherateofresponsesreceived.Then,werepeatedtheprocedure,thistimedirectlypollingCeilometerforthesamemetric.
Thetwosetsofmeasurementsweremadeonplatformswithsimilarhardwarecapabilities.TheresultsaredepictedinFig.5.
Figure14.VIMMonitoringManagerperformance
ItcanbeshownthattheVIMMMcanexposemetricswithperformancecomparabletonativeCeilometer. It also seems to exhibit better stability when overloaded (at more than 160requests/secforthegivenhardwareconfiguration).
An important added value of VIM MM is the communication overhead, which has beenreducedtotheminimumtoimprovescalability.Figure15comparesthelength(inbytes)oftheresponsestoasingleGETrequestforaspecificmetricofaVNFVM(CPUload,memoryutilizationanddiskusage).TheresponseofCeilometerisquiteverbose,sinceitalsoincludesdetailedinstanceinformation.Wetrytoalleviatethiseffectbyincludingintheresponsebodyonlytheabsolutelynecessaryelements,i.e.themetricname,thevalueandthetimestamp.Theresultisadecreaseinoverheadbyabout95%.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium43
Figure15.Lengthofresponsesforsingle-metricrequests
It is thus seen that the VIM MM exhibits acceptable performance when it comes tocommunicatingmetrics.
Regarding Ceilometer, however, it must be noted that the performance limitations ofCeilometer are known and expected to be alleviated in the upcoming releases - and theTelemetry project is already targeting at fulfilling NFV requirements. In this context,Ceilometer could also in the near future offer an efficient solution for NFV monitoring,bringingatthesametimestrongcommunitysupportaswellaswideindustrialuptake,beingacoreOpenstackcomponent.
5.3. Fulfillmentofrequirements
Followingthesuccessfulexecutionoftheaforementionedvalidationandassessmenttests,the table below explains how the implemented and tested VIM monitoring frameworkeventuallyfulfills(orisplannedtofulfill)therequirementswhichweresetinSection2.
Table11.Compliancetorequirements
RequirementfortheMonitoringFramework
Status Justification
TheMFmustprovideavendoragnosticmechanismforphysicalresourcemonitoring.
Compliance Themechanismsintroducedformeasurementscollectionarevendoragnosticformostmetrics(CPU,memory,storage,networketc.)
TheMFmustprovideaninterfacetotheOrchestratorforthecommunicationofmonitoringmetrics.
Compliance TheMFexposesanorthboundRESTAPIformetrics/alarmscommunicationineitherpushorpullmode.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium44
TheMFmustre-useresourceidentifierswhenlinkingmetricstoresources.
Compliance TheVIMMMre-usestheOpenstackUUIDsandhostnamesforlinkingmetricstoresources.
TheMFmustmonitorinrealtimethephysicalnetworkinfrastructureaswellasthevNetsinstantiatedontopofit.
Compliance(Planned)
Network-relatedmeasurementsarederivedfromthemonitoringagents(fornetworkinterfaces)andalsoOpenDaylight(forvirtuallinksandnetworks)(Featureunderdevelopment)
TheMFmustprovideanAPIforcommunicatingmetrics(ineitherpushorpullmode)
Compliance TheMFexposesanorthboundRESTAPIformetrics/alarmscommunicationineitherpushorpullmode.
TheMFmustcollectutilisationmetricsfromthevirtualisedresourcesintheNFVI.
Compliance TheVIMMMcollectsmetricsfromdeployedVMsandestablishedvNets.
TheMFmustcollectcomputedomainmetrics.
Compliance Directaccesstocomputedomainmetricsisachievedbymeansofthecollectmonitoringagent.
TheMFmustcollecthardwareacceleratormetrics
Compliance(Planned)
Hardwareacceleratormetricsareaccessiblebymeansofspecificagent(collectd)plugins.(Featureunderdevelopment)
TheMFmustcollectcomputemetricsfromtheHypervisor.
Compliance TheVIMMMcollectshypervisormetricsindirectlyviatheCeilometerAPI.
TheMFmustcollectnetworkdomainmetricsfromtheHypervisor.
Compliance TheVIMMMcollectshypervisormetricsindirectlyviatheCeilometerAPI.
TheMFmustprocessanddispatchalarms.
Compliance(Planned)
TheVIMMMallowsconfiguringandsubscribingtoalarms.
TheMFmustcollectmetricsfromphysicalandvirtualnetworkingdevices.
Compliance(Planned)
TheVIMMMinterfaceswithOpenDaylighttocollectnetworkdevicestatistics.(Featureunderdevelopment)
TheMFmustleverageSDNmonitoringcapabilities.
Compliance(Planned)
TheVIMMMcollects(mostlyport)statisticsforSDNdevicesfromOpenDaylight.(Featureunderdevelopment)
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium45
6. CONCLUSIONSANDFUTUREWORK
ThisdocumentdescribedthedesignanddevelopmentofamonitoringframeworkfortheT-NOVAIVMlayer.Usingacomprehensivestate-of-the-artsurveyaswellasaconsolidationofT-NOVA requirements, the architecture of the T-NOVA VIM monitoring framework wasspecified. Taking into account the use of OpenDaylight and OpenStack as the controllertechnologiesintheVIM,infrastructuremetricsandstatisticsavailablefromthesecontrollersare collected. Furthermore, a VNF monitoring agent was also introduced, as an optionalcomponent,collectingarichsetofmetricsfromwithinVMsandVNFapplications.Allthesemetricsareaggregatedand filtered intoacentralisedMonitoringManager,whichexposesstatusandresource informationof theNFVI-PoPtotheOrchestrator,asconfiguredbythelatter.
Itisconcludedthat,withtheproposedapproach,thegoalofdeliveringaneffective,efficientandscalablemonitoringsolutionfortheT-NOVAIVMlayer isachieved.Thesolutionunderdevelopment is able to expose to the Orchestrator and to the Marketplace enhancedawareness of the IVM status and resources, while at the same time keeping thecommunicationandsignallingoverheadatminimum.
The current release has been integrated with the T-NOVA IVM testbed and has beendemonstrated in operation in IEEE IM 2015 and IEEE SDN/NFV conferences as part of anintegrateddemonstratoroftheT-NOVAproject,monitoringanNFVservicewiththevTCVNF.
Thenext steps in implementation involve the finalizationof theOrchestratorAPIwith thealarmingfunctionality,theintegrationofOpenDaylight,aswellastheanomalydetectionpart.Theseadvanceswillbereflectedinthefinalversionofthisdeliverable.
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium46
7. REFERENCES
[Aodh] OpenstackTelemetryalarming,https://github.com/openstack/Aodh
[Cloudwatch] AmazonCloudWatch,http://aws.amazon.com/cloudwatch
[Collectd] collectd–Thesystemstatisticscollectiondaemon,https://collectd.org/
[Cyclops] Cyclopsframework,http://icclab.github.io/cyclops/
[D232] M. McGrath (Ed.) et al, “Specification of the Infrastructure Virtualisation,ManagementandOrchestration–Final”,T-NOVADeliverableD2.32,October2015
[D532] “NetworkFunctionsImplementationandTesting–Final”,T-NOVADeliverableD5.32,June2016
[DCM] YeYu,C.Qian,andX.Li,"DistributedandCollaborativeTrafficMonitoringinSoftwareDefinedNetworks,"presentedattheACMSIGCOMMWorkshoponHot Topics in Software DefinedNetworking (HotSDN'14), Chicago, IL, USA,2014.
[Doctor] OPNFV Wiki - Project: Fault Management (Doctor),https://wiki.opnfv.org/doctor
[DoctorDel] Doctor Deliverable: Fault Management and Maintenance, Release 1.0.0,October 2015,http://artifacts.opnfv.org/doctor/DoctorFaultManagementandMaintenance.pdf
[Drools] DroolsBPMengine,http://www.drools.org/
[Flowsense] C. Yu, C. Lumezanu, Y. Zhang, V. Singh, G. Jiang, and H. Madhyastha,"FlowSense:MonitoringNetworkUtilizationwithZeroMeasurementCost,"inPassiveandActiveMeasurement.vol.7799,M.RoughanandR.Chang,Eds.,ed:SpringerBerlinHeidelberg,2013,pp.31-41.
[Ganglia] GangliaMonitoringSystem,http://ganglia.sourceforge.net
[GH-VIM] https://github.com/spacehellas/tnova-vim-backend
[Gnocchi] OpenstackGnocchiproject,https://wiki.openstack.org/wiki/Gnocchi
[Grafana] Grafana:Anopensource,featurerichmetricsdashboardandgrapheditorforGraphite,InfluxDB&OpenTSDB,http://grafana.org/
[Graphite] Graphite: A Highly Scalable Real-time Graphing System,https://github.com/graphite-project/graphite-web
[Hodge04] V. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies”,ArtificialIntelligenceReview22(2004),pp.85-126
[httperf] https://github.com/httperf/httperf
[Icinga] ICINGA.,https://www.icinga.org/
[InfluxDB] InfluxDB:Anopen-source,distributed,timeseriesdatabasewithnoexternaldependencies,http://influxdb.com/
[Monalisa] MONitoring Agents using a Large Integrated Services Architecture,http://monalisa.caltech.edu/monalisa.htm
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium47
[Monasca] OpenstackMonascaproject,https://wiki.openstack.org/wiki/Monasca
[Nagios] Nagios Is The Industry Standard In IT Infrastructure Monitoring,http://www.nagios.org/
[NFVIFA005] NetworkFunctionsVirtualisation(NFV);ManagementandOrchestration;Or-Vireferencepoint–InterfaceandInformationModelSpecification,workinprogress,November2015
[nodejs] https://nodejs.org/en/
[NVFINF010] ETSI GS NFV-INF 010 V1.1.1 (2014-12), Network Functions Virtualisation(NFV);ServiceQualityMetrics
[OpenNetMon]N.L.M.vanAdrichem,C.Doerr,andF.A.Kuipers,"OpenNetMon:NetworkmonitoringinOpenFlowSoftware-DefinedNetworks,"inNetworkOperationsandManagementSymposium(NOMS),2014IEEE,2014,pp.1-8.
[Payless] S.R.Chowdhury,M.F.Bari,R.Ahmed,andR.Boutaba,"PayLess:AlowcostnetworkmonitoringframeworkforSoftwareDefinedNetworks,"inNetworkOperationsandManagementSymposium(NOMS),2014IEEE,2014,pp.1-9.
[Prediction] OPNFV Wiki – Data Collection for Failure Prediction,https://wiki.opnfv.org/prediction
[SeaLion] Sealion; Quickly Diagnose Problems with you Linux Servers,https://sealion.com/
[Shinken] Shinken,http://www.shinken-monitoring.org/
[Stacktach] Stacktach, Event-based Monitoring & Billing solution for OpenStack,https://github.com/rackerlabs/stacktach
[Statsd] StatsD; Simple daemon for easy stats aggregation,https://github.com/etsy/statsd/
[Telemetry] OpenstackTelemetry,https://wiki.openstack.org/wiki/Telemetry
[vSphere] vmwarevSphere,http://www.vmware.com/products/vsphere
[Zabbix] ZABBIX, The Enterprise-class Monitoring Solution for Everyone,http://www.zabbix.com/
[Zenoss] ZenossUserCommunity,http://www.zenoss.org/
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium48
8. LISTOFACRONYMS
Acronym Explanation
API ApplicationProgrammingInterface
CPU CentralProcessingUnit
DPDK DataPacketDevelopmentKit
FPGA FieldProgrammableGateArray
GPU GraphicsProcessingUnit
HW Hardware
KPI KeyPerformanceIndicator
NFV NetworkFunctionsVirtualisation
NFVI NFVInfrastructure
NFVIPoP NFVIPoint-of-Presence
NFVO NFVOrchestrator
OID ObjectIdentifier
OPNFV OpenPlatformforNFV
OS OperatingSystem
REST RepresentationalStateTransfer
SNMP SimpleNetworkManagementProtocol
SoC System-on-Chip
VDU VirtualDeploymentUnit
VIM VirtualisedInfrastructureManager
VIMMM VIMMonitoringManager
VM VirtualMachine
VNF VirtualNetworkFunction
VNFD VNFDescriptor
VNFM VNFManager
VNFP VNFProvider
vTC VirtualTrafficClassifier
YANG YetAnotherNextGeneration
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium49
9. ANNEXI:SURVEYOFRELEVANTIT/NETWORKMONITORINGTOOLS
This sectionpresents abriefoverviewofexisting frameworks formonitoring virtualized ITinfrastructuresaswellasSDN-enablednetworks,anddiscussestechnologieswhichcouldbepartiallyre-usedinT-NOVA.
9.1.1. IT/Cloudmonitoring
9.1.1.1. Shinken
Shinken isanopensourcesystemandnetworkmonitoringapplication [Shinken]. It is fullycompatiblewithNagiosplugins.ItstartedasaproofofconceptforanewNagiosarchitecture,but since the proposal was turned down by the Nagios authors, Shinken became anindependenttool.ItisnotaforkofNagios;itisatotalrewriteinPython.Itwatcheshostsandservices,gathersperformancedataandalertsuserswhenerrorconditionsoccurandagainwhentheconditionsclear.Shinken'sarchitectureisfocusedonofferingeasierloadbalancingandhighavailabilitycapabilities.ThemaindifferencesandadvantagestowardNagiosare:
• Amoreefficientdistributedmonitoringandhighavailabilityarchitecture
• GraphiteintegrationintheWebUI
• Betterperformance,mostlyduetotheuseofadistributeddatabase(MongoDB)
9.1.1.2. Icinga
Icingaisanopen-sourcenetworkandsystemmonitoringapplicationwhichwasbornoutofaNagios fork [Icinga]. Itmaintainsconfigurationandplug-incompatibilitywiththe latter. Itsnewfeaturesareasfollows:
• AmodernWeb2.0styleuserinterface;• Aninterfaceformobiledevices;• Additionaldatabaseconnectors(forMySQL,Oracle,andPostgreSQL);• RESTfulAPI.
CurrentlytherearetwoflavoursofIcingathataremaintainedbytwodifferentdevelopmentbranches:Icinga1(theoriginalNagiosfork)andIcinga2(wherethecoreframeworkisbeingreplacementbyafullrewrite).
9.1.1.3. Zenoss
ZenossisanopensourcemonitoringplatformreleasedundertheGPLv2license[Zenoss]Itprovides an easy-to-use Web UI to monitor performance, events, configuration, andinventory.Zenossisoneofthebestoptionsforunifiedmonitoringasitiscloud-agnosticandisopensource.Zenossprovidespowerfulplug-insnamedZenpacks,whichsupportmonitoringonhypervisors(ESX,KVM,XenandHyperV),privatecloudplatforms(CloudStack,OpenStackand vCloud/vSphere), and public cloud (AWS). InOpenStack Zenoss integrateswithNova,KeystoneandOpenStackTelemetry.
9.1.1.4. Ganglia
Gangliaisascalabledistributedsystemmonitortoolforhigh-performancecomputingsystemssuchasclustersandgrids[Ganglia].Itsstructureisbasedonahierarchicaldesignusingatree
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium50
of point-to-point connections among cluster nodes. Ganglia is based on an XML datarepresentation,XDRforcompactandRRDtoolfordatastorageandvirtualisation.TheGangliasystemcontains:
1. Twouniquedaemons,gmondandgmetad2. APHP-basedwebfront-end3. Othersmallprograms
gmond runs on each node tomonitor changes in the host state, to announce applicablechanges,tolistentothestateofallGanglianodesviaaunicastormulticastchannelbasedoninstallation, and to respond to requests. gmetad (GangliaMeta Daemon) polls at regularintervalsacollectionofdatasources,parses theXMLandsavesallmetrics to round-robindatabases.AggregatedXMLcanthenbeexported.
TheGangliawebfrontendiswritteninPHP.ItusesgraphsgeneratedbygmetadandprovidesthecollectedinformationlikeCPUutilisationforthepastday,week,month,oryear.Gangliahasbeenusedtolinkclustersacrossuniversitycampusesandaroundtheworldandcanscaleto handle clusters with 2000 nodes. However, further work is required in order for it tobecomemorecloud-agnostic.
9.1.1.5. StackTach
StackTach isadebuggingandmonitoringutility forOpenStackthatcanworkwithmultipleDataCentres,includingmulti-celldeployment[Stacktach].Itwasinitiallycreatedasabrowser-baseddebuggingtoolforOpenStackNova.Sincethattime,StackTachhasevolvedintoatoolthat can perform debugging, monitoring and auditing. StackTach is quickly moving intoMetrics,SLAandMonitoringterritorywithversion2andtheinclusionofStacky,thecommandline interface to StackTach. StackTach contains aworker that reads notifications from theOpenStack’sRabbitMQqueuesandstorestheminadatabase.Fromthere,StackTachreviewsthestreamofnotificationstogleanusage informationandassemble it inaneasy-to-queryfashion.Userscaninquireoninstances,requests,servers,etc.usingthebrowserinterfaceortheStackycommandlinetool.RackspaceisworkingonStackTachintegrationwithTelemetry.
9.1.1.6. SeaLion
SeaLionisacloud-basedsystemmonitoringtoolforLinuxservers.Itinstallsanagentinthesystem,whichcanberunasanunprivilegeduser[SeaLion].Theagentcollectsdataatregularintervalsacrossserversandthisdatawillbeavailableonyourworkspace.Sealionprovidesahigh-level view (graphical overview) of Linux server activity. The monitoring data aretransmittedoverSSLtotheSeaLionservers.Theserviceprovidesgraphs,chartsandaccesstotherawgathereddata.
9.1.1.7. MonALISA
MONitoringAgents using a Large Integrated ServicesArchitecture (MonaLISA) is basedonDynamicDistributedServiceArchitectureandisabletoprovidecompletemonitoring,controland global optimisation services for complex systems[Monalisa]. TheMonALISA system isdesigned as a collection of autonomous multi-threaded, self-describing agent-basedsubsystems which are registered as dynamic services, and are able to collaborate andcooperateinperformingawiderangeofinformationgatheringandprocessingtasks.
Theagentscananalyseandprocesstheinformationinadistributedway,inordertoprovideoptimisation decisions in large-scale distributed applications. The scalability of the systemderives from the use of amultithreaded execution engine, that hosts a variety of looselycoupledself-describingdynamicservicesoragents,andtheabilityofeachservicetoregister
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium51
itselfandthentobediscoveredandusedbyanyotherservices,orclientsthatrequiresuchinformation. The system is designed to easily integrate existing monitoring tools andproceduresandtoprovidethisinformationinadynamic,customised,self-describingwaytoanyotherservicesorclients.
By using MonALISA the administrator is able to monitor all aspects of complex systems,including:
• Systeminformationforcomputernodesandclusters;• Networkinformation(traffic,flows,connectivity,topology)forWANandLAN;• Monitoringtheperformanceofapplications,jobsorservices;and• End-usersystemsandend-to-endperformancemeasurements.
9.1.1.8. collectd,StatsDandGraphite
Cloudinstancesmayalsobemonitoredbyusingacollectionofseparateopensourcetools.collectdisadaemonwhichcollectssystemperformancestatisticsperiodicallyandprovidesmechanismstostorethevaluesinavarietyofways[Collectd].collectdgathersstatisticsaboutthesystemitisrunningonandstoresthisinformation.Thesestatisticscanthenbeusedtofindcurrentperformancebottlenecks(i.e.performanceanalysis)andpredictfuturesystemload(i.e.,capacityplanning).collectdiswritteninCforperformanceandportability,allowingittorunonsystemswithoutscriptinglanguageorcrondaemon,suchasembeddedsystems.Atthesametimeitincludesoptimisationsandfeaturestohandlebigamountsofdatasets.StatsD[Statsd]isaNode.JSdaemonthatlistensformessagesonaUDPtoTCPport.StatsDlistensforstatistics,likecountersandtimersandthenparsesthemessages,extractsmetricsdata,andperiodicallyflushesthedatatootherservicesinordertobuildgraphs.AtoolthatcanbeusedtobuildgraphsafterwardsisGraphite[Graphite],whichisabletostorenumerictime-seriesdataandrendergraphsofthedataondemand.
9.1.1.9. vSphere
The vSphere statistics subsystem collects data on the resource usage of inventory objects[vSphere].Dataonawiderangeofmetricsiscollectedatfrequentintervals,processedandarchived inadatabase.Statistics regardingthenetworkutilisationarecollectedatCluster,Host andVirtualMachine levels. In addition vSphere supports performancemonitoring ofguestoperatingsystems,gatheringstatisticsregardingnetworkutilisationamongothers.
9.1.1.10. AmazonCloudWatch
AmazonCloudWatch isamonitoringservice forAWScloud resourcesand theapplicationsrunningonAWS[Cloudwatch].Itprovidesreal-timemonitoringtoAmazon'sEC2customersontheirresourceutilisationsuchasCPU,diskandnetwork.However,CloudWatchdoesnotprovideanymemory,diskspace,orloadaveragemetricswithoutrunningadditionalsoftwareon the instance. Itwasprimarilydesigned forusewithAmazonElasticLoadBalancingandAutoScalingwithloadbalancinginmind:theservicechecksCPUusageonmultipleinstancesandautomaticallycreatesadditionaloneswhentheloadincreases.
9.1.2. NetworkMonitoring
Networkmonitoring isadomain thathasattractedsignificantattention fromthe researchcommunity over the past decades, withwell-established technologies and standardswithregard to measurement processes (active and passive) as well as the communication ofmonitoringmetrics(SNMP,IPFIX,sFlowetc.).
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium52
InthecontextofT-NOVA,wherenetworkmanagement,atleastwithineachNFVI-PoPisbasedonOpenFlow,themeasurementprocesswillleverageOpenFlow’smonitoringcapabilities.
OpenFlowprovidesthecapability toreportper-flowandper-portmetrics, reportedbytheswitchitself.ThesemetricsarethencollectedbytheControllerandcommunicatedtoSDNcontrolapplicationsviathenorthboundAPIoftheControllerit-self(Figure16).AlmostallSDNcontrollersofferthecapabilitytoexposemonitoringmetrics,eitherviaAPIcallsorlanguagebindings.Inthisrespect,theOpenFlow-basedarchitectureprovidesthecapabilitytomonitorallnetworkelementsinauniformandvendor-agnosticmanner.
NetworkDevices
SouthboundAPI(OpenFlow)
CONTROLLERS(NOX,POX,OpenDaylight,Floodlight,Beacon,Ryu,Trema,Mul,
Jaxon,Maestro,NodeFlow,Ovs-controller,NDDI-OESS)
NorthboundAPI
SDNApplications
Monitoring
Figure16.CommunicationofmonitoringmetricsinanOpenFlow-enabledarchitecture
Inthiscontext,severalmonitoringapplicationshavebeendeveloped,leveragingOpenFlowcapabilities for integrated network management tasks. Some of these applications areoverviewedinthetablebelow.
Table12.OpenFlowmonitoringapplications
MonitoringApplication
Briefdescription Control-lerUsed
OpenSource
Availableat
OpenNetMon OpenNetMon[OpenNetMon]continuouslymonitorsallflowsbetweenpredefinedlinkdestinationpairsonthroughput,packetlossanddelay
POX Yes
https://github
.com
/TUD
elftNAS
/SDN
-Ope
nNetMon
/
T-NOVA|DeliverableD4.41 MonitoringandMaintenance-Interim
©T-NOVAConsortium53
Payless Payless[Payless]providesaflexibleRESTfulAPIforflowstatisticscollectionatdifferentaggregationlevels.Itusesanadaptivestatisticscollectionalgorithmthatdelivershighlyaccurateinformationinreal-timewithoutincurringsignificantnetworkoverhead.
POX, NOX,OpenDayLight
Yes
http://gith
ub.com
/srcviru
s/flo
odlight.
DCM DCM[DCM]allowsswitchestocollaborativelyachieveflow-monitoringtasksandbalancemeasurementload.
None(nativeOF)
No Notavailable
FlowSense FlowSense[Flowsense]achievesapush-basedapproachtoperformancemonitoringinflow-basednetworks,wherethenetworkinformsofperformancechanges,ratherthanqueryit.
None(nativeOF)
No Notavailable
In addition, many of the monitoring frameworks mentioned in Section 9.1.1 for cloudinfrastructurescanbealsousedformonitoringOpenFlowinfrastructures,viatheappropriateplugins.