A Deep-Reinforcement Learning Approach for Software ...
Transcript of A Deep-Reinforcement Learning Approach for Software ...
ADeep-ReinforcementLearningApproachforSoftware-DefinedNetworkingRoutingOptimization
GiorgioStampa*,MartaArias,DavidSanchez-Charles,VictorMuntes-Mulero,AlbertCabellos
IETF100–NMGRGSingapore,November2017
1
Knowledge-DefinedNetworking• AKnowledgeplane,ontopofcontrol
andmanagementplanes,shouldallowfor:– automation&optimization– prediction
• MachineLearningcantakeadvantageof:– Thefullviewprovidedbythe
NetworkAnalyticsplatform– Thefullcontrolprovidedbythe
(logically)centralizedmanagementandcontrolplanes
• WhichMachineLearningtechnique?• Howweapplyit?
A.Mestres,A.Rodriguez-Natal,J.Carner,P.Barlet-Ros,E.Alarcón,M.Solé,V.Muntés,D.Meyer,S.Barkai,M.J.Hibbett,G.Estrada,K.Maruf,F.Coras,V.Ermagan,H.Latapie,C.Cassar,J.Evans,F.Maino,J.Walrand,andA.Cabellos,“Knowledge-DefinedNetworking,”SIGCOMMComput.Commun.2017https://arxiv.org/abs/1606.06222
2
Fig. 1. The Knowledge plane
shows an overview of the KDN architecture and its functionalplanes.
The Data Plane is responsible for storing, forwarding andprocessing data packets. In SDN networks, data plane devicesare typically white-boxes composed of line-rate programmableforwarding hardware. They operate unaware of the rest ofthe network and rely on the other planes to populate theirforwarding tables and update their configuration.
The Control Plane exchanges operational state in orderto update the data plane matching and processing rules. Onan SDN network, this role is assigned to the –logicallycentralized– SDN controller that programs SDN data-planeforwarding elements via a southbound interface, typicallyusing an imperative language. While the data-plane operatesat packet time-scales, the control-plane is slower and typicallyoperates at flow time-scales.
The Management Plane ensures the correct operation andperformance of the network in the long term. It defines thenetwork topology and handles the provision and configurationof network devices. In SDN networks this is usually handledalso by the SDN controller. The management plane is alsoresponsible of monitoring the network to provide criticalnetwork analytics. For this it collects telemetry informationfrom the data plane while keeping an historical record ofnetwork state and events. The management plane is orthogonalto the control and data planes and typically operates at a slowertime-scale.
The Knowledge Plane, as originally defined by Clark, isredefined in this paper under the terms of SDN. In thatsense, we adapt Clarks definition of the KP: the heart of the
knowledge plane is its ability to integrate behavioral models
and reasoning processes into an SDN network. The KP takesadvantage of the Control and Management planes to obtaina full view and control over the network. It is responsibleof learning the behavior of the network and, in some cases,automatically operate the network accordingly. Fundamentally,the KP processes the network analytics collected by the man-
Intent language
Forwarding elements
SDN controller
Analytics platform
Machine learning
Human decisionKnowledge
Automatic decision
Fig. 2. KDN operational loop
agement plane, transforms them into knowledge via machinelearning, and uses that knowledge to take decisions (eitherautomatically or through human intervention). Parsing theinformation and learning from it is typically a slow process,however using such knowledge automatically can be done ata time-scales close to the control and management planes.
III. KNOWLEDGE-DEFINED NETWORKING
The Knowledge-Defined Networking architecture operatesby means of a loop -in a similar way to control systems– toprovide automation, recommendation, optimization, validationand estimation. Fig. 2 shows the main steps of such loop, inwhat follows we describe them in detail.
a) Forwarding Elements ! Analytics Platform: The Ana-lytics Platform aims to gather as much information as possibleto offer a complete view of the network. To that end, itmonitors in real time the Data Plane elements while they for-ward packets in order to access fine-grain traffic information.Besides, it queries the state at the SDN controller to obtaincontrol and management state. The analytics platform relieson protocols such NETCONF2 (to obtain the configurationand operational data from network devices) and NetFlow3 (toextract traffic information and samples). The most relevant datacollected by the Analytics Platform is summarized below.• Packet-level data and flow-level data: this includes DPI
information, flow granularity data and relevant trafficfeatures.
• Network state: This includes the logical and physicalconfiguration of the network as well as the networktopology.
• Service-level telemetry: In some scenarios the analyticsplatform will also monitor and store service-level infor-mation (e.g, load of the services, QoE, etc), this is rele-vant to learn the service-related behavior and its relationwith network performance, load and configuration.
2RFC 62413RFC 3954
2
DeepReinforcementLearning
Reward
State
Action:Left?Right?Straight?
Mnih,Volodymyr,etal."Human-levelcontrolthroughdeepreinforcementlearning."Nature518.7540(2015):529-533. 3
DeepReinforcementLearningChangeNetwork
Configuration
NetworkInfrastructure
TargetPerformance
State:TrafficandPerformance
4
DRL:InternalArchitecture• DRL=ReinforcementLeraning+
DeepLearning• NovelActor/CriticArchitecture• TheActoractsuponthesystem• TheCriticreceivesthereward
functionandmodifiestheweightsoftheActor’sNeuralNetwork.
• Explorationvs.Exploitation• Exploration:Trainingonthe
system• Exploitation:Optimizationofthe
system
Mnih,Volodymyr,etal."Human-levelcontrolthroughdeepreinforcementlearning."Nature518.7540(2015):529-533.
5
DRLforSDNRouting
Software-Defined Networking Controller!
DRL Agent
Network Analytics!
ACTION: Change Routing Configuration!
Weights [w0…w1]
STATE: Current Traffic Load !
Traffic Matrix
Network !Management!
Plane!
REWARD FUNCTION: Minimize Delay
REWARD: Delay
6
DRLforSDNRouting
Software-Defined Networking Controller!
DRL Agent
Network Analytics!
ACTION: Change Routing Configuration!
Weights [w0…w1]
STATE: Current Traffic Load !
Traffic Matrix
Network !Management!
Plane!
REWARD FUNCTION: Minimize Delay
REWARD: Delay
ThisistheNetworkPolicy
7
Methodology
B. Quoitin et al., “IGen: Generation of router-level Internet topologies through network design heuristics,” in ITC, 2009.
Large - Realistic topology of 79 nodes
and over 200 links Optimization Algorithm Simulated Annealing
Small - Realistic topology of 13 nodes
and over 25 links DRL Agent
Exploration 100k steps (overall)
2000 iterations per traffic
1M different traffic matrices
Compare Performance
8
Results:LearningRate
G.Stampa,M.Arias,D.Sanchez-Charles,V.Muntes-Mulero,andA.Cabellos,“ADeep-ReinforcementLearningApproachforSoftware-DefinedNetworkingRoutingOptimization,”arXivpreprintarXiv:1709.07080,2017. 9
Results:Performance
G.Stampa,M.Arias,D.Sanchez-Charles,V.Muntes-Mulero,andA.Cabellos,“ADeep-ReinforcementLearningApproachforSoftware-DefinedNetworkingRoutingOptimization,”arXivpreprintarXiv:1709.07080,2017. 10
DRL:OperatinalAdvantages• FullyAutonomous– Doesnotrequirepriorknowledgeofthenetwork– Worksonlineandinreal-time– Learnsandoptimizesautonomously
• Advantagesovertraditionaloptimizationalgorithms– DRLprovidesconstanttimeoptimizationvs.thelengthysearchprocessoftraditionalalgorithms
– Model-free:Learnsfromtheenvironmentdynamics,noneedforsimulationoranalyticalmodel.
– Black-boxoptimization:WithDRLagents,differentrewardfunctionscantargetdifferentpolicies,withouttheneedofdesigninganewalgorithm.Traditionalalgorithmsaretayloredtotheperformancepolicy.
11
ChallengesofDRL:Training
DRL Agent DRL Agent DRL Agent
Simulator Expert
RealInfrastructure Simulator Expert(HumanorAlgorithm)
MaybreakyournetworkDuringtheexplorationphase!
12
LackofExplainability
• DeepNeuralNetworksareinherentlyblackboxes.Wedon´tknow:– Whenwillitwork,whenwillitfail– Whydoesitwork,whyitdoesn´t
• Noguarantees,notroubleshooting• Solution:ExplainableArtificialIntelligence– Aimstodeveloptechniquestodevelopexplainableneuralnetworks
13
Samek,Wojciech,ThomasWiegand,andKlaus-RobertMüller."ExplainableArtificialIntelligence:Understanding,VisualizingandInterpretingDeepLearningModels."arXivpreprintarXiv:1708.08296(2017).
Rewardfunction=NetworkManagementPolicy
• Therewardfunctioneffectivelyrepresents,inamathematicallanguage,thenetworkmanagementpolicy
• Openquestions– Canweactuallyrepresentanynetworkpolicy?
• Aretherefundamentallimitations?
– Howcanwecompileexistingnetworkpolicylanguagestothemathematicallanguage?
14
Summary&Conclusions
• DeepReinforcementLearningrepresentsthefullrealizationofanautonomousintelligentnetwork
• Manyadvantages– Real-timeoperation(constant-timeoptimization)– Plug&Play(black-boxoperation)– Noconfiguration,justpicktherewardfunction
• Challenges– Training:Online,offlineorviaanexpert– Noguarantees:Towardsexplainability
15
Datasets,CodeandPapers
• Knowledge-DefinedNetworkinghttps://github.com/knowledgedefinednetworking• DRLforSDNRouting(codeanddata-sets)https://github.com/knowledgedefinednetworking/a-deep-rl-approach-for-sdn-routing-optimization• Work-in-ProgressPaperhttps://arxiv.org/pdf/1709.07080.pdf
16