A Deep-Reinforcement Learning Approach for Software ...

ADeep-ReinforcementLearningApproachforSoftware-DefinedNetworkingRoutingOptimization

GiorgioStampa*,MartaArias,DavidSanchez-Charles,VictorMuntes-Mulero,AlbertCabellos

[email protected]

IETF100–NMGRGSingapore,November2017

1

Knowledge-DefinedNetworking•  AKnowledgeplane,ontopofcontrol

andmanagementplanes,shouldallowfor:–  automation&optimization–  prediction

•  MachineLearningcantakeadvantageof:–  Thefullviewprovidedbythe

NetworkAnalyticsplatform–  Thefullcontrolprovidedbythe

(logically)centralizedmanagementandcontrolplanes

•  WhichMachineLearningtechnique?•  Howweapplyit?

A.Mestres,A.Rodriguez-Natal,J.Carner,P.Barlet-Ros,E.Alarcón,M.Solé,V.Muntés,D.Meyer,S.Barkai,M.J.Hibbett,G.Estrada,K.Maruf,F.Coras,V.Ermagan,H.Latapie,C.Cassar,J.Evans,F.Maino,J.Walrand,andA.Cabellos,“Knowledge-DefinedNetworking,”SIGCOMMComput.Commun.2017https://arxiv.org/abs/1606.06222

2

Fig. 1. The Knowledge plane

shows an overview of the KDN architecture and its functionalplanes.

The Data Plane is responsible for storing, forwarding andprocessing data packets. In SDN networks, data plane devicesare typically white-boxes composed of line-rate programmableforwarding hardware. They operate unaware of the rest ofthe network and rely on the other planes to populate theirforwarding tables and update their configuration.

The Control Plane exchanges operational state in orderto update the data plane matching and processing rules. Onan SDN network, this role is assigned to the –logicallycentralized– SDN controller that programs SDN data-planeforwarding elements via a southbound interface, typicallyusing an imperative language. While the data-plane operatesat packet time-scales, the control-plane is slower and typicallyoperates at flow time-scales.

The Management Plane ensures the correct operation andperformance of the network in the long term. It defines thenetwork topology and handles the provision and configurationof network devices. In SDN networks this is usually handledalso by the SDN controller. The management plane is alsoresponsible of monitoring the network to provide criticalnetwork analytics. For this it collects telemetry informationfrom the data plane while keeping an historical record ofnetwork state and events. The management plane is orthogonalto the control and data planes and typically operates at a slowertime-scale.

The Knowledge Plane, as originally defined by Clark, isredefined in this paper under the terms of SDN. In thatsense, we adapt Clarks definition of the KP: the heart of the

knowledge plane is its ability to integrate behavioral models

and reasoning processes into an SDN network. The KP takesadvantage of the Control and Management planes to obtaina full view and control over the network. It is responsibleof learning the behavior of the network and, in some cases,automatically operate the network accordingly. Fundamentally,the KP processes the network analytics collected by the man-

Intent language

Forwarding elements

SDN controller

Analytics platform

Machine learning

Human decisionKnowledge

Automatic decision

Fig. 2. KDN operational loop

agement plane, transforms them into knowledge via machinelearning, and uses that knowledge to take decisions (eitherautomatically or through human intervention). Parsing theinformation and learning from it is typically a slow process,however using such knowledge automatically can be done ata time-scales close to the control and management planes.

III. KNOWLEDGE-DEFINED NETWORKING

The Knowledge-Defined Networking architecture operatesby means of a loop -in a similar way to control systems– toprovide automation, recommendation, optimization, validationand estimation. Fig. 2 shows the main steps of such loop, inwhat follows we describe them in detail.

a) Forwarding Elements ! Analytics Platform: The Ana-lytics Platform aims to gather as much information as possibleto offer a complete view of the network. To that end, itmonitors in real time the Data Plane elements while they for-ward packets in order to access fine-grain traffic information.Besides, it queries the state at the SDN controller to obtaincontrol and management state. The analytics platform relieson protocols such NETCONF2 (to obtain the configurationand operational data from network devices) and NetFlow3 (toextract traffic information and samples). The most relevant datacollected by the Analytics Platform is summarized below.• Packet-level data and flow-level data: this includes DPI

information, flow granularity data and relevant trafficfeatures.

• Network state: This includes the logical and physicalconfiguration of the network as well as the networktopology.

• Service-level telemetry: In some scenarios the analyticsplatform will also monitor and store service-level infor-mation (e.g, load of the services, QoE, etc), this is rele-vant to learn the service-related behavior and its relationwith network performance, load and configuration.

2RFC 62413RFC 3954

2

DeepReinforcementLearning

Reward

State

Action:Left?Right?Straight?

Mnih,Volodymyr,etal."Human-levelcontrolthroughdeepreinforcementlearning."Nature518.7540(2015):529-533. 3

DeepReinforcementLearningChangeNetwork

Configuration

NetworkInfrastructure

TargetPerformance

State:TrafficandPerformance

4

DRL:InternalArchitecture•  DRL=ReinforcementLeraning+

DeepLearning•  NovelActor/CriticArchitecture•  TheActoractsuponthesystem•  TheCriticreceivesthereward

functionandmodifiestheweightsoftheActor’sNeuralNetwork.

•  Explorationvs.Exploitation•  Exploration:Trainingonthe

system•  Exploitation:Optimizationofthe

system

Mnih,Volodymyr,etal."Human-levelcontrolthroughdeepreinforcementlearning."Nature518.7540(2015):529-533.

5

DRLforSDNRouting

Software-Defined Networking Controller!

DRL Agent

Network Analytics!

ACTION: Change Routing Configuration!

Weights [w0…w1]

STATE: Current Traffic Load !

Traffic Matrix

Network !Management!

Plane!

REWARD FUNCTION: Minimize Delay

REWARD: Delay

6

DRLforSDNRouting

Software-Defined Networking Controller!

DRL Agent

Network Analytics!

ACTION: Change Routing Configuration!

Weights [w0…w1]

STATE: Current Traffic Load !

Traffic Matrix

Network !Management!

Plane!

REWARD FUNCTION: Minimize Delay

REWARD: Delay

ThisistheNetworkPolicy

7

Methodology

B. Quoitin et al., “IGen: Generation of router-level Internet topologies through network design heuristics,” in ITC, 2009.

Large - Realistic topology of 79 nodes

and over 200 links Optimization Algorithm Simulated Annealing

Small - Realistic topology of 13 nodes

and over 25 links DRL Agent

Exploration 100k steps (overall)

2000 iterations per traffic

1M different traffic matrices

Compare Performance

8

Results:LearningRate

G.Stampa,M.Arias,D.Sanchez-Charles,V.Muntes-Mulero,andA.Cabellos,“ADeep-ReinforcementLearningApproachforSoftware-DefinedNetworkingRoutingOptimization,”arXivpreprintarXiv:1709.07080,2017. 9

Results:Performance

G.Stampa,M.Arias,D.Sanchez-Charles,V.Muntes-Mulero,andA.Cabellos,“ADeep-ReinforcementLearningApproachforSoftware-DefinedNetworkingRoutingOptimization,”arXivpreprintarXiv:1709.07080,2017. 10

DRL:OperatinalAdvantages•  FullyAutonomous–  Doesnotrequirepriorknowledgeofthenetwork– Worksonlineandinreal-time–  Learnsandoptimizesautonomously

•  Advantagesovertraditionaloptimizationalgorithms–  DRLprovidesconstanttimeoptimizationvs.thelengthysearchprocessoftraditionalalgorithms

– Model-free:Learnsfromtheenvironmentdynamics,noneedforsimulationoranalyticalmodel.

–  Black-boxoptimization:WithDRLagents,differentrewardfunctionscantargetdifferentpolicies,withouttheneedofdesigninganewalgorithm.Traditionalalgorithmsaretayloredtotheperformancepolicy.

11

ChallengesofDRL:Training

DRL Agent DRL Agent DRL Agent

Simulator Expert

RealInfrastructure Simulator Expert(HumanorAlgorithm)

MaybreakyournetworkDuringtheexplorationphase!

12

LackofExplainability

•  DeepNeuralNetworksareinherentlyblackboxes.Wedon´tknow:– Whenwillitwork,whenwillitfail– Whydoesitwork,whyitdoesn´t

•  Noguarantees,notroubleshooting•  Solution:ExplainableArtificialIntelligence– Aimstodeveloptechniquestodevelopexplainableneuralnetworks

13

Samek,Wojciech,ThomasWiegand,andKlaus-RobertMüller."ExplainableArtificialIntelligence:Understanding,VisualizingandInterpretingDeepLearningModels."arXivpreprintarXiv:1708.08296(2017).

Rewardfunction=NetworkManagementPolicy

•  Therewardfunctioneffectivelyrepresents,inamathematicallanguage,thenetworkmanagementpolicy

•  Openquestions– Canweactuallyrepresentanynetworkpolicy?

•  Aretherefundamentallimitations?

– Howcanwecompileexistingnetworkpolicylanguagestothemathematicallanguage?

14

Summary&Conclusions

•  DeepReinforcementLearningrepresentsthefullrealizationofanautonomousintelligentnetwork

•  Manyadvantages–  Real-timeoperation(constant-timeoptimization)–  Plug&Play(black-boxoperation)– Noconfiguration,justpicktherewardfunction

•  Challenges–  Training:Online,offlineorviaanexpert– Noguarantees:Towardsexplainability

15

Datasets,CodeandPapers

•  Knowledge-DefinedNetworkinghttps://github.com/knowledgedefinednetworking•  DRLforSDNRouting(codeanddata-sets)https://github.com/knowledgedefinednetworking/a-deep-rl-approach-for-sdn-routing-optimization•  Work-in-ProgressPaperhttps://arxiv.org/pdf/1709.07080.pdf

16

A Deep-Reinforcement Learning Approach for Software ...

Documents

Transcript of A Deep-Reinforcement Learning Approach for Software ...