L2 and L3 agent restructure

Post on 28-Jul-2015

73 views 1 download

Tags:

Transcript of L2 and L3 agent restructure

Neutron L2 and L3 agentsHow They Work and How Kilo Improves ThemCarl Baldwin, Rossella Sblendido / May 18, 2014

2

Typical OpenStack Deployment

L2 Agent

4

L2 Agent

• Runs on compute node

• Configures the local vbridges (br-int, br-tun)

• Wires new devices

• Applies Security Group Rules

• Communicates with the Neutron server over RPC

5

When a VM is created...

6

Agent loop events

• OVSDB monitor has updates

• Neutron server messages Security groups change (rule updated, member added, provider

rule updated)

Port update

• OVS restarted

7

Detect ports changes

• OVSDB monitor signals if something has changed on the host

• OVS agent scans all the ports in the machine

• It keeps track of the ports that has already processed using an internal dict (registered_ports)

• Diff registered_ports with the result of the scanning → infer devices added and deleted

8

Process network ports – Port added

• request the device details

• provision local VLAN and install proper flows

• set up port filters

• update_device_up

9

Process network ports – Port deleted

• Remove filters

• update_device_down

• claim local VLAN if it's the last device

10

Processing Neutron server messages

• Updated port, same process as added ports

• Security group changes, filters are reapplied for the all devices affected

11

OVS restarted

• Detected using a canary flow

• Reconfigure bridges

• registered_ports is cleared, all ports are reprocessed

12

If an exception is throw?

• registered_ports is cleared, all the ports are reprocessed

• Full resync!

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The OpenStack TM attribution statement should used: The OpenStack wordmark and the Square O Design, together or part, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission.

L3 Agent

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14

Deployment

• Network Hosts– Legacy with 1 Agent– HA with more than 1 Agent– DVR

• Centralized part is like Legacy– API Available to manage association

• Compute Hosts– DVR

• Distributed part bound to multiple hypervisors

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

L3 Agent

• Receives update notifications for routers• Router Processing Queue

– Prioritize user actions so agent is responsive– Less priority to full sync

• Sends status updates

Router Status

51812f4e-e0a8-479a-a116-f588cb020b91

Processing…

5b80e13e-cd2d-40d6-aaea-856bcc4242f6

Processing…

d95effe5-11ca-4450-ba45-615e40d159c6

Processing…

e50750d2-42e3-4e34-888f-cef236a993f7

Processing…

be19c28c-6789-44ce-bb29-8dd4a9944deb

Waiting…

6f81708c-404e-4738-a21c-73eb2b8c2599

Waiting…

4206b114-2e97-4963-9a5d-140cfec95977

Waiting…

… …

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16

Router Internals

• Network namespaces (ip netns)• L2 Interfaces moved into namespace

– OVS port– Veth pair (virtual cables)

• IP address configured on interfaces• Simple routing and extra routes• Iptables for NAT and metadata• Proxy for metadata access• External access for instances without floating IP• Advanced Services

– FWaaS– VPNaaS

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17

Compute Host L3 Agent

• DVR only– Floating IPs for north/south IPv4 routing– East/west IPv4 routing

• FWaaS Integrated (partially)

“VM1-1

patch-tun

br-int

eth0

QRouter-X “VM2-1

patch-tun

br-int

eth0

QRouter-X

L2 Agent Restructuring

19

Restructuring work

• Get more info from OVSDB monitor

• Improve RPC calls

• Improve resync

20

OVSDB monitor get events

• Improve OvsdbMonitor so that it can pass to the agent the devices that were added or deleted

• The agent consumes the events, don't scans the ports all the time

21

Improve RPC calls

• Use a bulk call to update the status (up/down) of several devices

• Add a parameter: failed devices

• Don't refresh all the devices when security_groups_provider_updated is got but just those affected

• Add the attributes modified in port update so that the L2 agent can decide if reprocessing is needed

22

Improve resync

• Don't resync all the devices if an error is got

• Add a parameter in the RPC calls that collects the devices that caused an error

• The OVS agent can resync only the devices that failed The operation can be retried or failure ignored

23

Did this improve the situation? Let's test!

• VM running Devstack

• Rally scenario "args": {

"flavor": {

"name": "m1.tiny"

},

"image": {

"name": "cirros-0.3.4-x86_64-uec"

},"runner": {

"concurrency": 2,

"times": 20,

"type": "constant"

}

24

Results before

25

Results after

26

It worked!

• Min time 0.6% better

• Avg time 4% better

• 95th percentile 5.9% better

27

There's still work to do...

• Instead of using the command line for OVSDB monitor use the OVS Python library

• Create a queue of events to be processed so that multiple workers can be introduced

• Add priority to events so that higher priority events can be processed first

• Improve state convergence between agent and the server (resilience in case of failure)

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The OpenStack TM attribution statement should used: The OpenStack wordmark and the Square O Design, together or part, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission.

L3 Agent Restructuring

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29

Handyman Model

• One big file, one object: the agent• Jack of all trades

– Worse: it was a bit forgetful

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30

Contractor Model

• Time to move to a contractor model– Agent is the contractor– Calls in specialists to do the work– One contractor for network node, other for hypervisor

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31

Specialists

• New specialist for each type of router

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.32

More Specialized

© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33

Future Work

• Eliminate full sync on router• Too much internal state• Simplify DVR• L3 VPN• Eliminate IPv4 waste• DVR for IPv6

Thank you!