Neutron L2 and L3 agentsHow They Work and How Kilo Improves ThemCarl Baldwin, Rossella Sblendido / May 18, 2014
2
Typical OpenStack Deployment
L2 Agent
4
L2 Agent
• Runs on compute node
• Configures the local vbridges (br-int, br-tun)
• Wires new devices
• Applies Security Group Rules
• Communicates with the Neutron server over RPC
5
When a VM is created...
6
Agent loop events
• OVSDB monitor has updates
• Neutron server messages Security groups change (rule updated, member added, provider
rule updated)
Port update
• OVS restarted
7
Detect ports changes
• OVSDB monitor signals if something has changed on the host
• OVS agent scans all the ports in the machine
• It keeps track of the ports that has already processed using an internal dict (registered_ports)
• Diff registered_ports with the result of the scanning → infer devices added and deleted
8
Process network ports – Port added
• request the device details
• provision local VLAN and install proper flows
• set up port filters
• update_device_up
9
Process network ports – Port deleted
• Remove filters
• update_device_down
• claim local VLAN if it's the last device
10
Processing Neutron server messages
• Updated port, same process as added ports
• Security group changes, filters are reapplied for the all devices affected
11
OVS restarted
• Detected using a canary flow
• Reconfigure bridges
• registered_ports is cleared, all ports are reprocessed
12
If an exception is throw?
• registered_ports is cleared, all the ports are reprocessed
• Full resync!
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The OpenStack TM attribution statement should used: The OpenStack wordmark and the Square O Design, together or part, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission.
L3 Agent
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Deployment
• Network Hosts– Legacy with 1 Agent– HA with more than 1 Agent– DVR
• Centralized part is like Legacy– API Available to manage association
• Compute Hosts– DVR
• Distributed part bound to multiple hypervisors
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
L3 Agent
• Receives update notifications for routers• Router Processing Queue
– Prioritize user actions so agent is responsive– Less priority to full sync
• Sends status updates
Router Status
51812f4e-e0a8-479a-a116-f588cb020b91
Processing…
5b80e13e-cd2d-40d6-aaea-856bcc4242f6
Processing…
d95effe5-11ca-4450-ba45-615e40d159c6
Processing…
e50750d2-42e3-4e34-888f-cef236a993f7
Processing…
be19c28c-6789-44ce-bb29-8dd4a9944deb
Waiting…
6f81708c-404e-4738-a21c-73eb2b8c2599
Waiting…
4206b114-2e97-4963-9a5d-140cfec95977
Waiting…
… …
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16
Router Internals
• Network namespaces (ip netns)• L2 Interfaces moved into namespace
– OVS port– Veth pair (virtual cables)
• IP address configured on interfaces• Simple routing and extra routes• Iptables for NAT and metadata• Proxy for metadata access• External access for instances without floating IP• Advanced Services
– FWaaS– VPNaaS
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Compute Host L3 Agent
• DVR only– Floating IPs for north/south IPv4 routing– East/west IPv4 routing
• FWaaS Integrated (partially)
“VM1-1
patch-tun
br-int
eth0
QRouter-X “VM2-1
patch-tun
br-int
eth0
QRouter-X
L2 Agent Restructuring
19
Restructuring work
• Get more info from OVSDB monitor
• Improve RPC calls
• Improve resync
20
OVSDB monitor get events
• Improve OvsdbMonitor so that it can pass to the agent the devices that were added or deleted
• The agent consumes the events, don't scans the ports all the time
21
Improve RPC calls
• Use a bulk call to update the status (up/down) of several devices
• Add a parameter: failed devices
• Don't refresh all the devices when security_groups_provider_updated is got but just those affected
• Add the attributes modified in port update so that the L2 agent can decide if reprocessing is needed
22
Improve resync
• Don't resync all the devices if an error is got
• Add a parameter in the RPC calls that collects the devices that caused an error
• The OVS agent can resync only the devices that failed The operation can be retried or failure ignored
23
Did this improve the situation? Let's test!
• VM running Devstack
• Rally scenario "args": {
"flavor": {
"name": "m1.tiny"
},
"image": {
"name": "cirros-0.3.4-x86_64-uec"
},"runner": {
"concurrency": 2,
"times": 20,
"type": "constant"
}
24
Results before
25
Results after
26
It worked!
• Min time 0.6% better
• Avg time 4% better
• 95th percentile 5.9% better
27
There's still work to do...
• Instead of using the command line for OVSDB monitor use the OVS Python library
• Create a queue of events to be processed so that multiple workers can be introduced
• Add priority to events so that higher priority events can be processed first
• Improve state convergence between agent and the server (resilience in case of failure)
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. The OpenStack TM attribution statement should used: The OpenStack wordmark and the Square O Design, together or part, are trademarks or registered trademarks of OpenStack Foundation in the United States and other countries, and are used with the OpenStack Foundation’s permission.
L3 Agent Restructuring
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.29
Handyman Model
• One big file, one object: the agent• Jack of all trades
– Worse: it was a bit forgetful
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.30
Contractor Model
• Time to move to a contractor model– Agent is the contractor– Calls in specialists to do the work– One contractor for network node, other for hypervisor
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.31
Specialists
• New specialist for each type of router
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.32
More Specialized
© Copyright 2015 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.33
Future Work
• Eliminate full sync on router• Too much internal state• Simplify DVR• L3 VPN• Eliminate IPv4 waste• DVR for IPv6
Thank you!
Top Related