Reflections on data plane performance, iptables and ipsets
Neil Jerram – Metaswitch & Project Calico
@neiljerram www.projectcalico.org
Who am I?
• Free software hacker since 1990s
• Metaswitch (previously Data Connection) since 1995
; line+.el
;
; version 1.1
;
; This has not (yet) been accepted by the Emacs Lisp archive,
; but if it is the archive entry will probably be something like this:
;; line+|Neil Jerram|[email protected]|
;; Line Numbering & Interrupt Driven Actions|
;; 1993-02-18|1.1|<archive pathname of line+.el>|
; Mished and mashed by Neil Jerram <[email protected]>,
; Monday 21 December 1992.
Free software work
• Emacs
• Guile
• Openmoko and GTA04 smartphones
Metaswitch and Project Calico
• 30+ year provider of high quality networking software, but mostly proprietary
• Software -> hardware -> and now back again!
• Now also leading projects as open source• Project Clearwater
• Project Calico
So, Calico?
• Connectivity and security for workloads (aka endpoints, aka micro-services, aka containers or VMs) in an elastic computing environment• e.g. a data center
• Emphasis on simplicity and scalability
• Based on standard Linux features• routing, iptables• and Internet protocols (BGP)
• Mainline case L3 only
Old, zone-based security
Services in an elastic environment
Distributed firewall security
Calico architecture
Data plane performance questions
• Can we get same bandwidth between endpoints as between those endpoints’ hosts?
• What is CPU cost, and how does it compare with other networking approaches?
• What are the effects of our iptables and ipset programming?
Testing methodology
• Two hosts, directly connected by 10Gb link• 8 core
• 64Gb RAM
• 3.13 kernel
• No tuning
• qperf, using TCP
• Measure CPU usage, raw throughput and packet latency
Configurations
• Bare metal, i.e. host to host
• Between OpenStack VMs• ‘TAP’ interface between VM and host
• Between containers• veth pair between container namespace and host namespace
• Between OpenStack VMs using Open vSwitch (OVS) and VXLAN
• MTU 1500, send sizes 20000 and 500
Data plane throughput
• Saturation for 20k messages …(red bars)
• … but not for 500 messages(blue bars)
• Why?• OpenStack better than bare metal?
• OVS case reaches >8Gb/s if MTU is increased to 9000
CPU usage
• CPU-limited for small messages
• OpenStack cases can use more cores
• Extra CPU cost for virtualization• Namespace
• TAP or veth interface
• Routing in guest as well as host
CPU usage per throughput
• CPU required to drive each Gb/s of throughput
Latency
• Tiny extra latency for containers
• More for VMs• But acceptable
• Note micro seconds• Not milli!
Security rules
iptables and ipsets
• iptables on a given host should be the composition of many logical security rules
• Will this impact data plane performance?
• Actually, no
-A felix-FORWARD -i tap+ -j felix-FROM-ENDPOINT-A felix-FORWARD -o tap+ -j felix-TO-ENDPOINT-A felix-FORWARD -i tap+ -j ACCEPT-A felix-FORWARD -o tap+ -j ACCEPT-A felix-FROM-ENDPOINT -i tap7f470881-51 -g felix-from-7f470881-51-A felix-FROM-ENDPOINT -j DROP-A felix-INPUT -i tap+ -j felix-FROM-ENDPOINT-A felix-INPUT -i tap+ -j ACCEPT-A felix-TO-ENDPOINT -o tap7f470881-51 -g felix-to-7f470881-51-A felix-TO-ENDPOINT -j DROP-A felix-from-7f470881-51 -m conntrack --ctstate INVALID -j DROP-A felix-from-7f470881-51 -m conntrack --ctstate RELATED,ESTABLISHED -j RETURN-A felix-from-7f470881-51 -p udp -m udp --sport 68 --dport 67 -j RETURN-A felix-from-7f470881-51 -s 10.28.0.40/32 -m mac --mac-source FA:16:3E:4E:7A:0E -g felix-p-_6b340324948a39b-o-A felix-from-7f470881-51 -m comment --comment "Anti-spoof DROP (endpoint 7f470881-5156-47ce-a67d-b971ef5e5cde):" -j DROP-A felix-p-_6b340324948a39b-i -p icmp -m set --match-set felix-v4-_6b340324948a39b src -j RETURN-A felix-p-_6b340324948a39b-i -s 172.18.203.20/32 -p tcp -m multiport --dports 22 -j RETURN-A felix-p-_6b340324948a39b-i -s 172.18.203.20/32 -p udp -m multiport --dports 5060 -j RETURN-A felix-p-_6b340324948a39b-i -s 172.18.203.20/32 -p tcp -m multiport --dports 80 -j RETURN-A felix-p-_6b340324948a39b-i -m comment --comment "Default DROP rule (72d696a9-f715-495f-9152-7f5e6a69fd0f):" -j DROP
What saves us?
• conntrack
• ipsets scale well, thanks to hash table implementation
• Nested design for source/destination interface mapping
Arjan Schaaf’s measurements
What is happening here?
• http://www.slideshare.net/ArjanSchaaf/docker-network-performance-in-the-public-cloud
• Various approaches to networking between containers on AWS hosts• For this case Calico uses IP-in-IP between the hosts
• Calico bandwidth less than half of native
• We set up the same system, got same results as Arjan• For t2.micro bandwidth = 65.3 MB/sec compared with native = 125 MB/sec.• For m4.xlarge bandwidth = 108 MB/sec compared with native = 267 MB/sec• Why?
It’s all about the MTU
• Calico in a public cloud uses IP-in-IP, with tunnel MTU = 1440
• 1440 was optimised for GCE, which has an MTU of 1460 on its VM interfaces
• But AWS instances have an MTU = 9001!• So native tests were using jumbo frames, and the calico test was using 1440.
• If Calico’s tunnel MTU is increased to 8980• For t2.micro, Calico bandwidth = 114 MB/sec
• For m4.xlarge, Calico bandwidth = 266 MB/sec
• Problem solved – Calico throughput is now close to native
So what have we learned?
• With Calico connectivity, VMs or containers can saturate a 10Gb link between hosts, just as much as the hosts themselves could
• There is a CPU cost to virtualization• But mostly inevitable if you want virtualization at all (non-accelerated)
• Calico does not add any significant extra cost
• Conntrack largely saves us from the effects of complex iptables• ipsets and clever programming design also help
• Be humble about performance comparisons
Further information, and thanks!
• Project Calico• http://www.projectcalico.org/• http://docs.projectcalico.org/en/latest/• https://github.com/projectcalico
• Blog on Calico data plane performance• http://www.projectcalico.org/calico-dataplane-performance/
• Thanks!• @neiljerram• @projectcalico• www.metaswitch.com
Top Related