Designing scalable Docker networks

21
Designing scalable Docker Networks [March 15 2016] [ Murat Mukhtarov ] Zendesk

Transcript of Designing scalable Docker networks

Page 1: Designing scalable Docker networks

Designing scalable Docker Networks

[March 15 2016][ Murat Mukhtarov ]

Zendesk

Page 2: Designing scalable Docker networks

Contents

2

● Linux network namespaces○ Introduction○ Binding interface to namespace

● Docker networking○ Namespaces○ Inbound and Outbound traffic flows○ Clustered environments○ Challenges

● VXLAN○ Introduction○ VXLAN signalling○ VXLAN and Docker

● BGP○ Routing VXLAN with BGP○ Scaling VXLAN based Docker networks with BGP○ PoC

● What wasn’t covered in this presentation

Page 3: Designing scalable Docker networks

Linux network namespaces

3

Network namespaces is a part of containerization technology that used by Linux kernelNetwork namespaces allows:

○ To create linux container network isolation instances (namespaces)

○ With own routing table, virtual interfaces, L2 isolation

● The tool that is used to operate with network ns: iproute2

● Network namespaces are stored in ○ /var/run/netns

● There two types of network namespaces:○ Root namespace [ ip link ]○ Non-root namespace [ ip netns .. ip link ]

Page 4: Designing scalable Docker networks

Bind interface to network namespace

4

Change namespace for eth0-NAMESPACE1 from Root to NAMESPACE1

When network namespace is created it has only one interface Loopback:

We can create a pair of peered ip links in the root namespace:

Page 5: Designing scalable Docker networks

Bringing namespaced interface UP

5

After bringing UP veth part of the pipe, interface inside NAMESPACE1 also becomes UP

We can rename interface inside namespace and try to bring it UP

Finally assign ip address on eth0 interface inside NAMESPACE1

Page 6: Designing scalable Docker networks

Docker and network namespaces

6

Docker supports different format of containerisation:

● Libcontainer - own native go-lang implementation to use kernel containerisation

capabilities. Default (since 0.9)

● LXC was default before 0.9

Hence docker uses libcontainer every container that created with network namespace

would not be seen in ip netns output

However it is possible to expose it if you now docker container process pid:

PID=$(docker inspect -f '{{.State.Pid}}' $container_id)

ln -s /proc/$PID/ns/net /var/run/netns/$PID

Instead of PID you can use any name, container_id for example

Page 7: Designing scalable Docker networks

Docker networking: introduction

7

Docker does for you:

- ip link pair: vethXXXXXX <-> eth0 inside the

container’s namespace

- Adds to docker0 (by default) bridge a vethXXXXX

interface (which is tunnel-end in Root namespaces).

- Sets up ip address from docker0 network range.

- Creates a rule in iptables that will organize for you

NAT (PAT) translation, masquerading containers’

network behind default eth0 interface

Page 8: Designing scalable Docker networks

Docker networking: exposing ports

8

Docker can expose internal ports and even

interfaces:

- Network type: host. No network namespaces

isolation, root namespace will be used

- Supply port numbers to be exposed: iptables

rules would be created to allow given port(s)

number and create a port mapping (port

translation) rule.

Page 9: Designing scalable Docker networks

Docker networking: Clustered environments

9

Now docker offers multi host networking using Docker Swarm, KV store to signal

Network and Clustering using Docker Swarm. Overlay transport Requires Linux

Kernel version > 3.17

Page 10: Designing scalable Docker networks

Current challenges

10

KV store approach is a great way to interconnect different docker-runnings nodes

for Docker only environments. But it still has scalability limitations for WAN, Multi-

Datacenters and not only Docker scenarios.

- Modern service-oriented applications consists of multiple processes. Sometimes

platform can be described as 30-40 applications, which would be great to

containerise

- Old networking child issues could return - broadcast domain problems,

segmentation and etc.

- Docker offers VXLAN support which allows you to scale to certain extent.

However how to distribute knowledge about VXLAN database for non-Docker

networks ?

Page 11: Designing scalable Docker networks

VXLAN introduction

11

VXLAN overlay networking technology that allows to send Ethernet traffic encapsulated into UDP datagrams

over IP/GRE networks. Detailed description of VXLAN networking could be found in RFC7348

24 bit VNI field is VXLAN address field that could be

compared with 802.1q tag for Ethernet frames or MPLS

label.

Bare in mind MTU value when using VXLAN

Page 12: Designing scalable Docker networks

VXLAN signalling

12

VXLAN network should be properly signalled otherwise participating hosts would not know about existence

of each other. In terms of signalling this particular information should be advertised:

- VXLAN Tunnel End-Point (VTEP) - identifies EndPoint, an entity that organizes and terminates VXLAN

tunnels

- VXLAN Network Identifier (VNI) - identifies the network, similar to 802.1q tag or MPLS label

- IP and MAC addresses

Ways of signalling VXLAN:

- Unicast way - dedicated controller

- Multicast way - using PIM and VNI:VTEP pairs

propagated as Multicast routes

- Docker has implementation with KV store

- OpenContrail can use XMPP

- BGP

Page 13: Designing scalable Docker networks

VXLAN signalling with BGP: EVPN

13

Using BGP protocol to carry VXLAN and MAC/IP information is described at following RFCs:

- http://tools.ietf.org/html/rfc7432

- https://tools.ietf.org/html/draft-ietf-bess-evpn-overlay-02

- https://tools.ietf.org/html/rfc4684

BGP protocol is designed to be highly extensible and that is why it is possible to use NLRI

to carry other information than IPv4/IPv6 routes.

For EVPN following Address families were allocated:

● AFI 25 - which matches to L2VPN networks signalling over BGP (Kompella approach)

● SAFI 70 - subaddress family for EVPN (VXLAN)

Basicly VXLAN information is carried as BGP routes.

Page 14: Designing scalable Docker networks

VXLAN and Docker

14

To create multi-tenant Docker networks with advanced isolation we can use VXLAN in the

following way:

- Create a dedicated interface that has type vxlan

- Create a bridge interface where we can stitch together vxlan interface and Root

namespace leg of container interface

- Create a forwarding table entry

bridge fdb add to 00:17:42:8a:b4:05 dst 192.19.0.2 dev vxlan0

- It would be signalled using multicast address 239.1.1.1 on port 4789 (mutlicast should be

supported)

OR

- Configure KV store parameters as daemon arguments and create overlay network

- docker network create --driver overlay my-multi-host-network

Page 15: Designing scalable Docker networks

Docker and VXLAN traffic flow

15

Page 16: Designing scalable Docker networks

Docker with EVPN and BGP

16

To achieve highly scalable network for Docker we can use:

- VXLAN as a forwarding plane to carry network traffic and isolate different

container groups and hosts

- Signal VXLAN using BGP to manage large Multi-datacenter networks

- CNI plugin to bring EVPN tunnels up automatically (Kubernetes)

Python written BGP implementation for VXLAN and BGP: bagpipe-BGP, code based

on ExaBGP

https://github.com/Orange-OpenSource/bagpipe-bgp

Go BGP implementation - GoBGP - Route Reflector https://github.com/osrg/gobgp

Page 17: Designing scalable Docker networks

Stitching together Docker, BGP and VXLAN

17

Page 18: Designing scalable Docker networks

18

Proof of concept:Docker + VXLAN + BGP

Page 19: Designing scalable Docker networks

DEMO

19

Description:

- 4 virtual machines: 3 - bagpipe-bgp and 1 goBGP route reflector

- Dockerbgp1, Dockerbgp2 and Dockerbgp3 establish BGP session to

goBGP RR: 192.168.33.30

- dockerbgp1: 192.168.33.10, runninng web server

- dckerbgp2: 192.168.33.20, running curl

- dockerbgp3: 192.168.33.30, just busybox for ping test

EVPN network: 192.168.10.0/24

IP network for hosts: 192.168.33.0/24

Page 20: Designing scalable Docker networks

What we did not cover

20

- Another BGP project for Docker and Kubernetes IP networking:

https://www.projectcalico.org/why-bgp/

- CNI the Container Network Interface, is a proposed standard for

configuring network interfaces for Linux application containers.

https://github.com/appc/cni

- IP VPN networks using Bagpipe BGP and Open vSwitch