Post on 23-Dec-2015
1.10.2012 1
SEATTLESEATTLE- - A Scalable Ethernet Architecturefor Large Enterprises
T-110.6120 – Special Course in Future Internet Technologies
M.Sc. Pekka Hippeläinen
IBM
phippela@gmail
1.10.2012 2
SEATTLE
Based on and pictures borrowed from:Changhoon,K;Caesar,M;Rexford,J. Floodless in SEATTLE: A Scalable Ethernet Architecture for Large Enterprises
Is it possible to build a protocol that maintains the same configuration-free properties as Ethernet bridging, yet scales to large networks?
1.10.2012 3
Contents
Motivation: network management challenge
Ethernet features: ARP and DHCP broadcasts 1) Ethernet Bridging 2) Scaling with Hybrid networks 3) Scaling with VLANs Distributed Hashing SEATTLE approach
Results Conclusions
1.10.2012 4
Network management challenge IP Networks require massive effort to
configure and manage Even 70% of an enterprise network’s cost
goes to maintenance and configuration Ethernet is much simpler to manage However Ethernet does not scale well
beyond small LANs SEATTLE architecture aims to provide
scalability of IP with simplicity of Ethernet management
1.10.2012 5
Why Ethernet is so wonderful ? Easy to setup, easy to manage DHCP server, some hubs, plug’n play
18.09.2012 6
Flooding query 1: DHCP requests Lets say node A joins the ethernet To get IP / confirm IP – node A sends a DHCP
request as a broadcast Request floods through the broadcast domain
1.10.2012 7
Flooding query 2: ARP In order for node A to communicate to node
B in the same broadcast domain, the sender needs MAC address of the node B
Lets assume that node B IP is know Node A sends and Address Request Protocol
(ARP) broadcast – to find out MAC address of node B
Similarly to DHCP broadcast – the request is flooded through the whole broadcast domain
This is basically {IP -> MAC} mapping
1.10.2012 8
Why flooding is bad ? Large Ethernet deployments contain vast
number of hosts and thousands of bridges Ethernet was not designed to such a scale Virtualization and mobile deployments can
cause many dynamic events – causing control traffic
Broadcast messages need to be processed in the end hosts – interrupting cpu
The bridges forwarding tables grow roughly linearly with number of hosts
1.10.2012 9
1) Ethernet bridging Ethernet consists of segments each
comprising a single physical layer Ethernet bridges are used to interconnect
segments to multi-hop network i.e. LAN This forms a single broadcast domain Bridge learns how to reach a host – by
inspecting the incoming frames and associating the source MAC with the incoming port
A bridge stores this information to a forwarding table – using the table to forward packets to correct direction
1.10.2012 10
Bridge spanning tree One bridge is configured to be the root
bridge Other bridges collectively compute a
spanning tree based on the distance to the root
Thus traffic is not routed through shortest path but along the spanning tree
This approach avoids broadcast storms
1.10.2012 11
1.10.2012 12
2) Hybrid IP/Ethernet In this approach multiple LANs are
interconnected with IP routing In hybrid networks each LAN contains at
most a few hundred of hosts that form IP subnet
IP subnet is associated with the IP prefix Assigning IP prefixes to subnet and
associating subnets with router interfaces is a manual process
Unlike MAC which is host identifier – IP address denotes the hosts current location in the network
1.10.2012 13
1.10.2012 14
Drawbacks of Hybrid approach Biggest drawback is the configuration
overhead Router interfaces must be configured Host must have correct IP address corresponding
to the subnet it is located (DHCP can be used) Networking policies are defined usually per
network prefix i.e. topology When network changes the policies must be
updated Limited mobility support
Mobile users & virtualized hosts at datacenters If IP is constant – the user should stay on the same
subnet
1.10.2012 15
3) Virtual LANs Overcomes some problems of Ethernet
and IP Networks Administrators can logically groups hosts
into same broadcast domain VLANS can be configured to overlap –
configuring bridges not the hosts Now broadcast overhead can be reduced
by the isolates domains Mobility is simplified – IP address can be
retained while moving between bridges
1.10.2012 16
Virtual LANs Traffic from B1 to B2 can be ‘trunked’
over multiple bridges Inter domain traffic needs to be routed
1.10.2012 17
Drawbacks of VLANs Trunk configuration overhead
Extending VLAN across multiple bridges requires VLAN to be configured at each of the bridges participating. Often manual work.
Limited control plane scalability Forwarding table entries and broadcast traffic for
every active host and every VLAN visible Insufficient data plane efficiency
Single spanning tree is still used within each VLAN
Inter-VLAN traffic must be routed via IP gateways
1.10.2012 18
Distributed Hash Tables Hash tables are used to store {key -> value}
pairs In case of multiple nodes there is nice way to
make Nodes symmetric Distribute the hash table entries evenly among
nodes Keep reshuffling of entries small in case of
adding/removing nodes Idea is to calculate H(key) that is mapped to a
host – one can visualize this to mapping to an angle (or to a point on a circle)
1.10.2012 19
Distributed Hash Tables
Each node is mapped to randomly distributed points on the circle
Thus each node is mapped to multiple buckets
One calculates the H(key) – and stores the entry to the node owning this bucket
If node is removed – the values are now assigned to next buckets
If node is added – entries are moved to the new buckets
1.10.2012 20
SEATTLE approach 1/2 1) Switches calculate shortest
path among themselves This is link state protocol – basically
Dijkstra Switch level discovery protocol – Ethernet
hosts do not respond Switch topology much more stable than at
host level Much more scalable than at host level Each switch has an ID – one MAC address of
the switch interfaces
1.10.2012 21
SEATTLE approach 2/2 2) DHT used in switches
{IP->MAC} mapping This is essentially ARP request avoiding
flooding {MAC->location} mapping
When switch is located – routing along the shortest path can be used
DCHP Service location can also be stored SEATTLE thus reduces flooding, allows
usage of shortest path and offers a nice way to locate DHCP service
1.10.2012 22
SEATTLE Control overhead reduced with consistent
hashing When set of switches changes due to network
failure or recovery – only some entries must be moved
Balancing load with virtual switches If some switches are more powerful – the switch
can represent itself as many – getting more load Enabling flexible service discovery
This is mainly DHCP – but could be something like {“PRINTER”->location}
1.10.2012 23
Topology changes Adding and removing switches/links
can alter topology Switch/link failures and recoveries can
also lead to partitioning events (more rare)
Non-partitioning link failures are easy to handle – the resolver for hash entry is not changed
1.10.2012 24
Switch failures If switch fails or recovers hash entries
need to be moved The switch that published value – monitors
the liveliness of resolver. Republishing entry when needed
The entries have TTL
1.10.2012 25
Partitioning events Each switch has to book keep also
locally-stored location entries If switch s_old is removed / not reachable –
all the switches need to remove these location entries
This approach correctly handles partitioning events
1.10.2012 26
Scaling:location Hosts use directory service to publish and
maintain {mac->location} mappings When host a with mac_a arrives – it accesses
switch S_a (steps 1-3) Switch s_a publishes {mac_a,location}, by
calculating the correct bucket F(mac_a) i.e. switch/resolver
When node b wants to send message to node a F(mac_a) is calculated to fetch the location
’Reactive resolution’ – also cache misses do not lead flooding
1.10.2012 27
Scaling:ARP When node b makes ARP request – SEATTLE
converts this to a {F(IP_a) -> mac_a} request The resolver/switch for F(IP_a) is usually
different from F(mac_a) Optimization for hosts making ARP request
F(IP_a) address resolver can also store mac_a and S_a
When node b makes F(IP_a) ARP request also mac_a->S_a mapping is cached to S_b
Shortest path (-> path 10) can now be used
1.10.2012 28
Handling host dynamics
Location change Wireless handoff VM moved but retaining MAC
Host MAC address changes NIC card replaced Failover event VM migration forcing MAC change
Host changes IP DHCP leave expires Manual reconfiguration
1.10.2012 29
Insert, delete and update Location change
Host h moves from s_old to s_new s_new updates the existing mac-to-location
entry MAC change
IP-to-MAC update MAC-to-location deletion (old) and insertion
(new) IP change
S_h deletes old IP-to-MAC and inserts new IP-to-MAC
1.10.2012 30
Ethernet: Bootstrapping hosts Host discovered by access switches
SEATTLE switches snoop ARP requests Most OSes generate ARP request at boot up / if up Aldo DHCP messages or host down can be used
Host configuration without broadcast DHCP_SERVER hashes string “DHCP_SERVER” and
stores the location to the switches The “DHCP_SERVER” string is used to locate
service No need to broadcast for ARP or DHCP
1.10.2012 31
Scalable and flexible VLANs To support broadcasts – the authors suggest
using groups Similar to VLAN - groups is defined as a set of
hosts who share the same broadcast domain The groups are not limited to layer-2
reachability Multicast-based group-wide broadcasting
Multicast tree with broadcast root for each group F(group_id) used for broadcast root location
1.10.2012 32
Simulations 1) Campus ~40 000 students
517 routers and switches
2) AP-Large (Access Provider) 315 routers
3) Datacenter (DC) 4 core routes with 21 aggregation switches
Routers were converted to SEATTLE switches
1.10.2012 33
Cache timeout and AP-large with 50k hosts
Shortest path cache timeouthas impact on number oflocation lookups
Even with 60s time out 99.98%packets were forwarded without lookup
Control overhead (blue) decreases very fast – where as the table size increases only moderately
Shortest path is used in majority of routing in these simulations
1.10.2012 34
Table size increase in DC
Ethernet bridges stores entryfor each destination ~ O(sh)behavior across network
SEATTLE requires only ~O(h) state since only access and resolver switches need to store and location information for each hosts
With this topology the table size was reduced by factor of 22
In AP-large case the factor was increased to 64
1.10.2012 35
Control overhead in AP-large Number of control messages
over all links in the topologydivided by the number switchesand duration of the trace
SEATTLY significantly reduces control overhead in the simulations
This is mainly because Ethernet generates network wide floods for a significant number of packets
1.10.2012 36
Effect of switch failure in DC Switches were allowed to fail
randomly The average recover time was
30 seconds SEATTLE can use all the links in the
topology, where as Ethernet is restricted to the spanning tree
Ethernet must re-compute the tree causing outages
1.10.2012 37
Effect of host mobility in Campus Hosts were randomly moved
between access switches For high mobility rates,
SEATLLES loss rate was lower than Ethernet
On Ethernet it takes sometime for switches to evict the stale information location information and re-learn the new location
SEATTLE provided low loss and broadcast overhead
1.10.2012 38
What was omitted Authors suggest multi-level one-hop DHTs
With large dynamic networks – it can be beneficial that entries are stored close
This is achieved with regions and backbone – border switches connect to the backbone switches
With topology changes Approach to seamless mobility is described in the
paper Updating remote host caches is required with
switch based MAC revocation lists Some simulation results Authors also made sample implementation
1.10.2012 39
Conlusions Operators today face challenges in managing
and configuring large networks. This is largely to complexity of administering IP networks.
Ethernet is not a viable alternative poor scaling and inefficient path selection
SEATTLE promises scalable self-configuring routing Simulations suggest efficient routing, low latency
with quick recovery Host mobility supported with low control overhead
Ethernet stacks at end hosts are not modified
1.10.2012 40
Thank you for your attention!Questions? Comments?