Applying Hyper-scale Design Patterns to Routing

41
Applying Hyper-scale Design Patterns to Routing Hannes Gredler, CTO RtBrick Inc. DEVNET-2064

Transcript of Applying Hyper-scale Design Patterns to Routing

Page 1: Applying Hyper-scale Design Patterns to Routing

Applying Hyper-scaleDesign Patterns to Routing

Hannes Gredler, CTO RtBrick Inc.DEVNET-2064

Page 2: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Who am I ?• CTO at RtBrick, Inc.• Past stint: Distinguished Engineer

with the “other router-vendor”• 18 Years working experience,

developing, deploying and supporting Routing Software

• Expertise• BGP, IS-IS, MPLS• 20+ Patents• 20+ Proposed Standards

http://www.arkko.com/tools/allstats/hannesgredler.html

• IETF WG co-chair (IS-IS)

DEVNET-2064 2

Page 3: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

>2013 exposure to Data Center Networks & SRNew large-scale data-center network model emerging [draft-ietf-rtgwg-bgp-routing-large-dc] End-to-End Layer-3 routing

Fixes issues with L2 switching data plane.

Hierarchical Topology CLOS-based

Max 5-stages Use of aggregation at TORs

DEVNET-2064 3

Page 4: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 4

Got a couple of inconvenient insights …

• Networks have become Anti-Moore• direct sourcing from OEM manufacturers in Taiwan

• Hardware is a Commodity• Cost per Bit dropping sharply (USD 400 / 100GBE port)• Boutique ASICs viable in 5 years from now ?

• Curated Software Release models approaching EOL• Modularization or Custom package selection desired (no-PIM, no RSVP, etc.)• Pay per-use

• Different model (node vs. system) for Resiliency• Open sourcing of components the new normal

• Integration of components becomes core competency

DEVNET-2064

Page 5: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 5DEVNET-2064

John GageSun Microsystems

1) “The network is the computer”

Page 6: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 6DEVNET-2064

Hannes Gredler, 2015

2) “Is it possible to construct a router based on the web 2.0 mindset ?”

Page 7: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 7

• Introduction• Multi-Level Architecture• Micro-services & APIs• Commoditization & Unit Economics• Resiliency, system coupling and state recovery• Open Source Development & Test• Conclusion

Agenda

DEVNET-2064

Page 8: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Multi-Level Architecture

8DEVNET-2064

Page 9: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 9DEVNET-2064

Hyper-scale Multi-level Architecture

Page 10: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 10

Forwarding node• Translates RIB Objects to local OS

representation• Tables• Routes• Nexthops

• Hardware Prefix Caching

• Aggregate FIB table• (filter specifics)

• Localize fwd table• VPNs

DEVNET-2064

Page 11: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 11

Protocol I/O node• Schema driven protocol

serializer /de-serializer• Keep alive delegation /absorption• Terminal Communication point for

Sockets, stdio & file I/O• Pre-processing protocol stream

(filter BGP PA128)• Queuing machinery & Routing

Protocol update generation• Interface state handling

DEVNET-2064

Page 12: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 12

Application (route computation) node• Schema driven Data Structure

Server

• Stores Applications Objects• Routes, Nexthops, Tables

• Triggered execution (Add, Chg, Del) of internal/external Application code• Python functions• C/C++ library calls• Executables vfork()

DEVNET-2064

Page 13: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 13DEVNET-2064

Putting it all together

Page 14: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Micro-Services & APIs

14DEVNET-2064

Page 15: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 15

Build a system of little components

• Micro-service architecture is like a UNIX pipeline model

• Small pieces of software, serving a unique purpose

• Easy transfer of state from one brick to next

SortFilterSource Filter Sink

curl http://192.168.1.1/bds/object | grep “Received-From:” | sort | grep “foo” > out

DEVNET-2064

Page 16: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 16

REST/JSON based APIs

DEVNET-2064

Page 17: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 17

bds://local/bgp.neighbor

bds://local/isis.adj

bds://local/isis.lsdb.l2

bds://217.160.181.216/bgp.rib-in

PUBSUB

Database centric / Distributed Data Store

DEVNET-2064

Page 18: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 18

Open IPC format = BSON/JSON• Binary JSON for memory and I/O

efficiency

• JSON conversion on the fly possible

DEVNET-2064

Page 19: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 19

Table replication & state flow within a system

DEVNET-2064

Page 20: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Commoditization & Unit Economics

20DEVNET-2064

Page 21: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 21

Compute Strategy: Yahoo vs. Google

DEVNET-2064

Few Big vs. Many Small

Page 22: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 22

• Economy of scale will ultimately render custom-ASICs obsolete• FY2016 systems shipping: 100GB, > 128K FIB entries• Disintegration happening

• soon to enter the Edge Router Business …

• For ease of integration makes no Hardware, no locality, no OS assumptions• Unbounded Configuration Possibilities:• Single Switch, Cluster of Switches, Co-located x86 Rack Servers ….• Large FIBs, Small FIBs, SW-based forwarders & Combos thereof

Commodity data plane = White-boxes

DEVNET-2064

Page 23: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 23

• 16-32 CPU Cores, 64 GB RAM, Solid State Disks• app USD 3000

• Runs stock Ubuntu / Centos

• Linux Containers (LXC)• dependency management• Para-Virtualization

Commodity control plane = 1RU Rack Servers

DEVNET-2064

Page 24: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Resiliency, system coupling and state recovery

24DEVNET-2064

Page 25: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 25DEVNET-2064

Hyper-scale Multi-level Architecture

Page 26: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 26DEVNET-2064

Resiliency

Page 27: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 27DEVNET-2064

Resiliency – snapshot DB to disk

Page 28: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 28

Resiliency (2) – restart based on disk snapshot

DEVNET-2064

Page 29: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Open SourceDevelopment & Test

29DEVNET-2064

Page 30: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 30DEVNET-2064

Open Development• Open Source

• 100 eyes better than 4 eyes, Network effects• Long term Maintenance

• Open Source means sharing of not just Code:• Code• Test• Build• Documentation

Page 31: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 31

Open development (1)• Use what is usable

• No needs to re-invent Linux, event-loops, memory managers

• Kernel based networking stacks are not usable for a router• Debugging hard (GDB live attachment)• Experimental forwarding code with no fault-domains in your kernel, really ?• TCP snapshots / restart.

• In 2016Q1 we did not have a packet forwarding core• Cisco did release fd.io / VPP• User space DPDK design aligned with our (religious) believes• Most feature complete open-source L3 forwarder• Engineered for performance and maintainability

DEVNET-2064

Page 32: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 32DEVNET-2064

Open development (2)• Kick-ass VPP crew

• Helped us to implement necessary core-features (indirect next-hop) within two weeks.• Good balance between Stability and feature velocity• Excellent Continuous Integration & Test Automation (untypical for FLOSS projects)

Page 33: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 33

Open Development (3)VPP Internet stream generator

DEVNET-2064

Page 34: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Conclusion

34DEVNET-2064

Page 35: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 35

In Conclusion• Network Equipment design got to be

• Distributed, Multi-level Architecture• Micro-service based• Running on Commodity Hardware• “System” Resilient• Open Development / Open Test

• Cisco Vector Packet Processing (VPP)• Best code in the industry (why is this free ?)• Good Code Governance• Establishment of an innovative ecosystem around VPP underway

DEVNET-2064

Page 36: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

rtbrick demo at fd.io booth

36DEVNET-2064

Page 37: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 37DEVNET-2064

Demo hosted at EC2 instance

Page 38: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

• 39Million IPv4 / 3M IPv6 route entries

• BGP table snapshots from RIPE RIS server

• route-processing / update / restart performance• 20x compared to JUNOS | IOS-XR• Full-bringup time 180s• Resync time 26s

• Full fault domain isolation• Blast radius within a protocol-process of an address family

• Process restart• Preservation of TCP session• Fast, robust Re-sync of state• Everything versioned

Demo SCALE

Page 39: Applying Hyper-scale Design Patterns to Routing

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public

Process restart using snapshots (3)

Normal appd restart resync time for 2.5M RIB entries / 28s

snapshot appd restart resync time for 2.5M RIB entries / 6s

Page 40: Applying Hyper-scale Design Patterns to Routing

Thank you

© 2016 Cisco and/or its affiliates. All rights reserved. Cisco Public 40DEVNET-2064

Page 41: Applying Hyper-scale Design Patterns to Routing