100Gb/s Highways: Innovation Required at Every...

23
UC Berkeley Casper Workshop, June 13 2014 Kevin Deierling, Mellanox Technologies – kevind at mellanox.com 100Gb/s Highways: Innovation Required at Every Layer

Transcript of 100Gb/s Highways: Innovation Required at Every...

Page 1: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

UC Berkeley Casper Workshop, June 13 2014

Kevin Deierling, Mellanox Technologies – kevind at mellanox.com

100Gb/s Highways: Innovation Required at Every Layer

Page 2: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 2- Mellanox Technologies -

Leading Supplier of End-to-End Interconnect Solutions

Virtual Protocol Interconnect

StorageFront / Back-End

Server / Compute Switch / Gateway

56G IB & FCoIB 56G InfiniBand

10/40/56GbE & FCoE 10/40/56GbE

Virtual Protocol Interconnect

Host/Fabric SoftwareICs Switches/GatewaysAdapter Cards Cables/Modules

Comprehensive End-to-End InfiniBand and Ethernet Portfolio (VPI)

Metro / WAN

Page 3: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 3- Mellanox Technologies -

Bandwidth & Latency Roadmap

Heritage of High Speed, Low Latency Interconnect and Large Scale Systems

Scaling from two servers to 1000's of interconnected nodes End to end connectivity solutions that delivers value for challenging workloads

Page 4: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 4- Mellanox Technologies -

Moore’s Law

� Moore’s Law: Chip transistor count doubles roughly every two years

• Linear shrink of 30% results in half the area

• Keep the cost/area about constant while shrinking

Page 5: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 5- Mellanox Technologies -

Moore’s Law & Dennard Scaling

� Moore’s Law: Chip transistor count doubles roughly every two years

• Linear shrink of 30% results in half the area

• Keep the cost/area about constant while shrinking

� Dennard Scaling: As a MOSFET transistor shrinks it gets:

• Faster

• Lower power (constant power density)

• Smaller/lighter

Henry Dennard, IEEE JSSC Oct 1974

No compromises: Everything gets better as transistors shrink

Page 6: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 6- Mellanox Technologies -

Now for the Bad News …

� Dennard scaling broke about a decade ago

• Both power density & performance stopped scaling

� Higher power & new process/fabs = higher costs

� The economic half of Moore’s law crumbling too

http://www.ni. com/white-paper/14565/en/

Page 7: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 7- Mellanox Technologies -

But Half of Moore’s Law Still Going - Performance thru Parallelism

� So with only half of Moore’s law in tact what is a multi-Billion dollar Fab to do ?

� Not faster cores but more and more of them …

http://www.ni. com/white-paper/14565/en/

� Dennard scaling broke about a decade ago

• Both power density & performance stopped scaling

� Higher power & new process/fabs = higher costs

� The economic half of Moore’s law crumbling too

Page 8: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 8- Mellanox Technologies -

Multi-Core Processors & Virtualization is Perfect Match

� Hardware Multi-Core Parallelism

• Natural consequence of Moore’s Law continuing in conjunction with the end Dennard’s Scaling

• Multi-threaded programming is hard

• Difficult to achieve perfect parallelism

- Gustafson Law Speedup = s’ + p*n (ideally serial s’=0)

� Virtualization

• Multiple Virtual Machines on single physical server

• Completely separate tasks are easy to parallelize

• The Virtual Machine is a software construct supported by software hypervisor

- Software is easy to move – needs virtual IP addresses

Page 9: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 9- Mellanox Technologies -

100Gb/s Requires Innovating at Every Layer

� Application Layer

• Message format

� Presentation Layer

• Coding 1’s and 0’s

� Session Layer

• Authentication, Permissions, Persistence

� Transport Layer

• End-to-end error control

� Network Layer

• Addressing, routing

� Link Layer

• Error detection, flow control

� Physical Layer

• Bit stream, physical medium, analog symbol mapping bits

HYBRID MODEL

Page 10: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 10- Mellanox Technologies -

Parallel Processing of Big Data Problems & Decision Analytics

� Large search indexes now over 100 petabytes!

� If you are spending $10 million on a cluster -efficiency matters

� Better accelerate every part of Hadoop processing

• Direct access of processes to file system

• RDMA Enabled distributed file systems

- HDFS

- GPFS

- Ceph

• Accelerate Hadoop itself (map shuffle)

Page 11: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 11- Mellanox Technologies -

� Transport Layer Innovation Required• TCP/IP dropped packets a non-starter.

• Rear-ending someone is not the best way to figure out there is congestion

• Explicit notification required

• + RDMA, virtual nics, virtual traffic steering, affinity

� Network Layer• Virtual as well as physical routing (Easy VM migration )

� Link Layer• Lossless Networks using Flow control

- PFC (on/off) flow control is a blunt instrument

� Round trip latency of 1us requires ~40Kbytes buffering per logical link

- IETF starting to consider credit based flow control for Ethernet modeled after InfiniBand

� Physical Layer• 100Gb/s signaling means 10ps symbol period!!

- 3 mm pulse of light in free space!

- Less <<1cm on FR4 … Not feasible at this rate

• Lower symbol rate required through either:

- Parallel streams: ex: 4x25Gb/s

- Multi-bit/symbol: ex: PAM4, WDM

Innovation Required @ 100Gb/s

TCP/IP Implicit Congestion Notification aka dropped packets and timeouts

PFC: Priority Flow Control

Page 12: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 12- Mellanox Technologies -

RDMA: Critical for Efficient use of Data Center Resources

ZERO Copy Remote Data Transfer

Low Latency, High Performance Data Transfers

InfiniBand - 56Gb/s RoCE* – 40Gb/s

Kernel Bypass Protocol Offload

* RDMA over Converged Ethernet

Application ApplicationUSER

KERNEL

HARDWARE

Buffer Buffer

Page 13: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 13- Mellanox Technologies -

RDMA: How it Works

RDMA over InfiniBand or Ethernet

KERNEL

HARDW

ARE

USER

RACK 1

OS

NIC Buffer 1

Application

1Application

2

OS

Buffer 1

NICBuffer 1

TCP/IP

RACK 2

HCA HCA

Buffer 1Buffer 1

Buffer 1

Buffer 1

Buffer 1

Page 14: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 14- Mellanox Technologies -

Server

VM1 VM2 VM3 VM4

Need to Accelerate Virtual Overlay Networks!

Overlay Network Virtualization: Isolation, Simplicity, Scalability

Virtual Domain 3

Virtual Domain 2

Virtual Domain 1

Physical View

Physical View

Server

VM5 VM6 VM7 VM8

Mellanox SDN Switches & Routers

VirtualView

VirtualView

NVGRE/VXLAN Overlay NetworksVirtual Overlay Networks Simplifies

Management and VM Migration

ConnectX-3 Pro Overlay Accelerators Enable

Bare Metal PerformanceOpenFlow

Virtual Network

Management API

Page 15: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 15- Mellanox Technologies -

� To fit 100Gb/s in QSFP package requires:

• Low power electronics

• 4x25+ Gb/s modulators and detectors

• Silicon photonics integration:

• no lenses for the laser

• no isolators

• no TEC

100Gb/s in QSFP28 Package

TX (Modulator)RX (Photo Detector)

TIA & CDR

Modulator Driver

& CDR

Mellanox QSFP Module

TIA – Transimpedance AmplifierCDR – Clock Data Recovery

Page 16: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 16- Mellanox Technologies -

Single Laser Reduces Power and Cost

Fully integrated Parallel optical

engine Tx and Rx chipsGe PDs

Lasers

Modulators

Splitter

Page 17: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 17- Mellanox Technologies -

Franz-Keldysh Modulator scales to >50G

� 5dB ER with 2.8 Vpp

� 2 Vpp possible

� Integrates w/WDM

28GHz25Gb/s Eye

Page 18: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 18- Mellanox Technologies -

� Wavelength division multiplexing

• Many parallel channels over a single fiber, reducing cabling complexity and cost

� WDM:100G Ethernet LR4 uses 4 wavelengths

Wavelength Division Multiplexing

Page 19: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 19- Mellanox Technologies -

� Echelle gratings scale from 4 to 40+ channels

� As much as 10x smaller than Arrayed Waveguide Gratings

� Better control of wavelength registration and very low cross talk

Echelle Gratings as Mux/Demux

input

outputReflective facet

Slab

SEM of etched facet

Spectra on a 12 channel multiplexer

Tx

Rx

Page 20: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 20- Mellanox Technologies -

� QSFP provides great density

� WDM link uses standard SMF duplex fiber (same as 10G today)

� 2 km reach

Echelle Gratings Enable 4x25G WDM over Single Fiber

Laser bank

Mo

du

lato

r b

an

k

Mux DeMux

Ge

PIN

de

tec

tor

ba

nk

Ge Detectors

VOALasers

Power Monitors

Modulators

Page 21: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 21- Mellanox Technologies -

� Have demonstrated 1+Tb/s link

WDM Provides Scalability

1 5 2 5 1 5 3 0 1 5 3 5 1 5 4 0 1 5 4 5 1 5 5 0 1 5 5 5 1 5 6 0 1 5 6 5- 3 5

- 3 0

- 2 5

- 2 0

- 1 5

- 1 0

- 5

0

W a v e l e n g t h ( n m )

T E / T M S p e c t r u m

Lo

ss

(d

B)

1 5 3 0 1 5 3 5 1 54 0 1 5 4 5 1 5 5 0 1 5 5 5 1 5 6 00 .3

0 .4

0 .5

0 .6

0 .7

0 .8

0 .9

1

1 .1

W a v e le n g th (n m )

Re

sp

on

siv

ity

(A/W

)

T E

T M

40 channel WDM Receiver Layout16mm x 11.5mm

Connecting Single Fiber is Easier and Cheaper than Connecting 40!

Page 22: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

© 2014 Mellanox Technologies 22- Mellanox Technologies -

Summary

� Commercial 100Gb/s is around the corner

• But really hard!

� Dennard Scaling has cracked

� And with it Moore’s Law will break. Not because of the laws of physics but rather the laws of economics

� So scaling will be at the cluster level, driving the requirement for low latency, high performance interconnects

� Innovation required at every level

• Application layer: IPC, Application direct access file systems

• Transport Layer: RDMA, Application Acceleration, Convergence

• Network: Better be thinking virtually

• Link: Lossless but not just stop lights

• Phy: 100Gb/s @ VCSEL limits. Silicon photonics required to move beyond 100Gbs. WDM in the data center is not far behind.

� Market drivers will be efficiency and performance for Cloud, Web 2.0, Big Data

Page 23: 100Gb/s Highways: Innovation Required at Every Layercasper.berkeley.edu/2014_presentations/Friday/mellanox.pdf · 100Gb/s Highways: Innovation Required at Every Layer ... OpenFlow

Thank You! Questions?