400G: Deployment at a National Lab - Amazon...

28
Chris Tracy (Esnet) *Jason R. Lee (NERSC) 400G: Deployment at a National Lab - 1 - June 30, 2016

Transcript of 400G: Deployment at a National Lab - Amazon...

Chris Tracy (Esnet)

*Jason R. Lee (NERSC)

400G:

Deployment at a

National Lab

- 1 -

June 30, 2016

Concept

- 2 -

Concept: Use case

This work originally began as a white paper in December 2013, in which Esnet

was exploring new technologies to support rates above 100G.

One use case, in particular, was the problem of linking two disparate data

centers. NERSC was in the planning phase of a move from the Oakland

Scientific Facility (OSF) in Oakland to Shyh Wang Hall (CRT), at LBL in Berkeley.

Oakland Scientific Facility [1] Oakland, CA

Berkeley Lab’s Shyh Wang Hall [2] Berkeley, CA

- 3 -

Concept: Proposal

By February 2014, interest in these new technologies grew. This led to the

generation of a draft proposal submitted to the DOE.

In collaboration with NERSC and Ciena, ESnet proposed a field-trial of 400G

technology on BayExpress — ESnet’s Bay Area production dark fiber ring.

ESnet5 BayExpress: Production system serving Bay Area laboratories between Sacramento and Sunnyvale.

- 4 -

400G

- 5 -

400G: Plan

• BayExpress ring is 450 km in length

• National Energy Research and Scientific Computer Facility (NERSC) was moving to a new building that was only 11.5 km from the current site. – Short way around.

• NERSC needed to stay up and running, serving the large diverse scientific community it supports – ~6000 scientists, ~900 projects and 46 countries across the

world.

– There is no time that the center is “lightly” loaded.

– As of June 27th we have a 10 day backlog of jobs to run.

- 6 -

400G: Plan

• We would create two alien waves, where each wave would carry 200G.

• Then combine these waves to form a “SuperChannel”, that would be 400G in total bandwidth

• Wave selectable switches are in the path, but they are limited to 50GHz granularity.

• In the production circuit this took up 100GHz of spectral bandwidth.

– 2 x 50GHz channels

- 7 -

400G: Execution

● Network-wide upgrade had to be performed for new

h/w and optical control plane

● 4x100 GigE circuits brought up and fully production

quality between OSF and CRT

● On line (DWDM) side, provisioned on BayExpress as

two adjacent 50 GHz channels

● Each 50 GHz channel contains one DP-16QAM signal

(2x100GigE payload)

● DP-16QAM signal line rate => 275.75 Gbit/s (incl.

G.709/FEC overhead)

- 8 -

NERSC

- 9 -

NERSC: Physical topology

- 10 -

CRT OSF

NERSC: Synchronizing File systems

• Sync FS between sites while keeping jobs running on the supercomputers. – In total about ~10 PB of file system data

• GPFS restripe, keeping both sites live.

• Achieved a sustained rate of ~250 Gbps over the link: – Limited by the number of sinks / sources we could allocate to

the transfer.

– We did push 400 Gbps during acceptance of the link.

• Path that the data took was: – Disk 10G Ethernet 400G Superchannel

Ethernet/Infinband routers Disk

– All the disk at CRT was IB connected.

- 11 -

NERSC: File system Transfers

- 12 -

CRT OSF

NERSC: WAN

Key component: 200G 16QAM transponder

- 13 -

400G: Production (Sept ‘15)

LBNL Ciena node Berkeley, CA

400G service Termination point (4 x 100G Ethernet client)

- 14 -

Test Bed

- 15 -

Testbed: Sept ‘15

● Demonstrated 400G super-channel in lab at LBNL

● 37.5GHz spacing using 80km fiber spool

● Better utilization of the spectral bandwidth

● Using Raman amplification w/ integrated OTDR

● Validating next-gen ROADM technology:

● Flexible (gridless), colorless mux/demux

● Level3's acquiring of TW Telecom last year has caused some delay on

bringing up the Dark Fiber for this project

● Goal to characterize next-gen ROADM architecture in the real-world

and gain operational experience

- 16 -

Testbed: 400G May 2016

- 17 -

Testbed: Industry Partner:

● Provided hands-on technical assistance

● Loaned four 40km single-mode fiber (SMF) spools

● Donated equipment: two colorless mux/demux, two

Raman amplifiers, two switchable line amplifiers

1 colorless mux/demux 1 Raman amp

1 switchable line amp

Four 40 km SMF-28 fiber spools

- 18 -

Final Thoughts

- 19 -

Summary: Project Timeline 2013 Dec White paper on “Moving ESnet Beyond 100G”.

2014 Feb Draft proposal.

2014 May FWP submitted to PAMS. Ciena presents SC’13 400G superchannel at TNC2014 [4]

2014 Sept Receive FY14 guidance.

2015 Jan CR ends. Receive FY15 guidance, project kick-off. Ciena and Brocade equipment procurement.

2015 Feb ALU equipment procurement. Level 3 and ESnet complete Ciena code upgrade.

2015 Mar Level 3 splicing procurement.

2015 May 400G testbed running in lab, super-channel PoC with Ciena spools (80 km) 2nd Ciena equipment procurement.

2015 Jul 400G link across BayExpress (11.5 km) put into production ready for upcoming NERSC relocation to Berkeley.

2015 Nov Press releases (right before SC15), Shyh Wang Hall building dedication.

2015 Dec NERSC relocates to Berkeley facility. Level 3 splices delivers dark fiber.

2016 May Field trial: 400G super-channel across 93.3 km (dark fiber plus spools).

- 20 -

Summary: Filesystem Transfers

- 21 -

Summary: Final Thoughts

• Took almost two years to deploy.

• Worked almost flawlessly

– For the 11 km length.

– Doesn’t work around the entire 450k ring (OSNR too high)

• Took less then a month to move all the data from OSF to CRT

• No apparent “down-time” to the users.

– Took about an hour per file system to ‘remount’ after a final sync

• Still in production today as NERSC moves out of OSF

- 22 -

Thank you!

- 23 -

Contact Info:

• PI: Chris Tracy: [email protected]

• Co-PI: Jason Lee: [email protected]

- 24 -

National Energy Research Scientific Computing

Center

- 25 -

NERSC: WAN Topology

- 26 -

WAN: Topology (cont)

- 27 -

NERSC: WAN Fiber

• Fiber provided by:

• Loaned 11.5 km dark fiber BCXN6956 between

Oakland and Berkeley

• Supported Ciena code upgrade to support new

hardware from this project

• ESnet contributed funds for fiber splicing work

Fiber path is approximate - 28 -