1 The Evolution of ESnet (Summary) William E. Johnston ESnet Manager and Senior Scientist Lawrence...

25
1 The Evolution of ESnet (Summary) William E. Johnston ESnet Manager and Senior Scientist Lawrence Berkeley National Laboratory
  • date post

    18-Dec-2015
  • Category

    Documents

  • view

    220
  • download

    2

Transcript of 1 The Evolution of ESnet (Summary) William E. Johnston ESnet Manager and Senior Scientist Lawrence...

1

The Evolution of ESnet(Summary)

William E. Johnston ESnet Manager and Senior Scientist

Lawrence Berkeley National Laboratory

2

Summary – 1

• ESnet’s mission to support the large-scale science of the DOE Office of Science results in a very unique networko The top 100 data flows each month account for about

25-40% of the total monthly network traffic – that is,100-150 Terabytes out of about 450 Terabytes (450,000,000 Megabytes)

o These top 100 flows represent massive data flows from science experiments to analysis sites and back

• At the same time ESnet supports all of the other DOE collaborative science and the Lab operationso The other 60-75% of the ESnet monthly traffic is in

6,000,000,000 flows

3

Summary – 2

• ESnet must have an architecture that provides very high capacity and high reliability at the same timeo Science demands botho Lab operations demand high reliability

• To meet the challenge of DOE science ESnet has developed a new architecture that has two national core rings and many metropolitan area ringso One national core is specialized for massive science datao The other core is for general science and Lab operationso Each core is designed to provide backup for the othero The metropolitan rings reliably connect the Labs at high-

speed to the two cores and connect the cores together

4

Summary – 3

• ESnet is designed, built, and operated as a collaboration between ESnet and the DOE science community and the Labso ESnet planning, configuration, and even operation have

input and active participation from the DOE science community and the DOE Labs

• ESnet also provides a collection of value added services (“science services”) that support the process of DOE collaborative scienceo National and international trust management services for

strong user authentication across all DOE science collaborators (Lab, US university, and international research and education institutions)

o Audio and video conferencing that can be scheduled world wide

5

Summary – 4

• Taken together, these points demonstrate that ESnet is an evolving creation that is uniquely tailored to meet the needs of the large-scale science of the Office of Scienceo This is not a network that can be purchased from a

commercial telecom – and if it were, it would be very expensive

o A very specialized set of services has been combined into a unique facility to support the Office of Science mission

6

DOE Office of Science Drivers for Networking

• The large-scale science that is the mission of the Office of Science is dependent on networks foro Sharing of massive amounts of data

o Supporting thousands of collaborators world-wide

o Distributed data processing

o Distributed simulation, visualization, and computational steering

o Distributed data management

• These issues were explored in two Office of Science workshops that formulated networking requirements to meet the needs of the science programs (see refs.)

7

Science Requirements for Networking

The network and middleware requirements to support DOE science were developed by the OSC science community representing major DOE science disciplines:

o Climate simulationo Spallation Neutron Source facilityo Macromolecular Crystallographyo High Energy Physics experimentso Magnetic Fusion Energy Sciences

o Chemical Scienceso Bioinformaticso The major supercomputing

facilities and Nuclear Physics were considered separately

Available at www.es.net/#research

Conclusions: the network is essential foro long term (final stage) data analysis and collaborationo “control loop” data analysis (influence an experiment in progress)o distributed, multidisciplinary simulation

August, 2002 Workshop Organized by Office of ScienceMary Anne Scott, Chair, Dave Bader,Steve Eckstrand. Marvin Frazier, Dale Koelling, Vicky White

Workshop Panel Chairs:

Ray Bair, Deb Agarwal, Bill Johnston, Mike Wilde, Rick Stevens, Ian Foster, Dennis Gannon, Linda Winkler, Brian Tierney, Sandy Merola, and Charlie Catlett

8

Evolving Quantitative Science Requirements for Networks

Science Areas considered in the Workshop(not Nuclear Physics and Supercomputing)

Today End2End

Throughput

5 years End2End

Documented Throughput

Requirements

5-10 Years End2End Estimated

Throughput Requirements

Remarks

High Energy Physics

0.5 Gb/s 100 Gb/s 1000 Gb/s high bulk throughput

Climate (Data & Computation)

0.5 Gb/s 160-200 Gb/s N x 1000 Gb/s high bulk throughput

SNS NanoScience Not yet started 1 Gb/s 1000 Gb/s + QoS for control channel

remote control and time critical throughput

Fusion Energy 0.066 Gb/s(500 MB/s burst)

0.198 Gb/s(500MB/20 sec. burst)

N x 1000 Gb/s time critical throughput

Astrophysics 0.013 Gb/s(1 TBy/week)

N*N multicast 1000 Gb/s computational steering and collaborations

Genomics Data & Computation

0.091 Gb/s(1 TBy/day)

100s of users 1000 Gb/s + QoS for control channel

high throughput and steering

9

Observed Drivers for the Evolution of ESnet

ESnet Monthly Accepted TrafficFeb., 1990 – Feb. 2005

ESnet is Currently Transporting About 430 Terabytes/mo.(=430,000 Gigabytes/mo. = 430,000,000 Megabytes/mo.)

and this volume is increasing exponentially

TB

ytes

/Mon

th

10

Observed Drivers for the Evolution of ESnet

Oct., 1993

Aug., 1990

Jul., 1998

39 months

57 months

42 months

ESnet traffic has increased by 10X every 46 months, on average,since 1990

Dec., 2001

TB

ytes

/Mon

th

11

ESnet Science Traffic

• Since SLAC and FNAL based, high energy physics experiment data analysis started, the top 100 ESnet flows have consistently accounted for 25% - 40% of ESnet’s monthly total traffico Much of this data goes to sites in Europe for analysis

• As LHC (CERN high energy physics accelerator) data starts to move, the large science flows will increase a lot (200-2000 times)o Both LHC, US tier 1 data centers are at DOE Labs –

Fermilab and Brookhaven- All of the data from the two major LHC experiments – CMS and

Atlas – will be stored at these centers for analysis by groups at US universities

12

Source and Destination of the Top 30 Flows, Feb. 2005T

erab

ytes

/Mon

th

Fer

mila

b (U

S)

Wes

tGrid

(C

A)

SLA

C (

US

)

INF

N C

NA

F (

IT)

SLA

C (

US

)

RA

L (U

K)

Fer

mila

b (U

S)

MIT

(U

S)

SLA

C (

US

)

IN2P

3 (F

R)

IN2P

3 (F

R)

Fer

mila

b (U

S)

SLA

C (

US

)

Kar

lsru

he (

DE

)

Fer

mila

b (U

S)

J

ohns

Hop

kins

12

10

8

6

4

2

0

LIG

O (

US

)

Cal

tech

(U

S)

LLN

L (U

S)

NC

AR

(U

S)

Fer

mila

b (U

S)

SD

SC

(U

S)

Fer

mila

b (U

S)

Kar

lsru

he (

DE

)

LBN

L (U

S)

U. W

isc.

(U

S)

Fer

mila

b (U

S)

U

. Tex

as, A

ustin

(U

S)

BN

L (U

S)

LLN

L (U

S)

BN

L (U

S)

LLN

L (U

S)

Fer

mila

b (U

S)

U

C D

avis

(U

S)

Qw

est (

US

)

ES

net (

US

)F

erm

ilab

(US

)

U. T

oron

to (

CA

)B

NL

(US

)

LLN

L (U

S)

BN

L (U

S)

LLN

L (U

S)

CE

RN

(C

H)

BN

L (U

S)

NE

RS

C (

US

)

LB

NL

(US

)D

OE

/GT

N (

US

)

JLa

b (U

S)

U. T

oron

to (

CA

)

Fer

mila

b (U

S)

NE

RS

C (

US

)

LB

NL

(US

)N

ER

SC

(U

S)

LB

NL

(US

)N

ER

SC

(U

S)

LB

NL

(US

)N

ER

SC

(U

S)

LB

NL

(US

)C

ER

N (

CH

)

Fer

mila

b (U

S)

DOE Lab-International R&E

Lab-U.S. R&E (domestic)

Lab-Lab (domestic)

Lab-Comm. (domestic)

Enabling Future OSC Science:ESnet’s Evolution over the Next 5-10 Years

•Based both on theo projections of the science programs

o changes in observed network traffic and patterns over the past few years

it is clear that the network must evolve substantially in order to meet the needs of OSC science

14

DOE Science Requirements for Networking - 1

1) Network bandwidth must increase substantially, not just in the backbone but all the way to the sites and the attached computing and storage systems

o The 5 and 10 year bandwidth requirements mean that the network bandwidth has to almost double every year

o Upgrading ESnet to accommodate the anticipated increase from the current 100%/yr traffic growth to 300%/yr over the next 5-10 years is priority number 7 out of 20 in DOE’s “Facilities for the Future of Science – A Twenty Year Outlook”

The primary network requirements to come outof the Office of Science workshops were

15

DOE Science Requirements for Networking - 2

2) A highly reliable network is critical for science – when large-scale experiments depend on the network for success, the network must not fail

3) There must be network services that can guarantee various forms of quality-of-service (e.g., bandwidth guarantees)

4) A production, extremely reliable, IP network with Internet services must support the process of science

• This network must have backup paths for high reliability

• This network must be able to provide backup paths for large-scale science data movement

16

ESnet Evolution

• With the old architecture (to 2004) ESnet can not meet the new requirements

• The current core ring cannot handle the anticipated large science data flows at affordable cost

• The current point-to-point tail circuits to sites are neither reliable nor scalable to the required bandwidth

ESnetCore

New York (AOA)

Chicago (CHI)

Sunnyvale (SNV)

Atlanta (ATL)

Washington, DC (DC)

El Paso (ELP)

DOE sites

17

ESnet’s Evolution – The Network Requirements

• Based on the growth of DOE large-scale science, and the resulting needs for remote data and experiment management, the architecture of the network must change in order to support the general requirements of both

1) High-speed, scalable, and reliable production IP networking for

- University and international collaborator and general science connectivity

- Highly reliable site connectivity to support Lab operations

- Global Internet connectivity

2) High bandwidth data flows of large-scale science- Very high-speed network connectivity to specific sites

- Scalable, reliable, and very high bandwidth site connectivity

- Provisioned circuits with guaranteed quality of service(e.g. dedicated bandwidth) and for traffic isolation

18

ESnet’s Evolution – The Network Requirements

• In order to meet these requirements, the capacity and connectivity of the network must increase to provideo Fully redundant connectivity for every site

o High-speed access to the core for every site- at least 20 Gb/s, generally, and 40-100 Gb/s for some sites

o 100 Gbps national core/backbone bandwidth by 2008 in two independent backbones

19

10GE

10GE

RTR

Wide Area Network Technology

RTR

optical fiber ring

“DWDM”Dense Wave (frequency) Division Multiplexing provides the circuits

• today typically 64 x 10 Gb/s optical channels per fiber

• channels (referred to as “lambdas”) are usually used in bi-directional pairs

Lambda (optical) channels are converted to electrical channels

• usually SONET data framing or Ethernet data framing

ESnet hub router

ESnet core

“tail circuit”“local loop”

site LAN

ESnet hub(e.g. Sunnyvale, Chicago,NYC, Washington, Atlanta,

Albuquerque)

ESnet site

RTR

RTR

RTRRTR

A ring topology network is inherently reliable – all single point failures are mitigated by routing traffic in

the other direction around the ring.

RTR

Site – ESnet network policy demarcation

(“DMZ”)ESnet border

router

Site IP “gateway” router

20

ESnet Strategy For A New Architecture

A three part strategy for the evolution of ESnet1) Metropolitan Area Network (MAN) rings to provide

- dual site connectivity for reliability

- much higher site-to-core bandwidth

- support for both production IP and circuit-based traffic

2) A Science Data Network (SDN) core for- provisioned, guaranteed bandwidth circuits to support large, high-speed

science data flows

- very high total bandwidth

- multiply connecting MAN rings for protection against hub failure

- alternate path for production IP traffic

3) A High-reliability IP core (e.g. the current ESnet core) to address- general science requirements

- Lab operational requirements

- Backup for the SDN core

- vehicle for science services

GEANT (Europe)

Asia-Pacific

New York

Chi

cago

Sunnyvale

Washington, DC

El Paso (ELP)

Primary DOE Labs

IP core hubs

ESnet Target Architecture: IP Core + Science Data Network + MANs

Possible new hubs

Atlanta (ATL)

CERN

Seattle

Albuquerque (ALB)

SDN/NLR hubs

Aus.

Aus.

Production IP coreScience Data Network coreMetropolitan Area NetworksLab suppliedInternational connections

ESnetScience Data Network Core

(SDN) (NLR circuits)

ESnetIP Core (Qwest)

MetropolitanArea Rings

San Diego

LA

22

First Two Steps in the Evolution of ESnet

1) The SF Bay Area MAN will provide to the five OSC Bay Area siteso Very high speed site access – 20 Gb/s

o Fully redundant site access

2) The first two segments of the second national10 Gb/s core – the Science Data Network – will be San Diego to Sunnyvale to Seattle

GEANT (Europe)

Asia-Pacific

New York

Chi

cago

Sunnyvale

Washington, DC

El Paso (ELP)

Primary DOE Labs

IP core hubs

Science Data Network – Step One:SF BA MAN and West Coast SDN

Possible new hubs

Atlanta (ATL)

CERN

Seattle

Albuquerque (ALB)

SDN/NLR hubs

Aus.

Aus.

Production IP coreScience Data Network coreMetropolitan Area NetworksLab suppliedInternational connections

ESnetScience Data Network Core

(SDN) (NLR circuits)

ESnetIP Core (Qwest)

MetropolitanArea Rings

San Diego

LA

24

ESnet SF Bay Area MAN Ring (Sept., 2005)

Chicago (Qwest)

El Paso

Seattle and Chicago (NLR)

LA andSan Diego

SF Bay Area

λ4 future

λ3 future

λ2 SDN/circuits

λ1 production IP

SLAC

Qwest /ESnet hub

SNLL

Joint Genome InstituteLBNL

NERSC

LLNL

•2 λs (2 X 10 Gb/s channels) in a ring configuration, and delivered as 10 GigEther circuits

•Dual site connection (independent “east” and “west” connections) to each site

•Will be used as a 10 Gb/s production IP ring and2 X 10 Gb/s paths (for circuit services) to each site

•Qwest contract signed for two lambdas 2/2005 with options on two more

•Project completion date is 9/2005

ESnetIP core ring

(Qwest circuits)

ESnet SDNcore

(NLR circuits)

ESnet MAN ring (Qwest circuits)

NASAAmes

Level 3hub

ESnet hubs and sitesDOE Ultra

Science Net

25

References – DOE Network Related Planning Workshops

1) High Performance Network Planning Workshop, August 2002http://www.doecollaboratory.org/meetings/hpnpw

2) DOE Science Networking Roadmap Meeting, June 2003http://www.es.net/hypertext/welcome/pr/Roadmap/index.html

3) DOE Workshop on Ultra High-Speed Transport Protocols and Network Provisioning for Large-Scale Science Applications, April 2003

http://www.csm.ornl.gov/ghpn/wk2003

4) Science Case for Large Scale Simulation, June 2003http://www.pnl.gov/scales/

5) Workshop on the Road Map for the Revitalization of High End Computing, June 2003

http://www.cra.org/Activities/workshops/nitrd http://www.sc.doe.gov/ascr/20040510_hecrtf.pdf (public report)

6) ASCR Strategic Planning Workshop, July 2003http://www.fp-mcs.anl.gov/ascr-july03spw

7) Planning Workshops-Office of Science Data-Management Strategy, March & May 2004

o http://www-conf.slac.stanford.edu/dmw2004