Some Techniques Used by ESnet to Mitigate Network Emergencies · PDF fileESnet to Mitigate...

27
1 Some Techniques Used by ESnet to Mitigate Network Emergencies JET Roadmap Workshop April 14, 2004 Joe Burrescia ESnet

Transcript of Some Techniques Used by ESnet to Mitigate Network Emergencies · PDF fileESnet to Mitigate...

1

Some Techniques Used by ESnet to Mitigate Network

Emergencies

JET Roadmap WorkshopApril 14, 2004Joe Burrescia

ESnet

2

What Is an Emergency?

An emergency is any unplanned event that can cause deaths or significant injuries to employees, customers or the public; or that can shut down your business, disrupt operations, cause physical or environmental damage, or threaten the facility's financial standing or public image. Obviously, numerous events can be "emergencies," including:

– Fire– Hazardous materials incident – Flood or flash flood – Hurricane – Tornado – Winter storm – Earthquake – Communications failure– Radiological accident – Civil disturbance– Loss of key supplier or customer – Explosion

From FEMA Emergency Management Guide For Business & Industry

3

What Are Some Network Emergencies That Can be Anticipated ?

• Primary NOC incapacitated• Massive DoS attacks against sites• Disruption of backbone routers• Self recoverable communication outages• Communication outages requiring external

assistance

4

TWC

JGISNLL

LBNL

SLAC

YUCCA MT

BECHTEL

PNNLLIGO

INEEL

LANL

SNLAKCP-

KirklandPANTEX

KCP

NOAA

OSTIORAU

SRS

ORNLJLAB

PPPL

ANL-DCINEEL-DCORAU-DC

LLNL/LANL-DC

MIT

ANL

BNL

FNALAMES

4xLAB-DCNERSC NREL

ALBHUB

LLNL

GA

LamontSDSC

Japan

GTN&NNSA

International (high speed)OC192 (10G/s optical)OC48 (2.5 Gb/s optical)Gigabit Ethernet (1 Gb/s)OC12 ATM (622 Mb/s)OC12 OC3 (155 Mb/s)T3 (45 Mb/s)T1-T3T1 (1 Mb/s)

Office Of Science Sponsored (22)NNSA Sponsored (12)Joint Sponsored (3)Other Sponsored (NSF LIGO, NOAA)Laboratory Sponsored (6)

QWESTATM

42 end user sites

ESnet IP

GEANT- Germany- France- Italy- UK- etc Sinet (Japan)Japan – Russia(BINP)

CA*net4CERNMRENNetherlandsRussiaStarTapTaiwan(ASCC)

CA*net4KDDI (Japan)FranceSwitzerlandTaiwan(TANet2)

AustraliaCA*net4Taiwan(TANet2)

Singaren

ELP HUB

SNV HUB CHI HUB

AOA HUB

ATL HUB

DC HUB

Peering points

MAE-EStarlightChi NAP

Fix-WPAIX-W

MAE-W

NY-NAP

PAIX-E

PNWG

SEA HUB

ESnet Connects DOE Facilities and Collaborators

Hubs

Equinix

Abile

ne Abile

ne

Abilene(future)

Abilene

Equinix

DOE-ALB

5

What Are Some Network Emergencies That Can be Anticipated ?

• Primary NOC incapacitated– Multiply distributed NOC hardware– Geographically dispersed personnel

• Massive DoS attacks against sites• Failures of physical circuits • Self recoverable communication outages• Communication outages requiring external

assistance

6

ESnetPrimary

NOC

The Visibility Problem

7

Distributed NOC Functionalities

– Load Sharing, Online always• DNS• Network Monitoring System (Aprisma’s Spectrum)• Mail• Trouble Ticket System• Main Web Server

– Backed up services, Offline till primary unavailable• Engineering private web server

– Router RANCID database– Circuit database– Contact database– DNS backend database

• PKI server

8

Geographically Distributed Staff

• 1 primary location– Berkeley Lab

• 3 remote locations– Livermore Telework Center– Ames Lab in Iowa– Brookhaven National Lab in New York

9

LBNLPPPL

BNL

AMES

Remote Engineers• partial duplicate infrastructure

DNS

Remote Engineer• partial duplicate

infrastructure

TWCRemoteEngineer

Dispersed Network Resources

ATL HUB

SEA HUB

ALBHUB

NYC HUBS

DC HUB

ELP HUB

CHI HUB

SNV HUB Duplicate InfrastructureCurrently deploying full replication of the NOC databases and servers and Science Services databases in the NYC Qwest carrier hub

Engineers, 24x7 Network Operations Center, generator backed power

• Spectrum (net mgmt system)• DNS (name – IP address

translation)• Engineering databases• Public and private Web• E-mail (server and archive)• PKI cert. repository and

revocation lists• collaboratory authorization

service

10

What Are Some Network Emergencies That Can be Anticipated ?

• Primary NOC incapacitated• Massive DoS attacks against sites

– Phased Security Architecture • Disruption of backbone routers• Self recoverable communication outages• Communication outages requiring external

assistance

11

Maintaining Science Mission Critical Infrastructurein the Face of Cyberattack

• A Phased Security Architecture is being implemented to protects the network and the ESnet sites

• The phased response ranges from blocking certain site traffic to a complete isolation of the network which allows the sites to continue communicating among themselves in the face of the most virulent attacks– Separates ESnet core routing functionality from external Internet

connections by means of a “peering” router that can have a policy different from the core routers

– Provide a rate limited path to the external Internet that will insure site-to-site communication during an external denial of service attack

– Provide “lifeline” connectivity for downloading of patches, exchange of e-mail and viewing web pages (i.e.; e-mail, dns, http, https, ssh, etc.) with the external Internet prior to full isolation of the network

12

Phased Security Topology Thumbnail

13

Phased Response to Cyberattack

LBNL

ESnet

router

router

borderrouter

X

peeringrouter

Lab

Lab

gatewayrouter

ESnet second response – filter traffic from outside of ESnet

Lab first response – filter incoming traffic at their ESnet gateway router

ESnet third response – shut down the main peering paths and provide only limited bandwidth paths for specific

“lifeline” services

Xpeeringrouter

gatewayrouter

border router

router

attack trafficX

ESnet first response –filters to assist a site

14

Gartner WeeklyFLASH, 06 February 2004

*EVENT:* On 1 March 2004, MCI announced that it is now offering a denial-of-service (DoS) service-level agreement (SLA) designed to help customers defend against Internet attacks. The new SLA - believed to be the first offered by an ISP - guarantees that all MCI Internet customers will have immediate access to MCI's security staff, and that MCICustomer Support will respond to a suspected DoS attack within 15 minutes of a customer-generated trouble ticket. If MCI fails to meet the terms of the SLA, the customer will, at its request, be credited one day's prorated MCI charges for the affected service. The new SLA applies across all MCI IP services, and the performance guarantees are automatically extended to all customers at no additional cost.

15

What Are Some Network Emergencies That Can be Anticipated ?

• Primary NOC incapacitated• Massive DoS attacks against sites• Disruption of backbone routers

– Additional router hardening• Self recoverable communication outages• Communication outages requiring external

assistance

16

Additional Router Hardening

• Protocols– Considering using Authenticated protocols

• IGP– ISIS

» Side benefit of not running over IP– OSPFv2– iBGP with MD5 hash

• EGP– eBGP with MD5 hash

17

Additional Router Hardening

• Limit accepted routes– Sites

• Specific ACLs– Peers

• Specific ACLs where possible• “Bogon” ACL’s otherwise

• Limit accepted packets– Peers

• Reverse Path Forwarding (RPF) checks

18

Additional Router Hardening

• Router configurations verified daily– Downloaded from routers and compared to

known good versions.• Verify router OS (Juniper)

– Get a baseline MD5 checksum on critical system files for each router. Then generate an MD5 checksum on the actual files on the router and compare.

19

Router Hardening Resources

• http://nsa2.www.conxion.com• http://www.cymru.com/Documents/secure-ios-template.html• http://www.qorbit.net/documents/junos-template.pdf• http://www.cymru.com/Documents/secure-bgp-template.html

20

What Are Some Network Emergencies That Can be Anticipated ?

• Primary NOC incapacitated• Massive DoS attacks against sites• Disruption of backbone routers• Self recoverable communication outages

– Redundant topology• Communication outages requiring external

assistance

21

Site gateway routersite equip. Site gateway router

Qwest hub

Vendor neutraltelecom facility

ESnet core

ANLFNAL

New ESnet Architecture – Chicago MAN as ExampleCERN

(DOE funded link)

monitor monitor

site equip.

ESnet production IP service

T320

StarLight

other international

peerings

Site LAN Site LAN

all interconnects from the sites

back to the core ring are high

bandwidth and have full module

redundancy

No single point failure can

disrupt

Current approach of

point-to-point tail circuits from hub

to site

22

ESnet/QwestNLRORNL

DENDENSNVSNV

ELPELP

ALBALB ATLATL

DCDC

MANs

High-speed cross connects with Internet2/Abilene

Major DOE Office of Science Sites

Production IP: Long-Term ESnet Connectivity Goal• Connecting MANs with two cores to ensure against hub failure (for example, NLR is shown as the second core below)

Japan

Japan

CERN/Europe

Europe

SDGSDG

Japan

CHICHI

AsiaPacSEASEA

NYCNYC

23

What Are Some Network Emergencies That Can be Anticipated ?

• Primary NOC incapacitated• Massive DoS attacks against sites• Disruption of backbone routers• Self recoverable communication outages• Communication outages requiring external

assistance– Networks helping networks

24

Network Cooperation During Emergencies

• Layer 1– Make use of (take over?) other network’s

“spare” physical links• Layer 2

– “Tunnel” across other network’s infrastructure• Layer 3

– (Pre)configure backup route announcements

25

Layer 1: Using a “Spare” Circuit

26

Layer 2:“Tunneling” Across Another Network

DISCOM Network

PVC

27

Layer 3: Preconfigured Backup Routing

Agnès Pouélé

ESnet announces ESnet and Abilene routes to GTREN

Abilene announces Abilene and ESnet routes to GEANT