Post on 14-Dec-2015
Networking
Shawn McKee
University of Michigan
DOE/NSF Review November 29, 2001
November 29, 2001Shawn Mckee, UMich 2DOE/NSF Review
Why Networking?• Since the early 1980’s physicists have depended upon
leading-edge networks to enable ever larger international collaborations.
• Major HEP collaborations, such as ATLASATLAS, require rapid access to event samples from massive data stores, not all of which can be locally stored at each computational site.
• Evolving integrated applications, i.e. Data Grids, rely on seamless, transparent operation of the underlying LANs and WANs.
• Networks are among the most basic Grid building blocks.
November 29, 2001Shawn Mckee, UMich 3DOE/NSF Review
Tier 1
Tier2 Center
Online SystemOffline Farm,
CERN Computer Ctr ~25 TIPS
BNL CenterFrance ItalyUK
InstituteInstituteInstituteInstitute ~0.25TIPS
Workstations
~100 MBytes/sec
100 - 1000 Mbits/sec
Physicists work on analysis “channels”
Each institute has ~10 physicists working on one or more channels
Physics data cache
~PByte/sec
~2.5 Gbits/sec
Tier2 CenterTier2 CenterTier2 Center
~2.5 Gbps
Tier 0 +1
Tier 3
Tier 4
Tier2 Center Tier 2
CERN/Outside Resource Ratio ~1:2Tier0/( Tier1)/( Tier2) ~1:1:1
Hierarchical Computing Model
November 29, 2001Shawn Mckee, UMich 4DOE/NSF Review
MONARC Simulations
• MONARC (Models of Networked Analysis at Regional Centres) has simulated Tier 0/ Tier 1/Tier 2 data processing for ATLAS.
• Networking implications: Tier 1 centers require ~ 140 Mbytes/sec to Tier 0 and ~200 Mbytes/sec to (each?) other Tier 1s, based upon 1/3 of ESD stored at each Tier 1.
November 29, 2001Shawn Mckee, UMich 5DOE/NSF Review
TCP WAN Performance
Mathis, et. al., Computer Communications Review v27, 3, July 1997, demonstrated the dependence of bandwidth on network parameters:
PkLossRTT
MSSBW
7.0
BW - BandwidthBW - Bandwidth
MSS – Max. Segment SizeMSS – Max. Segment Size
RTT – Round Trip TimeRTT – Round Trip Time
PkLoss – Packet loss ratePkLoss – Packet loss rate
If you want to get 90 Mbps via TCP/IP on a WAN link from LBL to IU you need a packet loss < 1.8e-6 !! (~70 ms RTT).
November 29, 2001Shawn Mckee, UMich 6DOE/NSF Review
Network Monitoring: Iperf
• We have setup testbed network monitoring using Iperf (V1.2) (S. McKee(Umich), D. Yu (BNL))
• We test both UDP (90 Mbps sending) and TCP between all combinations of our 8 testbed sites.
• Globus is used to initiate both the client and server Iperf processes.
(http://atgrid.physics.lsa.umich.edu/~cricket/cricket/grapher.cgi)
November 29, 2001Shawn Mckee, UMich 7DOE/NSF Review
USATLAS Grid Testbed
Calren Esnet, Abilene, Nton
Abilene
ESnet, Mren
UC BerkeleyLBNL-NERSC
ESnet
NPACI, Abilene
BrookhavenNationalLaboratory
Indiana University
Boston University
ArgonneNationalLaboratory
HPSS sites
U Michigan
University ofTexas atArlington
University of Oklahoma
Prototype Tier 2s
November 29, 2001Shawn Mckee, UMich 8DOE/NSF Review
Testbed Network Measurements
SiteSite UDPUDP(Mbps)(Mbps) TCPTCP(Mpbs)(Mpbs) PkLoss (%)*PkLoss (%)* JitterJitter(ms)(ms)TCP Wind, TCP Wind, BottleneckBottleneck
ANLANL 65.4/81.3 17.7/20.9 0.24/0.03 1.1/0.1 2 M, 100
BNLBNL 66.4/83.5 10.5/13.6 0.51/0.19 1.7/0.5 4 M, 100
BUBU 63.4/78.6 10.8/13.4 0.70/0.25 2.4/1.27 128 K, 100
IUIU 35.8/40.3 26.7/35.0 0.31/0.048 0.9/0.55 2 M, 45
LBLLBL 70.4/88.4 15.7/20.8 0.16/0.014 1.6/0.7 2 M, 100
OUOU 72.1/90.8 21.5/27.8 0.89/0.020 1.7/0.4 2 M, 100
UMUM 69.7/87.3 27.5/36.0 0.26/0.018 1.8/0.6 2 M, 100
UTAUTA 9.5 3.8 0.57 1.3 128 K, 1010
November 29, 2001Shawn Mckee, UMich 9DOE/NSF Review
Networking Requirements
There is more than a simple requirement of adequate network bandwidth for USATLAS. We need:
–A set of local, regional, national and international networks able to interoperate transparently, without bottlenecks.
–Application software that works together with the network to provide high throughput and bandwidth management.
–A suite of high-level collaborative tools that will enable effective data analysis between internationally distributed collaborators.
The ability of USATLAS to effectively participate at the LHC is closely tied to our underlying networking infrastructure!
November 29, 2001Shawn Mckee, UMich 10DOE/NSF Review
Networking as a Common Project• A new Internet2 working group has formed from the LHC
Common Projects initiative: HENPHENP (High Energy/Nuclear Physics), co-chaired by Harvey Newman (CMS) and Shawn McKee (ATLAS).
• Initial meeting hosted by IU in June, kick-off meeting in Ann Arbor October 26th
• The issues this group is focusing on are the same same that USATLAS networking needs to address.
• USATLAS gains the advantage of a greater resource pool dedicated to solving network problems, a “louder” voice in standard settings and a better chance to realize necessary networking changes.
November 29, 2001Shawn Mckee, UMich 11DOE/NSF Review
Network Coupling to Software• Our software and computing model will evolve as
our network evolves…both are coupled.• Very different computing models result from
different assumptions about the capabilities of the underlying network (Distributed vs Local).
• We must be careful to keep our software “network network awareaware” while we work to insure our networks will meet the needs of the computing model.
November 29, 2001Shawn Mckee, UMich 12DOE/NSF Review
Achieving High Performance Networking• Server and Client CPU, I/O and NIC throughput sufficient
• Must consider firmware, hard disk interfaces, bus type/capacity• Knowledge base of hardware: performance, tuning issues, examples
• TCP/IP stack configuration and tuning is Absolutely RequiredAbsolutely Required• Large windows, multiple streams
• No Local infrastructure bottlenecks• Gigabit Ethernet “clear path” between selected host pairs• To 10 Gbps Ethernet by ~2003
• Careful Router/Switch configuration and monitoring • Enough router “Horsepower” (CPUs, Buffer Size, Backplane BW)• Packet Loss must be ~Zero (well below 0.1%)
• i.e. No “Commodity” networks (need ESNet, I2 type networks)• End-to-end monitoring and tracking of performance
November 29, 2001Shawn Mckee, UMich 13DOE/NSF Review
Local Networking Infrastructure
• LANs used to lead WANs in performance, capabilities and stability, but this is no longer true.
• WANs are deploying 10 Gigabit technology compared with 1 Gigabit on leading edge LANs.
• New protocols and services are appearing on backbones (Diffserv, IPV6, multicast) (ESNet, I2ESNet, I2).
• Insuring our ATLAS institutions have the required LOCAL level of networking infrastructure to effectively participate in ATLAS is a major challenge.
November 29, 2001Shawn Mckee, UMich 14DOE/NSF Review
Estimating Site CostsSite CostsSite Costs OC3
155MbpsOC12 622Mbps
OC48 2.4Gbps
Fiber/campus Backbone
I2 req.
(Sup. Gig)
I2 req.
(Sup Gig)
I2 req.
(Sup Gig)
Network Interface
$100/conn. (Fast Eth.)
$1K/conn. (Gigabit)
$1K/conn.
(Gigabit)
Routers $15-30K $40-80K $60-120K
Telecom service Provider
Variable (~$12K/y)
Variable (~$20K/y)
Variable (~$50K/y)
Network connection Fee
$110K $270K $430K
Network Planning for US ATLAS Tier 2 Facilities, R. Gardner, G. Bernbom (IU)
November 29, 2001Shawn Mckee, UMich 15DOE/NSF Review
Networking Plan of Attack
• Refine our requirements for the network• Survey existing work and standards • Estimate likely developments in networking and
their timescales• Focus on gaps between expectations and needs• Adapt existing work for US ATLAS• Provide clear, compelling cases to funding
agencies about the critical importance of the network
November 29, 2001Shawn Mckee, UMich 16DOE/NSF Review
Network Efforts• Survey of current/future
network related efforts• Determine and document US
ATLAS network requirements
• Problem Isolation (Finger pointing tools)
• ProtocolsProtocols (Achieving high bandwidth and reliable connections)
• Network testbed (implementation, Grid testbed upgrades)
• ServicesServices (QoS, Multicast, Encryption, Security)
• Network configuration examples and recommendations
• End-to-end knowledgebaseknowledgebase• Monitoring for both
prediction and fault detection
• Liaison to network related efforts and funding agencies
November 29, 2001Shawn Mckee, UMich 17DOE/NSF Review
Network Related FTEs/CostsUS ATLAS Networking
2.75
4 4 4
3.25
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
2002 2003 2004 2005 2006
FT
E
0
50
100
150
200
250
K$
FTE(tot)
FTE(need)
K$
Network related efforts to leverage and adapt existing efforts for ATLAS
November 29, 2001Shawn Mckee, UMich 18DOE/NSF Review
Support for Networking?• Traditionally, DOEDOE and NSFNSF have provided University
networking support indirectly through the overhead charged to grant recipients.
• National labs have network infrastructure provided by DOEDOE, but not at the level we are finding we require.
• Unlike networking, computing for HEP has never been considered as simply infrastructure.
• The Grid is blurring the boundaries of computing and the network is taking on a much more significant, fundamental role in HEP computing.
• It will be necessary for funding agencies to recognize the fundamental role the network plays in our computing model and to support it directly.
November 29, 2001Shawn Mckee, UMich 19DOE/NSF Review
What can we Conclude?• Networks will be vital to the success of our USATLAS
efforts.• Network technologies and services are evolving requiring us
to test and develop with current networks while planning for the future.
• We must raise and maintain awareness of networking issues for our collaborators, network providers and funding agencies.
• We must clearly present network issues to the funding agencies to get the required support.
• We need to determine what deficiencies exist in network infrastructure, services and support and work to insure those gaps are closed before they adversely impact our program.
November 29, 2001Shawn Mckee, UMich 20DOE/NSF Review
References
• US ATLAS Facilities Plan– http://www.usatlas.bnl.gov/computing/mgmt/dit/
• MONARC – http://monarc.web.cern.ch/MONARC/
• HENP Working Group– http://www.usatlas.bnl.gov/computing/mgmt/lhccp/henpnet/
• Iperf monitoring page– http://atgrid.physics.lsa.umich.edu/~cricket/cricket/grapher.cgi
November 29, 2001Shawn Mckee, UMich 22DOE/NSF Review
Network FTE Breakdown
2002 2003 2004 2005 2006
Survey 0.25 0.25 0.25 0.25 0.25
Requirements 0.5/0.25 0.5/0.25 0.25 0.25 0.25
Protocols 0.25 0.25 0.25 0.25
Services 0.25 0.5/0.25 0.75/0.25 0.5/0.25 0.5/0.25
Configuration 0.25 0.5/0.25 0.5/0.25 0.5/0.25 0.5/0.25
Testbed 0.25/0.25 0.5 0.5 0.5
Monitoring 0.25/0.25 0.25/0.25 0.25/0.25 0.5/0.25 0.5/0.25
End-to-End KB 0.25 0.5/0.25 0.5/0.5 0.5/0.5 0.5/0.5
Problem Isolation 0.25 0.5 0.5 0.5 0.5
Liaison 0.25/0.25 0.25/0.25 0.25/0.25 0.25/0.25 0.25/0.25
November 29, 2001Shawn Mckee, UMich 23DOE/NSF Review
Network K$ Breakdown
2002 2003 2004 2005 2006
Survey 44 44 44 22 22
Requirements 44 44 44 44 44
Protocols 55 55 1010 55
Services 55 1515 2020 2020 4040
Configuration 55 1010 1010 1515 2020
Testbed 6060 7575 120120 3030
Monitoring 1212 1212 1212 2525 2525
End-to-End KB 1010 2020 2020 2020 2020
Problem Isolation 44 55 66 88 66
Liaison 77 77 88 88 88