July-2008Fabrizio Furano - The Scalla suite and the Xrootd1.
Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks...
-
Upload
irma-roberts -
Category
Documents
-
view
219 -
download
0
Transcript of Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks...
Performance and Scalability of xrootd
Andrew Hanushevsky (SLAC),Wilko Kroeger (SLAC), Bill Weeks (SLAC),
Fabrizio Furano (INFN/Padova), Gerardo Ganis (CERN)Jean-Yves Nief (IN2P3), Peter Elmer (U Wisconsin)
Les Cottrell (SLAC), Yee Ting Li (SLAC)
Computing in High Energy Physics
13-17 February 2006 http://xrootd.slac.stanford.edu xrootd is largely funded by the US Department of Energy
Contract DE-AC02-76SF00515 with Stanford University
CHEP 13-17 February 2006 2: http://xrootd.slac.stanford.edu
Outline
Architecture Overview Performance & Scalability
Single Server Performance Speed, latency, and bandwidth Resource overhead
Scalability Server and administrative
Conclusion
CHEP 13-17 February 2006 3: http://xrootd.slac.stanford.edu
authentication(gsi, krb5, etc)
Clustering(olbd)
lfn2pfnprefix encoding
Storage System(oss, drm/srm, etc)
authorization(name based)
File System(ofs, sfs, alice, etc)
Protocol (1 of n)(xrootd)
xrootd Plugin Architecture
Protocol Driver(XRD)
CHEP 13-17 February 2006 4: http://xrootd.slac.stanford.edu
Performance Aspects
Speed for large transfers MB/Sec
Random vs Sequential Synchronous vs asynchronous Memory mapped (copy vs “no-copy”)
Latency for small transfers sec round trip time
Bandwidth for scalability “your favorite unit”/Sec vs increasing load
CHEP 13-17 February 2006 5: http://xrootd.slac.stanford.edu
Raw Speed I (sequential)
Disk Limit
Sun V20z2x1.86GHz Opteron 244
16GB RAMSeagate ST373307LC73GB 10K rpm SCSI
sendfile() anyone?
CHEP 13-17 February 2006 6: http://xrootd.slac.stanford.edu
Raw Speed II (random I/O)
(file not preloaded)
CHEP 13-17 February 2006 8: http://xrootd.slac.stanford.edu
Event Rate Bandwidth
NetApp FAS270: 1250 dual 650 MHz cpu, 1Gb NIC, 1GB cache, RAID 5 FC 140 GB 10k rpmApple Xserve: UltraSparc 3 dual 900MHz cpu, 1Gb NIC, RAID 5 FC 180 GB 7.2k rpm Sun 280r, Solaris 8, Seagate ST118167FCCost factor: 1.45
CHEP 13-17 February 2006 9: http://xrootd.slac.stanford.edu
Latency & Bandwidth
Latency & bandwidth are closely related Inversely proportional if linear scaling present
The smaller the overhead the greater the bandwidth Underlying infrastructure is critical
OS and devices
CHEP 13-17 February 2006 11: http://xrootd.slac.stanford.edu
ESnet routed ESnet SDN layer 2 via USN
SLAC to Seattle
BW Challenge
Seattle to SLAC
•SC2005 BW Challenge•Latency Bandwidth
•8 xrootd Servers•4@SLAC & 4@Seattle•Sun V20z w/ 10Gb NIC•Dual 1.8/2.6GHz Opterons•Linux 2.6.12
•1,024 Parallel Clients•128 per server
•35Gb/sec peak•Higher speeds killed router•2 full duplex 10Gb/s links•Provided 26.7% overall BW
•BW averaged 106Gb/sec•17 Monitored links total
I/OBandwidth (wide area network)
http://www-iepm.slac.stanford.edu/monitoring/bulk/sc2005/hiperf.html
CHEP 13-17 February 2006 12: http://xrootd.slac.stanford.edu
xrootd Server Scaling
Linear scaling relative to load Allows deterministic sizing of server
Disk NIC CPU Memory
Performance tied directly to hardware cost Underlying hardware & software are critical
CHEP 13-17 February 2006 15: http://xrootd.slac.stanford.edu
Device & File System Effects
CPU limited
I/O limited
1 Event 2K
UFS good on small readsVXFS good on big reads
CHEP 13-17 February 2006 17: http://xrootd.slac.stanford.edu
Super Scaling
xrootd Servers Can Be Clustered Support for over 256,000 servers per cluster Open overhead of 100us*log64(number servers)
Uniform deployment Same software and configuration file everywhere No inherent 3rd party software requirements
Linear administrative scalingEffective load distribution
CHEP 13-17 February 2006 20: http://xrootd.slac.stanford.edu
Low Latency Opportunities
New programming paradigm Ultra-fast access to small random blocks
Accommodate object data Memory I/O instead of CPU to optimize access
Allows superior ad hoc object selection Structured clustering to scale access to memory
Multi-Terabyte memory systems at commodity prices PetaCachePetaCache Project SCALLASCALLA SStructured CCluster AArchitecture for LLow LLatency AAccess
Increased data exploration opportunities
CHEP 13-17 February 2006 21: http://xrootd.slac.stanford.edu
Memory Access Characteristics
Block size effect on average overall
latency per I/O (1 job - 100k I/O’s)
Scaling effect on average overall
latency clients (5 - 40 jobs)
Disk I/O
Mem I/O
CHEP 13-17 February 2006 22: http://xrootd.slac.stanford.edu
Conclusion
System performs far better than we anticipatedWhy? Excruciating attention to details
Protocols, algorithms, and implementation Effective software collaboration
INFN/Padova: Fabrizio Furano, Alvise Dorigao Root: Fons Rademakers, Gerri Ganis Alice: Derek Feichtinger, Guenter Kickinger Cornell: Gregory Sharp SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill Weeks BaBar: Pete Elmer
Critical operational collaboration BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC
Commitment to “the science needs drive the technology”