Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks...

22
Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks (SLAC), Fabrizio Furano (INFN/Padova), Gerardo Ganis (CERN) Jean-Yves Nief (IN2P3), Peter Elmer (U Wisconsin) Les Cottrell (SLAC), Yee Ting Li (SLAC) Computing in High Energy Physics 13-17 February 2006 http://xrootd.slac.stanford.edu xrootd is largely funded by the US Department of Energy Contract DE-AC02-76SF00515 with Stanford University

Transcript of Performance and Scalability of xrootd Andrew Hanushevsky (SLAC), Wilko Kroeger (SLAC), Bill Weeks...

Performance and Scalability of xrootd

Andrew Hanushevsky (SLAC),Wilko Kroeger (SLAC), Bill Weeks (SLAC),

Fabrizio Furano (INFN/Padova), Gerardo Ganis (CERN)Jean-Yves Nief (IN2P3), Peter Elmer (U Wisconsin)

Les Cottrell (SLAC), Yee Ting Li (SLAC)

Computing in High Energy Physics

13-17 February 2006 http://xrootd.slac.stanford.edu xrootd is largely funded by the US Department of Energy

Contract DE-AC02-76SF00515 with Stanford University

CHEP 13-17 February 2006 2: http://xrootd.slac.stanford.edu

Outline

Architecture Overview Performance & Scalability

Single Server Performance Speed, latency, and bandwidth Resource overhead

Scalability Server and administrative

Conclusion

CHEP 13-17 February 2006 3: http://xrootd.slac.stanford.edu

authentication(gsi, krb5, etc)

Clustering(olbd)

lfn2pfnprefix encoding

Storage System(oss, drm/srm, etc)

authorization(name based)

File System(ofs, sfs, alice, etc)

Protocol (1 of n)(xrootd)

xrootd Plugin Architecture

Protocol Driver(XRD)

CHEP 13-17 February 2006 4: http://xrootd.slac.stanford.edu

Performance Aspects

Speed for large transfers MB/Sec

Random vs Sequential Synchronous vs asynchronous Memory mapped (copy vs “no-copy”)

Latency for small transfers sec round trip time

Bandwidth for scalability “your favorite unit”/Sec vs increasing load

CHEP 13-17 February 2006 5: http://xrootd.slac.stanford.edu

Raw Speed I (sequential)

Disk Limit

Sun V20z2x1.86GHz Opteron 244

16GB RAMSeagate ST373307LC73GB 10K rpm SCSI

sendfile() anyone?

CHEP 13-17 February 2006 6: http://xrootd.slac.stanford.edu

Raw Speed II (random I/O)

(file not preloaded)

CHEP 13-17 February 2006 7: http://xrootd.slac.stanford.edu

Latency Per Request

CHEP 13-17 February 2006 8: http://xrootd.slac.stanford.edu

Event Rate Bandwidth

NetApp FAS270: 1250 dual 650 MHz cpu, 1Gb NIC, 1GB cache, RAID 5 FC 140 GB 10k rpmApple Xserve: UltraSparc 3 dual 900MHz cpu, 1Gb NIC, RAID 5 FC 180 GB 7.2k rpm Sun 280r, Solaris 8, Seagate ST118167FCCost factor: 1.45

CHEP 13-17 February 2006 9: http://xrootd.slac.stanford.edu

Latency & Bandwidth

Latency & bandwidth are closely related Inversely proportional if linear scaling present

The smaller the overhead the greater the bandwidth Underlying infrastructure is critical

OS and devices

CHEP 13-17 February 2006 10: http://xrootd.slac.stanford.edu

Server Scaling (Capacity vs Load)

CHEP 13-17 February 2006 11: http://xrootd.slac.stanford.edu

ESnet routed ESnet SDN layer 2 via USN

SLAC to Seattle

BW Challenge

Seattle to SLAC

•SC2005 BW Challenge•Latency Bandwidth

•8 xrootd Servers•4@SLAC & 4@Seattle•Sun V20z w/ 10Gb NIC•Dual 1.8/2.6GHz Opterons•Linux 2.6.12

•1,024 Parallel Clients•128 per server

•35Gb/sec peak•Higher speeds killed router•2 full duplex 10Gb/s links•Provided 26.7% overall BW

•BW averaged 106Gb/sec•17 Monitored links total

I/OBandwidth (wide area network)

http://www-iepm.slac.stanford.edu/monitoring/bulk/sc2005/hiperf.html

CHEP 13-17 February 2006 12: http://xrootd.slac.stanford.edu

xrootd Server Scaling

Linear scaling relative to load Allows deterministic sizing of server

Disk NIC CPU Memory

Performance tied directly to hardware cost Underlying hardware & software are critical

CHEP 13-17 February 2006 13: http://xrootd.slac.stanford.edu

Overhead Distribution

CHEP 13-17 February 2006 14: http://xrootd.slac.stanford.edu

OS Effects

CHEP 13-17 February 2006 15: http://xrootd.slac.stanford.edu

Device & File System Effects

CPU limited

I/O limited

1 Event 2K

UFS good on small readsVXFS good on big reads

CHEP 13-17 February 2006 16: http://xrootd.slac.stanford.edu

NIC Effects

CHEP 13-17 February 2006 17: http://xrootd.slac.stanford.edu

Super Scaling

xrootd Servers Can Be Clustered Support for over 256,000 servers per cluster Open overhead of 100us*log64(number servers)

Uniform deployment Same software and configuration file everywhere No inherent 3rd party software requirements

Linear administrative scalingEffective load distribution

CHEP 13-17 February 2006 18: http://xrootd.slac.stanford.edu

Cluster Data Scattering (usage)

CHEP 13-17 February 2006 19: http://xrootd.slac.stanford.edu

Cluster Data Scattering (utilization)

CHEP 13-17 February 2006 20: http://xrootd.slac.stanford.edu

Low Latency Opportunities

New programming paradigm Ultra-fast access to small random blocks

Accommodate object data Memory I/O instead of CPU to optimize access

Allows superior ad hoc object selection Structured clustering to scale access to memory

Multi-Terabyte memory systems at commodity prices PetaCachePetaCache Project SCALLASCALLA SStructured CCluster AArchitecture for LLow LLatency AAccess

Increased data exploration opportunities

CHEP 13-17 February 2006 21: http://xrootd.slac.stanford.edu

Memory Access Characteristics

Block size effect on average overall

latency per I/O (1 job - 100k I/O’s)

Scaling effect on average overall

latency clients (5 - 40 jobs)

Disk I/O

Mem I/O

CHEP 13-17 February 2006 22: http://xrootd.slac.stanford.edu

Conclusion

System performs far better than we anticipatedWhy? Excruciating attention to details

Protocols, algorithms, and implementation Effective software collaboration

INFN/Padova: Fabrizio Furano, Alvise Dorigao Root: Fons Rademakers, Gerri Ganis Alice: Derek Feichtinger, Guenter Kickinger Cornell: Gregory Sharp SLAC: Jacek Becla, Tofigh Azemoon, Wilko Kroeger, Bill Weeks BaBar: Pete Elmer

Critical operational collaboration BNL, CNAF, FZK, INFN, IN2P3, RAL, SLAC

Commitment to “the science needs drive the technology”