Xrootd Andrew Hanushevsky Stanford Linear Accelerator Center 30-May-03.
-
Upload
branden-james -
Category
Documents
-
view
216 -
download
2
Transcript of Xrootd Andrew Hanushevsky Stanford Linear Accelerator Center 30-May-03.
May 30, 2003 2: xrootd
Goals
High Performance File-Based Access Scalable, extensible, usable
Fault tolerance Server failures handled in a natural way Servers may be dynamically added and removed
Flexible Security Allowing use of almost any protocol
Rootd Compatibility
May 30, 2003 3: xrootd
Achieving High Performance
Scalable request/response protocol
Multi-threaded multi-process architecture
Architecture sensitive polling
MRU scheduling
Sticky sockets
Adaptive reconfiguration
Versatile sfs layer (based on proven oofs)
May 30, 2003 4: xrootd
Scalable Protocol I
Connection multiplexing One connection per client/host
Multiple logically independent streams
Request redirection supported Similar to http redirection
Supports dynamic load balancing and fail-over
Uses an intentional request header Can better optimize request processing
May 30, 2003 5: xrootd
Scalable Protocol II
Asynchronous mode allowed Multiple processing-order-independent requests Optional application-directed pre-read
I/O segmenting Able to naturally deal with very large transfers
Better use of server resources
Request deferral Client waits for resources without using server
resources
May 30, 2003 6: xrootd
Scalable Protocol III
Unsolicited Reverse Request Mode Allows server to manage client for recovery
Asynchronous redirect, deferral, and messages
Protocol may be compatibly extended Mechanism to send opaque information
Accommodate things that were “forgotten” Messaging interface Cache group Request priority And so on….
May 30, 2003 7: xrootd
MT/MP Architecture
Normally one multi-threaded server per host Should be able to utilize available resources
Easy to administer
Optionally, multiple servers per host Fully utilize large machines
May 30, 2003 8: xrootd
Architecture Sensitive Polling
All POSIX systems support poll() Used by default
Not always an efficient I/O “interrupt” mechanism
Alternate polling mechanisms allowed /dev/poll
Available on Solaris and patched RH Linux Up to an order of magnitude reduction in CPU
Essential to reduce latency
May 30, 2003 9: xrootd
MRU Scheduling
Connections processed in mmost rrecently uused order Gives priority to active connections Reduces polling overhead Essentially a fair scheduling algorithm
Starvation cannot occur Longer running tasks tend to get started first
Assuming all other things being equal
May 30, 2003 10: xrootd
Sticky Sockets
Connection temporarily binds to a thread Avoids polling and scheduling overhead Significantly reduces latency
Connection automatically unbinds Client is not sufficiently active Number of other requests approaches available
threads
May 30, 2003 11: xrootd
Adaptive Reconfiguration
Server dynamically adjusts configuration Number of threads
Kept proportionate to number of active requests Pre-allocated buffers
Sizes track actual usage profile Recomputed periodically
Pre-allocated objects Number tracks recent needs
High latency connections rescheduled
May 30, 2003 12: xrootd
Versatile sfs Layer I
Integrates multiple performance features Dynamic load balancing
Client redirected to “best” server of the moment File descriptor partitioning
Reduces socket polling overhead File system interface reuse
Prevents open file proliferation and attendant overhead Same file opened in same mode shared by multiple clients
File system interface timeout Reduces overhead caused by idle opened files
May 30, 2003 14: xrootd
DLB Implementation
xrootd
dlbd
xrootd
dlbd
xrootd
dlbd
xrootd
dlbd
Client
subscribe
(any number)
(any number)open
wait
open again
try host:port
who has the file?who has the file?
I doI do
May 30, 2003 15: xrootd
Versatile sfs Layer II
Dynamic disk cache integration Allows unlimited file system size Provides superior internal load balancing
Mass Storage Integration HPSS, Castor, Enstore, etc
RFIO Integration
Scalable authorization From file sub-trees to single files
May 30, 2003 16: xrootd
Cache File System
/cache1/databases:mydbfile
/databases/mydbfile
/cache2
/cache3
symlink
Index AreaOptional data cache
Default data area
Data AreaAny numberAny SizeChosen based on free
space in LRU order
MultipleIndependentFilesystems
Naming conventionallows for
audit and index recovery
May 30, 2003 17: xrootd
Fault Tolerance I
Servers may come and go Uses load balancing to effect recovery
New servers can be added at any time Servers may be brought down for maintenance Files can be moved around in real-time
Client simply adjust to the new configuration XTNetFile object handles recovery protocol
May 30, 2003 18: xrootd
Fault Tolerance II
Whenever client looses r/o connection Back to distinguished xrootd(s) for reselection
Whenever client looses r/w connection Limited wait/retry loop on the same server
We will be working to improve this next year!
All handled in the XTNetFile class Disruptions merely delay the client
May 30, 2003 19: xrootd
Flexible Security
Negotiated Security Protocol Allows client/server to agree on protocol
E.g., Kerberos, GSI, AFS Kerberos, etc.
Can be easily extended Multi-protocol authentication support
May 30, 2003 20: xrootd
Security Architecture
login
authenticate
Client-Specific Security Configuration
libooseccl.solibooseccl.so
ProtocolSelection
SelfConfiguration
Security Token
Multiple handshakes allowedMultiple handshakes allowedduring authentication phaseduring authentication phase
(required by some PKI protocols)
May 30, 2003 21: xrootd
Heterogeneous Security Support
• Servers have one or moreprotocol objects
• Server protocol objects createdat server initialization time
• Client selects which protocol touse when security context created
• Protocol object created based on configuration returned by xrootd
• One security context object perphysical xrootd connection
• Protocol objects may be sharedby one or more contexts
• Each “pass” through a securitycontext object may generatecredentials to be passed to xrootd
protocolsprotocols
May 30, 2003 22: xrootd
Simple & Effective Interface
For each login that requires authentication XrdSecCreateSecurityContext(ipaddr, config)
Returns security protocol object XrdSecClientSecurityXrdSecClientSecurity
Based on server ipaddr and server-supplied config XrdSecClientSecurity::getCredentials()
Returns credentials to be sent to the server Done via authenticateauthenticate request and possible authmoreauthmore response
Based on well tested and documented oofs security
May 30, 2003 23: xrootd
Optional Scalable Authorization
u abh rw /slac/rootfiles/usr/abh r /cern/rootfiles
libooseccl.so
libooacc.so
AuthenticationAuthorization
May 30, 2003 24: xrootd
Security Summary
Multi-protocol Authentication Supports distributed heterogeneous environments
Scalable Authorization Open-ended capability based model
Integrated Auditing To keep the security hard hats happy
Well defined, proven interfaces Trivially replaceable for a plug & play architecture
May 30, 2003 25: xrootd
rootd Compatibility
Bilateral compatibility XTNetfile reverts to TNetFile for rootd servers XRootd reverts to rootd protocol for TNetFile
Allows for transparent introduction Can run mixed mode Binary is multi-environment compatible
May 30, 2003 26: xrootd
Compatibility Modes
xrootdxrootd
rootdrootd
xrootdxrootdrootd compabilityrootd compabilityTNetFile
Application
TNetFile
XTNetFile
Application
rootdrootd
Client-Side Compatibility
Server-Side Compatibility
May 30, 2003 27: xrootd
xrootd Architecture
Protocol LayerProtocol Layer
Filesystem Logical LayerFilesystem Logical Layer
Filesystem Physical LayerFilesystem Physical Layer
Filesystem ImplementationFilesystem Implementation
Protocol ManagerProtocol Manager
May 30, 2003 28: xrootd
Dynamically loadedDynamically loaded(can also be static)(can also be static)
xrootd Internals
May 30, 2003 29: xrootd
Conclusion
xrootd provides high performance file access Improves over afs, ams, nfs, etc.
Unique performance, usability, scalability, security, compatibility, and recoverability characteristics
xrootd can provide a firm server foundation for native file system implementations E.g. alienfs, gridfs, slashgrid, etc
For now, aim is to support BaBar