TeraGrid National Cyberinfrastructure for Terascale Science Dane Skow Deputy Director, TeraGrid
description
Transcript of TeraGrid National Cyberinfrastructure for Terascale Science Dane Skow Deputy Director, TeraGrid
Charlie Catlett ([email protected]) January 2007
TeraGridNational Cyberinfrastructure for
Terascale Science
Dane SkowDeputy Director, TeraGridwww.teragrid.org The University of Chicago and Argonne National Laboratory
February 2007
Slides courtesy of Charlie Catlett (UC/ANL), Tony Rimovsky (NCSA) and Reagan Moore (SDSC)TeraGrid is supported by the National Science Foundation Office of Cyberinfrastructure
Petascale
“NSF Cyberinfrastructure Vision for 21st Century Discovery”
1. Distributed, scalable up to
petaFLOPS HPC
2. Data, data analysis,
visualization
3. Collaboratories, observatories, virtual
organizations
includes networking, middleware,
systems software
“sophisticated” science application software
includes data to and from
instruments
4. Education and Workforce
• provide sustainable and evolving CI that is secure, efficient, reliable, accessible, usable, and interoperable
• provide access to world-class tools and servicesDraft 7.1 CI Plan at
www.nsf.gov/oci/Adapted from: Dan Atkins, NSF Office of Cyberinfrastructure
Charlie Catlett ([email protected]) January 2007
TeraGrid Mission• TeraGrid provides integrated, persistent, and pioneering
computational resources that will significantly improve our nation’s ability and capacity to gain new insights into our most challenging research questions and societal problems.
– Our vision requires an integrated approach to the scientific workflow including obtaining access, application development and execution, data analysis, collaboration and data management.
– These capabilities must be accessible broadly to the science, engineering, and education community.
Dane Skow ([email protected]) February 2007
SDSC
TACC
UC/ANL
NCSA
ORNL
PU
IU
PSC
NCAR
CaltechUSC/ISI
UNC/RENCI
UW
Resource Provider (RP)
Software Integration Partner
Grid Infrastructure
Group(GIG)
TeraGrid Facility Partners
NIU
Dane Skow ([email protected]) February 2007
Networking
SDSC
UC/ANL PSC
TACC
ORNLLA DEN
NCSA
NCAR
Abilene
2x10G
1x10G 1x10G
PU
IPGrid
IU
CHI
1x10G
1x10G each
2x10G
1x10G
1x10G
3x10G each
Cornell
1x10G
1x10G
Dane Skow ([email protected]) February 2007
TeraGrid Usage Growth
Specific Allocations Roaming Allocations
Normalized Units (millions)
100
200
TeraGrid currently delivers to users an average of 400,000 cpu-hours per day -> ~20,000 CPUs DC
Dane Skow ([email protected]) February 2007
1
100
10,000
O N D J F M A M J J A S O N D J F M A M J J A S O N D J F M A M J J A S O N D
2003 2004 2005 2006
Active UsersAll Users EverNew Accounts
TeraGrid User Community GrowthBegin TeraGrid Production Services
(October 2004)
Incorporate NCSA and SDSC Core (PACI) Systems and Users
(April 2006)
Decommissioning of systems typically causes slight reductions in active users. E.g. December 2006 is due to decommissioning of Lemeux (PSC).
FY05 FY06
New User Accounts 948 2,692
Avg. New Users per Quarter 315 365*
Active Users 1,350 3,228
All Users Ever 1,799 4,491(*FY06 new users/qtr excludes Mar/Apr 2006)
Charlie Catlett ([email protected]) January 2007
TeraGrid Projects by Institution
Blue: 10 or more PI’sRed: 5-9 PI’sYellow: 2-4 PI’sGreen: 1 PI
1000 projects, 3200 users
TeraGrid allocations are available to researchers at any US educational institution by peer review. Exploratory allocations can be obtained through a biweekly review process. See www.teragrid.org.
Charlie Catlett ([email protected]) January 2007
FY06 Quarterly Usage by Discipline
100
50
Percent Usage
Charlie Catlett ([email protected]) January 2007
TeraGrid Science Gateways Initiative:Service-Oriented Approach
The science and engineering community has been building discipline-specific cyberinfrastructure in the form of portals, applications, and grids. Our objective is to enable these to use TeraGrid resources transparently as “back-ends” to their infrastructure.
The TeraGrid Science Gateways program has developed, in partnership with 20+ communities and multiple major Grid projects, an initial set of processes, policies, and services that enable these gateways to access TeraGrid (or other facilities) resources via web services.
TeraGridTeraGridGrid-XGrid-X Grid-YGrid-Y
Web Services
Dane Skow ([email protected]) February 2007
Use ModalityUse Modality Community SizeCommunity Size(est. number of projects)(est. number of projects)
Batch Computing on Individual Resources 850
Exploratory and Application Porting 650
Workflow, Ensemble, and Parameter Sweep 160
Science Gateway Access 100
Remote Interactive Steering and Visualization 35
Tightly-Coupled Distributed Computation 10
TeraGrid User Community in 2006
Grid
-y U
sers
Dane Skow ([email protected]) February 2007
Data Storage Resources
• Local Cluster Files System
• Global File System
GPFS-WAN 250TB
• Data Collections
• Archive Storage
Graphic courtesy of SDSC datacentral
Dane Skow ([email protected]) February 2007
Local Cluster Storage
• Normal site user/group permissions apply– TeraGrid users typically have individual accounts connected with
their project team via usual uid/gid groups
– Therefore normal containment/forensic tools work inside the system
• GridFTP transfer from one resource to another– Dedicated GridFTP mover nodes for parallel systems
– Dynamic GridFTP mover “fleet” direct from apps
– Central TeraGrid Listener to gather system aggregate data• Modification to standard set to lift “vail of privacy” within TeraGrid
• System metrics and diagnostics
• Forensics analysis database
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
•Shared NFS-like file system within a single site
–GPFS, Lustre, NFS, PVFS,QFS, CXFS, …
Dane Skow ([email protected]) February 2007
“Global” File System
• TeraGrid has central GPFS-WAN server at SDSC mounted by several clusters across the grid.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
•Pros–Common namespace
–POSIX syntax for remote file access
–Single Identity space (x509) across WAN
–High speed parallel systems available
•Cons–GPFS-WAN: IBM licensing and availability
–Lustre-WAN: Lack of WAN security model
–No group authZ construct support
Dane Skow ([email protected]) February 2007
Archived Storage
• Just now beginning to deal with archived storage as an allocated resource.
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
• Issues– Retention policy/guarantee
– Media migration
– Privacy/Security/Availability on abandoned files
– Economic Model (NCAR has a “Euro” approach with common currency)
Dane Skow ([email protected]) February 2007
Using an SRB Data Grid - Details
Storage Resource Broker
•Data request goes to SRB Server
Storage Resource Broker
Metadata Catalog
DB
•Server looks up information in catalog
•Catalog tells which SRB server has data
•1st server asks 2nd for data
•The data is found and returned
•User asks for data
QuickTime™ and aTIFF (LZW) decompressorare needed to see this picture.
Dane Skow ([email protected]) February 2007
Lessons Learned
• Lesson from Stakkato was not (just) scale of attack, but rather importance of being able to restore control– In a connected world with agents this means
• Virtual borders -- ALL > collaborators > pair-wise trusts
• Centralized logging for forensics/IDS– USE THE SAME SYSTEM FOR DAILY OPERATIONS/METRICS !
– We must be able to (perhaps painfully) outpace attackers in cleaning system
• Ease of use and ubiquity are essential to adoption– AFS’s change to directory permission from file
permissions had a huge adoption barrier cost
Dane Skow ([email protected]) February 2007
Lessons Learned
• Work is needed on distributed group authorization/management tooling– Group membership and roles are best maintained by the
leaders of the group
– Policy rules are best kept and enforced by the data store
• Security Triad: – Who you are
– Where you can go
– What you can do
Dane Skow ([email protected]) February 2007
Lessons Learned
• Work is needed on distributed group authorization/management tooling– Group membership and roles are best maintained by the
leaders of the group
– Policy rules are best kept and enforced by the data store
• Security Triad:
–Who you are– Where you can go– What you can do
Dane Skow ([email protected]) February 2007
Lessons Learned• Work is needed on distributed group
authorization/management tooling– Group membership and roles are best maintained by the
leaders of the group– Policy rules are best kept and enforced by the data store
• Security Triad: – Who you are– Where you can go– What you can do
• Some actions are so dangerous that they deserve to have the 2 person rule enforced – (e.g. archive tape erasure)
Dane Skow ([email protected]) February 2007
Lessons Learned
• Security is never “done”– The coordination team (building) from the Stakkato
incident was THE most valuable result.
Dane Skow ([email protected]) February 2007
Security in Distributed Data Management Systems
Storage Resource BrokerStorage Resource Broker
Reagan W. MooreReagan W. Moore
Wayne SchroederWayne Schroeder
Mike WanMike Wan
Arcot RajasekarArcot Rajasekar
{moore, schroede, mwan, sekar}@sdsc.edu
http://www.sdsc.edu/srb
http://irods.sdsc.edu/http://irods.sdsc.edu/
Charlie Catlett ([email protected]) January 2007
Logical Name Spaces
• Logical User name– Unique identifier for each person accessing the
system• {User-name, project-name}
– User groups - aggregations of users• Membership in multiple groups
– Data grids (zones) • {user-name, project-name, zone-name}
Charlie Catlett ([email protected]) January 2007
Authorization - SRB
• Assign access controls on each name space– Files– Metadata– Storage
• Assign roles that represent sets of allowed operations– Role - administrator, curator, read, write, annotate
Charlie Catlett ([email protected]) January 2007
Rule-based Data Management
iRODS (integrated Rule Oriented Data System)• Map from management policies to rules controlling
execution of remote micro-services• Manage persistent state information for results of
micro-service execution• Support an additional three logical name spaces
– Rules– Micro-services– Persistent state information
Charlie Catlett ([email protected]) January 2007
Controlling Remote Operations
Data ManagementEnvironment
ConservedProperties
ControlMechanisms
RemoteOperations
ManagementFunctions
AssessmentCriteria
ManagementPolicies
Capabilities
Data ManagementInfrastructure
PersistentState
Rules Micro-services
PhysicalInfrastructure
Database Rule Engine StorageSystem
iRODS - integrated Rule-Oriented Data SystemiRODS - integrated Rule-Oriented Data System
Charlie Catlett ([email protected]) January 2007
Rule-based Access
• Associate security policies with each digital entity– Redaction, access controls on structures within a file– Time-dependent access controls (how long to hold data
proprietary)
• Associate access controls with each rule– Restrict ability to modify, apply rules
• Associate access controls with each micro-service– Explicit control of operation execution within a given
collection– Much finer control than provided by Unix r:w:e
Charlie Catlett ([email protected]) January 2007
For More Information
Reagan W. Moore
San Diego Supercomputer Center
http://www.sdsc.edu/srb/
http://irods.sdsc.edu/
Charlie Catlett ([email protected]) January 2007
Call for ParticipationPapers, tutorials, posters, BOFs, and demonstrations are being accepted through February in three tracks: Science, Technology, and Education, Outreach and Training
Submissions are being accepted through April for three competitions for high school, undergraduate and graduate students:
•Impact of Cyberinfrastructure
•Research posters
•On-site advancing scientific discovery