Science Gateways on the TeraGrid Von Welch, NCSA [email protected] (with thanks to Nancy...
-
Upload
dayna-sharp -
Category
Documents
-
view
216 -
download
0
Transcript of Science Gateways on the TeraGrid Von Welch, NCSA [email protected] (with thanks to Nancy...
Science Gateways on the TeraGrid
Von Welch, [email protected]
(with thanks to Nancy Wilkins-Diehr, SDSC for many slides)
The TeraGrid Strategy• Building a distributed
system of unprecedented scale– 40+ teraflops compute – 1+ petabyte storage– 10-40Gb/s networking
• Creating a unified user environment across heterogeneous resources– User software environment,
User support resources.– Created an initial community
of over 500 users, 80 PI’s.
• Integrating new partners to introduce new capabilities– Additional computing,
visualization capabilities– New types of resources-
data collections, instruments
Make it extensible!
The TeraGrid Team• Two major components:
– 9 Resource Providers (RPs) who provide resources and expertise
• Seven universities• Two government laboratories• Expected to grow
– The Grid Integration Group (GIG) who provides leadership in grid integration among the RPs
• Led by Director, who is assisted by Executive Steering Committee, Area Directors, Project Manager
• Includes participation by staff at each RP
• Funding now provided for people, not just networks and hardware!
TeraGrid Resource Partners
TeraGrid ResourcesANL/UC Caltech IU NCSA ORNL PSC Purdue SDSC TACC
Compute
Resources
Itanium2(0.5 TF)
IA-32(0.5 TF)
Itanium2(0.8 TF)
Itanium2(0.2 TF)
IA-32(2.0 TF)
Itanium2 (10 TF)
SGI SMP(6.5 TF)
IA-32(0.3 TF)
XT3(10 TF)TCS (6 TF)Marvel(0.3 TF)
Hetero (1.7 TF)
Itanium2(4.4 TF)
Power4+(1.1 TF)
IA-32(6.3 TF)
Sun (Vis)
Online Storage
20 TB 155 TB 32 TB 600 TB 1 TB 150 TB
540 TB 50 TB
Mass
Storage
1.2 PB 3 PB 2.4 PB 6 PB 2 PB
Data Collections
Yes Yes Yes Yes Yes
Visualization
Yes Yes Yes Yes Yes
Instruments Yes Yes Yes
Network
(Gb/s,Hub)
30
CHI
30
LA
10
CHI
30
CHI
10
ATL
30
CHI
10
CHI
30
LA
10
CHI
Partners will add resources and TeraGrid will add partners!
Science GatewaysA new initiative for the TeraGrid
• Increasing investment by communities to build their own cyberinfrastructure.
• Heterogeneity– Resources - different architectures at local, national and
international levels– Users- from HPC expert to K-12 student…they should all
benefit from CI.– Software stacks, policies.
• How can “centers/Institutions” provide, operate, maintain in this heterogeneous world ?
• Working with Gateways, TeraGrid will start to answer that question by providing generic CI services to communities.
• Integration and interoperability.
What are Gateways?• Gateways will
– engage communities that are not traditional users of the supercomputing centers
• by
– providing community-tailored access to TeraGrid services and capabilities
• Three examples:– Web-based Portals that front-end Grid Services that provide teragrid-deployed
applications used by a community.– Coordinated access points enabling users to move seamlessly between
TeraGrid and other grids.– Application programs running on users' machines but accessing services in
TeraGrid (and elsewhere)
• All take advantage of existing community investment in software, services, education, and other components of Cyberinfrastructure.
Grid Portal Gateways
• The Portal accessed through a browser or desktop tools
– Provides Grid authentication and access to services
– Provide direct access to TeraGrid hosted applications as services
• The Required Support Services– Searchable Metadata catalogs– Information Space Management.– Workflow managers– Resource brokers– Application deployment services – Authorization services.
• Builds on NSF & DOE software– Use NMI Portal Framework, GridPort– NMI Grid Tools: Condor, Globus, etc.– OSG, HEP tools: Clarens, MonaLisa
Technical Approach
Biomedical and Biology, Building Biomedical Communities
OGCE Portletswith ContainerOGCE Portletswith Container
Apache JetspeedInternal ServicesApache JetspeedInternal Services
ServiceAPI
ServiceAPI
GridProtocols
GridServiceStubs
GridServiceStubs
RemoteContentServices
RemoteContentServices
RemoteContentServersHTTP
GridService
sLocalPortal
Services
LocalPortal
Services
Grid Resources
Open Source Tools
Build standard portals to meet the domain requirements of the biology communitiesDevelop federated databases to be replicated and shared across TeraGrid
Workflow Composer
Initial Focus on 10 GatewaysScience Gateway Prototype Discipline Science Partner(s) TeraGrid Liaison
Linked Environments for Atmospheric Discovery (LEAD)
Atmospheric Droegemeier (OU) Gannon (IU), Pennington (NCSA)
National Virtual Observatory (NVO)
Astronomy Szalay (Johns Hopkins) Williams (Caltech)
Network for Computational Nanotechnology (NCN) and “nanoHUB”
Nanotechnology Lundstrum (PU) Goasguen (PU)
National Microbial Pathogen Data Resource Center (NMPDR)
Biomedicine and Biology Schneewind (UC), Osterman (Burnham/UCSD), DeLong (MIT), Dusko (INRA)
Stevens (UC/Argonne)
NSF National Evolutionary Biology Center (NESC), NIH Carolina Center for Exploratory Genetic Analysis, State of North Carolina Bioinformatics Portal project
Biomedicine and Biology Cunningham (Duke), Magnuson (UNC)
Reed (UNC), Blatecky (UNC)
Neutron Science Instrument Gateway
Physics Dunning (ORNL) Cobb (ORNL)
Grid Analysis Environment High -Energy Physics Newman (Caltech) Bunn (Caltech)
Transportation System Decision Support
Homeland Security Stephen Eubanks (LAN L) Beckman (Argonne)
Groundwater/Flood Modeling Environmental Wells (UT -Austin), Engel (ORNL) Boisseau (TACC)
Science Grid [GrPhyN/ivDGL/Grid3]
Multiple Pordes (FNAL), Huth (Harvard), Avery (Uflorida)
Foster (UC/Argonne), Kesselman (USC -ISI), Livny (UW)
Expanding User Base
0
1000
2000
3000
4000
5000
6000
1 2 3 4 5
OSG
Flood
HEP
SNS
NESC/CCEGA
OLSG
NCN
NVO
LEAD
0
1000
2000
3000
4000
5000
6000
2005 2006 2007 2008 2009
OSG
Flood
HEP
SNS
NESC/CCEGA
OLSG
NCN
NVO
LEAD
A new generation of “users” that access TeraGrid via Science Gateways, scaling well beyond the traditional “user” with a shell login account.
Projected user community size by each science gateway project.
Impact on society from gateways enabling decision support is much larger!
So how will we meet all these needs?• With RATS!
(Requirements Analysis Teams)
• Organized RATS• Collection, analysis
and consolidation of requirements to jumpstart the work
• And milestones
Rats de Paris
Traditional HPC Model• All user have accounts at each
site/resource– NxN matrix of users and sites
• Users access resources through low-level interfaces– E.g. Unix Shells, FTP session
• Resource takes care of all the security– AAAA: Authentication, Authorization,
Auditing, Accounting
Traditional HPC Usage
% ls% foo
AUTHn
OS(Authz)
AuditAccounting
Science Gateway Motivation• Shell-level access to resources is great for power
users, but has steep learning curve– Many SG users just need domain-specific interface, e.g.
they are not developing or deploying application codes
• Each resource/site has to maintain state about every user– Scalability problems for large/dynamic user communities
• No abstraction - users must adapt to all changes in resources
SG Security Model• SG acts as a interface between the
community and its resources• Much like a traditional ‘Grid Portal’, it provides
a domain-specific interface• However, unlike portals, it exists as a trusted
entity in its own right, allowing the resource to “outsource” AAAA functionality to the SG
• Resources runs all commands in a community account, which constrains what community can do - account can be constrained to a few community applications
Conceptual Model
% ls% foo
% ls% foo
% ls% foo
SG AAAA Model
% ls% foo
• Security functions held by the resource are now split between resource and Science Gateway
• However there is a strong need to communicate between the two• Resource will want full audit information and user information to
investigate suspicious activity• SG needs accounting information to do allocations and reporting
(e.g. who is using the SG)
Accounting
Authn
User-level Authz
Community-levelAuthz
User-LevelAudit
Job-LevelAudit
Outstanding Challenges• How to identify a job between SG and resource?
– “/bin/foo run at 15:38:13 (my time)” not very accurate
• Standard template for resource/SG agreement– Akin to certificate policy
• Acceptance of group accounts– Convince folks its ok to outsource
• Restricted accounts– Cookbook to restrict account to certain
applications• Sandboxing of users from each others• Community administrators
– Those who set up group account
Outstanding Challenges (cont)• Each SG forms its own VO
– TeraGrid provides resources– SG provides the user
• I’ve mostly talked about SG/TeraGrid relationship• But how SGs will manage their users is open
– Authentication, Authorization, Contact information… (the whole list Jill just gave)
– Users distributed over multiple domains– Wanting to get into the 1000’s of users– Different communities for each SG
• TeraGrid would like to help as much as possible here as well
Questions?