Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads ›...
Transcript of Senior Director, Research Computing and Cyberinfrastructure The … › ~lifka › Downloads ›...
Research Computing and Cyberinfrastructure ‐Th S i bili M d l P SThe Sustainability Model at Penn State
Vijay K. AgarwalaSenior Director, Research Computing and Cyberinfrastructure
The Pennsylvania State UniversityUniversity Park PA 16802 USAUniversity Park, PA 16802 USA
May 3rd – 5th, 2010 at Cornell University Center for Advanced Computing
1
Organizational Structure of Research Computing and CI at Penn StateUniversity Presidenty
Senior Vice President for Finance and Business
Senior Vice President for Research and
Provost and Executive Vice PresidentFinance and Business
Dean of Graduate SchoolPresident
Vice Provost for Information Technology and Chief Information
Offi
Associate Vice PresidentOffice of Physical Plant
RCC Advisory CommitteeDirector of Director f
Officery
Director of Research
RCC Executive Committee
Up to seven faculty members and research administrators from the University Research
Council
Telecommunication and Networking
Services
DirectorInstitute of Cyber
Science
Associate VP for Research
Computing and CyberinfrastructureExecutive Director
High Performance Computing Systems
4 staff members
Domain Specific Consulting Support
4 staff members
Visualization and Telecollaborative Systems
4 staff members
Software Development andProgramming Support
4 t ff b4 staff members 4 staff members 4 staff members 4 staff members
2
Research Computing and Cyberinfrastructure• Pro ide s stems ser ices b researching c rrent practices in operating s stem file s stem data• Provide systems services by researching current practices in operating system, file system, data
storage, job scheduling as well as computational support related to compilers, parallel computations, libraries, and other software support. Also supports visualization of large datasets by innovative means to gain better insight from the results of simulations.
• Enable large‐scale computations and data management by building and operating several state‐of‐the art computational clusters and machines with a variety of architectures.
• Consolidate and thus significantly increase the research computing resources available to each faculty participant. Faculty members can frequently exceed their share of the machine to meetfaculty participant. Faculty members can frequently exceed their share of the machine to meet peak computing needs.
• Provide support and expertise for using programming languages, libraries, and specialized data and software for several disciplines.
• Investigate emerging visual computing technologies and implement leading‐edge solutions in a cost‐effective manner to help faculty better integrate data visualization tools and immersive facilities in their research and instruction.
• Investigate emerging architectures for numerically‐intensive computations and work with early‐Investigate emerging architectures for numerically intensive computations and work with earlystage companies. For example: interconnects, networking, and graphics processors for computations.
• Help build inter‐ and intra‐institutional research communities using cyberinfrastructure h l itechnologies.
• Maintain close contacts with NSF and DoE funded national centres, and help faculty members with porting and scaling of codes across different systems.
3
Programs, Libraries, and Application Codes in Support of Computational Research
• Compilers and Debuggers: DDT, GNU Compilers, Intel Compiler Suite, NVIDIA CUDA, PGI Compiler Suite TotalViewCompiler Suite, TotalView
• Computational Biology: BLAST, BLAST+, EEGLAB, FSL, MRIcro, MRICron, SPM5, SPM8, RepeatMasker, wuBlast
• Computational Chemistry and Material Science: Accelyrs Materials Studio Amber CCP4• Computational Chemistry and Material Science: Accelyrs Materials Studio, Amber, CCP4, CHARMM , CPMD, Gamess, Gaussian 03, GaussView, Gromacs, LAMMPS, NAMD, NWChem, Rosetta, Shrondinger Suite, TeraChem, ThermoCalc , VASP, WIEN2K, WxDragon
• Finite Element Solvers: ABAQUS, LS‐DYNA, MD/Nastran and MD/Patran
• Fluid Dynamics: Fluent, GAMBIT, OpenFOAM, Pointwise
• Mathematical and Statistical Libraries and Applications: AMD ACML, ATLAS, BLAS, IMSL, LAPACK, GOTO, Intel MKL, Mathematica, MATLAB, Distributed MATLAB,NAG, PETSc, R, SAS, WSMP
• Multiphysics: ANSYS, Comsol
• Optimization: AMPL, CPLEX, GAMS, Matlab Optimization Toolbox, Matgams, OPL
• Parallel Libraries: OpenMPI, Parallel IMSL, ScaLAPACK
• Visualization Software: Avizo, Grace, IDL, Tecplot, VisIt, VMD
All ft i t ll ti d i b f lt Th ft t kAll software installations are driven by faculty. The software stack on every system is customized and entirely shaped by faculty needs.
4
Compute EnginesCore Compute EnginesSystem Nodes Cores Memory (GB) Interconnect
Lion‐XO 64 192 1280 SDR Infiniband
Lion‐XB 16 128 512 SDR Infiniband
A 48‐port 10 GigE switch connects all compute engines and storage 1280 core system coming online in Summer 2010 All i hLion‐XC 140 560 1664 SDR Infiniband
Lion‐XJ 144 1152 5120 DDR Infiniband
Lion‐XI 64 512 4096 DDR Infiniband
f b d
All compute engines together to deliver 40 million core hours in 2010 Emerging technologies in the hands of the user community –ScaleMP and GPU cluster
Lion‐XK 64 512 4096 DDR Infiniband
Lion‐XH 48 384 2304 QDR Infiniband
Lion‐CLSF 1 virtual node using ScaleMP
128 768 QDR Infiniband
technology
CyberStar 224 2048 7680 QDR Infiniband
Hammer 8 64 1024 GigE
T l 2 16 32 Gi ETesla 2 servers 16 32 GigE
1 NVIDIA S1070 960 16 GigE
StoragegSystem Capacity Performance
IBM GPFS Parallel File System 550 TB Can sustain 5+ GB/s
5
Job SchedulingScheduling Tools• Torque is the resource managerTorque is the resource manager• Moab is the job scheduler
Participating Faculty Partners• Each partner group gets its own queue in Torque• Each partner queue is assigned a fair‐share usage target in Moab equivalent to the group’s
percentage of ownership in the specific compute engine• A 30 day fair‐share window is used to track partner group usage • When a partner group is below its target, the jobs in its queue are given a priority boost• When a partner group is at or above its target, the jobs in its queue have no priority boost but are
also not penalized (they are treated on par with general users)• Within a partner group queue there is a smaller fair‐share priority between members of that groupWithin a partner group queue there is a smaller fair share priority between members of that group
to keep anyone from monopolizing the group queue• Partner group queues do not limit the number of cores its members use• Partner group queues have by default a two week limit on the run time of jobs but this limit can be
adjusted as necessaryadjusted as necessary
General Users• General users have access to a public queue and a small fair‐share priority• General users are limited in the number of processor cores they can have in use (32) and the length
of their jobs (24 hours)
6
8 12 groups exceeded their “target share” at some point during the periodOnly four groups remained below the “target share” for this period
6
7group1group1
group2group2
group3group3
group4group4
5
arget U
sage
group4group4
group5group5
group6group6
group7group7
3
4
f Actua
l to Ta group8group8
group9group9
group10group10
group11group11
2
3
Ratio
of group11group11
group12group12
group13group13
group14group14
1group15group15
group16group16
max usagemax usage
0Usage of Usage of LIONLION‐‐XJ XJ compute cluster between January 1, 2010 compute cluster between January 1, 2010 ‐‐ April 22, 2010April 22, 2010
7
Visualization Services
Staff members provide consulting, teach seminars, assist faculty and support facilities for visualization and VR.
Recent support areas include:
• Visualization and VR system design and deployment
• 3D Modeling applications: FormZ Maya• 3D Modeling applications: FormZ, Maya
• Data Visualization applications: OpenDX, VTK, VisIt
• 3D VR development and device libraries: VRPN, VMRL, JAVA3D, OpenGL, OpenSG CaveLibOpenSG, CaveLib
• Domain specific visualization tools: Avizo (Earth, Fire, Green, Wind), VMD, SCIRun
• Telecollaborative tools and facilities: Access Grid, inSORS, VNC
• Parallel graphics for scalable visualization: Paraview, DCV, VisIt
• Programming support for graphics (e.g. C/C++, JAVA3D, Tcl/Tk, Qt)
8
Visualization Facilities
Our goal is to foster more effective use of visualization and VR techniques in research and teaching across colleges and disciplines via strategic deployment of facilities and related support.
• Locating facilities strategically across campus for convenient access by targeted disciplines and user communitiesdisciplines and user communities
• Leveraging existing applications and workflows so that visualization and VR can be natural extensions to existing work habits for the users being served
• Providing outreach seminars end user training and ongoing staff support in use of• Providing outreach seminars, end‐user training and ongoing staff support in use of the facilities
• Working on an ongoing basis with academic partners to develop and adapt these resources more precisely to fit their needsp y
• Helping to identify and pursue funding opportunities for further enhancement of these efforts as they take root
9
Visualization Facilities
• Immersive Environments Lab: in partnership with School of h d d hArchitecture and Landscape Architecture
• Immersive Construction Lab: in partnership with Architecture EngineeringArchitecture Engineering
• Visualization/VR Lab: in partnership with Materials Simulation Center
• Visualization/VR Lab: in partnership with Computer Science and Engineering
• Sports Medicine VR Lab: a partnership between Kinesiology, Athletics and Hershey Medical Center
10
Building a strong University based Computing Center (UBCC)Top 10 practices
• Appeal to a broad constituency: science, engineering, medicine, business, humanities, liberal arts. Provision 25% of compute cycles with minimal requirements of proposal and review.
• Flexibility in system configuration and adaptability to faculty needs: work with partners on their queue requirements, accommodate temporary priority boost.
• Keep barriers to faculty participation low: as low as a $5000 investment for a contributing partner with priority access Try before you buy program and guestcontributing partner with priority access. Try‐before‐you‐buy program and guest status for prospective faculty partners.
• Maximize system utilization: consistently above 90%, even contributing partners don't have instantaneous access.
• Extensive software stack, and rapid turnaround in installation of new software: rich set of tools, compilers, libraries, community codes and ISV codes.
• Provide consulting with subject matter expertise: differentiate the UBCC from “flop shops” and cloud providersflop shops and cloud providers.
• Strong commitment to training: teaching in classes, seminars, workshops.• Provide accurate and daily system utilization data.• Build strong partnerships with hardware and software vendorsBuild strong partnerships with hardware and software vendors.• Make emerging technologies and test beds available to faculty and students.
11
Cloud Computing – How will it impact Research CI at Universities?
Cloud computing : Virtualized/remote computing resource from which users can purchase what they need, when they need it
A clearinghouse for buyers and sellers of Digital Analysis Computing (DAC) services
Computing resources for commercial and academic community
Offers HPC services to industry, government, and academiaAffiliated with
12
Cloud Computing – How will it impact Research CI at Universities?
• Google+IBM cloud - a platform to computer science researchers for building g p p gcloud applications. Combination of Google machines, IBM blade center and system x servers
• NSF awarded $5.5 million to fourteen universities through its Cluster Exploratory (CluE) programExploratory (CluE) program
• A global open source test bed, Open Cirrus for the advancement of cloud computing research
• University of Illinois, KIT (Germany), iDA (Singapore), ETRI (Korea), MIMOSUniversity of Illinois, KIT (Germany), iDA (Singapore), ETRI (Korea), MIMOS (Malaysia), Russian Academy of Sciences
13
• Computing in the Cloud (CiC) – NSF Solicitation 10-550 (~ $5 mil in FY 2010)
Cloud Computing – How will it impact Research CI at Universities?
• What impact buying of commercial cloud services will have on future development of campus, regional and national CI?
• How tightly should the campus CI be integrated with that of• How tightly should the campus CI be integrated with that of private companies in a geographic region? Would it promote stronger partnership between academia and industry?
• Would new policies and guidelines emerge from federal funding agencies on incorporating cloud services in the grants? Would cloud service providers have to be US‐based for data and security reasons?
Demand for cloud computing services by campus based researchers will depend on how research computation related CI p pis funded and deployed in the future, and just as importantly, how well the university based computing centers (UBCC) respond to it and stay competitive.p y p
14
A model for University‐based Computing Center (UBCC)Assumes capital expenditures have been incurred for a data center as well as hardware and software. For example, a $10 million investment today in a green data center with a PUE of 1.25 and lower levels of redundancy can create a facility with: 3,200 sq. ft. of raised floor, 3,000 sq. ft. of office space, 1.5 ‐ 2.0 megawatts of electrical power, limited capacity UPS, and HVAC systems, all housed in a 12,000 sq. ft. building. A similar $10 million investment in hardware today can build aggregate peak computing capacity between 250 and 400 teraflops (25,000 – 40,000 cores) and 10 ‐ 20 petabytes of storage.
Annual Operating Expenditurep g pHardware: $2.50 million replacing 25% of installed compute capacity each yearSoftware: $0.25 million annual licensing costs Utility cost: $1.00 million (80 racks, avg. 15 KW/rack)Staff (16 people):$2 00 million in salary and benefitsStaff (16 people):$2.00 million in salary and benefitsOther expenses: $0.25 millionTotal: $6.00 million annual investment in support of research, teaching & outreach.
Once initial capital expenditures have been made it is possible to deliver compute cycles with aOnce initial capital expenditures have been made, it is possible to deliver compute cycles, with a high level of system and computational staff support, at a cost between $0.025 to $0.04 per core hour (2.4 ‐ 3.0 GHz cores).
Universities are more likely to commit such institutional funds if Federal funding agencies (NSF NIHUniversities are more likely to commit such institutional funds if Federal funding agencies (NSF, NIH, DoE, NASA, DoC) provide incentives by way of targeted support for UBCC.
15
The case for UBCC: why Federal agencies need to provide direct support• Incentive for institutional investment: research universities will leverage targeted federal funding and commit
more funds in supporting computational and data-driven research and teaching.• Efficient use of resources: UBCC, with its campus-level computing cloud, has proven to be highly optimal and
cost-effective in meeting a large portion of computing needs across a range of disciplines. Breakthrough science is becoming possible on small-to-medium sized systems. The research community at campuses frequently relies on campus-based HPC systems and staff to speed up their development and discovery cycle. The campus based HPC systems are designed and their software stack frequently fine tuned in response to the computational needs of local research community and with the sole goal of increasing computational productivity of faculty and students.
• Seed computational science education: UBCC is a critical enabler by providing hands-on advanced training to students and faculty. UBCC nurtures sustained interest in research computing and HPC technologies amongst faculty, graduate students, and more significantly, reaches the undergraduate student community.
• Workforce Development: There is a widening skills gap i.e. shortage of people with disciplinary knowledge, skills in scalable algorithm and code development for large systems, and the ability to think across disciplinary boundaries and integrate modeling and computational techniques from different areas. There is an equally pronounced shortage of skilled people in system administration, in building and deploying various parts of the software stack, code scaling, and in adoption of large-scale computing in disciplinary areas. Support for UBCC will go a long way in addressing the shortage of skilled personnel and help with wider and deeper adoption of HPC technologies in both academia and industry UBCC have a critical role to play in developing the next generation oftechnologies in both academia and industry. UBCC have a critical role to play in developing the next generation of HPC professionals and also in moderating the outsourcing of engineering services when it is due to shortage of skilled personnel in the US.
• Industrial outreach: UBCC is uniquely positioned and highly effective in forging partnerships between industry and academia to make regional businesses globally competitive. It can leverage its CI assets, couple it with what industry may have and in effect build "local collaboratories" around which faculty students and industrial R&Dindustry may have, and in effect build local collaboratories around which faculty, students, and industrial R&D personnel can collaborate. In doing so, UBCC will support development of advanced modeling technologies, help overcome barriers of "high-perceived cost" that are preventing its widespread adoption by small and medium-sized businesses, and drive the adoption of modeling and simulation as an integral part of their research and product development cycles. UBCC can provide large-scale computing services and the continuity of contact between faculty/students and industry personnel.y y
• Build a healthier HPC ecosystem: If more UBCCs are funded, it will increase diversity and number of participants, and the rate of innovation and number of companies in this important industry sector where US companies are global leaders.
16