March 24, 2011

12
SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update Trestles Recent Dash results Gordon schedule SDSC’s broader HPC environment Recent EOT activity March 24, 2011

description

SDSC RP Update Trestles Recent Dash results Gordon schedule SDSC’s broader HPC environment Recent EOT activity. March 24, 2011. Trestles - System Description. Trestles - Configuring for productivity for modest-scale and gateway users. Allocation Plans Target users that need

Transcript of March 24, 2011

Page 1: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

SDSC RP UpdateTrestles

Recent Dash resultsGordon schedule

SDSC’s broader HPC environmentRecent EOT activity

March 24, 2011

Page 2: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Trestles - System DescriptionSystem Component Configuration

AMD MAGNY-COURS COMPUTE NODESockets 4Cores 32Clock Speed 2.4 GHzFlop Speed 307 Gflop/sMemory capacity 64 GBMemory bandwidth 171 GB/sSTREAM Triad bandwidth 100 GB/sFlash memory (SSD) 120 GB

FULL SYSTEMTotal compute nodes 324Total compute cores 10,368Peak performance 100 Tflop/sTotal memory 20.7 TBTotal memory bandwidth 55.4 TB/sTotal flash memory 39 TB

QDR INFINIBAND INTERCONNECTTopology Fat treeLink bandwidth 8 GB/s (bidrectional)Peak bisection bandwidth 5.2 TB/s (bidirectional)MPI latency 1.3 us

DISK I/O SUBSYSTEMFile systems NFS, LustreStorage capacity (usable) 150 TB: Dec 2010

2PB : June 20114PB: July 2012

I/O bandwidth 50 GB/s

Page 3: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Trestles - Configuring for productivity for modest-scale and gateway users

• Allocation Plans• Target users that need <=1K cores • Plan to allocate ~70% of the theoretically available SUs• Cap allocation per project at 1.5M SUs/year (~2.5% of annual total).• Allow new users to request up to 50,000 SUs in startup allocations, and front-load the

SUs offered during the first few allocations cycles.• Configure the job queues and resource schedulers for lower expansion factors and

generally faster turnaround.• Challenge will be to maintain fast turnaround as utilization goes up

• Services• Shared nodes• Long-running queue• Advance reservations • On-demand queue

• ~20 nodes set aside in on-demand queue. Users can run here at 25% (TBR) discount. • Jobs may be pre-empted (killed) at any time for on-demand users (initial pathfinder is SCEC

for realtime earthquake analyses)

Page 4: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Trestles Utilization and Expansion Factor

Page 5: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Users to date

Page 6: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

CIPRES gateway growth

Page 7: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Results from the CIPRES gateway• Identify evolutionary relationships by comparing DNA• To date, >2000 scientists have run more than 35,000 analyses for

100 completed studies. These studies span a broad spectrum of biological and medical research. The following discoveries were made by scientists using the Gateway over the past year:• Hepatitis C virus evolves quickly to defeat the natural human immune

response, altering the responsiveness of the infection to interferon therapy.• Humans are much more likely to infect apes with malaria than the reverse.• Toxic elements in local soils influence the geographical distribution of related

plants.• Red rice, a major crop weed in the US, did not arise from domestic rice stock.• Beetles and flowering plants adapt to each other over time, to the benefit of

both species.• Viruses can introduce new functions into Bakers’ yeast in the wild.• A microbe called Naegleria gruberi, which can live with or without oxygen,

provided new insights into the evolutionary transition from oxygen-free to oxygen-breathing life forms.

Page 8: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Recent Dash Results: Flash Application Benchmarks

• LiDAR Topographical database: Representative query of a 100 GB topographical database

• Test configuration: Gordon I/O nodes. 1 with 16 SSD’s an 1 with 16 spinning disks. Running single and concurrent instances of DB2 on the node.

• EM_BFS: Solution of 300M node using flash for out-of-core• Test configuration: Gordon I/O nodes. 1 with 16 SSD’s, and 1 with 16 spinning disks.

• Abaqus: S4B – Cylinder Head Bolt Up. Static analysis that simulates bolting a cylinder head onto an engine block.

• Test Configuration: Single Dash compute node run comparing local I/O to spinning disk and flash drive.

• Reverse Time Migration: Acoustic Imaging/Seismic Application• Test Configuration: Dash Compute nodes with local SSD and local spinning disk.

• Protein databank: Repository of 3D structures of molecules• Test configuration: Gordon I/O nodes. 1 with 16 SSD’s, and 1 with 16 spinning disks.

Page 9: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Flash Provides 2-4x Improvement in Run Times for LiDAR Query; MR-BFS, and Abaqus

Page 10: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Gordon Schedule (Approximate)

• Sixteen production-level flash I/O nodes are already in-house for testing• Early results for a single I/O node: Random I/O (4K blocks): Read

420K IOPS, Write 165K IOPS• Sandy Bridge availability early summer• System delivery to SDSC late summer• Friendly user late fall • Production before end of CY11• First allocation meeting: “Sept” cycle

Page 11: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

SDSC’s Broader HPC Environment• In addition to TeraGrid systems … SDSC operates:• Triton (Appro 256-node cluster + 28 Sun large-memory nodes

256/512 GB)• SDSC system supporting staff, industrial partners, & UCSD/UC users

• Thresher (IBM 256-node cluster)• UC-wide system, along with Mako at LBNL, operated for systemwide users

as part of a UC-wide Shared Research Computing Services (ShaRCS) pilot for condo computing

• Data Oasis –Lustre parallel file system• Shared by Triton, Trestles (and Gordon)• Phase 0 – 140 TB• Phase 1 – currently in procurement, ~ 2PB (raw), ~50 GB/s BW• Phase 2 – Summer 2012 – expansion for Gordon to 4PB, ~100 GB/s

Page 12: March 24, 2011

SAN DIEGO SUPERCOMPUTER CENTER

at the UNIVERSITY OF CALIFORNIA; SAN DIEGO

Recent EOT Activity• Planning

• Spring vSMP training workshop

• Track 2D Early User Symposium (in conjunction with TG-11)

• SDSC Summer Institute on HPC and Data-Intensive Discovery in Environmental and Ecological Sciences, featuring TG resources and TG Science Gateways

• Presenting a poster and hosting the TG booth at the Tapia Conference in April; will promote TG-11, internship and job opportunities, and encourage new TG/XD users.

• Computational Research Experience for Undergraduates (CREU) program this spring, and REHS (Research Experiences for High School Students) in summer 2011• Last year's program was very successful. Applications this year are very strong.

• TeacherTECH and StudentTECH programs are continuing, 2-3 per week.

• Portal development continues for Campus Champions and MSI-CIEC communities.

• Partnership with the San Diego County chapter of the Computer Science Teachers Association will host their second joint meeting in May (first in February).

• Engaging with state-wide effort to bring CS education to all high schools