Post on 15-Feb-2018
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Michael L. Norman
Principal Investigator
Director, SDSC
Allan Snavely
Co-Principal Investigator
Project Scientist
A New NSF TeraGrid Resource for Data-Intensive Science
Slide 1
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Coping with the data deluge
• Advances in computing technology have resulted in a Moore’s Law for Data • Amount of digital data from
instruments doubles every 18 months (DNA sequencers, CCD cameras, telescopes, MRIs, etc.)
• Density of storage media keeping pace with Moore’s law, but not I/O rates • Time to process exponentially
growing amounts of data is growing exponentially
• Latency for random access limited by disk read head speed
Slide 2
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
What is Gordon?
• A “data-intensive” supercomputer based on SSD flash memory and virtual shared memory SW
• Emphasizes MEM and IOPS over FLOPS
• A system designed to accelerate access to massive data bases being generated in all fields of science, engineering, medicine, and social science
• Random IO to SSD 10-100x faster than HDD
• In production in 1Q2012
• We have a working prototype called Dash available for testing/evaluation now (LSST, PTF)
Slide 3
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
The Memory Hierarchy of a Typical HPC Cluster
Shared memory
programming
Message passing
programming
Latency Gap
Disk I/O
Slide 5
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
The Memory Hierarchy of Gordon
Shared memory
programming
Disk I/O
Slide 6
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon’s 3 Key Innovations
• Fill the latency gap with large amounts of flash SSD
• 256 TB
• >35 million IOPS
• Aggregate CPU, DRAM, and SSD resources into 32 shared memory supernodes for ease of use
• 8 TFLOPS
• 2 TB DRAM
• 8 TB SSD
• High performance parallel file system
• 4 PB
• >100 GB/s sustained
Slide 7
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Results from Dash*:
a working prototype of Gordon
*available as a TeraGrid resource
Slide 8
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Palomar Transient Factory (PTF) collab. with Peter Nugent
• Nightly wide-field surveys using Palomar Schmidt telescope
• Image data sent to LBL for archive/analysis
• 100 new transients every minute
• Large, random queries across multiple databases for IDs
Slide 9
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
PTF-DB Transient Search
Forward
Q1
Backward
Q1
DASH-IO-
SSD
11s
(145x)
100s
(24x)
Existing DB 1600s 2400s
Random Queries requesting very small chunks of data about the
candidate observations
Slide 10
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
MOPS Application Runs Faster Under vSMP as on Hardware SMP and ccNUMA
Moving Object Pipeline System
(MOPS) used in Asteroid Tracking.
Part of the Large Synoptic Survey
Telescope (LSST) Project
•Algorithm is serial (no MPI)
•135 GB required for the test
case
•Dash Node is a dual socket, 8
core Nehalem node with 48 GB
of memory.
•Triton PDAF is an 8 socket, 32
core, Shanghai node with
256GB memory
•Ember is a SGI Altix UV
system with 384 Nehalem
cores, 2 TB RAM in a SSI
Collab. with Jonathan Myers, LSST Corp.
Slide 11
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon Training Events http://www.sdsc.edu/us/training/
• Getting Ready for Gordon: Using vSMP
• May 10-11, 2011 (next week)
• Will be recorded for web download
• Getting Ready for Gordon: Summer Institute
• August 6-17, 2011
• Contact Susan Rathbun (susan@sdsc.edu)
Slide 12
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
How to get time
• NOW: Request a start-up allocation on Dash at https://www.teragrid.org/web/user-support/startup
• After Sept 2011: Request a start-up or large allocation on Gordon at https://www.teragrid.org/web/user-support/allocations
• For more information, see http://www.sdsc.edu/us/resources/dash/
• Or email me at MLNORMAN@UCSD.EDU
Slide 13
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
RESERVE SLIDES
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon Architecture: “Supernode”
• 32 Appro Extreme-X compute nodes • Dual processor Intel
Sandy Bridge • 240 GFLOPS • 64 GB
• 2 Appro Extreme-X IO nodes • Intel SSD drives
• 4 TB ea. • 560,000 IOPS
• ScaleMP vSMP virtual shared memory • 2 TB RAM aggregate • 8 TB SSD aggregate
240 GF
Comp.
Node
64 GB
RAM
240 GF
Comp.
Node
64 GB
RAM
4 TB
SSD
I/O Node
vSMP memory
virtualization
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon Architecture: Full Machine
• 32 supernodes = 1024 compute nodes
• Dual rail QDR Infiniband network
• 3D torus (4x4x4)
• 4 PB rotating disk parallel file system
• >100 GB/s
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
D D D D D D
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon Aggregate Capabilities
Speed >200 TFLOPS
Mem (RAM) 64 TB
Mem (SSD) 256 TB
Mem (RAM+SSD) 320 TB
Ratio (MEM/SPEED) 1.31 BYTES/FLOP
IO rate to SSDs 35 Million IOPS
Network bandwidth 16 GB/s bi-directional
Network latency 1 msec.
Disk storage 4 PB
Disk IO Bandwidth >100 GB/sec