A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x)...
-
Upload
nguyenminh -
Category
Documents
-
view
215 -
download
0
Transcript of A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x)...
![Page 1: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/1.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Michael L. Norman
Principal Investigator
Director, SDSC
Allan Snavely
Co-Principal Investigator
Project Scientist
A New NSF TeraGrid Resource for Data-Intensive Science
Slide 1
![Page 2: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/2.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Coping with the data deluge
• Advances in computing technology have resulted in a Moore’s Law for Data • Amount of digital data from
instruments doubles every 18 months (DNA sequencers, CCD cameras, telescopes, MRIs, etc.)
• Density of storage media keeping pace with Moore’s law, but not I/O rates • Time to process exponentially
growing amounts of data is growing exponentially
• Latency for random access limited by disk read head speed
Slide 2
![Page 3: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/3.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
What is Gordon?
• A “data-intensive” supercomputer based on SSD flash memory and virtual shared memory SW
• Emphasizes MEM and IOPS over FLOPS
• A system designed to accelerate access to massive data bases being generated in all fields of science, engineering, medicine, and social science
• Random IO to SSD 10-100x faster than HDD
• In production in 1Q2012
• We have a working prototype called Dash available for testing/evaluation now (LSST, PTF)
Slide 3
![Page 4: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/4.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
The Memory Hierarchy of a Typical HPC Cluster
Shared memory
programming
Message passing
programming
Latency Gap
Disk I/O
Slide 5
![Page 5: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/5.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
The Memory Hierarchy of Gordon
Shared memory
programming
Disk I/O
Slide 6
![Page 6: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/6.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon’s 3 Key Innovations
• Fill the latency gap with large amounts of flash SSD
• 256 TB
• >35 million IOPS
• Aggregate CPU, DRAM, and SSD resources into 32 shared memory supernodes for ease of use
• 8 TFLOPS
• 2 TB DRAM
• 8 TB SSD
• High performance parallel file system
• 4 PB
• >100 GB/s sustained
Slide 7
![Page 7: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/7.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Results from Dash*:
a working prototype of Gordon
*available as a TeraGrid resource
Slide 8
![Page 8: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/8.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Palomar Transient Factory (PTF) collab. with Peter Nugent
• Nightly wide-field surveys using Palomar Schmidt telescope
• Image data sent to LBL for archive/analysis
• 100 new transients every minute
• Large, random queries across multiple databases for IDs
Slide 9
![Page 9: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/9.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
PTF-DB Transient Search
Forward
Q1
Backward
Q1
DASH-IO-
SSD
11s
(145x)
100s
(24x)
Existing DB 1600s 2400s
Random Queries requesting very small chunks of data about the
candidate observations
Slide 10
![Page 10: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/10.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
MOPS Application Runs Faster Under vSMP as on Hardware SMP and ccNUMA
Moving Object Pipeline System
(MOPS) used in Asteroid Tracking.
Part of the Large Synoptic Survey
Telescope (LSST) Project
•Algorithm is serial (no MPI)
•135 GB required for the test
case
•Dash Node is a dual socket, 8
core Nehalem node with 48 GB
of memory.
•Triton PDAF is an 8 socket, 32
core, Shanghai node with
256GB memory
•Ember is a SGI Altix UV
system with 384 Nehalem
cores, 2 TB RAM in a SSI
Collab. with Jonathan Myers, LSST Corp.
Slide 11
![Page 11: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/11.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon Training Events http://www.sdsc.edu/us/training/
• Getting Ready for Gordon: Using vSMP
• May 10-11, 2011 (next week)
• Will be recorded for web download
• Getting Ready for Gordon: Summer Institute
• August 6-17, 2011
• Contact Susan Rathbun ([email protected])
Slide 12
![Page 12: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/12.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
How to get time
• NOW: Request a start-up allocation on Dash at https://www.teragrid.org/web/user-support/startup
• After Sept 2011: Request a start-up or large allocation on Gordon at https://www.teragrid.org/web/user-support/allocations
• For more information, see http://www.sdsc.edu/us/resources/dash/
• Or email me at [email protected]
Slide 13
![Page 13: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/13.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
RESERVE SLIDES
![Page 14: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/14.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon Architecture: “Supernode”
• 32 Appro Extreme-X compute nodes • Dual processor Intel
Sandy Bridge • 240 GFLOPS • 64 GB
• 2 Appro Extreme-X IO nodes • Intel SSD drives
• 4 TB ea. • 560,000 IOPS
• ScaleMP vSMP virtual shared memory • 2 TB RAM aggregate • 8 TB SSD aggregate
240 GF
Comp.
Node
64 GB
RAM
240 GF
Comp.
Node
64 GB
RAM
4 TB
SSD
I/O Node
vSMP memory
virtualization
![Page 15: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/15.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon Architecture: Full Machine
• 32 supernodes = 1024 compute nodes
• Dual rail QDR Infiniband network
• 3D torus (4x4x4)
• 4 PB rotating disk parallel file system
• >100 GB/s
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
SN SN
D D D D D D
![Page 16: A New NSF TeraGrid Resource for Data-Intensive Science · PDF fileSSD 11s (145x) 100s (24x) Existing DB 1600s 2400s Random Queries requesting very small chunks of data about the candidate](https://reader031.fdocuments.in/reader031/viewer/2022030407/5a8524c47f8b9a9f1b8c457b/html5/thumbnails/16.jpg)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA, SAN DIEGO
Gordon Aggregate Capabilities
Speed >200 TFLOPS
Mem (RAM) 64 TB
Mem (SSD) 256 TB
Mem (RAM+SSD) 320 TB
Ratio (MEM/SPEED) 1.31 BYTES/FLOP
IO rate to SSDs 35 Million IOPS
Network bandwidth 16 GB/s bi-directional
Network latency 1 msec.
Disk storage 4 PB
Disk IO Bandwidth >100 GB/sec