Emerging Technology Trends in Data-Intensive SupercomputingSAN DIEGO SUPERCOMPUTER CENTER at the...
Transcript of Emerging Technology Trends in Data-Intensive SupercomputingSAN DIEGO SUPERCOMPUTER CENTER at the...
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Emerging Technology Trends in Data-Intensive Supercomputing
Dr. Richard Moore
Deputy Director San Diego Supercomputer Center University of California, San Diego
USA
HPC Advisory Council Meeting October 28, 2012 Zhangjiajie, China
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Data Crisis: Information Big Bang
PCAST Digital Data
NSF Experts Study
Wired, Nature
Storage Networking Industry Association (SNIA) 100 Year Archive
Requirements Survey Report
“there is a pending crisis in archiving… we have to create long-term methods
for preserving information, for making it available for analysis in
the future.” 80% respondents: >50 yrs;
68% > 100 yrs
Industry
“Data-Enabled Science”
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Gordon – An Innovative Data-Intensive Supercomputer
• Designed to accelerate access to massive amounts of data in areas of genomics, earth science, engineering, medicine, and others.
• Emphasizes memory and I/O over FLOPS. • Appro-integrated 1,024 node Sandy Bridge
cluster. • 300 TB of high performance Intel flash. • Large memory supernodes via vSMP
Foundation from ScaleMP. • 3D torus interconnect from Mellanox. • Built from commodity hardware • In production since February 2012. • Funded by the NSF and available through the
Extreme Science and Engineering Discovery Environment program (XSEDE) allocations.
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Shared memory Programming
(vSMP)
The Memory Hierarchy of a Typical Supercomputer
Shared memory Programming (single node)
Message passing programming
Latency Gap Disk I/O BIG DATA
Disk I/O
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
vSMP Aggregation Software
Gordon 32-way Supernode
Dual Sandy Bridge
Compute Node Dual SB
CN Dual SB
CN Dual SB
CN Dual SB
CN Dual SB
CN Dual SB
CN Dual SB
CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
Dual SB CN
I/O NODE
4.8 TB flash SSD
Dual Westmere
I/O Processors
I/O NODE
4.8 TB flash SSD
Dual Westmere
I/O Processors
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Gordon Specifications INTEL SANDY BRIDGE COMPUTE NODE
Sockets & Cores 2 & 16 Clock speed 2.6 GHz
DRAM capacity and speed 64 GB, 1,333 MHz INTEL710 eMLC FLASH I/O NODE
NAND flash SSD drives 16 SSD capacity per drive &
per node 16 * 300 GB = 4.8 TB
SMP SUPER-NODE (VIA VSMP) Compute nodes / I/O
Nodes 32 / 2 Addressable DRAM 2 TB
Addressable memory including flash 11.6 TB
GORDON (AGGREGATE) Compute Nodes 1,024 Compute cores 16,384
Peak performance 341 TF
DRAM/SSD memory 64 TB DRAM 300 TB SSD
INFINIBAND INTERCONNECT Architecture Dual-Rail, 3D torus
Link Bandwidth QDR Vendor Mellanox
LUSTRE-BASED DISK I/O SUBSYSTEM (SHARED) Total storage:
current/planned 4 PB/6 PB (raw)
Total bandwidth 100 GB/s
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Exporting & Preserving Flash Performance
• Several layers of overhead reduce performance (SATA, Linux, network)
• I/O models need to be driven by the applications
• No one has really done this before
• iSCSIoRDMA (iSER) was the best protocol
• XFS performs well • Continue to explore
alternatives based on user needs
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Gordon 3D Torus Interconnect Fabric 4x4x4 3D Torus Topology
IO
CN CN CN CN CN CN CN CN CN CN CN CN CN CN CN
36 Port Fabric Switch
36 Port Fabric Switch
18 x 4X IB Network Connections
18 x 4X IB Network Connections
IO
CN
Dual-Rail Network increased Bandwidth & Redundancy
Single Connection to each Network 16 Compute Nodes, 2 IO Nodes
4X4X4 Mesh Ends are folded on all three
Dimensions to form a 3DTorus
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Trestles System Description Table 2.1. Trestles System Specification
System Component Configuration AMD MAGNY-COURS COMPUTE NODE
Sockets 4 Cores 32 Clock speed 2.4 GHz Flop speed 307 Gflop/s Memory capacity 64 GB Memory bandwidth 171 GB/s STREAM Triad bandwidth 100 GB/s Flash memory (SSD) 160GB
FULL SYSTEM Total compute nodes 324 Total compute cores 10,368 Peak performance 100 Tflop/s Total memory 20.7 TB Total memory bandwidth 55.4 TB/s Total flash memory 52 TB
QDR INFINIBAND INTERCONNECT Topology Fat tree Link bandwidth 8 GB/s (bidirectional) Peak bisection bandwidth 5.2 TB/s (bidirectional) MPI latency 1.3 µs
DISK I/O SUBSYSTEM File systems NFS, Lustre Storage capacity (usable) 150 TB: Dec 2010
2PB : June 2011 4PB: July 2012
I/O bandwidth 50 GB/s
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
The Majority of TeraGrid/XD Projects Have Modest-Scale Resource Needs
• “80/20” @ ~512 cores (FY09) • ~80% of projects never run a
job larger than this … • And those projects use <20%
of resources • Only ~1% of projects run jobs as
large as 16K cores and those consume >30% of TG resources
• Many projects/users only need modest-scale jobs/resources
• And a modest-size resource can provide the resources for a large number of these projects/users
Exceedance distributions of projects and usage as a function of the largest job (core count) run by a project over a full year (FY2009)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
SDSC is Deploying a New Repertoire of Storage Systems
SDSC Cloud • Storage of Digital Data for Ubiquitous Access and High-Durability • Access: Multi-platform web interface, S3-type interfaces, backup
SW
Data Oasis (PFS) • High-Performance Parallel File System for HPC Systems;
Partitioned for Scratch and Medium-Term Parking Space • Access: Lustre on HPC Systems (Gordon, Trestles, Triton)
Project Storage • Purpose: Typical Project / Home Directory / User File Server
Storage Needs • Access: NFS/CIFS, iSCI
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Data Oasis Heterogeneous Architecture
64 OSS (Object Storage Servers)
Provide 100GB/s Performance and >4PB Raw
Capacity
JBOD 90TB
JBODs (Just a Bunch Of Disks)
Provide Capacity Scale-out to an Additional 5.8PB
Arista 7508 10G
Arista 7508 10G
Redundant Switches for Reliability and
Performance
3 Distinct Network Architectures
OSS 72TB
JBOD 90TB
OSS 72TB
JBOD 90TB
OSS 72TB
JBOD 90TB
64 Lustre LNET Routers 100 GB/s
Mellanox 5020 Bridge 12 GB/s
MDS: Gordon scratch
MDS: Trestles scratch
Myrinet 10G Switch 25 GB/s
MDS: Triton scratch
GORDON IB cluster
TRITON Myrinet cluster
TRESTLES IB cluster
Metadata Servers
MDS: Gordon & Trestles project
OSS 72TB
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
SDSC Cloud: A Paradigm Shift for Long-Term Storage: Focus on Access, Sharing & Collaboration
• Launched September 2011 • Largest, highest-performance
known academic cloud • 5.5 Petabytes (raw), 8 GB/sec • Automatic dual-copy and
verification • Capacity and performance scale
linearly to 100’s of petabytes • Open source platform based on
NASA and RackSpace software • http://cloud.sdsc.edu
15
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Applications of SDSC Cloud Shared/published/curated data collections
HPC simulation data storage and sharing
Web/portal applications and site hosting
Application integration using supported APIs
Serving images/videos
Backup services
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Data Curation – pilot projects • Project mid-way thru a two-year pilot phase
• How do lab personnel work with librarians to curate their data? • How much work is required to curate data and what are options? • What is a sustainable business model that RCI should invest in?
• Five representative programs selected as pilots (http://rci.ucsd.edu/pilots) • The Brain Observatory (Annese) • Open Topography (Baru) • Levantine Archaeology Laboratory (Levy) • SIO Geological Collections (Norris) • Laboratory for Computational Astrophysics (Wagner)
• Using existing tools whenever possible • Storage at SDSC, campus high-speed networking, Digital Asset Management System
(DAMS) at UCSD Libraries, Chronopolis digital preservation network
• Also, develop Data Management Plan tools and provide training • Anticipate production curation services in mid-2013
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Center for Large-scale Data Systems Research (CLDS)
• New Industry-Academia partnership led by SDSC
• Research focus: technical and management challenges of large-scale data systems in business, government and academia
• “Big Data 2015” and “How Much Information? Phase 2” research projects
• How IT systems are used and valued in industry verticals
• POC: Chaitan Baru ([email protected])
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Predictive Analytics Center of Excellence: Bringing together academia, government, and industry
Predictive Analytics Center of
Excellence
Inform, Educate and Train
Develop Standards
& Methodolo
gy
Scalable High Per-formance
Data Mining
Foster Research
and Collaborati
on
Data Mining
Repository of Very
Large Data Sets
Provide Predictive Analytics Services
Bridge the Industry
and Academia
Gap
PACE: Natasha Balac
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Applications
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Computational Style Code Answering the question: Why Gordon?
V M F
C T L
V: Uses vSMP aggregation software C: Computationally intensive, leverages Sandy Bridge M: Uses larger Memory/core on Gordon (4GB/core) T: Threaded F: Uses Flash L: Lustre I/O intensive
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Breadth First Search Comparison using SSD and HDD
V M F
C T L Source: Sandeep Gupta, San Diego Supercomputer Center. Used by permission. 2011
Graphs are mathematical and computational representations of relationships of objects in a network. Such networks occur in many natural and man-made scenarios, including communication, biological, and social contexts. Understanding the structure of these graphs is important for uncovering important relationships among the members.
• Implementation of Breadth-first search (BFS) graph algorithm developed by Munagala and Ranade
• 134 million nodes • Flash drives reduced
I/O time by factor of 6.5x • Problem converted from I/O
bound to compute bound
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Daphnia Genome Assembly using Velvet and vSMP
V M F
C T L Source: Wayne Pfeiffer, San Diego Supercomputer Center. Used by permission.
Daphnia (a.k.a. water flea), is a model species used for understanding mechanisms of inheritance and evolution, and as a surrogate species for studying human health in responses to environmental changes.
De novo assembly of short DNA reads using the de Bruijn graph algorithm. Code parallelized using OpenMP directives.
Benchmark problem: Daphnia genome assembly from 44-bp and 75-bp reads using 35-mer
Photo: Dr. Jan Michels, Christian-Albrechts-University, Kiel
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Foxglove Calculation using Gaussian 09 with vSMP - MP2 Energy Gradient Calculation
V M F
C T L Source: Jerry Greenberg, San Diego Supercomputer Center. January, 2012.
The Foxglove plant (Digitalis) is studied for its medicinal uses. Digoxin, an extract of the Foxglove, is used to treat a variety of conditions including diseases of the heart. There is some recent research that suggests it may also be a beneficial cancer treatment.
Time to solution: 43,000s
Processor footprint - 4 nodes 64 threads
Memory footprint – 10 nodes 700 GB
1 Compute node = (16 cores/node) 64 GB/node)
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Axial compression of caudal rat vertebra - Very large memory simulation using Abaqus and vSMP
Source: Matthew Goff, Chris Hernandez. Cornell University. Used by permission. 2012
The goal of the simulations is to analyze how small variances in boundary conditions effect high strain regions in the model. The research goal is to understand the response of trabecular bone to mechanical stimuli. This has relevance for paleontologists to infer habitual locomotion of ancient people and animals, and in treatment strategies for populations with fragile bones such as the elderly.
• 5 million quadratic, 8 noded elements
• Model created with custom Matlab application that converts 253 micro CT images into voxel-based finite element models
vSMP and flash provide a large memory capability and speed-up by allowing very large Abaqus FEM models to be run in-core, or mixed in-core/flash mode.
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Massive Data Analysis of Large-eddy Simulation of Deep Convection in Atmosphere (Clouds) using vSMP
Simulation Details • GigaLES Model Run Dataset (partial) • 40 time-steps (24 hour simulation) • 256 vertical layers • 204.8 x 204.8 kilometers • 100 m horizontal resolution
R Analysis • 160 GB data set (40 netCDF files @ 4 GB each) • 340 GB memory footprint • ~ 3 ½ hours for data input and analysis
The Center for Multi-scale Modeling of Atmospheric Processes (CMMAP) is an NSF Science and Technology Center focused on improving the representation of cloud processes in climate models.
V F
C T L
M • System for Atmospheric Modeling: M. Kharoutdinov,
SUNY Stonybrook • Visualization: J. Helly, A. Chourasia • Analysis: J. Helly, S. Strande
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Cosmology simulation: Matter power spectrum measurement using vSMP
Source: Rick Wagner, Michael L. Norman. SDSC.
Goal is to measure the effect of the light from the first stars on the evolution of the universe. To quantitatively compare the matter distribution of each simulation, we use radially binned 3D power spectra.
• 2 simulations • 32003 uniform 3D grids • 15k+ files each
Individual simulations
Difference
Power spectra
• Existing OpenMP code • ~256GB memory used • ~5 ½ hours per field • 0 development effort
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Impact of high-frequency trading on financial markets
V M F
C T L Source: Mao Ye, Dept. of Finance, U. Illinois. Used by permission. 6/1/2012
To determine the impact of high-frequency trading activity on financial markets, it is necessary to construct nanosecond resolution limit order books – records of all unexecuted orders to buy/sell stock at a specified price. Analysis provides evidence of quote stuffing: a manipulative practice that involves submitting a large number of orders with immediate cancellation to generate congestion
Time to construct limit order books now under 15 minutes for threaded application using 16 cores on single Gordon compute node
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Protein Data Bank Query Comparisons: With DB2 Database on 2 Gordon I/O Nodes: with HDD’s or with SSD’s
V M F
C T L Source: Vishwinath Nandigam, San Diego Supercomputer Center. 2011
The Protein Data Bank (PDB): Is the single worldwide repository of information about the 3D structures of large biological molecules. These are the molecules of life that are found in all organisms. Understanding the shape of a molecule helps to understand how it works.
• For single queries, HDD and SSD perform about the same.
• For concurrent queries, SSD’s achieve large speedup.
• Q5B is > 10x, and performance varies by type of query
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Classification of sensor time series data
V M F
C T L Source: Ramon Huerta, UCSD Bio Circuits Institute Used by permission 6/1/2012
Chemical sensors (e-noses) will be placed in the homes of elderly participants in an effort to continuously and non-intrusively monitor their living environments. Time series classification algorithms will then be applied to the sensor data to detect anomalous behavior that may suggest a change in health status.
After optimizing code, linking Intel’s MKL and porting to Gordon, runtime reduced from 15.5 hours to 8 minutes
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Summary • Data-intensive supercomputing requires new
approaches, not just more storage capacity • Gordon targeted to new classes of data-
intensive applications using SSD and vSMP • Data Oasis is a robust, heterogeneous high-
bandwidth file system • SDSC Cloud facilitates data access, sharing,
search and discovery • Bridging library & researchers w/ curation pilots • After the hardware, it takes people & expertise to
bring the impact to science applications!
SAN DIEGO SUPERCOMPUTER CENTER
at the UNIVERSITY OF CALIFORNIA; SAN DIEGO
Thank you!
谢谢
Richard Moore [email protected]