Clemson: Solving the HPC Data Deluge
Click here to load reader
-
Upload
insidehpc -
Category
Technology
-
view
354 -
download
1
description
Transcript of Clemson: Solving the HPC Data Deluge
Clemson HPC Storage Dell Panel SC13 Boyd Wilson So,ware CTO Clemson University
Outline
• Palme9o Cluster • Wide Area Storage Across the Innova@on PlaAorm • Collec@ve Cluster (Real-‐Time Data Aggrega@on and Analy@cs Cluster) • Performance Numbers • Research DMZ/Network
Palmetto Storage
Primary Research Cluster at Clemson • 1972 nodes • 22928 Cores • 998400 Cuda Cores • 396 TF (only benchmarked newest GPU nodes) • ~120 + TF addi@onal not benchmarked. • Condominium Model • Home Storage SAMQFS backed by SL8500 (6PB) • Scratch OrangeFS
SAM QFS Home and Archive on
SL8500
Palmetto Storage
Scratch • 32 R510 • 16 R720 • 512TB OrangeFS (v2.8.8)
FDR IB Nodes 200 Nodes
400 Nvidia K20 396 TF MX Nodes
1622 Nodes 96 TF
FDR IB 10G MX
NFS Home/Archive • SAMQFS over NFS • 120TB Disk • 6PB Tape
10G Eth
96 IB Nodes with
Innova@on PlaAorm
Data Access
Campus Data Access
Palmetto Scratch Next Steps
• 32 Dell R720 • 520TB Scratch • OrangeFS • WebDAV to OrangeFS • Hadoop over OrangeFS with MyHadoop
FDR IB Nodes 200 Nodes
400 Nvidia K20 GPU 396 TF
MX Nodes 1622 Nodes
96 TF
FDR IPoIB 10G IPoMX
WebDAV
Mul@ple 10G Eth
ScienceDMZ
Mul@ple 10G Eth / 100 G
OrangeFS Clients • File Write 37Gb/s
• Server Hw problems & network packet loss during tests • Perfsonar 49Gb/s ini@al • Later retest ~70Gb/s with tuning • Addi@onal File tes@ng planned
(Ini@al tes@ng systems had to move to produc@on)
Clemson – USC 100Gb tests
12 Dell R720 OrangeFS Servers
OrangeFS Clients
SC13 Demo
OrangeFS Clients
16 Dell R720 OrangeFS Servers SC13 Floor
• Clemson • USC • I2 • Omnibond
Innova@on PlaAorm
Data Access
Campus Data Access Social Data Input
The “Collective” Cluster • 12 R720 • 170TB • D3 based Vis Toolkit called SocalTap
• Social Media Aggrega@on Via GNIP
• Elas@c Search • Hadoop MapReduce • OrangeFS • WebDAV to OrangeFS
Palme9o
WebDAV
Mul@ple 10G Eth
ScienceDMZ
OrangeFS on Dell R720s
• 16 Dell R720 Servers Connected with 10Gb/s Ethernet • 32 Clients reached nearly 12GB/s read and 8GB/s write
# Write iozone -‐i 0 -‐c -‐e -‐w -‐r $RS -‐s 4g -‐t $NUM_PROCESSES -‐+n -‐+m $CLIENT_LIST # Read iozone -‐i 1 -‐c -‐e -‐w -‐r $RS -‐s 4g -‐t $NUM_PROCESSES -‐+n -‐+m $CLIENT_LIST
MapReduce over OrangeFS
• 8 Dell R720 Servers Connected with 10Gb/s Ethernet • Remote Case adds an additional 8 Identical Servers and does all OrangeFS work Remotely and only Local work is done on Compute Node (Traditional HPC Model)
• *25% improvement with OrangeFS running on Separate nodes from Map Reduce
MapReduce over OrangeFS
• 16 Dell R720 Servers Connected with 10Gb/s Ethernet • Remote Clients are Dell R720s with single SAS disks for local data (vs. 12 disk arrays in the previous test).
Clemson Research Network
100Gig&Tagged&Trunk
Brocade(MLx32(Core((Router
Clemson
CLightCollaborator
F/W&(ACL)&and&Route&Filter
Science(DMZ(
Peer&Link
Perimeter&F/W
Dell&Z9000
Dell&S4810
Dell&S4810Dell&S4810
DMZ
I2&InnovaJon&PlaKorm
PerfSonar
PerfSonar
PerfSonar
PerfSonar
Host&Firewall
Internet/I2/NLR
PerfSonar
CC7NIE
Palme>oNet
Innova@on(PlaAorm
Internet
Campus
Top&of&RackSamQFS
Fibre(Channel