Download - Clemson: Solving the HPC Data Deluge

Transcript
Page 1: Clemson: Solving the HPC Data Deluge

Clemson    HPC  Storage  Dell  Panel  SC13    Boyd  Wilson  So,ware  CTO  Clemson  University      

Page 2: Clemson: Solving the HPC Data Deluge

Outline  

• Palme9o  Cluster  • Wide  Area  Storage  Across  the  Innova@on  PlaAorm  • Collec@ve  Cluster    (Real-­‐Time  Data  Aggrega@on  and  Analy@cs  Cluster)    • Performance  Numbers  • Research  DMZ/Network    

Page 3: Clemson: Solving the HPC Data Deluge

Palmetto  Storage  

Primary  Research  Cluster  at  Clemson  •  1972  nodes  •  22928  Cores  •  998400  Cuda  Cores  •  396  TF  (only  benchmarked  newest  GPU  nodes)  •  ~120  +  TF  addi@onal  not  benchmarked.  •  Condominium  Model  •  Home  Storage  SAMQFS  backed  by  SL8500  (6PB)  •  Scratch  OrangeFS  

Page 4: Clemson: Solving the HPC Data Deluge

 SAM  QFS  Home  and  Archive  on  

SL8500  

Palmetto  Storage  

Scratch  •  32  R510  •  16  R720  •  512TB  OrangeFS  (v2.8.8)  

FDR  IB  Nodes  200  Nodes  

400  Nvidia  K20    396  TF  MX  Nodes  

1622  Nodes  96  TF  

FDR  IB  10G  MX  

NFS  Home/Archive  •  SAMQFS  over  NFS  •  120TB  Disk  •  6PB  Tape  

10G  Eth  

96  IB  Nodes  with    

Page 5: Clemson: Solving the HPC Data Deluge

 Innova@on  PlaAorm  

Data  Access  

 Campus  Data  Access  

Palmetto  Scratch    Next  Steps  

•  32  Dell  R720  •  520TB  Scratch  •  OrangeFS  •  WebDAV  to  OrangeFS  •  Hadoop  over  OrangeFS  with  MyHadoop  

FDR  IB  Nodes  200  Nodes  

400  Nvidia  K20  GPU  396  TF  

MX  Nodes  1622  Nodes  

96  TF  

FDR  IPoIB  10G  IPoMX  

WebDAV  

Mul@ple  10G  Eth  

ScienceDMZ  

Mul@ple  10G  Eth  /  100  G  

Page 6: Clemson: Solving the HPC Data Deluge

OrangeFS  Clients  •  File  Write  37Gb/s  

•  Server  Hw  problems  &  network  packet  loss  during  tests  •  Perfsonar  49Gb/s  ini@al  •  Later  retest  ~70Gb/s  with  tuning  •  Addi@onal  File  tes@ng  planned  

(Ini@al  tes@ng  systems  had  to  move  to  produc@on)  

Clemson  –  USC  100Gb  tests  

12  Dell  R720  OrangeFS  Servers  

Page 7: Clemson: Solving the HPC Data Deluge

OrangeFS  Clients  

SC13  Demo  

OrangeFS  Clients  

16  Dell  R720  OrangeFS  Servers  SC13  Floor  

•  Clemson  •  USC  •  I2  •  Omnibond  

Page 8: Clemson: Solving the HPC Data Deluge

 Innova@on  PlaAorm  

Data  Access  

 Campus  Data  Access  Social  Data  Input  

The  “Collective”  Cluster  •  12  R720  •  170TB  •  D3  based  Vis  Toolkit  called  SocalTap  

•  Social  Media  Aggrega@on  Via  GNIP  

•  Elas@c  Search  •  Hadoop  MapReduce  •  OrangeFS  •  WebDAV  to  OrangeFS  

Palme9o  

WebDAV  

Mul@ple  10G  Eth  

ScienceDMZ  

Page 9: Clemson: Solving the HPC Data Deluge

OrangeFS  on  Dell  R720s  

•  16  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet  •  32  Clients  reached  nearly  12GB/s  read  and  8GB/s  write  

#  Write  iozone  -­‐i  0  -­‐c  -­‐e  -­‐w  -­‐r  $RS  -­‐s  4g  -­‐t  $NUM_PROCESSES  -­‐+n  -­‐+m  $CLIENT_LIST  #  Read  iozone  -­‐i  1  -­‐c  -­‐e  -­‐w  -­‐r  $RS  -­‐s  4g  -­‐t  $NUM_PROCESSES  -­‐+n  -­‐+m  $CLIENT_LIST    

Page 10: Clemson: Solving the HPC Data Deluge

MapReduce  over  OrangeFS  

•  8  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet  •  Remote  Case  adds  an  additional  8  Identical  Servers  and  does  all  OrangeFS  work  Remotely  and  only  Local  work  is  done  on  Compute  Node  (Traditional  HPC  Model)  

•  *25%  improvement  with  OrangeFS  running  on  Separate  nodes  from  Map  Reduce    

Page 11: Clemson: Solving the HPC Data Deluge

MapReduce  over  OrangeFS  

•  16  Dell  R720  Servers  Connected  with  10Gb/s  Ethernet  •  Remote  Clients  are  Dell  R720s  with  single  SAS  disks  for  local  data  (vs.  12  disk  arrays  in  the  previous  test).  

Page 12: Clemson: Solving the HPC Data Deluge

Clemson  Research  Network  

100Gig&Tagged&Trunk

Brocade(MLx32(Core((Router

Clemson

CLightCollaborator

F/W&(ACL)&and&Route&Filter

Science(DMZ(

Peer&Link

Perimeter&F/W

Dell&Z9000

Dell&S4810

Dell&S4810Dell&S4810

DMZ

I2&InnovaJon&PlaKorm

PerfSonar

PerfSonar

PerfSonar

PerfSonar

Host&Firewall

Internet/I2/NLR

PerfSonar

CC7NIE

Palme>oNet

Innova@on(PlaAorm

Internet

Campus

Top&of&RackSamQFS

Fibre(Channel