Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR...

31
Networked exascale Supercomputing ASREN and the HPC community can make it happen , Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December 1-11 th 2014 Muscat, Oman

Transcript of Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR...

Page 1: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Networked exascale Supercomputing

ASREN and the HPC community can make it happen

,

Yves Poppe

A*STAR Computational Resource Centre

Singapore

ASREN

December 1-11th 2014

Muscat, Oman

Page 2: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Please, maximize my effective throughput

Connect HPC resources at Fusionopolis with the storage and genomics pipeline in the Biopolis Matrix Building

Pg 2

Page 3: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 3

Page 4: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 4

Tests started with Mellanox METRO-X early 2013 and were followed up with trials using the Obsidian Strategics Longbow C400. Today the sites are connected with 2x40gbps connections running native InfiniBand and reaching approx. 98.4% of maximum theoretical possible throughput.

Page 5: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

The big picture: NSCCSingapore’s National SuperComputer Centre

• Joint A*STAR, NUS, NTU, SUTD and NRF project; RFS Q3 2015

• NSCC– Calls for new 1-2+ PetaFLOP Supercomputer– Recurrent investment every 3 to 5 years– Pooling up and high tier compute resources at A*STAR and IHLs– Co-investment from primary stakeholders

• Science, Technology and Research Network (STAR-N)– High bandwidth network to connect distributed compute resources– Provides high speed access to users, both public and private, anywhere– Supports transfer of large data sets both locally and internationally

Pg 5

Page 6: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

The quest to maximize effective throughput TCP/IP’s curse: CPU overhead

Pg 6

Source: IBTA, the InfiniBand Trade Association

Page 7: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

InfiniBand’s magic potion: RDMA

Pg 7

Source: IBTA, the InfiniBand Trade Association

Page 8: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

The undeniable virtues of RDMA

Pg 8

47% system CPU overhead and idle time in a TCP/IP environment versus 12% in an RDMA environment

In other words 88% CPU efficiency in the user space with RDMA versus 53% with TCP/IP

Source: Mellanox

With Fast Interconnect an SSD (Solid State Drive), an Input/output operation (IOP) takes 235 microsec: 10 for network, 200 for software, 25 for read/write

With RDMA and SSD an IOP takes 26 microsec : 1 for network, 5 for software, 25 for read/write.

IOP speed matches SSD speed!

RDMA also vastly improves datacenter operation.

Page 9: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

HPC’s road to InfiniBand

• 1999: Intel, IBM, Sun, HP, Microsoft, Compaq and Dell agree on the original InfiniBand standard to solve a looming problem of a PCI (Peripheral Component Interconnect) bottleneck

• 2003: Virginia Tech builds an InfiniBand cluster ranked number three on the SC Top500 at the time.

• IB becomes increasingly popular for cluster interconnects as it beats Ethernet on both price and latency.

• November 2014: 225 of the Top 500 use InfiniBand, up 8.7% YoY.

• The Ethernet camp tries to counter with RoCE (RDMA over Converged Ethernet) and now RoCEv2 for the data centre space..

Pg 9

Page 10: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

HPC’s choice: InfiniBand link layer

Pg 10

Page 11: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

The Ultimate InfiniBand Jailbreak

• HPC’s and Infiniband were suffocating within the Data Center walls.

• Range extenders like the Mellanox MetroX gave Infiniband and consequently HPC’s and data centres themselves more breathing room and ways to expand on metro level.

• Obsidian Strategics took the final step: It took Data Centre walls away completely. InfiniBand connections can cross continents and circle the globe.

• The ultimate step: BGFC makes InfiniBand routeable and opens the possibility to permeate the globe giving rise to an Infininet.

Pg 11

Internet gave us classrooms without walls.Infininet will give us supercomputing without walls

Page 12: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

…proved itself in a spectacular way

Pg 12

Page 13: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 13

TGN-IA and TGN-P

100gbps

Page 14: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Galaxy14 Network Topology SC14

Pg 14

100gbps linking A*STAR in Singapore to the A*STAR booth at SC14 in New Orleans via Singaren, the Tata Communications transpacific cables TGN-IA and TGN-P to Seattle, Century Link to New Orleans and Scinet on the SC14 conference grounds.

Page 15: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

A*STAR’s vision: Infinicortex a Supercomputer of Supercomputers

Professor Tan Tin Wee and Dr. Marek Michalewicz proposed to demonstrate something totally new, never done before,

Pg 15

Very High speed transcontinental transmission of native Long Distance Infiniband between High Performance

Computing (HPC) centres continents apart and have them operate as one, tackling the biggest computational

challenges and opening a possible avenue to exascale supercomputing where the most vexing problem is

power and heat generation. This is not cloud computing, this is not Grid

Page 16: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

The four elements that made this possible

• Very high speed transmission as made possible by ACA100– Asia connects America at 100gbps, challenge issued by Yves Poppe,

then at Tata Communications, at APAN 37 in Bandung, Indonesia.

• InfiniBand over trans-pacific distances– Made possible with Obsidian Strategics InfiniBand range extenders.

• Galaxy of Supercomputers– Supercomputer interconnect topology and graph theory work by Y.Deng,

M.Michalewicz and L.Orlowski.– InfiniBand subnetting using the BGFC protocol and the new Obsidian

Crossbow InfiniBand router.

• Application layer– File transfer optimization based on the development of Dsync+ for

simple file transfers all the way to complex work flows with ADIOS (Adaptable I/O System) developed by Dr. Scott Klasky and his team at Oak Ridge National Laboratories.

Pg 16

Page 17: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

InfiniBand range extender and router

Pg 17

Longbow Device

Developed by Obsidian Strategics based in Edmonton, Canada

Crossbow plus Longbows give rise to Galaxy and open the door to an Infininet

Crossbow Device

Page 18: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 18

Page 19: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Galaxy of Supercomputers

• Supercomputers located at different geolocations connected into a Super-Graph or ‘Nodes of Super-Network’

• Supercomputers may have arbitrary interconnect topologies

• Galaxy is based on a topology with small diameter and lowest possible link number.

• In terms of graph representation it is an embedding of graphs representing Supercomputers’ topologies into a graph representing the Galaxy topology.

Pg 19

Page 20: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 20

Page 21: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Making the vision a reality

• Testing within Singapore completed using dark fibre between two A*CRC sites and also with the National University of Singapore using Singaren’s new SLIX over 80km. Convincing results led us to deploy two 40gbps InfiniBand connections between our Biopolis and Fusionopolis sites.

• InfiniBand over Ethernet testing with Tokyo Institute of Technology’s Tsubame-KFC successfully completed using Singaren, APAN and JGN-X.

• InfiniBand over IP testing completed with the NCI (National Computational Infrastructure) at the Australian National Unversity in Canberra using existing Singaren, APAN and AARnet infrastructure.

• 10gbps dedicated link between Singapore and the USA for layer 2 ‘native’ InfiniBand testing with ORNL and others starting end October.

• Rather spectacular results of the 100gbps trial and demos between Singapore and New Orleans at SC14

Pg 21

Page 22: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Proving the point

Pg 22

Extract from the presentation prepared by Jakub Chrzeszczyk and Andrew Howard, NCI, Australia

Page 23: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 23

Page 24: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 24

Page 25: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 25

Page 26: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 26

Page 27: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Long Distance InfiniBand: a potential R&E networking game changer

Pg 27

So far, the global HPC needs were presented at TERENA, APAN and GLIF and resonate with the visions of the global R&E networking community.

Adoption of native Infiniband as a commonly used layer 2 transmission protocol would give NREN’s a rare opportunity to gain back the lead in

innovation and clearly differentiate themselves from commercial networks.

The HPC community is faced with a continuing exponential growth of data generated and current NREN internetworking capacity is already insufficient

considering only the needs of genomics data interchange.

To reach exascale computing, a distributed approach is probably required if only to cope with power requirements and disaster recovery

Page 28: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

The HPC community’s call to ASREN

HPC’s need NREN’s and NREN’s need HPC’s– The majority of global R&E traffic originates from the HPC community.– Supercomputing is essential to the economic development in all advanced

industrial sectors as well as academic research and education.– The HPC community constitutes by far the most demanding constituency

globally as they continue to push relentlessly the bandwidth and switching capacity envelopes on all scales. This for the simple reason that the incredible ‘big data’ growth with associated hunger for computing power, storage and associated electrical power and cooling will continue unabated.

– Reaching the exaflop scale in supercomputing will very likely require a distributed approach to be sustainable and include the Infinicortex concept.

Pg 28

Let us lead the world in building an Infinet!Let us lead the world towards exascale computing

Page 29: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Circle the globe at 100gbps with ACE-100?

• Prof. Tan Tin Wee, Chairman of A*STAR Computational Resource Centre, pointed out that with ANA-100 now a reality and ACA-100 coming, the only missing piece to circle the globe would be ACE-100: Asia connects Europe.

• I had a vision of bits racing around the world, 100,000,000,000 of them every second, 100gbps, as fast as light can travel through fibre, transmitting a continuous stream of copies of Jules Vernes’ ‘Around the world in eighty days’.

Pg 29

SC15: the Phileas Fogg challenge

Page 30: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

We hope to see you in Singapore at

Pg 30

Organised by A*STAR Computational Resource Centre (A*CRC), An international conference on

supercomputing, exascale and beyond in Singapore and AsiaMarch 17-20, 2015, Singapore

http://supercomputingfrontiers2015.com/?page_id=23033

Page 31: Networked exascale Supercomputing ASREN and the HPC community can make it happen, Yves Poppe A*STAR Computational Resource Centre Singapore ASREN December.

Pg 31

Thank You

Creativity requires the courage to let go of certainties. Erich Fromm