D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

17
Name Designation Affiliation Date Submitted by: Ben Humphreys ASKAP Computing Project Engineer CSIRO 2011-12-09 Chris Broekema LOFAR Software Engineer ASTRON 2011-12-09 Approved for release as part of SKA Software and Computing CoDR: K. Cloete Project Manager SPDO 2012-01-27 SOFTWARE AND COMPUTING CODR HPC TECHNOLOGY ROADMAP Document number ...................................................................WP2-050.020.010-SR-004 Revision ........................................................................................................................... E Author .................................................................... Ben Humphreys and Chris Broekema Date ................................................................................................................. 2012-01-27 Status .............................................................................................. Approved for Release

Transcript of D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

Page 1: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

Name Designation Affiliation Date

Submitted by:

Ben Humphreys ASKAP Computing Project Engineer

CSIRO 2011-12-09

Chris Broekema LOFAR Software

Engineer ASTRON 2011-12-09

Approved for release as part of SKA Software and Computing CoDR:

K. Cloete Project Manager SPDO 2012-01-27

SOFTWARE AND COMPUTING CODR HPC TECHNOLOGY ROADMAP

Document number ................................................................... WP2-050.020.010-SR-004 Revision ........................................................................................................................... E Author .................................................................... Ben Humphreys and Chris Broekema Date ................................................................................................................. 2012-01-27 Status .............................................................................................. Approved for Release

Page 2: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 2 of 17

DOCUMENT HISTORY Revision Date Of Issue Engineering Change

Number

Comments

A 2011-09-29 - Draft posted 2011-09-30 for comment

B 2011-10-03 - Extrapolations from Top500 historical data added, other minor additions

C 2011-11-13 - Complete rewrite

D 2011-12-16 - Addressed review comments

E 2012-01-27 - References and title page updated, actioned by D. Hall

DOCUMENT SOFTWARE Package Version Filename

Wordprocessor MsWord Word Mac

2008

D3D_WP2-050.020.010-SR-004-E_HPC_Technology_Roadmap

ORGANISATION DETAILS Name SKA Program Development Office

Physical/Postal Address

Jodrell Bank Centre for Astrophysics

Alan Turing Building

The University of Manchester

Oxford Road

Manchester, UK

M13 9PL Fax. +44 (0)161 275 4049

Website www.skatelescope.org

Page 3: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 3 of 17

TABLE OF CONTENTS

1 INTRODUCTION ............................................................................................. 5

1.1 Purpose of the document ....................................................................................................... 5 1.2 Scope of the document ........................................................................................................... 5

2 REFERENCES ................................................................................................ 6

2.1 Applicable documents ............................................................................................................. 6 2.2 Reference documents ............................................................................................................. 6

3 BACKGROUND AND CONTEXT ........................................................................... 7

4 HPC ROADMAP ANALYSIS .............................................................................. 8

5 HPC ROADMAP TRENDS AND DIRECTIONS ........................................................ 10

6 RESEARCH TO BE DONE ................................................................................. 15

7 COLLABORATION OPPORTUNITIES ................................................................... 15

8 CONCLUSIONS & RECOMMENDATIONS ............................................................. 16

Page 4: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 4 of 17

LIST OF FIGURES Figure 1 - Historical and projected performance of supercomputers on the Top500 list. ..................... 8Figure 2 -Total hardware concurrency of the Top 10 systems in the Top 500. .................................... 10Figure 3 - Total microprocessor power budget (Source: Kogge et al. 2008 [5]) ................................... 11Figure 4 - Distribution of energy used in a projected 2018 ExaFlop supercomputer ........................... 14Figure 5 - Energy required for accessing a word of data, compared to distance to the CPU ............... 14

LIST OF TABLES Table 1 - Projected supercomputer specifications, compared to two current top ranking

supercomputers. Note that, apart from raw floating-point performance and required energy per Flop, all secondary specifications lag behind. (Source: Kogge et al., 2008 [5] & www.top500.org) ....................................................................................................................... 12

LIST OF ABREVIATIONS CoDR..........................Conceptual Design Review

CPU............................Central Processing Unit - a processor

Exascale....................Systems capable of reaching Exaflop/s performance

Flop.............................Floating Point Operation

HPC............................High Performance Computing

I/O...............................Input / Output

kWh............................KiloWatt Hour

MTTI...........................Mean Time to Interrupt

MW.............................MegaWatt

PEP............................Project Execution Plan

SKA........................... Square Kilometre Array

SPDO........................ SKA Program Development Office

SRR........................... Science Requirements Review

Page 5: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 5 of 17

1 Introduction

1.1 Purpose of the document

The purpose of this document is to discuss the SKA project with respect to existing roadmaps for High-Performance Computing (HPC); a technology that the project will be critically reliant on. The primary aim of this document is two-fold: 1) To inform the Science Requirements Review (SRR) and related scoping activities as to the viability and risk associated with the cutting-edge HPC technologies the project is reliant on. 2) To inform the Project Execution Plan (PEP) [4] as to the required research, development and prototyping activities that will be required during the pre-construction phase.

1.2 Scope of the document

This document largely focuses on SKA Phase 1 and this particular revision of the document is largely restricted to covering those topics of important in the near future; that is in the software and computing conceptual design review (CoDR) and pre-construction phase. This document is focused on those parts of the system known to be reliant on HPC, including but not limited to: calibration, imaging, and non-imaging processing (such as pulsar timing). The needs of sub-systems that may leverage HPC, such as beam-formers and correlators are discussed in less detail.

Page 6: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 6 of 17

2 References

2.1 Applicable documents

The following documents are applicable to the extent stated herein. In the event of conflict between the contents of the applicable documents and this document, the applicable documents shall take precedence.

[1] S06: WP2‐005.010.030‐MP‐001 - System Engineering Management Plan (Revision F) [2] S08: WP2-001.010.010-PLA-002 - SKA Operational Concepts (Revision A) [3] S07: WP2‐005.030.000‐SRS‐002 - SKA Phase 1 System Requirements Specification

(Revision B)

2.2 Reference documents

The following documents are referenced in this document. In the event of conflict between the contents of the referenced documents and this document, this document shall take precedence.

[4] S02: 130_Memo_Dewdney - SKA Memo 130: SKA Phase 1: Preliminary System Description [5] Z02: TR-2008-13 - ExaScale Computing Study: Technology Challenges in Achieving

Exascale Systems [6] Z01: IESP-roadmap - International Journal of High Performance Computer Applications

(2011): Volume 25, Number 1: The International Exascale Software Project Roadmap [7] S03: 132_Memo_Humphreys - SKA Memo 132: Analysis of Convolutional Resampling

Algorithm Performance [8] D3A: WP2-050.020.010-SR-001 - Visibility Processing [9] Z14: 20091120 sc09-exa-panel-kogge - Energy at Exaflops [10] Z15: EESI__Update-HPC_Initiatives_EPSRC_D2 2_R1.0 - European Exascale Software

Initiative - Deliverable D2.2 Update of Investigation Report on Existing HPC Initiatives [11] Z16: Kogge_Dysart_Using_the_TOP500_to_Trace_and_Project_Trends - Proceedings of

2011 International Conference for High Performance Computing, Networking, Storage and Analysis, November 2011: 12-18: Using the TOP500 to Trace and Project Technology and Architecture Trends

Page 7: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 7 of 17

3 Background and Context

The SKA phase 1 preliminary system description [4] describes a processing requirement that is considered beyond the capabilities of modern day supercomputers. This requirement relates to calibration, imaging processing and non-imaging processing (e.g pulsar timing or searches). As a result, the success of the SKA is critically dependant on improvements in high-performance computing, and the ability of the project to adequately leverage these improvements. As described in the summary, this document aims to identify the risks, opportunities and necessary areas of research required surrounding this technology area the SKA project is critically reliant on. This is done via an analysis of current HPC roadmaps and an analysis of the gaps that currently exist in the ability for the SKA project to leverage these technologies. The precise HPC requirements are unclear and for many reasons will remain unclear for quite some time. The first and perhaps most obvious reason for this lack of certainty is that the HPC roadmaps are merely educated guesses, and not guarantees as to available supercomputing technology. It is not clear exactly what technology will be the basis of exascale computers, when such a capability will be economically feasible, and what programming model will be used to exploit such a system. The second reason relates to uncertain science requirements. Even small changes in requirements, such as dynamic range or resolution requirements, may have a profound impact on the processing requirements. An example being baseline length where, as least using current imaging techniques, the processing cost grows roughly as the cube of the baseline length. Finally, the techniques and algorithms implemented will inevitably change over the period of design and implementation, a period of approximately a decade. This change in techniques or algorithm implementation is true of most radio telescopes, and is absolutely necessary for SKA. This variability may increase the processing requirements, for example where some unexpected instrumental or physical effects must be accounted for. Likewise, improvements in algorithms or techniques may lead to significantly reduced processing requirements. The actual processing requirements for SKA phase 1 processing has been estimated, and estimates vary from ~8 petaflop/s to many 100s of petaflop/s. Indeed it is obvious that SKA is on a path to approximately an exaflop as will be required for SKA phase 2. At the low end of the estimates for phase 1 is that described in [5] as ~8 petaflop/s. This document however states: “overall compute requirement may be a serious underestimate. This figure assumes 100% efficiency”. Various studies (e.g. [10]) have found that efficiency, with respect to peak floating-point performance, is perhaps as low as 5-10%. Furthermore, and as will be explained below, this figure may substantially decrease in future as the result of the growing gap between memory bandwidth and floating-point compute performance. It should also be noted that this estimate does not incorporate correction for various instrumental effects, many of which are outlined in [11]. Until recently, practical experience with tera-scale and peta-scale processing in radio astronomy has been limited. However, the advent of SKA and the associated precursors/pathfinders has led to a substantial increase in development of HPC capabilities. The two telescopes currently developing processing capabilities in the range of hundreds of teraflops are the Low Frequency Array (LOFAR), and the Australian Square Kilometre Array Pathfinder (ASKAP). Additionally, a number of institutes are researching and applying cluster and/or GPU processing techniques at various scales. These experiences will be related below to support the HPC roadmap analysis provided.

Page 8: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 8 of 17

4 HPC Roadmap Analysis

The HPC roadmap analysis focuses on two specific HPC roadmaps, “ExaScale Computing Study: Technology Challenges in Achieving ExaScale Systems” [5] and “The International ExaScale Software Project Roadmap” [6]. Perhaps the most used (and abused) feasibility argument centres around the floating point arithmetic performance of the Top 500 fastest supercomputers. Figure 1 shows the historical and projected performance of the fastest computer (blue points) and 500th fastest (orange points). The projection indicates that the SKA phase 1 requirement of a few hundred petaflop/s, if procured in 2017, would put this system somewhere close to, or within the top 10 fastest in the world. Looking further out to SKA phase 2, and a requirement of perhaps an exaflop in 2022, again the system would most likely be ranked in the top 10. However the projection indicates one exaflop would be achieved in the 2019-2020 timeframe.

Figure 1 - Historical and projected performance of supercomputers on the Top500 list.

(Source: www.top500.org) Both of these projections would seem to indicate that the SKA HPC goals are realistic, however are certainly on the cutting-edge and as such pose high risks to the project. If however SKA is able to constrain its processing requirements to some of the lower estimates then what would, by 2017-2018, be “commodity” HPC systems may well suffice. The following describe in more detail the specific requirements of the SKA and the associated roadmap analysis. This focuses on the pain-points identified in the pathfinders; thus areas which are known to be a source of concern.

Memory Bandwidth Gap Modern processors are more suited to those algorithms with an arithmetic intensity in the range of 2 to 10 single-precision ops/byte. However, calibration and imaging algorithms generally have a much lower arithmetic intensity, closer to 1 single-precision ops/byte and often below. This indicates that such algorithms are likely to be bound by memory-bandwidth, and not starved of floating-point arithmetic capability. Consequently, using floating-point instructions per second (or flops) as a measure is not particularly useful. Rather, memory bandwidth is likely to be a better measure of performance. Unfortunately, while floating point performance of a compute node is expected to increase by a factor of O(1000) in the period 2009-2018, memory bandwidth in only expected to increase O(100) fold [6].

Page 9: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 9 of 17

If this cannot be remedied in algorithm improvements, it is likely that HPC technologies will fall short of the SKA need by a factor of perhaps O(10).

Power As at November 2011, the fastest/largest supercomputer on the top 500 has a power consumption of 12.66 megawatts (MW). A politico-economic pain threshold of 25MW has been suggested (by DARPA) as a working boundary [6]. This range of 10-25MW is likely to be expected for any system in the top 20 later over the next decade, as would be the case for both SKA phase 1 and 2. Assuming a nominal power cost of €0.20 per kWh, then the yearly power expense for such a system would be between €17.5M and €43.8M. Furthermore, this does not include the energy cost of cooling, so the expense could be anywhere between 50 and 100% larger. The goal of 25 megawatts however is just that, a goal, and may not be realistic. Indeed the projections described in [5] and [11] indicate the power requirement for an exascale class system in the 2018-2020 timeframe would be upwards of 100MW. This equates to over €175M per year given the power costs quoted above. This cost is significant and the SKA must take steps to ensure efficient processing. By leveraging low power technologies and investing in algorithm development and HPC expertise it is likely that significant operational savings, perhaps on the order of €10+ million per year, are possible.

Data Rates The SKA phase 1 preliminary system description [4] specifies a data rate of 320 GBytes/s from the SKA1 Dish correlator and ~3 TBytes/s from the aperture array correlator. Such a high data rate today (2011) would require approximately 700 x 40GbE connections into the science computing system (known as the common processor). However in just a few years, and certainly well prior to the procurement of the SKA1 central processor, it can be expected that a more modest 280 x 100GbE connections would be required. Thus the input data stream into the science computing system, while still challenging, is not a significant area of risk. Placing HPC technology in the role of beamformer or correlator is a different story. Input data rates could be as high as 10PB/s, for SKA phase 1. Such a system would be without peer in the HPC world.

Page 10: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 10 of 17

5 HPC Roadmap Trends and Directions

The SKA will require processing in the Exaflop/s range. ExaScale research efforts are underway to develop a system capable of reaching those performance figures by the end of the decade, but we can expect disruptive developments in the way this these systems are designed. The ExaScale computing study [5] showed that, although Exaflop/s performance should be possible by 2018, the resulting system will most likely be characterized by massive parallelism, and relatively low memory and I/O bandwidth. The International Exascale Software Project (http://www.exascale.org/iesp) takes the predictions of the Exascale computing study and on top of this builds a roadmap for required advancements in operating systems, middleware, libraries, software development tools and algorithm design [6]. The disruptive developments described in these two roadmaps directly impact the way we design our systems and develop our software. The next couple of sections will briefly show the expected developments for hardware, middleware, software and algorithms and how these may affect SKA system and software development.

Hardware Moore’s law, as interpreted as the doubling of the number of components per unit of area on a chip every 18-24 months, is expected to continue to hold for the next decade or so, which means that feature sizes in future processor will continue to decrease for the foreseeable future. Due to increased leakage power at small feature sizes, processor clock frequency has levelled off. Future systems are expected to continue to run at a clock frequency in the order of one to several Gigahertz. So the trend is that individual cores tend to not increase in performance very much, certainly not sufficiently to follow Moore’s law. Shrinking feature sizes, however, allow us to add ever-increasing numbers of cores on a CPU. This leads to a massive increase in required application concurrency to efficiently use the available resources.

Figure 2 -Total hardware concurrency of the Top 10 systems in the Top 500.

(Source: Kogge et al. 2008 [5])

Page 11: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 11 of 17

ExaScale systems will be characterized by massive parallelism on many levels. Huge numbers of nodes, possibly of various types, will be connected to a cohesive but highly complex system. Within a node, and even within a processor, we’ll see various levels of parallelism. It is probable that processors will be heterogeneous, consisting of both a smaller number of general purpose, complex, superscalar, out-of-order cores and many much simpler cores optimized for floating point operations. It is possible that these will be augmented by a number of special purpose accelerators. The heterogeneous nature of these processors makes them relatively hard to program, but the potential performance and efficiency of such a system is tremendous. The power budget available for a single processor socket has also levelled off. The practical limit for commodity cooling solutions is around 150W per socket. Water-cooling may raise this limit slightly. In the future we’ll see aggressive and fine grained power gating shutting down unused parts of a CPU, allowing the remaining components to dynamically scale in performance to fill the available thermal budget. It is likely that the available thermal budget per socket will be insufficient to allow all components in a processor to run at full power simultaneously.

Figure 3 - Total microprocessor power budget (Source: Kogge et al. 2008 [5])

Conventional HPC applications are often relatively compute intensive; the number of Flops per bit of I/O is very large. SKA processing, in contrast, contains a significant portion of operations with very low computational intensity. The streaming nature of the SKA central processor emphasises this. Although most applications will notice the significantly reduced memory bandwidth per Flop available in future systems, the I/O bound and streaming nature of SKA processing makes this a particularly significant problem for us.

2009 Jaguar

2011 ‘K’ Computer

2018 2009 vs 2018

Rmax 2 PFlop 10 PFlop 1 EFlop O(1000)

Energy requirement 6 MW 10 MW 20 MW O(10)

Energy/Flop 3 nJ/Flop 1 nJ/Flop 20 pJ/Flop -O(100)

System memory 0.3 PB 1 PB 32 - 64 PB O(100)

Memory/Flop 0.6 B/Flop 0.1 B/Flop 0.03 B/Flop -O(10)

Page 12: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 12 of 17

2009 Jaguar

2011 ‘K’ Computer

2018 2009 vs 2018

Node performance 125 GFlop 128 GFlop 1 - 15 TFlop O(10) - O(100)

Node interconnect b/w 3.5 GB/s 10 GB/s 200-400 GB/s O(100)

Memory bw/node 25 GB/s 64 GB/s 2 - 4 TB/s O(100)

Memory bw/Flop 0.2 B/s/Flop 0.5 B/s/Flop 0.002 B/s/Flop -O(100)

Concurrency 225,000 548,352 O(109) O(10,000)

MTTI days days O(1 day) -O(10)

Table 1 - Projected supercomputer specifications, compared to two current top ranking supercomputers. Note that, apart from raw floating-point performance and required energy per Flop, all secondary

specifications lag behind. (Source: Kogge et al., 2008 [5] & www.top500.org) In short, ExaScale hardware will undergo dramatic changes in the next decade or so. The accepted vision for an exascale system is the so-called GigaKiloMega model, which stands for GigaHz KiloCore MegaNode. In other words, an exascale system will consist of 106 nodes, which each consist of 103 cores, running at a Gigahertz. The risks of these developments for SKA mainly concentrated in the limited amount of I/O, both external (i.e. interconnect and Ethernet), and internal (memory bandwidth), available in these systems. This is due to the relatively low computational intensity of SKA processing, and the streaming character of the central processor. For real-time applications, dynamic power scaling may cause problems due to increased timing noise.

Operating Systems & Middleware The tremendous increase in concurrency in future systems means a significant additional load on the memory infrastructure. Exposing cache structure to the programmer or the programming tools available to the programmer may significantly increase the efficiency of I/O bound applications by allowing the programmer or compiler more control over the way data moves in the CPU. SKA is a largely I/O bound application, which makes this a particularly attractive proposition for us. The highly complex nature of future HPC hardware, notably its heterogeneous nature and dynamic power gating, may decrease the predictability of node performance, especially with respect to real-time operations. Operating system noise, in other words, is expected to increase. Massively parallel applications can therefore not afford any synchronicity between large numbers of nodes, since there is no way to guarantee these nodes run in lock-step. Supercomputers in the 2018 timeframe will contain many millions of components, any one of which can break. The Mean Time To Interrupt (MTTI) of current supercomputers can be measured in days, in future systems this will be much shorter. In essence a future supercomputer must be considered partly broken most of the time, and failures during operations will be frequent. Fortunately most of the SKA processing is highly independent and the loss of a small part of an experiment is often acceptable. A significantly reduced MTTI should not impact SKA processing much, provided the operating system and communications middleware handle failures gracefully.

Page 13: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 13 of 17

Compilers / Libraries / Development tools The effort currently required to program a heterogeneous system is tremendous. All levels of parallelism have to be explicitly leveraged, often using several different APIs. Future systems will almost certainly involve even more heterogeneity and levels of parallelism. Although this is a well-recognized problem, not unique to SKA, we do need to develop and maintain the ability to use the exascale software development tools and hardware to be developed in the next couple of years. As mentioned before, future supercomputers will have to be considered permanently broken. Application middleware will have to be able to gracefully handle missing nodes or communication links. Although can usually accept the loss of part of an observation, we do need to be able to detect failed hardware. Future middleware needs to be able to handle failed hardware and provide feedback on observed or suspected failures to the application. The streaming nature of the SKA application also makes failure handling more time critical than in most other applications. This same streaming nature of SKA processing makes it likely the first extreme scale application that may benefit from a highly scalable framework for the development of data intensive applications. One may imagine a high level language in which programs are expressed in terms of data transformations. One of the key features of such a framework should be that the programmer is aware and has control over the locality of the data. In other words, the programming environment should allow explicit control over the way in which data is moving not only between nodes, but also within a node. This fine-grained control over data locality and movement is key to an efficient data intensive application, in flops, bytes and in joules.

Algorithms Astronomical software for HPC platforms is still relatively young. The SKA precursor and pathfinder projects have all had to develop significant portions of their software stack from scratch. This means that the astronomical community is relatively unburdened by legacy software and can build on a fresh and modern basis. Unfortunately a large portion of the software for SKA needs to be rewritten from scratch again in order to scale to exaflop levels. Astronomical data processing is by nature data parallel. Frequency channels are often treated completely independent of each other. In current systems the number of computational resources (cores, multi processors, vector units, etc) is often of the same order than the available number of frequency channels. In SKA we’ll be faced with several orders more computational resources (cores, FPUs, stream processors) than frequency channels, which means novel parallelization strategies need to be developed. Automatic parallelization techniques and advanced simulation and modelling may aid us in this effort, but will probably require significant effort from the SKA community to be used (and developed) effectively. A very large portion of the processing in precursor instruments depends on astronomical libraries like casacore and numerical libraries like blas. Although the HPC community can be expected to develop an exascale version of the most obvious numerical routines, we must develop an exascale variant of the astronomical routines. This will require a significant investment in time and personnel, but importantly also an investment in the development of tools to build these routines on. Early involvement in the design of the numerical libraries for exascale systems may aid us here.

Operational cost / energy requirement An exascale supercomputer in the 2018 timeframe is, optimistically, expected to consume around 25 MW. The dominant energy-consuming factor in such a system will be I/O, with over half the energy required being spent on moving bits. Note that this is an estimate based on conventional,

Page 14: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 14 of 17

computationally intensive, HPC applications. SKA processing is mostly I/O bound, which may significantly increase the percentage of energy spent on I/O.

Figure 4 - Distribution of energy used in a projected 2018 ExaFlop supercomputer

(Source: Kogge et al. [5]) The streaming nature of the SKA central processor introduces another problem. Energy required for I/O increases super-linearly with distance to the CPU. Since all our data is streamed, we have to pay the maximum price for every bit processed.

Figure 5 - Energy required for accessing a word of data, compared to distance to the CPU

(Source: Kogge [11]) The energy required for processing the amount of data the SKA will produce, demands an energy-focused approach to high performance computing. Development efforts will have to be focused on limiting power consumption to a minimum. The most effective way to achieve this is most likely by optimizing for minimal I/O, even at the cost of additional Flops. A data aware / streaming programming framework may be of particular value here, provided it allows the algorithms designer the flexibility required to optimize data flows efficiently.

Co-Design of ExaScale systems An interesting development is the emerging trend to co-design HPC platforms. Close cooperation between the supercomputer vendor and the primary application owners allow for a system design that is optimally suited for the intended application. For SKA this is both a blessing and a curse. The development of systems optimized purely for high compute intensity batch processed scientific data may hurt performance of astronomical codes for the SKA. On the other hand, if we get involved early and the SKA is accepted as a challenging and

Page 15: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 15 of 17

interesting co-design vehicle for future exascale platforms, we may end up with a system particularly suited for our application. This does require involvement from our end in the development of exascale systems, not only on a high level, but also by investing detailed technical domain expertise.

6 Research to be done

SKA processing differs from other exascale applications in its streaming nature. Research in the development of software frameworks and runtime environments for streaming processing is therefore essential. Although this scale of streaming processing may be unique to astronomy, this research has a much broader applicability. Related to this, we need to make sure future hardware and software systems can efficiently handle the I/O load SKA requires. Research in efficient operating systems, middleware and software development tools that facilitate I/O is necessary. Not only is efficient I/O essential to SKA, I/O is also expected to be the major contributing factor to central processing energy consumption. Optimization techniques focusing on minimal energy consumption, not on optimal FPU utilisation, may effectively reduce the operating cost of the SKA.

7 Collaboration Opportunities

The International ExaScale Software Project (http://www.exascale.org/iesp) is a collaboration of some of the biggest institutes in supercomputing research. The exascale software roadmap it produced [6] shows an excellent grasp of the challenges ahead. It has identified the need for a limited number of so called co-design vehicles to focus research efforts. SKA was welcomed as an interesting potential co-design vehicle, especially considering its very data intensive character. Even if SKA is not selected as an official co-design vehicle, we should still strive to continue to be involved in this project, as it is a convenient way to build relations with a large group of big players in HPC research. A similar project in Europe, the European Exascale Software Initiative (EESI), concluded in October 2011 and there is currently no known follow on project. Europe does have several other projects, like PRACE (http://www.prace-project.eu), PlanetHPC (http://www.planethpc.eu) and TEXT (http://project-text.eu), but these appear to be somewhat limited in scope or focus more on the near future [10]. Apart from the research institutes in the US and Europe, we also need to work closely with the various vendors with an exascale research programme (i.e. IBM, Intel, NVIDIA). Collaborations with these vendors are the best way to get early access to prototype systems for evaluation. This does pose a risk, since there is no way to predict which vendor will provide the most appropriate solution for SKA. We must avoid early vendor lock-in at all cost. Note that limited resources mean we have to limit our more intensive collaboration efforts to the vendors most likely to produce a viable exascale system. A somewhat less obvious partner for collaboration might be the other exascale application owners. We need to focus not only on how SKA differs from the conventional applications, but we need to leverage the knowledge of classic HPC applications as well. We may be able to reuse a portion of

Page 16: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 16 of 17

their know-how with little or no effort on our part. We should limit ourselves to the major, confirmed exascale, applications, possibly only the co-design vehicles to be selected by the IESP. Whatever collaboration partners we end up selecting, we need to focus our efforts on the unique qualities of SKA, its data intensive nature and the streaming character of the central processor. Real-time processing of high bandwidth data, in other words software correlation, is also an excellent research vehicle.

8 Conclusions & Recommendations

By all measures it is clear that both SKA phase 1 and 2 are critically reliant on cutting-edge HPC technology. This has two obvious implications: 1) much work will be required to effectively exploit this technology, and 2) risk is high and steps must be taken towards mitigating this risk. The first point, that much work is required, should not be addressed by SKA in isolation. Indeed existing users of high-end HPC have similar exascale ambitions and are in the process of dealing with exascale challenges. As such it is vital for the project to continue close relationships with the wider HPC community in order to ensure such technology can be effectively leveraged. Projections appear to indicate that both the phase 1 and phase 2 HPC requirements may be achievable, at least from a hardware perspective, on a schedule compatible with SKA. While the hardware challenge may be overcome by industry, in a manner SKA can directly exploit, the same cannot be said for software. While some software technologies will be developed towards exascale by existing HPC users, there will most likely be large gaps left, limiting the ability of SKA to efficiently exploit these technologies. This is most true in the areas of data intensive computing and the processing of realtime data streams (as opposed to the more common batch processing). As such, SKA must invest directly in overcoming these challenges. So while ExaScale systems may be available somewhere in the 2018-2019 timeframe, the design of these exascale systems will introduce disruptive changes in the way we design our systems and write our software. In order to scale to exaflop/s performance, nearly all software will need to be rewritten and redesigned from the ground up. This effort will most likely depend on a small number people intimately familiar with the processing required. Leading with our HPC accomplishments, notably LOFAR and ASKAP, is essential. Much work has been done in the pathfinders, work that should be leveraged by SKA. However, current precursor and pathfinder instruments are unlikely to attempt scaling beyond a few petaflop/s. This leaves a scaling gap of several orders of magnitude to SKA, which needs to be addressed. In order to be able to effectively use the available compute resources, SKA needs to acquire and maintain expertise in extreme scaling of data intensive applications. Since our own pathfinders and precursors are unlikely to scale far beyond the petaflop/s level, we need to partner with industry, large research facilities and other exascale code owners to do so. Pathfinders, notably ASKAP, have been very successful in leveraging existing national supercomputing facilities to support software development and scaling work. Most of the relevant projects in exascale research are located in the US, but we should also investigate the various projects in Europe [10] and Asia. Fortunately SKA is a unique data intensive application that has generated significant interest from the exascale community. Highly data intensive applications are currently notably underrepresented. This is a risk, since it is likely nobody else will step up and solve these SKA scale data intensive challenges, but also provides a clear opportunity for SKA contributors to get involved. It should be

Page 17: D3D WP2 050.020.010 SR 004 E HPC Technology Roadmap

WP2-050.020.010-SR-004 Revision: E

2012-01-27 Page 17 of 17

noted that every single HPC application will sooner or later become data intensive, since the I/O subsystems don’t keep pace with the increase in available compute power. This means that the SKA may be considered a pathfinding application in HPC research. While the above-mentioned activities will go a long way to mitigating the risk of SKA’s dependence on aggressive HPC roadmaps, other risk mitigation activities are suggested. Specifically science requirements should be analysed as to their exposure to this risk, and mitigation strategies identified. This should at least include understanding the impact on science should processing capacity be somewhat constrained. Below is a summary of recommendations relevant to the pre-construction phase of the project:

Recommendations: 1. In the lead up to the science requirements review (SRR), science use-cases should be

analysed with respect to their reliance of cutting-edge HPC. Those science cases with substantial exposure to this risk should describe risk mitigation strategies.

2. Continue and expand efforts to build partnerships with industry, research institutes and exascale code owners.

3. Build and maintain expertise in extreme scaling of data intensive applications.

4. Invest in active and technical contributions in the various international exascale projects,

especially with respect to data intensive applications or the development of systems; either hardware, middleware or software, particularly suited for these data intensive applications.

5. Don’t under-invest in algorithm development or HPC software expertise. The possible

benefits of efficiency improvements, and their impact on operations costs are potentially on the order of €10 million per year.

6. Educate the SKA community about developments in exascale research, e.g. add a recurring

exascale computing session to the yearly technical SKA computing conferences or workshops.

7. Internships at the various exascale software initiatives should be considered, especially early

in the pre-construction phase where the experience can benefit the system design.