HPC and Exascale for the Square Kilometer Array Telescope · 2107-18 Issue tenders, award and...
Transcript of HPC and Exascale for the Square Kilometer Array Telescope · 2107-18 Issue tenders, award and...
HPC and Exascale for the Square Kilometer Array Telescope
www.skatelescope.org
Bill Boas Cray, Business Development , SKA
[email protected] 510.375.8840
HPC Advisory Council, Stanford 1
Table of Contents – Exascale and SKA
● Overall Project Summary
● The Technology Opportunity Funding, Governance, Timeline and Structure
● The Countries participating
● Board of Directors
● Project Structure Pre-Construction Phase – SKA1
● Self-Funded Consortia under contract to SKAO
● Consortias’ Deliverables in 2016
● Acquisition and Construction SKA1 2017-2020
● SKA2 Scale Up and Schedule
● RARIC in US
● Known IBM Activities
Cray Inc. – Jan 2014 2
Overall Project Summary
● SKA is a very large, decade in the making, decades more ahead, global, exascale, radio astronomy observatory, a 24/7enterprise, conceived and driven by astronomers
● 2/3 installed in Africa, 1/3 installed Australia, Project Office in UK, 12 member countries now, USA backed out in 2010
● Three Phases going forward from now ● 2013-16 SKA1 (10%) Pre-Construction requirements, architecture
design, specification 12 Consortia awarded this phase,
● 2107-18 Issue tenders, award and construct SKA1 include pre-cursors
● 2019-2021 design and specify SKA2 (90%) SKA1 operational and for 50 years thereafter
● 2022-24 Issue SKA2 tenders, award, construct, integrate into operations
● Budgets in Euros ● Pre-Construction $90M in-kind by members, 5M for computing
prototypes in Architecture Lab at Cambridge University
● SKA1 650M cap from Project Office; SKA2 ~5000M
Cray Inc. – March 2013
3
Technology Opportunity – Architect, Design and Integrate Real-Time Data Handling, HPC and Big Data
Cray Inc. – Jan 2014 4
# 2013 estimate by SKA South Africa
MeerKAT Pre-Cursor 2014-15
SKA Phase 1 2017-19
SKA Phase 2 Est. 2020-24
Data into CSP 2 Tbps 50 Tbps up to 5 Pbps
Data into SDP 0.4 Tbps 20 Tbps up to 500 Tbps
Into Storage 35 Gbps 300+ Gbps up to 2 Tbps
Computing load 200 TFlops 30+ PFlops 3+ EFlops
…
Incoming Signals from
Dishes and Arrays
Sw
itch
Switch
Co
rrela
tor
Be
am
form
er
Scie
nce
Pro
ce
ssor
Scie
nce
Arc
hiv
e
A No-Stop* Data Streaming, Analysis, Storage and Distribution Architecture
SKA
One observatory South Africa – 2/3
One observatory In Australia -1/3
Re
se
arc
hers
Wo
rldw
ide
* - No-Stop means the data never stops incoming, must be either handled or dropped in bit bucket
*Does not mean “Non-Stop”, h/w, s/w fail-over and silent error detection are not required
# - more information in further slides.
Open Skies
Merit Based
Distribution
To
Researchers
Analyse Signals to
extract Data from Noise Process Data to
Create Visibilities
Archive Visibilities
for Distribution
SKA Organization and Governance Overview
Cray Inc. – Jan 2014 5
Structure Diagram and Current Funding
Cray Inc. – Jan 2014 6
(in kind) means member countries have funded their institutions,
companies and individuals within them (listed further in slides)
to work on SKA Pre-Construction Phase
SKA Governance
● SKA Board of Directors made siting decision in May 2012: ● SKA1-low –Australia ● SKA1-survey –Australia ● SKA1-mid –South Africa
● SKA Office is UK Company Limited by Guarantee i.e. non-profit ● Expedient solution to enable SKA project to proceed; long-term governance
structure under review ● SKA Board has set a cost-cap for SKA1 – 650Million Euro
● Imposes discipline on the design process ● Each Consortium given cost for Pre-Construction Phase
● SKAO has Element Engineers in each Consortium ● Role is to guide the Work Packages of each Consortium
● Board Members coordinate each Country Governments ● Seeking construction funding
● Design guided by scientific and engineering assessments ● Re-baselining in ~1 year..
● To incorporate precursors – MeerKAT South Africa, ASKAP in Australia ● Re-use as much existing infrastructure as possible in SKA1
● Pathfinder – LOFAR in Europe may be?
Cray Inc. – Jan 2014 7
The Countries participating in SKA in 2014
● Australia (DIISRTE) -
● Canada (NRC-Herzberg)
● China (MOST)
● Germany (BMBF)
● Italy (INAF)
● Netherlands (NWO)
● New Zealand (MED)
● South Africa (DST)
● Sweden (Chalmers)
● UK (STFC)
● India (Tata/DAE) – anticipate joining
Cray Inc. – Jan 2014 8
The Project Structure ● Led by SKA Office
● Management ● Science ● System Design and System Engineering ● Maintenance & Support and Operations
● Carried out by Work Package Consortia
● Dish Array ● Aperture Arrays ● Signal and Data Transport (including synchronization and timing) ● Central Signal Processor ● Science Data Processor ● Telescope Manager ● Infrastructure, including power, etc. ● Assembly, Integration and Verification
● Advanced Instrumentation Programs
● Mid Frequency Aperture Array ● Wide Band Single Pixel Feeds
Cray Inc. – Jan 2014 9
Pre-Construction Design and Engineering Structure
● Design of SKA to be by multinational global consortia
● Consortia as contractors to the central office
● Consortia leaders chosen on merit and peer acceptance
● SKA Office holds the design authority for the project.
● SKA Office will run system engineering, receive and review designs from consortia, monitor progress, analyze, assess merit and approve
● SKA Office issued a *baseline conceptual design to serve as starting point for design, based on previous work and CoDRs.
● 10 consortia formed to undertake the design.
Cray Inc. – Jan 2014 10
* http://www.skatelescope.org/wp-content/uploads/2012/07/SKA-TEL-SKO-DD-001-1_BaselineDesign1.pdf
Project Timeline
Cray Inc. – Jan 2014 11
Pre-Construction Phase Consortia and Leaders
● Dish Array – Mark McKinnon, CSIRO, Australia
● Low Frequency Aperture Arrays - Jan Geralt Bij de Vaate, ASTRON, Netherlands
● Mid Frequency Aperture Arrays - Jan Geralt Bij de Vaate, ASTRON, Netherlands
● Signal and Data Transport – Keith Grainge, Univ. Manchester, UK
● Central Signal Processor – David Stephens, MDA/NRC, Canada
● Science Data Processor – Paul Alexander, Univ. Cambridge UK
● Telescope Manager – Yashwant Gupta, NCRA, India
● Infrastructure, including power, etc. -
● Assembly, Integration and Verification – Richard Lord, SKA South Africa
Cray Inc. – Jan 2014 12
DISH Consortium Members
● Commonwealth Scientific and Industrial Research Organization (CSIRO), Australia
● RPC Technologies, Australia
● National Research Council, Canada
● Joint Laboratory for Radio Astronomy Technology (JLRAT), China
● Max Planck Institute for Radio Astronomy (MPIfR), Germany
● Vertex Antennentechnik, Germany
● IAF Fraunhofer, Germany
● National Institute of Astrophysics (INAF), Italy
● European Industrial Engineering (EIE), Italy
● Società Aerospaziale Mediterranea (SAM), Italy
● SKA South Africa, South Africa
● EM Software and Systems (EMSS), South Africa
● Spain University Group, Spain
● Chalmers University/Onsala Space Observatory, Sweden
● Omnisys Instruments AB, Sweden
Cray Inc. – Jan 2014 13
Initial technical Solution
http://www.skatelescope.org/wp-content/uploads/2013/09/SKA-TEL_DSH_MGT-CSIRO-TS-004-1_DishTechSol.pdf
SKA Dishes
Cray Inc. – Jan 2014 14
The Consortium is responsible for the design and
verification of the antenna structure, optics, feed
suites, receivers, and all supporting systems and
infrastructure for SKA1-mid and SKA1-survey.
The Consortium is sub-divided into work
elements, summarised below. The “optics” are
how the dishes are described, and whilst the
tolerances are not as tight as for their optical
counterparts, they still have to be built to a level
of precision unsurpassed in the field of radio
astronomy.
The main challenge is the mass production of several thousand 15m wide telescopes, all with
identical performance characteristics, all built with new design ideas, and built to last and tolerate
the harsh conditions of the deserts in which they will operate. Combine with that the overriding
element of cost, and getting the very best price to performance ratio, and the dish element of the
SKA is a formidable technical and engineering challenge.
The task of the Dish Structure work element is to deliver the construction-ready design for the
structure element of the SKA1-mid and SKA1-survey dishes. Three prototype antennas are being
built within the Consortium: DVA-1 in Canada, DVA-C in China, and MeerKAT-1 in South Africa.
Status of a Candidate Dish ● The Dish Verification Antenna (DVA-1) ● The DVA-1 project in Canada is progressing on
many fronts. The foundation is complete and now sits beneath a 3m pile of regolith to post-load the soil. Trenching for power and data is also complete.
● To improve surface accuracy the primary and secondary molds have been faired. Results on the secondary are extremely good with an error of ~.1mm rms from the design shape.
● Measurements of the primary are underway and are expected to be < .5mm rms. Fabrication of the secondary reflector will begin in mid-April, followed by a large scale infusion test panel for the primary reflector. Layup and infusion of the primary reflector should be complete by July.
● Steady progress is being made on the telescope pedestal by Matt Fleming’s team at Minex Engineering in California. With the major pieces complete, work is focused on integrating parts and measuring performance.
● Other subcontractors such as Profile Composites, FormaShape, and Vectorworks Marine are respectively fabricating sub-components such as carbon feed legs, composite backing pieces, and dish rim connectors. Integration of DVA1 assemblies will begin in early summer with testing expected to begin in the fall.
Cray Inc. – Jan 2014 15
Low Frequency Aperture Arrays
● International Centre for Radio Astronomy Research (ICRAR), Australia
● Key Lab of Aperture Array and Space Application (KLAASA), China
● German Long Wave Consortium (GLOW), Germany
● National Institute for Astrophysics (INAF), Italy
● University of Malta, Malta
● Netherlands Institute for Radio Astronomy (ASTRON), The Netherlands
● Joint Institute for VLBI in Europe (JIVE), The Netherlands
● University of Cambridge, UK
● University of Manchester, UK
● University of Oxford, UK
● Massachusetts Institute of Technology (MIT), USA
Cray Inc. – Jan 2014 16
Mid Aperture Array Consortium Members
● Key Lab of Aperture Array and Space Application (KLAASA) , China
● University of Bordeaux, France
● Paris/Nançay Observatory, France
● University of Malta, Malta
● Netherlands Institute for Radio Astronomy (ASTRON), Netherlands
● Instituto de Telecomunicações (IT), Portugal
● SKA South Africa, South Africa
● University of Cambridge, UK
● University of Manchester , UK
● University of Oxford , UK
Cray Inc. – Jan 2014 17
Aperture Arrays
Cray Inc. – Jan 2014 18
• An aperture array is a large number of small,
fixed antenna elements coupled to appropriate
receiver systems which can be arranged in a
regular or random pattern on the ground.
• A signal “beam” is formed and steered by
combining all the received signals after
appropriate time delays have been introduced
to align the phases of the signals coming from a
particular direction.
• Innovative, efficient and low cost, aperture
array antennas provide a large field of view and
are capable of observing more than one part of
the sky at once.
• By simultaneously using different sets of timing
delays, this “beam forming” can be repeated
many times to create multiple independent
beams, yielding an enormous total field of view.
• The ability to configure numerous beams will
permit the system to look at multiple regions of
the sky simultaneously, massively increasing
the telescope survey speed.
• The number of useful beams produced, or total
field of view, is limited by the available
computing and communications capabilities.
• The SKA from the outset is challenging
academia, industry and technologies with
concepts and designs that, at this time have not
been developed and do not exist.
Low
Frequency
in SKA1
Mid
Frequency
added in
SKA2
Concept of
Beamforming
Signal and Data Transport Consortium Members
● Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
● Australia Academic and Research Network (AARNet), Australia
● Tsinghua University/ Peking University, China
● National Centre for Radio Astrophysics (NCRA) / Tata Consulting, India
● Netherlands Institute for Radio Astronomy (ASTRON), The Netherlands
● Joint Institute for VLBI in Europe (JIVE), The Netherlands
● Instituto de Telecomunicações (IT), Portugal
● SKA South Africa, Nelson Mandela Metropolitan University (NMMU), South Africa
● University of Granada, Spain
● University of Manchester, UK
● National Physical Laboratory (NPL), UK
● DANTE, UK
Cray Inc. – Jan 2014 19
SADT Overview
Cray Inc. – Jan 2014 20
Central Signal Processor Consortium Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
International Centre for Radio Astronomy Research (ICRAR), Australia
Swinburne University of Technology, Australia
CISCO, Australia
National Research Council of Canada (NRC), Canada
Canadian Institute of Theoretical Astrophysics (CITA), Canada
MDA Systems Ltd, Canada
Key Lab of Aperture Array and Space Application (KLAASA), China
Max Plank Institute for Radio Astronomy (MPIfRA), Germany
National Centre for Radio Astrophysics (NCRA), India
National Institute for Astrophysics (INAF), Italy
SELEX Electronic Systems, Italy
University of Malta, Malta
Netherlands Institute for Radio Astronomy (ASTRON), The Netherlands
Joint Institute for VLBI in Europe (JIVE), The Netherlands
Cray Inc. – Jan 2014 21
Netherlands eScience Center (NLeSC), The
Netherlands
AUT University, New Zealand
Massey University, New Zealand
University of Auckland, New Zealand
Compucon New Zealand, New Zealand
Open Parallel Ltd, New Zealand
SKA South Africa, South Africa
Reutech Radar Systems (A Division of Reutech
Limited), South Africa
Ingeniería de Sistemas para la Defensa de España
(ISDEFE), Spain
Universidad Politécnica de Madrid (UPM), Spain
IBM Zurich, Switzerland
Science and Technology Facilities Council (STFC),
UK
University of Manchester, UK
University of Oxford, UK
Adaptative Array Systems Limited, UK
NVIDIA, USA
NASA JPL, USA
Correlators Approaches Summary
Cray Inc. – Jan 2014 22
Sub Element Lead Institute
Or Person
Approaches Technologies
LOW-AA correlator Oxford / Zarb-Adami “Hardware-based”, Custom PowerMX, Uniboard-2, or
Redback, or other TBD.
Primarily FPGAs.
LOW-AA correlator Curtin / Steve Ord “Software-based”
programming methods.
Results could impact
Stage 2 investigations in
all correlator sub-elements
COTS NVIDIA GPUs as a
starting point, using all
COTS equipment.
(SMART research)
MID-DISH correlator/BF NRC+SKA-
SA/Carlson+Kapp
PowerMX custom platform,
with standards generation
for use/contribution
elsewhere.
FPGAs are the baseline.
Will consider ASICs and
SMART research
SUR-DISH correlator
NZ Alliance / Ensor “Hardware-based”. Multi-
facetted; will consider
PowerMX, Redback, and
others…whatever works
best.
FPGAs, multi-core CPUs,
ASICs, possibly mixed-
and-matched on PowerMX
boards or other platforms.
PSS Engine UManchester / Stappers Currently studied for COTS
using existing software
algorithms/methods
Primarily GPUs, but will
consider FPGA and even
ASIC accelerators to save
cost and power
PST Engine Swinburne/van Straten COTS using existing
software algorithms,
methods, and research.
COTS GPUs. Cost and
power here are negligible.
Signal Processing Functions
Cray Inc. – Jan 2014 23
Science Data Processor Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
International Centre for Radio Astronomy Research (ICRAR), Australia
iVEC, Australia
University of Melbourne, Australia
Canadian Consortium, University of Calgary, Canada
Canadian Astronomy Data Centre (CADC), Canada
IBM Canada Limited, Canada
Calgary Scientific Inc., Canada
Rackforce, Canada
University of Alberta, Canada
University of British Columbia, Canada
McGill University, Canada
University of Chile, Chile
Regensburg University, Germany
Netherlands Institute for Radio Astronomy (ASTRON), The Netherlands
Institute for Radio Astronomy and Space Research (IRASR), New Zealand
COMPUCON New Zealand
GreenButton, New Zealand
Cray Inc. – Jan 2014 24
AUT University, New Zealand
Massey University, New Zealand
University of Auckland, New Zealand
University of Otago, New Zealand
Callaghan Innovation, New Zealand
Open Parallel, New Zealand
Victoria University of Wellington, New Zealand
Instituto de Telecomunicações (IT), Portugal
University of Évora, Portugal
Council for Scientific and Industrial Research (CSIR), South Africa
Instituto de Astrofísica de Andalucía, Spain
Barcelona Supercomputing Center, Spain
Fundación Centro de Supercomputación (FCSCL), Spain
Science and Technology Facilities Council (STFC), UK
University of Manchester, UK
University of Cambridge, UK
Oxford University, UK
University College London, UK
University of Southampton, UK
Google, USA
SDP Computing Tasks
The tasks are combined in iterative loops that typically involve refining estimates of array parameters – such as complex gains – while concurrently creating images that converge towards the transformed observed data in these steps. ● Removing data that has been corrupted by interference or faults in the system. This can
include interference from mobile phones, or any other Earth based radio signal, by errors in the signal transport, or problems with the hardware.
● Calibrating each antenna’s signal to remove the effects of instrumental variation and variations in the line-of-sight propagation of the radio signal. With so many radio telescopes, this requires huge amounts of processing power to manage.
● Transforming the data onto a rectangular grid in what radio astronomers call the “u-v plane” – this is akin to interpolating a few randomly scattered altitude measurements onto a regular map grid to estimate the altitudes at all grid intersections.
● A mathematical calculation called a Fourier transformation to convert the data into a representation of the object’s image in the sky.
● A further calculation called “deconvolution of the point-response function of the array” to remove the radio equivalent of the spikes around bright stars in an optical image.
These steps must be done for thousands of separated frequencies, as whilst the SKA radio telescopes work over a low, mid and high frequency ranges, they are indeed that – ranges, and thousands of individual frequencies must be analysed within each range. The SKA’s computers are required to do all of this in real time. Buffer memory is required to store interim processing results while the processing loops are being executed. The science processing facility will also have large data storage sub-systems. The end results of the converged image processing form the basis of the final astronomical images that are distributed to astronomers and physicists around the world.
Cray Inc. – Jan 2014 25
SDP Design Approach ● Adopt Incremental and Iterative Design approach to the system engineering ,and prototyping
● Horizontal prototyping aims to provide a system-wide prototype, Vertical prototyping provides performance and functionality of individual components
● Open Architecture Lab Approach based on Lawrence Livermore Hyperion Emphasis on a determining an appropriate scalable element
● Emphasis on system-level components of the open source software stack
● Create an evaluation and prototyping testbed: Petascale I/O technology scaling for SKA1 and future capacity to SKA2 - processor, memory, networking, storage, visualization, etc.
● Design for future technology refresh, expansion, and upgrades
Cray Inc. – Jan 2014 26
Cray Inc. – Jan 2014 27
Overall SDP Bock Diagram
Multi-Tasking and Multifunction Processing
Cray Inc. – Jan 2014 28
Cray Inc. – Jan 2014 29
Estimated SDP Sizing
Cray Inc. – Jan 2014 30
Telescope Manager
● Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
● National Research Council of Canada (NRC), Canada
● GTD GmbH, Germany
● National Centre for Radio Astrophysics (NCRA), India
● Tata Research, Development and Design Centre (TRDDC), India
● National Institute for Astrophysics (INAF), Italy
● Instituto de Telecomunicações (IT), Portugal
● Geo-Space Sciences Research Center (CICGE-FCUP ), Portugal
● SKA South Africa, South Africa
● Science and Technology Facilities Council (STFC), UK
Cray Inc. – Jan 2014 31
Telescope Manager Description
● This element includes all hardware and software necessary to control the telescope and associated infrastructure. The TM includes the co-ordination of the systems at observatory level and the software necessary for scheduling the telescope operations. It also includes the central monitoring of key performance metrics and the provision of central co-ordination of safety signals.
● The TM provides physical and software access to, and at, remote locations for transmission of diagnostic data and local control.
● The TM design and development, when complete will be responsible for the monitoring of the entire telescope, the engineering and operational status of its component parts.
● The TM is also responsible for enabling control of various sub-systems and their associated components, as well as provide and support online and physical access
● The TM will send control signals when needed, detect and manage faults if they arise, control associated infrastructure, and coordinate the handling of safety signals.
● The TM is also responsible for coordinating observations, including telescope operations, operator infrastructure, metadata collection, archiving of collected monitor and control data, and much more.
● The TM also links to a number of other work package elements through interfaces and provides the backbone for the functioning of the telescope arrays.
● In summary, the TM is responsible for the management of all astronomical observations, management of all the telescope hardware and software systems that perform the observations and facilitating communication across the primary stakeholders, in addition to ensuring safety
Cray Inc. – Jan 2014 32
Infrastructure, Power, etc.
● Two Consortiums – Australia and South Africa separate ● INFRA-AU ● Aurecon, Australia
● Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
● Woodhead, Australia
● Radio Quiet Zone (RQZ) Solutions, Australia
● Rider Levett Bucknall, Australia
● INFRA-SA ● SKA South Africa – Tracy Cheetham
Cray Inc. – Jan 2014 33
Assembly Integration Verification
● Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia
● Netherlands Institute for Radio Astronomy (ASTRON),The Netherlands
● SKA South Africa, South Africa
Cray Inc. – Jan 2014 34
Acquisition and Construction SKA1 2017-2020
● 2016 Consortia deliverables to SKAO are documents to enable preparation of necessary tenders
● 2107 an organization formed to enable the procurements should be formed by the Board and set up to issue the tenders
● 2017 can expect some teams (not current consortia) are encouraged to respond to tenders
● 2017 response to tenders and selection of successful bidding teams
● 2018 construction of SKA1 begins
● 2020 SKA1 operations begin
Cray Inc. – Jan 2014 35
SKA2 Scale Up and Schedule
● Design and Engineering completed by 2020 based on experience and learnings from SKA1
● 2021 tenders issued
● 2022 build out begins
● 2025 SKA2 complete and operations begin
● SKA intended to be operational until 2075
Cray Inc. – Jan 2014 36
Radio Astronomy Related Industry Consortium (RARIC)
● Missions ● Prototype, develop, document and advocate RA as an area of science
exploration as valuable to US industry at extreme scale ● Guidance from astronomers collaborating with industry to create working
groups and carry out projects that advance technologies for RA ● Add to industry and government understanding of RA’s economic and
societal value ● Provide results of work to SKAO and other RA projects of extreme scale
● Initial Voting Members ● Industry – Cray, DDN, IBM, Intel*, Mellanox, Nvidia*, Xyratex ● Academic - Berkeley, Cornell*, Illinois/NCSA, JPL/Caltech, UNC/Renci
● Proposed Working Groups ● Data Management – Paul Grun Co-Chair with Caltech ● Software Correlation – Co Chairs are NCSA and Berkeley ● Foundational Technologies – Bill Boas co Chair with Alan Benner ● Transient Streaming Analytics – proposed by JPL
Cray Inc. – March 2013 37
* - not yet agreeing to join and assign people
SKA’s Analysis of HPC Roadmap by 2017
Cray Inc. – Jan 2014 38
Historical Progress of Top500 • By 2017 SKA goals should be more than
realistic for FLOPs
• May even be able to use a Top100
system
• Integer and no.of threads perspective
may be a challenge
• As may be the data streaming aspect
Memory Bandwidth • Arithmetic Intensity is lower so FLOPs is
not a good measure
• Expected to fall short by (10) by 2017
Data Rates • Correlator preliminary projections lead to
700 40GbE or 280 100GbE
• Beamformer is different, need is higher by
O(10)
Power • Projections for Exascale of 20 MW, not
including cooling in desert
• Cost range in Euro maybe in range 45-
100M per year !!
2018 2009 vs 2018
Rmax 1 EFlop O(1000)
Energy requirement 20 MW O(10)
Energy/Flop 20 pJ/Flop -O(100)
System memory 32 - 64 PB O(100)
Memory/Flop 0.03 B/Flop -O(10)
Node performance 1 - 15 Tflop O(10) - O(100)
Node interconnect b/w 200-400 GB/s O(100)
Memory bw/node 2 - 4 TB/s O(100)
Memory bw/Flop 0.002 B/s/Flop -O(100)
Concurrency O(109) O(10,000)
MTTI O(1 day) -O(10)
Software and Applications – major needs
● Operating Systems and Middleware ● No-Stop Data Streaming at Extreme Scale
● Huge increase in concurrency
● Significant Increase in I/O efficiency
● Consistency across heterogeneous hardware
● Synchronicity across nodes and cores to run in lock-step
● No “failures” (h/w or s/w) impact continuous operation
● Compilers/Libraries/Development Tools ● Consistent across Hardware and Instruction Set Architectures
● Exreme levels of parallelism across all APIs
● Failure Handling does not interrupt No-Stop Data Streaming at Extreme Scale
● Programmer knowledge of data transformations and locality of data within streams
● Knowledge of fops, bytes and joules in real time
● Algorithms and Applications ● Fortunately nearly all codes today are “home grown” in RA
● To achieve the necessary scale nearly all current RA codes need re-writing
● No. of channels and computing resources are roughly in balance, not at extreme scale
● Co-Design ● Close collaboration between RA and Industry NECESSARY for all of the above
Cray Inc. – Jan 2014 39
Cray Inc. – Jan 2014 41
2018 2009 vs 2018
Rmax 1 EFlop O(1000)
Energy requirement 20 MW O(10)
Energy/Flop 20 pJ/Flop -O(100)
System memory 32 - 64 PB O(100)
Memory/Flop 0.03 B/Flop -O(10)
Node performance 1 - 15 TFlop O(10) - O(100)
Node interconnect b/w 200-400 GB/s O(100)
Memory bw/node 2 - 4 TB/s O(100)
Memory bw/Flop 0.002 B/s/Flop -O(100)
Concurrency O(109) O(10,000)
MTTI O(1 day) -O(10)
Table 1 - Projected supercomputer specifications, compared to two current top ranking supercomputers.
Overall Project Summary
● SKA is a very large, decade in the making, decades more ahead, global, exascale, radio astronomy observatory, a 24/7enterprise, conceived and driven by astronomers
● 2/3 installed in Africa, 1/3 installed Australia, Project Office in UK, 12 member countries now, USA backed out in 2010
● Three Phases going forward from now ● 2013-16 SKA1 (10%) Pre-Construction requirements, architecture
design, specification 12 Consortia awarded this phase,
● 2107-18 Issue tenders, award and construct SKA1 include pre-cursors
● 2019-2021 design and specify SKA2 (90%) SKA1 operational and for 50 years thereafter
● 2022-24 Issue SKA2 tenders, award, construct, integrate into operations
● Budgets in Euros ● Pre-Construction $90M in-kind by members, 5M for computing
prototypes in Architecture Lab at Cambridge University
● SKA1 650M cap from Project Office; SKA2 ~5000M
Cray Inc. – March 2013
42