Download - 2020#and#Beyond….The#ExaScale##€¦ · Exa : greek ἕξ, six ... "Over#the#pastsix#decades,#U.S.#compu=ng#capabili=es#have# been#maintained#through#con=nuous#research#and#the#

2020 and Beyond….The ExaScale

1

What is ExaScale? §  Exascale compu=ng refers to compu=ng systems capable of at

least one exaFLOPS, or a billion billion calcula=ons per second. Such capacity represents a thousandfold increase over the first petascale computer that came into opera=on in 2008 (One exaflops is a thousand petaflops or a quin=llion, 1018, floa=ng point opera=ons per second.)

§  Exascale compu=ng would be considered as a significant achievement in computer engineering, for it is believed to be the order of processing power of the human brain at neural level. It is, for instance, the target power of the Human Brain Project.

2

Exa : greek ἕξ, six International unit prefix = 1000⁶ = (10³)⁶ = 10¹⁸ 1 000 000 000 000 000 000 = billion billion = quintillion

Who is pushing ExaScale?

§  The ini=a=ve has been endorsed by two US agencies: the Office of Science and the Na=onal Nuclear Security Administra=on, both of which are part of the US Department of Energy.

§  The technology would be useful in various computa=on-‐intensive research areas, including basic research, engineering, earth science, biology, materials science, energy issues, and na=onal security.

§  The United States has put aside $126 million for Exascale compu=ng beginning in 2012.

3

Who is doing it? §  USA, Europe, Japan with some from China & India §  Three European projects aiming at developing technologies and so_ware for Exascale compu=ng have been started in 2011 §  The CRESTA project (Collabora=ve Research into Exascale Systemware,

Tools and Applica=ons)

§  DEEP project (Dynamical ExaScale Entry Plaaorm) §  Mont-‐Blanc

§  In Japan, the RIKEN Advanced Ins=tute for Computa=onal Science is planning an exascale system for 2020, it will consume less than 30 megawads

4

And in the US…..

§  On 29 July 2015, President Obama signed an execu=ve order crea=ng a Na=onal Strategic Compu=ng Ini=a=ve calling for the accelerated development of an Exascale system and funding research into post-‐semiconductor compu=ng

5

Top500

6

Top500

7

Challenges §  It has been recognized that enabling applica=ons to fully

exploit capabili=es of Exascale compu=ng systems is not straighaorward.

§  In fact, in June 2014, the stagna=on of the Top500 supercomputer list had observers ques=on the possibility of Exascale systems by 2020. (original predic=on was 2018!!)

8

Na=onal Strategic Compu=ng Ini=a=ve §  NSCI §  Five themes:

§  Create systems that can apply Exaflops of compute to Exabytes of data §  Keep the US at the forefront of HPC capabili=es §  Improve HPC Applica=on developer produc=vity §  Make HPC readily available §  Establish hardware technology for future HPC systems

§  NSCI will be driven largely by the Na=onal Science Founda=on and the departments of Defense and Energy. NASA, the FBI, the Department of Homeland Security, the Na=onal Ins=tutes of Health and the Commerce Department's Na=onal Oceanic and Atmospheric Administra=on are designated as the five "deployment agencies" that will put the planned computers to use and will take part in the planning and development efforts.

10

https://www.whitehouse.gov/sites/default/files/microsites/ostp/nsci_fact_sheet.pdf

NSCI…..

§  The fastest supercomputer in the world today is China's Tianhe-‐2, which runs at 33.86 petaflops.

§  In April, the Energy Department set aside $200 million that is expected to produce a peak performance of 180 petaflops when it is delivered in 2018.

§  "Over the past six decades, U.S. compu=ng capabili=es have been maintained through con=nuous research and the development and deployment of new compu=ng systems," the execu=ve order states. "Maximizing the benefits of HPC in the coming decades will require ... a cohesive, strategic effort within the federal government and a close collabora=on between the public and private sectors."

11

Europe/ESSI

EESI ambi=on is to support Europe Exascale compu=ng strategy by providing the appropriate guidance to build such capability and let Europe remain ahead of compe==on. The objec=ves of this project, is to focus on so_ware key issues improvement, cross-‐cupng issues advances, and gap analysis.

12

Europe

§  ESSI §  The European Exascale So_ware Ini=a=ve 2008

13

Europe/ESSI Aims

Ø  Contribute to the coordina=on and the monitoring of the European Exascale Open Source so_ware produc=on

Ø  Produce a roadmap of Exascale industrial applica=ons, for Climate, Earth Sciences, Fundamental Physics, Life Science with a par=cular emphasis on the breakthroughs and gap analysis.

Ø  Produce a roadmap of Numerical Libraries, So_ware eco-‐system, scien=fic so_ware engineering and programmability. Once again, emergence of breakthroughs in linear or non linear algebra or in par=cle simula=on for example, will be monitored ex: eigenvalues of tensors?

Ø  Follow up of research program in massively parallel stochas=c programming, Uncertain=es, Power, Performance, Data management, Resilience, with a par=cular emphasis on the breakthroughs and gap analysis.

Ø  Exchange and dissemina=on Ø  Act as a pro-‐ac=ve European voice in the interna=onal community and propose an

Interna=onal Exascale So_ware Ini=a=ve Ø  Prepare periodic synthesis by key issues – Recommenda=ons for EC, Funding

agencies and R&D stakeholders

14

Europe/ESSI

15

Europe/DEEP and DEEP-‐er

Ø  DEEP is a research project developing a novel architecture for next-‐genera=on supercomputers

Ø  Hardware architecture Ø  A new type of "cluster booster architecture" has been designed for the

DEEP system. Ø  The final DEEP prototype system consists of a 128 node Eurotech

Aurora Cluster and a 384 node Booster.

16

Europe/DEEP and DEEP-‐ER

Ø  DEEP takes the concept of compute accelera=on to a new level: it combines a standard, InfiniBand™ Cluster using Intel® Xeon® nodes (Cluster Nodes) with an highly scalable Booster constructed of Intel® Xeon Phi™ co-‐processors (Booster Nodes) and the EXTOLL high-‐performance 3D torus network.

Ø  Both interconnects are coupled by Intel® Core™ Booster Interface Nodes.

Ø  Code parts with limited scalability (e.g. because of complex control flow or data dependencies) run with high efficiency on the Cluster side, and the transparent bridging of both interconnects facilitates high-‐speed data transfer between the two sides

17

Europe/DEEP and DEEP-‐ER

18

Meanwhile in the US of A….. §  DOE spending $200M on next-‐gen supercomputer §  The Department of Energy to deliver two next-‐genera=on

supercomputers to the department's Argonne Na=onal Laboratory.

§  The contract is the third and final part of the DOE’s $525 million Collabora=on of Oak Ridge, Argonne and Lawrence Livermore (CORAL) ini=a=ve, which is developing systems that will be five to seven =mes more powerful than today’s top supercomputers. Intel Federal LLC is the prime contractor and will deliver a system called Aurora, based on Cray "Shasta" supercomputers.

19

Aurora….. §  The Aurora system is expected to have a peak performance of

180 petaflops when it is delivered in 2018, making it the world’s most powerful system currently announced to date. Supercomputers used for weather forecas=ng by the Na=onal Oceanic and Atmospheric Administra=on, by comparison, are currently being upgraded to five petaflops, while Oak Ridge Na=onal Laboratory's Titan supercomputer is capable of 27 petaflops.

§  The DoE also announced a high-‐performance compu=ng R&D program, DesignForward, and FastForward which is led by DOE’s Office of Science and Na=onal Nuclear Security Administra=on.

20

Meanwhile in the US of A…..

§  The Aurora system is expected to have a peak performance of 180 petaflops when it is delivered in 2018, making it the world’s most powerful system currently announced to date..

21

FastForward

§  In 2011, the FastForward RFP solicited innova=ve R&D proposals in the areas of processor, memory, and storage, and I/O that will maximize energy and concurrency efficiency while increasing the performance, produc=vity, and reliability of key DOE extreme-‐scale applica=ons from both the NNSA and the Office of Science.

§  The goal is to begin addressing long-‐lead =me items that will impact extreme-‐scale HPC systems later this decade.

§  Award winners: §  NVIDIA, IBM, Intel, AMD, WhamCloud

22

AAR Fast Forward 2 Memory Technology

23

InvesCgators • Mike Ignatowski (PI -‐ Technical lead) • Jay Owen (Director) DOE representaCves • Jeanine Cook, SNL • John May, LLNL

Total Funding • Approximately $13M Contract Period of Performance

• Sept 2014 – Dec 2016

Research and Development in:

• New Advanced Memory Interface

• Mul=-‐level Memory Architecture and

So_ware Support

•  Processing-‐in-‐Memory Architecture and

So_ware Support

•  Processing-‐in-‐Memory Test Bed Co-‐Design, Technology Transfers, and InteracCons with System Integrators

• Driving Research Results into Future Products

Proposed Exascale Memory Architecture

NVIDIA Fast Forward 2 Node Architecture

24

InvesCgators • Bill Dally (PI -‐ Technical Lead) • Sylvia Chanak (Project Manager) DOE representaCve • Robin Goldstone, LLNL

Total Funding • Approximately $19M Contract Period of Performance • Oct 2014 – Dec 2016

ApplicaCon co-‐design Explore algorithm trade-‐offs, evaluate research concepts, and engage with DOE developers.

Node Architecture Improve energy efficiency of throughput processors and total bandwidth to memory.

Resilience Develop lightweight error detec=on and recovery mechanisms.

Circuits and VLSI Design energy efficient signaling, power delivery, and memory technologies.

NIC architecture Study TOC integra=on and resilience issues.

Programming System Portable, high-‐performance programming with tools to help automate target-‐specific tuning.

CoMD LULESH OpenMC

System Sketch:

Co-‐design codes include:

HPGMG-‐FV

Cray FastForward 2 Node Architecture

25

InvesCgators • Michael Langer (Program Manager) • Gregory Faanes (PI -‐ Technical lead) • Daniel Ernst (PI -‐ Technical lead) DOE representaCves • Josip Loncaric, LANL • Bronis de Supinski, LLNL

Contract Period of Performance • Nov 2014 – Jan 2017

Efforts in:

• ARM node technology

• High-‐efficiency core definiCons

•  System on Chip (SoC) invesCgaCons

• Next generaCon ISA definiCon • Architecture and performance simulaCon

• Core and SoC simulaCon environments

•  IniCal performance and sobware insight

• Memory/Power Mgmt/Resiliency InvesCgaCons

•  Exascale Compiler / RunCme invesCgaCons

•  SoC concepts for 2020 and 2022

High Level FastForward 2 Plan

Siml Platforms

ISA

Compiler Runtime

Core Arch

Memory Arch

NIC Arch

2020/22 SOC Arch

IBM Fast Forward 2 Memory Technology

26

InvesCgators • Hillery Hunter (PI -‐ Technical lead), IBM • Thomas Gray, nVIDIA

DOE representaCves • Andres Marquez, PNNL • Maya Gokhale, LLNL

Total Funding • Approximately $7M Contract Period of Performance • Sept 2014 – Dec 2016

Flexible Future Memory Interfaces • Flexible memory controller implementa=on in a commercial server system

• Infrastructure for performance modeling, simula=on and analysis

• Universal protocol and specifica=on development for different memory technologies

Memory Power Efficiency • Memory interface architecture enhancements

• Energy-‐efficient off-‐chip signaling • Energy-‐efficient signaling within stacks

• Energy-‐efficient DRAM core design • Enable reduced DRAM voltage without significant performance impact

• Improve DRAM row locality with DRAM-‐side caching

• Power management innova=ons

Advanced Stacked Memory Architectures • Develop architecture and feature set for next-‐gen memories

• Explore a future paradigm in which composable stacked memory devices may be built using a single design to support different memory technologies, interfaces, applica=ons

Reliability for Large-‐Scale Memory Systems

• I/O reliability enhancement techniques • Core reliability enhancement techniques • Advanced error reduc=on, handling and repairing

DRAM dies

Base dieType 1

TSV

Micro bump

TSV

Base dieType 2

C4 balls

STT-‐MRAMdies

Intel Fast Forward 2 Node Architecture

27

InvesCgators • Shekhar Borkar (PI -‐ Technical lead) DOE representaCves • Nick Wright, LBNL • Maf Leininger, LLNL

Total Funding • Approximately $20M Contract Period of Performance • Sept 2014 – Dec 2016

Node Architecture and Microarchitecture • Complete node architecture; evaluate and validate using proxy applicaCons

Prototype Processor Design • Complete processor design, ready for fabricaCon

Prototype Node board and System P1 Design • Design memory subsystem, node, and conceptual P1 system cabinet

Sobware Stack • Leverage SW stack developed under X-‐Stack, OCR runCme, and legacy support

ApplicaCons with Co-‐design Centers • Evaluate the approach using both legacy proxy applicaCons and refactored proxies using X-‐Stack sobware stack

Memory hierarchy expected for exascale machines devised in Fast Forward 2

Design Forward

§  The objec=ve of the DesignForward program is to ini=ate partnerships with mul=ple companies to accelerate the R&D of cri=cal technologies needed for extreme-‐scale compu=ng. It is recognized that the broader compu=ng market will drive innova=on in a direc=on that may not meet DOE’s mission needs..

§  Winners: §  AMD, Cray, IBM, Intel and nVIDIA

28

AAR Design Forward 2 System IntegraCon

29

InvesCgators • Wayne Burleson (PI, Technical lead) • Bill Brantley (Technical co-‐lead) • Andy Kegel (Senior Manager) DOE representaCves • Josip Loncaric, LANL • Jonathan Carter, LBL

Total Funding • Approximately $2.5M Contract Period of Performance • Mar 2015 – Feb 2017

Conceptual System Design • Develop and specify a high-‐level design for an exascale system that could be delivered in 2023 =meframe; consult with System Integrators to evaluate alterna=ves.

ExecuCon Model • Extend the Heterogeneous Systems Architecture (HSA) and AMD’s eXtended Tasking Queuing (XTQ) research for the full so_ware stack.

Metrics and their EvaluaCon • Analyze power and compute capacity per rack, system scalability, resilience, communica=on, value to workflow, and cost of ownership.

Co-‐Design AcCviCes • Par=cipate with DOE researchers in working groups, deep dives, and proxy applica=on studies

Cray DesignForward 2 System IntegraCon

30

InvesCgators • Michael Langer (Program Manager) • Larry Kaplan (PI -‐ Technical lead) • Bob Alverson (PI -‐ Technical Lead) DOE RepresentaCves • Al Geist, ORNL • Paul Hargrove, LBL

Contract Period of Performance • Feb 2015 -‐ Mar 2017

System Component Study • Examine Exascale system architectures, including node architectures, networks, and system I/O

ExecuCon Model • Derive a runCme specificaCon and apply it to a study of an evoluConary and a revoluConary programming model

System OrganizaCon • Using proxy applicaCons as guideposts, examine a selecCon of Exascale system configuraCons using the studied components; arCculate relaCve advantages and disadvantages of the component technologies

Node study to be based on Abstract Machine Models*

*Ang, J. A., et al. "Abstract Machine Models and Proxy Architectures for Exascale Computing." Sandia National Laboratories and Lawrence Berkeley National Laboratory, 2014.

IBM Design Forward 2 System IntegraCon

31

InvesCgators • ConstanCnos Evangelinas (CO-‐PI) • Charles Johns (CO-‐PI) DOE representaCves • Robin Goldstone, LLNL • Nick Wright, LBL

Total Funding • Approximately $2.5M Contract Period of Performance • 2015-‐2017

ExecuCon Model • Define architecture and abstracCons of execuCon model and interacCons with programming models.

Advanced Memory Architectures • Explore appropriate interfaces; examine possible approaches and architectures.

ExecuCon Emulator and Performance Modeling • Allow for performance esCmaCon of exemplar workflows in different hardware configuraCons.

Metrics • Define metrics for system evaluaCon in concert with DF2 laboratories.

System OpCons • Understand tradeoffs in design opCons based on workflows and (proxy) applicaCons.

Data Centric Computing Model*

*Agerwala, T and Perone, M, “Data Centric Systems The Next Paradigm in Computing”, ICPP 2014.

Intel Design Forward 2 System IntegraCon

32

InvesCgators • Tryggve Fossum (PI -‐ Technical lead) DOE representaCves • Ray Bair, ANL • Jeff Broughton, LBL

Total Funding • Approximately $2.5M Contract Period of Performance • Mar 2015 – Jan 2017

Workload Tools Development • Develop tools and methodologies to convert applicaCon workloads to simulator-‐specific input formats.

Component Simulator Enhancements • Extend exisCng simulators to operate within mulC-‐Cered simulaCon framework.

System Performance Studies • Apply simulaCon framework to non-‐trivial benchmarks and workloads.

Programming Model EvaluaCon • Evaluate and compare programming and execuCons models via simulaCon.

System Architecture Design Study • UClize performance study results in a full system design study.

Multi-layer refinement and abstraction of simulation results

Further Reading…

§  IESP §  Interna=onal ExaScale So_ware Project, exascale.org