Perspective on Extreme Scale Computing in China fileDepei Qian Beihang University ... 863 program...

31
Perspective on Extreme Scale Computing in China Depei Qian Beihang University Co-Design2014Guangzhou Nov. 6, 2014

Transcript of Perspective on Extreme Scale Computing in China fileDepei Qian Beihang University ... 863 program...

Perspective on Extreme Scale Computing

in China

Depei Qian

Beihang University

Co-Design2014,Guangzhou

Nov. 6, 2014

A historical review

• Changes of the research priority on computing under the 863 program

– 1987:intelligent computer was the initial goal

• Developing Lisp machine and Prolog machine

• Supporting AI applications

• Influenced by the Japanese fifth generation computer plan

– 1990:changed to parallel computers

• Developing SMP and MPP for parallel computing

• Supporting compute-intensive applications

• To meet the needs of research community and industry

– 1998:emphasis on both high performance computers and HPC environment

• Pursuing easy use of the high performance computers

• Establish a widely available computing facility

• Influenced by the PACI program in the US

A historical review

– 2002-2005:High Performance Computer and Core Software (863 key project)

• Emphasizing resource sharing and collaborative work

• Supporting applications in multiple areas with the Grid technology

• Successfully developed TF-scale computers and China National Grid (CNGrid) testbed

– 2006-2010:High productivity Computer and Service Environment (863 key project)

• Emphasizing other features besides the peak performance– Efficiency in program development

– Portability of programs

– Robust of the system

• Emphasizing coordinated development of machine, environment, and applications

• Emphasizing the service features of the environment

• Successfully developed Peta-scale computers, upgraded CNGrid into the national HPC service environment

A historical review

– 2010-2016:High Productivity Computer and

Application Service Environment(863 key project)

• Emphasizing new operation models and

mechanisms of CNGrid

• Developing cloud-like application villages over

CNGrid to promote applications

• Developing world-class computer systems

– Tianhe-2

– Sunway-NG

A historical review

• High performance computer development in the past 20

years in China

– 1993:Dawning-I,shared memory SMP, 640 MIPS peak

• Dawning 1000: MPP, 2.5GFlops(1995)

• Dawning 1000A:cluster(1996)

• Dawning 2000:111GFlops(1999)

• Dawning 3000:400GFlops(2000)

– 2003:Lenovo DeepComp 6800,5.32TFlops peak,

cluster

– From 1993 to 2003: the performance of a single system

increased more than 8000 times

A historical review

– 2004:Dawning 4000A,Peak performance 11.2TFlops, cluster

• Lenovo 7000,150TFlops peak, Hybrid cluster and Dawning 5000A,

230TFlops, cluster (2008)

• TH-1A, 4.7PFlops peak, 2.56PFlops LinPack, CPU+GPU (2010)

• Dawning 6000, 3Pflops peak, 1.27 PFlops LinPack, CPU+GPU (2010)

• Sunway-Bluelight,1.07PFlops peak,796TF LinPack, Homogeneous,

implemented with China’s multicore processors (2011)

– 2013: Tianhe-2,54PFlops peak and 33.9PFlops LinPack,

CPU+MIC accelerated architecture

– From 2003 to 2013: the performance of a single system increased by

10000 times

– In the past 20 yeas, the performance of the fastest systems

developed by the 863 program increased 84,000,000 times

Experiences• Coordination between the national research programs and the

development plans of the local government/application organizations

• Matching money for developing the computers

• Joint effort in establishing national supercomputing centers

• Collaboration between industry, universities, research institutes, and application organizations– HPC centers played an important role in the development

of high performance computers• Selecting the team

• Defining the system features

– Industry participated in system development• Inspur, Sugon, Lenovo actively participated in the development

of PF- and 50PF-scale high performance computer systems

– Application organizations led the development of application software

• Balanced development of HPC machines, environment, and applications

Problems

• Lack of national long-term plan for high performance computing– the successful story of joint-funding may not always be repeatable

– sustained funding needed.

• Lack of regular support to the operation of the HPC centers– Charging the end users sometimes prevent the academic users from using the

computing facilities

• Rely on some external key technologies for developing the system– Processors/accelerator, memory, interconnects

• The application software bottleneck– Commercial application software relies on import

– Limited parallel scale of the commercial software

– Must be addressed by

• Self-development

• Open source

• Resource sharing by new business model

• Shortage of talents– improving university education and continuous training

• Lack of routine Inter-disciplinary collaboration– Must be encouraged by more effective measures.

Challenges

• Can we maintain the speed of the last 20 years? – Impossible without technical breakthrough and significant investment

– TOP500 curve shows a slow down world-wide

– How to define a reasonable but challenging goal for the next 5 years will be critical

• The great technical challenges to exa-scale system development– Power

– performance

– programmability

– resilience

• Exploring billion-thread parallelism in applications on exa-scale systems

• Improving QoS and user experiences in the context of cloud, big data, and mobile Internet

Challenges: exa-scale system

• Low power consumption

– US DoE has defined 20MW as the power consumption limit of the exa-scale computer

– the biggest and most difficult barrier to implement exa-scale systems.

• Reliability

– very short MTBF in a system with huge amount of components

– programs requiring long-time execution is hard

• Check pointing may not work

– fault-tolerance measure at different layer needed

• Fault detection, fault diagnosis, fault-isolation, and fault recovery

• Exa-scale programming and execution

– how to program large scale heterogeneous parallel systems

– explore locality and reduce data movement

• Performance obtained by applications

– co-design between applications and systems

– new evaluation system and benchmarks

Challenges: HPC applications

• Application software development is behind the parallel hardware development– system composed of 3-4 million cores

– software developed can use only 300,000 cores at most

• Need a holistic approach in application software development– Joint efforts of problem definition, modeling, algorithms, and

architecture-dependent algorithm implementation

• Need greater efforts on programming support and tools– acceptance of parallel programming framework is still low

– adoption of performance tools is not a common practice

Challenges: HPC environment

• A long way to go before being a mature national computing infrastructure

– Need to build up rich software resources

– the computing centers are to be improved in functions, service modes, quality of services

– hierarchical infrastructure needed to meet the requirement of different applications

• more efficient usage of different resources

• New operation models and mechanisms– from batch to more interactive

– from supporting scientific computing only to supporting both computing and data

– pervasive use stimulated by mobile Internet

– business model for establishing computing service industry

Strategic studies

• A strategic study has been organized jointly by the 863 key project and the National Supercomputing Innovation Alliance

• Studies in three areas carried out– Technology and system

– HPC Service and environment

– HPC applications

• Workshops organized in August

• Reports on each area drafted and are still being modified

General strategy

• Pursuing sustainable development based new innovations and experiences learned in the last 3 key projects

– Emphasize technical innovations and self-developed technologies

– Develop HPC systems and environment based on application requirements

• true “application pulling”

• success measured by applications

– Continue the strategy of coordinated development of supercomputers, HPC environment, and HPC applications

• mutual promotion among the three aspects

• promote overall level of high performance computing in China

Task 1: High Performance Computers

• Strategy

– meet the nation’s increasing requirement for computing

– establish a collaborative innovation mechanism,

– making breakthrough in key technologies, based on self-controllable

technologies to develop the leading-class supercomputer

• Emphasis

– co-design based on application features analysis

– exa-scale computer architectures

– exa-scale computer system software

– processor architecture

– high performance interconnects

– system infrastructure

• System development

– Based on technology innovations, developing a leading-class

computing system

Co-design based on application features analysis

• What applications require exa-scale computing? How big

is the set of exa-scale applications?

• Understand the workload characteristics of the exa-scale

applications

– earth system modeling, fusion simulation, turbulence

simulation, materials simulation, bio-informatics data

analysis,

• Co-design based on application characteristics

– propose architecture appropriate for major applications

– Look for architectural support to major algorithms

• Develop metrics and benchmarks to know how well the

architecture adapts to the applications

Exa-scale computer architecture

• Multi-objective constrained design

– balance and optimization among performance, power consumption, cost and reliability by hardware and software coordination

• Making tradeoff between Homogeneous and heterogeneous, general purpose and special purpose, dynamic and static

• General purpose vs special purpose?– If there are only a few applications requiring exa-scale computers, why shouldn’t we

develop more efficient special purpose machine?

– Could the general purpose machine satisfy the needs of different applications efficiently?

• Homogeneous vs heterogeneous?– To meet the requirements of a wide range of applications, CPU only? or CPU +

accelerators?

– The accelerators used as co-processors or stand alone?

• Idle and wasted if not needed in the current system

• Static vs dynamic?– Could we use reconfigurable architecture to take the advantages of both special

purpose or general purpose?

– Static reconfigurable or dynamic reconfigurable?

– Languages and tools to support reconfiguration according to application characteristics?

Exa-scale computer architecture

• Address the memory wall issue

– hierarchical memory architecture

• depth, structure of the cache, coherence, data pre-fetching

– the impact of using novel memory devices to the architecture

• 3D packaging of processor/memory

• novel non-volatile memory devices

– reduce data moving from the architecture point of view

• functions performed by memory

• moving code instead of data

Exa-scale computing software

• Exa-scale computing software environment

include

– node OS and runtime

– program development environment

– system resource management

– parallel program debugging, performance analysis

and optimization

• Node OS and runtime

– supporting heterogeneity and in-node parallelism

– ensure efficiency in programming, code execution, resource management, and reliability

Exa-scale computing software

• Program development environment must take care of multi-level non-symmetric parallelism, data driven feature, energy consumption of the code, and code reliability– parallel programming models

– parallel languages and compilers

– runtime optimization

• Exa-scale parallel program debugging and performance analysis– acceptable time and storage overhead

– parallel program debugging

– program performance analysis and optimization

– program energy consumption optimization

• Resource management for large scale complex system, achieving stable and efficient operation of exa-scale systems

• Novel hardware and software fault-tolerant mechanisms

Processor for exa-scale systems

• Processor is the key to achieve performance and energy consumption goal

– system power consumption 20MW = procesor power consumption 100GF/W

– very difficult to realize

• Processor micro-architecture

– heterogeneous many-core

• High bandwidth memory access

• On-chip memory and networks

• Processor reliability techniques

• Low power processor design

High performance interconnect

• Interconnects influence system scalability and power

consumption

– determine the parallel scale of the exa-scale computer

• With the increase of communication rate, the power

consumption of the interconnects is not negligible

– may >20% energy consumed

• Highly scalable network structures

• High reliable communication mechanism

• High radix interconnect chip design

• Large scale optical interconnect networks

– achieving both high performance and low power

consumption

System infrastructure

• Important to system energy consumption,

scalability, and reliability

• Efficient cooling techniques

– efficient chip-, board-, and system-level cooling

• Highly efficient power supplies

• High density assembly techniques

Developing leading-class computers

• Develop system optimized in efficiency, energy

consumption, reliability

• Efficiently support a variety of grand challenge

applications

• Adopt the newest technologies in the world, co-design

and innovation in different aspects and levels

• Kernel hardware and software developed in China

• Whether or not to develop an exa-scale computer by

2020 is still a open issue in China. Further study is

needed.

Task 2: HPC applications

• Strategy– Developing applications for important areas in social and

economical development

– establishing innovative mechanism for application software development

• Emphasis– Scientific and engineering computing middleware supporting

development of large scale application software on heterogeneous computers

• parallel programming framework

• parallel algorithm libraries

– Establish several centers for high performance computing application software development

• supporting the entire lifecycle of program development, verification, deployment, disseminations and services

• form an eco-system for HPC applications

Task 2: HPC Applications

– Develop a set of capability-type applications • Numerical reactor for nuclear energy development and

utilization

• Numerical wind tunnel

• Numerical engines

• Global climate change and earth system modeling

• Big data analysis platform for gene sequencing

• Platform for drug discovery

• High throughput computing platform for material design

• …

– Develop a group of capacity-type applications and deploy them at national supercomputing centers, attract a large population of faithful users

Task 3: HPC environment

• Strategy

– Establish a national HPC environment with world-level

resources and service capabilities

• Emphasis

– In cooperation with the development of supercomputers and

HPC applications, upgrade the scale of the environment,

enrich the software resources, enhance the service

capability

– Establish domain-oriented platforms for solving domain

problems

• earth science

• environment protection and disaster monitoring

• …

– HPC cloud platform for industry

• domain application villages

Task 3: HPC environment

– Develop enabling technologies, improve quality of services

• data transfer tools

• visualization tools

• security mechanisms

• application performance analysis tools

– Develop key technologies and platform for resource management and environment

operation, supporting new operation models

– Improve network connection for better performance and user experiences

Supercomputing Innovation Alliance

• China Supercomputing Innovation Alliance was established in Sep. of 2013

• The mission of the alliance is to provide a platform for collaborative innovation and a bridge for connecting industry, research, and user communities

• Five working groups

– technology and standard

– applications

– environment and services

– dissemination and industry promotion

– education and training

• We wish the alliance can play an important role in HPC development in China