Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth...

4
Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth John Jose and Yongping Men COMP60611 – Future of large-scale Parallel Computing Key Challenges with Exascale Machines Energy: Energy per operation must be low enough that the total energy when acting at an exascale level is below acceptable bounds Memory: Cost and bandwidth of memory is such that an exascale machine would not be feasible with current technology. Extraordinary concurrency: It is assumed that reaching the exascale will be achieved by increasing the number of processing units, not the speed of each processing unit. Therefore, we may need billions of processing units to reach the exascale. This is a 1000 times greater than current best efforts in total concurrency. In addition, some applications may just not be suitable for this level of concurrency. Resiliency: As the technology gets smaller and more complicated, it becomes more sensitive to faults such as noise, temperature, and bit error rates. In addition, increasing the number of cores increases the amount of synchronisation required.

Transcript of Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth...

Page 1: Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth John Jose and Yongping Men COMP60611 – Future of large-scale.

Combining the strengths of UMIST andThe Victoria University of Manchester

Matthew Livesey, Hemanth John Jose and Yongping Men

COMP60611 – Future of large-scale Parallel Computing

Key Challenges with Exascale Machines

•Energy: Energy per operation must be low enough that the total energy when acting at an exascale level is below acceptable bounds

•Memory: Cost and bandwidth of memory is such that an exascale machine would not be feasible with current technology.

•Extraordinary concurrency: It is assumed that reaching the exascale will be achieved by increasing the number of processing units, not the speed of each processing unit. Therefore, we may need billions of processing units to reach the exascale. This is a 1000 times greater than current best efforts in total concurrency. In addition, some applications may just not be suitable for this level of concurrency.

•Resiliency: As the technology gets smaller and more complicated, it becomes more sensitive to faults such as noise, temperature, and bit error rates. In addition, increasing the number of cores increases the amount of synchronisation required.

Page 2: Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth John Jose and Yongping Men COMP60611 – Future of large-scale.

Combining the strengths of UMIST andThe Victoria University of Manchester

Matthew Livesey, Hemanth John Jose and Yongping Men

COMP60611 – Future of large-scale Parallel Computing

Solutions

•3D stacking - By stacking memory on the processor in a 3D manner, we may increase total memory bandwidth and address the memory wall problem.

•Transactional Memory - Processors speculatively update the shared memory inside a transaction and commit the updates only on the successful completion of the transaction

•Heterogeneous many-core Systems - For some problems, a greater speed up may be realised with heterogeneous cores rather than uniform cores

•Programming models based on Psychological research

Page 3: Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth John Jose and Yongping Men COMP60611 – Future of large-scale.

Combining the strengths of UMIST andThe Victoria University of Manchester

Matthew Livesey, Hemanth John Jose and Yongping Men

• RAMP

– To just provide a general platform to measure the large scale parallel

success and avoid waiting years between HW/SW iterations.

– RAMP design framework (RAMP blue/RAMP red/RAMP white)

6

CMU Simics/RAMP Simulator

BEE2 Platform Simics (PC)Xilinx XCV2P70

DDR2MemDDR2Mem

InterleavedPipeline

CPUcontextCPU

context16xCPU

PowerPC

SimulatedI/O devices

16-CPU Shared-memory UltraSPARC III Server

(SunFire 3800) Memory

MMU DMA

Graphics NIC SCSI

Terminal

PCI

CPUCPU CPU..16-CPU Shared-memory UltraSPARC III Server

(SunFire 3800) Memory

MMU DMA

Graphics NIC SCSI

Terminal

PCI

CPUCPU CPU..

BEE2 Platform Simics (PC)Xilinx XCV2P70

DDR2MemDDR2Mem

InterleavedPipeline

CPUcontextCPU

context16xCPU

PowerPC

SimulatedI/O devices

16-CPU Shared-memory UltraSPARC III Server

(SunFire 3800) Memory

MMU DMA

Graphics NIC SCSI

Terminal

PCI

CPUCPU CPU..

COMP60611 – Future of large-scale Parallel Computing

Page 4: Combining the strengths of UMIST and The Victoria University of Manchester Matthew Livesey, Hemanth John Jose and Yongping Men COMP60611 – Future of large-scale.

Combining the strengths of UMIST andThe Victoria University of Manchester

Matthew Livesey, Hemanth John Jose and Yongping Men

References[1] Asanovic et al. The Landscape of parallel computing research: A view from Berkeley. Electrical Engineering and Computer Sciences, University of California at Berkeley. December 18, 2006

[2] Gabriel H. Loh. 3D-Stacked Memory Architectures for Multi-Core Processors International Symposium on Computer Architecture 2008.

[3] Kogge P. et al. ExaScale Computing Study: Technology Challenges in Achieving Exascale Systems DARPA, 2008

[4] Hammond L. et al. Transactional Memory Coherence and Consistency, Proceedings of the 31st annual international symposium on Computer architecture, 2004

[5] Wawrzynek J. et al. RAMP: A Research Accelerator for Multiple Processors, University of California at Berkeley, 2006.

COMP60611 – Future of large-scale Parallel Computing