SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU...

31
All Rights Reserved, Copyright© FUJITSU LIMITED 2014 SPARC64 XIfx: Fujitsu’s Next Generation Processor for HPC August 11, 2014 Toshio Yoshida Next Generation Technical Computing Unit Fujitsu Limited

Transcript of SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU...

Page 1: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

All Rights Reserved, Copyright© FUJITSU LIMITED 2014

SPARC64™ XIfx: Fujitsu’s Next Generation Processor

for HPC

August 11, 2014

Toshio Yoshida

Next Generation Technical Computing Unit Fujitsu Limited

Page 2: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 2

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

2

Page 3: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Fujitsu Processor Development

3

GS21

2600

SPARC64

VIIIfx

SPARC64

X

SPARC64

XIfx

UNIX Server

HPC

SPARC64

X+

SPARC64

IXfx

10Peta 20Peta scale 100Peta scale

Mainframe

2011 2012 2013 2014

K computer

8 Cores

HPC-ACE

DIMM

Tofu interconnect

FX10

16 Cores

HPC-ACE

DIMM

Tofu interconnect

Post-FX10

32 Cores

+ 2 Assistant Cores

HPC-ACE2

HMC

Tofu interconnect2

16cores

SMT / SWoC

3GHz

16cores

SMT / SWoC+

3.7GHz

EXA scale

Page 4: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 4

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

4

Page 5: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Design Concept of SPARC64TM XIfx

• Designed for massively parallel supercomputer systems

– High performance for wide range of real applications

– High scalability

– Low power consumption

– Groundwork for EXA scale computing

• Enhance and inherit K computer features

– Stand-alone scalar many-core architecture

– Enhanced VISIMPACT and Sector cache

– On-chip integrated Tofu interconnect 2

• Introduce new technologies to EXA scale

– Wider SIMD enhancements HPC-ACE2

– Leading-edge memory technology HMC

– Cores dedicated for non-computation operation Assistant cores

5

Page 6: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 6

SPARC64™ XIfx Chip Overview

Architecture Features • 32 computing cores

+ 2 assistant cores

• HPC-ACE2

• 24 MB L2 cache

• HMC, Tofu2 , PCI Gen3

20nm CMOS • 3,750M transistors

• 1,001 signal pins

• 2.2GHz

Performance (peak) • 1.1TFlops

• HMC 240GB/s x 2(in/out)

• Tofu2 125GB/s x 2(in/out)

core core

core core

core core

core core

core core

core core

core core

core core

Assistant

core Assistant

core

core core

core core

core core

core core

core core

core core

core core

core core

Tofu2 interface

Tofu2 controller

HM

C in

terface HM

C in

terf

ace

L2 cache

L2 cache

PCI interface

MA

C

MA

C M

AC

M

AC

PCI controller

6

Page 7: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 7

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

7

Page 8: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Node Architecture • Stand-alone scalar many-core with wider SIMD

– No accelerator

• Non-hierarchical and high bandwidth memory

– 8x HMCs (32GB, 240GB/s x2 (in/out) )

• Isolation of non-computation operation for jitter reduction

– 32 Computing cores

– 2 Assistant cores

• Daemon, IO, MPI asynchronous communication, etc.

• Sector cache is used for assistant core to avoid cache pollution

– Computing cores and Assistant cores keep cache coherency

• Single OS manages computing and assistant cores

– Single OS minimizes memory management overhead

8

Page 9: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 9

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

9

Page 10: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

HPC-ACE2: ISA enhancements

• Wider SIMD enhancements from K computer / FX10

– 256-bit wide SIMD (64-bit x 4 / 32-bit x 8)

– More integer operations

– Stride load/store

– Indirect load/store

– Compress

– Round

– Permutation

10

Page 11: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Wider SIMD Extensions • 256-bit wide SIMD with 128 FPRs

– 64-bit (DP: Double Precision) x 4 SIMD

– 32-bit (SP: Single Precision) x 8 SIMD

• DP 3.2x, SP 6.1x faster than SPARC64TM IXfx in basic kernels

– Improved L1 cache pipelines

– Higher frequency 1.848GHz -> 2.2GHz

DP 3.23x / SP 6.18x

(average)

No

rmal

ized

Per

form

ance

Basic Kernels Performance per Core

11

Page 12: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Built-in Functions • Built-in functions accelerated by

– HPC-ACE2 instructions • 256-bit wide SIMD

• Rounding / Bit manipulation / Exponential auxiliary instructions

– Microarchitectural enhancements

3.64x (average) Rounding

Exponential Bit manipulation

256-bit wide SIMD

L1 cache

improvement

No

rmal

ized

Per

form

ance

Built-in Functions Performance per Core

12

Page 13: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Stride Load/Store Instructions

• Stride access is frequently used in various HPC apps.

– Support from 2 to 7-element stride width

• 3.6x faster than SPARC64TM IXfx

i[0] i[1] i[2] i[3]

lddst,s [%l0]@stride 3, %f0

i[0] i[1] i[2]

i[3]

%l0+0 +32

+64

%f0

E.g. Stride load @ stride width =3

Memory 3.67x 3.63x

No

rmal

ized

Per

form

ance

Stride load Performance

13

Page 14: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Memory

Indirect Load/Store Instructions • Indirect load and store instructions for list accesses

– List accesses appear in wide ranges of HPC apps.

• More than 1.6x faster than SPARC64TM IXfx

i[0] i[1] i[2] i[3]

lddid,s [%f0], %f2

A D B C

i[2] i[0]

i[1]

C

i[3]

%f0

%f2

A

B

D

E.g. Indirect load

1.67x 1.92x

No

rmal

ized

Per

form

ance

Indirect Load/Store Performance

14

Page 15: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 15

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

15

Page 16: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 16

SPARC64TM XIfx Core Pipeline

FLB

L1 I$ 64KB

4ways

Branch Target

Address

Decode

& Issue

RSE

RSA

RSF

RSBR

GUB

GPR 188Registers

EXA

EXB

EAGA

EXC EAGB

EXD

FPR 128x4 Reg.

FUB

Fetch

Port

Store

Port

L1 D $ 64KB

4Way

MAC

(HMC Controller)

Fetch Issue Dispatch Reg-Read Execute Cache and Memory

CSE

Commit

PC

Control

Registers

L2$

HMC

Write

Buffer

Pattern History Table

PCI Controller

Tofu 2 controller

PCI-GEN3 CPU-CPU I/F

34 cores …

FLB FLA Local

Pattern Table

FLB FLB FLB

• 2x 256-bit SIMD FMAs + 4x ALUs (shared with 2 AGENs)

• 2x 256-bit SIMD LOADs or 1x 256-bit SIMD STORE

• Fundamental pipelines are based on SPARC64TM X+

– Superscalar, Out-of-Order, branch prediction, etc.

• No multithreading

16

Page 17: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Many-Core Architecture • SPARC64TM XIfx has 2 CMGs (Core Memory Group)

– CMG consists of 17 cores, L2 cache and 2 memory controllers (MAC)

– Two CMGs keep cache coherency by ccNUMA with on-chip directory

• 32GB memory capacity

• To bind a process in a CMG is recommended

L2 cache 12MB 24ways

CMG*0 (17cores)

L2 cache 12MB 24ways

CMG*1 (17cores)

PCI controller

Tofu2 controller

CC 0

CC 1

CC 2

CC 15

AC CC 0

CC 1

CC 2

CC 15

AC

MAC

MAC HMC

MAC HMC

MAC

MAC

MAC HMC

MAC HMC

MAC

CPU

17

Page 18: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

• High bandwidth cache, memory and Tofu2

– 2x Cache bandwidth / Core • Compared to SPARC64TM IXfx

– 8x HMC

• 15 Gbps

• 16 lanes

• 8 ports

– Tofu2

• 25 Gbps

• 4 lanes

• 10 ports

CMG1 CMG0

High Bandwidth

Tofu2

125GB/s x2(in/out)

Memory

240GB/s x2(in/out)

L1 cache

4.4TB/s

L2 cache

2.2TB/s

Perfomance

1.1TFLOPS 256-bit SIMD

FMA x2

L1 cache

70.4

GB/s 140.8

GB/s

70.4

GB/s

35.2

GB/s

Tofu2

controller

12.5

GB/s 12.5

GB/s

120

GB/s

120

GB/s 120

GB/s

120

GB/s

34 cores

HMCs HMCs

x10 ports

L2 cache L2 cache 70.4GB/s

70.4GB/s

Core

18

Page 19: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 19

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

19

Page 20: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Enhanced VISIMPACT • Advantages of Hybrid Parallelization

– To reduce communication cost

in highly parallel programs

– To increase user memory space

by reducing communication buffer

• VISIMPACT* (introduced in FX1)

– Automatic parallelization technology by Fujitsu’s compiler

– Hardware barrier for fast synchronization

• Enabling 8 sets of Hardware barriers between 32 cores

– Optimum combination of # Threads and # Processes depends on apps.

– Any combinations of T(Threads) and P(Processes) are supported

• 32 T(Thread) x 1 P(Process), 16 T x 2 P, 8 T x 4 P, etc.

– The goal is heterogeneous hybrid parallelization for load

imbalance and multi physics

MPI com

MPI com

*Virtual Single Processor by Integrated Multi-core Parallel Architecture

20

Page 21: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Effect of VISIMPACT • Lower memory usage

– By reducing communication buffer for MPI

• Higher performance

– By reducing MPI communication cost

Memory usage and Performance

of #Threads x #Processes

Normalized Memory Usage Normalized Performance

Higher

Performance Lower

Usage

21

Page 22: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Enhanced Sector Cache • Sector Cache (introduced in K computer)

– Cache line is replaced to keep specified sector size when cache miss occurs

• Like ‘Local Memory’

– Leave the reusable data on cache by dividing cache into segments

• Unlike ‘Local Memory’ – No need for a dedicated address

– No penalty to save and restore in context switch

• SPARC64TM XIfx supports 4 sectors in L1 cache (per core) and L2 cache (per CMG) respectively

– More usable than SPARC64TM IXfx of 2 sectors in L1 and L2 respectively

– Each sector size can be specified separately

L2 Cache

- Instruction fetch

- Normal data

- Streaming data

Reusable

data 3 Reusable

data 2

Reusable

data 1

Sector

0

Sector

1

Sector

2

Sector

3

22

Page 23: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 23

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

23

Page 24: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Assistant core • Assistant core serves Daemon, IO, MPI asynchronous

communication instead of computation

– Each CMG has an assistant core allocated on 17th core

– Sector cache within L2 cache allocates one sector to assistant core

to avoid cache pollution

• Minimize performance degradation in large systems by jitter

reduction

Perf degradation ratio by jitter (model)

L2 cache 12MB 24ways

CMG0 (17cores)

L2 cache 12MB 24ways

CMG1 (17cores)

PCI controller

Tofu2 controller

CC 0

CC 1

CC 2

CC 15

AC CC 0

CC 1

CC 2

CC 15

AC

MAC

MAC HMC

MAC HMC

MAC MAC

MAC HMC

MAC HMC

MAC

CPU

CPU block diagram

Performance

improvement

24

Page 25: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 25

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

25

Page 26: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Performance • SPARC64TM XIfx boosts performance up by ISA and

mircoarchitectural enhancements

– 97% execution efficiency for DGEMM

• Sector cache realizes the same effect as 2.5x L1 cache size

– 1.7x faster per core than SPARC64TM IXfx in real HPC applications

such as fluid dynamics

40% up

by uArch

33% up

by ISA

No

rmal

ized

Per

form

ance

Real HPC Applications Performance per Core

Memory BW

(HMC)

Out-of-Order

Indirect access

& Integer SIMD 256-bit SIMD

26

Page 27: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 27

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

27

Page 28: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014

Reliability, Availability, Serviceability • HPC system requires extensive

RAS capability of CPU and interconnect

• SPARC64™ XIfx inherits mainframe-level RAS features - # checkers in CPU increased to

~92,900

- Tofu2 buses support self-recovery and lane dynamic degradation

Units Error Detection and Correction

Cache (Tags) ECC, Parity & Duplicate

Cache (Data) ECC, Parity

Registers ECC (INT/FP), Parity (Others)

ALUs Parity, Residue

Green: 1-bit error Correctable

Yellow: 1-bit error Detectable

Gray: 1-bit error Harmless

28

SPARC64TM XIfx RAS diagram

Other RAS features

Cache dynamic degradation

Hardware Instruction Retry

Lane dynamic degradation for Tofu2

28

Page 29: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 29

Agenda

Fujitsu Processor Development

SPARC64TM XIfx

Design Concept and Processor Overview

Node Architecture

HPC-ACE2: ISA enhancements

Microarchitecture

Enhanced VISIMPACT and Sector Cache

Assistant Core

Performance

RAS

Summary

29

Page 30: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 30

Summary SPARC64TM XIfx is Fujitsu’s latest SPARC processor,

designed for massively parallel supercomputing systems

Enhance and inherit K computer features Stand alone scalar many-core architecture

VISIMPACT and Sector Cache

On-chip integrated Tofu2

Introduce new technologies to EXA scale HPC-ACE2

HMC

Assistant cores

SPARC64TM XIfx has improved performance of real HPC applications significantly

As a next step, Fujitsu goes forward to EXA scale supercomputing

30

Page 31: SPARC64 XIfx: Fujitsu's Next Generation Processor for HPC · Author: FUJITSU LIMITEDတတတတတတတတ Created Date: 8/13/2014 12:12:40 AM

SPARC64™ XIfx All Rights Reserved, Copyright© FUJITSU LIMITED 2014 31

Abbreviations

• SPARC64TM XIfx – RSA: Reservation Station for Address generation

– RSE: Reservation Station for Execution

– RSF: Reservation Station for Floating-point

– RSBR: Reservation Station for Branch

– GUB: General-purpose Update Buffer

– FUB: Floating-point Update Buffer

– GPR: General-Purpose Register

– FPR: Floating-Point Register

– CSE: Commit Stack Entry

– EAG: Effective Address Generator

– EX : Execution unit (Integer)

– FL : Floating-point unit

– HPC-ACE: High Performance Computing-Arithmetic Computational Extensions

– HMC: Hybrid Memory Cube

– Tofu: Torus-Fusion

31