Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief...

26
Mike Muller CTO Is there anything new in heterogeneous computing?

description

Keynote presentation, Is There Anything New in Heterogeneous Computing, by Mike Muller, Chief Technology Officer, ARM, at the AMD Developer Summit (APU13), Nov. 11-13, 2013.

Transcript of Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief...

Page 1: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Mike Muller CTO

Is there anything new in heterogeneous computing?

Page 2: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Evolution

Computing

Embedded

PC

77

82

97

07 10

Mobile Computing

IOT

Cloud Server

1960 1970 1980 1990 2000 2010 2020

Wearable Intelligence

13

89

Consumer Smart Appliances

93

Page 3: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

What’s the Innovation?

MEMS

CCD

Wireless 3 G

Semiconductor Process?

Media GPS

Social Media?

Page 4: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

10

100

1,000

10,000

1990 1995 2000 2005 2010 2015 2020 2025

LELE

SADP

LELELE EUV

SAQP

EUV LELE

EUV + DWEB

EUV + DSA

FinFET HNW

III-V GE

VNW 2D: C, MoS

spintronics NEMS

Patterning

Planar CMOS

Al wires CU wires

Switches

Interconnect

NMOS PMOS

Mobility Trends: CMOS

14nm 10nm 7nm 5nm 3.5nm

HKMG Strain

// 3DIC Opto I/O Opto int Seq. 3D

Graphene wire, CNT via

cm2 /

(V·s

)

Page 5: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Printing: Moore’s Law and Ink Jets

10,000 nozzles

10 nozzles

1980 1985 1990 1995 2000 2005 2010 2015 2020

1E11

1E10

1E9

1E8

1E7

1E6

1E5

1E4

1E3

1E1

1E0

1E-1

1E-2

1E-3

100’s microns

10’s microns

Drops/Second 1/Size (pL-1)

Page 6: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Printing and Imprinting Thin Film Transistors (TFT)

Can be transparent, bio-degradable and even ingestible

Unit cost 1000 less than mainstream CMOS CMOS @ $40,000/m2 vs. TFT @ $10/m2

Printing CAPEX can be less than $1,000 350dpi = 200um @ 20 m/s

Can print batteries, antenna

Mainly organic at ~20 volts

Imprint CAPEX a $2M DVD press is high volume Better controllability hence higher density and performance

1um today scale to 50nm features as used today for BluRay discs

Mainly Inorganic NMOS only at ~2 volts

Page 7: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Mobility Trends: CMOS & Thin Film Transistors

0.00001

0.0001

0.001

0.01

0.1

1

10

100

1000

10000

1990 1995 2000 2005 2010 2015 2020 2025

Conventional NMOS

Conventional PMOS

TFT

CPU

cm2 /

(V·s

)

ARM1 3µ

6MHz

CortexM0 2µ

20kHz

Page 8: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Top Right

and Bottom Left

Page 9: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

1998 Manual Partitioning

C & Assembler

2013 Manual Partitioning

C++ & OpenCL/RenderScript

ARM DSP ARM GPU

Vector Add Reduction Matrix Mul

GPU OpenCL on GPU 1.00 1.00 1.00

GPU OpenCL on FPGA 0.14 0.02 0.89

FPGA OpenCL on FPGA 1.71 1.62 31.85

+ +

Is There Anything New in Heterogeneous Computing?

Page 10: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

How Do People Program?

Simple, old-school ray tracer

Start with C++ code and accelerate the code with Heterogeneous Systems void traceScreen() { for(y = 0; y < height; ++y) { for(x = 0; x < width; ++x){ Ray ray = generateRay(x, y); IntersectableObject *obj = traceRay(ray); framebuffer[y][x] = colorPixelForObject(obj); } } }

void traceScreen() { par_for_2D(height, width, [&](int y, int x) { Ray ray = generateRay(x, y); IntersectableObject *obj = traceRay(ray); framebuffer[y][x] = colorPixelForObject(obj); }); }

Mobile Web

Embedded ~200k

Desktop

~20M Programmers

Page 11: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Moving the Code onto OpenCL 1.x

Need to make the following changes a) Get rid of all the pointers, both in scene vector and internally in CSGObject

b) Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals

c) Get rid of the virtual function calls

d) Change the classes to structs

e) Get rid of recursion in CSGObject

f) Avoid accessing the global scene variable in accelerated code

g) Port the code base to OpenCL C

Page 12: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Moving the Code onto OpenCL 2

Need to make the following changes a) Get rid of all the pointers, both in scene vector and internally in CSGObject

b) Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals

c) Get rid of the virtual function calls

d) Change the classes to structs

e) Get rid of recursion in CSGObject

f) Avoid accessing the global scene variable in accelerated code

g) Port the code base to OpenCL C

OpenCL 2 solves point a) with shared address space, but not the rest

Page 13: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Moving the Code onto C++ AMP

Need to make the following changes a) Get rid of all the pointers, both in scene vector and internally in CSGObject

b) Rewrite the use of std::vector, as C++ AMP cannot call into C++ standard library

c) Get rid of the virtual function calls

d) Change the classes to structs

e) Get rid of recursion in CSGObject

f) Avoid accessing the global scene variable in accelerated code

g) Port the code base to OpenCL C

C++ AMP solves points d), f) and g), but not the rest

Page 14: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Moving the Code onto HSA

Need to make the following changes a) Get rid of all the pointers, both in scene vector and internally in CSGObject

b) Rewrite the use of std::vector, as HSAIL does not understand C++ data type internals

c) Get rid of the virtual function calls

d) Change the classes to structs

e) Get rid of recursion in CSGObject

f) Avoid accessing the global scene variable in accelerated code

g) Port the code base to a language on top of HSAIL

HSA solves points a), c), d), e) and soon f)

Page 15: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

What Makes GPUs Good For Power Efficient Compute?

Relaxed single-threaded performance No dynamic scheduling

No branch prediction

No register renaming, no result forwarding

Longer pipelines

Lower clock frequencies

Multi-threading Tolerate long latencies to memory

Increasing the ALU/control ratio Short-vectors exposed to programmers

SIMT/Warp/VLIW/Wavefront based execution

Page 16: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

LITTLE big

Heterogeneous Compute Homogeneous Architecture

How about a SIMTish ARM? Familiar programming model, C++ and OpenMP

Fewer seams

Sharing data structures and function pointers/vtables

..

Throughput

Load/Store Pipe FP Pipe

Integer Pipe

SIM

T

Que

ue

Wri

te

RESEARCH

Page 17: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Moving the Code onto a Warped ARM

Need to make the following changes Get rid of all the pointers, both in scene vector and internally in CSGObject

Rewrite the use of std::vector, as OpenCL C does not understand C++ data type internals

Get rid of the virtual function calls

Change the classes to structs

Get rid of recursion in CSGObject

Avoid accessing the global scene variable in accelerated code

Port the code base to OpenCL C

Page 18: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Performance vs Effort

We’ve implemented SGEMM, a matrix-matrix multiplication benchmark, in various ways, to investigate the tradeoff between programmer effort and performance payoff

SGEMM version Speedup Effort

ARM in C 1x Low

ARM in C with NEON intrinsics, prefetching 15x Medium - High

ARM in assembly with NEON, prefetching 26x High

SIMTish ARM in C 35x Low

SIMTish ARM in C, unrolled 44x Low - Medium

Mali GPU x 4 way 136x High

Page 19: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Scale Needs Standards

Page 20: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

IPv4

IPv6 Sonosnet

Works for geeks… No proper orchestration Battle for the apps platform Needs home IT support Or only single manufacturer

Imagine that there were a 1000 of these connected devices….

Page 21: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Functional Becomes the Internet of things

Functional Little Data

Page 22: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Mike

Life Insurance

Gym

Car Insurance

My Data

Their Data

!

X X Rob Curtis Haymakers Cambridge

Picture by Keith Jones

Page 23: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Sharing Needs Trust

Page 24: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

IOT Medical Devices

First implantable Pacemaker 1958

Can a pacemaker be hacked to kill? Or just a plot line in US TV series

RF interface for adjusting settings

First hacked in 2008 “Sustained effort by a team of specialists” – The New York Times

Range a few cm

Today MIT grad students

One weekend

Range 50 feet

Page 25: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

Trust Needs Security

Page 26: Keynote (Mike Muller) - Is There Anything New in Heterogeneous Computing - by Mike Muller, Chief Technology Officer, ARM

It’s a Heterogeneous Future

Open Data and Objects

The future R

each

Smart Everything

SaaS M2M

Applications

Internet / broadband

Mobile Telephony

Sensors & Actuators Networks

Fixed Telephony Networks

Mobile internet

Scale Needs Standards Sharing Needs Trust Trust Needs Security

Today