Heterogeneous Systems Architecture: The Next Area of Computing Innovation

38
HETEROGENEOUS SYSTEMS ARCHITECTURE: THE NEXT AREA OF COMPUTING INNOVATION CASE STUDY: THE HOLODECK Dr. Lisa Su Senior Vice President and GM, Global Business Units, AMD ISSCC Conference February 18, 2013

description

Dr. Lisa Su, Senior Vice President and GM, Global Business Units, AMD keynote from ISSCC on Heterogeneous Systems Architecture: The Next Area of Computing Innovation - Case Study, The Holodeck.

Transcript of Heterogeneous Systems Architecture: The Next Area of Computing Innovation

Page 1: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

HETEROGENEOUS SYSTEMS ARCHITECTURE:

THE NEXT AREA OF COMPUTING INNOVATION

CASE STUDY: THE HOLODECK

Dr. Lisa Su Senior Vice President and GM, Global Business Units,

AMD

ISSCC Conference

February 18, 2013

Page 2: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

2 | ISSCC Keynote | February 18th, 2013

CHALLENGES TO MOORE’S LAW SCALING

Lithography challenges begin severely limiting area scaling at 20nm node

– Fewer 1X metals due to cost

– Less aggressive feature scaling due to lithography challenges

Compounded by rapidly increasing lithography costs

– 28 20nm transition is inflection point with dual exposure

– No cost / transistor crossover for first time at 28 20nm transition

0.0

0.2

0.4

0.6

0.8

1.0

45nm 40nm 32nm 28nm 20nm 20FinFET

Norm

aliz

ed A

rea

0.0

0.2

0.4

0.6

0.8

1.0

45nm 40nm 32nm 28nm 20nm 20FinFET

No

rma

lize

d C

ost/

Tra

nsis

tor

Cost Per Transistor Scaling Area Scaling by Technology Generation

Page 3: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

3 | ISSCC Keynote | February 18th, 2013

A PARADIGM SHIFT…

Throughput Performance Accelerator

Homogeneous

Computing

High-level programmable

Multi-Core

Era

Heterogeneous

Systems Era

Single-Core

Era

Graphics driver-based

programs

OpenCL/DX driver-based

programs

Pro

gra

mm

ab

ilit

y

CP

U

Microprocessor Advancement

GP

U

Ad

va

nc

em

en

t

Heterogeneous

Computing

Page 4: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

4 | ISSCC Keynote | February 18th, 2013

HETEROGENEOUS SYSTEMS ARCHITECTURE MEMORY MODEL

From

32 bit

To

64 bit

Yesterday

Today

Page 5: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

5 | ISSCC Keynote | February 18th, 2013

ARCHITECTURES – A HISTORICAL PERSPECTIVE

Surround Computing Era Legacy Processing Era

Heterogeneous Architectures

Traditionally Optimized Platforms

Single Core CPUs

Multi-Core CPUs/GPUs

2000s 2010s 1990s 1981

APUs and legacy SOC

Page 6: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

6 | ISSCC Keynote | February 18th, 2013

CHANGING THE THINKING, CHANGING THE GAME

HSA is designed to make the GPU hardware

directly accessible to the software, using the high

level languages programmers already in use on

the CPU

C, C++, Java, Python…even JavaScript, HTML5

ISA agnostic – e.g., x86, 64-bit ARM, Radeon, Mali

GPU becomes a peer processor to the CPU in

terms of system integration

Full programming language features

Shared virtual memory: pointer is a pointer

Coherency

Context switching

HSA Foundation – an

industry-wide initiative

Page 7: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

7 | ISSCC Keynote | February 18th, 2013

BENEFITS OF HETEROGENEOUS SYSTEM ARCHITECTURE

Page 8: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

8 | ISSCC Keynote | February 18th, 2013

EFFECTIVE COMPUTE OFFLOAD

Made easy by HSA

Unleash the best compute elements depending on task

APU Accelerated

Software Applications

Serial and Task

Parallel Workloads

HSA Accelerated Processing Unit

Data Parallel Workloads

Page 9: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

9 | ISSCC Keynote | February 18th, 2013

0 fps

5 fps

10 fps

15 fps

20 fps

25 fps

CPU CPU+GPU

Performance

CPU Cores

CPU Cores

NB+GPU

NB+GPU

DRAM

DRAM

0 W

5 W

10 W

15 W

20 W

25 W

30 W

35 W

CPU CPU+GPU

Power

MOTION DSP 720P

BRINGING IT ALL TOGETHER

AMD internal testing: AMD E2-3200 APU (2 cores @ 2400Mhz, GPU:2 CU @ 444Mhz),

Windows 7 OS, MotionDSP vReveal Applications 720P MP4 input

(http://www.vreveal.com/stabilization)

>4.0X Better Energy Efficiency1

Synergistic use of GPU compute

+ shared memory

=

lower power and higher performance

Page 10: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

10 | ISSCC Keynote | February 18th, 2013

TODAY’S DISCUSSION: FROM SURROUND COMPUTING TO

ENABLING THE HOLODECK

1. A fully featured Holodeck is

still many years away

2. Today our discussion will:

Establish a Holodeck framework

Identify Holodeck enabling technologies

Discuss how Heterogeneous Systems

Architecture (HSA) accelerates these

technologies

Undertake an HSA deep dive on one of

these enabling technologies

Look at how new dedicated processors

will enable Holodeck functionality

Page 11: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

11 | ISSCC Keynote | February 18th, 2013

WHAT IS A HOLODECK?

Page 12: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

12 | ISSCC Keynote | February 18th, 2013

THE HOLODECK FRAMEWORK: AN EVOLUTION OF SURROUND COMPUTING

Natural User Interfaces

Context Computing

360 Degree Virtual

Environments

Page 13: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

13 | ISSCC Keynote | February 18th, 2013

HOLODECK ENABLING TECHNOLOGIES: PROFOUND IMPLICATIONS FOR COMPUTER ARCHITECTURE

Computational Photography Delivering seamless and immersive video environments

Directional Audio Using audio to enhance immersion and realism of our environments

Natural User Interfaces Enabling realistic, natural human

communication

Context Computing Delivering an intuitive understanding

of the user’s needs in real time

Augmented Reality Bringing it all together – combining the

real and the virtual

Page 14: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

14 | ISSCC Keynote | February 18th, 2013

COMPUTATIONAL PHOTOGRAPHY 360 DEGREE VISUAL ENVIRONMENTS, PHOTOSTITCHING, PERIPHERAL VISION AND HSA

Mapping real life scenes through finite images

Photo stitching of tiled environments and

perceptual correction

Detect interest points & match features

Projecting geometry with point features

using algorithms like RANSAC

Image processing to account for

curved screen surfaces

Modulate brightness to account for

peripheral vision

HSA presents a unified view of the

system with shared memory so CPU and

GPU acceleration in the entire process

Page 15: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

15 | ISSCC Keynote | February 18th, 2013

DIRECTIONAL AUDIO

Couples computationally demanding 3D

audio and spatialization effects with

"always on" background processing like

(VAD) Voice Activity Detection

Voice activity detection is best

implemented with special audio

processors and acceleration

techniques

Spatialization effects such as

“Convolution Reverb” are best

done with GPU acceleration

HSA enables seamless

integration of CPU and GPU

acceleration with other

independent accelerators

Page 16: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

16 | ISSCC Keynote | February 18th, 2013

Speech Recognition:

Background processing – echo

cancellation & noise suppression

Audio feature extraction

Voice pattern recognition through

Markov model or similar algorithm

Gesture Recognition:

Frame preprocessing & filtering

Optical flow or object tracking

Sophisticated computer vision

algorithms to delineate the hand or

body parts from the background

NATURAL USER INTERFACES

NUI algorithms all benefit from

CPU/GPU and audio processors to

efficiently perform these functions at

the lowest power

Page 17: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

17 | ISSCC Keynote | February 18th, 2013

CONTEXT COMPUTING BIOMETRICS EXAMPLE

• Facial Recognition:

• Face detection (is there a face) –

GPU acceleration

• Face identification (pattern

matching through algorithms like

Haar face detection) – CPU and

GPU acceleration

• Validation through blink detection

(make sure it is a real face) –

GPU acceleration

HSA enables mix and match of the best

acceleration for each phase of the

process

Page 18: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

18 | ISSCC Keynote | February 18th, 2013

AUGMENTED REALITY

• Image Registration:

• Relies on robust and fast feature

detection – benefits from

CPU/GPU acceleration

• Object Tracking:

• Relies on “optical flow” algorithm

– benefits from CPU/GPU

acceleration

• Image Composition:

• Once information exists from the

above, becomes a classic

graphics rendering use case

The building blocks of HSA enable the

augmented reality world.

Page 19: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

19 | ISSCC Keynote | February 18th, 2013

THE WAY FORWARD

Many technologies required to

enable our vision

– Heterogeneous engines that

accelerate key client and server

workloads

– Datacenters optimized for

latency, scalability, and

efficiency

– Processors optimized for new

and emerging workloads

– Active research into new

algorithms

Page 20: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

ENABLING TECHNOLOGY DEEP DIVE:

ACCELERATING NATURAL USER INTERFACES (HAAR

FACE DETECTION) WITH HETEROGENEOUS

SYSTEMS ARCHITECTURE

Page 21: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

21 | ISSCC Keynote | February 18th, 2013

LOOKING FOR FACES IN ALL THE RIGHT PLACES

Page 22: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

22 | ISSCC Keynote | February 18th, 2013

LOOKING FOR FACES IN ALL THE RIGHT PLACES

Quick HD Calculations

Search square = 21 x 21

Pixels = 1920 x 1080 = 2,073,600

Search squares = 1900 x 1060 = ~2 Million

Page 23: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

23 | ISSCC Keynote | February 18th, 2013

LOOKING FOR DIFFERENT SIZE FACES BY SCALING THE VIDEO FRAME

Page 24: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

24 | ISSCC Keynote | February 18th, 2013

LOOKING FOR DIFFERENT SIZE FACES BY SCALING THE VIDEO FRAME

More HD Calculations

70% scaling in H and V

Total Pixels = 4.07 Million

Search squares = 3.8 Million

Page 25: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

25 | ISSCC Keynote | February 18th, 2013

HAAR CASCADE STAGES

Feature l

Feature m

Feature p

Feature r

Feature q

Feature k

Stage N

Stage N+1

Face still possible? Yes

No

REJECT FRAME

Page 26: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

26 | ISSCC Keynote | February 18th, 2013

22 CASCADE STAGES, EARLY OUT BETWEEN EACH

STAGE 22 STAGE 21 STAGE 2 STAGE 1

NO FACE

FACE CONFIRMED

Final HD Calculations

Search squares = 3.8 million

Average features per square = 124

Calculations per feature = 100

Calculations per frame = 47 GCalcs

Calculation Rate

30 frames/sec = 1.4TCalcs/second

60 frames/sec = 2.8TCalcs/second

…and this only gets front-facing faces

Page 27: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

27 | ISSCC Keynote | February 18th, 2013

CASCADE DEPTH ANALYSIS

0

5

10

15

20

25Cascade Depth

20-25 15-20 10-15 5-10 0-5

Page 28: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

28 | ISSCC Keynote | February 18th, 2013

UNBALANCING DUE TO EXITS IN EARLIER CASCADE STAGES

Live

Dead

When running on the GPU, we run each search rectangle on a separate

work item

Early out algorithms, like HAAR, exhibit divergence between work items

– Some work items exit early

– Their neighbors continue

– SIMD packing suffers as a result

Page 29: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

29 | ISSCC Keynote | February 18th, 2013

0

10

20

30

40

50

60

70

80

90

100

1 2 3 4 5 6 7 8 9-22

Tim

e (

ms)

Cascade Stage

A10-4600M (6CU@497Mhz, 4 cores@2700Mhz)

GPU CPU

PROCESSING TIME/STAGE

AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 GHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,

6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)

Page 30: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

30 | ISSCC Keynote | February 18th, 2013

0

2

4

6

8

10

12

0 1 2 3 4 5 6 7 8 22

Imag

es/S

ec

Number of Cascade Stages on GPU

AMD A10-4600M APU (6CU@497Mhz, 4 cores@2700Mhz)

CPU HSA GPU

PERFORMANCE CPU-VS-GPU

AMD A10 4600M APU with Radeon™ HD Graphics; CPU: 4 cores @ 2.3 MHz (turbo 3.2 GHz); GPU: AMD Radeon HD 7660G,

6 compute units, 685MHz; 4GB RAM; Windows 7 (64-bit); OpenCL™ 1.1 (873.1)

Page 31: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

31 | ISSCC Keynote | February 18th, 2013

HAAR SOLUTION RUN DIFFERENT CASCADES ON GPU AND CPU

By seamlessly sharing data between CPU and GPU,

HSA allows the right processor to handle its appropriate

workload

+2.5x

-2.5x

INCREASED

PERFORMANCE DECREASED ENERGY

PER FRAME

Page 32: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

32 | ISSCC Keynote | February 18th, 2013

2x

4x

4x

5x

9x

10x

10x

12x

0 2 4 6 8 10 12 14

Face detect

Video stabilization

Stereo vision

Audio search

Visual Search

Voice recognition

Photo indexing

Gesture recognition

Acceleration vs. CPU

APPLICATION ACCELERATION USING HSA

AMD estimates Source:AMD Whitepaper, Accelerating Consumer/Prosumer Multimedia with HSA, June 2012

Page 33: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

33 | ISSCC Keynote | February 18th, 2013

HSA EVOLUTION

System Integration

GPU compute

context switch

GPU graphics

pre-emption

Quality of Service

Next Gen

Architectural Integration

Unified Address Space

for CPU and GPU

GPU uses pageable

system memory via

CPU pointers

Fully coherent memory

between CPU & GPU

Kaveri

Optimized Platforms

GPU Compute C++

support

User mode scheduling

Bi-Directional Power

Mgmt between CPU

and GPU

Trinity

Physical Integration

Llano

Integrate CPU & GPU

in silicon

Unified Memory

Controller

Common

Manufacturing

Technology

Page 34: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

34 | ISSCC Keynote | February 18th, 2013

HSA PROGRAMMABILITY ADVANTAGE

• Works with today’s programming models and languages

• Architected to enable CPU like programmability

• Promotes development and adoption of extended standards

• Write Once Run Anywhere – with Performance

C, C++, Java … OpenCL, C++

AMP, Java8 …

Domain-

Specific

Ext / APIs

DX11,

OpenGL …

HSA Intermediate Language (HSAIL)

Unified Programming Models HSA

Foundation

Compute Acceleration

Graphics Acceleration

Page 35: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

35 | ISSCC Keynote | February 18th, 2013

CONCLUSION

The age of traditional computing is

dead.

A paradigm shift in processing has

brought about the Heterogeneous

Systems Era

HSA will enable us to dramatically

scale processing power while

increasing power efficiency

The Holodeck still years away, but

HSA and dedicated hardware

blocks will accelerate and enable

technologies as they emerge

Page 36: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

36 | ISSCC Keynote | February 18th, 2013

ACKNOWLEDGEMENTS

Bill Herz

Phil Rogers

Marty Johnson

Chris Hook

Sumant Subramanian

Page 37: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

THANK YOU

Page 38: Heterogeneous Systems Architecture: The Next Area of Computing Innovation

38 | ISSCC Keynote | February 18th, 2013

DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and

typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to

product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences

between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or

otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to

time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO

RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN

NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES

ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF

SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, Radeon, and combinations thereof

are trademarks of Advanced Micro Devices, Inc. Other names and logos are used for informational purposes only and may

be trademarks of their respective owners.