Apu13 cp lu-keynote-final-slideshare

How many cores will we need?Chien-ping lu, phd

Sr. director, Mediatek inc

| how many cores will we need? | December 4, 2013 | Confidential2

a group of hippos is called …

A Crash


a group of crows is called …

A Murder


a group of giraffes is called …

A Tower

From Wikipedia


So, it is not surprising that we use

“A Parade” of elephants “An Army” of ants“A Herd” of sheep


From frequency to MULTIcore scaling

performance

Time Power wall: 2005

Parallel ComputingSerial Computing

Power

Power

Frequency


How many cores will we need?

Performance

Time

Moderate Massive


Performance

Time

2x 4x 3x

8x 4x 16x 4x

Dark silicon (OR DARK CORES)?


Light up the cores

power

Degree of Parallelism (number of cores)

Power ceiling

GPU-style “cores”

Parallelism wall

Little cores

Big cores

Redefine the cores to be heterogeneousRedefine the cores to be heterogeneous

Body tracking Ray tracing

Amdahl’s law

Dark Silicon:A concern on power

Dark Silicon:A concern on power

An argument against parallel computing

An argument against parallel computing


Front End

Front End

Front End

Front End

Front End

Front End

ALU

ALU

ALU

ALU

ALU

ALU

The elephants: CPU coresFor multiple-instruction-multiple-DATA (MIMD) execution

A CPU core runs 1 iteration of the parallel loopThe same color means the same piece of code

Front End

Front End

Front End

Front End

Front End

Front End

ALU

ALU

ALU

ALU

ALU

ALU

Retrofitted for moderately parallel workloads, and not very efficient for massively parallel workloadsParallel.For (…)

…

…

…

…Else


Front End

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Front End

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

Front End

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

ALU

army of ants: simt coresFor SIMT (single-instruction-multiple-thread ) Execution

A branch is emulated thru divergence

SIMT is the execution model of HSA and implemented in modern GPUs, with MIMD flexibility and SIMD efficiency

A cluster of SIMT cores shares one front end in a SIMD manner

Parallel.For (…)

…

…

…

…Else

A SIMT core runs 1 iteration of the parallel loop

SFU 0

SFU 1

Can achieve better power efficiency with more specialized function units given the right workload


Properties of massively data-parallel workloads

• Problem size N of the parallel workload can keep growing

• Visible serial workload s can be kept constant

• Communication overhead is proportional to log P (by a factor of r)

• Parallel workload is speeded up linearly by P, the number of cores

• "Embarrassingly" parallel, when there is no communication overhead (r=0)

N/PN/Pr log Pr log P

NN

ss

ss

Time saved by P cores


1log +++=Prs

PsSpeedup

PNPrs

NsSpeedup

/log +++=

Revisiting Amdahl's law for trend prediction


Mediatek face beautificationWhen it comes to beauty, there seems to be no limit

BeforeSkin tone adjustmentWrinkle removal Thinner face, bigger eyes


graphics keeps moving

Pac-man, 1980

GL benchmark 2.1 Egypt, 2011

GL benchmark 2.5 Egypt, 2012

GFX bench 2.7 T-Rex, 2013

GFX bench 3.0 Manhattan, 2013

Mobile 3D Graphics

Recognized by 94% of American Consumers

Highest grossing video game of all-time


HPC from 1993 to 2012‒GFLOPS ~ 130,000x‒Cores ~ 11,000x‒GHz ~ 10x

High-performance computing (HPC) keeps scaling out

Higher grid resolution

More time steps

More atoms


More coresMore cores Higher Frequency Higher Frequency

parallel killer apps are just around the corner

Moore’s lawMoore’s law

Bigger problemsBigger problems

DataDataBetter user experience

Better user experience

More complex software

More complex software

What bigger problems to solve with bigger data?

How solving bigger problems leads to better user experience?

Mining bigger data with Machine

Learning

Mining bigger data with Machine

Learning

completing the positive feedback loop

Bigger data-parallel workloads in Graphics

and HPC

Bigger data-parallel workloads in Graphics

and HPC


How to distinguish cat photos from dog ones?

ASIRRAAnimal Species Image Recognition for Restricting Access (from Microsoft Research)


Why is it hard?

Source: training set of Kaggle.com Dogs vs. Cats competition


is there a solution to relate photos from the same dog?

Prancer, a 5-years-old toy poodle, before and after grooming


MINE the solutions from the data

Dog-Cat

classifierD

og-Cat classifier

Theory of the differences between dogs and cats?

Theory of the differences between dogs and cats?

Learn from many (12,500) photos labeled as dogs or cats

Learn from many (12,500) photos labeled as dogs or cats

Machine LearningMachine Learning


machine learning: prediction with powerful models

More powerful have more knobs, which need to be determined with a bigger data set

The explosive growth of data has made very powerful models feasible

6th-order polynomial over-fits the 4 samples


From data to user experience

),( nn yx

{ }ia

x y

Knobs

Web-scale Data

Machine Learning

Determine to minimize the error between

nyand

{ }ia

nx { }iaModel

f

dog/cat photos dog or catSensor readings jogging, walking or climbingDepth images body motion

Bigger data lead to more powerful models

Bigger data lead to more powerful models

Examples:

x { }iaModel

fClient

Cloud

Powerful models with more knobs lead to better user experience

Powerful models with more knobs lead to better user experience


Smarter ClientSmarter ClientClientClient

SensingSensingBetter SensingBetter Sensing

ConnectivityConnectivityBetter

ConnectivityBetter

ConnectivityCloud

User Experience

User ExperiencePowerful ModelPowerful ModelData MiningData Mining Better User Experience

Better User Experience

Bigger Data Mining

Bigger Data Mining

More powerfulModel

More powerfulModel

Smart clients in the era of data

Big Training SetBig Training Set

Inputdata

Inputdata

Bigger Training Set

Bigger Training Set

In the cloud or the clients

Local Machine Learning

Local Machine Learning


The future is here‒There are already massively parallel

heterogeneous processors

There is no shame in being data-parallel‒One of the smartest things achieved

in computing is data parallel

Looking forward

Source: Le et al., Building High-level Features Using Large Scale Unsupervised Learning

Carbon footprint of US datacenters is at the same level as the airline industry

Go parallel and go heterogeneous to keep Mobile device cool in our palms Data centers clean for our

environment


Disclaimer & Attribution

The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes.

AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

ATTRIBUTION

© 2013 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo and combinations thereof are trademarks of Advanced Micro Devices, Inc. in the United States and/or other jurisdictions. SPEC is a registered trademark of the Standard Performance Evaluation Corporation (SPEC). Other names are for informational purposes only and may be trademarks of their respective owners.

Apu13 cp lu-keynote-final-slideshare

Technology

Transcript of Apu13 cp lu-keynote-final-slideshare