Emerging Computing Trends in the Datacenter Dileep Bhandarkar,...

36
Qualcomm Datacenter Technologies, Inc. Emerging Computing Trends in the Datacenter Dileep Bhandarkar, Ph. D. Vice President, Technology Linaro Connect Keynote – 23 March 2018, Hong Kong

Transcript of Emerging Computing Trends in the Datacenter Dileep Bhandarkar,...

Page 1: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Qualcomm Datacenter Technologies, Inc.

Emerging Computing Trends in the Datacenter

Dileep Bhandarkar, Ph. D.Vice President, Technology

Linaro Connect Keynote – 23 March 2018, Hong Kong

Page 2: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Outline

• Historical Perspective on 40 Years of Moore’s Law– Single Core Era enabled by Dennard Scaling

• Post Dennard Scaling Drives Multi-Core Era• The Shift to Energy Efficient Multi-Core Designs for

the Cloud• Heterogenous Computing Era with Application

Specific Accelerators

Page 3: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

The First 50 Yearsafter

Shockley’s Transistor Invention

Page 4: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

1958: Jack Kilby’sIntegrated Circuit

My 40+ Year Journey From Mainframes to Smartphones https://www.youtube.com/watch?v=7ptXpNFY3XM

Bob Noyce’sIntegrated Circuit

Page 5: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

From 2300 to >1Billion Transistors

Moore’s Law video at http://www.cs.ucr.edu/~gupta/hpca9/HPCA-PDFs/Moores_Law_Video_HPCA9.wmv

Page 6: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Dennard ScalingDevice or Circuit Parameter Scaling Factor

Device dimension tox, L, W 1/K

Doping concentration Na K

Voltage V 1/K

Current I 1/K

Capacitance eA/t 1/K

Delay time per circuit VC/I 1/K

Power dissipation per circuit VI 1/K2

Power density VI/A 1

The benefits of scaling : as transistors get smaller, they can switch faster and use less power. Each new generation of process technology was expected to reduce minimum feature size by

approximately 0.7x (K ~1.4). A 0.7x reduction in linear features size provided roughly a 2x increase in transistor density.

Dennard scaling broke down around 2004 with unscaled interconnect delays and our inability to scale the voltage and current due to reliability concerns.

But increasing transistor density (Moore’s Law) has continued to enable multicore designs.

Page 7: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

THE MULTICORE ERA

SINGLE THREAD PERFORMANCE IMPROVEMENT SLOWING DOWN

PERFORMANCE DRIVEN BY HIGHER CORE COUNT

Post Dennard Scaling

Page 8: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Transistor CountIncreasing

Slower Improvement

No Improvement

Power Going UpWith Performance

Core count increasing to

drive Performance

Now Performance Improvement Comes from Higher Core Count at Similar Frequencywith Each New Process Node

Page 9: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

The last 5 Generations of ~135W Xeon Processors

Slow Improvement in IPC but per thread performance constrained by powerPerformance data from www.spec.org

8 coresMar 2012

10 coresSep 2013

12 coresSep 2014

14 coresApr 2016

18 coresJul 2017

Page 10: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

No Improvement in Perf/Watt per Coreeven with higher power

Performance data from www.spec.org

Page 11: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Era of Energy Efficient Cores

Page 12: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

© 2017 Arm Limited 12

Looking ahead from edge to cloudThe future requires a new approach to CPU design

Safe and autonomous Hyper-efficient

Secure private compute

Cortex beyond mobile Mixed reality

Presented by Peter Greenhalgh at Hot Chips 2017

Page 13: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

13

Cloud

Traditional Enterprise IT

% T

otal

dat

acen

ter s

erve

r rev

enue

0%

25%

50%

75%

100%

2013 2014 2015 2016 2017 2018 2019 2020

Server Industry is shifting to the Cloud

Page 14: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Disruptions Come from Below!

Mainframes

Minicomputers

RISC Systems

Desktop PCs

Notebooks

Smart Phones

Volume

Perf

orm

ance

Bell’s Law: hardware technology, networks, and interfaces allows new, smaller, more specialized computing devices to be introduced to serve a computing need.

Page 15: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

15

Qualcomm Datacenter TechnologiesUniquely positioned to leverage mobile growth and drive datacenter process leadership

65nm 45nm 28nm 20nm 10nm1st in theindustry

14nmMobile driven

NowThenFab process techdriven by PC

Fab process tech driven by mobile phones

PC driven

2008 2010 2012

2016

20182014 1.5B units

256M units

Smartphone unitsPC units

45nm 32nm 10nm14nm22nm

A new world in datacenter :

Manufacturing process

Mobile Technology Disrupting the Cloud Datacenter

Page 16: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

16

Qualcomm Centriq™

2400Throughput performanceThread DensityQuality of ServiceEnergy Efficiency

What Cloud means forProcessor Architecture

Key metrics• Perf / thread• Perf / Watt• Perf / mm2

The future requires a new approach to CPU design

Page 17: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Computational + server growthfuel datacenter energy efficiency considerations

• 2014: US datacenters consumed 70 billion kilowatt-hours of electricity

• Datacenters can cost between $10M and $20M per megawatt

• Unused datacenter capacity can be expensive • 1W of server power can cost $1 per year in energy

costs at 10 cents per KWH• Server power related costs can be 30-50% of overall

datacenter operating costs• Servers need to be designed for average power

consumption (not just max peak output)• Hyper-efficient designs necessary to improve server

energy efficiency

Page 18: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

18

Falkor duplex

Falkorduplex

Falkor duplex

Falkor duplex

Falkor duplex

Falkor duplex

Falkor duplex

Falkor duplex

Falkorduplex

Falkor duplex

Falkor duplex

Falkorduplex

Falkor duplex

Falkorduplex

Falkor duplex

Falkor duplex

Falkor duplex

Falkor duplex

Falkor duplex

Falkor duplex

Falkorduplex

Falkor duplex

Falkor duplex

Falkorduplex

8-S

erde

s

SA

TA C

TLH

DM

AE

MA

C

OC

ME

M

QG

ICU

SB

US

B

US

B

US

B

PW

QFP

RO

MIM

CM

PM

/CC

8-Serdes

PC

le

8-Serdes

8-Serdes

PC

le

8-Serdes

DDR DDR DDR

MCMCMC

DDR DDR DDR

Coherent segmented ring interconnect

L3L3L3L3 L3L3

L3L3L3L3 L3L3

MCMCMC

• 48 custom Armv8 cores at 2.6 GHz peak frequency• Large 60 MB L3 cache• 6 DDR4 memory channels at 2667 MT/s• High bandwidth coherent ring• Low average power under typical load• Ultra low idle power• Cache Quality of Service• Inline memory bandwidth compression• Security rooted in hardware• Leading performance and energy efficiency

Qualcomm Centriq 2400: Built for The Cloud

Details at https://www.qualcomm.com/products/qualcomm-centriq-2400-processor

Page 19: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

19

Qualcomm Centriq 2400 Drives Perf/W and Perf/Thread Leadership

1

1.71

1.04

1.25 1.3

8

1

1.18

0.77

0.93 0.

99

1

0.69 0.

74

0.75

0.72

1

2.02

1.84

1.86

1.70

1 1.01

0.92

0.93

0.85

1

0.24

0.59

0.40

0.27

Q D F 2 4 6 0 P L AT I N UM 8 1 8 0 G O L D 6 1 3 8 P L AT I N UM 8 1 6 0 P L AT I N UM 8 1 7 0

Power SPECintrate2006 Perf/Watt Perf/Core Perf/Thread Perf/$

IsoPower IsoPerf48 cores

120 W TDP657 SIR2006

$1,995

20 cores125 W TDP

504 SIR2006$2,612

26 cores165 W TDP

653 SIR2006$7,405

28 cores205 W TDP

775 SIR2006$10,009

Top BinE7 Price

24 cores150 W TDP

612 SIR2006$4,702

Top Bin E5 Price SKU

Performance based on internal tests for SPECintrate2006 (SIR) estimates using gcc O2

Page 20: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

20

Qualcomm Centriq 2460 Lowers Average and Idle Powerto Improve Cloud Server Density in Datacenters

0

20

40

60

80

100

120

Ave

rage

Pow

er (

Wat

ts)

8W idle power

400.perlbench

401.bzip2

403.gcc

429.mcf

445.gobmk

456.hmmer

458.libquantum

464.h264ref

471.omnetpp

473.astar

458.sieng

483.xalancbmk

SPECint®_rate2006 subtests

120W TDP

Median = 65W

Page 21: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

• Are we really serious about energy efficiency?• What should the Cost and Power constraints be? • How many instruction sets is too many?

• X86, ARM, MIPS, Power, RISC V• Have we reached the limit of high core count? SW Scalability?• Do we need to improve single thread general purpose performance?• What should the power limit be for a single socket?• How much performance are we willing to sacrifice for better security?• Is there a fundamental conflict between multi-tenancy and security?• Cost and convenience vs extreme security?• When does device scaling end? Will there be a sub pico nm era?

Many Questions to Ponder?

Page 22: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Heterogenous Computing Era

Page 23: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

• Energy efficiency must be a implicit design target• Desktop PC CPU cores are too power hungry and not energy efficient• Wimpy cores are not good enough for servers• Servers can be designed by scaling up energy efficient mobile core design philosophy• Many workloads run best on different kinds of specialized processing engines• Each processing engine has its own strengths

Lessons from Mobile Computing

Page 24: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

• Order of Magnitude higher computational efficiency than general purpose processors

• Can accept inefficient implementation to reduce time to market• Many potential applications

– Machine Learning– Encryption– Data Compression– Video processing

• Need reasonable volume for business case• Algorithms need to be stable• Can they be programmable? Where do FPGAs fit?

The Age of Application Specific Accelerators

Page 25: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Before the emergence of DNNs Algorithms and rule based systems were laboriously hand-codedBut by 2012, the ingredients for change were available

Sufficiently powerful GPU’s Readily available large data sets on the internet

The Emergence of Deep Neural Networks

Deep Neural Networks are becoming Pervasive

The turning point - ImageNet Competition 2012 “ImageNet Classification with Deep Convolutional Neural Networks”, Neural Information

Processing Systems Conference (NIPS 2012) Deep Neural Net enabled a performance breakthrough

Now - DNN’s are simpler to develop and deploy, ushering in radical change in many fields and entire industries

Page 26: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Deep Learning is Growing Exponentially

Source: Google

Source: Google

Page 27: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

2727

Devices, machines,and things are becoming more intelligent

Page 28: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

2828

Learn, infer context, anticipate

Reasoning

Act intuitively, interact naturally, protect privacy

Action

Hear, see, monitor, observe

Perception

Offering new capabilities to enrich our lives

Page 29: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

29

Where does compute need to be and why?

. . .

• Bandwidth / Backhaul traffic• Compute Resources

• Power/Thermal Envelope• Privacy & Security

• Latency • Reliability

Central CloudDevices Edge Cloud

Page 30: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

30

What is “Edge”?

Cloudlets / edge nodes / edge gateways◦ 5-20ms latency◦ Optionally co-located with access

networks◦ Few server racks per site

. . .

Customer devices◦ Smartphones, connected cars, drones,

IoT sensors/devices◦ < 2 ms latency; millions of devices

Customer premises◦ Enterprises, homes, stadiums, cars◦ < 5 ms latency; 1000s of devices

Centralized clouds◦ > 100 ms latency ◦ 5-100 per operator or cloud

service provider◦ 100s-1000s of server racks

per site

EDGE

Page 31: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Server/CloudTraining

Execution/Inference

DevicesExecution/Inference

AI is Increasingly Everywhere

Inference: on device, on the edge cloud, or centralized cloud depending on use case characteristics (latency, bandwidth, context)

Page 32: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

CPU• Free cycles available• ISA enhancements• Complementary with

other accelerators

GPU• Over-design (cost,

power) for AI

FPGA• Offers flexibility• Typically hard to

program & expensive

ASIC• Purpose-built• Energy and cost

efficient• Expensive to

design• Least flexible

Page 33: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Training tends toward concentrated, centralized computation

Inference tends toward wide distribution

GPUsLarge DPU

CPUsSmall DPU

CPUsSmall DPULow cost

GPUsLarge DPUHigher Cost

Page 34: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

CPUs are not powerful enough for training, but have free cycles available for inference – opportunity for add-in accelerator cards Instruction Set enhancements can improve performanceGPUs have too much “extra baggage” that add cost and power for features not needed for AI – opportunity for domain specific acceleratorsFPGAs offer more flexibility, but are difficult to program and expensiveASICs are energy and product cost efficient, but less flexible

Deep neural networks are making significant strides in many areas speech, vision, language, search, robotics, medical imaging & treatment, drug discovery …

We have an opportunity to dramatically reshape our computing devices to better serve this emerging and growing marketExpect to see lots of innovation and excitement in the years to come

Thoughts on Future Silicon for Deep Learning

Page 35: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

• Single thread general purpose performance improvement is slowing down• Energy efficiency is extremely important in datacenters• ARM architecture enables energy efficient designs with good performance• Typical-use efficiency is becoming more important than peak output efficiency

in enterprise data centers• Idle mode power will become more important for servers• Smart power management can dynamically optimize server operation to

improve efficiency in normal use• Security improvements need even if they cost performance• There is plenty of opportunity for innovation on new application specific

architectures targeted for specific workloads

Concluding Remarks

Speculation Can Lead to a Meltdown!

Page 36: Emerging Computing Trends in the Datacenter Dileep Bhandarkar, …connect.linaro.org.s3.amazonaws.com/hkg18/presentations/... · 2018-04-03 · Outline • Historical Perspective

Follow us on:For more information, visit us at:www.qualcomm.com & www.qualcomm.com/blog

Nothing in these materials is an offer to sell any of the components or devices referenced herein.

©2018 Qualcomm Technologies, Inc. and/or its affiliated companies. All Rights Reserved.

Qualcomm is a trademark of Qualcomm Incorporated, registered in the United States and other countries, Qualcomm Centriq and Falkor aretrademarks of Qualcomm Incorporated. Other products and brand names may be trademarks or registered trademarks of their respectiveowners.

References in this presentation to “Qualcomm” may mean Qualcomm Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries orbusiness units within the Qualcomm corporate structure, as applicable.

Qualcomm Incorporated includes Qualcomm’s licensing business, QTL, and the vast majority of its patent portfolio.Qualcomm Technologies, Inc., a wholly-owned subsidiary of Qualcomm Incorporated, operates, along with its subsidiaries, substantially all ofQualcomm’s engineering, research and development functions, and substantially all of its product and services businesses, inc luding itssemiconductor business, QCT.

Thank you