Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could...
Transcript of Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could...
![Page 1: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/1.jpg)
Accelerating algorithmic and hardware advancements for power efficient on-device AI
Qualcomm Technologies, Inc.
@qualcomm_techJuly 2018
![Page 2: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/2.jpg)
2
By 2025, the data center sector could be using 20% of all available electricity in the world1
A cloud provider used the equivalent energy consumption of ~366,000 US households in 20142
Bitcoin mining in 2017 used the same energy as did all of Ireland3
Computers are consuming an increasing amount of energy
1. Andrae, Anders (2017) Total Consumer PowerConsumption Forecast; 2. The Verge (2014); 3.The Guardian (2017)
Given the economic potential of AI, these numbers will only be increasing
![Page 3: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/3.jpg)
33
Will we have reached the capacity of the human brain?
Energy efficiency of a brain is 100x better than current hardware
Source: Welling
2025
En
erg
y c
on
su
mp
tio
n
1940 1950 1960 1970 1980 1990 2000 2010 2020 2030
1943: First NN (+/- N=10)
1988: NetTalk
(+/- N=20K)
2009: Hinton’s Deep
Belief Net (+/- N=10M)
2013: Google/Y!
(N=+/- 1B)
2025:
N = 100T = 1014
2017: Extremely large
neural networks (N = 137B)
1012
1010
108
106
1014
104
102
100
Deep neural networks are energy hungry and growing fastAI is being powered by the explosive growth of deep neural networks
![Page 4: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/4.jpg)
44
Value created by AI must exceed the cost to run the service
Broad economic viability requires energy efficient AI
Economic feasibility per transaction may require cost as low as a micro-dollar (1/10,000th of a cent)
• Personalized advertisements and recommendations
• Smart security monitoring based on image and sound recognition
• Efficiency improvements for smart cities and factories
![Page 5: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/5.jpg)
5
The AI power and thermal ceiling
The challenge of AI workloads
Very compute intensive
Complex concurrencies
Real-time
Always-on
Constrained mobile environment
Must be thermally efficient for sleek, ultra-light designs
Requires long battery life for all-day use
Storage/memory bandwidth limitations
![Page 6: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/6.jpg)
66
Soon, AI algorithms will be
measured by the amount
of intelligence they provide
per watt hour.
![Page 7: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/7.jpg)
77
Weight parameter
Activation node
Edges
Parts Objects
OutputPixels
Dog?
Cat?
Deep learningThe good
Convolutional
neural networks
(CNNs) have been
very successful
• Extract learnable features with state-of-the art results
• Encode location invariance, namely that the same object may appear anywhere inthe image
• Share parameters, making them “data efficient”
• Execute quickly on modern hardware with parallel processing
![Page 8: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/8.jpg)
88
Weight parameter
Activation node
Edges
Parts Objects
OutputPixels
Dog?
Cat?
Deep learningThe bad and the ugly
CNNs use too much
memory, compute,
and energy (today)
• CNNs do not encode additional symmetries, such as rotation invariance (object may appear in any orientation1)
• CNNs do not reliably quantify the confidence in a prediction
• CNNs are easy to fool by changing the input only slightly, such as adversarial examples
1. Cohen & Welling 2016
![Page 9: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/9.jpg)
99
Edges
Parts Objects
OutputPixels
Dog?
Cat?
Bayesian deep-learningaddresses these challenges
Inspired by brain functionality, introducing noise to neural networks is beneficial
Noise can be a good thing for AI
Compressionand quantization
• Reduce complexityof the neural network model
• Reduce bit-width of the parameters and activations
• Save power andimprove efficiency
Introduce noise to weights Noise propagates to activations
![Page 10: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/10.jpg)
1010
Edges
Parts Objects
OutputPixels
Dog
Cat
w1
w2 (Y)
w2
w1 Initial weight values Apply Bayesiandeep-learning to shrink the model
Compression
and quantization
• Quantize weights:Use lower precision (bit-width)
• Prune activations:Reduce number of activation nodes
![Page 11: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/11.jpg)
1111
Edges
Parts Objects
OutputPixels
Introduce noise to parameters
Dog
Cat
w1
w2 (Y)
Prior P(W)—distribution before data (X)
w2
w1 Initial weight values Apply Bayesiandeep-learning to shrink the model
Compression
and quantization
• Quantize weights:Use lower precision (bit-width)
• Prune activations:Reduce number of activation nodes
![Page 12: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/12.jpg)
1212
Edges
Parts Objects
OutputPixels
Introduce noise to parameters
Add data (X)
Dog
Cat
w1
w2 (Y)
Prior P(W)—distribution before data (X)
w2
Posterior P(W | X,Y) —distribution after data
w1 Initial weight values Apply Bayesiandeep-learning to shrink the model
Compression
and quantization
• Quantize weights:Use lower precision (bit-width)
• Prune activations:Reduce number of activation nodes
![Page 13: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/13.jpg)
1313
Edges
Parts Objects
OutputPixels
Introduce noise to parameters
Add data (X)
Dog
Cat
w1
w2
Prune activations(similar method as
quantization)
X
Quantize weights(or even remove)
(Y)
Prior P(W)—distribution before data (X)
w2
Posterior P(W | X,Y) —distribution after data
w1 is pinned down
w2 is still highly uncertain
w1 Initial weight values Apply Bayesiandeep-learning to shrink the model
Compression
and quantization
• Quantize weights:Use lower precision (bit-width)
• Prune activations:Reduce number of activation nodes
![Page 14: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/14.jpg)
14
Bayesian deep learning provides broad benefitsA powerful tool to address a variety of deep learning challenges
Compression and quantization
Quantize parameters and
activations, prune model
components
Regularization and generalization
Avoid overfitting data; choose
the simplest model to explain
observations (Occam’s razor)
Confidence estimation
Generate the confidence
intervals of the predictions
Privacy/adversarial robustness
Avoid storing personal
information in parameters, be
less sensitive to adversarial
attacks
![Page 15: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/15.jpg)
1515
83%
84%
85%
86%
87%
88%
89%
90%
1 1.5 2 2.5 3 3.5
To
p 5
Accu
racy
Compression ratio
ResNet-18 on a large set of labeled images
Baseline Channel pruning SVD Bayesian pruning
Applying Bayesian deep learning to real use casesImage classification
Zhang, Zou, He, & Sun 2015 (SVD);
He, Zhang, & Sun 2017 (channel pruning)
compression ratio while maintaining close to the same accuracy3X
![Page 16: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/16.jpg)
16
What does the future look like for AI hardware?
Distributed computing architectures
AI accelerationresearch
![Page 17: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/17.jpg)
1717
AI acceleration research
Balance AI acceleration capabilities across engines (CPU, GPU, DSP)
Focused on fundamentals to accelerate deep learning workloads at low power
Memory hierarchyOptimize the memory hierarchy to reduce
the power consumption of data movement
while ensuring performance
Compute architectureOptimize instruction type, parallelism, and precision of the functional units
UtilizationExploit sparsity to improve utilization and reduce power consumption
![Page 18: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/18.jpg)
18
AlgorithmicadvancementsNeural network algorithm design
optimized for hardware
Optimization for space and
runtime efficiency
EfficienthardwareEfficient architecture design
Selecting the right compute
engine for the right task
SoftwaretoolsSoftware-accelerated
run-time for deep learning
Neural processing SDK for
model compression and
optimization
The approach for makingpower efficient on-device AI
Focusing on the joint
design of algorithms and
hardware to achieve
high performance
![Page 19: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/19.jpg)
19
AI algorithms and hardware
need to be energy efficient
We are a leader in Bayesian
deep learning and its
applications to model
compression and quantization
We are doing fundamental
research on AI algorithms,
software, and hardware to
maximize power efficiency
![Page 20: Accelerating algorithmic and hardware advancements for …...2 By 2025, the data center sector could be using 20% of all available electricity in the world1 A cloud provider used the](https://reader034.fdocuments.in/reader034/viewer/2022050515/5f9f7c176e23a03d03154dbe/html5/thumbnails/20.jpg)
Follow us on:
For more information, visit us at:
www.qualcomm.com & www.qualcomm.com/blog
Thank you
Nothing in these materials is an offer to sell any of the
components or devices referenced herein.
©2018 Qualcomm Technologies, Inc. and/or its affiliated
companies. All Rights Reserved.
Qualcomm is a trademark of Qualcomm Incorporated,
registered in the United States and other countries. Other
products and brand names may be trademarks or registered
trademarks of their respective owners.
References in this presentation to “Qualcomm” may mean Qualcomm
Incorporated, Qualcomm Technologies, Inc., and/or other subsidiaries
or business units within the Qualcomm corporate structure, as
applicable. Qualcomm Incorporated includes Qualcomm’s licensing
business, QTL, and the vast majority of its patent portfolio. Qualcomm
Technologies, Inc., a wholly-owned subsidiary of Qualcomm
Incorporated, operates, along with its subsidiaries, substantially all of
Qualcomm’s engineering, research and development functions, and
substantially all of its product and services businesses, including its
semiconductor business, QCT.