What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of...
Transcript of What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of...
![Page 1: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/1.jpg)
What is the State of Neural Network Pruning?
Davis Blalock*Jose Javier Gonzalez*Jonathan FrankleJohn V. Guttag
*equal contr ibut ion
![Page 2: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/2.jpg)
Blalock&Gonzalez 2
Overview
Meta-analysis of neural network pruning We aggregated results across 81 pruning papers and prunedhundreds of networks in controlled conditions
• Some surprising findings…
ShrinkBench Open source library to facilitate development and standardized evaluation of neural network pruning methods
![Page 3: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/3.jpg)
Blalock&Gonzalez
Part0:Background
3
![Page 4: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/4.jpg)
Blalock&Gonzalez
• Neural networks are often accurate but large• Pruning: Systematically removing parameters from a network
4
Neural Network Pruning
![Page 5: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/5.jpg)
Blalock&Gonzalez
Typical Pruning Pipeline
• Scoring importance of parameters• Schedule of pruning, training /
finetuning
• Structure of induced sparsity• Finetuning details — optimizer,
duration, hyperparameters
5
Data
Model
Pruning Algorithm Finetuning Evaluation
Many design choices:
![Page 6: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/6.jpg)
Blalock&Gonzalez
• Goal: Increase efficiency of network as much as possible with minimal drop in quality
• Metrics• Quality = Accuracy• Efficiency = FLOPs,
compression, latency…
• Must use comparable tradeoffs
6
Evaluating Neural Network Pruning
Accuracy of Pruned Network
6
![Page 7: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/7.jpg)
Blalock&Gonzalez
Part1:Meta-Analysis
7
![Page 8: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/8.jpg)
Blalock&Gonzalez
•We aggregated results across 81 pruning papers
•Mostly published in top venues
•Corpus closed under experimental comparison
8
Overview of Meta-Analysis
Venue #ofPapersarXivonly 22NeurIPS 16ICLR 11CVPR 9ICML 4ECCV 4BMVC 3IEEEAccess 2Other 10
![Page 9: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/9.jpg)
Blalock&Gonzalez
•Pruning works•Almost any heuristic improves efficiency with little performance drop•Many methods better than random pruning
•Don’t prune all layers uniformly
•Sparse models better for fixed # of parameters
9
Robust Findings
![Page 10: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/10.jpg)
Blalock&Gonzalez 10
Better Pruning vs Better Architecture
![Page 11: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/11.jpg)
Blalock&Gonzalez
Ideal Results Over Time
20152016201720182019
(Dataset,Architecture,Xmetric,Ymetric,Hyperparameters)→Curve
11
CompressionRatio
![Page 12: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/12.jpg)
Blalock&Gonzalez
Ideal Results Over Time
20152016201720182019
VGG-16onImageNet AlexNetonImageNet ResNet-50onImageNet
12
CompressionRatio CompressionRatio CompressionRatio
TheoreticalSpeedup TheoreticalSpeedup TheoreticalSpeedup
![Page 13: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/13.jpg)
Blalock&Gonzalez
Actual Results Over Time
20152016201720182019
VGG-16onImageNet AlexNetonImageNet ResNet-50onImageNet
13
CompressionRatio CompressionRatio CompressionRatio
TheoreticalSpeedup TheoreticalSpeedup TheoreticalSpeedup
![Page 14: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/14.jpg)
Blalock&Gonzalez
• Among 81 papers:• 49 datasets• 132 architectures• 195 (dataset, architecture) pairs
14
Quantifying the Problem
Dataset Architecture #ofPapersUsingPair
ImageNet VGG-16 22CIFAR-10 ResNet-56 14ImageNet ResNet-50 14ImageNet CaffeNet 11ImageNet AlexNet 9CIFAR-10 CIFAR-VGG 8ImageNet ResNet-34 6ImageNet ResNet-18 6CIFAR-10 ResNet-110 5CIFAR-10 PreResNet-164 4CIFAR-10 ResNet-32 4
All(dataset,architecture)pairsusedinatleast4papers
• Vicious cycle: extreme burden to compare to existing methods
![Page 15: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/15.jpg)
Blalock&Gonzalez
• Presence of comparisons:•Most papers compare to at most 1 other method • 40% papers have never been compared to• Pre-2010s methods almost completely ignored
•Reinventing the wheel:• Magnitude-based pruning: Janowsky (1989)• Gradient times magnitude: Mozer & Smolensky (1989)• “Reviving” pruned weights: Tresp et al. (1997)
15
Dearth of Reported Comparisons
![Page 16: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/16.jpg)
Blalock&Gonzalez
•Alice’s network has 10 million parameters. She prunes 8 million of them. What compression ratio might she report in her paper?A. 80%B. 20%C. 5xD. No reported compression ratio
16
Pop quiz!
![Page 17: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/17.jpg)
Blalock&Gonzalez
•Alice’s network has 10 million parameters. She prunes 8 million of them. What compression ratio might she report in her paper?A. 80%B. 20%C. 5xD. No reported compression ratio
17
Pop quiz!
![Page 18: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/18.jpg)
Blalock&Gonzalez
•According to the literature, how many FLOPs does it take to run inference using AlexNet on ImageNet?A. 371 millionB. 500 millionC. 724 millionD. 1.5 billion
18
Pop quiz!
![Page 19: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/19.jpg)
Blalock&Gonzalez
•According to the literature, how many FLOPs does it take to run inference using AlexNet on ImageNet?A. 371 millionB. 500 millionC. 724 millionD. 1.5 billion
19
Pop quiz!
![Page 20: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/20.jpg)
Blalock&Gonzalez
Part2:ShrinkBench
20
![Page 21: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/21.jpg)
Blalock&Gonzalez
Why ShrinkBench?
•Want to hold everything but pruning algorithm constant• Improved rigor, development time
21
Data
Model
Pruning Algorithm Finetuning Evaluation
Potential confounding factors
![Page 22: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/22.jpg)
Blalock&Gonzalez
Masking API
Model (+ Data) Pruning Masks -2.1 4.6 0.8 -0.1
0.2 1.5 -4.9 2.3
-2.5 2.7 4.2 -1.1
-0.3 5.0 3.1 4.7
0 1 0 0
0 0 1 0
1 1 1 0
0 1 0 1
-2.1 4.6 0.8 -0.1
0.2 1.5 -4.9 2.3
-2.5 2.7 4.2 -1.1
-0.3 5.0 3.1 4.7
-2.1 4.6 0.8 -0.1
0.2 1.5 -4.9 2.3
-2.5 2.7 4.2 -1.1
-0.3 5.0 3.1 4.7
0 0 1 0
0 0 1 1
1 1 1 1
0 1 0 0
0 1 0 0
0 0 1 0
1 1 1 0
0 1 0 1
Accuracy Curve
• Lets algorithm return arbitrary masks for weight tensors• Standardizes all other aspects of training and evaluation
22
![Page 23: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/23.jpg)
Blalock&Gonzalez
Crucial to Vary Amount of Pruning & Architecture
23
CIFAR-VGG ResNet-56
![Page 24: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/24.jpg)
Blalock&Gonzalez
Compression and Speedup are not Interchangeable
24
ResNet-18 on ImageNet
![Page 25: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/25.jpg)
Blalock&Gonzalez
Using Identical Initial Weights is essential
25
ResNet-56 on CIFAR-10
![Page 26: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/26.jpg)
Blalock&Gonzalez
• Pruning works• But not as well as improving architecture
•But we have no idea what methods work the best• Field suffers from extreme fragmentation in experimental setups
•We introduce a library/benchmark to address this• Faster progress in the future, interesting findings already
26
Conclusion
https://github.com/jjgo/shrinkbench
![Page 27: What is the State of Neural Network Pruning?02-14-30)-02-14-30-1413-what_is_the.pdfMeta-analysis of neural network pruning We aggregated results across 81 pruning papers and pruned](https://reader034.fdocuments.in/reader034/viewer/2022042109/5e89d341697b8d60e44c8e65/html5/thumbnails/27.jpg)
Blalock&Gonzalez
Questions?
27