Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data...
Transcript of Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data...
![Page 1: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/1.jpg)
Stratos Idreos
![Page 2: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/2.jpg)
11:30am today, Yifan Wu from Berkeley: Real-Time Interactive Data Analytic Interfaces
11:30am tmr, K. Karanasos from Microsoft: (big data/ML)
Talks at the lab this week:
Discussion papers as of MondayReview submission every class
![Page 3: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/3.jpg)
Academic interest
Market share
Wealth of applicationsdeep le
arning
![Page 4: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/4.jpg)
Some input Some output
![Page 5: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/5.jpg)
Cat
Image recognition
Dog
Table
![Page 6: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/6.jpg)
ΓαταCat
Machine translation
![Page 7: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/7.jpg)
What happens in Vegas…
Auto-complete
…
![Page 8: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/8.jpg)
cat = f ( )
How do we do this mapping?
![Page 9: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/9.jpg)
cat = f ( )
How do we do this mapping?
![Page 10: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/10.jpg)
perceptron
weight
3.4
data
clean/transformactivation function
![Page 11: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/11.jpg)
Multi-layer perceptron“can represent a wide variety of interesting functions”
![Page 12: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/12.jpg)
Multi-layer perceptron
![Page 13: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/13.jpg)
Some weird neural network!
Features to labelsData to features
+
-
Deep neural networks
![Page 14: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/14.jpg)
Cat
Multiple layers
![Page 15: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/15.jpg)
Cat
Training phase: pass labeled data until we get to an acceptable accuracyInference: new data -> result
![Page 16: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/16.jpg)
How do we train these networks?
![Page 17: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/17.jpg)
Labeled
data
How do we train these networks?
![Page 18: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/18.jpg)
Forward pass to compute a prediction
Labeled
data
How do we train these networks?
![Page 19: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/19.jpg)
Forward pass to compute a prediction
Fish
Labeled
data
How do we train these networks?
![Page 20: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/20.jpg)
Forward pass to compute a prediction
Fish
Labeled
data
Backward pass to ‘slightly’ nudge the weights
How do we train these networks?
![Page 21: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/21.jpg)
Forward pass to compute a prediction
Fish
Labeled
data
Backward pass to ‘slightly’ nudge the weights
repeat until happy/convergence
How do we train these networks?
![Page 22: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/22.jpg)
Labeled
data
PERFORMANCE (TRAINING/INFERENCE)
![Page 23: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/23.jpg)
Labeled
data
PERFORMANCE (TRAINING/INFERENCE)
READING/WRITING DATA + COMPUTATION
![Page 24: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/24.jpg)
Labeled
data
ACCURACY
![Page 25: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/25.jpg)
![Page 26: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/26.jpg)
2012 2014 2015 2017
264 layers
152 layers
19 layers7 layers
Deep Residual Learning for Image Recognition [He et. al., 2016]
Layers in state-of-the-art NN
![Page 27: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/27.jpg)
CatDNN
![Page 28: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/28.jpg)
Single ModelEnsemble Model
CatDNN
DNN
DNN
DNN
![Page 29: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/29.jpg)
Single ModelEnsemble Model
CatDNN
DNN
DNN
DNN
Cat
Cat
Catastrophe
CAT
![Page 30: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/30.jpg)
Cat
Cat
Catastrophe
CAT
DNN
DNN
DNN
Representationally richer
Wisdom of the crowd
![Page 31: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/31.jpg)
finding the first principles of neural networks
![Page 32: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/32.jpg)
Cat
Cat
Catastrophe
CAT
DNN
DNN
DNN
VERY EXPENSIVE TO TRAIN
![Page 33: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/33.jpg)
100 variants of VGG-16 (different structures)
CIFAR-10
Dataset Training approachesFull-data Bagging
How expensive is it?
![Page 34: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/34.jpg)
Trai
ning
tim
e (h
rs.)
0
15
30
45
60
Number of neural networks1 21 41 61 81 100
Full-dataBagging
![Page 35: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/35.jpg)
35 min.27 min.
BaggingFull-Data
Time to add one neural network
Trai
ning
tim
e (h
rs.)
0
15
30
45
60
Number of neural networks1 21 41 61 81 100
Full-dataBagging
![Page 36: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/36.jpg)
35 min.27 min.
BaggingFull-Data
CIFAR-10
![Page 37: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/37.jpg)
35 min.27 min.
BaggingFull-Data
CIFAR-10 CIFAR-100 CIFAR-10 | ResNet
67 min.
35 min.
Complex data Complex model
516 min.
206 min.
![Page 38: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/38.jpg)
~250
~10
NN ensembles
other models ensembles
ensemble
size
![Page 39: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/39.jpg)
~250
~10
NN ensembles
other models ensembles
complex problems need larger ensembles
ensemble
size
![Page 40: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/40.jpg)
MotherNets Rapid Deep Ensemble Learningit is all about data movement and computation
![Page 41: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/41.jpg)
MotherNetsDNN architectures
![Page 42: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/42.jpg)
MotherNets
Capture structural similarity
i
![Page 43: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/43.jpg)
MotherNets
Capture structural similarity
i
![Page 44: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/44.jpg)
MotherNet
MotherNets
Capture structural similarity
i
![Page 45: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/45.jpg)
Trained
MotherNets
Capture structural similarity Train once
i ii
![Page 46: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/46.jpg)
Same function as MotherNet
MotherNets
Capture structural similarity Train once Transfer learned function
i ii iii
![Page 47: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/47.jpg)
MotherNets
Capture structural similarity Train once Transfer learned function Train incrementally
i ii iii iv
![Page 48: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/48.jpg)
MotherNets
Capture structural similarity Train once Transfer learned function Train incrementally
i ii iii iv
![Page 49: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/49.jpg)
I. Capture structural similarityFind the largest network structure common amongst all networks
![Page 50: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/50.jpg)
I. Capture structural similarityFind the largest network structure common amongst all networks
20 13
15 8 20
10 18 10 22
5NN1
NN2
NN3
L1 L2 L3 L4
MotherNet
Neu
ral N
etw
ork
Spec
ifica
tions
![Page 51: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/51.jpg)
I. Capture structural similarityFind the largest network structure common amongst all networks
20 13
15 8 20
10 18 10 22
5NN1
NN2
NN3
L1 L2 L3 L4
10
MotherNet
Neu
ral N
etw
ork
Spec
ifica
tions
![Page 52: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/52.jpg)
I. Capture structural similarityFind the largest network structure common amongst all networks
20 13
15 8 20
10 18 10 22
5NN1
NN2
NN3
L1 L2 L3 L4
810
MotherNet
Neu
ral N
etw
ork
Spec
ifica
tions
![Page 53: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/53.jpg)
I. Capture structural similarityFind the largest network structure common amongst all networks
20 13
15 8 20
10 18 10 22
5NN1
NN2
NN3
L1 L2 L3 L4
5810
MotherNet
Neu
ral N
etw
ork
Spec
ifica
tions
![Page 54: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/54.jpg)
I. Capture structural similarityFind the largest network structure common amongst all networks
20 13
15 8 20
10 18 10 22
5NN1
NN2
NN3
L1 L2 L3 L4
5810
MotherNet
Neu
ral N
etw
ork
Spec
ifica
tions
![Page 55: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/55.jpg)
I. Capture structural similarityFind the largest network structure common amongst all networks
20 13
15 8 20
10 18 10 22
5NN1
NN2
NN3
L1 L2 L3 L4
5810
MotherNet
Smaller than any of the ensemble networks
Neu
ral N
etw
ork
Spec
ifica
tions
![Page 56: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/56.jpg)
MotherNets
Capture structural similarity Train once Transfer learned function Train incrementally
i ii iii iv
![Page 57: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/57.jpg)
MotherNets
Capture structural similarity Train once Transfer learned function Train incrementally
i ii iii iv
![Page 58: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/58.jpg)
Function Preserving TransformationsIncrease the capacity (expressivity) of the networks while preserving their function (also accuracy)
II. Transfer learned function
![Page 59: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/59.jpg)
Function Preserving Transformations
Deepen the networkWiden the network
Increase the capacity (expressivity) of the networks while preserving their function (also accuracy)
II. Transfer learned function
![Page 60: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/60.jpg)
Function Preserving TransformationsIncrease the capacity (expressivity) of the networks while preserving their function (also accuracy)
Morph MotherNets to ensemble networks
![Page 61: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/61.jpg)
MotherNets
Capture structural similarity Train once Transfer learned function Train incrementally
i ii iii iv
![Page 62: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/62.jpg)
Capture structural similarity Train once Transfer learned function Train incrementally
i ii iii iv
full data bagging
![Page 63: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/63.jpg)
100 variants of VGG-16 (different structures)
CIFAR-10
Dataset Training approachesFull-data Bagging
How does it behave?
![Page 64: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/64.jpg)
![Page 65: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/65.jpg)
Trai
ning
tim
e (h
rs.)
0
15
30
45
60
Number of neural networks1 21 41 61 81 100
Full-dataBaggingMotherNets
![Page 66: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/66.jpg)
Trai
ning
tim
e (h
rs.)
0
15
30
45
60
Number of neural networks1 21 41 61 81 100
Full-dataBaggingMotherNets
![Page 67: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/67.jpg)
Trai
ning
tim
e (h
rs.)
0
15
30
45
60
Number of neural networks1 21 41 61 81 100
Full-dataBaggingMotherNets
7 min.
35 min.27 min.
BaggingFull-DataMotherNets
Time to add one neural network
![Page 68: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/68.jpg)
7 min.
35 min.27 min.
BaggingFull-DataMotherNets
CIFAR-10
Time to add one neural network
![Page 69: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/69.jpg)
7 min.
35 min.27 min.
BaggingFull-DataMotherNets
CIFAR-10 CIFAR-100 CIFAR-10 | ResNet
14 min.
67 min.
35 min.
155 min.
516 min.
206 min.
Complex data Complex model
Time to add one neural network
![Page 70: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/70.jpg)
Accuracy vs Performance?
![Page 71: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/71.jpg)
220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274
MotherNets
0 2 4 6 8
10 12 14 16 18
Vot
e
EA SL O
erro
r rat
e (%
)
baggingfull-data
KDMotherNets
(a) Ensemble test error rate
0
40
80
120
160
200
FD Bag. KD MNet
train
ing
time
(min
) MNV13V16
V16AV16B
V19
(b) Training time
0
5
10
15
20
25
30
13 12 11 10 9 8
train
ing
time
(min
)
test error rate (%)
MotherNetsfull-data
(c) Tradeoff
6
7
8
erro
r ra
te (%
)
0
100
200
1 2 3 4 5
time
(min
)
Number of clusters
(d) Varying the number of clusters
Figure 3: MotherNets achieves comparable individual and ensemble accuracy to the full-data approach but in a fraction ofthe training time – striking a better time-accuracy tradeoff.
networks in a function-preserving manner, we build a sep-arate MotherNet for each class (or a set of MotherNets ifeach class also consists of networks of diverse sizes). Thisenables MotherNets to closely capture the architecture classof every neural network in the diverse ensemble.
2.4. Training
Step 1: Training the MotherNet. First, the MotherNet forevery cluster is trained from scratch using the entire dataset until convergence. This allows the MotherNet to learn agood core representation of the data. Since the MotherNethas fewer parameters than any of the ensemble networks inits cluster (by construction) it is expected to take less timeto converge than any of the ensemble networks.
Step 2: Hatching Ensemble Networks. Once the Mother-Net is trained, the next step is to generate every ensemblenetwork by expanding the MotherNet through a sequenceof function-preserving transformations. We call this processhatching. Hatching yields the ensemble networks with pos-sibly larger size (more parameters) than the MotherNet butrepresenting the same function as learned by the MotherNet.The hatching process is instantaneous as generating everyensemble network requires a single pass on the Mother-Net. Every hatched ensemble network with size (number ofparameters) larger than the MotherNet now has additionaluntrained parameters that we need to further train. To ex-plicitly add diversity to the hatched networks, we randomlyperturb the newly introduced parameters. This is a stan-dard technique to create diversity when training ensemblenetworks.
Step 3: Training Ensemble Networks. The hatched en-semble networks are trained using bootstrap aggregation(bagging) (Breiman, 1996). Bagging is an effective andwidely used method to create diversity and to reduce thevariance of ensembles (Hansen & Salamon, 1990; Guzman-Rivera et al., 2014; Lee et al., 2015b). This is becausedifferent models in the ensemble are trained using differ-ent overlapping subsets of the data. However, recent work
has shown that training neural network ensembles throughbagging results in decreased generalization accuracy as itreduces the number of unique data items seen by individualneural networks. Since neural networks have a large numberof parameters they are affected relatively more from thisreduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos, 1999;Lee et al., 2015a). Hatching the networks from the Mother-Net and then further training them using bagging overcomesthis problem.
We explain in supplementary material Section B how Moth-erNets’ training process can be parallelized and how state-of-the-art general network training optimizations can beincorporated into MotherNets.
2.5. Optimizing for Inference
While our focus is on reducing training costs, the Mother-Nets approach can also help during inference. To optimizean ensemble trained through MotherNets for inference cost(time and memory requirement), we share the incremen-tal training of MotherNet parameters. To achieve this, weconstruct a shared-MotherNet from the hatched networksand train the shared-MotherNet incrementally instead ofthe hatched networks. In shared-MotherNet, the hatchednetworks are combined together such that they have a sin-gle copy of MotherNet parameters, which is jointly trainedinstead of being trained independently by each ensemblenetwork. This yields an ensemble with fewer number ofparameters that is more efficient to infer from and is morecompact to deploy. We explain this approach to optimizeensemble inference as well as provide experimental resultsin supplementary material Section C.
3. ExperimentsWe show how MotherNets can be used to accelerate thetraining of diverse ensembles of neural networks.
Training Setup. We adopt stochastic gradient descent with
Accuracy vs Performance?
![Page 72: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/72.jpg)
Accelerating inference in MotherNets
Inference cost
![Page 73: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/73.jpg)
Accelerating inference in MotherNets
Inference cost TrainingMaintain common MotherNets parameters
MN
IV —
Incr
emen
tal T
rain
ing
![Page 74: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/74.jpg)
Accelerating inference in MotherNets
Inference cost Training Inference
Full-pass
Maintain common MotherNets parameters
MN
IV —
Incr
emen
tal T
rain
ing
![Page 75: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/75.jpg)
Accelerating inference in MotherNets
Inference cost Training Inference
Full-pass
Partial
passes
Maintain common MotherNets parameters
MN
IV —
Incr
emen
tal T
rain
ing
![Page 76: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/76.jpg)
Accelerating inference in MotherNets
CIFAR-10
Ensemble of 5 VGGNetsLayers between 13 and 34
Initial Results
![Page 77: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/77.jpg)
Accelerating inference in MotherNets
1.95 min.
3.71 min.
StandardShare-MN
Inference time
7.9%7.5%
Test error rate
Initial results
![Page 78: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/78.jpg)
finding the first principles of neural networks
![Page 79: Stratos Idreos - DASlabdaslab.seas.harvard.edu/classes/cs265/files/... · reduction in unique data items than ensembles of other clas-sifiers such as decision trees or SVMs (Domingos,](https://reader034.fdocuments.in/reader034/viewer/2022042410/5f273b98028bf671f70c4cee/html5/thumbnails/79.jpg)
Stratos Idreos