New NetTailor: Tuning the Architecture, Not Just the Weightsmorgado/nettailor/poster.pdf · 2019....

NetTailor: Tuning the Architecture, Not Just the WeightsPedro Morgado, Nuno Vasconcelos

University of California, San Diego

NetTailorAbstract Evaluation

Real-world applications of object recognition often require the solution of multiple tasks

in a single platform. We propose a transfer learning procedure, denoted NetTailor, in

which layers of a pre-trained CNN are used as universal blocks that can be combined with

small task-specific layers to generate new networks. In this way, NetTailor can adapt the

network architecture, not just its weights, to the target task.

Experiments show that networks adapted to simpler tasks become significantly smaller

than those adapted to complex tasks. Due to the modular nature of the procedure, this

reduction is achieved without compromise of either parameter sharing across tasks, or

classification accuracy.

Complexity Constraints

Conditions for removing block 𝑩𝒊𝒋 Probability

Self exclusion: Small coefficient 𝛼 associated with the block. 𝑃 𝑅𝑠𝑒𝑙𝑓𝑖,𝑗

= 1 − 𝛼𝑖𝑗

Input exclusion: When all incoming paths are removed 𝑃 𝑅𝑖𝑛𝑝𝑢𝑡𝑖,𝑗

=ෑ

𝑘

1 − 𝛼𝐼𝑘

Output exclusion: When all departing paths are removed 𝑃 𝑅𝑜𝑢𝑡𝑝𝑢𝑡𝑖,𝑗

=ෑ

𝑘

1 − 𝛼𝑂𝑘

𝐸[𝒞] =

𝑖,𝑗

𝒞𝑖𝑗1 − 𝑃 𝑅𝑠𝑒𝑙𝑓

𝑖,𝑗∪ 𝑅𝑖𝑛𝑝𝑢𝑡

𝑖,𝑗∪ 𝑅𝑜𝑢𝑡𝑝𝑢𝑡

𝑖,𝑗Expected Complexity

High performance on all tasks

Models share parameters and computation to be supported in the same device

Models trained independently of tasks/datasets for distributed development.

Adaptive model complexity (Easier tasks→ simple model, Complex → complex)

Goals

HP

SP

DD

AC

Prior Work

Feature Extraction HP SP DD AC Universal classifier HP SP DD AC

Residual Adapters

⋯⋯Task 1 Task 2 Task 3 Task 4

Pre

trai

ne

d M

od

el

HP SP DD ACFinetuning

Pre

trai

ne

d M

od

el

Task 1 Task 2 Task 3 Task 4

⋯

HP SP DD AC

Idea

U

U

U

U

U

P

P

P

P

P

C

P

P

P

P

U

U

U

U

U

P

P

P

P

P

C

P

P

P

P

U

U

U

P

C

PP

P

U

U

U

U

U

C

1 2 3

Start with a pre-trained network that remains unchanged.

Augment the original network ( ) with a set of small task-specific blocks ( )

Find the smallest combination of layers and connections that solves the task.

1

2

3

U P

Learned architectures

SVHN

Flowers

VOC

Accuracy

# Params

FLOPs

# Univ Layers

Prunned Unprunned

Accuracy

# Params

FLOPs

# Univ Layers

Accuracy

# Params

FLOPs

# Univ Layers

#Univ. LayersFLOPs

# ParamsAccuracy

#Univ. LayersFLOPs

# ParamsAccuracy

#Univ. LayersFLOPs

# ParamsAccuracy

Complex tasks require larger models, simpler task require smaller models

Path SelectionU

U

U

U

U

P

P

P

P

P

C

P

P

P

P

𝑈𝑖

𝑃𝑖−1𝑖

𝒙𝑖−1

𝑃𝑖−2𝑖

𝒙𝑖−2

𝑃𝑖−3𝑖

𝒙𝑖−3𝛼𝑖−3𝑖

𝛼𝑖−2𝑖

𝛼𝑖−1𝑖

𝛼𝑖𝑖 +

𝒙𝑖

An attention coefficient 𝛼 ∈ 0,1 for each block

𝒙𝑖 = 𝛼𝑖𝑖𝑈𝑖 𝒙𝑖−1 +

𝑘

𝛼𝑘𝑖 𝑃𝑘

𝑖 𝒙𝑘

Blocks removed from inference path when 𝛼 is small.

Coefficients 𝛼 are parameterized by a softmaxdistribution to force network paths to compete

Teacher NetworkTeacher Student

𝒅

𝒅

𝒅

To preserve performance, a teacher network finetuned on the target task is used to supervise the student at each stage.

𝐿𝑡 𝒙 =

𝑙

𝒙𝑙𝑡 − 𝒙𝑙

2

Learning1st Stage: Tuning the architecture.

Student model optimized by SGD under the loss ℒ = 𝐿𝑐 𝒙, 𝑦 + 𝜆1𝐿𝑡 𝒙 + 𝜆2𝐸[𝒞].

2nd Stage: Pruning and retraining.Blocks with small 𝛼 are removed. Remaining parameters retrained with 𝜆2 = 0.

❖ Study different block pruning strategies

❖ Extend NetTailor path selection framework to tasks beyond recognition (e.g. detection, semantic segmentation, multi-task learning)

❖ Study the benefits of NetTailor to general neural architecture search

NetTailor on different datasets

0.00

5.00

ImageNet Flowers CIFAR100 Dped UCF101 SVHN DTD Omniglot Aircraft GTSRPar

amet

ers

[M]

Prunned Unprunned

0.00

0.20

0.40

0.60

ImageNet Flowers CIFAR100 Dped UCF101 DTD Aircraft SVHN Omniglot GTSR

FLO

PS

[B]

Compression rates up to 80% of parameters and 66% of FLOPs

NetTailor on different base networks

# Params (M)

Acc

ura

cy (

%)

Original

Pruned

SVHN

Flowers

VOC

ResNet50 pruned to the size of ResNet18 while preserving its

performance

Support

https://pedro-morgado.github.io/nettailor

Website

https://github.com/pedro-morgado/nettailor

Repository


⋯

Feature extractor jointly trained for all tasks


⋯

Initialize

Pre

trai

ne

d M

od

el Feature extractor

remains frozen after pre-training.

Future Work

New NetTailor: Tuning the Architecture, Not Just the Weightsmorgado/nettailor/poster.pdf · 2019....

Documents

Transcript of New NetTailor: Tuning the Architecture, Not Just the Weightsmorgado/nettailor/poster.pdf · 2019....