AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation...

18
AdaNet: Adaptive Structural Learning of Artificial Neural Networks Corinna Cortes 1 , Xavier Gonzalvo 1 , Vitaly Kuznetsov 1 , Mehryar Mohri 21 , Scott Yang 2 1 Google Research 2 Courant Institute ICML, 2017/ Presenter: Anant Kharkar Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation) AdaNet: Adaptive Structural Learning of Artificial Neural Networks ICML, 2017/ Presenter: Anant Kharkar 1 / 18

Transcript of AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation...

Page 1: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

AdaNet: Adaptive Structural Learning of ArtificialNeural Networks

Corinna Cortes1, Xavier Gonzalvo1, Vitaly Kuznetsov1, MehryarMohri21, Scott Yang 2

1Google Research

2Courant Institute

ICML, 2017/ Presenter: Anant Kharkar

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 1

/ 18

Page 2: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Outline

1 IntroductionMotivation & Contributions

2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds

3 AdaNetBasicsAlgorithmResultsVariations

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 2

/ 18

Page 3: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Outline

1 IntroductionMotivation & Contributions

2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds

3 AdaNetBasicsAlgorithmResultsVariations

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 3

/ 18

Page 4: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Motivation & Contributions

Motivation

Lack of theory behind model architecture design

Usually requires domain knowledge

Contributions

New regularizer that learns parameters & architecture simultaneously

Additive model

Context

Strongly convex optimization problems

Generic network structure (not just MLP)

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 4

/ 18

Page 5: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Outline

1 IntroductionMotivation & Contributions

2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds

3 AdaNetBasicsAlgorithmResultsVariations

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 5

/ 18

Page 6: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Regularizer Derivation

Function families:

H1 = {x 7→ u ·Ψ(x) : u ∈ Rn0 , ‖u‖p ≤ Λ1,0}

Hk =

{x 7→

k−1∑s=1

us · (ϕs ◦ hs)(x) : us ∈ Rns , ‖us‖p ≤ Λk,s , hk,s ∈ Hs

}Definitions:

x: input

u: connections to previous layers

Ψ: feature vector

ϕ: activation function

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 6

/ 18

Page 7: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Outline

1 IntroductionMotivation & Contributions

2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds

3 AdaNetBasicsAlgorithmResultsVariations

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 7

/ 18

Page 8: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Generalization Bounds

Rademacher Complexity:

R̂S(G) =1

mEσ

[suph∈G

m∑i=1

σih(xi )

]

Definitions:

x: input

h: function expressed by single unit

σ: Rademacher RV {-1, +1}

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 8

/ 18

Page 9: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Generalization Bounds

Theorem 1:

R(f ) ≤ R̂S ,ρ(f ) +4

ρ

l∑k=1

‖wk‖1Rm(H̃k) +2

ρ

√log l

m+ C (ρ, l ,m, δ)

C (ρ, l ,m, δ) =

√⌈4

ρ2log(

ρ2m

log l)

⌉log l

m+

log( 2δ )

2m

l∑k=1

‖wk‖1Rm(H̃k): weighted sum of Rademacher complexities

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 9

/ 18

Page 10: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Generalization Bounds

Corollary 1:

R(f ) ≤ R̂S,ρ(f ) +2

ρ

l∑k=1

‖wk‖1

[r̄∞ΛkN

1q

k

√2log(2n0)

m

]

+2

ρ

√log l

m+ C (ρ, l ,m, δ)

C (ρ, l ,m, δ) =

√⌈4

ρ2log(

ρ2m

log l)

⌉log l

m+

log( 2δ )

2m

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 10

/ 18

Page 11: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Outline

1 IntroductionMotivation & Contributions

2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds

3 AdaNetBasicsAlgorithmResultsVariations

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 11

/ 18

Page 12: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Basics

Adaptively grows network structure

Objective function:

F (w) =1

m

m∑i=1

Φ

(1− yi

N∑j=1

wjhj

)+

N∑j=1

Γj |wj |

Definitions:

Φ: nondecreasing convex function (Ex: ex)

Γj : λrj + β

rj : Rm(Hkj ) (Rademacher complexity)

This objective function is forcibly convex

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 12

/ 18

Page 13: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Outline

1 IntroductionMotivation & Contributions

2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds

3 AdaNetBasicsAlgorithmResultsVariations

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 13

/ 18

Page 14: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Algorithm

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 14

/ 18

Page 15: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Outline

1 IntroductionMotivation & Contributions

2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds

3 AdaNetBasicsAlgorithmResultsVariations

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 15

/ 18

Page 16: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Results

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 16

/ 18

Page 17: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Outline

1 IntroductionMotivation & Contributions

2 Theoretical BackgroundRegularizer DerivationGeneralization Bounds

3 AdaNetBasicsAlgorithmResultsVariations

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 17

/ 18

Page 18: AdaNet: Adaptive Structural Learning of Artificial Neural ... · Outline 1 Introduction Motivation & Contributions 2 Theoretical Background Regularizer Derivation Generalization Bounds

Variations

AdaNet.R: R(w,h) = Γh‖w‖1 regularization in obj.

AdaNet.P: subnetworks only connected to previous subnetwork

AdaNet.D: dropout on subnetwork connections

AdaNet.SD: std dev of final layers instead of Rademacher complexity

Corinna Cortes, Xavier Gonzalvo, Vitaly Kuznetsov, Mehryar Mohri, Scott Yang (Northwestern University and Intel Corporation)AdaNet: Adaptive Structural Learning of Artificial Neural NetworksICML, 2017/ Presenter: Anant Kharkar 18

/ 18