Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A...

53
Deep Declarative Networks: A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian National University 2019

Transcript of Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A...

Page 1: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Deep Declarative Networks:A New Hope

A/Prof. Stephen Gould

Research School of Computer Science

The Australian National University

2019

Page 2: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

slide credit: Dylan Campbell

Page 3: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

What did we gain?

βœ“ Better-than-human performance on closed-world classification tasks

βœ“ Very fast inference (with the help of GPU acceleration)βœ“ versus very slow iterative optimization

procedures

βœ“ Common tools and software frameworks for sharing research code

βœ“ Robustness to variations in real-world data if training set is sufficiently large and diverse

Clear mathematical models; separation between algorithm and objective (loss function)

Theoretical performance guarantees

Interpretability and robustness to adversarial attacks

Ability to enforce hard constraints

Intuition guided by physical models

Parsimony – capacity consumed learning what we already know

What did we lose?

S Gould | 3

Page 4: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

What if we could have the best of both worlds?

S Gould | 4

Page 5: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Deep learning models

β€’ Linear transforms (i.e., convolutions)

β€’ Elementwise non-linear transforms

β€’ Spatial/global pooling

S Gould | 5

Page 6: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Deep learning layerπ‘₯: input

𝑦: output

πœƒ: local parameters

𝑓: forward function

𝐽: global error / loss

𝑦 = 𝑓 π‘₯; πœƒπ‘₯ 𝑦

𝑑𝐽

𝑑𝑦

𝑑𝐽

𝑑π‘₯

πœƒ

𝑑𝐽

π‘‘πœƒ

processing node

S Gould | 6

Page 7: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

End-to-end computation graph

𝑓2 𝑧1; πœƒ 𝑓3 𝑧2; πœƒ 𝑓4 𝑧3; πœƒ

𝑓5 𝑧1; πœƒ 𝑓6 𝑧5; πœƒ 𝑓7 𝑧6; πœƒ

𝑓1 π‘₯; πœƒ 𝑓8 𝑧4, 𝑧7; πœƒ

π‘₯

𝑦

𝑦 = 𝑓8 𝑓4 𝑓3 𝑓2 𝑓1 π‘₯ , 𝑓7 𝑓6 𝑓5 𝑓1 π‘₯

S Gould | 7

Page 8: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

End-to-end learning

β€’ Learning is about finding parameters that maximize performance,argmaxπœƒ performance(model(πœƒ))

β€’ To do so we need to understand how the model output changes as a function of its input and parameters

β€’ (Local based learning) incrementally updates parameters based on a signal back-propagated from the output of the network

β€’ This requires calculation of gradients

𝑦 = 𝑓 π‘₯; πœƒπ‘₯ 𝑦

𝑑𝐽

𝑑𝑦

𝑑𝐽

𝑑π‘₯

πœƒπ‘‘π½

π‘‘πœƒπ‘‘π½

𝑑π‘₯=

𝑑𝐽

𝑑𝑦

𝑑𝑦

𝑑π‘₯and

𝑑𝐽

π‘‘πœƒ=

𝑑𝐽

𝑑𝑦

𝑑𝑦

π‘‘πœƒ

S Gould | 8

Page 9: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Example: Back-propagation through a node

Consider the following implementation of a node

We can back-propagate gradients as

πœ•π‘¦π‘‘πœ•π‘¦π‘‘βˆ’1

=1

21 βˆ’

π‘₯

π‘¦π‘‘βˆ’12

πœ•π‘¦π‘‘πœ•π‘₯

=1

2π‘¦π‘‘βˆ’1+

πœ•π‘¦π‘‘πœ•π‘¦π‘‘βˆ’1

πœ•π‘¦π‘‘βˆ’1πœ•π‘₯

It turns out that this node implements the Babylonian algorithm, which computes

𝑦 = π‘₯

As such we can compute its derivative directly as

πœ•π‘¦

πœ•π‘₯=

1

2 π‘₯

=1

2𝑦

Chain rule gives πœ•π½

πœ•π‘₯from

πœ•π½

πœ•π‘¦(input) and

πœ•π‘¦

πœ•π‘₯(computed)

fwd_fcn(x)

y0 =1

2π‘₯

for 𝑑 = 1, …, 𝑇 do

y𝑑 ←1

2π‘¦π‘‘βˆ’1 +

π‘₯

π‘¦π‘‘βˆ’1return 𝑦𝑇

bck_fcn(x, y)

return 1

2𝑦

S Gould | 9

Page 10: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Separate of forward and backward operations

π‘₯ 𝑦

𝑑𝐽

𝑑𝑦

𝑑𝐽

𝑑π‘₯processing node

y0 =1

2π‘₯

for 𝑑 = 1, …, 𝑇 do

y𝑑 ←1

2π‘¦π‘‘βˆ’1 +

π‘₯

π‘¦π‘‘βˆ’1return 𝑦𝑇

return 1

2𝑦

𝑑𝐽

𝑑𝑦

forward function

backward function

S Gould | 10

Page 11: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Deep declarative networks (DDNs)

In an imperative node the implementation of the forward processing function αˆšπ‘“ is explicitly defined. The output is then

𝑦 = αˆšπ‘“(π‘₯; πœƒ)

where π‘₯ is the input and πœƒ are the parameters of the node.

In a declarative node the input–output relationship is specified as the solution to an optimization problem

𝑦 ∈ argminπ‘’βˆˆπΆ 𝑓(π‘₯, 𝑒; πœƒ)

where 𝑓 is the objective and 𝐢 are the constraints.

[Gould et al., 2019]

𝑦 = αˆšπ‘“ π‘₯; πœƒπ‘₯ 𝑦

D𝑦𝐽Dπ‘₯𝐽

πœƒ π·πœƒπ½

𝑦 = argminπ‘’βˆˆπΆπ‘“ π‘₯, 𝑒; πœƒ

π‘₯ 𝑦D𝑦𝐽Dπ‘₯𝐽

πœƒ Dπœƒπ½

S Gould | 11

Page 12: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Imperative vs. declarative node example: global average pooling

Imperative specification:

𝑦 =1

𝑛

𝑖=1

𝑛

π‘₯𝑖

Declarative specification:

𝑦 = argminπ‘’βˆˆβ„π‘š

𝑖=1

𝑛

𝑒 βˆ’ π‘₯𝑖2

π‘₯𝑖 ∈ β„π‘š ∣ 𝑖 = 1,… , 𝑛 β†’ β„π‘š

β€œthe vector 𝑒 that is the minimumdistance to all input vectors π‘₯𝑖”

S Gould | 12

Page 13: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Deep declarative nodes: special cases

Unconstrained(e.g., robust pooling)

𝑦 π‘₯ ∈ argminπ‘’βˆˆβ„π‘šπ‘“ π‘₯, 𝑒

Equality Constrained(e.g., projection onto 𝐿𝑝-sphere)

𝑦 π‘₯ ∈argminπ‘’βˆˆβ„π‘šπ‘“ π‘₯, 𝑒

subject to β„Ž π‘₯, 𝑒 = 0

Inequality Constrained(e.g., projection onto 𝐿𝑝-ball)

𝑦 π‘₯ ∈argminπ‘’βˆˆβ„π‘šπ‘“ π‘₯, 𝑒

subject to β„Ž π‘₯, 𝑒 ≀ 0

𝑦 = argminπ‘’βˆˆπΆπ‘“ π‘₯, 𝑒; πœƒ

π‘₯ 𝑦D𝑦𝐽Dπ‘₯𝐽

πœƒ Dπœƒπ½

[Gould et al., 2019]

S Gould | 13

Page 14: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Imperative and declarative nodes can co-exist

𝑓2 𝑧1; πœƒargmin

𝑓3 𝑧2, 𝑒; πœƒπ‘“4 𝑧3; πœƒ

𝑓5 𝑧1; πœƒ 𝑓6 𝑧5; πœƒargmin

𝑓7 𝑧6, 𝑒; πœƒ

𝑓1 π‘₯; πœƒ 𝑓8 𝑧4, 𝑧7; πœƒ

π‘₯

𝑦

𝑦 = 𝑓8 𝑓4 argmin 𝑓3 𝑓2 𝑓1 π‘₯ , 𝑒 , argmin 𝑓7 𝑓6 𝑓5 𝑓1 π‘₯ , 𝑒

S Gould | 14

Page 15: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Learning as bi-level optimization

minimize (over x) objective(x, y)

subject to constraints(x)

minimize (over y) objective(x, y)

subject to constraints(y)

bi-level learning problem

declarative node problem

minimize (over x) objective(x)

subject to constraints(x)

learning problem

S Gould | 15

Page 16: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

A game theoretic perspective

β€’ Consider two players, a leader and a followerβ€’ The market dictates the price its willing to pay for some goods based on

supply, i.e., quantity produced by both players, 𝑷 π’’πŸ + π’’πŸβ€’ Each player has a cost structure associated with producing goods, π‘ͺπ’Š π’’π’Š and

wants to maximize profits, π’’π’Šπ‘· π’’πŸ + π’’πŸ βˆ’ π‘ͺπ’Š π’’π’Šβ€’ The leader picks a quantity of goods to produce knowing that the follower

will respond optimally. In other words, the leader solves

maximizeπ‘ž1 π‘ž1𝑃 π‘ž1 + π‘ž2 βˆ’ 𝐢1 π‘ž1subject to π‘ž2 ∈ argmaxπ‘ž π‘žπ‘ƒ π‘ž1 + π‘ž βˆ’ 𝐢2 π‘ž

[Stackelberg, 1930s]

S Gould | 16

Page 17: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

β€’ Closed-form lower-level problem: substitute for π’š in upper problem

β€’ May result in a difficult (single-level) optimization problem

Solving bi-level optimization problems

[Bard, 1998; Dempe and Franke, 2015]

minimizeπ‘₯ 𝐽(π‘₯, 𝑦)subject to 𝑦 ∈ argmin𝑒𝑓(π‘₯, 𝑒)

minimizeπ‘₯ 𝐽(π‘₯, 𝑦 π‘₯ )

S Gould | 17

Page 18: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Solving bi-level optimization problems

β€’ Convex lower-level problem: replace lower problem with sufficient conditions (e.g., KKT conditions)

β€’ May result in non-convex problem if KKT conditions are not convex

[Bard, 1998; Dempe and Franke, 2015]

minimizeπ‘₯ 𝐽(π‘₯, 𝑦)subject to 𝑦 ∈ argmin𝑒𝑓(π‘₯, 𝑒)

minimizeπ‘₯,𝑦 𝐽(π‘₯, 𝑦)

subject to β„Ž 𝑦 = 0

S Gould | 18

Page 19: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Solving bi-level optimization problems

β€’ Gradient descent: compute gradient with respect to 𝒙

β€’ But this requires computing the gradient of π’š (itself a function of 𝒙)

[Bard, 1998; Dempe and Franke, 2015]

minimizeπ‘₯ 𝐽(π‘₯, 𝑦)subject to 𝑦 ∈ argmin𝑒𝑓(π‘₯, 𝑒)

π‘₯ ← π‘₯ βˆ’ πœ‚πœ•π½(π‘₯, 𝑦)

πœ•π‘₯+πœ•π½(π‘₯, 𝑦)

πœ•π‘¦

𝑑𝑦

𝑑π‘₯

S Gould | 19

Page 20: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Algorithm for solving bi-level optimization

SolveBiLevelOptimization:

initialize π‘₯

repeat until convergence:

solve 𝑦 ∈ argmin𝑒𝑓(π‘₯, 𝑒)

compute 𝐽(π‘₯, 𝑦)

compute 𝑑𝐽

𝑑π‘₯=

πœ•π½(π‘₯,𝑦)

πœ•π‘₯+

πœ•π½(π‘₯,𝑦)

πœ•π‘¦

𝑑𝑦

𝑑π‘₯

update π‘₯ ← π‘₯ βˆ’ πœ‚π‘‘π½

𝑑π‘₯

return π‘₯

[Bard, 1998; Dempe and Franke, 2015; Gould et al., 2019]

S Gould | 20

Page 21: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

[Gould et al., 2019]

How do we compute 𝑑

𝑑π‘₯argminπ‘’βˆˆπΆπ‘“ π‘₯, 𝑒 ?

S Gould | 21

Page 22: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Implicit differentiation

Let 𝑓:ℝ Γ— ℝ β†’ ℝ be a twice differentiable function and let

𝑦 π‘₯ = argmin𝑒𝑓 π‘₯, 𝑒

The derivative of 𝑓 vanishes at (π‘₯, 𝑦). By Dini’s implicit function theorem (1878)

𝑑𝑦(π‘₯)

𝑑π‘₯= βˆ’

πœ•2𝑓

πœ•π‘¦2

βˆ’1πœ•2𝑓

πœ•π‘₯πœ•π‘¦

The result extends to vector-valued functions, vector-argument functions and (equality) constrained problems. See [Gould et al., 2019].

[Krantz and Parks, 2013; Dontchev and Rockafellar, 2014; Gould et al., 2016; Gould et al., 2019]

S Gould | 22

Page 23: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Proof sketch

𝑦 ∈ argmin𝑒 𝑓 π‘₯, 𝑒 β‡’πœ•π‘“(π‘₯, 𝑦)

πœ•π‘¦= 0

LHS:𝑑

𝑑π‘₯

πœ•π‘“(π‘₯,𝑦)

πœ•π‘¦=

πœ•2𝑓 π‘₯,𝑦

πœ•π‘₯πœ•π‘¦+

πœ•2𝑓 π‘₯,𝑦

πœ•π‘¦2𝑑𝑦

𝑑π‘₯

RHS:𝑑

𝑑π‘₯0 = 0

Rearranging gives 𝑑𝑦

𝑑π‘₯= βˆ’

πœ•2𝑓

πœ•π‘¦2

βˆ’1πœ•2𝑓

πœ•π‘₯πœ•π‘¦.

𝑒

𝑓

𝑦

[Gould et al., 2019]

S Gould | 23

Page 24: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Deep declarative nodes: what do we need?

β€’ Forward passβ€’ A method to solve the optimization problem

β€’ Backward passβ€’ Specification of objective and constraintsβ€’ (And cached result from the forward pass)β€’ Do not need to know how the problem was solved

[Gould et al., 2019]

𝑦 = argminπ‘’βˆˆπΆπ‘“ π‘₯, 𝑒; πœƒ

π‘₯ 𝑦D𝑦𝐽Dπ‘₯𝐽

πœƒ Dπœƒπ½

𝑒

𝑓

𝑦S Gould | 24

Page 25: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

examples

S Gould | 25

Page 26: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

𝑦 =1

𝑛

𝑖=1

𝑛

π‘₯𝑖further processing(e.g., classification)

π‘₯

𝑦𝐽

data generating network

y = argminu

𝑖=1

𝑛12 𝑒 βˆ’ π‘₯𝑖

2

Global average pooling

π‘₯𝑖 ∈ β„π‘š ∣ 𝑖 = 1,… , 𝑛 β†’ β„π‘š

[Gould et al., 2019]

S Gould | 26

Page 27: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Robust penalty functions, πœ™

Quadratic Pseudo-Huber Huber Welsch Truncated Quad.

12𝑧

21 +

𝑧

𝛼

2

βˆ’ 1 ቐ12𝑧

2 for 𝑧 ≀ 𝛼

else 𝛼( 𝑧 βˆ’ 12𝛼)

1 βˆ’ expβˆ’π‘§2

2𝛼2ቐ12𝑧

2 for 𝑧 ≀ 𝛼12𝛼

2 otherwise

closed-form, convex, smooth, unique solution

convex, smooth, unique solution

convex,non-smooth,non-isolated

solutions

non-convex,smooth,

isolated solutions

non-convex,non-smooth,

isolated solutions

[Gould et al., 2019]

S Gould | 27

Page 28: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Example: robust pooling

argminu

𝑖=1

𝑛

πœ™ 𝑒 βˆ’ π‘₯𝑖; 𝛼

1

2𝑦 2

π‘₯

𝑦𝐽

data generating network

minimize (over π‘₯) 𝐽 π‘₯, 𝑦 β‰œ 12 𝑦 2

subject to 𝑦 ∈ argmin𝑒

𝑖=1

𝑛

πœ™(𝑒 βˆ’ π‘₯𝑖; 𝛼)

[Gould et al., 2019]

S Gould | 28

Page 29: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian
Page 30: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

π‘³πŸ π‘³πŸ π‘³βˆž

closed-form, smooth,unique solution*

non-smooth, isolated solutions non-smooth,isolated solutions

Example: Euclidean projection

[Gould et al., 2019]

S Gould | 30

Page 31: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

argminπ‘’βˆˆβ„π‘š1

2𝑒𝑇𝑃𝑒 + π‘žπ‘‡π‘’ + π‘Ÿ

subject to𝐴𝑒 = 𝑏𝐺𝑒 ≀ β„Ž

Can be differentiated with respect to its parameters:

𝑃 ∈ β„π‘šΓ—π‘š, π‘ž ∈ β„π‘š, 𝐴 ∈ β„π‘›Γ—π‘š, 𝑏 ∈ ℝ𝑛

Example: quadratic programs

[Amos and Kolter, 2017]

S Gould | 31

Page 32: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Can be differentiated with respect to its parameters:

𝐴 ∈ β„π‘›Γ—π‘š, 𝑏 ∈ ℝ𝑛, 𝑐 ∈ β„π‘š

Example: convex programs

[Agrawal et al., 2019]

argminπ‘’βˆˆβ„π‘š 𝑐𝑇𝑒subject to 𝑏 βˆ’ 𝐴𝑒 ∈ 𝐾

S Gould | 32

Page 33: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Implementing deep declarative nodes

β€’ Need: objective and constraint functions, solver to obtain 𝑦

β€’ Gradient by automatic differentiation

import autograd.numpy as np

from autograd import grad, jacobian

def gradient(x, y, f)

fY = grad(f, 1)

fYY = jacobian(fY, 1)

fXY = jacobian(fY, 0)

return -1.0 * np.linalg.solve(fYY(x,y), fXY(x,y))

𝑑𝑦(π‘₯)

𝑑π‘₯= βˆ’

πœ•2𝑓

πœ•π‘¦2

βˆ’1πœ•2𝑓

πœ•π‘₯πœ•π‘¦

[Gould et al., 2019]

S Gould | 33

Page 34: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

cvxpylayers

β€’ Disciplined convex optimizationβ€’ Subset of optimization problems

β€’ Write problem using cvxβ€’ Solver and gradient computed automatically!

[Agrawal et al., 2019]

x = cp. Parameter(n)

y = cp. Variable(n)

obj = cp. Minimize(cp.sum_squares(x - y ))

cons = [ y >= 0]

prob = cp. Problem(obj, cons)

layer = CvxpyLayer(prob, parameters=[x], variables=[y])

S Gould | 34

Page 35: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

applications

S Gould | 35

Page 36: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Robust point cloud classification

[Gould et al., 2019]

S Gould | 36

Page 37: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Robust point cloud classification

[Gould et al., 2019]

S Gould | 37

Page 38: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Video activity recognition

Stand Up Sit Down

[Marszalek et al., 2009; Carreira et al., 2019]

S Gould | 38

Page 39: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

stand up

Frame encoding

abstract feature 1

abst

ract

fea

ture

2

[Fernando and Gould, 2016]

S Gould | 39

Page 40: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

stand up

sit down

stand up

answerphone

answerphone

abstract feature 1

abst

ract

fea

ture

2

[Fernando and Gould, 2016]

S Gould | 40

Page 41: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Video clip classification pipeline

frame 1

frame 2

frame t

…

Co

nv-

1

ReL

U-1

Poo

l-1

Co

nv-

2

ReL

U-2

Poo

l-2

Co

nv-

3

ReL

U-3

Poo

l-3

Co

nv-

4

ReL

U-4

Poo

l-4

Co

nv-

5

ReL

U-5

Poo

l-5

FC-6

ReL

U-6

FC-7

ReL

U-7

L2-n

orm

Co

nv-

1

ReL

U-1

Poo

l-1

Co

nv-

2

ReL

U-2

Poo

l-2

Co

nv-

3

ReL

U-3

Poo

l-3

Co

nv-

4

ReL

U-4

Poo

l-4

Co

nv-

5

ReL

U-5

Poo

l-5

FC-6

ReL

U-6

FC-7

ReL

U-7

L2-n

orm

Co

nv-

1

ReL

U-1

Poo

l-1

Co

nv-

2

ReL

U-2

Poo

l-2

Co

nv-

3

ReL

U-3

Poo

l-3

Co

nv-

4

ReL

U-4

Poo

l-4

Co

nv-

5

ReL

U-5

Poo

l-5

FC-6

ReL

U-6

FC-7

ReL

U-7

L2-n

orm

T-Po

ol

L2-n

orm

Soft

max

[Fernando and Gould, 2016]

S Gould | 41

Page 42: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Temporal pooling

β€’ Max/avg/robust pooling summarizes an unstructured set of objects

π‘₯𝑖 ∣ 𝑖 = 1,… , 𝑛 β†’ β„π‘š

β€’ Rank pooling summarizes a structured sequence of objects

π‘₯𝑖 𝑖 = 1,… , 𝑛 β†’ β„π‘š

[Fernando et al., 2015; Fernando and Gould, 2016]

S Gould | 42

Page 43: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Rank Pooling

β€’ Find a ranking function π‘Ÿ:ℝ𝑛 β†’ ℝ such that π‘Ÿ π‘₯𝑑 < π‘Ÿ π‘₯𝑠 for 𝑑 < 𝑠

β€’ In our case we assume that π‘Ÿ: π‘₯ ↦ 𝑒𝑇π‘₯ is a linear function

β€’ Use 𝑒 as the representation

[Fernando et al., 2015; Fernando and Gould, 2016]

S Gould | 43

Page 44: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Experimental results

Method Accuracy (%)

Max-Pool + SVM 66

Avg-Pool + SVM 67

Rank-Pool + SVM 66

Max-Pool-CNN (end-to-end) 71

Avg-Pool-CNN (end-to-end) 70

Rank-Pool-CNN (end-to-end) 87

Improved trajectory features + fisher vectors + rank-pooling

87

[Rodriguez et al., 2008]

21% improvement!

150 video clips from BBC and ESPN footage10 sports actions

[Fernando and Gould, 2016]

S Gould | 44

Page 45: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Visual attribute ranking

1. Order a collection of images according to a given attribute

2. Recover the original image from shuffled image patches

[Santa Cruz et al., 2017]

S Gould | 45

Page 46: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Birkhoff polytope

β€’ Permutation matrices form discrete points inEuclidean space which imposes difficulties forgradient based optimizers

β€’ The Birkhoff polytope is the convex hull for theset of 𝑛 Γ— 𝑛 permutation matrices

β€’ This coincides exactly with the set of 𝑛 Γ— 𝑛 doubly stochastic matrices

β€’ We relax our visual permutation learning problem over permutation matrices to a problem over doubly stochastic matrices

{π‘₯1, … , π‘₯𝑛} β†’ 𝐡𝑛

[Santa Cruz et al., 2017]

S Gould | 46

Page 47: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

End-to-end visual permutation learning

0 1 0 00 0 1 00 0 0 11 0 0 0

X

𝑃

π‘₯

CNN

CNN

… CNN

0 0.7 0.2 0.10 0.1 0.8 00.1 0.1 0 0.80.9 0.1 0 0

𝑄

?

𝐴

positive matrix

doubly stochastic matrix

[Santa Cruz et al., 2017]

S Gould | 47

Page 48: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Sinkhorn normalization or projection onto 𝐡𝑛

Alternatively, define a deep declarative module

[Santa Cruz et al., 2017]

sinkhorn_fcn(A)

𝑄 = 𝐴

for 𝑑 = 1, …, 𝑇 do

𝑄𝑖,𝑗 ←𝑄𝑖,𝑗

Οƒπ‘˜π‘„π‘–,π‘˜

𝑄𝑖,𝑗 ←𝑄𝑖,𝑗

Οƒπ‘˜π‘„π‘˜,𝑗

return 𝑄

S Gould | 48

Page 49: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Visual attribute learning results

[Santa Cruz et al., 2017]

S Gould | 49

Page 50: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Blind perspective-n-point

[Campbell et al., 2019 (unpublished)]

S Gould | 50

Page 51: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Blind perspective-n-point

[Campbell et al., 2019 (unpublished)]

𝑍𝐹2D featureextraction

𝐹

𝑍𝑃3D featureextraction

𝑃

pairwisedistance

sinkhornnormalization

RANSACweighted

BPnP(𝑅, 𝑑)

declarative node

declarative node

S Gould | 51

Page 52: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

Blind perspective-n-point

[Campbell et al., 2019 (unpublished)]

S Gould | 52

Page 53: Deep Declarative Networks: A New Hopeusers.cecs.anu.edu.au/~sgould/papers/declarativeNetworks...A New Hope A/Prof. Stephen Gould Research School of Computer Science The Australian

code and tutorials at http://deepdeclarativenetworks.com

CVPR 2020 Workshop (http://cvpr2020.deepdeclarativenetworks.com)

S Gould | 53