DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity...

112
Slope and geometry in variational mathematics Dmitriy Drusvyatskiy School of ORIE, Cornell University Joint work with Aris Daniilidis (Barcelona), Alex D. Ioffe (Technion), Martin Larsson (Lausanne), and Adrian S. Lewis (Cornell) January 29, 2013

Transcript of DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity...

Page 1: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Slope and geometry in variational mathematics

Dmitriy DrusvyatskiySchool of ORIE, Cornell University

Joint work withAris Daniilidis (Barcelona), Alex D. Ioffe (Technion), Martin Larsson (Lausanne),

and Adrian S. Lewis (Cornell)

January 29, 2013

Page 2: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Fix a metric space (X , d) and a function f : X → R.

Slope: “Fastest instantaneous rate of decrease”

|∇f |(x) := limsupx→x

f (x)− f (x)d(x, x) .

• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.

Critical points:

x is critical for f ⇐⇒ |∇f |(x) = 0.

Eg:

Origin is critical

2/35

Page 3: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Fix a metric space (X , d) and a function f : X → R.

Slope: “Fastest instantaneous rate of decrease”

|∇f |(x) := limsupx→x

f (x)− f (x)d(x, x) .

• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.

Critical points:

x is critical for f ⇐⇒ |∇f |(x) = 0.

Eg:

Origin is critical

2/35

Page 4: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Fix a metric space (X , d) and a function f : X → R.

Slope: “Fastest instantaneous rate of decrease”

|∇f |(x) := limsupx→x

f (x)− f (x)d(x, x) .

• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.

Critical points:

x is critical for f ⇐⇒ |∇f |(x) = 0.

Eg:

Origin is critical

2/35

Page 5: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Fix a metric space (X , d) and a function f : X → R.

Slope: “Fastest instantaneous rate of decrease”

|∇f |(x) := limsupx→x

f (x)− f (x)d(x, x) .

• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.

Critical points:

x is critical for f ⇐⇒ |∇f |(x) = 0.

Eg:

Origin is critical

2/35

Page 6: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Fix a metric space (X , d) and a function f : X → R.

Slope: “Fastest instantaneous rate of decrease”

|∇f |(x) := limsupx→x

f (x)− f (x)d(x, x) .

• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.

Critical points:

x is critical for f ⇐⇒ |∇f |(x) = 0.

Eg:

Origin is critical

2/35

Page 7: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Fix a metric space (X , d) and a function f : X → R.

Slope: “Fastest instantaneous rate of decrease”

|∇f |(x) := limsupx→x

f (x)− f (x)d(x, x) .

• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.

Critical points:

x is critical for f ⇐⇒ |∇f |(x) = 0.Eg:

Origin is critical 2/35

Page 8: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Method of alternating projectionsCommon problem:

Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.

A

Bx

Distance and projection:

dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.

Finding points in PA and PB is often easy!

Eg 1 (simple example): Linear programming:

{x : x ≥ 0}⋂{x : Ax = b}.

Eg 2 (more interesting): Low-order control:

{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.

3/35

Page 9: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Method of alternating projectionsCommon problem:

Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.

A

Bx

Distance and projection:

dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.

Finding points in PA and PB is often easy!

Eg 1 (simple example): Linear programming:

{x : x ≥ 0}⋂{x : Ax = b}.

Eg 2 (more interesting): Low-order control:

{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.

3/35

Page 10: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Method of alternating projectionsCommon problem:

Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.

A

Bx

Distance and projection:

dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.

Finding points in PA and PB is often easy!

Eg 1 (simple example): Linear programming:

{x : x ≥ 0}⋂{x : Ax = b}.

Eg 2 (more interesting): Low-order control:

{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.

3/35

Page 11: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Method of alternating projectionsCommon problem:

Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.

A

Bx

Distance and projection:

dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.

Finding points in PA and PB is often easy!

Eg 1 (simple example): Linear programming:

{x : x ≥ 0}⋂{x : Ax = b}.

Eg 2 (more interesting): Low-order control:

{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.

3/35

Page 12: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Method of alternating projectionsCommon problem:

Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.

A

Bx

Distance and projection:

dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.

Finding points in PA and PB is often easy!

Eg 1 (simple example): Linear programming:

{x : x ≥ 0}⋂{x : Ax = b}.

Eg 2 (more interesting): Low-order control:

{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.

3/35

Page 13: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Method of alternating projections

Method of alternating projections (von Neumann ’33):

xk+1 ∈ PB(xk)xk+2 ∈ PA(xk+1)

A

B

x0

The “angle” between A and B drives the convergence!

Quantifying the angle:

ψ(x, y) :={|x − y| if x ∈ A, y ∈ B+∞ otherwise

Comparison of |∇ψy|(x) and |∇ψx |(y) quantifies the angle!

4/35

Page 14: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Method of alternating projections

Method of alternating projections (von Neumann ’33):

xk+1 ∈ PB(xk)xk+2 ∈ PA(xk+1)

A

B

x0

The “angle” between A and B drives the convergence!

Quantifying the angle:

ψ(x, y) :={|x − y| if x ∈ A, y ∈ B+∞ otherwise

Comparison of |∇ψy|(x) and |∇ψx |(y) quantifies the angle!

4/35

Page 15: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Method of alternating projections

Method of alternating projections (von Neumann ’33):

xk+1 ∈ PB(xk)xk+2 ∈ PA(xk+1)

A

B

x0

The “angle” between A and B drives the convergence!

Quantifying the angle:

ψ(x, y) :={|x − y| if x ∈ A, y ∈ B+∞ otherwise

Comparison of |∇ψy|(x) and |∇ψx |(y) quantifies the angle!

4/35

Page 16: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Outline

• Variational geometry:— Subdifferentials— The semi-algebraic case— Slope and error bounds

• Applications:— Alternating projections and transversality— Steepest descent trajectories

• To smooth or not to smooth?

• Active sets in optimization

5/35

Page 17: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

SubdifferentialsFréchet subdifferential:

v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,

or equivalentlyv ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).

• Eg: If f is smooth ∂f = {∇f }.Example:

∂(−|·|)(x) =

1 if x < 0∅ if x = 0−1 if x > 0

y = −|x|

Subdifferential graph:

gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.

6/35

Page 18: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

SubdifferentialsFréchet subdifferential:

v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,or equivalently

v ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).

• Eg: If f is smooth ∂f = {∇f }.Example:

∂(−|·|)(x) =

1 if x < 0∅ if x = 0−1 if x > 0

y = −|x|

Subdifferential graph:

gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.

6/35

Page 19: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

SubdifferentialsFréchet subdifferential:

v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,or equivalently

v ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).• Eg: If f is smooth ∂f = {∇f }.

Example:

∂(−|·|)(x) =

1 if x < 0∅ if x = 0−1 if x > 0

y = −|x|

Subdifferential graph:

gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.

6/35

Page 20: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

SubdifferentialsFréchet subdifferential:

v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,or equivalently

v ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).• Eg: If f is smooth ∂f = {∇f }.

Example:

∂(−|·|)(x) =

1 if x < 0∅ if x = 0−1 if x > 0

y = −|x|

Subdifferential graph:

gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.

6/35

Page 21: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

SubdifferentialsFréchet subdifferential:

v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,or equivalently

v ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).• Eg: If f is smooth ∂f = {∇f }.

Example:

∂(−|·|)(x) =

1 if x < 0∅ if x = 0−1 if x > 0

y = −|x|

Subdifferential graph:

gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.6/35

Page 22: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subdifferentials

Fundamental theoretical and computational hurdle:1. |∇f | is not lsc,2. gph ∂f is not closed.

Limiting slope:

|∇f |(x) := liminfx→x

|∇f |(x).

Limiting subdifferential:

gph ∂f := cl (gph ∂f ).

Relationship (Ioffe ’00):

|∇f |(x) = dist (0, ∂f (x)).

7/35

Page 23: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subdifferentials

Fundamental theoretical and computational hurdle:1. |∇f | is not lsc,2. gph ∂f is not closed.

Limiting slope:

|∇f |(x) := liminfx→x

|∇f |(x).

Limiting subdifferential:

gph ∂f := cl (gph ∂f ).

Relationship (Ioffe ’00):

|∇f |(x) = dist (0, ∂f (x)).

7/35

Page 24: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subdifferentials

Fundamental theoretical and computational hurdle:1. |∇f | is not lsc,2. gph ∂f is not closed.

Limiting slope:

|∇f |(x) := liminfx→x

|∇f |(x).

Limiting subdifferential:

gph ∂f := cl (gph ∂f ).

Relationship (Ioffe ’00):

|∇f |(x) = dist (0, ∂f (x)).

7/35

Page 25: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subdifferentials

Are these notions adequate?

Pathology: gph ∂f can be very large!

What do we expect the size of gph ∂f to be?

• If f : Rn → R is smooth, then gph ∂f is n-dimensionalsmooth manifold.

• If f : Rn → R is convex, then gph ∂f is n-dimensionalLipschitz manifold (Minty ’62).

Multiple authors (Rockafellar, Borwein, Wang,. . .):There are functions f : Rn → R with “2n-dimensional” gph ∂f .

8/35

Page 26: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subdifferentials

Are these notions adequate?

Pathology: gph ∂f can be very large!

What do we expect the size of gph ∂f to be?

• If f : Rn → R is smooth, then gph ∂f is n-dimensionalsmooth manifold.

• If f : Rn → R is convex, then gph ∂f is n-dimensionalLipschitz manifold (Minty ’62).

Multiple authors (Rockafellar, Borwein, Wang,. . .):There are functions f : Rn → R with “2n-dimensional” gph ∂f .

8/35

Page 27: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subdifferentials

Are these notions adequate?

Pathology: gph ∂f can be very large!

What do we expect the size of gph ∂f to be?

• If f : Rn → R is smooth, then gph ∂f is n-dimensionalsmooth manifold.

• If f : Rn → R is convex, then gph ∂f is n-dimensionalLipschitz manifold (Minty ’62).

Multiple authors (Rockafellar, Borwein, Wang,. . .):There are functions f : Rn → R with “2n-dimensional” gph ∂f .

8/35

Page 28: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subdifferentials

Are these notions adequate?

Pathology: gph ∂f can be very large!

What do we expect the size of gph ∂f to be?

• If f : Rn → R is smooth, then gph ∂f is n-dimensionalsmooth manifold.

• If f : Rn → R is convex, then gph ∂f is n-dimensionalLipschitz manifold (Minty ’62).

Multiple authors (Rockafellar, Borwein, Wang,. . .):There are functions f : Rn → R with “2n-dimensional” gph ∂f .

8/35

Page 29: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Semi-algebraic geometry

Q ⊂ Rn is semi-algebraic if it is a finite union of

solution sets to finitely many polynomial inequalities.

Semi-algebraicity is robust (Tarski-Seidenberg theorem).Eg:f semi-algebraic =⇒ gph ∂f and |∇f | are semi-algebraic.

Semi-algebraic Q “stratify” into finitely many manifolds {Mi}.Dimension:

dimQ := maxi=1,...,k

{dimMi}.

9/35

Page 30: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Semi-algebraic geometry

Q ⊂ Rn is semi-algebraic if it is a finite union of

solution sets to finitely many polynomial inequalities.

Semi-algebraicity is robust (Tarski-Seidenberg theorem).Eg:f semi-algebraic =⇒ gph ∂f and |∇f | are semi-algebraic.

Semi-algebraic Q “stratify” into finitely many manifolds {Mi}.Dimension:

dimQ := maxi=1,...,k

{dimMi}.

9/35

Page 31: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Semi-algebraic geometry

Q ⊂ Rn is semi-algebraic if it is a finite union of

solution sets to finitely many polynomial inequalities.

Semi-algebraicity is robust (Tarski-Seidenberg theorem).Eg:f semi-algebraic =⇒ gph ∂f and |∇f | are semi-algebraic.

Semi-algebraic Q “stratify” into finitely many manifolds {Mi}.Dimension:

dimQ := maxi=1,...,k

{dimMi}.

9/35

Page 32: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Semi-algebraic geometry

Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have

dim gph ∂f = n,

even locally around any point in gph ∂f .

Conclusion: Criticality is meaningful for concrete variationalproblems!

Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem) ⇒ intervals (a, b) of non-critical values.

What can we learn from non-criticality?

10/35

Page 33: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Semi-algebraic geometry

Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have

dim gph ∂f = n,

even locally around any point in gph ∂f .

Conclusion: Criticality is meaningful for concrete variationalproblems!

Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem) ⇒ intervals (a, b) of non-critical values.

What can we learn from non-criticality?

10/35

Page 34: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Semi-algebraic geometry

Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have

dim gph ∂f = n,

even locally around any point in gph ∂f .

Conclusion: Criticality is meaningful for concrete variationalproblems!

Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem)

⇒ intervals (a, b) of non-critical values.

What can we learn from non-criticality?

10/35

Page 35: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Semi-algebraic geometry

Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have

dim gph ∂f = n,

even locally around any point in gph ∂f .

Conclusion: Criticality is meaningful for concrete variationalproblems!

Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem) ⇒ intervals (a, b) of non-critical values.

What can we learn from non-criticality?

10/35

Page 36: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Semi-algebraic geometry

Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have

dim gph ∂f = n,

even locally around any point in gph ∂f .

Conclusion: Criticality is meaningful for concrete variationalproblems!

Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem) ⇒ intervals (a, b) of non-critical values.

What can we learn from non-criticality?

10/35

Page 37: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Slope and error boundsCommon problem: Estimate

dist (x, [f ≤ r ]) (difficult).

“The residual”:f (x)− r (easy).

Desirable quality: Exists κ with

dist (x, [f ≤ r ]) ≤ κ(f (x)− r).

Restrict f : Rn → R to a “slice” f−1(a, b).

Lemma (Error bound)The following are equivalent.Non-criticality:

|∇f | ≥ 1κ.

Error-bound:

dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).

11/35

Page 38: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Slope and error boundsCommon problem: Estimate

dist (x, [f ≤ r ]) (difficult).

“The residual”:f (x)− r (easy).

Desirable quality: Exists κ with

dist (x, [f ≤ r ]) ≤ κ(f (x)− r).

Restrict f : Rn → R to a “slice” f−1(a, b).

Lemma (Error bound)The following are equivalent.Non-criticality:

|∇f | ≥ 1κ.

Error-bound:

dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).

11/35

Page 39: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Slope and error boundsCommon problem: Estimate

dist (x, [f ≤ r ]) (difficult).

“The residual”:f (x)− r (easy).

Desirable quality: Exists κ with

dist (x, [f ≤ r ]) ≤ κ(f (x)− r).

Restrict f : Rn → R to a “slice” f−1(a, b).

Lemma (Error bound)The following are equivalent.Non-criticality:

|∇f | ≥ 1κ.

Error-bound:

dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).

11/35

Page 40: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Slope and error boundsCommon problem: Estimate

dist (x, [f ≤ r ]) (difficult).

“The residual”:f (x)− r (easy).

Desirable quality: Exists κ with

dist (x, [f ≤ r ]) ≤ κ(f (x)− r).

Restrict f : Rn → R to a “slice” f−1(a, b).

Lemma (Error bound)The following are equivalent.Non-criticality:

|∇f | ≥ 1κ.

Error-bound:

dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).

11/35

Page 41: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Slope and error boundsCommon problem: Estimate

dist (x, [f ≤ r ]) (difficult).

“The residual”:f (x)− r (easy).

Desirable quality: Exists κ with

dist (x, [f ≤ r ]) ≤ κ(f (x)− r).

Restrict f : Rn → R to a “slice” f−1(a, b).

Lemma (Error bound)The following are equivalent.Non-criticality:

|∇f | ≥ 1κ.

Error-bound:

dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).

11/35

Page 42: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Alternating projections & transversality

12/35

Page 43: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projections

Indicator function:

δA(x) ={

0 if x ∈ A∞ if x /∈ A

Coupling function:

ψ(x, y) = δA(x) + |x − y|+ δB(y).

A

B

x0

Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ

for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)

Equivalently: For any v = x−y|x−y| , either

dist(v, ∂δB(y)

)≥ κ or dist

(− v, ∂δA(x)

)≥ κ.

Normal cone: NB(y) := ∂δB(y).

13/35

Page 44: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projectionsIndicator function:

δA(x) ={

0 if x ∈ A∞ if x /∈ A

Coupling function:

ψ(x, y) = δA(x) + |x − y|+ δB(y).

A

B

x0

Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ

for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)

Equivalently: For any v = x−y|x−y| , either

dist(v, ∂δB(y)

)≥ κ or dist

(− v, ∂δA(x)

)≥ κ.

Normal cone: NB(y) := ∂δB(y).

13/35

Page 45: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projectionsIndicator function:

δA(x) ={

0 if x ∈ A∞ if x /∈ A

Coupling function:

ψ(x, y) = δA(x) + |x − y|+ δB(y).

A

B

x0

Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ

for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)

Equivalently: For any v = x−y|x−y| , either

dist(v, ∂δB(y)

)≥ κ or dist

(− v, ∂δA(x)

)≥ κ.

Normal cone: NB(y) := ∂δB(y).

13/35

Page 46: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projectionsIndicator function:

δA(x) ={

0 if x ∈ A∞ if x /∈ A

Coupling function:

ψ(x, y) = δA(x) + |x − y|+ δB(y).

A

B

x0

Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ

for all x ∈ A and y ∈ B, not in A ∩ B.

(Proof: Error bound)

Equivalently: For any v = x−y|x−y| , either

dist(v, ∂δB(y)

)≥ κ or dist

(− v, ∂δA(x)

)≥ κ.

Normal cone: NB(y) := ∂δB(y).

13/35

Page 47: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projectionsIndicator function:

δA(x) ={

0 if x ∈ A∞ if x /∈ A

Coupling function:

ψ(x, y) = δA(x) + |x − y|+ δB(y).

A

B

x0

Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ

for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)

Equivalently: For any v = x−y|x−y| , either

dist(v, ∂δB(y)

)≥ κ or dist

(− v, ∂δA(x)

)≥ κ.

Normal cone: NB(y) := ∂δB(y).

13/35

Page 48: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projectionsIndicator function:

δA(x) ={

0 if x ∈ A∞ if x /∈ A

Coupling function:

ψ(x, y) = δA(x) + |x − y|+ δB(y).

A

B

x0

Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ

for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)

Equivalently: For any v = x−y|x−y| , either

dist(v, ∂δB(y)

)≥ κ or dist

(− v, ∂δA(x)

)≥ κ.

Normal cone: NB(y) := ∂δB(y).

13/35

Page 49: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projectionsIndicator function:

δA(x) ={

0 if x ∈ A∞ if x /∈ A

Coupling function:

ψ(x, y) = δA(x) + |x − y|+ δB(y).

A

B

x0

Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ

for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)

Equivalently: For any v = x−y|x−y| , either

dist(v, ∂δB(y)

)≥ κ or dist

(− v, ∂δA(x)

)≥ κ.

Normal cone: NB(y) := ∂δB(y).13/35

Page 50: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projections

dist(v,NB(y)

)≥ κ or dist

(v,−NA(x)

)≥ κ

x1

x4

x3x2

x5

x6x7Q

Figure: Normal cones

Transversality: NA(x) ∩ −NB(x) = {0}.

Local convergence (D-Ioffe-Lewis ’13):

A and B transverse at x =⇒ local R-linear convergence.

14/35

Page 51: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projections

dist(v,NB(y)

)≥ κ or dist

(v,−NA(x)

)≥ κ

x1

x4

x3x2

x5

x6x7Q

Figure: Normal cones

Transversality: NA(x) ∩ −NB(x) = {0}.

Local convergence (D-Ioffe-Lewis ’13):

A and B transverse at x =⇒ local R-linear convergence.

14/35

Page 52: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projections

dist(v,NB(y)

)≥ κ or dist

(v,−NA(x)

)≥ κ

x1

x4

x3x2

x5

x6x7Q

Figure: Normal cones

Transversality: NA(x) ∩ −NB(x) = {0}.

Local convergence (D-Ioffe-Lewis ’13):

A and B transverse at x =⇒ local R-linear convergence.

14/35

Page 53: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Convergence of alternating projections

dist(v,NB(y)

)≥ κ or dist

(v,−NA(x)

)≥ κ

x1

x4

x3x2

x5

x6x7Q

Figure: Normal cones

Transversality: NA(x) ∩ −NB(x) = {0}.

Local convergence (D-Ioffe-Lewis ’13):

A and B transverse at x =⇒ local R-linear convergence.

14/35

Page 54: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Steepest descent trajectories

15/35

Page 55: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Steepest descent trajectoriesNotation: X is a complete metric space.

Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if

dist (x(t), x(s)) ≤ |t − s|.

What are trajectories of steepest descent?

Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy

|∇(f ◦ x)| ≤ |∇f |(x).

Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].

16/35

Page 56: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Steepest descent trajectoriesNotation: X is a complete metric space.Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if

dist (x(t), x(s)) ≤ |t − s|.

What are trajectories of steepest descent?

Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy

|∇(f ◦ x)| ≤ |∇f |(x).

Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].

16/35

Page 57: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Steepest descent trajectoriesNotation: X is a complete metric space.Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if

dist (x(t), x(s)) ≤ |t − s|.

What are trajectories of steepest descent?

Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy

|∇(f ◦ x)| ≤ |∇f |(x).

Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].

16/35

Page 58: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Steepest descent trajectoriesNotation: X is a complete metric space.Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if

dist (x(t), x(s)) ≤ |t − s|.

What are trajectories of steepest descent?

Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy

|∇(f ◦ x)| ≤ |∇f |(x).

Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].

16/35

Page 59: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Steepest descent trajectoriesNotation: X is a complete metric space.Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if

dist (x(t), x(s)) ≤ |t − s|.

What are trajectories of steepest descent?

Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy

|∇(f ◦ x)| ≤ |∇f |(x).

Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].

16/35

Page 60: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Example

Figure: f (x, y) = max{x + y, |x − y|}+ x(x + 1) + y(y + 1) + 10017/35

Page 61: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subgradient dynamical systems

Theorem (D-Ioffe-Lewis)For reasonable f : Rn → R and a curve x : [0,T ]→ Rn the followingare equivalent.1. x is a curve of near-steepest descent,2. f ◦ x is decreasing and (after reparametrizing)

x ∈ −∂f (x), a.e. on [0,T ].

Remark: When 2. holds,

x is the shortest element of −∂f (x), a.e. on [0,T ].

Reasonable conditions: f is smooth, convex, or semi-algebraic.

18/35

Page 62: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subgradient dynamical systems

Theorem (D-Ioffe-Lewis)For reasonable f : Rn → R and a curve x : [0,T ]→ Rn the followingare equivalent.1. x is a curve of near-steepest descent,2. f ◦ x is decreasing and (after reparametrizing)

x ∈ −∂f (x), a.e. on [0,T ].

Remark: When 2. holds,

x is the shortest element of −∂f (x), a.e. on [0,T ].

Reasonable conditions: f is smooth, convex, or semi-algebraic.

18/35

Page 63: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Subgradient dynamical systems

Theorem (D-Ioffe-Lewis)For reasonable f : Rn → R and a curve x : [0,T ]→ Rn the followingare equivalent.1. x is a curve of near-steepest descent,2. f ◦ x is decreasing and (after reparametrizing)

x ∈ −∂f (x), a.e. on [0,T ].

Remark: When 2. holds,

x is the shortest element of −∂f (x), a.e. on [0,T ].

Reasonable conditions: f is smooth, convex, or semi-algebraic.

18/35

Page 64: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Existence

Theorem (Ambrosio et al. ’05, De Giorgi ’93)For reasonable f : X → R, there exist curves of near-steepestdescent starting from any point.

Proof ingredients:• Moreau-Yosida approximation:

xk+1 = argminx∈X

{f (x) + 12τ d

2(x, xk)}.

• Extraneous topologies ⇒ existence of minimizers andconvergence.

Proof is opaque and uses heavy machinery!

19/35

Page 65: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Existence

Theorem (Ambrosio et al. ’05, De Giorgi ’93)For reasonable f : X → R, there exist curves of near-steepestdescent starting from any point.

Proof ingredients:• Moreau-Yosida approximation:

xk+1 = argminx∈X

{f (x) + 12τ d

2(x, xk)}.

• Extraneous topologies ⇒ existence of minimizers andconvergence.

Proof is opaque and uses heavy machinery!

19/35

Page 66: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Existence

Theorem (Ambrosio et al. ’05, De Giorgi ’93)For reasonable f : X → R, there exist curves of near-steepestdescent starting from any point.

Proof ingredients:• Moreau-Yosida approximation:

xk+1 = argminx∈X

{f (x) + 12τ d

2(x, xk)}.

• Extraneous topologies ⇒ existence of minimizers andconvergence.

Proof is opaque and uses heavy machinery!

19/35

Page 67: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

New proof idea: Error bound lemma

Equipartition: For η > 0,

f (x0)− η = τ0 < . . . < τk = f (x0).

Initialize: j = 0;while i ≤ k do

xj+1 ← P[f≤τj+1](xj);end

Consider resulting trajectories as k →∞.

20/35

Page 68: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

New proof idea: Error bound lemma

Equipartition: For η > 0,

f (x0)− η = τ0 < . . . < τk = f (x0).

Initialize: j = 0;while i ≤ k do

xj+1 ← P[f≤τj+1](xj);end

Consider resulting trajectories as k →∞.

20/35

Page 69: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

New proof idea: Error bound lemma

Equipartition: For η > 0,

f (x0)− η = τ0 < . . . < τk = f (x0).

Initialize: j = 0;while i ≤ k do

xj+1 ← P[f≤τj+1](xj);end

Consider resulting trajectories as k →∞.

20/35

Page 70: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Lengths of steepest descent curves

One motivation: Algorithm complexity (Eg: Attouch et al. ’11).

Theorem (D-Ioffe-Lewis)If f is semi-algebraic, then any bounded trajectory ofnear-steepest descent (with maximal domain) has bounded lengthand converges to a critical point.

Historic remarks:• Lojasiewicz famously proved it for real analytic functions.• True for convex f (Daniilidis et al. ’12).• Not true for C∞ functions (Palis, de Melo).

21/35

Page 71: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Lengths of steepest descent curves

One motivation: Algorithm complexity (Eg: Attouch et al. ’11).

Theorem (D-Ioffe-Lewis)If f is semi-algebraic, then any bounded trajectory ofnear-steepest descent (with maximal domain) has bounded lengthand converges to a critical point.

Historic remarks:• Lojasiewicz famously proved it for real analytic functions.• True for convex f (Daniilidis et al. ’12).• Not true for C∞ functions (Palis, de Melo).

21/35

Page 72: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Lengths of steepest descent curves

One motivation: Algorithm complexity (Eg: Attouch et al. ’11).

Theorem (D-Ioffe-Lewis)If f is semi-algebraic, then any bounded trajectory ofnear-steepest descent (with maximal domain) has bounded lengthand converges to a critical point.

Historic remarks:• Lojasiewicz famously proved it for real analytic functions.• True for convex f (Daniilidis et al. ’12).• Not true for C∞ functions (Palis, de Melo).

21/35

Page 73: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

To smooth or not to smooth?

22/35

Page 74: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsMollification:

fε(x) := (f ∗ φε)(x) =∫

Rnφε(x − y)f (y) dy

where φε is a scaled standard bump function φ

• Downside: Not much control on derivatives.

23/35

Page 75: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsMollification:

fε(x) := (f ∗ φε)(x) =∫

Rnφε(x − y)f (y) dy

where φε is a scaled standard bump function φ

• Downside: Not much control on derivatives.

23/35

Page 76: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsSet-up: Q ↪−→ Rn f−−→ R.

Q

Assume Q is a disjoint union of C∞ manifolds

Q = M1 ∪M2 ∪ . . . ∪Mk−1 ∪Mk

24/35

Page 77: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsMotivation: Integration by parts∫

Qg∆f dx

=∫∂Q

g〈∇f , n〉dS −∫

Q〈∇g,∇f 〉 dx.

Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.

Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:

x ∈ Mi =⇒ ∇f (x) ∈ TxMi .

provided {Mi} is a Whitney stratification of Q.

• For semi-algebraic Q, Whitney stratifications always exist!

25/35

Page 78: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsMotivation: Integration by parts∫

Qg∆f dx =

∫∂Q

g〈∇f , n〉dS −∫

Q〈∇g,∇f 〉 dx.

Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.

Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:

x ∈ Mi =⇒ ∇f (x) ∈ TxMi .

provided {Mi} is a Whitney stratification of Q.

• For semi-algebraic Q, Whitney stratifications always exist!

25/35

Page 79: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsMotivation: Integration by parts∫

Qg∆f dx =

∫∂Q

g〈∇f , n〉dS −∫

Q〈∇g,∇f 〉 dx.

Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.

Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:

x ∈ Mi =⇒ ∇f (x) ∈ TxMi .

provided {Mi} is a Whitney stratification of Q.

• For semi-algebraic Q, Whitney stratifications always exist!

25/35

Page 80: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsMotivation: Integration by parts∫

Qg∆f dx =

∫∂Q

g〈∇f , n〉dS −∫

Q〈∇g,∇f 〉 dx.

Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.

Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:

x ∈ Mi =⇒ ∇f (x) ∈ TxMi .

provided {Mi} is a Whitney stratification of Q.

• For semi-algebraic Q, Whitney stratifications always exist!

25/35

Page 81: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsMotivation: Integration by parts∫

Qg∆f dx =

∫∂Q

g〈∇f , n〉dS −∫

Q〈∇g,∇f 〉 dx.

Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.

Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:

x ∈ Mi =⇒ ∇f (x) ∈ TxMi .

provided {Mi} is a Whitney stratification of Q.

• For semi-algebraic Q, Whitney stratifications always exist!

25/35

Page 82: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsMotivation: Integration by parts∫

Qg∆f dx =

∫∂Q

g〈∇f , n〉dS −∫

Q〈∇g,∇f 〉 dx.

Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.

Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:

x ∈ Mi =⇒ ∇f (x) ∈ TxMi .

provided {Mi} is a Whitney stratification of Q.

• For semi-algebraic Q, Whitney stratifications always exist!25/35

Page 83: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsWhat about higher order of smoothness?

Need a more stringentstratification.

Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies

PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .

Mi

Mj

x

PMj(x)

PMi(x) = PMi

◦ PMj(x)

• If {Mi} is normally flat, can guarantee f is C∞−smooth!

Example: partition of polyhedra into open faces is normally flat.

26/35

Page 84: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsWhat about higher order of smoothness? Need a more stringentstratification.

Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies

PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .

Mi

Mj

x

PMj(x)

PMi(x) = PMi

◦ PMj(x)

• If {Mi} is normally flat, can guarantee f is C∞−smooth!

Example: partition of polyhedra into open faces is normally flat.

26/35

Page 85: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsWhat about higher order of smoothness? Need a more stringentstratification.

Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies

PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .

Mi

Mj

x

PMj(x)

PMi(x) = PMi

◦ PMj(x)

• If {Mi} is normally flat, can guarantee f is C∞−smooth!

Example: partition of polyhedra into open faces is normally flat.

26/35

Page 86: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsWhat about higher order of smoothness? Need a more stringentstratification.

Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies

PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .

Mi

Mj

x

PMj(x)

PMi(x) = PMi

◦ PMj(x)

• If {Mi} is normally flat, can guarantee f is C∞−smooth!

Example: partition of polyhedra into open faces is normally flat.

26/35

Page 87: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functionsWhat about higher order of smoothness? Need a more stringentstratification.

Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies

PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .

Mi

Mj

x

PMj(x)

PMi(x) = PMi

◦ PMj(x)

• If {Mi} is normally flat, can guarantee f is C∞−smooth!

Example: partition of polyhedra into open faces is normally flat.26/35

Page 88: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functions

Any other examples?

Notation:• Sn are n × n symmetric matrices.• λ : Sn → Rn is the eigenvalue map

λ(A) = (λ1(A), . . . , λn(A)).

Normally flat stratifications “lift” (D-Larsson ’12):

A symmetric stratification {Mi} of Q ⊂ Rn is normally flat⇐⇒ stratification {λ−1(Mi)} of λ−1(Q) ⊂ Sn is normally flat.

27/35

Page 89: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functions

Any other examples?

Notation:• Sn are n × n symmetric matrices.• λ : Sn → Rn is the eigenvalue map

λ(A) = (λ1(A), . . . , λn(A)).

Normally flat stratifications “lift” (D-Larsson ’12):

A symmetric stratification {Mi} of Q ⊂ Rn is normally flat⇐⇒ stratification {λ−1(Mi)} of λ−1(Q) ⊂ Sn is normally flat.

27/35

Page 90: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functions

Any other examples?

Notation:• Sn are n × n symmetric matrices.• λ : Sn → Rn is the eigenvalue map

λ(A) = (λ1(A), . . . , λn(A)).

Normally flat stratifications “lift” (D-Larsson ’12):

A symmetric stratification {Mi} of Q ⊂ Rn is normally flat⇐⇒ stratification {λ−1(Mi)} of λ−1(Q) ⊂ Sn is normally flat.

27/35

Page 91: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functions

Examples: Sn+ = {X ∈ Sn : X � 0},

Rn×n+ = {X ∈ Rn×n : det(X) ≥ 0},Bn

2 = {X ∈ Rn×n : σmax ≤ 1}.

=⇒ Strongest approximation results become available!

Applications:

1. Probability: properties of the matrix-valued Bessel Process(Larsson ’12)

2. PDEs: Geometry of Sobolev spaces.3. More forthcoming...

28/35

Page 92: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functions

Examples: Sn+ = {X ∈ Sn : X � 0},

Rn×n+ = {X ∈ Rn×n : det(X) ≥ 0},Bn

2 = {X ∈ Rn×n : σmax ≤ 1}.

=⇒ Strongest approximation results become available!

Applications:

1. Probability: properties of the matrix-valued Bessel Process(Larsson ’12)

2. PDEs: Geometry of Sobolev spaces.3. More forthcoming...

28/35

Page 93: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Approximation of functions

Examples: Sn+ = {X ∈ Sn : X � 0},

Rn×n+ = {X ∈ Rn×n : det(X) ≥ 0},Bn

2 = {X ∈ Rn×n : σmax ≤ 1}.

=⇒ Strongest approximation results become available!

Applications:1. Probability: properties of the matrix-valued Bessel Process

(Larsson ’12)2. PDEs: Geometry of Sobolev spaces.3. More forthcoming...

28/35

Page 94: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization.

29/35

Page 95: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization

Figure: Q is 4× 4 Toeplitz spectrahedron

Definition (Partial Smoothness)A convex set Q is partly smooth relative toM⊂ Q if1. (Smoothness)M is a smooth manifold,2. (Sharpness) NM = spanNQ onM,3. (Continuity) NQ varies continuously onM.

30/35

Page 96: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization

Figure: Q is 4× 4 Toeplitz spectrahedron

Definition (Partial Smoothness)A convex set Q is partly smooth relative toM⊂ Q if1. (Smoothness)M is a smooth manifold,2. (Sharpness) NM = spanNQ onM,3. (Continuity) NQ varies continuously onM.

30/35

Page 97: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization

Why do optimizers care?

• Many optimization algorithms identifyM in finite time!

Eg: Gradient projection, Newton-like, proximal point.

=⇒ Acceleration strategies!

How to see this structure in eigenvalue optimization?

Partly smooth manifolds “lift” (Daniilidis-D-Lewis):

Q partly smooth relative toMm

λ−1(Q) partly smooth relative to λ−1(M)

provided Q is symmetric.

31/35

Page 98: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization

Why do optimizers care?

• Many optimization algorithms identifyM in finite time!

Eg: Gradient projection, Newton-like, proximal point.

=⇒ Acceleration strategies!

How to see this structure in eigenvalue optimization?

Partly smooth manifolds “lift” (Daniilidis-D-Lewis):

Q partly smooth relative toMm

λ−1(Q) partly smooth relative to λ−1(M)

provided Q is symmetric.

31/35

Page 99: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization

Why do optimizers care?

• Many optimization algorithms identifyM in finite time!

Eg: Gradient projection, Newton-like, proximal point.=⇒ Acceleration strategies!

How to see this structure in eigenvalue optimization?

Partly smooth manifolds “lift” (Daniilidis-D-Lewis):

Q partly smooth relative toMm

λ−1(Q) partly smooth relative to λ−1(M)

provided Q is symmetric.

31/35

Page 100: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization

Why do optimizers care?

• Many optimization algorithms identifyM in finite time!

Eg: Gradient projection, Newton-like, proximal point.=⇒ Acceleration strategies!

How to see this structure in eigenvalue optimization?

Partly smooth manifolds “lift” (Daniilidis-D-Lewis):

Q partly smooth relative toMm

λ−1(Q) partly smooth relative to λ−1(M)

provided Q is symmetric.

31/35

Page 101: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization

Why do optimizers care?

• Many optimization algorithms identifyM in finite time!

Eg: Gradient projection, Newton-like, proximal point.=⇒ Acceleration strategies!

How to see this structure in eigenvalue optimization?

Partly smooth manifolds “lift” (Daniilidis-D-Lewis):

Q partly smooth relative toMm

λ−1(Q) partly smooth relative to λ−1(M)

provided Q is symmetric.

31/35

Page 102: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Active sets in optimization

Why do optimizers care?

• Many optimization algorithms identifyM in finite time!

Eg: Gradient projection, Newton-like, proximal point.=⇒ Acceleration strategies!

How to see this structure in eigenvalue optimization?

Partly smooth manifolds “lift” (Daniilidis-D-Lewis):

Q partly smooth relative toMm

λ−1(Q) partly smooth relative to λ−1(M)

provided Q is symmetric.

31/35

Page 103: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Nonsmoothness under symmetry

Actions:permutation group Σ acts on Rn (permutation)

orthogonal group On acts on Sn (conjugation)

Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).

Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.

Eg: {X : X � 0} = λ−1({x : x ≥ 0}

).

THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!

Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .

32/35

Page 104: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Nonsmoothness under symmetry

Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)

Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).

Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.

Eg: {X : X � 0} = λ−1({x : x ≥ 0}

).

THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!

Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .

32/35

Page 105: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Nonsmoothness under symmetry

Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)

Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).

Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.

Eg: {X : X � 0} = λ−1({x : x ≥ 0}

).

THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!

Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .

32/35

Page 106: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Nonsmoothness under symmetry

Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)

Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).

Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.

Eg: {X : X � 0} = λ−1({x : x ≥ 0}

).

THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!

Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .

32/35

Page 107: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Nonsmoothness under symmetry

Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)

Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).

Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.

Eg: {X : X � 0} = λ−1({x : x ≥ 0}

).

THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!

Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .

32/35

Page 108: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Nonsmoothness under symmetry

Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)

Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).

Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.

Eg: {X : X � 0} = λ−1({x : x ≥ 0}

).

THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!

Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .

32/35

Page 109: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Nonsmoothness under symmetry

Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)

Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).

Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.

Eg: {X : X � 0} = λ−1({x : x ≥ 0}

).

THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!

Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .

32/35

Page 110: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Summary

Slope

Geometry

Analysis Algorithms

33/35

Page 111: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

Thank you.

34/35

Page 112: DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity ...ddrusv/UW_inter.pdfSlopeandgeometryinvariationalmathematics DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity Jointworkwith ArisDaniilidis(Barcelona),AlexD.Ioffe(Technion

References• Optimality, identifiability, and sensitivity, D-Lewis,submitted to Math. Programming Ser. A.

• The dimension of semi-algebraic subdifferentialgraphs, D-Ioffe-Lewis. Nonlinear Analysis, 75(3),1231-1245, 2012.

• Semi-algebraic functions have small subdifferentials,D-Lewis, to appear in Math. Prog. Ser. B.

• Tilt stability, uniform quadratic growth, and strongmetric regularity of the subdifferential, D-Lewis, toappear in SIAM J. on Opt.

• Approximating functions on stratifiable domains,D-Larsson, submitted to the Trans. of the AMS.

• Generic nondegeneracy in convex optimization,D-Lewis, Proc. Amer. Soc. 139 (2011), 2519-2527.

• Trajectories of subgradient dynamical systems,D-Ioffe-Lewis, submitted to SIAM J. on Control and Opt.

• Spectral lifts of identifiable sets and partly smoothmanifolds, Daniilidis-D-Lewis, preprint.

Available at http://people.orie.cornell.edu/dd379/35/35