DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity...
Transcript of DmitriyDrusvyatskiy SchoolofORIE,CornellUniversity...
Slope and geometry in variational mathematics
Dmitriy DrusvyatskiySchool of ORIE, Cornell University
Joint work withAris Daniilidis (Barcelona), Alex D. Ioffe (Technion), Martin Larsson (Lausanne),
and Adrian S. Lewis (Cornell)
January 29, 2013
Fix a metric space (X , d) and a function f : X → R.
Slope: “Fastest instantaneous rate of decrease”
|∇f |(x) := limsupx→x
f (x)− f (x)d(x, x) .
• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.
Critical points:
x is critical for f ⇐⇒ |∇f |(x) = 0.
Eg:
Origin is critical
2/35
Fix a metric space (X , d) and a function f : X → R.
Slope: “Fastest instantaneous rate of decrease”
|∇f |(x) := limsupx→x
f (x)− f (x)d(x, x) .
• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.
Critical points:
x is critical for f ⇐⇒ |∇f |(x) = 0.
Eg:
Origin is critical
2/35
Fix a metric space (X , d) and a function f : X → R.
Slope: “Fastest instantaneous rate of decrease”
|∇f |(x) := limsupx→x
f (x)− f (x)d(x, x) .
• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.
Critical points:
x is critical for f ⇐⇒ |∇f |(x) = 0.
Eg:
Origin is critical
2/35
Fix a metric space (X , d) and a function f : X → R.
Slope: “Fastest instantaneous rate of decrease”
|∇f |(x) := limsupx→x
f (x)− f (x)d(x, x) .
• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.
Critical points:
x is critical for f ⇐⇒ |∇f |(x) = 0.
Eg:
Origin is critical
2/35
Fix a metric space (X , d) and a function f : X → R.
Slope: “Fastest instantaneous rate of decrease”
|∇f |(x) := limsupx→x
f (x)− f (x)d(x, x) .
• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.
Critical points:
x is critical for f ⇐⇒ |∇f |(x) = 0.
Eg:
Origin is critical
2/35
Fix a metric space (X , d) and a function f : X → R.
Slope: “Fastest instantaneous rate of decrease”
|∇f |(x) := limsupx→x
f (x)− f (x)d(x, x) .
• If f : Rn → R is smooth, then |∇f |(x) = |∇f (x)|.
Critical points:
x is critical for f ⇐⇒ |∇f |(x) = 0.Eg:
Origin is critical 2/35
Method of alternating projectionsCommon problem:
Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.
A
Bx
Distance and projection:
dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.
Finding points in PA and PB is often easy!
Eg 1 (simple example): Linear programming:
{x : x ≥ 0}⋂{x : Ax = b}.
Eg 2 (more interesting): Low-order control:
{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.
3/35
Method of alternating projectionsCommon problem:
Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.
A
Bx
Distance and projection:
dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.
Finding points in PA and PB is often easy!
Eg 1 (simple example): Linear programming:
{x : x ≥ 0}⋂{x : Ax = b}.
Eg 2 (more interesting): Low-order control:
{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.
3/35
Method of alternating projectionsCommon problem:
Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.
A
Bx
Distance and projection:
dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.
Finding points in PA and PB is often easy!
Eg 1 (simple example): Linear programming:
{x : x ≥ 0}⋂{x : Ax = b}.
Eg 2 (more interesting): Low-order control:
{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.
3/35
Method of alternating projectionsCommon problem:
Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.
A
Bx
Distance and projection:
dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.
Finding points in PA and PB is often easy!
Eg 1 (simple example): Linear programming:
{x : x ≥ 0}⋂{x : Ax = b}.
Eg 2 (more interesting): Low-order control:
{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.
3/35
Method of alternating projectionsCommon problem:
Given sets A,B ⊂ Rn , find some point x ∈ A ∩ B.
A
Bx
Distance and projection:
dB(x) = miny∈B|x − y| and PB(x) = {nearest points of B to x}.
Finding points in PA and PB is often easy!
Eg 1 (simple example): Linear programming:
{x : x ≥ 0}⋂{x : Ax = b}.
Eg 2 (more interesting): Low-order control:
{X � 0 : rankX ≤ r}⋂{X : A(X) = b}.
3/35
Method of alternating projections
Method of alternating projections (von Neumann ’33):
xk+1 ∈ PB(xk)xk+2 ∈ PA(xk+1)
A
B
x0
The “angle” between A and B drives the convergence!
Quantifying the angle:
ψ(x, y) :={|x − y| if x ∈ A, y ∈ B+∞ otherwise
Comparison of |∇ψy|(x) and |∇ψx |(y) quantifies the angle!
4/35
Method of alternating projections
Method of alternating projections (von Neumann ’33):
xk+1 ∈ PB(xk)xk+2 ∈ PA(xk+1)
A
B
x0
The “angle” between A and B drives the convergence!
Quantifying the angle:
ψ(x, y) :={|x − y| if x ∈ A, y ∈ B+∞ otherwise
Comparison of |∇ψy|(x) and |∇ψx |(y) quantifies the angle!
4/35
Method of alternating projections
Method of alternating projections (von Neumann ’33):
xk+1 ∈ PB(xk)xk+2 ∈ PA(xk+1)
A
B
x0
The “angle” between A and B drives the convergence!
Quantifying the angle:
ψ(x, y) :={|x − y| if x ∈ A, y ∈ B+∞ otherwise
Comparison of |∇ψy|(x) and |∇ψx |(y) quantifies the angle!
4/35
Outline
• Variational geometry:— Subdifferentials— The semi-algebraic case— Slope and error bounds
• Applications:— Alternating projections and transversality— Steepest descent trajectories
• To smooth or not to smooth?
• Active sets in optimization
5/35
SubdifferentialsFréchet subdifferential:
v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,
or equivalentlyv ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).
• Eg: If f is smooth ∂f = {∇f }.Example:
∂(−|·|)(x) =
1 if x < 0∅ if x = 0−1 if x > 0
y = −|x|
Subdifferential graph:
gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.
6/35
SubdifferentialsFréchet subdifferential:
v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,or equivalently
v ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).
• Eg: If f is smooth ∂f = {∇f }.Example:
∂(−|·|)(x) =
1 if x < 0∅ if x = 0−1 if x > 0
y = −|x|
Subdifferential graph:
gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.
6/35
SubdifferentialsFréchet subdifferential:
v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,or equivalently
v ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).• Eg: If f is smooth ∂f = {∇f }.
Example:
∂(−|·|)(x) =
1 if x < 0∅ if x = 0−1 if x > 0
y = −|x|
Subdifferential graph:
gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.
6/35
SubdifferentialsFréchet subdifferential:
v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,or equivalently
v ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).• Eg: If f is smooth ∂f = {∇f }.
Example:
∂(−|·|)(x) =
1 if x < 0∅ if x = 0−1 if x > 0
y = −|x|
Subdifferential graph:
gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.
6/35
SubdifferentialsFréchet subdifferential:
v ∈ ∂f (x) ⇐⇒ x is critical for f − 〈v, ·〉,or equivalently
v ∈ ∂f (x) ⇐⇒ f (x) ≥ f (x) + 〈v, x − x〉+ o(|x − x|).• Eg: If f is smooth ∂f = {∇f }.
Example:
∂(−|·|)(x) =
1 if x < 0∅ if x = 0−1 if x > 0
y = −|x|
Subdifferential graph:
gph ∂f := {(x, v) ∈ Rn ×Rn : v ∈ ∂f (x)}.6/35
Subdifferentials
Fundamental theoretical and computational hurdle:1. |∇f | is not lsc,2. gph ∂f is not closed.
Limiting slope:
|∇f |(x) := liminfx→x
|∇f |(x).
Limiting subdifferential:
gph ∂f := cl (gph ∂f ).
Relationship (Ioffe ’00):
|∇f |(x) = dist (0, ∂f (x)).
7/35
Subdifferentials
Fundamental theoretical and computational hurdle:1. |∇f | is not lsc,2. gph ∂f is not closed.
Limiting slope:
|∇f |(x) := liminfx→x
|∇f |(x).
Limiting subdifferential:
gph ∂f := cl (gph ∂f ).
Relationship (Ioffe ’00):
|∇f |(x) = dist (0, ∂f (x)).
7/35
Subdifferentials
Fundamental theoretical and computational hurdle:1. |∇f | is not lsc,2. gph ∂f is not closed.
Limiting slope:
|∇f |(x) := liminfx→x
|∇f |(x).
Limiting subdifferential:
gph ∂f := cl (gph ∂f ).
Relationship (Ioffe ’00):
|∇f |(x) = dist (0, ∂f (x)).
7/35
Subdifferentials
Are these notions adequate?
Pathology: gph ∂f can be very large!
What do we expect the size of gph ∂f to be?
• If f : Rn → R is smooth, then gph ∂f is n-dimensionalsmooth manifold.
• If f : Rn → R is convex, then gph ∂f is n-dimensionalLipschitz manifold (Minty ’62).
Multiple authors (Rockafellar, Borwein, Wang,. . .):There are functions f : Rn → R with “2n-dimensional” gph ∂f .
8/35
Subdifferentials
Are these notions adequate?
Pathology: gph ∂f can be very large!
What do we expect the size of gph ∂f to be?
• If f : Rn → R is smooth, then gph ∂f is n-dimensionalsmooth manifold.
• If f : Rn → R is convex, then gph ∂f is n-dimensionalLipschitz manifold (Minty ’62).
Multiple authors (Rockafellar, Borwein, Wang,. . .):There are functions f : Rn → R with “2n-dimensional” gph ∂f .
8/35
Subdifferentials
Are these notions adequate?
Pathology: gph ∂f can be very large!
What do we expect the size of gph ∂f to be?
• If f : Rn → R is smooth, then gph ∂f is n-dimensionalsmooth manifold.
• If f : Rn → R is convex, then gph ∂f is n-dimensionalLipschitz manifold (Minty ’62).
Multiple authors (Rockafellar, Borwein, Wang,. . .):There are functions f : Rn → R with “2n-dimensional” gph ∂f .
8/35
Subdifferentials
Are these notions adequate?
Pathology: gph ∂f can be very large!
What do we expect the size of gph ∂f to be?
• If f : Rn → R is smooth, then gph ∂f is n-dimensionalsmooth manifold.
• If f : Rn → R is convex, then gph ∂f is n-dimensionalLipschitz manifold (Minty ’62).
Multiple authors (Rockafellar, Borwein, Wang,. . .):There are functions f : Rn → R with “2n-dimensional” gph ∂f .
8/35
Semi-algebraic geometry
Q ⊂ Rn is semi-algebraic if it is a finite union of
solution sets to finitely many polynomial inequalities.
Semi-algebraicity is robust (Tarski-Seidenberg theorem).Eg:f semi-algebraic =⇒ gph ∂f and |∇f | are semi-algebraic.
Semi-algebraic Q “stratify” into finitely many manifolds {Mi}.Dimension:
dimQ := maxi=1,...,k
{dimMi}.
9/35
Semi-algebraic geometry
Q ⊂ Rn is semi-algebraic if it is a finite union of
solution sets to finitely many polynomial inequalities.
Semi-algebraicity is robust (Tarski-Seidenberg theorem).Eg:f semi-algebraic =⇒ gph ∂f and |∇f | are semi-algebraic.
Semi-algebraic Q “stratify” into finitely many manifolds {Mi}.Dimension:
dimQ := maxi=1,...,k
{dimMi}.
9/35
Semi-algebraic geometry
Q ⊂ Rn is semi-algebraic if it is a finite union of
solution sets to finitely many polynomial inequalities.
Semi-algebraicity is robust (Tarski-Seidenberg theorem).Eg:f semi-algebraic =⇒ gph ∂f and |∇f | are semi-algebraic.
Semi-algebraic Q “stratify” into finitely many manifolds {Mi}.Dimension:
dimQ := maxi=1,...,k
{dimMi}.
9/35
Semi-algebraic geometry
Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have
dim gph ∂f = n,
even locally around any point in gph ∂f .
Conclusion: Criticality is meaningful for concrete variationalproblems!
Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem) ⇒ intervals (a, b) of non-critical values.
What can we learn from non-criticality?
10/35
Semi-algebraic geometry
Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have
dim gph ∂f = n,
even locally around any point in gph ∂f .
Conclusion: Criticality is meaningful for concrete variationalproblems!
Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem) ⇒ intervals (a, b) of non-critical values.
What can we learn from non-criticality?
10/35
Semi-algebraic geometry
Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have
dim gph ∂f = n,
even locally around any point in gph ∂f .
Conclusion: Criticality is meaningful for concrete variationalproblems!
Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem)
⇒ intervals (a, b) of non-critical values.
What can we learn from non-criticality?
10/35
Semi-algebraic geometry
Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have
dim gph ∂f = n,
even locally around any point in gph ∂f .
Conclusion: Criticality is meaningful for concrete variationalproblems!
Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem) ⇒ intervals (a, b) of non-critical values.
What can we learn from non-criticality?
10/35
Semi-algebraic geometry
Theorem (D-Ioffe-Lewis)For semi-algebraic f : Rn → R, we have
dim gph ∂f = n,
even locally around any point in gph ∂f .
Conclusion: Criticality is meaningful for concrete variationalproblems!
Semi-algebraic f have only finitely many critical values(cf. Sard’s Theorem) ⇒ intervals (a, b) of non-critical values.
What can we learn from non-criticality?
10/35
Slope and error boundsCommon problem: Estimate
dist (x, [f ≤ r ]) (difficult).
“The residual”:f (x)− r (easy).
Desirable quality: Exists κ with
dist (x, [f ≤ r ]) ≤ κ(f (x)− r).
Restrict f : Rn → R to a “slice” f−1(a, b).
Lemma (Error bound)The following are equivalent.Non-criticality:
|∇f | ≥ 1κ.
Error-bound:
dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).
11/35
Slope and error boundsCommon problem: Estimate
dist (x, [f ≤ r ]) (difficult).
“The residual”:f (x)− r (easy).
Desirable quality: Exists κ with
dist (x, [f ≤ r ]) ≤ κ(f (x)− r).
Restrict f : Rn → R to a “slice” f−1(a, b).
Lemma (Error bound)The following are equivalent.Non-criticality:
|∇f | ≥ 1κ.
Error-bound:
dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).
11/35
Slope and error boundsCommon problem: Estimate
dist (x, [f ≤ r ]) (difficult).
“The residual”:f (x)− r (easy).
Desirable quality: Exists κ with
dist (x, [f ≤ r ]) ≤ κ(f (x)− r).
Restrict f : Rn → R to a “slice” f−1(a, b).
Lemma (Error bound)The following are equivalent.Non-criticality:
|∇f | ≥ 1κ.
Error-bound:
dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).
11/35
Slope and error boundsCommon problem: Estimate
dist (x, [f ≤ r ]) (difficult).
“The residual”:f (x)− r (easy).
Desirable quality: Exists κ with
dist (x, [f ≤ r ]) ≤ κ(f (x)− r).
Restrict f : Rn → R to a “slice” f−1(a, b).
Lemma (Error bound)The following are equivalent.Non-criticality:
|∇f | ≥ 1κ.
Error-bound:
dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).
11/35
Slope and error boundsCommon problem: Estimate
dist (x, [f ≤ r ]) (difficult).
“The residual”:f (x)− r (easy).
Desirable quality: Exists κ with
dist (x, [f ≤ r ]) ≤ κ(f (x)− r).
Restrict f : Rn → R to a “slice” f−1(a, b).
Lemma (Error bound)The following are equivalent.Non-criticality:
|∇f | ≥ 1κ.
Error-bound:
dist (x, [f ≤ r ]) ≤ κ(f (x)− r), when r ∈ (a, f (x)).
11/35
Alternating projections & transversality
12/35
Convergence of alternating projections
Indicator function:
δA(x) ={
0 if x ∈ A∞ if x /∈ A
Coupling function:
ψ(x, y) = δA(x) + |x − y|+ δB(y).
A
B
x0
Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ
for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)
Equivalently: For any v = x−y|x−y| , either
dist(v, ∂δB(y)
)≥ κ or dist
(− v, ∂δA(x)
)≥ κ.
Normal cone: NB(y) := ∂δB(y).
13/35
Convergence of alternating projectionsIndicator function:
δA(x) ={
0 if x ∈ A∞ if x /∈ A
Coupling function:
ψ(x, y) = δA(x) + |x − y|+ δB(y).
A
B
x0
Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ
for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)
Equivalently: For any v = x−y|x−y| , either
dist(v, ∂δB(y)
)≥ κ or dist
(− v, ∂δA(x)
)≥ κ.
Normal cone: NB(y) := ∂δB(y).
13/35
Convergence of alternating projectionsIndicator function:
δA(x) ={
0 if x ∈ A∞ if x /∈ A
Coupling function:
ψ(x, y) = δA(x) + |x − y|+ δB(y).
A
B
x0
Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ
for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)
Equivalently: For any v = x−y|x−y| , either
dist(v, ∂δB(y)
)≥ κ or dist
(− v, ∂δA(x)
)≥ κ.
Normal cone: NB(y) := ∂δB(y).
13/35
Convergence of alternating projectionsIndicator function:
δA(x) ={
0 if x ∈ A∞ if x /∈ A
Coupling function:
ψ(x, y) = δA(x) + |x − y|+ δB(y).
A
B
x0
Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ
for all x ∈ A and y ∈ B, not in A ∩ B.
(Proof: Error bound)
Equivalently: For any v = x−y|x−y| , either
dist(v, ∂δB(y)
)≥ κ or dist
(− v, ∂δA(x)
)≥ κ.
Normal cone: NB(y) := ∂δB(y).
13/35
Convergence of alternating projectionsIndicator function:
δA(x) ={
0 if x ∈ A∞ if x /∈ A
Coupling function:
ψ(x, y) = δA(x) + |x − y|+ δB(y).
A
B
x0
Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ
for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)
Equivalently: For any v = x−y|x−y| , either
dist(v, ∂δB(y)
)≥ κ or dist
(− v, ∂δA(x)
)≥ κ.
Normal cone: NB(y) := ∂δB(y).
13/35
Convergence of alternating projectionsIndicator function:
δA(x) ={
0 if x ∈ A∞ if x /∈ A
Coupling function:
ψ(x, y) = δA(x) + |x − y|+ δB(y).
A
B
x0
Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ
for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)
Equivalently: For any v = x−y|x−y| , either
dist(v, ∂δB(y)
)≥ κ or dist
(− v, ∂δA(x)
)≥ κ.
Normal cone: NB(y) := ∂δB(y).
13/35
Convergence of alternating projectionsIndicator function:
δA(x) ={
0 if x ∈ A∞ if x /∈ A
Coupling function:
ψ(x, y) = δA(x) + |x − y|+ δB(y).
A
B
x0
Sufficient condition: Exists κ > 0 withmax {|∇ψx |(y), |∇ψy|(x)} ≥ κ
for all x ∈ A and y ∈ B, not in A ∩ B. (Proof: Error bound)
Equivalently: For any v = x−y|x−y| , either
dist(v, ∂δB(y)
)≥ κ or dist
(− v, ∂δA(x)
)≥ κ.
Normal cone: NB(y) := ∂δB(y).13/35
Convergence of alternating projections
dist(v,NB(y)
)≥ κ or dist
(v,−NA(x)
)≥ κ
x1
x4
x3x2
x5
x6x7Q
Figure: Normal cones
Transversality: NA(x) ∩ −NB(x) = {0}.
Local convergence (D-Ioffe-Lewis ’13):
A and B transverse at x =⇒ local R-linear convergence.
14/35
Convergence of alternating projections
dist(v,NB(y)
)≥ κ or dist
(v,−NA(x)
)≥ κ
x1
x4
x3x2
x5
x6x7Q
Figure: Normal cones
Transversality: NA(x) ∩ −NB(x) = {0}.
Local convergence (D-Ioffe-Lewis ’13):
A and B transverse at x =⇒ local R-linear convergence.
14/35
Convergence of alternating projections
dist(v,NB(y)
)≥ κ or dist
(v,−NA(x)
)≥ κ
x1
x4
x3x2
x5
x6x7Q
Figure: Normal cones
Transversality: NA(x) ∩ −NB(x) = {0}.
Local convergence (D-Ioffe-Lewis ’13):
A and B transverse at x =⇒ local R-linear convergence.
14/35
Convergence of alternating projections
dist(v,NB(y)
)≥ κ or dist
(v,−NA(x)
)≥ κ
x1
x4
x3x2
x5
x6x7Q
Figure: Normal cones
Transversality: NA(x) ∩ −NB(x) = {0}.
Local convergence (D-Ioffe-Lewis ’13):
A and B transverse at x =⇒ local R-linear convergence.
14/35
Steepest descent trajectories
15/35
Steepest descent trajectoriesNotation: X is a complete metric space.
Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if
dist (x(t), x(s)) ≤ |t − s|.
What are trajectories of steepest descent?
Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy
|∇(f ◦ x)| ≤ |∇f |(x).
Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].
16/35
Steepest descent trajectoriesNotation: X is a complete metric space.Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if
dist (x(t), x(s)) ≤ |t − s|.
What are trajectories of steepest descent?
Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy
|∇(f ◦ x)| ≤ |∇f |(x).
Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].
16/35
Steepest descent trajectoriesNotation: X is a complete metric space.Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if
dist (x(t), x(s)) ≤ |t − s|.
What are trajectories of steepest descent?
Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy
|∇(f ◦ x)| ≤ |∇f |(x).
Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].
16/35
Steepest descent trajectoriesNotation: X is a complete metric space.Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if
dist (x(t), x(s)) ≤ |t − s|.
What are trajectories of steepest descent?
Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy
|∇(f ◦ x)| ≤ |∇f |(x).
Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].
16/35
Steepest descent trajectoriesNotation: X is a complete metric space.Bounded speed: A curve x : [0,T ]→ X is 1-Lipschitz if
dist (x(t), x(s)) ≤ |t − s|.
What are trajectories of steepest descent?
Motivation: 1-Lipschitz curves x : [0,T ]→ X satisfy
|∇(f ◦ x)| ≤ |∇f |(x).
Definition (Trajectories of near-steepest descent)A curve x : [0,T ]→ X is a trajectory of near-steepest descent if• x is 1-Lipschitz,• f ◦ x is decreasing,• |(f ◦ x)′| ≥ |∇f |(x), a.e. on [0,T ].
16/35
Example
Figure: f (x, y) = max{x + y, |x − y|}+ x(x + 1) + y(y + 1) + 10017/35
Subgradient dynamical systems
Theorem (D-Ioffe-Lewis)For reasonable f : Rn → R and a curve x : [0,T ]→ Rn the followingare equivalent.1. x is a curve of near-steepest descent,2. f ◦ x is decreasing and (after reparametrizing)
x ∈ −∂f (x), a.e. on [0,T ].
Remark: When 2. holds,
x is the shortest element of −∂f (x), a.e. on [0,T ].
Reasonable conditions: f is smooth, convex, or semi-algebraic.
18/35
Subgradient dynamical systems
Theorem (D-Ioffe-Lewis)For reasonable f : Rn → R and a curve x : [0,T ]→ Rn the followingare equivalent.1. x is a curve of near-steepest descent,2. f ◦ x is decreasing and (after reparametrizing)
x ∈ −∂f (x), a.e. on [0,T ].
Remark: When 2. holds,
x is the shortest element of −∂f (x), a.e. on [0,T ].
Reasonable conditions: f is smooth, convex, or semi-algebraic.
18/35
Subgradient dynamical systems
Theorem (D-Ioffe-Lewis)For reasonable f : Rn → R and a curve x : [0,T ]→ Rn the followingare equivalent.1. x is a curve of near-steepest descent,2. f ◦ x is decreasing and (after reparametrizing)
x ∈ −∂f (x), a.e. on [0,T ].
Remark: When 2. holds,
x is the shortest element of −∂f (x), a.e. on [0,T ].
Reasonable conditions: f is smooth, convex, or semi-algebraic.
18/35
Existence
Theorem (Ambrosio et al. ’05, De Giorgi ’93)For reasonable f : X → R, there exist curves of near-steepestdescent starting from any point.
Proof ingredients:• Moreau-Yosida approximation:
xk+1 = argminx∈X
{f (x) + 12τ d
2(x, xk)}.
• Extraneous topologies ⇒ existence of minimizers andconvergence.
Proof is opaque and uses heavy machinery!
19/35
Existence
Theorem (Ambrosio et al. ’05, De Giorgi ’93)For reasonable f : X → R, there exist curves of near-steepestdescent starting from any point.
Proof ingredients:• Moreau-Yosida approximation:
xk+1 = argminx∈X
{f (x) + 12τ d
2(x, xk)}.
• Extraneous topologies ⇒ existence of minimizers andconvergence.
Proof is opaque and uses heavy machinery!
19/35
Existence
Theorem (Ambrosio et al. ’05, De Giorgi ’93)For reasonable f : X → R, there exist curves of near-steepestdescent starting from any point.
Proof ingredients:• Moreau-Yosida approximation:
xk+1 = argminx∈X
{f (x) + 12τ d
2(x, xk)}.
• Extraneous topologies ⇒ existence of minimizers andconvergence.
Proof is opaque and uses heavy machinery!
19/35
New proof idea: Error bound lemma
Equipartition: For η > 0,
f (x0)− η = τ0 < . . . < τk = f (x0).
Initialize: j = 0;while i ≤ k do
xj+1 ← P[f≤τj+1](xj);end
Consider resulting trajectories as k →∞.
20/35
New proof idea: Error bound lemma
Equipartition: For η > 0,
f (x0)− η = τ0 < . . . < τk = f (x0).
Initialize: j = 0;while i ≤ k do
xj+1 ← P[f≤τj+1](xj);end
Consider resulting trajectories as k →∞.
20/35
New proof idea: Error bound lemma
Equipartition: For η > 0,
f (x0)− η = τ0 < . . . < τk = f (x0).
Initialize: j = 0;while i ≤ k do
xj+1 ← P[f≤τj+1](xj);end
Consider resulting trajectories as k →∞.
20/35
Lengths of steepest descent curves
One motivation: Algorithm complexity (Eg: Attouch et al. ’11).
Theorem (D-Ioffe-Lewis)If f is semi-algebraic, then any bounded trajectory ofnear-steepest descent (with maximal domain) has bounded lengthand converges to a critical point.
Historic remarks:• Lojasiewicz famously proved it for real analytic functions.• True for convex f (Daniilidis et al. ’12).• Not true for C∞ functions (Palis, de Melo).
21/35
Lengths of steepest descent curves
One motivation: Algorithm complexity (Eg: Attouch et al. ’11).
Theorem (D-Ioffe-Lewis)If f is semi-algebraic, then any bounded trajectory ofnear-steepest descent (with maximal domain) has bounded lengthand converges to a critical point.
Historic remarks:• Lojasiewicz famously proved it for real analytic functions.• True for convex f (Daniilidis et al. ’12).• Not true for C∞ functions (Palis, de Melo).
21/35
Lengths of steepest descent curves
One motivation: Algorithm complexity (Eg: Attouch et al. ’11).
Theorem (D-Ioffe-Lewis)If f is semi-algebraic, then any bounded trajectory ofnear-steepest descent (with maximal domain) has bounded lengthand converges to a critical point.
Historic remarks:• Lojasiewicz famously proved it for real analytic functions.• True for convex f (Daniilidis et al. ’12).• Not true for C∞ functions (Palis, de Melo).
21/35
To smooth or not to smooth?
22/35
Approximation of functionsMollification:
fε(x) := (f ∗ φε)(x) =∫
Rnφε(x − y)f (y) dy
where φε is a scaled standard bump function φ
• Downside: Not much control on derivatives.
23/35
Approximation of functionsMollification:
fε(x) := (f ∗ φε)(x) =∫
Rnφε(x − y)f (y) dy
where φε is a scaled standard bump function φ
• Downside: Not much control on derivatives.
23/35
Approximation of functionsSet-up: Q ↪−→ Rn f−−→ R.
Q
Assume Q is a disjoint union of C∞ manifolds
Q = M1 ∪M2 ∪ . . . ∪Mk−1 ∪Mk
24/35
Approximation of functionsMotivation: Integration by parts∫
Qg∆f dx
=∫∂Q
g〈∇f , n〉dS −∫
Q〈∇g,∇f 〉 dx.
Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.
Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:
x ∈ Mi =⇒ ∇f (x) ∈ TxMi .
provided {Mi} is a Whitney stratification of Q.
• For semi-algebraic Q, Whitney stratifications always exist!
25/35
Approximation of functionsMotivation: Integration by parts∫
Qg∆f dx =
∫∂Q
g〈∇f , n〉dS −∫
Q〈∇g,∇f 〉 dx.
Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.
Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:
x ∈ Mi =⇒ ∇f (x) ∈ TxMi .
provided {Mi} is a Whitney stratification of Q.
• For semi-algebraic Q, Whitney stratifications always exist!
25/35
Approximation of functionsMotivation: Integration by parts∫
Qg∆f dx =
∫∂Q
g〈∇f , n〉dS −∫
Q〈∇g,∇f 〉 dx.
Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.
Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:
x ∈ Mi =⇒ ∇f (x) ∈ TxMi .
provided {Mi} is a Whitney stratification of Q.
• For semi-algebraic Q, Whitney stratifications always exist!
25/35
Approximation of functionsMotivation: Integration by parts∫
Qg∆f dx =
∫∂Q
g〈∇f , n〉dS −∫
Q〈∇g,∇f 〉 dx.
Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.
Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:
x ∈ Mi =⇒ ∇f (x) ∈ TxMi .
provided {Mi} is a Whitney stratification of Q.
• For semi-algebraic Q, Whitney stratifications always exist!
25/35
Approximation of functionsMotivation: Integration by parts∫
Qg∆f dx =
∫∂Q
g〈∇f , n〉dS −∫
Q〈∇g,∇f 〉 dx.
Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.
Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:
x ∈ Mi =⇒ ∇f (x) ∈ TxMi .
provided {Mi} is a Whitney stratification of Q.
• For semi-algebraic Q, Whitney stratifications always exist!
25/35
Approximation of functionsMotivation: Integration by parts∫
Qg∆f dx =
∫∂Q
g〈∇f , n〉dS −∫
Q〈∇g,∇f 〉 dx.
Goal: approximate f by a C2−smooth f so that the ∇f ⊥ n.
Theorem (D-Larsson)Given a continuous f : Rn → R and any ε > 0, there exists aC1-smooth f satisfying1. Closeness: |f (x)− f (x)| < ε for all x ∈ Rn,2. Neumann Boundary condition:
x ∈ Mi =⇒ ∇f (x) ∈ TxMi .
provided {Mi} is a Whitney stratification of Q.
• For semi-algebraic Q, Whitney stratifications always exist!25/35
Approximation of functionsWhat about higher order of smoothness?
Need a more stringentstratification.
Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies
PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .
Mi
Mj
x
PMj(x)
PMi(x) = PMi
◦ PMj(x)
• If {Mi} is normally flat, can guarantee f is C∞−smooth!
Example: partition of polyhedra into open faces is normally flat.
26/35
Approximation of functionsWhat about higher order of smoothness? Need a more stringentstratification.
Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies
PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .
Mi
Mj
x
PMj(x)
PMi(x) = PMi
◦ PMj(x)
• If {Mi} is normally flat, can guarantee f is C∞−smooth!
Example: partition of polyhedra into open faces is normally flat.
26/35
Approximation of functionsWhat about higher order of smoothness? Need a more stringentstratification.
Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies
PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .
Mi
Mj
x
PMj(x)
PMi(x) = PMi
◦ PMj(x)
• If {Mi} is normally flat, can guarantee f is C∞−smooth!
Example: partition of polyhedra into open faces is normally flat.
26/35
Approximation of functionsWhat about higher order of smoothness? Need a more stringentstratification.
Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies
PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .
Mi
Mj
x
PMj(x)
PMi(x) = PMi
◦ PMj(x)
• If {Mi} is normally flat, can guarantee f is C∞−smooth!
Example: partition of polyhedra into open faces is normally flat.
26/35
Approximation of functionsWhat about higher order of smoothness? Need a more stringentstratification.
Definition (Normally flat stratification)The stratification {Mi} is normally flat if Mi ⊂ clMj implies
PMi (x) = PMi ◦ PMj (x) for all x near Mi and Mj .
Mi
Mj
x
PMj(x)
PMi(x) = PMi
◦ PMj(x)
• If {Mi} is normally flat, can guarantee f is C∞−smooth!
Example: partition of polyhedra into open faces is normally flat.26/35
Approximation of functions
Any other examples?
Notation:• Sn are n × n symmetric matrices.• λ : Sn → Rn is the eigenvalue map
λ(A) = (λ1(A), . . . , λn(A)).
Normally flat stratifications “lift” (D-Larsson ’12):
A symmetric stratification {Mi} of Q ⊂ Rn is normally flat⇐⇒ stratification {λ−1(Mi)} of λ−1(Q) ⊂ Sn is normally flat.
27/35
Approximation of functions
Any other examples?
Notation:• Sn are n × n symmetric matrices.• λ : Sn → Rn is the eigenvalue map
λ(A) = (λ1(A), . . . , λn(A)).
Normally flat stratifications “lift” (D-Larsson ’12):
A symmetric stratification {Mi} of Q ⊂ Rn is normally flat⇐⇒ stratification {λ−1(Mi)} of λ−1(Q) ⊂ Sn is normally flat.
27/35
Approximation of functions
Any other examples?
Notation:• Sn are n × n symmetric matrices.• λ : Sn → Rn is the eigenvalue map
λ(A) = (λ1(A), . . . , λn(A)).
Normally flat stratifications “lift” (D-Larsson ’12):
A symmetric stratification {Mi} of Q ⊂ Rn is normally flat⇐⇒ stratification {λ−1(Mi)} of λ−1(Q) ⊂ Sn is normally flat.
27/35
Approximation of functions
Examples: Sn+ = {X ∈ Sn : X � 0},
Rn×n+ = {X ∈ Rn×n : det(X) ≥ 0},Bn
2 = {X ∈ Rn×n : σmax ≤ 1}.
=⇒ Strongest approximation results become available!
Applications:
1. Probability: properties of the matrix-valued Bessel Process(Larsson ’12)
2. PDEs: Geometry of Sobolev spaces.3. More forthcoming...
28/35
Approximation of functions
Examples: Sn+ = {X ∈ Sn : X � 0},
Rn×n+ = {X ∈ Rn×n : det(X) ≥ 0},Bn
2 = {X ∈ Rn×n : σmax ≤ 1}.
=⇒ Strongest approximation results become available!
Applications:
1. Probability: properties of the matrix-valued Bessel Process(Larsson ’12)
2. PDEs: Geometry of Sobolev spaces.3. More forthcoming...
28/35
Approximation of functions
Examples: Sn+ = {X ∈ Sn : X � 0},
Rn×n+ = {X ∈ Rn×n : det(X) ≥ 0},Bn
2 = {X ∈ Rn×n : σmax ≤ 1}.
=⇒ Strongest approximation results become available!
Applications:1. Probability: properties of the matrix-valued Bessel Process
(Larsson ’12)2. PDEs: Geometry of Sobolev spaces.3. More forthcoming...
28/35
Active sets in optimization.
29/35
Active sets in optimization
Figure: Q is 4× 4 Toeplitz spectrahedron
Definition (Partial Smoothness)A convex set Q is partly smooth relative toM⊂ Q if1. (Smoothness)M is a smooth manifold,2. (Sharpness) NM = spanNQ onM,3. (Continuity) NQ varies continuously onM.
30/35
Active sets in optimization
Figure: Q is 4× 4 Toeplitz spectrahedron
Definition (Partial Smoothness)A convex set Q is partly smooth relative toM⊂ Q if1. (Smoothness)M is a smooth manifold,2. (Sharpness) NM = spanNQ onM,3. (Continuity) NQ varies continuously onM.
30/35
Active sets in optimization
Why do optimizers care?
• Many optimization algorithms identifyM in finite time!
Eg: Gradient projection, Newton-like, proximal point.
=⇒ Acceleration strategies!
How to see this structure in eigenvalue optimization?
Partly smooth manifolds “lift” (Daniilidis-D-Lewis):
Q partly smooth relative toMm
λ−1(Q) partly smooth relative to λ−1(M)
provided Q is symmetric.
31/35
Active sets in optimization
Why do optimizers care?
• Many optimization algorithms identifyM in finite time!
Eg: Gradient projection, Newton-like, proximal point.
=⇒ Acceleration strategies!
How to see this structure in eigenvalue optimization?
Partly smooth manifolds “lift” (Daniilidis-D-Lewis):
Q partly smooth relative toMm
λ−1(Q) partly smooth relative to λ−1(M)
provided Q is symmetric.
31/35
Active sets in optimization
Why do optimizers care?
• Many optimization algorithms identifyM in finite time!
Eg: Gradient projection, Newton-like, proximal point.=⇒ Acceleration strategies!
How to see this structure in eigenvalue optimization?
Partly smooth manifolds “lift” (Daniilidis-D-Lewis):
Q partly smooth relative toMm
λ−1(Q) partly smooth relative to λ−1(M)
provided Q is symmetric.
31/35
Active sets in optimization
Why do optimizers care?
• Many optimization algorithms identifyM in finite time!
Eg: Gradient projection, Newton-like, proximal point.=⇒ Acceleration strategies!
How to see this structure in eigenvalue optimization?
Partly smooth manifolds “lift” (Daniilidis-D-Lewis):
Q partly smooth relative toMm
λ−1(Q) partly smooth relative to λ−1(M)
provided Q is symmetric.
31/35
Active sets in optimization
Why do optimizers care?
• Many optimization algorithms identifyM in finite time!
Eg: Gradient projection, Newton-like, proximal point.=⇒ Acceleration strategies!
How to see this structure in eigenvalue optimization?
Partly smooth manifolds “lift” (Daniilidis-D-Lewis):
Q partly smooth relative toMm
λ−1(Q) partly smooth relative to λ−1(M)
provided Q is symmetric.
31/35
Active sets in optimization
Why do optimizers care?
• Many optimization algorithms identifyM in finite time!
Eg: Gradient projection, Newton-like, proximal point.=⇒ Acceleration strategies!
How to see this structure in eigenvalue optimization?
Partly smooth manifolds “lift” (Daniilidis-D-Lewis):
Q partly smooth relative toMm
λ−1(Q) partly smooth relative to λ−1(M)
provided Q is symmetric.
31/35
Nonsmoothness under symmetry
Actions:permutation group Σ acts on Rn (permutation)
orthogonal group On acts on Sn (conjugation)
Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).
Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.
Eg: {X : X � 0} = λ−1({x : x ≥ 0}
).
THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!
Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .
32/35
Nonsmoothness under symmetry
Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)
Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).
Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.
Eg: {X : X � 0} = λ−1({x : x ≥ 0}
).
THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!
Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .
32/35
Nonsmoothness under symmetry
Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)
Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).
Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.
Eg: {X : X � 0} = λ−1({x : x ≥ 0}
).
THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!
Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .
32/35
Nonsmoothness under symmetry
Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)
Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).
Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.
Eg: {X : X � 0} = λ−1({x : x ≥ 0}
).
THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!
Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .
32/35
Nonsmoothness under symmetry
Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)
Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).
Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.
Eg: {X : X � 0} = λ−1({x : x ≥ 0}
).
THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!
Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .
32/35
Nonsmoothness under symmetry
Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)
Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).
Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.
Eg: {X : X � 0} = λ−1({x : x ≥ 0}
).
THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!
Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .
32/35
Nonsmoothness under symmetry
Actions:permutation group Σ acts on Rn (permutation)orthogonal group On acts on Sn (conjugation)
Correspondence:On-invariant (spectral)⇐⇒ Σ-invariant (symmetric).
Q is spectral ⇐⇒ Q = λ−1(M) withM symmetric.
Eg: {X : X � 0} = λ−1({x : x ≥ 0}
).
THE TRANSFER PRINCIPLE:Variational properties of Q andM are in 1-to-1 correspondence!
Eg: Convexity, stratifiability, Whitney conditions, normalflatness, partial smoothness, . . .
32/35
Summary
Slope
Geometry
Analysis Algorithms
33/35
Thank you.
34/35
References• Optimality, identifiability, and sensitivity, D-Lewis,submitted to Math. Programming Ser. A.
• The dimension of semi-algebraic subdifferentialgraphs, D-Ioffe-Lewis. Nonlinear Analysis, 75(3),1231-1245, 2012.
• Semi-algebraic functions have small subdifferentials,D-Lewis, to appear in Math. Prog. Ser. B.
• Tilt stability, uniform quadratic growth, and strongmetric regularity of the subdifferential, D-Lewis, toappear in SIAM J. on Opt.
• Approximating functions on stratifiable domains,D-Larsson, submitted to the Trans. of the AMS.
• Generic nondegeneracy in convex optimization,D-Lewis, Proc. Amer. Soc. 139 (2011), 2519-2527.
• Trajectories of subgradient dynamical systems,D-Ioffe-Lewis, submitted to SIAM J. on Control and Opt.
• Spectral lifts of identifiable sets and partly smoothmanifolds, Daniilidis-D-Lewis, preprint.
Available at http://people.orie.cornell.edu/dd379/35/35