Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup...
Transcript of Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup...
![Page 1: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/1.jpg)
Prediction without probability: a PDE approachto a model problem from machine learning
Robert V. KohnCourant Institute, NYU
Joint work with Kangping Zhu (PhD 2014)and Nadejda Drenska (in progress)
Mathematics for Nonlinear Phenomena:Analysis and Computation
celebrating Yoshikazu Giga’s contributions and impact
Sapporo, August 2015
Robert V. Kohn Prediction without probability
![Page 2: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/2.jpg)
Looking back
We met in Tokyo inJuly 1982, at aUS-Japan seminar.
Giga came to Courant soon thereafter. We decided to study blowupof ut = ∆u + up. Over the next few years we had a lot of fun.
Asymptotically self-similar blowup of semilinear heat equations,CPAM (1985)
Characterizing blowup using similarity variables, IUMJ (1987)
Nondegeneracy of blowup for semilinear heat equations, CPAM (1989)
Robert V. Kohn Prediction without probability
![Page 3: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/3.jpg)
Looking back
We met in Tokyo inJuly 1982, at aUS-Japan seminar.
Giga came to Courant soon thereafter. We decided to study blowupof ut = ∆u + up. Over the next few years we had a lot of fun.
Asymptotically self-similar blowup of semilinear heat equations,CPAM (1985)
Characterizing blowup using similarity variables, IUMJ (1987)
Nondegeneracy of blowup for semilinear heat equations, CPAM (1989)
Robert V. Kohn Prediction without probability
![Page 4: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/4.jpg)
Looking back
We met in Tokyo inJuly 1982, at aUS-Japan seminar.
Giga came to Courant soon thereafter. We decided to study blowupof ut = ∆u + up. Over the next few years we had a lot of fun.
Asymptotically self-similar blowup of semilinear heat equations,CPAM (1985)
Characterizing blowup using similarity variables, IUMJ (1987)
Nondegeneracy of blowup for semilinear heat equations, CPAM (1989)
Robert V. Kohn Prediction without probability
![Page 5: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/5.jpg)
Over the years
Our paths have crossed many times, and in many ways.
Navier-Stokes
1983, Giga: Time & spatialanalyticity of solutions of theNavier-Stokes equations
1983, Caffarelli-Kohn-Nirenberg:Partial regularity of suitable wksolns of the Navier-Stokes eqns
The Aviles-Giga functional
1987, Aviles-Giga: A math’l pbmrelated to the physical theory ofliquid crystal configurations
2000, Jin-Kohn: Singularperturbation and the energy offolds
Robert V. Kohn Prediction without probability
![Page 6: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/6.jpg)
Over the years
Our paths have crossed many times, and in many ways.
Navier-Stokes
1983, Giga: Time & spatialanalyticity of solutions of theNavier-Stokes equations
1983, Caffarelli-Kohn-Nirenberg:Partial regularity of suitable wksolns of the Navier-Stokes eqns
The Aviles-Giga functional
1987, Aviles-Giga: A math’l pbmrelated to the physical theory ofliquid crystal configurations
2000, Jin-Kohn: Singularperturbation and the energy offolds
Robert V. Kohn Prediction without probability
![Page 7: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/7.jpg)
Over the years
Our paths have crossed many times, and in many ways.
Navier-Stokes
1983, Giga: Time & spatialanalyticity of solutions of theNavier-Stokes equations
1983, Caffarelli-Kohn-Nirenberg:Partial regularity of suitable wksolns of the Navier-Stokes eqns
The Aviles-Giga functional
1987, Aviles-Giga: A math’l pbmrelated to the physical theory ofliquid crystal configurations
2000, Jin-Kohn: Singularperturbation and the energy offolds
Robert V. Kohn Prediction without probability
![Page 8: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/8.jpg)
Over the years
Crystalline surface energies
1998, M-H Giga & Y Giga:Evolving graphs by singularweighted curvature (the first ofmany joint papers!)
1994, Girao-Kohn: Convergenceof a crystalline algorithm for. . . the motion of a graph byweighted curvature
Level-set representations of interface motion
1991, Chen-Giga-Goto:Uniqueness and existence ofviscosity solutions of generalizedmean curvature flow equations
2005, Kohn-Serfaty: Adeterministic-control-basedapproach to motion by curvature
Robert V. Kohn Prediction without probability
![Page 9: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/9.jpg)
Over the years
Crystalline surface energies
1998, M-H Giga & Y Giga:Evolving graphs by singularweighted curvature (the first ofmany joint papers!)
1994, Girao-Kohn: Convergenceof a crystalline algorithm for. . . the motion of a graph byweighted curvature
Level-set representations of interface motion
1991, Chen-Giga-Goto:Uniqueness and existence ofviscosity solutions of generalizedmean curvature flow equations
2005, Kohn-Serfaty: Adeterministic-control-basedapproach to motion by curvature
Robert V. Kohn Prediction without probability
![Page 10: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/10.jpg)
Over the years
Hamilton-Jacobi approach to spiral growth
2013, Giga-Hamamuki:Hamilton-Jacobi equations withdiscontinuous source terms
1999, Kohn-Schulze: Ageometric model for coarseningduring spiral-mode growth of thinfilms
Finite-time flattening of stepped crystals
2011, Giga-Kohn:Scale-invariant extinction timeestimates for some singulardiffusion equations
Robert V. Kohn Prediction without probability
![Page 11: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/11.jpg)
Over the years
Hamilton-Jacobi approach to spiral growth
2013, Giga-Hamamuki:Hamilton-Jacobi equations withdiscontinuous source terms
1999, Kohn-Schulze: Ageometric model for coarseningduring spiral-mode growth of thinfilms
Finite-time flattening of stepped crystals
2011, Giga-Kohn:Scale-invariant extinction timeestimates for some singulardiffusion equations
Robert V. Kohn Prediction without probability
![Page 12: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/12.jpg)
Over the years
Many thanks for
- your huge impact on our field- your leadership (both scientific and practical)- helping our community grow and prosper- a lot of fun in our joint projects- your friendship over the years.
Robert V. Kohn Prediction without probability
![Page 13: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/13.jpg)
Today’s mathematical topic
Prediction without probability: a PDE approach to a modelproblem from machine learning
1 The problem (“prediction with expert advice”)2 Two very simple experts3 Two more realistic experts4 Perspective
Robert V. Kohn Prediction without probability
![Page 14: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/14.jpg)
Prediction with expert advice
Basic idea: given
a data stream
a notion of prediction
some experts
the overall goal is to beat the (retrospectively) best-performing expert– or at least, not do too much worse.
Jargon: minimize regret.
Widely-used paradigm in machine learning. Many variants, assoc todifferent types of data, classes of experts, notions of prediction.
Note analogy to a common business news feature . . .
Robert V. Kohn Prediction without probability
![Page 15: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/15.jpg)
Prediction with expert advice
Basic idea: given
a data stream
a notion of prediction
some experts
the overall goal is to beat the (retrospectively) best-performing expert– or at least, not do too much worse.
Jargon: minimize regret.
Widely-used paradigm in machine learning. Many variants, assoc todifferent types of data, classes of experts, notions of prediction.
Note analogy to a common business news feature . . .
Robert V. Kohn Prediction without probability
![Page 16: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/16.jpg)
The stock prediction problemA classic model problem (T Cover 1965, and many people since):
A stock goes up or down (datastream is binary, no probability)
Investor buys (or sells) f shares of stock at each time step,|f | ≤ 1. Effectively, he is making a prediction.
Two experts (to be specified soon). Regret wrt a given expert =(expert’s gain) - (investor’s gain).
Typical goal: minimize the worst-case value of regret wrtbest-performing expert at a given future time T .
More general goal: Minimize worst-case value of
φ(regret wrt expert 1, regret wrt expert 2)
at time T . (The “typical goal” is φ(x1, x2) = max{x1, x2}.)Robert V. Kohn Prediction without probability
![Page 17: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/17.jpg)
The stock prediction problemA classic model problem (T Cover 1965, and many people since):
A stock goes up or down (datastream is binary, no probability)
Investor buys (or sells) f shares of stock at each time step,|f | ≤ 1. Effectively, he is making a prediction.
Two experts (to be specified soon). Regret wrt a given expert =(expert’s gain) - (investor’s gain).
Typical goal: minimize the worst-case value of regret wrtbest-performing expert at a given future time T .
More general goal: Minimize worst-case value of
φ(regret wrt expert 1, regret wrt expert 2)
at time T . (The “typical goal” is φ(x1, x2) = max{x1, x2}.)Robert V. Kohn Prediction without probability
![Page 18: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/18.jpg)
The stock prediction problemA classic model problem (T Cover 1965, and many people since):
A stock goes up or down (datastream is binary, no probability)
Investor buys (or sells) f shares of stock at each time step,|f | ≤ 1. Effectively, he is making a prediction.
Two experts (to be specified soon). Regret wrt a given expert =(expert’s gain) - (investor’s gain).
Typical goal: minimize the worst-case value of regret wrtbest-performing expert at a given future time T .
More general goal: Minimize worst-case value of
φ(regret wrt expert 1, regret wrt expert 2)
at time T . (The “typical goal” is φ(x1, x2) = max{x1, x2}.)Robert V. Kohn Prediction without probability
![Page 19: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/19.jpg)
Very simple experts vs more realistic experts
Recall: stock goes up or down (datastream is binary, no probability)
Investor buys (or sells) f shares of stock at each time step,|f | ≤ 1. Effectively, he is making a prediction.
Two experts, each using a public algorithm to make his choice.
FIRST PASS: Two very simple experts – one always expects the stockto go up (he chooses f = 1), the other always expects the stock to godown (he chooses f = −1).
SECOND PASS: Two more realistic experts – each looks at the last dmoves, and makes a choice depending on this recent history.
First pass: Kangping Zhu. Second pass: Nadejda Drenska.
Robert V. Kohn Prediction without probability
![Page 20: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/20.jpg)
Getting started: two very simple experts
Essentially an optimal control problem:
state space: (x1, x2) = (regret wrt + expert, regret wrt - expert).
control: investor’s stock purchase |f | ≤ 1.
value function: v(x , t) = optimal (worst-case) time-T result,starting from relative regrets x = (x1, x2) at time t .
Dynamic programming principle:
v(x1, x2, t) = min|f |≤1
maxb=±1
v(new position, t + 1)
= min|f |≤1
maxb=±1
v(x1 + b(1− f ), x2 − b(1 + f ), t + 1)
for t < T , with final-time condition v(x ,T ) = φ(x).
Robert V. Kohn Prediction without probability
![Page 21: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/21.jpg)
The dynamic programming principle
Recall: (x1, x2) = (regret wrt + expert, regret wrt - expert), whereregret = (expert’s gain) - (investor’s gain).
If investor buys f shares and market goes up, investor gains f , the+ expert gains 1, the − expert gains −1. So state moves from (x1, x2)to (x1 + (1− f ), x2 + (−1− f )).
Similarly, if investor buys f shares and market goes down, statemoves from (x1, x2) to (x1 − (1− f ), x2 − (−1− f )).
Hence the dynamic programming principle:
v(x1, x2, t) = min|f |≤1
maxb=±1
v(new position, t + 1)
= min|f |≤1
maxb=±1
v(x1 + b(1− f ), x2 − b(1 + f ), t + 1)
Robert V. Kohn Prediction without probability
![Page 22: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/22.jpg)
Scaling
In machine learning, one is interested in how regret accumulatesover many time steps.
To access this question, it is natural to rescale the problem andlook for a continuum limit.
Our problem has no probability. But our rescaling is like thepassage from random walk to diffusion.
Our problem shares many features with the two-person-gameinterpretation of motion by curvature (work with Sylvia Serfaty,CPAM 2006).
So: consider a scaled version of problem: stock moves are ±ε andtime steps are ε2. The value function is still the optimal worst-casetime-T result. The principle of dynamic programming becomes
wε(x1, x2, t) = min|f |≤1
maxb=±1
wε(x1 + εb(1− f ), x2 − εb(1 + f ), t + ε2).
We expect that w(x , t) = limε→0 wε should solve a PDE.Robert V. Kohn Prediction without probability
![Page 23: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/23.jpg)
The PDE
The PDE is, roughly speaking, the Hamilton-Jacobi-Bellman eqnassoc to our optimal control problem. Sketch of (formal) derivation:
(1) Use Taylor expansion to estimatew(x1 + εb(1− f ), x2 − εb(1 + f ), t + ε2).
(2) Investor chooses f to make the O(ε) terms vanish, sinceotherwise they kill him; this gives f = (∂1w − ∂2w)/(∂1w + ∂2w).
(3) The O(ε2) terms are insensitive to b = ±1; they give thenonlinear PDE
wt + 2〈D2w∇⊥w
∂1w + ∂2w,∇⊥w
∂1w + ∂2w〉 = 0 with ∇⊥w = (−∂2w , ∂1w).
This final-value problem is to be solved with w = φ at t = T .
Robert V. Kohn Prediction without probability
![Page 24: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/24.jpg)
More detailed derivation of pdeDynamic programming principle:
wε(x1, x2, t) = max|f |≤1
minb=±1
wε(x1 + εb(1− f ), x2 − εb(1 + f ), t + ε2)
Taylor expansion:
w(x1+εb(1−f ), x2−εb(1+f ), t+ε2) ≈ w(x1, x2, t)+εb(1−f )w1−εb(1+f )w2
+ 12 w11ε
2b2(1− f )2 − w12ε2b2(1− f )(1 + f ) + 1
2 w22ε2b2(1 + f )2 + wtε
2
After substitution and reorganization:
0 ≈ max|f |≤1
minb=±1
{εb[(1− f )w1 − (1 + f )w2]
+ ε2b2[ 12 w11(1− f )2 − w12(1− f )(1 + f ) + 1
2 w22(1 + f )2 + wt ]}
Order-ε term vanishes when
f =∂1w − ∂2w∂1w + ∂2w
.
Note: we expect ∂1w > 0 and ∂2w > 0. Also: condition |f | ≤ 1 is automatic.
Robert V. Kohn Prediction without probability
![Page 25: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/25.jpg)
Unexpected properties of the PDE
Our PDE is geometric. In fact,
∂tw + 2〈D2w∇⊥w
∂1w + ∂2w,∇⊥w
∂1w + ∂2w〉 = 0
can be rewritten as
∂tw|∇w |
= 2κ|∇w |2
(∂1w + ∂2w)2 ,
where
κ = −div(∇w|∇w |
)is the curvature of a level set of w . Thus the normal velocity of eachlevel set is
vnor =2κ
(n1 + n2)2
where κ is its curvature and n is its unit normal.
Robert V. Kohn Prediction without probability
![Page 26: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/26.jpg)
Unexpected properties of the PDE – cont’d
Our PDE is the linear heat eqn indisguise. In fact, in the rotated(and scaled) coordinate system
ξ = x1 − x2, η = x1 + x2,
each level set of w is an evolving graph over the ξ axis. Moreover, thefunction η(ξ, t) associated with this graph, defined by
w(ξ, η(ξ, t), t) = const
solves the linear heat eqn
ηt + 2ηξξ = 0 for t < T .
The proof is elementary: one checks that for the evolving graph, thenormal velocity is what our PDE says it should be.
Corollary: existence, regularity, and (more or less) explicit solutionsfor a broad class of final-time data.
Thanks to Y. Giga for this observation.Robert V. Kohn Prediction without probability
![Page 27: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/27.jpg)
Convergence as ε→ 0
Main result: If φ(x) = w(x ,T ) is smooth, then
w(x , t)− Cε ≤ wε(x , t) ≤ w(x , t) + Cε
where C is independent of ε. (It grows linearly with T − t .)
Method: A verification argument. One inequality is obtained byconsidering the particular strategy
f = (∂1w − ∂2w)/(∂1w + ∂2w).
The other involves showing (as seen in the formal argument) that noother strategy can do better.
For the most standard regret-minimization problem,φ(x1, x2) = max{x1, x2} is not smooth. In this case our result is a bitweaker; the errors are of order ε| log ε|.
Robert V. Kohn Prediction without probability
![Page 28: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/28.jpg)
Sketch of one inequalityGoal: show that
wε(x , t) ≤ w(x , t) + Cε.
Strategy: Estimate wε(z0, t0) by finding asequence (z1, t1), (z2, t2), . . . (zN , tN) suchthat
tj+1 = tj + ε2 for each j , and tN = T .
wε(zj+1, tj+1) ≥ wε(zj , tj ) for each j .
w(zj+1, tj+1) = w(zj , tj ) + O(ε3).
Since N = (T − t0)/ε2, it follows easily that
w(z0, t0) = w(zN , tN) + O(ε).
Since wε = w at the final time T , we get
wε(z0, t0) ≤ wε(zN , tN) = w(zN , tN) ≤ w(z0, t0) + Cε.
Robert V. Kohn Prediction without probability
![Page 29: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/29.jpg)
Sketch of one inequality, cont’d
The sequence: Recall the dynamicprogramming principle
wε(z0, t0) = min|f |≤1
maxb=±1
wε(
z0 + εb(
f−1f+1
), t0 + ε2
)A specific choice of f gives an inequality;the choice from formal argument gives(
f−1f+1
)= 2
∇⊥w∂1w + ∂2w
evaluated at (z0, t0). Call this v0. Then
wε(z0, t0) ≤ maxb=±1
wε(z0 + εbv0, t0 + ε2) .
Let b0 achieve the max, and set z1 = z0 + εb0v0, t1 = t0 + ε2; we have
wε(z0, t0) ≤ wε(z1, t1).
Iterate to find (zj , tj ), j = 2,3, . . ..Robert V. Kohn Prediction without probability
![Page 30: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/30.jpg)
Sketch of one inequality – cont’d
Proof that w(zj+1, tj+1) = w(zj , tj ) + O(ε3):use the PDE. (Note: since w is smooth,Taylor expansion is honest.)
Using the specific choice of (z1, t1) we get
w(z1, t1) = w(z0, t0) + terms of order ε vanish
+ ε2(∂tw + 2〈D2w ∇⊥w
∂1w+∂2w , ∇⊥w∂1w+∂2w 〉
)+ O(ε3)
in which the RHS is evaluated at (z0, t0).
Using the PDE for w this becomes the desired estimate
w(z1, t1) = w(z0, t0) + terms of order ε2 vanish + O(ε3).
The argument applies for any j . Error terms come from O(ε3) terms inTaylor expansion; so the implicit constant is uniform if D3w and wtt
are uniformly bounded for all x ∈ R and all t < T .Robert V. Kohn Prediction without probability
![Page 31: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/31.jpg)
1 The problem (“prediction with expert advice”)2 Two very simple experts3 Two more realistic experts4 Perspective
Robert V. Kohn Prediction without probability
![Page 32: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/32.jpg)
More realistic experts
So far our experts were very simple (independent of history). Nowlet’s consider two history-dependent experts.
Keep d days of history. Typical state is thusm = (0001011)2 ∈ {0,1, · · · ,2d − 1}. It is updated each day.
Each expert’s prediction is a (known) function of history. The q expertbuys f = q(m) shares; the r expert buys f = r(m) shares.
Otherwise no change: the goal is to optimize the (worst-case) time-Tvalue of regret wrt best-performing expert, or more generally
φ(regret wrt q expert, regret wrt r expert)
Robert V. Kohn Prediction without probability
![Page 33: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/33.jpg)
Dynamic programming becomes a mess
Problem: Dynamic programming doesn’t work so well any more.Apparently
state = (regret wrt q expert, regret wrt r expert, history)
so we’re looking for 2d distinct functions of space and time,wm(x1, x2, t). Dynamic programming principle can be formulated(coupling all 2d functions). We seem headed for a system of PDEs.
However:
(a) Regret accumulates slowly while states change rapidly; so valuefunction should be approx indep of state.
(b) Investor should choose f to achieve market indifference (atleading order in Taylor expansion).
(c) Accumulation of regret occurs at order ε2 (in Taylor expansion).
Using these ideas, we will again get a scalar PDE in the limit ε→ 0.
Robert V. Kohn Prediction without probability
![Page 34: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/34.jpg)
Dynamic programming becomes a mess
Problem: Dynamic programming doesn’t work so well any more.Apparently
state = (regret wrt q expert, regret wrt r expert, history)
so we’re looking for 2d distinct functions of space and time,wm(x1, x2, t). Dynamic programming principle can be formulated(coupling all 2d functions). We seem headed for a system of PDEs.
However:
(a) Regret accumulates slowly while states change rapidly; so valuefunction should be approx indep of state.
(b) Investor should choose f to achieve market indifference (atleading order in Taylor expansion).
(c) Accumulation of regret occurs at order ε2 (in Taylor expansion).
Using these ideas, we will again get a scalar PDE in the limit ε→ 0.
Robert V. Kohn Prediction without probability
![Page 35: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/35.jpg)
Identifying the PDEFormal derivation: ignore dependence of value function w(x1, x2, t)on ε and m; now
x1 = regret wrt q expert, x2 = regret wrt r expert.
If investor chooses f and market goes up/down (b = ±1),
x1 changes by bε(q(m)− f ), x2 changes by bε(r(m)− f ).
So market indifference at order ε requires
w1(q(m)− f ) + w2(r(m)− f ) = −w1(q(m)− f )− w2(r(m)− f ).
Solve for f : if current state is m, then investor should choose
f = (w1q(m) + w2r(m))/(w1 + w2).
Accumulation of regret is at order ε2. With f set by marketindifference, change in w is ε2 times
wt +12 (q(m)− r(m))2〈D2w ∇⊥w
∂1w+∂2w , ∇⊥w∂1w+∂2w 〉.
Worst-case scenario is the one that makes regret accumulate fastest.Robert V. Kohn Prediction without probability
![Page 36: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/36.jpg)
Identifying the PDEFormal derivation: ignore dependence of value function w(x1, x2, t)on ε and m; now
x1 = regret wrt q expert, x2 = regret wrt r expert.
If investor chooses f and market goes up/down (b = ±1),
x1 changes by bε(q(m)− f ), x2 changes by bε(r(m)− f ).
So market indifference at order ε requires
w1(q(m)− f ) + w2(r(m)− f ) = −w1(q(m)− f )− w2(r(m)− f ).
Solve for f : if current state is m, then investor should choose
f = (w1q(m) + w2r(m))/(w1 + w2).
Accumulation of regret is at order ε2. With f set by marketindifference, change in w is ε2 times
wt +12 (q(m)− r(m))2〈D2w ∇⊥w
∂1w+∂2w , ∇⊥w∂1w+∂2w 〉.
Worst-case scenario is the one that makes regret accumulate fastest.Robert V. Kohn Prediction without probability
![Page 37: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/37.jpg)
Identifying the PDEFormal derivation: ignore dependence of value function w(x1, x2, t)on ε and m; now
x1 = regret wrt q expert, x2 = regret wrt r expert.
If investor chooses f and market goes up/down (b = ±1),
x1 changes by bε(q(m)− f ), x2 changes by bε(r(m)− f ).
So market indifference at order ε requires
w1(q(m)− f ) + w2(r(m)− f ) = −w1(q(m)− f )− w2(r(m)− f ).
Solve for f : if current state is m, then investor should choose
f = (w1q(m) + w2r(m))/(w1 + w2).
Accumulation of regret is at order ε2. With f set by marketindifference, change in w is ε2 times
wt +12 (q(m)− r(m))2〈D2w ∇⊥w
∂1w+∂2w , ∇⊥w∂1w+∂2w 〉.
Worst-case scenario is the one that makes regret accumulate fastest.Robert V. Kohn Prediction without probability
![Page 38: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/38.jpg)
Identifying the PDE, cont’dRecall: accumulation of regret per step at state m is
12 (q(m)− r(m))2〈D2w ∇⊥w
∂1w+∂2w , ∇⊥w∂1w+∂2w 〉.
Essentially a problem from graph theory: seek
limN→∞
maxpaths of length N
1N
N∑j=1
(q(mj )− r(mj ))2.
In fact:
It suffices to consider cycles.
There are good algorithms for finding optimal cycles.Robert V. Kohn Prediction without probability
![Page 39: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/39.jpg)
Identifying the PDE, cont’d
Thus finally: the PDE is
wt +12
C∗〈D2w∇⊥w
∂1w + ∂2w,∇⊥w
∂1w + ∂2w〉 = 0,
whereC∗ = max
cycles
1cycle length
∑(q(mj )− r(mj ))2.
Summarizing: for two history-dependent experts,
Investor’s choice of f depends on the state as well as on ∇w(x);it achieves leading-order market indifference.
The value function w solves (almost) the same eqn as before(still reducible to the linear heat eqn!). All that changes is the“diffusion coefficient.”
Rigorous analysis still uses a verification argument (though thereare some new subtleties).
Robert V. Kohn Prediction without probability
![Page 40: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/40.jpg)
What about many history-dependent experts?
Can something similar be done for many history-dependent experts?
If there are K experts then w = w(x1, · · · , xK , t).
Market indifference at order ε still gives a formula for f .
Accumulation of regret sees D2w and Dw (not just a scalarquantity built from them); so the graph problem dependsnontrivially on D2w and Dw .
The formal PDE is much more nonlinear than for two experts.(Analysis: in progress.)
Robert V. Kohn Prediction without probability
![Page 41: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/41.jpg)
Stepping back
Mathematical messages
Stock prediction problem has a continuous-time limit. Reductionto linear heat eqn provides a rather explicit solution.
It provides another example where a deterministic two-persongame leads to 2nd order nonlinear PDE. (For earlier examples,see Kohn-Serfaty CPAM 2006 and CPAM 2010.)
Our analysis was elementary, since PDE is linked to linear heateqn. In other settings, when PDE solution is not smooth,convergence has been proved (without a rate) using viscositymethods.
Robert V. Kohn Prediction without probability
![Page 42: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/42.jpg)
Stepping back
Comparison to the machine learning literature
ML is mostly discrete. It was known that for the unscaled game,worst-case regret after N steps is of order
√N (compare: our
parabolic scaling). Our analysis gives the prefactor.
ML guys are smart. For the classic problem of minimizingworst-case regret, Andoni & Panigrahy found the samestrategies that come from our analysis (arXiv:1305.1359) – butdidn’t have the tools to prove they’re optimal.
The link to a linear heat eqn gives surprisingly explicit solutionsin the continuum limit.
Robert V. Kohn Prediction without probability
![Page 43: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/43.jpg)
Stepping back
Is this just a curiosity?
Key point: since behavior over many time steps is of interest,continuous time viewpoint should be useful.
But: the stock prediction problem is very simple: a binary timeseries and a linear “loss function.” What about other examples?
One might ask: when is worst-case regret minimization a goodidea? Not obvious. . .
Robert V. Kohn Prediction without probability
![Page 44: Prediction without probability: a PDE approach to a model ... · Asymptotically self-similar blowup of semilinear heat equations, CPAM (1985) Characterizing blowup using similarity](https://reader030.fdocuments.in/reader030/viewer/2022040713/5e197404c2ed6820f9370d7f/html5/thumbnails/44.jpg)
Happy Birthday, Yoshi!
Robert V. Kohn Prediction without probability