Convergence analysis for gradient descent optimization ...

Convergence analysis for gradient descent optimizationmethods in the training of ReLU neural networks

Arnulf Jentzen (CUHK Shenzhen, China & University of Münster, Germany)Funded by German Research Foundation through Cluster of Excellence Mathematics Münster

Joint works with

Patrick Cheridito (ETH Zurich, Switzerland)Benjamin Fehrman (University of Oxford, UK)

Benjamin Gess (University of Bielefeld, Germany)Martin Hutzenthaler (University of Duisburg-Essen, Germany)

Benno Kuckuck (University of Münster, Germany)Thomas Kruse (University of Giessen, Germany)

Tuan Anh Nguyen (University of Duisburg-Essen, Germany)Joshua Lee Padgett (University of Arkansas, USA)

Florian Rossmannek (ETH Zurich, Switzerland)Adrian Riekert (University of Münster, Germany)

Wednesday, September 27nd, 2021

Theorem (Hutzenthaler, J, Kruse, Nguyen 2019 SN PDE; Kuckuck, J, Padgett 2021)

Let T , p, κ > 0, let f : R→ R be Lipschitz, ∀ d ∈ N let gd ∈ C1(Rd ,R) andud : [0, T ]× Rd → R be an at most poly. grow. solution of

∂ud∂t = ∆x ud + f(ud ) with ud (0, ·) = gd ,

assume |gd (x)|+ ‖(∇gd )(x)‖ ≤ κdκ(1 + ‖x‖κ), letAl : Rl → Rl , l ∈ N, satisfyAl(x1, . . . , xl) = (maxx1, 0, . . . ,maxxl , 0), let

N = ∪L∈N ∪l0,...,lL∈N (×Ln=1(Rln×ln−1 × Rln )),

letR : N→ ∪∞a,b=1C(Ra,Rb) satisfy for all L ∈ N, l0, . . . , lL ∈ N,Φ = ((W1, B1), . . . , (WL, BL)) ∈ ×L

n=1(Rln×ln−1 × Rln ), x0 ∈ Rl0 , . . ., xL ∈ RlL with∀ n ∈ 1, . . . , L : xn = Aln (Wnxn−1 + Bn) that

(RΦ)(x0) = WLxL−1 + BL,let P : N→ N be the number of parameters, and let (Gd,ε)d∈N, ε∈(0,1] ⊆ N satisfyP(Gd,ε)≤κdκε−κ and |gd (x)− (RGd,ε)(x)| ≤ εκdκ(1 + ‖x‖κ). Then∃ (Ud,ε)d∈N,ε∈(0,1] ⊆ N, c > 0 : ∀ d ∈ N, ε ∈ (0, 1] :[ ∫

[0,T ]×[0,1]d

|ud (y)− (RUd,ε)(y)|p dy

≤ ε and P(Ud,ε) ≤ c dcε−c.

Linear PDEs: Grohs, Hornung, J, von Wurstemberger 2018 Mem. AMS; Berner,Grohs, J 2018 SIAM J. Math. Data Sci.; Elbrächter, Grohs, J, Schwab 2018; . . .

[0,T ]×[0,1]d

Let d,H,P ∈ N, a ∈ R, b > a, u ∈ C([a,b]d ,R) satisfy P = dH + 2H + 1, letµ : B([a,b]d )→ [0,∞) be measure, let LH : RP → R satisfy ∀ θ ∈ RP :

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑Hi=1θH(d+1)+i maxθHd+i +

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

let GH : RP → RP be an appropriately generalized gradient of LH , and letΘ ∈ C([0,∞),RP) satisfy for all t ∈ [0,∞) that Θt = Θ0 −

∫ t0 GH(Θs) ds.

Theorem (Cheridito, J, Riekert, Rossmannek 2021; J, Riekert 2021)

Assume for all x, y ∈ [a,b]d that u(x) = u(y). Then there exist no non-global localminima and no saddle points of LH and limt→∞ LH(Θt) = 0.

Theorem (Cheridito, J, Rossmannek 2021; J, Riekert 2021)

Assume µ = λ[a,b], let α, β ∈ R satisfy u(x) = αx + β, and assume

lim inft→∞‖Θt‖ <∞ and LH(Θ0) < α2(b−a)3

12(2bH/2c+1)4 . Then there exist∞-many

non-global local minima and∞-many saddle points of LH and limt→∞ LH(Θt) = 0.

Theorem (Eberle, J, Riekert, Weiss 2021)

Assume µ λ[a,b]d with a piecewise polynomial density, assume that u is piecewisepolynomial, and assume lim inft→∞‖Θt‖ <∞. Then there exist ϑ ∈ (GH)−1(0),c, β > 0 such that ‖Θt −ϑ‖ ≤ c(1 + t)−β and 0 ≤ LH(Θt)−L(ϑ) ≤ c(1 + t)−1.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

LH(θ)=∫

[a,b]d

[u(x)−θP−

∑dj=1 θ(i−1)d+jxj , 0

]2µ(dx),

∫ t0 GH(Θs) ds.

Theorem (J, Riekert 2021)

Assume µ λ[a,b]d and lim inft→∞‖Θt‖ <∞. Then there existsϑ ∈ (GH)−1(0) such that limt→∞ LH(Θt) = LH(ϑ).

Assume µ ∼ λ[a,b] with a Lipschitz continuous density, assume that u is piecewiseaffine linear, let (Ω,F ,P) be a probability space, for every H, k ∈ N, γ ∈ R letΘH,k,γ

n : Ω→ R3H+1, n ∈ N0, and kH,k,γn : Ω→ N, n ∈ N0, be random variables

satisfying for all n ∈ N0 that

ΘH,k,γn+1 = ΘH,k,γ

n − γGH(ΘH,k,γn ) and kH,k,γ

n ∈ arg min`∈1,...,k LH(ΘH,`,γn ),

and assume for all H ∈ N, γ ∈ R that ΘH,k,γ0 , k ∈ N, are i.i.d. standard normal. Then

∃ g > 0 : ∀ γ ∈ (0, g] :

lim infH→∞ lim infK→∞ P(

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

Proof based on Fehrman, Gess, J 2020 JMLR.

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

∃ g > 0 : ∀ γ ∈ (0, g] :

lim supn→∞ LH

(ΘH,kH,K ,γ

n ,γn

Appendix

Let d,H,P ∈ N, a ∈ R, b ∈ (a,∞), f ∈ C([a,b]d ,R) satisfyP = dH + 2H + 1, let Rr ∈ C(R,R), r ∈ N ∪ ∞, satisfy for all x ∈ R that(⋃

r∈NRr) ⊆ C1(R,R), R∞(x) = maxx, 0,supr∈N supy∈[−|x|,|x|]|(Rr )

′(y)| <∞, and

lim supr→∞(|Rr (x)−R∞(x)|+ |(Rr )

′(x)− 1(0,∞)(x)|)

let µ : B([a,b]d )→ [0,∞] be a measure, let Lr : RP → R, r ∈ N ∪ ∞, satisfyfor all r ∈ N ∪ ∞, θ = (θ1, . . . , θP) ∈ RP that

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

∑dj=1 θ(i−1)d+jxj)

])2µ(d(x1, . . . , xd )),

let G : RP → RP satisfy for all θ ∈ ϑ ∈ RP : ((∇Lr )(ϑ))r∈N is convergent thatG(θ) = limr→∞(∇Lr )(θ), and let Θ ∈ C([0,∞),RP) satisfy

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

There exists an open U ⊆ RP such that∫RP\U 1 dx = 0, (L∞)|U ∈ C1(U,R), and

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

′(y)| <∞, and

′(x)− 1(0,∞)(x)|)

Lr (θ) =∫

[a,b]d

(f(x1, . . . , xd )

−θP −∑H

i=1 θH(d+1)+i

[Rr (θHd+i +

])2µ(d(x1, . . . , xd )),

∀ t ∈ [0,∞) : Θt = Θ0 −∫ t

0 G(Θs) ds.

∇((L∞)|U) = G|U .

Theorem (Cheridito J Riekert Rossmannek 2021, J Riekert 2021)

Assume for all x, y ∈ [a,b]d that f(x) = f(y). Then

1 there exist no non-global local minima of L∞,

2 there exist no saddle points of L∞,

3 it holds for all θ ∈ G−1(0) that θ is a global minima of L∞, and

4 it holds that lim supt→∞ L∞(Θt) = 0.

Theorem (Cheridito J Rossmannek 2021, J Riekert 2021)

Assume µ = λ[a,b], let α, β ∈ R, satisfy for all x ∈ [a,b] that f(x) = αx + β, and

assume supt∈[0,∞)(H − 1)‖Θt‖ <∞, and L∞(Θ0) < α2(b−a)3

12(2bH/2c+1)4 . Then

1 there exist infinitely many non-global local minima of L∞,

2 there exist infinitely many saddle points of L∞, and

Theorem (J Riekert 2021)

Assume supt∈[0,∞)‖Θt‖ <∞ and µ λ[a,b]d . Then there exists ϑ ∈ G−1(0)such that lim supt→∞ L∞(Θt) = L∞(ϑ).

12(2bH/2c+1)4 . Then

Hamiltonian-Jacobi-Bellman equations

Consider∂u∂t = ∆x u − ‖∇x u‖2

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

d Mean Std. dev. Ref. value rel. L1-error Std. dev. rel. error avg. runtime

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

Approximations for u(1, 0); Time steps: 24;SGD steps: 500 (d < 104), 600 (d = 104); Learning rates: 1

10 , 1100 , 1

1000Beck, Becker, Cheridito, J, Neufeld 2019 SISC (to appear)

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

with u(0, x) =√‖x‖ for t ∈ [0, 1], x ∈ Rd .

10 2.07017 0.00634850 2.04629 0.01167 0.00310245 58.200

50 3.15098 0.00275839 3.13788 0.00417 0.00087906 58.359

100 3.75329 0.00136920 3.74471 0.00229 0.00036564 58.329

200 4.46734 0.00079688 4.46172 0.00126 0.00017860 58.159

300 4.94586 0.00087736 4.94105 0.00097 0.00017756 58.819

500 5.62126 0.00045092 5.61735 0.00070 0.00008027 57.670

1000 6.68594 0.00040334 6.68335 0.00039 0.00006035 66.546

5000 9.97266 0.00047098 9.99835 0.00257 0.00004711 393.894

10000 11.87860 0.00022705 11.89099 0.00104 0.00001909 1687.680

10 , 1100 , 1

Full history recursive Multilevel-Picard method: LetT > 0,L, p ≥ 0,Θ = ∪∞n=1Zn,∀ d ∈ N let gd ∈ C(Rd ,R) satisfy ∀ x ∈ Rd : |gd (x)| ≤ L(1 + ‖x‖p), letf : R→ R be Lipschitz, let (Ω,F ,P) probab. sp., let W d,θ : [0, T ]× Ω→ Rd ,d ∈ N, θ ∈ Θ, be i.i.d. Brownian motions, let Sθ : [0, T ]× Ω→ R, θ ∈ Θ, i.i.d.continuous satisfying ∀ t ∈ [0, T ], θ ∈ Θ that Sθt is U[t,T ]-distributed, assume that

(Sθ)θ∈Θ and (W d,θ)θ∈Θ,d∈N are independent, let Ud,θn,M : [0, T ]× Rd × Ω→ R,

d, n,M ∈ Z, θ ∈ Θ, satisfy ∀ d,M ∈ N, n ∈ N0, θ ∈ Θ, t ∈ [0, T ], x ∈ Rd :

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

and ∀ d, n ∈ N let Costd,n ∈ N be the comp. cost of Ud,0n,n (0, 0).

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

Ud,θn,M(t, x) =

Mn∑m=1

gd (x + Wd,(θ,0,−m)T−t )1N(n)

[n−1∑l=0

(T−t)Mn−l

Mn−l∑m=1

Ud,(θ,l,m)l,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

))−1N(l) f

d,(θ,l,m)l−1,M

(θ,l,m)t , x + W

d,(θ,l,m)

S(θ,l,m)t −t

(i) ∀ d ∈ N : there exists ud : [0, T ]× Rd → R at most polyn. grow. solution of

∂ud∂t + 1

2 ∆x ud + f(ud ) = 0 with ud (T , ·) = gd

(ii) ∀ δ > 0 : there exist n : N× (0,∞)→ N and C > 0 : ∀ d ∈ N, ε > 0 :(E[|ud (0, 0)− Ud,0

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

andCostd,nd,ε ≤ Cd1+p(1+δ)ε−(2+δ).

Extensions: Algorithms/Simulations/Proofs: Fully nonlinear PDEs (Beck, E, J 2018JNS), Optimal stopping (Becker, Cheridito, J 2018 JMLR), Uniform errors (Beck,Becker, Grohs, Jaafari, J 2018), Semilinear PDEs/CVA (Hutzenthaler, J, vonWurstemberger 2019 EJP, Hutzenthaler, J, Kruse, Nguyen 2020), Nonlipschitznonlinearities (Beck, Hornung, Hutzenthaler, Jentzen, Kruse 2019 J. Numer. Math.),Gradient dependent nonlinearities (Hutzenthaler, J, Kruse 2019), Elliptic PDEs (Beck,Gonon, J 2020), Discrete problems (Beck, J, Kruse 2020), . . .

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

∂ud∂t + 1

nd,ε,nd,ε(0, 0)|2

])1/2 ≤ ε

Convergence analysis for gradient descent optimization ...

Documents

Transcript of Convergence analysis for gradient descent optimization ...