Entropy numbers of nonlinear systems - Master's thesis … · 2021. 7. 26. · Entropy numbers of...

Entropy numbers of nonlinear systemsMaster’s thesis presentation

Guillaume Wang

ETHZ MINS

March 15, 2021

Framework LTI case Volterra series Generalize classical

Motivation

"What is a nonlinear system?"

Goal: learn S from observations (xi , yi )

Motivation

"What is a nonlinear system?"

Goal: learn S from observations (xi , yi )

MotivationSystem identification

Input: i(t), output: u(t). u(t) = S [ i(t) ]?

MotivationSystem identification

Input: i(t), output: u(t). u(t) = S [ i(t) ]?

MotivationSignal-to-signal tasks in ML

Text-to-text

Super-resolution imaging

MotivationSignal-to-signal tasks in ML

Text-to-text

Super-resolution imaging

Motivation

Classical (regression/classification):

learn f : Rd → R or 0, 1

Nonlinear system identification:

learn (e.g) S : L∞(R)→ L∞(R)

«How difficult is it to learn a mapping?»

Motivation

Classical (regression/classification):

learn f : Rd → R or 0, 1

Nonlinear system identification:

learn (e.g) S : L∞(R)→ L∞(R)

«How difficult is it to learn a mapping?»

1 Framework for this thesis

2 "Parametrize": LTI systems case

3 "Parametrize": Volterra series

4 Generalize classical techniques

Metric entropy

« How difficult is it to learn a set of objects C? »

• Learning theory tool: ε-covering number

Nε(C; ‖·‖) = min

n; ∃(p1, ..., pn) ⊂ C s.t C ⊂

B‖·‖pi ,ε

An ε-covering

Metric entropy

« How difficult is it to learn a set of objects C? »

• Learning theory tool: ε-covering number

Nε(C; ‖·‖) = min

n; ∃(p1, ..., pn) ⊂ C s.t C ⊂

B‖·‖pi ,ε

An ε-covering7

Metric entropy

ε-covering number Nε(C; ‖·‖)Metric entropy log2 Nε(C; ‖·‖)

Why metric entropy? Intuition:

Proposition

For any "bitstring length" ` ∈ N, consider encoder/decoder scheme

E : C → 0, 1` D : 0, 1` → C

log2 Nε(C; ‖·‖) is the minimum ` s.t

infE ,D

supc∈C‖c − D(E (c))‖ ≤ ε

("best-obtainable worst-case error")

→ quantifies "massiveness"→ fundamental bound on compressibility / learnability

Metric entropy

Proposition

E : C → 0, 1` D : 0, 1` → C

infE ,D

→ quantifies "massiveness"→ fundamental bound on compressibility / learnability

Metric entropy

Proposition

E : C → 0, 1` D : 0, 1` → C

infE ,D

→ quantifies "massiveness"→ fundamental bound on compressibility / learnability 8

Entropy numbers

Nε(C; ‖·‖) ε-covering number R∗+ → N

εn(C; ‖·‖) n-th entropy number N→ R∗+

"Metric entropy" ≡ "Entropy number"

Entropy numbers

Nε(C; ‖·‖) ε-covering number R∗+ → Nεn(C; ‖·‖) n-th entropy number N→ R∗+

Entropy numbers

Nε(C; ‖·‖) ε-covering number R∗+ → Nεn(C; ‖·‖) n-th entropy number N→ R∗+

Framework

X space of input signals x(t) (e.g L2(R),C (R), ...)Y space of output signals y(t)S space of systems S

Variants

1 Worst-case error

S = S : X → Y∥∥∥S − S

∥∥∥∞

= supx

∥∥∥S [x ]− S [x ]∥∥∥Y

2 Worst-case error over a subset U ⊂ XS = S : X → Y

∥∥∥S − S∥∥∥∞U

= supx∈U

∥∥∥S [x ]− S [x ]∥∥∥Y

3 Average error over a distribution

S = S : (X ,Σ,P)→ Y∥∥∥S − S

∥∥∥L1P

∥∥∥S [x ]− S [x ]∥∥∥Y

Framework

Variants

1 Worst-case error

S = S : X → Y∥∥∥S − S

∥∥∥∞

= supx

∥∥∥S [x ]− S [x ]∥∥∥Y

∥∥∥S − S∥∥∥∞U

= supx∈U

∥∥∥S [x ]− S [x ]∥∥∥Y

S = S : (X ,Σ,P)→ Y∥∥∥S − S

∥∥∥L1P

∥∥∥S [x ]− S [x ]∥∥∥Y

Framework

Variants1 Worst-case error

S = S : X → Y∥∥∥S − S

∥∥∥∞

= supx

∥∥∥S [x ]− S [x ]∥∥∥Y

∥∥∥S − S∥∥∥∞U

= supx∈U

∥∥∥S [x ]− S [x ]∥∥∥Y

S = S : (X ,Σ,P)→ Y∥∥∥S − S

∥∥∥L1P

∥∥∥S [x ]− S [x ]∥∥∥Y

Framework

S = S : X → Y∥∥∥S − S

∥∥∥∞

= supx

∥∥∥S [x ]− S [x ]∥∥∥Y

∥∥∥S − S∥∥∥∞U

= supx∈U

∥∥∥S [x ]− S [x ]∥∥∥Y

S = S : (X ,Σ,P)→ Y∥∥∥S − S

∥∥∥L1P

∥∥∥S [x ]− S [x ]∥∥∥Y

Framework

S = S : X → Y∥∥∥S − S

∥∥∥∞

= supx

∥∥∥S [x ]− S [x ]∥∥∥Y

∥∥∥S − S∥∥∥∞U

= supx∈U

∥∥∥S [x ]− S [x ]∥∥∥Y

S = S : (X ,Σ,P)→ Y∥∥∥S − S

∥∥∥L1P

∥∥∥S [x ]− S [x ]∥∥∥Y

Framework

X space of input signals x(t) (e.g L2(R),C (R), ...)Y space of output signals y(t)

S space of systems S

S = S : X → Y ‖S‖∞ = supx‖S [x ]‖Y

2 Worst-case error over a subset U ⊂ X

S = S : X → Y ‖S‖∞U = supx∈U‖S [x ]]‖Y

S = S : (X ,Σ,P)→ Y ‖S‖L1P

= Ex ‖S [x ]‖Y

Framework

Worst-case error over a subset U ⊂ X :

S = S : X → Y ‖S‖∞U = supx∈U‖S [x ]]‖Y

Goal:Given S+ ⊂ S, estimate Nε(S+; ‖·‖∞U)

Framework

Worst-case error over a subset U ⊂ X :

S = S : X → Y ‖S‖∞U = supx∈U‖S [x ]]‖Y

Goal:Given S+ ⊂ S, estimate Nε(S+; ‖·‖∞U)

LTI systems

LTI = Linear Time-Invariant

S(λx1 + x2) = λSx1 + Sx2

Convolution representation

Theorem (Schwartz kernel theorem)

For all S : L2(R)→ L2(R) LTI, there exists k ∈ D′(R) s.t

Sx(t) =

∫Rdτ k(t − τ)x(τ) = (k ∗ x)(t)

Proof:

Sx(t) = S

∫Rdτ x(τ) δτ (t)

∫Rdτ x(τ) Sδτ (t)︸︷︷︸ =

∫Rdτ x(τ) k(t, τ)

Time-invariance =⇒ k(t, τ) = k(t − τ)

Convolution representation

Theorem (Schwartz kernel theorem)

For all S : L2(R)→ L2(R) LTI, there exists k ∈ D′(R) s.t

Sx(t) =

∫Rdτ k(t − τ)x(τ) = (k ∗ x)(t)

Proof:

Sx(t) = S

∫Rdτ x(τ) δτ (t)

∫Rdτ x(τ) Sδτ (t)︸︷︷︸ =

∫Rdτ x(τ) k(t, τ)

Time-invariance =⇒ k(t, τ) = k(t − τ)

Convolution representation and norm

S =S : L2(R)→ L2(R) LTI

= Lk : x 7→ (k ∗ x), k ∈ K ' K

S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+

(K = D′(R))

‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)

Theorem

Suppose U = B(L2(R)).

∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥

L∞(R)

Conclusion: reduced to metric entropy in function space ,

S =S : L2(R)→ L2(R) LTI

= Lk : x 7→ (k ∗ x), k ∈ K ' K

S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+

(K = D′(R))

‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)

Theorem

∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥

L∞(R)

S =S : L2(R)→ L2(R) LTI

= Lk : x 7→ (k ∗ x), k ∈ K ' K

S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+

Want (K = D′(R))

‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)

Theorem

∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥

L∞(R)

S =S : L2(R)→ L2(R) LTI

= Lk : x 7→ (k ∗ x), k ∈ K ' K

S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+

Want (K = D′(R))

‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)

Theorem

∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥

L∞(R)

S =S : L2(R)→ L2(R) LTI

= Lk : x 7→ (k ∗ x), k ∈ K ' K

S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+

Want (K = D′(R))

‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)

Theorem

∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥

L∞(R)

S =S : L2(R)→ L2(R) LTI

= Lk : x 7→ (k ∗ x), k ∈ K ' K

S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+

Want (K = D′(R))

‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)

Theorem

∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥

L∞(R)

Conclusion: reduced to metric entropy in function space , 15

Volterra series

LTI system: k(t)

∫Rdτ k(τ) x(t − τ)

Volterra series: sum of monomialsk = (k0, k1, ...)

kn(t1, ..., tn)

Vk [x ] =∞∑n=0

dτ kn(τ ) x(t − τ1)...x(t − τn)

Rk: Taylor series f : Rd → R,

f (x) =∞∑n=0

∑i∈1...dn

ai xi1 ...xin

Volterra series

LTI system: k(t)

kn(t1, ..., tn)

Vk [x ] =∞∑n=0

dτ kn(τ ) x(t − τ1)...x(t − τn)

f (x) =∞∑n=0

∑i∈1...dn

ai xi1 ...xin

Volterra series

LTI system: k(t)

kn(t1, ..., tn)

Vk [x ] =∞∑n=0

dτ kn(τ ) x(t − τ1)...x(t − τn)

f (x) =∞∑n=0

∑i∈1...dn

ai xi1 ...xin

Volterra series

LTI system: k(t)

kn(t1, ..., tn)

Vk [x ] =∞∑n=0

dτ kn(τ ) x(t − τ1)...x(t − τn)

f (x) =∞∑n=0

∑i∈1...dn

ai xi1 ...xin

Volterra series

Vk [x ] =∞∑n=0

dτ kn(τ ) x(t − τ1)...x(t − τn)

Where do x , k , Vk [x ] live? i.e: X ,K,Y?

Theorem

‖Vk [x ]‖Y ≤∞∑n=0

‖kn‖Kn‖x‖nX

X = Lp(R), Kn = Lq(Rn), Y = L∞(R) (1/p + 1/q = 1)

orX = Cb(R), Kn = ba(Rn), Y = Cb(R)

(Convergence of∑∞

n=0 ?)

Volterra series

Vk [x ] =∞∑n=0

dτ kn(τ ) x(t − τ1)...x(t − τn)

Theorem

‖Vk [x ]‖Y ≤∞∑n=0

‖kn‖Kn‖x‖nX

n=0 ?)

Volterra series

Vk [x ] =∞∑n=0

dτ kn(τ ) x(t − τ1)...x(t − τn)

Theorem

‖Vk [x ]‖Y ≤∞∑n=0

‖kn‖Kn‖x‖nX

n=0 ?)18

Volterra series

For simplicity assume finite order N

Vk [x ] =N∑

dτ kn(τ ) x(t − τ1)...x(t − τn)

‖Vk [x ]‖Y ≤N∑

‖kn‖Kn‖x‖nX

≤ cst(N,U) ‖k‖K(K = K0 ×K1 × ...)

Ncst(N,U)·ε(S+; ‖·‖∞U) ≤ Nε(K+; ‖·‖K)

(S+ = Vk ; k ∈ K+)

Conclusion: again reduced to metric entropy in function space ,

Volterra series

Vk [x ] =N∑

dτ kn(τ ) x(t − τ1)...x(t − τn)

‖Vk [x ]‖Y ≤N∑

‖kn‖Kn‖x‖nX

≤ cst(N,U) ‖k‖K(K = K0 ×K1 × ...)

Ncst(N,U)·ε(S+; ‖·‖∞U) ≤ Nε(K+; ‖·‖K)

(S+ = Vk ; k ∈ K+)

Volterra series

Vk [x ] =N∑

dτ kn(τ ) x(t − τ1)...x(t − τn)

‖Vk [x ]‖Y ≤N∑

‖kn‖Kn‖x‖nX

≤ cst(N,U) ‖k‖K(K = K0 ×K1 × ...)

Ncst(N,U)·ε(S+; ‖·‖∞U) ≤ Nε(K+; ‖·‖K)

(S+ = Vk ; k ∈ K+)

Volterra series

Vk [x ] =N∑

dτ kn(τ ) x(t − τ1)...x(t − τn)

‖Vk [x ]‖Y ≤N∑

‖kn‖Kn‖x‖nX

≤ cst(N,U) ‖k‖K(K = K0 ×K1 × ...)

Ncst(N,U)·ε(S+; ‖·‖∞U) ≤ Nε(K+; ‖·‖K)

(S+ = Vk ; k ∈ K+)

"Parametrize" path

Summary: ifS = Φk , k ∈ K

S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K

then ,

Problems:

Upper estimate may be too looseGiven S+ specified differently, how to find K+?

S+ = [i(t) 7→ u(t)] ; R ∈ [Rmin,Rmax ], L ∈ [Lmin, Lmax ] K+??

"Parametrize" path

S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K

then ,

Problems:

Upper estimate may be too looseGiven S+ specified differently, how to find K+?

"Parametrize" path

S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K

then ,

Problems:Upper estimate may be too loose

Given S+ specified differently, how to find K+?

"Parametrize" path

S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K

then ,

Problems:Upper estimate may be too looseGiven S+ specified differently, how to find K+?

"Parametrize" path

S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K

then ,

Problems:Upper estimate may be too looseGiven S+ specified differently, how to find K+?

Generalize classical techniques

Different approach: adapt techniques from the caseF+ ⊂ F =

f : Rd → R

Illustrate the "sample and quantize" technique

Proof of metric entropy estimate for the set of lipschitz-continuous functions

f : Rd → R

(Lipschitz)-continuous functions

Lipschitz-continuous functions over a compact interval

F+ ⊂f : [0, 1]→ R; ∀t, t ′,

∣∣f (t)− f (t ′)∣∣ ≤ L

∣∣t − t ′∣∣

F+ ⊂f : [0, 1]→ R; ∀t, t ′,

∣∣f (t)− f (t ′)∣∣ ≤ L

∣∣t − t ′∣∣

F+ ⊂f : [0, 1]→ R; ∀t, t ′,

∣∣f (t)− f (t ′)∣∣ ≤ L

∣∣t − t ′∣∣

(Lipschitz)-continuous systems

Lipschitz-continuous system over a compact metric space

S+ ⊂S : (U, d)→ (Y, ‖·‖Y); ∀x , x ′,

∥∥S [x ]− S [x ′]∥∥Y ≤ Ld(x , x ′)

S+ ⊂S : (U, d)→ (Y, ‖·‖Y); ∀x , x ′,

∥∥S [x ]− S [x ′]∥∥Y ≤ Ld(x , x ′)

Quantize the output set S [x ]; S ∈ S+, x ∈ U ?

S+ ⊂S : (U, d)→ (Y, ‖·‖Y); ∀x , x ′,

∥∥S [x ]− S [x ′]∥∥Y ≤ Ld(x , x ′)

Theorem (Banach-valued Arzela-Ascoli)

S+ relatively compact in C (U;Y) iffS+ equicontinuous (⇐ L-lipschitz)S+ "equicompact": S [x ]; S ∈ S+, x ∈ U relatively compact 23

S+ ⊂S : (U, d)→ (Y, ‖·‖Y); ∀x , x ′,

∥∥S [x ]− S [x ′]∥∥Y ≤ Ld(x , x ′)

Theorem (Banach-valued Arzela-Ascoli)

S+ can be quantized iffS+ equicontinuous (⇐ L-lipschitz)S+ "equicompact": S [x ]; S ∈ S+, x ∈ U can be quantized 24

Conclusion

Framework: want Nε(S+; ‖·‖∞U) where

S+ ⊂ S = S : X → Y∥∥∥S − S

∥∥∥∞U

= supx∈U

∥∥∥S [x ]− S [x ]∥∥∥Y

Two paths:"Parametric"

S+ = Φk , k ∈ K+Nε(S+; ‖·‖∞U) ' Nε(K+; ‖·‖K)

f : Rd → R S : X → Y

Direction for future work: "non-parametric" path?Kernel methods for nonlinear system identification

Conclusion

S+ ⊂ S = S : X → Y∥∥∥S − S

∥∥∥∞U

= supx∈U

∥∥∥S [x ]− S [x ]∥∥∥Y

S+ = Φk , k ∈ K+Nε(S+; ‖·‖∞U) ' Nε(K+; ‖·‖K)

f : Rd → R S : X → Y

Conclusion

S+ ⊂ S = S : X → Y∥∥∥S − S

∥∥∥∞U

= supx∈U

∥∥∥S [x ]− S [x ]∥∥∥Y

S+ = Φk , k ∈ K+Nε(S+; ‖·‖∞U) ' Nε(K+; ‖·‖K)

f : Rd → R S : X → Y

Illustrations adapted from"Nonlinear system modeling based on the Wiener theory"(Schetzen 1981)https://commons.wikimedia.org/wiki/File:Circuit_L_R_parall%

C3%A8le_-_courant_en_entr%C3%A9e_et_tension_en_sortie.png

https://xkcd.com/730/

https://towardsdatascience.com/8a3fbfdc5e9b

"’Zero-Shot’ Super-Resolution using Deep Internal Learning"(Shocher et al. 2017)Neural Network Theory lecture notes HS2019 ETHZIntroduction to Digital Communications, Chapter 3 (Grami2016)"ε-Entropy and ε-Capacity of Sets In Functional Spaces"(Kolmogorov and Tikhomirov 1959)

Appendix

5 Volterra series as elements of a polynomial RKBS

6 Misc

Polynomial RKBS Misc

The idea

Time-invariant system → scalar-valued functional (cf report)

Volterra monomial with fixed n

Vkn [x ](t) =

dτ kn(τ ) x(t − τ1)...x(t − τn)

Fθn [x ] =

dτ θn(τ )x(τ1)...x(τn) =

dτ θn(τ )x×n(τ )

Idea: view as linear combination of feature map

Fθn [x ] =

dτ θn(τ )φ(x)(τ ) = 〈φ(x), θn〉

x ∈ Lp(R) 7→ φ(x) = x×n ∈ LpSym(Rn)So Fθn ∈ RKHS with

K (x , x) = 〈φ(x), φ(x)〉 =

(∫Rx x

Problem: ill-defined when x , x ∈ Lp(R)...

The idea

Time-invariant system → scalar-valued functional (cf report)Volterra monomial with fixed n

Vkn [x ](t) =

dτ kn(τ ) x(t − τ1)...x(t − τn)

Fθn [x ] =

K (x , x) = 〈φ(x), φ(x)〉 =

(∫Rx x

The idea

Vkn [x ](t) =

dτ kn(τ ) x(t − τ1)...x(t − τn)

Fθn [x ] =

x ∈ Lp(R) 7→ φ(x) = x×n ∈ LpSym(Rn)

So Fθn ∈ RKHS with

K (x , x) = 〈φ(x), φ(x)〉 =

(∫Rx x

The idea

Vkn [x ](t) =

dτ kn(τ ) x(t − τ1)...x(t − τn)

Fθn [x ] =

K (x , x) = 〈φ(x), φ(x)〉 =

(∫Rx x

Problem: ill-defined when x , x ∈ Lp(R)...1

RKBS = Reproducing Kernel Banach Space

Definition (Lin et al. 2019)

Pair of RKBS: a tuple (B1,B2, 〈·, ·〉B1×B2) s.t

Bi Banach space of (real-valued) functions on Ωi

〈·, ·〉B1×B2continuous bilinear form on B1 × B2

∃K : Ω1 × Ω2 → R, called reproducing kernel, s.t

K (x , ·) ∈ B2 ∀f ∈ B1, f (x) = 〈f ,K (x , ·)〉B1×B2

K (·, y) ∈ B1 ∀g ∈ B2, g(y) = 〈K (·, y), g〉B1×B2

(K is unique)

RKBS = Reproducing Kernel Banach Space

Definition (Lin et al. 2019)

Pair of RKBS: a tuple (B1,B2, 〈·, ·〉B1×B2) s.t

Bi Banach space of (real-valued) functions on Ωi

〈·, ·〉B1×B2continuous bilinear form on B1 × B2

∃K : Ω1 × Ω2 → R, called reproducing kernel, s.t

K (x , ·) ∈ B2 ∀f ∈ B1, f (x) = 〈f ,K (x , ·)〉B1×B2

K (·, y) ∈ B1 ∀g ∈ B2, g(y) = 〈K (·, y), g〉B1×B2

(K is unique)

RKBS from feature maps

Proposition (Lin et al. 2019)

φi : Ωi →Wi Banach (i = 1, 2)

〈·, ·〉W1×W2continuous bilinear form on W1 ×W2

"spanφi (Ωi ) are dense":v ∈ W2; ∀x ∈ Ω1, 〈φ1(x), v〉W1×W2

= 0u ∈ W1; ∀x ∈ Ω2, 〈u, φ2(x)〉W1×W2

This induces a pair of RKBS

B1 :=Fv = 〈φ1(·), v〉W1×W2

; v ∈ W2

‖Fv‖B1:= ‖v‖W2

B2 :=Gu = 〈u, φ2(·)〉W1×W2

; u ∈ W1

‖Gu‖B2:= ‖u‖W1

〈Fv ,Gu〉B1×B2:= 〈u, v〉W1×W2

and K (x , x) = 〈φ1(x), φ2(x)〉W1×W2.

φi : Ωi →Wi Banach (i = 1, 2)〈·, ·〉W1×W2

continuous bilinear form on W1 ×W2

= 0u ∈ W1; ∀x ∈ Ω2, 〈u, φ2(x)〉W1×W2

B1 :=Fv = 〈φ1(·), v〉W1×W2

; v ∈ W2

‖Fv‖B1:= ‖v‖W2

B2 :=Gu = 〈u, φ2(·)〉W1×W2

; u ∈ W1

‖Gu‖B2:= ‖u‖W1

〈Fv ,Gu〉B1×B2:= 〈u, v〉W1×W2

and K (x , x) = 〈φ1(x), φ2(x)〉W1×W2.

= 0u ∈ W1; ∀x ∈ Ω2, 〈u, φ2(x)〉W1×W2

B1 :=Fv = 〈φ1(·), v〉W1×W2

; v ∈ W2

‖Fv‖B1:= ‖v‖W2

B2 :=Gu = 〈u, φ2(·)〉W1×W2

; u ∈ W1

‖Gu‖B2:= ‖u‖W1

〈Fv ,Gu〉B1×B2:= 〈u, v〉W1×W2

and K (x , x) = 〈φ1(x), φ2(x)〉W1×W2.

= 0u ∈ W1; ∀x ∈ Ω2, 〈u, φ2(x)〉W1×W2

B1 :=Fv = 〈φ1(·), v〉W1×W2

; v ∈ W2

‖Fv‖B1:= ‖v‖W2

B2 :=Gu = 〈u, φ2(·)〉W1×W2

; u ∈ W1

‖Gu‖B2:= ‖u‖W1

〈Fv ,Gu〉B1×B2:= 〈u, v〉W1×W2

and K (x , x) = 〈φ1(x), φ2(x)〉W1×W2.

Volterra series as polynomial RKBS

Proposition

φ1 : Lp(R)→ LpSym(Rn), x(t) 7→ x×n(t)

φ2 : Lq(R)→ LqSym(Rn), x(t) 7→ x×n(t)

〈un(t), un(t)〉Lp =∫Rn dt un(t)un(t) (Rk: ≡ duality bracket)

"spanφi (Ωi ) are dense": (cf report)un ∈ LqSym(Rn); ∀x ∈ Lp(R),

⟨x×n, un

= 0un ∈ LpSym(Rn); ∀x ∈ Lq(R),

⟨un, x

×n)⟩Lp

This induces (B1,B2, 〈·, ·〉B1×B2) and

Fθn :

[x 7→

]; θn ∈ LqSym(Rn)

‖Fθn‖B1

= ‖θn‖LqSym(Rn) and K (x , x) =(∫

R x x)n .

→ B1 exactly describes Volterra monomials ,

Proposition

⟨x×n, un

⟨un, x

×n)⟩Lp

Fθn :

[x 7→

‖Fθn‖B1

R x x)n .

Proposition

⟨x×n, un

⟨un, x

×n)⟩Lp

Fθn :

[x 7→

‖Fθn‖B1

R x x)n .

Volterra monomials are exactly elements of a RKBS

Volterra series of finite order: idem (just concatenate)Volterra series of infinite order? → yes, cf report

K (x , x) =∞∑n=0

(∫Rx x

Possible future directions:

Learning in RKBS vs. classical Volterra-series-based systemidentification?What if we want K (x , x) =

∑∞n=0 an

(∫R x x

)n for somean ≥ 0?

Volterra monomials are exactly elements of a RKBSVolterra series of finite order: idem (just concatenate)

Volterra series of infinite order? → yes, cf report

K (x , x) =∞∑n=0

(∫Rx x

∑∞n=0 an

(∫R x x

Volterra monomials are exactly elements of a RKBSVolterra series of finite order: idem (just concatenate)Volterra series of infinite order? → yes, cf report

K (x , x) =∞∑n=0

(∫Rx x

∑∞n=0 an

(∫R x x

K (x , x) =∞∑n=0

(∫Rx x

∑∞n=0 an

(∫R x x

K (x , x) =∞∑n=0

(∫Rx x

Possible future directions:Learning in RKBS vs. classical Volterra-series-based systemidentification?

What if we want K (x , x) =∑∞

n=0 an(∫

R x x)n for some

an ≥ 0?

K (x , x) =∞∑n=0

(∫Rx x

Possible future directions:Learning in RKBS vs. classical Volterra-series-based systemidentification?What if we want K (x , x) =

∑∞n=0 an

(∫R x x

5 Volterra series as elements of a polynomial RKBS

6 Misc

Entropy numbers

Nε(C; ‖·‖) = minn; ∃(p1, ..., pn) ⊂ C s.t C ⊂

⋃i B‖·‖pi ,ε

εn(C; ‖·‖) = infε; ∃(p1, ..., pn) ⊂ C s.t C ⊂

Nε(C; ‖·‖) ≤ n0 ⇐⇒ εn(C; ‖·‖) ≤ ε0"Metric entropy" ≡ "Entropy number"

Entropy numbers

Nε(C; ‖·‖) = minn; ∃(p1, ..., pn) ⊂ C s.t C ⊂

εn(C; ‖·‖) = inf

ε; ∃(p1, ..., pn) ⊂ C s.t C ⊂

Entropy numbers

Nε(C; ‖·‖) = minn; ∃(p1, ..., pn) ⊂ C s.t C ⊂

εn(C; ‖·‖) = inf

ε; ∃(p1, ..., pn) ⊂ C s.t C ⊂

Entropy numbers of nonlinear systems - Master's thesis … · 2021. 7. 26. · Entropy numbers of...

Documents

Transcript of Entropy numbers of nonlinear systems - Master's thesis … · 2021. 7. 26. · Entropy numbers of...