Entropy numbers of nonlinear systems - Master's thesis … · 2021. 7. 26. · Entropy numbers of...
Transcript of Entropy numbers of nonlinear systems - Master's thesis … · 2021. 7. 26. · Entropy numbers of...
Entropy numbers of nonlinear systemsMaster’s thesis presentation
Guillaume Wang
ETHZ MINS
March 15, 2021
Framework LTI case Volterra series Generalize classical
Motivation
"What is a nonlinear system?"
Goal: learn S from observations (xi , yi )
2
Framework LTI case Volterra series Generalize classical
Motivation
"What is a nonlinear system?"
Goal: learn S from observations (xi , yi )
2
Framework LTI case Volterra series Generalize classical
MotivationSystem identification
Input: i(t), output: u(t). u(t) = S [ i(t) ]?
3
Framework LTI case Volterra series Generalize classical
MotivationSystem identification
Input: i(t), output: u(t). u(t) = S [ i(t) ]?
3
Framework LTI case Volterra series Generalize classical
MotivationSignal-to-signal tasks in ML
Text-to-text
Super-resolution imaging
4
Framework LTI case Volterra series Generalize classical
MotivationSignal-to-signal tasks in ML
Text-to-text
Super-resolution imaging
4
Framework LTI case Volterra series Generalize classical
Motivation
Classical (regression/classification):
learn f : Rd → R or 0, 1
Nonlinear system identification:
learn (e.g) S : L∞(R)→ L∞(R)
«How difficult is it to learn a mapping?»
5
Framework LTI case Volterra series Generalize classical
Motivation
Classical (regression/classification):
learn f : Rd → R or 0, 1
Nonlinear system identification:
learn (e.g) S : L∞(R)→ L∞(R)
«How difficult is it to learn a mapping?»
5
1 Framework for this thesis
2 "Parametrize": LTI systems case
3 "Parametrize": Volterra series
4 Generalize classical techniques
1 Framework for this thesis
2 "Parametrize": LTI systems case
3 "Parametrize": Volterra series
4 Generalize classical techniques
Framework LTI case Volterra series Generalize classical
Metric entropy
« How difficult is it to learn a set of objects C? »
• Learning theory tool: ε-covering number
Nε(C; ‖·‖) = min
n; ∃(p1, ..., pn) ⊂ C s.t C ⊂
⋃i
B‖·‖pi ,ε
An ε-covering
7
Framework LTI case Volterra series Generalize classical
Metric entropy
« How difficult is it to learn a set of objects C? »
• Learning theory tool: ε-covering number
Nε(C; ‖·‖) = min
n; ∃(p1, ..., pn) ⊂ C s.t C ⊂
⋃i
B‖·‖pi ,ε
An ε-covering7
Framework LTI case Volterra series Generalize classical
Metric entropy
ε-covering number Nε(C; ‖·‖)Metric entropy log2 Nε(C; ‖·‖)
Why metric entropy? Intuition:
Proposition
For any "bitstring length" ` ∈ N, consider encoder/decoder scheme
E : C → 0, 1` D : 0, 1` → C
log2 Nε(C; ‖·‖) is the minimum ` s.t
infE ,D
supc∈C‖c − D(E (c))‖ ≤ ε
("best-obtainable worst-case error")
→ quantifies "massiveness"→ fundamental bound on compressibility / learnability
8
Framework LTI case Volterra series Generalize classical
Metric entropy
ε-covering number Nε(C; ‖·‖)Metric entropy log2 Nε(C; ‖·‖)
Why metric entropy? Intuition:
Proposition
For any "bitstring length" ` ∈ N, consider encoder/decoder scheme
E : C → 0, 1` D : 0, 1` → C
log2 Nε(C; ‖·‖) is the minimum ` s.t
infE ,D
supc∈C‖c − D(E (c))‖ ≤ ε
("best-obtainable worst-case error")
→ quantifies "massiveness"→ fundamental bound on compressibility / learnability
8
Framework LTI case Volterra series Generalize classical
Metric entropy
ε-covering number Nε(C; ‖·‖)Metric entropy log2 Nε(C; ‖·‖)
Why metric entropy? Intuition:
Proposition
For any "bitstring length" ` ∈ N, consider encoder/decoder scheme
E : C → 0, 1` D : 0, 1` → C
log2 Nε(C; ‖·‖) is the minimum ` s.t
infE ,D
supc∈C‖c − D(E (c))‖ ≤ ε
("best-obtainable worst-case error")
→ quantifies "massiveness"→ fundamental bound on compressibility / learnability 8
Framework LTI case Volterra series Generalize classical
Entropy numbers
Nε(C; ‖·‖) ε-covering number R∗+ → N
εn(C; ‖·‖) n-th entropy number N→ R∗+
"Metric entropy" ≡ "Entropy number"
9
Framework LTI case Volterra series Generalize classical
Entropy numbers
Nε(C; ‖·‖) ε-covering number R∗+ → Nεn(C; ‖·‖) n-th entropy number N→ R∗+
"Metric entropy" ≡ "Entropy number"
9
Framework LTI case Volterra series Generalize classical
Entropy numbers
Nε(C; ‖·‖) ε-covering number R∗+ → Nεn(C; ‖·‖) n-th entropy number N→ R∗+
"Metric entropy" ≡ "Entropy number"
9
Framework LTI case Volterra series Generalize classical
Framework
X space of input signals x(t) (e.g L2(R),C (R), ...)Y space of output signals y(t)S space of systems S
Variants
1 Worst-case error
S = S : X → Y∥∥∥S − S
∥∥∥∞
= supx
∥∥∥S [x ]− S [x ]∥∥∥Y
2 Worst-case error over a subset U ⊂ XS = S : X → Y
∥∥∥S − S∥∥∥∞U
= supx∈U
∥∥∥S [x ]− S [x ]∥∥∥Y
3 Average error over a distribution
S = S : (X ,Σ,P)→ Y∥∥∥S − S
∥∥∥L1P
= Ex
∥∥∥S [x ]− S [x ]∥∥∥Y
10
Framework LTI case Volterra series Generalize classical
Framework
X space of input signals x(t) (e.g L2(R),C (R), ...)Y space of output signals y(t)S space of systems S
Variants
1 Worst-case error
S = S : X → Y∥∥∥S − S
∥∥∥∞
= supx
∥∥∥S [x ]− S [x ]∥∥∥Y
2 Worst-case error over a subset U ⊂ XS = S : X → Y
∥∥∥S − S∥∥∥∞U
= supx∈U
∥∥∥S [x ]− S [x ]∥∥∥Y
3 Average error over a distribution
S = S : (X ,Σ,P)→ Y∥∥∥S − S
∥∥∥L1P
= Ex
∥∥∥S [x ]− S [x ]∥∥∥Y
10
Framework LTI case Volterra series Generalize classical
Framework
X space of input signals x(t) (e.g L2(R),C (R), ...)Y space of output signals y(t)S space of systems S
Variants1 Worst-case error
S = S : X → Y∥∥∥S − S
∥∥∥∞
= supx
∥∥∥S [x ]− S [x ]∥∥∥Y
2 Worst-case error over a subset U ⊂ XS = S : X → Y
∥∥∥S − S∥∥∥∞U
= supx∈U
∥∥∥S [x ]− S [x ]∥∥∥Y
3 Average error over a distribution
S = S : (X ,Σ,P)→ Y∥∥∥S − S
∥∥∥L1P
= Ex
∥∥∥S [x ]− S [x ]∥∥∥Y
10
Framework LTI case Volterra series Generalize classical
Framework
X space of input signals x(t) (e.g L2(R),C (R), ...)Y space of output signals y(t)S space of systems S
Variants1 Worst-case error
S = S : X → Y∥∥∥S − S
∥∥∥∞
= supx
∥∥∥S [x ]− S [x ]∥∥∥Y
2 Worst-case error over a subset U ⊂ XS = S : X → Y
∥∥∥S − S∥∥∥∞U
= supx∈U
∥∥∥S [x ]− S [x ]∥∥∥Y
3 Average error over a distribution
S = S : (X ,Σ,P)→ Y∥∥∥S − S
∥∥∥L1P
= Ex
∥∥∥S [x ]− S [x ]∥∥∥Y
10
Framework LTI case Volterra series Generalize classical
Framework
X space of input signals x(t) (e.g L2(R),C (R), ...)Y space of output signals y(t)S space of systems S
Variants1 Worst-case error
S = S : X → Y∥∥∥S − S
∥∥∥∞
= supx
∥∥∥S [x ]− S [x ]∥∥∥Y
2 Worst-case error over a subset U ⊂ XS = S : X → Y
∥∥∥S − S∥∥∥∞U
= supx∈U
∥∥∥S [x ]− S [x ]∥∥∥Y
3 Average error over a distribution
S = S : (X ,Σ,P)→ Y∥∥∥S − S
∥∥∥L1P
= Ex
∥∥∥S [x ]− S [x ]∥∥∥Y
10
Framework LTI case Volterra series Generalize classical
Framework
X space of input signals x(t) (e.g L2(R),C (R), ...)Y space of output signals y(t)
S space of systems S
Variants1 Worst-case error
S = S : X → Y ‖S‖∞ = supx‖S [x ]‖Y
2 Worst-case error over a subset U ⊂ X
S = S : X → Y ‖S‖∞U = supx∈U‖S [x ]]‖Y
3 Average error over a distribution
S = S : (X ,Σ,P)→ Y ‖S‖L1P
= Ex ‖S [x ]‖Y
11
Framework LTI case Volterra series Generalize classical
Framework
Worst-case error over a subset U ⊂ X :
S = S : X → Y ‖S‖∞U = supx∈U‖S [x ]]‖Y
Goal:Given S+ ⊂ S, estimate Nε(S+; ‖·‖∞U)
12
Framework LTI case Volterra series Generalize classical
Framework
Worst-case error over a subset U ⊂ X :
S = S : X → Y ‖S‖∞U = supx∈U‖S [x ]]‖Y
Goal:Given S+ ⊂ S, estimate Nε(S+; ‖·‖∞U)
12
1 Framework for this thesis
2 "Parametrize": LTI systems case
3 "Parametrize": Volterra series
4 Generalize classical techniques
Framework LTI case Volterra series Generalize classical
LTI systems
LTI = Linear Time-Invariant
S(λx1 + x2) = λSx1 + Sx2
13
Framework LTI case Volterra series Generalize classical
Convolution representation
Theorem (Schwartz kernel theorem)
For all S : L2(R)→ L2(R) LTI, there exists k ∈ D′(R) s.t
Sx(t) =
∫Rdτ k(t − τ)x(τ) = (k ∗ x)(t)
Proof:
Sx(t) = S
∫Rdτ x(τ) δτ (t)
=
∫Rdτ x(τ) Sδτ (t)︸ ︷︷ ︸ =
∫Rdτ x(τ) k(t, τ)
Time-invariance =⇒ k(t, τ) = k(t − τ)
14
Framework LTI case Volterra series Generalize classical
Convolution representation
Theorem (Schwartz kernel theorem)
For all S : L2(R)→ L2(R) LTI, there exists k ∈ D′(R) s.t
Sx(t) =
∫Rdτ k(t − τ)x(τ) = (k ∗ x)(t)
Proof:
Sx(t) = S
∫Rdτ x(τ) δτ (t)
=
∫Rdτ x(τ) Sδτ (t)︸ ︷︷ ︸ =
∫Rdτ x(τ) k(t, τ)
Time-invariance =⇒ k(t, τ) = k(t − τ)
14
Framework LTI case Volterra series Generalize classical
Convolution representation and norm
S =S : L2(R)→ L2(R) LTI
= Lk : x 7→ (k ∗ x), k ∈ K ' K
S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+
Want
(K = D′(R))
‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)
Theorem
Suppose U = B(L2(R)).
∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥
L∞(R)
Conclusion: reduced to metric entropy in function space ,
15
Framework LTI case Volterra series Generalize classical
Convolution representation and norm
S =S : L2(R)→ L2(R) LTI
= Lk : x 7→ (k ∗ x), k ∈ K ' K
S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+
Want
(K = D′(R))
‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)
Theorem
Suppose U = B(L2(R)).
∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥
L∞(R)
Conclusion: reduced to metric entropy in function space ,
15
Framework LTI case Volterra series Generalize classical
Convolution representation and norm
S =S : L2(R)→ L2(R) LTI
= Lk : x 7→ (k ∗ x), k ∈ K ' K
S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+
Want (K = D′(R))
‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)
Theorem
Suppose U = B(L2(R)).
∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥
L∞(R)
Conclusion: reduced to metric entropy in function space ,
15
Framework LTI case Volterra series Generalize classical
Convolution representation and norm
S =S : L2(R)→ L2(R) LTI
= Lk : x 7→ (k ∗ x), k ∈ K ' K
S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+
Want (K = D′(R))
‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)
Theorem
Suppose U = B(L2(R)).
∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥
L∞(R)
Conclusion: reduced to metric entropy in function space ,
15
Framework LTI case Volterra series Generalize classical
Convolution representation and norm
S =S : L2(R)→ L2(R) LTI
= Lk : x 7→ (k ∗ x), k ∈ K ' K
S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+
Want (K = D′(R))
‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)
Theorem
Suppose U = B(L2(R)).
∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥
L∞(R)
Conclusion: reduced to metric entropy in function space ,
15
Framework LTI case Volterra series Generalize classical
Convolution representation and norm
S =S : L2(R)→ L2(R) LTI
= Lk : x 7→ (k ∗ x), k ∈ K ' K
S+ = Lk : x 7→ (k ∗ x), k ∈ K+ ' K+
Want (K = D′(R))
‖Lk‖∞U = ‖k‖K=⇒ Nε(S+; ‖·‖∞U) = Nε(K+; ‖·‖K)
Theorem
Suppose U = B(L2(R)).
∀k ∈ L1(R), ‖Lk‖∞U = |||Lk ||| =∥∥∥k∥∥∥
L∞(R)
Conclusion: reduced to metric entropy in function space , 15
1 Framework for this thesis
2 "Parametrize": LTI systems case
3 "Parametrize": Volterra series
4 Generalize classical techniques
Framework LTI case Volterra series Generalize classical
Volterra series
LTI system: k(t)
Lkx =
∫Rdτ k(τ) x(t − τ)
Volterra series: sum of monomialsk = (k0, k1, ...)
kn(t1, ..., tn)
Vk [x ] =∞∑n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Rk: Taylor series f : Rd → R,
f (x) =∞∑n=0
∑i∈1...dn
ai xi1 ...xin
16
Framework LTI case Volterra series Generalize classical
Volterra series
LTI system: k(t)
Lkx =
∫Rdτ k(τ) x(t − τ)
Volterra series: sum of monomialsk = (k0, k1, ...)
kn(t1, ..., tn)
Vk [x ] =∞∑n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Rk: Taylor series f : Rd → R,
f (x) =∞∑n=0
∑i∈1...dn
ai xi1 ...xin
16
Framework LTI case Volterra series Generalize classical
Volterra series
LTI system: k(t)
Lkx =
∫Rdτ k(τ) x(t − τ)
Volterra series: sum of monomialsk = (k0, k1, ...)
kn(t1, ..., tn)
Vk [x ] =∞∑n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Rk: Taylor series f : Rd → R,
f (x) =∞∑n=0
∑i∈1...dn
ai xi1 ...xin
16
Framework LTI case Volterra series Generalize classical
Volterra series
LTI system: k(t)
Lkx =
∫Rdτ k(τ) x(t − τ)
Volterra series: sum of monomialsk = (k0, k1, ...)
kn(t1, ..., tn)
Vk [x ] =∞∑n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Rk: Taylor series f : Rd → R,
f (x) =∞∑n=0
∑i∈1...dn
ai xi1 ...xin
17
Framework LTI case Volterra series Generalize classical
Volterra series
Vk [x ] =∞∑n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Where do x , k , Vk [x ] live? i.e: X ,K,Y?
Theorem
‖Vk [x ]‖Y ≤∞∑n=0
‖kn‖Kn‖x‖nX
X = Lp(R), Kn = Lq(Rn), Y = L∞(R) (1/p + 1/q = 1)
orX = Cb(R), Kn = ba(Rn), Y = Cb(R)
(Convergence of∑∞
n=0 ?)
18
Framework LTI case Volterra series Generalize classical
Volterra series
Vk [x ] =∞∑n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Where do x , k , Vk [x ] live? i.e: X ,K,Y?
Theorem
‖Vk [x ]‖Y ≤∞∑n=0
‖kn‖Kn‖x‖nX
X = Lp(R), Kn = Lq(Rn), Y = L∞(R) (1/p + 1/q = 1)
orX = Cb(R), Kn = ba(Rn), Y = Cb(R)
(Convergence of∑∞
n=0 ?)
18
Framework LTI case Volterra series Generalize classical
Volterra series
Vk [x ] =∞∑n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Where do x , k , Vk [x ] live? i.e: X ,K,Y?
Theorem
‖Vk [x ]‖Y ≤∞∑n=0
‖kn‖Kn‖x‖nX
X = Lp(R), Kn = Lq(Rn), Y = L∞(R) (1/p + 1/q = 1)
orX = Cb(R), Kn = ba(Rn), Y = Cb(R)
(Convergence of∑∞
n=0 ?)18
Framework LTI case Volterra series Generalize classical
Volterra series
For simplicity assume finite order N
Vk [x ] =N∑
n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
‖Vk [x ]‖Y ≤N∑
n=0
‖kn‖Kn‖x‖nX
≤ cst(N,U) ‖k‖K(K = K0 ×K1 × ...)
Ncst(N,U)·ε(S+; ‖·‖∞U) ≤ Nε(K+; ‖·‖K)
(S+ = Vk ; k ∈ K+)
Conclusion: again reduced to metric entropy in function space ,
19
Framework LTI case Volterra series Generalize classical
Volterra series
For simplicity assume finite order N
Vk [x ] =N∑
n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
‖Vk [x ]‖Y ≤N∑
n=0
‖kn‖Kn‖x‖nX
≤ cst(N,U) ‖k‖K(K = K0 ×K1 × ...)
Ncst(N,U)·ε(S+; ‖·‖∞U) ≤ Nε(K+; ‖·‖K)
(S+ = Vk ; k ∈ K+)
Conclusion: again reduced to metric entropy in function space ,
19
Framework LTI case Volterra series Generalize classical
Volterra series
For simplicity assume finite order N
Vk [x ] =N∑
n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
‖Vk [x ]‖Y ≤N∑
n=0
‖kn‖Kn‖x‖nX
≤ cst(N,U) ‖k‖K(K = K0 ×K1 × ...)
Ncst(N,U)·ε(S+; ‖·‖∞U) ≤ Nε(K+; ‖·‖K)
(S+ = Vk ; k ∈ K+)
Conclusion: again reduced to metric entropy in function space ,
19
Framework LTI case Volterra series Generalize classical
Volterra series
For simplicity assume finite order N
Vk [x ] =N∑
n=0
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
‖Vk [x ]‖Y ≤N∑
n=0
‖kn‖Kn‖x‖nX
≤ cst(N,U) ‖k‖K(K = K0 ×K1 × ...)
Ncst(N,U)·ε(S+; ‖·‖∞U) ≤ Nε(K+; ‖·‖K)
(S+ = Vk ; k ∈ K+)
Conclusion: again reduced to metric entropy in function space ,
19
Framework LTI case Volterra series Generalize classical
"Parametrize" path
Summary: ifS = Φk , k ∈ K
S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K
then ,
Problems:
Upper estimate may be too looseGiven S+ specified differently, how to find K+?
S+ = [i(t) 7→ u(t)] ; R ∈ [Rmin,Rmax ], L ∈ [Lmin, Lmax ] K+??
20
Framework LTI case Volterra series Generalize classical
"Parametrize" path
Summary: ifS = Φk , k ∈ K
S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K
then ,
Problems:
Upper estimate may be too looseGiven S+ specified differently, how to find K+?
S+ = [i(t) 7→ u(t)] ; R ∈ [Rmin,Rmax ], L ∈ [Lmin, Lmax ] K+??
20
Framework LTI case Volterra series Generalize classical
"Parametrize" path
Summary: ifS = Φk , k ∈ K
S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K
then ,
Problems:Upper estimate may be too loose
Given S+ specified differently, how to find K+?
S+ = [i(t) 7→ u(t)] ; R ∈ [Rmin,Rmax ], L ∈ [Lmin, Lmax ] K+??
20
Framework LTI case Volterra series Generalize classical
"Parametrize" path
Summary: ifS = Φk , k ∈ K
S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K
then ,
Problems:Upper estimate may be too looseGiven S+ specified differently, how to find K+?
S+ = [i(t) 7→ u(t)] ; R ∈ [Rmin,Rmax ], L ∈ [Lmin, Lmax ] K+??
20
Framework LTI case Volterra series Generalize classical
"Parametrize" path
Summary: ifS = Φk , k ∈ K
S+ = Φk , k ∈ K+‖Φk‖∞U ≤c ‖k‖K
then ,
Problems:Upper estimate may be too looseGiven S+ specified differently, how to find K+?
S+ = [i(t) 7→ u(t)] ; R ∈ [Rmin,Rmax ], L ∈ [Lmin, Lmax ] K+??
20
1 Framework for this thesis
2 "Parametrize": LTI systems case
3 "Parametrize": Volterra series
4 Generalize classical techniques
Framework LTI case Volterra series Generalize classical
Generalize classical techniques
Different approach: adapt techniques from the caseF+ ⊂ F =
f : Rd → R
Illustrate the "sample and quantize" technique
Proof of metric entropy estimate for the set of lipschitz-continuous functions
21
Framework LTI case Volterra series Generalize classical
Generalize classical techniques
Different approach: adapt techniques from the caseF+ ⊂ F =
f : Rd → R
Illustrate the "sample and quantize" technique
Proof of metric entropy estimate for the set of lipschitz-continuous functions
21
Framework LTI case Volterra series Generalize classical
Generalize classical techniques
Different approach: adapt techniques from the caseF+ ⊂ F =
f : Rd → R
Illustrate the "sample and quantize" technique
Proof of metric entropy estimate for the set of lipschitz-continuous functions
21
Framework LTI case Volterra series Generalize classical
(Lipschitz)-continuous functions
Lipschitz-continuous functions over a compact interval
F+ ⊂f : [0, 1]→ R; ∀t, t ′,
∣∣f (t)− f (t ′)∣∣ ≤ L
∣∣t − t ′∣∣
22
Framework LTI case Volterra series Generalize classical
(Lipschitz)-continuous functions
Lipschitz-continuous functions over a compact interval
F+ ⊂f : [0, 1]→ R; ∀t, t ′,
∣∣f (t)− f (t ′)∣∣ ≤ L
∣∣t − t ′∣∣
22
Framework LTI case Volterra series Generalize classical
(Lipschitz)-continuous functions
Lipschitz-continuous functions over a compact interval
F+ ⊂f : [0, 1]→ R; ∀t, t ′,
∣∣f (t)− f (t ′)∣∣ ≤ L
∣∣t − t ′∣∣
22
Framework LTI case Volterra series Generalize classical
(Lipschitz)-continuous systems
Lipschitz-continuous system over a compact metric space
S+ ⊂S : (U, d)→ (Y, ‖·‖Y); ∀x , x ′,
∥∥S [x ]− S [x ′]∥∥Y ≤ Ld(x , x ′)
23
Framework LTI case Volterra series Generalize classical
(Lipschitz)-continuous systems
Lipschitz-continuous system over a compact metric space
S+ ⊂S : (U, d)→ (Y, ‖·‖Y); ∀x , x ′,
∥∥S [x ]− S [x ′]∥∥Y ≤ Ld(x , x ′)
Quantize the output set S [x ]; S ∈ S+, x ∈ U ?
23
Framework LTI case Volterra series Generalize classical
(Lipschitz)-continuous systems
Lipschitz-continuous system over a compact metric space
S+ ⊂S : (U, d)→ (Y, ‖·‖Y); ∀x , x ′,
∥∥S [x ]− S [x ′]∥∥Y ≤ Ld(x , x ′)
Theorem (Banach-valued Arzela-Ascoli)
S+ relatively compact in C (U;Y) iffS+ equicontinuous (⇐ L-lipschitz)S+ "equicompact": S [x ]; S ∈ S+, x ∈ U relatively compact 23
Framework LTI case Volterra series Generalize classical
(Lipschitz)-continuous systems
Lipschitz-continuous system over a compact metric space
S+ ⊂S : (U, d)→ (Y, ‖·‖Y); ∀x , x ′,
∥∥S [x ]− S [x ′]∥∥Y ≤ Ld(x , x ′)
Theorem (Banach-valued Arzela-Ascoli)
S+ can be quantized iffS+ equicontinuous (⇐ L-lipschitz)S+ "equicompact": S [x ]; S ∈ S+, x ∈ U can be quantized 24
Conclusion
Framework LTI case Volterra series Generalize classical
Conclusion
Framework: want Nε(S+; ‖·‖∞U) where
S+ ⊂ S = S : X → Y∥∥∥S − S
∥∥∥∞U
= supx∈U
∥∥∥S [x ]− S [x ]∥∥∥Y
Two paths:"Parametric"
S+ = Φk , k ∈ K+Nε(S+; ‖·‖∞U) ' Nε(K+; ‖·‖K)
Generalize classical techniques
f : Rd → R S : X → Y
Direction for future work: "non-parametric" path?Kernel methods for nonlinear system identification
25
Framework LTI case Volterra series Generalize classical
Conclusion
Framework: want Nε(S+; ‖·‖∞U) where
S+ ⊂ S = S : X → Y∥∥∥S − S
∥∥∥∞U
= supx∈U
∥∥∥S [x ]− S [x ]∥∥∥Y
Two paths:"Parametric"
S+ = Φk , k ∈ K+Nε(S+; ‖·‖∞U) ' Nε(K+; ‖·‖K)
Generalize classical techniques
f : Rd → R S : X → Y
Direction for future work: "non-parametric" path?Kernel methods for nonlinear system identification
25
Framework LTI case Volterra series Generalize classical
Conclusion
Framework: want Nε(S+; ‖·‖∞U) where
S+ ⊂ S = S : X → Y∥∥∥S − S
∥∥∥∞U
= supx∈U
∥∥∥S [x ]− S [x ]∥∥∥Y
Two paths:"Parametric"
S+ = Φk , k ∈ K+Nε(S+; ‖·‖∞U) ' Nε(K+; ‖·‖K)
Generalize classical techniques
f : Rd → R S : X → Y
Direction for future work: "non-parametric" path?Kernel methods for nonlinear system identification
25
Framework LTI case Volterra series Generalize classical
Illustrations adapted from"Nonlinear system modeling based on the Wiener theory"(Schetzen 1981)https://commons.wikimedia.org/wiki/File:Circuit_L_R_parall%
C3%A8le_-_courant_en_entr%C3%A9e_et_tension_en_sortie.png
https://xkcd.com/730/
https://towardsdatascience.com/8a3fbfdc5e9b
"’Zero-Shot’ Super-Resolution using Deep Internal Learning"(Shocher et al. 2017)Neural Network Theory lecture notes HS2019 ETHZIntroduction to Digital Communications, Chapter 3 (Grami2016)"ε-Entropy and ε-Capacity of Sets In Functional Spaces"(Kolmogorov and Tikhomirov 1959)
26
Appendix
5 Volterra series as elements of a polynomial RKBS
6 Misc
Polynomial RKBS Misc
The idea
Time-invariant system → scalar-valued functional (cf report)
Volterra monomial with fixed n
Vkn [x ](t) =
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Fθn [x ] =
∫Rn
dτ θn(τ )x(τ1)...x(τn) =
∫Rn
dτ θn(τ )x×n(τ )
Idea: view as linear combination of feature map
Fθn [x ] =
∫Rn
dτ θn(τ )φ(x)(τ ) = 〈φ(x), θn〉
x ∈ Lp(R) 7→ φ(x) = x×n ∈ LpSym(Rn)So Fθn ∈ RKHS with
K (x , x) = 〈φ(x), φ(x)〉 =
(∫Rx x
)n
Problem: ill-defined when x , x ∈ Lp(R)...
1
Polynomial RKBS Misc
The idea
Time-invariant system → scalar-valued functional (cf report)Volterra monomial with fixed n
Vkn [x ](t) =
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Fθn [x ] =
∫Rn
dτ θn(τ )x(τ1)...x(τn) =
∫Rn
dτ θn(τ )x×n(τ )
Idea: view as linear combination of feature map
Fθn [x ] =
∫Rn
dτ θn(τ )φ(x)(τ ) = 〈φ(x), θn〉
x ∈ Lp(R) 7→ φ(x) = x×n ∈ LpSym(Rn)So Fθn ∈ RKHS with
K (x , x) = 〈φ(x), φ(x)〉 =
(∫Rx x
)n
Problem: ill-defined when x , x ∈ Lp(R)...
1
Polynomial RKBS Misc
The idea
Time-invariant system → scalar-valued functional (cf report)Volterra monomial with fixed n
Vkn [x ](t) =
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Fθn [x ] =
∫Rn
dτ θn(τ )x(τ1)...x(τn) =
∫Rn
dτ θn(τ )x×n(τ )
Idea: view as linear combination of feature map
Fθn [x ] =
∫Rn
dτ θn(τ )φ(x)(τ ) = 〈φ(x), θn〉
x ∈ Lp(R) 7→ φ(x) = x×n ∈ LpSym(Rn)
So Fθn ∈ RKHS with
K (x , x) = 〈φ(x), φ(x)〉 =
(∫Rx x
)n
Problem: ill-defined when x , x ∈ Lp(R)...
1
Polynomial RKBS Misc
The idea
Time-invariant system → scalar-valued functional (cf report)Volterra monomial with fixed n
Vkn [x ](t) =
∫Rn
dτ kn(τ ) x(t − τ1)...x(t − τn)
Fθn [x ] =
∫Rn
dτ θn(τ )x(τ1)...x(τn) =
∫Rn
dτ θn(τ )x×n(τ )
Idea: view as linear combination of feature map
Fθn [x ] =
∫Rn
dτ θn(τ )φ(x)(τ ) = 〈φ(x), θn〉
x ∈ Lp(R) 7→ φ(x) = x×n ∈ LpSym(Rn)So Fθn ∈ RKHS with
K (x , x) = 〈φ(x), φ(x)〉 =
(∫Rx x
)n
Problem: ill-defined when x , x ∈ Lp(R)...1
Polynomial RKBS Misc
RKBS
RKBS = Reproducing Kernel Banach Space
Definition (Lin et al. 2019)
Pair of RKBS: a tuple (B1,B2, 〈·, ·〉B1×B2) s.t
Bi Banach space of (real-valued) functions on Ωi
〈·, ·〉B1×B2continuous bilinear form on B1 × B2
∃K : Ω1 × Ω2 → R, called reproducing kernel, s.t
K (x , ·) ∈ B2 ∀f ∈ B1, f (x) = 〈f ,K (x , ·)〉B1×B2
K (·, y) ∈ B1 ∀g ∈ B2, g(y) = 〈K (·, y), g〉B1×B2
(K is unique)
2
Polynomial RKBS Misc
RKBS
RKBS = Reproducing Kernel Banach Space
Definition (Lin et al. 2019)
Pair of RKBS: a tuple (B1,B2, 〈·, ·〉B1×B2) s.t
Bi Banach space of (real-valued) functions on Ωi
〈·, ·〉B1×B2continuous bilinear form on B1 × B2
∃K : Ω1 × Ω2 → R, called reproducing kernel, s.t
K (x , ·) ∈ B2 ∀f ∈ B1, f (x) = 〈f ,K (x , ·)〉B1×B2
K (·, y) ∈ B1 ∀g ∈ B2, g(y) = 〈K (·, y), g〉B1×B2
(K is unique)
2
Polynomial RKBS Misc
RKBS from feature maps
Proposition (Lin et al. 2019)
φi : Ωi →Wi Banach (i = 1, 2)
〈·, ·〉W1×W2continuous bilinear form on W1 ×W2
"spanφi (Ωi ) are dense":v ∈ W2; ∀x ∈ Ω1, 〈φ1(x), v〉W1×W2
= 0
= 0u ∈ W1; ∀x ∈ Ω2, 〈u, φ2(x)〉W1×W2
= 0
= 0
This induces a pair of RKBS
B1 :=Fv = 〈φ1(·), v〉W1×W2
; v ∈ W2
‖Fv‖B1:= ‖v‖W2
B2 :=Gu = 〈u, φ2(·)〉W1×W2
; u ∈ W1
‖Gu‖B2:= ‖u‖W1
〈Fv ,Gu〉B1×B2:= 〈u, v〉W1×W2
and K (x , x) = 〈φ1(x), φ2(x)〉W1×W2.
3
Polynomial RKBS Misc
RKBS from feature maps
Proposition (Lin et al. 2019)
φi : Ωi →Wi Banach (i = 1, 2)〈·, ·〉W1×W2
continuous bilinear form on W1 ×W2
"spanφi (Ωi ) are dense":v ∈ W2; ∀x ∈ Ω1, 〈φ1(x), v〉W1×W2
= 0
= 0u ∈ W1; ∀x ∈ Ω2, 〈u, φ2(x)〉W1×W2
= 0
= 0
This induces a pair of RKBS
B1 :=Fv = 〈φ1(·), v〉W1×W2
; v ∈ W2
‖Fv‖B1:= ‖v‖W2
B2 :=Gu = 〈u, φ2(·)〉W1×W2
; u ∈ W1
‖Gu‖B2:= ‖u‖W1
〈Fv ,Gu〉B1×B2:= 〈u, v〉W1×W2
and K (x , x) = 〈φ1(x), φ2(x)〉W1×W2.
3
Polynomial RKBS Misc
RKBS from feature maps
Proposition (Lin et al. 2019)
φi : Ωi →Wi Banach (i = 1, 2)〈·, ·〉W1×W2
continuous bilinear form on W1 ×W2
"spanφi (Ωi ) are dense":v ∈ W2; ∀x ∈ Ω1, 〈φ1(x), v〉W1×W2
= 0
= 0u ∈ W1; ∀x ∈ Ω2, 〈u, φ2(x)〉W1×W2
= 0
= 0
This induces a pair of RKBS
B1 :=Fv = 〈φ1(·), v〉W1×W2
; v ∈ W2
‖Fv‖B1:= ‖v‖W2
B2 :=Gu = 〈u, φ2(·)〉W1×W2
; u ∈ W1
‖Gu‖B2:= ‖u‖W1
〈Fv ,Gu〉B1×B2:= 〈u, v〉W1×W2
and K (x , x) = 〈φ1(x), φ2(x)〉W1×W2.
3
Polynomial RKBS Misc
RKBS from feature maps
Proposition (Lin et al. 2019)
φi : Ωi →Wi Banach (i = 1, 2)〈·, ·〉W1×W2
continuous bilinear form on W1 ×W2
"spanφi (Ωi ) are dense":v ∈ W2; ∀x ∈ Ω1, 〈φ1(x), v〉W1×W2
= 0
= 0u ∈ W1; ∀x ∈ Ω2, 〈u, φ2(x)〉W1×W2
= 0
= 0
This induces a pair of RKBS
B1 :=Fv = 〈φ1(·), v〉W1×W2
; v ∈ W2
‖Fv‖B1:= ‖v‖W2
B2 :=Gu = 〈u, φ2(·)〉W1×W2
; u ∈ W1
‖Gu‖B2:= ‖u‖W1
〈Fv ,Gu〉B1×B2:= 〈u, v〉W1×W2
and K (x , x) = 〈φ1(x), φ2(x)〉W1×W2.
3
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Proposition
φ1 : Lp(R)→ LpSym(Rn), x(t) 7→ x×n(t)
φ2 : Lq(R)→ LqSym(Rn), x(t) 7→ x×n(t)
〈un(t), un(t)〉Lp =∫Rn dt un(t)un(t) (Rk: ≡ duality bracket)
"spanφi (Ωi ) are dense": (cf report)un ∈ LqSym(Rn); ∀x ∈ Lp(R),
⟨x×n, un
⟩Lp
= 0
= 0un ∈ LpSym(Rn); ∀x ∈ Lq(R),
⟨un, x
×n)⟩Lp
= 0
= 0
This induces (B1,B2, 〈·, ·〉B1×B2) and
B1 =
Fθn :
[x 7→
∫Rn
dτ θn(τ )x×n(τ )
]; θn ∈ LqSym(Rn)
‖Fθn‖B1
= ‖θn‖LqSym(Rn) and K (x , x) =(∫
R x x)n .
→ B1 exactly describes Volterra monomials ,
4
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Proposition
φ1 : Lp(R)→ LpSym(Rn), x(t) 7→ x×n(t)
φ2 : Lq(R)→ LqSym(Rn), x(t) 7→ x×n(t)
〈un(t), un(t)〉Lp =∫Rn dt un(t)un(t) (Rk: ≡ duality bracket)
"spanφi (Ωi ) are dense": (cf report)un ∈ LqSym(Rn); ∀x ∈ Lp(R),
⟨x×n, un
⟩Lp
= 0
= 0un ∈ LpSym(Rn); ∀x ∈ Lq(R),
⟨un, x
×n)⟩Lp
= 0
= 0
This induces (B1,B2, 〈·, ·〉B1×B2) and
B1 =
Fθn :
[x 7→
∫Rn
dτ θn(τ )x×n(τ )
]; θn ∈ LqSym(Rn)
‖Fθn‖B1
= ‖θn‖LqSym(Rn) and K (x , x) =(∫
R x x)n .
→ B1 exactly describes Volterra monomials ,
4
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Proposition
φ1 : Lp(R)→ LpSym(Rn), x(t) 7→ x×n(t)
φ2 : Lq(R)→ LqSym(Rn), x(t) 7→ x×n(t)
〈un(t), un(t)〉Lp =∫Rn dt un(t)un(t) (Rk: ≡ duality bracket)
"spanφi (Ωi ) are dense": (cf report)un ∈ LqSym(Rn); ∀x ∈ Lp(R),
⟨x×n, un
⟩Lp
= 0
= 0un ∈ LpSym(Rn); ∀x ∈ Lq(R),
⟨un, x
×n)⟩Lp
= 0
= 0
This induces (B1,B2, 〈·, ·〉B1×B2) and
B1 =
Fθn :
[x 7→
∫Rn
dτ θn(τ )x×n(τ )
]; θn ∈ LqSym(Rn)
‖Fθn‖B1
= ‖θn‖LqSym(Rn) and K (x , x) =(∫
R x x)n .
→ B1 exactly describes Volterra monomials ,
4
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Volterra monomials are exactly elements of a RKBS
Volterra series of finite order: idem (just concatenate)Volterra series of infinite order? → yes, cf report
K (x , x) =∞∑n=0
(∫Rx x
)n
Possible future directions:
Learning in RKBS vs. classical Volterra-series-based systemidentification?What if we want K (x , x) =
∑∞n=0 an
(∫R x x
)n for somean ≥ 0?
5
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Volterra monomials are exactly elements of a RKBSVolterra series of finite order: idem (just concatenate)
Volterra series of infinite order? → yes, cf report
K (x , x) =∞∑n=0
(∫Rx x
)n
Possible future directions:
Learning in RKBS vs. classical Volterra-series-based systemidentification?What if we want K (x , x) =
∑∞n=0 an
(∫R x x
)n for somean ≥ 0?
5
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Volterra monomials are exactly elements of a RKBSVolterra series of finite order: idem (just concatenate)Volterra series of infinite order? → yes, cf report
K (x , x) =∞∑n=0
(∫Rx x
)n
Possible future directions:
Learning in RKBS vs. classical Volterra-series-based systemidentification?What if we want K (x , x) =
∑∞n=0 an
(∫R x x
)n for somean ≥ 0?
5
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Volterra monomials are exactly elements of a RKBSVolterra series of finite order: idem (just concatenate)Volterra series of infinite order? → yes, cf report
K (x , x) =∞∑n=0
(∫Rx x
)n
Possible future directions:
Learning in RKBS vs. classical Volterra-series-based systemidentification?What if we want K (x , x) =
∑∞n=0 an
(∫R x x
)n for somean ≥ 0?
5
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Volterra monomials are exactly elements of a RKBSVolterra series of finite order: idem (just concatenate)Volterra series of infinite order? → yes, cf report
K (x , x) =∞∑n=0
(∫Rx x
)n
Possible future directions:Learning in RKBS vs. classical Volterra-series-based systemidentification?
What if we want K (x , x) =∑∞
n=0 an(∫
R x x)n for some
an ≥ 0?
5
Polynomial RKBS Misc
Volterra series as polynomial RKBS
Volterra monomials are exactly elements of a RKBSVolterra series of finite order: idem (just concatenate)Volterra series of infinite order? → yes, cf report
K (x , x) =∞∑n=0
(∫Rx x
)n
Possible future directions:Learning in RKBS vs. classical Volterra-series-based systemidentification?What if we want K (x , x) =
∑∞n=0 an
(∫R x x
)n for somean ≥ 0?
5
5 Volterra series as elements of a polynomial RKBS
6 Misc
Polynomial RKBS Misc
Entropy numbers
Nε(C; ‖·‖) = minn; ∃(p1, ..., pn) ⊂ C s.t C ⊂
⋃i B‖·‖pi ,ε
εn(C; ‖·‖) = infε; ∃(p1, ..., pn) ⊂ C s.t C ⊂
⋃i B‖·‖pi ,ε
Nε(C; ‖·‖) ≤ n0 ⇐⇒ εn(C; ‖·‖) ≤ ε0"Metric entropy" ≡ "Entropy number"
6
Polynomial RKBS Misc
Entropy numbers
Nε(C; ‖·‖) = minn; ∃(p1, ..., pn) ⊂ C s.t C ⊂
⋃i B‖·‖pi ,ε
εn(C; ‖·‖) = inf
ε; ∃(p1, ..., pn) ⊂ C s.t C ⊂
⋃i B‖·‖pi ,ε
Nε(C; ‖·‖) ≤ n0 ⇐⇒ εn(C; ‖·‖) ≤ ε0"Metric entropy" ≡ "Entropy number"
6
Polynomial RKBS Misc
Entropy numbers
Nε(C; ‖·‖) = minn; ∃(p1, ..., pn) ⊂ C s.t C ⊂
⋃i B‖·‖pi ,ε
εn(C; ‖·‖) = inf
ε; ∃(p1, ..., pn) ⊂ C s.t C ⊂
⋃i B‖·‖pi ,ε
Nε(C; ‖·‖) ≤ n0 ⇐⇒ εn(C; ‖·‖) ≤ ε0"Metric entropy" ≡ "Entropy number"
6