Discrete Denoising with Shifts - Stanford University · Discrete Denoising with Shifts 1 Prediction...

Post on 26-Apr-2020

7 views 0 download

Transcript of Discrete Denoising with Shifts - Stanford University · Discrete Denoising with Shifts 1 Prediction...

Discrete Denoising with Shifts

Taesup Moon

Yahoo! Labs

EE477 Guest LectureNovember 10, 2011

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 1 / 24

Discrete Denoising with Shifts

1 Prediction with Experts’ Advice

2 Discrete Denoising with ShiftsRecap of DUDEMotivationNew algorithm: S-DUDEResults

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 2 / 24

Discrete Denoising with Shifts Recap of DUDE

Discrete denoising

Xt, Zt, Xt take values in finite alphabets

Choose Xn1 as close as possible to Xn

1 , based on theentire Zn

1Ex) text correction, image denoising, DNA sequence analyses, etc.Performance metric: per-symbol average loss

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 3 / 24

Discrete Denoising with Shifts Recap of DUDE

DUDE is the first universal discretedenoiser

DUDE - [Weissman et.al 05]

For location t to be denoised, do :

1 fix the window size k

2 find left k-context (`1, . . . , `k) and right k-context (r1, . . . , rk) of zt

`1 `2 · · · `k zt r1 r2 · · · rk

3 count all occurrences of symbols in zn with the same context

4 decide on xt according to

xt(zt+kt−k) = simple rule(Π,Λ, count vector[zn, zt−1

t−k, zt+kt+1 ], zt)

Whenever DUDE sees zt+1t−kztz

t+kt+1 , it makes the same decision for zt

DUDE is a “sliding window” denoiser

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 4 / 24

Discrete Denoising with Shifts Recap of DUDE

Ex 1 : stationary bit stream gets corrupted

Xn : 00000011111110000000000111111111100000001111111110000

Zn : 00100011101110010001000111110111100000011110111110001

source : binary Markov chain with p = 0.1, sequence length n = 106

1! p

0 1

p

p

1! p

noise : BSC(δ = 0.1)

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 5 / 24

Discrete Denoising with Shifts Recap of DUDE

Ex 1 : stationary bit stream gets corrupted

Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001

source : binary Markov chain with p = 0.1, sequence length n = 106

1! p

0 1

p

p

1! p

noise : BSC(δ = 0.1)

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 5 / 24

Discrete Denoising with Shifts Recap of DUDE

Ex 1 : stationary bit stream gets corrupted

Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001

source : binary Markov chain with p = 0.1, sequence length n = 106

1! p

0 1

p

p

1! p

noise : BSC(δ = 0.1)

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 5 / 24

Discrete Denoising with Shifts Recap of DUDE

DUDE achieves the optimal BER as thewindow size grows

0 1 2 3 4 5 60.5

0.6

0.7

0.8

0.9

1

Window size k

Bit e

rror r

ate/

Bit error rate plot

Bayes Optimum = 0.558

DUDE = 0.561

Window size k is a design parameter for given sequencelength n

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 5 / 24

Discrete Denoising with Shifts Recap of DUDE

DUDE attains the optimum performancesfor stationary sources

For a denoiser Xn = {Xt(zn)}nt=1,

LXn(xn, zn) =1

n

n∑

t=1

Λ(xt, Xt(zn))

is the performance measure

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24

Discrete Denoising with Shifts Recap of DUDE

DUDE attains the optimum performancesfor stationary sources

main results of DUDE : when k = kn < d12 log|Z| ne,

1 For any stationary process X,

limn→∞

[E(LXn

DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

)]= 0

Dn is the set of all denoisers in the world

DUDE attains the Bayes optimal performance

2 For all x ∈ X∞,

limn→∞

[LXn

DUDE(xn, Zn)−Dk(xn, Zn)

]= 0 w.p.1

Dk(xn, zn) : the best performance among Sk

DUDE is as good as the best sliding window denoiser

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24

Discrete Denoising with Shifts Recap of DUDE

DUDE attains the optimum performancesfor stationary sources

main results of DUDE : when k = kn < d12 log|Z| ne,

1 For any stationary process X,

limn→∞

[E(LXn

DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

)]= 0

Dn is the set of all denoisers in the world

DUDE attains the Bayes optimal performance

2 For all x ∈ X∞,

limn→∞

[LXn

DUDE(xn, Zn)−Dk(xn, Zn)

]= 0 w.p.1

Dk(xn, zn) : the best performance among Sk

DUDE is as good as the best sliding window denoiser

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24

Discrete Denoising with Shifts Recap of DUDE

DUDE attains the optimum performancesfor stationary sources

main results of DUDE : when k = kn < d12 log|Z| ne,

1 For any stationary process X,

limn→∞

[E(LXn

DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

)]= 0

Dn is the set of all denoisers in the world

DUDE attains the Bayes optimal performance

2 For all x ∈ X∞,

limn→∞

[LXn

DUDE(xn, Zn)−Dk(xn, Zn)

]= 0 w.p.1

Dk(xn, zn) : the best performance among Sk

DUDE is as good as the best sliding window denoiser

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24

Discrete Denoising with Shifts Recap of DUDE

DUDE attains the optimum performancesfor stationary sources

main results of DUDE : when k = kn < d12 log|Z| ne,

1 For any stationary process X,

limn→∞

[E(LXn

DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

)]= 0

Dn is the set of all denoisers in the world

DUDE attains the Bayes optimal performance

2 For all x ∈ X∞,

limn→∞

[LXn

DUDE(xn, Zn)−Dk(xn, Zn)

]= 0 w.p.1

Dk(xn, zn) : the best performance among Sk

DUDE is as good as the best sliding window denoiser

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 6 / 24

Discrete Denoising with Shifts Motivation

Ex 2 :piecewise stationary bit stream getscorruptedXn : 00000011111110000000000111111101100011011011011010110

Zn : 00100011101110010001000111110101100011111011010010100

source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2

1! p

0 1

p

p

1! p

noise : BSC(δ = 0.1)

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24

Discrete Denoising with Shifts Motivation

Ex 2 :piecewise stationary bit stream getscorruptedXn : 00000011111110000000000111111101100011011011011010110Zn : 00100011101110010001000111110101100011111011010010100

source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2

1! p

0 1

p

p

1! p

noise : BSC(δ = 0.1)

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24

Discrete Denoising with Shifts Motivation

Ex 2 :piecewise stationary bit stream getscorruptedXn : 00000011111110000000000111111101100011011011011010110Zn : 00100011101110010001000111110101100011111011010010100

source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2

1! p

0 1

p

p

1! p

noise : BSC(δ = 0.1)

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24

Discrete Denoising with Shifts Motivation

Does DUDE achieve the optimal BER?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24

Discrete Denoising with Shifts Motivation

Does DUDE achieve the optimal BER?

0 1 2 3 4 5 60.4

0.5

0.6

0.7

0.8

0.9

1

Window Size k

Bit e

rror r

ate/

Bit error rate plot

Bayes Optimum = 0.487

DUDE = 0.574

(+18%)

DUDE applies the same rule “regardless of the location”DUDE has a limitation for time- (space-) varying sources

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 7 / 24

Discrete Denoising with Shifts Motivation

In practice, many sources are time-(space-) varying

text : English → Spanish → German . . .

voice : image :

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 8 / 24

Discrete Denoising with Shifts Motivation

In practice, many sources are time-(space-) varying

text : English → Spanish → German . . .

voice :

image :

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 8 / 24

Discrete Denoising with Shifts Motivation

In practice, many sources are time-(space-) varying

text : English → Spanish → German . . .

voice : image :

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 8 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Can we do better than the DUDE whenthe source varies?

Questions

1 Can we perform as if we knew the source including its change points?

2 If so, can we do it efficiently?

answers1 Yes. S-DUDE can do essentially as well as if it knows

the source and its change points

2 Yes. S-DUDE is a linear complexity algorithm

[M and Weissman, IEEE Trans. Info. Theory, Nov 09]

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 9 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Can we do better than the DUDE whenthe source varies?

Questions

1 Can we perform as if we knew the source including its change points?

2 If so, can we do it efficiently?

answers1 Yes. S-DUDE can do essentially as well as if it knows

the source and its change points

2 Yes. S-DUDE is a linear complexity algorithm

[M and Weissman, IEEE Trans. Info. Theory, Nov 09]

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 9 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Can we do better than the DUDE whenthe source varies?

Questions

1 Can we perform as if we knew the source including its change points?

2 If so, can we do it efficiently?

answers1 Yes. S-DUDE can do essentially as well as if it knows

the source and its change points

2 Yes. S-DUDE is a linear complexity algorithm

[M and Weissman, IEEE Trans. Info. Theory, Nov 09]

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 9 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Take a closer look at the binary example

Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :

zt+3t−3 :

↓xt :

0100110︸ ︷︷ ︸↓0

0101110︸ ︷︷ ︸↓1

010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Take a closer look at the binary example

Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :

zt+3t−3 :

↓xt :

0100110︸ ︷︷ ︸↓0

0101110︸ ︷︷ ︸↓1

010 • 110 defined a “say-what-you-see” mapping in the middle

DUDE employs the same mapping whenever it sees 010 • 110

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Take a closer look at the binary example

Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :

zt+3t−3 :

↓xt :

0100110︸ ︷︷ ︸↓0

0101110︸ ︷︷ ︸↓1

010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Take a closer look at the binary example

Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :

zt+3t−3 :

↓xt :

0100110︸ ︷︷ ︸↓0

0101110︸ ︷︷ ︸↓1

010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110

Only 4 single-letter mappings in binary example“say-what-you-see”,“flip-what-you-see”,“always-say-0”,“always-say-1”

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Take a closer look at the binary example

Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :

zt+3t−3 :

↓xt :

0100110︸ ︷︷ ︸↓0

0101110︸ ︷︷ ︸↓1

010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110

DUDE counts n0 and n1 for 010 • 110 andif n0 ≈ n1 → “say-what-you-see”if n0 � n1 → “always-say-0”if n0 � n1 → “always-say-1”threshold depends on δ

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 10 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Employing shifting single-letter mappingswill be helpful

Suppose 0’s 1’s at 010 • 110 looked like

0000100011000011111111011101︸ ︷︷ ︸swys

“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”

Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss

How can we decide when to shift to what?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Employing shifting single-letter mappingswill be helpful

Suppose 0’s 1’s at 010 • 110 looked like

00001000110000︸ ︷︷ ︸ 11111111011101︸ ︷︷ ︸all− 0 all− 1

“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”

Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss

How can we decide when to shift to what?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Employing shifting single-letter mappingswill be helpful

Suppose 0’s 1’s at 010 • 110 looked like

00001000110000︸ ︷︷ ︸ 11111111011101︸ ︷︷ ︸all− 0 all− 1

“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”

Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss

How can we decide when to shift to what?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Employing shifting single-letter mappingswill be helpful

Suppose 0’s 1’s at 010 • 110 looked like

00001000110000︸ ︷︷ ︸ 11111111011101︸ ︷︷ ︸all− 0 all− 1

“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”

Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss

How can we decide when to shift to what?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Employing shifting single-letter mappingswill be helpful

Suppose 0’s 1’s at 010 • 110 looked like

00001000110000︸ ︷︷ ︸ 11111111011101︸ ︷︷ ︸all− 0 all− 1

“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”

Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss

How can we decide when to shift to what?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 11 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Snm is a class of shifting single-lettermappings

Ideally, shifting every time to the correct mapping would be thebest

equivalent to knowing the source sequence ⇒ impossible!

We limit the number of shifts to m

Snm : class of single-letter mappings shifting at most m times for

sequence length n, e.g.,

swys!

swys{s1, · · · , sn} :

zn :

all-0 all-1

|Snm| ≤

(nm

)· |S|m, |S| = |Z||X | (number of single-letter mappings)

Deciding when to shift to what m times⇔ Selecting the best combination in Sn

m

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Snm is a class of shifting single-lettermappings

Ideally, shifting every time to the correct mapping would be thebest

equivalent to knowing the source sequence ⇒ impossible!

We limit the number of shifts to m

Snm : class of single-letter mappings shifting at most m times for

sequence length n, e.g.,

swys!

swys{s1, · · · , sn} :

zn :

all-0 all-1

|Snm| ≤

(nm

)· |S|m, |S| = |Z||X | (number of single-letter mappings)

Deciding when to shift to what m times⇔ Selecting the best combination in Sn

m

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Snm is a class of shifting single-lettermappings

Ideally, shifting every time to the correct mapping would be thebest

equivalent to knowing the source sequence ⇒ impossible!

We limit the number of shifts to m

Snm : class of single-letter mappings shifting at most m times for

sequence length n, e.g.,

swys!

swys{s1, · · · , sn} :

zn :

all-0 all-1

|Snm| ≤

(nm

)· |S|m, |S| = |Z||X | (number of single-letter mappings)

Deciding when to shift to what m times⇔ Selecting the best combination in Sn

m

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Snm is a class of shifting single-lettermappings

Ideally, shifting every time to the correct mapping would be thebest

equivalent to knowing the source sequence ⇒ impossible!

We limit the number of shifts to mSn

m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,

swys!

swys{s1, · · · , sn} :

zn :

all-0 all-1

|Snm| ≤

(nm

)· |S|m, |S| = |Z||X | (number of single-letter mappings)

Deciding when to shift to what m times⇔ Selecting the best combination in Sn

m

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Snm is a class of shifting single-lettermappings

Ideally, shifting every time to the correct mapping would be thebest

equivalent to knowing the source sequence ⇒ impossible!

We limit the number of shifts to mSn

m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,

swys!

swys{s1, · · · , sn} :

zn :

all-0 all-1

|Snm| ≤

(nm

)· |S|m, |S| = |Z||X | (number of single-letter mappings)

Deciding when to shift to what m times⇔ Selecting the best combination in Sn

m

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Snm is a class of shifting single-lettermappings

Ideally, shifting every time to the correct mapping would be thebest

equivalent to knowing the source sequence ⇒ impossible!

We limit the number of shifts to mSn

m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,

swys!

swys{s1, · · · , sn} :

zn :

all-0 all-1

|Snm| ≤

(nm

)· |S|m, |S| = |Z||X | (number of single-letter mappings)

Deciding when to shift to what m times⇔ Selecting the best combination in Sn

m

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Snm is a class of shifting single-lettermappings

Ideally, shifting every time to the correct mapping would be thebest

equivalent to knowing the source sequence ⇒ impossible!

We limit the number of shifts to mSn

m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,

swys!

swys{s1, · · · , sn} :

zn :

all-0 all-1

|Snm| ≤

(nm

)· |S|m, |S| = |Z||X | (number of single-letter mappings)

Deciding when to shift to what m times⇔ Selecting the best combination in Sn

m

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 12 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )

X = s(Z)x ! Z

Λ(x, s(Z)) : loss between x and s(Z)

not observable

But, from the knowledge of Π, we devise `(Z, s) such that

Ex

(`(Z, s)

)= Ex

(Λ(x, s(Z))

)

`(Z, s) is an unbiased estimate of Ex

(Λ(x, s(Z))

)

`(Z, s) : loss between Z and s(·)

observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )

X = s(Z)x ! Z

Λ(x, s(Z)) : loss between x and s(Z)

not observable

But, from the knowledge of Π, we devise `(Z, s) such that

Ex

(`(Z, s)

)= Ex

(Λ(x, s(Z))

)

`(Z, s) is an unbiased estimate of Ex

(Λ(x, s(Z))

)

`(Z, s) : loss between Z and s(·)

observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )

X = s(Z)x ! Z

Λ(x, s(Z)) : loss between x and s(Z)

not observable

But, from the knowledge of Π, we devise `(Z, s) such that

Ex

(`(Z, s)

)= Ex

(Λ(x, s(Z))

)

`(Z, s) is an unbiased estimate of Ex

(Λ(x, s(Z))

)

`(Z, s) : loss between Z and s(·)

observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )

X = s(Z)x ! Z

Λ(x, s(Z)) : loss between x and s(Z)

not observable

But, from the knowledge of Π, we devise `(Z, s) such that

Ex

(`(Z, s)

)= Ex

(Λ(x, s(Z))

)

`(Z, s) is an unbiased estimate of Ex

(Λ(x, s(Z))

)

`(Z, s) : loss between Z and s(·)

observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )

X = s(Z)x ! Z

Λ(x, s(Z)) : loss between x and s(Z)

not observable

But, from the knowledge of Π, we devise `(Z, s) such that

Ex

(`(Z, s)

)= Ex

(Λ(x, s(Z))

)

`(Z, s) is an unbiased estimate of Ex

(Λ(x, s(Z))

)

`(Z, s) : loss between Z and s(·)

observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )

X = s(Z)x ! Z

Λ(x, s(Z)) : loss between x and s(Z)

not observable

But, from the knowledge of Π, we devise `(Z, s) such that

Ex

(`(Z, s)

)= Ex

(Λ(x, s(Z))

)

`(Z, s) is an unbiased estimate of Ex

(Λ(x, s(Z))

)

`(Z, s) : loss between Z and s(·)

observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )

X = s(Z)x ! Z

Λ(x, s(Z)) : loss between x and s(Z)

not observable

But, from the knowledge of Π, we devise `(Z, s) such that

Ex

(`(Z, s)

)= Ex

(Λ(x, s(Z))

)

`(Z, s) is an unbiased estimate of Ex

(Λ(x, s(Z))

)

`(Z, s) : loss between Z and s(·)observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S , arg minS∈Snc

m

i∈context c

`(zi, si)

vs. arg minS∈Snc

m

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S , arg minS∈Snc

m

i∈context c

`(zi, si)

vs. arg minS∈Snc

m

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S , arg minS∈Snc

m

i∈context c

`(zi, si)

vs. arg minS∈Snc

m

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S ,

arg minS∈Snc

m

i∈context c

`(zi, si)

vs. arg minS∈Snc

m

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S , arg minS∈Snc

m

i∈context c

`(zi, si)

vs. arg minS∈Snc

m

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S , arg minS∈Snc

m

i∈context c

`(zi, si)

vs. arg minS∈Snc

m

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S , arg minS∈Snc

m

i∈context c

`(zi, si)

vs. arg minS∈Snc

m

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S , arg minS∈Snc

m

i∈context c

`(zi, si)

vs. arg minS∈Snc

m

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 14 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE can be implemented with atwo-pass algorithm

again binary, BSC(δ) example

problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic

programming

3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE can be implemented with atwo-pass algorithm

again binary, BSC(δ) example

problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic

programming

3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE can be implemented with atwo-pass algorithm

again binary, BSC(δ) example

problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic

programming

3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE can be implemented with atwo-pass algorithm

again binary, BSC(δ) example

problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic

programming

3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE can be implemented with atwo-pass algorithm

again binary, BSC(δ) example

problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n

2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamicprogramming

3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE can be implemented with atwo-pass algorithm

again binary, BSC(δ) example

problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic

programming

3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

S-DUDE can be implemented with atwo-pass algorithm

again binary, BSC(δ) example

problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic

programming

3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 15 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Mt stores minimum sum of estimatedlosses up to tAgain binary, BSC(δ) example

Problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}Elements of Mt are defined to be the minimum sum up to t, e.g.,

Mt

all-0 swysall-1 fwys

i

m

Mt(i, swys) = min{s1,··· ,st}∈St

i

{`(zt, st = swys) +t−1∑

r=1

`(zr, sr)}

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 16 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

First pass uses dynamic programming

Only two possible cases to attain Mt(i, swys)

1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 17 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

First pass uses dynamic programming

Only two possible cases to attain Mt(i, swys)

1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)

2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 17 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

First pass uses dynamic programming

Only two possible cases to attain Mt(i, swys)

1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 17 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

First pass uses dynamic programming

Only two possible cases to attain Mt(i, swys)

Mt(i, swys) =`(zt, swys) + min

{Mt−1(i, swys),min1≤j≤|S|Mt−1(i− 1, j)

}

same for all other elements

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 17 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Second pass extracts S and denoise

When t = n,sn = arg minj∈{all−0,all−1,swys,fwys}Mn(m, j), xn = sn(zn)

minS!Snm

!nt=1 !(zt, st)

Mn

all-0 swysall-1 fwys

mmin

for t = n− 1, · · · , 1 : follow the optimal path and denoise!

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 18 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Second pass extracts S and denoise

When t = n,sn = arg minj∈{all−0,all−1,swys,fwys}Mn(m, j), xn = sn(zn)

minS!Snm

!nt=1 !(zt, st)

Mn

all-0 swysall-1 fwys

mmin

for t = n− 1, · · · , 1 : follow the optimal path and denoise!

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 18 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The complexity of S-DUDE is linear in nand m

Complexity

space : O(mn|Z|2k)time : O(mn|Z|2k)practical

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 19 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The complexity of S-DUDE is linear in nand m

Complexityspace : O(mn|Z|2k)time : O(mn|Z|2k)

practical

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 19 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

The complexity of S-DUDE is linear in nand m

Complexityspace : O(mn|Z|2k)time : O(mn|Z|2k)practical

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 19 / 24

Discrete Denoising with Shifts New algorithm: S-DUDE

Summary of S-DUDE

S-DUDE (Shifting DUDE)

For location t to be denoised, do :

1 fix the window size k, set the number of shifts m

2 find left k-context (`1, . . . , `k) and right k-context (r1, . . . , rk) of zt

`1 `2 · · · `k zt r1 r2 · · · rk

3 on all positions that share the same context c with zt

find S = arg minS∈Sncm

Pt∈context c `(zt, st)

4 decide on xt according to

xt = st(zt), where st(·) comes from S

We can also show that if we set m = 0, S-DUDE coincides withDUDE

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 20 / 24

Discrete Denoising with Shifts Results

S-DUDE achieves the optimum loss fortime-(space-) varying sourcesWhen k = kn <

12 log|Z| n,

Theorem 1 (stochastic setting)

For all piecewise stationary processes X,

limn→∞

[E(LXn

S-DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

)]= 0,

provided that the number of stationary segments is m = o(n) w.p.1

Theorem 2 (individual sequence setting)

When m = o(n), for all x ∈ X∞,

limn→∞

[LXn

S-DUDE(xn, Zn)−Dk,m(xn, Zn)

]= 0 w.p.1

where Dk,m(xn, zn) is the best performance attained by k-th order slidingwindow denoisers that can shift at most m times

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 21 / 24

Discrete Denoising with Shifts Results

S-DUDE achieves the optimum loss fortime-(space-) varying sourcesWhen k = kn <

12 log|Z| n,

Theorem 1 (stochastic setting)

For all piecewise stationary processes X,

limn→∞

[E(LXn

S-DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

)]= 0,

provided that the number of stationary segments is m = o(n) w.p.1

Theorem 2 (individual sequence setting)

When m = o(n), for all x ∈ X∞,

limn→∞

[LXn

S-DUDE(xn, Zn)−Dk,m(xn, Zn)

]= 0 w.p.1

where Dk,m(xn, zn) is the best performance attained by k-th order slidingwindow denoisers that can shift at most m times

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 21 / 24

Discrete Denoising with Shifts Results

No denoiser is better than S-DUDE

Strong converse

If m = Θ(n), no denoiser can achieve previous theorems.

m = o(n) is a necessary and sufficient condition for previous theorems!

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 22 / 24

Discrete Denoising with Shifts Results

Ex 2 : piecewise stationary bit stream(revisited)Xn : 00000011111110000000000111111111100000001111111110000

Zn : 00100011101110010001000111110111100000011110111110001

source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2

1! p

0 1

p

p

1! p

noise : flips bits with probability δ = 0.1

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 23 / 24

Discrete Denoising with Shifts Results

Ex 2 : piecewise stationary bit stream(revisited)Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001

source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2

1! p

0 1

p

p

1! p

noise : flips bits with probability δ = 0.1

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 23 / 24

Discrete Denoising with Shifts Results

Ex 2 : piecewise stationary bit stream(revisited)Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001

source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2

1! p

0 1

p

p

1! p

noise : flips bits with probability δ = 0.1

!

0

1 1

01! !

1! !

!

⇒ optimal BER attained by the Forward-Backward Recursion

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 23 / 24

Discrete Denoising with Shifts Results

Can S-DUDE achieve the Bayes optimalperformance?

0 1 2 3 4 5 60.4

0.5

0.6

0.7

0.8

0.9

1

Bit e

rror r

ate/

Window size k

Bit error rate plot

Bayes Optimum = 0.487

DUDE = 0.574

S DUDE (m=1) = 0.498

(+2.3%)

⇒ m can be regarded as another design parameter indevising a discrete denoiser

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 24 / 24

Discrete Denoising with Shifts Results

Can S-DUDE achieve the Bayes optimalperformance?

0 1 2 3 4 5 60.4

0.5

0.6

0.7

0.8

0.9

1Bi

t erro

r rat

e/

Window size k

Bit error rate plot

Bayes Optimum = 0.487

DUDE = 0.574

S DUDE (m=1) = 0.498

(+2.3%)

⇒ m can be regarded as another design parameter indevising a discrete denoiser

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 24 / 24

Discrete Denoising with Shifts Results

Can S-DUDE achieve the Bayes optimalperformance?

0 1 2 3 4 5 60.4

0.5

0.6

0.7

0.8

0.9

1Bi

t erro

r rat

e/

Window size k

Bit error rate plot

Bayes Optimum = 0.487

DUDE = 0.574

S DUDE (m=1) = 0.498

(+2.3%) ⇒ m can be regarded as another design parameter indevising a discrete denoiser

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 24 / 24