Discrete Denoising with Shifts - Stanford University · Discrete Denoising with Shifts 1 Prediction...

Discrete Denoising with Shifts

Taesup Moon

Yahoo! Labs

EE477 Guest LectureNovember 10, 2011

Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 1 / 24

Discrete Denoising with Shifts

1 Prediction with Experts’ Advice

2 Discrete Denoising with ShiftsRecap of DUDEMotivationNew algorithm: S-DUDEResults

Discrete Denoising with Shifts Recap of DUDE

Discrete denoising

Xt, Zt, Xt take values in finite alphabets

Choose Xn1 as close as possible to Xn

1 , based on theentire Zn

1Ex) text correction, image denoising, DNA sequence analyses, etc.Performance metric: per-symbol average loss

DUDE is the first universal discretedenoiser

DUDE - [Weissman et.al 05]

For location t to be denoised, do :

1 fix the window size k

2 find left k-context (`1, . . . , `k) and right k-context (r1, . . . , rk) of zt

`1 `2 · · · `k zt r1 r2 · · · rk

3 count all occurrences of symbols in zn with the same context

4 decide on xt according to

xt(zt+kt−k) = simple rule(Π,Λ, count vector[zn, zt−1

t−k, zt+kt+1 ], zt)

Whenever DUDE sees zt+1t−kztz

t+kt+1 , it makes the same decision for zt

DUDE is a “sliding window” denoiser

Ex 1 : stationary bit stream gets corrupted

Xn : 00000011111110000000000111111111100000001111111110000

Zn : 00100011101110010001000111110111100000011110111110001

source : binary Markov chain with p = 0.1, sequence length n = 106

noise : BSC(δ = 0.1)

⇒ optimal BER attained by the Forward-Backward Recursion

Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001

DUDE achieves the optimal BER as thewindow size grows

0 1 2 3 4 5 60.5

Window size k

rror r

Bit error rate plot

Bayes Optimum = 0.558

DUDE = 0.561

Window size k is a design parameter for given sequencelength n

DUDE attains the optimum performancesfor stationary sources

For a denoiser Xn = {Xt(zn)}nt=1,

LXn(xn, zn) =1

Λ(xt, Xt(zn))

is the performance measure

main results of DUDE : when k = kn < d12 log|Z| ne,

1 For any stationary process X,

limn→∞

[E(LXn

DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

Dn is the set of all denoisers in the world

DUDE attains the Bayes optimal performance

2 For all x ∈ X∞,

limn→∞

DUDE(xn, Zn)−Dk(xn, Zn)

]= 0 w.p.1

Dk(xn, zn) : the best performance among Sk

DUDE is as good as the best sliding window denoiser

limn→∞

[E(LXn

DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

limn→∞

]= 0 w.p.1

limn→∞

[E(LXn

DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

limn→∞

]= 0 w.p.1

limn→∞

[E(LXn

DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

limn→∞

]= 0 w.p.1

Discrete Denoising with Shifts Motivation

Ex 2 :piecewise stationary bit stream getscorruptedXn : 00000011111110000000000111111101100011011011011010110

Zn : 00100011101110010001000111110101100011111011010010100

source : binary Markov chain with p1 = 0.01→ p2 = 0.2 at t∗ = n2

Ex 2 :piecewise stationary bit stream getscorruptedXn : 00000011111110000000000111111101100011011011011010110Zn : 00100011101110010001000111110101100011111011010010100

Does DUDE achieve the optimal BER?

0 1 2 3 4 5 60.4

Window Size k

rror r

Bit error rate plot

DUDE = 0.574

(+18%)

DUDE applies the same rule “regardless of the location”DUDE has a limitation for time- (space-) varying sources

In practice, many sources are time-(space-) varying

text : English → Spanish → German . . .

voice : image :

voice :

image :

voice : image :

Discrete Denoising with Shifts New algorithm: S-DUDE

Can we do better than the DUDE whenthe source varies?

Questions

1 Can we perform as if we knew the source including its change points?

2 If so, can we do it efficiently?

answers1 Yes. S-DUDE can do essentially as well as if it knows

the source and its change points

2 Yes. S-DUDE is a linear complexity algorithm

[M and Weissman, IEEE Trans. Info. Theory, Nov 09]

Questions

Take a closer look at the binary example

Binary, BSC(δ)Suppose DUDE with window size k = 3 decided as follows :

zt+3t−3 :

↓xt :

0100110︸︷︷︸↓0

0101110︸︷︷︸↓1

010 • 110 defined a “say-what-you-see” mapping in the middleDUDE employs the same mapping whenever it sees 010 • 110

zt+3t−3 :

↓xt :

0100110︸︷︷︸↓0

0101110︸︷︷︸↓1

010 • 110 defined a “say-what-you-see” mapping in the middle

DUDE employs the same mapping whenever it sees 010 • 110

zt+3t−3 :

↓xt :

0100110︸︷︷︸↓0

0101110︸︷︷︸↓1

zt+3t−3 :

↓xt :

0100110︸︷︷︸↓0

0101110︸︷︷︸↓1

Only 4 single-letter mappings in binary example“say-what-you-see”,“flip-what-you-see”,“always-say-0”,“always-say-1”

zt+3t−3 :

↓xt :

0100110︸︷︷︸↓0

0101110︸︷︷︸↓1

DUDE counts n0 and n1 for 010 • 110 andif n0 ≈ n1 → “say-what-you-see”if n0 � n1 → “always-say-0”if n0 � n1 → “always-say-1”threshold depends on δ

Employing shifting single-letter mappingswill be helpful

Suppose 0’s 1’s at 010 • 110 looked like

0000100011000011111111011101︸︷︷︸swys

“always-say-0” → “always-say-1” may be better than fixed“say-what-you-see”

Generally, if single-letter mappings have some freedom to shift,they can attain smaller loss

How can we decide when to shift to what?

00001000110000︸︷︷︸ 11111111011101︸︷︷︸all− 0 all− 1

Snm is a class of shifting single-lettermappings

Ideally, shifting every time to the correct mapping would be thebest

equivalent to knowing the source sequence ⇒ impossible!

We limit the number of shifts to m

Snm : class of single-letter mappings shifting at most m times for

sequence length n, e.g.,

swys{s1, · · · , sn} :

all-0 all-1

|Snm| ≤

)· |S|m, |S| = |Z||X | (number of single-letter mappings)

Deciding when to shift to what m times⇔ Selecting the best combination in Sn

swys{s1, · · · , sn} :

all-0 all-1

|Snm| ≤

swys{s1, · · · , sn} :

all-0 all-1

|Snm| ≤

We limit the number of shifts to mSn

m : class of single-letter mappings shifting at most m times forsequence length n, e.g.,

swys{s1, · · · , sn} :

all-0 all-1

|Snm| ≤

swys{s1, · · · , sn} :

all-0 all-1

|Snm| ≤

swys{s1, · · · , sn} :

all-0 all-1

|Snm| ≤

swys{s1, · · · , sn} :

all-0 all-1

|Snm| ≤

The key tool is to devise an estimate ofthe loss ΛFocus on the single-letter setting (s(·) : Z → X )

X = s(Z)x ! Z

Λ(x, s(Z)) : loss between x and s(Z)

not observable

But, from the knowledge of Π, we devise `(Z, s) such that

(`(Z, s)

(Λ(x, s(Z))

`(Z, s) is an unbiased estimate of Ex

(Λ(x, s(Z))

`(Z, s) : loss between Z and s(·)

observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]

X = s(Z)x ! Z

not observable

(`(Z, s)

(Λ(x, s(Z))

observable

X = s(Z)x ! Z

not observable

(`(Z, s)

(Λ(x, s(Z))

observable

X = s(Z)x ! Z

not observable

(`(Z, s)

(Λ(x, s(Z))

observable

X = s(Z)x ! Z

not observable

(`(Z, s)

(Λ(x, s(Z))

observable

X = s(Z)x ! Z

not observable

(`(Z, s)

(Λ(x, s(Z))

observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

X = s(Z)x ! Z

not observable

(`(Z, s)

(Λ(x, s(Z))

`(Z, s) : loss between Z and s(·)observable

[Weissman et. al., Universal filtering via prediction, IEEE IT 07]Taesup Moon (Yahoo! Labs) EE477 Guest Lecture Nov 10, 2011 13 / 24

S-DUDE is defined by minimizing the sumof the estimated losses

For each context c (e.g., 010 • 110),S-DUDE finds

S , arg minS∈Snc

i∈context c

`(zi, si)

vs. arg minS∈Snc

i∈context c

Λ(xi, si(zi))

and applies them

Question : how can we get S = {s1, · · · , snc} ∈ Sncm efficiently?

S , arg minS∈Snc

i∈context c

`(zi, si)

vs. arg minS∈Snc

i∈context c

Λ(xi, si(zi))

and applies them

S , arg minS∈Snc

i∈context c

`(zi, si)

vs. arg minS∈Snc

i∈context c

Λ(xi, si(zi))

and applies them

arg minS∈Snc

i∈context c

`(zi, si)

vs. arg minS∈Snc

i∈context c

Λ(xi, si(zi))

and applies them

S , arg minS∈Snc

i∈context c

`(zi, si)

vs. arg minS∈Snc

i∈context c

Λ(xi, si(zi))

and applies them

S , arg minS∈Snc

i∈context c

`(zi, si)

vs. arg minS∈Snc

i∈context c

Λ(xi, si(zi))

and applies them

S , arg minS∈Snc

i∈context c

`(zi, si)

vs. arg minS∈Snc

i∈context c

Λ(xi, si(zi))

and applies them

S , arg minS∈Snc

i∈context c

`(zi, si)

vs. arg minS∈Snc

i∈context c

Λ(xi, si(zi))

and applies them

S-DUDE can be implemented with atwo-pass algorithm

again binary, BSC(δ) example

problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamic

programming

3 second pass : from Mn, extract the best {s1, · · · , sn} by a backwardrecursion

∑nt=1 `(zt, st)

to solve,

programming

∑nt=1 `(zt, st)

to solve,

programming

∑nt=1 `(zt, st)

to solve,

programming

∑nt=1 `(zt, st)

to solve,

1 allocate Mt ∈ Rm×4 for each 1 ≤ t ≤ n

2 first pass : scan (z1, · · · , zn) and update {Mt}nt=1 by dynamicprogramming

∑nt=1 `(zt, st)

to solve,

programming

∑nt=1 `(zt, st)

to solve,

programming

Mt stores minimum sum of estimatedlosses up to tAgain binary, BSC(δ) example

Problem : find best {s1, · · · , sn} ∈ Snm that minimizes

∑nt=1 `(zt, st)

si ∈ {all-0, all-1, swys, fwys}Elements of Mt are defined to be the minimum sum up to t, e.g.,

all-0 swysall-1 fwys

Mt(i, swys) = min{s1,··· ,st}∈St

{`(zt, st = swys) +t−1∑

`(zr, sr)}

First pass uses dynamic programming

Only two possible cases to attain Mt(i, swys)

1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)

1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)

2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)

1 i-th shift has occurred at t : min1≤j≤|S|Mt−1(i− 1, j) + `(zt, swys)2 i-th shift has occurred before t : Mt−1(i, swys) + `(zt, swys)

Mt(i, swys) =`(zt, swys) + min

{Mt−1(i, swys),min1≤j≤|S|Mt−1(i− 1, j)

same for all other elements

Second pass extracts S and denoise

When t = n,sn = arg minj∈{all−0,all−1,swys,fwys}Mn(m, j), xn = sn(zn)

minS!Snm

!nt=1 !(zt, st)

for t = n− 1, · · · , 1 : follow the optimal path and denoise!

Second pass extracts S and denoise

When t = n,sn = arg minj∈{all−0,all−1,swys,fwys}Mn(m, j), xn = sn(zn)

minS!Snm

!nt=1 !(zt, st)

for t = n− 1, · · · , 1 : follow the optimal path and denoise!

The complexity of S-DUDE is linear in nand m

Complexity

space : O(mn|Z|2k)time : O(mn|Z|2k)practical

Complexityspace : O(mn|Z|2k)time : O(mn|Z|2k)

practical

Complexityspace : O(mn|Z|2k)time : O(mn|Z|2k)practical

Summary of S-DUDE

S-DUDE (Shifting DUDE)

For location t to be denoised, do :

1 fix the window size k, set the number of shifts m

2 find left k-context (`1, . . . , `k) and right k-context (r1, . . . , rk) of zt

`1 `2 · · · `k zt r1 r2 · · · rk

3 on all positions that share the same context c with zt

find S = arg minS∈Sncm

Pt∈context c `(zt, st)

4 decide on xt according to

xt = st(zt), where st(·) comes from S

We can also show that if we set m = 0, S-DUDE coincides withDUDE

Discrete Denoising with Shifts Results

S-DUDE achieves the optimum loss fortime-(space-) varying sourcesWhen k = kn <

12 log|Z| n,

Theorem 1 (stochastic setting)

For all piecewise stationary processes X,

limn→∞

[E(LXn

S-DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

)]= 0,

provided that the number of stationary segments is m = o(n) w.p.1

Theorem 2 (individual sequence setting)

When m = o(n), for all x ∈ X∞,

limn→∞

S-DUDE(xn, Zn)−Dk,m(xn, Zn)

]= 0 w.p.1

where Dk,m(xn, zn) is the best performance attained by k-th order slidingwindow denoisers that can shift at most m times

S-DUDE achieves the optimum loss fortime-(space-) varying sourcesWhen k = kn <

12 log|Z| n,

Theorem 1 (stochastic setting)

For all piecewise stationary processes X,

limn→∞

[E(LXn

S-DUDE(Xn, Zn)

)− min

Xn∈Dn

E(LXn(Xn, Zn)

)]= 0,

provided that the number of stationary segments is m = o(n) w.p.1

Theorem 2 (individual sequence setting)

When m = o(n), for all x ∈ X∞,

limn→∞

S-DUDE(xn, Zn)−Dk,m(xn, Zn)

]= 0 w.p.1

where Dk,m(xn, zn) is the best performance attained by k-th order slidingwindow denoisers that can shift at most m times

No denoiser is better than S-DUDE

Strong converse

If m = Θ(n), no denoiser can achieve previous theorems.

m = o(n) is a necessary and sufficient condition for previous theorems!

Ex 2 : piecewise stationary bit stream(revisited)Xn : 00000011111110000000000111111111100000001111111110000

Zn : 00100011101110010001000111110111100000011110111110001

noise : flips bits with probability δ = 0.1

Ex 2 : piecewise stationary bit stream(revisited)Xn : 00000011111110000000000111111111100000001111111110000Zn : 00100011101110010001000111110111100000011110111110001

Can S-DUDE achieve the Bayes optimalperformance?

0 1 2 3 4 5 60.4

rror r

Window size k

Bit error rate plot

DUDE = 0.574

S DUDE (m=1) = 0.498

(+2.3%)

⇒ m can be regarded as another design parameter indevising a discrete denoiser

0 1 2 3 4 5 60.4

t erro

Window size k

Bit error rate plot

DUDE = 0.574

S DUDE (m=1) = 0.498

(+2.3%)

⇒ m can be regarded as another design parameter indevising a discrete denoiser

0 1 2 3 4 5 60.4

t erro

Window size k

Bit error rate plot

DUDE = 0.574

S DUDE (m=1) = 0.498

(+2.3%) ⇒ m can be regarded as another design parameter indevising a discrete denoiser

Discrete Denoising with Shifts - Stanford University · Discrete Denoising with Shifts 1 Prediction...

Documents

Transcript of Discrete Denoising with Shifts - Stanford University · Discrete Denoising with Shifts 1 Prediction...

Directional Weight Based Contourlet Transform Denoising ... · The review of the OCT image denoising methods ... contourlet-based image denoising algorithms are introduced in [8–11].

Good denoising using wavelets

Denoising Using Wavelets

wavelet denoising

ECG Signal Denoising

Compression and Denoising

Activity 1: Denoising an Image by Hand · Web viewDigital Image Processing Labs DENOISING IMAGES Digital Image Processing Labs DENOISING IMAGES Digital Image Processing Labs DENOISING

Adaptive wavelet thresholding for image denoising … Wavelet Thresholding for Image Denoising ... signal denoising using nonlinear techniques, ... ADAPTIVE WAVELET THRESHOLDING FOR

Denoising of Document Images using Discrete Curvelet ...Dayalbagh Educational Institute Dayalbagh, Agra, UP, India C. V. Lakshmi Physics & CS Department Dayalbagh Educational Institute

Wavefield-denoising and source encoding · Wavefield-denoising and source encoding Rongrong*Wang,*Ozgur*Yilmaz,*and*Felix*Herrmann. Outline Wavefield Reconstruction Inversion (WRI)-denoising

Wavelet Analysis For Robust Speech Processing and Applications: Applications of Discrete Wavelet Transform and Wavelet Denoising to Speech Classification, ... Enhancement and Robust

Image denoising algorithms

Denoising via Wavele

ECG DENOISING USING NN.pp

DUDE: An Algorithm for Discrete Universal Denoising

IVFC Signal Denoising

The Efficient Denoising Artificial Light Interference using Discrete Wavelet Transform with Application to Indoor Optical Wireless System

DUDE: An Algorithm for Discrete Universal Denoising Algorithm for Discrete Universal Denoising ... Example Source: Binary Markov ... Wz right peace the rest iction on alksoable sequbole

Medical Image Denoising - austinpublishinggroup.com

Denoising using local projective subspace methodshera.ugr.es/doi/16516813.pdf · Denoising using local projective subspace methods ... With local projective denoising techniques,

Wavefield-denoising and source encoding · Wavefield-denoising and source encoding RongrongWang,OzgurYilmaz,andFelixHerrmann. Outline Wavefield Reconstruction Inversion (WRI)-denoising