DIC12 3. Arithmetic coding - mat-web.upc.edu · DIC12 . 3. Arithmetic coding . 26.9.2012 SXD . 3.1....

DIC12 3. Arithmetic coding 26.9.2012 SXD 3.1. Segment coding 3.2. Binary representation of segments 3.3. Arithmetic coding 3.4. Arithmetic decoding 3.5. The scaling technique

2

3.1. Segment coding

Consider a source alphabet 𝒜 = {𝑎1, … ,𝑎𝑛} , with probabilities {𝑝1, … ,𝑝𝑛}. Let 𝑀 = 𝑎𝑗1 ⋯𝑎𝑗𝑁 be a source message of length 𝑁. In seg-ment coding (the first step in what will be arithmetic coding) a segment (or interval)

𝑆𝑀 = [𝑙𝑀, ℎ𝑀) ⊂ [0,1)

is assigned to any such 𝑀. Moreover, this assignment has the following properties:

1. ℎ𝑀 − 𝑙𝑀 = 𝑃(𝑀)

2. 𝑆𝑀 ∩ 𝑆𝑀′ = ∅ for any pair two distinct messages 𝑀,𝑀′ of length 𝑁.

3. ⋃ 𝑆𝑀𝑀 = [0,1).

4. If 𝑀 is a prefix of 𝑀′, then 𝑆𝑀′ ⊂ 𝑆𝑀.

So the length of 𝑆𝑀 is 𝑃(𝑀); the different 𝑆𝑀, for fixed 𝑁, form a parti-tion of [0,1), and the partition for 𝑁′ > 𝑁 is a refinement of that for 𝑁.

3

The case 𝑁 = 1

For messages of length 1 (any one of the symbols 𝑎𝑗 of 𝒜), we set

𝑆𝑎𝑗 = �𝑝1 + ⋯+ 𝑝𝑗−1, 𝑝1 + ⋯+ 𝑝𝑗−1 + 𝑝𝑗� = [𝜎𝑗−1,𝜎𝑗),

where we define 𝜎𝑗 = 𝑝1 + ⋯+ 𝑝𝑗−1 + 𝑝𝑗 , 𝑗 = 1, … ,𝑛 (we say that 𝜎1, … ,𝜎𝑛) is the cumulative probability distribution. Thus we have

𝑙𝑎𝑗 = 𝜎𝑗−1, ℎ𝑎𝑗 = 𝜎𝑗, ℎ𝑎𝑗 − 𝑙𝑎𝑗 = 𝑝𝑗

From the definitions it follows that 0 < 𝑝1 = 𝜎1 < ⋯ < 𝜎𝑛 = 1 and hence that the segments 𝑆𝑎𝑗 cover [0,1) and are pairwise disjoint. See

the left column of the illustration on next page for an example in which 𝑛 = 3.

Example. Before considering the general construction, we first describe it in a simple example. We will see how to obtain the segment 𝑆𝑀 for the message 𝑏𝑎𝑏𝑐 produced by the source {𝑎 → 0.2, 𝑏 → 0.5, 𝑐 → 0.3}.

4

By our stipulation of the case 𝑁 = 1, we know that

𝑆𝑏 = [0.2, 0.7).

Let us proceed now to assign an in-terval to 𝑏𝑎 (second colum of the illustration), then to 𝑏𝑎𝑏 (third column) and finally to 𝑏𝑎𝑏𝑐 (fourth column). The interval 𝑆𝑏𝑎 is ob-tained by subdividing

𝑆𝑏 = [𝑙𝑏 = 0.2,ℎ𝑏 = 0.7)

into segments of relative length 𝑝(𝑎) = 0.2, 𝑝(𝑏) = 0.5 and 𝑝(𝑐) = 0.3 and choosing the 𝑎-segment as 𝑆𝑏𝑎. So the division points are

𝑙𝑏𝑎 = 0.2, ℎ𝑏𝑎 = 𝑙𝑏𝑏 = 0.2 + 0.5 × 𝜎(𝑎) = 0.30,

ℎ𝑏𝑏 = 𝑙𝑏𝑐 = 0.2 + 0.5 × 𝜎(𝑏) = 0.55, ℎ𝑏𝑐 = 0.2 + 0.5 × 𝜎(𝑐) = 0.70

and hence

0.0

0.2

0.7

1.0

0.3

0.55

0.7 0.3

0.2

0.22

0.27

0.27

0.22

0.23

0.255

a

b

c

a

b

c

a

b

c

a

b

c

Example 𝒜 = {𝑎, 𝑏, 𝑐}, 𝑝 = (0.2, 0.5, 0.3), 𝑀 = 𝑏𝑎𝑏𝑐.

5

𝑆𝑏𝑎 = [0.2,0.3), 𝑆𝑏𝑏 = [0.30, 0.55), 𝑆𝑏𝑐 = [0.55,0.70).

Now the interval 𝑆𝑏𝑎𝑏 is obtained in a similar way: subdivide 𝑆𝑏𝑎 in inter-vals that are proportional to 𝑝(𝑎), 𝑝(𝑏) and 𝑝(𝑐) and choose as 𝑆𝑏𝑎𝑏 the segment corresponding to 𝑏. Actually we have

ℎ𝑏𝑎𝑎 = 𝑙𝑏𝑎𝑏 = 0.2 + 0.1 × 𝜎(𝑎) = 0.22,

ℎ𝑏𝑎𝑏 = 𝑙𝑏𝑎𝑐 = 0.2 + 0.1 × 𝜎(𝑏) = 0.27,

ℎ𝑏𝑎𝑐 = 0.2 + 0.1 × 𝜎(𝑐) = 0.30,

and hence 𝑆𝑏𝑎𝑏 = [𝑙𝑏𝑎𝑏 ,ℎ𝑏𝑎𝑏) = [0.22,0.27).

Finally we get, following the same procedure with 𝑆𝑏𝑎𝑏,

𝑆𝑏𝑎𝑏𝑐 = �0.22 + 0.05 × 𝜎(𝑏), 0.22 + 0.05 × 𝜎(𝑐)� = [0.255,0.270).

6

The general case

Suppose that we already know the interval 𝑆𝑀 = [𝑙𝑀,ℎ𝑀) of a message of length 𝑁 and that ℎ𝑀 − 𝑙𝑀 = 𝑃(𝑀). Then the interval of 𝑀′ = 𝑀𝑎𝑗 is defined as follows:

𝑆𝑀′ = [𝑙𝑀′ ,ℎ𝑀′) = �𝑙𝑀 + 𝑃(𝑀) × 𝜎𝑗−1, 𝑙𝑀 + 𝑃(𝑀) × 𝜎𝑗�.

We note that

ℎ𝑀′ − 𝑙𝑀′ = 𝑃(𝑀) × �𝜎𝑗 − 𝜎𝑗−1� = 𝑃(𝑀)𝑝𝑗 = 𝑃(𝑀′).

Remark. The conditions 1-4 at the beginning are a direct consequence of the definitions.

7

Computations

cumulative(S):= begin local Z={}, z=0

for s in S do z=z+s.1; Z=Z|{s.2->z} end end

S={{0.25,a},{0.4,b},{0.15,c},{0.1,d},{0.1,e}}; cumulative(S) {a→0.25, b→0.65, c→0.8, d→0.9, e→1.0} This function and the next have been added to CC. See the example AE.cc

8

# Interval encoding of a message M (given as a list) emitted by source S IE(M,S):= begin local P, Z, l=0, h=1, u P={s.2->s.1 with s in S} Z=cumulative(S) for x in M do u=h-l h=l+Z(x)*u; l=h-P(x)*u end {l,h} end;

S={{0.2,a},{0.5,b},{0.3,c}}; M={b,a,b,c} IE(M,S) {0.255,0.27}

9

3.2. Binary representation of segments Binary representation of numbers in the unit segment

Examples. 0.5→0.1, 0.25→0.01, 0.125→0.001, 0.75→0.11;

0.255→0.01000, 0.270→0.01001;

0.011001→1/4+1/8+1/64=25/64=0.390625.

0 1

0 1 0 1 0 1

0 1 0 1 0 1 0 1

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

0.010000

0.010001

0.255

0.270

0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

The binary unit segment

0.011001

10

Computations

# binary expression of a real number in [0,1) binary(x: Real, bitprecision: Integer) check (bitprecision>=0) := begin local e=floor(x), b, r=[], j=0 x=x-e e=reverse(base_change(e,2)) while (j<bitprecision) do j=j+1 x=2*x b=floor(x) if b>=1 then x=x-b; r=r|[1] else r=r|[0] end end {e,r} end;

binary(x:Real):=binary(x,6);

11

binary(0.255) {[0],[0,1,0,0,0,0]}

binary(0.270) {[0],[0,1,0,0,0,1]}

binary(0.255,8) {[0],[0,1,0,0,0,0,0,1]}

binary(0.270,8) {[0],[0,1,0,0,0,1,0,1]}

12

3.3. Arithmetic coding

The basic idea of arithmetic coding is to select an element in the ment 𝑆𝑀 = [𝑙𝑀 ,ℎ𝑀) that requires the minimal number of bits. Then we encode 𝑀 using the binary word formed with those bits. This can be accomplished as follows. Suppose that the first bit that is dif-ferent in the binary representations of 𝑙𝑀 and ℎ𝑀 is the 𝑟-th, so that we will have 𝑙𝑀 = 0. 𝑏1𝑏2 ··· 𝑏𝑟−10 ···, ℎ𝑀 = 0. 𝑏1𝑏2 ··· 𝑏𝑟−11 ···

Then the number 0. 𝑏1𝑏2 ··· 𝑏𝑟−11, or the word 𝑏1𝑏2 ··· 𝑏𝑟−11 satisfies the requirements, for any other number in the interval will require more bits. Thus we encode 𝑀 as 𝑏1𝑏2 ··· 𝑏𝑟−11.

13

Computations

AE(a:Real, b:Real) check a<b := begin local r=1, b1, b2, equal=true a=binary(a,32).2; b=binary(b,32).2 while (equal) do b1=a.r; b2=b.r if (b1==b2) then r=r+1; continue else (equal=false) end end take(b,r) end;

AE(0.255,0.270)

[0,1,0,0,0,1]

14

AE(M: List, S: List) := begin local I I=IE(M,S) {length(M), AE(I.1,I.2)} end;

# Example I S={{0.2,a},{0.5,b},{0.3,c}}; M={b,a,b,c} #

AE(M,S) # {4, [0,1,0,0,0,1]}

# Example II S={{0.25,a},{0.4,b},{0.15,c},{0.1,d},{0.1,e}}; M={b,b,a,e,e,d,e,a} # IE(M,S) # AE(M,S) # {8, [0,1,1,0,0,0,1,1,1,1,0,1,0,1,0,0,0,0,1,1]}

15

3.4. Arithmetic decoding

In this case the quickest approach is to comment on the simplest imple-mentation.

AD(C,S):= begin local P, Z, l=0, h=1, u, M={}, N=C.1, x P={s.2->s.1 with s in S} Z=cumulative(S) C=C.2 C=sum({C.j/2^j with j in range(C)})*1.0 for j in 1..N do u=h-l x=get_symbol(C,l,h,Z) M=M|{x} h=l+Z(x)*u; l=h-P(x)*u end M end;

16

get_symbol(x,a,b,Z):= for z in Z do

if a+z.2*(b-a)<=x then continue else return z.1 end end;

In order to experiment with the decoder, it is convenient to be able to generate, given a source 𝑆, messages of any length and having a distribu-tion ruled by the symbol probabilities. This can be accomplished with the following auxiliary functions.

urn(S,N):= begin local U={}, A, P P={ceiling(N*s.1) with s in S} A={s.2 with s in S} for j in range(S) do U=U|constant_list(P.j,A.j) end end;

17

The function urn(S,N) constructs a list U of length N with the source sym-bols in such a way that the occurrence of each of those symbols is deter-mined by their probabilities in the source.

source_simulator(S,n):= begin local M={}, N=100*length(S), U=urn(S,N), r=length(U) for j in 1..n do M=M|{U.random(1,r)} end end;

This function produces a message of length n in such a way that the fre-quency of each source symbol is (randomly) ruled by its probability.

18

Example. Let us see an example of how it works.

S={{0.25,a},{0.4,b},{0.15,c},{0.1,d},{0.1,e}}; M=source_simulator(S,12) #

{b,a,d,b,b,d,c,b,a,b,e,a}

X=AE(M,S) #

{12,[0,1,0,1,0,1,0,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,1,0,1]}

AD(X,S) {b,a,d,b,b,d,c,b,a,b,e,a}

19

3.5. The scaling technique to avoid overflows

Again, the simplest way is to explain the main ideas by looking at an im-plementation:

S={{0.25,a},{0.4,b},{0.15,c},{0.1,d},{0.1,e}};

M=source_simulator(S,12) #

#S={{0.4,a},{0.133333333333333,b},{0.133333333333333,d},

#{0.266666666666667,x},{0.066666666666667,´}};

#

#m={b,a,d,x,d,a,b,´} #

#c=AE(m,S) #

#AD(c,S) #

20

SAE(M: List, S: List) := begin

local C=[], P, Z, l=0, h=1, Bl, bl, Bh, bh, BP=32 P={s.2->s.1 with s in S} Z=cumulative(S) for x in M do

u=h-l; h=l+Z(x)*u; l=h-P(x)*u Bl=binary(l,BP).2; Bh=binary(h,BP).2; for j in 1..BP do bl=Bl.j; bh=Bh.j if (bl==bh) then show("continue j="|j); continue end C=C|take(Bl,j-1) Bl=take(Bl,-(BP-j+1)); Bh=take(Bh,-(BP-j+1)); l=sum{Bl.j/2^j with j in range(Bl)}*1.0 h=sum{Bh.j/2^j with j in range(Bh)}*1.0 break end end C|[1] end;

21

Example S={{0.25,a},{0.4,b},{0.15,c},{0.1,d},{0.1,e}}; M=source_simulator(S,30) #

{a,b,c,b,a,a,d,a,a,b,e,d,b,a,e,d,a,a,a,b,b,a,b,a,a,a,a,b,b,a}

SAE(M,S) # {30,[0,0,1,0,0,0,0,1,1,0,1,0,1,1,0,1,0,1,0,1,1,1,0,0,0,0,0,1,0,0,

0,0,0,0,1,0,1,0,0,0,1,0,0,1,1,1,0,1,0,1,0,1,1,1,0,0,1,0,0,0,1]}

DIC12 3. Arithmetic coding - mat-web.upc.edu · DIC12 . 3. Arithmetic coding . 26.9.2012 SXD . 3.1....

Documents

Transcript of DIC12 3. Arithmetic coding - mat-web.upc.edu · DIC12 . 3. Arithmetic coding . 26.9.2012 SXD . 3.1....