062 Teorija Informacije Aritmeticko kodiranje

22
TEORIJA INFORMACIJE Željko Jeričević, dr. sc. Zavod za računarstvo, Tehnički fakultet & Zavod za biologiju i medicinsku genetiku, Medicinski fakultet 51000 Rijeka, Croatia Phone: (+385) 51-651 594 E-mail: [email protected] http://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html

description

Aritmeticko kodiranje

Transcript of 062 Teorija Informacije Aritmeticko kodiranje

  • TEORIJA INFORMACIJE

    eljko

    Jerievi, dr. sc.Zavod

    za

    raunarstvo, Tehniki

    fakultet

    &

    Zavod

    za

    biologiju

    i medicinsku

    genetiku, Medicinski

    fakultet51000 Rijeka, Croatia

    Phone: (+385) 51-651 594 E-mail: [email protected]

    http://www.riteh.uniri.hr/~zeljkoj/Zeljko_Jericevic.html

  • 10 February 2012 [email protected] 2

    Information theoryIz

    dosadanjeg

    gradiva

    znamo

    da

    se informacija

    prije

    slanja

    kroz

    kanal

    treba

    prirediti. To se postie

    pretvorbom

    informacije

    u formu

    koja

    ima

    entropiju

    blisku

    maksimalnoj

    ime

    se efikasnost

    prenosa

    pribliava

    maksimalnoj. Ovo

    se moe

    postii kompresijom

    bez

    gubitaka

    informacije

    (lossless compression),

    napr. aritmetikim

    kodiranjem.Druga

    pretvorba

    odnosi

    se na

    sigurnost

    prijenosa

    pri

    emu

    se

    informacija

    prevodi

    u formu

    gdje

    je za

    odreeni

    tip pogreaka mogua

    automatska

    korekcija

    (napr. Hamming-ovim

    kodiranjem).

  • 10 February 2012 [email protected] 3

    Saimanje

    (compression)

  • 10 February 2012 4

    Entropijsko

    kodiranje: Kraft-ova nejednakost

    (u Huffman & Shannon-Fano)1.4.1 The Kraft inequality

    We shall prove the existence of efficient source codes by actually constructing somecodes that are important in applications. However, getting to these results requires someintermediate steps.A binary variable-length source code is described as a mapping from the sourcealphabet A to a set of finite strings, C from the binary code alphabet, which we alwaysdenote {0, 1}. Since we allow the strings in the code to have different lengths, it isimportant that we can carry out the reverse mapping in a unique way. A simple way ofensuring this property is to use a prefix code, a set of strings chosen in such a way thatno string is also the beginning (prefix) of another string. Thus, when the current stringbelongs to C, we know that we have reached the end, and we can start processing thefollowing symbols as a new code string. In Example 1.5 an example of a simple prefixcode is given.If ci is a string in C and l(ci ) its length in binary symbols, the expected length of thesource code per source symbol isL(C) =Ni=1P(ci )l(ci ).If the set of lengths of the code is {l(ci )}, any prefix code must satisfy the followingimportant condition, known as the Kraft inequality:

    i2l(ci )

    1. (1.10)

  • 10 February 2012 5

    Entropijsko

    kodiranje: Kraft-ova nejednakost1.4.1 The Kraft inequality

    The code can be described as a binary search tree: starting from

    the root, two branchesare labelled

    0 and 1, and each node is either a leaf that corresponds to the

    end of a string,or a node that can be assumed to have two continuing branches. Let lm be the maximallength of a string. If a string has length l(c), it follows from the prefix condition thatnone of the 2lml(c) extensions of this string are in the code. Also, two extensions ofdifferent code strings are never equal, since this would violate

    the prefix condition. Thusby summing over all codewords

    we get

    i2lml(ci )

    2lmand the inequality follows. It may further be proven that any uniquely decodable codemust satisfy (1.10) and that if this is the case there exists a prefix code with thesame set of code lengths. Thus restriction to prefix codes imposes no loss in codingperformance.

  • 10 February 2012 6

    Entropijsko

    kodiranje: Kraft-ova nejednakost

    1.4.1 The Kraft inequalityExample 1.5 (A simple code). The code {0, 10, 110, 111} is a

    prefix code for an alphabetof four symbols. If the probability distribution of the source is

    (1/2, 1/4, 1/8, 1/8), theaverage length of the code strings is 1

    1/2 + 2

    1/4 + 3

    1/4 = 7/4, which isalso the entropy of the source.

  • 10 February 2012 7

    Entropijsko

    kodiranje: Kraft-ova nejednakost1.4.1 The Kraft inequality

    If all the numbers log P(ci ) were integers, we could choose these as the lengthsl(ci ). In this way the Kraft inequality would be satisfied with equality, and furthermoreL = i P(ci )l(ci ) = i P(ci )log P(ci ) = H(X)and thus the expected code length would equal the entropy. Such a case is shown inExample 1.5. However, in general we have to select code strings that only approximatethe optimal values. If we round log P(ci ) to the nearest larger integer log P(ci ),the lengths satisfy the Kraft inequality, and by summing we get an upper bound on thecode lengthsl(ci ) = log P(ci ) log P(ci ) + 1. (1.11)The difference between the entropy and the average code length may be evaluatedfromH(X)

    L = i P(ci ) log P(ci )

    li = i

    P(ci )log 2l P(ci )

    log i

    2li

    0,where the inequalities are those established by Jensen and Kraft, respectively. This givesH(X)

    L

    H(X) + 1, (1.12)where the right-hand side is given by taking the average of (1.11).The loss due to the integer rounding may give a disappointing resultwhen

    the coding isdone on single source symbols. However, if we apply the result to strings of N symbols,we find an expected code length of at most NH + 1, and the result per source symbolbecomes at most H + 1/N. Thus, for sources with independent symbols, we can get anexpected code length close to the entropy by encoding sufficiently long strings of sourcesymbols.

  • 10 February 2012 8

    Aritmetiko

    kodiranje

    Pretpostavimo

    da

    elimo

    poslati

    poruku

    koja

    se sastoji

    od

    3 slova: A, B & C s podjednakom

    vjerojatnosti pojavljivanja

    Upotreba

    2 bita

    po

    simbolu

    je neefikasna: jedna

    od kombinacija

    bitova

    se nikada

    nee

    upotrebiti.

    Bolja

    ideja

    je upotreba

    realnih

    brojeva

    izmedu

    0 & 1 u brojevnom

    sustavu

    po

    bazi

    3, pri

    cemu

    svaka

    znamenka

    predstavlja

    simbol.

    Na primjer, sekvenca

    ABBCAB postaje

    0.011201 (uz

    A=0, B=1, C=2)

  • 10 February 2012 9

    Aritmetiko

    kodiranje

    Prevoenjem

    realnog

    broja

    0.011201 po

    bazi

    3 u

    binarni, dobivamo

    0.001011001

    Upotreba

    2 bita

    po

    simbolu

    zahtjeva

    12 bitova

    za

    sekvencu

    ABBCAB, a binarna

    reprezentacija

    0.011201 (u bazi

    3) zahtjeva

    9 bitova

    u binarnoj

    bazi

    to

    je uteda

    od

    25%.

    Metoda

    se zasniva

    na

    efikasnim

    in place

    algoritmima

    za

    prevoenje

    iz

    jedne

    baze

    u drugu

  • 10

    Brzo

    prevoenje

    iz

    jedne

    baze

    u drugu

    Linux/Unix bc

    program

    Primjeri: echo "ibase=2; 0.1" | bc .5 echo "ibase=3; 0.1000000" | bc .3333333 echo "ibase=3; obase=2; 0.011201" | bc .00101100100110010001 echo "ibase=2; obase=3; .001011001" | bc .0112002011101011210 zaokrueno na .011201 (duina 6)

  • 10 February 2012 11

    Aritmetiko

    dekodiranje

    Aritmetikim

    kodiranjem

    moemo

    postii

    rezultat

    blizak

    optimalnom

    (optimalno

    je log2

    p bita

    za

    svaki simbol

    vjerojatnosti

    p).

    Primjer

    s etiri

    simbola, aritmetikim

    kodom

    0.538 i sljedeom

    distribucijom

    vjerojatnosti

    (D je kraj

    poruke):

    0.6 0.2 0.1 0.1Simbol A B C D

    Vjerojatnost

  • 10 February 2012 12

    Aritmetiki

    kod

    sekvence

    je 0.538 (ACD)

    Prvi

    korak: poetni

    interval [0,1] podjeli

    u subintervale

    proporcionalno

    vjerojatnostima:

    0.538 pada

    u prvi

    interval (simbol

    A)

    [0 0.6) [0.6 0.8) [0.8 0.9) [0.9 1)Simbol A B C DInterval

  • 10 February 2012 13

    Aritmetiki

    kod

    sekvence

    je 0.538 (ACD)

    Drugi

    korak: interval [0,6) izabran

    u prvom

    koraku

    podjeli

    u subintervale

    proporcionalno

    vjerojatnostima:

    0.538 pada

    u trei

    sub-interval (simbol

    C)

    [0 0.36) [0.46 0.48) [0.48 0.54) [0.54 0.6)Simbol A B C DInterval

  • 10 February 2012 14

    Aritmetiki

    kod

    sekvence

    je 0.538 (ACD)

    Trei

    korak: interval [0.48-0.54) izabran

    u prvom

    koraku

    podjeli

    u subintervale

    proporcionalno vjerojatnostima:

    0.538 pada

    u etvrti

    sub-interval (simbol

    D, koji

    je ujedno

    i simbol

    zavretka

    niza)

    [0.48 0.516) [0.516 0.528) [0.528 0.534) [0.534 0.54)Simbol A B C DInterval

  • 10 February 2012 15

    Aritmetiki

    kod

    sekvence

    je 0.538 (ACD)Grafiki

    prikaz

    aritmetikog

    dekodiranja

  • 10 February 2012 16

    Aritmetiki

    kod

    sekvence

    je 0.538 (ACD)

    (ne)Jednoznanost: Ista

    sekvenca

    mogla

    se prikazati

    kao

    0.534, 0.535, 0.536, 0.537 ili

    0.539. Uporaba

    dekadskih umijesto

    binarnih

    znamenki

    uvodi

    neefikasnost.

    Informacijski

    sadraj

    tri dekadske

    zamenke

    je oko

    9.966 bita

    (zato?)

    Istu

    poruku

    moemo

    binarno

    kodirati

    kao

    0.10001010 to

    odgovara

    0.5390625 dekadski

    i zahtjeva

    8 bita.

  • 10 February 2012 17

    Aritmetiki

    kod

    sekvence

    je 0.538 (ACD)8 bita

    je vie

    nego

    stvarna

    entropija

    poruke

    (1.58 bita)

    zbog

    kratkoe

    poruke

    i pogrene

    distribucije. Ako

    se uzme

    u obzir

    stvarna

    distribucija

    simbola

    u poruci

    poruka

    se moe

    kodirati

    uz

    upotrebu

    sljedeih

    intervala: [0, 1/3); [1/9, 2/9); [5/27, 6/27); i binarnog

    intervala

    of

    [1011110, 1110001). Rezultat

    kodiranja

    je poruka

    111, odnosno

    3 bita

    Ispravna

    statistika

    poruke

    je krucijalna

    za

    efikasnost kodiranja!

  • 18

    Aritmetiko

    kodiranjeIterativno

    dekodiranje

    poruke

  • 19

    Aritmetiko

    kodiranjeIterativno

    kodiranje

    poruke

  • 20

    Aritmetiko

    kodiranjeDva

    simbola

    s vjerojatnou

    pojavljivanja

    px

    =2/3 & py

    =1/3

  • 21

    Aritmetiko

    kodiranjeTri simbola

    s vjerojatnou

    pojavljivanja

    px

    =2/3 & py

    =1/3

  • 22

    Aritmetiko

    kodiranje

    TEORIJA INFORMACIJEInformation theorySaimanje (compression)Entropijsko kodiranje: Kraft-ova nejednakost (u Huffman & Shannon-Fano)Entropijsko kodiranje: Kraft-ova nejednakostEntropijsko kodiranje: Kraft-ova nejednakostEntropijsko kodiranje: Kraft-ova nejednakostAritmetiko kodiranjeAritmetiko kodiranjeBrzo prevoenje iz jedne baze u druguAritmetiko dekodiranjeAritmetiki kod sekvence je 0.538 (ACD)Aritmetiki kod sekvence je 0.538 (ACD)Aritmetiki kod sekvence je 0.538 (ACD)Aritmetiki kod sekvence je 0.538 (ACD)Aritmetiki kod sekvence je 0.538 (ACD)Aritmetiki kod sekvence je 0.538 (ACD)Aritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeImage bitplanesImage bitplanesImage bitplanesImage bitplanesAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeHvala na panjiOKHuffman-ovo kodiranjeAritmetiko kodiranjeAritmetiko kodiranjeAdaptivno Huffman-ovo kodiranjeModificirano Huffman-ovo kodiranje