DIT - University of Trentoassets.disi.unitn.it/uploads/doctoral_school/documents/... · 2011. 2....

PhD Dissertation

International Doctorate School in Information and

Communication Technologies

DIT - University of Trento

Design principles for embedded

multimedia bitstreams transmission over

wireless links

Cristina E. Costa

Advisor:

Prof. Francesco G.B. De Natale

Università degli Studi di Trento

Co-Advisor:

Prof. Aggelos Katsaggelos

Northwestern University

February 22, 2005

Abstract

Applications supported by new wireless communications systems are evolv-

ing from voice and/or pure data transmission to multimedia. These ser-

vices are characterized by the transmission of a large amount of data and

real time constraints, which impose a significant increase in the complexity

and capability offered by transmitting devices. Wireless networks impose

great limitations to multimedia transmission, due to channel losses and

channel characteristics variability, that, combined with the limitation of

resources such as energy and computational power, lead to the research of

different approaches for multimedia data transmission.

Multimedia bitstreams have unique characteristics that can be exploited as

an advantage if taken into account during the design phase of the system.

In order to increase the transmission robustness and flexibility, new ap-

proaches for compression and coding have been studied.

Between them, progressive coding is one of the more interesting techniques

because it allows to create embedded bitstreams. This kind of bitstreams

can be used for implementing SNR scalability, since they can be truncated,

and still decoded, generating a lower quality version of the original data.

In this thesis, we investigate the use of embedded bitstreams in wireless

transmission and various approaches are proposed. In particular cross-

layering techniques are considered for implementing energy efficient coding

and transmission.

Keywords: embedded multimedia coding, wireless transmission, cross-

layer, energy efficient coding, MPEG-4 FGS, JPEG2000, region of interest.

Contents

1 Introduction 1

1.1 Video source coding techniques . . . . . . . . . . . . . . . 2

1.2 Scalability in image and video coding . . . . . . . . . . . . 3

1.3 Region of interest and non uniform compression . . . . . . 5

1.4 Transmission of multimedia bitstreams . . . . . . . . . . . 7

1.5 Unequal Error Protection . . . . . . . . . . . . . . . . . . 9

1.6 Joint source and channel coding . . . . . . . . . . . . . . . 10

1.7 Scope and main contributions . . . . . . . . . . . . . . . . 11

2 Progressive coding in video and image compression 13

2.1 Progressive scalability in JPEG2000 image coding standard 15

2.1.1 Rate-Distortion information . . . . . . . . . . . . . 16

2.2 Progressive scalability in MPEG-4 video coding standard . 17

2.2.1 FGS decoder simplification using post-clipping . . . 21

2.2.2 FGS Advanced Features . . . . . . . . . . . . . . . 21

2.2.3 Rate-Distortion model of the FGS bitstream . . . . 24

2.3 Applications of progressive coding . . . . . . . . . . . . . . 26

3 Wireless video and embedded bitstreams transmission 29

3.1 Joint source and channel coding in wireless transmission . 30

3.2 Unequal error protection in progressive and scalable bitstreams 30

3.3 Cross-layer approaches . . . . . . . . . . . . . . . . . . . . 31

i

3.4 Joint source coding and power control . . . . . . . . . . . . 33

3.5 Modulation based UEP . . . . . . . . . . . . . . . . . . . . 34

4 Non uniform compression in image and video transmission 35

4.1 Nonuniform compression of geometrically distorted images 36

4.2 Evaluation of spatial distortion . . . . . . . . . . . . . . . 38

4.3 Adaptive compression of geometrically distorted images . . 47

4.3.1 Adaptive Compression using a JPEG-like scheme and

QDM . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.3.2 Adaptive Compression using JPEG2000 and QDM 50

4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . 52

4.4.1 Quality measurement . . . . . . . . . . . . . . . . . 52

4.4.2 Non Uniform Compression using JPEG . . . . . . . 56

4.4.3 Non Uniform Compression using JPEG2000 . . . . 56

4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5 Interactive ROI selection using FGS in MPEG-4 video trans-

mission 65

5.1 The use of RoI in video browsing . . . . . . . . . . . . . . 66

5.2 The proposed approach . . . . . . . . . . . . . . . . . . . . 67

5.3 Application testbed . . . . . . . . . . . . . . . . . . . . . . 70

5.4 Experimental results . . . . . . . . . . . . . . . . . . . . . 70

5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 71

6 Energy efficient transmission 75

6.1 Distortion in progressive and scalable bitstreams . . . . . . 77

6.2 A general optimization approach to energy constrained prob-

lems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.3 Channel model . . . . . . . . . . . . . . . . . . . . . . . . 81

6.4 Application to image transmission . . . . . . . . . . . . . . 85

ii

6.4.1 Simulations results for jpeg2000 transmission . . . . 86

6.5 Application to video transmission . . . . . . . . . . . . . . 87

6.5.1 Simulations results for FGS MPEG-4 video trans-

mission . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.5.2 Rate-Distortion model of the FGS bitstream . . . . 97

6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 99

7 Study of the effects of the modulation scheme choice 101

7.1 AWGN channel model . . . . . . . . . . . . . . . . . . . . 101

7.2 Solution for AGWN channel model . . . . . . . . . . . . . 103

7.2.1 Modulation scheme comparison . . . . . . . . . . . 105

7.3 Combined use of energy based UEP and channel coding . . 107

7.4 Solution with RS coding . . . . . . . . . . . . . . . . . . . 112

7.5 Modulation comparison with error correcting codes . . . . 116

7.5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . 120

8 Conclusions 121

Bibliography 123

A Detailed procedure 135

A.1 Defining the channel model . . . . . . . . . . . . . . . . . 138

A.2 Dual problem . . . . . . . . . . . . . . . . . . . . . . . . . 140

iii

List of Tables

4.1 QDM statistics for JPEG encoder without adaptation, CR=10 54

4.2 QDM statistics for JPEG encoder without adaptation, CR=20 54

4.3 QDM statistics for JPEG2000 encoder without adaptation,

CR=10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.4 QDM statistics for JPEG2000 encoder without adaptation,

CR=20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.1 Image PSNR. . . . . . . . . . . . . . . . . . . . . . . . . . 87

6.2 General parameter settings. . . . . . . . . . . . . . . . . . 92

6.3 Parameter settings for the three experiments. . . . . . . . 92

7.1 Parameters a, α, and the spectral efficiency rb/BT for dif-

ferent modulations. . . . . . . . . . . . . . . . . . . . . . . 102

v

List of Figures

2.1 FGS encoder block schema. . . . . . . . . . . . . . . . . . 18

2.2 Bit-plane encoding of the Enhancement Layer. . . . . . . . 19

2.3 FGS bitstream. . . . . . . . . . . . . . . . . . . . . . . . . 19

2.4 FGS bitplane truncation. . . . . . . . . . . . . . . . . . . 19

2.5 Comparison between the measured data and the R-D curve

calculated from the BP data. . . . . . . . . . . . . . . . . . 25

4.1 Graphical representation of a fish-eye distorted image . . . 39

4.2 Compression and transmission schemes considered . . . . . 41

4.3 Example of application of QDM . . . . . . . . . . . . . . . 45

4.4 Conceptual scheme of the proposed approach . . . . . . . . 48

4.5 QDM maps of a test image: (a) Original achieved by patch

repetition of Baboon image; (b) QDM map for Semi-spherical

Mirror; (c) Parabolic Mirror . . . . . . . . . . . . . . . . . 53

4.6 Identification of a RoI from the QDM of a distorted image 56

4.7 Comparison between compression schemes at increasing com-

pression ratio for JPEG . . . . . . . . . . . . . . . . . . . 57

4.8 Comparison between compression schemes at increasing com-

pression ratio for JPEG-2000 . . . . . . . . . . . . . . . . . 59

4.9 Performance comparison on Blood image . . . . . . . . . . 60

4.10 Performance comparison on Tiled Baboon image . . . . . . 61

4.11 Performance comparison on Mobile and Calendar image . . 61

vii

4.12 Performance comparison on a synthetic image generated by

PovRay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5.1 Block scheme of the proposed method. . . . . . . . . . . . 68

5.2 mobile calendar sequence frame with and without RoI en-

hancement layer. . . . . . . . . . . . . . . . . . . . . . . . 72

5.3 AIDER sequence frame with and without RoI enhancement

layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.1 Total energy Etot and frame distortion (MSE) versus the

power of the last packet . . . . . . . . . . . . . . . . . . . 84

6.2 Probability of packet loss ρj versus the assigned power PL

for the last 4 packets (j = L− 3, .., L) for a frame in a FGScoded video sequence . . . . . . . . . . . . . . . . . . . . . 86

6.3 PSNR Gain in dB vs. interference plus noise . . . . . . . . 88

6.4 Assigned Power vs. Packet Number . . . . . . . . . . . . . 89

6.5 Average size of the bit-planes for the Foreman sequence

(QCIF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

6.6 Experiment A results . . . . . . . . . . . . . . . . . . . . . 94

6.7 Experiment B results . . . . . . . . . . . . . . . . . . . . . 95

6.8 Experiment C results . . . . . . . . . . . . . . . . . . . . . 96

6.9 PSNR: (a) experiment A, (b) B and (c) C. . . . . . . . . . 98

6.10 Comparison be tween the PSNR obtained using measured

data and the R-D model. . . . . . . . . . . . . . . . . . . . 99

7.1 PSNR comparison of the equal energy distribution method

and the proposed scheme for different modulations . . . . . 107

7.2 Average PSNR comparison of equal energy distribution method

and the proposed scheme for different modulations and en-

ergy budgets . . . . . . . . . . . . . . . . . . . . . . . . . . 108

7.3 Reed-Solomon RS(n, k) code. . . . . . . . . . . . . . . . . 109

viii

7.4 Random symbol block error performance for the RS(255,k)

code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

7.5 Performance Curves for RS codes with CR = 0.92. . . . . 111

7.6 Performance comparison between the proposed approach

and equal energy distribution . . . . . . . . . . . . . . . . 117

7.7 Impact of different levels of error protection. . . . . . . . . 118

7.8 Comparison among BPSK, FSK, and MSK modulation schemes

in terms of PSNR at the receiver . . . . . . . . . . . . . . 119

7.9 Performance improvement deriving by the introduction of

RS(110, 100) code (BPSK, BT = 200KHz). . . . . . . . . . 119

ix

Chapter 1

Introduction

Multimedia real-time transmission can very challenging, due to variations

in throughput, delay, packet loss and limited resources. Indeed video trans-

mission is very resource demanding in various ways because of the joint ef-

fect of real-time requirements and high volume of data do be transmitted.

Most concerns about multimedia transmission are related to compress-

ing data, transmitting it using the limited bandwidth offered by the chan-

nel, and protecting this data in such a way that the decoded sequence is

acceptable to the user view. Indeed video data produces an huge amount of

data that, without compression is unmanageable, due to the limitations in

the storage size and transmission bandwidth. Currently, several compres-

sion approaches are available, but without appropriate countermeasures,

the generated bitstream can became very sensitive to errors. For this rea-

son the newest video standards not only cover compression efficiency but

also transmission related issues such as error resilience and scalability.

In section 1.1 a brief introduction to video and image compression tech-

niques is given, while multimedia transmission issues are covered in deep

in section 1.4.

1

CHAPTER 1. INTRODUCTION

1.1 Video source coding techniques

In video data, a certain amount of redundancy is present both in spatial

(as for static images). Compression algorithms reduce this redundancy

with various methods and eliminate information that the human eye can

not perceive. In video compression, spatial redundancy is reduced through

methods similar to those of used for image compression, based on DCT or

wavelets. Temporal redundancy can be reduced through the encoding of

the residual image resulting from the difference between the original image

and the its prediction, resulting from the processing of the decoded near

frames. The difference image is the encoded and transmitted, together with

the side information necessary for generating the prediction image at the

decoder side. This operation is usually referred as motion compensation,

because it tries to compensate the motion between two or more frames, in

order to find an image that can be considered as a good prediction of the

original one.

In the algorithms implemented by video standards, frames are typi-

cally encoded into three different modes, usually called I-frame, P-frame

and B-frame. I-frames, also called Intra frames, are coded without motion

compensation, they allow synchronization and random access to the media,

and are a reference point for motion compensated frames. P-frames, also

called Inter frames, are encoded applying motion compensation to the pre-

vious I or P-frame. B-frames, also called Bidirectional frames, are encoded

applying motion compensation to both the previous and the successive I

or P frames. This last type of frame has the interesting property that no

other frame is encoded from its data.

More complex encoding algorithms also exist, such as 3-D wavelets and

DCT, that take in account a number of frames at time, performing in this

way both spatial and temporal compression at the same time.

2

1.2. SCALABILITY IN IMAGE AND VIDEO CODING

Currently various video standards exist:

• ISO MPEG-1, ISO MPEG-2, ISO MPEG-4

• H.263, H.26L, H.263+

• H.264/AVC

1.2 Scalability in image and video coding

The need of scaling is present in various situations, specially in multicast

or non live streaming where the same content must be used by different

users, each of them with different available resources. Indeed, in this cases

the characteristics of the user’s device or the available bandwidth offered

by the channel are not known in advance. For being able to cope with

these situations, it may be needed to scale the transmission bitrate, the

spatial resolution, or computational complexity.

Even if scalability is mainly used to cope with variable channel bitrate,

it can also be used for allowing different display resolutions, computing

resource capabilities, etc. Scalability techniques can be found for all types

of multimedia data, and can be provided in various ways.

In video coding several forms of scalability exist, depending on which

aspect of the decoded sequence is affected. The main types are the tem-

poral, spatial and SNR scalability. Also hybrid approaches, that combine

more than one type of scalability, exist.

Scalability is usually implemented during the compression process. In

the traditional approach, also called Layered Scalability, the encoder en-

coder generates more one bitstream are created: the more important is the

Base Layer (BL), and can be decoded independently from the others layer

generating a low resolution version of the original sequence. BL contains

critical information because it is needed for the decoding of the subsequent

3


layers. These add information to the BL, and are called Enhancement

Layers (ELs). The number of ELs depends on how many scalability layers

are desired. When the main BL is jointly decoded with one or more ELs,

a high resolution of the video is generated (in terms of quality, spatial or

temporal resolution, according to the scalable coding technique).

Temporal scalability generates different layers with increasing frame

rate. It is the most immediate form of scalability, since it can be per-

formed by dropping B-frames from the original bitstream. Another type of

scalability is based on spatial resolution and is useful in cases where the res-

olution of the display device is not known in advance. Finally, also PSNR

scalability is possible. It allows to achieve the video quality increasing the

number of layers considered.

Scalability allows to avoid encoding and maintaining different copies of

the same video at the server side, and the transmission of same data mul-

tiple times (i.e. using simulcast techniques). It is a valid alternative to the

use of simulcast transmission, since it solves the problem of transmitting

a multiple versions of the original data.

A more recent approach is the embedded coding (also known as progres-

sive scalability). Instead of using different distinct layers, embedded coding

implements scalability progressively in the same bitstream. The informa-

tion is added as the bitstream is decoded, gradually increasing resolution

to the reconstructed data.

Forms of progressive scalability are present in image, video and even

audio coding. It can achieved using wavelet transform or bit plane coded.

An introduction to progressive coding for both image and video is given in

chapter 2.

Multimedia standards address scalability in various ways. For still image

scalability, the JPEG2000 and MPEG-4 VTC standards that offer wavelet

based embedded scalability.

4

1.3. REGION OF INTEREST AND NON UNIFORM COMPRESSION

For video, some sort of scalability exist in all the most recent image and

video standards:

• MPEG-2 and H.262 implement temporal, spatial and SNR layeredscalability.

• MPEG-4 includes coding modes that allow layered scalability (tempo-ral, spatial, SNR), object scalability and progressive SNR scalability,

also known as Fine Granular Scalability (FGS).

• H.263+ implements temporal scalability using B-frames.

• H.264 implements temporal scalability using B-frames.

• MPEG-21 video, currently under development, should in future in-clude scalability features.

Only MPEG1, H.261 and H.263 video standards do not include any sort

of scalability.

A technique similar to scalability is multiple description (MD) coding.

It has been included in the H.263+ standard and allows to encode the

sequence into two equally important bitstreams. Each bitstream, when

independently decoded, generates a low resolution version of the encoded

sequence, but when decoded together generates a high definition version

of it. MD is often proposed for the transmission over error prone networks

where data can choose different paths for getting to its destination: if one

path fails, part of the data can still be received and decoded.

1.3 Region of interest and non uniform compression

Visual data represented in an image or video sequence may not be equally

important for the user. One of the reasons is because the human eye usually

focus on the part of visual information more relevant from a semantic

5


point of view. We can think for example of an anchorman speaking in a

news TV program. In this case the user is more interested on the face of

the person and in particular to the mouth and eyes, rather than on the

background. Another example are environmental images, where areas with

weather intense activity, such as an hurricane’s eye, are certainly the more

important for the interpretation of the image.

Commonly a region of an image or video that contains more information

for the user is called Region of Interest (RoI). A RoI usually delimits an

area that contain information necessary for the user to correctly interpret

the visual data. Often rectangular RoIs are preferred because are more easy

to encode, but it can be of any shape. Practically, RoI’s shape is limited by

coding algorithm characteristics. In the same image or video, more than

one RoI can exist and they can have various degrees of importance, and

the RoI can be segmented into sub-RoIs if some areas are more important

than others.

Since some regions are more visually important than other, loss and

inaccuracy during coding and transmission are more tolerated outside RoI.

This effect can be obtained through non-uniform lossy compression, tech-

nique used in video/image compression to reserve more coding resources

for the RoI, allowing a worst quality to the background. The idea is to

obtain a non-uniform quality in the image through a non uniform com-

pression, reducing in such a way the amount of data to be transmitted, or

improving the perceived quality of the data.

Currently, there exist various techniques in in video/image coding that

allow RoI definition and they were introduced in the most recent standards.

JPEG2000 was the first standard to introduce RoI definition. It is a recent

image compression standard based on Discrete Wavelet Transform (DWT).

In JPEG2000, for example, it is possible to specify a RoI during the coding

phase in order to implement non-uniform compression. Another possibility

6

1.4. TRANSMISSION OF MULTIMEDIA BITSTREAMS

is to specify the RoI during the decoding phase, allowing the user to specify

the RoI a posteriori. This feature, in combination with the communication

protocol JPIP, it allows a selective retrieval of the image and allows to

introduce the use of RoI also as a scalability tool.

RoI can be implemented also in video coding, for example through the

object concept in MPEG4 video coding standard or through scalability

tools like the selective enhancement in MPEG-4 FGS.

In chapters 4 and 5 examples of the use of RoI and non uniform com-

pression for transmission are presented.

1.4 Transmission of multimedia bitstreams

As far as the channel is concerned, video transmission requires a stable and

robust channel, a high bandwidth (even compressed the amount of data is

still significant), and little delay and jitter.

Compressed multimedia data, and in particular video, is highly sensitive

to transmission error. In compressed sequences, temporal and spatial pre-

dictive coding allows error propagation, and VBR coding generates peaks

of data where the encoder finds it hard to compress (for example in the

presence of rapid movement in the sequence of complex scenes).

Real-time video transmission, or streaming, is quite different from file

transfer for different reasons. In file transfer, a file containing a certain

amount of data must to be transmitted over the network integrally because

only when the file is completely transmitted it can be used. If only a

minimal a part of the data file is missing or damaged, the entire file is

compromised. In video streaming, the user can start decoding and viewing

the data before all the encoded sequence is transmitted, or even, in the case

of live streaming, encoded. This approach to data transmission impose

more tight constraints concerning transfer rate, delay and error resilience.

7


On the contrary of normal data, video data (as well as other multimedia

data) can still be used even if some data is lost or missing. Not always

transmission losses compromise the entire sequence and the eye can still

compensate and tolerate the some errors. Moreover, the introduction of

error control tools that allow the resynchronization, error recovery and

concealment, help to easy the task.

Various error control tools exist that allow to cope with channel fad-

ing, packet losses, transmission errors. These can be implemented at the

encoder or at the decoder side. The so called error resilient encoding falls

into the first case. These techniques can follow different approaches, and

include techniques that add redundancy to the bitstream, allow resynchro-

nization, or divide the data in independent decodable sections.

For example, they can be based on the spatial position of the errors,

trying to isolate them in a limited portion of the image or of the bitstream.

This can be achieved with resynchronization markers or data partitioning.

In data partitioning, data is organized in the bitstream so that important

data is grouped together and isolated. Other techniques are based on

temporal characteristics of the encoded sequence, and involves the insertion

of intra coded blocks or frames, at random or piloted by a criteria, such as

minimum distortion. Tools for error resilient encoding are present in the

most recent standards, such as in H.263, H.264 and MPEG4.

At the decoding side it always possible to use error detection and con-

cealment techniques for recovering from transmission errors [72]. The aim

of error concealment is to exploit the knowledge of human visual system

and common properties of visual data for reconstructing the missing bits

for reducing as much as possible the perceived effects of loses. Conceal-

ment algorithms have to mediate between computational complexity and

effectiveness, since video have strict timing constraints and do not tolerate

delays. Visual standards do not define how to conceal transmission er-

8

1.5. UNEQUAL ERROR PROTECTION

rors, but give the decoder designer the freedom to choose an concealment

approach appropriate to the system resources and requirements.

Finally, it’s always possible to use mixed techniques that involve both

encoder and decoder, for example including some sort of interactivity be-

tween the both, based on feedback messages on received or lost data. Tech-

niques that require exchange of control messages between encoder and de-

coder usually are suitable to point-to-point transmission, but not always

to point-to-multipoint scenarios.

In [32], the authors present a review of several channel-adaptive video

streaming techniques that, employed in different components of the system,

are allows to provide efficient, robust, scalable and low-latency streaming

video.

For a review of the technical challenges of video streaming and ap-

proaches how to solve the discussed problems in given in [46], while Zhang

et alt., in [74], give a good overview of challenges and approaches in trans-

porting Real-Time Video over the Internet.

1.5 Unequal Error Protection

Unequal Error Protection (UEP) of the bitstreams is implemented when

different error resilience strategies are used for protecting different parts of

data of the same multimedia file.

UEP approaches can be implemented using different techniques. Typ-

ically they consider characteristics of the encoded video for deciding the

protection strategy to be adopted. This, because in multimedia bitstreams

data is not equally important. It can be a good idea, then to protect more

the data that is more important for the decoding process, or that allows to

minimize distortion. From combining UEP with encoding strategies, such

as data partitioning or scalable coding, different solutions can be found.

9


If we consider for example video coding, I-frame reception is critical,

because of the prediction coding techniques that cause error propagation,

while the loss of a B-frame creates an isolated error, not visible in the

subsequent frames. An UEP strategy can be defined differentiating the

transmission of I- and P-frames from B-frames, for example adding variable

error correction codes to the transmitted data. Data partition can be used

combined with UEP for protecting important data, such as motion vectors

information, more heavily. It also possible to apply UEP to layered scalable

bitstreams, using techniques that differentiates the protection applied to

different layers [35]. Another possibility is to use RoI based encoding and

UEP.

1.6 Joint source and channel coding

Even if encoders include a great number of tools, including error resilience

ones, the main task of source coding is of reducing the bit size of the data,

using techniques eliminating spatial and, for video sequences, temporal

redundancy. On the other side, channel coding introduce redundancy to

protect data from channel errors and packet losses. Forward Error Correc-

tion codes (FEC) are mainly used for this aim, and between them the most

popular are the Reed-Salomon codes. They allow to correct up to a cer-

tain number of errors within a block of bits. When the data transmission

is packet based, problem is to cope with packet losses and FEC is usually

applied across packets.

Shannon’s source and channel coding theorem ([Ref] C. E. Shannon,

A mathematical theory of communication, Bell System Technical Journal,

vol. 27, pp. 379-423, 623-656, 1948.) states that, under certain conditions,

in a communications system source and channel coding can be optimized

independently. This important theorem is the foundation of design of many

10

1.7. SCOPE AND MAIN CONTRIBUTIONS

communications system, however the hypothesis needed for its validity can

become very restrictive for recent communication systems and specially for

video transmission. Indeed first the theorem assumes that it is possible to

use codewords of infinite length. This implies that we should allow infinite

delay in transmission, a restrictive hypothesis for real-time transmission.

A second requirement is to consider only point-to-point transmissions.

Methods that jointly consider the source and channel coding can be

used when Shannon’s theorem is not valid. Instead of applying source and

channel coding as two independent steps they are considered and optimized

together, in order to better exploit the knowledge arising from the coding

process during the transmission. Commonly known as joint source and

channel coding (JSCC), it is usually implemented at the application layer.

A great number of JSCC have been studied for both image and video

transmission, in chapter 3 an introduction of the use of these techniques in

wireless transmission in given.

1.7 Scope and main contributions

In this thesis we present some approaches used for embedded image and

video transmission over wireless.

An introduction to progressive coding is given in 2, while in chapter 3

existing techniques for progressive coding transmission over wireless net-

works are presented.

In chapter 4 we introduce the use of non uniform compression for image

transmission, while the use of RoI and embedded coding for interactive

transmission of video is discussed in chapter 5.

In chapter 6 we introduce an general approach to unequal error protec-

tion of embedded bitstream based on energy management. The method

found, allows to optimize the energy distribution among the packets, in

11


order to minimize the distortion or the energy consumption, while in chap-

ter 7 the efficiency of the method is compared for different modulation

schemes, and with or without channel coding.

12

Chapter 2

Progressive coding in video and

image compression

Multimedia data can be coded with coding techniques that generate em-

bedded bitstreams. Scalability allows encoding a video sequence in such a

way that the compressed video can accommodate different bitrates. The

progressive coding approach differs from traditional layered methods be-

cause the set of possible rates varies in a nearly continuous way.

The progressive scalability main characteristic is its capability to achieve

a smooth transition between different bit rates since the enhancement layer

frame information can be efficiently truncated at any point in order to

achieve the desired target and still be decoded correctly.

Indeed, in pure embedded bitstreams there are no distinct layers, as it

happens traditional layered coding. Indeed, in the traditional approach,

scalability is achieved by coding the data into different separate coding

layers, starting from the Base Layer (BL), which contains essential infor-

mation, and then by generating one or more Enhancement Layers (ELs)

with additional data. In progressive coding, scalability is achieved though

the direct truncation of the main bitstream. This approach differs from

traditional layered methods for video scalability because of its capability

to achieve a smooth transition between different bit rates.

13

CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION

When decoded, these bitstreams progressively add resolution data to

the recovered image or sequence. During the decoding, the process can be

interrupted at any point, and the data decoded up to that point can be

interpreted as a low resolution version of the fully decoded data.

Progressive scalability allows to gradually obtain the improvement of

the decoded video, making it useful in applications like video browsing or

remote access to video servers, in particular when dealing with narrowband

data channels (like mobile applications).

This encoding method can be employed with success in the field of

video communication, allowing real-time stream processing able to adapt

the bitstream to the channel bandwidth. In the context of rate control,

progressively coded bitstreams can be used for obtaining fine granular data

representations at lower bitrates, since these bitstreams have the property

of allowing different spatial/quality resolutions depending on the amount

of data being transmitted and decoded.

Progressive coding inherently allows also complexity scalability and easy

resource adaptation depending on the capabilities of video devices. From

the transmitter’s point of view, this means that the same bitstream can

accommodate the different bitrates needed for sending video data to users

on networks with heterogeneous capacity. The receiver can decide to de-

code only the amount of data supported by its own resources (i.e. memory,

computation power etc.).

The most popular progressive coding implementations are based on

wavelet transforms and/or bitplane coding. These techniques enable the

progressive coding of image, video and even audio data. For image cod-

ing, wavelet-based coding techniques, like those used in SPHIT [56] and

EBCOT [62] can be used. These techniques differ on how the compression

is achieved, but all of them can generate progressively coded bitstreams.

In particular, wavelets were exploited by the newest image compression

14

2.1. PROGRESSIVE SCALABILITY IN JPEG2000 IMAGE CODING STANDARD

standard, JPEG2000 [4][64] which is based on the EBCOT paradigm and

not only delivers a state-of-the-art compression performance, but also is

flexible to accommodate tools for the implementation of region of interest

(RoI), perception-based quality optimization, and quality layers.

Also in video compression wavelets can be used: 3-D wavelet coding

schemes, such as 3-D SPHIT [39], can be used for obtaining embedded

bitstreams of video data. These techniques group together a sequence of

frames and apply to them the 3-D wavelet transform, eventually allowing

both temporal and quality scalability.

Another important approach is represented by the Fine Granular Scal-

ability (FGS), since it was recently included in the streaming profile of

MPEG-4 standard, Part 2 [45][5].

The most recent video standard, H.264/AVC, implements only tempo-

ral scalability, using B-frames, but a special committee (known as SVC,

Scalable Video Coding) is evaluating the possibility of inserting progressive

coding.

Finally, also in audio coding it is possible to implement progressive

coding [43][47], and progressive coding techniques have been added also to

the MPEG-4 Audio standard [3].

2.1 Progressive scalability in JPEG2000 image cod-

ing standard

For creating the embedded bitstream, the JPEG2000 baseline compression

scheme [64] starts from a partitioning of the image into rectangular regions

called tiles, to each of which a discrete wavelet transform (DWT) is applied.

The DWT generates several wavelet sub-bands, which are divided for cod-

ing purposes into several smaller blocks called codeblocks. Each codeblock

is then independently quantized and bitplane encoded, thus achieving an

15


embedded bitstream at codeblock level. Codeblocks are then grouped to-

gether to form precincts.

From the error resilience point of view, when the decoder detects an

error in the codeblock data, it typically discards all the successive data

related to this codeblock. This produces a decoded codeblock equivalent

to the one generated by an encoder using a coarser quantization parameter.

As far as quality scalability properties are concerned, JPEG2000 creates,

at the encoding time, a certain number of Quality Layers (QLs). They are

formed in such a way to accommodate different coding rates and qualities in

the same bitstream. The user can decide how many QLs to implement and

the coding rates they must achieve. Each QL progressively accommodates

a given number of bits from each precinct. The contributions from each

precinct are chosen by the encoder in such a way to minimize the distortion

at the target rate. Each quality layer progressively reduces the distortion

of the decoded image in an optimal way in the rate-distortion sense. If

the number of layers is large enough, the distortion associated with the

bitstream truncated at a random point will be close to the optimal one. In

general, a layer is completely decodable only if all the precedent layers are

received; the first layer is then fundamental for the decoding of the entire

bitstream, and the importance of the layer decreases as we go from lower

to higher layers.

Rate-distortion statistics for each QL are generated by the encoder

during coding time. If the number of QL is sufficiently dense, the rate-

distortion curve of the original image can be constructed based on the

statistics obtained for each QL.

2.1.1 Rate-Distortion information

Most unequal error protection algorithms require the operational distortion-

rate curve of the source coder of the original images. A general R-D model,

16

2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD

valid for progressively coded images (jpeg2000) is shown in [14]. The au-

thors propose the use of parametric models instead of the true D/R curves

for wavelet-based embedded image and video coders. This model is also

used in [61].

In JPEG2000, rate-distortion data can also be collected during the en-

coding phase.

2.2 Progressive scalability in MPEG-4 video coding

standard

MPEG-4 FGS (Fine Granular Scalability) is a video coding approach that

allows introducing quality scalability to the encoded video. It uses a mixed

implementation of layered scalability and bit-plane coding for obtaining

two bitstreams, commonly called Base and Enhancement Layers.

The Base Layer (BL) contains essential information about the sequence

and can be decoded independently from the Enhancement Layer (EL),

producing a low quality reconstruction of the video sequence. A higher

quality reconstruction can be then achieved by decoding both the Base

and Enhancement Layers together. Since the EL is progressively coded, it

can be used to gradually add information and detail to the BL.

Due to its structure, the EL can be truncated at any point and is still

used to add information to the decoded BL. The FGS’s inherent scalability

and flexibility, also allows complexity scalability and easy resource adapta-

tion depending on the capabilities of video devices. Thus FGS is suitable

for video conferencing and video multicast. An interesting overview of

applications enabled by FGS technology is given in [67].

The Part 10 of the MPEG-4 standard includes FGS encoding and a hy-

brid method that combines FGS with temporal scalability (called also FSG-

T). Advanced MPEG-4 FGS tools are Selective Enhancement, Frequency

17


Weighting and Synchronization Markers for improving error resilience.

In FGS the base layer (BL) behaves as a normal compressed bitstream

(like MPEG-4 Simple Profile), while the difference between the encoded/decoded

video sequence and original video sequence is encoded in the Enhancement

Layer (EL) (Fig.2.1).

DCT Q

Q-1

IDCT

MotionCompensation

MotionEstimation

FrameMemory

VLCOriginal Sequence

Base LayerBitstream

Bit-planeShift

FindMaximum

Bit-planeVLC Enhancement Layer

Bitstream

FGS Enhancement Layer Encoding

Clipping

DCT

Figure 2.1: FGS encoder block schema.

Progressive decoding is achieved by a bit-plane coding of the DCT of

the residual image: the frame data is transmitted starting from the most

significant bit-plane (MSB) to the last one (LSB). DCT is performed in

a block basis, as for the base layer, but is bit-plane coded after zig-zag

scanning of the coefficients (Fig.2.2)

Data of each bit-plane (BP) is grouped in macroblock basis, and sent

one MB at time, starting from the upper left corner.

Usually the MSBP is the BP with smaller size, and BP size increases as

it goes from MSB to LSB (Fig.6.5).

18


Figure 2.2: Bit-plane encoding of the Enhancement Layer.

Figure 2.3: FGS bitstream.

Figure 2.4: FGS bitplane truncation.

19


It is likely that the first bit plane mostly contains information from

those MBs that the BL found more difficult to compress, such as those

containing movement. The first BP will contain also a great number of

zeros, due to small error in the others MBs.

Not all the data in the EL have the same importance: data from the

most significant bit planes are necessary for the decoding of the following

ones, and this data carries also more information, with respect to the last

bit planes.

Due to its structure, the EL can be truncated at any point and still be

decoded. Truncation can happen at any point of the EL of a frame: it

can happen, then, that half a frame is coded with a better quality then

the rest of it, since the EL is truncated at the middle of the BP for that

frame. This is more likely to happen in the LSBs, since MSB contains a lot

of zeros and implicitly gives more importance to certain MB with respect

others giving a sort of quality priority.

FGS allows performing rate control on pre-encoded sequences simply

truncating the EL in a such a way to satisfy the bit budget required.

While the Base Layer is compressed setting a maximum bit rate RB,

such that it can always be transmitted over the channel, the Enhancement

Layer can be cut in such a way that the FGS coded video can be trans-

mitted at any bit rate greater than RB (and minor of a certain RE, that

depends on the number of BP used in the EL), fully utilizing the band-

width available at the transmission time. In this way it is possible to adapt

the coded video to the time-varying condition of the channel.

FGS can also be used in multicast environments, allowing the transmis-

sion of the same compressed video to users with different requirements.

20


2.2.1 FGS decoder simplification using post-clipping

In Fine Granular Scalability, the residual image from which the enhance-

ment layer is created can be computed using a pre-clipping or post clipping

structure [54].

In a pre-clipping structure the residue is computed directly in DCT

domain from the difference from the original DCT coefficients and the

quantized ones (obtained during the BL coding). In this case EL, during

decoding, depends on intermediate results of the BL decoder. In streaming,

VOLs can arrive at different times and cross dependencies between BL

and EL decoders (such as use of intermediate data) may restrict decoder

implementation options.

For decoupling EL from BL decoding, a post-clipping approach must be

used. In this case the residue is calculated from the difference of the de-

coded BL VOP and the original one, not using in this way any intermediate

information.

In the MPEG-4, the FGS is implemented using a post-clipping coding

scheme [54] because it presents implementation advantages. Indeed, in this

kind of schema, the base and enhancement layers are de-coupled, and the

residue can be computed directly in the spatial domain. Various decoder

implementation are possible: it can be implemented as a sequential de-

coder, using the same hardware that operates both on BL and EL VOP;

as a parallel decoder, using dual hardware implementation that operates

on different VOPs (for base/enhancement); or as pipelined decoding.

2.2.2 FGS Advanced Features

Advanced features in FGS help to improve visual quality, usability, and

error resilience.

21


Fine-granular temporal scalability

Fine-granular temporal scalability (FGST) is a hybrid SNR-temporal scala-

bility and allows to trade off between individual frame quality and temporal

resolution.

FGST can be implemented with low added complexity, allowing to ob-

tain a lower transmission bitrate both slowing down the sequence frame

rate and reducing the quality in terms of PSNR.

In MPEG-4 FGST is implemented in two modes: as a single layer scal-

ability structure, referred as FGST, and as two layers structure, referred

as FGS-FGST.

Selective Enhancement

Selective Enhancement (SE) is implemented at a frame level and allows to

arrange the bit-plane coding order based on region selection. The region

of interest considered (RoI) can be arbitrary shaped, and have as shape

unit a macroblock. This method allows to transmit more bit-planes from

the RoI macroblocks.

SE is a tool for encoder optimization and can be used for a number of

operations such as region-based quality adjustment or object tracing, and

can be combined with frequency weighting operations (see next paragraph).

Automated RoI selection combined with SE can be implemented in vari-

ous ways. An example of how it is possible to use the SE tool for improving

the encoding perceived quality of a video conference stream is given in [68].

In this work, the SE is used combined with a real-time face detection algo-

rithm and improving in this way the subjective visual quality of streaming

video under various transmission bit-rates.

A more complex example is given in [36], where the an automated coding

mode selection is implemented. The encoder, based on the video content

22


and the current available bandwidth, selects between the available coding

schemes (FGS, FGST, FGS-SE, and FGST) in order to achieve higher

perceptual video quality. SE and background selection is based on the

contents of the video sequences.

Even if in the successor of MPEG-4, H.264, no FGS and SE are imple-

mented, it is also interesting to see how H.264 features are used in order to

automatically select the visually important regions to be used as SE regions

in a non standard H.264 FGS implementation [66]. The method requires

low computational complexity and can be used in real time transmissions.

Frequency Weighting

Frequency weighting (FW) method has been included in the MPEG-4 stan-

dard to allow the prioritized transmission of low frequency DCT coeffi-

cients. When different frequencies are treated equally and the precision is

limited by FGS bit-plane truncation, in certain sequences some flickering

artifacts can occur. This happens when high-frequency residues are added

to a low quality blocky BL.

Using the FW tool, DCT frequencies can be weighted on the basis of

different psychovisual importance. The approach is used for giving more

precision for low frequency and it is similar to the use of customized quanti-

zation matrices in the BL. For applying FW correctly, separate weighting

matrices must be used for I-frames and P-frames to cope with different

statistics since the first is applied to the residue of quantization, and the

latter to the residue of motion estimation.

An example of application of FW can be found in [53] where the authors

use different FW matrices in order to improve the FGS visual quality.

The FW matrice is chosen automatically, depending on the video sequence

characteristics and succeeds in improving the visual quality of the decoded

sequence.

23


Error Resilience

Standing that no temporal error propagation can occur in the FGS EL,

since no prediction scheme is implemented in this layer, the more significant

problem is maintaining the synchronization and being able to decode as

much as possible of sequential data.

In FGS data partition is used as an error resilience tool in order to

improve the quality of transmission of EL over error prone channels. Re-

synchronization markers are used considering bit-plane relations.

The error resilience of FGS video streaming is studied in [70][78]. An

improvement attempt is described in [77] where an Header Extension Code

(HEC) is proposed.

2.2.3 Rate-Distortion model of the FGS bitstream

It is possible to define a rate-distortion (R-D) model of the EL based on the

statistics collected during the encoding. The rate distortion model utilized

can be derived either from empirical considerations or from analytical cal-

culations. An interesting analysis of the FGS EL layer is given by Loguinov

and Radha in [26][24], where also a distortion model is defined.

Depending on the application, a simple R-D curve obtained from the

R-D data measured at each bitplane can be used. Indeed, experimental

measurements show that the R-D curve for the FGS EL is approximately

linear within a bitplane [82][81]. This is reasonable if we consider that

inside a bitplane the distortion is improved gradually by adding bitplane

information one MB at time, and if we consider that the statistics proper-

ties of a bitplane is constant within the bitplane itself. From the R-D data

measured for each BP, we can then obtain a good approximation of the

R-D curve (Fig.2.5). We recall that the R-D data can be easily calculated

also in the frequency domain, as highlighted in [25].

24


0 0.5 1 1.5 2 2.5 3 3.5 4

x 104

0

20

40

60

80

100

120

Rate (bits)

Dis

tort

ion

(MS

E)

R−D modelMeasured dataMeasured data at each BP

BL

BP 1

BP 2

BP 3

BP 4

Figure 2.5: Comparison between the measured data and the R-D curve calculated from

the BP data.

25


Cuetos et alt. collected interesting FGS statistics data in their work [27],

where they present a publicly available library of frame size and quality

traces of long MPEG-4 FGS encoded videos.

2.3 Applications of progressive coding

Progressive coding provide easy bandwidth adaptability because it allows

to separate the encoding process from the transmission. Indeed encoder do

not need to know the bitrate at which the bitstream will be transmitted.

Moreover, at the receiver side, if necessary it is possible to decode only part

of the transmitted data, according to its own computational capabilities.

The same transmitted bitstream can be used by different user or appliances,

according to their own needs and resources.

Progressive coding can be used as a pre-encoded video rate control tool.

Indeed, in order to cope with bandwidth variations, often present in wire-

less links, some sort of rate control must be adopted. Traditionally in

real-time non-scalable coding and transmission, data bitrate is adapted on

the fly to the available bandwidth during coding, in order to adapt the

bitstream to the changing conditions of the channel. If the data is already

compressed, this approach is not possible and transcoding techniques may

be adopted. These methods allow to create a lower bitrate version of the

video data directly from the compressed bitstream, without going through

a computationally intense compression-decompression process. An alter-

native to this approach is switching during the transmission time between

pre-encoded bitstreams, using simulcast techniques.

In this context, scalable bitstreams can represent a good solution to rate

control of pre-encoded sequences since this technique allows to encode once

but at different bitrates. Traditional layered encoded bitstreams permit to

transmit the data at different, but fixed, bitrates. A further enhancement

26

2.3. APPLICATIONS OF PROGRESSIVE CODING

to this approach, is is given by the use of embedded bitstreams for perform-

ing a fine granular rate-control. In [52] Radha and Parthasarathy present

two optimum (in a rate-distortion sense) rate-control algorithms for FGS

scalable video transmission.

An interesting review of FGS applications is given by Schaar et alt. in

[67]. The paper refers to FGS coding, but the considerations made in it

can be applied as well to other scalable coding techniques.

27


28

Chapter 3

Wireless video and embedded

bitstreams transmission

Wireless transmission is becoming popular both in home and office context,

and the number and variety of emerging application is growing. Residential

WLAN are growing in popularity, and wireless hot spots can be found in

major airports, hotel chains and conference rooms.

The term Wireless Transmission is commonly related to frameworks re-

lated both mobile/cellular services and wireless data networks. WLAN’s

standards include IEEE 802.11 HIPERLAN, Bluetooth, NMAC etc. Mo-

bile transmission include GPRS, 3G, UMTS, EDGE and next 4G technolo-

gies. It is realistic to think that, in the future, these two frameworks will

merge together.

As Girod and Fäber highlight in [31], the challenges of transmission

of multimedia data transmission go well beyond the problems related to

the poor bandwidth. In wireless based networks, errors during transmis-

sion cannot be avoided, due its own nature, even when error correction

techniques are implemented. The authors argue that the only practicable

solution is to achieve a compromise between reliability, throughput and

delay. The authors focus their work on cellular networks, but most of

the presented strategies can be applied to the transmission of video over

29

CHAPTER 3. WIRELESS VIDEO AND EMBEDDED BITSTREAMS TRANSMISSION

WLANs.

A review of error concealment strategies for fine granularity scalable

(FGS) video transmission is given by [11].

3.1 Joint source and channel coding in wireless trans-

mission

Various approaches of JSCC applied for wireless communications links exist

in literature, a general approach is presented in [9] by Appadwedula et alt.

The authors propose a JSCC schema based on a parametric distortion

model. An advantage of the method is that it can be applied to most

classes of source and channel coders, making it possible to obtain nearly

all of the benefits of joint source-channel optimization by matching existing

source and channel coding standards using a simple and general approach.

3.2 Unequal error protection in progressive and scal-

able bitstreams

The application of UEP to the transmission of progressive coded images

and video is not new. It has been implemented in various fashions, and

some UEP techniques designed for other contexts can be adapted to the

transmission of embedded bitstreams.

As highlighted before, in embedded bitstreams, the data is implicitly

sorted by its importance, and this characteristic can be used for the im-

plementation of error resilience techniques based on the UEP. Traditional

equal error protection (EEP) schemes consider all the data as having the

same importance and assign the same amount of protection to the whole

bitstream. On the other side, UEP schemes give more importance, hence

more protection, to the most critical parts of the coded image.

30

3.3. CROSS-LAYER APPROACHES

Reed Salomon codes were used by Natu and Taubman [48] for the pro-

tection of JPEG2000 bitstreams during transmission over wireless channels.

In [75] the channel coding is used for implementing UEP of a JPEG2000

bitstream.

For video transmission, the application of UEP within the EL FGS

bitstream was first considered by Schaar et alt. in [70], where the frame-

grained loss protection (FGLP) framework was introduced. Based on it,

Yang et alt. proposed in [78] a “degressive” protection algorithm (DEP)

based on FEC for optimal assignment of protection redundancy among

bit-planes. In [71], Wang et alt. studied the problem of rate-distortion

optimized UEP for Progressive FGS (PFGS) over wireless channels using

prioritized FEC for the BL and EL. A similar problem was studied in [79]

in which the objective was to minimize the processing power for PFGS

video given bandwidth and distortion constraints.

3.3 Cross-layer approaches

Each network layer (physical, link and application), are able to individu-

ally apply error protection schemes, that are independent from each other.

This behavior is implicit in the layering paradigm commonly used in the

networks’ structure definition. Of course, independent strategies do not

provide the overall optimal solution, since they ignore each other and do

not create useful synergies.

The idea behind cross-layer approaches is to jointly consider the error

protection strategies at various layers, in order to improve the transmission

efficiency in terms of protection, bandwidth and resource consumption.

These techniques do not necessary involves JSCC, and are aware of the

hole system.

Usually cross-layers approaches use optimization strategies in order to

31


minimize the overall resource utilization, or video distortion, and parame-

terize the characteristics of the different network layers. The adaptation

parameters that can be considered in a cross-layer scheme, can be found

at any layer:

• physical layer: transmission power, antenna characteristics, modula-tion and equalization schema;

• link layer: frame size, error correction coding strategy, ARQ, admis-sion control and scheduling, packetization

• transport and network layer: signaling and packetization

• application layer: compression strategy, error concealment, rate con-trol, error correction codes, ARQ, scheduling, packetization

The number of parameters involved in this process influence the overall

complexity of the optimization solution, and should be limited if a feasible

approach is wanted.

For complex problems, closed solutions are difficult to achieve, and dy-

namic programming is often used for finding solutions.

Resource optimization can involve one or more aspects of the trans-

mission, and can aim to optimize the consumption of a single resource

(for example the transmission energy or the bandwidth) or result (overall

distortion, video quality).

Shakkottai et alt. give in [60] an interesting overview of the issues

related to cross-layer design for wireless networks.

Cross-layer approaches to embedded bitstream transmission have been

studied. In [80] an hybrid UEP and ARQ scheme is used for the trans-

mission of scalable video over wireless. FEC and ARQ are also used for

transmission of FGS streams over 802.11 channels in [76] and [41]. The ap-

proach considers the characteristics of IEEE 802.11 WLANs for evaluating

32

3.4. JOINT SOURCE CODING AND POWER CONTROL

the transmission parameters.

In [33] a cross-layer optimization of OFDM transmission systems for

MPEG-4 video streaming is presented. In [17] Radha and Cohen presents

an efficient method for streaming FGS video over packet-based networks.

In [44], Li and van der Schaar present several heuristic algorithms for

real-time transmission of layered video bitstreams over wireless LANs, pro-

viding and adaptive QoS through real-time retry-limit adaptation (RTRA).

In [38] Khayam et alt. propose the MAC Lite strategy as a cross-

layer protocol design for real-time multimedia applications over 802.11b

networks.

3.4 Joint source coding and power control

For wireless networks energy is an important and limited resource. In order

to optimize its consumption, source coding parameters and power control

can be jointly considered and optimized. This form of cross-layering is

commonly addressed as joint source coding and power control (JSCPC).

In [83], a joint FEC and transmission power allocation scheme for lay-

ered video transmission over a multiple user CDMA networks was pro-

posed. In the work, scalability was achieved using 3D-SPIHT (wavelet

based coding). The objective was to minimize the end-to-end distortion

through optimal bit allocation among source layers and power allocation

among different CDMA channels.

The authors in [12] considered jointly adapting the source bit rate and

the transmission power in order to maximizing the performance of a CDMA

system subject to a constraint on the equivalent bandwidth. In that work,

an H.263+ codec was used to generate the layered bitstream.

In [13] Chan consider a JSCPC approach for video transmission over

3G wireless CDMA cellular networks. In [59], Sehlstedt and Le Blanc pro-

33


pose the use of alternate metrics to dynamically fine tune the performance

optimization and to use a dynamically adjustable bit-energy distribution.

For a progressive coded video, the position of the first bit error within

a frame is of more importance than the overall bit error probability. In

[30], Fossorier, Xiong and Zeger derive the optimal channel code rate and

the optimal energy allocation per transmitted bit for the transmission of a

progressively, numerically optimizing the choice of channel code rate and

the energy per bit allocation.

In [15] the authors propose an energy-aware MPEG-4 FGS video stream-

ing system with client feedback.

3.5 Modulation based UEP

An interesting approach to cross-layer and UEP, involves the usage of mul-

tiple modulation channels.

In [10], Atzori propose an approach for robust transmission of JPEG2000

images over wireless networks using a wavelet transmultiplexer. In [69],

Schaar and Meeha propose to use an adaptive modulation scheme in com-

bination with FGS coded video, in order to obtain an UEP based video

transmission over wireless. The approach, termed Adaptive Modulated

FGS (AM-FGS), is able to cope with channel bandwidth variations and

degradation exploiting the FGS structure and tailor the modulation scheme

to the channel conditions and data characteristics.

34

Chapter 4

Non uniform compression in image

and video transmission

In wireless channels bandwidth is limited and must be used wisely. Non

uniform compression using RoI is an interesting option in transmission

because it allows to distribute bits according to data importance achieving

a better perceived quality.

In this chapter, we consider the transmission of still images affected

by geometrical distortion. For their transmission, the effects of different

lossy compression strategies are analysed. Indeed, in this specific case, the

encoding-decoding process and the geometric correction, together generate

a non-homogeneous image degradation, since different amount of infor-

mation associated to each resulting pixel. A distortion measure named

Quadtree Distortion Map (QDM) able to quantify this distortion is de-

scribed in the following chapter. In in order to ensure a uniform quality on

the final image and to QDM exploited during compression. The resulting

method is able to reduce the total size of compressed geometrically dis-

torted pictures. From tests performed using JPEG and JPEG2000 coding

standards it is shown that it is possible to improve both the measured and

the perceived quality of the transmitted image.

In the next section some background on nonuniform compression of geo-

35

CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION

metrically distorted images is given and the effects of non-linear geometric

distortions on co-decoded images are considered. A detailed description of

the concept of Quadtree Distortion Map (QDM) is given in 4.2. In section

4.3, it is shown how QDM can be used to design an adaptive image com-

pressor able to achieve a uniform error distribution over the decompressed

and de-warped image. It is also shown how this approach can be applied to

the standard JPEG and JPEG2000 image compression algorithms, while

maintaining full compliance only in the latter case. A selection of quanti-

tative results is provided to demonstrate the viability and effectiveness of

the proposed approach is provided in section 4.4. 1

4.1 Nonuniform compression of geometrically distorted

images

Images acquired by optical sensors usually present some kind of geomet-

rical distortion due to the characteristics of lenses and sensors adopted

in the acquisition system, or to the physical structure of the object un-

der inspection, such as in the case of textures projected onto non-planar

surfaces [29]. In specific applications, such effects may also become more

significant, due to the specific nature of the acquisition system. This is

the case for instance of acquisition systems used in video surveillance or

ambient intelligence applications, where wide-angle lenses are commonly

used to acquire large areas with a single camera. In particular, fish-eye

lenses and panoramic lenses using omni-directional mirrors are adopted to

grab large portions of narrow indoor environments (a room, a car inside,

etc.) [[58]-[58]]. Another application that strongly suffers from geometrical

distortion is remote sensing [73].

In the projection of the real-world scene onto the image plane, the geo-

1This chapter was published in [21]

36

4.1. NONUNIFORM COMPRESSION OF GEOMETRICALLY DISTORTED IMAGES

metrical distortion acts as a non linear spatial compression and expansion

of the luminance function in the pixel plane. This may cause problems

in all the successive image treatment stages, from low-level processing to

the interpretation of the scene, and can be partially solved by applying

geometrical correction techniques based on sensor models and calibration

processes. Unfortunately, the correction is only seldom operated at the

sensor level, while it usually takes place at some remotely connected unit,

where the application software is run. The geometrical correction may

then happen to be carried out after that important processing steps have

already been applied: in particular, compression and encoding of images

is often implemented on-board to attain a more efficient transmission.

Some proposals to exploit the knowledge about the acquisition process

to improve image processing have already been made, with application to

specific domains such as medical tele-radiology [23]. In [51] a generic and

very simple acquisition model is studied, where the acquisition sensor is

modeled through a modulation transfer function which simply introduces

blurring. Another related work on the topic can be also found in [57], where

the features of a retina-like sensor, associated with an omni-directional

mirror, are exploited for imaging purposes.

In this framework, we have investigated the impact of the geometrical

distortion on image compression. The aim is to limit as much as possible

the amount of encoded data in order to accommodate it to the transmission

bandwidth available. As a result, we verified that it is possible to improve

the compression performance when encoding is applied to the geometrically

distorted image.

The analysis was conducted on both distorted images produced by real

systems, and synthetic images achieved by warping algorithms that sim-

ulate common distortion effects (fish-eye and mirrored lenses). For this

reason we will use in the following the terms warping and distortion, as

37


well as the opposite terms analysis-warping and geometric correction, re-

ferring to the same concepts.

4.2 Evaluation of spatial distortion

The first goal of the work is to evaluate the impact deriving from lossy

co-decompression followed by geometric distortion correction on the final

image quality.

The underlying assumption is that the image is compressed and de-

compressed before applying any geometrical correction. This hypothesis

is reasonable in many practical systems due to several reasons, including:

necessity of ensuring a low complexity of the acquisition system, use of sen-

sors with embedded compression tools, frequent changes of optical lens or

environment preventing the use of an embedded de-warping algorithm, etc.

On the other hand, compression is increasingly used in the early stage of

the acquisition, in particular for applications where the sensor is remotely

connected to the processing unit using narrow-bandwidth channels (e.g.,

wireless cameras) or is attached to a limited-capacity local storage device.

A spatial distortion in the acquisition system introduces a non-uniform

distribution of the visual information in the acquired image. As a matter

of fact, given two image areas with equivalent frequency content in the

undistorted domain, the relevant areas in the acquired picture will show

a higher frequency content where spatial compression occurred, and vice-

versa. Conversely, the coding algorithm usually operates in a homogeneous

way over the whole image. To achieve effective data compression it must

neglect some information, especially at the higher frequencies, and have to

produce an information loss as uniform as possible over the whole image,

in order to avoid local peaks in the distortion.

Consequently, the error introduced by the encoder in an image region

38

4.2. EVALUATION OF SPATIAL DISTORTION

(a)

(b)

Figure 4.1: Graphical representation of a fish-eye distorted image: (a) before, and (b)

after de-warping.

will be proportional to the local spatial deformation. Where spatial com-

pression is present, the error will affect a larger zone in the final corrected

image, and will be more severe due to the presence of higher frequency

contents. On the other side, in areas with low information density, the

error will be attenuated by the averaging effect introduced by geometrical

correction algorithms. Figure 4.1 depicts an example of this phenomenon

related to the use of a fish-eye lens, where the above concepts can easily

find clear evidence. It can be observed that two areas of equal dimension

in the undistorted (or corrected) domain, represented in dark and light

gray in Fig.4.1.b, are associated in the distorted domain to areas contain-

ing more or fewer samples according to their spatial position and to the

geometry of the acquisition system.

In order to quantify this effect, the idea is to compare two schemes

(see Fig.4.2): in the former, labeled as “scheme A”, the acquired image is

compressed and transmitted after the geometric correction; in the latter,

“scheme B”, compression and transmission are performed prior the geo-

metric correction of the image. The distortion is measured in any case by

39


comparing the final result (decompressed, de-warped image) with the un-

compressed, de-warped image, being the real-world (undistorted) picture

unavailable in real cases.

A commonly accepted metric to estimate the distortion introduced by a

processing system is the Peak Signal-to-Noise Ratio (PSNR), which treats

the distortion as a kind of noise introduced on the original data, indepen-

dently of its origin. The noise power is estimated through the computation

of the Mean Square Error (MSE), and the signal power is computed on the

basis of the maximum excursion of the luminance function, namely:

PSNR(dB) = 10 · log1022·b

MSE(4.1)

where b is the number of bits per pixels in the original image. Usually

PSNR is calculated on the whole image, but we are interested mostly in lo-

cal measures that can highlight the non-homogeneous distortion introduced

by scheme B as compared to scheme A. For the purpose of evaluating the

local distortion introduced by the process, we propose a method, called

QDM, which uses a quadtree decomposition to generate a local map of

the distortion effects. It will be demonstrated that QDM can be useful to

evaluate the performance of compression schemes applied to geometrically

distorted images, as well as to design optimized compression schemes able

to improve the overall coding performance. It is to be pointed out that the

concept of QDM is independent of the use of PSNR as a quality measure:

QDM-based approaches can be implemented also using more sophisticated

perceptual error models at the price of an increased complexity [65].

QDM is based on the application of the well-known quadtree decom-

position algorithm [5]. The quadtree segmentation was demonstrated to

efficiently represent simple image partitions subject to rigid geometric con-

straints. In our approach the quadtree decomposition is applied to the

error image Ierr(x, y), defined as the absolute difference performed on a

40


(a) (b)

Figure 4.2: The two alternative compression and transmission schemes considered in the

estimation of the impact of geometrical distortion on compression performance: (a) co-

decoding is applied after geometrical correction, (b) vice-versa.

41


pixel-by-pixel basis between the reference image Iref(x, y) (i.e. the geo-

metrically corrected uncompressed image) and the output image (image

after co-decompression and de-warping, in either order) Idist(x, y):

∀(x, y) Ierr = |Iref(x, y) − Idist(x, y)| (4.2)

The aim is to obtain a map representing the spatial distribution of

the distortion, through local measurements of the PSNR. The areas where

PSNR is considered to be homogeneous are those identified by the leaves of

the Quadtree decomposition. The QDM algorithm is a recursive process,

and proceeds as follows:

i Compute the variance σ2 of the error image Ierr(x, y)

ii If σ2 is greater than a given threshold Σth: then ⇒ split the imageinto four sub-images, having its size along x and y directions else ⇒stop recursion

iii Recursively apply steps (i) and (ii) to each sub-image until each block

fulfills the variance condition defined at point (ii) or reaches a mini-

mum size ∆min.

The stop condition in point (iii) takes into account also a minimum

allowed dimension ∆min for each sub-image, to avoid excessive splitting: in

our tests, we used ∆min = 8, corresponding to the typical block size used

in coding standards. Σth is set equal to α · σ2A, where σ2A is the varianceof the error image in Case A, and α is a parameter in the range 1 ÷ 2taking into account the type of spatial distortion and the characteristics of

the compression algorithm. More in detail, the choice of a is connected to

the distortion introduced by the acquisition device, which largely depends

on the viewing angle. For instance, the effect of fish-eye lenses can be

approximated by a spherical transform, in which the distributed over large

42


image areas, while not reaching very high values. In this case, a low value

of a (e.g. 1.2 ÷ 1.5) is required to achieve a precise QDM map. On theother side, parabolic or conic projections typical of mirrored lenses produce

heavier distortions, thus requiring higher values of α (1.6 ÷ 2) to focuson greatly distorted areas. Consequently, it has been found that it is

possible to heuristically set α a-priori on the basis of the type of geometrical

distortion, independently of the image content. Further considerations on

the setting of α are provided in section 4.4 (tables 4.1, 4.2, 4.3 and 4.4

and relevant discussion), where the impact of the coding algorithm is also

considered.

As a consequence of the above procedure, the areas are split where the

error is more fluctuant, thus achieving a subdivision of the error image

into areas with nearly constant distortion. The result of the decomposi-

tion is a sparse matrix that indicates the block subdivision of the error

image in block of various dimensions, associated to different error values.

In figure 3, an example of QDM is shown with application to the “blood”

test image, 256x256, 8 bpp. Here, the distortion introduced by the ac-

quisition system is simulated by a polar coordinate transform (3.a), which

reproduces the behavior of a 360 mirrored lens. The error is computed be-

tween the reference image (uncompressed de-warped) in Fig. (3.c) and the

output of schemes ’A’ and ’B’, in Figs. (3.d-e), respectively. A standard

JPEG encoder with compression ratio CR = 10 was used in both cases

(the co-decoded images in warped and de-warped domains are shown in

Figs. 3.b and 3.d, respectively), while the parameter a was set to 1.4. The

compression ratio CR is defined as:

CR =Nb,oNb,c

(4.3)

where Nb,o is the number of bits required for representing the original

image in the canonical form and Nb,c the number of bits after compression.

43


Since the variance threshold Σth results higher than the error variance

in scheme A, the relevant output image does not produce any split. As

far as scheme B is concerned, the result of the splitting process is repre-

sented in Fig. (3.f). In Fig. (3.g), called QDM map, each leave of the

relevant quadtree is associated to a gray level proportional to the local

distortion (the higher the distortion, the darker the corresponding block).

The QDM map of scheme B makes evident that the compression in the

distorted domain generates an uneven distribution of the error. To better

appreciate this fact, in Fig. (3.h) the QDM map associated to scheme B is

transformed back in polar coordinates, i.e., in the original acquisition do-

main. The resulting map provides a convincing confirmation of the above

reasoning about the implications of lossy compression applied to geomet-

rically distorted images. As a matter of fact, it can be observed that the

quality degradation progressively increases toward the image center, where

the information density is higher (due to spatial compression).

It is important to point out that in the compression of natural images,

the distribution of the error can fluctuate also in the absence of geometri-

cal distortions, due to the non-stationarity of the input image and to the

characteristics and parameters of the encoder. Nevertheless, this effect can

be neglected for two reasons.

First, the image content is the same for both scheme A and B, thus

allowing a comparative assessment. The underlying assumption is that the

effects of non-stationary image contents and geometrical distortions on the

error distribution are uncorrelated and additive. This is not completely

true in general, due to the fact that a geometrical deformation can al-

ter not only the magnitude but also the orientation of spatial frequencies

(e.g., straight lines become curves when acquired by a wide-angle lens).

Therefore, due to the different treatment of the spatial frequencies at the

encoder, the distortion can have some “second-order” effects on the final

44


(a) (b)

(c) (d) (e)

(f) (g) (h)

Figure 4.3: Example of application of QDM: (a) original, uncompressed and warped by

polar transform, (b) compressed in warped domain, (c) original, uncompressed de-warped,

(d) output of scheme A , (e) output of scheme B, (f) result of split process scheme B, (g)

QDM map scheme B, (h) polar transform of QDM map scheme B Note that when the

split process is applied to the scheme A (with the same parameters used for scheme B),

there is no split at all, and the QDM map is a constant value image.

45


result. Nevertheless, these phenomena are related more to the perceptive

quality of the decompressed image than to its objective assessment, and

therefore can be neglected in QDM, which is simply based on absolute error

estimation.

Second and more important, in practical applications QDM is meant to

be performed off-line, by presenting to the system some pre-defined cali-

bration images, designed to match the application to which the acquisition

system is targeted. For instance, in a fixed camera surveillance system the

calibration set could be obtained by selecting some shots acquired in typi-

cal operating conditions, thus allowing to take into account also the local

image content. On the contrary, to achieve a general purpose system the

calibration image should have a frequency content as uniform as possible,

to ensure a uniform behavior independently of the application. Accord-

ing to this last model, in our tests we used images containing statistical

or structural textures, as in the case of the “blood” image, or synthetic

patterns obtained by patch repetition.

A further consideration about system calibration concerns the possibil-

ity of computing the distortion map a-priori, simply based on the char-

acteristics of the acquisition system. For instance, it would be possible

to determine the local compression and expansion due to the geometrical

deformation, and directly estimate the relevant impact on the compression

distortion. Unfortunately, this is not a trivial task, since the deformation

produces in general a re-sampling of the picture over an irregular sam-

pling grid, which in turn generates very different spatial frequencies (both

in magnitude and orientation). Moreover, such spurious freq

DIT - University of Trentoassets.disi.unitn.it/uploads/doctoral_school/documents/... · 2011. 2....

Documents

Transcript of DIT - University of Trentoassets.disi.unitn.it/uploads/doctoral_school/documents/... · 2011. 2....