DIT - University of Trentoassets.disi.unitn.it/uploads/doctoral_school/documents/... · 2011. 2....
Transcript of DIT - University of Trentoassets.disi.unitn.it/uploads/doctoral_school/documents/... · 2011. 2....
-
PhD Dissertation
International Doctorate School in Information and
Communication Technologies
DIT - University of Trento
Design principles for embedded
multimedia bitstreams transmission over
wireless links
Cristina E. Costa
Advisor:
Prof. Francesco G.B. De Natale
Università degli Studi di Trento
Co-Advisor:
Prof. Aggelos Katsaggelos
Northwestern University
February 22, 2005
-
Abstract
Applications supported by new wireless communications systems are evolv-
ing from voice and/or pure data transmission to multimedia. These ser-
vices are characterized by the transmission of a large amount of data and
real time constraints, which impose a significant increase in the complexity
and capability offered by transmitting devices. Wireless networks impose
great limitations to multimedia transmission, due to channel losses and
channel characteristics variability, that, combined with the limitation of
resources such as energy and computational power, lead to the research of
different approaches for multimedia data transmission.
Multimedia bitstreams have unique characteristics that can be exploited as
an advantage if taken into account during the design phase of the system.
In order to increase the transmission robustness and flexibility, new ap-
proaches for compression and coding have been studied.
Between them, progressive coding is one of the more interesting techniques
because it allows to create embedded bitstreams. This kind of bitstreams
can be used for implementing SNR scalability, since they can be truncated,
and still decoded, generating a lower quality version of the original data.
In this thesis, we investigate the use of embedded bitstreams in wireless
transmission and various approaches are proposed. In particular cross-
layering techniques are considered for implementing energy efficient coding
and transmission.
Keywords: embedded multimedia coding, wireless transmission, cross-
layer, energy efficient coding, MPEG-4 FGS, JPEG2000, region of interest.
-
Contents
1 Introduction 1
1.1 Video source coding techniques . . . . . . . . . . . . . . . 2
1.2 Scalability in image and video coding . . . . . . . . . . . . 3
1.3 Region of interest and non uniform compression . . . . . . 5
1.4 Transmission of multimedia bitstreams . . . . . . . . . . . 7
1.5 Unequal Error Protection . . . . . . . . . . . . . . . . . . 9
1.6 Joint source and channel coding . . . . . . . . . . . . . . . 10
1.7 Scope and main contributions . . . . . . . . . . . . . . . . 11
2 Progressive coding in video and image compression 13
2.1 Progressive scalability in JPEG2000 image coding standard 15
2.1.1 Rate-Distortion information . . . . . . . . . . . . . 16
2.2 Progressive scalability in MPEG-4 video coding standard . 17
2.2.1 FGS decoder simplification using post-clipping . . . 21
2.2.2 FGS Advanced Features . . . . . . . . . . . . . . . 21
2.2.3 Rate-Distortion model of the FGS bitstream . . . . 24
2.3 Applications of progressive coding . . . . . . . . . . . . . . 26
3 Wireless video and embedded bitstreams transmission 29
3.1 Joint source and channel coding in wireless transmission . 30
3.2 Unequal error protection in progressive and scalable bitstreams 30
3.3 Cross-layer approaches . . . . . . . . . . . . . . . . . . . . 31
i
-
3.4 Joint source coding and power control . . . . . . . . . . . . 33
3.5 Modulation based UEP . . . . . . . . . . . . . . . . . . . . 34
4 Non uniform compression in image and video transmission 35
4.1 Nonuniform compression of geometrically distorted images 36
4.2 Evaluation of spatial distortion . . . . . . . . . . . . . . . 38
4.3 Adaptive compression of geometrically distorted images . . 47
4.3.1 Adaptive Compression using a JPEG-like scheme and
QDM . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.3.2 Adaptive Compression using JPEG2000 and QDM 50
4.4 Experimental results . . . . . . . . . . . . . . . . . . . . . 52
4.4.1 Quality measurement . . . . . . . . . . . . . . . . . 52
4.4.2 Non Uniform Compression using JPEG . . . . . . . 56
4.4.3 Non Uniform Compression using JPEG2000 . . . . 56
4.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5 Interactive ROI selection using FGS in MPEG-4 video trans-
mission 65
5.1 The use of RoI in video browsing . . . . . . . . . . . . . . 66
5.2 The proposed approach . . . . . . . . . . . . . . . . . . . . 67
5.3 Application testbed . . . . . . . . . . . . . . . . . . . . . . 70
5.4 Experimental results . . . . . . . . . . . . . . . . . . . . . 70
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6 Energy efficient transmission 75
6.1 Distortion in progressive and scalable bitstreams . . . . . . 77
6.2 A general optimization approach to energy constrained prob-
lems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.3 Channel model . . . . . . . . . . . . . . . . . . . . . . . . 81
6.4 Application to image transmission . . . . . . . . . . . . . . 85
ii
-
6.4.1 Simulations results for jpeg2000 transmission . . . . 86
6.5 Application to video transmission . . . . . . . . . . . . . . 87
6.5.1 Simulations results for FGS MPEG-4 video trans-
mission . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.5.2 Rate-Distortion model of the FGS bitstream . . . . 97
6.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7 Study of the effects of the modulation scheme choice 101
7.1 AWGN channel model . . . . . . . . . . . . . . . . . . . . 101
7.2 Solution for AGWN channel model . . . . . . . . . . . . . 103
7.2.1 Modulation scheme comparison . . . . . . . . . . . 105
7.3 Combined use of energy based UEP and channel coding . . 107
7.4 Solution with RS coding . . . . . . . . . . . . . . . . . . . 112
7.5 Modulation comparison with error correcting codes . . . . 116
7.5.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . 120
8 Conclusions 121
Bibliography 123
A Detailed procedure 135
A.1 Defining the channel model . . . . . . . . . . . . . . . . . 138
A.2 Dual problem . . . . . . . . . . . . . . . . . . . . . . . . . 140
iii
-
List of Tables
4.1 QDM statistics for JPEG encoder without adaptation, CR=10 54
4.2 QDM statistics for JPEG encoder without adaptation, CR=20 54
4.3 QDM statistics for JPEG2000 encoder without adaptation,
CR=10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.4 QDM statistics for JPEG2000 encoder without adaptation,
CR=20 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
6.1 Image PSNR. . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 General parameter settings. . . . . . . . . . . . . . . . . . 92
6.3 Parameter settings for the three experiments. . . . . . . . 92
7.1 Parameters a, α, and the spectral efficiency rb/BT for dif-
ferent modulations. . . . . . . . . . . . . . . . . . . . . . . 102
v
-
List of Figures
2.1 FGS encoder block schema. . . . . . . . . . . . . . . . . . 18
2.2 Bit-plane encoding of the Enhancement Layer. . . . . . . . 19
2.3 FGS bitstream. . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 FGS bitplane truncation. . . . . . . . . . . . . . . . . . . 19
2.5 Comparison between the measured data and the R-D curve
calculated from the BP data. . . . . . . . . . . . . . . . . . 25
4.1 Graphical representation of a fish-eye distorted image . . . 39
4.2 Compression and transmission schemes considered . . . . . 41
4.3 Example of application of QDM . . . . . . . . . . . . . . . 45
4.4 Conceptual scheme of the proposed approach . . . . . . . . 48
4.5 QDM maps of a test image: (a) Original achieved by patch
repetition of Baboon image; (b) QDM map for Semi-spherical
Mirror; (c) Parabolic Mirror . . . . . . . . . . . . . . . . . 53
4.6 Identification of a RoI from the QDM of a distorted image 56
4.7 Comparison between compression schemes at increasing com-
pression ratio for JPEG . . . . . . . . . . . . . . . . . . . 57
4.8 Comparison between compression schemes at increasing com-
pression ratio for JPEG-2000 . . . . . . . . . . . . . . . . . 59
4.9 Performance comparison on Blood image . . . . . . . . . . 60
4.10 Performance comparison on Tiled Baboon image . . . . . . 61
4.11 Performance comparison on Mobile and Calendar image . . 61
vii
-
4.12 Performance comparison on a synthetic image generated by
PovRay . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.1 Block scheme of the proposed method. . . . . . . . . . . . 68
5.2 mobile calendar sequence frame with and without RoI en-
hancement layer. . . . . . . . . . . . . . . . . . . . . . . . 72
5.3 AIDER sequence frame with and without RoI enhancement
layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.1 Total energy Etot and frame distortion (MSE) versus the
power of the last packet . . . . . . . . . . . . . . . . . . . 84
6.2 Probability of packet loss ρj versus the assigned power PL
for the last 4 packets (j = L− 3, .., L) for a frame in a FGScoded video sequence . . . . . . . . . . . . . . . . . . . . . 86
6.3 PSNR Gain in dB vs. interference plus noise . . . . . . . . 88
6.4 Assigned Power vs. Packet Number . . . . . . . . . . . . . 89
6.5 Average size of the bit-planes for the Foreman sequence
(QCIF). . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
6.6 Experiment A results . . . . . . . . . . . . . . . . . . . . . 94
6.7 Experiment B results . . . . . . . . . . . . . . . . . . . . . 95
6.8 Experiment C results . . . . . . . . . . . . . . . . . . . . . 96
6.9 PSNR: (a) experiment A, (b) B and (c) C. . . . . . . . . . 98
6.10 Comparison be tween the PSNR obtained using measured
data and the R-D model. . . . . . . . . . . . . . . . . . . . 99
7.1 PSNR comparison of the equal energy distribution method
and the proposed scheme for different modulations . . . . . 107
7.2 Average PSNR comparison of equal energy distribution method
and the proposed scheme for different modulations and en-
ergy budgets . . . . . . . . . . . . . . . . . . . . . . . . . . 108
7.3 Reed-Solomon RS(n, k) code. . . . . . . . . . . . . . . . . 109
viii
-
7.4 Random symbol block error performance for the RS(255,k)
code. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
7.5 Performance Curves for RS codes with CR = 0.92. . . . . 111
7.6 Performance comparison between the proposed approach
and equal energy distribution . . . . . . . . . . . . . . . . 117
7.7 Impact of different levels of error protection. . . . . . . . . 118
7.8 Comparison among BPSK, FSK, and MSK modulation schemes
in terms of PSNR at the receiver . . . . . . . . . . . . . . 119
7.9 Performance improvement deriving by the introduction of
RS(110, 100) code (BPSK, BT = 200KHz). . . . . . . . . . 119
ix
-
Chapter 1
Introduction
Multimedia real-time transmission can very challenging, due to variations
in throughput, delay, packet loss and limited resources. Indeed video trans-
mission is very resource demanding in various ways because of the joint ef-
fect of real-time requirements and high volume of data do be transmitted.
Most concerns about multimedia transmission are related to compress-
ing data, transmitting it using the limited bandwidth offered by the chan-
nel, and protecting this data in such a way that the decoded sequence is
acceptable to the user view. Indeed video data produces an huge amount of
data that, without compression is unmanageable, due to the limitations in
the storage size and transmission bandwidth. Currently, several compres-
sion approaches are available, but without appropriate countermeasures,
the generated bitstream can became very sensitive to errors. For this rea-
son the newest video standards not only cover compression efficiency but
also transmission related issues such as error resilience and scalability.
In section 1.1 a brief introduction to video and image compression tech-
niques is given, while multimedia transmission issues are covered in deep
in section 1.4.
1
-
CHAPTER 1. INTRODUCTION
1.1 Video source coding techniques
In video data, a certain amount of redundancy is present both in spatial
(as for static images). Compression algorithms reduce this redundancy
with various methods and eliminate information that the human eye can
not perceive. In video compression, spatial redundancy is reduced through
methods similar to those of used for image compression, based on DCT or
wavelets. Temporal redundancy can be reduced through the encoding of
the residual image resulting from the difference between the original image
and the its prediction, resulting from the processing of the decoded near
frames. The difference image is the encoded and transmitted, together with
the side information necessary for generating the prediction image at the
decoder side. This operation is usually referred as motion compensation,
because it tries to compensate the motion between two or more frames, in
order to find an image that can be considered as a good prediction of the
original one.
In the algorithms implemented by video standards, frames are typi-
cally encoded into three different modes, usually called I-frame, P-frame
and B-frame. I-frames, also called Intra frames, are coded without motion
compensation, they allow synchronization and random access to the media,
and are a reference point for motion compensated frames. P-frames, also
called Inter frames, are encoded applying motion compensation to the pre-
vious I or P-frame. B-frames, also called Bidirectional frames, are encoded
applying motion compensation to both the previous and the successive I
or P frames. This last type of frame has the interesting property that no
other frame is encoded from its data.
More complex encoding algorithms also exist, such as 3-D wavelets and
DCT, that take in account a number of frames at time, performing in this
way both spatial and temporal compression at the same time.
2
-
1.2. SCALABILITY IN IMAGE AND VIDEO CODING
Currently various video standards exist:
• ISO MPEG-1, ISO MPEG-2, ISO MPEG-4
• H.263, H.26L, H.263+
• H.264/AVC
1.2 Scalability in image and video coding
The need of scaling is present in various situations, specially in multicast
or non live streaming where the same content must be used by different
users, each of them with different available resources. Indeed, in this cases
the characteristics of the user’s device or the available bandwidth offered
by the channel are not known in advance. For being able to cope with
these situations, it may be needed to scale the transmission bitrate, the
spatial resolution, or computational complexity.
Even if scalability is mainly used to cope with variable channel bitrate,
it can also be used for allowing different display resolutions, computing
resource capabilities, etc. Scalability techniques can be found for all types
of multimedia data, and can be provided in various ways.
In video coding several forms of scalability exist, depending on which
aspect of the decoded sequence is affected. The main types are the tem-
poral, spatial and SNR scalability. Also hybrid approaches, that combine
more than one type of scalability, exist.
Scalability is usually implemented during the compression process. In
the traditional approach, also called Layered Scalability, the encoder en-
coder generates more one bitstream are created: the more important is the
Base Layer (BL), and can be decoded independently from the others layer
generating a low resolution version of the original sequence. BL contains
critical information because it is needed for the decoding of the subsequent
3
-
CHAPTER 1. INTRODUCTION
layers. These add information to the BL, and are called Enhancement
Layers (ELs). The number of ELs depends on how many scalability layers
are desired. When the main BL is jointly decoded with one or more ELs,
a high resolution of the video is generated (in terms of quality, spatial or
temporal resolution, according to the scalable coding technique).
Temporal scalability generates different layers with increasing frame
rate. It is the most immediate form of scalability, since it can be per-
formed by dropping B-frames from the original bitstream. Another type of
scalability is based on spatial resolution and is useful in cases where the res-
olution of the display device is not known in advance. Finally, also PSNR
scalability is possible. It allows to achieve the video quality increasing the
number of layers considered.
Scalability allows to avoid encoding and maintaining different copies of
the same video at the server side, and the transmission of same data mul-
tiple times (i.e. using simulcast techniques). It is a valid alternative to the
use of simulcast transmission, since it solves the problem of transmitting
a multiple versions of the original data.
A more recent approach is the embedded coding (also known as progres-
sive scalability). Instead of using different distinct layers, embedded coding
implements scalability progressively in the same bitstream. The informa-
tion is added as the bitstream is decoded, gradually increasing resolution
to the reconstructed data.
Forms of progressive scalability are present in image, video and even
audio coding. It can achieved using wavelet transform or bit plane coded.
An introduction to progressive coding for both image and video is given in
chapter 2.
Multimedia standards address scalability in various ways. For still image
scalability, the JPEG2000 and MPEG-4 VTC standards that offer wavelet
based embedded scalability.
4
-
1.3. REGION OF INTEREST AND NON UNIFORM COMPRESSION
For video, some sort of scalability exist in all the most recent image and
video standards:
• MPEG-2 and H.262 implement temporal, spatial and SNR layeredscalability.
• MPEG-4 includes coding modes that allow layered scalability (tempo-ral, spatial, SNR), object scalability and progressive SNR scalability,
also known as Fine Granular Scalability (FGS).
• H.263+ implements temporal scalability using B-frames.
• H.264 implements temporal scalability using B-frames.
• MPEG-21 video, currently under development, should in future in-clude scalability features.
Only MPEG1, H.261 and H.263 video standards do not include any sort
of scalability.
A technique similar to scalability is multiple description (MD) coding.
It has been included in the H.263+ standard and allows to encode the
sequence into two equally important bitstreams. Each bitstream, when
independently decoded, generates a low resolution version of the encoded
sequence, but when decoded together generates a high definition version
of it. MD is often proposed for the transmission over error prone networks
where data can choose different paths for getting to its destination: if one
path fails, part of the data can still be received and decoded.
1.3 Region of interest and non uniform compression
Visual data represented in an image or video sequence may not be equally
important for the user. One of the reasons is because the human eye usually
focus on the part of visual information more relevant from a semantic
5
-
CHAPTER 1. INTRODUCTION
point of view. We can think for example of an anchorman speaking in a
news TV program. In this case the user is more interested on the face of
the person and in particular to the mouth and eyes, rather than on the
background. Another example are environmental images, where areas with
weather intense activity, such as an hurricane’s eye, are certainly the more
important for the interpretation of the image.
Commonly a region of an image or video that contains more information
for the user is called Region of Interest (RoI). A RoI usually delimits an
area that contain information necessary for the user to correctly interpret
the visual data. Often rectangular RoIs are preferred because are more easy
to encode, but it can be of any shape. Practically, RoI’s shape is limited by
coding algorithm characteristics. In the same image or video, more than
one RoI can exist and they can have various degrees of importance, and
the RoI can be segmented into sub-RoIs if some areas are more important
than others.
Since some regions are more visually important than other, loss and
inaccuracy during coding and transmission are more tolerated outside RoI.
This effect can be obtained through non-uniform lossy compression, tech-
nique used in video/image compression to reserve more coding resources
for the RoI, allowing a worst quality to the background. The idea is to
obtain a non-uniform quality in the image through a non uniform com-
pression, reducing in such a way the amount of data to be transmitted, or
improving the perceived quality of the data.
Currently, there exist various techniques in in video/image coding that
allow RoI definition and they were introduced in the most recent standards.
JPEG2000 was the first standard to introduce RoI definition. It is a recent
image compression standard based on Discrete Wavelet Transform (DWT).
In JPEG2000, for example, it is possible to specify a RoI during the coding
phase in order to implement non-uniform compression. Another possibility
6
-
1.4. TRANSMISSION OF MULTIMEDIA BITSTREAMS
is to specify the RoI during the decoding phase, allowing the user to specify
the RoI a posteriori. This feature, in combination with the communication
protocol JPIP, it allows a selective retrieval of the image and allows to
introduce the use of RoI also as a scalability tool.
RoI can be implemented also in video coding, for example through the
object concept in MPEG4 video coding standard or through scalability
tools like the selective enhancement in MPEG-4 FGS.
In chapters 4 and 5 examples of the use of RoI and non uniform com-
pression for transmission are presented.
1.4 Transmission of multimedia bitstreams
As far as the channel is concerned, video transmission requires a stable and
robust channel, a high bandwidth (even compressed the amount of data is
still significant), and little delay and jitter.
Compressed multimedia data, and in particular video, is highly sensitive
to transmission error. In compressed sequences, temporal and spatial pre-
dictive coding allows error propagation, and VBR coding generates peaks
of data where the encoder finds it hard to compress (for example in the
presence of rapid movement in the sequence of complex scenes).
Real-time video transmission, or streaming, is quite different from file
transfer for different reasons. In file transfer, a file containing a certain
amount of data must to be transmitted over the network integrally because
only when the file is completely transmitted it can be used. If only a
minimal a part of the data file is missing or damaged, the entire file is
compromised. In video streaming, the user can start decoding and viewing
the data before all the encoded sequence is transmitted, or even, in the case
of live streaming, encoded. This approach to data transmission impose
more tight constraints concerning transfer rate, delay and error resilience.
7
-
CHAPTER 1. INTRODUCTION
On the contrary of normal data, video data (as well as other multimedia
data) can still be used even if some data is lost or missing. Not always
transmission losses compromise the entire sequence and the eye can still
compensate and tolerate the some errors. Moreover, the introduction of
error control tools that allow the resynchronization, error recovery and
concealment, help to easy the task.
Various error control tools exist that allow to cope with channel fad-
ing, packet losses, transmission errors. These can be implemented at the
encoder or at the decoder side. The so called error resilient encoding falls
into the first case. These techniques can follow different approaches, and
include techniques that add redundancy to the bitstream, allow resynchro-
nization, or divide the data in independent decodable sections.
For example, they can be based on the spatial position of the errors,
trying to isolate them in a limited portion of the image or of the bitstream.
This can be achieved with resynchronization markers or data partitioning.
In data partitioning, data is organized in the bitstream so that important
data is grouped together and isolated. Other techniques are based on
temporal characteristics of the encoded sequence, and involves the insertion
of intra coded blocks or frames, at random or piloted by a criteria, such as
minimum distortion. Tools for error resilient encoding are present in the
most recent standards, such as in H.263, H.264 and MPEG4.
At the decoding side it always possible to use error detection and con-
cealment techniques for recovering from transmission errors [72]. The aim
of error concealment is to exploit the knowledge of human visual system
and common properties of visual data for reconstructing the missing bits
for reducing as much as possible the perceived effects of loses. Conceal-
ment algorithms have to mediate between computational complexity and
effectiveness, since video have strict timing constraints and do not tolerate
delays. Visual standards do not define how to conceal transmission er-
8
-
1.5. UNEQUAL ERROR PROTECTION
rors, but give the decoder designer the freedom to choose an concealment
approach appropriate to the system resources and requirements.
Finally, it’s always possible to use mixed techniques that involve both
encoder and decoder, for example including some sort of interactivity be-
tween the both, based on feedback messages on received or lost data. Tech-
niques that require exchange of control messages between encoder and de-
coder usually are suitable to point-to-point transmission, but not always
to point-to-multipoint scenarios.
In [32], the authors present a review of several channel-adaptive video
streaming techniques that, employed in different components of the system,
are allows to provide efficient, robust, scalable and low-latency streaming
video.
For a review of the technical challenges of video streaming and ap-
proaches how to solve the discussed problems in given in [46], while Zhang
et alt., in [74], give a good overview of challenges and approaches in trans-
porting Real-Time Video over the Internet.
1.5 Unequal Error Protection
Unequal Error Protection (UEP) of the bitstreams is implemented when
different error resilience strategies are used for protecting different parts of
data of the same multimedia file.
UEP approaches can be implemented using different techniques. Typ-
ically they consider characteristics of the encoded video for deciding the
protection strategy to be adopted. This, because in multimedia bitstreams
data is not equally important. It can be a good idea, then to protect more
the data that is more important for the decoding process, or that allows to
minimize distortion. From combining UEP with encoding strategies, such
as data partitioning or scalable coding, different solutions can be found.
9
-
CHAPTER 1. INTRODUCTION
If we consider for example video coding, I-frame reception is critical,
because of the prediction coding techniques that cause error propagation,
while the loss of a B-frame creates an isolated error, not visible in the
subsequent frames. An UEP strategy can be defined differentiating the
transmission of I- and P-frames from B-frames, for example adding variable
error correction codes to the transmitted data. Data partition can be used
combined with UEP for protecting important data, such as motion vectors
information, more heavily. It also possible to apply UEP to layered scalable
bitstreams, using techniques that differentiates the protection applied to
different layers [35]. Another possibility is to use RoI based encoding and
UEP.
1.6 Joint source and channel coding
Even if encoders include a great number of tools, including error resilience
ones, the main task of source coding is of reducing the bit size of the data,
using techniques eliminating spatial and, for video sequences, temporal
redundancy. On the other side, channel coding introduce redundancy to
protect data from channel errors and packet losses. Forward Error Correc-
tion codes (FEC) are mainly used for this aim, and between them the most
popular are the Reed-Salomon codes. They allow to correct up to a cer-
tain number of errors within a block of bits. When the data transmission
is packet based, problem is to cope with packet losses and FEC is usually
applied across packets.
Shannon’s source and channel coding theorem ([Ref] C. E. Shannon,
A mathematical theory of communication, Bell System Technical Journal,
vol. 27, pp. 379-423, 623-656, 1948.) states that, under certain conditions,
in a communications system source and channel coding can be optimized
independently. This important theorem is the foundation of design of many
10
-
1.7. SCOPE AND MAIN CONTRIBUTIONS
communications system, however the hypothesis needed for its validity can
become very restrictive for recent communication systems and specially for
video transmission. Indeed first the theorem assumes that it is possible to
use codewords of infinite length. This implies that we should allow infinite
delay in transmission, a restrictive hypothesis for real-time transmission.
A second requirement is to consider only point-to-point transmissions.
Methods that jointly consider the source and channel coding can be
used when Shannon’s theorem is not valid. Instead of applying source and
channel coding as two independent steps they are considered and optimized
together, in order to better exploit the knowledge arising from the coding
process during the transmission. Commonly known as joint source and
channel coding (JSCC), it is usually implemented at the application layer.
A great number of JSCC have been studied for both image and video
transmission, in chapter 3 an introduction of the use of these techniques in
wireless transmission in given.
1.7 Scope and main contributions
In this thesis we present some approaches used for embedded image and
video transmission over wireless.
An introduction to progressive coding is given in 2, while in chapter 3
existing techniques for progressive coding transmission over wireless net-
works are presented.
In chapter 4 we introduce the use of non uniform compression for image
transmission, while the use of RoI and embedded coding for interactive
transmission of video is discussed in chapter 5.
In chapter 6 we introduce an general approach to unequal error protec-
tion of embedded bitstream based on energy management. The method
found, allows to optimize the energy distribution among the packets, in
11
-
CHAPTER 1. INTRODUCTION
order to minimize the distortion or the energy consumption, while in chap-
ter 7 the efficiency of the method is compared for different modulation
schemes, and with or without channel coding.
12
-
Chapter 2
Progressive coding in video and
image compression
Multimedia data can be coded with coding techniques that generate em-
bedded bitstreams. Scalability allows encoding a video sequence in such a
way that the compressed video can accommodate different bitrates. The
progressive coding approach differs from traditional layered methods be-
cause the set of possible rates varies in a nearly continuous way.
The progressive scalability main characteristic is its capability to achieve
a smooth transition between different bit rates since the enhancement layer
frame information can be efficiently truncated at any point in order to
achieve the desired target and still be decoded correctly.
Indeed, in pure embedded bitstreams there are no distinct layers, as it
happens traditional layered coding. Indeed, in the traditional approach,
scalability is achieved by coding the data into different separate coding
layers, starting from the Base Layer (BL), which contains essential infor-
mation, and then by generating one or more Enhancement Layers (ELs)
with additional data. In progressive coding, scalability is achieved though
the direct truncation of the main bitstream. This approach differs from
traditional layered methods for video scalability because of its capability
to achieve a smooth transition between different bit rates.
13
-
CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION
When decoded, these bitstreams progressively add resolution data to
the recovered image or sequence. During the decoding, the process can be
interrupted at any point, and the data decoded up to that point can be
interpreted as a low resolution version of the fully decoded data.
Progressive scalability allows to gradually obtain the improvement of
the decoded video, making it useful in applications like video browsing or
remote access to video servers, in particular when dealing with narrowband
data channels (like mobile applications).
This encoding method can be employed with success in the field of
video communication, allowing real-time stream processing able to adapt
the bitstream to the channel bandwidth. In the context of rate control,
progressively coded bitstreams can be used for obtaining fine granular data
representations at lower bitrates, since these bitstreams have the property
of allowing different spatial/quality resolutions depending on the amount
of data being transmitted and decoded.
Progressive coding inherently allows also complexity scalability and easy
resource adaptation depending on the capabilities of video devices. From
the transmitter’s point of view, this means that the same bitstream can
accommodate the different bitrates needed for sending video data to users
on networks with heterogeneous capacity. The receiver can decide to de-
code only the amount of data supported by its own resources (i.e. memory,
computation power etc.).
The most popular progressive coding implementations are based on
wavelet transforms and/or bitplane coding. These techniques enable the
progressive coding of image, video and even audio data. For image cod-
ing, wavelet-based coding techniques, like those used in SPHIT [56] and
EBCOT [62] can be used. These techniques differ on how the compression
is achieved, but all of them can generate progressively coded bitstreams.
In particular, wavelets were exploited by the newest image compression
14
-
2.1. PROGRESSIVE SCALABILITY IN JPEG2000 IMAGE CODING STANDARD
standard, JPEG2000 [4][64] which is based on the EBCOT paradigm and
not only delivers a state-of-the-art compression performance, but also is
flexible to accommodate tools for the implementation of region of interest
(RoI), perception-based quality optimization, and quality layers.
Also in video compression wavelets can be used: 3-D wavelet coding
schemes, such as 3-D SPHIT [39], can be used for obtaining embedded
bitstreams of video data. These techniques group together a sequence of
frames and apply to them the 3-D wavelet transform, eventually allowing
both temporal and quality scalability.
Another important approach is represented by the Fine Granular Scal-
ability (FGS), since it was recently included in the streaming profile of
MPEG-4 standard, Part 2 [45][5].
The most recent video standard, H.264/AVC, implements only tempo-
ral scalability, using B-frames, but a special committee (known as SVC,
Scalable Video Coding) is evaluating the possibility of inserting progressive
coding.
Finally, also in audio coding it is possible to implement progressive
coding [43][47], and progressive coding techniques have been added also to
the MPEG-4 Audio standard [3].
2.1 Progressive scalability in JPEG2000 image cod-
ing standard
For creating the embedded bitstream, the JPEG2000 baseline compression
scheme [64] starts from a partitioning of the image into rectangular regions
called tiles, to each of which a discrete wavelet transform (DWT) is applied.
The DWT generates several wavelet sub-bands, which are divided for cod-
ing purposes into several smaller blocks called codeblocks. Each codeblock
is then independently quantized and bitplane encoded, thus achieving an
15
-
CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION
embedded bitstream at codeblock level. Codeblocks are then grouped to-
gether to form precincts.
From the error resilience point of view, when the decoder detects an
error in the codeblock data, it typically discards all the successive data
related to this codeblock. This produces a decoded codeblock equivalent
to the one generated by an encoder using a coarser quantization parameter.
As far as quality scalability properties are concerned, JPEG2000 creates,
at the encoding time, a certain number of Quality Layers (QLs). They are
formed in such a way to accommodate different coding rates and qualities in
the same bitstream. The user can decide how many QLs to implement and
the coding rates they must achieve. Each QL progressively accommodates
a given number of bits from each precinct. The contributions from each
precinct are chosen by the encoder in such a way to minimize the distortion
at the target rate. Each quality layer progressively reduces the distortion
of the decoded image in an optimal way in the rate-distortion sense. If
the number of layers is large enough, the distortion associated with the
bitstream truncated at a random point will be close to the optimal one. In
general, a layer is completely decodable only if all the precedent layers are
received; the first layer is then fundamental for the decoding of the entire
bitstream, and the importance of the layer decreases as we go from lower
to higher layers.
Rate-distortion statistics for each QL are generated by the encoder
during coding time. If the number of QL is sufficiently dense, the rate-
distortion curve of the original image can be constructed based on the
statistics obtained for each QL.
2.1.1 Rate-Distortion information
Most unequal error protection algorithms require the operational distortion-
rate curve of the source coder of the original images. A general R-D model,
16
-
2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD
valid for progressively coded images (jpeg2000) is shown in [14]. The au-
thors propose the use of parametric models instead of the true D/R curves
for wavelet-based embedded image and video coders. This model is also
used in [61].
In JPEG2000, rate-distortion data can also be collected during the en-
coding phase.
2.2 Progressive scalability in MPEG-4 video coding
standard
MPEG-4 FGS (Fine Granular Scalability) is a video coding approach that
allows introducing quality scalability to the encoded video. It uses a mixed
implementation of layered scalability and bit-plane coding for obtaining
two bitstreams, commonly called Base and Enhancement Layers.
The Base Layer (BL) contains essential information about the sequence
and can be decoded independently from the Enhancement Layer (EL),
producing a low quality reconstruction of the video sequence. A higher
quality reconstruction can be then achieved by decoding both the Base
and Enhancement Layers together. Since the EL is progressively coded, it
can be used to gradually add information and detail to the BL.
Due to its structure, the EL can be truncated at any point and is still
used to add information to the decoded BL. The FGS’s inherent scalability
and flexibility, also allows complexity scalability and easy resource adapta-
tion depending on the capabilities of video devices. Thus FGS is suitable
for video conferencing and video multicast. An interesting overview of
applications enabled by FGS technology is given in [67].
The Part 10 of the MPEG-4 standard includes FGS encoding and a hy-
brid method that combines FGS with temporal scalability (called also FSG-
T). Advanced MPEG-4 FGS tools are Selective Enhancement, Frequency
17
-
CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION
Weighting and Synchronization Markers for improving error resilience.
In FGS the base layer (BL) behaves as a normal compressed bitstream
(like MPEG-4 Simple Profile), while the difference between the encoded/decoded
video sequence and original video sequence is encoded in the Enhancement
Layer (EL) (Fig.2.1).
DCT Q
Q-1
IDCT
MotionCompensation
MotionEstimation
FrameMemory
VLCOriginal Sequence
Base LayerBitstream
Bit-planeShift
FindMaximum
Bit-planeVLC Enhancement Layer
Bitstream
FGS Enhancement Layer Encoding
Clipping
DCT
Figure 2.1: FGS encoder block schema.
Progressive decoding is achieved by a bit-plane coding of the DCT of
the residual image: the frame data is transmitted starting from the most
significant bit-plane (MSB) to the last one (LSB). DCT is performed in
a block basis, as for the base layer, but is bit-plane coded after zig-zag
scanning of the coefficients (Fig.2.2)
Data of each bit-plane (BP) is grouped in macroblock basis, and sent
one MB at time, starting from the upper left corner.
Usually the MSBP is the BP with smaller size, and BP size increases as
it goes from MSB to LSB (Fig.6.5).
18
-
2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD
Figure 2.2: Bit-plane encoding of the Enhancement Layer.
Figure 2.3: FGS bitstream.
Figure 2.4: FGS bitplane truncation.
19
-
CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION
It is likely that the first bit plane mostly contains information from
those MBs that the BL found more difficult to compress, such as those
containing movement. The first BP will contain also a great number of
zeros, due to small error in the others MBs.
Not all the data in the EL have the same importance: data from the
most significant bit planes are necessary for the decoding of the following
ones, and this data carries also more information, with respect to the last
bit planes.
Due to its structure, the EL can be truncated at any point and still be
decoded. Truncation can happen at any point of the EL of a frame: it
can happen, then, that half a frame is coded with a better quality then
the rest of it, since the EL is truncated at the middle of the BP for that
frame. This is more likely to happen in the LSBs, since MSB contains a lot
of zeros and implicitly gives more importance to certain MB with respect
others giving a sort of quality priority.
FGS allows performing rate control on pre-encoded sequences simply
truncating the EL in a such a way to satisfy the bit budget required.
While the Base Layer is compressed setting a maximum bit rate RB,
such that it can always be transmitted over the channel, the Enhancement
Layer can be cut in such a way that the FGS coded video can be trans-
mitted at any bit rate greater than RB (and minor of a certain RE, that
depends on the number of BP used in the EL), fully utilizing the band-
width available at the transmission time. In this way it is possible to adapt
the coded video to the time-varying condition of the channel.
FGS can also be used in multicast environments, allowing the transmis-
sion of the same compressed video to users with different requirements.
20
-
2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD
2.2.1 FGS decoder simplification using post-clipping
In Fine Granular Scalability, the residual image from which the enhance-
ment layer is created can be computed using a pre-clipping or post clipping
structure [54].
In a pre-clipping structure the residue is computed directly in DCT
domain from the difference from the original DCT coefficients and the
quantized ones (obtained during the BL coding). In this case EL, during
decoding, depends on intermediate results of the BL decoder. In streaming,
VOLs can arrive at different times and cross dependencies between BL
and EL decoders (such as use of intermediate data) may restrict decoder
implementation options.
For decoupling EL from BL decoding, a post-clipping approach must be
used. In this case the residue is calculated from the difference of the de-
coded BL VOP and the original one, not using in this way any intermediate
information.
In the MPEG-4, the FGS is implemented using a post-clipping coding
scheme [54] because it presents implementation advantages. Indeed, in this
kind of schema, the base and enhancement layers are de-coupled, and the
residue can be computed directly in the spatial domain. Various decoder
implementation are possible: it can be implemented as a sequential de-
coder, using the same hardware that operates both on BL and EL VOP;
as a parallel decoder, using dual hardware implementation that operates
on different VOPs (for base/enhancement); or as pipelined decoding.
2.2.2 FGS Advanced Features
Advanced features in FGS help to improve visual quality, usability, and
error resilience.
21
-
CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION
Fine-granular temporal scalability
Fine-granular temporal scalability (FGST) is a hybrid SNR-temporal scala-
bility and allows to trade off between individual frame quality and temporal
resolution.
FGST can be implemented with low added complexity, allowing to ob-
tain a lower transmission bitrate both slowing down the sequence frame
rate and reducing the quality in terms of PSNR.
In MPEG-4 FGST is implemented in two modes: as a single layer scal-
ability structure, referred as FGST, and as two layers structure, referred
as FGS-FGST.
Selective Enhancement
Selective Enhancement (SE) is implemented at a frame level and allows to
arrange the bit-plane coding order based on region selection. The region
of interest considered (RoI) can be arbitrary shaped, and have as shape
unit a macroblock. This method allows to transmit more bit-planes from
the RoI macroblocks.
SE is a tool for encoder optimization and can be used for a number of
operations such as region-based quality adjustment or object tracing, and
can be combined with frequency weighting operations (see next paragraph).
Automated RoI selection combined with SE can be implemented in vari-
ous ways. An example of how it is possible to use the SE tool for improving
the encoding perceived quality of a video conference stream is given in [68].
In this work, the SE is used combined with a real-time face detection algo-
rithm and improving in this way the subjective visual quality of streaming
video under various transmission bit-rates.
A more complex example is given in [36], where the an automated coding
mode selection is implemented. The encoder, based on the video content
22
-
2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD
and the current available bandwidth, selects between the available coding
schemes (FGS, FGST, FGS-SE, and FGST) in order to achieve higher
perceptual video quality. SE and background selection is based on the
contents of the video sequences.
Even if in the successor of MPEG-4, H.264, no FGS and SE are imple-
mented, it is also interesting to see how H.264 features are used in order to
automatically select the visually important regions to be used as SE regions
in a non standard H.264 FGS implementation [66]. The method requires
low computational complexity and can be used in real time transmissions.
Frequency Weighting
Frequency weighting (FW) method has been included in the MPEG-4 stan-
dard to allow the prioritized transmission of low frequency DCT coeffi-
cients. When different frequencies are treated equally and the precision is
limited by FGS bit-plane truncation, in certain sequences some flickering
artifacts can occur. This happens when high-frequency residues are added
to a low quality blocky BL.
Using the FW tool, DCT frequencies can be weighted on the basis of
different psychovisual importance. The approach is used for giving more
precision for low frequency and it is similar to the use of customized quanti-
zation matrices in the BL. For applying FW correctly, separate weighting
matrices must be used for I-frames and P-frames to cope with different
statistics since the first is applied to the residue of quantization, and the
latter to the residue of motion estimation.
An example of application of FW can be found in [53] where the authors
use different FW matrices in order to improve the FGS visual quality.
The FW matrice is chosen automatically, depending on the video sequence
characteristics and succeeds in improving the visual quality of the decoded
sequence.
23
-
CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION
Error Resilience
Standing that no temporal error propagation can occur in the FGS EL,
since no prediction scheme is implemented in this layer, the more significant
problem is maintaining the synchronization and being able to decode as
much as possible of sequential data.
In FGS data partition is used as an error resilience tool in order to
improve the quality of transmission of EL over error prone channels. Re-
synchronization markers are used considering bit-plane relations.
The error resilience of FGS video streaming is studied in [70][78]. An
improvement attempt is described in [77] where an Header Extension Code
(HEC) is proposed.
2.2.3 Rate-Distortion model of the FGS bitstream
It is possible to define a rate-distortion (R-D) model of the EL based on the
statistics collected during the encoding. The rate distortion model utilized
can be derived either from empirical considerations or from analytical cal-
culations. An interesting analysis of the FGS EL layer is given by Loguinov
and Radha in [26][24], where also a distortion model is defined.
Depending on the application, a simple R-D curve obtained from the
R-D data measured at each bitplane can be used. Indeed, experimental
measurements show that the R-D curve for the FGS EL is approximately
linear within a bitplane [82][81]. This is reasonable if we consider that
inside a bitplane the distortion is improved gradually by adding bitplane
information one MB at time, and if we consider that the statistics proper-
ties of a bitplane is constant within the bitplane itself. From the R-D data
measured for each BP, we can then obtain a good approximation of the
R-D curve (Fig.2.5). We recall that the R-D data can be easily calculated
also in the frequency domain, as highlighted in [25].
24
-
2.2. PROGRESSIVE SCALABILITY IN MPEG-4 VIDEO CODING STANDARD
0 0.5 1 1.5 2 2.5 3 3.5 4
x 104
0
20
40
60
80
100
120
Rate (bits)
Dis
tort
ion
(MS
E)
R−D modelMeasured dataMeasured data at each BP
BL
BP 1
BP 2
BP 3
BP 4
Figure 2.5: Comparison between the measured data and the R-D curve calculated from
the BP data.
25
-
CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION
Cuetos et alt. collected interesting FGS statistics data in their work [27],
where they present a publicly available library of frame size and quality
traces of long MPEG-4 FGS encoded videos.
2.3 Applications of progressive coding
Progressive coding provide easy bandwidth adaptability because it allows
to separate the encoding process from the transmission. Indeed encoder do
not need to know the bitrate at which the bitstream will be transmitted.
Moreover, at the receiver side, if necessary it is possible to decode only part
of the transmitted data, according to its own computational capabilities.
The same transmitted bitstream can be used by different user or appliances,
according to their own needs and resources.
Progressive coding can be used as a pre-encoded video rate control tool.
Indeed, in order to cope with bandwidth variations, often present in wire-
less links, some sort of rate control must be adopted. Traditionally in
real-time non-scalable coding and transmission, data bitrate is adapted on
the fly to the available bandwidth during coding, in order to adapt the
bitstream to the changing conditions of the channel. If the data is already
compressed, this approach is not possible and transcoding techniques may
be adopted. These methods allow to create a lower bitrate version of the
video data directly from the compressed bitstream, without going through
a computationally intense compression-decompression process. An alter-
native to this approach is switching during the transmission time between
pre-encoded bitstreams, using simulcast techniques.
In this context, scalable bitstreams can represent a good solution to rate
control of pre-encoded sequences since this technique allows to encode once
but at different bitrates. Traditional layered encoded bitstreams permit to
transmit the data at different, but fixed, bitrates. A further enhancement
26
-
2.3. APPLICATIONS OF PROGRESSIVE CODING
to this approach, is is given by the use of embedded bitstreams for perform-
ing a fine granular rate-control. In [52] Radha and Parthasarathy present
two optimum (in a rate-distortion sense) rate-control algorithms for FGS
scalable video transmission.
An interesting review of FGS applications is given by Schaar et alt. in
[67]. The paper refers to FGS coding, but the considerations made in it
can be applied as well to other scalable coding techniques.
27
-
CHAPTER 2. PROGRESSIVE CODING IN VIDEO AND IMAGE COMPRESSION
28
-
Chapter 3
Wireless video and embedded
bitstreams transmission
Wireless transmission is becoming popular both in home and office context,
and the number and variety of emerging application is growing. Residential
WLAN are growing in popularity, and wireless hot spots can be found in
major airports, hotel chains and conference rooms.
The term Wireless Transmission is commonly related to frameworks re-
lated both mobile/cellular services and wireless data networks. WLAN’s
standards include IEEE 802.11 HIPERLAN, Bluetooth, NMAC etc. Mo-
bile transmission include GPRS, 3G, UMTS, EDGE and next 4G technolo-
gies. It is realistic to think that, in the future, these two frameworks will
merge together.
As Girod and Fäber highlight in [31], the challenges of transmission
of multimedia data transmission go well beyond the problems related to
the poor bandwidth. In wireless based networks, errors during transmis-
sion cannot be avoided, due its own nature, even when error correction
techniques are implemented. The authors argue that the only practicable
solution is to achieve a compromise between reliability, throughput and
delay. The authors focus their work on cellular networks, but most of
the presented strategies can be applied to the transmission of video over
29
-
CHAPTER 3. WIRELESS VIDEO AND EMBEDDED BITSTREAMS TRANSMISSION
WLANs.
A review of error concealment strategies for fine granularity scalable
(FGS) video transmission is given by [11].
3.1 Joint source and channel coding in wireless trans-
mission
Various approaches of JSCC applied for wireless communications links exist
in literature, a general approach is presented in [9] by Appadwedula et alt.
The authors propose a JSCC schema based on a parametric distortion
model. An advantage of the method is that it can be applied to most
classes of source and channel coders, making it possible to obtain nearly
all of the benefits of joint source-channel optimization by matching existing
source and channel coding standards using a simple and general approach.
3.2 Unequal error protection in progressive and scal-
able bitstreams
The application of UEP to the transmission of progressive coded images
and video is not new. It has been implemented in various fashions, and
some UEP techniques designed for other contexts can be adapted to the
transmission of embedded bitstreams.
As highlighted before, in embedded bitstreams, the data is implicitly
sorted by its importance, and this characteristic can be used for the im-
plementation of error resilience techniques based on the UEP. Traditional
equal error protection (EEP) schemes consider all the data as having the
same importance and assign the same amount of protection to the whole
bitstream. On the other side, UEP schemes give more importance, hence
more protection, to the most critical parts of the coded image.
30
-
3.3. CROSS-LAYER APPROACHES
Reed Salomon codes were used by Natu and Taubman [48] for the pro-
tection of JPEG2000 bitstreams during transmission over wireless channels.
In [75] the channel coding is used for implementing UEP of a JPEG2000
bitstream.
For video transmission, the application of UEP within the EL FGS
bitstream was first considered by Schaar et alt. in [70], where the frame-
grained loss protection (FGLP) framework was introduced. Based on it,
Yang et alt. proposed in [78] a “degressive” protection algorithm (DEP)
based on FEC for optimal assignment of protection redundancy among
bit-planes. In [71], Wang et alt. studied the problem of rate-distortion
optimized UEP for Progressive FGS (PFGS) over wireless channels using
prioritized FEC for the BL and EL. A similar problem was studied in [79]
in which the objective was to minimize the processing power for PFGS
video given bandwidth and distortion constraints.
3.3 Cross-layer approaches
Each network layer (physical, link and application), are able to individu-
ally apply error protection schemes, that are independent from each other.
This behavior is implicit in the layering paradigm commonly used in the
networks’ structure definition. Of course, independent strategies do not
provide the overall optimal solution, since they ignore each other and do
not create useful synergies.
The idea behind cross-layer approaches is to jointly consider the error
protection strategies at various layers, in order to improve the transmission
efficiency in terms of protection, bandwidth and resource consumption.
These techniques do not necessary involves JSCC, and are aware of the
hole system.
Usually cross-layers approaches use optimization strategies in order to
31
-
CHAPTER 3. WIRELESS VIDEO AND EMBEDDED BITSTREAMS TRANSMISSION
minimize the overall resource utilization, or video distortion, and parame-
terize the characteristics of the different network layers. The adaptation
parameters that can be considered in a cross-layer scheme, can be found
at any layer:
• physical layer: transmission power, antenna characteristics, modula-tion and equalization schema;
• link layer: frame size, error correction coding strategy, ARQ, admis-sion control and scheduling, packetization
• transport and network layer: signaling and packetization
• application layer: compression strategy, error concealment, rate con-trol, error correction codes, ARQ, scheduling, packetization
The number of parameters involved in this process influence the overall
complexity of the optimization solution, and should be limited if a feasible
approach is wanted.
For complex problems, closed solutions are difficult to achieve, and dy-
namic programming is often used for finding solutions.
Resource optimization can involve one or more aspects of the trans-
mission, and can aim to optimize the consumption of a single resource
(for example the transmission energy or the bandwidth) or result (overall
distortion, video quality).
Shakkottai et alt. give in [60] an interesting overview of the issues
related to cross-layer design for wireless networks.
Cross-layer approaches to embedded bitstream transmission have been
studied. In [80] an hybrid UEP and ARQ scheme is used for the trans-
mission of scalable video over wireless. FEC and ARQ are also used for
transmission of FGS streams over 802.11 channels in [76] and [41]. The ap-
proach considers the characteristics of IEEE 802.11 WLANs for evaluating
32
-
3.4. JOINT SOURCE CODING AND POWER CONTROL
the transmission parameters.
In [33] a cross-layer optimization of OFDM transmission systems for
MPEG-4 video streaming is presented. In [17] Radha and Cohen presents
an efficient method for streaming FGS video over packet-based networks.
In [44], Li and van der Schaar present several heuristic algorithms for
real-time transmission of layered video bitstreams over wireless LANs, pro-
viding and adaptive QoS through real-time retry-limit adaptation (RTRA).
In [38] Khayam et alt. propose the MAC Lite strategy as a cross-
layer protocol design for real-time multimedia applications over 802.11b
networks.
3.4 Joint source coding and power control
For wireless networks energy is an important and limited resource. In order
to optimize its consumption, source coding parameters and power control
can be jointly considered and optimized. This form of cross-layering is
commonly addressed as joint source coding and power control (JSCPC).
In [83], a joint FEC and transmission power allocation scheme for lay-
ered video transmission over a multiple user CDMA networks was pro-
posed. In the work, scalability was achieved using 3D-SPIHT (wavelet
based coding). The objective was to minimize the end-to-end distortion
through optimal bit allocation among source layers and power allocation
among different CDMA channels.
The authors in [12] considered jointly adapting the source bit rate and
the transmission power in order to maximizing the performance of a CDMA
system subject to a constraint on the equivalent bandwidth. In that work,
an H.263+ codec was used to generate the layered bitstream.
In [13] Chan consider a JSCPC approach for video transmission over
3G wireless CDMA cellular networks. In [59], Sehlstedt and Le Blanc pro-
33
-
CHAPTER 3. WIRELESS VIDEO AND EMBEDDED BITSTREAMS TRANSMISSION
pose the use of alternate metrics to dynamically fine tune the performance
optimization and to use a dynamically adjustable bit-energy distribution.
For a progressive coded video, the position of the first bit error within
a frame is of more importance than the overall bit error probability. In
[30], Fossorier, Xiong and Zeger derive the optimal channel code rate and
the optimal energy allocation per transmitted bit for the transmission of a
progressively, numerically optimizing the choice of channel code rate and
the energy per bit allocation.
In [15] the authors propose an energy-aware MPEG-4 FGS video stream-
ing system with client feedback.
3.5 Modulation based UEP
An interesting approach to cross-layer and UEP, involves the usage of mul-
tiple modulation channels.
In [10], Atzori propose an approach for robust transmission of JPEG2000
images over wireless networks using a wavelet transmultiplexer. In [69],
Schaar and Meeha propose to use an adaptive modulation scheme in com-
bination with FGS coded video, in order to obtain an UEP based video
transmission over wireless. The approach, termed Adaptive Modulated
FGS (AM-FGS), is able to cope with channel bandwidth variations and
degradation exploiting the FGS structure and tailor the modulation scheme
to the channel conditions and data characteristics.
34
-
Chapter 4
Non uniform compression in image
and video transmission
In wireless channels bandwidth is limited and must be used wisely. Non
uniform compression using RoI is an interesting option in transmission
because it allows to distribute bits according to data importance achieving
a better perceived quality.
In this chapter, we consider the transmission of still images affected
by geometrical distortion. For their transmission, the effects of different
lossy compression strategies are analysed. Indeed, in this specific case, the
encoding-decoding process and the geometric correction, together generate
a non-homogeneous image degradation, since different amount of infor-
mation associated to each resulting pixel. A distortion measure named
Quadtree Distortion Map (QDM) able to quantify this distortion is de-
scribed in the following chapter. In in order to ensure a uniform quality on
the final image and to QDM exploited during compression. The resulting
method is able to reduce the total size of compressed geometrically dis-
torted pictures. From tests performed using JPEG and JPEG2000 coding
standards it is shown that it is possible to improve both the measured and
the perceived quality of the transmitted image.
In the next section some background on nonuniform compression of geo-
35
-
CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION
metrically distorted images is given and the effects of non-linear geometric
distortions on co-decoded images are considered. A detailed description of
the concept of Quadtree Distortion Map (QDM) is given in 4.2. In section
4.3, it is shown how QDM can be used to design an adaptive image com-
pressor able to achieve a uniform error distribution over the decompressed
and de-warped image. It is also shown how this approach can be applied to
the standard JPEG and JPEG2000 image compression algorithms, while
maintaining full compliance only in the latter case. A selection of quanti-
tative results is provided to demonstrate the viability and effectiveness of
the proposed approach is provided in section 4.4. 1
4.1 Nonuniform compression of geometrically distorted
images
Images acquired by optical sensors usually present some kind of geomet-
rical distortion due to the characteristics of lenses and sensors adopted
in the acquisition system, or to the physical structure of the object un-
der inspection, such as in the case of textures projected onto non-planar
surfaces [29]. In specific applications, such effects may also become more
significant, due to the specific nature of the acquisition system. This is
the case for instance of acquisition systems used in video surveillance or
ambient intelligence applications, where wide-angle lenses are commonly
used to acquire large areas with a single camera. In particular, fish-eye
lenses and panoramic lenses using omni-directional mirrors are adopted to
grab large portions of narrow indoor environments (a room, a car inside,
etc.) [[58]-[58]]. Another application that strongly suffers from geometrical
distortion is remote sensing [73].
In the projection of the real-world scene onto the image plane, the geo-
1This chapter was published in [21]
36
-
4.1. NONUNIFORM COMPRESSION OF GEOMETRICALLY DISTORTED IMAGES
metrical distortion acts as a non linear spatial compression and expansion
of the luminance function in the pixel plane. This may cause problems
in all the successive image treatment stages, from low-level processing to
the interpretation of the scene, and can be partially solved by applying
geometrical correction techniques based on sensor models and calibration
processes. Unfortunately, the correction is only seldom operated at the
sensor level, while it usually takes place at some remotely connected unit,
where the application software is run. The geometrical correction may
then happen to be carried out after that important processing steps have
already been applied: in particular, compression and encoding of images
is often implemented on-board to attain a more efficient transmission.
Some proposals to exploit the knowledge about the acquisition process
to improve image processing have already been made, with application to
specific domains such as medical tele-radiology [23]. In [51] a generic and
very simple acquisition model is studied, where the acquisition sensor is
modeled through a modulation transfer function which simply introduces
blurring. Another related work on the topic can be also found in [57], where
the features of a retina-like sensor, associated with an omni-directional
mirror, are exploited for imaging purposes.
In this framework, we have investigated the impact of the geometrical
distortion on image compression. The aim is to limit as much as possible
the amount of encoded data in order to accommodate it to the transmission
bandwidth available. As a result, we verified that it is possible to improve
the compression performance when encoding is applied to the geometrically
distorted image.
The analysis was conducted on both distorted images produced by real
systems, and synthetic images achieved by warping algorithms that sim-
ulate common distortion effects (fish-eye and mirrored lenses). For this
reason we will use in the following the terms warping and distortion, as
37
-
CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION
well as the opposite terms analysis-warping and geometric correction, re-
ferring to the same concepts.
4.2 Evaluation of spatial distortion
The first goal of the work is to evaluate the impact deriving from lossy
co-decompression followed by geometric distortion correction on the final
image quality.
The underlying assumption is that the image is compressed and de-
compressed before applying any geometrical correction. This hypothesis
is reasonable in many practical systems due to several reasons, including:
necessity of ensuring a low complexity of the acquisition system, use of sen-
sors with embedded compression tools, frequent changes of optical lens or
environment preventing the use of an embedded de-warping algorithm, etc.
On the other hand, compression is increasingly used in the early stage of
the acquisition, in particular for applications where the sensor is remotely
connected to the processing unit using narrow-bandwidth channels (e.g.,
wireless cameras) or is attached to a limited-capacity local storage device.
A spatial distortion in the acquisition system introduces a non-uniform
distribution of the visual information in the acquired image. As a matter
of fact, given two image areas with equivalent frequency content in the
undistorted domain, the relevant areas in the acquired picture will show
a higher frequency content where spatial compression occurred, and vice-
versa. Conversely, the coding algorithm usually operates in a homogeneous
way over the whole image. To achieve effective data compression it must
neglect some information, especially at the higher frequencies, and have to
produce an information loss as uniform as possible over the whole image,
in order to avoid local peaks in the distortion.
Consequently, the error introduced by the encoder in an image region
38
-
4.2. EVALUATION OF SPATIAL DISTORTION
(a)
(b)
Figure 4.1: Graphical representation of a fish-eye distorted image: (a) before, and (b)
after de-warping.
will be proportional to the local spatial deformation. Where spatial com-
pression is present, the error will affect a larger zone in the final corrected
image, and will be more severe due to the presence of higher frequency
contents. On the other side, in areas with low information density, the
error will be attenuated by the averaging effect introduced by geometrical
correction algorithms. Figure 4.1 depicts an example of this phenomenon
related to the use of a fish-eye lens, where the above concepts can easily
find clear evidence. It can be observed that two areas of equal dimension
in the undistorted (or corrected) domain, represented in dark and light
gray in Fig.4.1.b, are associated in the distorted domain to areas contain-
ing more or fewer samples according to their spatial position and to the
geometry of the acquisition system.
In order to quantify this effect, the idea is to compare two schemes
(see Fig.4.2): in the former, labeled as “scheme A”, the acquired image is
compressed and transmitted after the geometric correction; in the latter,
“scheme B”, compression and transmission are performed prior the geo-
metric correction of the image. The distortion is measured in any case by
39
-
CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION
comparing the final result (decompressed, de-warped image) with the un-
compressed, de-warped image, being the real-world (undistorted) picture
unavailable in real cases.
A commonly accepted metric to estimate the distortion introduced by a
processing system is the Peak Signal-to-Noise Ratio (PSNR), which treats
the distortion as a kind of noise introduced on the original data, indepen-
dently of its origin. The noise power is estimated through the computation
of the Mean Square Error (MSE), and the signal power is computed on the
basis of the maximum excursion of the luminance function, namely:
PSNR(dB) = 10 · log1022·b
MSE(4.1)
where b is the number of bits per pixels in the original image. Usually
PSNR is calculated on the whole image, but we are interested mostly in lo-
cal measures that can highlight the non-homogeneous distortion introduced
by scheme B as compared to scheme A. For the purpose of evaluating the
local distortion introduced by the process, we propose a method, called
QDM, which uses a quadtree decomposition to generate a local map of
the distortion effects. It will be demonstrated that QDM can be useful to
evaluate the performance of compression schemes applied to geometrically
distorted images, as well as to design optimized compression schemes able
to improve the overall coding performance. It is to be pointed out that the
concept of QDM is independent of the use of PSNR as a quality measure:
QDM-based approaches can be implemented also using more sophisticated
perceptual error models at the price of an increased complexity [65].
QDM is based on the application of the well-known quadtree decom-
position algorithm [5]. The quadtree segmentation was demonstrated to
efficiently represent simple image partitions subject to rigid geometric con-
straints. In our approach the quadtree decomposition is applied to the
error image Ierr(x, y), defined as the absolute difference performed on a
40
-
4.2. EVALUATION OF SPATIAL DISTORTION
(a) (b)
Figure 4.2: The two alternative compression and transmission schemes considered in the
estimation of the impact of geometrical distortion on compression performance: (a) co-
decoding is applied after geometrical correction, (b) vice-versa.
41
-
CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION
pixel-by-pixel basis between the reference image Iref(x, y) (i.e. the geo-
metrically corrected uncompressed image) and the output image (image
after co-decompression and de-warping, in either order) Idist(x, y):
∀(x, y) Ierr = |Iref(x, y) − Idist(x, y)| (4.2)
The aim is to obtain a map representing the spatial distribution of
the distortion, through local measurements of the PSNR. The areas where
PSNR is considered to be homogeneous are those identified by the leaves of
the Quadtree decomposition. The QDM algorithm is a recursive process,
and proceeds as follows:
i Compute the variance σ2 of the error image Ierr(x, y)
ii If σ2 is greater than a given threshold Σth: then ⇒ split the imageinto four sub-images, having its size along x and y directions else ⇒stop recursion
iii Recursively apply steps (i) and (ii) to each sub-image until each block
fulfills the variance condition defined at point (ii) or reaches a mini-
mum size ∆min.
The stop condition in point (iii) takes into account also a minimum
allowed dimension ∆min for each sub-image, to avoid excessive splitting: in
our tests, we used ∆min = 8, corresponding to the typical block size used
in coding standards. Σth is set equal to α · σ2A, where σ2A is the varianceof the error image in Case A, and α is a parameter in the range 1 ÷ 2taking into account the type of spatial distortion and the characteristics of
the compression algorithm. More in detail, the choice of a is connected to
the distortion introduced by the acquisition device, which largely depends
on the viewing angle. For instance, the effect of fish-eye lenses can be
approximated by a spherical transform, in which the distributed over large
42
-
4.2. EVALUATION OF SPATIAL DISTORTION
image areas, while not reaching very high values. In this case, a low value
of a (e.g. 1.2 ÷ 1.5) is required to achieve a precise QDM map. On theother side, parabolic or conic projections typical of mirrored lenses produce
heavier distortions, thus requiring higher values of α (1.6 ÷ 2) to focuson greatly distorted areas. Consequently, it has been found that it is
possible to heuristically set α a-priori on the basis of the type of geometrical
distortion, independently of the image content. Further considerations on
the setting of α are provided in section 4.4 (tables 4.1, 4.2, 4.3 and 4.4
and relevant discussion), where the impact of the coding algorithm is also
considered.
As a consequence of the above procedure, the areas are split where the
error is more fluctuant, thus achieving a subdivision of the error image
into areas with nearly constant distortion. The result of the decomposi-
tion is a sparse matrix that indicates the block subdivision of the error
image in block of various dimensions, associated to different error values.
In figure 3, an example of QDM is shown with application to the “blood”
test image, 256x256, 8 bpp. Here, the distortion introduced by the ac-
quisition system is simulated by a polar coordinate transform (3.a), which
reproduces the behavior of a 360 mirrored lens. The error is computed be-
tween the reference image (uncompressed de-warped) in Fig. (3.c) and the
output of schemes ’A’ and ’B’, in Figs. (3.d-e), respectively. A standard
JPEG encoder with compression ratio CR = 10 was used in both cases
(the co-decoded images in warped and de-warped domains are shown in
Figs. 3.b and 3.d, respectively), while the parameter a was set to 1.4. The
compression ratio CR is defined as:
CR =Nb,oNb,c
(4.3)
where Nb,o is the number of bits required for representing the original
image in the canonical form and Nb,c the number of bits after compression.
43
-
CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION
Since the variance threshold Σth results higher than the error variance
in scheme A, the relevant output image does not produce any split. As
far as scheme B is concerned, the result of the splitting process is repre-
sented in Fig. (3.f). In Fig. (3.g), called QDM map, each leave of the
relevant quadtree is associated to a gray level proportional to the local
distortion (the higher the distortion, the darker the corresponding block).
The QDM map of scheme B makes evident that the compression in the
distorted domain generates an uneven distribution of the error. To better
appreciate this fact, in Fig. (3.h) the QDM map associated to scheme B is
transformed back in polar coordinates, i.e., in the original acquisition do-
main. The resulting map provides a convincing confirmation of the above
reasoning about the implications of lossy compression applied to geomet-
rically distorted images. As a matter of fact, it can be observed that the
quality degradation progressively increases toward the image center, where
the information density is higher (due to spatial compression).
It is important to point out that in the compression of natural images,
the distribution of the error can fluctuate also in the absence of geometri-
cal distortions, due to the non-stationarity of the input image and to the
characteristics and parameters of the encoder. Nevertheless, this effect can
be neglected for two reasons.
First, the image content is the same for both scheme A and B, thus
allowing a comparative assessment. The underlying assumption is that the
effects of non-stationary image contents and geometrical distortions on the
error distribution are uncorrelated and additive. This is not completely
true in general, due to the fact that a geometrical deformation can al-
ter not only the magnitude but also the orientation of spatial frequencies
(e.g., straight lines become curves when acquired by a wide-angle lens).
Therefore, due to the different treatment of the spatial frequencies at the
encoder, the distortion can have some “second-order” effects on the final
44
-
4.2. EVALUATION OF SPATIAL DISTORTION
(a) (b)
(c) (d) (e)
(f) (g) (h)
Figure 4.3: Example of application of QDM: (a) original, uncompressed and warped by
polar transform, (b) compressed in warped domain, (c) original, uncompressed de-warped,
(d) output of scheme A , (e) output of scheme B, (f) result of split process scheme B, (g)
QDM map scheme B, (h) polar transform of QDM map scheme B Note that when the
split process is applied to the scheme A (with the same parameters used for scheme B),
there is no split at all, and the QDM map is a constant value image.
45
-
CHAPTER 4. NON UNIFORM COMPRESSION IN IMAGE AND VIDEO TRANSMISSION
result. Nevertheless, these phenomena are related more to the perceptive
quality of the decompressed image than to its objective assessment, and
therefore can be neglected in QDM, which is simply based on absolute error
estimation.
Second and more important, in practical applications QDM is meant to
be performed off-line, by presenting to the system some pre-defined cali-
bration images, designed to match the application to which the acquisition
system is targeted. For instance, in a fixed camera surveillance system the
calibration set could be obtained by selecting some shots acquired in typi-
cal operating conditions, thus allowing to take into account also the local
image content. On the contrary, to achieve a general purpose system the
calibration image should have a frequency content as uniform as possible,
to ensure a uniform behavior independently of the application. Accord-
ing to this last model, in our tests we used images containing statistical
or structural textures, as in the case of the “blood” image, or synthetic
patterns obtained by patch repetition.
A further consideration about system calibration concerns the possibil-
ity of computing the distortion map a-priori, simply based on the char-
acteristics of the acquisition system. For instance, it would be possible
to determine the local compression and expansion due to the geometrical
deformation, and directly estimate the relevant impact on the compression
distortion. Unfortunately, this is not a trivial task, since the deformation
produces in general a re-sampling of the picture over an irregular sam-
pling grid, which in turn generates very different spatial frequencies (both
in magnitude and orientation). Moreover, such spurious freq