ECC Video: An Active Second Error Control Approach for...
Transcript of ECC Video: An Active Second Error Control Approach for...
ECC Video: An Active Second Error Control Approach
for Error Resilience in Video Coding
Bing Bing Du
Submitted in fulfillment of the requirements for the degree of
Doctor of Philosophy
In the School of Electrical and Electronics Systems Engineering
Queensland University of Technology
Brisbane
Australia
September 2003
Abstract
To support video communication over mobile environments has been one of the
objectives of many engineers of telecommunication networks and it has become a basic
requirement of a third generation of mobile communication systems. This dissertation
explores the possibility of optimizing the utilization of shared scarce radio channels for
live video transmission over a GSM (Global System for Mobile telecommunications)
network and realizing error resilient video communication in unfavorable channel
conditions, especially in mobile radio channels.
The main contribution describes the adoption of a SEC (Second Error Correction)
approach using ECC (Error Correction Coding) based on a Punctured Convolutional
Coding scheme, to cope with residual errors at the application layer and enhance the
error resilience of a compressed video bitstream. The approach is developed further for
improved performance in different circumstances, with some additional enhancements
involving Intra Frame Relay and Interleaving, and the combination of the approach with
Packetization.
Simulation results of applying the various techniques to test video sequences Akiyo and
Salesman are presented and analyzed for performance comparisons with conventional
video coding standard. The proposed approach shows consistent improvements under
these conditions. For instance, to cope with random residual errors, the simulation
results show that when the residual BER (Bit Error Rate) reaches 10-4, the video output
reconstructed from a video bitstream protected using the standard resynchronization
approach is of unacceptable quality, while the proposed scheme can deliver a video
output which is absolutely error free in a more efficient way. When the residual BER
reaches 10-3, the standard approach fails to deliver a recognizable video output, while
the SEC scheme can still correct all the residual errors with modest bit rate increase. In
bursty residual error conditions, the proposed scheme also outperforms the
resynchronization approach. Future works to extend the scope and applicability of the
research are suggested in the last chapter of the thesis.
Acknowledgements
I would like to acknowledge the excellent guidance, continuous help and generous
support from my PhD principal supervisor Prof. Anthony Maeder and associate
supervisor Prof. Miles Moody. It is their insights and encouragements that made this
research successful.
I would also like to express my gratitude for the financial assistance given to me by the
Cooperative Research Centre for Satellite Systems during my time at the Queensland
University of Technology.
i
Contents
Contents ______________________________________________________________i
List of Figures ________________________________________________________ix
List of Tables _________________________________________________________xi
1 INTRODUCTION ________________________________________________ 1
1.1 Mobile Video System __________________________________________ 1
1.2 Challenges on Networking Aspects _______________________________ 2
1.2.1 Optimized utilization of scarce radio channel resources ____________ 2
1.2.2 Effective error control schemes _______________________________ 3
1.3 Challenges on Source Video Coding ______________________________ 5
1.4 State of the Art of the Current Error Resilience Tools _______________ 6
1.5 Second Error Control and ECC video ____________________________ 6
1.6 Organization of the Thesis ______________________________________ 7
1.7 Contributions and Publications from the Research _________________ 9
References ________________________________________________________ 11
2 OVERVIEW of GSM SYSTEM ____________________________________ 15
2.1 Architecture and functions of the GSM network___________________ 15
2.1.1 Mobile station ____________________________________________ 16
2.1.2 The Base Station Subsystem_________________________________ 16
2.1.2.1 The Base Transceiver Station ______________________________ 16
ii
2.1.2.2 The Base Station Controller _______________________________ 16
2.1.3 The Network and Switching Subsystem________________________ 17
2.1.3.1 The Mobile services Switching Center (MSC)_________________ 17
2.1.3.2 The Gateway Mobile services Switching Center (GMSC)________ 17
2.1.3.3 Home Location Register (HLR) ____________________________ 17
2.1.3.4 Visitor Location Register (VLR) ___________________________ 18
2.1.3.5 The Authentication Center (AuC)___________________________ 18
2.1.3.6 The Equipment Identity Register (EIR) ______________________ 19
2.1.3.7 The GSM Interworking Unit (GIWU) _______________________ 19
2.1.4 The Operation and Support Subsystem (OSS) ___________________ 19
2.1.5 Additional Functional Elements ______________________________ 20
2.1.5.1 Message Center_________________________________________ 20
2.1.5.2 Mobile Service Node ____________________________________ 20
2.1.6 The geographical areas of the GSM network ____________________ 20
2.2 Signalling system in GSM _____________________________________ 21
2.2.1 GSM Radio Channels ______________________________________ 21
2.2.1.1 Dedicated Channels _____________________________________ 21
2.2.1.2 CCCH (Common Control Channels) ________________________ 21
2.2.2 Signalling Interfaces and Protocols ___________________________ 22
2.2.2.1 Um interface ___________________________________________ 23
2.2.2.2 A Interface ____________________________________________ 23
2.2.2.3 A-bis Interface _________________________________________ 24
2.2.2.4 MAP interfaces _________________________________________ 25
2.2.2.5 X.25 Interface System____________________________________ 26
2.3 The Multiple Access Scheme ___________________________________ 26
2.3.1 FDMA__________________________________________________ 26
2.3.1.1 Primary GSM __________________________________________ 26
2.3.1.2 E-GSM _______________________________________________ 26
2.3.1.3 DCS-1800 _____________________________________________ 27
2.3.2 TDMA__________________________________________________ 27
2.3.2.1 Traffic channel Frame Structure (26-Multiframe) ______________ 27
2.3.2.2 Signalling Frame Structure ________________________________ 28
iii
2.3.2.3 Structure of a TDMA Slot within a Frame ____________________ 28
2.3.3 Frequency Hopping________________________________________ 29
2.4 Source coding and channel coding ______________________________ 29
2.4.1 Speech coding ____________________________________________ 30
2.4.1.1 Full Rate speech Coding __________________________________ 30
2.4.1.2 Half Rate Speech Coding _________________________________ 30
2.4.1.3 Multirate Speech Coding _________________________________ 31
2.4.1.4 Enhanced Speech Coding _________________________________ 31
2.4.2 Channel coding ___________________________________________ 31
2.4.2.1 CRC__________________________________________________ 31
2.4.2.2 Block Code ____________________________________________ 32
2.4.2.3 Convolutional Code _____________________________________ 32
2.4.3 Interleaving ______________________________________________ 32
2.4.4 Encryption_______________________________________________ 32
References ________________________________________________________ 33
3 VIDEO OVER GPRS NETWORK__________________________________ 35
3.1 Data services in GSM networks_________________________________ 35
3.1.1 PDS and SMS ____________________________________________ 36
3.1.2 HSCSD _________________________________________________ 36
3.1.3 GPRS___________________________________________________ 37
3.2 Possibilities for video over GSM networks________________________ 38
3.2.1 Video over HSCSD________________________________________ 38
3.2.2 Video over GPRS _________________________________________ 39
3.2.3 Dynamic channel allocation _________________________________ 41
3.2.4 Example ________________________________________________ 43
3.2.5 EDGE __________________________________________________ 44
3.3 Conclusion __________________________________________________ 45
References ________________________________________________________ 45
iv
4 OVERVIEW OF VIDEO CODING TECHNIQUES AND THE CURRENT
VIDEO CODING STANDARDS ___________________________________ 47
4.1 Waveform based video coding __________________________________ 47
4.1.1 Motion estimation _________________________________________ 48
4.1.1.1 Optical flow techniques __________________________________ 48
4.1.1.2 Block matching techniques________________________________ 49
4.1.1.3 Pel-recursive techniques __________________________________ 49
4.1.2 Transforms ______________________________________________ 50
4.2 Model based video coding _____________________________________ 50
4.2.1 3D model coding__________________________________________ 51
4.2.2 2D model coding__________________________________________ 51
4.3 Current Video Standards______________________________________ 52
4.3.1 Core video coding techniques in the current video coding standard __ 53
4.4 Overview of error resilience techniques __________________________ 56
4.4.1 Error resilient encoding_____________________________________ 56
4.4.1.1 Robust Entropy encoding _________________________________ 56
4.4.1.2 Error Resilient prediction _________________________________ 57
4.4.1.3 Layered Coding with Unequal Error Protection ________________ 58
4.4.1.4 Multiple Description Coding ______________________________ 59
4.4.2 Decoder Error Concealment _________________________________ 60
4.4.2.1 Recovery of Texture Information ___________________________ 60
4.4.2.2 Recovery of Coding Modes and Motion Vectors _______________ 62
4.4.3 Encoder and Decoder Interactive Error Control __________________ 62
4.4.3.1 Reference Picture Selection (RPS) Based on Feedback Information 63
4.4.3.2 Error Tracking Based on Feedback information________________ 63
4.5 Error resilience tools in the current video coding standards _________ 64
4.5.1 Error resilience tools in H.263 _______________________________ 64
4.5.1.1 Forward Error Correction Mode (FEC) (Annex H) _____________ 64
4.5.1.2 Slice Structure Mode (Annex K) ___________________________ 65
4.5.1.3 Independent Segment Decoding Mode (Annex R)______________ 65
4.5.1.4 Reference Picture Selection (RPS - Annex N) _________________ 66
v
4.5.2 Error resilience tools in MPEG-4 _____________________________ 68
4.5.2.1 Packetization___________________________________________ 68
4.5.2.2 Data Partitioning ________________________________________ 70
4.5.2.3 Reversible VLC ________________________________________ 71
4.5.2.4 Adaptive Intra Refresh for Error Resilience ___________________ 71
4.5.2.5 NEWPRED ____________________________________________ 72
References ________________________________________________________ 73
5 OVERVIEW OF ERROR CORRECTION TECHNIQUES _____________ 79
5.1 Introduction_________________________________________________ 79
5.2 Block codes _________________________________________________ 80
5.2.1 Linear Cyclic Codes _______________________________________ 81
5.3 Convolutional codes __________________________________________ 82
5.3.1 Convolutional Encoding ____________________________________ 82
5.3.2 Viterbi Decoding__________________________________________ 84
5.3.3 Performance of Convolutional codes __________________________ 86
5.3.3.1 Performance of Hard-decision Viterbi decoding algorithm _______ 86
5.3.3.2 Performance of Soft-decision Viterbi decoding algorithm________ 87
5.3.3.3 Advantages of soft-decision over hard-decision decoding ________ 88
5.3.4 Punctured Convolutional code _______________________________ 90
References ________________________________________________________ 91
6 SECOND ERROR CONROL AND ECC VIDEO______________________ 93
6.1 Introduction_________________________________________________ 93
6.2 Second Error Control_________________________________________ 95
6.3 ECC video – the SEC approach_________________________________ 96
6.4 Simulation Results ___________________________________________ 99
6.4.1 Experiment conditions _____________________________________ 99
6.4.2 Results_________________________________________________ 100
vi
6.5 Discussion__________________________________________________ 103
References _______________________________________________________ 105
7 ECC VIDEO WITH IFR _________________________________________ 115
7.1 Introduction________________________________________________ 115
7.2 ECC with IFR ______________________________________________ 116
7.3 Simulation results ___________________________________________ 118
7.4 Delay analysis due to the employment of IFR ____________________ 120
7.5 Conclusion _________________________________________________ 122
References _______________________________________________________ 122
8 ECC VIDEO WITH SOFT-DECISION VITERBI DECODING ________ 127
8.1 Introduction________________________________________________ 127
8.2 ECC Video with Soft-Decision Viterbi Decoding__________________ 128
8.3 Simulation results ___________________________________________ 129
8.4 Discussion__________________________________________________ 132
Reference ________________________________________________________ 134
9 ECC VIDEO IN BURSTY CHANNEL ERRORS AND PACKET LOSS _ 145
9.1 Performance of the original ECC approach in Bursty residual Errors 146
9.2 ECC Video with Interleaving__________________________________ 148
9.3 Simulation Results __________________________________________ 150
9.3.1 ECC video in bursty errors _________________________________ 150
9.3.2 ECC video with burst lost in GPRS network ___________________ 152
9.4 Discussion__________________________________________________ 154
References _______________________________________________________ 157
vii
10 ECC WITH PACKETIZATION___________________________________ 165
10.1 Combination of ECC and Packetization_________________________ 165
10.2 Simulation Results __________________________________________ 165
11 CONCLUSIONS AND FUTURE WORK ___________________________ 169
11.1 Optimized utilization of radio channel __________________________ 169
11.2 The proposed error resilience video coding tools in this thesis ______ 169
11.3 Future Research Directions ___________________________________ 173
References _______________________________________________________ 175
viii
ix
List of Figures
FIGURE 1.1 BLOCK DIAGRAM OF VIDEO TRANSMISSION SYSTEM OVER MOBILE CHANNELS...................... 1
FIGURE 2.1 GENERAL ARCHITECTURE OF A GSM NETWORK ................................................. 15
FIGURE 2.2 GSM NETWORK AREAS ..................................................................................... 20
FIGURE 2.3 UM AND A INTERFACE ...................................................................................... 22
FIGURE 2.4 A-BIS INTERFACE ............................................................................................. 24
FIGURE 2.5 MAP INTERFACES ............................................................................................ 25
FIGURE 2.6 TRAFFIC CHANNEL FRAME STRUCTURE .............................................................. 28
FIGURE 2.7 SPEECH SIGNAL PROCESSING............................................................................. 30
FIGURE 4.1 DCT BASED VIDEO CODING ..................................................................................... 53
FIGURE 4.2 ZIGZAG SCAN OF DCT COEFFICIENTS........................................................................ 54
FIGURE 4.3 VRC WITH TWO THREADS AND THREE FRAMES PER THREAD.......................................... 66
FIGURE 4.4 FRAME LOSS WITH VRC...................................................................................... 67
FIGURE 4.5 PACKET STRUCTURE .............................................................................................. 69
FIGURE 4.6 STRUCTURE OF DATA PARTITIONING......................................................................... 71
FIGURE 5.1 CONVOLUTIONAL ENCODER .................................................................................... 82
FIGURE 5.2 STATE DIAGRAM OF A 4-STATE CONVOLUTIONAL ENCODER ........................................... 83
FIGURE 5.3 TRELLIS DIAGRAM OF A 4-STATE CONVOLUTIONAL ENCODER ......................................... 84
FIGURE 5.4 BASIC PROCEDURE OF PUNCTURED CODING FROM RATE ½ CONVOLUTIONAL CODE.............. 90
FIGURE 6.1 VIDEO COMMUNICATION SYSTEM WITH ECC.............................................................. 97
FIGURE 6.2 PSNR OF SALESMAN THROUGH ERROR FREE CHANNEL ............................................... 107
FIGURE 6.3 PSNR OF SALESMAN WITH BER OF 1 X 10-5 ............................................................. 107
FIGURE 6.4 PSNR OF SALESMAN WITH BER OF 4 X 10-5 ............................................................. 108
x
FIGURE 6.5 PSNR OF SALESMAN WITH BER OF 1.7 X 10-4 .......................................................... 108
FIGURE 6.6 PSNR OF AKIYO THROUGH ERROR FREE CHANNEL .................................................... 109
FIGURE 6.7 PSNR OF AKIYO WITH BER OF 1 X 10-5................................................................... 109
FIGURE 6.8 PSNR OF AKIYO WITH BER OF 4 X 10-5................................................................... 110
FIGURE 6.9 PSNR OF AKIYO WITH BER OF 1.7 X 10-4 ............................................................... 110
FIGURE 6.10 PSNR OF AKIYO AT BER OF 10-4 .......................................................................... 111
FIGURE 6.11 PSNR OF SALESMAN WITH BER OF 10-4 ............................................................... 111
FIGURE 7.1 PSNR OF SALESMAN WITH BER OF 1X10-4............................................................. 124
FIGURE 7.2 PSNR OF AKIYO WITH BER OF 1X10-4........................................................... 124
FIGURE 8.1 PERFORMANCE OF ECC(11/12) FOR SALESMAN WITH RANDOM ERRORS ...................... 136
FIGURE 8.2 PERFORMANCE OF ECC(11/12) FOR AKIYO WITH RANDOM ERRORS ............................ 137
FIGURE 8.3 SALESMAN WITH BER OF 10-2 ............................................................................ 138
FIGURE 8.4 AKIYO WITH BER OF 10-2.................................................................................. 138
FIGURE 9.1 PSNR OF I PICTURE WITH BURSTY ERRORS ............................................................. 147
FIGURE 9.2 PSNR OF P FRAME WITH BURSTY ERRORS............................................................... 148
FIGURE 9.3 VIDEO COMMUNICATION SYSTEM WITH ECC AND INTERLEAVING.................................. 149
FIGURE 9.4 INTERLEAVER FOR CODED DATA ............................................................................ 149
FIGURE 9.5 PERFORMANCE OF SALESMAN WITH BURSTY ERRORS................................................ 151
FIGURE 9.6 PERFORMANCE OF SALESMAN WITH BURST LOSS...................................................... 153
FIGURE 9.7 PERFORMANCE OF SALESMAN WITH BURSTY ERRORS (THE INTERLEAVING IS BASED ON
FRAME) .................................................................................................................... 159
FIGURE 9.8 PERFORMANCE OF SALESMAN WITH BURST LOSS (THE INTERLEAVING IS BASED ON
FRAME) .................................................................................................................... 159
FIGURE 10.1 PSNR OF ECC COMBINED WITH PACKETIZATION............................................ 166
xi
List of Tables TABLE 6-1 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(13/14) FOR
AKIYO .................................................................................................................... 112
TABLE 6-2 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(13/14) FOR
SALESMAN .............................................................................................................. 113
TABLE 7-1 BIT NUMBER COMPARISON BETWEEN ECC ALONE AND ECC PLUS IFR FOR
AKIYO .................................................................................................................... 125
TABLE 7-2 BIT NUMBER COMPARISON BETWEEN ECC ALONE AND ECC PLUS IFR FOR
SALESMAN .............................................................................................................. 126
TABLE 8-1 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(11/12) FOR
SALESMAN .............................................................................................................. 139
TABLE 8-2 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(11/12)
FOR AKIYO ............................................................................................................ 140
TABLE 8-3 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(9/10) FOR
SALESMAN .............................................................................................................. 141
TABLE 8-4 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(9/10) FOR
AKIYO .................................................................................................................... 142
TABLE 8-5 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(7/8) FOR
SALESMAN .............................................................................................................. 143
TABLE 8-6 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(7/8) FOR
AKIYO .................................................................................................................... 144
TABLE 9-1 GPRS CHANNEL CODING SCHEMES ............................................................ 152
xii
TABLE 9-2 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(450) AND ECC (9/10) FOR
SALESMAN .............................................................................................................. 160
TABLE 9-3 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(450) AND ECC (7/8) FOR
SALESMAN .............................................................................................................. 161
TABLE 9-4 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(380) AND ECC(7/8) FOR
SALESMAN .............................................................................................................. 162
TABLE 9-5 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(450) AND ECC(5/6) FOR
SALESMAN .............................................................................................................. 163
TABLE 9-6 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(250) AND ECC(5/6) FOR
SALESMAN .............................................................................................................. 164
TABLE 10-1 BIT NUMBER COMPARISON WHEN ECC AND PACKETIZATION ARE
COMBINED.............................................................................................................. 168
TABLE 11-1 PERFORMANCE COMPARISON FOR SALESMAN ............................................ 171
TABLE 11-2 BIT NUMBER COMPARISON BETWEEN BASIC AND RVLC FOR SALESMAN... 176
TABLE 11-3 BIT NUMBER COMPARISON BETWEEN BASIC AND RVLC FOR AKIYO.......... 177
1
1 INTRODUCTION
1.1 Mobile Video System
To support video communication over mobile environments has been one of the
objectives of many engineers of telecommunication networks and it has become a basic
requirement of a third generation of mobile communication systems. Advances in
video compression and mobile computing techniques have provided the possibility
[22,24] of transmitting video sequences over band-limited wireless channels.
Figure 1.1 Block diagram of video transmission system over mobile channels
The general block diagram of a video transmission system over a radio channel is
depicted in Fig 1.1. The digital video sequence is compressed by a video source
encoder and passed to a channel encoder, which adds appropriate redundancy for error
protection. After some transport processing and modulation, the video data packets are
Source Source Encoder
Channel
Encoder
Channel
Channel
Decoder
Source
DecoderDisplay
2
sent through the radio channel. At the receiver side, the demodulated data packets are
passed to the channel decoder for error detection and correction. The decoded packets
are reassembled to a bitstream and delivered to the video source decoder. The
decompressed video sequence is sent out for display. In a two-way communication
system, a return channel is available for the receiver to send back acknowledgements
about the receiving states.
To make this system realistic, many challenges need to be addressed on both
networking and video compression. The effort in this work has been put on transmitting
a video sequence over the GSM (global system for mobile telecommunications) [3]
network, because of the fact that it is widely implemented all over the world in more
than 80 countries around Europe, Asia and Australia.
1.2 Challenges on Networking Aspects
The main characteristic of a mobile network compared with a wired network is that the
radio channel resources are very scarce and error prone. There are two aspects, which
need to be addressed before a mobile video system can be put into commercial
operation.
1.2.1 Optimized utilization of scarce radio channel resources
Though HSCSD (high speed circuit switched data services) [1,22] and GPRS (general
packet radio service) [6] have paved the way making evolution of the GSM system
toward 3G (third generation system) more realistic, optimised utilization of the scarce
radio channels still pose a big challenge to a live mobile video system, because of the
variable bit rate of a video bitstream compressed using the current video coding
standards including MPEG series and H.26x series.
Both HSCSD and GPRS provide the possibility to transmit low bit rate video over the
GSM network, but they are not ideal as we face a problem in deciding how to allocate a
radio channel. The bit rate for I pictures (please refer to Chapter 4 for more detail about
the definition and content of I picture, P picture or B picture) is much higher than that
for P pictures or B pictures, so if we allocate a channel based on I picture size, most of
3
the time the channel resource is wasted when transmitting P or B pictures. On the other
hand, if we allocate the channel according to the bit rate for P pictures or B pictures, we
will be unable to transmit complete I pictures, which are the most important pictures for
the video decoding process.
One possible solution based on direct intuition might be the repeated acquiring and
releasing of channels to cater for I and P pictures. But the current channel allocation
protocol is based on a contention scheme, which can introduce unacceptable delay for
real time video applications. Obviously some modification to the current channel
allocation scheme is needed.
1.2.2 Effective error control schemes
A typical wireless channel has limited bandwidth and is error-prone and very unreliable
because of fading, noise, delay spread and interference from other users by band sharing
[36]. These limitations present a hostile communication environment especially for
video applications. The quality of service (QoS) requirement for video is quite different
from those for voice and data services. Compared to voice, video transmissions require
more reliable channels with low end-to-end bit error rate, which should be less than 10-5
[1]. The transmission data rate for video even after compression is much higher than
that for voice. Unlike data services, real-time video must ensure a small bounded
delay, which should be less than 300ms [1]. All these requirements call for a proper
error control mechanism.
The design of an appropriate error control scheme is based on several considerations.
First the objective of the error control scheme is to improve the end-to-end bit error rate
as much as possible. However we should note that the capability of the error control
scheme strongly depends on the channel characteristics and error patterns. Second, the
overhead imposed by the error control scheme should be made as low as possible to
increase the system throughput. The delay incurred by the error control scheme should
be as small as possible especially for real-time services. Finally the complexity of the
error control scheme should be simple to minimize the design and implementation cost.
Traditionally, error control is mainly implemented at the data link layer of a network
where forward error correction (FEC) [2] schemes and automatic repeat request (ARQ)
4
[2] techniques are employed to combat the bit errors existing in the communication
channels, these techniques have achieved great success for data services and speech
communication services in either wired or wireless environments.
FEC uses error correction codes for reliable data transmission. With FEC alone, a
communication system has constant throughput and a low bounded delay, which are
quite important for real-time services. For a non-stationary wireless fading channel, the
most serious disadvantage of FEC is that it is stationary. It can be designed and
implemented for the worst case of channel conditions. Obviously this is inefficient, as
when the channel condition is good, the FEC designed for the worst channel condition
can be a great waste of the channel resource.
In order to improve FEC, popular layered source coding and unequal error protection
schemes have been developed. In these schemes, a compressed video sequence is
divided into layers with different priorities according to the importance. The higher
priority layer has more error protection while the lower priority layer has less error
protection [37]. This method certainly improves the efficiency of the error correction
schemes. However since a compressed video bitstream has a variable length structure,
extra overhead is needed for layering to indicate which part of the bitstream it is coming
from. This overhead may not be much in high bit-rate video such as MPEG-1 [13] or
MPEG-2 [21]. But it occupies a large portion in a low-bit-rate video bitstream, which is
not efficient [37].
On the other hand, ARQ techniques require retransmission of packets, which are
detected as having errors. ARQ can achieve high system reliability, but it reduces
throughput and causes long and variable delay because of the retransmissions. A better
method is to combine FEC and ARQ together, which is the so-called hybrid ARQ
schemes. Hybrid ARQ is reliable, efficient and adaptive and offers much better
performance, especially for time-varying fading channels [38]. However, for real-time
video services, a hybrid ARQ still has its limitations.
Theoretically FEC can correct most of the errors if the error correction code is properly
designed; and if FEC fails, ARQ can always being employed to ensure a correct data
delivery. However for real-time video applications, the number of packet
5
retransmissions is limited, therefore the power of ARQ is limited compared with data
applications where the delay requirement is not so strict. Moreover, in most of the
telecommunication systems the FEC schemes supported by the system are limited. For
instance, in GSM [3,4] or GPRS [5,6] networks only four channel coding schemes are
supported. It is unavoidable that the transmission system in a telecommunication
network will leave some bits in error in a final video bitstream delivered to the
application layer. This leads to a strong demand that the source data should have some
kind of error resilient features.
1.3 Challenges on Source Video Coding
The state of the art of video compression is to use DCT [14] (Discrete Cosine
Transform) and motion compensation [15] to exploit spatial and temporal redundancy.
With the employment of variable length Huffman coding technique [16], the coding
efficiency is further improved. While these techniques do achieve high coding
efficiency, they leave a compressed video bitstream very vulnerable to errors inherent to
mobile transmission channels. Because of the use of VLC (Variable Length Code) and
the fact that the only synchronization point in the encoded video bitstream is PSC
(Picture Start Code) if no packetization is employed, a single residual error bit can cause
the video decoder to lose synchronization with the encoder until the next encoded video
frame. In such cases, the rest of the bitstream for the frame with the error will be
undecodable. Here the residual error refers to the errors delivered to the source data at
the application layer by the transmission system of a network after first error control
takes place in data link layer of the network. Due to motion compensation, the effects
of an error within an encoded video frame can propagate to the following frames until
the next Intra frame is encountered in the video bitstream. It is not difficult to see how
much degradation a single error bit can cause.
Now we can see what the challenge is to transmit an extremely vulnerable encoded
video bitstream over an extremely hostile mobile environment. Both networking and
encoded video bitstreams call for some kind of error resilience features to realize video
transmission over mobile network.
6
1.4 State of the Art of the Current Error Resilience Tools
To cope with the requirements of error resilient video coding, diverse error resilience
techniques have been developed [11,12]. Some of them have been incorporated into
MPEG-4 [7,8] and H.263 [9,10] video coding standards. However, all these error
resilience video coding tools are passive in the sense that they do not have the capability
to correct the error bits in a video bitstream. What they can do is to limit the effect and
influence of the errors to a certain degree. For instance, packetization (which has been
adapted by both H.263 and MPEG-4 video coding standards) simply puts some
resynchronization markers in an encoded video bitstream, letting the decoder regain
synchronization when an error occurs in the bitstream by looking for these markers,
thereby limiting the error effects to the packet where the error occurs. Obviously, these
passive error resilience techniques are not satisfactory when employed in mobile
environments. Even when errors are limited to one packet or several packets, the
information contained in the packet or packets will still have to be discarded and this
discarded information is unrecoverable. With inter-frame error propagation effects due
to the employment of motion estimation and motion compensation, the quality of
subsequent decoded video frames rapidly declines to an unrecognizable level if no
proper measures are taken. The conclusion we reach from observations and practice is
that passive error resilience techniques are not enough. Some active error resilience
techniques, which will have the capability to correct the errors in the bitstream before
video decoding, need to be developed.
1.5 Second Error Control and ECC video
As stated in last section, the main disadvantage with current error resilience techniques
is that they do not correct the errors in the video bitstream; instead they try to reduce the
error effect to a certain degree by using some kind of error concealment technique.
Instead of accepting the residual errors passively and trying to ‘repair’ the error effects
the residual errors cause, if we take the approach of applying a ‘Second Error Control’
(SEC) on an encoded video bitstream to recover the corrupted video data by correcting
7
the errors in the bitstream actively (to cope with the residual errors in the source data at
application layer after the ‘First Error Control’ in data link layer takes effect to cope
with the original error bits in the transmission channel), a much more satisfactory result
can be expected. Here First Error Control is achieved by FEC (Forward Error
Correction) and the associated ARQ technique as has already been described.
Obviously employing ARQ in SEC is not realistic in addition to first error control,
which probably has used up all the time limit allowed for retransmission with ARQ: the
better choice is to use FEC. As it is known that even compressed video data still
imposes a large data rate for any transmission system, the requirement on the SEC is
very demanding as the allowed data rate for the overhead caused by SEC is limited. In
this research we use punctured convolutional code to realize SEC, which achieves a
very high coding rate and still has very strong error correction capability. Simulation
results from this research have shown the potential success of the proposed SEC scheme
in applications involving video transmission in mobile environments. When applied to
video transmission, it is simpler, more easily implemented, more efficient and more
effective compared with the current error resilient video coding tools in the MPEG-4
and H.263 video coding standards.
1.6 Organization of the Thesis
Following the present introductory chapter, Chapter 2 presents a brief overview of GSM
networks, which lays the foundation for discussion on possibilities and scenarios for
live video transmission over GSM based networks. In Chapter 3 the potential and
possibility for video transmission over GPRS networks are explored, and some
proposals and suggestions are given to overcome the disadvantages inherent in GPRS
networks for live video communications.
Chapter 4 provides an extensive overview of the state of the art of video coding and
error resilient video coding techniques. The low bit rate video coding techniques are
described first; the introduction of the current video coding standards follows. Though
not widely employed in practice, because of the extremely high coding efficiency and
also because of having been incorporated in the MPEG-4 video coding standard, an
introduction of model based video coding techniques [17] is also included in this
8
chapter. As they are used in all current video coding standards, motion estimation and
motion compensation techniques occupy a large portion of this chapter. Another
important aspect, DCT (discrete cosine transform), is also described in detail. However,
the main emphasis of the chapter has been given to error resilient video coding
techniques. First, the general error resilience video coding techniques are outlined, and
then the error resilient video coding tools in the H.263 and MPEG-4 standard are
described in detail.
In Chapter 5, the current error correction coding techniques are reviewed. The chapter
starts with a brief introduction about block codes [18], followed by the description of
convolutional codes [19]. The focus has been put on the convolutional coding
techniques because this is the error correction technique which is used in the proposed
error resilient video coding technique. More specifically, the basic encoding structure
of a convolutional code, the maximum likelihood decoding algorithm (Viterbi
algorithm) and the performance of convolutional code are described in detail. The core
techniques in the proposed scheme, applied for source coding for error resilience at the
application layer, are punctured convolutional codes [20] which form an important
subclass of convolutional codes. These are described in length, followed by the
discussion about their attractiveness and flexibility.
In Chapter 6, the passiveness and the disadvantages of the current error resilience
techniques in the standards and the need to have some active error resilience tools are
identified. One possible solution, the SEC approach, is given to overcome the problems
inherent with current error resilience tools in the current video coding standards. The
ECC (Error Correction Coding) scheme as an implementation of the SEC approach is
described in detail in this chapter. The ECC scheme in the proposal is achieved with
punctured convolutional code. The simulation results, which prove the success of SEC,
are given followed with some discussions.
Though the proposed new scheme shows very high performance, it does have its own
drawback. Because the only synchronization point in an encoded bitstream is the
picture start code [8], if no packetization is employed, any single error bit which
escapes the protection of the ECC scheme will cause the decoder to lose
synchronization with the encoder until the start of the next frame (though it is very rare
that errors escape the protection with ECC if the ECC rate matches the residual error
9
conditions). To solve this problem, a new scheme is proposed in chapter 7 using back
channel messages. The new scheme is named as IFR (Intra Frame Relay), i.e. when an
Intra frame is decoded with errors, the corresponding area in the following P frame is
encoded in Intra mode to increase the possibility that the following picture frames can
have decent reference frames. The simulation results in this chapter have given a
positive support to the IFR scheme.
In Chapter 8, the ECC approach is further enhanced using the soft decision Viterbi
decoding algorithm to decode the punctured convolutional code. Simulation results
show that based on 100 tests ECC with coding rate 7/8 correct all the residual errors
when BER of the residual errors is less than 10-3. More significantly, decent video
communication can be realized in such a poor residual error condition that the BER (bit
error rate) of the residual errors reaches 10-2 when the IFR is employed. In Chapter 6 it
has been shown that packetization approach fails to deliver a satisfactory output when
the BER of the final video bitstream reaches 10-4.
In Chapter 9, the ECC scheme is expanded into more challenging situations, where
bursty error and packet loss occur frequently. To combat these aspects, the final video
bitstream (after convolutional coding is performed) is interleaved before being sent to
the channel [28,29]. Simulation results show that ECC with interleaving is an effective
approach to cope with both bursty residual errors and packet loss.
The possibility of combining the advantages of ECC and packetization is explored in
Chapter 10. The conclusion from simulation is that Packetization can increase the
performance of ECC when a back channel is not available in practice, even though it is
much less effective and less efficient than ECC scheme when it is used alone. The thesis
is concluded in Chapter 11 with suggested future research directions.
1.7 Contributions and Publications from the Research
The first research contribution of this work is the proposal of a method to update the
channel capacity to accommodate the different data rate requirement of the different
frame types of the compressed video bitstream during a live video communication. The
reconfiguration of the multi-slot to have more than the set of the current active channels
10
should be achieved by means of the communication between the MS (Mobile Station)
and the BSS (Base Station Subsystem), rather than by means of the re-access of
PRACH (Packet Random Access Channel) during the real time transmission, which
would involve further contention and introduce delays. The content of the
communication should be imbedded into the video data transmitted from MS to BSS
(see Chapter 2 and Chapter 3 for details).
The most important contribution of the work included in this thesis is the introduction
of SEC scheme and three error resilience tools based on the SEC approach. In the
proposed application of SEC to video transmission in mobile environment, the SEC
achieved with ECC has outperformed the error resilience tools represented by
resynchronization in the MPEG-4 standard and opened a new direction of development
of error resilience techniques. It is expected that these proposals might be considered
for incorporation in future video coding standards, to augment the error resilience tools
in the current standards.
The research included in this thesis has resulted in a number of publications, which are
listed as References [22-33]. The key publications and the further contributions
contained in them are outlined in the following paragraphs.
[22] explores the possibility of video transmission over a GSM network. The
introduction of HSCSD and GPRS in GSM network makes it possible to transmit a
video bitstream encoded with H.261 or H.263 video coding standards through GSM
network. However, to make the application realistic, a compromise is needed between
wide variations in bit rate needed to cater for all picture types, modeled by an
unbounded VBR scheme, and the inflexibility imposed by the network in allowing only
quantum channel allocation, modeled by step-function bit-rate performance variations
or variable constant-bit-rate (VCBR). The resolution of this dilemma relies on better
system integration and interoperation between the network behavior and the video
coding process, by extracting useful bit-rate information over many successive frames
and exerting careful intelligent control throughout the transmission.
[23] depicts all the aspects associated with video communication over mobile networks
for medical applications and identifies the conceptual and operational problems with
these applications, and then gives some suggestions to solve these problems.
11
[24] is based on the previous work [22]. The advantage in using GPRS other than
HSCSD to transmit low bit rate video bitstream is identified. An elementary dynamic
radio channel allocation scheme is proposed, based on the coordination between base
station and mobile station, and some typical more complex situations are described
which would need to be further developed.
The ECC approach first appeared in [27,30]. The simulation results using hard decision
Viterbi decoding algorithm in the paper give the first challenge to the traditional
approaches in the standard in term of coding efficiency and reconstructed video output
quality.
The IFR scheme to cope with the disadvantage of ECC approach is proposed in [31,32]
with the support of simulation results. The ECC scheme is further improved in [34]
using the soft-decision Viterbi decoding algorithm for convolutional decoding.
The ECC scheme is expanded into bursty error and packet loss situations in [28,29],
where the video bitstream generated after ECC is interleaved before being sent to a
radio channel. Simulation results show that ECC enhanced with interleaving is
effective to cope with both bursty errors and packet loss.
The ECC approach is generalized into second error control for error resilience in [33,
35]. The capability of ECC enhanced with IFR realizes video communication in
residual error conditions where the BER of the residual error falls to 10-2 is also
demonstrated in these papers
References
[1] Jari Hamalainen, “Design of GSM High Speed Data Services” Ph.D. Thesis, Nokia
Mobile Phones Ltd, Tampere, Finland, 1996.
[2] S. Lin and D. J. Costello, “Error Control Coding: Fundamentals and Applications”,
Prentice-Hall, Inc. 1983.
[3] Asha Mehrotra, “GSM System Engineering”, Artech House Publishers, 1997.
12
[4] Siegmund M. Redl, Matthias K. Weber and Malcolm W. Oliphant, “An
Introduction to GSM”, Artech House Publishers, 1995.
[5] J.Hamalainen, “General Packet Radio Service”, in Z. Zvonar, P. Jung and
K.Kammerlander, “GSM evolution towards 3rd generation systems”, Kluwer
Academic publishers, 1999, pp.65-80.
[6] J.Cai and D.J.Goodman, “General packet Radio Service in GSM”, IEEE
Communications Magazine, October 1997, pp122-131.
[7] R. Talluri, “Error-Resilienct Video Coding in the ISO MPEG-4 Standard”, IEE
Communications Magazine, June 1998, pp.112-119.
[8] ISO/IEC 14496-2, “Information Technology – Coding of Audio-Visual Objects:
Visual”.
[9] ITU – T Recommendation H.263, “Video coding for low bit rate communication”,
February 1998.
[10] J.Ott, Stephan Wenger and Gerd Knorr, “Application of H.263+ Video Coding
Modes in Lossy Packet network Environments”, Journal of Visual Communication
and Image Representation 10, 1999, pp.12-38.
[11] Y. Wang and Q. Zhu, “Error Control and Concealment for Video Communication:
A Review”, Proceedings of the IEEE, Vol. 86, No. 5, May 1998. pp.974 – 997.
[12] Y. Wang, S. Wenger, J. Wen and A. Katsaggelos, “Error Resilient Video Coding
Techniques – Real-time Video Communications over Unreliable Networks”, IEEE
Signal Processing Magazine, July 2000. pp.61-82.
[13] ISO/IEC 11172-2, “Information technology - coding of moving picture and
associated audio for digital storage media at up to about 1.5 mbit/s: Part 2 video”,
Aug. 1993.
[14] N. Ahmed, T. Natarajan and R. K. Rao, “Discrete Cosine transform”, IEEE Trans.
On Computers, 1974, pp.90-93.
[15] B. Furht, J. Greenberg, R. Westwater, “Motion Estimation Algorithms for Video
Compressioin”, Kluwer Academic Publishers, November 1996.
[16] D. Salomon, “Data Compression”, Springer Verlag, December 1997.
13
[17] L.Torres and M. Kunt, “Second generation video coding techniques”, in L.Torres
and M.Kunt, “Video coding, the second generation approach”, Kluwer Academic
Publishers, 1996. pp.1-31.
[18] J. G. Proakis, “Digital Communication”, McGraw – Hill, 1995.
[19] R. Johannesson and K. Sh. Zigangirov, “Fundamentals of Convolutional Coding”,
IEEE Press, 1999.
[20] Y. Yasuda, K. Kashiki and Y. Hirata, “High-Rate Punctured Convolutional Codes
for Soft Decision Viterbi Decoding”, IEEE Trans. on Comm., Vol. Com-32, No. 3,
March 1984. pp. 315-319.
[21] ISO/IEC: 13818 (MPEG-2). “Information technology – Generic Coding of Moving
Pictures and Associated Audio Information”.
[22] Bing Du and Anthony Maeder, “Approaches to Video Transmission over GSM
Networks”, Proceedings of SAICSIT 99, South Africa, pp. 28-31.
[23] Bing Du, Anthony Maeder and Miles Moody, “Televideo Transmission over
Mobile Channels for Medical Applications”, ARC Special Research Workshop on
Aspects of Telemedicine, Gold Coast, Australia, 24 October 1999.
[24] Bing Du, A. Maeder and M. Moody, “A framework for live video delivery over
GPRS networks”, Proceedings of AMOC 2000, November 2000, Penang,
Malaysia, pp. 97-101.
[25] Bing Du, A. Maeder and M. Moody, “Video delivery over mobile communication
channels”, CRC-SS annual conference, Adelaide, Australia, 2000.
[26] Bing Du, A. Maeder and M. Moody, “Dynamic hybrid ARQ scheme for video over
GPRS network”, CRC-SS annual conference, Newcastle, Australia, 2001.
[27] Bing Du, M. Ghanbari, “MPEG-4 Video with Error Correction Coding”, Internal
technical report, University of Essex, June 2002.
[28] Bing Du, M. Ghanbari, “ECC video and its performance in bursty channel errors”,
Proceedings of Iranian Conference on Electrical Engineering (ICEE) 2003, 6-8
May, 2003, Shiraz, Iran.
14
[29] Bing Du, M. Ghanbari, “ECC video in bursty channel errors and packet loss”,
Proceedings of Picture Coding Symposium 2003, Saint-Malo, France, 23 - 25 April
2003. pp.99-103.
[30] Bing Du, Anthony Maeder and Miles Moody, “A new approach for error resilient
in video transmission using ECC”, Proceedings of International Workshop on Very
Low Bit-rate Video, Madrid, Spain, 18-19 September 2003, pp.275-282.
[31] Bing Du, A. Maeder and M. Moody, “Intra Frame Relay in ECC video”, Image and
Vision Computing (IVCNZ) 2003, Palmerston North, New Zealand, 24-25
November 2003, pp193-198.
[32] Bing Du, Anthony Maeder and Miles Moody, “ECC video with Intra Frame
Relay”, Proceedings of the IADIS International Conference WWW/Internet 2003,
ICWI 2003, Algarve, Portugal, November 5-8, 2003. IADIS 2003, ISBN 972-
98947-1-X, pp.1007 - 1012.
[33] Bing Du, A. Maeder and M. Moody, “Second Error Control for Error Resilience
Video Coding”, Proceedings of Digital Image Computing - Techniques and
Applications (DICTA) Conference, Sydney, Australia, 10-12 December 2003,
pp.1027-1036.
[34] Bing Du, Anthony Maeder and Miles Moody, “ECC Approach with Soft-Decision
Viterbi Decoding for Error Resilience in Video Communications”, submitted to
27th Australasian Computer Science Conference (ACSC 2004), Dunedin, New
Zealand, 18-22 January 2004.
[35] Bing Du, Anthony Maeder and Miles Moody, “Second Error Control for Live
Video Communication”, submitted to IEEE Transactions on Circuits and Systems
for Video Technology.
[36] W. C. Jakes, “Microwave mobile communications”, May 1994, Wiley-IEEE Press.
[37] S. R. McCanne, “Scalable compression and transmission of Internet multicast
video”, Phd thesis, the University of California, Berkeley, CA, December 1996.
[38] Q. Zhang, S. Kassam, “Hybrid ARQ with Selective Combining for Fading
Channels” IEEE Journal on Selected Areas in Comm. Vol. 17 Num. 5, May 1999.
15
2 OVERVIEW of GSM SYSTEM
2.1 Architecture and functions of the GSM network
The GSM network can be divided into four main parts [1,2,3,4]:
• The Mobile Station (MS).
• The Base Station Subsystem (BSS).
• The Network and Switching Subsystem (NSS).
• The Operation and Support Subsystem (OSS).
The architecture of the GSM network is presented in Figure 2.1.
Figure 2.1 General architecture of a GSM network
16
2.1.1 Mobile station
The mobile station (MS) consists of the mobile equipment (the terminal) and a smart
card called the Subscriber Identity Module (SIM). The SIM provides personal mobility,
so that the user can have access to subscribed services irrespective of a specific
terminal. By inserting the SIM card into another GSM terminal, the user is able to
receive calls at that terminal, make calls from that terminal, and receive other
subscribed services. The SIM card may be protected against unauthorized use by a
password or personal identity number. The mobile equipment is uniquely identified by
the International Mobile Equipment Identity (IMEI).
The SIM card contains the following information:
• IMSI, the International Mobile Subscriber Identity used to identify the subscriber to
the system. IMSI numbers are independent, thereby allowing personal mobility.
• TMSI, temporary mobile station identity, used together with LAI to identify the MS
during the time it is served by the VLR that covers the location area.
• LAI, location area identity.
• Ki, a permanent key for authentication.
• Kc, a cipher key.
2.1.2 The Base Station Subsystem
All radio-related functions are performed in the base station system (BSS). The BSS
consists of base station controllers (BSCs) and the base transceiver stations (BTSs).
2.1.2.1 The Base Transceiver Station
The base transceiver station (BTS) handles the radio interface to the mobile station. The
BTS is the radio equipment (transceivers and antennas) needed to service each cell in
the network. A group of BTSs are controlled by a BSC.
2.1.2.2 The Base Station Controller
The base station controller (BSC) provides all the control functions and physical links
between the MSC and BTS. It is a high-capacity switch that provides functions such as
17
handover, cell configuration data, and control of radio frequency power levels in base
transceiver stations. A number of BSCs are served by an MSC.
2.1.3 The Network and Switching Subsystem (NSS)
The main role of the NSS is to manage the communications between the mobile users
and other users, such as mobile users, ISDN users, fixed telephony users, etc. It also
includes data bases needed in order to store information about the subscribers and to
manage their mobility. The different components of the NSS are described below.
2.1.3.1 The Mobile services Switching Center (MSC)
The MSC is the central component of the NSS. It performs the telephony switching
functions of the system and controls calls to and from other telephone and data systems.
It also performs such functions as toll ticketing, network interfacing, common channel
signaling, and others.
2.1.3.2 The Gateway Mobile services Switching Center (GMSC)
A gateway is a node interconnecting two networks. The GMSC is the interface between
the mobile cellular network and the PSTN. It is in charge of routing calls from the fixed
network towards a GSM user. The GMSC is often implemented in the same machines
as the MSC.
2.1.3.3 Home Location Register (HLR)
The HLR is a database used for storage and management of subscriptions. It is
considered the most important database since it stores permanent data on subscribers,
including a subscriber’s service profile, location information, and activity status. When
an individual buys a subscription from one of the PCS operators, he or she is registered
in the HLR of that operator.
The HLR contains the following information:
• MSISDN of MS, the mobile station’s ISDN number which is dialled by a subscriber
when calling the MS. It is used by the fixed network to route calls for MS to a
nearby gateway MSC in the home PLMN of MS.
• IMSI of MS.
18
• Originating and terminating service profile of MS.
• Address of the VLR associated with MSC that is currently serving the MS.
2.1.3.4 Visitor Location Register (VLR)
The VLR is a database that contains temporary information about subscribers that is
needed by the MSC in order to service visiting subscribers. The VLR is always
integrated with the MSC. When a mobile station roams into a new MSC area, the VLR
connected to that MSC will request data about the mobile station from the HLR. Later,
if the mobile station makes a call, the VLR will have the information needed for call
set-up without having to interrogate the HLR each time.
The VLR stores the following information:
• MSISDN.
• Originating and terminating service profile of MS.
• IMSI.
• TMSI.
• LAC, the local area code of the current MS location area.
• MSRN, the mobile station roaming number (which is equivalent of the temporary
location directory number).
2.1.3.5 The Authentication Center (AuC)
The AuC provides authentication and encryption parameters that verify the user’s
identity and ensure the confidentiality of each call. The AuC protects network operators
from different types of fraud found in today’s cellular world. The authentication
procedure involves the SIM card and the AuC. A secret key, Ki, stored in the SIM card
and the AuC, and a ciphering algorithm called A3 are used in order to verify the
authenticity of the user. The mobile station and the AuC compute a SRES (signed
result) using the secret key, the algorithm A3 and a random number generated by the
AuC. If the two computed SRES are the same, the subscriber is authenticated. The
different services to which the subscriber has access are also checked. Another security
procedure is to check the equipment identity. If the IMEI number of the mobile is
authorized in the EIR, the mobile station is allowed to connect the network. In order to
assure user confidentiality, the user is registered with a Temporary Mobile Subscriber
19
Identity (TMSI) after its first location update procedure. Enciphering is another option
using Kc to guarantee a very strong security.
The information stored in AuC is:
• MSISDN.
• Ki.
• Kc.
2.1.3.6 The Equipment Identity Register (EIR)
The EIR is also used for security purposes. It is a register containing information about
the mobile equipment. More particularly, it contains a list of all valid terminals. A
terminal is identified by its International Mobile Equipment Identity (IMEI). The EIR
allows then to forbid calls from stolen or unauthorized terminals (e.g, a terminal which
does not respect the specifications concerning the output RF power). The AuC and EIR
are implemented as stand-alone nodes or as a combined AuC/EIR node.
2.1.3.7 The GSM Interworking Unit (GIWU)
The GIWU corresponds to an interface to various networks for data communications.
During these communications, the transmission of speech and data can be alternated.
2.1.4 The Operation and Support Subsystem (OSS)
The OSS is connected to the different components of the NSS and to the BSC, in order
to control and monitor the GSM system. It is also in charge of controlling the traffic
load of the BSS. However, the increasing number of base stations, due to the
development of cellular radio networks, has provoked that some of the maintenance
tasks are transfered to the BTS. This transfer decreases considerably the costs of the
maintenance of the system.
20
2.1.5 Additional Functional Elements
2.1.5.1 Message Center
The message center (MXE) is a node that provides integrated voice, fax, and data
messaging. Specifically, the MXE handles short message service, cell broadcast, voice
mail, fax mail, email, and notification.
2.1.5.2 Mobile Service Node
The mobile service node (MSN) is the node that handles the mobile intelligent network
(IN) services.
2.1.6 The geographical areas of the GSM network
The Figure 2.2 presents the different areas that form a GSM network in general. A cell,
identified by its Cell Global Identity number (CGI), corresponds to the radio coverage
of a base transceiver station. A Location Area (LA), identified by its Location Area
Identity (LAI) number, is a group of cells served by a single MSC/VLR. A group of
location areas under the control of the same MSC/VLR defines the MSC/VLR area. A
Public Land Mobile Network (PLMN) is the area served by one network operator.
Figure 2.2 GSM network areas
21
2.2 Signalling system in GSM
2.2.1 GSM Radio Channels
2.2.1.1 Dedicated Channels
Dedicated Channel includes TCH (traffic channels) and DCCH (dedicated control
channels). DCCH, used for the message transfers between the network and the mobile
station, includes SDCCH (standalone dedicated sontrol channel), SACCH (slow
associated control channel) and FACCH (fast associated control channel). A BSS has a
pool of SDCCHs and a pool of TACHs (traffic and associated control channel). The
details of these channels are described below.
• SDCCH is allocated to a MS for call set-up signalling and released when this
signalling is complete.
• TCH is used for carrying speech and data.
• SACCH is always used in association with either a traffic channel or SDCCH. The
purpose of the SACCH is channel maintenance. The SACCH carries control and
measurement parameters or routing data needed to maintain a link between the
mobile and the base station.
• FACCH is associated to a TCH. It can carry the same information as the SDCCH.
The difference is that SDCCH exists on its own, whereas the FACCH replaces all or
part of a traffic channel. If during a call there is need for some heavy-duty
signalling with system at a rate much higher than the SACCH can handle, then
FACCH appears in the place of the traffic channel.
• TACH is the combination of a TCH and its SACCH as well as FACCH.
2.2.1.2 Common Control Channels (CCCH)
One RF carrier in each cell contains a CCCH which is time-divided into a number of
common (point-to-multipoint, unidirectional) channels, for signalling between a BSS
and all mobiles in the cell that are active, but not involved in a call:
• BCCH (Broadcast Control Channel) informs the mobile station about specific
system parameters it needs to identify the network or to gain access to the network.
22
The parameters are among others, the LAC (location area code), the MNC (mobile
network code identifying a GSM-network within a country), the information on
which frequencies the neighbouring cells may be found, different cell options and
access parameters.
• FCCH (frequency control channel) contains information for mobiles concerning
frequency synchronisation with the RF carrier.
• SCH (synchronisation channel) supplies with mobile station with information
enabling the mobiles to acquire frame and time synchronisation with the BSS.
• PAGCH (paging and access grant channel) broadcasts paging messages. Also when
the network (MSC) allocates a SDCCH to a MS, it informs the MS with a message
on PAGCH.
• RACH (random access channel), the only up link (from MS to NSS) common
control channel, is used by mobiles to request a SDCCH from the network.
2.2.2 Signalling Interfaces and Protocols
The interfaces and protocols for signalling between a MS and PLMN are shown in
Fig.2.3. Um (radio) interface is between a MS and the BSS, the A (cable) interface is
between BSS and MSC. Another interface A-bis between BTS and BSC is shown is
Fig.2.4. The GSM-MAP (mobile application part) interfaces between the equipment
entities of network and switching system are shown in Fig.2.5.
Figure 2.3 Um and A interface
23
2.2.2.1 Um Interface
The signalling protocol on this interface has three layers.
• Physical Layer (Layer 1) consists of those parts of the RF channels that contain
signalling channels (SACCH, FACCH, BCCH, SCH, FCCH, PAGCH, RACH and
SDCCH).
• Data Link Layer (Layer 2) [8], know as LAPDm, is a modified version of the ISDN
[5] link access protocol for D-channels.
• Message Layer (Layer 3). In MS this layer consists of three parts:
RR (radio resource management) sublayer at a MS communicates with its peer
in the BSS. For example, when RR at BSS allocates a TACH or a SDCCH
channel to MS, it informs the MS with a RR message.
MM (mobility management) sublayer messages support MS location updating
and authentication.
CM sublayer which has a further three parts:
(a) CC (call control) contains the messages for the set-up and release of
connections to the MS.
(b) SS (supplementary services) concerns the management of the
supplementary services. MS and HLR are the only entities involved in SS
management.
(c) SMS (short message service) is a service by which subscribers can send
short (text) messages to a MS.
2.2.2.2 A Interface
The signalling protocol on this interface is adapted from signalling system 7. The MTP
(message transfer part) shown in Fig.2.3 actually comprises 3 sublayers - MTP1, MTP2
and MTP3. MTP1 and MTP2 correspond respectively to Physical Layer and Data Link
Layer of OSI model. The MTP3 and SCCP (signalling connection control part) fulfil
the function of Network Layer of OSI. The user of the SCCP, BSSAP (BSS application
part) which comprises DTAP (direct transfer application part) and BSSMAP (BS
system management application part), actually pass only the RR and O&M (operations
and maintenance) messages. DTAP is used by BSS to transfer RR message
24
transparently between MS and MSC. BSSMAP is the process within BSS that controls
RR in response to instructions from MSC which is used in the assignment and switching
of RR at both call setup and handover.
2.2.2.3 A-bis Interface
As a number of BTS can be served by one BSC as shown in Fig. 2.4, there is a need for
communication between BTS and BSC.
A-bis A-bis A-bis
Figure 2.4 A-bis interface
The protocol used at this interface contains three layers: physical layer (layer 1),
signalling links (layer2), and an upper layer (layer 3) of signalling.
The physical layer either transmits at 2.048 kbps or at 64 kbps. Four coded
speeches at 13 kbps may be multiplexed to form a 64 kbps data channel after being
padded with extra bits.
Layer 2 uses the standard LAPD. The main distinction between LAPD and LAPDm
is that LAPDm is only used for the unacknowledged mode of operation, which
MSC
BSC
TRX BCF
TRX TRX TRX
TR
TR
TR
BC
25
applies to BCCHs and CCCHs. Both FCCH and SCH under BCCH do not require
acknowledgment. Similarly, no acknowledgment is needed for PCH and AGCH.
Layer 3 deals with the messages transferring from OMC to BTS as there is no direct
link between BTS and OMC. All the messages from OMC go first to BSC and then
are routed to BTS.
2.2.2.4 MAP Interfaces
The interfaces between different network parts are shown in Fig.2.5. These interfaces
are designated as MPA/B through MAP/H. The SS7 protocol [6,7] is used at all these
interfaces. The MAP protocol is used as a remote data base access performed by
exchange of messages that are grouped into simple dialogues, mostly in the form of
query and response.
MAP/F BSSMAP MAP/I MAP/D
MAP/E MAP/G MAP/C MAP/C
MAP/H
Figure 2.5 MAP interfaces
BSS
MAP/B
MSC
VLR
MAP/B
MSC
VLR
SMS Gatewa
HLR
GMSC
EIR
26
2.2.2.5 X.25 Interface System
The communication between MSC and OMC including billing centre information is
accomplished by deploying X.25 protocol [9].
2.3 The Multiple Access Scheme
The radio interface of GSM uses a combination of FDMA (Frequency Division
Multiple Access) and TDMA (Time Division Multiple Access) with some frequency
hopping.
2.3.1 FDMA
The use of frequency resources in the GSM development has followed three stages:
Primary GSM, E-GSM and DCS-1800.
2.3.1.1 Primary GSM
The Primary GSM system refers to the first generation of GMS systems in which two
25MHz frequency bands in the 900MHz range are used. The mobile station transmits in
the 890 to 915MHz frequency range while the base station transmits in the 935 to
960MHz range. The frequency bands are divided into 125 channels with widths of 200
kHz each. These channels are numbered from 0 to 124. However only 124 channels
from number 1 to 124 are used, and these channels are usually referred as ARFCN
(absolute radio frequency channel number). The band number 0 is used as a guard band
between GSM and other services on lower frequencies.
2.3.1.2 E-GSM
With the further development of the GSM standard, an additional range of frequencies
has been made available to the system. For each of the two duplex frequency ranges,
one for the forward direction and the other for the reverse direction, an additional 10
MHz has been added to the bottom end of the bands, extending the frequency range to
cover another 50 channels. The numbering for these additional channels is from 974 to
27
1023. Channel 0 is returned to use in the extended GSM system. Instead the lowest
channel (number 974) serves as the guard band.
2.3.1.3 DCS-1800
As the evolution of GSM progressed towards use as a personal communication network,
the official name of this system became DCS-1800 when ETSI finally completed the
specification of this system. In DCS-1800 the frequency ranges of 1710 to 1785 MHz in
the uplink direction and 1805 to 1880 MHz in the downlink are used, and the duplex
spacing is 75 MHz with 374 channels of 200 KHz each.
2.3.2 TDMA
As stated above, each carrier frequency has a width of 200KHz. The TDMA scheme
splits each frame of about 4.615 ms of this carrier of into 8 timeslots of about 0.577ms
each. Each of these timeslots is a physical channel occupied by an individual user. The
timeslot within a frame is numbered from 0 to 7. In traffic channel combinations, a
structure of 26 frames is defined as multiframe. Similarly in signalling channel
combinations, mustiframe is defined as the combination of 51 frames.
2.3.2.1 Traffic channel Frame Structure (26-Multiframe)
The traffic channel frame structure is show in Fig.2.6. The length of a 26-frame
multiframe is 120 ms, which is how the length of a burst period is defined (120 ms
divided by 26 frames divided by 8 burst periods per frame). Out of the 26 frames, 24 are
used for traffic, 1 is used for the Slow Associated Control Channel (SACCH) and 1 is
currently unused. TCHs for the uplink and downlink are separated in time by 3 burst
periods, so that the mobile station does not have to transmit and receive simultaneously,
thus simplifying the electronics.
In addition to these full-rate TCHs, there are also half-rate TCHs defined, although they
are not yet implemented. Half-rate TCHs will effectively double the capacity of a
system once half-rate speech coders are specified (i.e., speech coding at around 7 kbps,
instead of 13 kbps).
28
Figure 2.6 Traffic channel frame structure
2.3.2.2 Signalling Frame Structure
Just like the TCHs are always combined with ACCH in traffic channel multiframes, the
signalling channels are always grouped together to form signalling multiframes. There
are 4 different combinations listed as below:
FCCH + SCH + CCCH + BCCH.
FCCH + SCH + CCCH + BCCH + SDCCH/4 + SACCH/4.
CCCH + BCCH.
SDCCH/8 + SACCH/8.
The different combination have different multiframe structure, for which details can be
found in [1].
2.3.2.3 Structure of a TDMA Slot within a Frame
There are five different types of bursts (the contents of the timeslot) used to carry
information on the TCH and on the control channels: normal burst, synchronisation
burst, frequency correlation burst, access burst and dummy burst.
The normal burst is used to carry data and most signalling. It has a total length of
156.25 bits, made up of two 57 bit information sequences, a 26 bit training sequence
used for equalization, 1 stealing bit for each information block (used for FACCH), 3 tail
29
bits at each end, and an 8.25 bit guard sequence, as shown in Figure 6. The 156.25 bits
are transmitted in 0.577 ms, giving a gross bit rate of 270.833 kbps.
The F burst used on the FCCH and the S burst used on the SCH have the same length as
a normal burst, but a different internal structure which differentiates them from normal
bursts (thus allowing synchronization). The access burst is shorter than the normal burst
and is used only on the RACH. The dummy burst is sent from BTS on some occasions
and carries no information.
2.3.3 Frequency Hopping
The propagation conditions and therefore the multipath fading depend on the radio
frequency. In order to avoid important differences in the quality of the channels, slow
frequency hopping is introduced. Slow frequency hopping changes the frequency with
every TDMA frame. Fast frequency hopping which changes the frequency many times
per frame is not used in GSM. Frequency hopping also reduces the effects of co-channel
interference.
There are different types of frequency hopping algorithms. The algorithm selected is
sent through the Broadcast Control Channels. Even if frequency hopping can be very
useful for the system, a base station does not have to support it necessarily On the other
hand, a mobile station has to accept frequency hopping when a base station decides to
use it.
2.4 Source coding and channel coding
Fig. 2.7 presents the different operations that have to be performed in order to pass from
the speech source to radio waves and vice versa.
30
Figure 2.7 Speech signal processing
2.4.1 Speech coding
Speech coding is basically the process of speech compression using digital techniques.
In poor radio conditions, the performance of the GSM speech coder has been shown
superior to that of analog cellular [4]. The mathematical operation of the GSM speech
coder is completely standardised in every detail. The following speech coding schemes
are supported in GSM systems.
2.4.1.1 Full Rate speech Coding
The standard digital signal used in most wire telephone systems to represent an audio
channel requires 64 kbps. The standard GSM speech coder compresses this data rate to
13 kbps.
2.4.1.2 Half Rate Speech Coding
The use of higher data compression rates reduces the amount of data required per user
and this increases the number of users that can share a radio channel. The half rate coder
31
allows a single carrier frequency to support 16 conversations instead of the 8
conversations in the full rate coder case.
2.4.1.3 Multirate Speech Coding
The GSM speech coder can vary its data transmission rate depending on speech activity.
The speech coder can reduce or stop transmitting the digital voice signals when speech
activity is low. When the speech coder senses no speech activity (i.e. silence), it
digitally encodes a 20 ms window of background noise to prevent sudden disturbing
changes in perceived sound characteristics when the caller stops talking. Then it shuts
off the radio transmitter until the microphone picks up some sounds again. This process
is called Discontinuous Transmission (DTX). This allows the mobile to save the battery
life and the base station to reduce co-channel interference.
2.4.1.4 Enhanced Speech Coding
This scheme uses same bit rate as full rate speech coding, but has much better quality,
very comparable to the quality of a standard wired telephone connection. The cost is the
much more sophisticated method of encoding and decoding process.
2.4.2 Channel coding
Channel coding is the process of adding extra data bits along with transmitted data bits
that can be used to determine if some or all of the bits have been successfully received
without error. Three basic types of error protection coding are used in GSM: cyclic
redundancy check (CRC), block code and convolutional code.
2.4.2.1 CRC
When a call processing message or some other selected group of data bits are to be
transmitted, the entire message group of bits is first treated as a big binary number. It is
divided in a special way, by a pre-arranged constant, and the remainder is found. The
remainder (CRC check sum) is appended to the data and transmitted along with it. At
the receiving end, the data is again divided by the same special way. If the remainder
computed at the receiver does not match the CRC received, then errors occur. In some
32
cases, CRC check bits can be used to help correct by retransmission some bits that were
received in error.
2.4.2.2 Block Code
The GSM system uses a particular type of block code know as a Fire Code. A block
code is generated by “adding” a sum of products generated by a fixed size block of
digits. More details on block code can be found in Chapter 5.2.
2.4.2.3 Convolutional Code
A convolution code is calculated by “multiplying” the input data vale by a pre-arranged
constant value. At the receiving end, the received value is divided by the same pre-
arranged constant value. If the remainder is zero, it is reasonable to assume that the
data was received correctly, and the quotient is the data. If the remainder is not zero,
the error can be corrected (by adding or subtracting the remainder to or from the
quotient) in certain special cases, and in other cases where the errors are too numerous
or widespread, there is at least an awareness of the errors. Please refer to Chapter 5.3
for more details about convolutional code.
2.4.3 Interleaving
Interleaving is the reordering of data that is to be transmitted so that consecutive bits of
data are distributed over a larger sequence of data to reduce the effect of burst error.
2.4.4 Encryption
Encryption is a process of a protecting voice or data information from being
eavesdropped. It involves the use of a data processing algorithm (formula program) that
uses one or more secret keys (number value) that both the sender and receiver use to
encrypt and decrypt the information.
33
References
[1] Asha Mehrotra, “GSM System Engineering”, Artech House Publishers, 1997.
[2] Siegmund M. Redl, Matthias K. Weber and Malcolm W. Oliphant, “An Introduction
to GSM”, Artech House Publishers, 1995.
[3] Michel Mouly, “The GSM System for Mobile Communications”, Palaiseau, France:
M. Mouly & Marei-B Pautet, 1992.
[4] Lawrence Harte, Richard Levine and Geoff Livingston, “GSM Superphones”,
McGraw-Hill, 1999.
[5] Gary C. Kessler, “ISDN” Second Edition, McGraw-Hill Series on Computer
Communications, 1993.
[6] Richard J Manterfield, “Common-channel Signalling”, Peter Peregrinus Ltd, 1991.
[7] John G. van Bosse, “Signaling in Telecommunication Networks”, John Wiley &
Sons, Inc. 1998.
[8] Andrew S. Tanenbaum, “Computer Networks” Third Edition, Prentice Hall, 1996.
[9] Black, Uyless D. “X.25 and related protocols”, IEEE Computer Society Press, Los
Alamitos, California, 1991.
34
35
3 VIDEO OVER GPRS NETWORK Building on the brief description of GSM system in Chapter 2, in this chapter we are
able to explore how aspects of the GSM mobile telecommunications network might be
used to provide video delivery in real-time, taking into account the dynamic channel
bandwidth usage capabilities within that system, and marrying these with the variable
bit rate (VBR) characteristics of compressed video. The scope of this objective is large,
and impacts on many associated areas such as commercial provision of services, traffic
management and modelling, intelligent monitoring and control, systems integration,
picture quality and human factors issues. Here we will present only the fundamental
concepts of how the video transmission might be achieved, via careful matching of the
coding and delivery systems.
3.1 Data services in GSM networks
Four kinds of data services are supported in GSM (Global System for Mobile
communications) Phase 2+ system, as listed below:
Packet data on signalling channels service (PDS) [1].
Short message service (SMS) [2].
High speed circuit switched data services (HSCSD) [3].
General packet radio service (GPRS) [4,7].
Each of these services is described briefly in the following paragraph, in order to
identify whether their protocols and operating characteristics support video delivery or
not.
36
3.1.1 PDS and SMS
The GSM standard defines the meaning of PDS as below:
PDS is a bearer service enabling circuit oriented point to point transfer in GSM
networks of very small data packets on radio interface signalling channels for
applications using short dialogues with a data throughput rate capability in the range of
600 to 9200 bps and with a duration in the range of a few seconds [1].
As an alternative service, SMS [2] provides a means to transfer short messages packets
(of up to 140 octets) between a GSM mobile system and an SME (Short Message
Entity) via a SC (Service Center), through a signalling channel (SDCCH or SACCH). In
Phase 2+ the standard enhances the SMS by allowing for multiple SMS packets to be
concatenated, using a flag indicating more information to follow. Obviously PDS and
SMS are not suitable to transfer video over GSM networks due to the extremely limited
packet and message size, as video communication requires at least 32kbps bandwidth
for QCIF format video sequence ; therefore they are not considered any further here.
3.1.2 HSCSD
HSCSD is a feature enabling the co-allocation of multiple (up to 8) Full Rate Traffic
Channels (TCH/F) into a multi-slot configuration, consisting of one or several full rate
traffic channels intended expressly for data transmission [3,5].
Although a TCH (Traffic Channel) is optimised to be able to carry 13 kbps speech
information, for data transmission the data rate is adapted to the standard V.32 bit rate
of 9.6 kbps. In implementing HSCSD, a higher air interface user rate of 14.4 kbps per
TCH is supported, so the basic GSM circuit data service is extended to higher speed (up
to 115 kbps). This data rate is sufficient to support real-time compressed video
transmission applications like videophone or videoconferencing.
Both transparent and non-transparent HSCSD connections are supported, with
symmetric and asymmetric configuration. In an asymmetric configuration, the network
gives priority to fulfilling the air interface user rate requirement in the downlink
direction. For a non-transparent HSCSD connection the network can use dynamic
37
allocation of resources (i.e. TCH/F), as long as the configuration is not in contradiction
with the limiting values defined by the Mobile System, and the actual mobile equipment
is capable of handling the allocated channel configuration.
For a transparent HSCSD connection, dynamic resource allocation is applicable,
provided the air interface user rate is kept constant. The change of channel
configuration within the limits of minimum and maximum channel requirements is done
with resource upgrading and resource downgrading procedures during the call. The
Mobile System may request a service level up- or downgrading during the call,
negotiated at the beginning of the call. This modification of channel requirements
and/or desired air interface user rate is applicable to non-transparent HSCSD
connections only.
3.1.3 GPRS
For bursty data communication application, circuit allocation is a wasteful use of the
radio link. As an alternative, GPRS (General Packet Radio Service) [4,7,8] optimises
the use of network and radio resources by using a packet-mode technique to transfer
high-speed and low-speed data and signalling in an efficient manner. The highest
supported bit rate in GPRS is 170 kbps, which lays the foundation to support
videophone or videoconferencing applications (e.g. based on H.261, H.263 and MPEG-
4: see Chapter 4.3). In a GPRS network two types of services are supported:
Point-to-point (PTP).
Point-to-multipoint (PTM).
Based on the existing GSM network, this enhancement introduces two new network
nodes in the GSM PLMN: the Serving GPRS Support Node (SGSN) and the Gateway
GSN (GGSN). The SGSN, being at the same hierarchical level as the MSC and
connected to the base station system with Frame Relay, keeps track of the individual
Mobile System location and performs security functions and access control while
GGSN provides interworking with external packet-switched networks, and is connected
with SGSNs via an IP-based GPRS backbone network. In addition, the HLR (Home
Location Register) is enhanced with GPRS subscriber data and routing information.
38
The GPRS air interface protocol is concerned with communications between the Mobile
System and BSS at the physical, MAC (Medium Access Control) and RLC (Radio Link
Control) protocol layers. The RLC/MAC sublayers allow efficient multiuser
multiplexing on the shared packet data channels and utilise a selective ARQ protocol for
reliable transmissions across the air interface.
The MAC layer, derived from a slotted ALOHA protocol [9,10], is responsible for
access signalling procedures for the radio channel governing the attempts to access the
channel by the Mobile Systems and the control of this access by the network. Therefore
it is understandable that the crucial part of the network determining whether the network
is able to accommodate a variety of service types including speech, data and video,
mainly depends on MAC.
3.2 Possibilities for video over GSM networks
3.2.1 Video over HSCSD
The following discussions need some knowledge on the video coding algorithms,
further details of which can be found in Chapter 4. From the above description, we can
see HSCSD is undoubtedly a significant enhancement of air interface user rates and can
achieve much higher data transmission speed for ftp or constant bit rate video
applications. HSCSD provides the potential and possibility to transmit H.263 video
over a HSCSD connection. However, for live video delivery it has several limitations.
First, the current world video coding standards produce variable bit rate data streams.
This results in very poor utilisation of radio channels while being transmitted using
HSCSD, as the network has to allocate radio channels according to the highest bit rate
in the entire session, in order to guarantee the required QoS. Though the network can
use dynamic allocation of resources, this is only applicable to non-transparent HSCSD.
It is preferable to use transparent HSCSD, as a non-transparent connection will
introduce delay and jitter which is unacceptable for real-time video applications. For a
transparent connection, the dynamic resource allocation is possible only if the air
interface user rate is kept constant, which is meaningless for VBR video applications.
39
Second, though it is possible to use 8 TCH/F channels on the radio interface, the end-to-
end communication is limited to 64 kbps on the A interface (between base station
controller and mobile services switch center). The highest bit rate on an I-frame
(independently coded from adjacent frames) can be much higher than this. Moreover,
even if it is possible to allocate radio channels dynamically, it is not so flexible to do the
same thing in the GSM backbone network as the current GSM system is based on
circuit switched technology.
3.2.2 Video over GPRS
On the other hand, because of its packet switched nature, GPRS will give more
flexibility and efficiency than HSCSD for the following reasons:
• Packet video has become the main trend for new uses of video communications, such
as Internet access, i.e. video communication through the Internet.
• All the worldwide video coding standards are inherently suited to packet structure as
they delimit sections of the compressed data stream according to parts of a frame or
sequence of frames, such as groups of blocks.
• It is more flexible to allocate video channels per application dynamically, based on
video content, to improve the video channel's utilization. The GPRS backbone
networks are based on Internet Protocol (IP), in which extensive research activities
have been carried out to support diverse traffic transmission, including the proposal
of a new version IPv6.
• GPRS has strong potential to integrate different traffics into one network including
speech, data, and video.
The bottleneck of video over GPRS lies in the MAC (Medium Access Control)
protocol, since it has been designed mainly for data (non real-time) applications in the
current GPRS system (though it can support speech communication quite well). The
MAC is used to share the radio channels among mobile stations in the cell and to
allocate the physical radio channel for a mobile station (MS) when needed for
transmission or reception.
40
An MS initiates a packet transfer by making a Packet Channel Request on Packet
Random Access Channel (PRACH) on a contention basis with other MS. If the
contention is successful, the network responds on Packet Access Grant Channel
(PAGCH). It is possible to use either one- or two-phase packet access methods.
In one-phase access, the Packet Channel Request message contains all the information
needed for establishment of the channel including multislot related information and
quality of the requested service. As the response, a Packet Immediate Assignment
reserving the resources on Packet Data Channels (PDCH) for uplink transfer of user
information is sent to the MS. The MS then starts sending information to BTS for
transmission.
In two-phase access, the Packet Channel Request is responded to with a Packet Uplink
Assignment to reserve the uplink resources for transmitting the Packet Resource
Request, which carries the complete description of the requested resources for the
uplink transfer. A two-phase access can be initiated by either the network or a MS. The
network can order the MS to send a Packet Resource Request message by setting a
parameter in a Packet Uplink Assignment message. A mobile station can require two-
phase access in a Packet Channel Request message. In this case, the network may order
the MS to send a Packet Resource Request or continue with one-phase access
procedure.
From the description above it is clear that bandwidth assigned to one MS can be varied
dynamically. It works well with constant bit rate transmission, or variable non-real-time
applications, but is not effective for variable bit rate real-time video applications.
During transmission of live video, the variable bit rate of live video requires the
dynamic allocation of bandwidth with acceptable delay and every reallocating of the
radio channel requires access to the PRACH. However, the contention mechanism of
PRACH access does not guarantee the delay requirement, which is crucial in real-time
video applications.
One possible solution is the reconfiguration of the multislot to have more than the set of
current active channels by means of the communication between MS and BSS, rather
than by means of the re-access of PRACH during the real time transmission which
would involve further contention. Though the compressed video is variable bitrate, the
41
temporal frame structure, (i.e. the appearance of I, P or B pictures: see Chapter 4.3) is
periodic, so the arrival of an I picture or P picture for transmission can be anticipated.
Therefore the allocation of multislot channels according to the picture type can be
realised. Moreover this scheme requires classes of different video types to be defined
based on the statistical modelling of the video sources that every class corresponds to a
certain bitrate level, so that the bitrate for I picture, P picture or B picture can be
estimated for Packet Channel request purposes.
3.2.3 Dynamic channel allocation
As described in last section, the key issue in delivering live video over a GPRS network
lies in the capability of the network to allocate packet data channels to the MS (Mobile
Station) dynamically. This section provides more detailed discussion on dynamic
channel allocation schemes. In the current standard, three medium access modes are
supported, namely Dynamic Allocation, Extended Dynamic Allocation and Fixed
Allocation.
In Dynamic Allocation, the Packet Uplink Assignment message includes the list of
PDCHs (Packet Data channels) and the corresponding USF (Uplink State Flag) value
per PDCH. A unique TFI (Temporary Frame Identity) is allocated and is thereafter
included in each RLC (Radio Link Control) data and control block related to that TBF
(Temporary Block Flow). The MS monitors the USFs on the allocated PDCHs and
transmits radio blocks on those, which currently bear the USF value reserved for the
usage of the MS.
The Extended Dynamic Allocation medium access method extends Dynamic Allocation
to allow higher uplink throughput. In Extended Dynamic Allocation, the MS monitors
its assigned PDCHs starting with the lowest numbered PDCH, then the next lowest
numbered PDCH, etc. Whenever the MS detects an assigned USF value on an assigned
PDCH, the MS in the next block period transmits an RLC/MAC (Medium Access
Control) block on the same PDCH and all higher numbered assigned PDCHs without
looking for the assigned USF on the higher numbered PDCHs. If the number of PDCHs
allocated to a MS per block period is reduced, the network does not allocate any
42
resources to the MS for one block period following the block period with the higher
number of PDCHs allocated.
In Dynamic and Extended Dynamic Allocation, the MS may be allowed to use the
uplink resources as long as there is queued data on the RLS/MAC layer to be sent from
the MS. It can comprise a number of LLC (Logic Link Control) frames, in the sense
that the radio resources are assigned initially on an “unlimited” time basis.
Alternatively, the uplink assignment for each MS may be limited to a number of radio
blocks in order to offer more fairness to the medium at higher loads.
Fixed Allocation uses the Packet Uplink Assignment message to communicate a
detailed fixed uplink resource allocation to the MS. The fixed allocation consists of a
start frame, slot assignment, and block assignment bitmap representing the assigned
blocks per timeslot. The MS waits until the start frame is indicated and then transmits
radio blocks on those blocks indicated in the block assignment bitmap. The fixed
allocation does not include the USF and the MS is free to transmit on the uplink without
monitoring the downlink for the USF. If the current allocation is not sufficient, the MS
may request additional resources in one of the assigned uplink blocks. A unique TFI is
allocated and is thereafter included in each RLC data and control block related to that
TBF. Because each Radio Block includes an identifier (TFI), all received Radio Blocks
are correctly associated with a particular LLC frame and a particular MS.
Fixed Allocation is good for bursty applications, but does not provide enough blocks in
advance for longer time live video applications, therefore it will not be considered any
more in this project. Extended Dynamic Allocation is more suitable than Dynamic
Allocation for live video delivery because of its flexibility and because the MS does not
have to monitor every PDCH for its use, therefore can save some channel bit rate
capacity. For live video applications, the decision as to when to allocate more channels
needs to be made either by the base station with some kind of memorization mechanism
or by coordination between the network and MS, based on the radio environment and
traffic circumstances.
If I-frame refreshment is not used for error control, the network will need this
memorization mechanism. For instance, if the application transmits one I-frame every
30 frames, then after 29 frames, the network needs to allocate more channels for the
43
next I-frame automatically. In the case that the arrival of I-frames is not totally periodic,
a request for additional channels for unexpected I-frames needs to be sent in a radio data
block by the MS for I-frame transmission, so the network can allocate more channels.
Another possible modification needed to the current standard for video applications is
that MS shouldn’t need to monitor the USF during an I-frame transmission. This is a
reasonable modification because it is most likely that one application will occupy all
assigned PDCHs due to the greedy demand for the packet data channels during the I-
frame transmission. It should be noted that for PB-frame transmission, the use of USF is
necessary because a PDCH needs to be shared by several MSs.
In realizing this scheme, an effective CAC (Call Admission Control) algorithm needs to
be designed. This is a complex matter requiring detailed analysis and will be not
addressed in this work.
3.2.4 Example
The coding of QCIF-4:2:0 Miss America, at a frame rate of 10 fps based on H.263 with
options of Advanced Prediction mode and PB-frame mode, produces a bit rate of I-
frames at 90.2 kbps and smoothed PB-frames at 1.3 kbps. Thus, to transmit this video,
7 channels for I-frames and 1/8 channel for PB-frames need to be allocated. If another
separate video stream starts exactly one frame period after the start of the previous one,
theoretically 8 such streams can be supported simultaneously in one carrier.
Based on this simple example, we can consider some variations, which demand more
complex allocation and control of channels in the transmission, such as those caused by
variations in I-frame and PB-frame bit rates and mixture.
If the bit rate of an I-frame is more than one carrier can cater for in the limited time, part
of the bit rate can be transmitted in the following PB-frame period of time, causing a
corresponding delay i.e. 2 frame periods of time. This delay (about 200 ms if the frame
rate is 10 Hz) is within the usual limits of acceptable tolerance. However, this will result
in a lower utilisation of radio channels and make any consequential dynamic channel
allocation scheme much more complicated.
44
Another option to address this problem is to shape the bit rate into the range of one
carrier frame by adjusting quantisation steps with sacrifice of reconstructed visual
quality. For example, if the I-frame bit rate in the above case was 1.5 times the carrier
bit rate, the encoder would multiply the default quantisation matrix values by a known
constant which would reduce this below 1.0 times the carrier rate, bringing the bit rate
into the capacity of the carrier.
In the above examples, assumptions have been made that a data channel rate of 13.3
kbps is maintained, and that a single carrier is the maximum bandwidth that would be
allowed for the connection. These values could also change, either statically or even
during the progress of the call, thus adding further complexity to the control algorithm.
3.2.5 EDGE
Before concluding this chapter, it is necessary to give a brief introduction on EDGE
(Enhanced Data Rate for GSM and TDMA/136 Evolution) [11]. EDGE is an
enhancement to GSM that aims to increase data rates to over 384 kbps. This rate
increase is achieved by introducing a higher-level modulation format, namely 8-phase
shift keying (PSK) which transmits 3 bits per symbol, instead of the current GSM
modulation, which uses a technique called Gaussian minimum shift keying (GPSK) that
transmits 1 bit per symbol [12].
The benefit is that the overall available channel capacity is increased, which gives more
potential for video communication over mobile network. The penalty incurred by a
higher modulation format is an increase in the frame error rate (FER) at the physical
layer, especially at low SNR (signal to noise ratio) or C/I (carrier to interference ratio).
The FER may be reduced to acceptable levels by employing a FEC code. The residual
frame errors are corrected at the link layer by using a selective automatic repeat request
scheme. Because of the minimal delay requirement for live video transmission, all
these aspects imply a strong demand for error resilience features in the video bit stream.
45
3.3 Conclusion
Although GPRS lays the foundation for real-time video applications, before such
applications can be put into practice, more work needs to be done on optimising the
utilisation of shared scarce radio channels with guaranteed bandwidth for I picture
transmission. The fundamental issue is the compromise that is needed between wide
variations in bitrate needed to cater for all picture types, modelled by an unbounded
VBR scheme, and the inflexibility imposed by the network in allowing only quantum
channel allocation, modelled by step-function bit-rate performance variations or
variable constant-bit-rate (VCBR). The resolution of this dilemma relies on better
system integration and interoperation between the network behaviour and the video
coding process, by extracting useful bit-rate information over many successive frames
and exerting careful intelligent control throughout the transmission. This possibility
requires the development of further layers of complexity than exist at present, to create
a more compliant and flexible protocol that matches system capabilities with these
severe user needs.
References
[1] GSM 03.63 Packet Data on Signalling channels service (PDS), Service description,
Stage 2.
[2] GSM 03.40 Technical realization of the Short Message Service (SMS); Point-to-
Point (PP).
[3] GSM 03.34 High Speed Circuit Switched Data (HSCSD) – Stage 2.
[4] GSM 03.60 General Packet Radio Service (GPRS), Service description, Stage 2,
1997.
[5] J.Hamalainen, “High Speed Circuit Switched Data”, in Z.Zvonar, P.Jung and
K.Kammerlander, “GSM evolution towards 3rd generation systems”, Kluwer Academic
Publishers, 1999, pp. 81-91.
46
[6] J.Hamalainen, “General Packet Radio Service”, in Z.Zvonar, P.Jung and
K.Kammerlander, “GSM evolution towards 3rd generation systems”, Kluwer Academic
Publishers, 1999, pp. 65-80.
[7] J.Cai and D.J. Goodman, “General Packet Radio Service in GSM”, IEEE
Communications Magazine, October 1997, pp. 122-131.
[8] G.Brasche and B.Walke, “Concepts, Services, and Protocols of the New GSM phase
2+ General Packet Radio Service”, IEEE Communications Magazine, August 1997, pp.
94-104.
[9] D.J.Goodman, R.A.Valenzuela, K.T.Gayliard and B.Ramamurthi, “Packet
Reservation Multiple Access for Local Wireless Communications”, IEEE Transactions
on Communications, vol.37, no.8, August 1989, pp. 886-890.
[10] S.Nanda, D.J.Goodman and U.Timor, “Performance of PRMA: A Packet Voice
Protocol for Celluar Systems”, IEEE Transactions on Vehicular Techmology, vol.40,
no.3, August 1991, pp.584- 598.
[11] Robert Van Nobelen, Nambi Sechadri, Jim Whitehead and Shailender Timiri,
“An Adaptive Radio Link Protocol with Enhanced Data Rates for GSM Evolution”,
IEEE Personal Communications, February 1999. pp. 54-63.
[12] D. J. Goodman, “Wireless Personal Communications Systems”, Addison
Wesley, 1997.
47
4 OVERVIEW OF VIDEO CODING TECHNIQUES AND THE CURRENT VIDEO CODING STANDARDS
The video coding techniques reviewed here mainly address low bit-rate video coding,
because the objective of the video application is targeted to mobile situations. By low
bit-rate we mean the bitstream is suitable for transmission over mobile channels, which
is usually below 64kbit/s. State of the art of very low bit-rate video coding techniques
can be divided into waveform based coding and model based coding. The detailed
review of these techniques is given in the following sections.
4.1 Waveform based video coding
In waveform based coding, image sequences are treated as a 3-D signal waveform
exploiting the inherent statistical or deterministic properties and compression is directly
performed on a two-dimensional, discrete distribution of light intensities. A basic
problem in waveform-based compression is to achieve the minimum possible waveform
distortion for a given encoding rate or, equivalently, to achieve a given acceptable level
of waveform distortion with the least possible encoding rate. Most image or video
coding techniques including transform coding, subband/wavelet coding [1], VQ coding
[3] and fractal coding [2] can be classified into this group. Experiences with video
coding at low bit rate show that motion estimation/compensation operations to exploit
temporal redundancy and some kind of transformations to exploit spatial redundancy
are necessary for an efficient very low bit-rate video coding scheme. The reasons for
this are quite simple. Most image sequences exhibit very strong spatial and temporal
48
correlation or redundancy. The spatial redundancy can be reduced by exploiting the
spatial correlation through transformations, so that the compression is realised in the
spatial domain. By exploiting the temporal correlation by inter-frame prediction
through motion estimation and motion compensation techniques, the compression can
be achieved in the temporal domain.
4.1.1 Motion estimation
Motion compensation refers to the use of motion displacements in the coding and
decoding of the sequence. In the encoder the difference between source picture and
prediction is coded; in the decoder this difference is decoded and added to the
prediction to get the decoded output. Both encoder and decoder use the same motion
displacements in determining where to obtain the prediction. However the encoder
must estimate the displacements before encoding them in the bitstream; the decoder
merely decodes them. The process to determine the motion displacements represented
by motion vectors is called motion estimation. Motion estimation techniques can be
loosely divided into three main groups: optical flow techniques, pixel-recursive
techniques and block matching techniques.
4.1.1.1 Optical flow techniques
The optical flow techniques rely on the hypothesis that the image luminance is invariant
along motion trajectories and the direct result from this hypothesis is the optical flow
constraint equation or spatio-temporal constraint equation listed as below,
( ) ( ) 0,, =∂
∂+∇ ⋅ t
trItrI vr
rr r
where, I( ), trr denotes the continuous space-time intensity distribution, and vr dtrd /r= ,
while ( ) ( ) T
ytrI
xtrItrI
∂
∂∂
∂=∇
,,),(rr
r .
As the image intensity change at a point due to motion gives only one constraint, while
the motion vector at the same point has two components, the motion field cannot be
computed without an additional constraint. Different second constraints have been
49
introduced to solve the problem. Among them, Horn and Schunck introduce a
smoothness constraint [4], which minimizes the square of the magnitude of the gradient
of the optical flow velocity. This is based on the assumption that video contains only
opaque objects of finite size commonly undergoing rigid motion, which means that
neighboring points on the objects have similar velocities and the velocity field of the
brightness patterns in the image varies smoothly almost everywhere. This approach
results in a dense motion field. In video compression application these techniques suffer
from two serious drawbacks. First, direct adoption of the motion field will result in an
immense bit rate for motion information. Second, the smoothness constraint is not very
realistic in many situations, especially on moving objects boundaries.
4.1.1.2 Block matching techniques
In block matching techniques, the image is partitioned into rectangular blocks and the
same motion vector is assigned to all pixels within the block [5]. The motion vector is
obtained by minimising the disparity measure between the block in the current frame
and the block in the reference frame. Obviously the inherent motion model is quite
restrictive as it assumes the image is composed of rigid objects in translational motion.
The direct results of this restrictiveness are unreliable motion fields in the sense of the
true motion in the scene, block artifacts and poor motion-compensated prediction along
moving edges. However because of their simplicity to implement and the small
overhead on motion information in video coding, rectangular block matching techniques
have been widely used and adopted in the currently video standards including H.261
[6], H.263 [56], MPEG-1 [7], MPEG-2 [8] and MPEG-4 [57]. Recently more accurate
and complicated motion model and motion estimation techniques based on spatial
transformations such as triangular mesh [9] and quadrilateral mesh [10] have been
proposed, but the computational complexity and the higher amount of overhead
information have not justified their wide acceptance.
4.1.1.3 Pel-recursive techniques
Pel-recursive techniques recursively minimize the prediction error and are carried out
on a pixel-by-pixel basis, leading to a dense motion vector field [11]. Due to an
50
increased computational complexity at the decoder and other inherent drawbacks, this
technique is not commonly used in video compression.
4.1.2 Transforms
Transforms represent video in different domains from the time domain, for example in
the frequency domain. Transforms allow the number of variables or coefficients used to
represent the video to decrease, so in this way compression is realised. Among
transform methods for exploiting spatial redundancy, the discrete cosine transform
(DCT) [12] has been most successful so far and it has been incorporated into all the
image and video coding standards.
In all image and video coding standards, the DCT is based on block of 8 x 8 as shown
below,
∑∑ ++=7
0
7
0]16/)12cos[(]16/)12cos[(),(
2)(
2)(),( υπµπυµυµ yxyxfCCF
where µ and υ are the horizontal and vertical frequency indices, respectively, and the
constants, )(µC and )(υC are given by:
2/1)( =µC if 0=µ
1)( =µC if µ > 0
The original samples can be recreated by Inverse DCT (IDCT) defined as below,
∑∑ ++=7
0
7
0
]16/)12cos[(]16/)12cos[(),(2
)(2
)(),( υπµπυµυµ yxFCCyxf
Though research on other kind transforms, like wavelet/subband, fractal, etc, has been
very active, no successful result consistently and universally beating the DCT on overall
performance combined with motion estimation on video coding has been reported.
4.2 Model based video coding
In model based video coding, some kinds of models are used to exploit the special
features of the video. Model-based video coding can be classified into two categories:
51
3D model-based video coding and 2D model-based video coding. 3D model-based
video coding refers to the models of the real world content objects of the video, while
2D refers to the two-dimensional motion model of the video sequences. The detailed
description of these two coding schemes is below.
4.2.1 3D model coding
In 3D model-based coding, often referred to as 3D knowledge-based or 3D object-based
coding in literature, both the encoder and decoder contain a 3D model of the object to
be coded based on a priori knowledge of the object [13,14,15]. The model can be
downloaded at the decoder at the beginning of the transmission session. At the
transmitting side, the images are analysed including scaling of the 3D wireframe model,
global and local motion estimation, and extraction of the surface color and texture. As
the image object (e.g. the head of a person) moves, motion parameters defining the
coordinates of the wireframe model, described by the global motion of the head and
local motion due to the facial expressions and texture information, are updated and
transmitted. At the receiving side, the image is synthesised using these estimated
motion parameters.
3D model-based coding opens up the possibility of image coding at extremely low
bitrates, but several problems need to be solved before it can be applied to more general
situations. First, modelling objects is one of the important issues in 3D model-based
coding. So far no successful result has been reported on this method except for the
specialized case that the input always consists of a moving head and shoulders.
However dealing with unknown objects is an extremely difficult problem. The second
problem is the presence of analysis and synthesis errors due to mismatch of the wire-
frame, inaccurate motion estimation and rapidly changing texture information, which
can cause serious artifacts in the decoded images. Consequently, some authors suggest
2D deformable mesh and triangular models as described below.
4.2.2 2D model coding
In 2D model-based coding [16], the following steps are implemented,
52
• Segment the image into semantically meaningful regions which should coincide
with real objects to guarantee that the modeling and description of the motion will
be efficient.
• Build a mesh model, which can be triangular or quadrilateral, for each object.
Estimate the motion vectors at the vertices.
• Determine the transformation mapping parameters for each mesh element given the
displacement vectors at its vertices. Synthesize the present frame by mapping the
intensity or color information from the previous reconstructed frame onto the
corresponding patches in the present frame. Compute the synthesis error.
• Encode both the motion vectors at the vertices and the synthesis error.
So the key technique in 2D model-based coding is mesh-based motion estimation,
which overcomes the intrinsic artifact problem that translational block-based motion
estimation has [17,18]. Nevertheless it has several drawbacks, which need to be
addressed before it can be put to more generic applications. First, the occlusion problem
has not been solved. Second, to segment the image intelligently into semantically
meaningful regions cannot be done automatically as the segmentation itself is a ill-
posed problem.
Actually segmentation-based video coding is one of the key techniques in so called
“second generation” video coding schemes [19]. The whole MPEG-4 philosophy is
based on the assumption that the content of the images can be segmented into
meaningful objects. However until the segmentation problem can be solved with
reasonable results automatically, the utilization of MPEG-4 will have to rely on manual
intervention for segmentation.
4.3 Current Video Standards
So far five international standards for video coding have been created. H.261 addresses
videophone and videoconference applications at bit rates of multiples of 64 kbps.
H.263 is intended for similar applications as H.261, but at lower bit rates less than 64
kbps. MPEG-1 aims at digital storage media application up to about 1.5 Mbps. MPEG-
2 is for broadcast television at bit rates of 3-30 Mbps while MPEG-4 is for multimedia
53
application at 5 kbps to 4 Mbps. Among these standards the smallest video format is
sub-QCIF supported by H.263 with 96 lines and 128 pixels per line.
4.3.1 Core video coding techniques in the current video coding standard
All the above-mentioned video coding standards support encoding methods, which
exploit both the spatial redundancies and temporal redundancies inherent in the video
sequence. Spatial redundancies are exploited by using block-based Discrete Cosine
Transform (DCT) coding of 8 by 8 pixel blocks followed by quantization, zigzag scan,
and variable length coding of runs of zero quantized indices or the amplitudes of the
non-zero indices. Temporal redundancies are exploited by using motion compensation,
in which the difference picture of the current frame and its prediction in the reference
frame is coded based on the DCT scheme.
input ◦ Inter/intra ◦ ◦
Figure 4.1 DCT based video coding
As shown in Figure 4.1, each video frame is divided into blocks of a fixed size and each
block is more or less processed independently. A block is first predicted from a
DCT quantiser VLC buffer
Inverse Q&DCT
Frame store
Motion Estimatior
54
matching block in a previously coded reference frame through motion estimation. The
prediction error block is spatially de-correlated, by converting it into the frequency
domain using the discrete cosine transform (DCT); further compression is realized by
quantizing the resulting coefficients and converting them into binary code words using
variable length code (VLC).
After the DCT transform and quantization is applied, coefficients representing high
spatial frequencies are often zero, whereas low-frequency coefficients are often
nonzero. To exploit this behavior, the coefficients are arranged qualitatively from low
to high spatial frequency following the zigzag scan order shown as Figure 4.2.
0 1 5 6 14 15 27 28 2 4 7 13 16 26 29 42 3 8 12 17 25 30 41 43 9 11 18 24 31 40 44 53 10 19 23 32 39 45 52 54 20 22 33 38 46 51 55 60 21 34 37 47 50 56 59 61 35 36 48 49 57 58 62 63
Figure 4.2 Zigzag scan of DCT coefficients
Each nonzero AC coefficient is coded using run-level symbol structure, where each
symbol is encoded using variable length Huffman code. Run refers to the number of
zero coefficients before the next nonzero coefficient; level refers to the amplitude of the
nonzero coefficient. The variable length Huffman code is also applied to the coding of
motion vectors.
The above discussion assumes that temporal prediction is successful, in that the
prediction error block requires fewer bits to code than original image block. This
represents the P-mode of coding. When this is not the case, the original block will be
coded directly using DCT and run-length coding. This is known as intra or I-mode.
Instead of using a single reference frame for prediction, bi-directional prediction can be
used, which finds two best matching blocks, one in a previous frame and another in a
55
following frame, and uses a weighted average of the two matches as the prediction for
the current block. In this case, two MVs (motion vector) are associated with each
block. This is known as B-mode. Both P-mode and B-mode are generally referred to as
inter-mode. The mode information, the MVs and other side information regarding
picture format, block location, etc. are also coded using VLC.
In practice, the block size for motion estimation may not be the same as that used for
transform coding. Typically, motion estimation is done on a larger block known as
macroblock (MB), which is subdivided into several blocks. In the current standards, the
MB size is 16x16 pixels and the block size is 8x8 pixels. The coding mode is decided at
MB level. Because MVs of adjacent MBs are usually similar, the MV of a current MB
is predictively coded, using the MV of the previous MB for prediction. Similarly, the
DC coefficient of a block is predictively coded, with respect to the DC value of the
previous block.
The encoded video bitstream can have such frame structure that it can include I frames
(Intra coded), P frames (Predictive coded) and B frames (Bi-directionally predictive
coded). An I frame is coded entirely in Intra-mode. A P frame is coded using motion
compensated prediction from a past reference frame. Depending on the prediction
accuracy, and MB in P frame can be coded in either Intra-mode or P-mode. A B frame
is coded using motion compensated prediction from a past and future reference frame.
A MB in B frame can be coded in I-, P- or B-mode. I frames and P frames can be used
as reference by other pictures while B frames may never be used to predict another
picture. From error resilience point of view, I frame is most robust, as it doesn’t need to
reference other pictures, thus stops error propagation, but it is most inefficient coding
mode as it produces huge bits number. P frame and B frame achieve high coding
efficiency with B frame as most efficient, but they are vulnerable, as they need other
pictures as reference. If error happens in reference pictures, the effect will propagate to
the current picture and all subsequent pictures, which take the error pictures as
references.
The use of the variable length code improves the coding efficiency, however the most
disadvantage of it is that it introduce vulnerability of the encoded video bitstream.
When an error occurs in the bitstream, decoder is unable to locate the next code word
56
and therefore will loss the synchronization with encoder. This invokes the need to have
an encoded video bitstream to have some error resilience feature.
4.4 Overview of error resilience techniques
To address the need to make the video bitstream more error resilient, diverse error
resilience techniques [20,21,22] have been developed. Depending on the role that the
encoder, decoder and the network layers play in the process, error resilience techniques
can be divided into three categories: error resilient encoding, decoder error concealment
and encoder and decoder interactive error control.
4.4.1 Error resilient encoding
In this approach, the encoder adds redundancy bits into the video bitstream to enhance
the video quality when the bitstream is corrupted by transmission errors. The
redundancy bits should be inserted to achieve maximum gain with the smallest amount
of redundancy.
4.4.1.1 Robust Entropy encoding
As described in previous sections, one major cause for the vulnerability of the
compressed video bitstream is that a video coder uses VLC to represent various
symbols. Any bit errors or lost bits in the middle of a code word will make the code
word undecodable and also make it impossible for the decoder to locate the next code
word, thus causing loss of synchronization with the encoder until the next
resynchronisation point. To tackle this problem, the following techniques have been
developed.
Resynchronisation Markers: One simple and effective approach to address the problem
associated with the use of VLC is to insert resynchronisation markers [26] periodically.
These markers are specially designed in such way that they can be easily distinguished
from all other code words and small perturbation of these code words. Usually some
header information necessary to decode the remaining part of the picture is attached
immediately after the resynchronisation markers. This way, instead of looking for the
57
next picture start code by discarding all the remaining part of the bitstream before the
following picture start code, the decoder can resume proper decoding upon the detection
of a resynchronisation marker.
Reversible Variable Length Coding (RVLC): RVLC is a specially designed VLC that
can be decoded in both forward and backward directions [48]. Without the use of
RVLC, the decoder discards all the bits until a resynchronisation code word is identified
after an error occurs. With RVLC the decoder can not only decode bits after a
resynchronisation marker, but also decode bits before the next resynchronisation code
word in the backward direction. Thus with RVLC, fewer correctly received bits will be
discarded compared with situations where no RLVC is used. Intelligently designed
RVLC and corresponding decoding methods can significantly improve the error
robustness of the bit stream, with little or no loss of coding efficiency [23,24].
Provisions for Syntax-Based Repairs: Because of the syntax constraint present in
compressed video bitstreams, it is possible to recover data from a corrupted bitstream
by making the corrected stream conform to the right syntax [25]. Obviously, such
techniques are very much dependent on the particular coding scheme. The use of
synchronization codes, RVLC, and other sophisticated entropy coding means such as
error resilient entropy coding can all make such repairs more feasible and effective.
4.4.1.2 Error Resilient prediction
Another major contribution to the sensitivity of compressed video to transmission errors
is the use of temporal prediction. Once an error occurs, the reconstructed frame at the
decoder differs from that assumed at the encoder and the reference frames used at the
decoder from there onward will differ from those used at the encoder. Consequently all
subsequent reconstructed frames will be in error: this process is usually referred to as
error propagation. The use of spatial prediction for the DC coefficients and MVs will
also cause error propagation. Two techniques are used to address this need, as follows.
Insertion of Intra-Blocks or Frames: A simple way to stop temporal error propagation is
to encode entire frames in Intra mode [28] more often. For real-time applications, the
often use of Intra frames mode is typically not practical due to the delay constraints.
Instead of entire frames, the use of a sufficiently high number of intra-MBs is more
58
realistic. When employing intra-MBs for error resilience, both the number of such MBs
and their spatial placement has to be determined. The number of necessary intra-MBs is
obviously dependent on the quality of the connection. For the spatial placement of I-
mode blocks, several schemes have been proposed. Random placement has been shown
to be efficient, as well as placement in the areas of highest activity, determined by the
average MV magnitude. Hybrid schemes that additionally consider the time of the last
intra-update of a given MB were also considered. None of those schemes outperformed
any of the others significantly. The currently best-known way for determining both the
correct number and the placement of intra-MBs for error resilience is the use of a loss-
aware rate distortion optimization scheme. Finally if the back channel from decoder to
encoder is available, the information about missing or damaged MB data can be sent to
the encoder to trigger intra-coding at the encoder.
Independent Segment Prediction: The other approach to limit the extent of error
propagation is to split the data domain into several segments and perform
temporal/spatial prediction only within the same segment. This way, the error in one
segment will not affect another segment. One such approach is to include even-indexed
frames into one segment, and odd-indexed frames into another segment [29,30]. This
way, even frames are only predicted from even frames. Another approach is to divide a
frame into multiple regions (e.g. a region can be a GOB or slice), and a region can only
be predicted from the same region in the previous frame.
4.4.1.3 Layered Coding with Unequal Error Protection
Layered coding or scalable coding refers to coding a video into a base layer and one or
several enhancement layers [31]. The base layer provides a low but acceptable level of
quality, and each additional enhancement layer will incrementally improve the quality.
Layered coding also enables users with different bandwidth capacity or decoding
powers to access the same video at different quality levels. To show its strength as an
error resilience tool, layered coding needs to be paired with unequal error protection
(UEP) [32, 33] in the transport system, so the base layer gets most protection using
more channel resources while the enhancement layers get less protection. The
philosophy with this approach is that when the channel condition deteriorates, at least
video quality with base layer is guaranteed.
59
There are many ways to divide video data into more than one layer. According to the
choice of these ways, scalable video can be classified into data partitioning, SNR
scalability, spatial scalability and temporal scalability. These scalability schemes can
also be combined to form a hybrid scalability scheme.
Data partitioning: In this approach, the video bitstream is split so that one layer
contains all of the key headers, motion vectors and low-frequency DCT coefficients.
The second layer contains less critical information such as high frequency DCT
coefficients, possibly with less error protection.
SNR scalability: mainly used in applications that support video transmission at multiple
qualities. All layers have the same spatial resolution but different video quality. The
lower layer provides the basic video quality. The enhancement layers are coded so as to
enhance the basic quality by providing refinement data for the DCT coefficients of the
lower layer.
Spatial scalability: the input source video is preprocessed to create the lower-resolution
image. This is independently coded. In the enhancement layer the differences between
an interpolated version of the base layer and the source image are coded.
Temporal scalability: the lower temporal rate pictures are coded as the basic temporal
rate; the additional pictures are coded with temporal prediction relative to the base
layer.
4.4.1.4 Multiple Description Coding
Similar to layered coding, multiple description coding (MDC) [34, 35, 36, 37] also
codes a source into several sub-streams, known as descriptions, but the decomposition
is such that the resulting descriptions are correlated and have similar importance. Any
single description should provide a basic level of quality, and more descriptions
together will provide improved quality. For each description to provide a certain degree
of quality, all the descriptions must share some fundamental information about the
source, and thus must be correlated. This correlation enables the decoder to estimate a
missing description from a received one and thus provide an acceptable quality level
from any description. On the other hand, this correlation is also the source of
redundancy in MDC. An advantage of MDC over layered coding is that it does not
60
require special provisions in the network to provide a reliable sub-channel. For
example, in a very lossy network, many retransmissions have to be invoked or a lot of
redundancy has to be added in FEC to realize error free transmission. In this case, it
may be more effective to use MDC.
To accomplish their respective goals, layered coding uses a hierarchical, un-correlating
decomposition, whereas MDC uses a non-hierarchical, correlating decomposition.
Some approaches that have been proposed for accomplishing such decomposition
include overlapping quantization, correlated predictors, correlating linear transforms,
correlating filter-banks and interleaved spatial-temporal sampling.
4.4.2 Decoder Error Concealment
Decoder error concealment [38, 39] refers to the recovery or estimation of lost
information due to transmission errors. For a block-based hybrid coding paradigm,
there are three types of information that may need to be estimated in a damaged MB:
the texture information, including the pixels or DCT coefficient values for either an
original image block or a prediction error block; the motion information, consisting of
MVs for MBs coded in either P-mode or B-mode; and finally the coding mode of the
MB. Most of the error concealment techniques utilize some kind of spatial or temporal
interpolation based on the proposition that the colour values of spatially and temporally
adjacent pixels vary smoothly, except in the regions with edges.
4.4.2.1 Recovery of Texture Information
Motion Compensated Temporal Prediction: A simple and yet very effective approach to
recover a damaged MB in the decoder is by copying the corresponding MB in the
previously decoded frame based on the MV for this MB. The performance of this
approach depends critically on the availability of the MV. When the MV is also
missing, it must first be estimated. To reduce the impact of the error in the estimated
MVs, temporal prediction may be combined with spatial interpolation.
Spatial Interpolation: Another simple approach is to interpolate pixels [40, 41] in a
damaged block from pixels in adjacent correctly received blocks. Usually, because all
blocks or MBs in the same row are put into the same packet, the only available
61
neighboring blocks are those in the current row and the row above. Because most
pixels in these blocks are too far away from the missing samples, usually only the
boundary pixels in neighboring blocks are used for interpolation. Instead of
interpolating individual pixels, a simpler approach is to estimate the DC coefficient of a
damaged block and replace the damaged block by a constant equal to the estimated DC
value. The DC value can be estimated by averaging the DC values of surrounding
blocks. One approach to facilitate such spatial interpolation is by an interleaved
packetization mechanism so that the loss of one packet will damage only every alternate
block or MB.
Spatial and Temporal Interpolation by Maximizing the Smoothness of Resulting Video:
A problem with spatial interpolation is how to determine an appropriate interpolation
filter. Another shortcoming is that any received DCT coefficients are ignored. These
problems can be solved by requiring the recovered pixels in a damaged block to be
smoothly connected with the neighboring pixels, both spatially in the same frame and
temporally in the previous and following frames [42, 43]. If some but not all DCT
coefficients are received for the current block, then the estimation should be such that
the recovered block be as smooth as possible, subject to the constraint that the DCT on
the recovered block would produce the same value for the received coefficients. These
objectives can be formulated as an unconstrained optimization problem, and the
solutions under different loss patterns correspond to different interpolation filters in the
spatial, temporal and frequency domains.
Spatial Interpolation Using Projection onto Convex Sets (POCS) Technique: The
general idea behind POCS-based estimation methods [44, 45] is to formulate each
constraint about the unknowns as a convex set. The optimal solution is the intersection
of all the convex sets, which can be obtained by recursively projecting a previous
solution onto individual convex sets. When applying POCS for recovering an image
block, the spatial smoothness criterion is formulated in the frequency domain, by
requiring the discrete Fourier transform (DFT) of the recovered block to have energy
only in several low frequency coefficients. If the damaged block is believed to contain
an edge in a particular direction, then one can require the DFT coefficients to be
distributed along a narrow strip orthogonal to the edge direction, i.e., low-pass along the
edge direction, and all-pass in the orthogonal direction. The requirement on the range
62
of each DFT coefficient magnitude can also be converted into a convex set, as can the
constraint imposed by any received DCT coefficient. Because the solution can only be
obtained through an iterative procedure, this approach may not be suitable for real-time
applications.
4.4.2.2 Recovery of Coding Modes and Motion Vectors
Coding modes and motion vectors are fundamental information needed to decode
compressed video bitstream based on the current video-coding standard. One way to
estimate the coding mode for a damaged MB is by collecting the statistics of the coding
mode pattern of adjacent MBs and finding a most likely mode given the modes of
surrounding MBs. A simple and conservative approach is to assume that the MB is
coded in the intra-mode and use only spatial interpolation for recovering the underlying
blocks.
For estimating lost MVs, there are several possible simple operations [49]:
• Assuming the lost MVs to be zeros, which works well for video sequences with
relatively small motion.
• Using the MVs of the corresponding block in the previous frame.
• Using the average of the MVs from the spatially adjacent blocks.
• Using the median of MVs from the spatially adjacent blocks.
• Reestimating the MVs.
Typically when an MB is damaged, its horizontally adjacent MBs are also damaged,
and hence the average or mean is taken over the MVs above and below. It has been
shown that the last two methods produce the best reconstruction results [46].
Instead of estimating one MV for a damaged MB, one can use different MVs for
different pixel regions in the MB for a better result.
4.4.3 Encoder and Decoder Interactive Error Control
In all the techniques described in the previous sections, the encoder and decoder operate
independently to combat transmission errors. When a feedback channel from decoder
to encoder is available, better performance can be achieved if the encoder and decoder
cooperate in the process of error concealment [47]. For real-time applications it is not
63
realistic to employ error control techniques used in data link layer, e.g. ARQ. However
it is possible to limit or stop the error propagation effect by employing intra-mode
coding or dynamic reference picture selection according to the back channel message;
in this way we can reduce the coding inefficiency inherent with periodic intra mode
coding.
4.4.3.1 Reference Picture Selection (RPS) Based on Feedback Information
A simple way to take advantage of an available feedback channel is to employ RPS. If
the encoder learns through a feedback channel about a damaged part of a previously
coded frame, it can use a previous picture other than the last and damaged one as a
reference picture for encoding the next P-frame. Of course this reference picture should
be also available to the decoder. The disadvantage is that both encoder and decoder
need to have a large buffer to store several past decoded pictures as possible reference
pictures. Information about the reference picture to be used is conveyed in the bit
stream. Compared to coding the current picture as an I-frame, the penalty for using the
older reference picture is significantly lower, if the reference picture is not too far away.
4.4.3.2 Error Tracking Based on Feedback information
Instead of using an earlier and undamaged frame as the reference frame, the encoder can
track how the damaged areas in frame n would have affected decoded blocks in frames
n+1 to n+d-1, and then perform one of the following [50,51,52],
• Code the blocks in frame n+d that would have used for prediction of damaged
pixels in frame n+d-1 using intra-mode.
• Avoid using the affected area in frame n+d-1 for prediction in coding frame n+d.
• Perform the same type of error concealment at the encoder as at the decoder for
frame n+1 to n+d-1, so that the encoder’s reference picture matches that at the
decoder, when coding frame n+d.
The first two approaches only require the encoder to track the locations of damaged
pixels or blocks, whereas the last approach requires the duplication of the decoder
operation for frame n+1 to n+d-1, which is more complicated. In either approach, the
decoder will recover from errors completely at frame n+d.
64
4.5 Error resilience tools in the current video coding standards
Among all of the current video coding standards, H.263 and MPEG-4 have been created
with the intention for possible use in mobile environments. Because the work
completed in this thesis has been mainly targeted to mobile environment, only error
features from these two standards are reviewed.
4.5.1 Error resilience tools in H.263
H.263 follows the general ideas of block-based hybrid coding. Beyond the baseline
syntax, H.263 offers a variety of optional operation modes [53] that adjust various
tradeoffs. Some of these modes typically allow adjusting the tradeoff between
computational complexity and compression efficiency, while others are intended to
improve error resilience by adding redundancy bits to the bitstream, which will be
discussed in more detail in the following sections.
H.263 contains four error resilience tools: block-based FEC, flexible synchronization
points (slices), independent segment decoding (IDS) and reference picture selection
(RPS). The temporal, spatial, and SNR scalability modes can also be used to support
error resilient applications. An appropriate combination of these tools along with means
available in the baseline syntax, such as intra-MB refresh, is typically chosen adaptively
by the application according to the network characteristics and conditions.
4.5.1.1 Forward Error Correction Mode (FEC) (Annex H)
The FEC mode divides the H.263 bitstream into FEC frames of 492 bits each. A 19-bit
BCH forward error correction checksum [54] is calculated for all the bits of such a FEC
frame, along with additional bit to allow for resynchronization of the resulting 512-bit
block structure. This FEC coding allows the correction of single bit error in each FEC
frame and the detection of two bit errors for an approximately 4% increase in bit rate.
The FEC mechanism of Annex H is designed for ISDN, which is a very low error rate
network.
65
4.5.1.2 Slice Structure Mode (Annex K)
When the slice structure mode is used, the original group of block (GOB) structure is
replaced by a slice structure. Slices consist of a number of macroblocks belonging to
the same picture. These macroblocks might be arranged either in scanning order or in a
rectangular shape. In both cases, any macroblock of a picture belongs to exactly one
slice. All macroblocks of one slice can be decoded independently from the content of
other slices because no dependencies such as prediction of motion vectors are allowed
across slice boundaries. The main difference between a GOB and a slice is that a GOB
always has a rectangular shape, while a slice has a more flexible shape and usage than a
GOB.
There is a need to have the information of the picture header available to decode a slice
because the information conveyed in the picture header is not repeated in the slice
headers. Scan order slices are often more useful if small packet sizes are needed,
whereas rectangular slices are helpful in achieving packet loss resilience and low codec
delay at higher bit rates. Each of the two slice structures can be used either with a fixed
scan-ordered or an arbitrarily ordered transmission of the slices. The latter makes
decoder implementation more difficult, but minimizes latency in lossy environments.
The former is more appropriate for heavily pipelined hardware architectures, which
might not allow random decoding of data.
4.5.1.3 Independent Segment Decoding Mode (Annex R)
The independent segment decoding mode enforces the treatment of segment boundaries
as if they are picture boundaries. A segment is defined as a slice, a GOB, or a number
of consecutive GOBs with empty GOB headers. This mode allows the independent
decoding of picture parts, if and only if, the shape of the independently decodable
segments remains identical between two I frames. In such a case, the import of
previously corrupted picture data outside the segment boundaries (due to motion
compensation) during the reconstruction process can be avoided. The independent
segment decoding mode can be used for special effects like spatial video mixing, but it
can also achieve error resilience by eliminating error propagation between well-defined
spatial parts of a picture.
66
4.5.1.4 Reference Picture Selection (RPS - Annex N)
The RPS mode allows the use of an earlier than the last transmitted picture to serve as
the reference picture for inter picture prediction. It is also possible to apply RPS to
individual segments rather than full pictures. The temporal reference of the reference
picture to be used is conveyed in the picture/segments header to inform the decoder
which of its several reference pictures should be used.
The RPS mode may be used with or without a back channel. In multi-party video
applications, back channels are obviously not realistic. For these scenarios, one
possible method of using RPS mode is known as video redundancy coding (VRC).
VRC can be used in conjunction with the spatial error resilience mechanisms of Annex
R and Annex K to achieve spatial and temporal error resilience.
Figure 4.3 VRC with two threads and three frames per thread
The principle of the VRC method is to divide the sequence of pictures into two or more
threads in such a way that all camera pictures are assigned to one of the threads in a
round-robin fashion. Each thread is coded independently. Figure 4.3 shows that the
pictures have been divided into two threads. Obviously, the frame rate within one
thread is much lower than the overall frame rate: half in case of two threads, a third in
case of three threads, and so on. This leads to a substantial coding penalty because of
the generally larger scene changes in the picture sequence and longer motion vectors
typically required to represent accurately the motion related changes between two P
P1 P3 P5
Sync
P2 P4 P6
Sync
time
67
frames within a thread. At regular intervals, all threads converge into a so-called Sync
frame as shown in Fig. 4.3.
If one of these threads is damaged because of transmission errors, the remaining threads
stay intact and can be used to predict the next Sync frame. It is possible to continue the
decoding of the damaged thread, which leads to slight picture degradation, or to stop its
decoding, which leads to a drop of the frame rate. If the length of the threads is kept
reasonably small, however, both degradation forms will persist only for a very short
time, until the next Sync frame is reached.
Figure 4.4 illustrates the workings of VRC when one of the two threads is damaged.
Sync frames are always predicted based on one of the undamaged threads. This means
that the number of transmitted I frames can be kept small because there is no need for
complete resynchronisation. The dotted box in Fig. 4.4 means the frame has been
corrupted by the transmission error and was not decoded successfully. Consequently
the frame rate from the corrupted frame to the Sync point will be lower because these
frames are no long decodable. From the Sync point a new cycle will start again,
because the frame at Sync point can still be decoded from the other thread, which has
been successfully decoded.
Figure 4.4 Frame loss with VRC
A correct Sync frame prediction is no longer possible only if all threads between two
Sync frames are damaged. In this situation, annoying artifacts will be present until the
next I frame is decoded correctly, as would have been the case without employing VRC.
P1 P3 P5
Sync
P2 P6
Sync
time
68
If a back channel is available, messages can be sent to the encoder from the decoder
containing the positive or negative acknowledgements of a decoded picture along with
the temporal reference of the picture can be sent to encoder. By using this information,
the encoder can keep track of the last correctly decoded picture at the decoder. Once
the encoder learns about as incorrectly decoded picture at the decoder through a back
channel message, it can react accordingly by using a correct reference picture for further
prediction.
4.5.2 Error resilience tools in MPEG-4
In MPEG-4, five error resilience tools have been incorporated into the standard listed as
below:
Video Packetization or Resynchronisation
Data Partitioning (DP)
Reversible Variable-Length Codes (RVLCs)
Adaptive Intra Refresh (AIR)
NEWPRED
The first two approaches and AIR try to isolate the influence of the errors to be within
one packet, while RVLC attempts to recover some data which will be discarded if
RVLC is not used. NEWPRED brings encoder and decoder into cooperation to conceal
the error effect.
4.5.2.1 Packetization
Basically the video packet resynchronisation is very similar to the Group of Blocks
(GOB) or slice structure mode in Annex K of H.263+. Resynchronization or
packetization attempts to stop error propagation after errors have been detected, by
inserting resynchronization markers into the bitstream. When errors occur in the
encoded bitstream without using resynchronization markers, the decoder will not be
able to locate the next code word, and therefore will lose synchronization with the
encoder. When resynchronization markers are inserted in the bitstream, the decoder can
regain synchronization by looking for the next resynchronization marker after losing
synchronization due to the errors in the bitstream. Generally, the data between the
69
synchronization point prior to the error and the first point where synchronization is re-
established is discarded.
The main difference between GOB and Packetization is that the GOB approach to
resynchronization is based on spatial resynchronization while the video packet approach
adopted by MPEG-4 is based on providing periodic resynchronization markers
throughout the bitstream. In the GOB approach, once a particular macroblock location
is reached in the encoding process, a resynchronization marker is inserted into the
bitstream. A potential problem with this approach is that since the encoding process is
variable rate, these resynchronization markers will most likely be unevenly spaced
throughout the bitstream. Therefore, certain portions of the scene, such as high motion
areas, will be more susceptible to errors, which will also be more difficult to conceal. In
the video packet approach, the length of the video packets are not based on the number
of macroblocks, but instead on the number of bits contained in that packet. If the
number of bits contained in the current video packet exceeds a predetermined threshold,
then a new video packet is created at the start of the next macroblock.
Figure 4.5 Packet structure
Figure 4.5 shows a typical video packet (VP) structure. A resynchronization marker is
used to distinguish the start of a new video packet. This marker is distinguishable from
all possible VLC code words as well as the VOP (Video Object Plane) start code.
Header information is also provided at the start of a video packet. Contained in this
header is the information necessary to restart the decoding process and includes the
macroblock address (number) of the first macroblock contained in this packet and the
quantization parameter (quant_scale) necessary to decode that first macroblock. The
macroblock number provides the necessary spatial resynchronization while the
quantization parameter allows the differential decoding process to be resynchronized.
Following the quant_scale is the Header Extension Code (HEC). As the name implies,
HEC is a single bit used to indicate whether additional information will be available in
Resync Marker
MB Number
Quant Scale
HEC Macroblock Data Resync Marker
70
this header. If the HEC is equal to 1 then the following additional information is
available in the packet header: module time base, vop_time_increment,
vop_coding_type, intra_dc_vlc_thr, vop_fcode_forward, vop_fcode_backward. In this
case the HEC makes it possible to decode each VP independently, as all the necessary
information to decode the VP is included in the header extension code field.
If the VOP header information is corrupted by a transmission error, it can be corrected
by the HEC information. The decoder can detect the error in the VOP header, if the
decoded information is inconsistent with its semantics.
In conjunction with the video packet approach to resynchronization, a second method
called fixed interval synchronization has also been adopted by MPEG-4. This method
requires that VOP start codes and resynchronization markers appear only at legal fixed
locations in the bitstream. This helps to avoid the problems associated with errors
present in the bitstream which can emulate a VOP start code. In this case, when fixed
interval synchronization is unitized, the decoder is only required to search for a VOP
start code at the beginning for each fixed interval. The fixed interval synchronization
method extends this approach to be any predetermined interval.
Fixed interval synchronization is achieved by first inserting a bit with the value 0 and
then, if necessary, inserting bits with value 1 before the start code and the Sync marker.
The decoder can determine if errors are incurred in a video packet by detecting the
incorrect number of these stuffing bits.
4.5.2.2 Data Partitioning (DP)
Different from resynchronisation, DP [55] is an error concealment tool, which is
achieved by separating the motion and macroblock header information away from the
texture information. If the texture information is lost, this approach utilises the motion
information to conceal these errors. That is, due to the errors the texture information is
discarded, while the motion is used to motion compensate the previously decoded VOP.
The syntactic structure of the DP mode is depicted in Fig. 4.6.
71
Figure 4.6 Structure of Data Partitioning
Error concealment is an extremely important component of any error robust video codec.
Similar to the error resilience tools, the effectiveness of an error concealment strategy is
highly dependent on the performance of the resynchronisation scheme. Basically, if the
resynchronisation method can effectively localize the error, then the error concealment
problem becomes much more tractable.
4.5.2.3 Reversible Variable Length Coding (RVLC)
RVLC is the only error resilience tool which has some kind of data recovery mechanism
in MPEG-4. The main contribution to the vulnerability of the compressed video using
the current standards is the use of variable length codes, though it does achieve a high
compression ratio. During the decoding process, if the decoder detects an error while
decoding VLC data, it loses synchronization with the encoder. As a consequence, the
decoder typically discards all the data up to the next resynchronization point. RVLC
alleviates this problem and enable the decoder to better isolate the errors, thus
improving data recovery in the presence of errors.
RVLC is designed so as to be instantaneously decoded both in forward and reverse
directions. A part of a bitstream which cannot be decoded in the forward direction due
to the presence of errors can often be decoded in the backward direction, and so recover
some information which would otherwise have been discarded. However RVLC is only
applied to TCOEF coding in MPEG-4 at this stage.
4.5.2.4 Adaptive Intra Refresh (AIR) for Error Resilience
In the current video coding standard, error refreshment is mandatory. When an error
occurs in an I or a P frame, all subsequent frames are degraded unless any error
refreshment technique is adopted. However encoding entire pictures in Intra mode to
avoid this reduces the coding efficiency greatly, so a compromise is the AIR approach.
Resync Marker
MB Number
Quant Scale
HEC Motion & header information
Motion Marker
Texture Info
Resync Marker
72
In AIR, the motion area is encoded frequently in Intra mode and the number of Intra
MBs in a VOP is fixed and predetermined, depending on bit rate and frame rate. The
encoder estimates motion for each MB and the motion area is encoded in Intra mode.
The results of this estimation are recorded to the Refresh Map. The encoder refers to
the Refresh Map and decides whether to encode the current MB in Intra mode or not.
The decision is performed by the comparison between SAD and a threshold value.
SAD is the Sum of the Absolute Difference value between the current MB and the MB
in same location of the previous VOP. Since the SAD has been already calculated in
the Motion Estimation part, additional calculation for the AIR is not necessary. If the
SAD of the current MB exceeds the threshold it is regarded as a high motion area and it
is encoded in Intra mode.
4.5.2.5 NEWPRED
Similar to the RPS mode (Annex N) and Slice Structure Mode (Annex K) of H.263 in
principle, when the NEWPRED mode is turned on in MPEG-4, the reference used for
inter-prediction by the encoder will be updated adaptively according to feedback from
the decoder via feedback messages. These upstream messages indicate which
NEWPRED (NP) segments (which can either be an entire frame, or the content of a
packet) have been successfully decoded and which NP segments have not. Based on the
feedback information the encoder will either use the most recent NP segment, or a
spatially corresponding but older NP segment for prediction. In the latter case the
coding efficiency is reduced, as long motion vectors and additional texture information
will typically have to be used.
73
References
[1] Martin Vetterli and Jelena Kovacevic, “Wavelets and Subband Coding”, Prentice
Hall 1995.
[2] Haobo Li, Mirek Novak and Rober Forchheimer, “Fractal-based image sequence
compresson scheme”,Optical Engineering, 32(7), July 1993, pp.1588-95.
[3] Katherine S. Wang, James O. Normile and Hsi Jung Wu, “Software decodable
video compression algorithm based on vector quantization and classification”. In:
Proceedings of IEEE Workshop on Visual Signal Processing and Communication,
Melbourne, September 1993.
[4] Berthold K.P. Horn and Brian G. Schunck, “Determining Optical flow”, Artificial
Intelligence 17, 1981, pp.319-331.
[5] A. Murat Teckalp, “Digital Video Processing”, Prentice Hall PTR 1995.
[6] Recommendation H.261: Video Codec for Audiovisual Services at p×64 kbit/s.
ITU-T (CCITT), March 1993.
[7] ISO/IEC 11172-2, “Information technology-coding of moving picture and
associated audio for digital storage media at up to about 1.5 mbit/s: Part 2 video”,
August 1993.
[8] ISO/IEC: 13818 (MPEG-2). “Information technology – Generic Coding of Moving
Pictures and Associated Audio Information”.
[9] Y. Nakaya and H. Harashima, “Motion compensation based on spatial
transformations,” IEEE Trans. Circ. and Syst.: Video Tech., Vol. 4, June 1994,
pp.339-56, 366-7.
[10] Yucel Altunbasak, “Object-Scalable, Content-Based Video Representation and
Motion Tracking for Visual Communications and Multimedia”, Ph.D thesis,
Department of Electrical Engineering, University of Rochester. 1996.
[11] J. Biemond, L. Looijenga, D. E. Boekee, and R. H. J.M. Plompen, “A pel-recursive
Wiener-based displacement estimation algorithm,” Sign. Proc. Vol. 13, December
1987, pp. 399-412.
74
[12] N. Ahmed, T. Natarajan and R. K. Rao, “Discrete Cosine transform”, IEEE Trans.
On Computers, 1974, pp.90-93.
[13] Bill Welsh, “Model-based coding of images”, Ph.D thesis, British Telecom
Research laboratories, January 1991.
[14] Haobo Li, “Low Bitrate Image Sequence Coding”, Ph.D thesis, Linkoping
University, 1993.
[15] Jorn Ostermann, “Object-based analysis-synthesis coding based on the source
model of moving rigid 3D objects”, Signal Processing: Image Communication 6,
1994, pp.143-161.
[16] Candemir Toklu, “Object-based Digital Video Processing Using 2D Meshes”, Ph.D
thesis, Department of Electrical Engineering, University of Rochester, 1998.
[17] .Yucel Altunbasak and A. Murat Tekalp, “Closed-form connectivity-preserving
solutions for motion compensation using 2-D meshes”, IEEE Trans. Image Proc.,
Vol. 6, No. 9, September 1997, pp.1255-1269.
[18] Yucel Altunbasak and A. Murat Tekalp, “Occlusion-adaptive, content-based mesh
design and forward tracking”, IEEE Trans. Image Proc., Vol. 6, No. 9, September
1997, pp. 1270-1280.
[19] L.Torres and M.Kunt, “Second generation video coding techniques”, in L.Torres
and M.Kunt, “Video coding, The second generation approach”, Kluwer Academic
Publishers, 1996, pp.1-30.
[20] Y. Wang and Q. F. Zhu, “Error Control and Concealment for Video
Communication: A Review”, Proceedings of the IEEE, vol. 86, No. 5, May 1998.
pp.974 – 997.
[21] Y. Want, S. Wenger, J. Wen and A. K. Katsggelos, “Error Resilient Video Coding
Techniques”, IEEE Signal Processing Magazine, July 2000, pp.61-82.
[22] J. D. Villasenor, Y. Q. Zhang and J. Wen, “Robust Video Coding Algorithms and
Systems”, Proceedings of the IEEE, vol. 87, no. 10, October 1999, pp.1724-1733.
75
[23] J. Wen and J. D. Villasenor, “A class of reversible variable length codes for robust
image and video coding”, Proc. 1997 IEEE Int. Conf. Image Processing, vol. 2,
Santa Barbara, CA., Oct. 1997, pp. 65-68.
[24] Description of Error Resilient Core Experiiments, ISO/IEC JTC1/SC29/WG11
N1383, Nov. 1996.
[25] D. W. Redmill and N. G. Kingsbury, “the EREC: an error resilient technique for
coding variable-length blocks of data”, IEEE Trans. Image Processing, Vol. 5, No.
4, April 1996, pp. 565-574.
[26] R. Talluri, “Error-resilient video coding in ISO MPEG-4 standard”, IEEE
Commun. Mag., vol. 36, no.6, June 1998, pp.112-119.
[27] J.Ott, Stephan Wenger and Gerd Knorr, “Application of H.263+ Video Coding
Modes in Lossy Packet network Environments”, Journal of Visual Communication
and Image Representation 10, 1999, pp.12-38.
[28] P. Haskell and D. Messerschmitr, “Resynchronisation of motion compensated
video affected by ATM cell loss”, Proc. ICASSP 92, San Francisco, CA, Vol. 3,
1992, pp.545-548.
[29] S. Wenger, “Video redundancy coding in H.263+”, in Proceedings of AVSPN,
Aberdeen, UK, September 1997.
[30] S. Wenger, G. Knorr, J. Ott and F. Kossentini, “Error resilience support in
H.263+”, IEEE Trans. Circuit Syst. Video Technol., Vol. 8, No.6, November
1998, pp. 867-877.
[31] J. F. Arnold, M. R. Frater and J. Zhang, “Error resilience in the MPEG-2 video
coding standard for cell based networks – A review”, Signal Processing: Image
Communication 14, No. 6-8, May 1999, pp. 607-633.
[32] W. Rabiner, M. Budagavi and R. Talluri, “Proposed extensions to DMIF for
supporting unequal error protection of MPEG-4 video over H.324 mobile
networks”, ISO/IEC JTC 1/SC 29/WG 11, Doc. M4135, MPEG Atlantic City
meeting, October 1998.
76
[33] A. Cellatoglu, S. Fabri, S. Worrall, A. Kondoz, “Use of Prioritized Object-Oriented
Video Coding for the Provision of Multiparty Video Communications in Error-
Prone Environments”, IEEE VTC, Amsterdam, 1999-Fall, pp. 401-405.
[34] V. A. Vaishampayan, “Design of multiple description scalar quantizers”, IEEE
Trans. Inform. Theory, Vol. 39, No. 3, May 1993, pp. 821-834.
[35] V. A. Vaishampayan and J. Domaszewicz, “Design of entropy constrained multiple
description scalar quantizer”, IEEE Trans. Inform. Theory, vol. 40, January 1994,
pp. 245-250.
[36] Y. Wang, M. T. Orchard and A. R. Reibman, “Multiple description image coding
for noisy channels by pairing transform coefficients”, in Proc. 1997 IEEE 1st
Workshop Multimedia Signal Processing, Princeton, NJ, June 1997, pp. 419-424.
[37] M. T. Orchard, Y. Wang, V. A. Vaishampayan and A. R. Reibman, “Redundancy
rate-distortion analysis of multiple description coding using pairwise correlating
transforms”, IEEE International Conference on Image Processing (ICIP97), (Santa
Barbara, CA), October 1997. Vol. 1, pp. 608-611.
[38] Q. Zhu and Y. Wang, “Error concealment in visual communications”, in
Compressed video over Networks, A. R. Reibman and M. T. Sun, Eds. New York,
Marcel Dekker, 2000.
[39] A. K. Katsaggelos and N/ P. Galatsanos, Eds., “Signal Recovery Techniques for
Image and Video Compression and Transmission”, Norwell, MA: Kluwer, 1998.
[40] S. S. Hemami and T. H.-Y. Meng, “Transform coded image reconstruction
exploiting interblock correlation”, IEEE Trans. Image Processing, Vol. 4, July
1995, pp. 1023-1027.
[41] S. Aign and K. Fazel, “Temporal & spatial error concealment techniques for
hierarchical MPEG-2 video codec”, in Proceedings of IEEE International
Conference on Communications, ICC'95, Seattle, June 1995, pp. 1778-1783.
[42] M. C. Hong, L. Kondi, H. Scwab and A. K. Katsaggelos, “Video error concealment
techniques”, Signal Processing: Image Communications, Vol. 14, No. 68, 1999,
pp.437-492.
77
[43] Q. F. Zhu, Y. Wang and I. Shaw, “Coding and cell loss recovery for DCT-based
packet video”, IEEE Trans. Circuits Syst. Video Technol., Vol. 3, No. 3, June
1993, pp. 248-258.
[44] H. Sun and W. Kwok, “Concealment of damaged block transform coded images
using projections onto convex sets”, IEEE Trans. Image Processing, Vol. 4, April
1995, pp.470-477.
[45] G. S. Yu, M. M. Liu and M. W. Marcellin, “POCS-based error concealment for
packet video using multiframe overlap information”, IEEE Trans. Circuits Syst.
Video Technol., Vol. 8, August 1998, pp. 422-434.
[46] A. Narula and J. S. Jim, “Error concealment techniques for an all-digital high-
definition television system”, in Proc. SPIE Conf. Visual Communication Image
Processing, Cambridge, MA, 1993, pp. 304-315.
[47] B. Girod and N.Harber, “Feedback-based error control for mobile video
transmission”, Proc. IEEE, Vol. 87, October 1999, pp. 1707-1723.
[48] J.Wen and J.D.Villasenor, “Reversible Variable length Codes for Efficient and
Robust Image and Video Coding”, Proceedings of the 1998 IEEE Data
Compression Conference, Snowbird, Utah, March 30 – April 1, 1998, pp471-480.
[49] W.-m.Lam, A.R.Reibman and B.Liu, “Recovery of lost or erroneously received
motion vectors”, Proc. ICASSP ’93, Minneapolis, April 1993, pp.V-417-420.
[50] M. Wada, “Selective recovery of video packet loss using error concealment”, IEEE
J. Select. Areas Commun., Vol. 7, June 1989, pp. 807-814.
[51] E. Steinbach, N. Farber and B. Girod, “Standard compatible extension of H.263 for
robust video transmission in mobile environments” IEEE Trans. Circuits Syst.
Video Technol., Vol. 7, December 1997, pp. 872-881.
[52] Y. Tomita, T. Kimura and T.Ichikawa, “Error resilient modified inter-frame coding
system for limited reference picture memories”, In Proc. Int. Picture Coding Symp.
(PCS), Berlin, Germany, Sept. 1997, pp. 743- 748.
78
[53] G.Coto, B.Erol, M.Gallant and F.Kossentini, “H.263+: Video Coding at Low Bit
Rates”,IEEE Transactions on Circuit and Systems for Video Technology, Vol.8,
No.7, Novermber 1998, pp.849-866.
[54] D. G. Hoffman, D. A. Leonard, C. C. Lindner, K. T. Phelps, C. A. Rodger and J. R.
Wall, “Coding Theory: the Essentials”, Marcel Dekker, Inc., 1991.
[55] R. Talluri, I. Moccagattaq, Y. Nag and G. Cheung, “Error concealment by data
partitioning”, Signal Processing: Image Communcation, Vol. 14, May 1999,
pp. 505-518.
[56] ITU-T H.263 “Video coding for low bit rate communication”, 1998.
[57] ISO/IEC 14496-2, “Information Technology – Coding of Audio-Visual Objects:
Visual”, 2001.
79
5 OVERVIEW OF ERROR CORRECTION TECHNIQUES
5.1 Introduction
There are basically two error control techniques at the data link layer of a
telecommunication network: forward error correction (FEC) and automatic repeat
request (ARQ). FEC employs error correction codes to correct errors detected at the
receiver while ARQ uses error detection and retransmissions to combat transmission
errors in two-way communication systems. Both FEC and ARQ have their advantages
and limitations. For a stable channel condition, FEC schemes maintain a constant
system throughput and have a low time delay, which is very important for real-time
applications. However, when the channel condition deteriorates, the performance of
FEC decreases dramatically. For a one-way transmission system, FEC is the only
choice. For a good channel condition, ARQ schemes are simple and are able to achieve
a high throughput with high reliability. Again when the channel error rates increase, the
system throughput decreases rapidly; long and variable delays, which are unacceptable,
are expected. In wireless environments it is seldom feasible to use pure FEC or ARQ
due to the unstable channel conditions. In most situations a combination of basic FEC
and ARQ schemes, which is often called hybrid ARQ, is used due to its capability in
combining the advantages from pure EFC and ARQ. However for error control in the
application layer, FEC is the only choice. Because we are only concerned here with
error control in the application layer, only FEC techniques will be reviewed further.
Shannon published his pioneering work in 1948 [1], in which he showed that, as long as
the rate at which information is transmitted is less than the channel capacity, there exist
80
error control codes that can provide arbitrarily high levels of reliability at the receiver
output. Since then, a great deal of effort has been expended on the problem of devising
efficient encoding and decoding methods for error control in a noisy environment. The
output of research results on error control coding can be roughly classified into two
categories, namely block and convolutional.
5.2 Block Codes
In a block code [2,3], the information sequence is divided into message blocks of k
information bits, and each block is mapped independently into a block of n bits which
is called a code word with n > k. Corresponding to 2 k different possible messages,
there are 2 n different possible code words after encoding. From these 2 n code words,
m = 2 k code words may be selected to form a code. The set of code words of length n
is called a ),( kn block code. Obviously, the encoder is memoryless and the rate of this
block code is nkR /= .
In practice, linear block codes are the most commonly used due to their easy synthesis
and implementation, and are constructed according to the definition given below.
Let the message ),......,,( 110 −= kmmmmr be an arbitrary k -tuple from a Galoic
field, )(qGF . The linear ),( kn code C over )(qGF is the set of kq codewords of row-
vector form, ),......,( 110 −= nccccr , where )(qGFc j ∈ , which is defined by the following
linear transformation:
Gmc ⋅=rr
Here G , called the generator matrix, is a nk × matrix of rank k of elements from
)(qGF .
From the definition, it can be seen that any linear combination of two or more code
words is also a code word in a linear block code, hence their name. Among the linear
block codes, linear cyclic codes are most commonly used, including BCH codes and
Reed-Solomon codes, which are elaborated below.
81
5.2.1 Linear Cyclic Codes
Liner cyclic codes form a very important subclass of linear block codes. An
),( kn linear code C is called a cyclic code if any codeword Ccccc n ∈= − ),......,( 110r has
its cyclic shift S Cccccc nn ∈= −− ),......,,( 2101r . The special algebraic and geometric
structure of the cyclic codes ensure their implementation is relatively easy. A number
of efficient encoding and decoding algorithms have been derived for cyclic codes by the
use of shift-register circuits. These algorithms make it possible to implement long
block codes with a large number of codewords in practical communication application.
Almost all block codes employed in modern digital practice are either linear cyclic
codes or closely related to them.
BCH codes form a large class of powerful random error-correcting cyclic codes, which
was discovered by Bose and Ray-Chaudhuri in 1960 [4,5] and independently by
Hocquenghem in 1959 [6]. This class of codes is a remarkable generalization of the
Hamming codes for multiple-error correction. BCH codes provide a wide variety of
block lengths and corresponding code rates. They are important not only because of
their flexibility in the choice of their code parameters, but also because, at block lengths
of a few hundred or less, many of these codes are among the most used codes of the
same lengths and code rates. Another advantage is that they are capable of correcting
all random patterns of t errors by a decoding algorithm that is both simple and easily
realized in a reasonable amount of equipment. Among the non-binary BCH codes, the
most important subclass is the class of Reed-Solomon codes [7]. Reed-Solomon codes
have particularly good distance properties and burst error correction capabilities since
bursts of errors cause only a few symbol errors in a Reed-Solomon code, which can be
easily corrected. Reed-Solomon codes also can be concatenated with a binary code to
provide higher levels of error protection.
82
5.3 Convolutional codes
Convolutional codes [8,9] differ from block codes in that the encoder contains memory
and the n encoder outputs at any given time unit depend not only on the k inputs at that
time unit but also on the m previous input blocks
5.3.1 Convolutional Encoding
Fig.5.1 shows the structure of a typical convolutional encoder. Convolutional codes are
usually described using two parameters: the code rate and the constraint length. The
code rate, k/n, is expressed as a ratio of the number of bits fed into the convolutional
encoder (k) to the number of channel symbols output by the convolutional encoder (n)
in a given encoder cycle. The constraint length parameter, K, denotes the “length” of
the convolutional encoder, i.e. how many k-bit stages are available to feed the
combinatorial logic that produces the output symbols.
K× k stages Channel symbols
Figure 5.1 Convolutional Encoder
The input data to the encoder, which is assumed to be binary, is shifted into and along
the shift register k bits at a time. The n-bits output sequence for each k-bit input are
generated by the n linear algebraic function generators. Closely related to K is the
parameter, m, which indicates how many encoder cycles an input bit is retained and
used for encoding after it first appears at the input to the convolutional encoder.
1 2 … k 1 2 … k 1 2 … kk
1 2 3 n
Information bits
83
The parameter m can be thought of as the memory length of the encoder. In practice
codes with k= 1 and n= 2 are more often used, in these cases m= K - 1. Increasing K or
m usually improves the performance of convolutional codes.
Unlike a block code, which has a fixed length n, a convolutional encoder is basically a
finite-state machine, whose state status is determined by its memory elements. It is the
state status which determines the mapping between the next set of input and output bits.
As with most finite-state machines, the convolutional encoder can only move between
states in a limited manner, which can be represented by a state-transition diagram. The
state diagram of the convolutional code(7,5) with K = 3, k = 1 and n = 2 is shown in
Fig.5.2. The octal number 7 and 5 represent the code generator polynomials, which
when read in binary (111, 101) correspond to the shift register connections to the
modulo-two adders.
Figure 5.2 State Diagram of a 4-state convolutional encoder
In the state-transition diagram, nodes represent states and branches represent transitions.
Each branch in the state diagram has a label of the form XX/Y, where XX is the output
pair corresponding to the input bit Y. When depicting the evolution of states transition
with time, the trellis diagram can be obtained as shown in Fig. 5.3. In the diagram, the
four states (00, 01, 10, 11) are shown at the left-hand side, and the two digit numbers
represent the output as the encoder transitions from one state to another state. A solid
line in the diagram represents a ‘zero’ input bit and a dashed line represents a ‘one’
input bit.
01
00 10
11
11/0 10/0
00/1
01/001/1
01/1
00/0
10/1
Input 0: Solid line Input 1: dashed line
84
Figure 5.3 Trellis diagram of a 4-state convolutional encoder
5.3.2 Viterbi Decoding
In the decoding of a block code for a memoryless channel, the distances between the
received code word and the 2k possible transmitted code words are computed. Then the
code word that is closest in distance to the received code word is selected. This
decision rule, which requires the computation of 2k metrics, is optimum in the sense that
it results in a minimum probability of error for the binary symmetric channel and the
additive white Gaussian noise channel.
Different from block code decoding, the optimum decoding of a convolutional code
involves a search through the trellis for the most probable sequence. Depending on
whether the detector following the demodulator performs hard or soft decisions, the
corresponding metric in the trellis search may be either a Hamming metric or a
Euclidean metric respectively. A metric is defined for the jth branch of the ith path
through the trellis as the logarithm of the joint probability of the sequence conditioned
on the transmitted sequence for the ith path. That is,
),|(log )()( ijj
ij CYp=µ ,....3,2,1=j
Furthermore, a metric PM(i) for the ith path consisting of B branches through the trellis
is defined as
∑=
=B
j
ij
iPM1
)()( µ
00 00 00 00 00
11 11 11 11 11
01 01 01 01
10 10 10 10 11 11 11
00 00 00
01 01 01
10 10 10
00
01
10
11
85
The criterion for deciding between two paths through the trellis is to select the one
having the larger metric. This rule maximizes the probability of a correct decision or,
equivalently, it minimizes the probability of error for the sequence of information bits.
Based on this criterion Viterbi introduced a decoding algorithm [10,11] for
convolutional codes in 1967.
In the Viterbi algorithm it is assumed that the code begins and ends at the all-zero state.
For a (n,k,m) convolutional code, the input information sequence of kL bits is padded
with km all-zero bits, which are called tail bits, to flush the encoder memory in order for
the last information bits having their influence on the last output symbols of the
convolutional encoder. The received code word contains n(L+m) bits. With this
assumption the algorithm can be summarized as following.
1. Draw a trellis of depth k(L + m). For the last m stages of the trellis, draw only
paths corresponding to the all-zero input bits.
2. Initialisation: Set l = 1 and the metric of the initial all-zero state equal to 0.
3. Recursion: Find the distance of the lth nbits in the received sequence to all
branches connecting the states at the lth stage to the states at the (l + 1)th stage
of the trellis.
4. Add these distances to the metrics of the states at the lth stage to obtain the
metric candidates for the states at the (l + 1)th stage. For each state at the
(l+1)th stage, there are 2k paths entering the state and thus there are 2k metric
candidates. For each state at the (l+1)th stage, find the minimum of the metric
candidates and label the corresponding branch as the survivor. Store the
survivor path and assign its metric as the metric of the state at the (l+1)th stage
and eliminate all other paths.
5. If l = L + m, go to the next step: otherwise increase l by 1 and go to step 3.
6. For the all-zero state at the last (L + m)th stage, the survivor path is the optimum
path and the input sequence associated with this path is the maximum likelihood
decoded information sequence. Remove the last km bits from the estimated k(L
+ m)-bit sequence and thus obtain the estimated kL information bits.
86
The complexity of the Viterbi algorithms is proportional to the number of states and
paths in the trellis diagram. The complexity of the algorithm increases with the memory
length m and the input block length k. In addition, the decoding delay and the amount
of memory required for the storage becomes unacceptable when decoding a long
information sequence. A solution to this problem is the path memory truncation
approach, where the decoder at each stage only searches δ stages back in the trellis
instead of the start of the trellis. The parameterδ is called the trellis depth. Simulations
have shown that when )1(5 +≥ mδ , the performance degradation caused by this
suboptimal decoding is negligible.
5.3.3 Performance of Convolutional codes
5.3.3.1 Performance of Hard-decision Viterbi decoding algorithm
Unlike block codes, it is difficult to give a closed form expression for the performance
of convolutional code. Usually it is given by bounds. Let S be the set of all paths that
diverge from the all-zero path at a fixed time instant t, say 0=t , and remerge into the
all-zero path exactly once at some time later. The performance analysis of
convolutional codes is based on the first-event error probability, denoted as Pev, which
is defined as the probability that any path in the set S accumulates a higher metric that
the all-zero path, given correct decoding up to 0=t and assuming that all-zero path to
be correct without loss of essential generality. The more useful measure is bit error
probability, denoted as Pb, which is defined as the expected number of bits errors in a
given sequence of received bits normalized by the total number of bits in the sequence.
It can be shown that Pev for memoryless channels is bound by [13]
ddd
dev PaPfree
∑∞
=
≤
ddd
dkb PbPfree
∑∞
=
≤ 1
87
In the expressions, dfree is the minimum free distance of the code, ad is the number of
paths in S of Hamming weight d, and bd is the total number of nonzero information bits
in all paths of Hamming weight d in S. As for Pd, it is the pair-wise probability that a
path in S of Hamming weight d is chosen instead of the correct path. The parameters ad
and bd depend only on the code parameters and are commonly calculated from the
code’s transfer function while Pd is channel-dependent. For an additive white Gaussian
noise (AWGN) channel,
=
021 2
NEQP b
d
where
dzzxQx
)21exp(
21)( 2−= ∫
∞
π
where Eb/N0 is the energy-per-bit-to-noise density ratio.
5.3.3.2 Performance of Soft-decision Viterbi decoding algorithm
If Euclidean metric is used in the Viterbi decoding algorithm instead of the Hamming
metric, the Viterbi decoding becomes soft-decision Viterbi decoding. In many practical
applications, one wishes to use digital rather than analog circuits to implement the
Viterbi decoder. This means that the signal must be processed though an analog-to-
digital converter. If the received video data is quantized to one-bit precision before
being sent to the Viterbi decoder, the result is conventional hard decision data. If the
received symbols are quantized with more than one bit of precision, the result becomes
soft decision. A Viterbi decoder with soft decision data inputs quantized to three or
four bits of precision can perform about 2 dB better than one working with hard-
decision inputs in terms of coding gain [14]. The selection of the quantizing levels is an
important design decision because it can have a significant effect on the performance of
the reconstructed video quality. It has been observed by a large number of different
people that using five or six bits in the analog-to-digital converter usually gives
performance results extremely close to those of an analog soft-decision decoder. An
analysis of the effect of quantization can be found in [15].
88
In general, for coherent BPSK signals with AWGN channels and unquantized received
signals, it can be shown that the Pd should be replaced by the following equation while
all other equations still hold from the previous section:
)/)/(2( 0NEnkdQP bd =
5.3.3.3 Advantages of soft-decision over hard-decision decoding
To understand the advantage of soft-decision decoding over hard-decision decoding, we
need to understand the inherent drawback of hard-decision decoding.
Let x be the true transmitted sequence, y its adversary sequence and z the observed
received sequence. Also suppose the Hamming distance between x and y is d, i.e.
dyxd H =),(
If x really is the original transmitted code word, it must be true that the received
vector z is the sum of x plus some error vector xe . Under hard decision decoding, xe
is a binary vector, i.e.
xexz +=
In a similar fashion, if the actual transmitted code word was y , then
yeyz +=
If
yeyz += xex +=
the Viterbi decoding will select y instead of x and the error event of distance d will
occur. If the Hamming weight of xe (the number of 1 in vector xe ) is )( xH ew , clearly,
only when
2/)( dew xH >
the error event will occur. If d is an odd number the probability of the trellis error is
jdjd
dj
ppjd
dE −
+=
−
= ∑ )1()|Pr(
21
, d odd
89
When d is an even number, there is a slight complication, since it is possible for
2/)( dew xH = . In this case, we would have a tie between the adversary paths. A tie
implies that we have no statistically valid way to pick one sequence over the other. The
Viterbi decoder must, however, pick one or the other of the two paths. Since this is a
pure guess, the decoder has, at best, only a 50% chance of picking the correct path.
Therefore, if d is an even number,
2/2/
21
)1(2/2
1)1()|Pr( ddjdjd
dj
ppdd
ppjd
dE −
+−
= −
+=
∑ , d even
When the Euclidean distance is used in Viterbi decoding instead of the Hamming
distance in the hard decision Viterbi decoding, the probability of having two real valued
squared Euclidean distances that are exactly equal is zero for all practical purposes.
This eliminates ties, which improves the error rate of the Viterbi decoder.
However, the error rate of the decoder improves by much more than is accounted for
simply by eliminating ties. When the Euclidean distance is used in the Viterbi
decoding, the received the sequence become
η+= xz
where η is the sample of an AWGN process. If η is large enough to
cause ),(),( xzdyzd EE < , an error will occur, where ),( xzd E is the Euclidean distance
between the received sequence and transmitted sequence and ),( yzd E is the Euclidean
distance between the received sequence and adversary sequence. Since the iη are
statistically independent zero-mean Gaussian random variable, it can be shown that the
probability of this is
≡= ∫
∞−
σσ
πσ 22/
21)/(
2/
22 dQduedEPd
ur
where 2σ is the variance of the zero-mean Gaussian random process and 2/1
0
2)(),(
−=≡ ∑=
L
iiiE yxyxdd
where L is the length of the sequence.
90
If it is not easy to see the implication of these formulas, more concrete examples from
[14] show that the soft decision Viterbi decoding is more than two orders of magnitude
better than hard decision Viterbi decoding.
5.3.4 Punctured Convolutional code
Figure 5.4 Basic procedure of punctured coding from rate ½ convolutional code
A punctured convolutional code [12,13] is a high rate code obtained by the periodic
eliminations of specific code symbols from the output of a low rate encoder. Fig.5.4
shows the basic procedure for a rate ½ code. Specific m bits among l blocks (2l bits) of
the original code sequence are periodically deleted according to the map which
indicates positions for deleting bits. When m is chosen to be 1−l , a punctured code of
rate nn /)1( − is obtained.
For punctured high-rate convolutional codes, Viterbi decoding is hardly more complex
than for the original code from which the punctured codes are derived. The decoding is
performed on the trellis of the original code where the only modification consists of
discarding the metric increments corresponding to the punctured code symbols.
Given the perforation pattern of the code, this can be readily performed by inserting
dummy data into the positions corresponding to the deleted code symbols. In the
decoding process this dummy data is discarded by assigning the same metric value
regardless of the code symbol, 0 or 1. This procedure in effect inhibits the
Convolutional Encoder For Original ½ code
b1,1 b2,1 - - - - - - bl,1 b1 2 b2 2 - - - - - - bl 2
l
1 0 - - - - - - 1 1 1 - - - - - - 0
Input data
Original Coded Data
bl+1,1 - - - bl+1,2 - - -
Map of Deleting Bits
Punctured Coded data
1: transmitting 0: Deleting (m bits)
bl+1,1 - - - bl+1,2 - - -
b1,1 X - - - - - - bl,1 b1 2 b2 2 - - - - - - X
Original Coded data
(X - - - deleted bits)
91
convolutional metric calculation for the punctured symbols. In addition to the metric
inhibition, the only coding rate dependent modification in a variable-rate codec is the
truncation path length, or the trellis depth, which must be increased with the coding rate.
All other operations of the decoder remain essentially unchanged.
It is not difficult to see that the performance of punctured convolutional codes is
degraded compared with the original codes, however the degradation is rather gentle as
the coding rate increases from ½ to 7/8 or even to 15/16.
From previous works [12,13] it can be summarized that,
1. For the same rate punctured codes, the coding gain increases by 0.2 – 0.5 dB with
the increase of the constraint length K by 1.
2. Although the coding gain of punctured codes decreases as the coding rate becomes
higher, the coding gain is still high even for the high rate punctured codes. For
example, the rate 13/14 code provides a coding gain of more than 3 dB for 7≥K .
These properties of punctured convolutional codes make them an attractive option for
efficient implementation.
References
[1] C. E. Shannon, “A Mathematical Theory of Communication”, Bell System
Technical Journal, vol. 27, pp. 379-423 and pp. 623-656, July and October, 1948.
[2] J.H. van Lint, “Introduction to Coding Theory”, Springer-Verlag, 1982.
[3] George C. Clark, Jr. and J. Bibb Cain, “Error-Correction Coding for Digital
Communications”, Plenum Press, 1981.
[4] R. C. Bose and D. K. Ray-Chaudhuri, “On a class of error correcting binary group
codes”, Information and Control, 3(1), March 1960, pp.68-79.
[5] R. C. Bose and D. K. Ray-Chaudhuri, “Further results on error correcting binary
group code”, Information and Control, 3(3), September 1960, pp. 279-290.
[6] A. Hocquenghem, “Codes correcteurs d’erreurs”, Chiffres, 2, 1959.
92
[7] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields”, J. Soc.
Indust. Appl. Math, Vol. 8, 1960, pp. 300-304.
[8] A. J. Viterbi, “Convolutional Codes and Their Performance in Communication
Systems”, IEEE Transactions on Communications Technology, Vol. COM-19, No.
5, October 1971, pp. 751-772.
[9] J. G. Proakis, “Digital Communications”, McGraw-Hill, 1995.
[10] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically
optimum decoding algorithm”, IEEE Trans. Inform. Theory, Vol. IT-13, No. 2,
April 1967, pp. 260 – 269.
[11] A. J. Viterbi and J. K. Omura, “Principles of Digital Communication and
Coding”, McGraw-Hill Book Company, 1979.
[12] Y. Yasuda, K. Kashiki and Y. Hirata, “High-Rate Punctured Convolutional
Codes for Soft Decision Viterbi Decoding”, IEEE Transactions on Communications,
Vol. Com-32, No. 3, March 1984, pp. 315- 319.
[13] D. Haccoun and G. Begin, “High-Rate Punctured Convolutional Codes for
Viterbi and Sequential Decoding”, IEEE Transactions on Communications, Vol. 37,
No. 11, November 1989, pp 1113-1125.
[14] R. B. Wells, “Applied Coding and Information Theory for Engineers”, Prentice
Hall, 1999.
[15] R. Wells and G. Bartles, “Simplified calculation of likelihood metrics for Viterbi
decoding in partial response systems”, IEEE Trans. Magnetics, vol. 32, no. 5, Pt. III,
Sept. 1996.
93
6 SECOND ERROR CONTROL AND ECC VIDEO
6.1 Introduction
As described in Chapter 4, diverse error resilience techniques have been introduced and
some of them have been incorporated into MPEG-4 [1] or H.263 [2] video coding
standards to address the need for error resilient video transmission. The key technique
among the error resilience tools in the MPEG-4 standard is
resynchronization/packetization. With this technique, a compressed video bitstream is
packetized by inserting resynchronisation markers in the bitstream which lets the
decoder regain synchronization after an error occurs in the bitstream by looking for
another resynchronisation point, therefore limiting the error effects to the packet where
the error occurs.
It needs to be emphasized that although resynchronization is often referred to as
packetization in literature, the packetization process for error resilience at the
application layer is different from the packetization process for channel coding at the
data link layer. The packetization operation for error resilience simply means that the
resynchronization markers are inserted periodically in a video bitstream. The packet
size (for error resilience) usually means the bits number between two resynchronization
markers and packet size may vary slightly from packet to packet, as the start and the end
of a packet needs to be aligned with the start and the end of a macroblock. In this thesis
both resynchronization and packetization are used without being distinguished to refer
to the resynchronization operation for error resilience at the application layer.
With a packetization approach combined with RVLC (reversible variable length code)
and data partitioning, a decoder is also able to partially recover some data within the
94
packet containing errors, which would be otherwise totally discarded. Obviously, there
are several disadvantages with these error resilience tools.
Firstly, while these error resilience techniques bring error resilience, they also introduce
vulnerability. If an error happens to be within a marker including resynchronization
marker, DC marker and motion marker, the decoder will lose synchronization with the
encoder; the packet containing the error or even several packets will have to be
discarded.
Secondly, there is an associated loss of coding efficiency with these schemes. From our
simulation results (to be discussed later) it is clear that with CIF format video sequences
Salesman and Akiyo, when the packet size is set to 600 bits, an increase of bit rate of
more than 9.9% occurs when resynchronization, Data Partitioning and RVLC are
employed. It may be argued that employing only a packetization approach without
combing Data Partitioning and RVLC can reduce the increase of the bit rate, but it is
really necessary to combine RVLC and Data Partitioning with the packetization scheme
to fully exploit the potential of the packetization scheme. Another way to reduce the bit
budget for overhead due to packetization is to increase the size of the packet, but this
will bring a longer decoding delay, as the packetization approach introduces a decoding
delay of one packet (or slice). Increasing the packet size will also reduce the
effectiveness of the packetization scheme. It should be noted that RVLC reduces the
coding efficiency too.
Thirdly, these techniques are passive in the sense that they do not have the capability to
recover a bitstream from errors actively and completely by correcting the error bits in
the bitstream. Instead, the packets containing any errors are simply discarded, though
some information in the packets in error can be partially recovered through the
employment of RVLC and Data partitioning. The loss of information caused by
discarding the packets is unrecoverable; with the inter-frame error propagation effects,
the reconstructed video output rapidly declines to unrecognizable if no other measure is
taken. Also the partial recovery of the corrupted data through the employment of Data
Partitioning and RVLC is only possible when the corresponding motion vectors are
available. If the packet header or the motion information, which is located in the first
part of the packet, is corrupted first, then the use of RVLC becomes meaningless as the
95
data recovered by RVLC is only the difference between the blocks in the current frame
and the corresponding blocks in the previous frame located by the motion vectors. The
original texture coefficients in the block cannot be recovered without the motion
information. While data partitioning makes error concealment more easily realized
when motion information is available, even the best error concealment techniques only
reduce the influences of error to a certain degree. Actually nearly all the error resilience
tools currently available either inside or outside the video coding standards are passive
in the sense they do not have the capability to correct the errors in the final video
bitstream before video decoding.
Lastly, AIR [1] will increase the bit rate of an encoded bitstream significantly while
NEWPRED [1] needs the upstream messaging from the decoder, which may not be
practical in some situations, especially in a multi-party video communication system.
Generally reducing packet size can increase the robustness of an encoded video
bitstream with the cost of decreasing the coding efficiency. However there is a limit on
the effectiveness of improving the robustness by reducing the packet size due to the
reasons stated above. In some extreme channel conditions, an acceptable quality in
video communication will become impossible by simply employing the error resilience
tools in the MPEG-4. One extreme example is that when the packet is so small that one
packet only contains one macroblock, the bitstream will be more vulnerable than not
using the packetization scheme, as the markers will take more portion in the bitstream.
Obviously other tools are needed.
6.2 Second Error Control
Taking a further step to look at the inside of the current available error resilience video
coding tools, it can be seen that the most fatal and fundamental disadvantage of these
tools is that they accept the residual errors passively delivered to the application layer
by the transmission system of the network. As stated in Chapter 1, it is unavoidable that
some residual errors will have to be delivered to the video decoder by the transmission
network. But one question can be asked, do we have to accept these error bits in the
application layer for a real-time application? If the answer is yes, then the current
96
available error resilience techniques are the only choices, which means we will have to
accept all the poor quality of real-time video transmission associated with these
techniques. If the answer is no, a mechanism is needed to correct these errors and a
form of error control in the application layer is necessary.
To apply error control in the application layer after the first error control takes place in
the data link layer seems unrealistic because of the huge overhead a usual error control
scheme will cause. Obviously employing ARQ (automatic retransmission request) at
the application layer in SEC (second error control) is not realistic after the first error
control takes place at the data link layer, which probably has used up all the time limit
allowed for retransmission with ARQ. Applying directly a usual FEC (forward error
correction) approach commonly used in the data link layer, is unrealistic and cannot be
justified because of the huge overhead the usual FEC limitations. Now another question
can be asked, is there an effective error correction code with extremely high coding
efficiency? With an increase of around 9.9% in the final bitstream for error resilience
overhead, is it possible to do something better than the resynchronization approach in
MPEG-4? Recalling the capability a punctured convolutional code can provide, the
answer to the question may be yes. If another ECC (Error Correction Coding) is
applied at the application layer to correct residual errors, that means we are going to use
a second error control for real-time applications. Does this work and is punctured
convolutional coding efficient enough? This question is addressed in the sections below.
6.3 ECC video – the SEC approach
In an ECC scheme, a compressed video bitstream is not packetized using the
resynchronization markers; instead it is protected with an error correction code, i.e. a
compressed video bitstream is further encoded using the error correction code. The
basic requirement for the error correction code is high coding efficiency and strong
error correction capability. In this work the error correction code is achieved with a
punctured convolutional code [3,4,7]. There are three reasons for choosing the
convolutional code. Firstly convolutional coding is more suitable to mobile channels.
Enhanced with interleaving, a punctured convolutional code is also very good at coping
with both bursty errors and packet loss in addition to correcting random errors.
97
Secondly, it is easier to adapt the rate of the error correction code with a punctured
convolutional code [5,6] matching the residual error conditions. Thirdly, when
punctured, some of the convolutional codes can achieve very high coding efficiency
while still retaining very good error correction capability. After each video frame is
compressed, the compressed bitstream is further encoded using a punctured
convolutional code. The picture start code serves as the synchronization point, so the
decoder is able to receive the portion of each frame in the bitstream.
Before the video decoder start video decoding, it first decodes the punctured
convolutional code using the Viterbi decoding algorithm. Data partitioning and RVLC
can still be employed in an ECC scheme because of their error resilience and
concealment capability. If Data Partitioning and RVLC are used without employing
packetization, the whole frame can be considered as containing only one packet. The
main difference with the conventional MPEG-4 approach is that markers including DC
markers and motion markers in the ECC video bitstream are protected by the
convolutional code as well, while markers in the packetized video bitstream are exposed
to errors. Also in an ECC video bitstream, each frame only contains one motion marker
or DC marker, while in a packetized video bitstream each frame can contain multiple
resynchronization markers, DC markers or motion markers depending on the packet
size in the bitstream.
Figure 6.1 Video Communication System with ECC
Source Source Encoder
ECC Encoder
Channel
Encoder
Channel
Channel
Decoder
ECC
Decoder
Source
DecoderDisplay
98
A video communication system employing ECC is shown in Figure 6.1. It needs to be
emphasized that when compared with Fig. 1 it is clear that ECC is not a form of channel
coding, and although some punctured convolutional codes have been widely used as
channel coding schemes in the data link layer; instead it is part of source coding for
error resilience purposes. More precisely it is a SEC approach in addition to the first
error control (conventional error control) in the data link layer. Obviously the operation
of ECC on a compressed video bitstream for error resilience is different from the
ordinary FEC technique commonly employed in the data link layer of the network
though in principle they take similar role on correcting errors in a bitstream. First, FEC
is usually employed as a channel coding mechanism to improve the capacity of the
channel and often combined with ARQ (automatic repeat request). From the point view
of layered structure of telecommunication network, FEC usually exists in the second
layer (or data link layer) while ECC on encoded video bitstream is part of application
layer, therefore is considered as part of the source data by FEC. Second, the design and
choice of FEC usually depends on the channel conditions and the associated ARQ
mechanism, while the design and choice of ECC depend on the capability of the
network to combat the errors in the telecommunication channels. Third, FEC works on
the original errors introduced by the unfavorable channel conditions, ECC works on the
residual error left in the source data by the network. In other words, FEC belong to first
error control while ECC belongs to SEC.
In some telecommunication networks (for example some networks based on UDP/IP
protocols), the network simply discards the packets when the packets contain errors,
after these packets have been transmitted through the data link layer at the receiving
side. These networks must modify their protocols such that they will deliver the packets
to the application layer even when the packets still contain errors, if these networks are
employed for video communications and the video bitstreams are protected by ECC for
error resilience. The same applies to their use with the conventional error resilience
tools in MPEG-4. When the conventional error resilience tools in the MPEG-4 standard
are employed, the networks need to deliver the packets to the application layer even
when the packets contain errors.
99
6.4 Simulation Results
To evaluate the effectiveness of the proposed ECC scheme, two widely used video
sequences, Akiyo with relatively slow motion and Salesman with fast movement, are
chosen as the test sequences. The goal is to compare the PSNR (peak-to-peak signal to
noise ratio) of the reconstructed video sequence from the bitstreams protected with ECC
and the bitstreams protected with packetization.
Following the convention, in this thesis the PSNR is defined as [12]
( )
−=
∑∑i j
prcrefN jiYjiYPSNR 21
2
10 ),(),(255log10
where ),( jiYref and ),( jiYprc are the pixel values of the reference and processed images
respectively, N is the total number of pixels in the image, and i, j are the pixel index in
the image. In this equation the peak signal with an 8-bit resolution is 255, and the noise
is the square of the pixel-to-pixel difference (error) between the reference image and the
image under study. Though it has been claimed that in some cases PSNR’s accuracy is
doubtful because the color information has not been taken into consideration, its relative
simplicity makes it a very popular choice. If accuracy is a main concern, then some
more sophisticated perceptual error models than simple pixel differences might be used
[13].
6.4.1 Experimental conditions
The tests are conducted based on the following conditions:
1. 50 frames of each video sequence are encoded with the first frame coded as I frame
followed by all P frames without rate control.
2. Packet size of both video sequences is set to 600 bits when the packetization scheme
is used.
100
3. When the ECC scheme is employed, the ½ rate base convolutional code (561, 753)
is chosen, which has a constraint length of K = 9. This base code is punctured to
rate 13/14, which means that every 13 bits in the encoded bitstream, another bit is
added after convolutional encoding. The puncturing pattern is shown below.
1 1 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 0
4. After transmission the convolutionally encoded bitstream is decoded using the hard
decision Viterbi decoding algorithm with trellis depth of 21xK.
5. Data partitioning and RVLC are employed in both experiments with the ECC
scheme and the packetization approach.
6. The same quantization parameters are used in all experiments, which means that
correctly decoded bitstreams protected using ECC or packetization should have the
same visual quality on the same video sequence in the error free environments.
7. In each test, the residual errors are simulated with random errors with a Gaussian
distribution with Bit Error Rate (BER) of the residual errors set at 1x10-5, 4x10-5,
1x10-4 and 1.7x10-4 respectively.
8. After the corrupted bitstreams are decoded, the erroneous motion vectors and
texture information are replaced by 0, which means that when the motion vectors
are not available, the motion compensations are implemented by using the motion
vectors exactly at the same position in the previous frame and while the texture
information is not available, the block in question is reconstructed using the texture
information in the blocks located by the motion vectors.
6.4.2 Results
To express the simulation results, a new notation is needed. ECC(7/8) means that a
ECC scheme is used with the ECC rate set to 7/8. Similarly, Packetization(600) means
a packetization scheme is used with packet size set to 600 bits. The final results,
obtained by averaging results over 100 individual tests, are shown in Figure 6.2 to
Figure 6.11. The numbers of bits to encode each frame of the video sequences with
each scheme are listed in Table 6.1 and Table 6.2.
101
The advantage of using ECC instead of packetization is clearly seen. ECC(13/14)
produces less overhead in the bitstream than Packetization(600). The average number
of bits per frame used for encoding Akiyo is 4896.64 when ECC(13/14) is employed
and 4959.52 when packetization(600) is used. For Salesman the average number of
bits used for encoding each frame becomes 11674.88 and 11768.48 when ECC(13/14)
or Packetization(600) is employed respectively. The PSNRs of the video output
reconstructed from the bitstreams employing the ECC(13/14) are much higher than the
PSNRs of the video output employing Packetization(600) for both video sequences
Salesman and Akiyo. The PSNR gains range between 1dB and 4dB as shown in Figure
6.3 to Figure 6.4, Figure 6.7 to Figure 6.8 and Figure 6.10 to Figure 6.11, when the BER
of the final bitstream varies from 1x10-5 to 10-4.
Generally, the PSNRs of the reconstructed video output degrade as the BER of the
residual errors increases. At the extreme residual error condition, for example when the
BER reaches 1.7x10-4 for video sequence Akiyo with moderate slow motion, the
bitstream employing ECC(13/14) still delivers viewable (though not very good)
reconstructed images, while the bitstream employing Packetization(600) produces an
unrecognizable output as shown in Figure 6.9. For video sequence Salesman with fast
motion, both ECC and packetization approaches fail to deliver decent reconstructed
video outputs at the specified test conditions. For the purpose of comparison in another
experiment, the first frame (I frame) is transmitted error free. Again video output
employing ECC(13/14) has a better PSNR than the video output employing
Packetization(600) as shown in Figure 6.5.
When the BER of the residual errors is relaxed from 1.7x10-4 to 1x10-4, the simulation
results are shown in Figure 6.10 and Figure 6.11. Now the advantage of ECC over
packetization is clearly seen again, as packetization is producing uniformly
unacceptable results.
It can be seen that the PSNR gain with Akiyo is much higher than with Salesman when
ECC is used instead of packetization. The reason is simple: the bitstream for Salesman
has a much higher data rate than the bitstream for Akiyo. For random errors, higher
data rate means more opportunities that the errors in the bitstream corrupt the bitstream.
102
Two conclusions can be drawn from these experiments. First, ECC is superior over
packetization in terms of coding efficiency and effectiveness. Second, in extreme
residual error conditions, for instance when the BER of the final bitstream is higher than
10-4, both ECC(13/14) and packetization(600) are not enough. ECC needs to be and
can be improved to make it more powerful to correct most errors in the video bitstream,
especially in I frames, because in P frames at least some basic error concealment
operations can reduce the error effect in the reconstructed images.
The investigation on how to improve the power of ECC is given in Chapter 8. So far
we have no way to further improve the robustness of the conventionally encoded video
bitstream in some extreme situation if the packet size has reached its saturation point,
i.e. further reducing packet size will not improve the quality of the video transmission
when packetization is employed.
Another important characteristic to be noted is that the ECC approach produces much
less overhead for an I frame compared with the packetization approach; this can easily
be seen from the Table 6-1 and Table 6-2. The number of bits for the I frame of Akiyo
is 47480 when ECC(13/14) is used and 50352 when Packetization(600) is employed;
while the number of bits for the I frame of Salesman becomes 79432 and 81168 when
ECC(13/14) or Packetization(600) is employed respectively. This can be a big
advantage favoring the ECC approach for video transmission as the smaller bit rate for I
frames means a relaxed requirement for peak channel capacity for transmitting I frames.
According to convolutional coding theory, increasing the constraint length parameter K
of a convolutional code and trellis depth of Viterbi decoding will increase the power of
the convolutional code. However, the computational requirement for the Viterbi
decoding algorithm grows exponentially as a function of the constraint length K, so it is
usually limited in practice to constraint lengths of 9 or less. As computing technique
advance, it is reasonable to expect that it will be realistic to use convolutional codes
with constraint lengths longer than 9, which will make the ECC approach more efficient
and effective.
Though the experiments are conducted based on the MPEG-4 video coding standard,
the proposed scheme can be applied to all other video coding standards including
MPEG-1 [9], MPEG-2 [10], H.261 [8], H.263 [2], as all of them use basically the same
103
techniques based on DCT and motion estimation. The proposed error resilience
technique can also be applied to other video coding schemes, which are not based on
DCT and motion estimation, such as wavelets.
6.5 Discussion
In this chapter, the new concept of Second Error Correction is introduced. In this work,
SEC is realized with Error Correction Coding. For video applications, use of ECC
accomplished with punctured convolutional coding has achieved success.
The SEC approach has provided a fresh view on both error control and error resilience
coding. Traditionally implemented at data link layer, error control is now not only a
technique to improve the capacity of a channel at the data link layer; it is also an active
error resilience tool and can be implemented in the application layer. As a result, the
convolutional codes also expand into a new field of application. With the introduction
of SEC, several aspects of network operations can be integrated into a generic and
extended framework, which we can still use the term “Error Control” to represent, but
now the term has a broader meaning.
Under the concept of “Error Control” with its broader meaning, source coding, channel
coding and error resilience are not separated operations, they are different aspects of an
integrated functionality for error resilient real-time video delivery. Within the
integrated functionality, the distribution of error control between first error control and
SEC needs to be optimized. The distribution of the available bandwidth of radio
channels for source coding, first error control and SEC needs to be optimized as well.
A generic rate control algorithm based on these optimizations will be more effective and
efficient. These can be part of the future works.
The proposed algorithm requires more effort at the decoder as a Viterbi convolutional
decoding is quite demanding of computing power. If the decoder does not have enough
computing capacity, a longer decoding delay will be introduced. However, with the
commercial products of Viterbi convolutional decoding hardware and software widely
available, this should not become a problem at all.
104
In our experiments, the punctured convolutional code rate is 13/14, resulting in 7.7%
increase of the data rate of a base MPEG-4 encoded bitstream. It needs to be pointed
out that when convolutional code with a longer constraint length is used, a higher
punctured code rate, which will result in less overhead for the ECC, can be used to
achieve similar or better results. However, this will further increase the computing
complexity of the decoder.
To achieve the most coding efficiency, the puncturing rate can be adjusted to match the
bit error rate of the bitstream. For example, when the bit error rate in the bitstream is
not very high, the 16/17 code rate from the same base convolutional code may be
chosen to give a satisfactory protection to the video bitstream, while the 9/10 code can
be selected when the bit error rate of the bitstream is higher. More discussion on
different ECC rates is provided in Chapter 8 and Chapter 9.
To cope with a wide range of residual error conditions, the optimum puncturing patterns
of the base convolutional codes need to be further explored. At this stage the reported
highest punctured code rate for base code (171,133) is 16/17 [11], while the highest
punctured code rate for base code (561,752) is 13/14 [7]. The puncturing patterns of
higher code rates of 14/15, 15/16, 16/17, 17/18, 18/19, etc. need to be found for base
code (561,752) or other good base codes including those with constraint length longer
than 9, as a higher rate code will result in the ECC approach being more efficient in
favorable residual error conditions.
It is worthy to mention that a philosophically similar approach has been introduced in
the H.263 [2] video coding standard. In Annex H of H.263, the forward error correction
(FEC) for coded video signal is realized using block code BCH (511, 493). This allows
for 492 bits of the coded data to be appended with 2 bits of framing information and 18
bits of parity information to form a FEC frame. The FEC coding allows the correction
of single bit errors in each FEC frame and the detection of two bit errors for an
approximately 4% increase in bit rate. The FEC mechanism of Annex H is designed for
ISDN, which is an isochronous, very low error rate network. There is no doubt that the
FEC’s capability of correcting errors is very limited compared with ECC in a harsher
environment. First, the bit error number ECC can correct is not limited to 1 in a chunk
of 492 bits of the coded data. Second, ECC has the capability to cope with bursty errors
105
and packet loss [14], while FEC doesn’t have these capabilities. Lastly, FEC is not so
flexible to cope with residual error conditions, while ECC can be adaptive to residual
error conditions (see Chapter 8). It needs to be pointed out that the video decoding
process with ECC is totally compatible with the MPEG-4 standard after the bitstream is
convolutionally decoded.
The simulation results have given a positive answer to the question raised in Section
6.2, at least in random error situation. But in some extreme residual error situations,
ECC needs to be further enhanced to achieve more satisfactory results, which is
investigated in the following chapters.
References
[1] ISO/IEC 14496-2, “Information Technology – Coding of Audio-Visual Objects:
Visual”, 2001.
[2] ITU-T H.263 “Video coding for low bit rate communication”, 1998.
[3] J. G. Proakis, “Digital Communications”, McGraw-Hill, 1995.
[4] A. J. Viterbi, “Convolutional Codes and Their Performance in Communication
Systems”, IEEE Trans. on Comm. Technology, Vol. COM-19, No. 5, October 1971, pp.
751-772.
[5] J.Hagenauer, “Rate-Compatible Punctured Convolutional Codes (RCPC Codes) and
their Applications”, IEEE Trans. on Comm., Vol. 36, No. 4, April 1998, pp. 389-400.
[6] J. Hagenauer, N. Seshadri and C. W. Sundberg, “The Performance of Rate-
Compatible Punctured Convolutional Codes for Digital Mobile Radio”, IEEE Trans. on
Comm., Vol. 38, No. 7, July 1990, pp. 966-980.
[7] Y. Yasuda, K. Kashiki and Y. Hirata, “High-Rate Punctured Convolutional Codes
for Soft Decision Viterbi Decoding”, IEEE Trans. on Comm., Vol. Com-32, No. 3,
March 1984, pp. 315-319.
106
[8] Recommendation H.261: “Video Codec for Audiovisual Services at p×64 kbit/s”.
ITU-T (CCITT), Mar. 1993.
[9] ISO/IEC 11172-2, “Information technology-coding of moving picture and
associated audio for digital storage media at up to about 1.5 mbit/s: Part 2 video”,
August 1993.
[10] ISO/IEC: 13818 (MPEG-2). “Information technology – Generic Coding of Moving
Pictures and Associated Audio Information”.
[11] Yutaka Yasuda, Yasuo Hirata, Katsuhiro Nakamura and Susumu Otani,
“Development of variable-rate Viterbi decoder and its performance characteristic”,
Proc. 6th Int. Conf.Digital Satellite Commun., Phoenix, AZ, September 1983, pp. xii-24-
31.
[12] M.Ghanbari, “Video coding – an introduction to standard codecs”, The Institutition
of Electrical Engineers, 1999.
[13] K.T. Tan, M. Ghanbari and D.E. Pearson, “An objective measurement tool for
MPEG video quality”, Signal Processing, 7, 1998, pp. 279-294.
[14] Bing Du, M. Ghanbari, “ECC video in bursty channel errors and packet loss”, Proc.
Picture Coding Symp. 2003, Saint-Malo, France, 23 - 25 April 2003, pp.99-101.
107
PSNR of Salesman
31
32
33
34
35
1 6 11 16 21 26 31 36 41 46
Frame Number
PSN
R
Figure 6.2 PSNR of Salesman through error free channel
PSNR of Salesman
29
30
31
32
33
34
1 6 11 16 21 26 31 36 41 46
PSNR
Fram
e N
umb
ECC(13/14)Packetisation(600)
Figure 6.3 PSNR of Salesman with BER of 1 x 10-5
108
PSNR of Salesman
26
27
28
29
30
31
1 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PSN
R
ECC(13/14)Packetisation(600)
Figure 6.4 PSNR of Salesman with BER of 4 x 10-5
PSNR of Salesman
242526272829303132333435
1 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PSN
R
ECC(13/14)Packetisation(600)
Figure 6.5 PSNR of Salesman with BER of 1.7 x 10-4
(first frame transmitted error free to allow comparison)
109
PSNR of Akiyo
333435
363738
1 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PSN
R
Figure 6.6 PSNR of Akiyo through error free channel
PSNR of Akiyo
31
32
33
34
35
36
37
38
1 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PSN
R
ECC(13/14)Packetisation(600)
Figure 6.7 PSNR of Akiyo with BER of 1 x 10-5
110
PSNR of Akiyo
28
29
30
31
32
33
341 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PSN
R
ECC(13/14)Packetisation(600)
Figure 6.8 PSNR of Akiyo with BER of 4 x 10-5
PSNR of Akiyo
19
20
21
22
23
24
25
1 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PSN
R
ECC(13/14)Packetisation(600)
Figure 6.9 PSNR of Akiyo with BER of 1.7 x 10-4
111
PSNR of Akiyo
20
22
24
26
28
301 5 9 13 17 21 25 29 33 37 41 45 49
Frame number
PSN
R
ECC(13/14)Packetization(600)
Figure 6.10 PSNR of Akiyo with BER of 10-4
PSNR of Salesman
22
23
24
25
26
27
1 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PS
NR
ECC(13/14)Packetization(600)
Figure 6.11 PSNR of Salesman with BER of 10-4
112
Table 6-1 Bit number comparison between Packetization(600) and ECC(13/14) for Akiyo.
Frame ECC (13/14) Packetisation (600) 0 47480 50352 1 536 488 2 672 616 3 672 616 4 1128 1112 5 1064 1040 6 1112 1088 7 1304 1272 8 1552 1552 9 1744 1736
10 1632 1632 11 1456 1472 12 2064 2024 13 2008 1976 14 2896 2912 15 4176 4208 16 5816 5840 17 7008 7000 18 7848 7896 19 7640 7704 20 6824 6880 21 6304 6352 22 5232 5232 23 4352 4368 24 4296 4312 25 3200 3184 26 3104 3104 27 3832 3848 28 4808 4816 29 5528 5512 30 6032 6048 31 6112 6112 32 5176 5176 33 4328 4344 34 4688 4688 35 4984 5008 36 5232 5256 37 5088 5128 38 5272 5272 39 5376 5408 40 4840 4840 41 4824 4872 42 4312 4328 43 3752 3752 44 3776 3784 45 4408 4416 46 4768 4760 47 5008 5040 48 5080 5096 49 4488 4504
Total 244832 247976 Average 4896.64 4959.52
113
Table 6-2 Bit number comparison between Packetization(600) and ECC(13/14) for Salesman.
Frame ECC (13/14) Packetisation (600) 0 79432 81168 1 8616 8744 2 14136 14120 3 16200 16256 4 15520 15464 5 12496 12568 6 8832 8872 7 8440 8520 8 9240 9312 9 9232 9328
10 10248 10344 11 9440 9480 12 8424 8456 13 9392 9496 14 10064 10088 15 10120 10080 16 8592 8640 17 9792 9920 18 14320 14320 19 16640 16624 20 14176 14224 21 10712 10904 22 10528 10696 23 9016 9032 24 6416 6440 25 5656 5712 26 6976 7032 27 7944 8024 28 6792 6840 29 5064 5080 30 5888 5912 31 6824 6848 32 7512 7624 33 8272 8416 34 11016 11120 35 13840 13848 36 15904 15920 37 14104 14088 38 14792 14880 39 13352 13416 40 11032 11120 41 11272 11328 42 10608 10648 43 10128 10248 44 10816 10840 45 11448 11600 46 9432 9520 47 8928 8976 48 8352 8432 49 7768 7856
Total 583744 588424 Average 11674.88 11768.48
114
115
7 ECC VIDEO WITH IFR
7.1 Introduction
To address the passiveness and disadvantage of the current error resilience tools, the
ECC scheme has been proposed in the last chapter and [13]. Basically in an ECC
approach, a video bitstream encoded using a current video coding standard including all
the MPEG series and H.26x series is not packetized, instead it is further encoded using
an error correction code. This is an active error protection approach in the sense that it
can recover a corrupted bitstream by correcting the errors in the bitstream. The ECC in
our proposal is achieved with punctured convolutional coding [3,4,5,7]. Because of the
very efficient and effective error correction capability the punctured convolutional
codes can achieve, the proposed scheme shows significant improvement over the
packetization approach in the current MPEG-4 [10,11] and H.263 [2,8,9] video coding
standard in terms of reconstructed video quality and coding efficiency.
However the proposed ECC scheme has its own disadvantage. The only
synchronization point in an ECC video bitstream is the Picture Start Code when
packetization is not employed. When a single error bit within a frame escapes
protection with ECC (even though it is very rare if the error correction code is properly
designed matching the residual error conditions), the decoder can lose synchronization
with the encoder. Consequently, the macroblock within which the error occurs and all
the following macroblocks within the frame will be undecodable, resulting in a “half
image” effect (a decoding failure within a packet results in the empty strips in the frame
when packetization is employed), and so the quality of the reconstructed frame with this
error and all subsequent frames will suffer significantly until the next I frame due to the
116
inter-frame error propagation effects. This can happen more frequently when the
residual error condition changes following the changes of channel condition, because
the change of the ECC rate always falls behind the change of residual error conditions.
It needs to be pointed out that the NEWPRED [1,6] will not work when the basic I
frame collapses. To address this problem, a new error resilient tool - Intra Frame Relay
(IFR), is proposed in this Chapter. Simulation results show a significant improvement
over the original ECC scheme.
7.2 ECC with IFR
In the IFR scheme, when transmitting an I frame, the starting number of the corrupted
macroblocks due to errors is transmitted to the encoder by the decoder through a back
channel. The encoder then knows that the picture area in the I frame from the starting
macroblock to the end of the frame has not been successfully decoded, therefore in the
next frame all the macroblocks associated with the corrupted macroblocks (including
the macroblocks from the starting number to the end of the frame and macroblocks
using the corrupted macroblocks as reference for motion estimation) can be encoded in
Intra mode. This will increase the possibility that all the subsequent frames will have
decent reference frames.
There are two reasons to employ the proposed scheme only to an I frame. Firstly, the I
frame is the most important frame for decoding a subsequent sequence of frames. If
errors happen within an I frame, all the subsequent P (predicted) and B (bi-directional
predicted) frames will be affected due to the inter-frame error propagation effects.
Second, to encode macroblocks in P frames or B frames in Intra mode can reduce the
coding efficiency significantly. If errors happen within P frames, using NEWPRED
[1,6] will be more efficient and also effective.
To make the proposal realistic, a video decoder must have some capability to detect
errors in the bitstream after the ECC operation. The error detection process used in this
work is based on the following mechanisms.
During the decoding process, if one of the following events occurs, the bitstream will
become undecodable and the decoder will know that an error or errors have occurred.
117
The decoder then starts some error concealment for the rest of the macroblocks within
the frame and sends back the starting number of the broken macroblock to the encoder.
• Invalid VLC (MCBPC, CBPY, MVD, AND TCOEF) code is detected.
• Quantizing information goes out of range.
• Invalid INTRA DC code is detected.
• Escaped TCOEF with level 0 is detected.
• Coefficient overrun occurred.
• A motion vector refers out of picture or beyond maximum search range (for P
frame error detection).
However, errors can occur in a way that the bitstream is still decodable even though the
bitstream contains errors. In this case the error detection can be conducted after the
decoding process using redundancy information inherent in the neighboring
macroblocks. More detailed discussion on error detection after decoding can be found
in [12].
It should be emphasized that the encoder not only needs to encode the macroblocks
from the starting number to the end of the frame in the first P frame following an I
frame in Intra mode, it also needs to encode those macroblocks in Intra mode which
may use part or all of those corrupted macroblocks as references for motion estimation.
For instance if the maximum search range of motion estimation is 16 pixels, the Intra
mode should start from the macroblock immediately above and left of the starting
macroblock in the P frame.
It also needs to be pointed out that transmission delay including both downlink and back
channel messages can happen in telecommunication networks. For instance, when the
back channel message arrives at the encoder, it is possible that the encoder is starting
encode the second P frame following an I frame. In this case the encoder should start to
encode in Intra mode from the macroblock, which is two rows above and two columns
left of the starting macroblock transmitted to the encoder by the decoder, in the second
P frame, if the maximum search range of the motion estimation is chosen as 16 bits.
118
As stated in the last chapter, both Data Partitioning and RVLC without employing
packetization can still be used with ECC video. If Data Partitioning and RVLC are
employed, both first and last number of corrupted macroblocks can be transmitted to the
encoder. Consequently, the number of macroblocks which need to be encoded in Intra
mode in the next frame will be reduced compared with not employing RVLC and Data
Partitioning, therefore the coding efficiency will be improved while coding the next
frame and subsequent frames.
7.3 Simulation results
To evaluate the effectiveness of the proposed algorithm, again Salesman and Akiyo are
chosen as the test sequences. The goal is to compare the PSNR of ECC video with and
without IFR. The experiments are conducted based on the following conditions.
1. 50 frames of each video sequences are encoded with the first frame coded as I
frame followed by all P frames without rate control.
2. When ECC is employed, the ½ rate base convolutional code (561, 753) is chosen
which has a constraint length of K = 9. This base code is punctured to rate 13/14,
which means that every 13 bits in the encoded bitstream, another bit is added after
convolutional encoding.
3. After transmission, the convolutionally encoded video bitstream is decoded using
hard decision Viterbi decoding algorithm with trellis depth of 15xK.
4. Data partitioning and RVLC are employed in both ECC video and ECC video plus
IFR.
5. The same quantization parameters are used in all experiments, which means that
correctly decoded bitstreams protected using ECC and ECC plus IFR should have
the same visual quality on same video sequence in error free environments.
6. In each test, the residual errors are simulated with random errors with Gaussian
distribution. The BER of the residual error is set to 1x10-4. Back channel messages
are transmitted error free. In most situations this assumption is realistic. If the back
channel message only contains the acknowledgement, which can be positive or
119
negative and is usually short, a strong error protection scheme can be applied to the
back channel message.
7. After the corrupted bitstreams are decoded, the erroneous motion vectors and
texture information are replaced by 0, which means that when the motion vectors
are not available, the motion compensations are implemented by using the motion
vectors exactly in same position in the previous frame when the texture information
is not available, the block in question is reconstructed using the texture information
in the blocks located by the motion vectors.
The final results, obtained by averaging results over 100 individual tests, are shown in
Figure 7.1 and Figure 7.2. The number of bits required to encode each frame of the
sequences are listed in Table 7.1 and Table 7.2. The advantage using IFR is clearly
seen. The PSNRs of the first frames (I frame) for both sequences are not very good due
to the existence of residual errors. But the PSNRs of all the subsequent P frames are
lifted when IFR is employed with the PSNR gain about 7 dB for Salesman and 9 dB for
Akiyo, while the PSNRs of the video output without employing IFR remains low. The
cost is the increase of the coding rate of the first P frame following the I frame while the
coding rate for all other frames remains similar. For both video sequences, it has been
shown in the last chapter that the packetization approach fails to deliver decent
reconstructed video quality at the given conditions (i.e. when the BER of the final
bitstream reaches 10-4) and so no PSNR results have been repeated here for video
sequences employing packetization.
In a wired network, where the residual error conditions are stable, it is easy to design an
ECC scheme matching the residual error conditions; therefore the employment of IFR is
not so important. When the ECC scheme matches the residual error conditions, it can
be expected that the ECC scheme can nearly correct all the residual errors in the
bitstream. In another words, the probability that a residual error can escape the
protection of ECC can be extremely low if the ECC scheme matches the residual error
condition. In wireless situations, the residual error conditions vary, so it can happen
more frequently that some errors escape protection by ECC; therefore it is more
recommendable to employ the IFR techniques in wireless situations.
120
It should be noted that IFR can only be effective when employed together with ECC,
and it does not support packetization. The reason is quite straightforward, the
packetization approach and associated RVLC and Data Partitioning in the current
MPEG-4 coding standards are passive and they do not have the capability to correct
error bits in the bitstream.
7.4 Delay analysis due to the employment of IFR
One obvious problem with IFR is that the data rate for the first P frame following an
Intra frame will be increased due to the employment of IFR compared with not using
IFR.
From Table 7-1, it can be seen that employment of IFR results in the bit number of the P
frame following the first Intra frame of video sequence Akiyo being increased from 536
to 7456 when ECC(13/14) is used in the residual error condition, where the BER of the
bitstream is 10-4. However the IFR-resulted data rate of the first P frame is still less
than the peak date rate of all the P frames among the 50 frames, which exist in frame 18
and frame 19. So, for this particular sequence, IFR does not pose any special difficulty
at all, if the transmission channel for P frames is allocated for the peak P frame data rate
for this particular sequence.
For video sequence Salesman, the employment of IFR results in the bit number of the
first P frame following the Intra frame being increased from 8616 to 27538, which is
about two times of the peak rate of the P frames. This will introduce one frame
transmission delay. One solution is to drop the following P frame, i.e. only one frame
(the first P frame) is transmitted instead of two P frames if the channel allocation is
fixed. Another solution is to modify the transmission protocol to allocate more channel
capacity for the first P frame following an Intra frame. This can be easily implemented
because of the periodic data structure of an encoded video bitstream. We can treat the
first P frame as “half I frame”. If we can periodically update the channel allocation for
an I frame, it is not difficult to accommodate the periodic “half I frame”.
121
The bit numbers mentioned above are the average (per frame) of the results of 100 tests.
More generally, the transmission delay to the P frame following an Intra frame caused
by the employment of IFR depends on the following conditions.
First, the residual error conditions have a significant influence on the data rate increase
of the first P frame. If the residual error conditions are good, the delay is small.
Otherwise if the residual error conditions are poor, the delay will be lengthy.
Second, the content of the video sequence is also an important factor if the ECC rate
does not match the residual error condition completely. The more complex the content
is, the more bits it will produce after compression; consequently, there is more chance
that the bitstream gets corrupted while being transmitted in a Gaussian channel and
more likely that the data rate of the P frame will be increased. However, if the ECC rate
matches the residual error condition perfectly, the content of video will not have much
influence on the reconstructed video quality, as shown in Chapter 8, because the ECC
operation will correct all the residual errors.
Third, the ECC scheme itself plays a crucial role in determining the data rate of the P
frame following an Intra frame. If the ECC is powerful enough to correct all the errors
in the Intra frames, there will be no increase of the data rate of the first P frame
following an I frame. If the ECC is weak, the data rate of the P frame will increase
dramatically. The first two factors are closely related with the ECC scheme itself. The
ECC rate needs to be increased when residual error conditions are poor and the content
of the video sequence is more complex, to combat the unfavorable conditions. From the
next chapter it can be seen that by increasing the ECC rate from ECC(13/14) to ECC
(11/12), the capability of ECC scheme is increased to such degree that it corrects all the
error bits in an I frame for Akiyo when soft decision Viterbi algorithm is used.
Consequently, the employment of IFR does not result in any data rate increase for video
sequence Akiyo because it has no chance to function; while for Salesman, the data rate
increase of the first P frame is negligible due to the employment of IFR.
It should be emphasized that the final data rate of the bitstream employing ECC(11/12)
is still less than the data rate of the bitstream employing Packetization(600) if no RVLC,
Data Partitioning and packetization are employed in the ECC video bitstream.
122
7.5 Conclusion
Following the novel, efficient, effective and active SEC approach achieved with ECC to
combat residual errors [6], a new improved version of the scheme is introduced in this
chapter. The new error resilience tool is IFR, which uses back channel messages to
further improve the performance of ECC video. Simulation results have given positive
support. To stop inter-frame error propagation caused by Intra frame errors, IFR is very
effective for ECC video. To stop inter-frame error propagation caused by P frame
errors, NEWPRED is an effective alternative [1,6].
Future work will include the design and implementation of dynamic ECC for video
communication in mobile environments, with which the ECC coding rate can follow the
change of residual error condition dynamically. If both channel coding and ECC use
convolutional coding, obviously, it provides and excellent opportunity to design a
generic and integrated rate control scheme taking source coding, channel coding and
ECC into consideration, which should be more efficient and effective. This will also be
an interesting direction for future work. More accurate error detection after ECC will
improve the performance of the ECC approach with IFR, and so the error detection
techniques applicable after ECC are also an interesting direction for future work.
References
[1] ISO/IEC 14496-2, “Information Technology – Coding of Audio-Visual Objects:
Visual”, 2001.
[2] ITU-T H.263 “Video coding for low bit rate communication”, 1998. [3] J. G. Proakis, “Digital Communications”, McGraw_Hill, 1995. [4] A. J. Viterbi, “Convolutional Codes and Their Performance in Communication
Systems”, IEEE Trans. on Comm. Technology, Vol. COM-19, No. 5, October 1971, pp.
751-772.
123
[5] Y. Yasuda, K. Kashiki and Y. Hirata, “High-Rate Punctured Convolutional Codes
for Soft Decision Viterbi Decoding”, IEEE Trans. on Comm., Vol. Com-32, No. 3,
March 1984, pp. 315-319.
[6] ISO/IEC JTC1/SC29/WG11 N3908, “MPEG-4 Video Verification Model” version
18.0, January 2001/Pisa.
[7] Y. Yasuda, Y. Hirata, K. Nakamura and S. Otani, “Development of variable-rate
Viterbi decode and its performance characteristics”, Proc. 6th. Conf. Digital Satellite
Commun., Phoenix, AZ, Sept. 1983, pp. XII-24-31.
[8] S. Wenger, G. Knorr, J. Ott and F. Kossentini, “Error Relilience Support in
H.263+”, IEEE Transactions on circuits and systems for video technology, Vol. 8, No.
7, November 1998, pp.867-877.
[9] J. Ott, S. Wenger and G. Knorr, “Application of H.263+ Video Coding Modes in
Lossy Packet Network Environments”, Journal of Visual Communication and Image
Repressentation 10, 1999, pp.12-38.
[10] I. Moccagatta, S. Soudagar, J. Liang and H. Chen, “Error –Resilient Coding in
JPEG-200 and MPEG-4”, IEEE Journal on selected areas in communications, Vol. 18,
No.6, June 2000, pp. 899-914.
[11] Y. Wang, S. Wenger, J. Wen and A. K. Katsaggelos, “Error Resilient Video
Coding Techniques – Real-time Video Communications over Unreliable Networks”,
IEEE Signal Processing Magazine, July 2000, pp. 61-82.
[12] E. Khan, H. Gunji, S. Lehmann and M. Ghanbari, “Error Detection and Correction
in H.263 coded video over wireless network”, The 12th International Packet Video
Workshop (PV 2002), April 2002 Pittsburgh PA, USA.
[13] Bing Du, Anthony Maeder and Miles Moody, “A new approach for error resilient
in video transmission using ECC”, accepted by International Workshop on Very Low
Bit-rate Video, 18-19 September 2003, Madrid Spain.
124
PSNR of Salesman at BER of 1x10-4
23242526272829303132333435
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Frame Number
PSN
R
Error FreeECC plus IFGECC only
Figure 7.1 PSNR of Salesman at BER of 1x10-4
PSNR of Akiyo at BER of 1x10-4
2526272829303132333435363738
1 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PSN
R
Error FreeECC plus IFGECC only
Figure 7.2 PSNR of Akiyo at BER of 1x10-4
125
Table 7-1 Bit number comparison between ECC alone and ECC plus IFR for Akiyo
Frame No ECC(13/14) ECC&IFR(13/14) 0 47480 47480 1 536 7456 2 672 664 3 672 666 4 1128 1116 5 1064 1064 6 1112 1106 7 1304 1293 8 1552 1547 9 1744 1748
10 1632 1631 11 1456 1458 12 2064 2026 13 2008 1999 14 2896 2849 15 4168 4162 16 5816 5766 17 7008 7008 18 7848 7833 19 7632 7637 20 6824 6845 21 6296 6298 22 5232 5208 23 4352 4359 24 4288 4268 25 3200 3212 26 3104 3117 27 3824 3827 28 4808 4822 29 5520 5533 30 6032 6026 31 6112 6122 32 5168 5180 33 4328 4377 34 4688 4679 35 4984 4981 36 5232 5257 37 5088 5072 38 5272 5286 39 5376 5343 40 4832 4862 41 4824 4806 42 4312 4295 43 3752 3759 44 3776 3792 45 4400 4394 46 4768 4774 47 5008 5030 48 5072 5073 49 4488 4486
126
Table 7-2 Bit Number comparison between ECC alone and ECC plus IFR for Salesman
Frame No ECC(13/14) ECC&IFR(13/14) 0 79424 79424 1 8616 27538 2 14128 13734 3 16200 16188 4 15520 15567 5 12496 12545 6 8832 8819 7 8432 8446 8 9240 9174 9 9232 9273
10 10248 10220 11 9432 9379 12 8416 8514 13 9392 9363 14 10064 10137 15 10112 10181 16 8592 8604 17 9792 9930 18 14320 14237 19 16640 16702 20 14176 14077 21 10712 10589 22 10528 10460 23 9016 8977 24 6408 6367 25 5656 5634 26 6968 6946 27 7944 8023 28 6792 6847 29 5056 5004 30 5888 5888 31 6824 6777 32 7512 7569 33 8272 8159 34 11008 10805 35 13840 13821 36 15904 15893 37 14104 14161 38 14792 14924 39 13344 13318 40 11032 11051 41 11272 11316 42 10608 10715 43 10128 10211 44 10816 10817 45 11440 11328 46 9424 9451 47 8928 8952 48 8352 8265
49 7760 7736
127
8 ECC VIDEO WITH SOFT-DECISION VITERBI DECODING
8.1 Introduction
In the previous chapters and [7,8] it has been shown that the proposed ECC scheme can
achieve what the packetization approach cannot. However, in the original ECC
scheme accomplished with a punctured convolutional code, only the hard-decision
Viterbi decoding algorithm is used for the convolutional decoding, which has not fully
explored the potential of the punctured convolutional code. It is reasonable to expect
that the performance of an ECC approach can be further improved if the soft-decision
Viterbi decoding algorithm is used in the convolutional decoding process according to
the theory of convolutional coding.
Also, there is a hidden problem with the original ECC schemes, which is that the PSCs
(picture start code) are not protected by ECC because the ECC operation is based on
each frame of the encoded video bitstream and the PSC served as synchronization point.
If an error happens to be within a PSC, one frame of the video bitstream before the PSC
and one frame after the PSC will not be decoded correctly; consequently the subsequent
video frames will not be decoded properly until an Intra frame because all these frames
have lost decent reference frames if no other measure is taken. The NEWPRED [5,6]
tool can be employed to recover video communication if a PSC is corrupted when a
back channel from decoder to encoder is available, but using a previous frame other
than the last frame as reference for motion estimation can reduce the coding efficiency.
Though the simulation results in Chapter 6 and Chapter 7 still show that the original
128
ECC scheme is much better than the packetization approach, one reason for this is that
the opportunity or probability that a PSC gets corrupted is very low.
The other drawback of the ECC operation based on the video frame is that it introduces
a transmission delay of one frame at the encoding side and a decoding delay of one
frame at the decoding side. This can be a big disadvantage for some real-time video
communications in which the delay requirement is crucial.
In this chapter, the soft-decision Viterbi decoding algorithm for convolutional decoding
replaces the hard-decision Viterbi decoding algorithm in the original ECC scheme, to
improve the performance of ECC video. To bring the PSCs also under protection of
ECC, instead of performing the ECC operation on each frame of the encoded video
bitstream, the ECC operation in this Chapter is performed on a segment basis, i.e. the
video bitstream is decomposed into segments and each segment is further encoded with
ECC. Thus the PSCs in a video bitstream are also protected by the ECC operation. The
simulation result shows that ECC video based on this segmentation and accomplished
with soft-decision Viterbi convolutional decoding, can work in a residual error
condition where the BER of the encoded video bitstream reaches 10-2 without the need
for channel coding. Of course, in reality channel coding is an integrated part of any
telecommunication network, and so the quality of ECC video will be even more
satisfactory.
8.2 ECC with Soft-Decision Viterbi Decoding
The main difference between the hard-decision Viterbi decoding algorithm and the soft-
decision Viterbi decoding algorithm is that the Euclidean metric [2] is used in the soft-
decision Viterbi decoding algorithm instead of the Hamming metric in the hard-decision
decoding algorithm.
When the soft-decision Viterbi decoding algorithm is used in an ECC scheme at the
application layer, the network needs to deliver the soft-decision output [4] to the
application layer after channel coding. This can be accomplished by monitoring the
difference between the survivor path and the path that has the next best metric in the
channel decoding process if channel coding is also achieved with convolutional coding
129
[3]. By monitoring the metric difference of different paths, the channel decoder
(convolutional decoder) produces reliability or confidence information assigned to each
decoded bit.
One important issue inherent with the soft-decision decoding algorithm at the
application layer is that a careful decision is needed for the selection of the confidence
levels. Increasing the confidence level can improve the performance of the
convolutional coding but the cost is the increase of the computational requirement.
Here the confidence level is different to the quantization level, which is usually
conducted on the “raw” data from the channel to convert the analogue data into the
digital data. The confidence level is produced during the channel decoding process and
conducted on digital data. However, the impact of the selection of the confidence level
to punctured convolutional decoding is similar to the selection of the quantization level.
An analysis of the effect of quantization can be found in [1]; in this work 3 bits
precision is used.
8.3 Simulation results
To evaluate the effectiveness of the proposed algorithm, the same video sequences,
Akiyo and Salesman, are chosen again as the test sequences. The goal is to compare
the PSNRs of the video output reconstructed from the bitstreams protected with an ECC
scheme with the soft decision Viterbi decoding algorithm and the bitstreams protected
with a packetization approach. The experiment conditions are similar to the ones used
in Chapter 6. To make this chapter more complete and independent the conditions are
repeated below.
1. 50 frames of each video sequence are encoded with the first frame coded as an I
frame followed by all P frames without rate control.
2. Packet size of both video sequences is 600 bits when the packetization scheme is
used. Data partitioning and RVLC are employed with the packetization scheme,
while they are not employed with the ECC approach.
3. When the ECC scheme is employed, the ½ rate base convolutional code (561, 752)
is chosen which has a constraint length of K = 9. This base code is punctured to
130
rates 11/12, 9/10 and 7/8 for the residual error conditions where the BER of the
residual errors reaches 10-4 and 10-3 respectively.
4. The segment length for ECC coding is chosen as average frame length of the 50
encoded video frames.
5. After transmission the convolutionally encoded video bitstream is first decoded
using the soft-decision Viterbi decoding algorithm with trellis depth of 19xK.
6. The same quantization parameters are used in all experiments, which means that
correctly decoded video bitstreams protected with ECC or packetization should
have the same visual quality as the video sequence in error free environments.
7. After the corrupted video bitstream is decoded, the erroneous motion vectors and
texture information is replaced by 0, which means that when the motion vectors are
not available, the motion compensations are implemented by using the motion
vectors exactly in same position in the previous frame, and when the texture
information is not available, the block in question is reconstructed using the texture
information in the blocks located by the motion vectors.
The final results, obtained by averaging results per frame over 100 individual tests, are
shown in Fig.8.1 and Fig.8.2. The coding rate comparisons between the ECC schemes
and the packetization approach are shown in Table 8-1 to Table 8-6. The advantage
using the ECC scheme with soft-decision Viterbi decoding instead of the packetization
approach is clearly seen. The ECC(11/12) with soft-decision Viterbi decoding delivers
excellent reconstructed video output even when the BER of the residual errors reaches
10-3; while it has been shown in Chapter 6 that Packetization(600) is totally incapable to
deliver an decent reconstructed video output while decoding the corrupted video
bitstream when the BER of the final video bitstream reaches 10-4. When the BER of
the residual errors is relaxed from 10-3 to 10-4, the ECC(11/12) delivers video output
with PSNR which is nearly as same as the transmission-error free situation for video
sequence Salesman; while for Akiyo it delivers video output with PSNR which is
exactly the same as the transmission-error free situation (care is needed while looking at
Fig 8.2 as the PSNR curve for BER of 10-4 is coincident with the PSNR curve for
transmission-error free). It is not surprising that in 99 tests out of 100, the ECC(11/12)
corrects all of the residual errors in the bitstreams, leaving only one test with 5 bits in
131
error for Salesman and 3 bits in error for Akiyo for the 50 frames after ECC decoding.
Here one question can be raised, how can a bitstream with transmission-errors still
deliver a video output with PSNR which is the same as in the transmission-error free
situation? The answer is quite straightforward, when the bits in error in the bitstream
correspond to the reconstructed picture areas, where no movement in the content of the
video occurs, then a basic error concealment operation (e.g. copying the corresponding
area from the previous frame) will conceal the error effect completely.
With a further negligible increase of the ECC rate, these experiments also reveal that
when the ECC(9/11) is used, the ECC operation corrects all the residual errors in the
bitstreams of both sequences Akiyo and Salesman for all the 100 tests when BER of the
residual errors is set to 10-4, delivering transmission-error free reconstructed video
output. Comparing this result with the performance of Packetization(600), the contrast
is very obvious. Based on 100 tests, our experiments also reveal that ECC(7/8) corrects
all the residual errors in the bitstreams of these two video sequences, when the BER of
the residual errors reaches 10-3. In these cases, the PSNRs of the video outputs are
identical to the PSNRs of video sequences transmitted in an error free situation and
there is no point to depict a separate PSNR curve.
The 11/12 puncturing rate of the convolutional code results in a 9.2% increase of final
bit rate. From Table 8-1 and Table 8-2 it can bee seen that without employing RVLC,
the number of bits for the final bitstream employing ECC(11/12) is still less than the
number of bits for the final bitstream employing packetization(600) which results in the
9.9% bit rate increase of the final bitstream.
Table 8-3 and Table 8-4 reveal that ECC(9/10) produces marginally more bits than
Packetization(600) does for the sequence Salesman, while for sequence Akiyo the final
bitstreams employing ECC(9/10) and Packetization(600) produce equal number of bits.
Table 8-5 and Table 8-6 show that ECC(7/8) produces 4% more bits than
Packetization(600) does for Salesman and 2.8% more bits than Packetization(600) does
for Akiyo. Taking into consideration what ECC can achieve and Packetization cannot,
these additional bit rate increases are really negligible.
132
8.4 Discussion
It has been shown that if the residual error condition is stable and if the ECC rate is
properly designed matching the residual error condition, it is unnecessary to employ
Data Partitioning and RVLC because the ECC operation corrects all the errors in the
bitstream. If the residual error condition varies as in mobile situations, employing Data
Partitioning and RVLC is helpful. However to employ RVLC in the ECC approach
reduces the coding efficiency. With a data rate increase equivalent to the increase for
using RVLC, the ECC power can be further increased significantly. It has been shown
how much the ECC power can be increased by employing ECC(11/12) instead of
ECC(13/14). Which technique to choose in reality needs to be further investigated and a
wise decision needs to be made based on several factors including residual error
conditions, video content, ECC choice and networking protocols, etc.
Obviously employing IFR with an ECC scheme with soft-decision Viterbi decoding will
further improve the performance of the ECC scheme. Another set of simulation results
shown in Figure 8.3 and Figure 8.4 obtained by averaging results from 100 individual
tests conducted in residual error condition where the BER of the residual error is set to
10-2 further demonstrate how powerful the proposed ECC enhanced with IFR can be.
The soft-decision Viterbi decoding algorithm is used in this simulation. All the other
experiment conditions are the same as listed in Chapter 7 except that the BER of the
residual errors is changed to 10-2 here. In this simulation, the ECC(7/8) scheme is
designed to be applied to a situation where the BER of the residual errors stays at 10-3
most of the time and occasionally increases to 10-2. This can also be a result of
interleaving at the application to cope with bursty error and packet loss [9,10]. When
the BER of the video bitstreams increases to 10-2, ECC(7/8) enhanced with IFR still
delivers decent video outputs for both sequences Salesman and Akiyo as shown in
Figure 8.3 and Figure 8.4, while the bitstreams protected with
packetization/resynchronization are just undecodable no matter how long the packet
size is set. There is no point to draw the PSNR lines for the video outputs reconstructed
from the bitstreams protected by the packetization schemes in the figures for these
situation results, which drop to below 10dB and so do not mean anything. ECC(7/8)
results in an increase of the bit rate of the final bitstream by 14.29% for all frames
133
except the first P frame following the I frames, if no Data Partitioning and RVLC are
employed in the ECC scheme. But it does make video transmission possible when the
BER of the residual errors increases to 10-2, which would be otherwise impossible if
resynchronization is employed instead of the ECC scheme.
Another benefit of the ECC approach based on segments is that it can relieve the
requirement for channel capacity for the P frame following an Intra frame when IFR is
employed with the ECC scheme. When an ECC scheme is based on frames as in
Chapter 6 and Chapter 7, the ECC operation is conducted after each video frame is
compressed. When an error occurs within an I frame, the system resource for encoding
and the transmitting the picture area from the macroblock in which the error occur to the
end of the frame is wasted, as this part of the picture area cannot be used by the
decoder. As stated in Chapter 7, the decoder needs to send a message through a back
channel to the encoder to inform the encoder the start number of the macroblock in
which an error has occurred during the decoding process. The encoder then knows that
the decoding has not been successful from this start number to the end of the frame, and
so the encoder can encode all the macroblocks associated with those broken
macroblocks in the following frame in Intra mode. However, to encode macroblocks
from a P frame in Intra mode will result in an increase of bit rate of the frame. When an
ECC scheme is based on segments, it is possible that when a back channel message
arrives at the encoder, the encoding process of the Intra frame has not been finished. In
this case the encoder can stop encoding the rest of the Intra frame and start encoding the
following frame right away, to use the channel capacity allocated for the Intra frame to
transmit the bitstream from the next frame which is a P frame, thus relieving the
requirement for the channel capacity for the first P frame following an I frame due to
the employment of the IFR technique.
The fundamental difference between the SEC approach realized with ECC and
traditional schemes is that the SEC approach functions before video decoding while
traditional approaches function after video decoding. The SEC operation kills the errors
in the video bitstream before video decoding while traditional approaches accept the
errors before video decoding and try to conceal, hide or “repair” the error effects after
video decoding. That is why SEC is termed an active approach while traditional
134
approaches are passive. Even without seeing the simulation results one would expect
that the SEC approach would offer a big advantage over the traditional schemes.
These different approaches toward error resilience also create another feature which
distinguishes a ECC scheme from a packetization approach and other error resilience
approaches. This is that the performance of ECC mainly depends on the residual error
conditions and the capability of ECC to correct the residual errors in the bitstream, if the
ECC code is properly designed matching the residual error condition and does not
depend on the contents of the video sequence. This has been proven by the experiments
in this chapter. Different from ECC, the performance of packetization does not only
depends on the packet size and the residual error conditions, but also depends on the
content of the particular video sequence, because basically the packetization approach
relies heavily on the employment of techniques for error concealment. That is the main
reason why we have not conducted experiments with other video sequences except the
video sequence Akiyo and Salesman to prove ECC’s superiority over packetization: the
characteristics of Akiyo with slow movement and Salesman with fast movement are
fairly representative.
References
[1] R. Wells and G. Bartles, “Simplified calculation of likelihood metrics for Viterbi
decoding in partial response systems”, IEEE Trans. Magnetics, vol. 32, no. 5, Pt.
III, September 1996.
[2] R. B. Wells, “Applied Coding and Information Theory for Engineers”, Prentice
Hall, 1999.
[3] B. Vucetic, “An Adaptive Coding Scheme for Time-Varying Channels”, IEEE
transactions on communications, Vol. 39, No. 5, May 1991, pp.653-663.
[4] J. Hagenauer and P. Hoher, “A Viterbi algorithm with soft-decision output and its
applications”, in Proc. IEEE Global Telecommunications Conf. (GLOBECOM),
Dallas, TX, November 1989, pp. 47.1.1-47.1.7.
135
[5] ISO/IEC 14496-2, “Information Technology – Coding of Audio-Visual Objects:
Visual”, 2001.
[6] ISO/IEC JTC1/SC29/WG11 N3908, “MPEG-4 Video Verification Model” version
18.0, January 2001/Pisa.
[7] Bing Du, Anthony Maeder and Miles Moody, “A new approach for error resilient
in video transmission using ECC”, accepted by International Workshop on Very
Low Bit-rate Video, 18-19 September 2003, Madrid Spain.
[8] Bing Du, Anthony Maeder and Miles Moody, “ECC video with Intra Frame
Relay”, accepted by IADIS International WWW/Internet 2003 Conference,
Algarve, Portugal, November 2003.
[9] Bing Du, M. Ghanbari, “ECC video and its performance in bursty channel errors”,
Proceedings of Iranian Conference on Electrical Engineering (ICEE) 2003, May 6-
8, 2003, Shiraz, Iran.
[10] Bing Du, M. Ghanbari, “ECC video in bursty channel errors and packet loss”,
Proceedings of Picture Coding Symposium (PCS) 2003, Saint-Malo, France, 23 -
25 April 2003. pp.99-103.
136
Salesman in Random errors
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Frame Number
PSN
R
Transmission-error freeECC(11/12) with BER of 10-4ECC(11/12) with BER of 10-3
Packetizatino(600) with BER of 10-4Packetization(600) with BER of 10-3
Figure 8.1 Performance of ECC(11/12) for Salesman with random errors
137
Akiyo in Random Errors
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Frame Number
PSN
R
Transmission-error freeECC(11/12) with BER of 10-4ECC(11/12) with BER of 10-4Packetization(600) with BER of 10-4Packetization(600) with BER of 10-3
Figure 8.2 Performance of ECC(11/12) for Akiyo with random errors
138
Salesman with BER of 10-2
10
15
20
25
30
35
401 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Frame Number
PS
NR
ECCError Free
Figure 8.3 Salesman with BER of 10-2
Akiyo with BER of 10-2
10
15
20
25
30
35
40
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Frame number
PS
NR
ECCError Free
Figure 8.4 Akiyo with BER of 10-2
139
Table 8-1 Bit number comparison between Packetization(600) and ECC(11/12) for Salesman
Frame Number ECC(11/12) Packetization (600) 0 77896 81168 1 8600 8744 2 14176 14120 3 16320 16256 4 15592 15464 5 12536 12568 6 8888 8872 7 8488 8520 8 9280 9312 9 9272 9328
10 10272 10344 11 9480 9480 12 8472 8456 13 9432 9496 14 10120 10088 15 10200 10080 16 8688 8640 17 9880 9920 18 14472 14320 19 16744 16624 20 14232 14224 21 10712 10904 22 10536 10696 23 9048 9032 24 6440 6440 25 5720 5712 26 7032 7032 27 7976 8024 28 6848 6840 29 5096 5080 30 5928 5912 31 6824 6848 32 7496 7624 33 8280 8416 34 11040 11120 35 13872 13848 36 16000 15920 37 14216 14088 38 14888 14880 39 13384 13416 40 11096 11120 41 11352 11328 42 10728 10648 43 10216 10248 44 10856 10840 45 11488 11600 46 9464 9520 47 8968 8976 48 8360 8432 49 7784 7856
Total 584688 588424 Average 11693.76 11768.48
140
Table 8-2 Bit Number Comparison between Packetization(600) and ECC(11/12) for Akiyo
Frame Number ECC (11/12) Packetization (600)0 46792 50352 1 528 488 2 664 616 3 664 616 4 1120 1112 5 1048 1040 6 1112 1088 7 1296 1272 8 1536 1552 9 1728 1736
10 1624 1632 11 1448 1472 12 2064 2024 13 2016 1976 14 2880 2912 15 4168 4208 16 5760 5840 17 6984 7000 18 7752 7896 19 7616 7704 20 6824 6880 21 6296 6352 22 5224 5232 23 4344 4368 24 4296 4312 25 3176 3184 26 3096 3104 27 3808 3848 28 4808 4816 29 5496 5512 30 6008 6048 31 6056 6112 32 5160 5176 33 4320 4344 34 4664 4688 35 4976 5008 36 5216 5256 37 5080 5128 38 5280 5272 39 5352 5408 40 4832 4840 41 4808 4872 42 4296 4328 43 3736 3752 44 3784 3784 45 4384 4416 46 4760 4760 47 4984 5040 48 5072 5096 49 4480 4504
Total 243416 247976 Average 4868.32 4959.52
141
Table 8-3 Bit number comparison between Packetization(600) and ECC(9/10) for Salesman
Frame Number ECC (9/10) Packetization (600)0 79336 81168 1 8760 8744 2 14440 14120 3 16624 16256 4 15880 15464 5 12768 12568 6 9048 8872 7 8640 8520 8 9448 9312 9 9440 9328
10 10464 10344 11 9656 9480 12 8624 8456 13 9600 9496 14 10304 10088 15 10384 10080 16 8848 8640 17 10064 9920 18 14744 14320 19 17048 16624 20 14488 14224 21 10912 10904 22 10728 10696 23 9208 9032 24 6560 6440 25 5824 5712 26 7160 7032 27 8128 8024 28 6968 6840 29 5192 5080 30 6040 5912 31 6952 6848 32 7640 7624 33 8432 8416 34 11248 11120 35 14128 13848 36 16296 15920 37 14480 14088 38 15168 14880 39 13632 13416 40 11304 11120 41 11560 11328 42 10928 10648 43 10400 10248 44 11064 10840 45 11704 11600 46 9640 9520 47 9128 8976 48 8520 8432 49 7928 7856
Total 595480 588424 Average 11909.6 11768.48
142
Table 8-4 Bit number comparison between Packetization(600) and ECC(9/10) for Akiyo
Frame Number ECC (9/10) Packetization (600)0 47656 50352 1 536 488 2 680 616 3 680 616 4 1144 1112 5 1072 1040 6 1128 1088 7 1320 1272 8 1568 1552 9 1760 1736
10 1656 1632 11 1480 1472 12 2104 2024 13 2056 1976 14 2936 2912 15 4240 4208 16 5872 5840 17 7112 7000 18 7896 7896 19 7752 7704 20 6952 6880 21 6408 6352 22 5320 5232 23 4424 4368 24 4376 4312 25 3240 3184 26 3152 3104 27 3880 3848 28 4904 4816 29 5592 5512 30 6120 6048 31 6168 6112 32 5256 5176 33 4400 4344 34 4752 4688 35 5072 5008 36 5312 5256 37 5176 5128 38 5384 5272 39 5448 5408 40 4920 4840 41 4904 4872 42 4376 4328 43 3808 3752 44 3848 3784 45 4464 4416 46 4848 4760 47 5080 5040 48 5168 5096 49 4560 4504
Total 247960 247976 Average 4959.2 4959.52
143
Table 8-5 Bit number comparison between Packetization(600) and ECC(7/8) for Salesman
Frame Number ECC (7/8) Packetization (600)
0 81600 81168 1 9008 8744 2 14848 14120 3 17096 16256 4 16328 15464 5 13128 12568 6 9312 8872 7 8888 8520 8 9720 9312 9 9712 9328
10 10760 10344 11 9928 9480 12 8872 8456 13 9872 9496 14 10600 10088 15 10680 10080 16 9096 8640 17 10352 9920 18 15160 14320 19 17536 16624 20 14904 14224 21 11216 10904 22 11040 10696 23 9472 9032 24 6752 6440 25 5992 5712 26 7360 7032 27 8360 8024 28 7168 6840 29 5344 5080 30 6208 5912 31 7152 6848 32 7856 7624 33 8672 8416 34 11568 11120 35 14528 13848 36 16760 15920 37 14896 14088 38 15600 14880 39 14016 13416 40 11624 11120 41 11888 11328 42 11240 10648 43 10696 10248 44 11376 10840 45 12032 11600 46 9912 9520 47 9392 8976 48 8760 8432 49 8160 7856
Total 612440 588424 Average 12248.8 11768.48
144
Table 8-6 Bit number comparison between Packetization(600) and ECC(7/8) for Akiyo
Frame Number ECC (7/8) Packetization (600)0 49016 50352 1 552 488 2 696 616 3 696 616 4 1168 1112 5 1096 1040 6 1160 1088 7 1352 1272 8 1608 1552 9 1808 1736
10 1704 1632 11 1520 1472 12 2160 2024 13 2112 1976 14 3016 2912 15 4360 4208 16 6032 5840 17 7312 7000 18 8120 7896 19 7976 7704 20 7152 6880 21 6592 6352 22 5472 5232 23 4544 4368 24 4496 4312 25 3328 3184 26 3240 3104 27 3984 3848 28 5040 4816 29 5752 5512 30 6288 6048 31 6344 6112 32 5408 5176 33 4528 4344 34 4880 4688 35 5216 5008 36 5456 5256 37 5320 5128 38 5536 5272 39 5608 5408 40 5056 4840 41 5040 4872 42 4496 4328 43 3912 3752 44 3960 3784 45 4592 4416 46 4984 4760 47 5224 5040 48 5312 5096 49 4688 4504
Total 254912 247976 Average 5098.24 4959.52
145
9 ECC VIDEO IN BURSTY CHANNEL ERRORS AND PACKET LOSS
It has been shown in previous Chapters and [18,19,20,21] that the ECC approach can
realize an excellent video transmission in such poor residual error conditions that the
BER of the residual errors of the video bitstream deteriorates to 10-2 when ECC(7/8)
enhanced with IFR is employed instead of the packetization approach in the MPEG-4
video coding standard. However, these results are obtained for Gaussian channels, so
the performance of the proposed ECC approach in bursty residual errors and packet loss
situations remain untested. In this chapter, the final video bitstream is interleaved at the
application layer to combat the bursty channel errors and packet losses.
To make the description more clear and easier, we need to define two new concepts
here, bursty error and burst loss. By bursty error, we mean an error condition where a
chunk (or burst) of bits of a video bitstream is corrupted by errors with very high BER
(for instance 10-1) during the burst. The length of bursty error refers to the length of the
burst. By burst loss (or packet loss), we mean a chunk (or burst) of bits of an encoded
video bitstream which gets lost during the transmission process. The length of burst
loss (or packet loss) refers to the length of the packet which gets lost during
transmission. If burst loss occurs, the network needs to put dummy data into the
bitstream where the burst loss happens. Burst loss can happen in a frame or a segment
of a video bitstream.
146
9.1 Performance of the original ECC approach with Bursty Residual Errors
To evaluate the performance of the original ECC schemes in conditions of bursty errors
compared with the packetization approaches from the MPEG-4 standard [14,15], two
experiments are conducted using the video sequence Salesman. Because the goal of the
experiments is to test the performance of the ECC scheme with bursty errors and the
inter-frame error propagation effects can make analysis complicated, the effects of inter-
frame error propagation have been excluded in the experiments. The first experiment is
conducted on I picture only while the other is conducted on P frame following the first I
frames. In the first experiment, only one frame is encoded in Intra mode with the length
of the bursty error set to different values; while in the second experiment only two
frames are encoded, the first frame is encoded in Intra mode followed by a P frame and
only P frame are error corrupted, again with bursty error length set at different values.
The final results represented by the PSNRs are the average over 100 individual tests for
each length of bursty errors. The experiments are based on the following conditions,
1. Packet size of the encoded video sequences is 450 bits when packetization is used.
2. When ECC is employed, the ½ rate base convolutional code (561, 752) is chosen
which has a constraint length of K = 9. This base code is punctured to rate 9/10,
which means that every 9 bits in the encoded bitstream, another bit is added after
convolutional encoding.
3. The error corrupted convolutional encoded bitstream is decoded using the soft-
decision [9,10,11] Viterbi decoding algorithm with a trellis depth of 11xK.
4. Data partitioning and RVLC are employed in both experiments with ECC and
packetization.
5. The same quantization parameters are used in all experiments, which means that
correctly decoded bitstreams protected using ECC or packetization should have the
same visual quality for the same video sequence in error free environments.
147
6. In each test, the I or P frame of the encoded bitstream is randomly error corrupted
by a burst of bursty error (the start position of the bursty error in the frame is
randomly distributed). The BER of the burst is 10-1.
7. After the corrupted bitstreams are decoded, the erroneous motion vectors and
texture information are replaced by 0, which means that when the motion vectors are
not available, the motion compensations are implemented by using the motion
vectors exactly in same position in the previous frame, and when the texture
information is not available, the block in question is reconstructed using the texture
information in the blocks located by the motion vectors.
The final results are shown in Figure 9.1 and Figure 9.2. From the Figures it can be
seen that the performances of both ECC and packetization schemes are satisfactory
when the length of the bursty errors is less than 40 bits for the I frame. When the
length of the bursty errors further increases, the performance of the ECC scheme
declines rapidly, while the packetization approach still delivers a quite good output.
However for P frame, the results are reversed; i.e. the ECC scheme is marginally better
than the packetization approach. But the performances of both the ECC approach and
the packetization scheme decrease rapidly as the length of the bursty errors increase.
PSNR of I Picture in Bursty Error
15
20
25
30
35
40
0 10 20 30 40 50 60 70 80 90 100
Bursty Error Length (bits)
PS
NR
ECC(9/10)Packetisation(480)
Figure 9.1 PSNR of I picture with bursty errors
148
PSNR of P-picture in Bursty Errors
30
31
32
33
34
0 10 20 30 40 50 60 70 80 90 100
Bursty Error Length (bits)
PSN
RECC(9/10)Packetisation(480)
Figure 9.2 PSNR of P frame with bursty errors
Obviously for a normal video communication, where most picture frames are encoded
in P frames following a small number of I frames, both original ECC and packetization
approaches are not good enough to cope with bursty errors. This conclusion calls for a
new error resilience tool to cope with bursty errors and packet loss. In the following
sections, the encoded video bitstream is interleaved after ECC is performed. The result
obtained by simulation is quite promising.
9.2 ECC Video with Interleaving
In this new scheme, an additional operation called interleaving is performed after a
compressed video bitstream is further encoded using punctured convolutional code. A
video communication system employing ECC with interleaving is shown in Figure 9.3.
The operation performed by the interleaver is shown in Figure 9.4. The principle for
interleaving is to spread the bursty error into wide range, by reordering the video data,
which has gone through compression and the ECC encoding procedure, to make it
easier for the convolutional decoder to correct the errors in the bitstream. More detailed
discussion on interleaving for convolutional decoding can be found in [12,13]. In our
experiments, m is chosen to be equal to the length of bursty errors or the burst loss.
149
Figure 9.3 Video communication system with ECC and interleaving
Obviously the interleaving operation for coping with bursty errors and burst loss can
only be effective when employed by a bitstream protected with ECC. It can only make
it worse if it is applied to packetization approach, as packetization doesn’t have the
capability to correct errors in a bitstream and spreading bursty errors to wide range
means more picture area will be affected by the errors in the bitstream protected with a
packetization approach.
1 n+1 2n+1 … (m-1)n+12 n+2 2n+2 … (m-1)n+23 n+3 2n+3 … (m-1)n+34 n+4 2n+4 … (m-1)n+4… … … … …n-1 2n-1 3n-1 … mn-1n 2n 3n … mn
Figure 9.4 Interleaver for coded data
Source Source Encoder
ECC Encoder
Channel Encoder
Channel
Channel Decoder
ECC Decoder
Source Decoder Display
Interleaver
Deinter-leaver
Rea
d in
cod
ed b
its fr
om
conv
olut
iona
l enc
ode r
Read out bits to channel coding
m bits
n rows
150
9.3 Simulation Results
9.3.1 ECC video with bursty errors
To evaluate the effectiveness of the new proposal, the tests are conducted with the video
sequence Salesman. The encoded video bitstreams are protected using ECC enhanced
with interleaving and packetization respectively. The test conditions are the same as
those listed in Section 9.1 except that the ECC base code is also punctured to 7/8 in
addition to the 9/10 rate when the ECC approaches are employed. The packet size is set
to 380 bits and 450 bits respectively when the packetization schemes are employed. The
bit rate of the bitstream employing Packetization(380) is roughly equal to the bit rate of
the bitstream employing ECC(7/8) as shown in Table 9-4.
The length of segments for ECC coding is set to the average frame length of the
encoded bitstream of the 50 frames. The overhead in the final bitstream employing
Packetization(450) is more than that because of ECC((9/10), see Table 9.2 and Table
9.3. In each test, each segment is randomly corrupted by a burst of the bursty errors
after the compressed video bitstream is either further encoded using an ECC scheme or
packetized with a resynchronization approach. The length of the bursty errors is fixed
to 360 bits with the BER (bit error rate) set to 10-1 during the burst, which roughly
corresponds to a transmission of 10ms to this particular video sequence Salesman if the
final bit rate of the bitstream is 36 kbps; this is the toughest test condition used to
evaluate the performance of error resilience tools in MPEG-4 during its standardization
process [15]. The results are shown in Figure 9.5.
In the bursty error situation the superiority of the ECC scheme enhanced with
interleaving to the packetization approach is clearly seen from Figure 9.5. Throughout
the sequence, the bitstream employing ECC(9/10) delivers a very good video output
while producing fewer bits than the bitstream employing Packetization(450). With
marginal increase of the overhead, ECC(7/8) delivers an excellent reconstructed video
output, which is only 1 db lower than the transmission-error free situation.
151
ECC performance in bursty error with Salesman
1618202224262830323436
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49
Frame Number
PSN
R
Error FreeECC(7/8)ECC(9/10)Packetizatioin(380)
Figure 9.5 Performance of Salesman with bursty errors
Note: Because the packetization approach is unable to deliver a recognizable video output at
the same test conditions, we have set different test conditions for ECC and packetization
respectively. The bitstream employing ECC is corrupted by a burst in every segment,
which means there are around 8 bursts in the first frame; while the first frame (I frame)
of the bitstream employing packetization is corrupted by only one burst.
At the above specified test conditions, neither Packetization(380) nor
Packetization(450) is able to deliver a viewable video output. To have a biased
comparison favoring the packetization approach, another test is conducted with the
packetization approach where the first frame (which is an I) is corrupted by only one
burst of bursty errors when packetization is employed, whereas the I frame is corrupted
by 8 bursts of bursty error when the ECC scheme is used. The results with
packetization are still disappointing as shown in Figure 9.5, as only the first two frames
are viewable with the rest rapidly declining to unrecognizable. Though not depicted in
Figure 9.5, our results also reveal that reducing the packet size from 450 bit to 380 bits
does not show much improvement for combating the bursty residual errors. The reason
is that as stated in Chapter 6, there is more chance that the packet header and DC or
motion markers get corrupted, as the packet size get smaller. Because the length of the
bursty error is 360 bits, there is no point to reduce the packet size smaller than 360 bits,
otherwise it will be definite that one of the packet header will be corrupted.
Consequently the whole packet will have to be discarded.
152
9.3.2 ECC video with burst lost in GPRS network
GPRS [1,2,3,4] is an end-to-end mobile packet radio communication system based on
the same radio architecture as GSM. The capability of multiple timeslot allocation in
GPRS networks effectively increases the throughput of a single terminal, which makes
video transmission over GPRS network realistic [5,6,7].
GPRS radio blocks are arranged into GSM bursts for transmission across the radio
interface, where the Physical Link Layer is responsible for forward error detection and
correction. GPRS data is transmitted over the Packet Data Traffic Channel (PDTCH)
and is protected by four different channel coding schemes. CS-1, CS-2 and CS-3 [8]
use convolutional codes and block check sequences of differing strengths so as to give
different rates. CS-4 [8] on the other hand only provides error detection functionality
and was certainly is not good enough for being employed in video transmission. The
details of four channel coding schemes are listed in Table 9-1.
Table 9-1 GPRS Channel Coding Schemes
Scheme Code Rate Radio Block (bits) Data Rate (kb/s)
CS-1 1/2 181 9.05CS-2 2/3 268 13.4CS-3 3/4 312 15.6CS-4 1 428 21.4
In our experiment, we assume the CS-1 is used. The goal of the experiment is to test
the performance of the ECC approach enhanced with interleaving when there is a burst
loss in every segment, which corresponds to a loss of 181 bits in every segment. Again
the length of the segment for the ECC coding is set to the average frame length of the
final video bitstream of the 50 frames. When the network detects a burst loss, dummy
data is inserted into the bitstream where the burst loss occurs. In each test, the same
base convolutional code (561, 752) is chosen. To obtain decent reconstructed video
output, the base convolutional code has to be punctured to 5/6, which means a single bit
has to be inserted for every 5 bits in the bitstream. When the packetization schemes are
employed, the packet size is chosen as 250, 350, 450 bits respectively.
153
Performance of ECC in Packet loss
10
15
20
25
30
35
40
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49Frame Number
PSN
R
Error FreeECC(5/6)Packetization(250)
Figure 9.6 Performance of Salesman with burst loss
Note: Because the packetization approach is unable to deliver a recognizable video output at
the same test conditions as in the ECC scheme, we have set different test conditions for
ECC and Packetization respectively. There is only one burst lost in the first frame of the
bitstream employing the packetization scheme, while there are 8 bursts lost during
transmission in the bitstream employing the ECC approach.
The results are shown in Figure 9.6. Table 9-5 and Table 9-6 list the comparison of the
number of bits between the different ECC schemes and the different packetization
approaches with different packet sizes when the packetization schemes are employed.
Again the packetization approaches fail to deliver a viewable reconstructed video output
no matter how long the packet size is set at the specified test conditions. As in the
bursty error situations, to have a biased comparison in favoring the packetization
approaches, more favorable test conditions are set for the packetization approach by
only having one burst lost in the first frame (I frame) during transmission when
packetization is employed, while nearly 8 bursts are lost in the first frame (I frame)
when ECC and interleaving are employed. The results are depicted in Figure 9.6. Still
the packetization approach is disappointing, as only the first two frames are viewable
with rest rapidly declined to unrecognizable.
154
In such poor residual error conditions, the ECC(5/6) results in a 20% increase of
overhead in the final bitstream compared with the basic bitstream (here we define basic
bitstream as the bitstream in which no error resilience tool including packetization, Data
Partitioning, RVLC and ECC is employed). It does not look very efficient, but in such a
tough transmission condition, the ECC scheme does make video communication
possible while it is impossible by employing the packetization approaches. The
simulations also reveal that reducing the packet size from 450 bits to 250 bits, which
results in the same bit rate as ECC(5/6), has not shown much improvement on the
reconstructed video output when the packetization schemes are employed. The reason
is the same as in the burst error situation; reducing packet size will introduce too many
makers which will introduce too much vulnerability.
It is interesting to notice that the packetized video output with the burst loss in the last
experiment has a better performance than the packetized video output with bursty errors
in previous experiments, at least for the first few picture frames. This should not be
surprising when recalling that the burst loss has a length of 181 bits while the bursty
errors have a length of 360 bits, also the packet size in the previous experiment is 380
bits while the packet size in the last experiment is 250 bits.
9.4 Discussion
In this Chapter, Interleaving following an ECC operation is proposed to combat bursty
errors and packet loss in final video bitstreams. The interleaving is performed within
each segment of the final bitstream. It has been proved that SEC achieved with ECC
and enhanced with interleaving is more effective than the packetization approach not
only in a Gaussian channel, but also in a more challenging bursty error channels. It als
copes with packet loss very well. It can therefore be used to realize video
communication in such harsh environments where it is impossible with the
packetization approach.
Different from first error control where diverse matured hybrid ARQ schemes are
available, it is not realistic for SEC to employ ARQ technique when ECC fails.
Therefore it is mainly depends on the proper design of an ECC scheme for the worst
case channel conditions. This does not seem very efficient when the residual error is in
155
good conditions if the ECC is designed for the worst residual error conditions. But
from our experience, with a modest increase of ECC rate, the capability of ECC for
correcting errors can improve significantly. For instance, when the code rate of the
punctured convolutional code (561,752) increases from 13/14 to 11/12, its capability to
correct error increase dramatically. So even when the ECC is designed for the worst
residual error conditions, it is still quite efficient.
When a back channel from decoder to encoder is available, the employment of the IFR
technique can improve the performance of video transmission in both bursty errors and
burst loss significantly. More detail on IFR can be referred to in Chapter 7. From the
simulation results in Chapter 7, it is reasonable to expect that the PSNRs of the video
outputs with Salesman reported in the last section in both burst errors and packet loss
can be further improved to over 30 dB if IFR is employed in the simulation.
Another possibility to achieve more efficiency is to design a dynamic ECC scheme,
which follows the change of the residual error condition at the application layer if the
fading period of the channel lasts longer than one segment. This needs the availability
of a back channel message from the decoder.
One disadvantage with ECC and interleaving is that it introduces a decoding delay of
one segment because the ECC operation and interleaving is based on each segment of
the final encoded video bitstream. To reduce the decoding delay the segment needs to
be small. However to increase the effectiveness of interleaving, the segment need to be
longer. In practice a compromise needs to be made between these two contrary effects.
In our experiments the segments have been set to the average frame length of a video
frame. It needs to be emphasized that ECC alone based on segments does not introduce
any transmission delay or decoding delay. It is only the interleaving which introduces
the delays. Actually a packetization scheme also introduces a decoding delay of one
packet. To reduce the decoding delay, the packet size should not be long. However,
reducing the packet size reduces the coding efficiency significantly. A compromise is
also needed too if a packetization scheme is employed in practice.
It has been identified that ECC based on video frames has its disadvantages (see
Chapter 8). However it has advantages too, i.e. the interleaving operation based on each
frame of the bitstream will be more effective than based on segments, especially for an I
156
frame if the delay requirement is not so crucial. How the technique is employed in
reality needs to be flexible. When the main concern is on bursty errors or packet loss
regarding the transmission errors and the delay requirement is not very crucial, like a
real time downloading application in an ATM network [16,17], the ECC operation can
be conducted based on each video frame to stretch the effectiveness of interleaving to
its limit. Of course IFR and NEWPRED should be employed with ECC and
interleaving too, when a back channel is available to recover the decoding operation if
an error occurs in the bitstream, especially when the error occurs in a PSC (Picture Start
Code). The simulation results shown Figure 9.7 and Figure 9.8 demonstrate how
powerful a different interleaving scheme can be. In these simulations the interleaving is
conducted on each frame instead of on each segment, based on the same test conditions
as in Section 9.3 except that here we have a burst error or packet loss in each frame.
Comparing the performance of packetization in bursty errors and packet loss, the
achievement of the ECC with interleaving is really substantial.
The length of the bursty errors and the length of the packet loss are set to 360 bits and
181 bits respectively in the simulations in this chapter. When the length of the bursty
errors or packet loss is further increased, it seems that the performance of the
packetization approaches may be more competitive than the ECC schemes.
Theoretically the packetization approaches may be able to regain synchronization
within a frame when an error occurs in the frame, while the ECC approaches may fail as
the length of the bursty errors or the length of the packet loss increases. However, when
the length of the bursty errors or the length of the packet loss is increased to such a
degree that even the decoder is able to regain synchronization within a frame through
the employment of a packetization scheme after the errors occur, the video output is still
unacceptable. Not many people would think a video output is acceptable if it leaves
some big empty stripes on the screen. In this sense, regaining synchronization within a
frame does not help much for improving the video output when the length of the bursty
errors or the length of the packet loss is long enough. It has been shown that even when
the length of the burst error is only 360 bits, the reconstructed video output rapidly
declines to unrecognizable.
157
References
[1] J.Cai and D.J.Goodman, “General Packet Radio Service in GSM”, IEEE
Communications Magazine, vol.35, no.10, October 1997, pp. 122-131.
[2] G. Brasche and B. Walke, “Concepts, Services, and Protocols of the New GSM
Phase 2+ General Packet Radio Service”, IEEE Communications Magazine, vol.35,
no.8, August 1997, pp. 94-104.
[3] GSM 03.60 Digital Cellular Telecommunications System, General Packet Radio
Service (GPRS), Service description, Stage 2, 1997.
[4] GSM 03.64 Digital Cellular Telecommunications System, General Packet Radio
Service (GPRS), Overall description of the GPRS radio interface, Stage 2, 1997.
[5] Bing Du, A. Maeder and M. Moody, “A framework for live video delivery over
GPRS networks”, Proc. AMOC 2000, November 2000, Penang, Malaysia.
[6] Bing Du, A. Maeder and M. Moody, “Video delivery over mobile communication
channels”, Presentation at CRC-SS annual conference, Adelaide, Australia, 2000.
[7] Bing Du and Anthony Maeder, “Approaches to Video Transmission over GSM
Networks”, Proc. SAICSIT 99, South Africa.
[8] GSM 05.03 Digital Cellular Telecommunications System; Channel Coding, 1999.
[9] J. G. Proakis, “Digital Communication”, McGraw-Hill, 1995.
[10] L. H. Charles Lee, “Convolutional Coding – Fundamentals and applications”,
Artech House, 1997.
[11] R. B. Wells, “Applied Coding and Information Theory for Engineers”, Prentice-
Hall, 1999.
[12] J. L. Ramsey, “Realization of Optimum Interleavers”, IEEE Trans. Inform. Theory,
Vol. IT-16, 1970, pp. 338-345.
[13] G. D. Jr. Forney, “Burst Correcting Codes for the Classic Bursty Channel”, IEEE
Trans. Commun. Tech., vol. COM-19, October 1971, pp. 772-781.
158
[14] ISO/IEC 14496-2, “Information Technology – Coding of Audio-Visual Objects:
Visual”, 2001.
[15] ISO/IEC JTC1/SC29/WG11 N3908, “MPEG-4 Video Verification Model” version
18.0, January 2001/Pisa.
[16] Marc Boisseau, “An Introduction to Atm Technology”, International Thomson
Publishing, October 1995.
[17] Uyless Black, “ATM: Foundation for Broadband Networks”, Prentice Hall,
December 1998.
[18] Bing Du, Anthony Maeder and Miles Moody, “A new approach for error resilient
in video transmission using ECC”, accepted by International Workshop on Very
Low Bit-rate Video, 18-19 September 2003, Madrid Spain.
[19] Bing Du, Anthony Maeder and Miles Moody, “ECC video with Intra Frame
Relay”, accepted by IADIS International WWW/Internet 2003 Conference,
Algarve, Portugal, November 2003.
159
Salesman in Bursty Errors
5
10
15
20
25
30
35
40
1 5 9 13 17 21 25 29 33 37 41 45 49
Frame Number
PS
NR
ECC with InterleavingPacketisationError Free
Figure 9.7 Performance of Salesman with bursty errors (the interleaving is based on frame)
Video with Burst Loss
5
10
15
20
25
30
35
40
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Frame Number
PS
NR
ECCPacketisationError Free
Figure 9.8 Performance of Salesman with burst loss (the interleaving is based on frame)
160
Table 9-2 Bit number comparison between Packetization(450) and ECC (9/10) for Salesman
Frame Number ECC (9/10) Packetization (450)0 79336 82880 1 8760 8856 2 14440 14344 3 16624 16536 4 15880 15760 5 12768 12848 6 9048 9088 7 8640 8704 8 9448 9480 9 9440 9504
10 10464 10504 11 9656 9640 12 8624 8640 13 9600 9704 14 10304 10376 15 10384 10272 16 8848 8792 17 10064 10032 18 14744 14552 19 17048 16848 20 14488 14464 21 10912 11080 22 10728 10824 23 9208 9200 24 6560 6544 25 5824 5784 26 7160 7160 27 8128 8112 28 6968 6984 29 5192 5208 30 6040 6048 31 6952 7008 32 7640 7728 33 8432 8488 34 11248 11344 35 14128 14056 36 16296 16224 37 14480 14424 38 15168 15048 39 13632 13648 40 11304 11360 41 11560 11504 42 10928 10784 43 10400 10392 44 11064 11040 45 11704 11848 46 9640 9776 47 9128 9144 48 8520 8616 49 7928 7968
Total 595480 599168 Average 11909.6 11983.36
161
Table 9-3 Bit number comparison between Packetization(450) and ECC (7/8) for Salesman
Frame Number ECC (7/8) Packetization (450)0 81600 82880 1 9008 8856 2 14848 14344 3 17096 16536 4 16328 15760 5 13128 12848 6 9312 9088 7 8888 8704 8 9720 9480 9 9712 9504
10 10760 10504 11 9928 9640 12 8872 8640 13 9872 9704 14 10600 10376 15 10680 10272 16 9096 8792 17 10352 10032 18 15160 14552 19 17536 16848 20 14904 14464 21 11216 11080 22 11040 10824 23 9472 9200 24 6752 6544 25 5992 5784 26 7360 7160 27 8360 8112 28 7168 6984 29 5344 5208 30 6208 6048 31 7152 7008 32 7856 7728 33 8672 8488 34 11568 11344 35 14528 14056 36 16760 16224 37 14896 14424 38 15600 15048 39 14016 13648 40 11624 11360 41 11888 11504 42 11240 10784 43 10696 10392 44 11376 11040 45 12032 11848 46 9912 9776 47 9392 9144 48 8760 8616 49 8160 7968
Total 612440 599168 Average 12248.8 11983.36
162
Table 9-4 Bit number comparison between Packetization(380) and ECC(7/8) for Salesman
Frame Number ECC(7/8) Packetization(380) 0 81600 84248 1 9008 9032 2 14848 14704 3 17096 16944 4 16328 16072 5 13128 13096 6 9312 9232 7 8888 8920 8 9720 9656 9 9712 9736
10 10760 10704 11 9928 9824 12 8872 8856 13 9872 9856 14 10600 10560 15 10680 10512 16 9096 8968 17 10352 10272 18 15160 14800 19 17536 17312 20 14904 14744 21 11216 11320 22 11040 11112 23 9472 9512 24 6752 6640 25 5992 5928 26 7360 7360 27 8360 8256 28 7168 7144 29 5344 5288 30 6208 6248 31 7152 7184 32 7856 7880 33 8672 8712 34 11568 11656 35 14528 14304 36 16760 16480 37 14896 14696 38 15600 15416 39 14016 13888 40 11624 11592 41 11888 11832 42 11240 11072 43 10696 10632 44 11376 11296 45 12032 12104 46 9912 10000 47 9392 9344 48 8760 8752 49 8160 8192
Total 612440 611888 Average 12248.8 12237.76
163
Table 9-5 Bit number comparison between Packetization(450) and ECC(5/6) for Salesman
Frame Number ECC (5/6) Packetization (450)0 85680 82880 1 9456 8856 2 15592 14344 3 17952 16536 4 17144 15760 5 13784 12848 6 9776 9088 7 9328 8704 8 10208 9480 9 10192 9504
10 11296 10504 11 10424 9640 12 9312 8640 13 10368 9704 14 11128 10376 15 11216 10272 16 9552 8792 17 10864 10032 18 15920 14552 19 18416 16848 20 15648 14464 21 11776 11080 22 11584 10824 23 9944 9200 24 7088 6544 25 6288 5784 26 7728 7160 27 8776 8112 28 7528 6984 29 5608 5208 30 6520 6048 31 7504 7008 32 8248 7728 33 9104 8488 34 12144 11344 35 15256 14056 36 17600 16224 37 15640 14424 38 16376 15048 39 14720 13648 40 12200 11360 41 12480 11504 42 11800 10784 43 11232 10392 44 11944 11040 45 12632 11848 46 10408 9776 47 9856 9144 48 9200 8616 49 8560 7968
Total 643000 599168 Average 12860 11983.36
164
Table 9-6 Bit number comparison between Packetization(250) and ECC(5/6) for Salesman
Frame Number ECC(5/6) Packetization(250)0 85680 88376 1 9456 9576 2 15592 15432 3 17952 17600 4 17144 16968 5 13784 13704 6 9776 9704 7 9328 9368 8 10208 10176 9 10192 10216
10 11296 11256 11 10424 10328 12 9312 9304 13 10368 10312 14 11128 11080 15 11216 10984 16 9552 9416 17 10864 10720 18 15920 15544 19 18416 18008 20 15648 15592 21 11776 11968 22 11584 11600 23 9944 9976 24 7088 7056 25 6288 6264 26 7728 7688 27 8776 8632 28 7528 7472 29 5608 5576 30 6520 6552 31 7504 7560 32 8248 8336 33 9104 9160 34 12144 12160 35 15256 14984 36 17600 17408 37 15640 15448 38 16376 16184 39 14720 14584 40 12200 12336 41 12480 12376 42 11800 11608 43 11232 11168 44 11944 11784 45 12632 12640 46 10408 10480 47 9856 9880 48 9200 9264 49 8560 8616
Total 643000 642424 Average 12860 12848.48
165
10 ECC WITH PACKETIZATION So far we have seen that in some extreme situations, the packetization approach is
unable to deliver a decent video output when decoding an error-corrupted video
bitstream, while the ECC scheme can always achieve this although with certain amount
of overhead. However, it may be unrealistic to increase the ECC power by simply
increasing the ECC rate due to the limitation of available channel capacity etc. It will
be interesting to investigate the possibility of combining the advantages of ECC and
packetization approaches.
10.1 Combination of ECC and Packetization
The scheme combining ECC and packetization is quite straightforward, i.e. a
packetization scheme is employed first, followed by an ECC scheme based on segments
of the final compressed video bitstream. It is necessary that the allocation of bitstream
for ECC and packetization should be optimized. The packet size should not be too
small; otherwise the overhead in the bitstream will be increased dramatically, reducing
the coding efficiency. The packet size should not too big either; otherwise the
advantage of packetization cannot be realized.
10.2 Simulation Results
In this experiment, the compressed video sequence Salesman is protected with three
different schemes in packet loss situation – ECC(5/6), ECC(7/8) and ECC(7/8) plus
Packetization(5000). The packet loss is the same as in last chapter; i.e. there is a burst
166
lost in every segment, which corresponds to a loss of 181 consecutive bits in every
segment. The segment is set to the average frame length of the 50 compressed frames.
The packet size is set to 5000 bits, that means there are there 2 or 3 packets in a P frame
on average, while there are about 17 packet in an I frame on average. The result is
shown in Fig. 10-1. The bit budget for each scheme is listed in Table 10-1. We have
already known from last chapter that packetization alone fails to deliver a viewable
video output. From Figure 10-1 it is clearly shown that ECC(7/8) alone is
unsatisfactory also. Both ECC(5/6) and combination of ECC(7/8) with packetization
deliver satisfactory output. But the combination approach has a marginally better PSNR
than ECC(5/6) alone, while the combination uses fewer bits, which means the
combination approach is more efficient and effective than ECC(5/6) alone. This result
provides us with a fresh view on the packetization approach.
Performance of Salesman with ECC plus Packetization
10
12
14
16
18
20
22
24
26
28
30
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49
Frame Number
PSN
R
ECC (7/8) plus Packetization (5k)ECC(5/6)ECC(7/8)
Figure 10.1 PSNR of ECC combined with packetization
From previous chapters it can be concluded that the ECC approach alone is better than
the packetization approach alone. However, from the simulation above we can see
167
though packetization is not as effective as ECC, it can be used to increase the
performance of ECC with some sacrifice of coding rate.
Here a question has been raised, when will it be beneficial to use combination of ECC
and packetization compared with ECC alone? Also, how to optimally distribute the
available bit rate into Packetization and ECC when a combination is employed? These
questions will remain as part of future work.
Adding redundancy in the video bitstream at the application layer to realize error
resilience in different ways (resynchronisation/packetization versus ECC) results in
different results. The results in this work are obtained only by simulations. The
performances of these two totally different approaches toward error resilience need to
be tested and compared in a field implementation in the real world.
When ECC is combined with packetization, the IFR should not be used as IFR is based
on the assumption that whole frame only contains one packet, i.e. when packetization is
not used. Actually when a back channel is available, ECC enhanced with IFR is much
more effective and efficient than ECC combined with packetization as shown in
previous chapters and this chapter. If a back channel is not available, ECC with
packetization is an option.
So far three tools including ECC, IFR and Interleaving for error resilience have been
developed based on the SEC approach in addition to the error resilience tools in the
MPEG-4 standard. Each tool has its own advantages and disadvantages. Generally the
ECC approach is superior to the packetization approach, as the former is active and the
latter is passive. They can be used alone or combined with other tools. How and which
combination of these tools to be employed in reality needs to be flexible and optimized
based on each application’s characteristic and transmission’s error patterns.
168
Table 10-1 Bits number comparison when ECC and Packetization are combined
Frame Number ECC (7/8)&Pack5K Packetization450 ECC (5/6) ECC only (7/8) 0 86568 82880 85680 81600 1 9248 8856 9456 9008 2 15184 14344 15592 14848 3 17376 16536 17952 17096 4 16552 15760 17144 16328 5 13384 12848 13784 13128 6 9464 9088 9776 9312 7 9032 8704 9328 8888 8 9888 9480 10208 9720 9 9888 9504 10192 9712
10 10952 10504 11296 10760 11 10104 9640 10424 9928 12 9024 8640 9312 8872 13 10080 9704 10368 9872 14 10744 10376 11128 10600 15 10816 10272 11216 10680 16 9200 8792 9552 9096 17 10488 10032 10864 10352 18 15296 14552 15920 15160 19 17904 16848 18416 17536 20 15184 14464 15648 14904 21 11456 11080 11776 11216 22 11272 10824 11584 11040 23 9640 9200 9944 9472 24 6888 6544 7088 6752 25 6072 5784 6288 5992 26 7480 7160 7728 7360 27 8528 8112 8776 8360 28 7296 6984 7528 7168 29 5368 5208 5608 5344 30 6320 6048 6520 6208 31 7328 7008 7504 7152 32 8048 7728 8248 7856 33 8872 8488 9104 8672 34 11840 11344 12144 11568 35 14856 14056 15256 14528 36 17032 16224 17600 16760 37 15080 14424 15640 14896 38 15824 15048 16376 15600 39 14336 13648 14720 14016 40 11840 11360 12200 11624 41 12104 11504 12480 11888 42 11336 10784 11800 11240 43 10832 10392 11232 10696 44 11528 11040 11944 11376 45 12304 11848 12632 12032 46 10112 9776 10408 9912 47 9552 9144 9856 9392 48 8944 8616 9200 8760 49 8312 7968 8560 8160
Total 626776 599168 643000 612440 Average 12535.52 11983.36 12860 12248.8
169
11 CONCLUSIONS AND FUTURE WORK
This dissertation has explored the possibility of optimizing the utilization of shared
scarce radio channels for live video transmission over a GSM network in the first three
chapters, then concentrated on realizing error resilient video communication in
unfavorable channel conditions; especially in mobile radio channels.
11.1 Optimized utilization of radio channels
To improve the utilization of the scarce radio channel resources for live video
transmission over a GPRS network, the needs of modification to the current network
protocols have been identified and several suggestions on how they can be achieved
have been proposed. The most important contribution is the proposal of the new
method to update the channel capacity to accommodate the different data rate
requirements of different frame types of the compressed video bitstream during a
session of live video communication. The reconfiguration of the multi-slot to have
more than the set of the current active channels should be achieved by means of the
communication between the MS and the BSS, rather than by means of the re-access of
PRACH during the real time transmission, which would involve further contention. The
content of the communication should be imbedded into the video data transmitted from
MS to BSS.
11.2 The proposed error resilience video coding tools
To cope with residual transmission errors, a new concept for error resilience employing
Second Error Control (SEC) has been introduced. The simulation results from our
170
research have proved the great success of the SEC concept. This success really opens a
new direction to the field of video coding and transmissions and other real-time
communications.
Throughout the previous chapters, three error resilient video coding tools including
ECC, IFR and Interleaving have been proposed based on the SEC approach. ECC can
be used in random error situations. Enhanced with Interleaving ECC can also be used
in bursty errors and packet losses situations. IFR can be used in either case when a back
channel is available to further improve the performance of ECC. These three tools are
very effective to protect I frames compared with traditional error resilience techniques,
which is very important and often ignored by many researchers.
The original error resilience tools in MPEG-4 standard including Data Partitioning and
RVLC can still be used with these new tools. However if Data Partitioning is used with
an ECC scheme without employing packetization, a decoding delay of one frame will
be introduced. Also although RVLC can improve the error robustness of a video
bitstream, the cost is not small and this can be seen clearly from Table 11.2 and Table
11.3. The use of RVLC results in an increase of bit rate by 1.13% for Salesman and
1.89% for Akiyo. This is contrary to the statement, claimed by some researchers, that
RVLC could achieve its goal with little or no loss of coding efficiency.
In reality, the employment of these three tools needs to be flexible and optimized. In
random error situations (e.g. ISDN), ECC should be enough; while in bursty error
situations like wireless environments, ECC should be enhanced with interleaving. In
stable residual error conditions, the use of the IFR is not so important as a properly
designed ECC scheme matching the residual error condition can almost correct all
residual errors in the bitstream. But in unstable residual error conditions, the use of IFR
will be crucial to reduce the ‘half image’ effects. If the errors are mainly bursty or from
packet loss, Interleaving should be based on frames rather than on segments.
Employing these new tools, this research reduces the traditional requirement for the
end-to-end bit error rate from 10-5 to 10-2 for video communications without the need of
channel coding. Of course, channel coding is an integrated part of any network, so it is
reasonable to expect that in reality the new approaches will be more satisfactory. Also
the proposed schemes are very effective too to cope both bursty errors and packet loss.
171
Most importantly when the BER of a final bitstream deteriorates to 10-3 in random error
situations, the ECC approach can still deliver a video output which erases all the
residual error effects in the final bitstream, while packetization fails to deliver a
recognizable video output.
Now it is the time to have a more general comparison among the tools proposed in this
thesis and the tools in the standards. In circuit switched networks like ISDN, the FEC
[3] in H.263 is much more efficient and effective as at least the FEC is able to correct
one error bit in a chunk of 492 bits and detect two error bits with only 4% increase in bit
rate. While providing more than 9.9% increase in the data rate when packet size is set
to 600 bits, packetization combined with Data Partitioning and RVLC is less effective
than FEC in H.263 because basically the error resilience tools in MPEG-4 [2] are
passive and so they do not have the capability to correct errors in the encoded video
bitstream. The superiority of ECC over FEC for error resilience has been identified [5]
(also see Chapter 6). In packet switched networks where packet loss can happen often, it
has been shown [1,4] that the ECC approach enhanced with interleaving is more
effective to combat packet loss. In mobile networks, it has been shown in Chapter 6
and Chapter 8 that when the BER of the encoded video bitstream reaches 10-4, the
reconstructed video quality with packetization is generally unacceptable [5], while ECC
video still delivers decent reconstructed video output even when the BER of the final
video bitstream reaches 10-2. In bursty channel errors, [1,4] also shows that the ECC
approach enhanced with interleaving is much more effective than the packetization
approach.
Table 11-1 Performance comparison for Salesman
Resilience Scheme Performance Average Number of Bits Increase of Number of Bits Channel
Conditions ECC Pack ECC Pack ECC Pack ECC Pack
Ran 10-4 11/12 600 32.53 23.31 11693 11768 9.2% 9.9%
Ran 10-3 11/12 600 32.06 9.27 11693 11768 9.2% 9.9%
Bursty error 7/8 380 31.32 16.93 12248 12237 14.38% 14.28%
Burst Loss 5/6 250 26.52 17.42 12860 12848 20.09% 20%
172
Note:
1. Pack in the table represents packetization. The Resilience Scheme in the table
represents Error Resilience Scheme.
2. The performance of the schemes is evaluated in term of PSNR, which is taken as
the average over the 50 frames.
3. The average number of bits is the average value of the number of bits for each
frame over the 50 frames.
4. The Average PSNR of the 50 frames of Salesman at Transmission Error Free is
32.63.
5. The PSNRs with bursty errors and bursty loss are obtained based on segments,
i.e. there is one bursty error or burst loss in every segment. Each segment with
ECC is further encoded with punctured convolutional code, while each segment
with packetization is left as it is before being exposed to the channel errors.
6. The calculation of the increase on number of bits is compared with the basic
bitstream.
To have a more detailed, concrete and direct comparison of the different approaches, the
results we have achieved with the newly proposed error resilience tools so far in
previous chapters are summarized in the Table 11.1.
From Table 11.1, it can be seen that in all situations the ECC approaches outperform the
packetization schemes in terms of the quality of reconstructed video outputs. In the
random error situations the ECC approaches are more efficient than packetization
approaches, here by more efficient we mean that less number of bits for encoding the
video sequences is used. In bursty error or burst loss situations the packetization
schemes produce approximately equal bit rates of the final video bitstreams with the
corresponding ECC schemes, but the reconstructed video outputs delivered by the
packetization approaches are unrecognizable, while ECC can always delivers an
excellent outputs for both situations with negligible sacrifice of the coding efficiency.
Another important fact to mention is that when the ECC coding rate increases, for
instance from 11/12 to 9/10 or 7/8 to 5/6, the capability of the ECC schemes to combat
both random or bursty error improves dramatically. Contrary to this is that when the
173
packet size changes from 450 bits to even 250 bits, the capability of the packetization
schemes to combat bursty error or burst loss does not improve much. As mentioned in
Chapter 6, when packet size has been reduce to such degree or saturation point, further
reducing packet size doesn’t improve the effectiveness of packetization as more markers
(including packetization markers, DC markers or motion markers) introduce more
vulnerability. Even with random errors, when packet size has been reduced to a
saturation point, further reducing packet size only increases bit number without
increasing the effectiveness of the packetization schemes. Although no experiments
have been done on ECC rates higher than 5/6, it is reasonable to expect that when the
ECC rate is further increased to 3/4 or 2/3 the effectiveness of ECC will be further
increased significantly; the only problem with these rates is that the overhead in the
final video bitstream will be increased significantly as well. But in some extreme
situation, this might be the only option anyway.
Now the final conclusion based on the results, which have been achieved through the
simulations in the previous chapters, can be summarized as the following. The active
error resilient SEC approach realized with ECC is much more effective and efficient
than the passive error resilience tools represented by the packetization or
resynchronization approach in the current video coding standards to combat random
residual errors in the final bitstream for real-time applications. Enhanced with
interleaving ECC is also very effective to cope with both bursty errors and packet
losses. The ECC schemes can be further improved with IFR. More importantly, the
SEC approach is simpler and more easily implemented compared with current error
resilience techniques. This conclusion also calls for a process to re-examine the MPEG-
4 and other video coding standards.
Some simple and direct applications of the research output will include mobile video
telephony, wireless video surveillance, remote video conferencing, remote medical
imaging and wireless multimedia communication.
11.3 Future Research Directions
Future work can be focused in the following directions.
174
The MAC (Medium Access Control) protocol for GPRS network or other network
needs to be modified to make the live video communication over GPRS or other
networks more efficient, which, instead of allocating the radio resources based on
contention among applications, should make the network have some kind of mechanism
to update the channel resources allocation for live video communication automatically
and periodically to accommodate the need for I frame transmission of the video
bitstream.
For ECC video, the coding efficiency can be further improved if a dynamic ECC is
designed and implemented for video communication in mobile environment, so that the
ECC rate can follow the change of residual error conditions.
The distribution of error control between first error control and SEC needs to be
optimized. The distribution of the available bandwidth of radio channels for source
coding, first error control and SEC needs to be optimized as well. A generic rate
control algorithm based on these optimum distributions will be more effective and
efficient. More effective first error control schemes including FEC and ARQ at the data
link layer need to further improved and investigated taking the second error control at
the application layer into consideration.
More accurate and effective error detection techniques during the video decoding
process need to be investigated to improve the effectiveness of the IFR and error
concealment.
To cope with a wide range of residual error conditions, the optimum puncturing patterns
of good base convolutional codes need to be further explored. As stated in Chapter 4, at
this stage the reported highest punctured code rate for base code (171,133) is 16/17,
while the highest punctured code rate for base code (561,752) is 13/14. The puncturing
pattern of higher code rate of 14/15, 15/16, 16/17, 17/18, 18/19, etc. needs to be found
for base code (561,752) or other good base codes including those with constraint length
longer than 9, as higher rate codes will make ECC video more efficient in favorable
channel conditions.
The optimum combination of ECC and packetization in some extreme situations
(mainly in bursty channel errors and packet loss) needs to be further investigated,
175
although in most situations ECC alone should be a first choice, especially for random
channel errors.
Convolutional code is not the only code to achieve SEC, the possibility of using other
error correction codes, say, Turbo codes [6], needs to be explored as well.
Actually ECC video is only one application of SEC. To apply the SEC approach to
other applications, including real-time and non real-time, can also be considered.
Another crucial research direction is to design the high speed chips performing
computing for the punctured convolutional coding to make the use of the constraint
length of the convolutional code longer than 9 realistic and economic for real time
applications. This will improve the performance of ECC or SEC significantly and make
the application of SEC to other real-time communication more efficient and effective.
References
[1] Bing Du, M. Ghanbari, “ECC video and its performance in bursty channel errors”,
Proceedings of Iranian Conference on Electrical Engineering (ICEE) 2003, May 6-
8, 2003, Shiraz, Iran.
[2] ISO/IEC 14496-2, “Information Technology – Coding of Audio-Visual Objects:
Visual”, 2001.
[3] ITU-T H.263 “Video coding for low bit rate communication”, 1998.
[4] Bing Du, M. Ghanbari, “ECC video in bursty channel errors and packet loss”,
Proceedings of Picture Coding Symposium 2003, Saint-Malo, France, 23 - 25 April
2003, pp.99-103.
[5] Bing Du, Anthony Maeder and Miles Moody, “A new approach for error resilient
in video transmission using ECC”, accepted by International Workshop on Very
Low Bit-rate Video, 18-19 September 2003, Madrid Spain.
[6] L. Hanzo, T. H. Liew, B. L. Yeap, “Turbo Coding, Turbo Equalisation and Space-
Time Coding for Transmission over Fading Channels”, Wiley Europe, July 2002.
176
Table 11-2 Bit number comparison between basic and RVLC for Salesman
Frame Number RVLC Basic 0 73744 71392 1 7992 7872 2 13112 12984 3 15032 14952 4 14400 14280 5 11592 11480 6 8192 8136 7 7824 7768 8 8568 8496 9 8560 8488
10 9504 9408 11 8752 8680 12 7808 7752 13 8712 8632 14 9336 9264 15 9384 9336 16 7968 7952 17 9080 9048 18 13288 13256 19 15440 15336 20 13152 13032 21 9936 9808 22 9768 9648 23 8360 8280 24 5944 5896 25 5240 5232 26 6464 6432 27 7368 7304 28 6296 6264 29 4688 4664 30 5456 5424 31 6328 6248 32 6968 6864 33 7672 7576 34 10216 10112 35 12840 12704 36 14760 14656 37 13088 13024 38 13728 13640 39 12384 12256 40 10232 10160 41 10456 10392 42 9840 9824 43 9392 9352 44 10032 9944 45 10616 10520 46 8744 8664 47 8280 8208 48 7744 7656 49 7200 7128
Total 541480 535424 Average 10829.6 10708.48
177
Table 11-3 Bits number comparison between basic and RVLC for Akiyo
Frame Number RVLC Basic 0 44080 42880 1 488 472 2 616 600 3 616 600 4 1040 1016 5 976 952 6 1024 1008 7 1200 1176 8 1432 1400 9 1608 1576
10 1504 1480 11 1344 1320 12 1904 1880 13 1856 1840 14 2680 2632 15 3864 3808 16 5392 5272 17 6496 6392 18 7280 7096 19 7080 6968 20 6328 6248 21 5840 5760 22 4848 4776 23 4032 3968 24 3976 3928 25 2960 2904 26 2872 2824 27 3544 3480 28 4456 4400 29 5120 5024 30 5592 5496 31 5664 5544 32 4792 4720 33 4008 3952 34 4344 4264 35 4616 4552 36 4848 4768 37 4712 4648 38 4888 4832 39 4984 4896 40 4480 4416 41 4472 4400 42 3992 3928 43 3472 3416 44 3496 3456 45 4080 4008 46 4416 4352 47 4640 4560 48 4704 4640 49 4160 4096
Total 226816 222624 Average 4536.32 4452.48