ECC Video: An Active Second Error Control Approach for...

ECC Video: An Active Second Error Control Approach

for Error Resilience in Video Coding

Bing Bing Du

Submitted in fulfillment of the requirements for the degree of

Doctor of Philosophy

In the School of Electrical and Electronics Systems Engineering

Queensland University of Technology

Brisbane

Australia

September 2003

Abstract

To support video communication over mobile environments has been one of the

objectives of many engineers of telecommunication networks and it has become a basic

requirement of a third generation of mobile communication systems. This dissertation

explores the possibility of optimizing the utilization of shared scarce radio channels for

live video transmission over a GSM (Global System for Mobile telecommunications)

network and realizing error resilient video communication in unfavorable channel

conditions, especially in mobile radio channels.

The main contribution describes the adoption of a SEC (Second Error Correction)

approach using ECC (Error Correction Coding) based on a Punctured Convolutional

Coding scheme, to cope with residual errors at the application layer and enhance the

error resilience of a compressed video bitstream. The approach is developed further for

improved performance in different circumstances, with some additional enhancements

involving Intra Frame Relay and Interleaving, and the combination of the approach with

Packetization.

Simulation results of applying the various techniques to test video sequences Akiyo and

Salesman are presented and analyzed for performance comparisons with conventional

video coding standard. The proposed approach shows consistent improvements under

these conditions. For instance, to cope with random residual errors, the simulation

results show that when the residual BER (Bit Error Rate) reaches 10-4, the video output

reconstructed from a video bitstream protected using the standard resynchronization

approach is of unacceptable quality, while the proposed scheme can deliver a video

output which is absolutely error free in a more efficient way. When the residual BER

reaches 10-3, the standard approach fails to deliver a recognizable video output, while

the SEC scheme can still correct all the residual errors with modest bit rate increase. In

bursty residual error conditions, the proposed scheme also outperforms the

resynchronization approach. Future works to extend the scope and applicability of the

research are suggested in the last chapter of the thesis.

Acknowledgements

I would like to acknowledge the excellent guidance, continuous help and generous

support from my PhD principal supervisor Prof. Anthony Maeder and associate

supervisor Prof. Miles Moody. It is their insights and encouragements that made this

research successful.

I would also like to express my gratitude for the financial assistance given to me by the

Cooperative Research Centre for Satellite Systems during my time at the Queensland

University of Technology.

i

Contents

Contents ______________________________________________________________i

List of Figures ________________________________________________________ix

List of Tables _________________________________________________________xi

1 INTRODUCTION ________________________________________________ 1

1.1 Mobile Video System __________________________________________ 1

1.2 Challenges on Networking Aspects _______________________________ 2

1.2.1 Optimized utilization of scarce radio channel resources ____________ 2

1.2.2 Effective error control schemes _______________________________ 3

1.3 Challenges on Source Video Coding ______________________________ 5

1.4 State of the Art of the Current Error Resilience Tools _______________ 6

1.5 Second Error Control and ECC video ____________________________ 6

1.6 Organization of the Thesis ______________________________________ 7

1.7 Contributions and Publications from the Research _________________ 9

References ________________________________________________________ 11

2 OVERVIEW of GSM SYSTEM ____________________________________ 15

2.1 Architecture and functions of the GSM network___________________ 15

2.1.1 Mobile station ____________________________________________ 16

2.1.2 The Base Station Subsystem_________________________________ 16

2.1.2.1 The Base Transceiver Station ______________________________ 16

ii

2.1.2.2 The Base Station Controller _______________________________ 16

2.1.3 The Network and Switching Subsystem________________________ 17

2.1.3.1 The Mobile services Switching Center (MSC)_________________ 17

2.1.3.2 The Gateway Mobile services Switching Center (GMSC)________ 17

2.1.3.3 Home Location Register (HLR) ____________________________ 17

2.1.3.4 Visitor Location Register (VLR) ___________________________ 18

2.1.3.5 The Authentication Center (AuC)___________________________ 18

2.1.3.6 The Equipment Identity Register (EIR) ______________________ 19

2.1.3.7 The GSM Interworking Unit (GIWU) _______________________ 19

2.1.4 The Operation and Support Subsystem (OSS) ___________________ 19

2.1.5 Additional Functional Elements ______________________________ 20

2.1.5.1 Message Center_________________________________________ 20

2.1.5.2 Mobile Service Node ____________________________________ 20

2.1.6 The geographical areas of the GSM network ____________________ 20

2.2 Signalling system in GSM _____________________________________ 21

2.2.1 GSM Radio Channels ______________________________________ 21

2.2.1.1 Dedicated Channels _____________________________________ 21

2.2.1.2 CCCH (Common Control Channels) ________________________ 21

2.2.2 Signalling Interfaces and Protocols ___________________________ 22

2.2.2.1 Um interface ___________________________________________ 23

2.2.2.2 A Interface ____________________________________________ 23

2.2.2.3 A-bis Interface _________________________________________ 24

2.2.2.4 MAP interfaces _________________________________________ 25

2.2.2.5 X.25 Interface System____________________________________ 26

2.3 The Multiple Access Scheme ___________________________________ 26

2.3.1 FDMA__________________________________________________ 26

2.3.1.1 Primary GSM __________________________________________ 26

2.3.1.2 E-GSM _______________________________________________ 26

2.3.1.3 DCS-1800 _____________________________________________ 27

2.3.2 TDMA__________________________________________________ 27

2.3.2.1 Traffic channel Frame Structure (26-Multiframe) ______________ 27

2.3.2.2 Signalling Frame Structure ________________________________ 28

iii

2.3.2.3 Structure of a TDMA Slot within a Frame ____________________ 28

2.3.3 Frequency Hopping________________________________________ 29

2.4 Source coding and channel coding ______________________________ 29

2.4.1 Speech coding ____________________________________________ 30

2.4.1.1 Full Rate speech Coding __________________________________ 30

2.4.1.2 Half Rate Speech Coding _________________________________ 30

2.4.1.3 Multirate Speech Coding _________________________________ 31

2.4.1.4 Enhanced Speech Coding _________________________________ 31

2.4.2 Channel coding ___________________________________________ 31

2.4.2.1 CRC__________________________________________________ 31

2.4.2.2 Block Code ____________________________________________ 32

2.4.2.3 Convolutional Code _____________________________________ 32

2.4.3 Interleaving ______________________________________________ 32

2.4.4 Encryption_______________________________________________ 32

References ________________________________________________________ 33

3 VIDEO OVER GPRS NETWORK__________________________________ 35

3.1 Data services in GSM networks_________________________________ 35

3.1.1 PDS and SMS ____________________________________________ 36

3.1.2 HSCSD _________________________________________________ 36

3.1.3 GPRS___________________________________________________ 37

3.2 Possibilities for video over GSM networks________________________ 38

3.2.1 Video over HSCSD________________________________________ 38

3.2.2 Video over GPRS _________________________________________ 39

3.2.3 Dynamic channel allocation _________________________________ 41

3.2.4 Example ________________________________________________ 43

3.2.5 EDGE __________________________________________________ 44

3.3 Conclusion __________________________________________________ 45

References ________________________________________________________ 45

iv

4 OVERVIEW OF VIDEO CODING TECHNIQUES AND THE CURRENT

VIDEO CODING STANDARDS ___________________________________ 47

4.1 Waveform based video coding __________________________________ 47

4.1.1 Motion estimation _________________________________________ 48

4.1.1.1 Optical flow techniques __________________________________ 48

4.1.1.2 Block matching techniques________________________________ 49

4.1.1.3 Pel-recursive techniques __________________________________ 49

4.1.2 Transforms ______________________________________________ 50

4.2 Model based video coding _____________________________________ 50

4.2.1 3D model coding__________________________________________ 51

4.2.2 2D model coding__________________________________________ 51

4.3 Current Video Standards______________________________________ 52

4.3.1 Core video coding techniques in the current video coding standard __ 53

4.4 Overview of error resilience techniques __________________________ 56

4.4.1 Error resilient encoding_____________________________________ 56

4.4.1.1 Robust Entropy encoding _________________________________ 56

4.4.1.2 Error Resilient prediction _________________________________ 57

4.4.1.3 Layered Coding with Unequal Error Protection ________________ 58

4.4.1.4 Multiple Description Coding ______________________________ 59

4.4.2 Decoder Error Concealment _________________________________ 60

4.4.2.1 Recovery of Texture Information ___________________________ 60

4.4.2.2 Recovery of Coding Modes and Motion Vectors _______________ 62

4.4.3 Encoder and Decoder Interactive Error Control __________________ 62

4.4.3.1 Reference Picture Selection (RPS) Based on Feedback Information 63

4.4.3.2 Error Tracking Based on Feedback information________________ 63

4.5 Error resilience tools in the current video coding standards _________ 64

4.5.1 Error resilience tools in H.263 _______________________________ 64

4.5.1.1 Forward Error Correction Mode (FEC) (Annex H) _____________ 64

4.5.1.2 Slice Structure Mode (Annex K) ___________________________ 65

4.5.1.3 Independent Segment Decoding Mode (Annex R)______________ 65

4.5.1.4 Reference Picture Selection (RPS - Annex N) _________________ 66

v

4.5.2 Error resilience tools in MPEG-4 _____________________________ 68

4.5.2.1 Packetization___________________________________________ 68

4.5.2.2 Data Partitioning ________________________________________ 70

4.5.2.3 Reversible VLC ________________________________________ 71

4.5.2.4 Adaptive Intra Refresh for Error Resilience ___________________ 71

4.5.2.5 NEWPRED ____________________________________________ 72

References ________________________________________________________ 73

5 OVERVIEW OF ERROR CORRECTION TECHNIQUES _____________ 79

5.1 Introduction_________________________________________________ 79

5.2 Block codes _________________________________________________ 80

5.2.1 Linear Cyclic Codes _______________________________________ 81

5.3 Convolutional codes __________________________________________ 82

5.3.1 Convolutional Encoding ____________________________________ 82

5.3.2 Viterbi Decoding__________________________________________ 84

5.3.3 Performance of Convolutional codes __________________________ 86

5.3.3.1 Performance of Hard-decision Viterbi decoding algorithm _______ 86

5.3.3.2 Performance of Soft-decision Viterbi decoding algorithm________ 87

5.3.3.3 Advantages of soft-decision over hard-decision decoding ________ 88

5.3.4 Punctured Convolutional code _______________________________ 90

References ________________________________________________________ 91

6 SECOND ERROR CONROL AND ECC VIDEO______________________ 93

6.1 Introduction_________________________________________________ 93

6.2 Second Error Control_________________________________________ 95

6.3 ECC video – the SEC approach_________________________________ 96

6.4 Simulation Results ___________________________________________ 99

6.4.1 Experiment conditions _____________________________________ 99

6.4.2 Results_________________________________________________ 100

vi

6.5 Discussion__________________________________________________ 103

References _______________________________________________________ 105

7 ECC VIDEO WITH IFR _________________________________________ 115

7.1 Introduction________________________________________________ 115

7.2 ECC with IFR ______________________________________________ 116

7.3 Simulation results ___________________________________________ 118

7.4 Delay analysis due to the employment of IFR ____________________ 120

7.5 Conclusion _________________________________________________ 122

References _______________________________________________________ 122

8 ECC VIDEO WITH SOFT-DECISION VITERBI DECODING ________ 127

8.1 Introduction________________________________________________ 127

8.2 ECC Video with Soft-Decision Viterbi Decoding__________________ 128

8.3 Simulation results ___________________________________________ 129

8.4 Discussion__________________________________________________ 132

Reference ________________________________________________________ 134

9 ECC VIDEO IN BURSTY CHANNEL ERRORS AND PACKET LOSS _ 145

9.1 Performance of the original ECC approach in Bursty residual Errors 146

9.2 ECC Video with Interleaving__________________________________ 148

9.3 Simulation Results __________________________________________ 150

9.3.1 ECC video in bursty errors _________________________________ 150

9.3.2 ECC video with burst lost in GPRS network ___________________ 152

9.4 Discussion__________________________________________________ 154

References _______________________________________________________ 157

vii

10 ECC WITH PACKETIZATION___________________________________ 165

10.1 Combination of ECC and Packetization_________________________ 165

10.2 Simulation Results __________________________________________ 165

11 CONCLUSIONS AND FUTURE WORK ___________________________ 169

11.1 Optimized utilization of radio channel __________________________ 169

11.2 The proposed error resilience video coding tools in this thesis ______ 169

11.3 Future Research Directions ___________________________________ 173

References _______________________________________________________ 175

ix

List of Figures

FIGURE 1.1 BLOCK DIAGRAM OF VIDEO TRANSMISSION SYSTEM OVER MOBILE CHANNELS...................... 1

FIGURE 2.1 GENERAL ARCHITECTURE OF A GSM NETWORK ................................................. 15

FIGURE 2.2 GSM NETWORK AREAS ..................................................................................... 20

FIGURE 2.3 UM AND A INTERFACE ...................................................................................... 22

FIGURE 2.4 A-BIS INTERFACE ............................................................................................. 24

FIGURE 2.5 MAP INTERFACES ............................................................................................ 25

FIGURE 2.6 TRAFFIC CHANNEL FRAME STRUCTURE .............................................................. 28

FIGURE 2.7 SPEECH SIGNAL PROCESSING............................................................................. 30

FIGURE 4.1 DCT BASED VIDEO CODING ..................................................................................... 53

FIGURE 4.2 ZIGZAG SCAN OF DCT COEFFICIENTS........................................................................ 54

FIGURE 4.3 VRC WITH TWO THREADS AND THREE FRAMES PER THREAD.......................................... 66

FIGURE 4.4 FRAME LOSS WITH VRC...................................................................................... 67

FIGURE 4.5 PACKET STRUCTURE .............................................................................................. 69

FIGURE 4.6 STRUCTURE OF DATA PARTITIONING......................................................................... 71

FIGURE 5.1 CONVOLUTIONAL ENCODER .................................................................................... 82

FIGURE 5.2 STATE DIAGRAM OF A 4-STATE CONVOLUTIONAL ENCODER ........................................... 83

FIGURE 5.3 TRELLIS DIAGRAM OF A 4-STATE CONVOLUTIONAL ENCODER ......................................... 84

FIGURE 5.4 BASIC PROCEDURE OF PUNCTURED CODING FROM RATE ½ CONVOLUTIONAL CODE.............. 90

FIGURE 6.1 VIDEO COMMUNICATION SYSTEM WITH ECC.............................................................. 97

FIGURE 6.2 PSNR OF SALESMAN THROUGH ERROR FREE CHANNEL ............................................... 107

FIGURE 6.3 PSNR OF SALESMAN WITH BER OF 1 X 10-5 ............................................................. 107

FIGURE 6.4 PSNR OF SALESMAN WITH BER OF 4 X 10-5 ............................................................. 108

x

FIGURE 6.5 PSNR OF SALESMAN WITH BER OF 1.7 X 10-4 .......................................................... 108

FIGURE 6.6 PSNR OF AKIYO THROUGH ERROR FREE CHANNEL .................................................... 109

FIGURE 6.7 PSNR OF AKIYO WITH BER OF 1 X 10-5................................................................... 109

FIGURE 6.8 PSNR OF AKIYO WITH BER OF 4 X 10-5................................................................... 110

FIGURE 6.9 PSNR OF AKIYO WITH BER OF 1.7 X 10-4 ............................................................... 110

FIGURE 6.10 PSNR OF AKIYO AT BER OF 10-4 .......................................................................... 111

FIGURE 6.11 PSNR OF SALESMAN WITH BER OF 10-4 ............................................................... 111

FIGURE 7.1 PSNR OF SALESMAN WITH BER OF 1X10-4............................................................. 124

FIGURE 7.2 PSNR OF AKIYO WITH BER OF 1X10-4........................................................... 124

FIGURE 8.1 PERFORMANCE OF ECC(11/12) FOR SALESMAN WITH RANDOM ERRORS ...................... 136

FIGURE 8.2 PERFORMANCE OF ECC(11/12) FOR AKIYO WITH RANDOM ERRORS ............................ 137

FIGURE 8.3 SALESMAN WITH BER OF 10-2 ............................................................................ 138

FIGURE 8.4 AKIYO WITH BER OF 10-2.................................................................................. 138

FIGURE 9.1 PSNR OF I PICTURE WITH BURSTY ERRORS ............................................................. 147

FIGURE 9.2 PSNR OF P FRAME WITH BURSTY ERRORS............................................................... 148

FIGURE 9.3 VIDEO COMMUNICATION SYSTEM WITH ECC AND INTERLEAVING.................................. 149

FIGURE 9.4 INTERLEAVER FOR CODED DATA ............................................................................ 149

FIGURE 9.5 PERFORMANCE OF SALESMAN WITH BURSTY ERRORS................................................ 151

FIGURE 9.6 PERFORMANCE OF SALESMAN WITH BURST LOSS...................................................... 153

FIGURE 9.7 PERFORMANCE OF SALESMAN WITH BURSTY ERRORS (THE INTERLEAVING IS BASED ON

FRAME) .................................................................................................................... 159

FIGURE 9.8 PERFORMANCE OF SALESMAN WITH BURST LOSS (THE INTERLEAVING IS BASED ON

FRAME) .................................................................................................................... 159

FIGURE 10.1 PSNR OF ECC COMBINED WITH PACKETIZATION............................................ 166

xi

List of Tables TABLE 6-1 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(13/14) FOR

AKIYO .................................................................................................................... 112

TABLE 6-2 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(13/14) FOR

SALESMAN .............................................................................................................. 113

TABLE 7-1 BIT NUMBER COMPARISON BETWEEN ECC ALONE AND ECC PLUS IFR FOR

AKIYO .................................................................................................................... 125

TABLE 7-2 BIT NUMBER COMPARISON BETWEEN ECC ALONE AND ECC PLUS IFR FOR

SALESMAN .............................................................................................................. 126


SALESMAN .............................................................................................................. 139

TABLE 8-2 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(600) AND ECC(11/12)

FOR AKIYO ............................................................................................................ 140


SALESMAN .............................................................................................................. 141


AKIYO .................................................................................................................... 142


SALESMAN .............................................................................................................. 143


AKIYO .................................................................................................................... 144

TABLE 9-1 GPRS CHANNEL CODING SCHEMES ............................................................ 152

xii

TABLE 9-2 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(450) AND ECC (9/10) FOR

SALESMAN .............................................................................................................. 160

TABLE 9-3 BIT NUMBER COMPARISON BETWEEN PACKETIZATION(450) AND ECC (7/8) FOR

SALESMAN .............................................................................................................. 161


SALESMAN .............................................................................................................. 162


SALESMAN .............................................................................................................. 163


SALESMAN .............................................................................................................. 164

TABLE 10-1 BIT NUMBER COMPARISON WHEN ECC AND PACKETIZATION ARE

COMBINED.............................................................................................................. 168

TABLE 11-1 PERFORMANCE COMPARISON FOR SALESMAN ............................................ 171

TABLE 11-2 BIT NUMBER COMPARISON BETWEEN BASIC AND RVLC FOR SALESMAN... 176

TABLE 11-3 BIT NUMBER COMPARISON BETWEEN BASIC AND RVLC FOR AKIYO.......... 177

1

1 INTRODUCTION

1.1 Mobile Video System

To support video communication over mobile environments has been one of the

objectives of many engineers of telecommunication networks and it has become a basic

requirement of a third generation of mobile communication systems. Advances in

video compression and mobile computing techniques have provided the possibility

[22,24] of transmitting video sequences over band-limited wireless channels.

Figure 1.1 Block diagram of video transmission system over mobile channels

The general block diagram of a video transmission system over a radio channel is

depicted in Fig 1.1. The digital video sequence is compressed by a video source

encoder and passed to a channel encoder, which adds appropriate redundancy for error

protection. After some transport processing and modulation, the video data packets are

Source Source Encoder

Channel

Encoder

Channel

Channel

Decoder

Source

DecoderDisplay

2

sent through the radio channel. At the receiver side, the demodulated data packets are

passed to the channel decoder for error detection and correction. The decoded packets

are reassembled to a bitstream and delivered to the video source decoder. The

decompressed video sequence is sent out for display. In a two-way communication

system, a return channel is available for the receiver to send back acknowledgements

about the receiving states.

To make this system realistic, many challenges need to be addressed on both

networking and video compression. The effort in this work has been put on transmitting

a video sequence over the GSM (global system for mobile telecommunications) [3]

network, because of the fact that it is widely implemented all over the world in more

than 80 countries around Europe, Asia and Australia.

1.2 Challenges on Networking Aspects

The main characteristic of a mobile network compared with a wired network is that the

radio channel resources are very scarce and error prone. There are two aspects, which

need to be addressed before a mobile video system can be put into commercial

operation.

1.2.1 Optimized utilization of scarce radio channel resources

Though HSCSD (high speed circuit switched data services) [1,22] and GPRS (general

packet radio service) [6] have paved the way making evolution of the GSM system

toward 3G (third generation system) more realistic, optimised utilization of the scarce

radio channels still pose a big challenge to a live mobile video system, because of the

variable bit rate of a video bitstream compressed using the current video coding

standards including MPEG series and H.26x series.

Both HSCSD and GPRS provide the possibility to transmit low bit rate video over the

GSM network, but they are not ideal as we face a problem in deciding how to allocate a

radio channel. The bit rate for I pictures (please refer to Chapter 4 for more detail about

the definition and content of I picture, P picture or B picture) is much higher than that

for P pictures or B pictures, so if we allocate a channel based on I picture size, most of

3

the time the channel resource is wasted when transmitting P or B pictures. On the other

hand, if we allocate the channel according to the bit rate for P pictures or B pictures, we

will be unable to transmit complete I pictures, which are the most important pictures for

the video decoding process.

One possible solution based on direct intuition might be the repeated acquiring and

releasing of channels to cater for I and P pictures. But the current channel allocation

protocol is based on a contention scheme, which can introduce unacceptable delay for

real time video applications. Obviously some modification to the current channel

allocation scheme is needed.

1.2.2 Effective error control schemes

A typical wireless channel has limited bandwidth and is error-prone and very unreliable

because of fading, noise, delay spread and interference from other users by band sharing

[36]. These limitations present a hostile communication environment especially for

video applications. The quality of service (QoS) requirement for video is quite different

from those for voice and data services. Compared to voice, video transmissions require

more reliable channels with low end-to-end bit error rate, which should be less than 10-5

[1]. The transmission data rate for video even after compression is much higher than

that for voice. Unlike data services, real-time video must ensure a small bounded

delay, which should be less than 300ms [1]. All these requirements call for a proper

error control mechanism.

The design of an appropriate error control scheme is based on several considerations.

First the objective of the error control scheme is to improve the end-to-end bit error rate

as much as possible. However we should note that the capability of the error control

scheme strongly depends on the channel characteristics and error patterns. Second, the

overhead imposed by the error control scheme should be made as low as possible to

increase the system throughput. The delay incurred by the error control scheme should

be as small as possible especially for real-time services. Finally the complexity of the

error control scheme should be simple to minimize the design and implementation cost.

Traditionally, error control is mainly implemented at the data link layer of a network

where forward error correction (FEC) [2] schemes and automatic repeat request (ARQ)

4

[2] techniques are employed to combat the bit errors existing in the communication

channels, these techniques have achieved great success for data services and speech

communication services in either wired or wireless environments.

FEC uses error correction codes for reliable data transmission. With FEC alone, a

communication system has constant throughput and a low bounded delay, which are

quite important for real-time services. For a non-stationary wireless fading channel, the

most serious disadvantage of FEC is that it is stationary. It can be designed and

implemented for the worst case of channel conditions. Obviously this is inefficient, as

when the channel condition is good, the FEC designed for the worst channel condition

can be a great waste of the channel resource.

In order to improve FEC, popular layered source coding and unequal error protection

schemes have been developed. In these schemes, a compressed video sequence is

divided into layers with different priorities according to the importance. The higher

priority layer has more error protection while the lower priority layer has less error

protection [37]. This method certainly improves the efficiency of the error correction

schemes. However since a compressed video bitstream has a variable length structure,

extra overhead is needed for layering to indicate which part of the bitstream it is coming

from. This overhead may not be much in high bit-rate video such as MPEG-1 [13] or

MPEG-2 [21]. But it occupies a large portion in a low-bit-rate video bitstream, which is

not efficient [37].

On the other hand, ARQ techniques require retransmission of packets, which are

detected as having errors. ARQ can achieve high system reliability, but it reduces

throughput and causes long and variable delay because of the retransmissions. A better

method is to combine FEC and ARQ together, which is the so-called hybrid ARQ

schemes. Hybrid ARQ is reliable, efficient and adaptive and offers much better

performance, especially for time-varying fading channels [38]. However, for real-time

video services, a hybrid ARQ still has its limitations.

Theoretically FEC can correct most of the errors if the error correction code is properly

designed; and if FEC fails, ARQ can always being employed to ensure a correct data

delivery. However for real-time video applications, the number of packet

5

retransmissions is limited, therefore the power of ARQ is limited compared with data

applications where the delay requirement is not so strict. Moreover, in most of the

telecommunication systems the FEC schemes supported by the system are limited. For

instance, in GSM [3,4] or GPRS [5,6] networks only four channel coding schemes are

supported. It is unavoidable that the transmission system in a telecommunication

network will leave some bits in error in a final video bitstream delivered to the

application layer. This leads to a strong demand that the source data should have some

kind of error resilient features.

1.3 Challenges on Source Video Coding

The state of the art of video compression is to use DCT [14] (Discrete Cosine

Transform) and motion compensation [15] to exploit spatial and temporal redundancy.

With the employment of variable length Huffman coding technique [16], the coding

efficiency is further improved. While these techniques do achieve high coding

efficiency, they leave a compressed video bitstream very vulnerable to errors inherent to

mobile transmission channels. Because of the use of VLC (Variable Length Code) and

the fact that the only synchronization point in the encoded video bitstream is PSC

(Picture Start Code) if no packetization is employed, a single residual error bit can cause

the video decoder to lose synchronization with the encoder until the next encoded video

frame. In such cases, the rest of the bitstream for the frame with the error will be

undecodable. Here the residual error refers to the errors delivered to the source data at

the application layer by the transmission system of a network after first error control

takes place in data link layer of the network. Due to motion compensation, the effects

of an error within an encoded video frame can propagate to the following frames until

the next Intra frame is encountered in the video bitstream. It is not difficult to see how

much degradation a single error bit can cause.

Now we can see what the challenge is to transmit an extremely vulnerable encoded

video bitstream over an extremely hostile mobile environment. Both networking and

encoded video bitstreams call for some kind of error resilience features to realize video

transmission over mobile network.

6

1.4 State of the Art of the Current Error Resilience Tools

To cope with the requirements of error resilient video coding, diverse error resilience

techniques have been developed [11,12]. Some of them have been incorporated into

MPEG-4 [7,8] and H.263 [9,10] video coding standards. However, all these error

resilience video coding tools are passive in the sense that they do not have the capability

to correct the error bits in a video bitstream. What they can do is to limit the effect and

influence of the errors to a certain degree. For instance, packetization (which has been

adapted by both H.263 and MPEG-4 video coding standards) simply puts some

resynchronization markers in an encoded video bitstream, letting the decoder regain

synchronization when an error occurs in the bitstream by looking for these markers,

thereby limiting the error effects to the packet where the error occurs. Obviously, these

passive error resilience techniques are not satisfactory when employed in mobile

environments. Even when errors are limited to one packet or several packets, the

information contained in the packet or packets will still have to be discarded and this

discarded information is unrecoverable. With inter-frame error propagation effects due

to the employment of motion estimation and motion compensation, the quality of

subsequent decoded video frames rapidly declines to an unrecognizable level if no

proper measures are taken. The conclusion we reach from observations and practice is

that passive error resilience techniques are not enough. Some active error resilience

techniques, which will have the capability to correct the errors in the bitstream before

video decoding, need to be developed.

1.5 Second Error Control and ECC video

As stated in last section, the main disadvantage with current error resilience techniques

is that they do not correct the errors in the video bitstream; instead they try to reduce the

error effect to a certain degree by using some kind of error concealment technique.

Instead of accepting the residual errors passively and trying to ‘repair’ the error effects

the residual errors cause, if we take the approach of applying a ‘Second Error Control’

(SEC) on an encoded video bitstream to recover the corrupted video data by correcting

7

the errors in the bitstream actively (to cope with the residual errors in the source data at

application layer after the ‘First Error Control’ in data link layer takes effect to cope

with the original error bits in the transmission channel), a much more satisfactory result

can be expected. Here First Error Control is achieved by FEC (Forward Error

Correction) and the associated ARQ technique as has already been described.

Obviously employing ARQ in SEC is not realistic in addition to first error control,

which probably has used up all the time limit allowed for retransmission with ARQ: the

better choice is to use FEC. As it is known that even compressed video data still

imposes a large data rate for any transmission system, the requirement on the SEC is

very demanding as the allowed data rate for the overhead caused by SEC is limited. In

this research we use punctured convolutional code to realize SEC, which achieves a

very high coding rate and still has very strong error correction capability. Simulation

results from this research have shown the potential success of the proposed SEC scheme

in applications involving video transmission in mobile environments. When applied to

video transmission, it is simpler, more easily implemented, more efficient and more

effective compared with the current error resilient video coding tools in the MPEG-4

and H.263 video coding standards.

1.6 Organization of the Thesis

Following the present introductory chapter, Chapter 2 presents a brief overview of GSM

networks, which lays the foundation for discussion on possibilities and scenarios for

live video transmission over GSM based networks. In Chapter 3 the potential and

possibility for video transmission over GPRS networks are explored, and some

proposals and suggestions are given to overcome the disadvantages inherent in GPRS

networks for live video communications.

Chapter 4 provides an extensive overview of the state of the art of video coding and

error resilient video coding techniques. The low bit rate video coding techniques are

described first; the introduction of the current video coding standards follows. Though

not widely employed in practice, because of the extremely high coding efficiency and

also because of having been incorporated in the MPEG-4 video coding standard, an

introduction of model based video coding techniques [17] is also included in this

8

chapter. As they are used in all current video coding standards, motion estimation and

motion compensation techniques occupy a large portion of this chapter. Another

important aspect, DCT (discrete cosine transform), is also described in detail. However,

the main emphasis of the chapter has been given to error resilient video coding

techniques. First, the general error resilience video coding techniques are outlined, and

then the error resilient video coding tools in the H.263 and MPEG-4 standard are

described in detail.

In Chapter 5, the current error correction coding techniques are reviewed. The chapter

starts with a brief introduction about block codes [18], followed by the description of

convolutional codes [19]. The focus has been put on the convolutional coding

techniques because this is the error correction technique which is used in the proposed

error resilient video coding technique. More specifically, the basic encoding structure

of a convolutional code, the maximum likelihood decoding algorithm (Viterbi

algorithm) and the performance of convolutional code are described in detail. The core

techniques in the proposed scheme, applied for source coding for error resilience at the

application layer, are punctured convolutional codes [20] which form an important

subclass of convolutional codes. These are described in length, followed by the

discussion about their attractiveness and flexibility.

In Chapter 6, the passiveness and the disadvantages of the current error resilience

techniques in the standards and the need to have some active error resilience tools are

identified. One possible solution, the SEC approach, is given to overcome the problems

inherent with current error resilience tools in the current video coding standards. The

ECC (Error Correction Coding) scheme as an implementation of the SEC approach is

described in detail in this chapter. The ECC scheme in the proposal is achieved with

punctured convolutional code. The simulation results, which prove the success of SEC,

are given followed with some discussions.

Though the proposed new scheme shows very high performance, it does have its own

drawback. Because the only synchronization point in an encoded bitstream is the

picture start code [8], if no packetization is employed, any single error bit which

escapes the protection of the ECC scheme will cause the decoder to lose

synchronization with the encoder until the start of the next frame (though it is very rare

that errors escape the protection with ECC if the ECC rate matches the residual error

9

conditions). To solve this problem, a new scheme is proposed in chapter 7 using back

channel messages. The new scheme is named as IFR (Intra Frame Relay), i.e. when an

Intra frame is decoded with errors, the corresponding area in the following P frame is

encoded in Intra mode to increase the possibility that the following picture frames can

have decent reference frames. The simulation results in this chapter have given a

positive support to the IFR scheme.

In Chapter 8, the ECC approach is further enhanced using the soft decision Viterbi

decoding algorithm to decode the punctured convolutional code. Simulation results

show that based on 100 tests ECC with coding rate 7/8 correct all the residual errors

when BER of the residual errors is less than 10-3. More significantly, decent video

communication can be realized in such a poor residual error condition that the BER (bit

error rate) of the residual errors reaches 10-2 when the IFR is employed. In Chapter 6 it

has been shown that packetization approach fails to deliver a satisfactory output when

the BER of the final video bitstream reaches 10-4.

In Chapter 9, the ECC scheme is expanded into more challenging situations, where

bursty error and packet loss occur frequently. To combat these aspects, the final video

bitstream (after convolutional coding is performed) is interleaved before being sent to

the channel [28,29]. Simulation results show that ECC with interleaving is an effective

approach to cope with both bursty residual errors and packet loss.

The possibility of combining the advantages of ECC and packetization is explored in

Chapter 10. The conclusion from simulation is that Packetization can increase the

performance of ECC when a back channel is not available in practice, even though it is

much less effective and less efficient than ECC scheme when it is used alone. The thesis

is concluded in Chapter 11 with suggested future research directions.

1.7 Contributions and Publications from the Research

The first research contribution of this work is the proposal of a method to update the

channel capacity to accommodate the different data rate requirement of the different

frame types of the compressed video bitstream during a live video communication. The

reconfiguration of the multi-slot to have more than the set of the current active channels

10

should be achieved by means of the communication between the MS (Mobile Station)

and the BSS (Base Station Subsystem), rather than by means of the re-access of

PRACH (Packet Random Access Channel) during the real time transmission, which

would involve further contention and introduce delays. The content of the

communication should be imbedded into the video data transmitted from MS to BSS

(see Chapter 2 and Chapter 3 for details).

The most important contribution of the work included in this thesis is the introduction

of SEC scheme and three error resilience tools based on the SEC approach. In the

proposed application of SEC to video transmission in mobile environment, the SEC

achieved with ECC has outperformed the error resilience tools represented by

resynchronization in the MPEG-4 standard and opened a new direction of development

of error resilience techniques. It is expected that these proposals might be considered

for incorporation in future video coding standards, to augment the error resilience tools

in the current standards.

The research included in this thesis has resulted in a number of publications, which are

listed as References [22-33]. The key publications and the further contributions

contained in them are outlined in the following paragraphs.

[22] explores the possibility of video transmission over a GSM network. The

introduction of HSCSD and GPRS in GSM network makes it possible to transmit a

video bitstream encoded with H.261 or H.263 video coding standards through GSM

network. However, to make the application realistic, a compromise is needed between

wide variations in bit rate needed to cater for all picture types, modeled by an

unbounded VBR scheme, and the inflexibility imposed by the network in allowing only

quantum channel allocation, modeled by step-function bit-rate performance variations

or variable constant-bit-rate (VCBR). The resolution of this dilemma relies on better

system integration and interoperation between the network behavior and the video

coding process, by extracting useful bit-rate information over many successive frames

and exerting careful intelligent control throughout the transmission.

[23] depicts all the aspects associated with video communication over mobile networks

for medical applications and identifies the conceptual and operational problems with

these applications, and then gives some suggestions to solve these problems.

11

[24] is based on the previous work [22]. The advantage in using GPRS other than

HSCSD to transmit low bit rate video bitstream is identified. An elementary dynamic

radio channel allocation scheme is proposed, based on the coordination between base

station and mobile station, and some typical more complex situations are described

which would need to be further developed.

The ECC approach first appeared in [27,30]. The simulation results using hard decision

Viterbi decoding algorithm in the paper give the first challenge to the traditional

approaches in the standard in term of coding efficiency and reconstructed video output

quality.

The IFR scheme to cope with the disadvantage of ECC approach is proposed in [31,32]

with the support of simulation results. The ECC scheme is further improved in [34]

using the soft-decision Viterbi decoding algorithm for convolutional decoding.

The ECC scheme is expanded into bursty error and packet loss situations in [28,29],

where the video bitstream generated after ECC is interleaved before being sent to a

radio channel. Simulation results show that ECC enhanced with interleaving is

effective to cope with both bursty errors and packet loss.

The ECC approach is generalized into second error control for error resilience in [33,

35]. The capability of ECC enhanced with IFR realizes video communication in

residual error conditions where the BER of the residual error falls to 10-2 is also

demonstrated in these papers

References

[1] Jari Hamalainen, “Design of GSM High Speed Data Services” Ph.D. Thesis, Nokia

Mobile Phones Ltd, Tampere, Finland, 1996.

[2] S. Lin and D. J. Costello, “Error Control Coding: Fundamentals and Applications”,

Prentice-Hall, Inc. 1983.

[3] Asha Mehrotra, “GSM System Engineering”, Artech House Publishers, 1997.

12

[4] Siegmund M. Redl, Matthias K. Weber and Malcolm W. Oliphant, “An

Introduction to GSM”, Artech House Publishers, 1995.

[5] J.Hamalainen, “General Packet Radio Service”, in Z. Zvonar, P. Jung and

K.Kammerlander, “GSM evolution towards 3rd generation systems”, Kluwer

Academic publishers, 1999, pp.65-80.

[6] J.Cai and D.J.Goodman, “General packet Radio Service in GSM”, IEEE

Communications Magazine, October 1997, pp122-131.

[7] R. Talluri, “Error-Resilienct Video Coding in the ISO MPEG-4 Standard”, IEE

Communications Magazine, June 1998, pp.112-119.

[8] ISO/IEC 14496-2, “Information Technology – Coding of Audio-Visual Objects:

Visual”.

[9] ITU – T Recommendation H.263, “Video coding for low bit rate communication”,

February 1998.

[10] J.Ott, Stephan Wenger and Gerd Knorr, “Application of H.263+ Video Coding

Modes in Lossy Packet network Environments”, Journal of Visual Communication

and Image Representation 10, 1999, pp.12-38.

[11] Y. Wang and Q. Zhu, “Error Control and Concealment for Video Communication:

A Review”, Proceedings of the IEEE, Vol. 86, No. 5, May 1998. pp.974 – 997.

[12] Y. Wang, S. Wenger, J. Wen and A. Katsaggelos, “Error Resilient Video Coding

Techniques – Real-time Video Communications over Unreliable Networks”, IEEE

Signal Processing Magazine, July 2000. pp.61-82.

[13] ISO/IEC 11172-2, “Information technology - coding of moving picture and

associated audio for digital storage media at up to about 1.5 mbit/s: Part 2 video”,

Aug. 1993.

[14] N. Ahmed, T. Natarajan and R. K. Rao, “Discrete Cosine transform”, IEEE Trans.

On Computers, 1974, pp.90-93.

[15] B. Furht, J. Greenberg, R. Westwater, “Motion Estimation Algorithms for Video

Compressioin”, Kluwer Academic Publishers, November 1996.

[16] D. Salomon, “Data Compression”, Springer Verlag, December 1997.

13

[17] L.Torres and M. Kunt, “Second generation video coding techniques”, in L.Torres

and M.Kunt, “Video coding, the second generation approach”, Kluwer Academic

Publishers, 1996. pp.1-31.

[18] J. G. Proakis, “Digital Communication”, McGraw – Hill, 1995.

[19] R. Johannesson and K. Sh. Zigangirov, “Fundamentals of Convolutional Coding”,

IEEE Press, 1999.

[20] Y. Yasuda, K. Kashiki and Y. Hirata, “High-Rate Punctured Convolutional Codes

for Soft Decision Viterbi Decoding”, IEEE Trans. on Comm., Vol. Com-32, No. 3,

March 1984. pp. 315-319.

[21] ISO/IEC: 13818 (MPEG-2). “Information technology – Generic Coding of Moving

Pictures and Associated Audio Information”.

[22] Bing Du and Anthony Maeder, “Approaches to Video Transmission over GSM

Networks”, Proceedings of SAICSIT 99, South Africa, pp. 28-31.

[23] Bing Du, Anthony Maeder and Miles Moody, “Televideo Transmission over

Mobile Channels for Medical Applications”, ARC Special Research Workshop on

Aspects of Telemedicine, Gold Coast, Australia, 24 October 1999.

[24] Bing Du, A. Maeder and M. Moody, “A framework for live video delivery over

GPRS networks”, Proceedings of AMOC 2000, November 2000, Penang,

Malaysia, pp. 97-101.

[25] Bing Du, A. Maeder and M. Moody, “Video delivery over mobile communication

channels”, CRC-SS annual conference, Adelaide, Australia, 2000.

[26] Bing Du, A. Maeder and M. Moody, “Dynamic hybrid ARQ scheme for video over

GPRS network”, CRC-SS annual conference, Newcastle, Australia, 2001.

[27] Bing Du, M. Ghanbari, “MPEG-4 Video with Error Correction Coding”, Internal

technical report, University of Essex, June 2002.

[28] Bing Du, M. Ghanbari, “ECC video and its performance in bursty channel errors”,

Proceedings of Iranian Conference on Electrical Engineering (ICEE) 2003, 6-8

May, 2003, Shiraz, Iran.

14

[29] Bing Du, M. Ghanbari, “ECC video in bursty channel errors and packet loss”,

Proceedings of Picture Coding Symposium 2003, Saint-Malo, France, 23 - 25 April

2003. pp.99-103.

[30] Bing Du, Anthony Maeder and Miles Moody, “A new approach for error resilient

in video transmission using ECC”, Proceedings of International Workshop on Very

Low Bit-rate Video, Madrid, Spain, 18-19 September 2003, pp.275-282.

[31] Bing Du, A. Maeder and M. Moody, “Intra Frame Relay in ECC video”, Image and

Vision Computing (IVCNZ) 2003, Palmerston North, New Zealand, 24-25

November 2003, pp193-198.

[32] Bing Du, Anthony Maeder and Miles Moody, “ECC video with Intra Frame

Relay”, Proceedings of the IADIS International Conference WWW/Internet 2003,

ICWI 2003, Algarve, Portugal, November 5-8, 2003. IADIS 2003, ISBN 972-

98947-1-X, pp.1007 - 1012.

[33] Bing Du, A. Maeder and M. Moody, “Second Error Control for Error Resilience

Video Coding”, Proceedings of Digital Image Computing - Techniques and

Applications (DICTA) Conference, Sydney, Australia, 10-12 December 2003,

pp.1027-1036.

[34] Bing Du, Anthony Maeder and Miles Moody, “ECC Approach with Soft-Decision

Viterbi Decoding for Error Resilience in Video Communications”, submitted to

27th Australasian Computer Science Conference (ACSC 2004), Dunedin, New

Zealand, 18-22 January 2004.

[35] Bing Du, Anthony Maeder and Miles Moody, “Second Error Control for Live

Video Communication”, submitted to IEEE Transactions on Circuits and Systems

for Video Technology.

[36] W. C. Jakes, “Microwave mobile communications”, May 1994, Wiley-IEEE Press.

[37] S. R. McCanne, “Scalable compression and transmission of Internet multicast

video”, Phd thesis, the University of California, Berkeley, CA, December 1996.

[38] Q. Zhang, S. Kassam, “Hybrid ARQ with Selective Combining for Fading

Channels” IEEE Journal on Selected Areas in Comm. Vol. 17 Num. 5, May 1999.

15

2 OVERVIEW of GSM SYSTEM

2.1 Architecture and functions of the GSM network

The GSM network can be divided into four main parts [1,2,3,4]:

• The Mobile Station (MS).

• The Base Station Subsystem (BSS).

• The Network and Switching Subsystem (NSS).

• The Operation and Support Subsystem (OSS).

The architecture of the GSM network is presented in Figure 2.1.

Figure 2.1 General architecture of a GSM network

16

2.1.1 Mobile station

The mobile station (MS) consists of the mobile equipment (the terminal) and a smart

card called the Subscriber Identity Module (SIM). The SIM provides personal mobility,

so that the user can have access to subscribed services irrespective of a specific

terminal. By inserting the SIM card into another GSM terminal, the user is able to

receive calls at that terminal, make calls from that terminal, and receive other

subscribed services. The SIM card may be protected against unauthorized use by a

password or personal identity number. The mobile equipment is uniquely identified by

the International Mobile Equipment Identity (IMEI).

The SIM card contains the following information:

• IMSI, the International Mobile Subscriber Identity used to identify the subscriber to

the system. IMSI numbers are independent, thereby allowing personal mobility.

• TMSI, temporary mobile station identity, used together with LAI to identify the MS

during the time it is served by the VLR that covers the location area.

• LAI, location area identity.

• Ki, a permanent key for authentication.

• Kc, a cipher key.

2.1.2 The Base Station Subsystem

All radio-related functions are performed in the base station system (BSS). The BSS

consists of base station controllers (BSCs) and the base transceiver stations (BTSs).

2.1.2.1 The Base Transceiver Station

The base transceiver station (BTS) handles the radio interface to the mobile station. The

BTS is the radio equipment (transceivers and antennas) needed to service each cell in

the network. A group of BTSs are controlled by a BSC.

2.1.2.2 The Base Station Controller

The base station controller (BSC) provides all the control functions and physical links

between the MSC and BTS. It is a high-capacity switch that provides functions such as

17

handover, cell configuration data, and control of radio frequency power levels in base

transceiver stations. A number of BSCs are served by an MSC.

2.1.3 The Network and Switching Subsystem (NSS)

The main role of the NSS is to manage the communications between the mobile users

and other users, such as mobile users, ISDN users, fixed telephony users, etc. It also

includes data bases needed in order to store information about the subscribers and to

manage their mobility. The different components of the NSS are described below.

2.1.3.1 The Mobile services Switching Center (MSC)

The MSC is the central component of the NSS. It performs the telephony switching

functions of the system and controls calls to and from other telephone and data systems.

It also performs such functions as toll ticketing, network interfacing, common channel

signaling, and others.

2.1.3.2 The Gateway Mobile services Switching Center (GMSC)

A gateway is a node interconnecting two networks. The GMSC is the interface between

the mobile cellular network and the PSTN. It is in charge of routing calls from the fixed

network towards a GSM user. The GMSC is often implemented in the same machines

as the MSC.

2.1.3.3 Home Location Register (HLR)

The HLR is a database used for storage and management of subscriptions. It is

considered the most important database since it stores permanent data on subscribers,

including a subscriber’s service profile, location information, and activity status. When

an individual buys a subscription from one of the PCS operators, he or she is registered

in the HLR of that operator.

The HLR contains the following information:

• MSISDN of MS, the mobile station’s ISDN number which is dialled by a subscriber

when calling the MS. It is used by the fixed network to route calls for MS to a

nearby gateway MSC in the home PLMN of MS.

• IMSI of MS.

18

• Originating and terminating service profile of MS.

• Address of the VLR associated with MSC that is currently serving the MS.

2.1.3.4 Visitor Location Register (VLR)

The VLR is a database that contains temporary information about subscribers that is

needed by the MSC in order to service visiting subscribers. The VLR is always

integrated with the MSC. When a mobile station roams into a new MSC area, the VLR

connected to that MSC will request data about the mobile station from the HLR. Later,

if the mobile station makes a call, the VLR will have the information needed for call

set-up without having to interrogate the HLR each time.

The VLR stores the following information:

• MSISDN.

• Originating and terminating service profile of MS.

• IMSI.

• TMSI.

• LAC, the local area code of the current MS location area.

• MSRN, the mobile station roaming number (which is equivalent of the temporary

location directory number).

2.1.3.5 The Authentication Center (AuC)

The AuC provides authentication and encryption parameters that verify the user’s

identity and ensure the confidentiality of each call. The AuC protects network operators

from different types of fraud found in today’s cellular world. The authentication

procedure involves the SIM card and the AuC. A secret key, Ki, stored in the SIM card

and the AuC, and a ciphering algorithm called A3 are used in order to verify the

authenticity of the user. The mobile station and the AuC compute a SRES (signed

result) using the secret key, the algorithm A3 and a random number generated by the

AuC. If the two computed SRES are the same, the subscriber is authenticated. The

different services to which the subscriber has access are also checked. Another security

procedure is to check the equipment identity. If the IMEI number of the mobile is

authorized in the EIR, the mobile station is allowed to connect the network. In order to

assure user confidentiality, the user is registered with a Temporary Mobile Subscriber

19

Identity (TMSI) after its first location update procedure. Enciphering is another option

using Kc to guarantee a very strong security.

The information stored in AuC is:

• MSISDN.

• Ki.

• Kc.

2.1.3.6 The Equipment Identity Register (EIR)

The EIR is also used for security purposes. It is a register containing information about

the mobile equipment. More particularly, it contains a list of all valid terminals. A

terminal is identified by its International Mobile Equipment Identity (IMEI). The EIR

allows then to forbid calls from stolen or unauthorized terminals (e.g, a terminal which

does not respect the specifications concerning the output RF power). The AuC and EIR

are implemented as stand-alone nodes or as a combined AuC/EIR node.

2.1.3.7 The GSM Interworking Unit (GIWU)

The GIWU corresponds to an interface to various networks for data communications.

During these communications, the transmission of speech and data can be alternated.

2.1.4 The Operation and Support Subsystem (OSS)

The OSS is connected to the different components of the NSS and to the BSC, in order

to control and monitor the GSM system. It is also in charge of controlling the traffic

load of the BSS. However, the increasing number of base stations, due to the

development of cellular radio networks, has provoked that some of the maintenance

tasks are transfered to the BTS. This transfer decreases considerably the costs of the

maintenance of the system.

20

2.1.5 Additional Functional Elements

2.1.5.1 Message Center

The message center (MXE) is a node that provides integrated voice, fax, and data

messaging. Specifically, the MXE handles short message service, cell broadcast, voice

mail, fax mail, email, and notification.

2.1.5.2 Mobile Service Node

The mobile service node (MSN) is the node that handles the mobile intelligent network

(IN) services.

2.1.6 The geographical areas of the GSM network

The Figure 2.2 presents the different areas that form a GSM network in general. A cell,

identified by its Cell Global Identity number (CGI), corresponds to the radio coverage

of a base transceiver station. A Location Area (LA), identified by its Location Area

Identity (LAI) number, is a group of cells served by a single MSC/VLR. A group of

location areas under the control of the same MSC/VLR defines the MSC/VLR area. A

Public Land Mobile Network (PLMN) is the area served by one network operator.

Figure 2.2 GSM network areas

21

2.2 Signalling system in GSM

2.2.1 GSM Radio Channels

2.2.1.1 Dedicated Channels

Dedicated Channel includes TCH (traffic channels) and DCCH (dedicated control

channels). DCCH, used for the message transfers between the network and the mobile

station, includes SDCCH (standalone dedicated sontrol channel), SACCH (slow

associated control channel) and FACCH (fast associated control channel). A BSS has a

pool of SDCCHs and a pool of TACHs (traffic and associated control channel). The

details of these channels are described below.

• SDCCH is allocated to a MS for call set-up signalling and released when this

signalling is complete.

• TCH is used for carrying speech and data.

• SACCH is always used in association with either a traffic channel or SDCCH. The

purpose of the SACCH is channel maintenance. The SACCH carries control and

measurement parameters or routing data needed to maintain a link between the

mobile and the base station.

• FACCH is associated to a TCH. It can carry the same information as the SDCCH.

The difference is that SDCCH exists on its own, whereas the FACCH replaces all or

part of a traffic channel. If during a call there is need for some heavy-duty

signalling with system at a rate much higher than the SACCH can handle, then

FACCH appears in the place of the traffic channel.

• TACH is the combination of a TCH and its SACCH as well as FACCH.

2.2.1.2 Common Control Channels (CCCH)

One RF carrier in each cell contains a CCCH which is time-divided into a number of

common (point-to-multipoint, unidirectional) channels, for signalling between a BSS

and all mobiles in the cell that are active, but not involved in a call:

• BCCH (Broadcast Control Channel) informs the mobile station about specific

system parameters it needs to identify the network or to gain access to the network.

22

The parameters are among others, the LAC (location area code), the MNC (mobile

network code identifying a GSM-network within a country), the information on

which frequencies the neighbouring cells may be found, different cell options and

access parameters.

• FCCH (frequency control channel) contains information for mobiles concerning

frequency synchronisation with the RF carrier.

• SCH (synchronisation channel) supplies with mobile station with information

enabling the mobiles to acquire frame and time synchronisation with the BSS.

• PAGCH (paging and access grant channel) broadcasts paging messages. Also when

the network (MSC) allocates a SDCCH to a MS, it informs the MS with a message

on PAGCH.

• RACH (random access channel), the only up link (from MS to NSS) common

control channel, is used by mobiles to request a SDCCH from the network.

2.2.2 Signalling Interfaces and Protocols

The interfaces and protocols for signalling between a MS and PLMN are shown in

Fig.2.3. Um (radio) interface is between a MS and the BSS, the A (cable) interface is

between BSS and MSC. Another interface A-bis between BTS and BSC is shown is

Fig.2.4. The GSM-MAP (mobile application part) interfaces between the equipment

entities of network and switching system are shown in Fig.2.5.

Figure 2.3 Um and A interface

23

2.2.2.1 Um Interface

The signalling protocol on this interface has three layers.

• Physical Layer (Layer 1) consists of those parts of the RF channels that contain

signalling channels (SACCH, FACCH, BCCH, SCH, FCCH, PAGCH, RACH and

SDCCH).

• Data Link Layer (Layer 2) [8], know as LAPDm, is a modified version of the ISDN

[5] link access protocol for D-channels.

• Message Layer (Layer 3). In MS this layer consists of three parts:

RR (radio resource management) sublayer at a MS communicates with its peer

in the BSS. For example, when RR at BSS allocates a TACH or a SDCCH

channel to MS, it informs the MS with a RR message.

MM (mobility management) sublayer messages support MS location updating

and authentication.

CM sublayer which has a further three parts:

(a) CC (call control) contains the messages for the set-up and release of

connections to the MS.

(b) SS (supplementary services) concerns the management of the

supplementary services. MS and HLR are the only entities involved in SS

management.

(c) SMS (short message service) is a service by which subscribers can send

short (text) messages to a MS.

2.2.2.2 A Interface

The signalling protocol on this interface is adapted from signalling system 7. The MTP

(message transfer part) shown in Fig.2.3 actually comprises 3 sublayers - MTP1, MTP2

and MTP3. MTP1 and MTP2 correspond respectively to Physical Layer and Data Link

Layer of OSI model. The MTP3 and SCCP (signalling connection control part) fulfil

the function of Network Layer of OSI. The user of the SCCP, BSSAP (BSS application

part) which comprises DTAP (direct transfer application part) and BSSMAP (BS

system management application part), actually pass only the RR and O&M (operations

and maintenance) messages. DTAP is used by BSS to transfer RR message

24

transparently between MS and MSC. BSSMAP is the process within BSS that controls

RR in response to instructions from MSC which is used in the assignment and switching

of RR at both call setup and handover.

2.2.2.3 A-bis Interface

As a number of BTS can be served by one BSC as shown in Fig. 2.4, there is a need for

communication between BTS and BSC.

A-bis A-bis A-bis

Figure 2.4 A-bis interface

The protocol used at this interface contains three layers: physical layer (layer 1),

signalling links (layer2), and an upper layer (layer 3) of signalling.

The physical layer either transmits at 2.048 kbps or at 64 kbps. Four coded

speeches at 13 kbps may be multiplexed to form a 64 kbps data channel after being

padded with extra bits.

Layer 2 uses the standard LAPD. The main distinction between LAPD and LAPDm

is that LAPDm is only used for the unacknowledged mode of operation, which

MSC

BSC

TRX BCF

TRX TRX TRX

TR

TR

TR

BC

25

applies to BCCHs and CCCHs. Both FCCH and SCH under BCCH do not require

acknowledgment. Similarly, no acknowledgment is needed for PCH and AGCH.

Layer 3 deals with the messages transferring from OMC to BTS as there is no direct

link between BTS and OMC. All the messages from OMC go first to BSC and then

are routed to BTS.

2.2.2.4 MAP Interfaces

The interfaces between different network parts are shown in Fig.2.5. These interfaces

are designated as MPA/B through MAP/H. The SS7 protocol [6,7] is used at all these

interfaces. The MAP protocol is used as a remote data base access performed by

exchange of messages that are grouped into simple dialogues, mostly in the form of

query and response.

MAP/F BSSMAP MAP/I MAP/D

MAP/E MAP/G MAP/C MAP/C

MAP/H

Figure 2.5 MAP interfaces

BSS

MAP/B

MSC

VLR

MAP/B

MSC

VLR

SMS Gatewa

HLR

GMSC

EIR

26

2.2.2.5 X.25 Interface System

The communication between MSC and OMC including billing centre information is

accomplished by deploying X.25 protocol [9].

2.3 The Multiple Access Scheme

The radio interface of GSM uses a combination of FDMA (Frequency Division

Multiple Access) and TDMA (Time Division Multiple Access) with some frequency

hopping.

2.3.1 FDMA

The use of frequency resources in the GSM development has followed three stages:

Primary GSM, E-GSM and DCS-1800.

2.3.1.1 Primary GSM

The Primary GSM system refers to the first generation of GMS systems in which two

25MHz frequency bands in the 900MHz range are used. The mobile station transmits in

the 890 to 915MHz frequency range while the base station transmits in the 935 to

960MHz range. The frequency bands are divided into 125 channels with widths of 200

kHz each. These channels are numbered from 0 to 124. However only 124 channels

from number 1 to 124 are used, and these channels are usually referred as ARFCN

(absolute radio frequency channel number). The band number 0 is used as a guard band

between GSM and other services on lower frequencies.

2.3.1.2 E-GSM

With the further development of the GSM standard, an additional range of frequencies

has been made available to the system. For each of the two duplex frequency ranges,

one for the forward direction and the other for the reverse direction, an additional 10

MHz has been added to the bottom end of the bands, extending the frequency range to

cover another 50 channels. The numbering for these additional channels is from 974 to

27

1023. Channel 0 is returned to use in the extended GSM system. Instead the lowest

channel (number 974) serves as the guard band.

2.3.1.3 DCS-1800

As the evolution of GSM progressed towards use as a personal communication network,

the official name of this system became DCS-1800 when ETSI finally completed the

specification of this system. In DCS-1800 the frequency ranges of 1710 to 1785 MHz in

the uplink direction and 1805 to 1880 MHz in the downlink are used, and the duplex

spacing is 75 MHz with 374 channels of 200 KHz each.

2.3.2 TDMA

As stated above, each carrier frequency has a width of 200KHz. The TDMA scheme

splits each frame of about 4.615 ms of this carrier of into 8 timeslots of about 0.577ms

each. Each of these timeslots is a physical channel occupied by an individual user. The

timeslot within a frame is numbered from 0 to 7. In traffic channel combinations, a

structure of 26 frames is defined as multiframe. Similarly in signalling channel

combinations, mustiframe is defined as the combination of 51 frames.

2.3.2.1 Traffic channel Frame Structure (26-Multiframe)

The traffic channel frame structure is show in Fig.2.6. The length of a 26-frame

multiframe is 120 ms, which is how the length of a burst period is defined (120 ms

divided by 26 frames divided by 8 burst periods per frame). Out of the 26 frames, 24 are

used for traffic, 1 is used for the Slow Associated Control Channel (SACCH) and 1 is

currently unused. TCHs for the uplink and downlink are separated in time by 3 burst

periods, so that the mobile station does not have to transmit and receive simultaneously,

thus simplifying the electronics.

In addition to these full-rate TCHs, there are also half-rate TCHs defined, although they

are not yet implemented. Half-rate TCHs will effectively double the capacity of a

system once half-rate speech coders are specified (i.e., speech coding at around 7 kbps,

instead of 13 kbps).

28

Figure 2.6 Traffic channel frame structure

2.3.2.2 Signalling Frame Structure

Just like the TCHs are always combined with ACCH in traffic channel multiframes, the

signalling channels are always grouped together to form signalling multiframes. There

are 4 different combinations listed as below:

FCCH + SCH + CCCH + BCCH.

FCCH + SCH + CCCH + BCCH + SDCCH/4 + SACCH/4.

CCCH + BCCH.

SDCCH/8 + SACCH/8.

The different combination have different multiframe structure, for which details can be

found in [1].

2.3.2.3 Structure of a TDMA Slot within a Frame

There are five different types of bursts (the contents of the timeslot) used to carry

information on the TCH and on the control channels: normal burst, synchronisation

burst, frequency correlation burst, access burst and dummy burst.

The normal burst is used to carry data and most signalling. It has a total length of

156.25 bits, made up of two 57 bit information sequences, a 26 bit training sequence

used for equalization, 1 stealing bit for each information block (used for FACCH), 3 tail

29

bits at each end, and an 8.25 bit guard sequence, as shown in Figure 6. The 156.25 bits

are transmitted in 0.577 ms, giving a gross bit rate of 270.833 kbps.

The F burst used on the FCCH and the S burst used on the SCH have the same length as

a normal burst, but a different internal structure which differentiates them from normal

bursts (thus allowing synchronization). The access burst is shorter than the normal burst

and is used only on the RACH. The dummy burst is sent from BTS on some occasions

and carries no information.

2.3.3 Frequency Hopping

The propagation conditions and therefore the multipath fading depend on the radio

frequency. In order to avoid important differences in the quality of the channels, slow

frequency hopping is introduced. Slow frequency hopping changes the frequency with

every TDMA frame. Fast frequency hopping which changes the frequency many times

per frame is not used in GSM. Frequency hopping also reduces the effects of co-channel

interference.

There are different types of frequency hopping algorithms. The algorithm selected is

sent through the Broadcast Control Channels. Even if frequency hopping can be very

useful for the system, a base station does not have to support it necessarily On the other

hand, a mobile station has to accept frequency hopping when a base station decides to

use it.

2.4 Source coding and channel coding

Fig. 2.7 presents the different operations that have to be performed in order to pass from

the speech source to radio waves and vice versa.

30

Figure 2.7 Speech signal processing

2.4.1 Speech coding

Speech coding is basically the process of speech compression using digital techniques.

In poor radio conditions, the performance of the GSM speech coder has been shown

superior to that of analog cellular [4]. The mathematical operation of the GSM speech

coder is completely standardised in every detail. The following speech coding schemes

are supported in GSM systems.

2.4.1.1 Full Rate speech Coding

The standard digital signal used in most wire telephone systems to represent an audio

channel requires 64 kbps. The standard GSM speech coder compresses this data rate to

13 kbps.

2.4.1.2 Half Rate Speech Coding

The use of higher data compression rates reduces the amount of data required per user

and this increases the number of users that can share a radio channel. The half rate coder

31

allows a single carrier frequency to support 16 conversations instead of the 8

conversations in the full rate coder case.

2.4.1.3 Multirate Speech Coding

The GSM speech coder can vary its data transmission rate depending on speech activity.

The speech coder can reduce or stop transmitting the digital voice signals when speech

activity is low. When the speech coder senses no speech activity (i.e. silence), it

digitally encodes a 20 ms window of background noise to prevent sudden disturbing

changes in perceived sound characteristics when the caller stops talking. Then it shuts

off the radio transmitter until the microphone picks up some sounds again. This process

is called Discontinuous Transmission (DTX). This allows the mobile to save the battery

life and the base station to reduce co-channel interference.

2.4.1.4 Enhanced Speech Coding

This scheme uses same bit rate as full rate speech coding, but has much better quality,

very comparable to the quality of a standard wired telephone connection. The cost is the

much more sophisticated method of encoding and decoding process.

2.4.2 Channel coding

Channel coding is the process of adding extra data bits along with transmitted data bits

that can be used to determine if some or all of the bits have been successfully received

without error. Three basic types of error protection coding are used in GSM: cyclic

redundancy check (CRC), block code and convolutional code.

2.4.2.1 CRC

When a call processing message or some other selected group of data bits are to be

transmitted, the entire message group of bits is first treated as a big binary number. It is

divided in a special way, by a pre-arranged constant, and the remainder is found. The

remainder (CRC check sum) is appended to the data and transmitted along with it. At

the receiving end, the data is again divided by the same special way. If the remainder

computed at the receiver does not match the CRC received, then errors occur. In some

32

cases, CRC check bits can be used to help correct by retransmission some bits that were

received in error.

2.4.2.2 Block Code

The GSM system uses a particular type of block code know as a Fire Code. A block

code is generated by “adding” a sum of products generated by a fixed size block of

digits. More details on block code can be found in Chapter 5.2.

2.4.2.3 Convolutional Code

A convolution code is calculated by “multiplying” the input data vale by a pre-arranged

constant value. At the receiving end, the received value is divided by the same pre-

arranged constant value. If the remainder is zero, it is reasonable to assume that the

data was received correctly, and the quotient is the data. If the remainder is not zero,

the error can be corrected (by adding or subtracting the remainder to or from the

quotient) in certain special cases, and in other cases where the errors are too numerous

or widespread, there is at least an awareness of the errors. Please refer to Chapter 5.3

for more details about convolutional code.

2.4.3 Interleaving

Interleaving is the reordering of data that is to be transmitted so that consecutive bits of

data are distributed over a larger sequence of data to reduce the effect of burst error.

2.4.4 Encryption

Encryption is a process of a protecting voice or data information from being

eavesdropped. It involves the use of a data processing algorithm (formula program) that

uses one or more secret keys (number value) that both the sender and receiver use to

encrypt and decrypt the information.

33

References

[1] Asha Mehrotra, “GSM System Engineering”, Artech House Publishers, 1997.

[2] Siegmund M. Redl, Matthias K. Weber and Malcolm W. Oliphant, “An Introduction

to GSM”, Artech House Publishers, 1995.

[3] Michel Mouly, “The GSM System for Mobile Communications”, Palaiseau, France:

M. Mouly & Marei-B Pautet, 1992.

[4] Lawrence Harte, Richard Levine and Geoff Livingston, “GSM Superphones”,

McGraw-Hill, 1999.

[5] Gary C. Kessler, “ISDN” Second Edition, McGraw-Hill Series on Computer

Communications, 1993.

[6] Richard J Manterfield, “Common-channel Signalling”, Peter Peregrinus Ltd, 1991.

[7] John G. van Bosse, “Signaling in Telecommunication Networks”, John Wiley &

Sons, Inc. 1998.

[8] Andrew S. Tanenbaum, “Computer Networks” Third Edition, Prentice Hall, 1996.

[9] Black, Uyless D. “X.25 and related protocols”, IEEE Computer Society Press, Los

Alamitos, California, 1991.

35

3 VIDEO OVER GPRS NETWORK Building on the brief description of GSM system in Chapter 2, in this chapter we are

able to explore how aspects of the GSM mobile telecommunications network might be

used to provide video delivery in real-time, taking into account the dynamic channel

bandwidth usage capabilities within that system, and marrying these with the variable

bit rate (VBR) characteristics of compressed video. The scope of this objective is large,

and impacts on many associated areas such as commercial provision of services, traffic

management and modelling, intelligent monitoring and control, systems integration,

picture quality and human factors issues. Here we will present only the fundamental

concepts of how the video transmission might be achieved, via careful matching of the

coding and delivery systems.

3.1 Data services in GSM networks

Four kinds of data services are supported in GSM (Global System for Mobile

communications) Phase 2+ system, as listed below:

Packet data on signalling channels service (PDS) [1].

Short message service (SMS) [2].

High speed circuit switched data services (HSCSD) [3].

General packet radio service (GPRS) [4,7].

Each of these services is described briefly in the following paragraph, in order to

identify whether their protocols and operating characteristics support video delivery or

not.

36

3.1.1 PDS and SMS

The GSM standard defines the meaning of PDS as below:

PDS is a bearer service enabling circuit oriented point to point transfer in GSM

networks of very small data packets on radio interface signalling channels for

applications using short dialogues with a data throughput rate capability in the range of

600 to 9200 bps and with a duration in the range of a few seconds [1].

As an alternative service, SMS [2] provides a means to transfer short messages packets

(of up to 140 octets) between a GSM mobile system and an SME (Short Message

Entity) via a SC (Service Center), through a signalling channel (SDCCH or SACCH). In

Phase 2+ the standard enhances the SMS by allowing for multiple SMS packets to be

concatenated, using a flag indicating more information to follow. Obviously PDS and

SMS are not suitable to transfer video over GSM networks due to the extremely limited

packet and message size, as video communication requires at least 32kbps bandwidth

for QCIF format video sequence ; therefore they are not considered any further here.

3.1.2 HSCSD

HSCSD is a feature enabling the co-allocation of multiple (up to 8) Full Rate Traffic

Channels (TCH/F) into a multi-slot configuration, consisting of one or several full rate

traffic channels intended expressly for data transmission [3,5].

Although a TCH (Traffic Channel) is optimised to be able to carry 13 kbps speech

information, for data transmission the data rate is adapted to the standard V.32 bit rate

of 9.6 kbps. In implementing HSCSD, a higher air interface user rate of 14.4 kbps per

TCH is supported, so the basic GSM circuit data service is extended to higher speed (up

to 115 kbps). This data rate is sufficient to support real-time compressed video

transmission applications like videophone or videoconferencing.

Both transparent and non-transparent HSCSD connections are supported, with

symmetric and asymmetric configuration. In an asymmetric configuration, the network

gives priority to fulfilling the air interface user rate requirement in the downlink

direction. For a non-transparent HSCSD connection the network can use dynamic

37

allocation of resources (i.e. TCH/F), as long as the configuration is not in contradiction

with the limiting values defined by the Mobile System, and the actual mobile equipment

is capable of handling the allocated channel configuration.

For a transparent HSCSD connection, dynamic resource allocation is applicable,

provided the air interface user rate is kept constant. The change of channel

configuration within the limits of minimum and maximum channel requirements is done

with resource upgrading and resource downgrading procedures during the call. The

Mobile System may request a service level up- or downgrading during the call,

negotiated at the beginning of the call. This modification of channel requirements

and/or desired air interface user rate is applicable to non-transparent HSCSD

connections only.

3.1.3 GPRS

For bursty data communication application, circuit allocation is a wasteful use of the

radio link. As an alternative, GPRS (General Packet Radio Service) [4,7,8] optimises

the use of network and radio resources by using a packet-mode technique to transfer

high-speed and low-speed data and signalling in an efficient manner. The highest

supported bit rate in GPRS is 170 kbps, which lays the foundation to support

videophone or videoconferencing applications (e.g. based on H.261, H.263 and MPEG-

4: see Chapter 4.3). In a GPRS network two types of services are supported:

Point-to-point (PTP).

Point-to-multipoint (PTM).

Based on the existing GSM network, this enhancement introduces two new network

nodes in the GSM PLMN: the Serving GPRS Support Node (SGSN) and the Gateway

GSN (GGSN). The SGSN, being at the same hierarchical level as the MSC and

connected to the base station system with Frame Relay, keeps track of the individual

Mobile System location and performs security functions and access control while

GGSN provides interworking with external packet-switched networks, and is connected

with SGSNs via an IP-based GPRS backbone network. In addition, the HLR (Home

Location Register) is enhanced with GPRS subscriber data and routing information.

38

The GPRS air interface protocol is concerned with communications between the Mobile

System and BSS at the physical, MAC (Medium Access Control) and RLC (Radio Link

Control) protocol layers. The RLC/MAC sublayers allow efficient multiuser

multiplexing on the shared packet data channels and utilise a selective ARQ protocol for

reliable transmissions across the air interface.

The MAC layer, derived from a slotted ALOHA protocol [9,10], is responsible for

access signalling procedures for the radio channel governing the attempts to access the

channel by the Mobile Systems and the control of this access by the network. Therefore

it is understandable that the crucial part of the network determining whether the network

is able to accommodate a variety of service types including speech, data and video,

mainly depends on MAC.

3.2 Possibilities for video over GSM networks

3.2.1 Video over HSCSD

The following discussions need some knowledge on the video coding algorithms,

further details of which can be found in Chapter 4. From the above description, we can

see HSCSD is undoubtedly a significant enhancement of air interface user rates and can

achieve much higher data transmission speed for ftp or constant bit rate video

applications. HSCSD provides the potential and possibility to transmit H.263 video

over a HSCSD connection. However, for live video delivery it has several limitations.

First, the current world video coding standards produce variable bit rate data streams.

This results in very poor utilisation of radio channels while being transmitted using

HSCSD, as the network has to allocate radio channels according to the highest bit rate

in the entire session, in order to guarantee the required QoS. Though the network can

use dynamic allocation of resources, this is only applicable to non-transparent HSCSD.

It is preferable to use transparent HSCSD, as a non-transparent connection will

introduce delay and jitter which is unacceptable for real-time video applications. For a

transparent connection, the dynamic resource allocation is possible only if the air

interface user rate is kept constant, which is meaningless for VBR video applications.

39

Second, though it is possible to use 8 TCH/F channels on the radio interface, the end-to-

end communication is limited to 64 kbps on the A interface (between base station

controller and mobile services switch center). The highest bit rate on an I-frame

(independently coded from adjacent frames) can be much higher than this. Moreover,

even if it is possible to allocate radio channels dynamically, it is not so flexible to do the

same thing in the GSM backbone network as the current GSM system is based on

circuit switched technology.

3.2.2 Video over GPRS

On the other hand, because of its packet switched nature, GPRS will give more

flexibility and efficiency than HSCSD for the following reasons:

• Packet video has become the main trend for new uses of video communications, such

as Internet access, i.e. video communication through the Internet.

• All the worldwide video coding standards are inherently suited to packet structure as

they delimit sections of the compressed data stream according to parts of a frame or

sequence of frames, such as groups of blocks.

• It is more flexible to allocate video channels per application dynamically, based on

video content, to improve the video channel's utilization. The GPRS backbone

networks are based on Internet Protocol (IP), in which extensive research activities

have been carried out to support diverse traffic transmission, including the proposal

of a new version IPv6.

• GPRS has strong potential to integrate different traffics into one network including

speech, data, and video.

The bottleneck of video over GPRS lies in the MAC (Medium Access Control)

protocol, since it has been designed mainly for data (non real-time) applications in the

current GPRS system (though it can support speech communication quite well). The

MAC is used to share the radio channels among mobile stations in the cell and to

allocate the physical radio channel for a mobile station (MS) when needed for

transmission or reception.

40

An MS initiates a packet transfer by making a Packet Channel Request on Packet

Random Access Channel (PRACH) on a contention basis with other MS. If the

contention is successful, the network responds on Packet Access Grant Channel

(PAGCH). It is possible to use either one- or two-phase packet access methods.

In one-phase access, the Packet Channel Request message contains all the information

needed for establishment of the channel including multislot related information and

quality of the requested service. As the response, a Packet Immediate Assignment

reserving the resources on Packet Data Channels (PDCH) for uplink transfer of user

information is sent to the MS. The MS then starts sending information to BTS for

transmission.

In two-phase access, the Packet Channel Request is responded to with a Packet Uplink

Assignment to reserve the uplink resources for transmitting the Packet Resource

Request, which carries the complete description of the requested resources for the

uplink transfer. A two-phase access can be initiated by either the network or a MS. The

network can order the MS to send a Packet Resource Request message by setting a

parameter in a Packet Uplink Assignment message. A mobile station can require two-

phase access in a Packet Channel Request message. In this case, the network may order

the MS to send a Packet Resource Request or continue with one-phase access

procedure.

From the description above it is clear that bandwidth assigned to one MS can be varied

dynamically. It works well with constant bit rate transmission, or variable non-real-time

applications, but is not effective for variable bit rate real-time video applications.

During transmission of live video, the variable bit rate of live video requires the

dynamic allocation of bandwidth with acceptable delay and every reallocating of the

radio channel requires access to the PRACH. However, the contention mechanism of

PRACH access does not guarantee the delay requirement, which is crucial in real-time

video applications.

One possible solution is the reconfiguration of the multislot to have more than the set of

current active channels by means of the communication between MS and BSS, rather

than by means of the re-access of PRACH during the real time transmission which

would involve further contention. Though the compressed video is variable bitrate, the

41

temporal frame structure, (i.e. the appearance of I, P or B pictures: see Chapter 4.3) is

periodic, so the arrival of an I picture or P picture for transmission can be anticipated.

Therefore the allocation of multislot channels according to the picture type can be

realised. Moreover this scheme requires classes of different video types to be defined

based on the statistical modelling of the video sources that every class corresponds to a

certain bitrate level, so that the bitrate for I picture, P picture or B picture can be

estimated for Packet Channel request purposes.

3.2.3 Dynamic channel allocation

As described in last section, the key issue in delivering live video over a GPRS network

lies in the capability of the network to allocate packet data channels to the MS (Mobile

Station) dynamically. This section provides more detailed discussion on dynamic

channel allocation schemes. In the current standard, three medium access modes are

supported, namely Dynamic Allocation, Extended Dynamic Allocation and Fixed

Allocation.

In Dynamic Allocation, the Packet Uplink Assignment message includes the list of

PDCHs (Packet Data channels) and the corresponding USF (Uplink State Flag) value

per PDCH. A unique TFI (Temporary Frame Identity) is allocated and is thereafter

included in each RLC (Radio Link Control) data and control block related to that TBF

(Temporary Block Flow). The MS monitors the USFs on the allocated PDCHs and

transmits radio blocks on those, which currently bear the USF value reserved for the

usage of the MS.

The Extended Dynamic Allocation medium access method extends Dynamic Allocation

to allow higher uplink throughput. In Extended Dynamic Allocation, the MS monitors

its assigned PDCHs starting with the lowest numbered PDCH, then the next lowest

numbered PDCH, etc. Whenever the MS detects an assigned USF value on an assigned

PDCH, the MS in the next block period transmits an RLC/MAC (Medium Access

Control) block on the same PDCH and all higher numbered assigned PDCHs without

looking for the assigned USF on the higher numbered PDCHs. If the number of PDCHs

allocated to a MS per block period is reduced, the network does not allocate any

42

resources to the MS for one block period following the block period with the higher

number of PDCHs allocated.

In Dynamic and Extended Dynamic Allocation, the MS may be allowed to use the

uplink resources as long as there is queued data on the RLS/MAC layer to be sent from

the MS. It can comprise a number of LLC (Logic Link Control) frames, in the sense

that the radio resources are assigned initially on an “unlimited” time basis.

Alternatively, the uplink assignment for each MS may be limited to a number of radio

blocks in order to offer more fairness to the medium at higher loads.

Fixed Allocation uses the Packet Uplink Assignment message to communicate a

detailed fixed uplink resource allocation to the MS. The fixed allocation consists of a

start frame, slot assignment, and block assignment bitmap representing the assigned

blocks per timeslot. The MS waits until the start frame is indicated and then transmits

radio blocks on those blocks indicated in the block assignment bitmap. The fixed

allocation does not include the USF and the MS is free to transmit on the uplink without

monitoring the downlink for the USF. If the current allocation is not sufficient, the MS

may request additional resources in one of the assigned uplink blocks. A unique TFI is

allocated and is thereafter included in each RLC data and control block related to that

TBF. Because each Radio Block includes an identifier (TFI), all received Radio Blocks

are correctly associated with a particular LLC frame and a particular MS.

Fixed Allocation is good for bursty applications, but does not provide enough blocks in

advance for longer time live video applications, therefore it will not be considered any

more in this project. Extended Dynamic Allocation is more suitable than Dynamic

Allocation for live video delivery because of its flexibility and because the MS does not

have to monitor every PDCH for its use, therefore can save some channel bit rate

capacity. For live video applications, the decision as to when to allocate more channels

needs to be made either by the base station with some kind of memorization mechanism

or by coordination between the network and MS, based on the radio environment and

traffic circumstances.

If I-frame refreshment is not used for error control, the network will need this

memorization mechanism. For instance, if the application transmits one I-frame every

30 frames, then after 29 frames, the network needs to allocate more channels for the

43

next I-frame automatically. In the case that the arrival of I-frames is not totally periodic,

a request for additional channels for unexpected I-frames needs to be sent in a radio data

block by the MS for I-frame transmission, so the network can allocate more channels.

Another possible modification needed to the current standard for video applications is

that MS shouldn’t need to monitor the USF during an I-frame transmission. This is a

reasonable modification because it is most likely that one application will occupy all

assigned PDCHs due to the greedy demand for the packet data channels during the I-

frame transmission. It should be noted that for PB-frame transmission, the use of USF is

necessary because a PDCH needs to be shared by several MSs.

In realizing this scheme, an effective CAC (Call Admission Control) algorithm needs to

be designed. This is a complex matter requiring detailed analysis and will be not

addressed in this work.

3.2.4 Example

The coding of QCIF-4:2:0 Miss America, at a frame rate of 10 fps based on H.263 with

options of Advanced Prediction mode and PB-frame mode, produces a bit rate of I-

frames at 90.2 kbps and smoothed PB-frames at 1.3 kbps. Thus, to transmit this video,

7 channels for I-frames and 1/8 channel for PB-frames need to be allocated. If another

separate video stream starts exactly one frame period after the start of the previous one,

theoretically 8 such streams can be supported simultaneously in one carrier.

Based on this simple example, we can consider some variations, which demand more

complex allocation and control of channels in the transmission, such as those caused by

variations in I-frame and PB-frame bit rates and mixture.

If the bit rate of an I-frame is more than one carrier can cater for in the limited time, part

of the bit rate can be transmitted in the following PB-frame period of time, causing a

corresponding delay i.e. 2 frame periods of time. This delay (about 200 ms if the frame

rate is 10 Hz) is within the usual limits of acceptable tolerance. However, this will result

in a lower utilisation of radio channels and make any consequential dynamic channel

allocation scheme much more complicated.

44

Another option to address this problem is to shape the bit rate into the range of one

carrier frame by adjusting quantisation steps with sacrifice of reconstructed visual

quality. For example, if the I-frame bit rate in the above case was 1.5 times the carrier

bit rate, the encoder would multiply the default quantisation matrix values by a known

constant which would reduce this below 1.0 times the carrier rate, bringing the bit rate

into the capacity of the carrier.

In the above examples, assumptions have been made that a data channel rate of 13.3

kbps is maintained, and that a single carrier is the maximum bandwidth that would be

allowed for the connection. These values could also change, either statically or even

during the progress of the call, thus adding further complexity to the control algorithm.

3.2.5 EDGE

Before concluding this chapter, it is necessary to give a brief introduction on EDGE

(Enhanced Data Rate for GSM and TDMA/136 Evolution) [11]. EDGE is an

enhancement to GSM that aims to increase data rates to over 384 kbps. This rate

increase is achieved by introducing a higher-level modulation format, namely 8-phase

shift keying (PSK) which transmits 3 bits per symbol, instead of the current GSM

modulation, which uses a technique called Gaussian minimum shift keying (GPSK) that

transmits 1 bit per symbol [12].

The benefit is that the overall available channel capacity is increased, which gives more

potential for video communication over mobile network. The penalty incurred by a

higher modulation format is an increase in the frame error rate (FER) at the physical

layer, especially at low SNR (signal to noise ratio) or C/I (carrier to interference ratio).

The FER may be reduced to acceptable levels by employing a FEC code. The residual

frame errors are corrected at the link layer by using a selective automatic repeat request

scheme. Because of the minimal delay requirement for live video transmission, all

these aspects imply a strong demand for error resilience features in the video bit stream.

45

3.3 Conclusion

Although GPRS lays the foundation for real-time video applications, before such

applications can be put into practice, more work needs to be done on optimising the

utilisation of shared scarce radio channels with guaranteed bandwidth for I picture

transmission. The fundamental issue is the compromise that is needed between wide

variations in bitrate needed to cater for all picture types, modelled by an unbounded

VBR scheme, and the inflexibility imposed by the network in allowing only quantum

channel allocation, modelled by step-function bit-rate performance variations or

variable constant-bit-rate (VCBR). The resolution of this dilemma relies on better

system integration and interoperation between the network behaviour and the video

coding process, by extracting useful bit-rate information over many successive frames

and exerting careful intelligent control throughout the transmission. This possibility

requires the development of further layers of complexity than exist at present, to create

a more compliant and flexible protocol that matches system capabilities with these

severe user needs.

References

[1] GSM 03.63 Packet Data on Signalling channels service (PDS), Service description,

Stage 2.

[2] GSM 03.40 Technical realization of the Short Message Service (SMS); Point-to-

Point (PP).

[3] GSM 03.34 High Speed Circuit Switched Data (HSCSD) – Stage 2.

[4] GSM 03.60 General Packet Radio Service (GPRS), Service description, Stage 2,

1997.

[5] J.Hamalainen, “High Speed Circuit Switched Data”, in Z.Zvonar, P.Jung and

K.Kammerlander, “GSM evolution towards 3rd generation systems”, Kluwer Academic

Publishers, 1999, pp. 81-91.

46

[6] J.Hamalainen, “General Packet Radio Service”, in Z.Zvonar, P.Jung and

K.Kammerlander, “GSM evolution towards 3rd generation systems”, Kluwer Academic

Publishers, 1999, pp. 65-80.

[7] J.Cai and D.J. Goodman, “General Packet Radio Service in GSM”, IEEE

Communications Magazine, October 1997, pp. 122-131.

[8] G.Brasche and B.Walke, “Concepts, Services, and Protocols of the New GSM phase

2+ General Packet Radio Service”, IEEE Communications Magazine, August 1997, pp.

94-104.

[9] D.J.Goodman, R.A.Valenzuela, K.T.Gayliard and B.Ramamurthi, “Packet

Reservation Multiple Access for Local Wireless Communications”, IEEE Transactions

on Communications, vol.37, no.8, August 1989, pp. 886-890.

[10] S.Nanda, D.J.Goodman and U.Timor, “Performance of PRMA: A Packet Voice

Protocol for Celluar Systems”, IEEE Transactions on Vehicular Techmology, vol.40,

no.3, August 1991, pp.584- 598.

[11] Robert Van Nobelen, Nambi Sechadri, Jim Whitehead and Shailender Timiri,

“An Adaptive Radio Link Protocol with Enhanced Data Rates for GSM Evolution”,

IEEE Personal Communications, February 1999. pp. 54-63.

[12] D. J. Goodman, “Wireless Personal Communications Systems”, Addison

Wesley, 1997.

47

4 OVERVIEW OF VIDEO CODING TECHNIQUES AND THE CURRENT VIDEO CODING STANDARDS

The video coding techniques reviewed here mainly address low bit-rate video coding,

because the objective of the video application is targeted to mobile situations. By low

bit-rate we mean the bitstream is suitable for transmission over mobile channels, which

is usually below 64kbit/s. State of the art of very low bit-rate video coding techniques

can be divided into waveform based coding and model based coding. The detailed

review of these techniques is given in the following sections.

4.1 Waveform based video coding

In waveform based coding, image sequences are treated as a 3-D signal waveform

exploiting the inherent statistical or deterministic properties and compression is directly

performed on a two-dimensional, discrete distribution of light intensities. A basic

problem in waveform-based compression is to achieve the minimum possible waveform

distortion for a given encoding rate or, equivalently, to achieve a given acceptable level

of waveform distortion with the least possible encoding rate. Most image or video

coding techniques including transform coding, subband/wavelet coding [1], VQ coding

[3] and fractal coding [2] can be classified into this group. Experiences with video

coding at low bit rate show that motion estimation/compensation operations to exploit

temporal redundancy and some kind of transformations to exploit spatial redundancy

are necessary for an efficient very low bit-rate video coding scheme. The reasons for

this are quite simple. Most image sequences exhibit very strong spatial and temporal

48

correlation or redundancy. The spatial redundancy can be reduced by exploiting the

spatial correlation through transformations, so that the compression is realised in the

spatial domain. By exploiting the temporal correlation by inter-frame prediction

through motion estimation and motion compensation techniques, the compression can

be achieved in the temporal domain.

4.1.1 Motion estimation

Motion compensation refers to the use of motion displacements in the coding and

decoding of the sequence. In the encoder the difference between source picture and

prediction is coded; in the decoder this difference is decoded and added to the

prediction to get the decoded output. Both encoder and decoder use the same motion

displacements in determining where to obtain the prediction. However the encoder

must estimate the displacements before encoding them in the bitstream; the decoder

merely decodes them. The process to determine the motion displacements represented

by motion vectors is called motion estimation. Motion estimation techniques can be

loosely divided into three main groups: optical flow techniques, pixel-recursive

techniques and block matching techniques.

4.1.1.1 Optical flow techniques

The optical flow techniques rely on the hypothesis that the image luminance is invariant

along motion trajectories and the direct result from this hypothesis is the optical flow

constraint equation or spatio-temporal constraint equation listed as below,

( ) ( ) 0,, =∂

∂+∇ ⋅ t

trItrI vr

rr r

where, I( ), trr denotes the continuous space-time intensity distribution, and vr dtrd /r= ,

while ( ) ( ) T

ytrI

xtrItrI

∂

∂∂

∂=∇

,,),(rr

r .

As the image intensity change at a point due to motion gives only one constraint, while

the motion vector at the same point has two components, the motion field cannot be

computed without an additional constraint. Different second constraints have been

49

introduced to solve the problem. Among them, Horn and Schunck introduce a

smoothness constraint [4], which minimizes the square of the magnitude of the gradient

of the optical flow velocity. This is based on the assumption that video contains only

opaque objects of finite size commonly undergoing rigid motion, which means that

neighboring points on the objects have similar velocities and the velocity field of the

brightness patterns in the image varies smoothly almost everywhere. This approach

results in a dense motion field. In video compression application these techniques suffer

from two serious drawbacks. First, direct adoption of the motion field will result in an

immense bit rate for motion information. Second, the smoothness constraint is not very

realistic in many situations, especially on moving objects boundaries.

4.1.1.2 Block matching techniques

In block matching techniques, the image is partitioned into rectangular blocks and the

same motion vector is assigned to all pixels within the block [5]. The motion vector is

obtained by minimising the disparity measure between the block in the current frame

and the block in the reference frame. Obviously the inherent motion model is quite

restrictive as it assumes the image is composed of rigid objects in translational motion.

The direct results of this restrictiveness are unreliable motion fields in the sense of the

true motion in the scene, block artifacts and poor motion-compensated prediction along

moving edges. However because of their simplicity to implement and the small

overhead on motion information in video coding, rectangular block matching techniques

have been widely used and adopted in the currently video standards including H.261

[6], H.263 [56], MPEG-1 [7], MPEG-2 [8] and MPEG-4 [57]. Recently more accurate

and complicated motion model and motion estimation techniques based on spatial

transformations such as triangular mesh [9] and quadrilateral mesh [10] have been

proposed, but the computational complexity and the higher amount of overhead

information have not justified their wide acceptance.

4.1.1.3 Pel-recursive techniques

Pel-recursive techniques recursively minimize the prediction error and are carried out

on a pixel-by-pixel basis, leading to a dense motion vector field [11]. Due to an

50

increased computational complexity at the decoder and other inherent drawbacks, this

technique is not commonly used in video compression.

4.1.2 Transforms

Transforms represent video in different domains from the time domain, for example in

the frequency domain. Transforms allow the number of variables or coefficients used to

represent the video to decrease, so in this way compression is realised. Among

transform methods for exploiting spatial redundancy, the discrete cosine transform

(DCT) [12] has been most successful so far and it has been incorporated into all the

image and video coding standards.

In all image and video coding standards, the DCT is based on block of 8 x 8 as shown

below,

∑∑ ++=7

0

7

0]16/)12cos[(]16/)12cos[(),(

2)(

2)(),( υπµπυµυµ yxyxfCCF

where µ and υ are the horizontal and vertical frequency indices, respectively, and the

constants, )(µC and )(υC are given by:

2/1)( =µC if 0=µ

1)( =µC if µ > 0

The original samples can be recreated by Inverse DCT (IDCT) defined as below,

∑∑ ++=7

0

7

0

]16/)12cos[(]16/)12cos[(),(2

)(2

)(),( υπµπυµυµ yxFCCyxf

Though research on other kind transforms, like wavelet/subband, fractal, etc, has been

very active, no successful result consistently and universally beating the DCT on overall

performance combined with motion estimation on video coding has been reported.

4.2 Model based video coding

In model based video coding, some kinds of models are used to exploit the special

features of the video. Model-based video coding can be classified into two categories:

51

3D model-based video coding and 2D model-based video coding. 3D model-based

video coding refers to the models of the real world content objects of the video, while

2D refers to the two-dimensional motion model of the video sequences. The detailed

description of these two coding schemes is below.

4.2.1 3D model coding

In 3D model-based coding, often referred to as 3D knowledge-based or 3D object-based

coding in literature, both the encoder and decoder contain a 3D model of the object to

be coded based on a priori knowledge of the object [13,14,15]. The model can be

downloaded at the decoder at the beginning of the transmission session. At the

transmitting side, the images are analysed including scaling of the 3D wireframe model,

global and local motion estimation, and extraction of the surface color and texture. As

the image object (e.g. the head of a person) moves, motion parameters defining the

coordinates of the wireframe model, described by the global motion of the head and

local motion due to the facial expressions and texture information, are updated and

transmitted. At the receiving side, the image is synthesised using these estimated

motion parameters.

3D model-based coding opens up the possibility of image coding at extremely low

bitrates, but several problems need to be solved before it can be applied to more general

situations. First, modelling objects is one of the important issues in 3D model-based

coding. So far no successful result has been reported on this method except for the

specialized case that the input always consists of a moving head and shoulders.

However dealing with unknown objects is an extremely difficult problem. The second

problem is the presence of analysis and synthesis errors due to mismatch of the wire-

frame, inaccurate motion estimation and rapidly changing texture information, which

can cause serious artifacts in the decoded images. Consequently, some authors suggest

2D deformable mesh and triangular models as described below.

4.2.2 2D model coding

In 2D model-based coding [16], the following steps are implemented,

52

• Segment the image into semantically meaningful regions which should coincide

with real objects to guarantee that the modeling and description of the motion will

be efficient.

• Build a mesh model, which can be triangular or quadrilateral, for each object.

Estimate the motion vectors at the vertices.

• Determine the transformation mapping parameters for each mesh element given the

displacement vectors at its vertices. Synthesize the present frame by mapping the

intensity or color information from the previous reconstructed frame onto the

corresponding patches in the present frame. Compute the synthesis error.

• Encode both the motion vectors at the vertices and the synthesis error.

So the key technique in 2D model-based coding is mesh-based motion estimation,

which overcomes the intrinsic artifact problem that translational block-based motion

estimation has [17,18]. Nevertheless it has several drawbacks, which need to be

addressed before it can be put to more generic applications. First, the occlusion problem

has not been solved. Second, to segment the image intelligently into semantically

meaningful regions cannot be done automatically as the segmentation itself is a ill-

posed problem.

Actually segmentation-based video coding is one of the key techniques in so called

“second generation” video coding schemes [19]. The whole MPEG-4 philosophy is

based on the assumption that the content of the images can be segmented into

meaningful objects. However until the segmentation problem can be solved with

reasonable results automatically, the utilization of MPEG-4 will have to rely on manual

intervention for segmentation.

4.3 Current Video Standards

So far five international standards for video coding have been created. H.261 addresses

videophone and videoconference applications at bit rates of multiples of 64 kbps.

H.263 is intended for similar applications as H.261, but at lower bit rates less than 64

kbps. MPEG-1 aims at digital storage media application up to about 1.5 Mbps. MPEG-

2 is for broadcast television at bit rates of 3-30 Mbps while MPEG-4 is for multimedia

53

application at 5 kbps to 4 Mbps. Among these standards the smallest video format is

sub-QCIF supported by H.263 with 96 lines and 128 pixels per line.

4.3.1 Core video coding techniques in the current video coding standard

All the above-mentioned video coding standards support encoding methods, which

exploit both the spatial redundancies and temporal redundancies inherent in the video

sequence. Spatial redundancies are exploited by using block-based Discrete Cosine

Transform (DCT) coding of 8 by 8 pixel blocks followed by quantization, zigzag scan,

and variable length coding of runs of zero quantized indices or the amplitudes of the

non-zero indices. Temporal redundancies are exploited by using motion compensation,

in which the difference picture of the current frame and its prediction in the reference

frame is coded based on the DCT scheme.

input ◦ Inter/intra ◦ ◦

Figure 4.1 DCT based video coding

As shown in Figure 4.1, each video frame is divided into blocks of a fixed size and each

block is more or less processed independently. A block is first predicted from a

DCT quantiser VLC buffer

Inverse Q&DCT

Frame store

Motion Estimatior

54

matching block in a previously coded reference frame through motion estimation. The

prediction error block is spatially de-correlated, by converting it into the frequency

domain using the discrete cosine transform (DCT); further compression is realized by

quantizing the resulting coefficients and converting them into binary code words using

variable length code (VLC).

After the DCT transform and quantization is applied, coefficients representing high

spatial frequencies are often zero, whereas low-frequency coefficients are often

nonzero. To exploit this behavior, the coefficients are arranged qualitatively from low

to high spatial frequency following the zigzag scan order shown as Figure 4.2.

0 1 5 6 14 15 27 28 2 4 7 13 16 26 29 42 3 8 12 17 25 30 41 43 9 11 18 24 31 40 44 53 10 19 23 32 39 45 52 54 20 22 33 38 46 51 55 60 21 34 37 47 50 56 59 61 35 36 48 49 57 58 62 63

Figure 4.2 Zigzag scan of DCT coefficients

Each nonzero AC coefficient is coded using run-level symbol structure, where each

symbol is encoded using variable length Huffman code. Run refers to the number of

zero coefficients before the next nonzero coefficient; level refers to the amplitude of the

nonzero coefficient. The variable length Huffman code is also applied to the coding of

motion vectors.

The above discussion assumes that temporal prediction is successful, in that the

prediction error block requires fewer bits to code than original image block. This

represents the P-mode of coding. When this is not the case, the original block will be

coded directly using DCT and run-length coding. This is known as intra or I-mode.

Instead of using a single reference frame for prediction, bi-directional prediction can be

used, which finds two best matching blocks, one in a previous frame and another in a

55

following frame, and uses a weighted average of the two matches as the prediction for

the current block. In this case, two MVs (motion vector) are associated with each

block. This is known as B-mode. Both P-mode and B-mode are generally referred to as

inter-mode. The mode information, the MVs and other side information regarding

picture format, block location, etc. are also coded using VLC.

In practice, the block size for motion estimation may not be the same as that used for

transform coding. Typically, motion estimation is done on a larger block known as

macroblock (MB), which is subdivided into several blocks. In the current standards, the

MB size is 16x16 pixels and the block size is 8x8 pixels. The coding mode is decided at

MB level. Because MVs of adjacent MBs are usually similar, the MV of a current MB

is predictively coded, using the MV of the previous MB for prediction. Similarly, the

DC coefficient of a block is predictively coded, with respect to the DC value of the

previous block.

The encoded video bitstream can have such frame structure that it can include I frames

(Intra coded), P frames (Predictive coded) and B frames (Bi-directionally predictive

coded). An I frame is coded entirely in Intra-mode. A P frame is coded using motion

compensated prediction from a past reference frame. Depending on the prediction

accuracy, and MB in P frame can be coded in either Intra-mode or P-mode. A B frame

is coded using motion compensated prediction from a past and future reference frame.

A MB in B frame can be coded in I-, P- or B-mode. I frames and P frames can be used

as reference by other pictures while B frames may never be used to predict another

picture. From error resilience point of view, I frame is most robust, as it doesn’t need to

reference other pictures, thus stops error propagation, but it is most inefficient coding

mode as it produces huge bits number. P frame and B frame achieve high coding

efficiency with B frame as most efficient, but they are vulnerable, as they need other

pictures as reference. If error happens in reference pictures, the effect will propagate to

the current picture and all subsequent pictures, which take the error pictures as

references.

The use of the variable length code improves the coding efficiency, however the most

disadvantage of it is that it introduce vulnerability of the encoded video bitstream.

When an error occurs in the bitstream, decoder is unable to locate the next code word

56

and therefore will loss the synchronization with encoder. This invokes the need to have

an encoded video bitstream to have some error resilience feature.

4.4 Overview of error resilience techniques

To address the need to make the video bitstream more error resilient, diverse error

resilience techniques [20,21,22] have been developed. Depending on the role that the

encoder, decoder and the network layers play in the process, error resilience techniques

can be divided into three categories: error resilient encoding, decoder error concealment

and encoder and decoder interactive error control.

4.4.1 Error resilient encoding

In this approach, the encoder adds redundancy bits into the video bitstream to enhance

the video quality when the bitstream is corrupted by transmission errors. The

redundancy bits should be inserted to achieve maximum gain with the smallest amount

of redundancy.

4.4.1.1 Robust Entropy encoding

As described in previous sections, one major cause for the vulnerability of the

compressed video bitstream is that a video coder uses VLC to represent various

symbols. Any bit errors or lost bits in the middle of a code word will make the code

word undecodable and also make it impossible for the decoder to locate the next code

word, thus causing loss of synchronization with the encoder until the next

resynchronisation point. To tackle this problem, the following techniques have been

developed.

Resynchronisation Markers: One simple and effective approach to address the problem

associated with the use of VLC is to insert resynchronisation markers [26] periodically.

These markers are specially designed in such way that they can be easily distinguished

from all other code words and small perturbation of these code words. Usually some

header information necessary to decode the remaining part of the picture is attached

immediately after the resynchronisation markers. This way, instead of looking for the

57

next picture start code by discarding all the remaining part of the bitstream before the

following picture start code, the decoder can resume proper decoding upon the detection

of a resynchronisation marker.

Reversible Variable Length Coding (RVLC): RVLC is a specially designed VLC that

can be decoded in both forward and backward directions [48]. Without the use of

RVLC, the decoder discards all the bits until a resynchronisation code word is identified

after an error occurs. With RVLC the decoder can not only decode bits after a

resynchronisation marker, but also decode bits before the next resynchronisation code

word in the backward direction. Thus with RVLC, fewer correctly received bits will be

discarded compared with situations where no RLVC is used. Intelligently designed

RVLC and corresponding decoding methods can significantly improve the error

robustness of the bit stream, with little or no loss of coding efficiency [23,24].

Provisions for Syntax-Based Repairs: Because of the syntax constraint present in

compressed video bitstreams, it is possible to recover data from a corrupted bitstream

by making the corrected stream conform to the right syntax [25]. Obviously, such

techniques are very much dependent on the particular coding scheme. The use of

synchronization codes, RVLC, and other sophisticated entropy coding means such as

error resilient entropy coding can all make such repairs more feasible and effective.

4.4.1.2 Error Resilient prediction

Another major contribution to the sensitivity of compressed video to transmission errors

is the use of temporal prediction. Once an error occurs, the reconstructed frame at the

decoder differs from that assumed at the encoder and the reference frames used at the

decoder from there onward will differ from those used at the encoder. Consequently all

subsequent reconstructed frames will be in error: this process is usually referred to as

error propagation. The use of spatial prediction for the DC coefficients and MVs will

also cause error propagation. Two techniques are used to address this need, as follows.

Insertion of Intra-Blocks or Frames: A simple way to stop temporal error propagation is

to encode entire frames in Intra mode [28] more often. For real-time applications, the

often use of Intra frames mode is typically not practical due to the delay constraints.

Instead of entire frames, the use of a sufficiently high number of intra-MBs is more

58

realistic. When employing intra-MBs for error resilience, both the number of such MBs

and their spatial placement has to be determined. The number of necessary intra-MBs is

obviously dependent on the quality of the connection. For the spatial placement of I-

mode blocks, several schemes have been proposed. Random placement has been shown

to be efficient, as well as placement in the areas of highest activity, determined by the

average MV magnitude. Hybrid schemes that additionally consider the time of the last

intra-update of a given MB were also considered. None of those schemes outperformed

any of the others significantly. The currently best-known way for determining both the

correct number and the placement of intra-MBs for error resilience is the use of a loss-

aware rate distortion optimization scheme. Finally if the back channel from decoder to

encoder is available, the information about missing or damaged MB data can be sent to

the encoder to trigger intra-coding at the encoder.

Independent Segment Prediction: The other approach to limit the extent of error

propagation is to split the data domain into several segments and perform

temporal/spatial prediction only within the same segment. This way, the error in one

segment will not affect another segment. One such approach is to include even-indexed

frames into one segment, and odd-indexed frames into another segment [29,30]. This

way, even frames are only predicted from even frames. Another approach is to divide a

frame into multiple regions (e.g. a region can be a GOB or slice), and a region can only

be predicted from the same region in the previous frame.

4.4.1.3 Layered Coding with Unequal Error Protection

Layered coding or scalable coding refers to coding a video into a base layer and one or

several enhancement layers [31]. The base layer provides a low but acceptable level of

quality, and each additional enhancement layer will incrementally improve the quality.

Layered coding also enables users with different bandwidth capacity or decoding

powers to access the same video at different quality levels. To show its strength as an

error resilience tool, layered coding needs to be paired with unequal error protection

(UEP) [32, 33] in the transport system, so the base layer gets most protection using

more channel resources while the enhancement layers get less protection. The

philosophy with this approach is that when the channel condition deteriorates, at least

video quality with base layer is guaranteed.

59

There are many ways to divide video data into more than one layer. According to the

choice of these ways, scalable video can be classified into data partitioning, SNR

scalability, spatial scalability and temporal scalability. These scalability schemes can

also be combined to form a hybrid scalability scheme.

Data partitioning: In this approach, the video bitstream is split so that one layer

contains all of the key headers, motion vectors and low-frequency DCT coefficients.

The second layer contains less critical information such as high frequency DCT

coefficients, possibly with less error protection.

SNR scalability: mainly used in applications that support video transmission at multiple

qualities. All layers have the same spatial resolution but different video quality. The

lower layer provides the basic video quality. The enhancement layers are coded so as to

enhance the basic quality by providing refinement data for the DCT coefficients of the

lower layer.

Spatial scalability: the input source video is preprocessed to create the lower-resolution

image. This is independently coded. In the enhancement layer the differences between

an interpolated version of the base layer and the source image are coded.

Temporal scalability: the lower temporal rate pictures are coded as the basic temporal

rate; the additional pictures are coded with temporal prediction relative to the base

layer.

4.4.1.4 Multiple Description Coding

Similar to layered coding, multiple description coding (MDC) [34, 35, 36, 37] also

codes a source into several sub-streams, known as descriptions, but the decomposition

is such that the resulting descriptions are correlated and have similar importance. Any

single description should provide a basic level of quality, and more descriptions

together will provide improved quality. For each description to provide a certain degree

of quality, all the descriptions must share some fundamental information about the

source, and thus must be correlated. This correlation enables the decoder to estimate a

missing description from a received one and thus provide an acceptable quality level

from any description. On the other hand, this correlation is also the source of

redundancy in MDC. An advantage of MDC over layered coding is that it does not

60

require special provisions in the network to provide a reliable sub-channel. For

example, in a very lossy network, many retransmissions have to be invoked or a lot of

redundancy has to be added in FEC to realize error free transmission. In this case, it

may be more effective to use MDC.

To accomplish their respective goals, layered coding uses a hierarchical, un-correlating

decomposition, whereas MDC uses a non-hierarchical, correlating decomposition.

Some approaches that have been proposed for accomplishing such decomposition

include overlapping quantization, correlated predictors, correlating linear transforms,

correlating filter-banks and interleaved spatial-temporal sampling.

4.4.2 Decoder Error Concealment

Decoder error concealment [38, 39] refers to the recovery or estimation of lost

information due to transmission errors. For a block-based hybrid coding paradigm,

there are three types of information that may need to be estimated in a damaged MB:

the texture information, including the pixels or DCT coefficient values for either an

original image block or a prediction error block; the motion information, consisting of

MVs for MBs coded in either P-mode or B-mode; and finally the coding mode of the

MB. Most of the error concealment techniques utilize some kind of spatial or temporal

interpolation based on the proposition that the colour values of spatially and temporally

adjacent pixels vary smoothly, except in the regions with edges.

4.4.2.1 Recovery of Texture Information

Motion Compensated Temporal Prediction: A simple and yet very effective approach to

recover a damaged MB in the decoder is by copying the corresponding MB in the

previously decoded frame based on the MV for this MB. The performance of this

approach depends critically on the availability of the MV. When the MV is also

missing, it must first be estimated. To reduce the impact of the error in the estimated

MVs, temporal prediction may be combined with spatial interpolation.

Spatial Interpolation: Another simple approach is to interpolate pixels [40, 41] in a

damaged block from pixels in adjacent correctly received blocks. Usually, because all

blocks or MBs in the same row are put into the same packet, the only available

61

neighboring blocks are those in the current row and the row above. Because most

pixels in these blocks are too far away from the missing samples, usually only the

boundary pixels in neighboring blocks are used for interpolation. Instead of

interpolating individual pixels, a simpler approach is to estimate the DC coefficient of a

damaged block and replace the damaged block by a constant equal to the estimated DC

value. The DC value can be estimated by averaging the DC values of surrounding

blocks. One approach to facilitate such spatial interpolation is by an interleaved

packetization mechanism so that the loss of one packet will damage only every alternate

block or MB.

Spatial and Temporal Interpolation by Maximizing the Smoothness of Resulting Video:

A problem with spatial interpolation is how to determine an appropriate interpolation

filter. Another shortcoming is that any received DCT coefficients are ignored. These

problems can be solved by requiring the recovered pixels in a damaged block to be

smoothly connected with the neighboring pixels, both spatially in the same frame and

temporally in the previous and following frames [42, 43]. If some but not all DCT

coefficients are received for the current block, then the estimation should be such that

the recovered block be as smooth as possible, subject to the constraint that the DCT on

the recovered block would produce the same value for the received coefficients. These

objectives can be formulated as an unconstrained optimization problem, and the

solutions under different loss patterns correspond to different interpolation filters in the

spatial, temporal and frequency domains.

Spatial Interpolation Using Projection onto Convex Sets (POCS) Technique: The

general idea behind POCS-based estimation methods [44, 45] is to formulate each

constraint about the unknowns as a convex set. The optimal solution is the intersection

of all the convex sets, which can be obtained by recursively projecting a previous

solution onto individual convex sets. When applying POCS for recovering an image

block, the spatial smoothness criterion is formulated in the frequency domain, by

requiring the discrete Fourier transform (DFT) of the recovered block to have energy

only in several low frequency coefficients. If the damaged block is believed to contain

an edge in a particular direction, then one can require the DFT coefficients to be

distributed along a narrow strip orthogonal to the edge direction, i.e., low-pass along the

edge direction, and all-pass in the orthogonal direction. The requirement on the range

62

of each DFT coefficient magnitude can also be converted into a convex set, as can the

constraint imposed by any received DCT coefficient. Because the solution can only be

obtained through an iterative procedure, this approach may not be suitable for real-time

applications.

4.4.2.2 Recovery of Coding Modes and Motion Vectors

Coding modes and motion vectors are fundamental information needed to decode

compressed video bitstream based on the current video-coding standard. One way to

estimate the coding mode for a damaged MB is by collecting the statistics of the coding

mode pattern of adjacent MBs and finding a most likely mode given the modes of

surrounding MBs. A simple and conservative approach is to assume that the MB is

coded in the intra-mode and use only spatial interpolation for recovering the underlying

blocks.

For estimating lost MVs, there are several possible simple operations [49]:

• Assuming the lost MVs to be zeros, which works well for video sequences with

relatively small motion.

• Using the MVs of the corresponding block in the previous frame.

• Using the average of the MVs from the spatially adjacent blocks.

• Using the median of MVs from the spatially adjacent blocks.

• Reestimating the MVs.

Typically when an MB is damaged, its horizontally adjacent MBs are also damaged,

and hence the average or mean is taken over the MVs above and below. It has been

shown that the last two methods produce the best reconstruction results [46].

Instead of estimating one MV for a damaged MB, one can use different MVs for

different pixel regions in the MB for a better result.

4.4.3 Encoder and Decoder Interactive Error Control

In all the techniques described in the previous sections, the encoder and decoder operate

independently to combat transmission errors. When a feedback channel from decoder

to encoder is available, better performance can be achieved if the encoder and decoder

cooperate in the process of error concealment [47]. For real-time applications it is not

63

realistic to employ error control techniques used in data link layer, e.g. ARQ. However

it is possible to limit or stop the error propagation effect by employing intra-mode

coding or dynamic reference picture selection according to the back channel message;

in this way we can reduce the coding inefficiency inherent with periodic intra mode

coding.

4.4.3.1 Reference Picture Selection (RPS) Based on Feedback Information

A simple way to take advantage of an available feedback channel is to employ RPS. If

the encoder learns through a feedback channel about a damaged part of a previously

coded frame, it can use a previous picture other than the last and damaged one as a

reference picture for encoding the next P-frame. Of course this reference picture should

be also available to the decoder. The disadvantage is that both encoder and decoder

need to have a large buffer to store several past decoded pictures as possible reference

pictures. Information about the reference picture to be used is conveyed in the bit

stream. Compared to coding the current picture as an I-frame, the penalty for using the

older reference picture is significantly lower, if the reference picture is not too far away.

4.4.3.2 Error Tracking Based on Feedback information

Instead of using an earlier and undamaged frame as the reference frame, the encoder can

track how the damaged areas in frame n would have affected decoded blocks in frames

n+1 to n+d-1, and then perform one of the following [50,51,52],

• Code the blocks in frame n+d that would have used for prediction of damaged

pixels in frame n+d-1 using intra-mode.

• Avoid using the affected area in frame n+d-1 for prediction in coding frame n+d.

• Perform the same type of error concealment at the encoder as at the decoder for

frame n+1 to n+d-1, so that the encoder’s reference picture matches that at the

decoder, when coding frame n+d.

The first two approaches only require the encoder to track the locations of damaged

pixels or blocks, whereas the last approach requires the duplication of the decoder

operation for frame n+1 to n+d-1, which is more complicated. In either approach, the

decoder will recover from errors completely at frame n+d.

64

4.5 Error resilience tools in the current video coding standards

Among all of the current video coding standards, H.263 and MPEG-4 have been created

with the intention for possible use in mobile environments. Because the work

completed in this thesis has been mainly targeted to mobile environment, only error

features from these two standards are reviewed.

4.5.1 Error resilience tools in H.263

H.263 follows the general ideas of block-based hybrid coding. Beyond the baseline

syntax, H.263 offers a variety of optional operation modes [53] that adjust various

tradeoffs. Some of these modes typically allow adjusting the tradeoff between

computational complexity and compression efficiency, while others are intended to

improve error resilience by adding redundancy bits to the bitstream, which will be

discussed in more detail in the following sections.

H.263 contains four error resilience tools: block-based FEC, flexible synchronization

points (slices), independent segment decoding (IDS) and reference picture selection

(RPS). The temporal, spatial, and SNR scalability modes can also be used to support

error resilient applications. An appropriate combination of these tools along with means

available in the baseline syntax, such as intra-MB refresh, is typically chosen adaptively

by the application according to the network characteristics and conditions.

4.5.1.1 Forward Error Correction Mode (FEC) (Annex H)

The FEC mode divides the H.263 bitstream into FEC frames of 492 bits each. A 19-bit

BCH forward error correction checksum [54] is calculated for all the bits of such a FEC

frame, along with additional bit to allow for resynchronization of the resulting 512-bit

block structure. This FEC coding allows the correction of single bit error in each FEC

frame and the detection of two bit errors for an approximately 4% increase in bit rate.

The FEC mechanism of Annex H is designed for ISDN, which is a very low error rate

network.

65

4.5.1.2 Slice Structure Mode (Annex K)

When the slice structure mode is used, the original group of block (GOB) structure is

replaced by a slice structure. Slices consist of a number of macroblocks belonging to

the same picture. These macroblocks might be arranged either in scanning order or in a

rectangular shape. In both cases, any macroblock of a picture belongs to exactly one

slice. All macroblocks of one slice can be decoded independently from the content of

other slices because no dependencies such as prediction of motion vectors are allowed

across slice boundaries. The main difference between a GOB and a slice is that a GOB

always has a rectangular shape, while a slice has a more flexible shape and usage than a

GOB.

There is a need to have the information of the picture header available to decode a slice

because the information conveyed in the picture header is not repeated in the slice

headers. Scan order slices are often more useful if small packet sizes are needed,

whereas rectangular slices are helpful in achieving packet loss resilience and low codec

delay at higher bit rates. Each of the two slice structures can be used either with a fixed

scan-ordered or an arbitrarily ordered transmission of the slices. The latter makes

decoder implementation more difficult, but minimizes latency in lossy environments.

The former is more appropriate for heavily pipelined hardware architectures, which

might not allow random decoding of data.

4.5.1.3 Independent Segment Decoding Mode (Annex R)

The independent segment decoding mode enforces the treatment of segment boundaries

as if they are picture boundaries. A segment is defined as a slice, a GOB, or a number

of consecutive GOBs with empty GOB headers. This mode allows the independent

decoding of picture parts, if and only if, the shape of the independently decodable

segments remains identical between two I frames. In such a case, the import of

previously corrupted picture data outside the segment boundaries (due to motion

compensation) during the reconstruction process can be avoided. The independent

segment decoding mode can be used for special effects like spatial video mixing, but it

can also achieve error resilience by eliminating error propagation between well-defined

spatial parts of a picture.

66

4.5.1.4 Reference Picture Selection (RPS - Annex N)

The RPS mode allows the use of an earlier than the last transmitted picture to serve as

the reference picture for inter picture prediction. It is also possible to apply RPS to

individual segments rather than full pictures. The temporal reference of the reference

picture to be used is conveyed in the picture/segments header to inform the decoder

which of its several reference pictures should be used.

The RPS mode may be used with or without a back channel. In multi-party video

applications, back channels are obviously not realistic. For these scenarios, one

possible method of using RPS mode is known as video redundancy coding (VRC).

VRC can be used in conjunction with the spatial error resilience mechanisms of Annex

R and Annex K to achieve spatial and temporal error resilience.

Figure 4.3 VRC with two threads and three frames per thread

The principle of the VRC method is to divide the sequence of pictures into two or more

threads in such a way that all camera pictures are assigned to one of the threads in a

round-robin fashion. Each thread is coded independently. Figure 4.3 shows that the

pictures have been divided into two threads. Obviously, the frame rate within one

thread is much lower than the overall frame rate: half in case of two threads, a third in

case of three threads, and so on. This leads to a substantial coding penalty because of

the generally larger scene changes in the picture sequence and longer motion vectors

typically required to represent accurately the motion related changes between two P

P1 P3 P5

Sync

P2 P4 P6

Sync

time

67

frames within a thread. At regular intervals, all threads converge into a so-called Sync

frame as shown in Fig. 4.3.

If one of these threads is damaged because of transmission errors, the remaining threads

stay intact and can be used to predict the next Sync frame. It is possible to continue the

decoding of the damaged thread, which leads to slight picture degradation, or to stop its

decoding, which leads to a drop of the frame rate. If the length of the threads is kept

reasonably small, however, both degradation forms will persist only for a very short

time, until the next Sync frame is reached.

Figure 4.4 illustrates the workings of VRC when one of the two threads is damaged.

Sync frames are always predicted based on one of the undamaged threads. This means

that the number of transmitted I frames can be kept small because there is no need for

complete resynchronisation. The dotted box in Fig. 4.4 means the frame has been

corrupted by the transmission error and was not decoded successfully. Consequently

the frame rate from the corrupted frame to the Sync point will be lower because these

frames are no long decodable. From the Sync point a new cycle will start again,

because the frame at Sync point can still be decoded from the other thread, which has

been successfully decoded.

Figure 4.4 Frame loss with VRC

A correct Sync frame prediction is no longer possible only if all threads between two

Sync frames are damaged. In this situation, annoying artifacts will be present until the

next I frame is decoded correctly, as would have been the case without employing VRC.

P1 P3 P5

Sync

P2 P6

Sync

time

68

If a back channel is available, messages can be sent to the encoder from the decoder

containing the positive or negative acknowledgements of a decoded picture along with

the temporal reference of the picture can be sent to encoder. By using this information,

the encoder can keep track of the last correctly decoded picture at the decoder. Once

the encoder learns about as incorrectly decoded picture at the decoder through a back

channel message, it can react accordingly by using a correct reference picture for further

prediction.

4.5.2 Error resilience tools in MPEG-4

In MPEG-4, five error resilience tools have been incorporated into the standard listed as

below:

Video Packetization or Resynchronisation

Data Partitioning (DP)

Reversible Variable-Length Codes (RVLCs)

Adaptive Intra Refresh (AIR)

NEWPRED

The first two approaches and AIR try to isolate the influence of the errors to be within

one packet, while RVLC attempts to recover some data which will be discarded if

RVLC is not used. NEWPRED brings encoder and decoder into cooperation to conceal

the error effect.

4.5.2.1 Packetization

Basically the video packet resynchronisation is very similar to the Group of Blocks

(GOB) or slice structure mode in Annex K of H.263+. Resynchronization or

packetization attempts to stop error propagation after errors have been detected, by

inserting resynchronization markers into the bitstream. When errors occur in the

encoded bitstream without using resynchronization markers, the decoder will not be

able to locate the next code word, and therefore will lose synchronization with the

encoder. When resynchronization markers are inserted in the bitstream, the decoder can

regain synchronization by looking for the next resynchronization marker after losing

synchronization due to the errors in the bitstream. Generally, the data between the

69

synchronization point prior to the error and the first point where synchronization is re-

established is discarded.

The main difference between GOB and Packetization is that the GOB approach to

resynchronization is based on spatial resynchronization while the video packet approach

adopted by MPEG-4 is based on providing periodic resynchronization markers

throughout the bitstream. In the GOB approach, once a particular macroblock location

is reached in the encoding process, a resynchronization marker is inserted into the

bitstream. A potential problem with this approach is that since the encoding process is

variable rate, these resynchronization markers will most likely be unevenly spaced

throughout the bitstream. Therefore, certain portions of the scene, such as high motion

areas, will be more susceptible to errors, which will also be more difficult to conceal. In

the video packet approach, the length of the video packets are not based on the number

of macroblocks, but instead on the number of bits contained in that packet. If the

number of bits contained in the current video packet exceeds a predetermined threshold,

then a new video packet is created at the start of the next macroblock.

Figure 4.5 Packet structure

Figure 4.5 shows a typical video packet (VP) structure. A resynchronization marker is

used to distinguish the start of a new video packet. This marker is distinguishable from

all possible VLC code words as well as the VOP (Video Object Plane) start code.

Header information is also provided at the start of a video packet. Contained in this

header is the information necessary to restart the decoding process and includes the

macroblock address (number) of the first macroblock contained in this packet and the

quantization parameter (quant_scale) necessary to decode that first macroblock. The

macroblock number provides the necessary spatial resynchronization while the

quantization parameter allows the differential decoding process to be resynchronized.

Following the quant_scale is the Header Extension Code (HEC). As the name implies,

HEC is a single bit used to indicate whether additional information will be available in

Resync Marker

MB Number

Quant Scale

HEC Macroblock Data Resync Marker

70

this header. If the HEC is equal to 1 then the following additional information is

available in the packet header: module time base, vop_time_increment,

vop_coding_type, intra_dc_vlc_thr, vop_fcode_forward, vop_fcode_backward. In this

case the HEC makes it possible to decode each VP independently, as all the necessary

information to decode the VP is included in the header extension code field.

If the VOP header information is corrupted by a transmission error, it can be corrected

by the HEC information. The decoder can detect the error in the VOP header, if the

decoded information is inconsistent with its semantics.

In conjunction with the video packet approach to resynchronization, a second method

called fixed interval synchronization has also been adopted by MPEG-4. This method

requires that VOP start codes and resynchronization markers appear only at legal fixed

locations in the bitstream. This helps to avoid the problems associated with errors

present in the bitstream which can emulate a VOP start code. In this case, when fixed

interval synchronization is unitized, the decoder is only required to search for a VOP

start code at the beginning for each fixed interval. The fixed interval synchronization

method extends this approach to be any predetermined interval.

Fixed interval synchronization is achieved by first inserting a bit with the value 0 and

then, if necessary, inserting bits with value 1 before the start code and the Sync marker.

The decoder can determine if errors are incurred in a video packet by detecting the

incorrect number of these stuffing bits.

4.5.2.2 Data Partitioning (DP)

Different from resynchronisation, DP [55] is an error concealment tool, which is

achieved by separating the motion and macroblock header information away from the

texture information. If the texture information is lost, this approach utilises the motion

information to conceal these errors. That is, due to the errors the texture information is

discarded, while the motion is used to motion compensate the previously decoded VOP.

The syntactic structure of the DP mode is depicted in Fig. 4.6.

71

Figure 4.6 Structure of Data Partitioning

Error concealment is an extremely important component of any error robust video codec.

Similar to the error resilience tools, the effectiveness of an error concealment strategy is

highly dependent on the performance of the resynchronisation scheme. Basically, if the

resynchronisation method can effectively localize the error, then the error concealment

problem becomes much more tractable.

4.5.2.3 Reversible Variable Length Coding (RVLC)

RVLC is the only error resilience tool which has some kind of data recovery mechanism

in MPEG-4. The main contribution to the vulnerability of the compressed video using

the current standards is the use of variable length codes, though it does achieve a high

compression ratio. During the decoding process, if the decoder detects an error while

decoding VLC data, it loses synchronization with the encoder. As a consequence, the

decoder typically discards all the data up to the next resynchronization point. RVLC

alleviates this problem and enable the decoder to better isolate the errors, thus

improving data recovery in the presence of errors.

RVLC is designed so as to be instantaneously decoded both in forward and reverse

directions. A part of a bitstream which cannot be decoded in the forward direction due

to the presence of errors can often be decoded in the backward direction, and so recover

some information which would otherwise have been discarded. However RVLC is only

applied to TCOEF coding in MPEG-4 at this stage.

4.5.2.4 Adaptive Intra Refresh (AIR) for Error Resilience

In the current video coding standard, error refreshment is mandatory. When an error

occurs in an I or a P frame, all subsequent frames are degraded unless any error

refreshment technique is adopted. However encoding entire pictures in Intra mode to

avoid this reduces the coding efficiency greatly, so a compromise is the AIR approach.

Resync Marker

MB Number

Quant Scale

HEC Motion & header information

Motion Marker

Texture Info

Resync Marker

72

In AIR, the motion area is encoded frequently in Intra mode and the number of Intra

MBs in a VOP is fixed and predetermined, depending on bit rate and frame rate. The

encoder estimates motion for each MB and the motion area is encoded in Intra mode.

The results of this estimation are recorded to the Refresh Map. The encoder refers to

the Refresh Map and decides whether to encode the current MB in Intra mode or not.

The decision is performed by the comparison between SAD and a threshold value.

SAD is the Sum of the Absolute Difference value between the current MB and the MB

in same location of the previous VOP. Since the SAD has been already calculated in

the Motion Estimation part, additional calculation for the AIR is not necessary. If the

SAD of the current MB exceeds the threshold it is regarded as a high motion area and it

is encoded in Intra mode.

4.5.2.5 NEWPRED

Similar to the RPS mode (Annex N) and Slice Structure Mode (Annex K) of H.263 in

principle, when the NEWPRED mode is turned on in MPEG-4, the reference used for

inter-prediction by the encoder will be updated adaptively according to feedback from

the decoder via feedback messages. These upstream messages indicate which

NEWPRED (NP) segments (which can either be an entire frame, or the content of a

packet) have been successfully decoded and which NP segments have not. Based on the

feedback information the encoder will either use the most recent NP segment, or a

spatially corresponding but older NP segment for prediction. In the latter case the

coding efficiency is reduced, as long motion vectors and additional texture information

will typically have to be used.

73

References

[1] Martin Vetterli and Jelena Kovacevic, “Wavelets and Subband Coding”, Prentice

Hall 1995.

[2] Haobo Li, Mirek Novak and Rober Forchheimer, “Fractal-based image sequence

compresson scheme”,Optical Engineering, 32(7), July 1993, pp.1588-95.

[3] Katherine S. Wang, James O. Normile and Hsi Jung Wu, “Software decodable

video compression algorithm based on vector quantization and classification”. In:

Proceedings of IEEE Workshop on Visual Signal Processing and Communication,

Melbourne, September 1993.

[4] Berthold K.P. Horn and Brian G. Schunck, “Determining Optical flow”, Artificial

Intelligence 17, 1981, pp.319-331.

[5] A. Murat Teckalp, “Digital Video Processing”, Prentice Hall PTR 1995.

[6] Recommendation H.261: Video Codec for Audiovisual Services at p×64 kbit/s.

ITU-T (CCITT), March 1993.

[7] ISO/IEC 11172-2, “Information technology-coding of moving picture and


August 1993.



[9] Y. Nakaya and H. Harashima, “Motion compensation based on spatial

transformations,” IEEE Trans. Circ. and Syst.: Video Tech., Vol. 4, June 1994,

pp.339-56, 366-7.

[10] Yucel Altunbasak, “Object-Scalable, Content-Based Video Representation and

Motion Tracking for Visual Communications and Multimedia”, Ph.D thesis,

Department of Electrical Engineering, University of Rochester. 1996.

[11] J. Biemond, L. Looijenga, D. E. Boekee, and R. H. J.M. Plompen, “A pel-recursive

Wiener-based displacement estimation algorithm,” Sign. Proc. Vol. 13, December

1987, pp. 399-412.

74

[12] N. Ahmed, T. Natarajan and R. K. Rao, “Discrete Cosine transform”, IEEE Trans.

On Computers, 1974, pp.90-93.

[13] Bill Welsh, “Model-based coding of images”, Ph.D thesis, British Telecom

Research laboratories, January 1991.

[14] Haobo Li, “Low Bitrate Image Sequence Coding”, Ph.D thesis, Linkoping

University, 1993.

[15] Jorn Ostermann, “Object-based analysis-synthesis coding based on the source

model of moving rigid 3D objects”, Signal Processing: Image Communication 6,

1994, pp.143-161.

[16] Candemir Toklu, “Object-based Digital Video Processing Using 2D Meshes”, Ph.D

thesis, Department of Electrical Engineering, University of Rochester, 1998.

[17] .Yucel Altunbasak and A. Murat Tekalp, “Closed-form connectivity-preserving

solutions for motion compensation using 2-D meshes”, IEEE Trans. Image Proc.,

Vol. 6, No. 9, September 1997, pp.1255-1269.

[18] Yucel Altunbasak and A. Murat Tekalp, “Occlusion-adaptive, content-based mesh

design and forward tracking”, IEEE Trans. Image Proc., Vol. 6, No. 9, September

1997, pp. 1270-1280.

[19] L.Torres and M.Kunt, “Second generation video coding techniques”, in L.Torres

and M.Kunt, “Video coding, The second generation approach”, Kluwer Academic

Publishers, 1996, pp.1-30.

[20] Y. Wang and Q. F. Zhu, “Error Control and Concealment for Video

Communication: A Review”, Proceedings of the IEEE, vol. 86, No. 5, May 1998.

pp.974 – 997.

[21] Y. Want, S. Wenger, J. Wen and A. K. Katsggelos, “Error Resilient Video Coding

Techniques”, IEEE Signal Processing Magazine, July 2000, pp.61-82.

[22] J. D. Villasenor, Y. Q. Zhang and J. Wen, “Robust Video Coding Algorithms and

Systems”, Proceedings of the IEEE, vol. 87, no. 10, October 1999, pp.1724-1733.

75

[23] J. Wen and J. D. Villasenor, “A class of reversible variable length codes for robust

image and video coding”, Proc. 1997 IEEE Int. Conf. Image Processing, vol. 2,

Santa Barbara, CA., Oct. 1997, pp. 65-68.

[24] Description of Error Resilient Core Experiiments, ISO/IEC JTC1/SC29/WG11

N1383, Nov. 1996.

[25] D. W. Redmill and N. G. Kingsbury, “the EREC: an error resilient technique for

coding variable-length blocks of data”, IEEE Trans. Image Processing, Vol. 5, No.

4, April 1996, pp. 565-574.

[26] R. Talluri, “Error-resilient video coding in ISO MPEG-4 standard”, IEEE

Commun. Mag., vol. 36, no.6, June 1998, pp.112-119.

[27] J.Ott, Stephan Wenger and Gerd Knorr, “Application of H.263+ Video Coding

Modes in Lossy Packet network Environments”, Journal of Visual Communication

and Image Representation 10, 1999, pp.12-38.

[28] P. Haskell and D. Messerschmitr, “Resynchronisation of motion compensated

video affected by ATM cell loss”, Proc. ICASSP 92, San Francisco, CA, Vol. 3,

1992, pp.545-548.

[29] S. Wenger, “Video redundancy coding in H.263+”, in Proceedings of AVSPN,

Aberdeen, UK, September 1997.

[30] S. Wenger, G. Knorr, J. Ott and F. Kossentini, “Error resilience support in

H.263+”, IEEE Trans. Circuit Syst. Video Technol., Vol. 8, No.6, November

1998, pp. 867-877.

[31] J. F. Arnold, M. R. Frater and J. Zhang, “Error resilience in the MPEG-2 video

coding standard for cell based networks – A review”, Signal Processing: Image

Communication 14, No. 6-8, May 1999, pp. 607-633.

[32] W. Rabiner, M. Budagavi and R. Talluri, “Proposed extensions to DMIF for

supporting unequal error protection of MPEG-4 video over H.324 mobile

networks”, ISO/IEC JTC 1/SC 29/WG 11, Doc. M4135, MPEG Atlantic City

meeting, October 1998.

76

[33] A. Cellatoglu, S. Fabri, S. Worrall, A. Kondoz, “Use of Prioritized Object-Oriented

Video Coding for the Provision of Multiparty Video Communications in Error-

Prone Environments”, IEEE VTC, Amsterdam, 1999-Fall, pp. 401-405.

[34] V. A. Vaishampayan, “Design of multiple description scalar quantizers”, IEEE

Trans. Inform. Theory, Vol. 39, No. 3, May 1993, pp. 821-834.

[35] V. A. Vaishampayan and J. Domaszewicz, “Design of entropy constrained multiple

description scalar quantizer”, IEEE Trans. Inform. Theory, vol. 40, January 1994,

pp. 245-250.

[36] Y. Wang, M. T. Orchard and A. R. Reibman, “Multiple description image coding

for noisy channels by pairing transform coefficients”, in Proc. 1997 IEEE 1st

Workshop Multimedia Signal Processing, Princeton, NJ, June 1997, pp. 419-424.

[37] M. T. Orchard, Y. Wang, V. A. Vaishampayan and A. R. Reibman, “Redundancy

rate-distortion analysis of multiple description coding using pairwise correlating

transforms”, IEEE International Conference on Image Processing (ICIP97), (Santa

Barbara, CA), October 1997. Vol. 1, pp. 608-611.

[38] Q. Zhu and Y. Wang, “Error concealment in visual communications”, in

Compressed video over Networks, A. R. Reibman and M. T. Sun, Eds. New York,

Marcel Dekker, 2000.

[39] A. K. Katsaggelos and N/ P. Galatsanos, Eds., “Signal Recovery Techniques for

Image and Video Compression and Transmission”, Norwell, MA: Kluwer, 1998.

[40] S. S. Hemami and T. H.-Y. Meng, “Transform coded image reconstruction

exploiting interblock correlation”, IEEE Trans. Image Processing, Vol. 4, July

1995, pp. 1023-1027.

[41] S. Aign and K. Fazel, “Temporal & spatial error concealment techniques for

hierarchical MPEG-2 video codec”, in Proceedings of IEEE International

Conference on Communications, ICC'95, Seattle, June 1995, pp. 1778-1783.

[42] M. C. Hong, L. Kondi, H. Scwab and A. K. Katsaggelos, “Video error concealment

techniques”, Signal Processing: Image Communications, Vol. 14, No. 68, 1999,

pp.437-492.

77

[43] Q. F. Zhu, Y. Wang and I. Shaw, “Coding and cell loss recovery for DCT-based

packet video”, IEEE Trans. Circuits Syst. Video Technol., Vol. 3, No. 3, June

1993, pp. 248-258.

[44] H. Sun and W. Kwok, “Concealment of damaged block transform coded images

using projections onto convex sets”, IEEE Trans. Image Processing, Vol. 4, April

1995, pp.470-477.

[45] G. S. Yu, M. M. Liu and M. W. Marcellin, “POCS-based error concealment for

packet video using multiframe overlap information”, IEEE Trans. Circuits Syst.

Video Technol., Vol. 8, August 1998, pp. 422-434.

[46] A. Narula and J. S. Jim, “Error concealment techniques for an all-digital high-

definition television system”, in Proc. SPIE Conf. Visual Communication Image

Processing, Cambridge, MA, 1993, pp. 304-315.

[47] B. Girod and N.Harber, “Feedback-based error control for mobile video

transmission”, Proc. IEEE, Vol. 87, October 1999, pp. 1707-1723.

[48] J.Wen and J.D.Villasenor, “Reversible Variable length Codes for Efficient and

Robust Image and Video Coding”, Proceedings of the 1998 IEEE Data

Compression Conference, Snowbird, Utah, March 30 – April 1, 1998, pp471-480.

[49] W.-m.Lam, A.R.Reibman and B.Liu, “Recovery of lost or erroneously received

motion vectors”, Proc. ICASSP ’93, Minneapolis, April 1993, pp.V-417-420.

[50] M. Wada, “Selective recovery of video packet loss using error concealment”, IEEE

J. Select. Areas Commun., Vol. 7, June 1989, pp. 807-814.

[51] E. Steinbach, N. Farber and B. Girod, “Standard compatible extension of H.263 for

robust video transmission in mobile environments” IEEE Trans. Circuits Syst.

Video Technol., Vol. 7, December 1997, pp. 872-881.

[52] Y. Tomita, T. Kimura and T.Ichikawa, “Error resilient modified inter-frame coding

system for limited reference picture memories”, In Proc. Int. Picture Coding Symp.

(PCS), Berlin, Germany, Sept. 1997, pp. 743- 748.

78

[53] G.Coto, B.Erol, M.Gallant and F.Kossentini, “H.263+: Video Coding at Low Bit

Rates”,IEEE Transactions on Circuit and Systems for Video Technology, Vol.8,

No.7, Novermber 1998, pp.849-866.

[54] D. G. Hoffman, D. A. Leonard, C. C. Lindner, K. T. Phelps, C. A. Rodger and J. R.

Wall, “Coding Theory: the Essentials”, Marcel Dekker, Inc., 1991.

[55] R. Talluri, I. Moccagattaq, Y. Nag and G. Cheung, “Error concealment by data

partitioning”, Signal Processing: Image Communcation, Vol. 14, May 1999,

pp. 505-518.

[56] ITU-T H.263 “Video coding for low bit rate communication”, 1998.


Visual”, 2001.

79

5 OVERVIEW OF ERROR CORRECTION TECHNIQUES

5.1 Introduction

There are basically two error control techniques at the data link layer of a

telecommunication network: forward error correction (FEC) and automatic repeat

request (ARQ). FEC employs error correction codes to correct errors detected at the

receiver while ARQ uses error detection and retransmissions to combat transmission

errors in two-way communication systems. Both FEC and ARQ have their advantages

and limitations. For a stable channel condition, FEC schemes maintain a constant

system throughput and have a low time delay, which is very important for real-time

applications. However, when the channel condition deteriorates, the performance of

FEC decreases dramatically. For a one-way transmission system, FEC is the only

choice. For a good channel condition, ARQ schemes are simple and are able to achieve

a high throughput with high reliability. Again when the channel error rates increase, the

system throughput decreases rapidly; long and variable delays, which are unacceptable,

are expected. In wireless environments it is seldom feasible to use pure FEC or ARQ

due to the unstable channel conditions. In most situations a combination of basic FEC

and ARQ schemes, which is often called hybrid ARQ, is used due to its capability in

combining the advantages from pure EFC and ARQ. However for error control in the

application layer, FEC is the only choice. Because we are only concerned here with

error control in the application layer, only FEC techniques will be reviewed further.

Shannon published his pioneering work in 1948 [1], in which he showed that, as long as

the rate at which information is transmitted is less than the channel capacity, there exist

80

error control codes that can provide arbitrarily high levels of reliability at the receiver

output. Since then, a great deal of effort has been expended on the problem of devising

efficient encoding and decoding methods for error control in a noisy environment. The

output of research results on error control coding can be roughly classified into two

categories, namely block and convolutional.

5.2 Block Codes

In a block code [2,3], the information sequence is divided into message blocks of k

information bits, and each block is mapped independently into a block of n bits which

is called a code word with n > k. Corresponding to 2 k different possible messages,

there are 2 n different possible code words after encoding. From these 2 n code words,

m = 2 k code words may be selected to form a code. The set of code words of length n

is called a ),( kn block code. Obviously, the encoder is memoryless and the rate of this

block code is nkR /= .

In practice, linear block codes are the most commonly used due to their easy synthesis

and implementation, and are constructed according to the definition given below.

Let the message ),......,,( 110 −= kmmmmr be an arbitrary k -tuple from a Galoic

field, )(qGF . The linear ),( kn code C over )(qGF is the set of kq codewords of row-

vector form, ),......,( 110 −= nccccr , where )(qGFc j ∈ , which is defined by the following

linear transformation:

Gmc ⋅=rr

Here G , called the generator matrix, is a nk × matrix of rank k of elements from

)(qGF .

From the definition, it can be seen that any linear combination of two or more code

words is also a code word in a linear block code, hence their name. Among the linear

block codes, linear cyclic codes are most commonly used, including BCH codes and

Reed-Solomon codes, which are elaborated below.

81

5.2.1 Linear Cyclic Codes

Liner cyclic codes form a very important subclass of linear block codes. An

),( kn linear code C is called a cyclic code if any codeword Ccccc n ∈= − ),......,( 110r has

its cyclic shift S Cccccc nn ∈= −− ),......,,( 2101r . The special algebraic and geometric

structure of the cyclic codes ensure their implementation is relatively easy. A number

of efficient encoding and decoding algorithms have been derived for cyclic codes by the

use of shift-register circuits. These algorithms make it possible to implement long

block codes with a large number of codewords in practical communication application.

Almost all block codes employed in modern digital practice are either linear cyclic

codes or closely related to them.

BCH codes form a large class of powerful random error-correcting cyclic codes, which

was discovered by Bose and Ray-Chaudhuri in 1960 [4,5] and independently by

Hocquenghem in 1959 [6]. This class of codes is a remarkable generalization of the

Hamming codes for multiple-error correction. BCH codes provide a wide variety of

block lengths and corresponding code rates. They are important not only because of

their flexibility in the choice of their code parameters, but also because, at block lengths

of a few hundred or less, many of these codes are among the most used codes of the

same lengths and code rates. Another advantage is that they are capable of correcting

all random patterns of t errors by a decoding algorithm that is both simple and easily

realized in a reasonable amount of equipment. Among the non-binary BCH codes, the

most important subclass is the class of Reed-Solomon codes [7]. Reed-Solomon codes

have particularly good distance properties and burst error correction capabilities since

bursts of errors cause only a few symbol errors in a Reed-Solomon code, which can be

easily corrected. Reed-Solomon codes also can be concatenated with a binary code to

provide higher levels of error protection.

82

5.3 Convolutional codes

Convolutional codes [8,9] differ from block codes in that the encoder contains memory

and the n encoder outputs at any given time unit depend not only on the k inputs at that

time unit but also on the m previous input blocks

5.3.1 Convolutional Encoding

Fig.5.1 shows the structure of a typical convolutional encoder. Convolutional codes are

usually described using two parameters: the code rate and the constraint length. The

code rate, k/n, is expressed as a ratio of the number of bits fed into the convolutional

encoder (k) to the number of channel symbols output by the convolutional encoder (n)

in a given encoder cycle. The constraint length parameter, K, denotes the “length” of

the convolutional encoder, i.e. how many k-bit stages are available to feed the

combinatorial logic that produces the output symbols.

K× k stages Channel symbols

Figure 5.1 Convolutional Encoder

The input data to the encoder, which is assumed to be binary, is shifted into and along

the shift register k bits at a time. The n-bits output sequence for each k-bit input are

generated by the n linear algebraic function generators. Closely related to K is the

parameter, m, which indicates how many encoder cycles an input bit is retained and

used for encoding after it first appears at the input to the convolutional encoder.

1 2 … k 1 2 … k 1 2 … kk

1 2 3 n

Information bits

83

The parameter m can be thought of as the memory length of the encoder. In practice

codes with k= 1 and n= 2 are more often used, in these cases m= K - 1. Increasing K or

m usually improves the performance of convolutional codes.

Unlike a block code, which has a fixed length n, a convolutional encoder is basically a

finite-state machine, whose state status is determined by its memory elements. It is the

state status which determines the mapping between the next set of input and output bits.

As with most finite-state machines, the convolutional encoder can only move between

states in a limited manner, which can be represented by a state-transition diagram. The

state diagram of the convolutional code(7,5) with K = 3, k = 1 and n = 2 is shown in

Fig.5.2. The octal number 7 and 5 represent the code generator polynomials, which

when read in binary (111, 101) correspond to the shift register connections to the

modulo-two adders.

Figure 5.2 State Diagram of a 4-state convolutional encoder

In the state-transition diagram, nodes represent states and branches represent transitions.

Each branch in the state diagram has a label of the form XX/Y, where XX is the output

pair corresponding to the input bit Y. When depicting the evolution of states transition

with time, the trellis diagram can be obtained as shown in Fig. 5.3. In the diagram, the

four states (00, 01, 10, 11) are shown at the left-hand side, and the two digit numbers

represent the output as the encoder transitions from one state to another state. A solid

line in the diagram represents a ‘zero’ input bit and a dashed line represents a ‘one’

input bit.

01

00 10

11

11/0 10/0

00/1

01/001/1

01/1

00/0

10/1

Input 0: Solid line Input 1: dashed line

84

Figure 5.3 Trellis diagram of a 4-state convolutional encoder

5.3.2 Viterbi Decoding

In the decoding of a block code for a memoryless channel, the distances between the

received code word and the 2k possible transmitted code words are computed. Then the

code word that is closest in distance to the received code word is selected. This

decision rule, which requires the computation of 2k metrics, is optimum in the sense that

it results in a minimum probability of error for the binary symmetric channel and the

additive white Gaussian noise channel.

Different from block code decoding, the optimum decoding of a convolutional code

involves a search through the trellis for the most probable sequence. Depending on

whether the detector following the demodulator performs hard or soft decisions, the

corresponding metric in the trellis search may be either a Hamming metric or a

Euclidean metric respectively. A metric is defined for the jth branch of the ith path

through the trellis as the logarithm of the joint probability of the sequence conditioned

on the transmitted sequence for the ith path. That is,

),|(log )()( ijj

ij CYp=µ ,....3,2,1=j

Furthermore, a metric PM(i) for the ith path consisting of B branches through the trellis

is defined as

∑=

=B

j

ij

iPM1

)()( µ

00 00 00 00 00

11 11 11 11 11

01 01 01 01

10 10 10 10 11 11 11

00 00 00

01 01 01

10 10 10

00

01

10

11

85

The criterion for deciding between two paths through the trellis is to select the one

having the larger metric. This rule maximizes the probability of a correct decision or,

equivalently, it minimizes the probability of error for the sequence of information bits.

Based on this criterion Viterbi introduced a decoding algorithm [10,11] for

convolutional codes in 1967.

In the Viterbi algorithm it is assumed that the code begins and ends at the all-zero state.

For a (n,k,m) convolutional code, the input information sequence of kL bits is padded

with km all-zero bits, which are called tail bits, to flush the encoder memory in order for

the last information bits having their influence on the last output symbols of the

convolutional encoder. The received code word contains n(L+m) bits. With this

assumption the algorithm can be summarized as following.

1. Draw a trellis of depth k(L + m). For the last m stages of the trellis, draw only

paths corresponding to the all-zero input bits.

2. Initialisation: Set l = 1 and the metric of the initial all-zero state equal to 0.

3. Recursion: Find the distance of the lth nbits in the received sequence to all

branches connecting the states at the lth stage to the states at the (l + 1)th stage

of the trellis.

4. Add these distances to the metrics of the states at the lth stage to obtain the

metric candidates for the states at the (l + 1)th stage. For each state at the

(l+1)th stage, there are 2k paths entering the state and thus there are 2k metric

candidates. For each state at the (l+1)th stage, find the minimum of the metric

candidates and label the corresponding branch as the survivor. Store the

survivor path and assign its metric as the metric of the state at the (l+1)th stage

and eliminate all other paths.

5. If l = L + m, go to the next step: otherwise increase l by 1 and go to step 3.

6. For the all-zero state at the last (L + m)th stage, the survivor path is the optimum

path and the input sequence associated with this path is the maximum likelihood

decoded information sequence. Remove the last km bits from the estimated k(L

+ m)-bit sequence and thus obtain the estimated kL information bits.

86

The complexity of the Viterbi algorithms is proportional to the number of states and

paths in the trellis diagram. The complexity of the algorithm increases with the memory

length m and the input block length k. In addition, the decoding delay and the amount

of memory required for the storage becomes unacceptable when decoding a long

information sequence. A solution to this problem is the path memory truncation

approach, where the decoder at each stage only searches δ stages back in the trellis

instead of the start of the trellis. The parameterδ is called the trellis depth. Simulations

have shown that when )1(5 +≥ mδ , the performance degradation caused by this

suboptimal decoding is negligible.

5.3.3 Performance of Convolutional codes

5.3.3.1 Performance of Hard-decision Viterbi decoding algorithm

Unlike block codes, it is difficult to give a closed form expression for the performance

of convolutional code. Usually it is given by bounds. Let S be the set of all paths that

diverge from the all-zero path at a fixed time instant t, say 0=t , and remerge into the

all-zero path exactly once at some time later. The performance analysis of

convolutional codes is based on the first-event error probability, denoted as Pev, which

is defined as the probability that any path in the set S accumulates a higher metric that

the all-zero path, given correct decoding up to 0=t and assuming that all-zero path to

be correct without loss of essential generality. The more useful measure is bit error

probability, denoted as Pb, which is defined as the expected number of bits errors in a

given sequence of received bits normalized by the total number of bits in the sequence.

It can be shown that Pev for memoryless channels is bound by [13]

ddd

dev PaPfree

∑∞

=

≤

ddd

dkb PbPfree

∑∞

=

≤ 1

87

In the expressions, dfree is the minimum free distance of the code, ad is the number of

paths in S of Hamming weight d, and bd is the total number of nonzero information bits

in all paths of Hamming weight d in S. As for Pd, it is the pair-wise probability that a

path in S of Hamming weight d is chosen instead of the correct path. The parameters ad

and bd depend only on the code parameters and are commonly calculated from the

code’s transfer function while Pd is channel-dependent. For an additive white Gaussian

noise (AWGN) channel,

=

021 2

NEQP b

d

where

dzzxQx

)21exp(

21)( 2−= ∫

∞

π

where Eb/N0 is the energy-per-bit-to-noise density ratio.

5.3.3.2 Performance of Soft-decision Viterbi decoding algorithm

If Euclidean metric is used in the Viterbi decoding algorithm instead of the Hamming

metric, the Viterbi decoding becomes soft-decision Viterbi decoding. In many practical

applications, one wishes to use digital rather than analog circuits to implement the

Viterbi decoder. This means that the signal must be processed though an analog-to-

digital converter. If the received video data is quantized to one-bit precision before

being sent to the Viterbi decoder, the result is conventional hard decision data. If the

received symbols are quantized with more than one bit of precision, the result becomes

soft decision. A Viterbi decoder with soft decision data inputs quantized to three or

four bits of precision can perform about 2 dB better than one working with hard-

decision inputs in terms of coding gain [14]. The selection of the quantizing levels is an

important design decision because it can have a significant effect on the performance of

the reconstructed video quality. It has been observed by a large number of different

people that using five or six bits in the analog-to-digital converter usually gives

performance results extremely close to those of an analog soft-decision decoder. An

analysis of the effect of quantization can be found in [15].

88

In general, for coherent BPSK signals with AWGN channels and unquantized received

signals, it can be shown that the Pd should be replaced by the following equation while

all other equations still hold from the previous section:

)/)/(2( 0NEnkdQP bd =

5.3.3.3 Advantages of soft-decision over hard-decision decoding

To understand the advantage of soft-decision decoding over hard-decision decoding, we

need to understand the inherent drawback of hard-decision decoding.

Let x be the true transmitted sequence, y its adversary sequence and z the observed

received sequence. Also suppose the Hamming distance between x and y is d, i.e.

dyxd H =),(

If x really is the original transmitted code word, it must be true that the received

vector z is the sum of x plus some error vector xe . Under hard decision decoding, xe

is a binary vector, i.e.

xexz +=

In a similar fashion, if the actual transmitted code word was y , then

yeyz +=

If

yeyz += xex +=

the Viterbi decoding will select y instead of x and the error event of distance d will

occur. If the Hamming weight of xe (the number of 1 in vector xe ) is )( xH ew , clearly,

only when

2/)( dew xH >

the error event will occur. If d is an odd number the probability of the trellis error is

jdjd

dj

ppjd

dE −

+=

−

= ∑ )1()|Pr(

21

, d odd

89

When d is an even number, there is a slight complication, since it is possible for

2/)( dew xH = . In this case, we would have a tie between the adversary paths. A tie

implies that we have no statistically valid way to pick one sequence over the other. The

Viterbi decoder must, however, pick one or the other of the two paths. Since this is a

pure guess, the decoder has, at best, only a 50% chance of picking the correct path.

Therefore, if d is an even number,

2/2/

21

)1(2/2

1)1()|Pr( ddjdjd

dj

ppdd

ppjd

dE −

+−

= −

+=

∑ , d even

When the Euclidean distance is used in Viterbi decoding instead of the Hamming

distance in the hard decision Viterbi decoding, the probability of having two real valued

squared Euclidean distances that are exactly equal is zero for all practical purposes.

This eliminates ties, which improves the error rate of the Viterbi decoder.

However, the error rate of the decoder improves by much more than is accounted for

simply by eliminating ties. When the Euclidean distance is used in the Viterbi

decoding, the received the sequence become

η+= xz

where η is the sample of an AWGN process. If η is large enough to

cause ),(),( xzdyzd EE < , an error will occur, where ),( xzd E is the Euclidean distance

between the received sequence and transmitted sequence and ),( yzd E is the Euclidean

distance between the received sequence and adversary sequence. Since the iη are

statistically independent zero-mean Gaussian random variable, it can be shown that the

probability of this is

≡= ∫

∞−

σσ

πσ 22/

21)/(

2/

22 dQduedEPd

ur

where 2σ is the variance of the zero-mean Gaussian random process and 2/1

0

2)(),(

−=≡ ∑=

L

iiiE yxyxdd

where L is the length of the sequence.

90

If it is not easy to see the implication of these formulas, more concrete examples from

[14] show that the soft decision Viterbi decoding is more than two orders of magnitude

better than hard decision Viterbi decoding.

5.3.4 Punctured Convolutional code

Figure 5.4 Basic procedure of punctured coding from rate ½ convolutional code

A punctured convolutional code [12,13] is a high rate code obtained by the periodic

eliminations of specific code symbols from the output of a low rate encoder. Fig.5.4

shows the basic procedure for a rate ½ code. Specific m bits among l blocks (2l bits) of

the original code sequence are periodically deleted according to the map which

indicates positions for deleting bits. When m is chosen to be 1−l , a punctured code of

rate nn /)1( − is obtained.

For punctured high-rate convolutional codes, Viterbi decoding is hardly more complex

than for the original code from which the punctured codes are derived. The decoding is

performed on the trellis of the original code where the only modification consists of

discarding the metric increments corresponding to the punctured code symbols.

Given the perforation pattern of the code, this can be readily performed by inserting

dummy data into the positions corresponding to the deleted code symbols. In the

decoding process this dummy data is discarded by assigning the same metric value

regardless of the code symbol, 0 or 1. This procedure in effect inhibits the

Convolutional Encoder For Original ½ code

b1,1 b2,1 - - - - - - bl,1 b1 2 b2 2 - - - - - - bl 2

l

1 0 - - - - - - 1 1 1 - - - - - - 0

Input data

Original Coded Data

bl+1,1 - - - bl+1,2 - - -

Map of Deleting Bits

Punctured Coded data

1: transmitting 0: Deleting (m bits)

bl+1,1 - - - bl+1,2 - - -

b1,1 X - - - - - - bl,1 b1 2 b2 2 - - - - - - X

Original Coded data

(X - - - deleted bits)

91

convolutional metric calculation for the punctured symbols. In addition to the metric

inhibition, the only coding rate dependent modification in a variable-rate codec is the

truncation path length, or the trellis depth, which must be increased with the coding rate.

All other operations of the decoder remain essentially unchanged.

It is not difficult to see that the performance of punctured convolutional codes is

degraded compared with the original codes, however the degradation is rather gentle as

the coding rate increases from ½ to 7/8 or even to 15/16.

From previous works [12,13] it can be summarized that,

1. For the same rate punctured codes, the coding gain increases by 0.2 – 0.5 dB with

the increase of the constraint length K by 1.

2. Although the coding gain of punctured codes decreases as the coding rate becomes

higher, the coding gain is still high even for the high rate punctured codes. For

example, the rate 13/14 code provides a coding gain of more than 3 dB for 7≥K .

These properties of punctured convolutional codes make them an attractive option for

efficient implementation.

References

[1] C. E. Shannon, “A Mathematical Theory of Communication”, Bell System

Technical Journal, vol. 27, pp. 379-423 and pp. 623-656, July and October, 1948.

[2] J.H. van Lint, “Introduction to Coding Theory”, Springer-Verlag, 1982.

[3] George C. Clark, Jr. and J. Bibb Cain, “Error-Correction Coding for Digital

Communications”, Plenum Press, 1981.

[4] R. C. Bose and D. K. Ray-Chaudhuri, “On a class of error correcting binary group

codes”, Information and Control, 3(1), March 1960, pp.68-79.

[5] R. C. Bose and D. K. Ray-Chaudhuri, “Further results on error correcting binary

group code”, Information and Control, 3(3), September 1960, pp. 279-290.

[6] A. Hocquenghem, “Codes correcteurs d’erreurs”, Chiffres, 2, 1959.

92

[7] I. S. Reed and G. Solomon, “Polynomial codes over certain finite fields”, J. Soc.

Indust. Appl. Math, Vol. 8, 1960, pp. 300-304.

[8] A. J. Viterbi, “Convolutional Codes and Their Performance in Communication

Systems”, IEEE Transactions on Communications Technology, Vol. COM-19, No.

5, October 1971, pp. 751-772.

[9] J. G. Proakis, “Digital Communications”, McGraw-Hill, 1995.

[10] A. J. Viterbi, “Error bounds for convolutional codes and an asymptotically

optimum decoding algorithm”, IEEE Trans. Inform. Theory, Vol. IT-13, No. 2,

April 1967, pp. 260 – 269.

[11] A. J. Viterbi and J. K. Omura, “Principles of Digital Communication and

Coding”, McGraw-Hill Book Company, 1979.

[12] Y. Yasuda, K. Kashiki and Y. Hirata, “High-Rate Punctured Convolutional

Codes for Soft Decision Viterbi Decoding”, IEEE Transactions on Communications,

Vol. Com-32, No. 3, March 1984, pp. 315- 319.

[13] D. Haccoun and G. Begin, “High-Rate Punctured Convolutional Codes for

Viterbi and Sequential Decoding”, IEEE Transactions on Communications, Vol. 37,

No. 11, November 1989, pp 1113-1125.

[14] R. B. Wells, “Applied Coding and Information Theory for Engineers”, Prentice

Hall, 1999.

[15] R. Wells and G. Bartles, “Simplified calculation of likelihood metrics for Viterbi

decoding in partial response systems”, IEEE Trans. Magnetics, vol. 32, no. 5, Pt. III,

Sept. 1996.

93

6 SECOND ERROR CONTROL AND ECC VIDEO

6.1 Introduction

As described in Chapter 4, diverse error resilience techniques have been introduced and

some of them have been incorporated into MPEG-4 [1] or H.263 [2] video coding

standards to address the need for error resilient video transmission. The key technique

among the error resilience tools in the MPEG-4 standard is

resynchronization/packetization. With this technique, a compressed video bitstream is

packetized by inserting resynchronisation markers in the bitstream which lets the

decoder regain synchronization after an error occurs in the bitstream by looking for

another resynchronisation point, therefore limiting the error effects to the packet where

the error occurs.

It needs to be emphasized that although resynchronization is often referred to as

packetization in literature, the packetization process for error resilience at the

application layer is different from the packetization process for channel coding at the

data link layer. The packetization operation for error resilience simply means that the

resynchronization markers are inserted periodically in a video bitstream. The packet

size (for error resilience) usually means the bits number between two resynchronization

markers and packet size may vary slightly from packet to packet, as the start and the end

of a packet needs to be aligned with the start and the end of a macroblock. In this thesis

both resynchronization and packetization are used without being distinguished to refer

to the resynchronization operation for error resilience at the application layer.

With a packetization approach combined with RVLC (reversible variable length code)

and data partitioning, a decoder is also able to partially recover some data within the

94

packet containing errors, which would be otherwise totally discarded. Obviously, there

are several disadvantages with these error resilience tools.

Firstly, while these error resilience techniques bring error resilience, they also introduce

vulnerability. If an error happens to be within a marker including resynchronization

marker, DC marker and motion marker, the decoder will lose synchronization with the

encoder; the packet containing the error or even several packets will have to be

discarded.

Secondly, there is an associated loss of coding efficiency with these schemes. From our

simulation results (to be discussed later) it is clear that with CIF format video sequences

Salesman and Akiyo, when the packet size is set to 600 bits, an increase of bit rate of

more than 9.9% occurs when resynchronization, Data Partitioning and RVLC are

employed. It may be argued that employing only a packetization approach without

combing Data Partitioning and RVLC can reduce the increase of the bit rate, but it is

really necessary to combine RVLC and Data Partitioning with the packetization scheme

to fully exploit the potential of the packetization scheme. Another way to reduce the bit

budget for overhead due to packetization is to increase the size of the packet, but this

will bring a longer decoding delay, as the packetization approach introduces a decoding

delay of one packet (or slice). Increasing the packet size will also reduce the

effectiveness of the packetization scheme. It should be noted that RVLC reduces the

coding efficiency too.

Thirdly, these techniques are passive in the sense that they do not have the capability to

recover a bitstream from errors actively and completely by correcting the error bits in

the bitstream. Instead, the packets containing any errors are simply discarded, though

some information in the packets in error can be partially recovered through the

employment of RVLC and Data partitioning. The loss of information caused by

discarding the packets is unrecoverable; with the inter-frame error propagation effects,

the reconstructed video output rapidly declines to unrecognizable if no other measure is

taken. Also the partial recovery of the corrupted data through the employment of Data

Partitioning and RVLC is only possible when the corresponding motion vectors are

available. If the packet header or the motion information, which is located in the first

part of the packet, is corrupted first, then the use of RVLC becomes meaningless as the

95

data recovered by RVLC is only the difference between the blocks in the current frame

and the corresponding blocks in the previous frame located by the motion vectors. The

original texture coefficients in the block cannot be recovered without the motion

information. While data partitioning makes error concealment more easily realized

when motion information is available, even the best error concealment techniques only

reduce the influences of error to a certain degree. Actually nearly all the error resilience

tools currently available either inside or outside the video coding standards are passive

in the sense they do not have the capability to correct the errors in the final video

bitstream before video decoding.

Lastly, AIR [1] will increase the bit rate of an encoded bitstream significantly while

NEWPRED [1] needs the upstream messaging from the decoder, which may not be

practical in some situations, especially in a multi-party video communication system.

Generally reducing packet size can increase the robustness of an encoded video

bitstream with the cost of decreasing the coding efficiency. However there is a limit on

the effectiveness of improving the robustness by reducing the packet size due to the

reasons stated above. In some extreme channel conditions, an acceptable quality in

video communication will become impossible by simply employing the error resilience

tools in the MPEG-4. One extreme example is that when the packet is so small that one

packet only contains one macroblock, the bitstream will be more vulnerable than not

using the packetization scheme, as the markers will take more portion in the bitstream.

Obviously other tools are needed.

6.2 Second Error Control

Taking a further step to look at the inside of the current available error resilience video

coding tools, it can be seen that the most fatal and fundamental disadvantage of these

tools is that they accept the residual errors passively delivered to the application layer

by the transmission system of the network. As stated in Chapter 1, it is unavoidable that

some residual errors will have to be delivered to the video decoder by the transmission

network. But one question can be asked, do we have to accept these error bits in the

application layer for a real-time application? If the answer is yes, then the current

96

available error resilience techniques are the only choices, which means we will have to

accept all the poor quality of real-time video transmission associated with these

techniques. If the answer is no, a mechanism is needed to correct these errors and a

form of error control in the application layer is necessary.

To apply error control in the application layer after the first error control takes place in

the data link layer seems unrealistic because of the huge overhead a usual error control

scheme will cause. Obviously employing ARQ (automatic retransmission request) at

the application layer in SEC (second error control) is not realistic after the first error

control takes place at the data link layer, which probably has used up all the time limit

allowed for retransmission with ARQ. Applying directly a usual FEC (forward error

correction) approach commonly used in the data link layer, is unrealistic and cannot be

justified because of the huge overhead the usual FEC limitations. Now another question

can be asked, is there an effective error correction code with extremely high coding

efficiency? With an increase of around 9.9% in the final bitstream for error resilience

overhead, is it possible to do something better than the resynchronization approach in

MPEG-4? Recalling the capability a punctured convolutional code can provide, the

answer to the question may be yes. If another ECC (Error Correction Coding) is

applied at the application layer to correct residual errors, that means we are going to use

a second error control for real-time applications. Does this work and is punctured

convolutional coding efficient enough? This question is addressed in the sections below.

6.3 ECC video – the SEC approach

In an ECC scheme, a compressed video bitstream is not packetized using the

resynchronization markers; instead it is protected with an error correction code, i.e. a

compressed video bitstream is further encoded using the error correction code. The

basic requirement for the error correction code is high coding efficiency and strong

error correction capability. In this work the error correction code is achieved with a

punctured convolutional code [3,4,7]. There are three reasons for choosing the

convolutional code. Firstly convolutional coding is more suitable to mobile channels.

Enhanced with interleaving, a punctured convolutional code is also very good at coping

with both bursty errors and packet loss in addition to correcting random errors.

97

Secondly, it is easier to adapt the rate of the error correction code with a punctured

convolutional code [5,6] matching the residual error conditions. Thirdly, when

punctured, some of the convolutional codes can achieve very high coding efficiency

while still retaining very good error correction capability. After each video frame is

compressed, the compressed bitstream is further encoded using a punctured

convolutional code. The picture start code serves as the synchronization point, so the

decoder is able to receive the portion of each frame in the bitstream.

Before the video decoder start video decoding, it first decodes the punctured

convolutional code using the Viterbi decoding algorithm. Data partitioning and RVLC

can still be employed in an ECC scheme because of their error resilience and

concealment capability. If Data Partitioning and RVLC are used without employing

packetization, the whole frame can be considered as containing only one packet. The

main difference with the conventional MPEG-4 approach is that markers including DC

markers and motion markers in the ECC video bitstream are protected by the

convolutional code as well, while markers in the packetized video bitstream are exposed

to errors. Also in an ECC video bitstream, each frame only contains one motion marker

or DC marker, while in a packetized video bitstream each frame can contain multiple

resynchronization markers, DC markers or motion markers depending on the packet

size in the bitstream.

Figure 6.1 Video Communication System with ECC


ECC Encoder

Channel

Encoder

Channel

Channel

Decoder

ECC

Decoder

Source

DecoderDisplay

98

A video communication system employing ECC is shown in Figure 6.1. It needs to be

emphasized that when compared with Fig. 1 it is clear that ECC is not a form of channel

coding, and although some punctured convolutional codes have been widely used as

channel coding schemes in the data link layer; instead it is part of source coding for

error resilience purposes. More precisely it is a SEC approach in addition to the first

error control (conventional error control) in the data link layer. Obviously the operation

of ECC on a compressed video bitstream for error resilience is different from the

ordinary FEC technique commonly employed in the data link layer of the network

though in principle they take similar role on correcting errors in a bitstream. First, FEC

is usually employed as a channel coding mechanism to improve the capacity of the

channel and often combined with ARQ (automatic repeat request). From the point view

of layered structure of telecommunication network, FEC usually exists in the second

layer (or data link layer) while ECC on encoded video bitstream is part of application

layer, therefore is considered as part of the source data by FEC. Second, the design and

choice of FEC usually depends on the channel conditions and the associated ARQ

mechanism, while the design and choice of ECC depend on the capability of the

network to combat the errors in the telecommunication channels. Third, FEC works on

the original errors introduced by the unfavorable channel conditions, ECC works on the

residual error left in the source data by the network. In other words, FEC belong to first

error control while ECC belongs to SEC.

In some telecommunication networks (for example some networks based on UDP/IP

protocols), the network simply discards the packets when the packets contain errors,

after these packets have been transmitted through the data link layer at the receiving

side. These networks must modify their protocols such that they will deliver the packets

to the application layer even when the packets still contain errors, if these networks are

employed for video communications and the video bitstreams are protected by ECC for

error resilience. The same applies to their use with the conventional error resilience

tools in MPEG-4. When the conventional error resilience tools in the MPEG-4 standard

are employed, the networks need to deliver the packets to the application layer even

when the packets contain errors.

99

6.4 Simulation Results

To evaluate the effectiveness of the proposed ECC scheme, two widely used video

sequences, Akiyo with relatively slow motion and Salesman with fast movement, are

chosen as the test sequences. The goal is to compare the PSNR (peak-to-peak signal to

noise ratio) of the reconstructed video sequence from the bitstreams protected with ECC

and the bitstreams protected with packetization.

Following the convention, in this thesis the PSNR is defined as [12]

( )

−=

∑∑i j

prcrefN jiYjiYPSNR 21

2

10 ),(),(255log10

where ),( jiYref and ),( jiYprc are the pixel values of the reference and processed images

respectively, N is the total number of pixels in the image, and i, j are the pixel index in

the image. In this equation the peak signal with an 8-bit resolution is 255, and the noise

is the square of the pixel-to-pixel difference (error) between the reference image and the

image under study. Though it has been claimed that in some cases PSNR’s accuracy is

doubtful because the color information has not been taken into consideration, its relative

simplicity makes it a very popular choice. If accuracy is a main concern, then some

more sophisticated perceptual error models than simple pixel differences might be used

[13].

6.4.1 Experimental conditions

The tests are conducted based on the following conditions:

1. 50 frames of each video sequence are encoded with the first frame coded as I frame

followed by all P frames without rate control.

2. Packet size of both video sequences is set to 600 bits when the packetization scheme

is used.

100

3. When the ECC scheme is employed, the ½ rate base convolutional code (561, 753)

is chosen, which has a constraint length of K = 9. This base code is punctured to

rate 13/14, which means that every 13 bits in the encoded bitstream, another bit is

added after convolutional encoding. The puncturing pattern is shown below.

1 1 0 0 0 0 0 1 0 0 0 0 1 1 0 1 1 1 1 1 0 1 1 1 1 0

4. After transmission the convolutionally encoded bitstream is decoded using the hard

decision Viterbi decoding algorithm with trellis depth of 21xK.

5. Data partitioning and RVLC are employed in both experiments with the ECC

scheme and the packetization approach.

6. The same quantization parameters are used in all experiments, which means that

correctly decoded bitstreams protected using ECC or packetization should have the

same visual quality on the same video sequence in the error free environments.

7. In each test, the residual errors are simulated with random errors with a Gaussian

distribution with Bit Error Rate (BER) of the residual errors set at 1x10-5, 4x10-5,

1x10-4 and 1.7x10-4 respectively.

8. After the corrupted bitstreams are decoded, the erroneous motion vectors and

texture information are replaced by 0, which means that when the motion vectors

are not available, the motion compensations are implemented by using the motion

vectors exactly at the same position in the previous frame and while the texture

information is not available, the block in question is reconstructed using the texture

information in the blocks located by the motion vectors.

6.4.2 Results

To express the simulation results, a new notation is needed. ECC(7/8) means that a

ECC scheme is used with the ECC rate set to 7/8. Similarly, Packetization(600) means

a packetization scheme is used with packet size set to 600 bits. The final results,

obtained by averaging results over 100 individual tests, are shown in Figure 6.2 to

Figure 6.11. The numbers of bits to encode each frame of the video sequences with

each scheme are listed in Table 6.1 and Table 6.2.

101

The advantage of using ECC instead of packetization is clearly seen. ECC(13/14)

produces less overhead in the bitstream than Packetization(600). The average number

of bits per frame used for encoding Akiyo is 4896.64 when ECC(13/14) is employed

and 4959.52 when packetization(600) is used. For Salesman the average number of

bits used for encoding each frame becomes 11674.88 and 11768.48 when ECC(13/14)

or Packetization(600) is employed respectively. The PSNRs of the video output

reconstructed from the bitstreams employing the ECC(13/14) are much higher than the

PSNRs of the video output employing Packetization(600) for both video sequences

Salesman and Akiyo. The PSNR gains range between 1dB and 4dB as shown in Figure

6.3 to Figure 6.4, Figure 6.7 to Figure 6.8 and Figure 6.10 to Figure 6.11, when the BER

of the final bitstream varies from 1x10-5 to 10-4.

Generally, the PSNRs of the reconstructed video output degrade as the BER of the

residual errors increases. At the extreme residual error condition, for example when the

BER reaches 1.7x10-4 for video sequence Akiyo with moderate slow motion, the

bitstream employing ECC(13/14) still delivers viewable (though not very good)

reconstructed images, while the bitstream employing Packetization(600) produces an

unrecognizable output as shown in Figure 6.9. For video sequence Salesman with fast

motion, both ECC and packetization approaches fail to deliver decent reconstructed

video outputs at the specified test conditions. For the purpose of comparison in another

experiment, the first frame (I frame) is transmitted error free. Again video output

employing ECC(13/14) has a better PSNR than the video output employing

Packetization(600) as shown in Figure 6.5.

When the BER of the residual errors is relaxed from 1.7x10-4 to 1x10-4, the simulation

results are shown in Figure 6.10 and Figure 6.11. Now the advantage of ECC over

packetization is clearly seen again, as packetization is producing uniformly

unacceptable results.

It can be seen that the PSNR gain with Akiyo is much higher than with Salesman when

ECC is used instead of packetization. The reason is simple: the bitstream for Salesman

has a much higher data rate than the bitstream for Akiyo. For random errors, higher

data rate means more opportunities that the errors in the bitstream corrupt the bitstream.

102

Two conclusions can be drawn from these experiments. First, ECC is superior over

packetization in terms of coding efficiency and effectiveness. Second, in extreme

residual error conditions, for instance when the BER of the final bitstream is higher than

10-4, both ECC(13/14) and packetization(600) are not enough. ECC needs to be and

can be improved to make it more powerful to correct most errors in the video bitstream,

especially in I frames, because in P frames at least some basic error concealment

operations can reduce the error effect in the reconstructed images.

The investigation on how to improve the power of ECC is given in Chapter 8. So far

we have no way to further improve the robustness of the conventionally encoded video

bitstream in some extreme situation if the packet size has reached its saturation point,

i.e. further reducing packet size will not improve the quality of the video transmission

when packetization is employed.

Another important characteristic to be noted is that the ECC approach produces much

less overhead for an I frame compared with the packetization approach; this can easily

be seen from the Table 6-1 and Table 6-2. The number of bits for the I frame of Akiyo

is 47480 when ECC(13/14) is used and 50352 when Packetization(600) is employed;

while the number of bits for the I frame of Salesman becomes 79432 and 81168 when

ECC(13/14) or Packetization(600) is employed respectively. This can be a big

advantage favoring the ECC approach for video transmission as the smaller bit rate for I

frames means a relaxed requirement for peak channel capacity for transmitting I frames.

According to convolutional coding theory, increasing the constraint length parameter K

of a convolutional code and trellis depth of Viterbi decoding will increase the power of

the convolutional code. However, the computational requirement for the Viterbi

decoding algorithm grows exponentially as a function of the constraint length K, so it is

usually limited in practice to constraint lengths of 9 or less. As computing technique

advance, it is reasonable to expect that it will be realistic to use convolutional codes

with constraint lengths longer than 9, which will make the ECC approach more efficient

and effective.

Though the experiments are conducted based on the MPEG-4 video coding standard,

the proposed scheme can be applied to all other video coding standards including

MPEG-1 [9], MPEG-2 [10], H.261 [8], H.263 [2], as all of them use basically the same

103

techniques based on DCT and motion estimation. The proposed error resilience

technique can also be applied to other video coding schemes, which are not based on

DCT and motion estimation, such as wavelets.

6.5 Discussion

In this chapter, the new concept of Second Error Correction is introduced. In this work,

SEC is realized with Error Correction Coding. For video applications, use of ECC

accomplished with punctured convolutional coding has achieved success.

The SEC approach has provided a fresh view on both error control and error resilience

coding. Traditionally implemented at data link layer, error control is now not only a

technique to improve the capacity of a channel at the data link layer; it is also an active

error resilience tool and can be implemented in the application layer. As a result, the

convolutional codes also expand into a new field of application. With the introduction

of SEC, several aspects of network operations can be integrated into a generic and

extended framework, which we can still use the term “Error Control” to represent, but

now the term has a broader meaning.

Under the concept of “Error Control” with its broader meaning, source coding, channel

coding and error resilience are not separated operations, they are different aspects of an

integrated functionality for error resilient real-time video delivery. Within the

integrated functionality, the distribution of error control between first error control and

SEC needs to be optimized. The distribution of the available bandwidth of radio

channels for source coding, first error control and SEC needs to be optimized as well.

A generic rate control algorithm based on these optimizations will be more effective and

efficient. These can be part of the future works.

The proposed algorithm requires more effort at the decoder as a Viterbi convolutional

decoding is quite demanding of computing power. If the decoder does not have enough

computing capacity, a longer decoding delay will be introduced. However, with the

commercial products of Viterbi convolutional decoding hardware and software widely

available, this should not become a problem at all.

104

In our experiments, the punctured convolutional code rate is 13/14, resulting in 7.7%

increase of the data rate of a base MPEG-4 encoded bitstream. It needs to be pointed

out that when convolutional code with a longer constraint length is used, a higher

punctured code rate, which will result in less overhead for the ECC, can be used to

achieve similar or better results. However, this will further increase the computing

complexity of the decoder.

To achieve the most coding efficiency, the puncturing rate can be adjusted to match the

bit error rate of the bitstream. For example, when the bit error rate in the bitstream is

not very high, the 16/17 code rate from the same base convolutional code may be

chosen to give a satisfactory protection to the video bitstream, while the 9/10 code can

be selected when the bit error rate of the bitstream is higher. More discussion on

different ECC rates is provided in Chapter 8 and Chapter 9.

To cope with a wide range of residual error conditions, the optimum puncturing patterns

of the base convolutional codes need to be further explored. At this stage the reported

highest punctured code rate for base code (171,133) is 16/17 [11], while the highest

punctured code rate for base code (561,752) is 13/14 [7]. The puncturing patterns of

higher code rates of 14/15, 15/16, 16/17, 17/18, 18/19, etc. need to be found for base

code (561,752) or other good base codes including those with constraint length longer

than 9, as a higher rate code will result in the ECC approach being more efficient in

favorable residual error conditions.

It is worthy to mention that a philosophically similar approach has been introduced in

the H.263 [2] video coding standard. In Annex H of H.263, the forward error correction

(FEC) for coded video signal is realized using block code BCH (511, 493). This allows

for 492 bits of the coded data to be appended with 2 bits of framing information and 18

bits of parity information to form a FEC frame. The FEC coding allows the correction

of single bit errors in each FEC frame and the detection of two bit errors for an

approximately 4% increase in bit rate. The FEC mechanism of Annex H is designed for

ISDN, which is an isochronous, very low error rate network. There is no doubt that the

FEC’s capability of correcting errors is very limited compared with ECC in a harsher

environment. First, the bit error number ECC can correct is not limited to 1 in a chunk

of 492 bits of the coded data. Second, ECC has the capability to cope with bursty errors

105

and packet loss [14], while FEC doesn’t have these capabilities. Lastly, FEC is not so

flexible to cope with residual error conditions, while ECC can be adaptive to residual

error conditions (see Chapter 8). It needs to be pointed out that the video decoding

process with ECC is totally compatible with the MPEG-4 standard after the bitstream is

convolutionally decoded.

The simulation results have given a positive answer to the question raised in Section

6.2, at least in random error situation. But in some extreme residual error situations,

ECC needs to be further enhanced to achieve more satisfactory results, which is

investigated in the following chapters.

References


Visual”, 2001.


[3] J. G. Proakis, “Digital Communications”, McGraw-Hill, 1995.

[4] A. J. Viterbi, “Convolutional Codes and Their Performance in Communication

Systems”, IEEE Trans. on Comm. Technology, Vol. COM-19, No. 5, October 1971, pp.

751-772.

[5] J.Hagenauer, “Rate-Compatible Punctured Convolutional Codes (RCPC Codes) and

their Applications”, IEEE Trans. on Comm., Vol. 36, No. 4, April 1998, pp. 389-400.

[6] J. Hagenauer, N. Seshadri and C. W. Sundberg, “The Performance of Rate-

Compatible Punctured Convolutional Codes for Digital Mobile Radio”, IEEE Trans. on

Comm., Vol. 38, No. 7, July 1990, pp. 966-980.



March 1984, pp. 315-319.

106

[8] Recommendation H.261: “Video Codec for Audiovisual Services at p×64 kbit/s”.

ITU-T (CCITT), Mar. 1993.

[9] ISO/IEC 11172-2, “Information technology-coding of moving picture and


August 1993.



[11] Yutaka Yasuda, Yasuo Hirata, Katsuhiro Nakamura and Susumu Otani,

“Development of variable-rate Viterbi decoder and its performance characteristic”,

Proc. 6th Int. Conf.Digital Satellite Commun., Phoenix, AZ, September 1983, pp. xii-24-

31.

[12] M.Ghanbari, “Video coding – an introduction to standard codecs”, The Institutition

of Electrical Engineers, 1999.

[13] K.T. Tan, M. Ghanbari and D.E. Pearson, “An objective measurement tool for

MPEG video quality”, Signal Processing, 7, 1998, pp. 279-294.

[14] Bing Du, M. Ghanbari, “ECC video in bursty channel errors and packet loss”, Proc.

Picture Coding Symp. 2003, Saint-Malo, France, 23 - 25 April 2003, pp.99-101.

107

PSNR of Salesman

31

32

33

34

35

1 6 11 16 21 26 31 36 41 46

Frame Number

PSN

R

Figure 6.2 PSNR of Salesman through error free channel

PSNR of Salesman

29

30

31

32

33

34

1 6 11 16 21 26 31 36 41 46

PSNR

Fram

e N

umb

ECC(13/14)Packetisation(600)

Figure 6.3 PSNR of Salesman with BER of 1 x 10-5

108

PSNR of Salesman

26

27

28

29

30

31

1 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PSN

R


Figure 6.4 PSNR of Salesman with BER of 4 x 10-5

PSNR of Salesman

242526272829303132333435

1 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PSN

R


Figure 6.5 PSNR of Salesman with BER of 1.7 x 10-4

(first frame transmitted error free to allow comparison)

109

PSNR of Akiyo

333435

363738

1 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PSN

R

Figure 6.6 PSNR of Akiyo through error free channel

PSNR of Akiyo

31

32

33

34

35

36

37

38

1 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PSN

R


Figure 6.7 PSNR of Akiyo with BER of 1 x 10-5

110

PSNR of Akiyo

28

29

30

31

32

33

341 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PSN

R


Figure 6.8 PSNR of Akiyo with BER of 4 x 10-5

PSNR of Akiyo

19

20

21

22

23

24

25

1 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PSN

R


Figure 6.9 PSNR of Akiyo with BER of 1.7 x 10-4

111

PSNR of Akiyo

20

22

24

26

28

301 5 9 13 17 21 25 29 33 37 41 45 49

Frame number

PSN

R

ECC(13/14)Packetization(600)

Figure 6.10 PSNR of Akiyo with BER of 10-4

PSNR of Salesman

22

23

24

25

26

27

1 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PS

NR

ECC(13/14)Packetization(600)

Figure 6.11 PSNR of Salesman with BER of 10-4

112

Table 6-1 Bit number comparison between Packetization(600) and ECC(13/14) for Akiyo.

Frame ECC (13/14) Packetisation (600) 0 47480 50352 1 536 488 2 672 616 3 672 616 4 1128 1112 5 1064 1040 6 1112 1088 7 1304 1272 8 1552 1552 9 1744 1736

10 1632 1632 11 1456 1472 12 2064 2024 13 2008 1976 14 2896 2912 15 4176 4208 16 5816 5840 17 7008 7000 18 7848 7896 19 7640 7704 20 6824 6880 21 6304 6352 22 5232 5232 23 4352 4368 24 4296 4312 25 3200 3184 26 3104 3104 27 3832 3848 28 4808 4816 29 5528 5512 30 6032 6048 31 6112 6112 32 5176 5176 33 4328 4344 34 4688 4688 35 4984 5008 36 5232 5256 37 5088 5128 38 5272 5272 39 5376 5408 40 4840 4840 41 4824 4872 42 4312 4328 43 3752 3752 44 3776 3784 45 4408 4416 46 4768 4760 47 5008 5040 48 5080 5096 49 4488 4504

Total 244832 247976 Average 4896.64 4959.52

113

Table 6-2 Bit number comparison between Packetization(600) and ECC(13/14) for Salesman.

Frame ECC (13/14) Packetisation (600) 0 79432 81168 1 8616 8744 2 14136 14120 3 16200 16256 4 15520 15464 5 12496 12568 6 8832 8872 7 8440 8520 8 9240 9312 9 9232 9328

10 10248 10344 11 9440 9480 12 8424 8456 13 9392 9496 14 10064 10088 15 10120 10080 16 8592 8640 17 9792 9920 18 14320 14320 19 16640 16624 20 14176 14224 21 10712 10904 22 10528 10696 23 9016 9032 24 6416 6440 25 5656 5712 26 6976 7032 27 7944 8024 28 6792 6840 29 5064 5080 30 5888 5912 31 6824 6848 32 7512 7624 33 8272 8416 34 11016 11120 35 13840 13848 36 15904 15920 37 14104 14088 38 14792 14880 39 13352 13416 40 11032 11120 41 11272 11328 42 10608 10648 43 10128 10248 44 10816 10840 45 11448 11600 46 9432 9520 47 8928 8976 48 8352 8432 49 7768 7856

Total 583744 588424 Average 11674.88 11768.48

115

7 ECC VIDEO WITH IFR

7.1 Introduction

To address the passiveness and disadvantage of the current error resilience tools, the

ECC scheme has been proposed in the last chapter and [13]. Basically in an ECC

approach, a video bitstream encoded using a current video coding standard including all

the MPEG series and H.26x series is not packetized, instead it is further encoded using

an error correction code. This is an active error protection approach in the sense that it

can recover a corrupted bitstream by correcting the errors in the bitstream. The ECC in

our proposal is achieved with punctured convolutional coding [3,4,5,7]. Because of the

very efficient and effective error correction capability the punctured convolutional

codes can achieve, the proposed scheme shows significant improvement over the

packetization approach in the current MPEG-4 [10,11] and H.263 [2,8,9] video coding

standard in terms of reconstructed video quality and coding efficiency.

However the proposed ECC scheme has its own disadvantage. The only

synchronization point in an ECC video bitstream is the Picture Start Code when

packetization is not employed. When a single error bit within a frame escapes

protection with ECC (even though it is very rare if the error correction code is properly

designed matching the residual error conditions), the decoder can lose synchronization

with the encoder. Consequently, the macroblock within which the error occurs and all

the following macroblocks within the frame will be undecodable, resulting in a “half

image” effect (a decoding failure within a packet results in the empty strips in the frame

when packetization is employed), and so the quality of the reconstructed frame with this

error and all subsequent frames will suffer significantly until the next I frame due to the

116

inter-frame error propagation effects. This can happen more frequently when the

residual error condition changes following the changes of channel condition, because

the change of the ECC rate always falls behind the change of residual error conditions.

It needs to be pointed out that the NEWPRED [1,6] will not work when the basic I

frame collapses. To address this problem, a new error resilient tool - Intra Frame Relay

(IFR), is proposed in this Chapter. Simulation results show a significant improvement

over the original ECC scheme.

7.2 ECC with IFR

In the IFR scheme, when transmitting an I frame, the starting number of the corrupted

macroblocks due to errors is transmitted to the encoder by the decoder through a back

channel. The encoder then knows that the picture area in the I frame from the starting

macroblock to the end of the frame has not been successfully decoded, therefore in the

next frame all the macroblocks associated with the corrupted macroblocks (including

the macroblocks from the starting number to the end of the frame and macroblocks

using the corrupted macroblocks as reference for motion estimation) can be encoded in

Intra mode. This will increase the possibility that all the subsequent frames will have

decent reference frames.

There are two reasons to employ the proposed scheme only to an I frame. Firstly, the I

frame is the most important frame for decoding a subsequent sequence of frames. If

errors happen within an I frame, all the subsequent P (predicted) and B (bi-directional

predicted) frames will be affected due to the inter-frame error propagation effects.

Second, to encode macroblocks in P frames or B frames in Intra mode can reduce the

coding efficiency significantly. If errors happen within P frames, using NEWPRED

[1,6] will be more efficient and also effective.

To make the proposal realistic, a video decoder must have some capability to detect

errors in the bitstream after the ECC operation. The error detection process used in this

work is based on the following mechanisms.

During the decoding process, if one of the following events occurs, the bitstream will

become undecodable and the decoder will know that an error or errors have occurred.

117

The decoder then starts some error concealment for the rest of the macroblocks within

the frame and sends back the starting number of the broken macroblock to the encoder.

• Invalid VLC (MCBPC, CBPY, MVD, AND TCOEF) code is detected.

• Quantizing information goes out of range.

• Invalid INTRA DC code is detected.

• Escaped TCOEF with level 0 is detected.

• Coefficient overrun occurred.

• A motion vector refers out of picture or beyond maximum search range (for P

frame error detection).

However, errors can occur in a way that the bitstream is still decodable even though the

bitstream contains errors. In this case the error detection can be conducted after the

decoding process using redundancy information inherent in the neighboring

macroblocks. More detailed discussion on error detection after decoding can be found

in [12].

It should be emphasized that the encoder not only needs to encode the macroblocks

from the starting number to the end of the frame in the first P frame following an I

frame in Intra mode, it also needs to encode those macroblocks in Intra mode which

may use part or all of those corrupted macroblocks as references for motion estimation.

For instance if the maximum search range of motion estimation is 16 pixels, the Intra

mode should start from the macroblock immediately above and left of the starting

macroblock in the P frame.

It also needs to be pointed out that transmission delay including both downlink and back

channel messages can happen in telecommunication networks. For instance, when the

back channel message arrives at the encoder, it is possible that the encoder is starting

encode the second P frame following an I frame. In this case the encoder should start to

encode in Intra mode from the macroblock, which is two rows above and two columns

left of the starting macroblock transmitted to the encoder by the decoder, in the second

P frame, if the maximum search range of the motion estimation is chosen as 16 bits.

118

As stated in the last chapter, both Data Partitioning and RVLC without employing

packetization can still be used with ECC video. If Data Partitioning and RVLC are

employed, both first and last number of corrupted macroblocks can be transmitted to the

encoder. Consequently, the number of macroblocks which need to be encoded in Intra

mode in the next frame will be reduced compared with not employing RVLC and Data

Partitioning, therefore the coding efficiency will be improved while coding the next

frame and subsequent frames.

7.3 Simulation results

To evaluate the effectiveness of the proposed algorithm, again Salesman and Akiyo are

chosen as the test sequences. The goal is to compare the PSNR of ECC video with and

without IFR. The experiments are conducted based on the following conditions.

1. 50 frames of each video sequences are encoded with the first frame coded as I

frame followed by all P frames without rate control.

2. When ECC is employed, the ½ rate base convolutional code (561, 753) is chosen

which has a constraint length of K = 9. This base code is punctured to rate 13/14,

which means that every 13 bits in the encoded bitstream, another bit is added after

convolutional encoding.

3. After transmission, the convolutionally encoded video bitstream is decoded using

hard decision Viterbi decoding algorithm with trellis depth of 15xK.

4. Data partitioning and RVLC are employed in both ECC video and ECC video plus

IFR.


correctly decoded bitstreams protected using ECC and ECC plus IFR should have

the same visual quality on same video sequence in error free environments.

6. In each test, the residual errors are simulated with random errors with Gaussian

distribution. The BER of the residual error is set to 1x10-4. Back channel messages

are transmitted error free. In most situations this assumption is realistic. If the back

channel message only contains the acknowledgement, which can be positive or

119

negative and is usually short, a strong error protection scheme can be applied to the

back channel message.


texture information are replaced by 0, which means that when the motion vectors

are not available, the motion compensations are implemented by using the motion

vectors exactly in same position in the previous frame when the texture information

is not available, the block in question is reconstructed using the texture information

in the blocks located by the motion vectors.

The final results, obtained by averaging results over 100 individual tests, are shown in

Figure 7.1 and Figure 7.2. The number of bits required to encode each frame of the

sequences are listed in Table 7.1 and Table 7.2. The advantage using IFR is clearly

seen. The PSNRs of the first frames (I frame) for both sequences are not very good due

to the existence of residual errors. But the PSNRs of all the subsequent P frames are

lifted when IFR is employed with the PSNR gain about 7 dB for Salesman and 9 dB for

Akiyo, while the PSNRs of the video output without employing IFR remains low. The

cost is the increase of the coding rate of the first P frame following the I frame while the

coding rate for all other frames remains similar. For both video sequences, it has been

shown in the last chapter that the packetization approach fails to deliver decent

reconstructed video quality at the given conditions (i.e. when the BER of the final

bitstream reaches 10-4) and so no PSNR results have been repeated here for video

sequences employing packetization.

In a wired network, where the residual error conditions are stable, it is easy to design an

ECC scheme matching the residual error conditions; therefore the employment of IFR is

not so important. When the ECC scheme matches the residual error conditions, it can

be expected that the ECC scheme can nearly correct all the residual errors in the

bitstream. In another words, the probability that a residual error can escape the

protection of ECC can be extremely low if the ECC scheme matches the residual error

condition. In wireless situations, the residual error conditions vary, so it can happen

more frequently that some errors escape protection by ECC; therefore it is more

recommendable to employ the IFR techniques in wireless situations.

120

It should be noted that IFR can only be effective when employed together with ECC,

and it does not support packetization. The reason is quite straightforward, the

packetization approach and associated RVLC and Data Partitioning in the current

MPEG-4 coding standards are passive and they do not have the capability to correct

error bits in the bitstream.

7.4 Delay analysis due to the employment of IFR

One obvious problem with IFR is that the data rate for the first P frame following an

Intra frame will be increased due to the employment of IFR compared with not using

IFR.

From Table 7-1, it can be seen that employment of IFR results in the bit number of the P

frame following the first Intra frame of video sequence Akiyo being increased from 536

to 7456 when ECC(13/14) is used in the residual error condition, where the BER of the

bitstream is 10-4. However the IFR-resulted data rate of the first P frame is still less

than the peak date rate of all the P frames among the 50 frames, which exist in frame 18

and frame 19. So, for this particular sequence, IFR does not pose any special difficulty

at all, if the transmission channel for P frames is allocated for the peak P frame data rate

for this particular sequence.

For video sequence Salesman, the employment of IFR results in the bit number of the

first P frame following the Intra frame being increased from 8616 to 27538, which is

about two times of the peak rate of the P frames. This will introduce one frame

transmission delay. One solution is to drop the following P frame, i.e. only one frame

(the first P frame) is transmitted instead of two P frames if the channel allocation is

fixed. Another solution is to modify the transmission protocol to allocate more channel

capacity for the first P frame following an Intra frame. This can be easily implemented

because of the periodic data structure of an encoded video bitstream. We can treat the

first P frame as “half I frame”. If we can periodically update the channel allocation for

an I frame, it is not difficult to accommodate the periodic “half I frame”.

121

The bit numbers mentioned above are the average (per frame) of the results of 100 tests.

More generally, the transmission delay to the P frame following an Intra frame caused

by the employment of IFR depends on the following conditions.

First, the residual error conditions have a significant influence on the data rate increase

of the first P frame. If the residual error conditions are good, the delay is small.

Otherwise if the residual error conditions are poor, the delay will be lengthy.

Second, the content of the video sequence is also an important factor if the ECC rate

does not match the residual error condition completely. The more complex the content

is, the more bits it will produce after compression; consequently, there is more chance

that the bitstream gets corrupted while being transmitted in a Gaussian channel and

more likely that the data rate of the P frame will be increased. However, if the ECC rate

matches the residual error condition perfectly, the content of video will not have much

influence on the reconstructed video quality, as shown in Chapter 8, because the ECC

operation will correct all the residual errors.

Third, the ECC scheme itself plays a crucial role in determining the data rate of the P

frame following an Intra frame. If the ECC is powerful enough to correct all the errors

in the Intra frames, there will be no increase of the data rate of the first P frame

following an I frame. If the ECC is weak, the data rate of the P frame will increase

dramatically. The first two factors are closely related with the ECC scheme itself. The

ECC rate needs to be increased when residual error conditions are poor and the content

of the video sequence is more complex, to combat the unfavorable conditions. From the

next chapter it can be seen that by increasing the ECC rate from ECC(13/14) to ECC

(11/12), the capability of ECC scheme is increased to such degree that it corrects all the

error bits in an I frame for Akiyo when soft decision Viterbi algorithm is used.

Consequently, the employment of IFR does not result in any data rate increase for video

sequence Akiyo because it has no chance to function; while for Salesman, the data rate

increase of the first P frame is negligible due to the employment of IFR.

It should be emphasized that the final data rate of the bitstream employing ECC(11/12)

is still less than the data rate of the bitstream employing Packetization(600) if no RVLC,

Data Partitioning and packetization are employed in the ECC video bitstream.

122

7.5 Conclusion

Following the novel, efficient, effective and active SEC approach achieved with ECC to

combat residual errors [6], a new improved version of the scheme is introduced in this

chapter. The new error resilience tool is IFR, which uses back channel messages to

further improve the performance of ECC video. Simulation results have given positive

support. To stop inter-frame error propagation caused by Intra frame errors, IFR is very

effective for ECC video. To stop inter-frame error propagation caused by P frame

errors, NEWPRED is an effective alternative [1,6].

Future work will include the design and implementation of dynamic ECC for video

communication in mobile environments, with which the ECC coding rate can follow the

change of residual error condition dynamically. If both channel coding and ECC use

convolutional coding, obviously, it provides and excellent opportunity to design a

generic and integrated rate control scheme taking source coding, channel coding and

ECC into consideration, which should be more efficient and effective. This will also be

an interesting direction for future work. More accurate error detection after ECC will

improve the performance of the ECC approach with IFR, and so the error detection

techniques applicable after ECC are also an interesting direction for future work.

References


Visual”, 2001.

[2] ITU-T H.263 “Video coding for low bit rate communication”, 1998. [3] J. G. Proakis, “Digital Communications”, McGraw_Hill, 1995. [4] A. J. Viterbi, “Convolutional Codes and Their Performance in Communication

Systems”, IEEE Trans. on Comm. Technology, Vol. COM-19, No. 5, October 1971, pp.

751-772.

123



March 1984, pp. 315-319.

[6] ISO/IEC JTC1/SC29/WG11 N3908, “MPEG-4 Video Verification Model” version

18.0, January 2001/Pisa.

[7] Y. Yasuda, Y. Hirata, K. Nakamura and S. Otani, “Development of variable-rate

Viterbi decode and its performance characteristics”, Proc. 6th. Conf. Digital Satellite

Commun., Phoenix, AZ, Sept. 1983, pp. XII-24-31.

[8] S. Wenger, G. Knorr, J. Ott and F. Kossentini, “Error Relilience Support in

H.263+”, IEEE Transactions on circuits and systems for video technology, Vol. 8, No.

7, November 1998, pp.867-877.

[9] J. Ott, S. Wenger and G. Knorr, “Application of H.263+ Video Coding Modes in

Lossy Packet Network Environments”, Journal of Visual Communication and Image

Repressentation 10, 1999, pp.12-38.

[10] I. Moccagatta, S. Soudagar, J. Liang and H. Chen, “Error –Resilient Coding in

JPEG-200 and MPEG-4”, IEEE Journal on selected areas in communications, Vol. 18,

No.6, June 2000, pp. 899-914.

[11] Y. Wang, S. Wenger, J. Wen and A. K. Katsaggelos, “Error Resilient Video

Coding Techniques – Real-time Video Communications over Unreliable Networks”,

IEEE Signal Processing Magazine, July 2000, pp. 61-82.

[12] E. Khan, H. Gunji, S. Lehmann and M. Ghanbari, “Error Detection and Correction

in H.263 coded video over wireless network”, The 12th International Packet Video

Workshop (PV 2002), April 2002 Pittsburgh PA, USA.


in video transmission using ECC”, accepted by International Workshop on Very Low

Bit-rate Video, 18-19 September 2003, Madrid Spain.

124

PSNR of Salesman at BER of 1x10-4

23242526272829303132333435

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Frame Number

PSN

R

Error FreeECC plus IFGECC only

Figure 7.1 PSNR of Salesman at BER of 1x10-4

PSNR of Akiyo at BER of 1x10-4

2526272829303132333435363738

1 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PSN

R

Error FreeECC plus IFGECC only

Figure 7.2 PSNR of Akiyo at BER of 1x10-4

125

Table 7-1 Bit number comparison between ECC alone and ECC plus IFR for Akiyo

Frame No ECC(13/14) ECC&IFR(13/14) 0 47480 47480 1 536 7456 2 672 664 3 672 666 4 1128 1116 5 1064 1064 6 1112 1106 7 1304 1293 8 1552 1547 9 1744 1748

10 1632 1631 11 1456 1458 12 2064 2026 13 2008 1999 14 2896 2849 15 4168 4162 16 5816 5766 17 7008 7008 18 7848 7833 19 7632 7637 20 6824 6845 21 6296 6298 22 5232 5208 23 4352 4359 24 4288 4268 25 3200 3212 26 3104 3117 27 3824 3827 28 4808 4822 29 5520 5533 30 6032 6026 31 6112 6122 32 5168 5180 33 4328 4377 34 4688 4679 35 4984 4981 36 5232 5257 37 5088 5072 38 5272 5286 39 5376 5343 40 4832 4862 41 4824 4806 42 4312 4295 43 3752 3759 44 3776 3792 45 4400 4394 46 4768 4774 47 5008 5030 48 5072 5073 49 4488 4486

126

Table 7-2 Bit Number comparison between ECC alone and ECC plus IFR for Salesman

Frame No ECC(13/14) ECC&IFR(13/14) 0 79424 79424 1 8616 27538 2 14128 13734 3 16200 16188 4 15520 15567 5 12496 12545 6 8832 8819 7 8432 8446 8 9240 9174 9 9232 9273

10 10248 10220 11 9432 9379 12 8416 8514 13 9392 9363 14 10064 10137 15 10112 10181 16 8592 8604 17 9792 9930 18 14320 14237 19 16640 16702 20 14176 14077 21 10712 10589 22 10528 10460 23 9016 8977 24 6408 6367 25 5656 5634 26 6968 6946 27 7944 8023 28 6792 6847 29 5056 5004 30 5888 5888 31 6824 6777 32 7512 7569 33 8272 8159 34 11008 10805 35 13840 13821 36 15904 15893 37 14104 14161 38 14792 14924 39 13344 13318 40 11032 11051 41 11272 11316 42 10608 10715 43 10128 10211 44 10816 10817 45 11440 11328 46 9424 9451 47 8928 8952 48 8352 8265

49 7760 7736

127

8 ECC VIDEO WITH SOFT-DECISION VITERBI DECODING

8.1 Introduction

In the previous chapters and [7,8] it has been shown that the proposed ECC scheme can

achieve what the packetization approach cannot. However, in the original ECC

scheme accomplished with a punctured convolutional code, only the hard-decision

Viterbi decoding algorithm is used for the convolutional decoding, which has not fully

explored the potential of the punctured convolutional code. It is reasonable to expect

that the performance of an ECC approach can be further improved if the soft-decision

Viterbi decoding algorithm is used in the convolutional decoding process according to

the theory of convolutional coding.

Also, there is a hidden problem with the original ECC schemes, which is that the PSCs

(picture start code) are not protected by ECC because the ECC operation is based on

each frame of the encoded video bitstream and the PSC served as synchronization point.

If an error happens to be within a PSC, one frame of the video bitstream before the PSC

and one frame after the PSC will not be decoded correctly; consequently the subsequent

video frames will not be decoded properly until an Intra frame because all these frames

have lost decent reference frames if no other measure is taken. The NEWPRED [5,6]

tool can be employed to recover video communication if a PSC is corrupted when a

back channel from decoder to encoder is available, but using a previous frame other

than the last frame as reference for motion estimation can reduce the coding efficiency.

Though the simulation results in Chapter 6 and Chapter 7 still show that the original

128

ECC scheme is much better than the packetization approach, one reason for this is that

the opportunity or probability that a PSC gets corrupted is very low.

The other drawback of the ECC operation based on the video frame is that it introduces

a transmission delay of one frame at the encoding side and a decoding delay of one

frame at the decoding side. This can be a big disadvantage for some real-time video

communications in which the delay requirement is crucial.

In this chapter, the soft-decision Viterbi decoding algorithm for convolutional decoding

replaces the hard-decision Viterbi decoding algorithm in the original ECC scheme, to

improve the performance of ECC video. To bring the PSCs also under protection of

ECC, instead of performing the ECC operation on each frame of the encoded video

bitstream, the ECC operation in this Chapter is performed on a segment basis, i.e. the

video bitstream is decomposed into segments and each segment is further encoded with

ECC. Thus the PSCs in a video bitstream are also protected by the ECC operation. The

simulation result shows that ECC video based on this segmentation and accomplished

with soft-decision Viterbi convolutional decoding, can work in a residual error

condition where the BER of the encoded video bitstream reaches 10-2 without the need

for channel coding. Of course, in reality channel coding is an integrated part of any

telecommunication network, and so the quality of ECC video will be even more

satisfactory.

8.2 ECC with Soft-Decision Viterbi Decoding

The main difference between the hard-decision Viterbi decoding algorithm and the soft-

decision Viterbi decoding algorithm is that the Euclidean metric [2] is used in the soft-

decision Viterbi decoding algorithm instead of the Hamming metric in the hard-decision

decoding algorithm.

When the soft-decision Viterbi decoding algorithm is used in an ECC scheme at the

application layer, the network needs to deliver the soft-decision output [4] to the

application layer after channel coding. This can be accomplished by monitoring the

difference between the survivor path and the path that has the next best metric in the

channel decoding process if channel coding is also achieved with convolutional coding

129

[3]. By monitoring the metric difference of different paths, the channel decoder

(convolutional decoder) produces reliability or confidence information assigned to each

decoded bit.

One important issue inherent with the soft-decision decoding algorithm at the

application layer is that a careful decision is needed for the selection of the confidence

levels. Increasing the confidence level can improve the performance of the

convolutional coding but the cost is the increase of the computational requirement.

Here the confidence level is different to the quantization level, which is usually

conducted on the “raw” data from the channel to convert the analogue data into the

digital data. The confidence level is produced during the channel decoding process and

conducted on digital data. However, the impact of the selection of the confidence level

to punctured convolutional decoding is similar to the selection of the quantization level.

An analysis of the effect of quantization can be found in [1]; in this work 3 bits

precision is used.

8.3 Simulation results

To evaluate the effectiveness of the proposed algorithm, the same video sequences,

Akiyo and Salesman, are chosen again as the test sequences. The goal is to compare

the PSNRs of the video output reconstructed from the bitstreams protected with an ECC

scheme with the soft decision Viterbi decoding algorithm and the bitstreams protected

with a packetization approach. The experiment conditions are similar to the ones used

in Chapter 6. To make this chapter more complete and independent the conditions are

repeated below.

1. 50 frames of each video sequence are encoded with the first frame coded as an I

frame followed by all P frames without rate control.

2. Packet size of both video sequences is 600 bits when the packetization scheme is

used. Data partitioning and RVLC are employed with the packetization scheme,

while they are not employed with the ECC approach.

3. When the ECC scheme is employed, the ½ rate base convolutional code (561, 752)

is chosen which has a constraint length of K = 9. This base code is punctured to

130

rates 11/12, 9/10 and 7/8 for the residual error conditions where the BER of the

residual errors reaches 10-4 and 10-3 respectively.

4. The segment length for ECC coding is chosen as average frame length of the 50

encoded video frames.

5. After transmission the convolutionally encoded video bitstream is first decoded

using the soft-decision Viterbi decoding algorithm with trellis depth of 19xK.


correctly decoded video bitstreams protected with ECC or packetization should

have the same visual quality as the video sequence in error free environments.

7. After the corrupted video bitstream is decoded, the erroneous motion vectors and

texture information is replaced by 0, which means that when the motion vectors are

not available, the motion compensations are implemented by using the motion

vectors exactly in same position in the previous frame, and when the texture



The final results, obtained by averaging results per frame over 100 individual tests, are

shown in Fig.8.1 and Fig.8.2. The coding rate comparisons between the ECC schemes

and the packetization approach are shown in Table 8-1 to Table 8-6. The advantage

using the ECC scheme with soft-decision Viterbi decoding instead of the packetization

approach is clearly seen. The ECC(11/12) with soft-decision Viterbi decoding delivers

excellent reconstructed video output even when the BER of the residual errors reaches

10-3; while it has been shown in Chapter 6 that Packetization(600) is totally incapable to

deliver an decent reconstructed video output while decoding the corrupted video

bitstream when the BER of the final video bitstream reaches 10-4. When the BER of

the residual errors is relaxed from 10-3 to 10-4, the ECC(11/12) delivers video output

with PSNR which is nearly as same as the transmission-error free situation for video

sequence Salesman; while for Akiyo it delivers video output with PSNR which is

exactly the same as the transmission-error free situation (care is needed while looking at

Fig 8.2 as the PSNR curve for BER of 10-4 is coincident with the PSNR curve for

transmission-error free). It is not surprising that in 99 tests out of 100, the ECC(11/12)

corrects all of the residual errors in the bitstreams, leaving only one test with 5 bits in

131

error for Salesman and 3 bits in error for Akiyo for the 50 frames after ECC decoding.

Here one question can be raised, how can a bitstream with transmission-errors still

deliver a video output with PSNR which is the same as in the transmission-error free

situation? The answer is quite straightforward, when the bits in error in the bitstream

correspond to the reconstructed picture areas, where no movement in the content of the

video occurs, then a basic error concealment operation (e.g. copying the corresponding

area from the previous frame) will conceal the error effect completely.

With a further negligible increase of the ECC rate, these experiments also reveal that

when the ECC(9/11) is used, the ECC operation corrects all the residual errors in the

bitstreams of both sequences Akiyo and Salesman for all the 100 tests when BER of the

residual errors is set to 10-4, delivering transmission-error free reconstructed video

output. Comparing this result with the performance of Packetization(600), the contrast

is very obvious. Based on 100 tests, our experiments also reveal that ECC(7/8) corrects

all the residual errors in the bitstreams of these two video sequences, when the BER of

the residual errors reaches 10-3. In these cases, the PSNRs of the video outputs are

identical to the PSNRs of video sequences transmitted in an error free situation and

there is no point to depict a separate PSNR curve.

The 11/12 puncturing rate of the convolutional code results in a 9.2% increase of final

bit rate. From Table 8-1 and Table 8-2 it can bee seen that without employing RVLC,

the number of bits for the final bitstream employing ECC(11/12) is still less than the

number of bits for the final bitstream employing packetization(600) which results in the

9.9% bit rate increase of the final bitstream.

Table 8-3 and Table 8-4 reveal that ECC(9/10) produces marginally more bits than

Packetization(600) does for the sequence Salesman, while for sequence Akiyo the final

bitstreams employing ECC(9/10) and Packetization(600) produce equal number of bits.

Table 8-5 and Table 8-6 show that ECC(7/8) produces 4% more bits than

Packetization(600) does for Salesman and 2.8% more bits than Packetization(600) does

for Akiyo. Taking into consideration what ECC can achieve and Packetization cannot,

these additional bit rate increases are really negligible.

132

8.4 Discussion

It has been shown that if the residual error condition is stable and if the ECC rate is

properly designed matching the residual error condition, it is unnecessary to employ

Data Partitioning and RVLC because the ECC operation corrects all the errors in the

bitstream. If the residual error condition varies as in mobile situations, employing Data

Partitioning and RVLC is helpful. However to employ RVLC in the ECC approach

reduces the coding efficiency. With a data rate increase equivalent to the increase for

using RVLC, the ECC power can be further increased significantly. It has been shown

how much the ECC power can be increased by employing ECC(11/12) instead of

ECC(13/14). Which technique to choose in reality needs to be further investigated and a

wise decision needs to be made based on several factors including residual error

conditions, video content, ECC choice and networking protocols, etc.

Obviously employing IFR with an ECC scheme with soft-decision Viterbi decoding will

further improve the performance of the ECC scheme. Another set of simulation results

shown in Figure 8.3 and Figure 8.4 obtained by averaging results from 100 individual

tests conducted in residual error condition where the BER of the residual error is set to

10-2 further demonstrate how powerful the proposed ECC enhanced with IFR can be.

The soft-decision Viterbi decoding algorithm is used in this simulation. All the other

experiment conditions are the same as listed in Chapter 7 except that the BER of the

residual errors is changed to 10-2 here. In this simulation, the ECC(7/8) scheme is

designed to be applied to a situation where the BER of the residual errors stays at 10-3

most of the time and occasionally increases to 10-2. This can also be a result of

interleaving at the application to cope with bursty error and packet loss [9,10]. When

the BER of the video bitstreams increases to 10-2, ECC(7/8) enhanced with IFR still

delivers decent video outputs for both sequences Salesman and Akiyo as shown in

Figure 8.3 and Figure 8.4, while the bitstreams protected with

packetization/resynchronization are just undecodable no matter how long the packet

size is set. There is no point to draw the PSNR lines for the video outputs reconstructed

from the bitstreams protected by the packetization schemes in the figures for these

situation results, which drop to below 10dB and so do not mean anything. ECC(7/8)

results in an increase of the bit rate of the final bitstream by 14.29% for all frames

133

except the first P frame following the I frames, if no Data Partitioning and RVLC are

employed in the ECC scheme. But it does make video transmission possible when the

BER of the residual errors increases to 10-2, which would be otherwise impossible if

resynchronization is employed instead of the ECC scheme.

Another benefit of the ECC approach based on segments is that it can relieve the

requirement for channel capacity for the P frame following an Intra frame when IFR is

employed with the ECC scheme. When an ECC scheme is based on frames as in

Chapter 6 and Chapter 7, the ECC operation is conducted after each video frame is

compressed. When an error occurs within an I frame, the system resource for encoding

and the transmitting the picture area from the macroblock in which the error occur to the

end of the frame is wasted, as this part of the picture area cannot be used by the

decoder. As stated in Chapter 7, the decoder needs to send a message through a back

channel to the encoder to inform the encoder the start number of the macroblock in

which an error has occurred during the decoding process. The encoder then knows that

the decoding has not been successful from this start number to the end of the frame, and

so the encoder can encode all the macroblocks associated with those broken

macroblocks in the following frame in Intra mode. However, to encode macroblocks

from a P frame in Intra mode will result in an increase of bit rate of the frame. When an

ECC scheme is based on segments, it is possible that when a back channel message

arrives at the encoder, the encoding process of the Intra frame has not been finished. In

this case the encoder can stop encoding the rest of the Intra frame and start encoding the

following frame right away, to use the channel capacity allocated for the Intra frame to

transmit the bitstream from the next frame which is a P frame, thus relieving the

requirement for the channel capacity for the first P frame following an I frame due to

the employment of the IFR technique.

The fundamental difference between the SEC approach realized with ECC and

traditional schemes is that the SEC approach functions before video decoding while

traditional approaches function after video decoding. The SEC operation kills the errors

in the video bitstream before video decoding while traditional approaches accept the

errors before video decoding and try to conceal, hide or “repair” the error effects after

video decoding. That is why SEC is termed an active approach while traditional

134

approaches are passive. Even without seeing the simulation results one would expect

that the SEC approach would offer a big advantage over the traditional schemes.

These different approaches toward error resilience also create another feature which

distinguishes a ECC scheme from a packetization approach and other error resilience

approaches. This is that the performance of ECC mainly depends on the residual error

conditions and the capability of ECC to correct the residual errors in the bitstream, if the

ECC code is properly designed matching the residual error condition and does not

depend on the contents of the video sequence. This has been proven by the experiments

in this chapter. Different from ECC, the performance of packetization does not only

depends on the packet size and the residual error conditions, but also depends on the

content of the particular video sequence, because basically the packetization approach

relies heavily on the employment of techniques for error concealment. That is the main

reason why we have not conducted experiments with other video sequences except the

video sequence Akiyo and Salesman to prove ECC’s superiority over packetization: the

characteristics of Akiyo with slow movement and Salesman with fast movement are

fairly representative.

References

[1] R. Wells and G. Bartles, “Simplified calculation of likelihood metrics for Viterbi

decoding in partial response systems”, IEEE Trans. Magnetics, vol. 32, no. 5, Pt.

III, September 1996.

[2] R. B. Wells, “Applied Coding and Information Theory for Engineers”, Prentice

Hall, 1999.

[3] B. Vucetic, “An Adaptive Coding Scheme for Time-Varying Channels”, IEEE

transactions on communications, Vol. 39, No. 5, May 1991, pp.653-663.

[4] J. Hagenauer and P. Hoher, “A Viterbi algorithm with soft-decision output and its

applications”, in Proc. IEEE Global Telecommunications Conf. (GLOBECOM),

Dallas, TX, November 1989, pp. 47.1.1-47.1.7.

135


Visual”, 2001.




in video transmission using ECC”, accepted by International Workshop on Very

Low Bit-rate Video, 18-19 September 2003, Madrid Spain.


Relay”, accepted by IADIS International WWW/Internet 2003 Conference,

Algarve, Portugal, November 2003.


Proceedings of Iranian Conference on Electrical Engineering (ICEE) 2003, May 6-

8, 2003, Shiraz, Iran.


Proceedings of Picture Coding Symposium (PCS) 2003, Saint-Malo, France, 23 -

25 April 2003. pp.99-103.

136

Salesman in Random errors

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Frame Number

PSN

R

Transmission-error freeECC(11/12) with BER of 10-4ECC(11/12) with BER of 10-3

Packetizatino(600) with BER of 10-4Packetization(600) with BER of 10-3

Figure 8.1 Performance of ECC(11/12) for Salesman with random errors

137

Akiyo in Random Errors

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Frame Number

PSN

R

Transmission-error freeECC(11/12) with BER of 10-4ECC(11/12) with BER of 10-4Packetization(600) with BER of 10-4Packetization(600) with BER of 10-3

Figure 8.2 Performance of ECC(11/12) for Akiyo with random errors

138

Salesman with BER of 10-2

10

15

20

25

30

35

401 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Frame Number

PS

NR

ECCError Free

Figure 8.3 Salesman with BER of 10-2

Akiyo with BER of 10-2

10

15

20

25

30

35

40

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Frame number

PS

NR

ECCError Free

Figure 8.4 Akiyo with BER of 10-2

139

Table 8-1 Bit number comparison between Packetization(600) and ECC(11/12) for Salesman

Frame Number ECC(11/12) Packetization (600) 0 77896 81168 1 8600 8744 2 14176 14120 3 16320 16256 4 15592 15464 5 12536 12568 6 8888 8872 7 8488 8520 8 9280 9312 9 9272 9328

10 10272 10344 11 9480 9480 12 8472 8456 13 9432 9496 14 10120 10088 15 10200 10080 16 8688 8640 17 9880 9920 18 14472 14320 19 16744 16624 20 14232 14224 21 10712 10904 22 10536 10696 23 9048 9032 24 6440 6440 25 5720 5712 26 7032 7032 27 7976 8024 28 6848 6840 29 5096 5080 30 5928 5912 31 6824 6848 32 7496 7624 33 8280 8416 34 11040 11120 35 13872 13848 36 16000 15920 37 14216 14088 38 14888 14880 39 13384 13416 40 11096 11120 41 11352 11328 42 10728 10648 43 10216 10248 44 10856 10840 45 11488 11600 46 9464 9520 47 8968 8976 48 8360 8432 49 7784 7856

Total 584688 588424 Average 11693.76 11768.48

140

Table 8-2 Bit Number Comparison between Packetization(600) and ECC(11/12) for Akiyo

Frame Number ECC (11/12) Packetization (600)0 46792 50352 1 528 488 2 664 616 3 664 616 4 1120 1112 5 1048 1040 6 1112 1088 7 1296 1272 8 1536 1552 9 1728 1736

10 1624 1632 11 1448 1472 12 2064 2024 13 2016 1976 14 2880 2912 15 4168 4208 16 5760 5840 17 6984 7000 18 7752 7896 19 7616 7704 20 6824 6880 21 6296 6352 22 5224 5232 23 4344 4368 24 4296 4312 25 3176 3184 26 3096 3104 27 3808 3848 28 4808 4816 29 5496 5512 30 6008 6048 31 6056 6112 32 5160 5176 33 4320 4344 34 4664 4688 35 4976 5008 36 5216 5256 37 5080 5128 38 5280 5272 39 5352 5408 40 4832 4840 41 4808 4872 42 4296 4328 43 3736 3752 44 3784 3784 45 4384 4416 46 4760 4760 47 4984 5040 48 5072 5096 49 4480 4504

Total 243416 247976 Average 4868.32 4959.52

141



10 10464 10344 11 9656 9480 12 8624 8456 13 9600 9496 14 10304 10088 15 10384 10080 16 8848 8640 17 10064 9920 18 14744 14320 19 17048 16624 20 14488 14224 21 10912 10904 22 10728 10696 23 9208 9032 24 6560 6440 25 5824 5712 26 7160 7032 27 8128 8024 28 6968 6840 29 5192 5080 30 6040 5912 31 6952 6848 32 7640 7624 33 8432 8416 34 11248 11120 35 14128 13848 36 16296 15920 37 14480 14088 38 15168 14880 39 13632 13416 40 11304 11120 41 11560 11328 42 10928 10648 43 10400 10248 44 11064 10840 45 11704 11600 46 9640 9520 47 9128 8976 48 8520 8432 49 7928 7856

Total 595480 588424 Average 11909.6 11768.48

142

Table 8-4 Bit number comparison between Packetization(600) and ECC(9/10) for Akiyo


10 1656 1632 11 1480 1472 12 2104 2024 13 2056 1976 14 2936 2912 15 4240 4208 16 5872 5840 17 7112 7000 18 7896 7896 19 7752 7704 20 6952 6880 21 6408 6352 22 5320 5232 23 4424 4368 24 4376 4312 25 3240 3184 26 3152 3104 27 3880 3848 28 4904 4816 29 5592 5512 30 6120 6048 31 6168 6112 32 5256 5176 33 4400 4344 34 4752 4688 35 5072 5008 36 5312 5256 37 5176 5128 38 5384 5272 39 5448 5408 40 4920 4840 41 4904 4872 42 4376 4328 43 3808 3752 44 3848 3784 45 4464 4416 46 4848 4760 47 5080 5040 48 5168 5096 49 4560 4504

Total 247960 247976 Average 4959.2 4959.52

143


Frame Number ECC (7/8) Packetization (600)

0 81600 81168 1 9008 8744 2 14848 14120 3 17096 16256 4 16328 15464 5 13128 12568 6 9312 8872 7 8888 8520 8 9720 9312 9 9712 9328

10 10760 10344 11 9928 9480 12 8872 8456 13 9872 9496 14 10600 10088 15 10680 10080 16 9096 8640 17 10352 9920 18 15160 14320 19 17536 16624 20 14904 14224 21 11216 10904 22 11040 10696 23 9472 9032 24 6752 6440 25 5992 5712 26 7360 7032 27 8360 8024 28 7168 6840 29 5344 5080 30 6208 5912 31 7152 6848 32 7856 7624 33 8672 8416 34 11568 11120 35 14528 13848 36 16760 15920 37 14896 14088 38 15600 14880 39 14016 13416 40 11624 11120 41 11888 11328 42 11240 10648 43 10696 10248 44 11376 10840 45 12032 11600 46 9912 9520 47 9392 8976 48 8760 8432 49 8160 7856

Total 612440 588424 Average 12248.8 11768.48

144

Table 8-6 Bit number comparison between Packetization(600) and ECC(7/8) for Akiyo


10 1704 1632 11 1520 1472 12 2160 2024 13 2112 1976 14 3016 2912 15 4360 4208 16 6032 5840 17 7312 7000 18 8120 7896 19 7976 7704 20 7152 6880 21 6592 6352 22 5472 5232 23 4544 4368 24 4496 4312 25 3328 3184 26 3240 3104 27 3984 3848 28 5040 4816 29 5752 5512 30 6288 6048 31 6344 6112 32 5408 5176 33 4528 4344 34 4880 4688 35 5216 5008 36 5456 5256 37 5320 5128 38 5536 5272 39 5608 5408 40 5056 4840 41 5040 4872 42 4496 4328 43 3912 3752 44 3960 3784 45 4592 4416 46 4984 4760 47 5224 5040 48 5312 5096 49 4688 4504

Total 254912 247976 Average 5098.24 4959.52

145

9 ECC VIDEO IN BURSTY CHANNEL ERRORS AND PACKET LOSS

It has been shown in previous Chapters and [18,19,20,21] that the ECC approach can

realize an excellent video transmission in such poor residual error conditions that the

BER of the residual errors of the video bitstream deteriorates to 10-2 when ECC(7/8)

enhanced with IFR is employed instead of the packetization approach in the MPEG-4

video coding standard. However, these results are obtained for Gaussian channels, so

the performance of the proposed ECC approach in bursty residual errors and packet loss

situations remain untested. In this chapter, the final video bitstream is interleaved at the

application layer to combat the bursty channel errors and packet losses.

To make the description more clear and easier, we need to define two new concepts

here, bursty error and burst loss. By bursty error, we mean an error condition where a

chunk (or burst) of bits of a video bitstream is corrupted by errors with very high BER

(for instance 10-1) during the burst. The length of bursty error refers to the length of the

burst. By burst loss (or packet loss), we mean a chunk (or burst) of bits of an encoded

video bitstream which gets lost during the transmission process. The length of burst

loss (or packet loss) refers to the length of the packet which gets lost during

transmission. If burst loss occurs, the network needs to put dummy data into the

bitstream where the burst loss happens. Burst loss can happen in a frame or a segment

of a video bitstream.

146

9.1 Performance of the original ECC approach with Bursty Residual Errors

To evaluate the performance of the original ECC schemes in conditions of bursty errors

compared with the packetization approaches from the MPEG-4 standard [14,15], two

experiments are conducted using the video sequence Salesman. Because the goal of the

experiments is to test the performance of the ECC scheme with bursty errors and the

inter-frame error propagation effects can make analysis complicated, the effects of inter-

frame error propagation have been excluded in the experiments. The first experiment is

conducted on I picture only while the other is conducted on P frame following the first I

frames. In the first experiment, only one frame is encoded in Intra mode with the length

of the bursty error set to different values; while in the second experiment only two

frames are encoded, the first frame is encoded in Intra mode followed by a P frame and

only P frame are error corrupted, again with bursty error length set at different values.

The final results represented by the PSNRs are the average over 100 individual tests for

each length of bursty errors. The experiments are based on the following conditions,

1. Packet size of the encoded video sequences is 450 bits when packetization is used.

2. When ECC is employed, the ½ rate base convolutional code (561, 752) is chosen

which has a constraint length of K = 9. This base code is punctured to rate 9/10,

which means that every 9 bits in the encoded bitstream, another bit is added after

convolutional encoding.

3. The error corrupted convolutional encoded bitstream is decoded using the soft-

decision [9,10,11] Viterbi decoding algorithm with a trellis depth of 11xK.

4. Data partitioning and RVLC are employed in both experiments with ECC and

packetization.


correctly decoded bitstreams protected using ECC or packetization should have the

same visual quality for the same video sequence in error free environments.

147

6. In each test, the I or P frame of the encoded bitstream is randomly error corrupted

by a burst of bursty error (the start position of the bursty error in the frame is

randomly distributed). The BER of the burst is 10-1.


texture information are replaced by 0, which means that when the motion vectors are

not available, the motion compensations are implemented by using the motion

vectors exactly in same position in the previous frame, and when the texture



The final results are shown in Figure 9.1 and Figure 9.2. From the Figures it can be

seen that the performances of both ECC and packetization schemes are satisfactory

when the length of the bursty errors is less than 40 bits for the I frame. When the

length of the bursty errors further increases, the performance of the ECC scheme

declines rapidly, while the packetization approach still delivers a quite good output.

However for P frame, the results are reversed; i.e. the ECC scheme is marginally better

than the packetization approach. But the performances of both the ECC approach and

the packetization scheme decrease rapidly as the length of the bursty errors increase.

PSNR of I Picture in Bursty Error

15

20

25

30

35

40

0 10 20 30 40 50 60 70 80 90 100

Bursty Error Length (bits)

PS

NR


Figure 9.1 PSNR of I picture with bursty errors

148

PSNR of P-picture in Bursty Errors

30

31

32

33

34

0 10 20 30 40 50 60 70 80 90 100

Bursty Error Length (bits)

PSN

RECC(9/10)Packetisation(480)

Figure 9.2 PSNR of P frame with bursty errors

Obviously for a normal video communication, where most picture frames are encoded

in P frames following a small number of I frames, both original ECC and packetization

approaches are not good enough to cope with bursty errors. This conclusion calls for a

new error resilience tool to cope with bursty errors and packet loss. In the following

sections, the encoded video bitstream is interleaved after ECC is performed. The result

obtained by simulation is quite promising.

9.2 ECC Video with Interleaving

In this new scheme, an additional operation called interleaving is performed after a

compressed video bitstream is further encoded using punctured convolutional code. A

video communication system employing ECC with interleaving is shown in Figure 9.3.

The operation performed by the interleaver is shown in Figure 9.4. The principle for

interleaving is to spread the bursty error into wide range, by reordering the video data,

which has gone through compression and the ECC encoding procedure, to make it

easier for the convolutional decoder to correct the errors in the bitstream. More detailed

discussion on interleaving for convolutional decoding can be found in [12,13]. In our

experiments, m is chosen to be equal to the length of bursty errors or the burst loss.

149

Figure 9.3 Video communication system with ECC and interleaving

Obviously the interleaving operation for coping with bursty errors and burst loss can

only be effective when employed by a bitstream protected with ECC. It can only make

it worse if it is applied to packetization approach, as packetization doesn’t have the

capability to correct errors in a bitstream and spreading bursty errors to wide range

means more picture area will be affected by the errors in the bitstream protected with a

packetization approach.

1 n+1 2n+1 … (m-1)n+12 n+2 2n+2 … (m-1)n+23 n+3 2n+3 … (m-1)n+34 n+4 2n+4 … (m-1)n+4… … … … …n-1 2n-1 3n-1 … mn-1n 2n 3n … mn

Figure 9.4 Interleaver for coded data


ECC Encoder

Channel Encoder

Channel

Channel Decoder

ECC Decoder

Source Decoder Display

Interleaver

Deinter-leaver

Rea

d in

cod

ed b

its fr

om

conv

olut

iona

l enc

ode r

Read out bits to channel coding

m bits

n rows

150


9.3.1 ECC video with bursty errors

To evaluate the effectiveness of the new proposal, the tests are conducted with the video

sequence Salesman. The encoded video bitstreams are protected using ECC enhanced

with interleaving and packetization respectively. The test conditions are the same as

those listed in Section 9.1 except that the ECC base code is also punctured to 7/8 in

addition to the 9/10 rate when the ECC approaches are employed. The packet size is set

to 380 bits and 450 bits respectively when the packetization schemes are employed. The

bit rate of the bitstream employing Packetization(380) is roughly equal to the bit rate of

the bitstream employing ECC(7/8) as shown in Table 9-4.

The length of segments for ECC coding is set to the average frame length of the

encoded bitstream of the 50 frames. The overhead in the final bitstream employing

Packetization(450) is more than that because of ECC((9/10), see Table 9.2 and Table

9.3. In each test, each segment is randomly corrupted by a burst of the bursty errors

after the compressed video bitstream is either further encoded using an ECC scheme or

packetized with a resynchronization approach. The length of the bursty errors is fixed

to 360 bits with the BER (bit error rate) set to 10-1 during the burst, which roughly

corresponds to a transmission of 10ms to this particular video sequence Salesman if the

final bit rate of the bitstream is 36 kbps; this is the toughest test condition used to

evaluate the performance of error resilience tools in MPEG-4 during its standardization

process [15]. The results are shown in Figure 9.5.

In the bursty error situation the superiority of the ECC scheme enhanced with

interleaving to the packetization approach is clearly seen from Figure 9.5. Throughout

the sequence, the bitstream employing ECC(9/10) delivers a very good video output

while producing fewer bits than the bitstream employing Packetization(450). With

marginal increase of the overhead, ECC(7/8) delivers an excellent reconstructed video

output, which is only 1 db lower than the transmission-error free situation.

151

ECC performance in bursty error with Salesman

1618202224262830323436

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49

Frame Number

PSN

R

Error FreeECC(7/8)ECC(9/10)Packetizatioin(380)

Figure 9.5 Performance of Salesman with bursty errors

Note: Because the packetization approach is unable to deliver a recognizable video output at

the same test conditions, we have set different test conditions for ECC and packetization

respectively. The bitstream employing ECC is corrupted by a burst in every segment,

which means there are around 8 bursts in the first frame; while the first frame (I frame)

of the bitstream employing packetization is corrupted by only one burst.

At the above specified test conditions, neither Packetization(380) nor

Packetization(450) is able to deliver a viewable video output. To have a biased

comparison favoring the packetization approach, another test is conducted with the

packetization approach where the first frame (which is an I) is corrupted by only one

burst of bursty errors when packetization is employed, whereas the I frame is corrupted

by 8 bursts of bursty error when the ECC scheme is used. The results with

packetization are still disappointing as shown in Figure 9.5, as only the first two frames

are viewable with the rest rapidly declining to unrecognizable. Though not depicted in

Figure 9.5, our results also reveal that reducing the packet size from 450 bit to 380 bits

does not show much improvement for combating the bursty residual errors. The reason

is that as stated in Chapter 6, there is more chance that the packet header and DC or

motion markers get corrupted, as the packet size get smaller. Because the length of the

bursty error is 360 bits, there is no point to reduce the packet size smaller than 360 bits,

otherwise it will be definite that one of the packet header will be corrupted.

Consequently the whole packet will have to be discarded.

152

9.3.2 ECC video with burst lost in GPRS network

GPRS [1,2,3,4] is an end-to-end mobile packet radio communication system based on

the same radio architecture as GSM. The capability of multiple timeslot allocation in

GPRS networks effectively increases the throughput of a single terminal, which makes

video transmission over GPRS network realistic [5,6,7].

GPRS radio blocks are arranged into GSM bursts for transmission across the radio

interface, where the Physical Link Layer is responsible for forward error detection and

correction. GPRS data is transmitted over the Packet Data Traffic Channel (PDTCH)

and is protected by four different channel coding schemes. CS-1, CS-2 and CS-3 [8]

use convolutional codes and block check sequences of differing strengths so as to give

different rates. CS-4 [8] on the other hand only provides error detection functionality

and was certainly is not good enough for being employed in video transmission. The

details of four channel coding schemes are listed in Table 9-1.

Table 9-1 GPRS Channel Coding Schemes

Scheme Code Rate Radio Block (bits) Data Rate (kb/s)

CS-1 1/2 181 9.05CS-2 2/3 268 13.4CS-3 3/4 312 15.6CS-4 1 428 21.4

In our experiment, we assume the CS-1 is used. The goal of the experiment is to test

the performance of the ECC approach enhanced with interleaving when there is a burst

loss in every segment, which corresponds to a loss of 181 bits in every segment. Again

the length of the segment for the ECC coding is set to the average frame length of the

final video bitstream of the 50 frames. When the network detects a burst loss, dummy

data is inserted into the bitstream where the burst loss occurs. In each test, the same

base convolutional code (561, 752) is chosen. To obtain decent reconstructed video

output, the base convolutional code has to be punctured to 5/6, which means a single bit

has to be inserted for every 5 bits in the bitstream. When the packetization schemes are

employed, the packet size is chosen as 250, 350, 450 bits respectively.

153

Performance of ECC in Packet loss

10

15

20

25

30

35

40

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49Frame Number

PSN

R

Error FreeECC(5/6)Packetization(250)

Figure 9.6 Performance of Salesman with burst loss

Note: Because the packetization approach is unable to deliver a recognizable video output at

the same test conditions as in the ECC scheme, we have set different test conditions for

ECC and Packetization respectively. There is only one burst lost in the first frame of the

bitstream employing the packetization scheme, while there are 8 bursts lost during

transmission in the bitstream employing the ECC approach.

The results are shown in Figure 9.6. Table 9-5 and Table 9-6 list the comparison of the

number of bits between the different ECC schemes and the different packetization

approaches with different packet sizes when the packetization schemes are employed.

Again the packetization approaches fail to deliver a viewable reconstructed video output

no matter how long the packet size is set at the specified test conditions. As in the

bursty error situations, to have a biased comparison in favoring the packetization

approaches, more favorable test conditions are set for the packetization approach by

only having one burst lost in the first frame (I frame) during transmission when

packetization is employed, while nearly 8 bursts are lost in the first frame (I frame)

when ECC and interleaving are employed. The results are depicted in Figure 9.6. Still

the packetization approach is disappointing, as only the first two frames are viewable

with rest rapidly declined to unrecognizable.

154

In such poor residual error conditions, the ECC(5/6) results in a 20% increase of

overhead in the final bitstream compared with the basic bitstream (here we define basic

bitstream as the bitstream in which no error resilience tool including packetization, Data

Partitioning, RVLC and ECC is employed). It does not look very efficient, but in such a

tough transmission condition, the ECC scheme does make video communication

possible while it is impossible by employing the packetization approaches. The

simulations also reveal that reducing the packet size from 450 bits to 250 bits, which

results in the same bit rate as ECC(5/6), has not shown much improvement on the

reconstructed video output when the packetization schemes are employed. The reason

is the same as in the burst error situation; reducing packet size will introduce too many

makers which will introduce too much vulnerability.

It is interesting to notice that the packetized video output with the burst loss in the last

experiment has a better performance than the packetized video output with bursty errors

in previous experiments, at least for the first few picture frames. This should not be

surprising when recalling that the burst loss has a length of 181 bits while the bursty

errors have a length of 360 bits, also the packet size in the previous experiment is 380

bits while the packet size in the last experiment is 250 bits.

9.4 Discussion

In this Chapter, Interleaving following an ECC operation is proposed to combat bursty

errors and packet loss in final video bitstreams. The interleaving is performed within

each segment of the final bitstream. It has been proved that SEC achieved with ECC

and enhanced with interleaving is more effective than the packetization approach not

only in a Gaussian channel, but also in a more challenging bursty error channels. It als

copes with packet loss very well. It can therefore be used to realize video

communication in such harsh environments where it is impossible with the

packetization approach.

Different from first error control where diverse matured hybrid ARQ schemes are

available, it is not realistic for SEC to employ ARQ technique when ECC fails.

Therefore it is mainly depends on the proper design of an ECC scheme for the worst

case channel conditions. This does not seem very efficient when the residual error is in

155

good conditions if the ECC is designed for the worst residual error conditions. But

from our experience, with a modest increase of ECC rate, the capability of ECC for

correcting errors can improve significantly. For instance, when the code rate of the

punctured convolutional code (561,752) increases from 13/14 to 11/12, its capability to

correct error increase dramatically. So even when the ECC is designed for the worst

residual error conditions, it is still quite efficient.

When a back channel from decoder to encoder is available, the employment of the IFR

technique can improve the performance of video transmission in both bursty errors and

burst loss significantly. More detail on IFR can be referred to in Chapter 7. From the

simulation results in Chapter 7, it is reasonable to expect that the PSNRs of the video

outputs with Salesman reported in the last section in both burst errors and packet loss

can be further improved to over 30 dB if IFR is employed in the simulation.

Another possibility to achieve more efficiency is to design a dynamic ECC scheme,

which follows the change of the residual error condition at the application layer if the

fading period of the channel lasts longer than one segment. This needs the availability

of a back channel message from the decoder.

One disadvantage with ECC and interleaving is that it introduces a decoding delay of

one segment because the ECC operation and interleaving is based on each segment of

the final encoded video bitstream. To reduce the decoding delay the segment needs to

be small. However to increase the effectiveness of interleaving, the segment need to be

longer. In practice a compromise needs to be made between these two contrary effects.

In our experiments the segments have been set to the average frame length of a video

frame. It needs to be emphasized that ECC alone based on segments does not introduce

any transmission delay or decoding delay. It is only the interleaving which introduces

the delays. Actually a packetization scheme also introduces a decoding delay of one

packet. To reduce the decoding delay, the packet size should not be long. However,

reducing the packet size reduces the coding efficiency significantly. A compromise is

also needed too if a packetization scheme is employed in practice.

It has been identified that ECC based on video frames has its disadvantages (see

Chapter 8). However it has advantages too, i.e. the interleaving operation based on each

frame of the bitstream will be more effective than based on segments, especially for an I

156

frame if the delay requirement is not so crucial. How the technique is employed in

reality needs to be flexible. When the main concern is on bursty errors or packet loss

regarding the transmission errors and the delay requirement is not very crucial, like a

real time downloading application in an ATM network [16,17], the ECC operation can

be conducted based on each video frame to stretch the effectiveness of interleaving to

its limit. Of course IFR and NEWPRED should be employed with ECC and

interleaving too, when a back channel is available to recover the decoding operation if

an error occurs in the bitstream, especially when the error occurs in a PSC (Picture Start

Code). The simulation results shown Figure 9.7 and Figure 9.8 demonstrate how

powerful a different interleaving scheme can be. In these simulations the interleaving is

conducted on each frame instead of on each segment, based on the same test conditions

as in Section 9.3 except that here we have a burst error or packet loss in each frame.

Comparing the performance of packetization in bursty errors and packet loss, the

achievement of the ECC with interleaving is really substantial.

The length of the bursty errors and the length of the packet loss are set to 360 bits and

181 bits respectively in the simulations in this chapter. When the length of the bursty

errors or packet loss is further increased, it seems that the performance of the

packetization approaches may be more competitive than the ECC schemes.

Theoretically the packetization approaches may be able to regain synchronization

within a frame when an error occurs in the frame, while the ECC approaches may fail as

the length of the bursty errors or the length of the packet loss increases. However, when

the length of the bursty errors or the length of the packet loss is increased to such a

degree that even the decoder is able to regain synchronization within a frame through

the employment of a packetization scheme after the errors occur, the video output is still

unacceptable. Not many people would think a video output is acceptable if it leaves

some big empty stripes on the screen. In this sense, regaining synchronization within a

frame does not help much for improving the video output when the length of the bursty

errors or the length of the packet loss is long enough. It has been shown that even when

the length of the burst error is only 360 bits, the reconstructed video output rapidly

declines to unrecognizable.

157

References

[1] J.Cai and D.J.Goodman, “General Packet Radio Service in GSM”, IEEE

Communications Magazine, vol.35, no.10, October 1997, pp. 122-131.

[2] G. Brasche and B. Walke, “Concepts, Services, and Protocols of the New GSM

Phase 2+ General Packet Radio Service”, IEEE Communications Magazine, vol.35,

no.8, August 1997, pp. 94-104.

[3] GSM 03.60 Digital Cellular Telecommunications System, General Packet Radio

Service (GPRS), Service description, Stage 2, 1997.

[4] GSM 03.64 Digital Cellular Telecommunications System, General Packet Radio

Service (GPRS), Overall description of the GPRS radio interface, Stage 2, 1997.

[5] Bing Du, A. Maeder and M. Moody, “A framework for live video delivery over

GPRS networks”, Proc. AMOC 2000, November 2000, Penang, Malaysia.

[6] Bing Du, A. Maeder and M. Moody, “Video delivery over mobile communication

channels”, Presentation at CRC-SS annual conference, Adelaide, Australia, 2000.

[7] Bing Du and Anthony Maeder, “Approaches to Video Transmission over GSM

Networks”, Proc. SAICSIT 99, South Africa.

[8] GSM 05.03 Digital Cellular Telecommunications System; Channel Coding, 1999.

[9] J. G. Proakis, “Digital Communication”, McGraw-Hill, 1995.

[10] L. H. Charles Lee, “Convolutional Coding – Fundamentals and applications”,

Artech House, 1997.

[11] R. B. Wells, “Applied Coding and Information Theory for Engineers”, Prentice-

Hall, 1999.

[12] J. L. Ramsey, “Realization of Optimum Interleavers”, IEEE Trans. Inform. Theory,

Vol. IT-16, 1970, pp. 338-345.

[13] G. D. Jr. Forney, “Burst Correcting Codes for the Classic Bursty Channel”, IEEE

Trans. Commun. Tech., vol. COM-19, October 1971, pp. 772-781.

158


Visual”, 2001.



[16] Marc Boisseau, “An Introduction to Atm Technology”, International Thomson

Publishing, October 1995.

[17] Uyless Black, “ATM: Foundation for Broadband Networks”, Prentice Hall,

December 1998.





Relay”, accepted by IADIS International WWW/Internet 2003 Conference,

Algarve, Portugal, November 2003.

159

Salesman in Bursty Errors

5

10

15

20

25

30

35

40

1 5 9 13 17 21 25 29 33 37 41 45 49

Frame Number

PS

NR

ECC with InterleavingPacketisationError Free

Figure 9.7 Performance of Salesman with bursty errors (the interleaving is based on frame)

Video with Burst Loss

5

10

15

20

25

30

35

40

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Frame Number

PS

NR

ECCPacketisationError Free

Figure 9.8 Performance of Salesman with burst loss (the interleaving is based on frame)

160

Table 9-2 Bit number comparison between Packetization(450) and ECC (9/10) for Salesman


10 10464 10504 11 9656 9640 12 8624 8640 13 9600 9704 14 10304 10376 15 10384 10272 16 8848 8792 17 10064 10032 18 14744 14552 19 17048 16848 20 14488 14464 21 10912 11080 22 10728 10824 23 9208 9200 24 6560 6544 25 5824 5784 26 7160 7160 27 8128 8112 28 6968 6984 29 5192 5208 30 6040 6048 31 6952 7008 32 7640 7728 33 8432 8488 34 11248 11344 35 14128 14056 36 16296 16224 37 14480 14424 38 15168 15048 39 13632 13648 40 11304 11360 41 11560 11504 42 10928 10784 43 10400 10392 44 11064 11040 45 11704 11848 46 9640 9776 47 9128 9144 48 8520 8616 49 7928 7968

Total 595480 599168 Average 11909.6 11983.36

161

Table 9-3 Bit number comparison between Packetization(450) and ECC (7/8) for Salesman


10 10760 10504 11 9928 9640 12 8872 8640 13 9872 9704 14 10600 10376 15 10680 10272 16 9096 8792 17 10352 10032 18 15160 14552 19 17536 16848 20 14904 14464 21 11216 11080 22 11040 10824 23 9472 9200 24 6752 6544 25 5992 5784 26 7360 7160 27 8360 8112 28 7168 6984 29 5344 5208 30 6208 6048 31 7152 7008 32 7856 7728 33 8672 8488 34 11568 11344 35 14528 14056 36 16760 16224 37 14896 14424 38 15600 15048 39 14016 13648 40 11624 11360 41 11888 11504 42 11240 10784 43 10696 10392 44 11376 11040 45 12032 11848 46 9912 9776 47 9392 9144 48 8760 8616 49 8160 7968

Total 612440 599168 Average 12248.8 11983.36

162


Frame Number ECC(7/8) Packetization(380) 0 81600 84248 1 9008 9032 2 14848 14704 3 17096 16944 4 16328 16072 5 13128 13096 6 9312 9232 7 8888 8920 8 9720 9656 9 9712 9736

10 10760 10704 11 9928 9824 12 8872 8856 13 9872 9856 14 10600 10560 15 10680 10512 16 9096 8968 17 10352 10272 18 15160 14800 19 17536 17312 20 14904 14744 21 11216 11320 22 11040 11112 23 9472 9512 24 6752 6640 25 5992 5928 26 7360 7360 27 8360 8256 28 7168 7144 29 5344 5288 30 6208 6248 31 7152 7184 32 7856 7880 33 8672 8712 34 11568 11656 35 14528 14304 36 16760 16480 37 14896 14696 38 15600 15416 39 14016 13888 40 11624 11592 41 11888 11832 42 11240 11072 43 10696 10632 44 11376 11296 45 12032 12104 46 9912 10000 47 9392 9344 48 8760 8752 49 8160 8192

Total 612440 611888 Average 12248.8 12237.76

163



10 11296 10504 11 10424 9640 12 9312 8640 13 10368 9704 14 11128 10376 15 11216 10272 16 9552 8792 17 10864 10032 18 15920 14552 19 18416 16848 20 15648 14464 21 11776 11080 22 11584 10824 23 9944 9200 24 7088 6544 25 6288 5784 26 7728 7160 27 8776 8112 28 7528 6984 29 5608 5208 30 6520 6048 31 7504 7008 32 8248 7728 33 9104 8488 34 12144 11344 35 15256 14056 36 17600 16224 37 15640 14424 38 16376 15048 39 14720 13648 40 12200 11360 41 12480 11504 42 11800 10784 43 11232 10392 44 11944 11040 45 12632 11848 46 10408 9776 47 9856 9144 48 9200 8616 49 8560 7968

Total 643000 599168 Average 12860 11983.36

164


Frame Number ECC(5/6) Packetization(250)0 85680 88376 1 9456 9576 2 15592 15432 3 17952 17600 4 17144 16968 5 13784 13704 6 9776 9704 7 9328 9368 8 10208 10176 9 10192 10216

10 11296 11256 11 10424 10328 12 9312 9304 13 10368 10312 14 11128 11080 15 11216 10984 16 9552 9416 17 10864 10720 18 15920 15544 19 18416 18008 20 15648 15592 21 11776 11968 22 11584 11600 23 9944 9976 24 7088 7056 25 6288 6264 26 7728 7688 27 8776 8632 28 7528 7472 29 5608 5576 30 6520 6552 31 7504 7560 32 8248 8336 33 9104 9160 34 12144 12160 35 15256 14984 36 17600 17408 37 15640 15448 38 16376 16184 39 14720 14584 40 12200 12336 41 12480 12376 42 11800 11608 43 11232 11168 44 11944 11784 45 12632 12640 46 10408 10480 47 9856 9880 48 9200 9264 49 8560 8616

Total 643000 642424 Average 12860 12848.48

165

10 ECC WITH PACKETIZATION So far we have seen that in some extreme situations, the packetization approach is

unable to deliver a decent video output when decoding an error-corrupted video

bitstream, while the ECC scheme can always achieve this although with certain amount

of overhead. However, it may be unrealistic to increase the ECC power by simply

increasing the ECC rate due to the limitation of available channel capacity etc. It will

be interesting to investigate the possibility of combining the advantages of ECC and

packetization approaches.

10.1 Combination of ECC and Packetization

The scheme combining ECC and packetization is quite straightforward, i.e. a

packetization scheme is employed first, followed by an ECC scheme based on segments

of the final compressed video bitstream. It is necessary that the allocation of bitstream

for ECC and packetization should be optimized. The packet size should not be too

small; otherwise the overhead in the bitstream will be increased dramatically, reducing

the coding efficiency. The packet size should not too big either; otherwise the

advantage of packetization cannot be realized.


In this experiment, the compressed video sequence Salesman is protected with three

different schemes in packet loss situation – ECC(5/6), ECC(7/8) and ECC(7/8) plus

Packetization(5000). The packet loss is the same as in last chapter; i.e. there is a burst

166

lost in every segment, which corresponds to a loss of 181 consecutive bits in every

segment. The segment is set to the average frame length of the 50 compressed frames.

The packet size is set to 5000 bits, that means there are there 2 or 3 packets in a P frame

on average, while there are about 17 packet in an I frame on average. The result is

shown in Fig. 10-1. The bit budget for each scheme is listed in Table 10-1. We have

already known from last chapter that packetization alone fails to deliver a viewable

video output. From Figure 10-1 it is clearly shown that ECC(7/8) alone is

unsatisfactory also. Both ECC(5/6) and combination of ECC(7/8) with packetization

deliver satisfactory output. But the combination approach has a marginally better PSNR

than ECC(5/6) alone, while the combination uses fewer bits, which means the

combination approach is more efficient and effective than ECC(5/6) alone. This result

provides us with a fresh view on the packetization approach.

Performance of Salesman with ECC plus Packetization

10

12

14

16

18

20

22

24

26

28

30

1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49

Frame Number

PSN

R

ECC (7/8) plus Packetization (5k)ECC(5/6)ECC(7/8)

Figure 10.1 PSNR of ECC combined with packetization

From previous chapters it can be concluded that the ECC approach alone is better than

the packetization approach alone. However, from the simulation above we can see

167

though packetization is not as effective as ECC, it can be used to increase the

performance of ECC with some sacrifice of coding rate.

Here a question has been raised, when will it be beneficial to use combination of ECC

and packetization compared with ECC alone? Also, how to optimally distribute the

available bit rate into Packetization and ECC when a combination is employed? These

questions will remain as part of future work.

Adding redundancy in the video bitstream at the application layer to realize error

resilience in different ways (resynchronisation/packetization versus ECC) results in

different results. The results in this work are obtained only by simulations. The

performances of these two totally different approaches toward error resilience need to

be tested and compared in a field implementation in the real world.

When ECC is combined with packetization, the IFR should not be used as IFR is based

on the assumption that whole frame only contains one packet, i.e. when packetization is

not used. Actually when a back channel is available, ECC enhanced with IFR is much

more effective and efficient than ECC combined with packetization as shown in

previous chapters and this chapter. If a back channel is not available, ECC with

packetization is an option.

So far three tools including ECC, IFR and Interleaving for error resilience have been

developed based on the SEC approach in addition to the error resilience tools in the

MPEG-4 standard. Each tool has its own advantages and disadvantages. Generally the

ECC approach is superior to the packetization approach, as the former is active and the

latter is passive. They can be used alone or combined with other tools. How and which

combination of these tools to be employed in reality needs to be flexible and optimized

based on each application’s characteristic and transmission’s error patterns.

168

Table 10-1 Bits number comparison when ECC and Packetization are combined

Frame Number ECC (7/8)&Pack5K Packetization450 ECC (5/6) ECC only (7/8) 0 86568 82880 85680 81600 1 9248 8856 9456 9008 2 15184 14344 15592 14848 3 17376 16536 17952 17096 4 16552 15760 17144 16328 5 13384 12848 13784 13128 6 9464 9088 9776 9312 7 9032 8704 9328 8888 8 9888 9480 10208 9720 9 9888 9504 10192 9712

10 10952 10504 11296 10760 11 10104 9640 10424 9928 12 9024 8640 9312 8872 13 10080 9704 10368 9872 14 10744 10376 11128 10600 15 10816 10272 11216 10680 16 9200 8792 9552 9096 17 10488 10032 10864 10352 18 15296 14552 15920 15160 19 17904 16848 18416 17536 20 15184 14464 15648 14904 21 11456 11080 11776 11216 22 11272 10824 11584 11040 23 9640 9200 9944 9472 24 6888 6544 7088 6752 25 6072 5784 6288 5992 26 7480 7160 7728 7360 27 8528 8112 8776 8360 28 7296 6984 7528 7168 29 5368 5208 5608 5344 30 6320 6048 6520 6208 31 7328 7008 7504 7152 32 8048 7728 8248 7856 33 8872 8488 9104 8672 34 11840 11344 12144 11568 35 14856 14056 15256 14528 36 17032 16224 17600 16760 37 15080 14424 15640 14896 38 15824 15048 16376 15600 39 14336 13648 14720 14016 40 11840 11360 12200 11624 41 12104 11504 12480 11888 42 11336 10784 11800 11240 43 10832 10392 11232 10696 44 11528 11040 11944 11376 45 12304 11848 12632 12032 46 10112 9776 10408 9912 47 9552 9144 9856 9392 48 8944 8616 9200 8760 49 8312 7968 8560 8160

Total 626776 599168 643000 612440 Average 12535.52 11983.36 12860 12248.8

169

11 CONCLUSIONS AND FUTURE WORK

This dissertation has explored the possibility of optimizing the utilization of shared

scarce radio channels for live video transmission over a GSM network in the first three

chapters, then concentrated on realizing error resilient video communication in

unfavorable channel conditions; especially in mobile radio channels.

11.1 Optimized utilization of radio channels

To improve the utilization of the scarce radio channel resources for live video

transmission over a GPRS network, the needs of modification to the current network

protocols have been identified and several suggestions on how they can be achieved

have been proposed. The most important contribution is the proposal of the new

method to update the channel capacity to accommodate the different data rate

requirements of different frame types of the compressed video bitstream during a

session of live video communication. The reconfiguration of the multi-slot to have

more than the set of the current active channels should be achieved by means of the

communication between the MS and the BSS, rather than by means of the re-access of

PRACH during the real time transmission, which would involve further contention. The

content of the communication should be imbedded into the video data transmitted from

MS to BSS.

11.2 The proposed error resilience video coding tools

To cope with residual transmission errors, a new concept for error resilience employing

Second Error Control (SEC) has been introduced. The simulation results from our

170

research have proved the great success of the SEC concept. This success really opens a

new direction to the field of video coding and transmissions and other real-time

communications.

Throughout the previous chapters, three error resilient video coding tools including

ECC, IFR and Interleaving have been proposed based on the SEC approach. ECC can

be used in random error situations. Enhanced with Interleaving ECC can also be used

in bursty errors and packet losses situations. IFR can be used in either case when a back

channel is available to further improve the performance of ECC. These three tools are

very effective to protect I frames compared with traditional error resilience techniques,

which is very important and often ignored by many researchers.

The original error resilience tools in MPEG-4 standard including Data Partitioning and

RVLC can still be used with these new tools. However if Data Partitioning is used with

an ECC scheme without employing packetization, a decoding delay of one frame will

be introduced. Also although RVLC can improve the error robustness of a video

bitstream, the cost is not small and this can be seen clearly from Table 11.2 and Table

11.3. The use of RVLC results in an increase of bit rate by 1.13% for Salesman and

1.89% for Akiyo. This is contrary to the statement, claimed by some researchers, that

RVLC could achieve its goal with little or no loss of coding efficiency.

In reality, the employment of these three tools needs to be flexible and optimized. In

random error situations (e.g. ISDN), ECC should be enough; while in bursty error

situations like wireless environments, ECC should be enhanced with interleaving. In

stable residual error conditions, the use of the IFR is not so important as a properly

designed ECC scheme matching the residual error condition can almost correct all

residual errors in the bitstream. But in unstable residual error conditions, the use of IFR

will be crucial to reduce the ‘half image’ effects. If the errors are mainly bursty or from

packet loss, Interleaving should be based on frames rather than on segments.

Employing these new tools, this research reduces the traditional requirement for the

end-to-end bit error rate from 10-5 to 10-2 for video communications without the need of

channel coding. Of course, channel coding is an integrated part of any network, so it is

reasonable to expect that in reality the new approaches will be more satisfactory. Also

the proposed schemes are very effective too to cope both bursty errors and packet loss.

171

Most importantly when the BER of a final bitstream deteriorates to 10-3 in random error

situations, the ECC approach can still deliver a video output which erases all the

residual error effects in the final bitstream, while packetization fails to deliver a

recognizable video output.

Now it is the time to have a more general comparison among the tools proposed in this

thesis and the tools in the standards. In circuit switched networks like ISDN, the FEC

[3] in H.263 is much more efficient and effective as at least the FEC is able to correct

one error bit in a chunk of 492 bits and detect two error bits with only 4% increase in bit

rate. While providing more than 9.9% increase in the data rate when packet size is set

to 600 bits, packetization combined with Data Partitioning and RVLC is less effective

than FEC in H.263 because basically the error resilience tools in MPEG-4 [2] are

passive and so they do not have the capability to correct errors in the encoded video

bitstream. The superiority of ECC over FEC for error resilience has been identified [5]

(also see Chapter 6). In packet switched networks where packet loss can happen often, it

has been shown [1,4] that the ECC approach enhanced with interleaving is more

effective to combat packet loss. In mobile networks, it has been shown in Chapter 6

and Chapter 8 that when the BER of the encoded video bitstream reaches 10-4, the

reconstructed video quality with packetization is generally unacceptable [5], while ECC

video still delivers decent reconstructed video output even when the BER of the final

video bitstream reaches 10-2. In bursty channel errors, [1,4] also shows that the ECC

approach enhanced with interleaving is much more effective than the packetization

approach.

Table 11-1 Performance comparison for Salesman

Resilience Scheme Performance Average Number of Bits Increase of Number of Bits Channel

Conditions ECC Pack ECC Pack ECC Pack ECC Pack

Ran 10-4 11/12 600 32.53 23.31 11693 11768 9.2% 9.9%

Ran 10-3 11/12 600 32.06 9.27 11693 11768 9.2% 9.9%

Bursty error 7/8 380 31.32 16.93 12248 12237 14.38% 14.28%

Burst Loss 5/6 250 26.52 17.42 12860 12848 20.09% 20%

172

Note:

1. Pack in the table represents packetization. The Resilience Scheme in the table

represents Error Resilience Scheme.

2. The performance of the schemes is evaluated in term of PSNR, which is taken as

the average over the 50 frames.

3. The average number of bits is the average value of the number of bits for each

frame over the 50 frames.

4. The Average PSNR of the 50 frames of Salesman at Transmission Error Free is

32.63.

5. The PSNRs with bursty errors and bursty loss are obtained based on segments,

i.e. there is one bursty error or burst loss in every segment. Each segment with

ECC is further encoded with punctured convolutional code, while each segment

with packetization is left as it is before being exposed to the channel errors.

6. The calculation of the increase on number of bits is compared with the basic

bitstream.

To have a more detailed, concrete and direct comparison of the different approaches, the

results we have achieved with the newly proposed error resilience tools so far in

previous chapters are summarized in the Table 11.1.

From Table 11.1, it can be seen that in all situations the ECC approaches outperform the

packetization schemes in terms of the quality of reconstructed video outputs. In the

random error situations the ECC approaches are more efficient than packetization

approaches, here by more efficient we mean that less number of bits for encoding the

video sequences is used. In bursty error or burst loss situations the packetization

schemes produce approximately equal bit rates of the final video bitstreams with the

corresponding ECC schemes, but the reconstructed video outputs delivered by the

packetization approaches are unrecognizable, while ECC can always delivers an

excellent outputs for both situations with negligible sacrifice of the coding efficiency.

Another important fact to mention is that when the ECC coding rate increases, for

instance from 11/12 to 9/10 or 7/8 to 5/6, the capability of the ECC schemes to combat

both random or bursty error improves dramatically. Contrary to this is that when the

173

packet size changes from 450 bits to even 250 bits, the capability of the packetization

schemes to combat bursty error or burst loss does not improve much. As mentioned in

Chapter 6, when packet size has been reduce to such degree or saturation point, further

reducing packet size doesn’t improve the effectiveness of packetization as more markers

(including packetization markers, DC markers or motion markers) introduce more

vulnerability. Even with random errors, when packet size has been reduced to a

saturation point, further reducing packet size only increases bit number without

increasing the effectiveness of the packetization schemes. Although no experiments

have been done on ECC rates higher than 5/6, it is reasonable to expect that when the

ECC rate is further increased to 3/4 or 2/3 the effectiveness of ECC will be further

increased significantly; the only problem with these rates is that the overhead in the

final video bitstream will be increased significantly as well. But in some extreme

situation, this might be the only option anyway.

Now the final conclusion based on the results, which have been achieved through the

simulations in the previous chapters, can be summarized as the following. The active

error resilient SEC approach realized with ECC is much more effective and efficient

than the passive error resilience tools represented by the packetization or

resynchronization approach in the current video coding standards to combat random

residual errors in the final bitstream for real-time applications. Enhanced with

interleaving ECC is also very effective to cope with both bursty errors and packet

losses. The ECC schemes can be further improved with IFR. More importantly, the

SEC approach is simpler and more easily implemented compared with current error

resilience techniques. This conclusion also calls for a process to re-examine the MPEG-

4 and other video coding standards.

Some simple and direct applications of the research output will include mobile video

telephony, wireless video surveillance, remote video conferencing, remote medical

imaging and wireless multimedia communication.

11.3 Future Research Directions

Future work can be focused in the following directions.

174

The MAC (Medium Access Control) protocol for GPRS network or other network

needs to be modified to make the live video communication over GPRS or other

networks more efficient, which, instead of allocating the radio resources based on

contention among applications, should make the network have some kind of mechanism

to update the channel resources allocation for live video communication automatically

and periodically to accommodate the need for I frame transmission of the video

bitstream.

For ECC video, the coding efficiency can be further improved if a dynamic ECC is

designed and implemented for video communication in mobile environment, so that the

ECC rate can follow the change of residual error conditions.

The distribution of error control between first error control and SEC needs to be

optimized. The distribution of the available bandwidth of radio channels for source

coding, first error control and SEC needs to be optimized as well. A generic rate

control algorithm based on these optimum distributions will be more effective and

efficient. More effective first error control schemes including FEC and ARQ at the data

link layer need to further improved and investigated taking the second error control at

the application layer into consideration.

More accurate and effective error detection techniques during the video decoding

process need to be investigated to improve the effectiveness of the IFR and error

concealment.

To cope with a wide range of residual error conditions, the optimum puncturing patterns

of good base convolutional codes need to be further explored. As stated in Chapter 4, at

this stage the reported highest punctured code rate for base code (171,133) is 16/17,

while the highest punctured code rate for base code (561,752) is 13/14. The puncturing

pattern of higher code rate of 14/15, 15/16, 16/17, 17/18, 18/19, etc. needs to be found

for base code (561,752) or other good base codes including those with constraint length

longer than 9, as higher rate codes will make ECC video more efficient in favorable

channel conditions.

The optimum combination of ECC and packetization in some extreme situations

(mainly in bursty channel errors and packet loss) needs to be further investigated,

175

although in most situations ECC alone should be a first choice, especially for random

channel errors.

Convolutional code is not the only code to achieve SEC, the possibility of using other

error correction codes, say, Turbo codes [6], needs to be explored as well.

Actually ECC video is only one application of SEC. To apply the SEC approach to

other applications, including real-time and non real-time, can also be considered.

Another crucial research direction is to design the high speed chips performing

computing for the punctured convolutional coding to make the use of the constraint

length of the convolutional code longer than 9 realistic and economic for real time

applications. This will improve the performance of ECC or SEC significantly and make

the application of SEC to other real-time communication more efficient and effective.

References


Proceedings of Iranian Conference on Electrical Engineering (ICEE) 2003, May 6-

8, 2003, Shiraz, Iran.


Visual”, 2001.



Proceedings of Picture Coding Symposium 2003, Saint-Malo, France, 23 - 25 April

2003, pp.99-103.




[6] L. Hanzo, T. H. Liew, B. L. Yeap, “Turbo Coding, Turbo Equalisation and Space-

Time Coding for Transmission over Fading Channels”, Wiley Europe, July 2002.

176

Table 11-2 Bit number comparison between basic and RVLC for Salesman

Frame Number RVLC Basic 0 73744 71392 1 7992 7872 2 13112 12984 3 15032 14952 4 14400 14280 5 11592 11480 6 8192 8136 7 7824 7768 8 8568 8496 9 8560 8488

10 9504 9408 11 8752 8680 12 7808 7752 13 8712 8632 14 9336 9264 15 9384 9336 16 7968 7952 17 9080 9048 18 13288 13256 19 15440 15336 20 13152 13032 21 9936 9808 22 9768 9648 23 8360 8280 24 5944 5896 25 5240 5232 26 6464 6432 27 7368 7304 28 6296 6264 29 4688 4664 30 5456 5424 31 6328 6248 32 6968 6864 33 7672 7576 34 10216 10112 35 12840 12704 36 14760 14656 37 13088 13024 38 13728 13640 39 12384 12256 40 10232 10160 41 10456 10392 42 9840 9824 43 9392 9352 44 10032 9944 45 10616 10520 46 8744 8664 47 8280 8208 48 7744 7656 49 7200 7128

Total 541480 535424 Average 10829.6 10708.48

177

Table 11-3 Bits number comparison between basic and RVLC for Akiyo

Frame Number RVLC Basic 0 44080 42880 1 488 472 2 616 600 3 616 600 4 1040 1016 5 976 952 6 1024 1008 7 1200 1176 8 1432 1400 9 1608 1576

10 1504 1480 11 1344 1320 12 1904 1880 13 1856 1840 14 2680 2632 15 3864 3808 16 5392 5272 17 6496 6392 18 7280 7096 19 7080 6968 20 6328 6248 21 5840 5760 22 4848 4776 23 4032 3968 24 3976 3928 25 2960 2904 26 2872 2824 27 3544 3480 28 4456 4400 29 5120 5024 30 5592 5496 31 5664 5544 32 4792 4720 33 4008 3952 34 4344 4264 35 4616 4552 36 4848 4768 37 4712 4648 38 4888 4832 39 4984 4896 40 4480 4416 41 4472 4400 42 3992 3928 43 3472 3416 44 3496 3456 45 4080 4008 46 4416 4352 47 4640 4560 48 4704 4640 49 4160 4096

Total 226816 222624 Average 4536.32 4452.48

ECC Video: An Active Second Error Control Approach for...

Documents

Transcript of ECC Video: An Active Second Error Control Approach for...