The - IARIA Journals · Libraries are permitted to photocopy or print, ... Rana Hamed, German...

The International Journal on Advances in Telecommunications is published by IARIA.

ISSN: 1942-2601

journals site: http://www.iariajournals.org

contact: [email protected]

Responsibility for the contents rests upon the authors and not upon IARIA, nor on IARIA volunteers,

staff, or contractors.

IARIA is the owner of the publication and of editorial aspects. IARIA reserves the right to update the

content for quality improvements.

Abstracting is permitted with credit to the source. Libraries are permitted to photocopy or print,

providing the reference is mentioned and that the resulting material is made available at no cost.

Reference should mention:

International Journal on Advances in Telecommunications, issn 1942-2601

vol. 3, no. 1 & 2, year 2010, http://www.iariajournals.org/telecommunications/

The copyright for each included paper belongs to the authors. Republishing of same material, by authors

or persons or organizations, is not allowed. Reprint rights can be granted by IARIA or by the authors, and

must include proper reference.

Reference to an article in the journal is as follows:

<Author list>, “<Article title>”

International Journal on Advances in Telecommunications, issn 1942-2601

vol. 3, no. 1 & 2, year 2010, <start page>:<end page> , http://www.iariajournals.org/telecommunications/

IARIA journals are made available for free, proving the appropriate references are made when their

content is used.

Sponsored by IARIA

www.iaria.org

Copyright © 2010 IARIA

International Journal on Advances in Telecommunications

Volume 3, Number 1 & 2, 2010

Editor-in-Chief

Tulin Atmaca, IT/Telecom&Management SudParis, France

Editorial Advisory Board

Michael D. Logothetis, University of Patras, Greece

Jose Neuman De Souza, Federal University of Ceara, Brazil

Eugen Borcoci, University "Politehnica" of Bucharest (UPB), Romania

Reijo Savola, VTT, Finland

Haibin Liu, Aerospace Engineering Consultation Center-Beijing, China

Advanced Telecommunications


Rui L.A. Aguiar, Universidade de Aveiro, Portugal


Symeon Chatzinotas, University of Surrey, UK

Denis Collange, Orange-ftgroup, France

Todor Cooklev, Indiana-Purdue University - Fort Wayne, USA

Jose Neuman De Souza, Federal University of Ceara, Brazil

Sorin Georgescu, Ericsson Research, Canada

Paul J. Geraci, Technology Survey Group, USA

Christos Grecos, University if Central Lancashire-Preston, UK

Manish Jain, Microsoft Research – Redmond

Michael D. Logothetis, University of Patras, Greece

Natarajan Meghanathan, Jackson State University, USA

Masaya Okada, ATR Knowledge Science Laboratories - Kyoto, Japan

Jacques Palicot, SUPELEC- Rennes, France

Gerard Parr, University of Ulster in Northern Ireland, UK

Maciej Piechowiak, Kazimierz Wielki University - Bydgoszcz, Poland

Dusan Radovic, TES Electronic Solutions - Stuttgart, Germany

Matthew Roughan, University of Adelaide, Australia

Sergei Semenov, Nokia Corporation, Finland

Carlos Becker Westphal, Federal University of Santa Catarina, Brazil

Rong Zhao, Detecon International GmbH - Bonn, Germany

Piotr Zwierzykowski, Poznan University of Technology, Poland

Digital Telecommunications

Bilal Al Momani, Cisco Systems, Ireland


Claus Bauer, Dolby Systems, USA

Claude Chaudet, ENST, France

Gerard Damm, Alcatel-Lucent, France

Michael Grottke, Universitat Erlangen-Nurnberg, Germany

Yuri Ivanov, Movidia Ltd. – Dublin, Ireland

Ousmane Kone, UPPA - University of Bordeaux, France

Wen-hsing Lai, National Kaohsiung First University of Science and Technology, Taiwan

Pascal Lorenz, University of Haute Alsace, France

Jan Lucenius, Helsinki University of Technology, Finland

Dario Maggiorini, University of Milano, Italy

Pubudu Pathirana, Deakin University, Australia

Mei-Ling Shyu, University of Miami, USA

Communication Theory, QoS and Reliability


Piotr Cholda, AGH University of Science and Technology - Krakow, Poland

Michel Diaz, LAAS, France

Ivan Gojmerac, Telecommunications Research Center Vienna (FTW), Austria

Patrick Gratz, University of Luxembourg, Luxembourg

Axel Kupper, Ludwig Maximilians University Munich, Germany

Michael Menth, University of Wuerzburg, Germany

Gianluca Reali, University of Perugia, Italy

Joel Rodriques, University of Beira Interior, Portugal

Zary Segall, University of Maryland, USA

Wireless and Mobile Communications

Tommi Aihkisalo, VTT Technical Research Center of Finland - Oulu, Finland

Zhiquan Bai, Shandong University - Jinan, P. R. China

David Boyle, University of Limerick, Ireland

Bezalel Gavish, Southern Methodist University - Dallas, USA

Xiang Gui, Massey University-Palmerston North, New Zealand

David Lozano, Telefonica Investigacion y Desarrollo (R&D), Spain

D. Manivannan (Mani), University of Kentucky - Lexington, USA

Himanshukumar Soni, G H Patel College of Engineering & Technology, India

Radu Stoleru, Texas A&M University, USA

Jose Villalon, University of Castilla La Mancha, Spain

Natalija Vlajic, York University, Canada

Xinbing Wang, Shanghai Jiaotong University, China

Ossama Younis, Telcordia Technologies, USA

Systems and Network Communications

Fernando Boronat, Integrated Management Coastal Research Institute, Spain

Anne-Marie Bosneag, Ericsson Ireland Research Centre, Ireland

Huaqun Guo, Institute for Infocomm Research, A*STAR, Singapore

Jong-Hyouk Lee, INRIA, France

Elizabeth I. Leonard, Naval Research Laboratory – Washington DC, USA

Sjouke Mauw, University of Luxembourg, Luxembourg

Reijo Savola, VTT, Finland

Multimedia

Dumitru Dan Burdescu, University of Craiova, Romania

Noel Crespi, Institut TELECOM SudParis-Evry, France

Mislav Grgic, University of Zagreb, Croatia

Christos Grecos, University of Central Lancashire, UK

Atsushi Koike, Seikei University, Japan

Polychronis Koutsakis, McMaster University, Canada

Chung-Sheng Li, IBM Thomas J. Watson Research Center, USA

Artur R. Lugmayr, Tampere University of Technology, Finland

Parag S. Mogre, Technische Universitat Darmstadt, Germany

Chong Wah Ngo, University of Hong Kong, Hong Kong

Justin Zhan, Carnegie Mellon University, USA

Yu Zheng, Microsoft Research Asia - Beijing, China

Space Communications

Emmanuel Chaput, IRIT-CNRS, France

Alban Duverdier, CNES (French Space Agency) Paris, France

Istvan Frigyes, Budapest University of Technology and Economics, Hungary

Michael Hadjitheodosiou ITT AES & University of Maryland, USA

Mark A Johnson, The Aerospace Corporation, USA

Massimiliano Laddomada, Texas A&M University-Texarkana, USA

Haibin Liu, Aerospace Engineering Consultation Center-Beijing, China

Elena-Simona Lohan, Tampere University of Technology, Finland

Gerard Parr, University of Ulster-Coleraine, UK

Cathryn Peoples, University of Ulster-Coleraine, UK

Michael Sauer, Corning Incorporated/Corning R&D division, USA

Additional reviewers

Vasilis Stilianakis, University of Patras, Greece

International Journal on Advances in Telecommunications

Volume 3, Numbers 1 & 2, 2010

CONTENTS

Narrowband Interference Suppression for MIMO MB-OFDM UWB Communication

Systems

Georgi Iliev, Technical University of Sofia, Bulgaria

Zlatka Nikolova, Technical University of Sofia, Bulgaria

Vladimir Poulkov, Technical University of Sofia, Bulgaria

Miglen Ovtcharov, Technical University of Sofia, Bulgaria

1 - 8

Efficient Variable Block Size Selection for AVC Low Bitrate Applications

Ihab Amer, Advanced Technology Information Processing Systems, Canada

Graham Jullien, Advanced Technology Information Processing Systems, Canada

Wael Badawy, IntelliView Technologies Inc., Canada

Adrian Chirila-Rus, Xilinx Inc., USA

Robert Turney, Xilinx Inc., USA

Rana Hamed, German University in Cairo (GUC), Egypt

9 - 27

Exploiting Concatenation in the Design of Low-Density Parity-Check Codes

Marco Baldi, Polytechnic University of Marche, Italy

Giovanni Cancellieri, Polytechnic University of Marche, Italy

Franco Chiaraluce, Polytechnic University of Marche, Italy

28 - 38

An Iterative Algorithm for Compression of Correlated Sources at Rates Approaching the

Slepian-Wolf Bound: Theory and Analysis

F. Daneshgaran, Calif. State Univ, USA

M. Laddomada, Texas A&M University-Texarkana, USA

M. Mondin, Politecnico di Torino, Italy

39 - 48

Scalable and Robust Wireless JPEG 2000 Images and Video Transmission with Adaptive

Bandwidth Estimation

Max Agueh, LACSC - ECE, France

Stefan Ataman, LACSC - ECE, France

Cristina Mairal, LACSC - ECE, France

Henoc Soude, LIASD - Université Paris 8, France

49 - 58

Modelling of Mobile Workflows with UML

Michael Decker, University of Karlsruhe (TH), Germany

59 - 71

Network Prediction for Energy-Aware Transmission in Mobile Applications

Ramya Sri Kalyanaraman, Helsinki Institute for Information Technology HIIT, Finland

72 - 82

Yu Xiao, Aalto University, Finland

Antti Ylä-Jääski, Aalto University, Finland

Highly Accurate Location-Aware Information Delivery With PosPush

Zhao Junhui, NEC Laboratories, China

Wang Yongcai, NEC Laboratories, China

83 - 92

Micro-Mobility Solution Based on Intra-domain multicast and Congestion Avoiding for

Two-Nodes Mobile IP Network

Yacine Benallouche, Université de Versailles Saint-Quentin, France

Dominique Barth, Université de Versailles Saint-Quentin, France

93 - 103

Analysis of Communication Overhead for a Clustering-Based Security Protocol in Ad Hoc

Networks

C. Maghmoumi, University of Haute-Alsace, France

H. Abouaissa, University of Haute-Alsace, France

J. Gaber, Belfort University, France

P. Lorenz, University of Haute-Alsace, France

104 - 113

Narrowband Interference Suppression for MIMO MB-OFDM UWBCommunication Systems

Georgi IlievDepartment of Telecommunications

Technical University of SofiaSofia, Bulgaria

e-mail: [email protected]

Zlatka NikolovaDepartment of Telecommunications



Vladimir PoulkovDepartment of Telecommunications



Miglen OvtcharovDepartment of Telecommunications



Abstract—Ultrawideband (UWB) systems show excellentpotential benefits when used in the design of high-speed digitalwireless home networks. The constantly-increasing demand forhigher data transmission rates can be satisfied by exploitingboth multipath- and spatial-diversity, using multiple-inputmultiple-output (MIMO) together with proper modulation andcoding techniques. Unlike conventional MIMO OFDMsystems, the performance of MIMO MB-OFDM UWB systemsdoes not depend on the temporal correlation of thepropagation channel although narrowband interference (NBI)is still a problem. In this paper we put forward a technique forsuppressing NBI by the use of adaptive narrowband filtering.The method is compared experimentally with other algorithmsfor the identification and cancellation/suppression of complexNBI in OFDM single-band and multiband (MB) UWB systems.The study shows that the different schemes offer slightlydiffering performances, depending on the parameters of theMIMO UWB system. The proposed complex adaptivenarrowband filtering technique is an optimal solution whichoffers a good balance between NBI suppression efficiency andcomputational complexity.

Keywords - Narrowband interference (NBI); Multibandorthogonal frequency-division multiplexing (MB-OFDM);Multiple-input multiple-output (MIMO); Ultrawideband(UWB); Variable complex filters; Adaptive complex filterbanks.

I. INTRODUCTION

In recent years, ultrawideband (UWB) signals have beenemployed extensively in communications and rangingapplications. Depending on how the available bandwidth ofthe system is used, UWB can be divided into two groups:single-band and multiband.

Conventional UWB technology is based on single-bandsystems and employs carrier-free communications [1]–[3]. Itis implemented by directly modulating information into asequence of impulse-like waveforms; support for multiple

users is by means of time-hopping or direct sequencespreading approaches.

The UWB frequency band of multiband UWB systemsis divided into several sub-bands, with the bandwidth ofeach of them being at least 500 MHz [4] [5]. By interleavingthe symbols across sub-bands, multiband UWB canmaintain the power of the transmission as though a widebandwidth were being utilized. The advantage of themultiband approach is that it allows the information to beprocessed over a much smaller bandwidth, thereby reducingoverall design complexity as well as improving spectralflexibility and worldwide adherence to the relevantstandards. In order to capture the multipath energyefficiently, the orthogonal frequency division multiplexing(OFDM) technique is used to modulate the information ineach sub-band. The major difference with MB-OFDM, asopposed to more traditional OFDM schemes, is that theMB-OFDM symbols are not continuously sent on onefrequency-band; instead they are interleaved over differentsub-bands across both time and frequency. Multiple accessof multiband UWB is enabled by the use of suitably-designed frequency-hopping sequences over the set of sub-bands.

Most UWB applications are used indoors, thus providingan excellent transmission environment for MIMOimplementation. Moreover, the GHz center frequency ofUWB systems makes the spacing between antenna arrayelements less critical. In consequence, the combination ofUWB and MIMO technology becomes an effective methodof achieving the very high data rates required for high-speedshort-range communications. Multi-antenna UWBtechnology has been well-explored in conventional single-band UWB systems [6]–[8]. Conversely, research intomultiband UWB systems employing multiple antennae is asyet not complete, so the full benefits of UWB-MIMOcommunications systems have therefore not been entirelyexplored.

1

International Journal on Advances in Telecommunications, vol 3 no 1 & 2, year 2010, http://www.iariajournals.org/telecommunications/

2010, © Copyright by authors, Published under agreement with IARIA - www.iaria.org

The main difference between MIMO OFDM and MIMOMB-OFDM lies in the channel characteristics. The blockdiagram for MIMO MB-OFDM is shown in Figure 1.

Encoder

IFFTadd

prefixD/A X

IFFTadd

prefixD/A X

Decoder

FFTremoveprefix

A/DX

FFTremoveprefix

A/DX

)tf2jexp( k

C

)tf2jexp( k

C )tf2jexp( k

C

)tf2jexp( k

C

1 1

Nt Nr

Figure 1. MIMO MB-OFDM UWB system.

In this paper, we look at UWB MIMO systems withMB-OFDM, a leading technology for many broadbandcommunication systems [9]. Due to their relatively lowtransmission power, such systems are very sensitive tonarrowband interference (NBI). Because of the spectralleakage effect caused by Discrete Fourier Transform (DFT)demodulation at the OFDM receiver, many subcarriers nearthe interference frequency suffer from serious Signal-to-Interference Ratio (SIR) degradation, which can adverselyaffect or even block communications [10].

The issue of NBI suppression in wideband OFDMsystems is of primary importance in such systems and hasbeen studied extensively in the last few years [11]. Twomain types of approach are generally adopted. The firstinvolves various frequency excision methods, whereby theaffected frequency bins of the OFDM symbol are excised ortheir usage avoided [12]. The second approach is related to“cancellation techniques” which are aimed at eliminating ormitigating the effect of the NBI on the received OFDMsignal [13]. In most cases, the degradation in a MIMO MB-OFDM-based receiver is beyond the reach of the frequencyexcision method when the SIR is less than 0 dB. Thus,mitigation techniques employing cancellation methods, oneof which is based on complex adaptive filtering and theother on NBI identification and cancellation, arerecommended as an alternative [14]-[17].

In this paper, a method for NBI cancellation based onadaptive complex digital filtering, using the Least MeanSquares (LMS) algorithm to adapt to the central frequencyof the NBI [18], is presented and the method is compared toother mitigation techniques. The study shows that allschemes give different performances, depending on theparameters of the MB-OFDM MIMO UWB system, but theproposed method offers considerable benefits, including lowcomputational complexity.

The rest of the paper is organized as follows: In SectionII the adaptive filtering scheme using variable complexadaptive narrowband filter is considered. An adaptivecomplex notch filter bank for the cancellation/enhancementof multiple complex signals is also proposed. Thesimulation model for NBI suppression in UWB systems isdescribed in Section III. Section IV presents the simulationresults for NBI suppression in UWB channels and MIMOMB-OFDM systems. Finally, Section V concludes thepaper.

II. NBI SUPPRESSION USING ADAPTIVE COMPLEX

FILTERING

In comparison with the information wideband signal, theinterference occupies a much narrower frequency band buthas a higher-power spectral density [19]. On the other hand,the wideband signal usually has autocorrelation propertiesquite similar to those of AWGN (Adaptive Wide GaussianNoise), so filtering in the frequency domain is possible.

A. Variable Complex Narrowband Digital Filter

The filtering process is carried out at the input of theOFDM demodulator by the use of a variable complex filterwith independent tuning of the central frequency andbandwidth. This is then turned into an adaptive narrowbandfilter to be implemented in an OFDM receiver.

A variable complex bandpass (BP) first-order digitalfilter designated LS1 (Low Sensitivity) is designed [20](Figure 2). The transfer functions of the LS1 section, all ofBP type, are:

.z)1ˆ2(zcos)1ˆ2(21

zsin)ˆ1(ˆ2)z(H)z(H

,z)1β2(zθcos)1β2(21

z)1β2(ˆzθcosβ2ˆ)z(H)z(H

221

1

IRRI

221

212

IIRR

The composed multiplier is 12ˆ .

The bandwidth can be tuned by trimming the singlecoefficient , whereas controls the central frequency ω0.

This design of variable complex digital filter has twovery important advantages: firstly, an extremely lowpassband sensitivity which offers resistance to quantizationeffects; secondly, independent control of central frequencyand filter bandwidth over a wide frequency range.

sin Out Re

In Re

In Im

z-1 +

+

+sin

cos

+

+

z-1

cos

+

+

Out Im+

Figure 2. Variable complex BP LS1 filter.

B. Adaptive Complex Filtering

In Figure 3, an adaptive complex notch/BP narrow-bandsystem based on the LS1 variable complex filter is shown[18].

In the following, we consider the input/output relationsfor corresponding BP/notch filters Eq.(2)-(9). For the BPfilter we have the following real output:

)n(y)n(y)n(y 2R1RR

2



where

),2n(x)12(2)1n(x)n(cos4

)n(x2)2n(y)12(

)1n(y)n(cos)12(2)n(y

RR2

R1R2

1R1R

).1n(x)n(sin)1(4

)2n(y)12(


I

2R2

2R2R

yR is the real output and xR is the real input.

VARIABLECOMPLEX

FILTER

ADAPTIVEALGORITHM

xR(n)

+

+

xI(n)

eR(n)

yI(n)

yR(n)

e I(n)

Figure 3. Block-diagram of a BP/notch adaptive complex filter section.

The imaginary output is given by the followingequation:

)n(y)n(y)n(y 2I1II

),1n(x)n(sin)1(4

)2n(y)12(


R

1I2

1I1I

and

),2n(x)12(2)1n(x)n(cos4

)n(x2)2n(y)12(


II2

I2I2

2I2I

where yI is the imaginary output and xI is the imaginaryinput.

For the notch filter we have a real output:

)n(y)n(x)n(e RRR

and imaginary output:

)n(y)n(x)n(e III

The cost function is the power of the notch filter outputsignal:

)]n(e)n(e[

where )n(je)n(e)n(e IR

The LMS algorithm is then applied to update the filtercoefficient responsible for the central frequency as follows:

)]n(y)n(eRe[)n()1n( '

is the step size controlling the speed of convergence,(*) denotes complex-conjugate, y(n) is the derivative of

)n(jy)n(y)n(y IR with respect to the coefficient

subject of adaptation, where

)1n(x)n(cos)1(4

)1n(y)n(sin)12(2

)1n(x)n(sin4

)1n(y)n(sin)12(2)n(y

I

2R

R2

1R'R

and

).1n(x)n(sin4

)1n(y)n(sin)12(2

)1n(x)n(cos)1(4

)1n(y)n(sin)12(2)n(y

I2

2I

R

1I'I

In order to ensure the stability of the adaptive algorithm,the range of the step size µ should be set according to [21]

2L

P0

In this case L is the filter order, σ2 is the power of thesignal y(n) and P is a constant which depends on thestatistical characteristics of the input signal. In mostpractical situations K is approximately equal to 0.1.

This approach can easily be extended to the complexadaptive narrow-band filter bank (Figure 4) [22].

The notch filter bank output signal is described by thefollowing formulae:

)n(y)n(x)n(eM

1i

RiRR FB

)n(y)n(x)n(eM

1i

IiIIFB

)n(je)n(e)n(eFBFB IRFB

where M=NR is the number of the receiver’s antennae.

The LMS algorithm is applied to adapt the filter bankcoefficients [21]:

)]n(y)n(eRe[)n()1n( 'iFBii

for M1i

The main advantages of both the adaptive structure andthe filter bank lie in their low computational complexity andfast convergence. The very low sensitivity of the variablecomplex filter section ensures a high tuning accuracy, evenwith severely quantized multiplier coefficients and thegeneral efficiency of the adaptation [22].

3



VARIABLE COMPLEXBANDPASS FILTER



ADAPTIVEALGORITHM

+

+

1

M

2

+

xR(n)

xI(n)

yR1(n)eIFB(n)

eRFB(n)

yI1(n)

yRM(n)

yIM(n)

yR2(n)

yI2(n)

Figure 4. Adaptive complex notch filter bank for thecancellation/enhancement of NBI.

III. SIMULATION MODEL FOR NBI SUPPRESSION IN UWBSYSTEMS

In order to compare the proposed method with other NBIsuppression methods, such as Frequency Excision (FE) [23]and Frequency Identification and Cancellation (FIC) [17], anumber of simulations relative to complex basebandpresentation are performed.

The FE method is applied to the OFDM signal with acomplex NBI at the input of the demodulator. The signal isthen converted into the frequency domain by FFT,oversampled by 8 and the noise peaks in the spectra of thesignal are limited to the determined threshold. The signal issubsequently converted back into the time domain andapplied to the input of the demodulator. It should be notedthat, for more precise frequency excision, FFT of a higherorder than the one in the demodulator is applied.

The FIC method is implemented as a two-stagealgorithm. First, the complex NBI frequency is estimated byfinding the maximum in the oversampled signal spectrum.Next, using the ML approach, the NBI amplitude and phaseare estimated. The second stage realizes the NLSoptimization algorithm, where precise estimations of NBIcomplex amplitude, phase and frequency are performed.

For the realization of the NBI filtering method, thecomplex adaptive notch filter is connected at the receiver’sinput. The adaptation algorithm tunes the filter in such away that its central frequency and bandwidth match the NBIsignal spectrum. In the simulations, the central frequency ofthe notch filter is chosen to be equal to the NBI centralfrequency, while its bandwidth is equal to 20% of thebandwidth between two adjacent OFDM sub-carriers.

In the OFDM demodulator, the prefix and suffix guardintervals are removed and a 256-point FFT is applied. Thepilot tones are removed and a channel equalization of theOFDM symbol is performed. Finally, the corresponding 64-QAM demodulation and decoding is carried out.

The information source is modeled by a generator ofuniformly distributed random integers based on themodified version of Marsaglia's “Subtract with borrowalgorithm” [24]. The method can generate all the double-precision values in the closed interval [2-53, 1-2-53].Theoretically, this method can generate over 21492 valuesbefore repeating itself.

The channel encoder is implemented as a convolutionalencoder. In the simulation, the code rate: RC = 1/2 is chosen.In the receiver, a Viterbi hard threshold convolutionaldecoder is implemented.

A block interleaver-deinterleaver is used in thesimulation. The algorithm chooses a permutation tablerandomly, using the initial state input that is provided.

The digital modulator is implemented as 256-pointIFFT. In the OFDM block demodulator, the prefix andsuffix guard intervals are removed from each channel and256-point FFT is also applied. The OFDM symbol consistsof 128 data bins and 2 pilot tones. Each discrete piece ofOFDM data can use different modulation formats. In theexperiments, Grey-encoded 64-QAM modulation format isused. After the IFFT process, the prefix and suffix guardintervals are added.

For the wireless channel, a multi-ray model with directand delayed (reflected) components is used. The delayedcomponents are subject to fading, while the direct ones arenot. To preserve total signal energy, the direct and delayedsignal components are scaled by the square roots ofK/(K+1) and 1/(K+1) respectively. To simplify simulations,a complex baseband representation of the system is used[25] [26].

The NBI is modeled as a single complex tone, thefrequency of which is located centrally between twoadjacent OFDM sub-carriers.

IV. EXPERIMENTS AND SIMULATION RESULTS

A. NBI Suppression for UWB Channels

Using the above general simulation model, differentexperiments were performed, estimating the Bit Error Ratio(BER) as a function of the SIR. Four types of channels areconsidered, i.e., AWGN, CM1, CM2 and CM3 [27] [28].The CM1, CM2 and CM3 channels are subject to strongfading and additionally background AWGN is applied, sothat the signal to AWGN ratio at the input of the OFDMreceiver is 20 dB. In Figure 5, a complex AWGN channel isconsidered. The SIR is varied from -20 dB to 0 dB. It can beseen that for high NBI, where the SIR is less than 0 dB, allmethods lead to a significant improvement in performance.The complex adaptive filtering scheme gives betterperformance than the FE method. This could be explainedby the NBI spectral leakage effect caused by DFTdemodulation at the OFDM receiver, when many sub-carriers near the interference frequency suffer degradation.Thus, filtering out the NBI before demodulation is betterthan frequency excision. The FIC algorithm achieves thebest result because there is no spectrum leakage, as happenswith frequency excision, and there is no amplitude andphase distortion as seen in the adaptive filtering case.

4



-20 -15 -10 -5 0

10-4

10-3

10-2

10-1

SIR [dB]

BE

R

FIC method

FE method

NBI Filtering method

without NBI suppression

Figure 5. BER as a function of SIR for AWGN channel

In the case of CM1, CM2 and CM3 IEEE UWBchannels (Figures 6, 7 and 8) it could be seen that similarresults were obtained.

-20 -15 -10 -5 010

-4

10-3

10-2

10-1

SIR [dB]

BE

R


NBI filtering method

FE method

FIC method

Figure 6. BER as a function of SIR for CM1 channel

-20 -15 -10 -5 010

-4

10-3

10-2

10-1

SIR [dB]

BE

R



FE method

FIC method


-20 -15 -10 -5 010

-3

10-2

10-1

SIR [dB]

BE

R



FE method

FIC method


It should be noted that the adaptive filtering scheme andfrequency cancellation scheme lead to a degradation of theoverall performance when SIR > 0. This is due either to theamplitude and phase distortion of the adaptive notch filter,or to a wrong estimation of NBI parameters during theidentification. The degradation can be reduced by theimplementation of a higher-order notch filter or by usingmore sophisticated identification algorithms. Thedegradation effect can be avoided by simply switching offthe filtering when SIR > 0. Such a scheme is easilyrealizable as the amplitude of the NBI can be monitored atthe BP output of the filter (Figure 3).

In Figure 9, the results of applying a combination ofmethods are presented. A multi-tone NBI (5 sine-waveinterfering signal) is added to the OFDM signal. One of theNBI tones is 10 dB stronger than the others. The NBI filteris adapted to track the strongest NBI tone, thus preventingthe loss of resolution and AGC saturation. It can be seenthat the combination of frequency excision plus adaptivefiltering improves the performance, and the combination offrequency cancellation plus adaptive filtering is even better.

-20 -15 -10 -5 010

-3

10-2

10-1

SIR [dB]

BE

R


FIC method

FE method

NBI filtering + FIC method

NBI filtering + FE method

Figure 9. BER as a function of SIR for CM3 channel – multi-tone NBI

5



Figure 10 shows BER as a function of SIR for the CM3channel when QPSK modulation is used, the NBI beingmodeled as a complex sine wave. It can be seen that therelative performance of the different NBI suppressionmethods is similar to the one in Figure 6 but the BER ishigher due to the fact that NBI is QPSK modulated.

-20 -15 -10 -5 010

-4

10-3

10-2

10-1

SIR [dB]

BE

R


FE method

FIC method


Figure 10. BER as a function of SIR for CM3 channel –QPSK modulated NBI

B. NBI Suppression for MIMO MB-OFDM systems

To evaluate the performance of the three NBIsuppression methods, (FE, FIC, and our proposed NBIfiltering method), simulations relative to the complexbaseband presentation are conducted. A standard MIMOOFDM receiver is assumed and the suppression methods areapplied to the MIMO OFDM signal with a complex NBI ateach input of the receiver independently.

For the NBI filtering method, the complex adaptivenotch filter bank is used. The adaptation algorithm tunes thefilter at each receiver input so that its central frequency andbandwidth match the NBI signal spectrum. The FrequencyIdentification and Cancellation method estimates thecomplex NBI frequency by determining the maximum in theoversampled signal spectrum per channel.

The OSTBC model for complex signals is realized usingthe methods described in [29], [30]. The number of transmitantennae NT can be set from 1 to 4 as long as the number ofreceive antennae NR can also be set from 1 to 4. For 2x2MIMO system, the code rate is Rc=1 whereas for 3x3 and4x4 MIMO systems the code rate is Rc=1/2.

The MIMO wireless flat fading channel is realized asgiven in [31]. A system with NT transmit antennae and NR

receive antennae is considered. It is assumed that thecomplex channel gain hi,j is a complex Gaussian randomvariable: hi,j~Nc(0,1). As the MIMO channel matrix H isnot known for the receiver, it must be estimated before thestart of the decoding process. The channel estimationmethod based on the optimal training preamble [31] isadopted. OSTB decoding, 64-QAM demodulation and errorcorrection decoding are carried out.

Using the above-mentioned simulation model of theOrthogonal Space-Time Block Coding (OSTBC) MIMOsystem, different experiments were performed to estimatethe BER as a function of the SIR. Four types of systems areconsidered: SISO (1x1), MIMO (2x2), MIMO (3x3) andMIMO (4x4). The MIMO channels are subject to flat fadingand, in addition, background AWGN is applied, so that thesignal to AWGN ratio at the input of the OFDM receiver is15dB. In Figure 11, a complex flat fading AWNG channelwithout NBI suppression is considered. The SIR is variedfrom -20 dB to 0 dB. It can be seen that 4x4 OSTBC MIMOsystem gives the best performance.

-20 -15 -10 -5 010

-4

10-3

10-2

10-1

SIR [dB]

BE

R

MIMO 3x3

MIMO 4x4

1x1 SISO

2x2 MIMO

Figure 11. BER as a function of SIR for MIMO channel

Three NBI suppression techniques are then applied: FE,FIC, and the new NBI adaptive filtering method.

In the case of 2x2, 3x3 and 4x4 MIMO channels(Figures 12, 13 and 14) better results in terms of NBIfiltering are obtained for higher values of antenna diversity.The FE method manifests good performance for high SIR.

-20 -15 -10 -5 010

-4

10-3

10-2

10-1

SIR [dB]

BE

R


FE method

FIC method


Figure 12. BER as a function of SIR for 2x2 MIMO channel

6



-20 -15 -10 -5 010

-5

10-4

10-3

10-2

SIR [dB]

BE

R


FIC method

FE method



-20 -15 -10 -5 010

-5

10-4

10-3

10-2

SIR [dB]

BE

R



FIC method

FE method


The experimental results show that the frequencyidentification and cancellation method achieves the highestperformance. On the other hand, the extremely highcomputational complexity limits its application in terms ofhardware resources. In this respect, the adaptive notch filterturns out to be the optimal NBI suppression scheme, as itoffers very good performance and reasonable computationalcomplexity. The frequency excision method showsrelatively good results and its main advantage is itscomputational efficiency.

TABLE I. COMPUTATIONAL COMPLEXITY COMPARISON

SuppressionMethod

Number ofAdditions

Number ofMultiplications

Complexity

FrequencyExcision

6MN 4MNlog(N) ~O(Nlog(N))

NBI Filtering KMN 28MN+KMN2 ~O(N2)FrequencyIdentificationand Cancellation

2MN2 M(N+2)N3 ~O(N4)

The computational complexity per iteration of themodeled NBI suppression methods are listed in Table I,where: K is a positive integer constant, N is the number ofsamples to be processed, M is the number of receiver’santennas and O() is the complexity estimation function.

V. CONCLUSIONS

In this paper a method for NBI suppression in MIMOMB-OFDM UWB communication systems, using adaptivecomplex narrowband filtering, is proposed. In relation tothis, a comparison with two other schemes for suppressionof complex NBI is performed. The first is a frequencyexcision method while the second employs frequencyidentification and cancellation based on ML and NLSalgorithms. The experiments show that for high NBI, wherethe SIR is less than 0 dB, all three suppression methods leadto a significant improvement in performance.

An optimal solution is the adaptive narrowband filteringmethod, which offers a trade-off between outstanding NBIsuppression efficiency and computational complexity. Analternative approach is to implement a combination of theNBI filtering and frequency cancellation methods, thusimproving overall performance.

ACKNOWLEDGMENT

This work was supported by the Bulgarian NationalScience Fund – Grant No. ДО-02-135/2008 “Research onCross Layer Optimization of Telecommunication ResourceAllocation”.

REFERENCES

[1] M. L.Welborn, “System considerations for ultra-wideband wirelessnetworks,” in IEEE Radio Wireless Conf., Aug. 2001, pp. 5–8.

[2] J. R. Foerster, “The performance of a direct-sequence spreadultrawideband system in the presence of multipath, narrowbandinterference, and multiuser interference,” in IEEE Conf. UltraWideband Systems Tech., May 2002, pp. 87–91.

[3] N. Boubaker and K. B. Letaief, “Ultra wideband DSSS for multipleaccess communications using antipodal signaling,” in IEEE Int. Conf.Commun., vol. 3, May 2003, pp. 11–15.

[4] E. Saberinia and A. H. Tewfik, “Pulsed and nonpulsed OFDM ultrawideband wireless personal area networks,” in IEEE Conf. UltraWideband Systems Tech., Nov. 2003, pp. 275–279.

[5] A. Batra, J. Balakishnan, G. R. Aiello, J. R. Foerster and A. Dabak,“Design of a multiband OFDM system for realistic UWB channelenvironments,” IEEE Transactions Microwave Theory andTechiques, vol. 52, No. 9, pp. 2123–2138, Sep. 2004.

[6] W. P. Siriwongpairat, M. Olfat, and K. J. R. Liu, “Performanceanalysis and comparison of time hopping and direct sequence UWB-MIMO systems,” in EURASIP J. Appl. Signal Process. Special Issueon “UWBState of the Art”, vol. 2005, Mar. 2005, pp. 328–345.

[7] L. Yang and G. B. Giannakis, “Analog space-time coding formultiantenna ultra-wideband transmissions,” IEEE Trans. Commun.,vol. 52, no. 3, pp. 507–517, March 2004.

[8] M. Weisenhorn and W. Hirt, “Performance of binary antipodalsignaling over the indoor UWB MIMO channel,” in IEEE Int. Conf.Commun., vol. 4, May 2003, pp. 2872–2878.

[9] A. Bara, J. Balakrishnan, A. Dabak, R. Gharpurey and J. Lin, “Multi-band OFDM physical layer proposal”, IEEE P802.15-04/0493r1-TG3a, Sept. 2004.

7



[10] A. Giorgetti, M. Chiani and M.Z.Win, “The effect of narrowbandinterference on wideband wireless communication systems”, IEEETrans. Communications, vol. 53, No.12, pp. 2139-2149, Dec. 2005.

[11] Z. Li, A. M. Haimovich, and H. Grebel, “Performance of ultra-wideband communications in the presence of interference,” in Proc.IEEE Int. Conf. Communications, vol. 10, June 2001, pp. 2948–2952.

[12] Alan J. Coulson, “Narrowband interference in pilot symbol assistedOFDM systems”, IEEE Trans on Wireless Communications, vol. 3,No. 6, Nov. 2004, pp. 2277-2287.

[13] M. Biagi, A. Buccini and L. Taglione, “Advanced methods for ultrawide band receiver design”, Proceedings of the First InternationalWorkshop “Networking with UWB”, Rome, Italy, 21 Dec. 2001.

[14] C. Carlemalm, H. V. Poor and A. Logothetis, “Suppression ofmultiple narrowband interferers in a spread-spectrum communicationsystem”, IEEE J. Select. Areas Communications, vol.3, No. 5, pp.1431-1436, Sept 2004.

[15] L. B. Milstein, “Interference rejection techniques in spread spectrumcommunications”, Proc. IEEE, vol. 76, No. 6, pp. 657-671, June1988.

[16] J. D. Laster and J. H. Reed, “Interference rejection in digital wirelesscommunications”, IEEE Signal Proc. Mag., vol. 14, No. 3, pp. 37 –62, May 1997.

[17] E. Baccareli, M. Baggi and L. Tagilione, “A novel approach to in-band interference mitigation in ultra wide band radio systems”, IEEEConference on Ultra Wide Band Systems and Technologies, 2002.

[18] G. Iliev, Z. Nikolova, G. Stoyanov and K. Egiazarian, “Efficientdesign of adaptive complex narrowband IIR filters”, Proceedings ofXII European Signal Processing Conference, EUSIPCO 2004,Vienna, Austria, pp. 1597 - 1600, 6 - 10 Sept. 2004.

[19] S. Y. Park, G. Shor, and Y. S. Kim, “Interference resilienttransmission scheme for multi-band OFDM system in UWBchannels,” in Proc. IEEE Int. Circuits and Systems Symp., vol. 5,Vancouver, BC, Canada, May 2004, pp. 373–376.

[20] G. Stoyanov, M. Kawamata and Z. Valkova, “New first and second-order very low-sensitivity bandpass/bandstop complex digital filtersections”, Proc. IEEE Region 10th Annual Conf. "TENCON’97",Brisbane, Australia, vol. 1, Dec. 2-4, 1997, pp. 61-64.

[21] S. Douglas, “Adaptive filtering, in digital signal processinghandbook”, D. Williams and V. Madisetti, Eds., Boca Raton: CRCPress LLC, pp. 451-619, 1999.

[22] Z. Nikolova, G. Iliev, G. Stoyanov and K. Egiazarian, “Design ofadaptive complex IIR notch filter bank for detection of multiplecomplex sinusoids”, Proc. International Workshop SMMSP’2002,Toulouse, France, pp. 155-158, Sept. 2002.

[23] J.-C. Juang, C.-L. Chang and Y.-L. Tsai, ”An interference mitigationapproach against pseudolites”, The 2004 International Symposium onGNSS/GPS, Sydney, Australia, 6 – 8 Dec. 2004.

[24] S. Tezuka, P. L'Ecuyer and R. Couture, “On the lattice structure ofthe add-with-carry and subtract-with-borrow random numbergenerators”, ACM Transactions on Modeling and ComputerSimulation (TOMACS), vol. 3, Issue 4, pp. 315 – 331, Oct. 1993.

[25] R. van Nee and R. Prasad, OFDM for wireless multimediacommunications, Artech House, 2000.

[26] H. Shulze and C. Luders, Theory and applications of OFDM andCDMA, John Wiley, 2005.

[27] M.-O. Wessman, A. Svensson and E. Agrell, “Frequency diversityperformance of coded multiband-OFDM systems on IEEE UWBChannels”, COST 289 Workshop, Gothenburg, Sweden, April 2007.

[28] A. F. Molisch and J. R. Foerster, “Channel models for ultra widebandpersonal area networks”, IEEE Wireless Communications, pp. 524-531, Dec. 2003.

[29] S. Alamouti, “Simple transmit diversity technique for wirelesscommunications”, IEEE Journal Select. Areas Commun, vol. 16, No.8, pp. 1451–1458, Oct. 1998.

[30] I. Bernguer and Dong, “Space-time coding and signal processing forMIMO communications”, J Comput. Sci &Technol., vol. 18, No 6,pp. 689-702, Nov. 2003.

[31] M. Ovtcharov, V. Poulkov, G. Iliev and Z. Nikolova, “Narrowbandinterference suppression for IEEE UWB channels”, Proc. ICDT 2009,Colmar, France, pp. 43-47.

8



Efficient Variable Block Size Selection for AVC Low Bitrate Applications

Ihab Amer and Graham Jullien Advanced Technology Information Processing Systems

(ATIPS) 2500 University Drive, NW

Calgary, AB, Canada, T2N 1N4 e-mails: {amer, jullien}@atips.ca

Wael Badawy IntelliView Technologies Inc.

808-55 Avenue NE Calgary, AB, Canada, T2E 6Y4 e-mail: [email protected]

Adrian Chirila-Rus and Robert Turney DSP Systems Engineering, DSP Division, Xilinx Inc

115 South 4th street Watertown, WI, USA, 53094

e-mails: {adrian.chirila-rus, robert.turney}@xilinx.com

Rana Hamed

German University in Cairo (GUC) Main Entrance, Fifth District

New Cairo City, Egypt email: [email protected]

Abstract— The Advanced Video Coding (AVC) standard proposes the usage of Variable Block Size (VBS) motion-compensated prediction and mode decision aiming for an optimized Rate-Distortion (R-D) performance. Unlike Fixed Block Size (FBS) motion-compensated prediction, where all regions of the pictures are treated similarly in terms of temporal prediction, VBS increases the efficiency of encoding by allowing more active regions to be represented with more bits than less active ones. The main concern regarding the usage of VBS motion-compensated prediction is the dramatic increase it adds to the encoder computational requirements, which not only prevents the encoder from satisfying real-time constraints, but also makes it impractical for hardware implementation. This paper presents an efficient VBS selection scheme, which can be applied to any VBS Motion Estimation (ME) module, leading to significant reduction in its computational requirements with minor loss in the quality of the reconstructed picture. The computational requirements reduction is achieved by minimizing the number of required ME searches and simplifying the Mode Decision (MD) operation. In order to meet different applications’ demands, the proposed algorithm can be adjusted to function at any of three operating points, trading off computational requirements with R-D performance. In the paper, the algorithm is described in detail, focusing on the theoretical computational requirements savings. This theoretical analysis is then supported with simulation results performed on three benchmark video sequences with various types of motion.

Keywords- H.264/AVC, motion estimation, variable block size.

I. INTRODUCTION

To achieve a high coding efficiency, AVC deploys a set of new features in addition to enhancing a set of previously used features. In the inter-frame motion prediction, AVC allows for the usage of variable block sizes motion estimation/compensation that can support block sizes of 16×16, 16×8, 8×16 and 8×8 resulting in significant performance improvement. Even more, in the case when an 8×8 mode is chosen, further smaller blocks of sizes 8×4, 4×8 and 4×4 can be used. This method improves the motion tracking capabilities of the encoder by allowing inactive regions and regular movements to be represented with an optimal amount of motion information (e.g., 16×16 mode is represented by one motion vector), while for fast and highly irregular movements, the finer blocks can be used at the cost of increased motion information, but optimal error representation. In terms of computational and memory requirements, motion estimation is by far the most complex module in the entire AVC encoder. As will be shown later, recent studies show that it represents between 70% and 90% of the entire encoder computational requirements. The increase in computational and operational requirements brought by the usage of such new features requires algorithmic and implementation enhancements so that the compression algorithm can become useful in real applications. As a step towards the design and implementation of an entire computationally-efficient AVC encoder, this paper proposes a solution to the increased computational requirements of the AVC variable block size motion estimation/compensation (and mode decision) module. A novel, computationally-efficient algorithm for variable block size motion estimation and mode decision (acting at three operating points that meet various applications’ requirements) is developed and described. The

9



obtained simulation results and analysis show that, for the variety of tested benchmark sequences, the encoder computational requirements can be reduced to less than half, at the expense of “minor” degradation in the quality. The work in this paper further emphasises on the work that has been introduced in [1].

The remainder of this paper is organized as follows: Section II describes the key concept behind the adoption of VBS ME/MD, showing its necessity, and the computationally-expensive way it is implemented in the direct approach of the AVC software reference model. A survey of recent efforts to solve this computational requirements issue is also given. In Section III, the proposed VBS selection scheme is described, accompanied with some theoretical computational requirements analysis to show its effectiveness. Section IV shows and discusses the experimental analysis and results obtained by encoding various benchmark video sequences. Then finally, Section V concludes this paper.

II. VBS MOTION ESTIMATION AND MODE DECISION

AVC defines seven block sizes for inter-prediction. The seven modes are shown in Figure 1. During the encoding process, an ideal encoder has to examine all these modes to achieve the best inter-prediction, which requires performing all the seven modes of searches, and then examining all the 259 possible combinations of macroblock (MB) partitioning schemes to choose the best one out of them. This VBS motion estimation and mode selection process result in significant performance improvement (in terms of rate-distortion). This is because, on the contrary to Fixed Block Size Motion Estimation (FBSME), where all sub-blocks in a MB have the same size, Variable Block Size (VBS) ME improves the motion tracking capabilities of the encoder by allowing it to give more “attention” to highly active sub-blocks (representing an active region with more sub-blocks would better describe it). However, this leads to dramatic increase in the computational requirements of the encoder compared to traditional FBSME-based encoders.

This section overviews the main concept of VBS ME/MD. It starts with Section A, where the necessity of VBS ME/MD is investigated. Then in Section B, a description of the AVC reference software implementation approach is given. Finally, Section C summarizes some recent efforts to reduce the computational requirements of VBS ME/MD.

A. Necessity of VBS ME/MD In this section the necessity of VBS ME is evaluated by

first demonstrating its usage when encoding different sequences with different types of motion, then by comparing the reconstructed sequences R-D performance when VBS ME is fully or partially disabled.

Figure 2 shows the occurrence percentage of different block sizes as chosen by the AVC software reference model “Joint Model JM 10.2”, running with all the seven block

mode types enabled. Full search is used for motion estimation. The test has been performed for a wide range of sequences, starting from QCIF (176×144) up to HD 1080p (1920×1080) and for different motion types.

Table 1 summarizes the characteristics of the sequences that were used in this experiment.

The impact of choosing the variable block size can be clearly seen. For example, the traditional 16×16 has been chosen as the optimal mode for almost 50% of the cases in the “Shields” sequence, a sequence with regular pan movement, while it was as small as 5% for high irregular movement in “Football” sequence, for which the small block sizes (4×4) has been the best option in 40% of the cases. In addition, a more informative test was performed by running the JM several times for each sequence, disabling one or more inter search/decision mode in each run, then comparing the R-D curves generated from all runs. Figure 3 shows the impact of disabling some of the searching modes on the overall encoder efficiency. The sequence used for this test was Foreman CIF (30 fps). For simplicity, the testing conditions for the encoder runs included the following restrictions: the usage of only one reference frame and the usage of “Low Complexity Mode” for Rate-Distortion optimization. Each of the curves was generated with a different inter-search/decision-mode configuration than the other. Table 2 provides a description of the different cases used to create Figure 3. Any single curve in the graph represents a typical relation between rate and distortion in digital video coding. As expected, the two parameters are inversely proportional. This is reasonable since the better quality needed to be preserved in an encoded video sequence, the more bits required to represent the generated bitstream.

Rate is typically represented by the minimum number of bits per seconds required to transmit the generated bitstream without affecting the continuity of the reconstructed sequence. It depends on two parameters: the size of the generated bitstream, and the resolution/frame rate of the specific sequence. The quality of the reconstructed sequences is measured by the most widely-used Peak Signal to Noise Ratio (PSNR) parameter. Due to its easy calculation, PSNR is the most famous objective quality measurement in video coding. It is measured on a logarithmic scale and depends on the “Mean Squared Error” (MSE) between an original and an impaired image or video frame as shown in (1) and (2). F(i,j) and f(i,j) are the values of pixel (i,j) in the original and reconstructed images or video frame of resolution N×M, while (2n-1) is the highest-possible signal value in the video frame, where n is the number of bits per frame sample.

2

,

[ ( , ) ( , )]i j

f i j F i j

MSEN M

∀

−

=×

∑

(1)

2

10

(2 1)10 log

n

PSNRMSE

−=

(2)

10



Figure 3 shows that the blue curve (Case 0, which uses all the variable block sizes) is the optimum one in terms of R-D performance. The brown curve (Case 5, where only a fixed 4×4 search/decision mode is used) has a relatively poor performance, especially at low bitrates (around 1 Mb/s), as most of the residual data is cut-off by the coarse quantization process (high QP value), giving more influence to the size of the encoded motion info on the output bitrate. Besides, the light-blue curve (Case 3, where only a fixed 16×16 search/decision mode is used) also shows a poor performance, especially at high bitrates (around 6 Mb/s), when the savings achieved by representing each macroblock by only one motion vector becomes negligible when compared to the overhead of encoding the residuals, which becomes higher for macroblocks with more “activity” inside.

Figure 4 shows that the generated R-D curves for all the tested sequences tend to demonstrate a relatively similar behaviour to what has been discussed above at different bitrates (the higher the sequence resolution, the higher the required bitrate).

In conclusion, VBS ME/MD is an important tool that has been used in AVC to optimize the R-D performance of the encoder. It allows inactive regions and regular movements to be represented with an optimal amount of motion information (e.g., 16×16 blocks), while for fast and highly irregular movements, the finer blocks are used (e.g., 4×4 blocks), achieving optimal error representation, at the cost of increased motion information.

B. Direct Implementation of VBS ME/MD A feature of the way AVC reference software (JM) adopts Rate-Distortion Optimization (RDO) is by performing an exhaustive search for the “best” partitioning scheme (the one that gives the lowest R-D Cost) among all the 259 (44 (8×8 or smaller) sub-modes + 1 (8×16) + 1 (16×8) + 1 (16×16) modes) possible combinations. This can be shown in Figure 5 [2]. Hence, the JM searches exhaustively for the minimum possible distortion that can be achieved subject to a specific rate constraint. This can be expressed as follows: Find min(D) subject to R<Rc, which can be elegantly solved using Lagrangian optimization where a distortion term is weighted against a rate term as follows [3]:

Find min(J), where J = D + λR (3) where J is the R-D Cost that needs to be minimized. D is

the Distortion. JM uses the Sum of Absolute Differences (SAD) as the metric of distortion because of its less computational requirements compared to other metrics. SAD is calculated using (4), where F(i,j) and f(i,j) are the values of pixel (i,j) in the reference and the candidate block respectively. R represents the Rate. The way to estimate the rate differs based on the type of rate-distortion optimization scheme being used. For high-complexity mode, JM uses the exact number of bits for header, motion info, and transform coefficients (the way they look after the last encoding stage) as the metric of rate as shown in (5). λ (the Lagrange

parameter that determines the importance of rate with respect to distortion) is exponentially related to the Quantization Parameter (QP) after multiplying it with a double-precision weight as shown in (6).

,

( , ) ( , )i j

SAD f i j F i j∀

= −∑ (4)

R = Rheader (exact) + Rmotion (exact) + Rcoefficients (exact) (5) λ = weight×2(QP-12)/3 (6) For low-complexity mode, motion cost calculation is

performed using a less complex scheme. A lookup table (LUT) is used to roughly estimate the number of bits needed to encode the difference between the motion vectors and the predictors as shown in (7). λ is also estimated roughly from QP by a lookup table as shown in (8).

R = LUTMV_COST[MVcand-MVpred] (7) λ = LUTQP2QUANT[max(0, QP-12)] (8) For simplicity reasons, and practical hardware

implementation, any reference to rate-distortion optimization in the remaining part of this paper is a reference to the low-complexity mode, unless clearly mentioned otherwise. Recent complexity analysis was performed in [4] to estimate the distribution of complexity among different modules of the AVC encoder. A typical profile of the encoder complexity (by files) based on an Intel® PentiumTM III 1.0 GHz general purpose processor with 512 MB of memory is shown in the pie chart of Figure 6(a). Also, another run-time complexity analysis was performed in [6], where the reference software (Baseline profile) was executed on a Sun® BladeTM 1000 with UltraSPARCTM III 1 GHz processor. The run time percentage (approximated to the nearest integer value) of each module is shown in the pie chart of Figure 6(b). The figure shows that the computational requirements of motion estimation/compensation typically represent from 70% to 90% of the overall encoder computational requirements for a typical AVC encoder, which makes it a primary candidate for hardware acceleration. Besides, this module is the main target of encoder computational requirements reduction for most researchers. Most of the efforts aim at introducing “reasonable” quality degradation as an expense of computational requirements savings compared to the optimal (yet very complex) exhaustive-search solution. The next section surveys some of those efforts.

C. Recent Efforts to Reduce Complexity of VBS ME/MD Many algorithms for fast ME and MD have been

proposed in the literature. Most of them rely on the fact that human’s visual system is typically insensitive to the degradation in PSNR that is less than 0.2 dB. Hence, they tend to reduce the computational requirements, at the expense of minor “unrecognizable” quality degradation. In this section, an overview of the recent efforts to reduce the VBS ME/MD in AVC is given. Some of the algorithms that have been introduced recently rely on exploiting video

11



features such as texture and edge information to predict the best possible mode. As an example, Wu et al [7] proposed to adjust the block sizes based on the homogeneity of the region in the MB. They observed that homogenous regions, which are determined by using the magnitude of the edge vector, tend to move together, and hence should not be split into smaller blocks. The results they provided indicate that the technique is not as effective for dynamic sequences as it is for inactive ones. They achieved a maximum computational requirements reduction of 45% for inactive sequences, but it did not exceed 10% reduction for active sequences. The tests they performed were on low resolution sequences only (QCIF and CIF), which leaves the effectiveness of their algorithm with high resolution sequences questionable. Lin et al [2] developed a combined algorithm for fast motion estimation and mode decision by exploiting the motion and cost information available from blocks that have been processed prior to the current block. The authors proposed to predict the mode of a current MB based on previously coded MBs, then perform a fast ME for this predicted mode. If the resulting cost is less than an adaptively calculated threshold, the algorithm skips any further calculations for this block. Although the authors did not investigate the potential of their algorithm for HW implementation, the way they described it makes it less likely to be implemented in hardware. A possible candidate HW design should be able to handle the worst case scenario, which is to calculate all the complex R-D costs for all the seven modes. This task is too complex to be executed within a reasonable number of clock cycles that fits typical encoder systems requirements unless a huge (non realistic) design is implemented. The authors did not provide any computational requirements analysis to justify the time savings they are claiming. They also claim that they observed a maximum PSNR degradation of 0.4 dB in their experiments on CIF sequences, which is big enough to generate visually noticeable defects. Yu et al [8] proposed a strategy that incorporates temporal similarity detection as well as the detection of different moving features within a macroblock. Their experiments were performed on QCIF resolution only, and they showed up to 9% bitrate overhead with highly dynamic sequences. The authors did not provide any visual justification of the quality they are claiming. Also they did not show any potential for hardware implementation. In [9], the author proposed an algorithm that relies mainly on two predictive factors: intrinsic complexity of the MB and mode knowledge from the previous frame. The algorithm added more than 5% bitrate overhead for dynamic sequences, and the maximum achieved time saving for all the tested QCIF and CIF sequences was 30%.

Tourapis et al [10] proposed to extend and adapt the concept of the Enhanced Predictive Zonal Search (EPZS) motion estimation algorithms within the AVC standard. They proposed additional modifications to the predictor consideration, and introduced a new refinement pattern and a new iterative refinement process to improve the efficiency of the algorithms. The results of their experiments on selected CIF sequences showed a speedup in the motion estimation process (without providing enough computational requirements analysis to justify this speedup). However, in

that paper, the authors did not introduce any enhancements in the mode decision process. In [11], the authors had more focus on mode decision. They proposed a fast mode decision mechanism by considering that the mode decision error surface is monotonic with partition block size. Their approach depends on searching specific block sizes first, then based on the values of the resulting cost parameters, a decision was made whether to perform an early termination or not. If not, all possible search modes are performed. This means that for highly dynamic sequences, the speedup of the mode decision process is expected to be negligible. The authors did not provide any computational requirements analysis or comparisons, and they reported a speedup in the encoding process of the examined CIF sequences, without describing the configurations of the reference (JM) they are comparing with. In [12], the authors proposed fast mode decision and motion estimation processes with main focus on MPEG–2/AVC transcoding. The algorithm utilizes the motion information from MPEG–2 for an AVC encoding using EPZS. The target application for this work had limited processing power; hence the authors ignored the small sub-partitions that AVC standard offers. This had its effect on the obtained results for the tested CIF sequences. Highly dynamic sequences showed more than 0.3 dB degradation in PSNR, with a speedup of an order of magnitude in the encoding time.

Lee et al [13] proposed an early skip mode decision as well as a selective intra mode decision. They reported a maximum improvement of 30% in the required encoding time. Ahmad et al [14] proposed a scheme that is based on a 3D recursive search algorithm and takes into consideration the motion vector cost and previous frame information. They also reported a maximum decrease of 30% in the required encoding time. However, they did not provide any quality assessments; neither did they provide any computational requirements analysis of their algorithm. In [15], Kim et al proposed an algorithm that uses the property of All Zero Coefficients Block (AZCB), obtained by quantization and coefficient thresholding, to skip unnecessary modes. The authors reported an average speedup of two times for the tested QCIF and CIF sequences. However, based on the algorithm flow that they provided, the algorithm is expected to lose its computational efficiency when applied to highly dynamic sequences, as no early skipping will be performed in this case, and all the inter-search modes will be required. Jiang et al [16] proposed a low complexity VBS ME algorithm for video telephony communication. Their technique included classifying macroblocks in a frame into types, and applying different searching strategies for each type. They based their cost calculations on SAD only, ignoring the rate component. They did not propose any modification to the MD module. Hence, the algorithm efficiency is limited to the usage of the proposed ME algorithm only. In spite of the achieved time savings compared to the full search approach, the reported results showed noticeable degradation in encoding time when compared to other fast ME algorithms. Also, the authors’ experiments on QCIF and CIF sequences showed that the

12



algorithm is confined in applications with sequences of low motion and high temporal correlation.

Tanizawa et al [17] proposed a fast rate-distortion optimization method targeting low complex mode decision. The proposed method was based on a 2-step hierarchical mode decision. In the first step, a simple R-D cost without tentative coding is calculated. One or more coding candidates are selected using the obtained R-D cost. The number of candidates varies with the value of the quantization parameter (QP). In the second step, the conventional RDO method is applied to the candidates that have been chosen in the first step. The authors timing comparisons did not include the motion estimation operation. A disadvantage of this method is its unsuitability for hardware implementation due to the extremely complex RDO calculation that is applied for the successful candidates of the first step. Dai et al proposed an algorithm in [18] that limits the candidate modes to a small subset by pre-encoding a down-sampled small version of the image. They reported a 50% reduction in encoding time for the three tested CIF sequences. Kuo et al [19] also proposed a multi-resolution scheme and an adaptive rate-distortion model with early termination rules to accelerate the search process. In addition, the authors derived a rule to estimate the bits resulting from residual coding. The reported results showed significant improvement in the encoding time for the tested CIF sequences when compared with the exhaustive full search technique. However, the algorithm showed poor R-D performance when applied to highly dynamic sequences. Chen et al [20] proposed a fast bits estimation method for entropy coding to be used in RDO instead of calculating the actual bitrate, which is an extremely complex operation if performed for each candidate location. The algorithm is based on simplifying the CAVLC only, which makes it inapplicable to the Main- and High-profile encoders that are intended to be configured to use CABAC. The authors did not propose any simplification to the motion search and mode decision strategy, which makes a hardware implementation of their method unpractical.

Rhee et al [21] introduced one of the earliest proposals to use VBS ME. In their work, they proposed two algorithms. One of them was extremely computationally extensive and they proposed to use it as a reference by the following research work in the field, while the other was a simpler version that is based on heuristics. Their work relied on local motion information. A set of candidate motion vectors of each fixed size small block is first obtained by full search whose matching error is less than a prescribed threshold. Neighbouring blocks are then merged in a quad-tree manner if they have at least one motion vector in common. The only modes the algorithm supported were the square modes (4×4, 8×8, and 16×6). Tu et al extended this work in [22] by taking the quantization parameter and the Lagrange cost function into consideration to determine the threshold for comparing the distance between the motion vectors of the two blocks. The experimentation was performed on QCIF and CIF sequences. No timing or computational requirements analysis was introduced. The authors proposed to start with 8×8 searches, and then merge all the way up in the tree to

obtain larger block sizes. Hence the supported modes were (8×8, 16×8, 8×16, and 16×16) only. The authors then proposed in [23] a merge and split scheme, which gave the algorithm the ability to represent all search modes. They proposed to perform a “simple” refinement search after every merge or split operation. This extra search would introduce relatively large complexity overhead to the motion estimation module if the complexity of the main motion estimation search is comparable. The obtained results for the tested QCIF and CIF sequences were obtained at QP = 30 only and no rate distortion curves were given. In [24], the authors used the same merging scheme, with the possibility of excluding some low-probability modes in the mode decision process. Again, this extra search that is required after each merging or splitting would introduce relatively large complexity overhead to the motion estimation module if the complexity of the main motion estimation search is comparable.

Ates et al [25] proposed a hierarchical block-matching-based motion estimation algorithm that uses a common set of SAD computations for motion estimation of different block sizes. Based on the hierarchal prediction and the median motion vector predictor of AVC, the algorithm defines a limited set of candidate vectors; and the optimal motion vectors are chosen from this common set. The authors showed their complexity analysis; however, they did not support it with experimental timing measurements. This algorithm showed acceptable PSNR degradation and increase in required bitrates for the tested QCIF, CIF, and SIF sequences. The next section describes the proposed VBS selection scheme, accompanied with some theoretical computational requirements analysis to show its effectiveness.

III. THE PROPOSED VBS MODE DECISION ALGORITHM

The proposed variable block size motion estimation and mode decision algorithm is based on a merging scheme. It is a bottom-up approach approximation method that exploits the correlation of the smaller blocks motion vectors in a uniform or close motion vector field to build up the larger blocks. The algorithm uses the available typical motion estimation information such as predictor, refinement displacement, and cost. The cost here maybe directly considered as the R-D Cost described in (3) assuming that it is going to be pre-calculated in the motion estimation engine and passed as an input to the algorithm, or it might be considered as the SAD requiring the calculation of the R-D Cost within the algorithm module. A description of the main data structure of the algorithm is given in Section A. In Section B, a theoretical computational requirements analysis is introduced to emphasise the computational efficiency of the proposed algorithm. Finally Section C describes the flow of the proposed algorithm.

13



A. The Main Data Structure of the Proposed Algorithm The main data structure of the algorithm is shown in

Figure 7. It is mainly a decision tree that is used to decide about the “suitable” neighbouring nodes to approximately merge them upwards forming a parent node. The algorithm is based on the observation that, if the cost of a parent block is higher than the sum of costs of the children blocks, then the even larger block-size modes can be excluded. Each node in the tree represents a “legal” block partition. A node is represented by its best-representing motion vector that have been calculated by block-based motion estimation (or interpolated by merging), the predictor used as the anchor to start searching around, and the R-D cost accompanied with this motion vector and motion vector predictor.

B. Computational Complexity Analysis The tree consists of four decision-levels and five nodes-

levels as shown in Figure 7. Roughly, it can be assumed that all the even nodes-levels of the tree (level 0, level 2, and level 4) require approximately the same computational requirements (1x each level) to find the best motion vectors for all the nodes of a level and to calculate their accompanied costs, while the two odd nodes-levels of the tree (level 1 and level 3) require approximately a computational requirements of (2x each level) to find the best motion vectors and costs.

The overall ME/MD module computational requirements can be viewed as the sum of the VBS best motion vector search computational requirements and the VBS mode decision computational requirements. This is shown in (9).

Comp_Req_Total = Comp_Req_MV_Searches +

Comp_Req_VBS_MD (9)

Hence, the overall inter-related computational

requirements of the ME/MD module in the JM can be estimated using Figure 5 and Equation (9). Due to the exhaustive nature of the direct JM implementation, it is required to perform all the types of searches for all possible modes, which is equivalent to 7x, in order to find the best match for each and every block type. Also, it is required to find the best partitioning scheme among all the 259 possible combinations, a relatively complex operation, whose computational requirements will not be described in details due to its dependence on the way it is implemented. Due to the dominance of the search computational requirements over the mode decision one, any reference to (JM 7x) in the remaining part of this paper is a reference to the exhaustive-search JM implementation of the ME/MD module.

By nature, the proposed merging tree makes the ME/MD module simpler than the JM implementation due to its hierarchical mode decision approximation scheme (simpler mode decision operation). However, as mentioned above, the dominant part that determines the overall module computational requirements is the influence it has on the computational requirements of the motion vector search part

(first parameter of (9)). Therefore, the most effective cost reduction technique for the overall module would come by reducing the required number of searches. This led to the idea of allowing the algorithm to meet a wider base of market demands by making it adjustable to work at three different operating points, trading between computational requirements and R-D performance. The key concept is to reduce the required searches as much as possible, while still keeping the R-D degradation within the different allowance windows offered by different customers. Figure 8 describes the difference between the three operating points. The encircled levels refer to levels where the motion vectors of all their nodes are found by performing block-based searches. The arrows refer to the interpolated levels and where they are interpolated from. For operating point 1, the algorithm requires performing the 4×4, 8×8, and 16×16 types of searches. This corresponds to a search computational requirements of 3x (based on the metric mentioned above), which is less than half the computational requirements of the JM exhaustive implementation (JM 7x). The motion vectors and the predictors of the other types of modes (4×8, 8×4, 8×16, and 16×8) are “interpolated” using the actually-calculated motion data of the child nodes in the levels that directly precede those levels in the merging tree structure. The algorithm working at this operating point would fit the requirements of high-resolution applications, such as SD and HD broadcasting. Operating point 2 is the second mode of the algorithm. It requires performing 4×4 and 8×8 types of searches only, reducing the search computational requirements to 2x, while the motion vectors and the predictors of the other types of modes (4×8, 8×4, 8×16, 16×8, and 16×16) are “interpolated” using the tree structure. Operating point 3 is also introduced to meet the requirements of applications such as simple handheld devices where reduced computational requirements (and power consumption) is the main priority. This mode requires performing 4×4 searches only (level 0 of the tree), reducing the search computational requirements to 1x, while the motion vectors and the predictors of all the other types of modes (4×8, 8×4, 8×8, 8×16, 16×8, and 16×16) are “interpolated” in a bottom-up fashion all the way through the tree structure.

Table 3 shows the schemes that are going to be referred to in the next section, with a brief description of each. Note that JM 1x, JM 2x, and JM 3x, are generated by running the JM, disabling some of inter search/decision modes. They are mainly included to evaluate the performance of VBS 1x, VBS 2x, and VBS 3x (operating point 3, 2, 1) respectively compared to their corresponding almost-same-computational-requirements JM implementations (in addition to the main comparison with the optimum R-D performance of JM 7x). A ‘√’ in the above table refers to an actual MV search and mode decision. A ‘–’ refers to an absence of this block-type search and mode decision, while an ‘M’ refers to a block-type that is interpolated by merging of child nodes. The number of comparisons required for VBS 1x, 2x, and 3x mode decision are derived assuming that each merging rule requires a single comparison. The MD computational requirements of VBS 2x and 3x also include

14



the additions and comparisons required for early mode skipping, while the computational requirements of the control logic are not stated. The MD computational requirements for JM 2x and 3x are not given in details because of being implementation-dependent.

C. Description of the Proposed Algorithm In this section, a description of the algorithm that has

been implemented is provided. The coverage of the algorithm to the three operating points mentioned above is also discussed. Figure 9 shows the flowchart of the algorithm. As can be seen, at operating point 3, the tree structure is parsed (checked for merging) starting from its level 0 (L0) all the way up to its level 4 (L4). At operating point 2, the sum of costs (using (3)) of each four neighbouring 4×4 nodes (each L0 quad) is compared with the corresponding 8×8 cost to decide whether to start with 4×4 quads followed by merge-checking for one level up, or to start with the 8×8 level (L2) followed by merge-checking for two levels up. Operating point 1 differs from operating point 2 by having an additional check for possible early decision to choose 16×16 mode, which makes the last-level merge-checking stage unnecessary. In Figure 9, it is assumed that the nodes costs are calculated on the spot; however, they can be passed as inputs to the VBS module as they have already been calculated during the search for best motion vectors. A key element that determines the efficiency of the above merging scheme is how accurate the merging rules are. On the other hand, its computational requirements are very influential in determining the second parameter in (9). Note that based on Figure 7, different rules maybe used for merge-checking of different nodes. However, for simplicity, the same merging rule is initially presumed to be applicable for all pairs of blocks. Other varieties of the algorithm may be applicable, such as changing the rule according to the level it is located in, or giving some nodes more priority than others by subjecting them to more accurate merging rules. It is clear that a suitable rule would be an accurate, yet simple one. The steps of the merge-checking rule are provided in Figure 10.

It is worth mentioning that for generality, the shown pseudo-code is based on the assumption that each searched node has been searched using N_MV different motion vectors predictors (N_MV is assumed to be 4 as an example), or that N_MV successful candidates are elected by the ME search engine rather than only the best one per search. This means that each search node will be initially marked by 4 best motion vectors and 4 motion vector predictors, which would help avoiding falling into local minima. For simplicity, all the testing results mentioned in the next section have been generated after assigning N_MV to 1. Also it was found that using Th_x = Th_y = 0 is a reasonable choice. The next section shows the experimental analysis and results obtained by encoding various benchmark video sequences.

IV. EXPERIMENTAL ANALYSIS AND RESULTS

In this section, results of the performed experiments to evaluate the performance of the proposed algorithm are introduced. The main goal is to compare the algorithm at its three operating points to the different JM references mentioned in the previous section. Section A describes the method and experimentation environment, while Section B shows the detailed comparisons with the brutal-force method that is adopted by the reference software in terms of time complexity and R-D performance.

A. Method and Experimentation Environment The evaluation process is done by performing a

comprehensive test where various video sequences are encoded by seven versions of the encoder (see Table 3) throughout a specific range of Quantization Parameter (QP). The goal is to plot the seven generated R-D curves for each sequence to sense the closeness of the R-D performance of the encoder working with the proposed VBS ME/MD algorithm, to the optimum exhaustive solution. Besides, the different encoding times of various sequences with the tested versions of the reference software will be given, and compared to the “theoretical” computational requirements analysis that has already been introduced in Section B of the previous section. The sequences were selected to represent a variety of motion types. This helps in creating a wide range of input stimuli to test the algorithm behaviour. In order to cover a wider range of rate and quality requirements during the testing process, QP was set to vary throughout a broad range of values.

B. Obtained Results This section starts by showing execution-time

measurements and results in subsection 1, followed by the rate-distortion results in subsection 2.

1) Execution Time Measurments and Results

All the results that are discussed in this section have been obtained by running the seven versions of the AVC encoder (defined in Table 2) on a unified platform. Table 4 shows the time spent by each of the seven versions of the software on encoding ten frames of each of the tested sequences (with QP = 30). The values between parentheses represent the savings in computational time with respect to the JM7x version. The table shows that using VBS 1x, VBS 2x, or VBS 3x, reduces the encoding time of all the tested sequences at least by more than half when compared to the required encoding time for the pure JM 7x version. This conforms to the theoretical computational requirements analysis that has been discussed in Section B of the last section. The total encoding time for VBS 1x, VBS 2x, and VBS 3x is almost the same as for JM 1x, JM 2x, and JM 3x respectively (with minor increase due to the extra comparisons and additions). However, the next section shows that the improvement of the VBS nx algorithms over

15



the corresponding JM nx ones in terms of R-D performance is clear enough to neglect this minor complexity overhead, especially when targeting low bitrate applications. Also, the table shows that most of the encoding time is being spent on the motion estimation calculation, which conforms to the encoder complexity profile that was shown in Figure 6. Figure 11 shows a graphical representation of the obtained results. The results show that the more motion a sequence contains, the more time it spends in motion estimation, which in turns translates to an overall increase in encoding time.

2) Rate-Distortion Measurments and Results Figure 12 to Figure 14 show the R-D behaviour of the tested

versions of the encoder, with an emphasis on the low-bitrate region. The figures show that the generated graphs are consistent with the expected behaviour of the algorithm. It is clear that for all the sequences, the lower the bitrate, the more effective the algorithm appears to act (at all operating points). This is because at lower bitrates, motion data have a comparable effect on bitrate to the residual data; hence any savings are highly sensible.

For all sequences, VBS 1x may be used as an optimized version of JM 1x. Though, its performance is relatively poor when compared to VBS 2x, VBS 3x, or JM 7x, it can be used as a reasonable compromise when the target application requires low complexity and low power system with reasonable R-D behavior. For all the examined sequences, JM 7x does not outperform VBS 3x by more than 0.2 dB at any bitrate (around the target bitrate that suits the sequence resolution).

VBS 1x, VBS 2x, and VBS 3x introduce enhancements over JM 1x, JM 2x, and JM 3x respectively. The enhancements become more sensible at low bitrates. For example, VBS 1x introduces huge enhancement (around 8 dB) over JM 1x when encoding the sequence “Mobile and Calendar QCIF 30 fps” (Figure 14) targeting as low bitrate as 29.5 Kbps. Also, the merging operation that was performed resulting in VBS 2x boosted the curve of JM 2x for the sequence “Miss America QCIF 30 fps” (Figure 13) by (6 dB) at 26.5 Kbps bitrate.

In summary, the experimental analysis and results demonstrate the main contribution of the proposed algorithm. It is mainly the ability to exhibit acceptable R-D behavior for different sequences with various types of motion. Nevertheless, the ME/MD computational requirements are less than half the computational requirements of ME/MD of JM 7x. This leads to faster encoding time on software platforms, as well as smaller (hence less expensive) implementations on hardware platforms.

V. CONCLUSION

Having the VBS ME tool in the AVC standard improves its coding efficiency significantly. However, it also introduces extreme computational requirements to the encoder. The JM (AVC software reference model) has an exhaustive approach to implement VBS ME/MD. All seven

types of motion estimation searches are performed, and then in the mode decision step, an exhaustive search follows to choose the best partitioning scheme among all possible combinations. Knowing that VBS ME and MD typically represent from 70% to 90% of the entire encoder computational requirements, many research efforts have been introduced in the literature to reduce their computational requirements. However, most of the solutions were local to specific types of simplified motion estimation searches.

In this paper a computationally-efficient VBS selection scheme was introduced. The scheme is applicable to any VBS ME module, leading to significant reduction in its computational requirements with minor loss in the quality of the reconstructed picture. Three versions of the proposed algorithm have been introduced in order to meet different applications’ demands. Evaluation experiments were performed on three benchmark video sequences with various spatial and temporal characteristics ranging from smooth slow motion, up to random fast motion. Timing analysis of the performed experiments showed that the proposed algorithm (with its three versions) reduces the encoding time of all the tested sequences at least by half when compared to the required encoding time for the pure brutal-force solution. Objective quality measurement is represented by R-D performance. It has shown that, for all the performed tests, VBS 1x, VBS 2x, and VBS 3x introduce enhancements over JM 1x, JM 2x, and JM 3x respectively, with minor computational overhead. In general, the proposed algorithm is mostly effective with low-power decoder devices, with reduced computational resources, especially when targeting low-bitrate video applications.

ACKNOWLEDGMENT

The authors would like to thank the Advanced Technology Information Processing Systems (ATIPS) Laboratory, Alberta Informatics Circle of Research Excellence (iCORE), the Alberta Ingenuity Fund (AIF), the Natural Sciences and Engineering Research Council of Canada (NSERC), CMC Microsystems, Micronet R&D, Canada Foundation for Innovation (CFI), the Department of Electrical and Computer Engineering at the University of Calgary, Xilinx Inc., and the German University in Cairo (GUC) for supporting this research.

REFERENCES

[1] I. Amer, R. Hamed, W. Badawy, G. Jullien, and J. W. Haslett, “An Enhanced Lenient Merging Scheme for H.264 Variable Block Size Selection”, proceedings of International Conference on Advances in Multimedia (MMEDIA), Colmar, France, pp. 136-139, July 2009.

[2] S. Lin, C. Chang, C. Su, Y. Lin, C. Pan, and H. Chen, “Fast Multi-Frame Motion Estimation and Mode Decision for H.264 Encoders,” Proceedings of IEEE International Conference on Wireless Networks, Communications and Mobile Computing, Vol. 2, pp. 1237-1242, June 2005.

16



[3] G.J. Sullivan and T. Wiegand, “Rate-Distortion Optimization for Video Compression,” IEEE Signal Processing Magazine, Vol. 15, pp. 74 - 90, November 1998.

[4] W. Chung, “Implementing the H.264/AVC Video Coding Standard on FPGAs,” A white paper. [Online]. Available:

[5] www.xilinx.com/publications/solguides/be_01/xc_pdf/p18-21_be1-dsp4.pdf

[6] Y.-W Huang, B.-Y Hsieh, S.-Y. Chien, S.-Y. Ma, and L.-G Chen, “Analysis and Complexity Reduction of Multiple Reference Frames Motion Estimation in H.264/AVC,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 16, No. 4, pp. 507-522, April 2006.

[7] D. Wu, S. Wu, K. Lim, F. Pan, Z. Li, X. Lin, “Block INTER Mode Decision for Fast Encoding of H.264,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pp. 181-184, May 2004.

[8] A. Yu and G. Martin, “Advanced Block Size Selection Algorithm For Inter Frame Coding in H.264/MPEG–4 AVC,” Proceedings of IEEE International Conference on Image Processing, Vol. 1, pp. 95-98, October 2004.

[9] A. Yu, “Efficient Block-Size Selection Algorithm for Inter-Frame Coding in H.264/MPEG–4 AVC,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 3, pp. 169-172, May 2004.

[10] H. Tourapis and A. Tourapis “Fast Motion Estimation within the H.264 CODEC,” Proceedings of IEEE International Conference on Multimedia and Expo, Vol. 3, pp. 517-520, July 2003.

[11] P. Yin, H. Tourapis, A. Tourapis, and J. Boyce, “Fast Mode Decision and Motion Estimation for JVT/H.264,” Proceedings of IEEE International Conference on Image Processing, Vol. 3, pp. 853-856, September 2003.

[12] X. Lu, A. Tourapis, P. Yin, and J. Boyce, “Fast Mode Decision and Motion Estimation for H.264 with a Focus on MPEG–2/H.264 Transcoding,” Proceedings of IEEE International Symposium on Circuits and Systems, Vol. 2, pp. 1246-1249, May 2005.

[13] J. Lee and B. Jeon, “Fast Mode Decision for H.264,” Proceedings of IEEE International Conference on Multimedia and Expo, Vol. 2, pp. 1131-1134, June 2004.

[14] A. Ahmad, N. Khan, S. Masud, and M.A. Maud, “Efficient Block Size Slection in H.264 Video Coding Standard,” IEE Electronics Letters, Vol. 40, No. 1, pp. 19-21, January 2004.

[15] Y.-H. Kim, J.-W. Yoo, S.-W. Lee, J. Shin, J. Paik, and H.-K. Jung, “Adaptive Mode Decision for H.264 Encoder,” IEE Electronics Letters, Vol. 40, No. 19, pp. 1172-1173, September 2004.

[16] Y. Jiang, S. Li, and S. Goto, “A Low Complexity Variable Block Size Motion Estimation Algorithm for Video Telephony Communication,” Proceedings of IEEE International Midwest Symposium on Circuits and Systems, Vol. 2, pp. 465-468, July 2004.

[17] A. Tanizawa, S. Koto, T. Chujoh, and Y. Kikuchi, “A Study on Fast Rate-Distortion Optimized Coding Mode Decision for H.264,” Proceedings of IEEE International Conference on Image Processing, Vol. 2, pp. 793-796, October 2004.

[18] Q. Dai, D. Zhu, and R. Ding, “Fast Mode Decision for Inter Prediction in H.264,” Proceedings of IEEE International Conference on Image Processing, Vol. 1, pp. 119-122, October 2004.

[19] C.-H. Kuo, M. Shen, and C.-C. Kuo. “Fast Inter-Prediction Mode Decision and Motion Search for H.264,” Proceedings of IEEE International Conference on Multimedia and Expo, Vol. 1, pp. 663-666, June 2004.

[20] Q. Chen and Y. He, “A Fast Bits Estimation Method for Rate-Distortion Optimization in H.264/AVC,” Proceedings of Picture Coding Symposium, December 2004.

[21] I. Rhee, G.R., Martin; S. Muthukrishnan, and R.A. Packwood, “Quadtree-Structured Variable-Size Block-Matching Motion Estimation with Minimal Error,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 10, No. 1, pp. 42-50, February 2000.

[22] Y.-K. Tu, J.-F. Yang, Y.-N. Shen, and M.-T. Sun, “Fast Variable-Size Block Motion Estimation Using Merging Procedure with an Adaptive Threshold,” Proceedings of IEEE International Conference on Multimedia and Expo, Vol. 2, pp. 789-792, July 2003.

[23] Z. Zhou, M.-T. Sun, and Y.-F. Hsu, “Fast Variable Block-Size Motion Estimation Algorithms Based on Merge and Split Procedures for H.264/MPEG–4 AVC,” Proceedings of IEEE International Symposium on Circuits and Systems, Vol. 3, pp. 725-728, May 2004.

[24] Z. Zhou and M.-T. Sun, “Fast Macroblock Inter Mode Decision and Motion Estimation for H.264/MPEG–4 AVC,” Proceedings of International Conference on Image Processing, Vol. 2, pp. 789-792, October 2004.

[25] H.F. Ates and Y. Altunbasak, “SAD Reuse in Hierarchical Motion Estimation for the H.264 Encoder,” Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing, Vol. 2, pp. 905-908, March 2005.

[26] Y.-W. Huang, T.-C. Wang, B.-Y. Hsieh, and L.-G. Chen, “Hardware Architecture Design for Variable Block Size Motion Estimation in MPEG–4 AVC/JVT/ITU-T H.264,” Proceedings of IEEE International Symposium on Circuits and Systems, Vol. 2, pp. 796-799, May 2003.

17



Figure 1. Variable block sizes defined in AVC

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Foreman QCIF Foreman CIF Football CIF City 4CIF Shields 720p Tractor 1080p

Inter MB (4x4) Inter MB (4x8) Inter MB (8x4) Inter MB (8x8) Inter MB (8x16) Inter MB (16x8) Inter MB (16x16) Intra MB

Figure 2. Optimal distribution of the various block sizes in the inter-predicted frames

18



Table 1. Summary of the characteristics of the tested sequences

Sequence

Type

Resolution

Frame Rate

Description of the first ten

frames

Type of motion

Foreman QCIF 176×144 30 fps A man talking to a still camera Slow limited motion Foreman CIF 352×288 30 fps A man talking to a still camera Slow limited motion Football CIF 352×288 30 fps A part of a football game Extensive motion

City

SD (4CIF)

704×576

30 fps

A scene of a city taken with a panning camera

Regular motion

Shields

HD (720p)

1280×720

60 fps

A person pointing at a group of shields while the camera is

panning

Camera shooting of highly textured

scenes

Tractor HD

(1080p)

1920×1080

60 fps A Tractor working at field Camera shooting of

very high resolution

35

36

37

38

39

40

41

42

43

44

0 1000 2000 3000 4000 5000 6000 7000

Bitrate (Kbits/sec)

PSN

R (d

B)

Case0Case1Case2Case3Case4Case5

Better Performance

Figure 3. R-D behaviour of Foreman CIF (30 fps) with various inter searches/decision-modes enabled

Table 2. Description of the different curves in Figure 3

Curve Description Case 0 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 search/decision modes are enabled Case 1 16×16, 8×8, 8×4, 4×8, and 4×4 search/decision modes are enabled Case 2 16×16, 8×8, and 4×4 search/decision modes are enabled Case 3 Only 16×16 search/decision mode is enabled Case 4 Only 8×8 search/decision mode is enabled Case 5 Only 4×4 search/decision mode is enabled

19



34

36

38

40

42

44

100 300 500 700 900 1100 1300 1500

Bitrate (KBit/sec)

PSN

R (d

B)


(a)

32

33

34

35

36

37

38

39

1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000

Bitrate (KBit/sec)

PSN

R (d

B)


(b)

35

36

37

38

39

40

41

1000 2000 3000 4000 5000 6000 7000 8000 9000

Bitrate (KBit/sec)

PSN

R (d

B)


(c)

34

35

36

37

38

39

40

0 20000 40000 60000 80000 100000 120000 140000 160000

Bitrate (KBit/sec)

PSN

R (d

B)


(d)

Figure 4. R-D behaviour of different sequences with different inter searches/decision-modes enabled (a) Foreman QCIF (30 fps) (b) Mobile CIF (30 fps) (c) Football CIF (30 fps) (d) Shields HD 720p (60 fps)

Figure 5. Exhaustive search for best partition scheme as adopted in AVC reference software

Perform Full/Fast Search ME for every possible block-size

Calculate RD-Cost All modes tested?

Determine the best mode out of the 259 possible

ones

Yes

No

20



mv_search.c68%

cabac.c3%

biariencode.c3%

rdopt.c3%

macroblock.c3%

refbuf.c7%

block.c8%

abc.c1% image.c

1% rdopt_coding_state.c0%

loopFilter.c0%

memcpy.asm3%

(a)

Mode Decision2%

Exp-Golomb VLC+CAVLC

0%Deblocking Filter

0%

DCT+Q+IQ+IDCT0%

Intra Prediction1%

Interpolation8%

Fractional ME37%

Integer ME52%

(b)

Figure 6. AVC encoder computational complexity (a) profiled by files (from [4]) (b) profiled by functional modules (from [6])

Figure 7. The main data structure of the algorithm: The merging tree

21



(a)

(b)

Rule A1

Rule A2

Rule A3

Rule A4

Rule A5

Rule A6

Rule A7

Rule A8

Rule A9

Rule A10

Rule A11

Rule A12

Rule A13

Rule A14

Rule A15

Rule A16

Rule B1

Rule B2

Rule B3

Rule B4

Rule C1

Rule C2

Rule C3

Rule C4

Rule D1

(c)

(d)

Figure 8. Required searches and interpolations for the algorithm’s three operating points (a) The merging tree (b) Operating point 1 (VBS 3x) (c) Operating point 2 (VBS 2x) (d) Operating point 3 (VBS 1x)

Table 3. Required searches and computational complexities for the tested references

Search/Decision Modes Ref. 4×4 8×4 4×8 8×8 16×8 8×16 16×16

Computational Requirements

JM 7x √ √ √ √ √ √ √ 7x MV search + searching 259 combs for MD JM 3x √ – – √ – – √ 3x MV search + searching 17 combs for MD JM 2x √ – – √ – – – 2x MV search + searching 16 combs for MD JM 1x √ – – – – – – 1x MV search + No mode decision VBS 3x √ M M √ M M √ 3x MV search + 26 comps + 18 adds (for MD) VBS 2x √ M M √ M M M 2x MV search + 25 comps + 12 adds (for MD) VBS 1x √ M M M M M M 1x MV search + 25 comps (for MD)

22



Figure 9. Flowchart of the proposed algorithm

23



Figure 10. Steps of the merge-checking rule

1. If one of the nodes is “unavailable”, then the parent node is also “unavailable”, else:

2. Calculate the number of identical (or semi-identical) MVs in the pool of candidate MVs of each of the two nodes (4×4 = 16 possible pairs of MVs combinations). A simple rule to identify semi-identical MVs would be:

D_MV_x = abs(MV_x1 - MV_x2); D_MV_y = abs(MV_y1 - MV_y2);

n_semi_iden_MV = 0;

If ((D_MV_x <= Th_x) && (D_MV_y <= Th_y)){ semi_identical = true; n_semi_iden_MV++;

} Else

semi_identical = false;

This operation should be repeated to count the number of semi-identical MVs between the two nodes under test (n_semi_iden_MV).

3. Decide whether to merge the two nodes under test or not based on the following rule:

If (0<QP<12) merge = ((n_semi_iden_MV >12)?true:false); Else if (13<QP<25) merge = ((n_semi_iden_MV>9)?true:false); Else if (26<QP<38) merge = ((n_semi_iden_MV >6)?true:false);

Else merge = ((n_semi_iden_MV _MV>3)?true:false);

4. If the two nodes are chosen to be merged, then the parent node is marked as “available”. The average MVs of the best 4 pairs of semi-identical MVs will be assigned to the parent node, This will be used to decide if this parent node is to be merged with its neighbour next-level node or not.

5. If the two blocks are not to be merged, then each of them will be marked with its best MV out of its 4

candidate ones, and the parent node will be marked as “unavailable”.

24



Table 4. Encoding time of the tested sequences via the seven versions of the reference software

Carphone QCIF (30 fps)

Miss America QCIF (30 fps)

Mobile QCIF (30 fps)

Total Enc. time (sec)

7.11

6.83

7.2

JM

7x Time spent on ME (sec)

6.794

6.5

6.9


3.302 (53.56%)

3.01 (59.93%)

3.11 (56.81%)

VBS


2.861 (59.76%)

2.7 (58.46%)

3.0 (56.52%)


3.049 (57.12%)

2.99 (56.22%)

3.1 (56.94%)

JM


2.877 (57.65%)

2.71 (58.31%)

2.9 (57.97%)


2.172 (69.45%)

2.153 (68.48%)

2.2 (69.44%)

VBS


1.952 (72.55%)

1.9 (70.77%)

2.1 (69.57%)


2.172 (69.45%)

2.14 (68.67%)

2.18 (69.72%)

JM


1.892 (72.15%)

1.85 (71.54%)

2.0 (71.01%)


1.061 (85.08%)

0.95 (86.09%)

1.2 (83.33%)

VBS


0.813 (88.33%)

0.75 (88.46%)

0.9 (86.96%)


1.046 (85.29%)

0.9 (86.83%)

1.1 (84.72%)

JM


0.797 (88.27%)

0.74 (88.62%)

0.8 (84.06%)

25



Total Encoding Time for Various Sequences

0

1

2

3

4

5

6

7

Carphone QCIF Miss America QCIF Mobile QCIFEncoded Sequences

Tota

l Enc

odin

g Ti

me

(sec

) JM 7xVBS 3xJM 3xVBS 2xJM 2xVBS 1xJM 1x

Time spent in ME for Various Sequences

0

1

2

3

4

5

6

7

Carphone QCIF Miss America QCIF Mobile QCIFEncoded Sequences

ME

Enco

ding

Tim

e (s

ec) JM 7x

VBS 3xJM 3xVBS 2xJM 2xVBS 1xJM 1x

(a) (b)

Figure 11. Time spent by the seven versions of the software on encoding the tested sequence (a) Total encoding time (b) ME encoding time

33

35

37

39

41

43

45

47

49

25000 26000 27000 28000 29000 30000 31000

Bitrate (bps)

PSNR

(dB)

JM 7XVBS 3XJM 3XVBS 2XJM 2XVBS 1XJM 1X

Figure 12. R-D behaviour for Carphone QCIF (30 fps) at low bitrate

26



39

41

43

45

47

49

51

53

55

25000 26000 27000 28000 29000 30000 31000

Bitrate (bps)

PSNR

(dB)

JM 7XVBS 3XJM 3XVBS 2XJM 2XVBS 1XJM 1X

Figure 13. R-D behaviour for Miss America QCIF (30 fps) at low bitrate

23

25

27

29

31

33

35

37

25000 26000 27000 28000 29000 30000 31000

Bitrate (bps)

PSNR

(dB)

JM7XVBS3XJM3XVBS2XJM2XVBS1XJM1X

Figure 14. R-D behaviour for Mobile and Calendar QCIF (30 fps) at low bitrate

27



Exploiting Concatenation in the Design of Low-Density Parity-Check Codes

Marco Baldi, Giovanni Cancellieri, and Franco ChiaraluceDIBET, Polytechnic University of Marche,

Ancona, ItalyEmail: {m.baldi, g.cancellieri, f.chiaraluce}@univpm.it

Abstract—This paper describes a method to exploit theconcatenation of very simple component codes in order toobtain good low-density parity-check codes. This allows todesign codes having a number of benefits, as high flexibilityin length and rate and low encoding complexity. We focuson two kinds of concatenation: the former is classic serialconcatenation, in which redundancy is progressively addedto the encoded word, whereas the latter is the special caseof concatenation coinciding with a bi-dimensional productcode. The proposed design technique allows to obtain codescharacterized by parity-check matrices with a low density of1 symbols and free of short cycles in their associated Tannergraph; so efficient algorithms based on the belief propagationprinciple can be adopted for their decoding. In addition,the systematic form of the component codes can ensure ratecompatibility; so the proposed codes can be adopted in type-II hybrid automatic repeat request schemes. We analyze theirproperties through theoretical arguments and provide somedesign examples to assess their performance.

Keywords-LDPC codes; concatenated codes; product codes.

I. INTRODUCTION

This paper deals with the design of a family of structuredlow-density parity-check (LDPC) codes based on serialconcatenation [1]. The introduction of concatenation in thedesign of efficient schemes for forward error correction(FEC) is due to Dave Forney in 1966 [2]. Since then, severalforms of concatenation have been exploited in the designof codes, providing better and better performance, untilthe introduction of turbo codes [3], based on concatenatedconvolutional codes and Bahl, Cocke, Jelinek and Raviv(BCJR) decoders [4].

In recent years, LDPC codes [5] have become the state ofthe art in FEC techniques, due to their capacity-approachingperformance under belief propagation decoding [6]. Sincetheir recent rediscovery [7], many design techniques forLDPC codes have been proposed, that can be classifiedas structured and non-structured approaches. Structured ap-proaches permit to design codes characterized by ratherlow implementation complexity that, however, must obeya number of constraints in terms of length and rate, asin the case of quasi-cyclic (QC) LDPC codes [8]. Non-structured designs, instead, are able to produce good LDPCcodes with very fine length and rate granularity, as occursby adopting the progressive edge growth technique [9].However, non-structured techniques generally produce codes

that are less prone to hardware implementation, due to thelack of structure in their characteristic matrices.

As a first aim of this paper, we study how to designstructured LDPC codes that can be represented as the serialconcatenation of very simple components [10], [11]. Weadopt, as component codes, a class of polynomial codeshaving a binomial generator polynomial. This approachpermits us to design codes characterized by a very simpleintrinsic structure, that allows to adopt low complexityencoder circuits. The proposed codes can also be shortenedarbitrarily, thus obtaining fine length and rate granularity.

We show that the proposed technique can be used todesign sets of rate compatible codes [12]. Rate compat-ibility is important, for example, in the implementationof Type-II Hybrid Automatic Repeat-reQuest (T-II HARQ)schemes, where packets are initially encoded with a highrate code, and then redundancy is transmitted incrementallyuntil successful decoding is achieved. T-II HARQ schemesare particularly useful in packet switched communicationnetworks, since they allow the achievement of capacity-approaching unequal error correction. Serial concatenationis a consolidated procedure to design rate compatible codes[13], [14].

When the code length increases, the advantages of adopt-ing serial concatenation can be limited by the need ofrather large component codes. For this reason, we consider afurther form of concatenation, coinciding with the structureof a bi-dimensional product code. At the cost of some lossin performance, the adoption of a product structure allows tokeep small the size of the component codes while designingconcatenated codes with large blocks. We will show thatboth the serial concatenation and the product structure areable to ensure the LDPC nature of the codes when adoptingthe special class of component codes we consider. We willalso develop some examples aimed at assessing the effectof different design choices in the combination of serialconcatenation with the product structure.

The paper is organized as follows. Section II introducesthe code design approach based on the serial concatenationof codes with binomial generator polynomial. In SectionIII the characteristics of the proposed codes are studied.Section IV reports some design examples of serially con-catenated codes and their simulated performance, whereasSection V gives some examples of usage of such codes

28



in bi-dimensional product structures. Finally, Section VIconcludes the paper.

II. CODE DESIGN

A. Component codes

In the considered scheme, the i-th component code (i =1, . . . ,M ) is a polynomial code with generator polynomial

g(i)(x) = (1 + xri) , (1)

where ri, that is a suitably chosen positive integer, representsthe code redundancy. We denote by ni the length of the i-thcomponent code and by ki = ni − ri its dimension. Eachcomponent code can be seen as a shortened version of abinary cyclic code with length Ni =

⌈ni

ri

⌉· ri ≥ ni, where

function �·� returns the smallest integer greater than or equalto its argument. It can be easily verified that:

(1 + xNi

)= (1 + xri)

(1 + xri + x2ri + . . .+ xNi−ri

);

(2)so a valid parity polynomial for the cyclic code is:

h(i)(x) =(1 + xri + x2ri + . . .+ xNi−ri

). (3)

Starting from the coefficients of the parity polynomial, itis easy to obtain a valid parity-check matrix for any binarycyclic code in its standard form [15]. For the consideredcyclic codes, the parity-check matrix (Hi) has a very regularstructure, that is a single row of

⌈ni

ri

⌉identity blocks with

size ri × ri. It follows that Hi has size ri ×Ni.The i-th cyclic code has dimension Ki = Ni− ri ≥ ni−

ri = ki. Each Ki-bit information vector can be associatedto an information polynomial m(i)(x) as follows:

m(i)(x) = m0+. . .+mki−1xki−1+. . .+mKi−1x

Ki−1, (4)

where m0 . . .mKi−1 ∈ {0, 1} are the information bits. Thecodeword corresponding to m(i)(x) can be expressed, inpolynomial terms, as follows:

t(i)(x) = t0 + . . .+ tni−1xni−1 + . . .+ tNi−1x

Ni−1

= m(i)(x)g(i)(x). (5)

We shorten the cyclic code by imposing mki= mki+1 =

. . . = mKi−1 = 0. This implies tni= tni+1 = . . . =

tNi−1 = 0, and the parity-check matrix can be accordinglyshortened by eliminating its first Ni−ni columns. Figure 1shows the structure of the parity-check matrix of the cycliccode and its shortened version. Black diagonals represent 1symbols, whereas the other symbols are null. The shortenedmatrix corresponds to the sub-matrix marked in grey.

r i

r i

Ni

n i

Figure 1. Parity-check matrix of the i-th component code.

B. Serial concatenation

We consider a first family of codes that are obtainedas the serial concatenation of M component codes of thetype described in Section II-A. We call this family of codesMultiple Serially Concatenated Multiple Parity-Check (M-SC-MPC) codes [16]. Each component code is in systematicform; so, we obtain a systematic serial concatenation, inwhich redundancy is incrementally appended at the end ofthe information vector.

The serially concatenated code has dimension k andlength n. If we set n0 = k, the i-th component code hasdimension ki = ni−1, redundancy ri and length ni = ki+ri,with i = 1 . . .M . The following relations hold:

n1 = n0 + r1 = k + r1n2 = n1 + r2 = k + r1 + r2

...nM = nM−1 + rM = k +

∑Mi=1 ri

(6)

and the overall code has length n = nM and redundancyr =

∑Mi=1 ri. The parity-check matrix of each component

code, in the form of Figure 1, can be used to obtain a validparity-check matrix for the serially concatenated code. Suchmatrix (H) is in lower triangular form, and it is shown inFigure 2, for the case M = 3. Each column of H hasmaximum density M/r = M/

∑Mi=1 ri (that is the density

of its leftmost n1 columns); the values ri, i = 1 . . .M , mustbe chosen high enough as to make H sparse, thus obtainingan LDPC code. Furthermore, we will see in the followingthat, under suitable conditions, matrix H corresponds toa Tanner graph free of short cycles, that allows to adoptefficient LDPC decoding algorithms.

It should be noted that the proposed scheme can be seenas a generalization of the multiple serially concatenatedsingle parity-check (M-SC-SPC) approach [17]. The latter,however, assuming r1 = r2 = . . . = rM = 1, does notpermit to obtain LDPC codes. Moreover, the performanceof M-SC-MPC codes can be better than that of M-SC-SPCcodes [16].

A first requirement, when designing serially concatenatedcodes with the proposed technique, consists in making theirparity-check matrix suitable for application of decoding

29



r1

n 1

r2

r3

n 2

n = n 3

Figure 2. Parity-check matrix of the serially concatenated code.

algorithms based on the Belief Propagation (BP) principle.This can be achieved only if the associated Tanner graph isfree of short cycles. For the considered codes, such conditioncan be easily ensured by following the rules established inLemma 1 and Corollary 1, that are reported next.

Lemma 1: In order to obtain a Tanner graph representa-tion free of length-4 cycles, the M-SC-MPC code must havelength n ≤ nmax, with:

nmax = mini,j∈[1,M ]

i<j

{lcm(ri, rj) +

M∑l=i+1

rl

}. (7)

Proof: Let us focus on the parity-check matrix forM = 3 (see Figure 2), and consider the first two blocks ofrows (i.e. the first r1+ r2 rows). A length-4 cycle exists be-tween any two rows if they have two 1 symbols at the samecolumns. It can be easily verified that this cannot occur whenn1 ≤ lcm(r1, r2), that is n ≤ lcm(r1, r2) + r2 + r3. If weconsider the first and third blocks of rows, length-4 cyclesare avoided among their rows when n1 ≤ lcm(r1, r3), thatis n ≤ lcm(r1, r3)+r2+r3. Finally, in the last two blocks ofrows (i.e. the last r2+r3 rows), length-4 cycles are absent forn2 ≤ lcm(r2, r3), that is n ≤ lcm(r2, r3)+r3. Hence, in or-der to avoid length-4 cycles in the matrix of Figure 2, it mustbe n ≤ min [lcm(r1, r2) + r2 + r3, lcm(r1, r3) + r2 + r3,lcm(r2, r3) + r3]. This confirms the validity of (7) for thecase M = 3. The same reasoning can be easily extended tothe general case of M component codes, thus proving theassertion.

Corollary 1: For a set of distinct, coprime and increas-ingly ordered ri’s, i = 1 . . .M , the Tanner graph of theM-SC-MPC code is free of length-4 cycles for code lengthn ≤ n′

max, with:

n′max = r1r2 +

M∑j=2

rj . (8)

Proof: If we refer again to the case M = 3 (see Figure2), by Lemma 1, length-4 cycles are avoided when n ≤

lcm(r1, r2) + r2 + r3 = r1r2 + r2 + r3 = r2(r1 + 1) + r3,n ≤ lcm(r1, r3) + r2 + r3 = r1r3 + r2 + r3 and n ≤lcm(r2, r3)+ r3 = r2r3+ r3. But r1r2 < r1r3 and r1+1 <r3, so the first condition is the most stringent one. This resultcan be extended to a generic value of M , in the sense thatthe condition set by the first two blocks of rows is always themost stringent one. So, under the hypotheses of the corollary,Eq. (8) results.

It is important to observe that the proposed design tech-nique achieves very fine granularity in the code length. Infact, provided that n ≤ nmax, each value of n is feasible andable to ensure a Tanner graph representation free of length-4cycles.

By comparing (7) and (8), we can observe that the choiceof ri’s all distinct and coprime yields the highest values forthe code length, i.e. the highest flexibility in the choice of n.For this reason, in the following we will consider distinct,coprime and increasingly ordered ri’s, in such a way as toapply Eq. (8).

C. Design of product codes

A particular form of serial concatenation can be realizedby constructing a product code, that can be seen as an N -dimensional polytope in which each component code worksalong one dimension.

We focus on the simplest form of product codes, thatare bi-dimensional codes. In this case, the overall coderesults from two component codes working on the twodimensions of a rectangular matrix. An example of suchmatrix is reported in Figure 3; we denote by (na, ka, ra)and (nb, kb, rb) the length, dimension and redundancy ofthe two component codes. The information bits are writtenin the inner kb × ka matrix by following a fixed order (forexample, in row-wise manner from top left to bottom right).When the inner matrix is filled, the first component codeacts on its rows, producing a set of kbra checks, that fillthe light grey rectangular region marked as “Checks a”.Then, the second component code acts on all na columns,so producing karb checks on the information symbols andfurther rarb checks on checks. So, the encoding processof a product code can be seen as the serially concatenatedapplication of two components codes. A special feature ofthis particular case of serial concatenation is that invertingthe order of application of the two component codes doesnot yield any change in the encoded word.

A product code permits to increase the minimum distancein a multiplicative way. If the two component codes haveminimum distances da and db, respectively, the productcode has minimum distance d = da · db. Several types ofcomponent codes have been used in the design of productcodes. SPC codes are often used because of their simplicity,but they can yield severe constraints on the overall codelength and rate. Better results can be obtained with productcodes based on Hamming codes, that can achieve very good

30



raka

Information bitskb

rb

Checks a

Checks bChecks on

checks

n b

n a

Figure 3. Encoding scheme of a bi-dimensional product code.

performance under soft-input/soft-output iterative decoding[18]. Furthermore, it has been demonstrated that productcodes are potentially able to achieve error-free coding witha nonzero code rate (as the number of dimensions increasesto infinity) [19], [20].

We are interested in designing bi-dimensional productcodes that exploit, as component codes, two serially con-catenated codes having the form described in Section II-B.In fact, when large block codes are needed, the form ofserial concatenation in Section II-B can require rather largecomponent codes, that can become unpractical for hardwareimplementation. On the other hand, bi-dimensional productcodes could be designed that allow to obtain large blockswhile still exploiting component codes with a rather smallsize. Next, we will show that the parity-check matrix of thebi-dimensional product code can be easily obtained startingfrom the parity-check matrices of the two component codes,and that the product code is still an LDPC code [1].

Let us suppose that the two component codes have thefollowing parity-check matrices, where hi,j represents thej-th column of matrix Hi:

Ha =[ha,1 ha,2 · · · ha,na

],

Hb =[hb,1 hb,2 · · · hb,nb

]. (9)

It follows that Ha has size ra×na, while Hb has size rb×nb.A valid parity-check matrix for the product code having

such components can be expressed in the following form:

Hp =

[Hp1

Hp2

], (10)

where Hp1 has size ranb × nanb, and Hp2 has sizerbna×nanb. Hp1 can be obtained as a block-diagonal matrixformed by nb repetitions of Ha:

Hp1 =

⎡⎢⎢⎢⎣

Ha 0 · · · 00 Ha · · · 0...

.... . .

...0 0 · · · Ha

⎤⎥⎥⎥⎦ , (11)

k

Encoder 1

k

Encoder 2

k

Encoder M

r1 r1 r2 k r1 r2 rM

Figure 4. Scheme of the concatenated systematic encoder.

where 0 represents an ra × na null matrix.Hp2 is given by (12): it consists of a row of blocks,

in which the i-th block contains, along the main diagonal,na copies of hb,i (that is, the columns of Hb), while allthe remaining symbols are null. Hp is redundant, since itincludes two sets of parity-check constraints representingchecks on checks through both the component codes. Forthis reason, Hp cannot have full rank. When both compo-nents are in systematic form, a full rank parity-check matrixfor the product code can be obtained by eliminating the lastra·rb rows from Hp1 or Hp2, in such a way to avoid doubledrepresentation of checks on checks.

If we suppose that the densities of 1 symbols in Ha

and Hb are δa and δb, respectively, it is easy to provethat the density of Hp1 is δa/nb, while that of Hp2 isδb/na. So, even starting from two component codes thatare not characterized by very sparse parity-check matrices,the resulting product code can still be an LDPC code.Alternative representations of the parity-check matrix canbe found, that can achieve even lower density [21]. For ourpurposes, however, the density of the parity-check matrix inthe form (10), with Hp1 and Hp2 as expressed by (11) and(12), is low enough.

Furthermore, it is easy to verify that matrix (10) is freeof length-4 cycles, provided that the same holds for thecomponent matrices Ha and Hb. So, the codes obtainedas bi-dimensional product codes can be effectively decodedby means of LDPC decoding algorithms. We will give someexamples in this sense in the next section.

III. CODE CHARACTERISTICS

A. Encoding and Decoding

The serially concatenated codes we consider, describedin Section II-B, can be encoded by using a very simpleconcatenated encoder structure, like that shown in Figure4. Each component code, described in Section II-A, is insystematic form; so, the i-th component encoder simplyappends ri redundancy bits to the input vector. The overallcodeword results in the concatenation of the input vector andthe redundancy vectors added by the cascade of encoders.

Alternatively, the serially concatenated code can be en-coded by using the standard “back substitution” technique.For the proposed concatenated code, the low-density parity-check matrix is in lower triangular form; so, the standardencoding algorithm has very low complexity due to thefact that it works on a sparse matrix. For generic LDPC

31



Hp2 =

⎡⎢⎢⎢⎣

hb,1 0 · · · 0 hb,2 0 · · · 0 hb,nb0 · · · 0

0 hb,1 · · · 0 0 hb,2 · · · 0 0 hb,nb· · · 0

......

. . ....

......

. . .... · · · ...

.... . .

...0 0 · · · hb,1 0 0 · · · hb,2 0 0 · · · hb,nb

⎤⎥⎥⎥⎦ . (12)

s i+1 r i+ s i+1 ...

s i+2 r i+ s i+2 ...

r i+1 2r i+1 ...1

r i+2 2r i+2 ...2

r i+ s i 2r i+ s i ...s i

......

... ...

... ... ...

p 1

p 2

p ri-si+1

p ri-si+2...

...

p ri

Figure 5. Parallel implementation of the encoder for the i-th componentcode.

codes, instead, a matrix pre-elaboration through Gaussianelimination may be needed in order to put the parity-check matrix in lower triangular form, and this usually doesnot preserve its sparse character, thus yielding increasedcomplexity.

On the other hand, when adopting the encoder circuitshown in Figure 4, each component code can be encoded bymeans of a very simple circuit, based on a linear feedbackshift register (LFSR) that implements the polynomial multi-plication expressed by Eq. (5). Encoding of each componentcode can be also implemented by using a parallel encoderarchitecture and serial to parallel and parallel to serialconverters [22]. For the proposed component codes, theparallel encoder coincides with a bank of SPC encoders,as shown in Figure 5. The parallel encoder for the i-thcomponent code can be represented as a binary matrix withri rows and

⌈ki

ri

⌉+ 1 columns. For encoding, its cells are

filled in column-wise order, from top left to bottom right.The first ri−si cells, with si = kimodri, are unused, whilethe others are filled in the order reported in Figure 5, untilthe first

⌈ki

ri

⌉columns are completed (white cells in the

figure). When the j-th row is filled, j = 1 . . . ri, the paritybit pj is calculated, by XORing the elements of the row,and its value is stored in the last column, at the same row.When all the parity bits have been calculated, the encoderoutputs the codeword by reading the matrix content in thesame order used for the input, but including the parity bits.

Due to the LDPC nature of the serially concatenated codeswe consider, and to the absence of short cycles in theirassociated Tanner graph, their decoding can be accomplishedthrough the standard Sum-Product Algorithm with Log-Likelihood Ratios (LLR-SPA) [23], or through its low-complexity versions, like the normalized min-sum (NMS)

algorithm [24]. These are BP-based decoding algorithmsand, hence, they are able to exploit the length-4 cycle freeTanner graph representation of the code to achieve capacityapproaching error-correction performance.

B. Minimum Distance

An upper bound on the minimum distance of the seriallyconcatenated codes can be obtained by exploiting theirconcatenated nature and the structure of the componentcodes.

Lemma 2: Serially concatenated codes in the proposedfamily have minimum distance:

dmin ≤ 2M . (13)

Proof: Let us consider the concatenated encoder shownin Figure 4 with systematic component encoders as shown inFigure 5, and let us focus on the first code, that has redun-dancy r1. Its minimum weight codewords have Hammingweight 2, and correspond to input vectors having weight1. Due to the encoder structure, the two 1 symbols in eachminimum weight codeword are spaced by an integer multipleof r1, say a1r1, with 1 ≤ a1 ≤

⌊k1

r1

⌋, where function

�·� returns the greatest integer smaller than or equal to itsargument.

When such codeword is given as input to the subsequentencoder, the two 1 symbols can be in the same row ofthe second encoder or not. In the first case, that occursfor a1r1 = a2r2, 1 ≤ a2 ≤

⌊k2

r2

⌋, the output codeword

has weight 2; otherwise, it has weight 4. The latter caseoccurs for k2 = n1 < lcm(r1, r2), and produces a weight-4codeword whose 1 symbols can be spaced of integer multi-ples of r1, r2 and linear combinations of them. When suchweight-4 codeword is given as input to the third componentencoder, its four 1 symbols can be in four different rows ornot. In the first case, the Hamming weight is doubled again,thus reaching 8. The same procedure can be generalized byinduction, thus obtaining that, for M component codes, theminimum Hamming weight cannot be greater than 2M , thatproves the assertion.

The proof of Lemma 2 gives an implicit rule for approach-ing the upper bound on the minimum distance: the numberof coincidences among linear combinations of the ri values,i = 1 . . .M , must be reduced as much as possible in therange [1, n]. The choice of coprime ri’s is also favorablefrom this viewpoint.

32



When serially concatenated codes are used as componentsin bi-dimensional product codes, the minimum distance ofthe overall code can be easily upper bounded starting fromLemma 2 and considering that the product code structureincreases the minimum distance in a multiplicative way. So,the following corollary immediately follows.

Corollary 2: A bi-dimensional product code having, ascomponents, two serially concatenated codes of the consid-ered family, has minimum distance:

dmin ≤ 2Ma+Mb , (14)

where Ma and Mb represent the number of serially concate-nated codes in the two components of the product code.

C. Rate Compatibility

Rate compatibility consists in designing a set of “compat-ible” codes with different rates, in such a way as to allowthe implementation of schemes with variable error correctioncapability.

For the serially concatenated scheme we consider, ratecompatibility is ensured by systematic encoding, since re-dundancy is incrementally appended to the informationvector. By considering each component code progressively,a set of rate compatible codes is simply obtained, with coderates

k

k + r1>

k

k + r1 + r2> . . . >

k

k +∑M

j=1 rj. (15)

An example of rate compatible serially concatenatedcodes is reported in the following section. Rate compatibilitycan also be ensured when the serially concatenated codes weconsider are used as components in bi-dimensional productcodes. In this case, a simple way to obtain a family of ratecompatible codes is to fix one of the two components of theproduct code, whereas the other one can employ a variablenumber of its sub-components in order to change the overallcode rate.

IV. EXAMPLES OF SERIALLY CONCATENATED CODES

In this section, we give some examples of design ofserially concatenated codes through the considered tech-nique. We show that such technique is able to producesmall, medium and large codes with very fine granularityin the code length and rate, and very good error correctionperformance. Together with their low complexity encoding,these peculiarities justify possible interest on such codes forpractical applications.

All simulations have been performed with Binary PhaseShift Keying (BPSK) modulation over the Additive WhiteGaussian Noise (AWGN) channel. Coded transmission hasbeen simulated through a suitable software, written in C++language, that performs a Montecarlo evaluation of bit errorrate (BER) and frame error rate (FER) for each value of

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

Bit

Err

or R

ate

Eb/N

0 [dB]

uncoded SCC(840, 702, M=4) SCC(899, 702, M=5) SCC(988, 702, M=6)

Figure 6. Simulated BER for small codes.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-7

10-6

10-5

10-4

10-3

10-2

10-1

100

Eb/N

0 [dB]

SCC(840, 702, M=4) SCC(899, 702, M=5) SCC(988, 702, M=6)

Fra

me

Err

or R

ate

Figure 7. Simulated FER for small codes.

energy per bit to noise power spectral density ratio (Eb/N0).In order to provide a sufficient level of confidence forMontecarlo simulations, each BER and FER point has beenestimated after waiting the occurrence of 100 erred frames.

A. Small Size Codes

We consider a first set of rate compatible serially con-catenated codes obtained from the following choice of rivalues: [29, 31, 35, 43, 59, 89]. All codes in the family havedimension k = 702, but different length and rate, dependingon the number of component codes. The first code adoptsM = 4 components, corresponding to the first four valuesof ri; thus it has redundancy r = 138, length n = 840 andrate R = 0.84. The second code is obtained by including thefifth component code; so it has redundancy r = 197, length

33



n = 899 and rate R = 0.78. The last code is obtainedby considering all the M = 6 values of ri; thus, it hasredundancy r = 286, length n = 988 and rate R = 0.71.Their simulated performance in terms of BER and FER, asa function of the signal-to-noise ratio Eb/N0, is reported inFigures 6 and 7, respectively.

Rate compatibility of these codes can be exploited in aT-II HARQ scheme, by transmitting, initially, each packetencoded through the first code; then, if requested, by sendingr5 and, finally, r6 bits of further redundancy.

B. Medium Size Codes

In order to provide a design example of codes withmoderate length, we have considered code parameters verysimilar to those proposed for application in near-Earthspace missions by the Consultative Committee for SpaceData Systems (CCSDS) [25]. The designed code has lengthn = 8208, dimension k = 7182 and, hence, rate R = 0.875.Its redundancy (r = 1026) corresponds to the followingchoice of ri values for the M = 5 component codes:[177, 181, 214, 221, 233]. Due to the very fine length gran-ularity achievable through the proposed design approach,the code could be arbitrarily shortened in order to havedimension coincident with an integer multiple of 32, assuggested in [25].

The error correction performance of the proposed code,reported in Figure 8, looks very good: its curves are almostoverlaid with those of an optimized code with very similarparameters (it has length n = 8176 and dimension k =7156) recommended by the CCSDS. The performance of thelatter code in terms of BER and FER, shown in Figure 8(and derived from [25]), refers to an FPGA implementationadopting a maximum number of decoding iterations equalto 50.

It should be observed that the proposed code, besidesachieving almost the same performance as the CCSDS code,allows the implementation of very simple encoder circuits,due to its concatenated nature. Therefore, it provides a validalternative to the CCSDS code that, in turn, is characterizedby low encoding complexity thanks to its quasi-cyclic nature.

C. Large Size Codes

As a further example, we have considered the design oflarge codes with high rate. We have adopted the followingchoice of ri values for the M = 5 component codes:[313, 569, 577, 641, 643], that yields redundancy r = 2743.With this choice of the parameters, we have designed a firstcode with length n = 27430, i.e. dimension k = 24687 andrate R = 0.9. Subsequently, this code has been shortened tolength n = 10972, thus producing a code with dimensionk = 8299 and rate R = 0.75.

In order to assess the effect of a different value of M , wehave also considered an alternative code design that givesthe same value of redundancy (r = 2743), but exploiting

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

Err

or R

ate

Eb/N

0 [dB]

uncoded BER SCC(8208, 7182, M=5) BER SCC(8208, 7182, M=5) FER CCSDS(8176, 7156) BER CCSDS(8176, 7156) FER

Figure 8. Performance of medium codes.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-7

10-6

10-5

10-4

10-3

10-2

10-1

100

Err

or R

ate

Eb/N

0 [dB]

uncoded BER SCC(27430, 24687, M=5) BER SCC(27430, 24687, M=5) FER SCC(27430, 24687, M=6) BER SCC(27430, 24687, M=6) FER

Figure 9. Performance of large codes with rate 0.9.

M = 6 serially concatenated codes. In this case, the chosenri values are: [313, 349, 401, 479, 563, 638].

Figure 9 shows the simulated performance of the twocodes with n = 27430 and rate 0.9. As we notice from thefigure, the adoption of a higher number of component codes,in this case, does not give any advantage. In particular, bothcodes do not show any error floor in the explored region.The serially concatenated code with M = 5 components,however, has better performance in the waterfall region, andits BER curve intersects that of an uncoded transmission ata lower signal-to-noise ratio. So, it could be concluded thatthere is no need to increase the number of components over5 for large codes with such a high rate.

However, when the code is shortened in such a way asto achieve lower rate, such conclusion is no more valid.

34



0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-9

10-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

100

Err

or R

ate

Eb/N

0 [dB]

uncoded BER SCC(10972, 8229, M=5) BER SCC(10972, 8229, M=5) FER SCC(10972, 8229, M=6) BER SCC(10972, 8229, M=6) FER

Figure 10. Performance of large codes with rate 0.75.

The simulated performance of the two codes with lengthn = 10972 and rate 0.75 is reported in Figure 10. As wenotice from the figure, an error floor appears in the curves ofthe code with M = 5 components, though at BER < 10−7;on the other hand, the waterfall behavior of the code is verygood. The presence of an error floor can be avoided, at thecost of a worse waterfall performance, by increasing thenumber of serially concatenated components up to M = 6.In this case, the error rate curves do not exhibit any changein their slope; so, they can intersect those of the code withM = 5 components. The FER curve for M = 6 intersectsthat for M = 5 at FER 10−4; an intersection is alsoexpected for the BER curve.

V. EXAMPLES OF PRODUCT CODES

In this section, we provide some examples of bi-dimensional product codes that exploit, as components,two serially concatenated codes of the type described inSection II-B. For conciseness, we denote them as “productM-SC-MPC codes”. The first example we consider is aproduct code with equal components. More precisely, eachcomponent code is a serially concatenated code havinglength na = nb = 64, dimension ka = kb = 49 andr1 = 7, r2 = 8. Hence, the product code has lengthn = 4096, dimension 2401 and rate 0.586. Figure 11shows its simulated performance by using the log-likelihoodversion of the SPA decoding algorithm. For the sake ofcomparison, the figure shows the performance of a 4D-TPCbased on (8, 7) SPC components [26], that has exactly thesame parameters.

We observe that the bi-dimensional product code outper-forms the 4D-SPC-TPC, when the latter is decoded throughalgorithm 1 in [26]. Instead, when adopting the decodingalgorithm 2 in the same reference, performance of the two

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-6

10-5

10-4

10-3

10-2

10-1

4D SPC-TPC ([25], alg. 1) 4D SPC-TPC ([25], alg. 2) PC(4096, 2401, M

a=M

b=2)

Bit

Err

or R

ate

Eb/N

0 [dB]

Figure 11. Comparison between (4096, 2401) SPC-TPC and a productcode with the same parameters.

codes is almost the same. However, an important differencebetween the two codes is the fact that our product code isbi-dimensional, while the SPC-TPC is quadri-dimensionaland, therefore, has larger decoding latency and complexity.

Two further examples are given in Figures 12 (for theBER) and 13 (for the FER), where two larger product M-SC-MPC codes are considered. The first code has (n, k) =(10000, 5670) and has been obtained as the product of twodifferent serially concatenated codes. Their parameters areas follows: na = 100, ka = 81 and raj = [9, 10]; nb = 100,kb = 70 and rbj = [7, 11, 12].

The second code has (n, k) = (12544, 6400); so, its rateis slightly reduced with respect to the former one. It adopts,as components, twice the same M-SC-MPC code. The latterhas length na = nb = 112, dimension ka = kb = 80 andraj = rbj = [8, 11, 13].

From the simulation results we see that, as expected, byusing larger component codes, product M-SC-MPC codescan achieve better performance. The higher number ofserially concatenated codes in each component allows toincrease the minimum distance and improve performance inthe error floor region. This is evident for the (12544, 6400)code, whose components are both based on an order-3 serialconcatenation. Together with its slightly lower rate, thismakes the performance of such code better than that of the(10000, 5670) code. It should also be noted that, differentlyfrom SPC-TPCs, in our codes the increase in minimumdistance is achieved though preserving the bi-dimensionalnature of the product code.

In order to better assess the effect on performance of thechoice of Ma and Mb, that is, the number of componentcodes in each one of the two serially concatenated codesforming the product code, we have designed three further

35



0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

uncoded PC(10000, 5670, M

a=2, M

b=3)

PC(12544, 6400, Ma=M

b=3)

Bit

Err

or R

ate

Eb/N

0 [dB]

Figure 12. BER performance for (10000, 5670) and (12544, 6400)product codes.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-5

10-4

10-3

10-2

10-1

100

PC(10000, 5670, Ma=2, M

b=3)

PC(12544, 6400, Ma=M

b=3)

Fra

me

Err

or R

ate

Eb/N

0 [dB]

Figure 13. FER performance for (10000, 5670) and (12544, 6400)product codes.

product codes with rate around 1/2 and comparable size.The first code has length 3195 and dimension 1504; itstwo components have na = 71, ka = 47, raj = [7, 8, 9]and nb = 45, kb = 32, rbj = [6, 7], respectively. So, inthis case, we have fixed Ma = 3 and Mb = 2. In thedesign of a second product code, we have adopted a differentdistribution of the serially concatenated components, that is,Ma = 4 and Mb = 1. The product code has length n = 3003and dimension k = 1467. The first one of its two componentcodes has na = 91, ka = 48 and raj = [9, 10, 11, 13]. Thesecond component, instead, is a single parity-check codewith nb = 33 and kb = 32. As a further example, wehave designed a third product code with Ma = Mb = 3.It uses twice the same serially concatenated code, that has

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-7

10-6

10-5

10-4

10-3

10-2

10-1

Bit

Err

or R

ate

Eb/N

0 [dB]

uncoded PC(3195, 1504, M

a=3, M

b=2)

PC(3003, 1536, Ma=4, M

b=1)

PC(5329, 2401, Ma=M

b=3)

Figure 14. BER performance for (3195, 1504), (3003, 1467) and(5329, 2401) product codes.

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-5

10-4

10-3

10-2

10-1

100

Fra

me

Err

or R

ate

Eb/N

0 [dB]

PC(3195, 1504, Ma=3, M

b=2)

PC(3003, 1536, Ma=4, M

b=1)

PC(5329, 2401, Ma=M

b=3)

Figure 15. FER performance for (3195, 1504), (3003, 1467) and(5329, 2401) product codes.

length na = nb = 73, dimension ka = kb = 49 andraj = rbj = [7, 8, 9]. Thus, the product code has lengthn = 5329 and dimension k = 2401.

The simulated performance is reported in Figure 14, forthe BER, and in Figure 15, for the FER. It results fromnumerical simulations that the code with Ma = 3 and Mb =2 has better performance with respect to the code havingMa = 4 and Mb = 1, especially in the waterfall region.This suggests that a balanced choice of Ma and Mb can yielda performance improvement with respect to an unbalancedchoice.

If both Ma and Mb are increased up to 3, performancecan be further improved, at the cost of a larger code block,due to the constraints of the product structure. However, the

36



0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-8

10-7

10-6

10-5

10-4

10-3

10-2

10-1

Bit

Err

or R

ate

Eb/N

0 [dB]

uncoded TC(3008, 1504) SCC(3008, 1504, M=5) PC(3195, 1504, M

a=3, M

b=2)

Figure 16. BER performance for codes with DVB-RCS compliantparameters.

(5329, 2401) code is based on two very small componentcodes, so its complexity could still be reasonably low formany practical applications.

In comparison with single serially concatenated codes, itis reasonable to expect that the adoption of smaller compo-nent codes in a product code is paid in terms of a worse errorrate performance. In order to verify such conclusion, wehave considered a set of codes having parameters compliantwith the DVB-RCS standard [27]. We have focused on theoption of MPEG 2 frames, that are 188 bytes long, andcode rate R = 1/2. For this choice of the parameters, thestandard recommends the usage of a (3008, 1504) doublebinary turbo code with an optimized interleaver.

We have designed a serially concatenated code withexactly the same parameters. It has length 3008 andadopts the following choice of the ri’s for its components:[277, 281, 293, 313, 340]. For the sake of comparison, wehave also considered one of the product codes introducedin the previous section. It has dimension 1504, like in thestandard, but length 3195, because of the constraints due tothe product code design. Its components have na = 71,ka = 47 and raj = [7, 8, 9]; nb = 45, kb = 32 andrbj = [6, 7].

The simulated performance of the serially concatenatedand the product code is reported in Figure 16, for the BER,and in Figure 17, for the FER. The simulated performanceof the standard turbo code is also reported as a benchmark.

From the figures we see that the product code requireshigher SNR per bit than the single serially concatenated codewith M = 5; the loss at BER 10−6 and FER 10−4 isin the order of 1.5 dB. This is the price to pay for reducingcomplexity by adopting a product code structure. In thisexample, where we have focused on low rate codes, the

0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.010-6

10-5

10-4

10-3

10-2

10-1

100

Fra

me

Err

or R

ate

Eb/N

0 [dB]

SC(3008, 1504) SCC(3008, 1504, M=5) PC(3195, 1504, M

a=3, M

b=2)

Figure 17. FER performance for codes with DVB-RCS compliantparameters.

standard turbo code achieves the best performance, also incomparison with the M-SC-MPC code. On the contrary, athigher code rates, M-SC-MPC codes and, more generally,structured LDPC codes can be able to outperform turbocodes [28].

VI. CONCLUSION

We have shown that the concatenation of very simplecomponent codes can be effectively exploited in the de-sign of structured LDPC codes. This allows to obtain afamily of LDPC codes characterized by fine length andrate granularity, low complexity and rate compatibility. Theyensure good error correction performance, and compete withcodes optimized for specific applications. When large blockcodes are needed, a solution for continuing to exploit smallcomponents is to use serially concatenated codes in bi-dimensional product code structures. We have consideredbi-dimensional product codes obtained as the direct productof two serially concatenated codes. We have shown that suchproduct codes are still LDPC codes, and that their associatedTanner graph is suitable for performing LDPC decoding. Oursimulations show that LDPC product codes in the consideredclass are able to achieve rather good performance in spite oftheir very low complexity. However, the advantage of usingvery small components in product code structures is paid interms of coding gain: simulation results show a loss on theorder of 1.5 dB for LDPC product codes, compared withsingle LDPC codes with the same parameters.

As a further work, the introduction of an interleaver withinthe product code structure could be considered, in order tooptimize the decoder performance in the waterfall regionwhile preserving the multiplicative effect on the minimumdistance due to the product code structure.

37



REFERENCES

[1] M. Baldi, G. Cancellieri, and F. Chiaraluce, “A class oflow-density parity-check product codes,” in Proc. SPACOMM2009, Colmar, France, Jul. 2009, pp. 107–112.

[2] G. D. Forney, Jr., “Concatenated codes,” Ph.D. dissertation,MIT Press, Cambridge, MA, 1966.

[3] C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannonlimit error-correcting coding and decoding: Turbo-codes,” inProc. IEEE International Conference on Communications(ICC ‘93), vol. 2, Geneva, Switzerland, May 1993, pp. 1064–1070.

[4] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decodingof linear codes for minimizing symbol error rate,” IEEETrans. Inform. Theory, vol. 20, no. 2, pp. 284–287, Mar. 1974.

[5] R. G. Gallager, “Low-density parity-check codes,” IRE Trans.Inform. Theory, vol. IT-8, pp. 21–28, Jan. 1962.

[6] C. Sae-Young, G. Forney, T. Richardson, and R. Urbanke,“On the design of low-density parity-check codes within0.0045 dB of the Shannon limit,” IEEE Commun. Lett., vol. 5,no. 2, pp. 58–60, Feb. 2001.

[7] D. J. C. MacKay and R. M. Neal, “Good codes based onvery sparse matrices,” in Cryptography and Coding. 5th IMAConference, ser. Lecture Notes in Computer Science, C. Boyd,Ed. Berlin: Springer, 1995, no. 1025, pp. 100–111.

[8] M. P. C. Fossorier, “Quasi-cyclic low-density parity-checkcodes from circulant permutation matrices,” IEEE Trans.Inform. Theory, vol. 50, no. 8, pp. 1788–1793, Aug. 2004.

[9] X. Y. Hu, E. Eleftheriou, and D. M. Arnold, “Regularand irregular progressive edge-growth Tanner graphs,” IEEETrans. Inform. Theory, vol. 51, no. 1, pp. 386–398, Jan. 2005.

[10] M. Baldi, G. Cancellieri, and F. Chiaraluce, “New LDPCcodes based on serial concatenation,” in Proc. AICT ’09,Venice/Mestre, Italy, May 2009, pp. 31–315.

[11] ——. (2009) Good LDPC codes based onvery simple component codes. [Online]. Available:http://www.gtti.it/GTTI09/Programma.php

[12] J. Hagenauer, “Rate-compatible punctured convolutionalcodes (RCPC codes) and their applications,” IEEE Trans.Commun., vol. 36, no. 4, pp. 389–400, Apr. 1988.

[13] F. Babich, G. Montorsi, and F. Vatta, “Rate-compatible punc-tured serial concatenated convolutional codes,” in Proc. IEEEGlobecom’03, vol. 4, San Francisco, CA, Dec. 2003, pp.2062–2066.

[14] ——, “Performance enhancement of partially systematic rate-compatible SCCCs through puncturing design,” in Proc. IEEEGlobecom’04, vol. 1, Dallas, TX, Nov. 2004, pp. 167–171.

[15] S. B. Wicker, Error Control Systems for Digital Communica-tion and Storage. Prentice Hall, 1994.

[16] M. Baldi, G. Cancellieri, A. Carassai, and F. Chiaraluce,“LDPC codes based on serially concatenated multiple parity-check codes,” IEEE Commun. Lett., vol. 13, no. 2, pp. 142–144, Feb. 2009.

[17] J. S. K. Tee, D. P. Taylor, and P. A. Martin, “Multiple serialand parallel concatenated single parity-check codes,” IEEETrans. Commun., vol. 51, no. 10, pp. 1666–1675, Oct. 2003.

[18] F. Chiaraluce and R. Garello, “Extended Hamming productcodes analytical performance evaluation for low error rateapplications,” IEEE Trans. Wireless Commun., vol. 3, no. 6,pp. 2353–2361, Nov. 2004.

[19] P. Elias, “Error free coding,” IRE Trans. Inf. Theory, vol. 4,no. 4, pp. 29–37, Sep. 1954.

[20] D. M. Rankin, T. A. Gulliver, and D. P. Taylor, “Asymptoticperformance of single parity-check product codes,” IEEETrans. Inform. Theory, vol. 49, no. 9, pp. 2230–2235, Sep.2003.

[21] M. Esmaeili, “Construction of binary minimal product parity-check matrices,” Applicable Algebra in Engineering, Commu-nication and Computing, vol. 19, no. 4, pp. 339–348, Aug.2008.

[22] O. Gazi and A. O. Yilmaz, “On parallelized serially concate-nated codes,” in Proc. WCNC 2007, Hong Kong, Mar. 2007,pp. 714–718.

[23] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding ofbinary block and convolutional codes,” IEEE Trans. Inform.Theory, vol. 42, no. 2, pp. 429–445, Mar. 1996.

[24] J. Chen and M. Fossorier, “Near optimum universal be-lief propagation based decoding of low-density parity checkcodes,” IEEE Trans. Commun., vol. 50, pp. 406–414, Mar.2002.

[25] CCSDS, “Low Density Parity Check Codes for use in Near-Earth and Deep Space Applications,” Consultative Committeefor Space Data Systems (CCSDS), Washington, DC, USA,Tech. Rep. Orange Book, Issue 2, Sep. 2007, CCSDS 131.1-O-2.

[26] D. M. Rankin and T. A. Gulliver, “Single parity check productcodes,” IEEE Trans. Commun., vol. 49, no. 8, pp. 1354–1362,Aug. 2001.

[27] Digital Video Broadcasting (DVB); Interaction channel forsatellite distribution systems, ETSI EN Std. 301 790 (v1.4.1),Sep. 2005.

[28] M. Baldi, F. Chiaraluce, and G. Cancellieri, “Finite-precisionanalysis of demappers and decoders for LDPC-coded M-QAM systems,” IEEE Trans. Broadcast., vol. 55, no. 2, pp.239–250, Jun. 2009.

38



An Iterative Algorithm for Compression ofCorrelated Sources at Rates Approaching the

Slepian-Wolf Bound: Theory and AnalysisF. Daneshgaran, M. Laddomada, and M. Mondin

Abstract—This paper proposes a novel iterative algorithmbased on Low Density Parity Check codes for compression ofcorrelated sources at rates approaching the Slepian-Wolf bound.The setup considered in the paper looks at the problem ofcompressing one source without employing the source correlation,and employing the other correlated source as side information atthe decoder which decompresses the first source. We demonstratethat depending on the extent of the source correlation estimatedthrough an iterative paradigm, significant compression can beobtained relative to the case the decoder does not use theimplicit knowledge of the existence of correlation. Two stages ofiterative decoding are employed. During global iterations updatedestimates of the source correlation is obtained and passed on tothe belief-propagation decoder that performs local iterations witha pre-defined stopping criterion and/or a maximum number oflocal decoding iterations. Detailed description of the iterativedecoding algorithm with embedded cross-correlation estimationare provided in the paper, in addition to simulation resultsconfirming the potential gains of the approach.

Keywords-Correlated sources; compression; iterative decoding;low density parity check codes; Slepian-Wolf.

I. INTRODUCTION

Consider two independent identically distributed (i.i.d.)discrete binary memoryless sequences of length k, X =[x1, x2, . . . , xk] and Y = [y1, y2, . . . , yk], where pairs of com-ponents (xi, yi) have joint probability mass function p(x, y).Assume that the two sequences are generated by two transmit-ters which do not communicate with each other, and that bothsequences have to be jointly decoded at a common receiver.Slepian and Wolf demonstrated that the achievable rate regionfor this problem (i.e., for perfect recovery of both sequencesat a joint decoder), is the one identified by the following setof equations imposing constraints on the rates RX and RY bywhich both correlated sequences are transmitted:

RX ≥ H(X|Y ),RY ≥ H(Y |X),RX + RY ≥ H(X, Y )

(1)

whereby, H(X|Y ) is the conditional entropy of source Xgiven source Y , H(Y |X) is the conditional entropy of sourceY given source X , and H(X,Y ) is the joint entropy of thesources. A pictorial representation of this achievable region isgiven in Fig. 1.

Fred Daneshgaran is with the ECE Dept., Calif. State Univ., Los Angeles,USA. E-mail: [email protected]

Massimiliano Laddomada is with the Electrical Engineering Department ofTexas A&M University-Texarkana, USA. E-mail: [email protected].

Marina Mondin is with the Dipartimento di Elettronica, Politecnico diTorino, Italy. E-mail: [email protected]

Rx

Ry

H(X)H(X|Y)

H(Y)

H(Y|X)

A

B

R +x R =H(X,Y)y

Fig. 1. Rate region for Slepian-Wolf encoding.

In this paper, which is an extended and thorough versionof [1], we focus on trying to achieve the corner points Aand B in Fig. 1, since any other point between these canbe achieved with a time-sharing approach [2]. In particular,we focus on the architecture shown in Fig. 2 in which weassume that one of the two sequences, namely X in ourframework, is independently encoded with a source encoderthat does not employ the correlation between the sources Xand Y . We assume that sequence Y is compressed up to itssource entropy H(Y ) and is known at the joint decoder asside information, and our aim is at compressing sequence Xwith a rate R1 ≤ RX as close as possible to its conditionalentropy R1 ≥ H(X|Y ) in order to achieve the corner pointA in Fig. 1. The decoder tries to decompress the sequenceX , in order to obtain an estimate X , by employing Y as sideinformation and without the prior knowledge of the amountof correlation between the sequences. Indeed, it estimates thiscorrelation through an iterative algorithm which improves thedecoding reliability of X .

Obviously, our solution to joint source coding at point A isdirectly applicable to point B by symmetry. The overall rateof transmission of both sequences is greater than H(Y ) +H(X|Y ) = H(X, Y ).

With this background, let us provide a quick survey of therecent literature related to the problem addressed in this paper.This survey is by no means exhaustive and is meant to simplyprovide a sampling of the literature in this area.

In [3], [4], [5], the authors deal with applications of sensornetworks for tracking and processing of information to be sentto a common destination. In particular, in [4], the authorsinspired by information theory concepts and in particular, the

39



Source EncoderX Z PerfectChannelR1

JointDecoder

X

Yside information

Rx

Z

Fig. 2. Architecture of the encoder and joint decoder for the Slepian-Wolfproblem.

Slepian-Wolf theorem [2], present the problem of distributedsource coding using syndromes. The approach is based onthe use of coset codes and significant compression gains areachieved using the proposed technique.

In [6], the authors show that turbo codes can allow to comeclose to the Slepian-Wolf bound in lossless distributed sourcecoding. In [7], [8], [9], the authors propose a practical codingscheme for separate encoding of the correlated sources forthe Slepian-Wolf problem. In [10], the authors propose theuse of punctured turbo codes for compression of correlatedbinary sources whereby compression has been achieved viapuncturing. The proposed source decoder utilizes an iterativescheme to estimate the correlation between two differentsources. In [11], punctured turbo codes have been applied tothe compression of non-binary sources.

The article [12] focuses on the problem of reducing thetransmission rate in a distributed environment of correlatedsources and the authors propose a source coding scheme forcorrelated images exploiting modulo encoding of pixel valuesand compression of the resulting symbols with binary and non-binary turbo codes.

Paper [13] deals with the use of irregular repeat accumulatecodes as source-channel codes for the transmission of a mem-oryless binary sequence with side information at the decoder,while in [14] parallel and serial concatenated convolutionalcodes are considered for the same problem. In [15], [16],the authors propose a practical coding scheme based on LowDensity Parity Check (LDPC) codes for separate encoding ofthe correlated sources for the Slepian-Wolf problem.

The dual problem of Slepian-Wolf correlated source codingover noisy channels has been dealt with in [17]-[21].

In [22] the authors consider the problem of encodingdistributed correlated sources, demonstrating that separatesource-channel coding is optimal in any network in whichcorrelated sources do not interfere with each other, and thatthe information on a network behaves as a flow. Work [23]proposes a distributed joint source-channel coding scheme formultiple correlated sources.

Finally, in [24] the authors consider the design for jointdistributed source and network coding.

Almost all proposed works in the literature use soft-metricsfor improving the decoding performance. An excellent workdescribing the ”mathematics of soft-decoding” is the paper byHagenauer et al. [25].

Relative to the cited articles, the main novelty of the presentwork may be summarized as follows: 1) in [10] and [16] theencoder and decoder must both know the correlation betweenthe two sources. We assume knowledge of mean correlation at

the encoder. The decoder has implicit knowledge of this viaobservation of the length of the encoded message. It iterativelyestimates the actual correlation observed and uses it duringdecoding; 2) our algorithm can be used with any pair ofsystematic encoder/decoder without modifying the encodingand decoding algorithm; 3) the proposed algorithm is veryefficient in terms of the required number of LDPC decodingiterations. We use quantized integer LLR values (LLRQ) andthe loss of our algorithm for using integer LLRQ metrics isquite negligible in light of the fact that it is able to guaranteeperformance better than that reported in [10] and [16] (where,to the best of our knowledge, authors use floating pointmetrics) as exemplified by the results shown in table II below;4) we utilize post detection correlation estimates to generateextrinsic information, which can be applied to any alreadyemployed decoder without any modification; and 5) we donot use any interleaver between the sources at the transmitter.Using the approach of [10] in a network, information aboutinterleavers used by different nodes must be communicatedand managed. This is not trivial in a distributed network suchas the internet. Furthermore, there is a penalty in terms ofdelay that is incurred.

This paper is an extended version of [1] and is organized asfollows. Section II deals with the definition of the encodingalgorithm for correlated sources, and presents the class ofLDPC codes which is the focus of our current work. Insection III, we present modification of the belief-propagationalgorithm for decoding of LDPC codes, and follow up withdetails of our algorithm for iterative joint source decoding ofcorrelated sources with side information and iterative corre-lation estimation. Section IV deals with the issue of sensitiv-ity of the cross-correlation function between two sequencesto residual errors after channel decoding, demonstrating therelative robustness of the empirical cross-correlation measureto channel induced errors. In section V, we present simulationresults and comparisons confirming the potential gains that canbe obtained from the proposed iterative algorithm. Finally, wepresent the conclusion in section VI.

II. ARCHITECTURE OF THE LDPC-BASED SOURCEENCODER

This section focuses on the source encoder used for sourcecompression. LDPC coding is essential to achieve performanceclose to the theoretical limit in [2].

The LDPC matrix [26] for encoding each source is consid-ered as a systematic (n, k) code. The codes used need to besystematic for the decoder to exploit the estimated correlationbetween X and Y directly.

Each codeword C is composed of a systematic part X , anda parity part Z which together form C = [X, Z]. With thissetup and given the parity check matrix Hn−k,n of the LDPCcode, it is possible to decompose Hn−k,n as follows:

Hn−k,n = (HX ,HZ) (2)

whereby, HX is a (n− k)× (k) matrix specifying the sourcebits participating in check equations, and HZ is a (n− k)×

40



(n− k) matrix of the form:

HZ =

1 0 . . . 0 01 1 0 . . . 00 1 1 0 . . .

. . . . . . . . . . . . . . .0 . . . 0 1 1

. (3)

The choice of this structure for H , also called staircase LDPC(for the double diagonal of ones in HZ), has been motivatedby the fact that aside from being systematic, we obtain a LDPCcode which is encodable in linear time in the codeword lengthn. In particular, with this structure, the encoding operation isas follows:

zi =

[∑kj=1 xj ·HX

i,j

](mod 2), i = 1[

zi−1 +∑k

j=1 xj ·HXi,j

](mod 2), i = 2, ., n− k

(4)where, HX

i,j represents the element (i, j) of the matrix HX ,and xj is the j-th bit of the source sequence X .

Source compression takes place as follows: considering thescheme shown in Fig. 2, we encode the length k sequencebelonging to the source X and transmit on a perfect channelonly the parity sequence Z, whose bits are evaluated as in (4).The rate guaranteed by such an encoder is

R1 =n− k

k.

In relation to the setup shown in Fig. 2, the Slepian-Wolfproblem reduces to that of encoding the source X with a rateR1 as close to H(X|Y ) as possible (i.e., R1 ≥ H(X|Y )).Note that source correlation is not employed at the encoder.

The objective of the joint decoder as explained in the nextsection, is to recover sequence X by employing the correlatedsource Y (considered as a side information at the decoder),and the estimates of the correlation between the sources Xand Y obtained in an iterative fashion.

Before explaining the iterative decoding algorithm, let usbriefly discuss the correlation model adopted in this frame-work. We consider the following model in order to follow thesame framework pursued in the literature [10], [15]:

P (xj 6= yj) = p, ∀j = 1, . . . , k (5)

In light of the considered correlation model, and noting thatthe sequence Y is available losslessly at the joint decoder, thetheoretical limit for lossless compression of X is

R1 ≥ H(X|Y ) = H(p),

whereby H(p) is the binary entropy function defined as

H(p) = −p log2(p)− (1− p) log2(1− p).

With this setup, the following holds:

R1 + RY ≥ H(X,Y ) → R1 + 1 ≥ H(p) + 1. (6)

Note that the encoder needs to know the mean correlation soas to choose a rate close to H(p). It does so, by keepingk constant while choosing n appropriately. We use the termmean correlation, because in any actual setting, the exactcorrelation between the sequences may be varying about the

LDPCdecoder

L (X)

hardestimator

X correlationestimator

Y

(i)(i)

Y

^

Z

α(i)

Fig. 3. Architecture of the Iterative Joint decoder of correlated sources.

mean value. Hence, it is beneficial if the decoder estimates theactual correlation value from observations itself. While no sideinformation about the rate is communicated to the decoder,the decoder knows the mean correlation implicitly from theknowledge of block length n.

III. JOINT ITERATIVE LDPC-DECODING OF CORRELATEDSOURCES

The architecture of the iterative joint decoder for theSlepian-Wolf problem is depicted in Fig. 3. Its goal is todetermine the best estimate X of the source k-sequence X ,by starting from the received parity bit (n − k)-sequence Z.We note that aside from the fact that the receiver may a-priori presume some correlation between both sequences exist,no side information on this correlation is communicated tothe joint decoder. Indeed, this correlation is estimated in aniterative fashion.

With reference to the architecture of the joint decoderdepicted in Fig. 3, we note that there are two stages ofiterative decoding. Index i denotes a global iteration wherebyduring each global iteration, the updated estimate of the sourcecorrelation obtained during the previous global iteration ispassed on to the belief-propagation decoder that performslocal iterations with a pre-defined stopping criterion and/ora maximum number of local decoding iterations.

Based on the notation above, we can now develop thealgorithm for exploiting the source correlation in the LDPCdecoder. Consider a (n, k)-LDPC identified by the matrixH(n−k,n) as expressed in (2). Note that we only make refer-ence to maximum rank matrix H since the particular structureassumed for H ensures this. In particular, the double diagonalon the parity side of the H matrix always guarantees that therank of H is equal to the number of its rows, i.e., n− k.

It is well known that the parity check matrix H can bedescribed by a bipartite graph with two types of nodes asshown in Fig. 4; n bit-nodes corresponding to the LDPC codebits, and n−k check-nodes corresponding to the parity checksas expressed by the rows of the matrix H . Following thenotation employed in [29], let B(m) denote the set of bit-nodes connected to the m-th check-node, and C(n) denotethe set of check-nodes adjacent to the n-th bit-node. With thissetup, B(m) corresponds to the set of positions of the 1’s inthe m-th row of H , while C(n) is the set of positions of the1’s in the n-th column of H . In addition, let us use the notationC(n) \m and B(m) \ n to mean the sets C(n) and B(m) inwhich the m-th check-node and the n-th bit-node respectively,are excluded. Furthermore, let us identify with λn,m(un) thelog-likelihood of the message that the n-th bit-node sends to

41



1u 2u 3u ku k+1u nu

systematic bit nodes parity bit nodes

check nodes

1z 2z iz i+1z n-kz

Fig. 4. Pictorial representation of the Tanner graph of a staircase LDPCcode.

the m-th check-node, that is, the LLR of the probability that n-th bit-node is 1 or 0 based on all checks involving the n-th bitexcept the m-th check, and with Λm,n(un) the log-likelihoodof the message that the m-th check-node sends to the n-thbit-node, that is, the LLR of the probability that the n-th bit-node is 1 or 0 based on all the bit-nodes checked by the m-thcheck except the information coming from the n-th bit-node.With this setup, we have the following steps of the belief-propagation algorithm:Initialization Step: each bit-node is assigned an a-posterioriLLR L (uj) as follows:{

log(

P (xj=1|yj)P (xj=0|yj)

)= (2yj − 1)α(i), j = 1, . . . , k

(2zj − 1) , j = k + 1, . . . , n(7)

whereby,

α(i) = log(

p(i)

1− p(i)

)

is the correction factor taking into account the estimatedcorrelation between sequences X and Y at global iteration i.Note that this term derives from the correlation model adoptedin this paper as expressed in (5), in which the correlationbetween any bit in the same position in the two sequences Xand Y is seen as produced by an equivalent binary symmetricchannel with transition probability p.

In our setup, before the first global iteration, α(0) = 1 sincethe degree of correlation is not known a priori.

In summary, for any position (m,n) such that Hm,n = 1,set:

λn,m(un) = L(un), (8)

andΛm,n(un) = 0. (9)

(1) Check-node update: for each m = 1, . . . , n − k, and foreach n ∈ B(m), compute:

Λm,n(un) = 2 tanh−1

∏

p∈B(m)\ntanh

(λp,m(up)

2

)

(10)(2) Bit-node update: for each t = 1, . . . , n, and for each m ∈C(t), compute:

λt,m(ut) = L(ut) +∑

p∈C(t)\mΛp,n(ut). (11)

(3) Decision: for each bit node ut with t = 1, . . . , n, compute:

λt(ut) = L(ut) +∑

p∈C(t)

Λp,n(ut), (12)

and quantize the results such that ut = 0 if λt(ut) < 0, andut = 1 otherwise.

If H · UT = 0 then halt the algorithm and outputxt = ut, t = 1, . . . , k as the estimate of the decompressedsource bits, X , corresponding to the first source. Otherwise,if the number of iterations is less than a predefined maximumnumber, iterate the process starting from step (1).

The architecture of the iterative joint decoder is depictedin Fig. 3. Let us elaborate on the signal processing involved.In particular, as before let x and y be two correlated binaryrandom variables which can take on the values {0, 1} andlet r = x ⊕ y. Let us assume that random variable r takeson the values {0, 1} with probabilities P (r = 1) = pr andP (r = 0) = 1− pr.

The correction factor α(i) at global iteration (i) is evaluatedas follows,

α(i) = log(

pr

1− pr

), (13)

by counting the number of places in which X(i) and Y differ,or equivalently by evaluating the Hamming weight wH(.) ofthe sequence R(i) = X(i) ⊕ Y whereby, in the previousequation,

pr =wH(R(i))

k.

In the latter case, by assuming that the sequence R = X ⊕ Yis i.i.d., we have:

α(i) = log

(wH(R(i))

k − wH(R(i))

)(14)

where k is the source block size. Above, letters highlightedwith . are used to mean that the respective parameters havebeen estimated.

Formally, the iterative decoding algorithm can be stated asfollows:

1) Set the log-likelihood ratios α(0) to one (see Fig. 3).Compute the log-likelihood ratios for any bit nodeusing (7).

2) For each global iteration i = 1, . . . , q, do the following:a) perform a belief-propagation decoding on the par-

ity bit sequence Z by using a predefined maximumnumber of local iterations, and the side informationrepresented by the correlated sequence Y alongwith the correction factor α(i−1);

b) Evaluate α(i) using (14);c) If

∣∣α(i) − α(i−1)∣∣ ≥ 10−4 go back to (a) and

continue iterating, else exit.Point c) in the previous code fragment is used in order to

speed-up the overall iterative algorithm. Extensive tests weconducted suggested that the threshold value of 10−4 may beused for this purpose. Obviously, one can keep iterating untilthe last global iteration as well.

42



A. Overview of Integer-Metrics Belief-Propagation Decoder

In this section, we briefly describe the LDPC decoder work-ing with integer LLRs. This approach leads to efficient belief-propagation decoding. Following the rationales described in[27], we begin by quantizing any real LLR (denoted LLRQafter quantization) employed in the initialization phase ofthe belief-propagation decoder in (7), using the followingtransformation:

LLRQ ={ b2qL(uj) + 0.5c , j = 1, . . . , kb2zj − 1c · S, j = k + 1, . . . , n

(15)

whereby b·c stands for rounding to the smaller integer in theunit interval in which the real number falls, L(uj) is the realLLR, S is a suitable scaling factor, and q is the precisionchosen to represent the LLR with integer metrics. In ourbelief-propagation decoder, we use q = 3, which guaranteesa good trade-off between BER performance and complexityof the decoder implementation [27]. The scaling factor S isthe greatest integer metric processed by the iterative decoder.In our set-up, we use S = 10000. Note that such a scalingfactor depends on the practical implementations of the belief-propagation decoder. Suffice it to say that in our setup, S giveshigh likelihood to the parity bits zj , ∀j = k + 1, . . . , n, sincethey are transmitted through a perfect channel to the decoder.

All the summations shown in equations (11) and (12) areperformed on integer LLRs. Finally, (10) has to be modifiedfor working with integer metrics. To this end, we follow theschedule proposed in [29] in order to avoid real calculationsand only use look-up tables and integer additions. This way,the basic operation L(x ⊕ y) in the check node update isapproximated as follows:

L(x⊕ y) ≈ sign(L(x)) · sign(L(y)) ·min(|L(x)|, |L(y)|) (16)

+ log(1 + eL(x)+L(y)

)− log

(eL(x) + eL(y)

)

whereby, sign(·) is the function which evaluates the sign ofthe enclosed LLR. The latter two log functions are evaluatedthrough look-up tables containing samples of the log func-tions approximated with piecewise linear approximation assuggested in [29].

IV. SENSITIVITY OF THE CROSS-CORRELATION TORESIDUAL ERRORS AFTER CHANNEL DECODING

In this section, we wish to demonstrate the relative robust-ness of the empirical cross-correlation between sequences Xand Y to residual errors after channel decoding. To this end,let X and Y be two binary vectors of length k. Correlationbetween two binary streams can be measured in two ways. Onetechnique is to map a logic-0 to the integer +1, logic-1 to −1and to take the inner product of the two vectors defining ourdata packets and divide by the vector length (assumed to bethe same for both packets). It is well known that the resultingempirical correlation estimate lies in the range −1 ≤ ρ ≤ +1.Positive value of ρ signifies a similarity between X and Y ,while a negative ρ signifies a similarity between X and Y ,where Y is the complement of vector Y . This traditionalmeasure of correlation is not of direct interest to us since we

wish to relate a correlation measure to an empirical probabilitywhich is strictly positive.

To this end, let us define rn = xn ⊕ yn as the XOR of then-th component of the vectors X and Y . Similarly, we defineR = X ⊕ Y whereby R is obtained via componentwise XORof the components of the vectors X and Y . Let the number ofplaces in which X and Y agree be γ and define the empiricalcross-correlation between these two vectors as ρ = γ/k. Notethat with this definition, we may take ρ as the empiricalprobability that rn = 0 and a value of for instance ρ = 0.7 andρ = 0.3 would have the same information content as measuredby the value of the binary entropy function. Notice that,since intrinsic soft information is generated from probabilityestimates, this measure of correlation is more useful to us thanthe traditional definition.

Let us consider the Slepian-Wolf problem discussed above,and suppose that what is available at the receiver is a noisyversion of X denoted X . For instance, X could be anerroneous version of X obtained after transmission througha noisy channel modelled as a Binary Symmetric Channel(BSC) with transition probability pe. We assume that the errorevents inflicting the sequence X are independent identicallydistributed (i.i.d.). The joint decoder generates an empiricalestimate of the cross-correlation based on the use of thesequences X and Y by forming the vector R = X ⊕ Yand counting the number of places where R is zero. Let usdenote this count as γ. Clearly, γ is a random variable. Thequestion is, what is the Probability Mass Function (PMF) ofγ? Knowledge of this PMF allows us to assess the sensitivityof our estimate of the cross-correlation to errors in the originalsequence X .

It is relatively straightforward to express the probability that(rn = rn) and (rn 6= rn) as P (rn = rn) = P (xn = xn) =1− pe, P (rn 6= rn) = P (xn 6= xn) = pe. Consider applyinga permutation to the sequences X and Y so that the permutedsequences agree in the first γ locations, and disagree in theremaining (k − γ) locations. The permutation is applied tosimplify the explanation of how we may go about obtainingthe PMF of γ and by no means impacts the results. It isevident that the permuted sequence π(R) contains γ zeros inthe first γ locations and (k−γ) ones in the remaining locations.Now consider evaluation of the probability P (γ = γ + ν)for ν = 0, 1, .., (k − γ). We define π(R)γ to represent thefirst γ bits of π(R) and π(R)k−γ the remaining (k − γ) bits.Similarly we define π(R)γ and π(R)k−γ . For a fixed ν, theevent {γ = γ+ν} corresponds to the union of the events of thetype: π(R)k−γ differs from π(R)k−γ in (ν + l) positions forsome l ∈ {0, 1, .., γ}, π(R)γ differs from π(R)γ in l positions,and the remaining bits of π(R) and π(R) are identical. Theprobability of such elementary events are given by:

(γl

)(k − γν + l

)(1− pe)k−ν−2l(pe)ν+2l. (17)

It can be shown that the probability of the event {γ = γ+ν}for ν = 0, 1, .., (k − γ) is given by:

P (γ = γ + ν) =γ∑

l=0

[(γl

)(k − γν + l

)·

43



0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 10

0.002

0.004

0.006

0.008

0.01

0.012

Empirical ρ

P(ρ

)

ρ=0.9−pe=10−1

ρ=0.9−pe=10−2

ρ=0.9−pe=10−3

ρ=0.9−pe=10−4

ρ=0.9−pe=10−5

0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.880

0.002

0.004

0.006

0.008

0.01

Empirical ρ

P(ρ

)

ρ=0.8−pe=10−1

ρ=0.8−pe=10−2

ρ=0.8−pe=10−3

ρ=0.8−pe=10−4

ρ=0.8−pe=10−5

Fig. 5. Probability mass function (PMF) of ρ for three different values ofraw error rate p when the true cross-correlations between the data packets areρ = 0.8 and 0.9 at data block length of k = 16400.

·(1− pe)k−ν−2l(pe)ν+2l], (18)

while, for m = 1, 2, .., γ we have:

P (γ = γ −m) =γ∑

l=m

[(γl

)(k − γl −m

)·

·(1− pe)k−2l+m(pe)2l−m]. (19)

In the next section, we shall present simulation results foriterative joint decoding of correlated sources for data packetsize of k = 16400. Hence, let us look at representative resultsassociated with the estimation of ρ for this block length:

1) Fig. 5 depicts the PMF of ρ for five different valuesof raw error rate pe when the true cross-correlationsbetween the data packets are ρ = 0.8 and 0.9 at datablock length of k = 16400. The following observationis in order: there is a bias in the empirical estimate ofρ as measured by the most probable value of ρ and thetrue value of ρ. This bias is a strong function of pe.

2) As noted above, the most likely value of ρ, denotedM(ρ) (i.e., the Mode), obtained from evaluation of theempirical cross-correlation from noisy received vectorsis not necessarily the true value ρ. This is particularlyso at larger values of pe and for small and large valuesof ρ. Fig. 6 captures this behavior for three values ofpe = 10−1, 10−2 and pe = 10−3 as a function of ρfor block length k = 16400. In particular, this figureshows the difference (ρ−M(ρ)) versus ρ obtained fromempirical evaluation of the cross-correlation from noisyreceived vectors.

3) The standard deviation of ρ is a strong function of pe

itself. Fig. 7 depicts the standard deviation of ρ as afunction of pe for k = 16400 for three different values ofρ. This figure shows that the standard deviation increasesslowly with increasing pe. Note that even at values ofpe as large as p = 0.1 this standard deviation is stillrelatively small for ρ in the range ρ = 0.1 to ρ = 0.9.

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9−0.1

−0.08

−0.06

−0.04

−0.02

0

0.02

0.04

0.06

0.08

ρ

ρ−E

[ρ]

pe=10−1

pe=10−2

pe=10−3

Fig. 6. The difference (ρ−M(ρ)) versus ρ whereby M(ρ) denotes the mostprobable value of ρ obtained from empirical evaluation of the cross-correlationfrom noisy received vectors (block length k = 16400).

0 0.05 0.1 0.15 0.2 0.25 0.3 0.351

1.5

2

2.5

3

3.5

4x 10

−3

pe

σ ρ

ρ=0.8ρ=0.9ρ=0.98

Fig. 7. Standard deviation of ρ as a function of p (block length k = 16400).

V. SIMULATION RESULTS AND COMPARISONS

We have simulated the performance of our proposed iter-ative joint source decoder. In order to compare our resultswith others proposed in the literature, we follow the sameframework as in [10], [15], [16].

In the following, we provide sample simulation resultsassociated with various (n, k) LDPC codes designed with thetechnique proposed in [28]. In particular, for a fair comparisonwith the results provided in [16], we designed various LDCPcodes with k = 16400 and k = 16000. The details and theparameters of the designed LDPCs are given in Tables I-II.

Parameters given in Table I are the source block length k,the codeword length n, the rate R1 of the source, expressedas n−k

k (i.e., inverse of the compression ratio), the averagedegree dv of the bit nodes, and the average degree dc of thecheck nodes of the designed LDPCs. Note that, the encodingprocedure adopted in our approach is different from the oneproposed in [16] in that we source encode k bits at a time

44



TABLE IPARAMETERS OF THE DESIGNED LDPCS.

LDPC k n R1 dv dc

L1 16400 30750 0.875 2.8 6L2 16400 26200 0.597 3 8L3 16400 22400 0.365 3.21 12L4 16000 20000 0.25 3.59 18L5 16400 20300 0.237 3.45 18L6 16400 20000 0.219 3.23 18L7 16400 19300 0.176 3.0 20L8 16400 19500 0.189 3.0 19

and transmit only n − k bits. In [16], the authors proposeda source compression which encodes n source bits at a time,and transmits n− k syndrome bits.

The degree distributions of the designed LDPC codes,labelled L1 to L8 in Table I, are shown in Table II whereby,the integer number multiplying any power of the indeterminatexw identifies the number of bit nodes in λLr (x), and checknodes in ρLr

(x) having degree w + 1 in the r-th designedLDPC with r = 1, . . . , 8.

For local decoding of the LDPC codes, the maximumnumber of local iterations has been set to 50, while themaximum number of global iterations is 5, even though thestopping criterion discussed in the previous section has beenadopted.

The results on the compression achieved with the proposedalgorithm are shown in Table III. The first row shows thecorrelation parameter assumed, that is, the probability p =P (xj 6= yj), ∀j = 1, . . . , k. The second row shows thejoint entropy limit as expressed in (6) for various values ofthe correlation parameter p. The third and fourth rows showthe results on source compression presented in papers [10],[16], while the last row presents the results on compressionachieved with the proposed algorithm employing a maximumof 5 global iterations in conjunction with using the stoppingcriterion noted in the previous section. As in [16], we assumeerror free compression for a target Bit Error Rate (BER) 10−6.Note that statistic of the results shown has been obtained bycounting 30 erroneous frames.

From Table III it is evident that significant compressiongains with respect to the theoretical limits can be achieved asthe correlation between sequences X and Y increases. Noticethat as opposed to the algorithm proposed in [16], we do notassume p is known at the decoder. Indeed, this parameter isiteratively estimated as discussed in the previous section. Asimilar approach has been pursued in [10], whereby an itera-tive approach is employed for the estimation of the correlationbetween the two correlated sequences, but employing turbocodes.

Finally, note that we employ integer soft-metrics as ex-plained in the previous section, while in [10], [16], to thebest our knowledge, the authors employ real metrics. Thealgorithm working on integer metrics is very fast and reducesconsiderably the complexity burden required by the two-stageiterative algorithm (i.e., the local-global combination).

Fig. 8 shows the BER performance of the proposed iterativedecoding algorithm for various global iterations and as a

TABLE IIDEGREE DISTRIBUTIONS OF THE DESIGNED LDPCS.

λL1 (x) 1 + 14349x + 15400x2 + 200x3 + 800x12

ρL1 (x) x4 + 14349x5

λL2 (x) 1 + 9799x + 11000x2 + 4700x3 + 700x9

ρL2 (x) x6 + 9799x7

λL3 (x) 1 + 5999x + 13700x2 + 1800x3 + 900x12

ρL3 (x) x10 + 5999x11

λL4 (x) 1 + 3999x + 12600x2 + 1300x3 + 2100x9

ρL4 (x) x16 + 3999x17

λL5 (x) 1 + 3899x + 12800x2 + 2400x3 + 1200x11

ρL5 (x) x16 + 3899x17

λL6 (x) 1 + 3599x + 14400x2 + 1200x3 + 800x11

ρL6 (x) x16 + 3599x17

λL7 (x) 1 + 2899x + 15400x2 + 750x3 + 250x11

ρL7 (x) x18 + 2899x19

λL8 (x) 1 + 3099x + 15900x2 + 500x9

ρL8 (x) x17 + 3099x18

1.105 1.11 1.115 1.12 1.125 1.13 1.135 1.14 1.145 1.1510

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

H(X,Y)

BE

R

Th.limit 1.1076Th.limit 1.1171Th.limit 1.11751−glob.it max3−glob.it max5−glob.it max

Fig. 8. BER performance of the proposed iterative decoding algorithm forvarious global iterations as a function of the joint entropy between sourcesX and Y , when the stopping criterion for global iterations is applied. Resultsrefer to the LDPC labelled L8 in Tables I-II, guaranteeing a compression rateof R1 = 0.189. For a target BER= 10−6, we also show the theoretical limitas expressed by the Slepian-Wolf bound achieved by the iterative algorithmapplied to the LDPC L8 for increasing values of the maximum number ofglobal iterations.

function of the joint entropy between sources X and Y ,when the stopping criterion for global iterations is appliedand the maximum number of global iterations specified inthe legend are used. The LDPC used for encoding is the onelabelled L8 in Tables I-II which guarantees a compressionrate of R1 = 0.189. At BER= 10−6, the overall rateR = R1 + RY = 1.189 is within 0.075 of the theoreticallimit of 1.1175.

In order to assess the effects of the global iterations onBER performances, we simulated the same LDPC L8 withoutstopping criterion on the global iterations. The resulting BERperformance are shown in Fig. 9. Notice that 3- or 4- globaliterations suffice to get almost all that can be gained from theknowledge of the cross-correlation.

In Table IV, we show the average number of local iterations

45



TABLE IIICOMPRESSION RATE PERFORMANCE OF THE ITERATIVE ALGORITHM FOR VARIOUS JOINT ENTROPIES .

p 0.01 0.02 0.025 0.02875 0.05 0.1 0.2H(p) + 1 1.08 1.141 1.169 1.188 1.286 1.469 1.722R [10] - - 1.31 - 1.435 1.63 1.89R [16] - - 1.276 - 1.402 1.60 1.875R = R1 + RY 1.176-L7 1.219-L6 1.237 -L5 1.25-L4 1.365-L3 1.597-L2 1.875-L1

1.105 1.11 1.115 1.12 1.125 1.13 1.135 1.14 1.145 1.1510

−7

10−6

10−5

10−4

10−3

10−2

10−1

H(X,Y)

BE

R

Th.limit 1.1076Th.limit 1.1171Th.limit 1.11751−glob.it com2−glob.it com3−glob.it com4−glob.it com5−glob.it com

Fig. 9. BER performance of the proposed iterative decoding algorithm forvarious global iterations as a function of the joint entropy between sourcesX and Y , without stopping criterion on global iterations. Results refer tothe LDPC labelled L8 in Tables I-II, guaranteeing a compression rate ofR1 = 0.189. For a target BER= 10−6, we also show the theoretical limitas expressed by the Slepian-Wolf bound achieved by the iterative algorithmapplied to the LDPC L8 for increasing values of the maximum number ofglobal iterations.

performed by the joint decoder at the end of a given globaliteration, for various values of correlation between the sourcesand when the stopping criterion on the global iterations isnot applied. Finally, we evaluated the average number ofglobal iterations performed by the iterative algorithm whenthe stopping criterion on global iterations is employed dur-ing decoding. Simulation results show that when the LDPCdecoder works at BER levels less than or equal to 10−5,the average number of global iterations equals 1.2, thusguaranteeing a very efficient iterative approach to the co-decompression problem. In other words, an overall averagenumber of 80 LDPC decoding iterations suffices to obtaingood BER performance.

In the following set of simulation results, we adoptedanother approach in order to test the proposed algorithm forvarying actual correlation levels. In particular, for any givenvalue of mean correlation p, we generate a uniform randomvariable having mean value equal to the mean correlationitself and with a maximum variation of ∆p around thismean value. We used the following maximum variations:∆p = 0.5, 0.2, 0.1%, and ∆p = 0.0% which refers to thecase in which the correlation value is not variable, but fixed.

For each data block, we set the actual correlation equalto the mean correlation plus this perturbation. The decoder

TABLE IVAVERAGE NUMBER OF LOCAL ITERATIONS FOR LDPC L8 DECODING

WITH MAXIMUM LOCAL ITERATIONS PARAMETER SET TO 50, FORDIFFERENT GLOBAL ITERATION INDICES.

p H(X, Y ) 1-st it. 2-nd it. 3-rd it. 4-th it.0.0213 1.1487 50.00 50.00 50.00 50.000.0204 1.1437 50.00 50.00 50.00 50.000.0195 1.1386 50.00 46.16 46.12 46.110.0186 1.1335 50.00 39.51 38.45 34.580.0177 1.1283 42.47 23.65 23.35 22.310.0168 1.1231 34.73 16.16 16.09 15.170.0159 1.1178 22.97 12.59 12.58 12.57

1.08 1.1 1.12 1.14 1.16 1.18 1.2 1.2210

−6

10−5

10−4

10−3

10−2

10−1

H(X,Y)

BE

Rp=0.015−∆ p=0.5%p=0.015−∆ p=0.2%p=0.015−∆ p=0.1%p=0.015−∆ p=0.0%p=0.015−∆ p=0.1%*p=0.025−∆ p=0.5%p=0.025−∆ p=0.2%p=0.025−∆ p=0.1%p=0.025−∆ p=0.0%p=0.025−∆ p=0.1%*

Fig. 10. BER performance of the proposed iterative decoding algorithmfor a maximum of 5 global iterations as a function of the joint entropybetween sources X and Y , when the stopping criterion for global iterationsis applied. Results refer to the LDPCs labelled L8 and L5 in Table I. Thelegend shows the mean correlation value p and the maximum value of thecorrelation variation with respect to the mean value. Curves labelled with∗ refer to the ones obtained without the iterative paradigm, using the meancorrelation value.

iterates to estimate the actual correlation value which variesaround its mean value from one block to the next. In effect,the parameter p is iteratively estimated as discussed in theprevious section. A similar approach has been pursued in [10]for fixed correlation level, whereby an iterative approach isused for the estimation of the correlation between the twocorrelated sequences, but employing turbo codes.

Fig. 10 shows the BER performance of the proposed itera-tive decoding algorithm for a maximum of 5 global iterationsand as a function of the joint entropy between sources X andY , when the stopping criterion for global iterations is applied.LDPCs used for encoding are the one labelled L5 and L8 in

46



1.27 1.28 1.29 1.310

−6

10−5

10−4

10−3

10−2

10−1

H(X,Y)

BE

R

p=0.05−∆ p=0.5%p=0.05−∆ p=0.2%p=0.05−∆ p=0.1%p=0.05−∆ p=0.0%

1.47 1.48 1.4910

−6

10−5

10−4

10−3

10−2

10−1

H(X,Y)

BE

R

p=0.1 −∆ p=0.5%p=0.1 −∆ p=0.2%p=0.1 −∆ p=0.1%p=0.1 −∆ p=0.0%

Fig. 11. BER performance of the proposed iterative decoding algorithm fora maximum of 5 global iterations as a function of the joint entropy betweensources X and Y , when the stopping criterion for global iterations is applied.Results refer to the LDPCs labelled L3 (left subplot) and L2 (right subplot)in Table I. The legend shows the mean correlation value p and the maximumvalue of the correlation variation with respect to the mean value.

Table I which guarantee compression rates of RX = 0.237and RX = 0.189, respectively. LDPC labelled L5 is used atmean values of p equal to 0.025, while LDPC L8 is adoptedfor a mean correlation of 0.015. From Fig. 10 one clearly seesthat LDPC decoding does not converge when the decoder doesnot iterate for estimating the actual value of p, but uses onlyits mean value for setting the extrinsic information. Noticealso that the performances of the iterative decoder when thecorrelation value is fixed (curves labelled ∆p = 0.0 in Fig. 10),are very close to the case in which the actual correlation valuevaries within ∆p = 0.1% from the mean value.

Similar considerations can be deduced from Fig. 11 whichshows the BER performance of the proposed iterative decodingalgorithm when using LDPCs labelled L2 and L3 in Table Iwhich guarantee compression rates of RX = 0.597 andRX = 0.365, respectively. LDPC labelled L2 is used at meancorrelation equal to 0.1, while LDPC L3 is used with a meancorrelation of 0.05. Note that the performance degrades as∆p increases since the encoder works further away from itsoptimal operating point.

The results on the compression achieved with the proposedalgorithm are shown in Table V for the case in which the cor-relation value is fixed. The first row shows the fixed correlationparameter assumed, namely, p = P (xj 6= yj), ∀j = 1, . . . , kin our model. The second row shows the joint entropy limitfor various values of the fixed correlation parameter p. Thethird row presents the results on compression achieved withthe proposed algorithm employing a maximum of 5 globaliterations in conjunction with using the stopping criterionnoted in the previous section. As in [16], we assume error freecompression for a target Bit Error Rate (BER) 10−6. Note thatstatistic of the results shown has been obtained by counting30 erroneous frames.

From Table V it is evident that significant compression gains

TABLE VCOMPRESSION RATE PERFORMANCE OF THE ITERATIVE ALGORITHM FOR

H(p) = 0.112.

p 0.015H(p) + 1 1.112R = RX + RY 1.189 -L8

with respect to the theoretical limits can be achieved as thecorrelation between sequences X and Y increases.

VI. CONCLUSION

In this paper we have presented a novel iterative joint de-coding algorithm based on LDPC codes for the Slepian-Wolfproblem of compression of correlated information sources.

In the considered scenario, two correlated sources transmittoward a common receiver. The first source is compressedby transmitting the parity check bits of a systematic LDPCencoded codeword, without the knowledge of the correlationamong the sources. The correlated information of the secondsource is employed as side information at the receiver andused for decompressing and decoding of the first source. Thecorrelation is not known a-priori, but is estimated through aniterative paradigm. Both the iterative decoding algorithm andthe cross-correlation estimation procedure have been describedin detail.

Simulation results suggest that relatively large compressiongains are achievable at relatively small number of globaliterations specially when the sources are highly correlated.

VII. ACKNOWLEDGEMENTS

This work was partially supported by the PRIN 2007 projectFeasibility study of an optical Earth-satellite quantum com-munication channel and by the CAPANINA project (FP6-IST-2003-506745) as part of the EU VI Framework Programme.

REFERENCES

[1] F. Daneshgaran, M. Laddomada, and M. Mondin, “LDPC-based iterativealgorithm for compression of correlated sources at rates approaching theSlepian-Wolf bound,” In Proc. of the Int. Conf. SPACOMM 2009, pp.74-79, Colmar, France, July 2009.

[2] D. Slepian and J. Wolf, “Noiseless coding of correlated informationsources,” IEEE Trans. on Inform. Theory, vol.IT-19, no. 4, pp.471-480,July 1973.

[3] L. J. Guibas, “Sensing, tracking, and reasoning with relations,” IEEESignal Processing Magazine, no.3, pp.73-85, March 2002.

[4] S. S. Pradhan, J. Kusuma, and K. Ramchandran, “Distributed com-pression in a dense microsensor network,” IEEE Signal ProcessingMagazine, no.3, pp.51-60, March 2002.

[5] F. Zhao, J. Shin, and J. Reich, “Information-driven dynamic sensor col-laboration,” IEEE Signal Processing Magazine, no.3, pp.61-72, March2002.

[6] A. Aaron and B. Girod, “Compression with side information using turbo-codes,” In Proc. of Data Compress. Conf., DCC’02, pp.252-261, April2002.

[7] J. Bajcsy and P. Mitran, “Coding for the Slepian-Wolf problem withturbo codes,” In Proc. of Global Telecomm. Conf., GLOBECOM’01,vol.2, pp.1400-1404, Nov. 2001.

[8] J. Bajcsy and P. Mitran, “Coding for the Wyner-Ziv problem with turbo-like codes”, In Proc. of Inter. Symposium on Infor. Theory, ISIT’02,pp.91, 2002.

47



[9] J. Bajcsy and I. Deslauriers, “Serial turbo coding for data compressionand the Slepian-Wolf problem”,In Proc. of Inform. Theory Workshop,pp.296-299, March 31-April 4 2003.

[10] J. Garcia-Frias and Y. Zhao, “Compression of correlated binary sourcesusing turbo-codes,” IEEE Communications Letters, vol.5, no.10, pp.417-419, October 2001.

[11] Y. Zhao and J. Garcia-Frias, “Joint estimation and compression ofcorrelated nonbinary sources using punctured turbo codes,” IEEE Trans-actions on Communications, vol.53, no.3, pp.385-390, March 2005.

[12] A. D. Liveris, Z. Xiong, and C. N. Georghiades, “A distributed sourcecoding technique for correlated images using turbo-codes,” IEEE Com-munications Letters, vol.6, no.9, pp.379-381, September 2002.

[13] A. D. Liveris, Z. Xiong, and C. N. Georghiades, “Joint source-channelcoding of binary sources with side information at the decoder using IRAcodes,” In Proc. of IEEE 2002 Multimedia Signal Processing Workshop,December 2002.

[14] A. D. Liveris, Z. Xiong, and C. N. Georghiades, “Distributed com-pression of binary sources using conventional parallel and serial con-catenated convolutional codes,” In Proc. of IEEE Data CompressionConference, 2003.

[15] A. D. Liveris, Z. Xiong, and C. N. Georghiades, “Compression of binarysources with side information at the decoder using LDPC codes,” IEEECommunications Letters, vol.6, no.10, pp.440-442, October 2002.

[16] A. D. Liveris, Z. Xiong, and C. N. Georghiades, “Compression of binarysources with side information using low-density parity-check codes,” InProc. of IEEE GLOBECOM 2002, vol. 2, pp.1300-1304, 17-21 Nov.2002.

[17] J. Garcia-Frias, “Joint source-channel decoding of correlated sourcesover noisy channels,” In Proc. of DCC 2001, pp.283-292, March 2001.

[18] J. Garcia-Frias, W. Zhong, and Y. Zhao, “Iterative decoding schemes forsource and joint source-channel coding of correlated sources,” In Proc.of Asilomar Conference on Signals, Systems, and Computers, November2002.

[19] F. Daneshgaran, M. Laddomada, and M. Mondin, “Iterative joint channeldecoding of correlated sources employing serially concatenated convo-lutional codes,” IEEE Transactions on Information Theory, vol.51, no.7,pp.2721-2731, July 2005.

[20] F. Daneshgaran, M. Laddomada, and M. Mondin, “Iterative joint channeldecoding of correlated sources,” IEEE Transactions on Wireless Com-munications, Vol. 5, No. 10, pp.2659-2663, October 2006.

[21] F. Daneshgaran, M. Laddomada, and M. Mondin, “LDPC-based channelcoding of correlated sources with iterative joint decoding,” IEEE Trans-actions on Communications, Vol. 54, No. 4, pp.577-582, April 2006.

[22] J. Barros and S.D. Servetto, “The sensor reachback problem,” Sub-mitted to IEEE Transactions on Information Theory, available online athttp://cn.ece.cornell.edu/publications/papers/20031112, November 2003.

[23] X. Zhu, L. Zhang, and Y. Liu, “ A distributed joint source-channelcoding scheme for multiple correlated sources,” Fourth InternationalConference on Digital Communications and Networking in China, 2009,pp. 1 - 6.

[24] W. Yunnan, V. Stankovic, Z. Xiong, and S. Kung, “ On practical designfor joint distributed source and network coding,” IEEE Transactions onInformation Theory, Vol. 55, No. 4, pp. 1709 - 1720, April 2009.

[25] J. Hagenauer, E. Offer, and L. Papke, “Iterative decoding of binary blockand convolutional codes,” IEEE Trans. on Inform. Theory, vol.42, no.2,pp.429-445, March 1996.

[26] T.J. Richardson and R.L. Urbanke, “Efficient encoding of low-densityparity-check codes,” IEEE Transactions on Information Theory, vol.47,no.2, pp.638 - 656, Feb. 2001.

[27] G. Montorsi and S. Benedetto, “Design of fixed-point iterative decodersfor concatenated codes with interleavers,” IEEE Journal on SelectedAreas in Communications, vol.19, No. 5, pp.871-882, May 2001.

[28] Xiao-Yu Hu, E. Eleftheriou, and D.M. Arnold, “Progressive edge-growthTanner graphs,” IEEE Global Telecommunications Conference, vol.2,pp.995-1001, 25-29 Nov. 2001.

[29] Xiao-Yu Hu, E. Eleftheriou, D.-M. Arnold, and A. Dholakia, “Efficientimplementations of the sum-product algorithm for decoding LDPCcodes,” IEEE Global Telecommunications Conference, vol.2, pp.25-29,25-29 Nov. 2001.

48



Scalable and Robust Wireless JPEG 2000 Images and Video Transmission withAdaptive Bandwidth Estimation

Max Agueh, Stefan Atamanand Cristina Mairal

LACSC – ECE37, Quai de Grenelle75015 Paris, France

{agueh, ataman, mairal}@ece.fr

Henoc SoudeLIASD – Université Paris 8

2, Rue de la liberté93526 Saint Denis, [email protected]

Abstract— This paper presents a scalable and dynamic schemefor robust JPEG 2000 images/video transmission over wirelesschannels. The proposed system relies on an adaptivebandwidth estimation tool to select the suitable JPEG 2000streams, layers and resolution. A Wireless JPEG 2000 (JPWL)compliant Forward Error Correction (FEC) scheme isintroduced. An optimal layer oriented Unequal ErrorProtection Forward Error Correction rate allocation scheme isproposed to ensure codestreams protection againsttransmission errors. The main advantages of the proposedscheme are its optimality, its compliance to Wireless JPEG2000 (the 11th part of JPEG 2000 standard) and its low timeconsumption. We demonstrate that our proposed schemeoutperforms the layer oriented FEC scheme proposed by Guoet al. and other existing layer based FEC schemes. We alsoshow that, due to its low run time, our layer based scheme is agood candidate for highly time constrained motion JPEG 2000video streaming applications, thus, in this sense, its overcomesthe limitation of optimal packet oriented FEC rate allocationscheme. We then validate the effectiveness of our proposedlayer oriented scheme with a Wireless Motion JPEG 2000client/server application.

Keywords: layer-oriented FEC; unequal error protection;layer scalability; wireless JPEG 2000; video streaming

I. INTRODUCTION

In high error rate environments such as wirelesschannels, data protection is mandatory for efficienttransmission of images and video. In this context, JPEG2000, the newest image representation standard [1] proposesin its 11th part (Wireless JPEG 2000 – JPWL) [2] differenttechniques such as data interleaving, FEC with Reed-Solomon (RS) codes etc. in order to enhance the protectionof JPEG 2000 codestreams against transmission errors.

Since wireless channels' characteristics depend on thetransmission environment, the packet loss rate in the systemalso changes dynamically. Thus a priori FEC rate allocationschemes such as the one proposed in [3] are less efficient.

Moreover, in wireless multimedia systems such as theone considered in this paper (see Figure 1), a straightforwardFEC methodology is used, by applying FEC uniformly overthe entire stream (Equal Error Correction - EEP). Howeverin [4], Gupa et al. suggest that for hierarchical codes, such asJPEG 2000, Unequal Error Protection (UEP) which assignsdifferent FEC to different portion of codestream has beenconsidered as a suitable protection scheme.

Figure 1. Wireless video streaming system

Two families of data protection schemes address thisissue by taking the wireless channel characteristics intoaccount in order to dynamically assign the FEC rate forJPEG 2000 based images/video. The first family is based ona dynamic layer-oriented unequal error protectionmethodology whereas the second relies on a dynamic packet-oriented unequal error protection methodology. Hence, in thefirst case, powerful RS codes are assigned to the mostimportant layers and less robust codes are used for theprotection of less important layers. It is worth noting that inthis case, all the JPEG 2000 packets belonging to the samelayer are protected with the same selected RS code.Examples of layer-oriented FEC rate allocation schemes areavailable in [4] and [5]. On the other side, in packet-orientedFEC rate allocation schemes such as the one presented in [6],RS codes are assigned in descending order of packetsimportance. In [6], we demonstrate that the proposed optimalpacket-oriented FEC rate allocation is more efficient than thelayer-oriented FEC rate allocation scheme presented in [4]and [5]. However, layer-based FEC rate allocation schemeshave low complexity while packet-oriented FEC allocationmethodologies are complex especially when the number ofpackets in the codestream is high. In this case, packetoriented FEC schemes are unpractical for highly time-constrained images/video streaming applications. Therefore,switching to a layer oriented FEC rate allocation scheme ismore interesting. The smart FEC rate allocation schemeproposed in [7] addresses this issue by allowing switchingfrom a packet oriented FEC scheme to a layer orientedscheme such as the ones proposed in [4] and [5]. However,to our knowledge, existing layer oriented FEC rate allocationschemes are based on heuristics and thus are suboptimalbecause they rely on parameters which are empirically fixed.For example, the layer oriented FEC scheme proposed in [4]

relies on the permissible error rate ( ) whereas the layeroriented FEC scheme proposed in [5] relies on a Quality ofService (QoS) metric (

0 ).

802.11-based network

Videocamcorder

VideoServer

WirelessClient

DecoderJPWL

EncoderJPWL

49



In this paper, based on the optimal packet oriented FECmethodology presented in [6], we propose an optimal layeroriented FEC rate allocation scheme for robust JPEG 2000codestreams transmission over wireless channels.

The paper is organized as follows. In Section II, anoverview of the video streaming system and a presentation oflayer based FEC rate allocation schemes are provided. InSection III and IV, we describe the optimal layer orientedFEC scheme and the available bandwidth estimation scheme.We show the performances achieved by our proposed layerbased FEC scheme in Section V. Finally, some conclusionsare drawn in Section VI.

II. OVERVIEW OF THE STREAMING SYSTEM AND OF THE

JPEG 2000 BASED FEC RATE ALLOCATION SCHEMES

This section is dedicated to the description of theconsidered wireless JPEG 2000 video streaming system andto the presentation of existing FEC rate allocation schemes.

A. The Wireless JPEG 2000 video streaming system andthe wireless channel

Unlike the system described in [6], where the FEC rateallocation scheme is packet oriented, in the current paper weconsider a layer oriented FEC rate allocation scheme. Inother words the difference between both systems is the FECrate allocation module. In the packet oriented scheme theredundancy is added by taking the packets importance intoaccount (see Figure 2) while in our layer oriented scheme,we rely on layer importance in order to allocate the adequateRS codes (see Figure 3).

Figure 2. A JPEG 2000 codestreams transmission through the JPWLpacket-oriented FEC rate system

Figue 3. A JPEG 2000 codestreams transmission through the JPWLlayer-oriented FEC rate system

The wireless channel is emulated by real wirelesschannel traces available in [13]. At the encoder side, theGilbert model [14] is used to derive application level modelsof error occurrences in the considered traces. A detaileddescription of the platform used to generate the loss patternsalong with an analysis on wireless channel modeling withGilbert model is provided in [6].

B. Overview of JPEG 2000 based FEC rate allocationschemes

In this section we present an overview of JPEG 2000based FEC schemes.

1) Packet-oriented FEC rate allocation schemeIn this FEC rate allocation scheme the correct decoding

of packet i at the receiver yields a reduction of the distortion

on the transmitted image. Since ,iRD is considered as the

reduction of distortion achieved when packet i is protected atthe level , the gain is the ratio between the image quality

improvement ,iRD and the associated cost in terms of

bandwidth consumption.For each JPEG 2000 image, the optimal packet-oriented

algorithm models the channel with a Gilbert model and foreach possible protection level , its evaluates the probability

of incorrect word decoding. Then, for each packet i , basedon the estimated probability of decoding word, the reductionof distortion associated with the decoding of packet i ,

protected from level to max is estimated. The gain yield

by the associated increment of quality is also computed.After ordering those gain values by decreasing order ofimportance, the optimal protection rate for each packet isselected up to meet the rate constraint.

2) Layer-oriented FEC rate allocation scheme

In layer-oriented FEC rate allocation schemes ([4] and [5]),unequal error protection is applied by taking the importanceof each layer into account. Those families of FEC rateallocation schemes can be divided in two groups: non JPWLcompliant schemes and JPWL compliant schemes.

a) Non JPWL compliant Layer-oriented FEC rateallocation scheme

In non JPWL compliant layer-oriented FEC rate allocationschemes such as the one presented in [4], the slope of theRate-Distortion curve is used to select a code for each

quality layer. The slopejS corresponding to layer j is

expressed as:jjj RDS / where, jS represents the

contribution of layer j to the improvement of image

quality, jR is its length (in byte) andjD corresponds to

the distortion decrement measured by MSE (Mean SquareError).

Original codestreams

Layer-Oriented JPWL Protection

Layer 0 Layer 1 Layer 2Protection

50



Hence powerful non JPWL compliant RS codes are set tothe most important layers such as the base layer and theother layers are protected by decreasing order ofimportance. However, this algorithm is not JPWL compliantand was designed based on the assumption that the channelis a memoryless Binary Symmetric Channel (uncorrelatederror occurrence) which is not realistic because wirelesschannels have correlated errors sequences. Moreover, thealgorithm proposed by Z. Guo et al was not adaptive. In thiswork, we overcome this limitation by adapting the Z. Guoalgorithm for a dynamic selection of RS codes thanks to adynamic Bit Error Rate (BER) estimation scheme.

b) JPWL compliant dynamic Layer-oriented FEC rateallocation scheme

JPWL compliant FEC rate allocation schemes, such asthe one presented in [5], are based on the assumption thattransmitted JPEG 2000 image quality is linked to the amountof correctly decoded packets at the receiver. Hence, goal ofthis scheme is to maximize the overall throughput in thesystem under a Quality of Service (QoS) fixed parameter

( 0 ). The dynamic layer-oriented FEC rate allocation

scheme improves the performance by about 10% comparedto a priori selection of channel coding.

The layer oriented FEC rate allocation schemes presentedin this section are characterized by heuristics whoseparameters are set in order to achieve a desired Quality ofService. Hence, high QoS level means more data protectionand thus more bandwidth consumption. Since wirelessclient/server systems are bandwidth constrained, the desiredhigh QoS level may not be achieved because of the risk ofexceeding the available bandwidth. In this sense, traditionallayer based FEC rate schemes may be viewed as suboptimalas they achieve a less effective codestreams protection. Theproposed optimal layer oriented FEC rate allocation schemeovercomes these limitations by selecting the best RS codesyielding by this way the maximum achievable images/videoquality regarding the available resources in the system.

III. OPTIMAL LAYER ORIENTED FORWARD ERROR

CORRECTION RATE ALLOCATION

A. Layer Based FEC rate allocation problemformalization

Considering that JPEG 2000 codestreams are constituted

of a set of L layers, the optimal FEC allocation problem canbe resumed by answering the question: how to optimallyprotect each layer in order to minimize the transmitted imagedistortion under a rate constraint determined by the availablebandwidth in the system?

LetavB be the budget constraint in bytes corresponding

to the available bandwidth in the system. Letilay be the

length in bytes of thethi layer of the L layers available in

the codestream and ),( knRS the Reed-Solomon code used

for its protection. The corresponding protection level is and the FEC coding rate is nkR . We define

knRfec 1 as the inverse of the channel coding rate, so

feclayi )( represents, in bytes, the increase of thethi layer

length when protected at a level .

Unlike the packet oriented FEC scheme, where all the 16default RS codes, defined by JPWL standard, are consideredin the FEC rate allocation process, here we restrict theconsidered RS codes to those with 2fec . In other words

we consider only the first 10 default codes. This assumptionmakes sense in the context of layer oriented FEC rateallocation, because adding redundant data which in ratio ismore than twice superior to the original layers may overloadthe networks and drastically increase the losses instead ofreducing them.

Let be a layer protection level selected in the rangelaymax0 , each protection level corresponding to a

specific RS code selected between the 10 JPWL default RS

codes ( 0 means that the layer is not transmitted,

1 means transmission with protection level 1, higher

values imply increasing channel code capacity with and

10max lay ).

Let i be the number of data packet constituting thethi

quality layer of a JPEG 2000 codestream,0

ilayRD and

ilayRD be respectively the reduction of distortion associated

to the correct decoding of layer i and the reduction ofdistortion associated to the correct decoding of layeri protected to level . The reduction of distortion metric

associated to the correct decoding of the packets of a JPEG2000 codestream is extracted from a codestream index file.The reduction of distortion metric is further presented in [6].

We rely on this codestream index file to derive0

ilayRD and

we associated the decoding error probability estimation

process presented in [14] in order to derive

ilayRD . Hence,

the layer oriented FEC rate allocation problem is formalisedby:

Maximize

L

i ii

lay

feclay

RDi

)(

(3-1)

Subject to L

iavii Bfeclay )( (3-2)

We addressed this problem in [18] by proposing anoptimal layer oriented FEC rate allocation scheme. In thefollowing we present the proposed algorithm.

51



B. Optimization

We define

ilayG as the gain in quality of the transmitted

image obtained at the receiver side when layer i is decoded.We derive 1

ilayRD and

ilayRD the reduction of distortion

obtained when layer i is transmitted respectively withprotection level 1 and with protection level .We have:

011 )1(iii laylaylay RDPRD

0)1(iii laylaylay RDPRD (3-3)

where1

ilayP and

ilayP are the decoding error probabilities

obtained when layer i is protected respectively to level 1and to level . The final gain is:

i

laylay

i

lay

laylay

RDP

lay

RDG iii

i

011

1)1(

(3-4)

Similarly, any transmission between two consecutive

protection levels ( 1 and ) yields an improvement in

terms of reduction of distortion but has a budget cost equal

to ilayfecfec )( 1 , hence we have:

i

laylay

laylayfecfec

RDRDG ii

i )( 1

1

1

i

laylaylay

laylayfecfec

RDPPG iii

i

)(

)(

1

01

(3-5)

For each layer, a set of gain values is computed, orderedin decreasing order of importance and stored in a vector.Then, the FEC rate associated to the first gain value of eachvector is applied for the corresponding layer’s protectionwithout exceeding the available bandwidth. It is worth notingthat all the packets belonging to the same layer are protectedat the same FEC rate.

C. Contribution of the proposed optimal layer orientedFEC rate allocation scheme

Even if the gain metrics presented in the previous sectionseem close to the ones used in [6], they hold a fundamentaldifference because they rely on the contribution of each layerto the reduction of distortion instead of just taking intoaccount the contribution of a specific packet. Actually,during the source coding process, the incrementalcontribution from the set of image codeblocks is collected inquality layers. Due to the fact that the rate-distortioncompromises derived during JPEG 2000 truncation processare the same for all the codeblocks, for any quality layerindex i the contributions of quality layer 1 through qualitylayer i constitute a rate-distortion optimal representation ofthe entire image. Hence, at layer level the reduction ofdistortion values are strictly decreasing. In contrast, theselection of a specific JPEG 2000 packet does not guaranteethat the contributions of packet 1 to the selected index packetare monolithically decreasing. In this case, as confirmed byA. Descampe et al [15], some additional restrictions have tobe added to the considered convex-hull in order to ensurerate-distortion and cost-distortion optimality. This justifies

the necessity to have a merging step in the packet orientedFEC scheme [6] (it ensures that the convex-hull is alwaysconvex). In the layer oriented FEC this step is skippedbecause the reduction of distortion curve is alreadymonolithically decreasing, significantly reducing thecomplexity and thus the time-consumption of the FEC rateallocation algorithm. Moreover, in the proposed optimallayer oriented FEC scheme, we only consider the first 10 RScodes instead of considering all the 16 default RS codesdefined by JPWL standard as it is the case in [6]. This alsoreduces considerably the FEC scheme time consumption asits leads to less gains values computation which make theproposed optimal layer FEC rate allocation scheme a goodcandidate for real time images/video streaming applications.

In addition, the number of layers available in thecodestreams is another criterion which contributes to thereduction of the time consumption of our proposed FECscheme. Actually, a JPEG 2000 image extracted from aMotion JPEG 2000 video sequence is defined by ( L , R ,C )where L is the number of quality layers of the consideredimage , R is its resolution level corresponding to thedecomposition levels of the Discrete Wavelet Transform andC is the number of components. Assuming that theconsidered JPEG 2000 image is not spatially divided andthus is described by a unique tile, the number of data packetsavailable in the considered JPEG 2000 codestreams is givenby CRLS . In this context, the complexity of packetoriented FEC schemes [6] is based on the S data packetswhile the complexity of the optimal layer based FEC isbased on the L layers available in the codestreams. Inscalable JPEG 2000 images, since the number of layers issignificantly lower in comparison to the number of datapackets, the time consumption of our proposed layer orientedFEC scheme is significantly low in comparison to packetoriented scheme.

Algorithm:For each JPEG 2000 image- Model the channel with a Gilbert model and for eachpossible protection level ( 100 ), evaluate the

probability of incorrect word decoding

ilayP

- For 1i to Li (Number of JPEG 2000 layers)For 1 to 10

Estimate 0, )1(iii lay

ilaylay RDPRD

i

laylay

laylayfecfec

RDRDG ii

i )( 1

1

1

End ForEnd For- Order gain values in decreasing order of importance- Select each gain value, corresponding to a specificprotection level, up to meeting the rate constraint- Optimally protect JPEG 2000 layers with thecorresponding RS codesEnd For

52



IV. BANDWIDTH ESTIMATION AND IMAGES/VIDEO

SCALABILITY SCHEME

A. Available Bandwidth estimation

In the literature many authors investigate classificationsof the bandwidth estimation tools. In [7], R. Prasad et al.define four types of bandwidth estimation tools. The firstone, the Variable Packet Size technique (VPST) measuresthe capacity of individual hops. It uses ICMP (InternetControl Message Protocol) packets to measure the RTT(Roundtrip Time). It assumes that the minimum RTT meansno queuing delays. Hence queuing delays are not taken intoaccount to estimate capacity. The result is a straight line,whose slope equals 1/C. The second bandwidth estimationtool presented in [7] is the Probing Packet Pair DispersionTechnique (PPPDT) which measures the end-to-end capacityusing a packet pair with the same length and rate. Thecapacity is found using the formula:

out

LC

whereout is the dispersion of the packet pair at the

receiver.The third bandwidth estimation tool is the Self-loading

Periodic Streams Technique (SLoPST) which measures theavailable bandwidth sending packet with the same length butat different rates and calculating its one way delay. Themoment the delay starts to increase means the availablebandwidth has been exceeded.

The last bandwidth estimation tool is the Probing PacketPair Trains Dispersion Technique (PPPTDT) whichmeasures the available bandwidth and the capacity of an end-to-end path. It sends a train of packet pairs at graduallyincreasing packet rate and estimates its average dispersionrate at the receiver. The result is a curve where the slope is1/C and the inflexion point is the available bandwidth value.

In our work, we focus on a fast bandwidth estimationtechnique because it is more effective in tracking fastvarying wireless channels like the ones considered in ourscenarios.

B. Available Bandwidth estimation tool

The available bandwidth estimation tool which isimplemented in our system is called WBest (BandwidthEstimation Tool for Wireless Networks) and is presented in[8] by M. Li and C. Chang. It is a two step fast convergingand accurate bandwidth estimation tool. It uses the effective

capacity ( eC ) to estimate the available bandwidth. It takes

also into account cross-traffic impact. First, the effectivecapacity is measured using an improved Probing Packet PairTechnique. Actually, in this technique, the median of thedispersion of the train of packet pairs sent is calculated.Then, another packet train is sent at the rate of the effectivecapacity in order to evaluate the real available bandwidth.This procedure avoids the additional delay which is yieldedby the incremental packet rate generated while seeking forthe rate which congests the path.

In this context, the available bandwidth ( AB) is derivedas follows:

)2(out

ee

CCAB

if there is no congestion

or 0AB if congestion is detected

It is worth noting that during the second step, congestionis detected by analyzing the dispersion of the packet train. If

the dispersion is lower than 2/eC , it means that

congestion occurred. In this case, the bandwidth estimationprocess is cancelled because packets are queuing.

The authors of this work pointed out that finding theoptimal lengths of the trains used in both stages is a difficultissue, and in [8], they propose a methodology to address thisissue. Relying in the methodology proposed in [8], weempirically derive that in our scenarios, 6 packets pairs forthe first train and 30 packets for the second train is a goodtradeoff which yields sufficiently accurate bandwidthestimation results.

Once the available bandwidth estimated in our system,the following step consists in adapting the images and videostreams to the channel conditions.

In the present work we implement an adaptive bandwidthestimation tool and propose an additional scalability tool atthe encoder which dynamically and efficiently selects thebest resolution and layer for each JPEG 2000 frame beforetransmission through the wireless channel. Hence, accordingto the estimated bandwidth, refinement layers could be addedor removed from JPEG 2000 codestreams. We present in thefollowing the processes which are implemented at theencoder.

Algorithm:Once connected, the server starts the WBest [8] process

in order to obtain the initial available bandwidth. At this stepthe goal is to send images and video with maximum detail(highest resolution and all refinement layers) matching withthe estimated bandwidth. The original resolution and numberof layers for the considered video is found using an indexerlike the one available in [9].

In our work, the default number of resolution is 6, thelength and the width of the image must be a power of 2 (here352x288) and the number of layers is 3.

Let l be a layer of a JPEG 2000 image and lrateSE is

corresponding source encoding rate. Letk

nfec l

rate be the

inverse of the Reed-Solomon code RS(n,k) selected by thelayer oriented FEC rate allocation scheme to protect layer

l against transmission errors. Let lengthframe_ be the

amount of data needed to transmit layer l protected, wehave:

lrate

lrate fecSEWHlengthframe _ (4 -1)

53



Since quality is the most important parameter for a JPEG2000 transmission system, in our algorithm, for a givenresolution we try to transmit a maximum number ofrefinement layers.

The proposed scheme is able to adapt to channelvariations thanks to the bandwidth estimation tool. Hence,when the channel experiences good conditions, our heuristicalgorithm selects the highest resolution with the highestquality (all the refinement layers are transmitted). If thechannel experiences harsh conditions, image layers andresolution are decreased up to defined thresholds. Weempirically derive thresholds ( 2/maxl ) for layer removing

and for (maxmax / lresol ) resolution reduction because we notice

that images quality starts to show a huge degradation whenwe remove more than half of the original image layers andimage visualization becomes impossible under thisresolution reduction threshold. Hence, when the channelexperienced bad conditions, image layers are incrementallyreduced while maintaining original resolution of the JPEG2000 frame to highest level. However, if the correspondingframe length do not match the available bandwidth, imageresolution downscaling is done. It is worth noting that ourfixed thresholds are valid for our scenarios and may changein different environments.

Once the resolution and the number of layers are chosen,the server sends the video streaming to the client.

The available bandwidth estimation tool is launchedevery 10 frames but this frequency could be changedaccording to the application requirements. An interestingextension to this work could be to optimally adapt thefrequency of the bandwidth estimation tool to the channelconditions.

The efficiency of the proposed heuristic algorithm isdemonstrated using a wireless client/server video streamingapplication. In the following section, we present the resultsderived from different video streaming scenarios.

C. Results of Available Bandwidth estimation

The video streaming scenarios considered in this workare derived from wireless transmission trials used in theliterature for bandwidth estimation purpose.

The video sequence is speedway.mj2 [16] which is a352x288 motion JPEG2000 sequence constituted of 200JPEG 2000 frames with six resolutions and three layers each.

1) Scenario 1In the first scenario, the wireless channel considered is

derived from BART tool [10], which estimates the availablebandwidth in an end-to-end path where the bottleneck is awireless hop. It uses the Probing Packet Pair TrainsDispersion Technique and improves the system usingKalman filters to measure and track the changes.

In this scenario, we focus on the fast varying part of theestimated bandwidth. Moreover, we divide the bandwidthestimated by BART tool [10] by a parameter δ=2 in order toshow that our scheme is efficient even when the wirelesschannel experienced harsh conditions.

________________________________________________Heuristic ________________________________________1-Wait for client connection request

2-Estimate available bandwidth (avB ) using WBest tool

3-Derive original images/video maximum number of layers(

maxl ) and maximum resolution (maxresol ) from the indexer

For each JPEG 2000 image:

4-Initialize current layer parameters (max_ llaycur ) and

current resolution (max_ resolresolcur )

5-Calculate the needed bandwidthneededB

laycur

i

irate

irateneeded fecSEresolcurB

_

1

_

6- While ( 2/_ maxllaycur )

{If (

neededav BB ) Send Image

Else{

1__ laycurlaycur

Calculate

laycur

i

irate


_

1

_

}}Else{

While (

max

max_l

resolresolcur )

{

2

__

resolcurresolcur

Calculate

laycur

i

irate


_

1

_

If (neededav BB ) Send Image

Else

{2

__

resolcurresolcur

1__ laycurlaycur

}}

}

54



Figure 4.a Available bandwidth versus time – BART Tool scenario

Figure 4.b Images processed at point (1) and (2) - BART Tool scenario

Scenario 1ImageLength

ImageWidth

Layer oftransmitted

Images(1) 352 288 3

(2) 352 288 2

Table 1: Scalability parameters - BART Tool scenario

In figure 4.a, point (1) indicates that the estimatedbandwidth is higher than the needed bandwidth, hence initialJPEG 2000 frames are transmitted. Point (2) shows that theestimated bandwidth is decreased and becomes insufficientto send original images. Hence, the algorithm maintains theresolution at the highest level (initial value) but one layer isremoved from original frames as shown in Table 1.

In Figure 4.b we randomly select and present imagesspeedway10.j2k and speedway25.j2k which have beenprocessed respectively on points (1) and (2).

2) Scenario 2

The second scenario is derived from the work of Gupta etal. [11]. In their work, their compare a passive technique thatuses packets containing useful data as probing packets andthe pathchirp tool which is a general bandwidth estimationtool designed using Self-loading Periodic Streams (SLoPS).They demonstrated that the passive tool follows moreaccurately the changes of the available bandwidth.Moreover, they show that saturating the network to estimatethe bandwidth is not the best choice.

In this scenario, the network is a single link between twocomputers with two additional cross-traffics. The estimatedbandwidth has been divided by δ=7 in our tests.

In figure 5.a we observe three points corresponding todifferent available bandwidth along with adapted data rate.Point (1) indicates that the available bandwidth issignificantly low compared to the needed bandwidth. In thiscase resolution is downscaled without removing a qualitylayer. Then point (2) shows that when the availablebandwidth is increased, resolution is up-scaled whileremoving a quality layer. Finally, original frames aretransmitted at point (3).

Figure 5.a Available bandwidth versus time – Pathchirp tool scenario

Figure 5.b Images processed at point (1), (2) and (3) - Pathchirp toolscenario

Speedway25.j2k

Speedway10.j2k

Speedway5.j2k

Speedway13.j2k

Speedway18.j2k

(1)

(2)

Avai

lable

Ban

dw

idth

(Mbps)

(2)

(3)

(1)

Avai

lable

Ban

dw

idth

(Mbps)

55



ScenarioPathchirp tool

ImageLength

ImageWidth

Layer oftransmitted

Images(1) 176 144 0

(2) 352 288 1

(3) 352 288 0

Table 2: Scalability parameters - Pathchirp tool scenario

In the following, instead of using pathchirp tool forbandwidth estimation, we use a passive tool which is a moreaccurate tool [11]. Results are presented in Figure 6.a. Forthe same scenario, we notice that one additional bandwidthstate is detected in point (4) which demonstrates that thepassive tool is more accurate than pathchirp tool

Figure 6.a Available bandwidth versus time – Passive tool scenario

Figure 6.b. Images processed at point (1), (2), (3) and (4) - Passive toolscenario

In comparison to the pathchirp tool, using the passivetool yields image quality layer downscaling at point (4)instead of maintaining the transmission of unmodifiedoriginal images (see Table 3). Hence, using pathchirp toolmay lead to networks overloading.

ScenarioPassive tool

ImageLength

ImageWidth

Layer oftransmitted

Images(1) 176 144 1

(2) 352 288 1

(3) 352 288 0

(4) 352 288 1

Table 3: Scalability parameters - Passive tool scenario

Figure 6.b presents speedway5.j2k, speedway10.j2k,speedway14.j2k and speedway20.j2k which have beenprocessed respectively on points (1), (2), (3) and (4).

V. PERFORMANCES OF THE OPTIMAL LAYER ORIENTED

FEC RATE ALLOCATION SCHEME

A. Performance of layer based FEC scheme in terms oftime consumption

In Figure 7 the run time of the proposed layer based FECrate allocation scheme is plotted versus the number of datapackets available in the JPEG 2000 codestreams. This curveis compared to the one achieved using the optimal packetoriented FEC rate allocation scheme [6]. These results areachieved using an Intel core Duo CPU 2.9 GHz workstation.

As packet-oriented and layer oriented schemes are linkedby the number of layers available in each image, we vary thisparameter in order to derive some comparable results. In theconsidered scenario, the number of available resolution andcomponent of JPEG 2000 frames are fixed (resolution = 10and component = 1) because these parameters do not impactthe time-performance of layer oriented FEC rate allocationschemes. In Figure 7 each packet (i) corresponds to aspecific JPEG 2000 frame (with a specific quality layer).

In this scenario, the available bandwidth in the system isset to 18 Mbit/s ( sMbitsBav /18 ). It is worth noting that

in practice few existing JPEG 2000 codecs allow highquality scalability and to our knowledge, none of them canhandle more than 50 quality layers. Hence, the consideredscenario allows generalization to future high quality layerscalable FEC rate allocation systems.

Speedway5.j2k

Speedway10.j2k

Speedway14.j2k

Speedway20.j2k

(2)

(4)(1)

(3)

Avai

lable

Ban

dw

idth

(Mbps)

56



Figure 7. Time versus packets: Fixed image resolution (R=10) –Varyingquality layers (0 to 100) – One component (C =1)

In Figure 7 we notice that both layer and packet orientedscheme have a run time linearly increasing with the numberof packets available in the codestreams. However, theoptimal layer based FEC scheme is significantly less timeconsuming than the packet based FEC scheme. Forcodestreams containing less than 1000 packets (qualitylayers 10) the packet oriented FEC scheme is 3 timesmore time consuming than our optimal layer based FECscheme. For JPEG 2000 codestreams, whose number ofpackets is between 1000 and 5000 (quality layers between10 and 50) the packet oriented scheme is up to 5 times therun time of the layer based FEC scheme. Since existingJPEG 2000 codecs handle less than 50 quality layers, ourproposed optimal layer based scheme is a good candidatefor real-time JPEG 2000 codestreams over wireless channelas its yields low time consumption. For codestreams with anumber of packets between 5000 and 10000 (quality layersbetween 50 and 100 – high layer scalability) the packetoriented scheme has 6 times the run time of the layer basedFEC scheme. Hence, our proposed layer based scheme, dueto its low time consumption, could be viewed as a goodcandidate for future high quality layer scalable wirelessJPEG 2000 based images and video streaming applications.Although our proposed scheme achieves good performancesin terms of time consumption in comparison to packetoriented FEC rate allocation schemes, the last ones presentbetter performance in terms of visualization qualityespecially for highly noisy channels. It is worth noting thatpacket oriented and layer oriented FEC schemes advantagescould be combined in a smart switching FEC rate allocationscheme such as the one proposed in [12].In the following section we demonstrate the effectiveness ofour proposed layer based FEC scheme thank to aclient/server application of Motion JPEG 2000 videostreaming over real ad-hoc network traces.

B. Layer oriented FEC rate allocation for Motion JPEG2000 video streaming over real ad-hoc network traces

In this section we present the results achieved whilestreaming Motion JPEG 2000 based video over real ad-hocnetwork channel traces [13] and we demonstrate that theproposed optimal layer based scheme outperforms existinglayer oriented FEC schemes even if for highly noisychannels it is less efficient than packet oriented FECscheme. The comparison is handled both in terms ofStructural Similarity (SSIM) [17] and in terms of successfuldecoding rate. We derive the Mean SSIM metric of theMotion JPEG 2000 video sequence by averaging the SSIMmetrics of the JPEG 2000 images contained on theconsidered video sequence. It is worth noting that eachSSIM measure derived is associated to a successfuldecoding rate metric which corresponds to decoder crashavoidance on the basis of 1000 transmission trials.

The considered wireless channel traces are available in[13] and the video sequence used is speedway.mj2 [16]containing 200 JPEG 2000 frames generated with an overallcompression ratio of 20 for the base layer, 10 for the secondlayer and 5 for the third layer.

Figure 8 presents the successful decoding rate of themotion JPEG 2000 video sequence speedway.mj2 [16]transmission over real ad-hoc network channel traces [13].We observe that for highly noisy channels ( dBNC 15/ ),

the proposed optimal layer outperforms other layer basedFEC schemes but is less efficient than the packet orientedscheme. For noisy channel ( dBNCdB 18/15 ), we notice

that all layer based UEP schemes exhibit similarperformances in terms of successful decoding rate. However,for low noisy channels ( dBNC 18/ ) all the FEC schemes

yield the same improvement in terms of successful decodingrate.

Figure 8. Successful decoding rate versus Carrier to Noise Ratio

57



In Figure 9 we show that our proposed optimal layerbased FEC rate allocation scheme still outperforms otherlayer based schemes in terms of Mean SSIM. This is due tothe fact that the base layer which is the most important partof the codestreams is highly protected in our proposedscheme, in comparison to other layer based schemes,guaranteeing this way a good quality for the visualization.

Figure 9. Mean Structural Similarity versus Carrier to Noise Ratio

It is worth noting that, for highly noisy channels, ouroptimal layer oriented FEC scheme is less efficient thanoptimal packet oriented FEC scheme presented in [6].However the last one is unpractical for real time streamingapplications when the number of packets in the codestreamsis high. In contrast our proposed layer oriented efficientlyovercomes this limitation. In this context, instead of beingused to replace packet oriented FEC rate allocation schemes,our proposed optimal layer based FEC scheme should beused to complete it. Thus, an interesting extension to thiswork could be to consider the framework of unifying bothfamilies (packet oriented and layer oriented) and goingstraightforward to derive an optimal combined packet/layeroriented FEC rate allocation scheme for robust transmissionof JPEG 2000 images/video over wireless channels.

VI. CONCLUSION

In this paper we presented an optimal layer oriented FECrate allocation scheme for robust JPEG 2000 images/videostreaming over wireless channels. Compared to packetoriented FEC schemes, our proposed layer based scheme issignificantly less time consuming while offering similarperformances in terms of Structural Similarity for low noisychannels. However, for highly noisy channels, the packetoriented scheme is more efficient than the proposed schemewhich leads us to the conclusion that both packet orientedand layer oriented schemes should not be viewed asantagonists FEC schemes, but should be combined in a new

framework of combined packet/layer based FEC rateallocation scheme. In this context, our proposed optimallayer oriented scheme is a good candidate for both highlyconstrained video streaming applications and for futureunified packet/layer based FEC rate allocation schemes.

REFERENCES

[1] D. S. Taubman and M. W. Marcellin, “JPEG 2000 ImageCompression Fundamentals, Standards and Practice,” KluwerAcademic Publishers, The Netherlands 2001

[2] JPEG 2000 part 11 Final Draft International Standard, ISO/IEC JTC1/SC 29/WG 1 N3797

[3] M. Agueh, F. O. Devaux, and J. F. Diouris, “A Wireless MotionJPEG 2000 video streaming scheme with a priori channel coding,”13th European Wireless 2007 (EW-2007), April 2007, Paris, France

[4] Z. Guo, Y. Nishikawa, R. Y. Omaki, T. Onoye, and I. Shirakawa, “ALow-Complexity FEC Assignment Scheme for Motion JPEG 2000over Wireless Network,” IEEE Transactions on ConsumerElectronics, Vol. 52, Issue 1, Pages: 81 – 86, Feb. 2006

[5] M. Agueh, J. F. Diouris, M. Diop, and F. O. Devaux, “Dynamicchannel coding for efficient Motion JPEG 2000 streaming overMANET,” Proc. Mobimedia2007 conf., Aug. 2007, Greece

[6] M. Agueh, J. F. Diouris, M. Diop, F. O. Devaux, C. DeVleeschouwer, and B. Macq, “Optimal JPWL Forward ErrorCorrection rate allocation for robust JPEG 2000 images and videostreaming over Mobile Ad-hoc Networks,” EURASIP Journal on Ad.in Sig. Proc., Spec. Issue wireless video, Vol. 2008, ID 192984

[7] R. Prasad, C. Dovrolis, M. Murray and kc claffy, “Bandwidthestimation: metrics, measurement techniques and tools,” IEEENetwork, vol. 17, pp. 27-35, 2003

[8] M. Li, M. Claypool and R. Kinichi, “ WBest: a bandwidth estimationtool for IEEE 802.11 wireless networks,” Proc. of 33rd IEEE Conf. onLocal computer networks, 2008.

[9] Openjpeg, website: www.openjpeg.org

[10] A. Johnsson and M. Björkman, “On measuring available bandwidthin wireless networks,” Proc. of 33rd IEEE Conf. on Local Computernetworks, 2008.

[11] D. Gupta, D. Wu, P. Mohapatra and C. Chuah, “Experimentalcomparison of bandwidth estimation tools for wireless meshnetworks,” Proc. of IEEE Inocom mini-conference, April 2009

[12] M. Agueh, S. Ataman, and H. Soude, “A low time-consuming smartFEC rate allocation scheme for robust wireless JPEG 2000 imagesand video transmission,” 4th International conf. on Com. andNetworking, 2009, Xi’an (China)

[13] Loss patterns acquired during the WCAM Annecy 2004 measurementcampaigns IST-2003-507204, “Wireless Cameras and Audio-VisualSeamless Networking,” website: <http://wcam.epfl.ch/> 01-04-2009

[14] J. R. Yee and E. J. Weldon, “Evaluation of the performance of error-correcting codes on a Gilbert channel,” in IEEE Transactions onCommunications. vol. 43, no. 8, Pages: 2316-2323, 1995

[15] A. Descampe, C. De Vleeschouwer, C. Iregui, B. Macq, andF.Marques, “Pre-fetching and caching strategies for remote andinteractive browsing of JPEG 2000 images,” IEEE Transactions onImage Processing, vol 16, no 5, Pages: 1339-1354, 2006

[16] Speedway video sequences have been generated by UCL:<http://euterpe.tele.ucl.ac.be/WCAM/public/Speedway%20Sequence>

01-04-2009

[17] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Imagequality assessment: From error visibility to structural similarity,”IEEE Tran. on Image Proc., vol. 13, no. 4, Pages: 600-612, Apr. 2004

[18] Max Agueh, Henoc Soude, "Optimal Layer-Based Unequal ErrorProtection for Robust JPEG 2000 Images and Video Transmissionover Wireless Channels," mmedia, pp.104-109, 2009 FirstInternational Conference on Advances in Multimedia, 2009

58



Modelling of Mobile Workflows with UML

Michael DeckerInstitute AIFB, University of Karlsruhe (TH)

Karlsruhe, GermanyEmail: [email protected]

Abstract—Thanks to the advances on the field of mobilecomputing nowadays it is possible to realize workflow systems,which provide support for mobile activities, i.e., activitieswhich are usually performed in situations where no stationarycomputer is available. It is obvious that the integration ofmobile computers into workflow systems has a great potentialespecially for companies with a high portion of mobile workers.However, so far there is a lack of modelling techniques thatallow to express mobile-specific aspects of workflows. In thefollowing article we therefore introduce an extension to activitydiagrams from the Unified Modelling Language, which allowsto model different types of so called location constraints. Sucha location constraint is a statement that defines where aparticular activity of a workflow has to be performed or is notallowed to be performed. The proposed technique in particularsupports dynamic location constraints, i.e., constraints that aredefined during the runtime of a workflow instance. Further,different classes of anomalies that may occur in diagramsaccording to the proposed extension are also discussed.

Keywords-Mobile Workflow Systems, Location-based Ser-vices, Business Processes, Activity Diagrams

I. INTRODUCTION

A workflow is a set of partial-ordered activities that isperformed with the support of a special computer system (socalled Workflow Management System) to reach a particularbusiness goal [2]. This goal could be the fulfillment of an or-der to perform some kind of on-site repair (e.g., fix machinein factory or in a private apartment). The individual activitiesrequired to reach this goal include receive customer’s call,dispatch service technician, perform on-site-repair and post-processing (e.g., writing bill, evaluation). There is also apartial order defined for these activities that states whichactivities have to be finished before a particular activitycan be started or if there are optional activities. Such adescription of a workflow is also called a workflow schema.The individual invocations of that schema (e.g., order no.123 by customer A., also termed a case) is a workflowinstance of that schema. A workflow schema is a mobileworkflow if there are instances that have activities thatare typically performed using mobile devices like PersonalDigital Assistants (PDA), smartphones or notebooks. Theworkflow sketched above as example is such a mobile work-flow, because during on-site repair activities the concernedactors (mobile workers) do not have access to stationarydesktop computers, so they have to use mobile computersto query technical data about components to be repaired,

to order replacement parts or to write on-site reports aboutwhich workings had to be performed.

It is unquestionable that the employment of mobile de-vices for the enactment of workflows provides a greatpotential [3][4]: even while staying in the field mobileworkers have access to latest information from backendsystems (e.g., technical documentation, customer data, stateof orders, availability of items). Further, it is possible toavoid media disruptions because mobile workers don’t haveto print every information they might need on-site and don’thave to write reports and gathered data (e.g., orders forspare parts, measurement readings) onto paper forms thathave to be entered into a computer system. This helpsto avoid costs (for paper as well as working time neededfor data gathering), errors (e.g., typos, double entries, lostforms) and the time delay between printing and informationconsumption or writing down the information and enteringit into the backend information system is reduced.

The contribution of the article at hand is the introductionof an UML profile for activity diagrams that allows to coverimportant mobile-specific aspects, namely the constantlychanging location of the mobile actors. To enable this socalled location constraints are introduced: such a constraintis a statement attached to an activity that says at whichlocations the respective activity has to be performed or isnot allowed to be performed. Not only static constraintsare possible, i.e., constraints that are defined at the designtime of a workflow schema, but also dynamic constraints,which are derived automatically during the runtime of aworkflow instance. To actually enforce location constraintsit is necessary that the workflow system knows about themobile user’s current location. For this several locatingtechnologies are available: the most prominent one is thesatellite-based Global Positioning System (GPS), but thereare many other technologies, see [5] or [6] for a goodintroduction and overview on this topic.

Location constraints for workflows are a variant ofLocation-aware Access Control (LAAC). Access control isconcerned with the decision if a user’s request to perform aparticular operation on a resource of an information systemhas to be granted or denied. Examples for pairs of operationand resource are “read on report.doc” or “write on databasetable1”. When LAAC is employed the access decision isnot only based on the user’s identity and properties of the

59



resource but also on the current location of the user asdetermined by a locating technology. The novel feature ofour extension for activity diagrams from the perspective ofLAAC is that it supports process-aware location constraints,i.e., that it is possible to formulate location constraints basedon the order of activities within a workflow. Further, it iseven supported that activities of a workflow instance createlocation constraints for subsequent activities of the sameworkflow instance.

The work at hand is the extended version of a paper thatwas presented at the IARIA UbiComm-conference 2009 [1].In addition to the original paper we also introduce someshortcut notations (Section V-B) and a notation for derivedconstraints (Section V-C). Further, we discuss the problemof anomalies with location constraints (Section VIII) andsketch how our concept of location constraints can also beapplied for UML usecase diagrams (Section IX).

The following sections are organized as follows: In Sec-tion II the basics necessary for the understanding of thepaper are covered. The underlying location model for ourmodelling approach is introduced in Section III. Based onthese preparations the concept of location constraints isintroduced in Section IV. The next Section V is devotedto the visual representation of the different types of loca-tion constraints. A detailed example of an activity diagramannotated with several location constraints can be foundin Section VI. In Section VII it is elicited how UML’smetamodel has to be extended to obtain a UML profile forlocation constraints. There are different types of anomaliesthat may occur if location constraints are used; this is thesubject of Section VIII. Location constraints can also be usedto annotate usecase diagrams which is explained in SectionIX. A short survey on related work can be found in SectionX before we conclude with a summary and outlook to futurework in the final Section XI.

II. PRELIMINARIES

In this section we describe the basics from the domain ofLocation Aware Access Control and the Unified ModellingLanguage that are required for the understanding of thediscussion in this paper.

A. Location-Aware Access Control

When Location-Aware Access Control (LAAC) is appliedthe user’s current position is considered by the componentof the system that makes the access control decision [7].For example, it could be enforced that a user is not al-lowed to access confidential data with his mobile computerwhen he stays at places that are deemed as unsafe, e.g.,public places, countries where espionage has to be fearedor regions where no secured wireless data transmission isavailable. So LAAC is a mean to tackle specific security

challenges that come along with the employment of mobilecomputers. These challenges arise from the fact that due totheir portability and size mobile computers often get lost orstolen. There is also the danger that someone looks over theshoulder of the legitimate user to gather data that is classified(“shoulder sniffing” or “shoulder surfing”). Further, wirelessdata transmission could get eavesdropped (passive attack) oreven manipulated (active attack).

Another serious challenge in mobile computing are usabil-ity problems, since mobile devices only have a small displayof limited quality and rudimentary means for data input(e.g., no full keyboard). Because of this the mobile user willappreciate it if data items and elements of the graphical userinterface are hidden, when they are not relevant for him at hiscurrent location. For example, if a mobile service technicianstays in a particular city all files concerning customers inother cities could be hidden.

B. Unified Modelling Language

The Unified Modelling Language (UML, [8]) is the resultof the consolidation of several independently developedmodelling languages (e.g., Object Modelling Technique(OMT) and Object Oriented Software Engineering (OOSE))from the domain of software engineering. Nowadays UMLis maintained by an industry consortium, namely the ObjectModelling Group (OMG).

UML comprehends 13 different diagram types. On theuppermost level we can discern diagrams to describe struc-tural aspects of software systems and behavioural aspects.A well-know example for the former type are class diagramsthat are commonly used to describe data structures in objectoriented programming languages (e.g., Figure 1). Activitydiagrams belong to the latter type of diagrams. In the nextsection we show how these diagrams can be extended toenable expressing mobile-specific aspects.

The purpose of UML’s activity diagrams is to describeworkflows, i.e., to represent a set of activities and therelations between these activities. An individual activity isdepicted as box with rounded edges that contains the nameof the activity. The initial activity is a black circle and thefinal activity is represented as black circle within a biggerwhite circle. Arrows with solid lines connect these boxes toindicate which activities have to be finished before anotherone can be started; this way the control flow of the workflowis defined. To enable depicting conditional flows of activitiesthere are control nodes (shown as diamonds) that are usedto connect arrows. Further concepts in activity diagrams areswimlanes to group activities together (e.g., all activities thathave to be performed by the same actor or organisation) orthe forking/joining of the control flow for parallel activities.

Many other graphical languages for the modelling ofworkflows and business processes can be found in litera-ture, e.g., Petri-nets, Event-driven Process Chains (EPC) or

60



Flowcharts. The distinguishing feature of UML activity dia-grams for our purpose is that it provides an explicit definedmeta-model and thus explicitly supports the definition ofextensions.

III. LOCATION MODEL

Before the actual extension of UML activity diagramscan be described we have to introduce a simple locationmodel. For the sake of simplicity we consider only twodimensions, but it is simple to extend the model to supportthree dimensions which might be necessary to cover indoorscenarios where for example offices at different floors of abuilding have to be distinguished.

As geometric primitives there are Point and Polygonwhich both are extensions of the abstract superclass Ab-stractGeometry (see Figure 1). For polygons it is demandedthat the lines of a given polygon don’t cross each other. If aclass is abstract this means that it is not possible to directlycreate instances of that class; to obtain an instance one has tocreate a concrete (i.e., non-abstract) subclass. A circle areais associated to one point that represents the center pointof the circle; the radius of that circle is defined as membervariable of class circular area and holds a double value thatdefines the radius of that circle in meters. LocationInstancesare associated with exactly one polygon and belong toexactly one LocationClass. Location classes are used togroup location instances that conceptually belong together,e.g., there might be location classes that represent cities,countries, districts or rooms within buildings. The conceptof location classes and instances can also be found in theGeographic Markup Language (GML) in the form of featuretypes and features [9][10]. It is demanded that two polygonsthat belong to location instances of the same location classdo not overlap spatially. Examples of location instances forlocation class city are Malta or Berlin. To exemplify thismodel also some instances of location instances and classesare drawn in the figure (surrounded by the box with thedotted line). It is not demanded that all the instances of oneparticular location class cover the whole reference space,i.e., that a given class provides an exhaustive classification.This is the case for class city, because not every point in acountry can be assigned to a city, there are also rural areas.But a location class country could provide an exhaustiveclassification if the reference space is the whole area of agiven continent; each point on that continent can be assignedto one instance of that class which represents a particularcountry.

IV. LOCATION CONSTRAINTS

In this section the concepts of Location Constraints andLocation Rules are introduced.

A. Definition and Classification of Location ConstraintsA location constraint is a statement about the location

where one or more activities of a workflow schema or

CircleArea

LocationClass

PolygonPoint

AbstractGeometry

Radius:Double

1

*Center Point

LocationInstance

1

*Spatial Extent

City Country

MaltaValletta

Berlin*

1

InstanceOf

ExampleInstances

Figure 1. Location Model as UML-Class Model

instance have to be performed or are not allowed to beperformed. Constraints of the former type are called positiveconstraints while constraints of the latter type are negativeconstraints. These two types of constraints can be found inthe classification of location constraints depicted in Figure2.

A positive constraint could be demanded if a companywants to enforce that particular workflow activities dealingwith confidential customer data are not performed outsidethe company’s premises. Negative constraints will usuallybe employed if it is easier to express where something isnot allowed than to enumerate the places where something isallowed; for example, if there are only a few countries in theworld where certain software functions of a mobile workflowsystem should not be used (e.g., because of license or exportrestrictions) then a negative constraint will be defined thatenumerates all these countries.

There is another dimension to classify location constraintswhich can also be found in Figure 2 and which is orthogonalto the distinction of positive and negative constraints:

Static constraints: These constraints are defined forthe workflow schema before the runtime of the individualworkflow instances. This implies that these constraints —which can also be called schema constraints — are enforcedfor all workflow instances that are created from that schema.

Dynamic constraints: They are defined during the run-time of a workflow instance are only valid for that instance.Further, we distinguish external and internal dynamic con-straints: For external constraints the actual locations are notcalculated by the mobile workflow system itself; rather, theyare defined manually by a human operator during runtime(e.g., dispatcher, manager) or they are queried from ananother information systems, e.g., a customer relationshipmanagement database that stores the addresses of all thecustomers of a company. When an internal constraint isdefined then all the information needed by the mobileworkflow system to calculate the actual location during the

61



Location Constraints

Positive NegativeStatic(schema level)

Dynamic(instance level)

Internal(Location Rules)

External

ManualDefinition

From otherSystems

Figure 2. Different types of location constraints depicted as classificationtree

runtime of the workflow instance has to be specified in theworkflow graph. For this we propose so called location ruleswhich are elicited in the subsequent subsection.

B. Location Rules

The basic concept behind so location rules is that basedon the user’s current location where he performs a partic-ular activity called trigger activity a location constraint foranother activity called target activity can be derivated. If thelocation constraint created by a rule is a positive one thismeans that both the trigger and the target activity have to beperformed at the same location, so this is called binding of(same) location. If a negative constraint is created then thetarget activity cannot be performed at the same location asthe trigger activity; this case can also be called prohibitionof same location or separation of locations.

These two types of location rules were greatly inspiredby the work of Bertino at al. [11]: they applied the wellknow security principles Separation of Duties (SoD) andBinding of Duties (BoD) [12] to the workflow domain.SoD for workflow management means that if a given userperformed a particular activity of a workflow instance he isnot allowed to also perform a particular other activity of thesame workflow instance. The standard example for SoD isan approval workflow where the submitter of the proposalis not allowed to decide over his own proposal. An examplefor BoD is the policy one face to the customer that says thatthe actor who had the initial contact with a customer has toperform all other activities for the same workflow instancethat involve communication with the customer. Bertino et al.even consider the case of inter-instance constraints, e.g., ifAlice had to perform the activity make decision in a proposalworkflow instance started by Bob then Bob isn’t allowedto perform that activity in a subsequent proposal instanceinitiated by Alice to prevent collusion.

To define what is the same location the location model hasto be employed. A location rule can refer to a location class;the instance of that class that contains the user’s locationwhen performing the trigger activity is the location that isused for the target location. It is also possible to define a

Table IEXAMPLE FOR AN IMPLICATION LIST WHICH MAPS LOCATION

INSTANCES OF CLASS country TO LOCATION INSTANCES OF CLASS city

Trigger Location Target LocationGermany MunichFrance MunichMalta Valletta

. . . . . .

radius that is used to calculate a circle around the user’slocation during the activation of the trigger activity.

Another approach is to have rules that create constraintspointing to a location different from the location where thetrigger-activity was performed; for these rules with impliedlocations it is necessary to have lookup tables that maptrigger locations to target locations. As example such animplication list is sketched in Table I: this table mapsdifferent countries to the city where the headquarter ofan international company is. It is allowed that differentcountries are mapped to the same location, e.g., all theorders from either France or Germany have to be handledin Munich, because the number of orders from France is sosmall that it wouldn’t make sense to operate a headquarterin France.

V. VISUAL REPRESENTATION

In this section first the basic elements of the visualrepresentation are introduced. After this, shortcut notationsare covered and the depiction of constraints derived basedon location rules during the runtime of a workflow instance.

A. Basic Elements

As visual representation for the different types of locationconstraints and rules in UML activity diagram we proposethe one which is sketched in the table that can be found inFigure 3. All the constraints in the left column are positiveconstraints or produce positive constraints, while the rightcolumn is devoted to negative constraints. The uppermostrow shows static location constraints while all the other threerows show different kind of dynamic location constraints.

• All location constraints and rules are attached as dottedarrows to the boxes that represent the activities. On thedotted arrow there is a circle that holds a symbol whichindicates the mode of the constraint: Positive StaticConstraint and Binding of same Location are symbol-ized by “=”, Negative Static Constraint and Prohibitionof same Location are symbolized as “ 6=”, rules forImplied Binding of Location are symbolized as “⇒”and Implied Prohibition of Location are symbolized as“;”.

• Static constraints are shown as arrows pointing from aparallelogram to the constrained activity. The parallelo-gram stands for a location instance and holds the nameof that location.

62



Positive Constraints

Negative Constraints

Sta

tic

Con

stra

ints

Dyn

amic

Con

stra

ints

=LocInstance1

LocClass1=

100m

„Binding of same Location“

„Prohibition of same Location“

List1⇒

„Implied Binding of Location“

LocInstance2

List3⇒

„Implied Prohibition of Location“

… …

… …

Sam

e Lo

catio

nIm

plie

dLo

catio

n

≠

≠

Ext

erna

lC

onst

rain

ts

Loca

tion

Rul

es

Manual Definition From other System

…

City=

CRM-DB

…

Country≠

Figure 3. Different types of location constraints

• There are two types of external dynamic constraints,namely those where the actual constraint is enteredmanually by a human operator or is retrieved fromanother information system connected to the workflowsystem. Human operators are represented by a symbolthat shows a little man. This symbol is also used torepresent actors in UML usecase diagrams. An externalinformation system as source for a dynamic locationconstraint is depicted by the symbol for a softwarecomponent that was used in the deployment diagrams ofUML 1.x. An external constraint is also depicited by adotted arrow which connects two activities. The activitywhere that arrow starts is the activity which triggers theretrieval of the dynamic constraint. It is possible thatthe trigger activity is solely devoted to the retrieval ofthe location constraint; but the triggering could alsobe just a secondary effect of an activity. The target ofthe dotted arrow is the activity to which the retrievedlocation constraint will be assigned to. On the line ofthe arrow the symbol for the source of the constraintas well as a circle with hold the mode (positive ornegative) and the granularity (location class or radius)is drawn.

=

London

. . .

Swimlane

. . .

. . .. . .

Berlin

Figure 4. Location constraint for all activities in a swimlane

• A rule for the creation of dynamic constraints is shownas arrow pointing from the trigger activity to the targetactivity that will be bound to the derived location. Forsame location rules the granularity of what constitutesthe same location is annotated by a box attached tothe circle; the box can hold either a location class or anumeric value as radius; for implied location rules thebox holds the name of the implication list. For example,if the location class city is used for the rule then thecity that covers the user’s current position is assignedas location constraint to the target location. If such acity cannot be found then no constraint is generated.

B. Shortcut Notations

We also devised some shortcuts which help to reduce thenumber of required elements to depict location constraintsin some scenarios.

In Figure 4 several activities are contained within aswimlane (rectangular box). In UML swimlanes are usuallyemployed to group activities together that are performedby the same actor. It is also possible to assign a locationconstraint to a swimlane, which then has to be enforcedfor all activities within that swimlane. This example showsalso that it is possible to assign more than just one locationwith a static location constraint. If two or more locations areassigned as positive location constraints this means that theconstrained activities have to be performed in one of theselocations; in the example in Figure 4 this would mean thatall the activities within the swimlane have to be performedeither in London or Berlin.

When several locations are assigned to a negative locationconstraint this means that the activity can be performedeverywhere but not at any of the assigned locations.

The fragment in Figure 5 represents the case where twodifferent activities share the same location constraint. Forthese fragments the meaning is that both activities have tobe performed somewhere in Canada; however, it would bepossible to perform the two activity in different Canadiancities. Such a constraint could occur in practice if the pre-and post-processing of a workflow should be performedwithin a particular country, but all the other activities canbe performed somewhere else in the world.

63



=

Canada

. . . . . . . . .

Figure 5. Single location constraint shared by two separate activities

B

ABlock

. . .

. . .

. . .

. . .

≠

. . .. . .

C

Figure 6. Location rule with two trigger activities

There are also shortcuts for location rules like shown inFigure 6: in this example one location rule has two triggeractivities. The location instance of class Block (short forBlock of flats) which contains the user’s location when heperforms activity A or B will define a negative constraintthat forbids the execution of activity C at the same block.Analogous to this it is possible to have more than just onetarget activity for one location rule; this would mean thatthe location constraint created when that rule is triggered isattached to all the target activities at the same time.

C. Visualisation of derived constraints

While not necessary for the actual workflow modelling forthe purpose of demonstration it is useful to be able to depictthe static location constraint generated by a dynamic con-straint; the generated constraint can also be called derivedconstraint

A static constraint generated by a dynamic constraint isdepicted as a static constraint (as parallelogram) and attachedwith a dotted line to the box that indicates the granularityof the contraint. In Figure 7 this is shown for the case of alocation rule, but the notation can also be used for externalconstraints. The rule in the figure creates a positive constraintthat restricts the execution of the target activity to the citywhere the trigger activity was performed. In the depictedexample the trigger activity was performed somewhere inItaly, so that an accordant static constraint was created.

Country

=Italy

. . .. . .

Figure 7. Depiction of derived constraint

VI. EXAMPLE SCENARIO

To exemplify the application of our UML profile theworkflow already sketched in the introductory section iselaborated in more detail and shown as diagram with lo-cation constraints in Figure 8. This workflow could befound in a company that provides technical maintenanceservices and employs many technicians which are sent tothe customers’ premises to perform on-site repair works oftechnical components.

An instance of the depicted workflow schema is createdwhen a call center agent receives a telephone call from acustomer. This activity has a static location constraint thatpoints to all the location instances that represent the premisesof call centers operated by the company. The incoming callsare routed to the call center that is the nearest with regardto the origin of the call, so the local service center thathas to send a mobile worker to the customer’s premises(dispatching) is defined based on a dynamic location con-straint: the district in which the dispatching activity has tobe performed is determined by an external application thatmakes this decision by evaluating the caller’s phone number.The service center first sends an inspector to the site (e.g.,factory, private residence, places with components of pub-lic infrastructure like pipes, electric transformation station,junction box) where the allegedly defective component canbe found. If the inspector cannot find a technical defectthe mobile part of the workflow is aborted and Follow-upOffice Work is performed as final activity. However, if thereis indeed a defect the inspector will order a mobile repairteam to the location of the component using his PDA. Thelocation where that team can perform the actual repair workis restricted with a dynamic constraint in form of a locationrule (binding of same location) which says that the repairactivities have to be performed not farther away than 150meters from the point that was determined by the inspector’sGPS device when he placed the order for the mobile repairteam.

Since some components cannot be repaired on-site apossible branch of the workflow is Shop Floor Repair ata special repair shop operated by the company. To min-imize transportation ways the location of this activity isconstrained with another dynamic constraint that is basedon an implication list; this implication list maps each regionwhere the company provides its service to the nearest repairshop. If the activity in the shop floor is finished thenanother invocation of the activity On-Site Work has to followsince the repaired component has to be brought back andreinstalled. It is possible that the sequence of on-site workand shop-floor work is performed several times, e.g., ifthe defect is not solved after the first component that wasrepaired in the shop-floor is brought back.

The final activity is called Follow-up Office Work anddeals with writing the bill, ordering new spare parts and

64



Impl-List2

District

AnswerCustomer Call

Dispatching On-Site-Inspection

CallCenter1

…

On-Site-Work

Shop-Floor-Repair

Follow-upOffice Work

⇒

150 m=

CallCenter2

=

≠

Start End

District=

Figure 8. Example workflow with different kinds of location constraints

writing a report for the customer and the manufacturer ofthe component that had to be repaired. Since this activityalso includes some kind of evaluation (e.g., how fast thecustomer’s problem was solved, level of customer satisfac-tion) a dynamic constraint (prohibition of same location) isused to demand that the office center where this activity isperformed is not in the same district as the one that initiatedthe workflow instance. The purpose of this rule is to ensurethat employees knowing each other cannot collude to coverup mistakes that were made.

VII. UML PROFILE

We are now prepared to show how the proposed exten-sion to UML activity diagrams fits into the meta-model ofUML. For the sake of brevity only the most important newconstructs are covered. In Figure 9 the package MobileWork-flowProfile is shown, that represents the new UML profile.Outside of the package some important classes from theUML meta-model are shown.

The profile also contains the location model which wasintroduced in Section III. Due to space restrictions only theparts of the location model are drawn that are necessaryfor understanding the relation of the location model andthe rest of the model. There are also two new classes tomodel implication lists for rules with implied locations:ImplList and ImplPair. Each instance of ImplPair standsin association with two location instances to represent thetrigger location and the target location. These instances ofImplPair (Implication Pair) represent the individual entriesof an implication ImplList (Implication List).

The main connection between the UML meta-model andour profile is the extension relationship between class Activ-ity and ConstrainedActivity. Instances of ConstrainedActivityhave to be used if any kind of location constraint has to beassigned to an activity. For this each ConstrainedActivitystands in association with at least one instance of classAConstraint, which is the short form for Abstract Constraint,i.e., it is not possible to obtain direct location instances

of that class. A flag named isPositive indicates if theAConstraint represents a positive or a negative constraint.There are two direct subclasses of AConstraint, namelyADynConstraint (short for AbstractDynamicConstraint, an-other abstract class) and StaticConstraint. Each instance ofclass StaticConstraint stands in association with at least oneinstance of LocationInstance. ADynConstraint as the secondsubclass of AConstraint has two direct subclasses: ImplLoc-DynConstraint and SameLocDynConstraint. ImplLocDyn-Constraint is associated with exactly one ImplList. ClassSameLocDynConstraint stands in association with class Loc-Class. However, this association is an optional one becausewhat is considered as “the same location” cannot only bedefined based on a location class but also based on a radius.For the latter case there is also a member variable radius inclass SameLocDynConstraint that is set to a value greaterthan 0 if the modeller thinks it is more appropriate to derivethe “same location” using a radius than looking a locationinstance. For a given instance of SameLocDynConstraint itis not allowed to use both methods to define the “samelocation”, i.e., either the radius is greater than 0 or there is anassociated location instance. Using UML’s Object ConstraintLanguage (OCL) it is possible to express this formally [13]:

context SameLocDynConstrinv (not targetClass->isEmpty()) XOR

(radius >= 0.0)

The keyword context is followed by the name of that classfrom whose perspective the following statement has to beviewed. For the given statement this means that targetClasshas to be the name of an end of an assocation assigned toclass SameLocDynConstr and that radius has to be the nameof a member variable of that class. The keyword inv standsfor invariant, i.e., the following boolean expression has toevaluate to true for all instances of the model. IsEmpty() is afunction that returns true if for the considered instance thereis no assocation to an object of LocClass.

Class ActivityEdge from the UML meta-model couldn’t

65



Anomalies

Severity Certainty InvolvedConstraints

Redundancy Certain

ContradictionPotential Static

DynamicBothLocation-

dependent Sequence-dependent

Figure 10. Different types on anomalies

be used to represent the arrows introduced in our profile toassign location constraints to activities because the edges inactivity diagrams can carry tokens and we opted to have adifferent graphical representation (dotted arrows instead ofsolid arrows for the sake of better readability).

VIII. ANOMALIES OF LOCATION CONSTRAINTS

When an activity diagram is annotated with locationconstraints according to the UML profile introduced in thispaper it is possible that anomalies occur. In this sectionwe will describe different types of such anomalies in termsof pairs of location constraints. The different dimensionsconsidered to classify anomalies of location constraints aredepicted in Figure 10.

The first dimension to classify the different types ofanomalies is the severity level:

• A location constraint could make a redundant state-ment, i.e., the constraint could be removed withoutaltering the behaviour of the mobile workflow systemfor any thinkable workflow instance. There are twosubtypes of redundancy: the constrained activity isnever reached or another constraint makes the samerestriction or an even stronger one. The first case wouldoccur if a static constraint is made to an activity that isnever reached; this means that there is an inconsistencyin the underlying workflow schema. Another case is alocation rule whose target activity cannot be performedagain once the trigger activity is reached; this meansthe location constraint generated by the trigger activitywill never have any effect.

• Two location constraints could make contradictorystatements for a given activity, i.e., there is no placewhere that activity could be performed by an actoraccording to both the constraints. The suggestive ap-proach to deal with such contradictions would be toassign priorities to the individual constraints, so that incase of a conflict only the constraint with the higherpriority is considered.

Redundant statements do not cause problems during theexecuting of a workflow instance but unnecessarily bloatthe model. Further, the existence of a redundant statement

might be a hint that there was an error during the elicitationof the required constraints.

Most workflows schemas have decision points, i.e., thereare different sets of activities that are actually performedfor a given workflow instance. For example, there mightbe a branch of a workflow schema that isn’t executed forevery instance. Further, the order of the activities can alsodiffer between two workflow instances of the same workflowschema. Some anomalies may occur depending on thelocation of the actor while performing a particular activity.This leads to the next dimension for the classification ofanomalies called certainty:

• A potential anomaly will not appear in every workflowinstance of a given workflow schema. We distinguishtwo subcases of potential anomalies: the anomaly ap-pears depending on the order and/or the set of actuallyperformed activities (sequence-dependent) or merelybased on the location where one or more activitieswhere performed (location-dependent) no matter of thesequence of performed activities.

• A certain anomaly will appear in every workflowinstance of a given workflow schema. So to detect thistype of anomaly we just have to look at one potentialworkflow instance of that schema.

It also can be considered between what types of locationconstraints the anomalies arise:

• If two static constraints produce an anomaly this iscalled intra-static anomaly. For example, a positivestatic constraint assigned directly to an activity coulddemand that that activity is performed in London.However, this activity is contained in a swimlane whichhas also a static constraint which says that the activitiesare not allowed to be performed in England. SinceLondon lies within England there is no place thatsatisfies both contradictory constraints.

• If both constraints involved into are dynamic constraintsit is named intra-dynamic anomaly. As example weconsider an activity that is the target of two positivelocation rules. During runtime the first rule creates aconstraint that says that this activity has to be per-formed in Berlin; subsequently, the second rule createsa constraint that says that the same activity has to beperformed in Spain. Since Berlin isn’t a City in Spainthis is a contradiction.

• The last case are inter-static-dynamic anomalieswhere both types of constraints are involved. So thepair of the conflicting constraints has one dynamic andone static constraint.

In Figure 11 three activity diagrams of tiny workflows aredepicted which cover the three cases of inra-static, inter-static-dynamic and intra-dynamic anomalies:

66



Activity

<<abstract>>

ActivityNode

LocationModel

MobileWorkflowProfile

LocClass

LocInstance

ImplPair

ImplList

triggerLoc

targetLoc

<<abstract>>

ActivityEdge

AConstraintisPositive:bool

StaticConstraint

ConstrainedActivity

ADynConstraint

SameLocDynConstrradius:double

ImplLocDynConstr

<<abstract>>

Classifier Class

11

* *

* 1..*1

*

1

*

*

2+1

*

1..*

1

11

*

*

0..1

targetClass

triggerActivity

*

1..*

Figure 9. Extension of the meta-model of UML Activity Diagrams

A1 A2

=

U.K. LON

A1 A2

U.K.

=

CitiesEurope

A1 A2 A3

CitiesWorldwide

=

Countries

a)

b)

c)

=

=

=

Figure 11. Examples for anomalies

• Diagram a) shows an intra-static anomaly. The swim-lane that contains the two activities A1 and A2 has aconstraint which confines the execution of these activi-ties to the location London. Further, Activity A1 has anindividual constraint that restricts the execution of thisactivity to the location United Kingdom (U.K.). Thisrepresents a redundancy, because A1 is also confined bythe constraint assigned to the swimlane, which is evenstricter, since the location London is a subset of U.K. Ifthe constraint assigned to activity A1 would be omittedit wouldn’t change semantics of the diagram. This casecould occur if the constraint to A1 was assigned longbefore the swimlane or the constraint for the swimlanewas created and it was forgotten to remove the thenredundant individual constraint for A1.

• Diagram b) shows an intra-dynamic anomaly: ActivityA2 is the target activity of a location rule and hasalso a static constraint. The location rule’s triggeractivity is A1 and the derived location constraint willbe an instance of the location class Cities of Europe.A2’s location constraint points to the location UnitedKingdom (U.K.), but not all European cities are locatedwithin the U.K., so this location rules represents arepotential anomaly in the form of a contradiction.

• Diagram c shows an intra-dynamic anomaly: there aretwo location rules with the same target activity A3. Theupper location rule (with trigger activity A2) assignsa location constraint that confines A3 to a locationinstance of the class Cities Worldwide; the lower lo-cation rule with trigger activity A1 assigns a locationconstraint that confines A3 to a location instance ofthe class Countries. Since it is possible, that A2 is

67



performed in a City that doesn’t lie in the country whereA1 was performed this represents a potential anomaly inthe form of a contraction. For example, it could happen,that A3 has to be performed in Germany and Lisbonaccording to the derived location constraints, which isobviously impossible to satisfy, since Lisbon is not aGerman city.

In Figure 12 three more activity diagrams with potentialanomalies are shown:

• In diagram a) the location constraint assigned to activityA1 represents a redundancy (see also first diagram inFigure 11). However, the branch with the swimlane isn’texecuted in every run of this workflow this anomaly isa sequence-depend anomaly.

• Diagram b) shows a workflow schema with two concur-rent branches. The potential anomaly is a contraction ifA2 is performed before B2 in an European City outsidethe U.K. But if the execution of B2 is started before A2is executed then the derived location constraint won’teffect the execution of B2 and so not contradictionoccurs.

• The third diagram c in this figure has again twobranches of which only one is executed. If the branchwith activity A is executed in an European City thiswill lead to a contradiction, because the final activityC can only be executed in the USA according to thestatic constraint.

So all the anomalies in this figure are sequence-dependent.As example of a location-dependent anomaly we can con-sider diagram b in Figure 12: in this case a contradictionoccurs if A1 is executed outside the U.K.

IX. LOCATION CONSTRAINTS FOR USECASE DIAGRAMS

Another diagram type which can be found in UML is forusecase diagrams [8]. The purpose of a usecase diagram isto show which functions (usecases) a software system shouldprovide. Such a usecase is depicted as ellipse that containts ashort textual description. It further shows different user rolesand how these roles are connected to individual usecases. Arole is depicted as human operator (the same symbol thatwas used for manual location constraints) and is connectedby a line to each usecase he is allowed to perform. If thesystem to be developed consists of several subsystems (e.g.,distributed system) then each subsystem is represented by arectangular box that contains the usecases that are operatedby this subsystem. Usecase diagrams have also the abilityto show how individual usecases stand in relation to eachother, e.g., if one usecase always invokes another usecase.However, this features isn’t relevant for our consideration.

Usecase diagrams are usually employed at an early stageof the software development cycle. Based on the results ofthe usecase analysis activity diagrams can be developed.

Location constraints as defined in Section IV can also beused to annotate usecase diagrams. An example for sucha diagram can be found in Figure 13: The left subsys-tem contains two usecases, namely Create new order andFinalize order; the right subsystem has also two usecaseswhich are named Create new account and Edit master data.There are three roles in the diagram: Travelling Salesman,Manager and Accountant. It is possible to assign staticlocation constraints to roles, usecases, system boundariesand the association between a role and a usecase. Further,usecases can also be the trigger or the target of a locationrule: the part of town where the usecase Create new orderis performed defines where the usecase Finalize order hasto be performed. The rationale behind this rule is thata travelling salesman should only be allowed to finalizethe order where he started it. Further, the role travellingsalesman is restricted to Spain by a location constraint. Thissays that users of that role can only invoke usecases whenthey are within Spain. However, users with role manager areallowed to invoke the usecases they are assigned to withoutany spatial restriction. The third role Accountant has alsoa location constraint that prevents users with that role ofinvoking usecases when they are outside the rooms of thecompany’s accounting department.

The subsystem at the right side of the figure has a staticlocation constraint that confines all the usecases provided bythat subsystem to the location company premises.

X. RELATED WORK

In [14], an extension for UML activity diagrams isproposed that aims at modeling security requirements forbusiness processes. The model introduces stereotypes to at-tach security requirements (e.g., privacy, access control, non-repudiation) to different elements in UML activity diagrams.For example, using this notation it is possible to assign astereotype �privacy� to an activity partition (swimlane)to express that the disclosure of sensitive personal data hasto be prevented during the activities contained by the withinthe swimlane.

Another profile for UML activity diagrams can be foundin [15]. The purpose of this profile is to annotate workflowswith concepts from the domain of business intelligence.Using this profile the modeller can express which activitiesrequire input from a data warehouse or data mart.

Baumeister et al. devised another extension of activitydiagrams for modelling mobility [16]. But this approachis aimed at expressing how physical objects are moved byactivities from one location to another and doesn’t allow toformulate location constraints.

In literature some articles that propose location-awareaccess control models can be found. Notably most of themare extensions of Role-Based Access Control (RBAC, [17])like GEO-RBAC [18], LoT-RBAC [19] or S-RBAC [20]. Themost prominent aspects that distinguished these models is

68



A1 A2

U.K. LON

B1

A1 A2 A3

B1 B2 B3

CitiesEurope=

U.K.

a) b)

A

B

C

c)CitiesEurope

=USA

=

==

=

Figure 12. Examples for potential anomalies

Part of town

Company‘sPremises

=

Create newaccount

Manager

TravellingSalesman

Edit masterdata

Create neworder

Finalizeorder

=

Spain

=

Accountant

AccountingOffice

=

Figure 13. Usecase diagram with different types of location constraints

the subset of RBAC-components that can restricted with alocation constraint. For example, in GEO-RBAC the locationconstraints are assigned to the roles itself, so dependingon the mobile user’s current location individual roles areswitched on or off; in the S-RBAC-model location con-straints can be assigned to the association between rolesand permissions, so individual permissions of a role areswitched on or off depending on the location. We surveyedthese models in another article [7].

If access control decisions are based on the determinationof a mobile user’s location this leads to the question how

trustworthy the employed locating system is. An attackwith the intent to manipulate the location delivered by alocating system is called location spoofing. Such attackscan be performed by the possessor of the mobile computer(internal spoofing) or third party (external spoofing). In [21]we give an overview on different technical approaches forthe prevention of location spoofing.

To the best of our knowledge we are aware of only onefurther paper by another group that deals with location con-straints for workflows [22]. However, this model only allowsto assign individual locations to activity nodes to state where

69



the respective activity is allowed to be performed. There arealso location constraints assigned to the individual actors inthe model. The focus of this work is the development ofan algorithm to check if there are enough employees whocan perform the individual activities of a workflow underthe consideration of the respective location constraints.

In a recent paper we considered the problem of detectinginconsistencies in location aware access control policies[23]. However, this work focused solely on RBAC and didn’ttake any workflow-specific aspects into account. The basicidea is that it is possible to assign location constraints toseveral components of different type in a model at the sametime. For example, a role service technician could have alocation constraint so that this role can only be enabledin a certain region. Further, there could be also a locationconstraint assigned to a user that restricts his usage of themobile information system to the city where he has to work.If the role service technician is assigned to that user itcould occur that the intersection area of the two locationconstraints is empty, i.e., the user could never activate thenewly acquired role. Such an ”‘empty assignment”’ is astrong hint that the administrator of the system made amistake and should be informed about this.

Further, the concept of spatial coverage as way to checkthe validity of location aware access control policies isintroduced. For example, the coverage of a given role withrespect to the entity user is the spatial extent of all the pointswhere at least one user could activate that role according tothe user’s and the roles location constraints. If the coveragefor the role service technician doesn’t cover locations thatshould be served by the technicians then this again is astrong hint that something is wrong with the configuration ofthe access control model. In [23] we also describe severalother types of spatial coverage for the purpose of modelchecking.

XI. CONCLUSION

In the article, an UML profile was introduced that extendsactivity diagrams to express spatial constraints — so calledlocation constraints — concerning where individual activi-ties have to be performed or are not allowed to be performed.Using our profile location constraints can be defined at thedesign time of a workflow schema, but it is also possible todefine rules to automatically generate constraints during theruntime of a workflow instance.

Ideas for future work include to introduce further stereo-types to express mobile-specific constraints, e.g., to stateminimum capabilities for the mobile computers to performthe activity, for example a minimum screen size or theavailability of tamper-proof memory. Further, we envisionthe development of a graphical editor that supports drawingUML diagrams according to our profile; this editor should

integrate functionalities to work with geographic data todefine the spatial extents of location constraints.

Further, it would also be worthwhile to extend the locationmodel with security labels in the sense of mandatory accesscontrol (MAC). To the best of our knowledge there are onlytwo publication by other authors which deal with MAC-based location-aware access control [24][25]. However,these publications do not cover workflow-specific aspects.We also published an article on MAC-based location con-straints for database management system [26], but again thiswork isn’t workflow-specific. The basic idea of introductingsecurity labels into our modelling approach would be toassign clearance levels to location, e.g., a particular countryor building could have the clearance Top Secret, while otherlocations have only the clearance for the level Secret. Basedon this classification location constraints could be assignedto a UML activity diagram. Such a constraint could state thata particular activity can only be performed when the mobileactors stays at a location that has a clearance of Top Secret,but not below; or a location rule could be used to assign aderived constraint so that the target activity’s location has ata clearance level not lower than the location of the triggeractivity.

In Section VIII several types of anomalies where dis-cussed that can be found in activity diagrams when locationconstraints are applied. Since models should be free of suchanomalies we are working towards an algorithmic approachto automatically detect such anomalies.

UML activity diagrams are by far not the only graphicallanguage for the modelling of workflows. Another popularlanguage for workflow modelling is the Business ProcessModelling Notation (BPMN), which is also maintained bythe OMG [27]. We are also working on a BPMN profile forthe expression of location constraints similar to the profilepresented in this article [28].

Further, we are currently working on the application ofthe concept of location constraints for business processesfrom the domain of agriculture [29].

REFERENCES

[1] M. Decker, “Modelling Location-Aware Access Control Con-straints for Mobile Workflows with UML Activity Diagrams,”in The Third International Conference on Mobile UbiquitousComputing, Systems, Services and Technologies (UbiComm2009). Sliema, Malta: IEEE, October 2009, pp. 263–268.

[2] A. Oberweis, Process-Aware Information Systems. BridgingPeople and Software Through Process Technology. NewYork, USA, et al.: John Wiley & Sons, 2005, ch. Person-to-Application Processes: Workflow Management, pp. 21–36.

[3] M. Perry, K. O’Hara, A. Sellen, B. A. T. Brown, and R. H. R.Harper, “Dealing with mobility: understanding access any-time, anywhere,” ACM Transactions on Computer-HumanInteraction, vol. 8, no. 4, pp. 323–347, December 2001.

70



[4] M. Decker, “A Security Model for Mobile Processes,” in Pro-ceedings of the International Conference on Mobile Business(ICMB 08). Barcelona, Spain: IEEE, July 2008.

[5] A. Kupper, Location-based Services – Fundamentals andOperation. Chichester, U.K.: John Wiley & Sons, 2007,reprint.

[6] J. Hightower and G. Borriello, “Location Systems for Ubiqui-tous Computing,” IEEE Computer, vol. 34, no. 8, pp. 57–66,2001.

[7] M. Decker, “Location-Aware Access Control: An Overview,”in Proceedings of Informatics 2009 — Special Session onWireless Applications and Computing (WAC ’09), Carvoeiro,Portugal, 2009, pp. 75–82.

[8] Unified Modeling Language (OMG UML), Superstructure,V2.1.2, Object Management Group, 2007.

[9] R. Lake, D. S. Burggraf, M. Trninic, and L. Rae, GML.Geography Mark-Up Language. Foundation for the Geo-Web.Chichester, U.K.: John Wiley & Sons, 2004.

[10] D. S. Burggraf, “Geography Markup Language,” Data Sci-ence Journal, vol. 5, pp. 178–204, October 2006.

[11] E. Bertino, E. Ferrari, and V. Atluri, “The Specificationand Enforcement of Authorization Constraints in WorkflowManagement Systems,” ACM Transactions on Informationand System Security, vol. 2, no. 1, pp. 65–104, 1999.

[12] R. S. Sandhu, “Separation of duties in computerized infor-mation systems.” in Results of the IFIP WG 11.3 Workshopon Database Security (DBSec), Halifax, U.K., 1990, pp. 179–190.

[13] J. B. Warmer and A. Kleppe, The object constraint lan-guage: getting your models ready for MDA, 2nd ed., ser.The Addison-Wesley Object Technology Series. Boston,Massachusetts, USA: Addison-Wesley, 2003.

[14] A. Rodriguez, E. Fernandez-Medina, and M. Piattini, “To-wards a UML 2.0 Extension for the Modeling of SecurityRequirements in Business Processes,” in Third InternationalConference on Trust and Privacy in Digital Business (Trust-Bus), Krakow, Poland, 2006, pp. 51–61.

[15] V. Stefanov, B. List, and B. Korherr, “Extending UML 2Activity Diagrams with Business Intelligence Objects,” inProceedings of Data Warehosuing and Knowledge Discovery(DaWaK 2005), Copenhagen, Denmark, 2005, pp. 53–63.

[16] H. Baumeister, N. Koch, P. Kosiuczenko, and M. Wirsing,“Extending Activity Diagrams to Model Mobile Systems,”in Proceedings of NetObjectDays (NOD), Erfurt, Germany,2002, pp. 278–293.

[17] D. F. Ferraiolo, R. Sandhu, E. Gavrila, D. R. Kuhn, andR. Chandramouli, “Proposed NIST Standard for Role-BasedAccess Control,” ACM Transactions on Information and Sys-tem Security, vol. 4, no. 3, pp. 224–274, 2001.

[18] M. L. Damiani, E. Bertino, and P. Perlasca, “Data Securityin Location-Aware Applications: An Approach Based onRBAC,” International Journal of Information and ComputerSecurity, vol. 1, no. 1/2, pp. 5–38, 2007.

[19] S. M. Chandran and J. Joshi, “LoT-RBAC: A Location andTime-Based RBAC Model,” in Proceedings of the 6th Inter-national Conference on Web Information Systems Engineering(WISE ’05). New York, USA: Springer, 2005, pp. 361–375.

[20] F. Hansen and V. Oleshchuk, “SRBAC: A Spatial Role-BasedAccess Control Model for Mobile Systems,” in Proceedings ofthe 7th Nordic Workshop on Secure IT Systems (NORDSEC).Gjovik, Norway: NTNU, 2003, pp. 129–141.

[21] M. Decker, “Prevention of Location-Spoofing. A Survey onDifferent Methods to Prevent the Manipulation of Locating-Technologies,” in Proceedings of the International Confer-ence on e-Business (ICE-B). Milan, Italy: INSTICC, 2009,pp. 109–114.

[22] R. Hewett and P. Kijsanayothin, “Location contexts in role-based security policy enforcement,” in Proceedings of the2009 International Conference on Security and Management(SAM’09), Las Vegas, Nevada, USA, 2009, pp. 404–410.

[23] M. Decker, “An Access-Control Model for Mobile Com-puting with Spatial Constraints - Location-aware Role-basedAccess Control with a Method for Consistency Checks,” inProceedings of the International Conference on e-Business(ICE-B 2008). Porto, Portugal: INSTICC, July 2008, pp.185–190.

[24] U. Leonhardt and J. Magee, “Security Considerations fora Distributed Location Service,” Journal of Networks andSystems, vol. 6, no. 1, pp. 51–70, 1998.

[25] I. Ray and M. Kumar, “Towards a Location-based MandatoryAccess Control Model,” Computers & Security, vol. 25, no. 1,pp. 36–44, 2006.

[26] M. Decker, “Mandatory and Location-Aware Access Controlfor Relational Databases,” in Proceedings of the InternationalConference on Communication Infrastructure, Systems andApplications in Europe (EuropeComm 2009), ser. LNICST,R. M. et al., Ed., no. 16. London, U.K.: Springer, August2009, pp. 217–228.

[27] OMG, Business Process Model and Notation (BPMN) v. 1.2,Object Management Group, January 2007.

[28] M. Decker, H. Che, A. Oberweis, P. Sturzel, and M. Vogel,“Modeling Mobile Workflows with BPMN,” in Proceedingsof the Ninth International Conference on Mobile Business(ICMB 2010)/Ninth Global Mobility Roundtable (GMR 2010).Athens, Greece: IEEE, 2010, pp. 272–279.

[29] M. Decker, D. Eichhorn, E. Georgiew, A. Oberweis, J. Plaß-mann, T. Steckel, and P. Sturzel, “Modelling and enforcementof location constraints in an agricultural application scenario,”in Proceedings of the Conference on Wireless Applicationsand Computing 2010 (WAC 2010), Freiburg, Germany, 2010,accepted, to appear.

71



Network Prediction for Energy-Aware Transmission in Mobile Applications

Ramya Sri Kalyanaraman

Helsinki Institute for Information Technology HIIT

P.O. Box 15400, FI - 00076 Aalto

Espoo, Finland

[email protected]

Yu Xiao, Antti Ylä-Jääski

Aalto University

P.O. Box 15400, FI-00076 Aalto

Espoo, Finland

{yu.xiao, antti.yla-jaaski}@tkk.fi

Abstract— Network parameters such as signal-to-noise-ratio

(SNR), throughput, and packet loss rate can be used for

measuring the wireless network performance which highly

depends on the wireless network conditions. Previous works on

energy consumption have shown that the performance of

wireless networks have impact on the energy efficiency of data

transmission. Hence, it is potential to gain energy savings by

adapting the data transmission to the changing network

conditions. This adaptation requires accurate and energy-

efficient prediction of the network performance parameters. In

this paper, we focus on the prediction of SNR and the

prediction-based network adaptations for energy savings.

Based on the SNR data sets collected from diverse real-life

networks, we first evaluate three prediction algorithms,

namely, Autoregressive Integrated Moving Average, Newton

Forward Interpolation, and Markov Chain. We compare these

three algorithms in terms of prediction accuracy and energy

overhead. Later we propose a threshold-based adaptive policy

which controls the data transmission based on the predicted

SNR values. To evaluate the effectiveness of using network

prediction in adaptation, we use a FTP as a case study and

compare the network goodput and energy consumption under

different network conditions. The experimental results show

that the usage of adaptations improves the network goodput.

Furthermore, the adaptations using prediction can save up to

40% energy under specific network conditions when compared

to the adaptation without prediction.

Keywords-prediction; adaptation; SNR; power; context-

awareness; policy-based.

I. INTRODUCTION

Mobile hand-held devices are more and more used for accessing rich multimedia content and various social media services. These new applications and services often call for more transmission capacity, processing power and high-quality displays which increase the energy usage. The improved user experience might be compromised by the short battery lifetime of the device. Unfortunately the improvements in battery technology are rather slow that there is a strong motivation to invest in software techniques to save the energy usage of mobile devices.

In network applications, major part of the energy is consumed by data transmission. The cost incurred during data transmission is not only dependent on the amount of data being transferred, but also the network goodput [2]. There is an indirect connection between the network goodput and other network performance parameters such as signal-to-noise-ratio (SNR). For example, in the wireless networks

with higher SNR, the network goodput is usually higher, and therefore the energy consumption per a unit of transmitted data is lower. Hence, it is possible to save energy by transferring data only under the conditions where SNR values are high.

In this paper we focus on the prediction of SNR and its potential for improving the efficiency of an energy-aware network adaptation. With the help of the predicted future SNR values, it is possible to make a priori decisions as to when to schedule the data transmission over a wireless link. We propose a general solution of the prediction-based adaptive network transmission, including prediction algorithms of SNR and an adaptive policy. We evaluate our solution using FTP as a case study, while our solution is scalable to other network applications such as real-time streaming by taking the application-specific performance constraint into account in our adaptive policy.

Although the prediction of network parameters [3, 4] and their use in context-aware mobile applications have been discussed in the literature [5, 6, 7, 8, 9, 10], using the predicted network parameters for energy efficiency has not been widely studied, especially the usage of SNR for adaptive network transmission. Our contributions include the following aspects.

a) Comparing the performance of the prediction

algorithms, namely, Auto Regressive Integrated

Moving Average (ARIMA), Newton’s Forward

Interpolation (NFI) and Markov chain (MC). We

analyze the performance in terms of prediction

accuracy, processing load and energy overhead.

b) Proposing and evaluating a prediction-based power

adaptation. The results show that the potential in energy

saving under favorable network conditions can be up to

40% with the help of the predicted SNR.

This work is an extended version of our earlier

publication [1]. Major portions of the experimental results have been updated, with enhancement in the offline training and online computation. In this work, we refine the evaluation by classifying the network conditions and analyzing the effectiveness of energy savings under different network conditions. In addition, the Linear Regression method used in [1] is replaced by MC, as MC shows better performance in short-term prediction.

The remainder of this paper is structured as follows. Section 2 reviews the related work in network prediction and

72



prediction-based adaptations. The prediction algorithms used in this study are introduced in Section 3, and the model fitting using these algorithms is described in Section 4. Section 5 describes a prediction-based adaptation. In addition, Section 5 includes the evaluation of the adaptation using FTP as a case study. In Section 6, we discuss the obtained experimental results. Finally, we present the concluding remarks and potential future work in Section 7.

II. RELATED WORK

Adaptive mobile applications are aware of and able to adapt to the changes in context such as the network signal strength, residual battery lifetime, and the user’s location. The key for providing such adaptive applications lies in the decision-making of adaptations based on the available context data and predefined policies.

The context information includes the past, present, and future states of the contexts. Obviously, the past and present states of the contexts can be collected during runtime, whereas the future states are not available and have to be predicted based on the past and/or present states. The predicted states of the contexts are often used in context-aware applications for enhancing the effectiveness of adaptations, such as improving the resource utility and system performance. In [3] prediction of long range wireless signal strength is presented, where prediction is utilized for efficient use of resources in dynamic route selection in multi-hop networks.

For network-aware adaptive applications, network conditions are the most important contexts and can be described using signal strength, SNR, goodput, bit error rate and other contexts that reflect the status of the network transmissions. The so-called Box-Jenkins approach is the most widely used technique for network prediction. The basic idea is to train models with different parameters to fit the pre-collected data sets. This approach is adopted by many linear and non-linear models, such as ARIMA [13] and Generalized Auto Regressive Conditional Heteroskedasticity (GARCH) [16]. In addition, Linear Regression [3], the MC [4], the Hidden Markov Model [7], and the Artificial Neural Network [17] have also been proposed for network prediction. In this paper, we apply ARIMA, MC and NFI [14] to the prediction of network SNR.

As mentioned before, network conditions can be described using various parameters. Different prediction algorithms might fit different contexts better depending on the characteristics of the contexts and the algorithms themselves. Previous researchers have been trying to apply different prediction algorithms to the prediction of different contexts. For example, [4] and [20] predict the status of network connectivity and the future location of mobile device using the MC, [3] utilized a linear regression approach for long range signal strength prediction, and [21] used the Hidden Markov Model to predict the number of wireless devices connected to an access point at a given time. Link prediction and path analysis using Markov chains in the World Wide Web are presented in [22].

Different prediction techniques are compared in terms of k-step-ahead predictability and performance overhead. For

example, one-step-ahead predictability can be measured by Mean Square Error (MSE), Normal Mean Square Error (NMSE)[3], Root Mean Square Error (RMSE) [11] and Signal-to-Error Ratio (SER). The performance overhead includes the computational cost such as CPU cycles and memory access counts, and also the energy consumption caused by the prediction itself.

In most of the research work focusing on prediction for context-aware and adaptive applications, the motivation lies in increasing and analyzing the prediction accuracy of various prediction methods. In [11] various prediction methods such as Neural networks and Bayesian networks, state and Markov predictors are modeled for performing the next location prediction using the sequences of previously visited locations. In this paper, prediction accuracy has been considered as the main indicator for benchmarking the performance of various prediction techniques. To the best of our knowledge, our paper is the first one presenting the energy overhead of different prediction algorithms.

Adaptive applications purely based on policies consist of the condition and action pairs for different contexts. Here the possibility of anticipating the future trend of the context is not utilized. In adaptive applications with the support of prediction based adaptation, the system is given a chance to know the future trend of the context, and act accordingly.

Most of the existing adaptive network applications make adaptive decisions according to predefined policies. Predicted network contexts can be used as conditions of adaptive policies. In [18], the feasibility of implementing policy-based network adaptations in middleware architecture is presented. However, prediction-based network adaptations emphasizing possible energy conservation have not been widely studied. In this paper, we propose an adaptive file download in WLAN based on the prediction of network SNR, and compare the efficiency of prediction and adaptations among three different prediction techniques.

III. NETWORK PREDICTION ALGORITHMS

Dynamic adaptation to the environment and proactive behavior are key requirements for mobile adaptive applications. Prediction algorithms play a major role in providing a proactive nature for such applications [5]. In this paper we focus on predicting the network SNR that will be used for adaptive network applications.

In our approach, the nature of input data we observe to make the network prediction reflects the time series pattern [11]. We choose ARIMA [12], the most general model for forecasting a time series pattern. Selecting NFI [14] allows us to fit non-linear data over the curve. In addition, we experiment with MC representing a discrete time stochastic process. While evaluating the best suiting algorithm for our application, a trade-off is made between prediction accuracy, processing load and energy consumption, instead of compromising between the highest accuracy and the lowest energy consumption alone.

73



A. Auto Regressive Integrate Moving Average(ARIMA)

Classical linear time series models are based on the idea that the current value of the series can be explained as a function of the past values of the series and some other independent variables. ARIMA is one of the most famous linear predictive models used for network prediction. It is an integration of three models, namely, an autoregressive model of order p, a moving average model of order q, and a differencing model of order d. According to the definition in [18], a process, , is said to be ARIMA(p, d, q) if

, (1)

where B is a backward shift operator, is an autoregressive operator, is a moving average operator, and is assumed to be a Gaussian white noise. The three operators can be expressed as below.

, (2)

(3)

(4)

where and are constants.

Let be the predicted SNR value at time t, and Y(t-i)

be the observed SNR value at time . The time series

of SNR is considered to be an ARIMA (p, d, q) model if

, (5)

where is the residual of prediction at time , and is the intercept.

The values of p, d, q are identified based on the analysis

of Autocorrelation Function (ACF) and Partial

Autocorrelation Function (PACF), while the estimation of

parameters , , are based on the least squares method

which aims at minimizing the MSE of residuals as shown in

Equation (6). The procedure of model fitting will be detailed

in Section 4.

(6)

B. Newton’s Forward Interpolation(NFI)

The observation at time x in a time series is defined as a function , and the values of x are tabulated at interval h as shown in Equation (4).

(7) where the first value of x is , and i is the number of intervals between x and . When only a few discrete values of are known, NFI aims at finding the general form of based on known values by using the finite difference formulas.

Let , and the finite forward

difference of a function is defined by Higher orders, such as the order forward

difference , can be obtained by repeating the operations

of the forward difference operator j times. For example,

(8)

NFI model fits the observation at time x with an ith degree polynomial as Equation (9).

(9)

When only the past i SNR observations are known, up to (i-1)th order forward difference can be developed. The prediction of SNR at time x, corresponding to the (i+1)th observation, is defined as a function g(x) in Equation(10).

(10)

Compared to f(x), the error term e(x) is approximated

as Equation (11).

(11)

Different from ARIMA, NFI could predict future values

online directly based on past observations without offline

training. Since the prediction accuracy of NFI depends on

the size of N, the size of N can be selected by applying the

least square method to the training data sets as used in

ARIMA model fitting. The offline training and online

implementation are described in Section 4.

C. Markov Chain(MC)

MC is a discrete time stochastic process with the

Markov property, where given the present state, the future

and past states are independent. To determine the future

state from the present state, a probabilistic approach is used.

A state space is defined where all the possible states of the

system are represented. The change of state from one to the

other is called state transition. Let us consider a discrete

time process, Xn: n>0 with discrete state spaces {i, j}. The

transition probability from i to j is computed as given in

Equation (12).

(12)

74



The state transition involved in our model, and the online

prediction of the state transition is explained in Section 4.

IV. NETWORK PREDICTION

Prediction of network related information adds value to

ubiquitous applications, as the need for dynamic adaptation

in a fluctuating network environment becomes a major

requirement in such cases. In addition to network traffic and

mobility prediction, predicting the value of SNR itself could

provide a timely hint about possible changes in network

conditions, and thus the application can prepare itself for

adaptation. In this section, we present the steps involved in offline

training and online computation of prediction algorithms. The workflow is described in Figure 1. To perform offline training we carry out data collection. The spots chosen for data collection involve physical interferences and disturbances. Offline training enables us to estimate the parameters required for online computation, and the performance of prediction algorithms can also be analyzed using offline training. In online computation, the prediction algorithms are applied during runtime and the outcomes are matched with the adaptation policy.

A. Data Collection

The offline training used in our methodology significantly depends on the data collection. The data sets were gathered considering the following factors.

a) The user walked around a defined path, where the path

involved different types of physical obstacles,

interferences and noise.

b) A total of 12 data traces were collected at different

times during three days, where the collection of each

set lasted for 17 minutes.

c) The user walked around at a normal walking speed, and

the signal strength and the noise level were recorded for

every 100 millisecond. We performed the data collection in the Computer

Science department building of the university. Figure 2 shows the route map of the data collection. The defined path covered the first and second floors of the Computer Science department, which includes the corridor of the Data Communications Software lab, department library, café and the stairs to the second floor. To cover different network environments, we chose the path to include different types of physical obstacles, and interference. For example, the department corridor consists of physical obstacles such as wall structures and glass doors, and the library has metallic book shelves, and wooden structures. The open area in the cafe remains as a place with possible interferences due to multiple access points, and the significant presence of Bluetooth enabled devices.

B. Data Analysis and Pre-processing

By observing the collected data sets, we noticed that the

SNR values remain unchanged for several milliseconds as

shown in Figure 3. We therefore re-sampled the collected

data sets at 1Hz, and used the new data sets for further

processing. We chose 80% of the samples as the training

data set, and the others for testing.

Data smoothing is often used for figuring out future

trends, such as the trend in marketing size or stock prices.

However, for short-term prediction like SNR prediction in

our case, it remained an interesting question whether data

smoothing still helped to improve the prediction accuracy.

We therefore applied data smoothing over the training data

set, and compared the prediction accuracy between the

models using a smoothed data set and the original training

data set.

Figure 1: Workflow of offline training and online

computation

Figure 2: Route Map for Data Collection

Start

Data Collection

Estimate parameters

required for online computation

Performance Analysis of prediction algorithms

Apply Prediction in adaptive application

Compute power consumption with/without

prediction

Offline Training

Online ComputationPerform online SNR prediction

DC

S R

ese

arc

h

Gro

up

Co

rrid

or

stairs

stairs

Door

Door

Library

Cafe

Hall

75



Figure 3: Samples of SNR values collected at 10Hz.

Figure 4: Residuals of smoothing with window size 20.

Table 1: Comparison of prediction accuracy among

different models.

MSE NMSE SER

ARIMA(0, 1,1) 97.58624 0.425374 7.456893

ARIMA(1, 0,1) 89.26294 0.389093 7.844067

ARIMA(2, 1,0) 174.4522 0.760430 4.934015

NFI(N=3) 692.8996 3.015758 -1.0487

NFI(N=4, 5) 99.3080 0.432879 7.380937

To perform data smoothing, we apply the simple

Moving Average algorithm [23] which is used for time

series data set smoothing in the field of statistics. According

to the moving average algorithm, from the collected N data

points, we perform averaging for every W data set, where W

is often termed as the window size. The general expression

for the moving average is given as in Equation (13).

(13)

In general as the window size increases the trend of the

smoothed data set becomes clearer, but with a risk of a shift

in the function. To choose an optimal window size we

performed smoothing with a varied window size, and

selected the best one with the minimum MSE of the

residuals. The residuals are the differences between the

original values and smoothed values. According to our

experimental results, smoothing with a window size of 20

has the minimum MSE which is 9.5 as shown in Figure 4.

C. Offline training/Model Fitting

1) ARIMA

ARIMA offline training is to fit an ARIMA(p, d, q) model

to the collected data sets. The outputs of the model fitting

includes the orders of autoregressive, differencing, and

moving average, p, d, q, and the estimated parameter values

of the autoregressive operator, the moving average operator,

and the intercept, , , , as defined in Section 3.1. The

model fitting includes the following five basic steps [18].

a) Data transform. We plotted the data and observed

the possible data transform. As introduced in Section 4.1,

we applied smoothing to the raw data, and trained the

models using both data sets and comparing the results.

b) Model identification. The orders of autoregressive,

differencing and moving average are identified by observing

the ACF and PACF of the residuals. For orginal data sets,

both ARIMA(0, 1, 1) and ARIMA(1, 0, 1) fit the data well

and the differences in terms of the ACF and PACF of the

residuals are negligible. Hence we trained both models and

chose the better one after diagnostics. Similarly, for

smoothed data sets with window size 20, ARIMA(2, 1, 0)

fits better than other models for smoothed data sets.

c) Parameter estimation.We estimated the parameters

of the expected models which obtain the minimum MSE.

d) Diagnostics.We applied the models obtained from

the last step to the testing data sets and compared the

accuracy among the models. First, we calculated the MSE,

NMSE, and SER for each model. NMSE is a function of the

MSE normalized by the variance of the actual data as

defined in Equation (17).

. (17)

SER is defined as in Equation (18).

(18)

The smaller the MSE and NMSE are, the higher the

accuracy is. Conversely, the bigger the SER is, the higher

the accuracy is. The results of MSE, NMSE and SER for the

three ARIMA models are listed in Table 1.

e) Model Selection. According to Table 1, ARIMA(2,

1, 0), with training based on the smoothed data set, showed

the lowest accuracy compared to the models obtained from

original data sets. ARIMA(1, 0, 1) fits the testing data sets

better than others due to the smallest MSE and NMSE, as

well as the highest SER. Hence we chose ARIMA(1, 0, 1)

which is shown in Equation (14) for online SNR prediction.

(14)

0

5

10

15

20

25

1 6 11 16 21 26 31 36 41 46 51 56 61 66 71 76 81 86 91 96

SNR(dB)

Time(ms)

0

20

40

60

80

100

1

50

1

10

01

15

01

20

01

25

01

30

01

35

01

40

01

45

01

50

01

55

01

60

01

65

01

70

01

75

01

80

01

85

01

Residuals(dB)

Time(second)

76



Figure 5: Transition probability between States, S1 and S2.

The implementation and online performance evaluation is

presented in Section 4.3.

2) NFI

The accuracy of NFI prediction depends on the order of

the forward difference N as explained in Section 3.3. We

chose the size of N based on the least square method as used

for ARIMA model fitting. As shown in Table 1, we

compared the MSE when N is set to 3, 4, and 5 respectively.

When N is 3, the MSE is much bigger than others, while the

same predicted values and also the same MSE, NMSE, and

SER values are obtained when either N= 4 or 5. Hence, we

selected the order of the forward difference as 4 for online

prediction.

3) Markov Chain

We designed a two-state Markov Chain model as shown

in Figure 5, where S1 and S2 are the two possible states.

Here S1 represents the SNR value less than or equal to the

threshold, and S2 corresponds to a SNR greater than the

threshold. Either S1 or S2 can represent the initial state

depending upon the value of the SNR at the start of

application.

There are four possible state transitions involved in the

state machine, with four different transition probabilities.

Let St and St+1 represent the state of SNR at time t, and t+1.

The four probable state transitions are as given below:

SNR low, low (t) = Pr {St+1 = low | St = low},

SNR low, high (t) = Pr {St+1 = low | St = high},

SNR high, high (t) = Pr {St+1 = high | St = high},

SNR high, low (t) = Pr {St+1 = high | St = low}.

The values of the four probabilities are obtained from

the training of the data sets. During the training, we set a

threshold for discreetizing the samples into low and high

states to 19, and calculated the probability distribution. The

values of the four probabilities are shown in Figure 5. The

precision of predicted outcomes using the Markov chain is

0.8191 as calculated according to Equation (19).

(19)

D. Online prediction

We present the online SNR prediction with the same

monitoring and prediction frequencies in this Section, and

set the initial values of the frequencies to 1Hz.

1) ARIMA(1, 0, 1)

The monitored and predicted SNR values are recorded in

arrays s[] and p[] respectively. The pseudo code of ARIMA

(1, 0, 1) online prediction is described as below.

//initialize the parameters of ARIMA(1,0,1);

Set ar=0.7997, ma=-0.0052, intercept=22.1650;

Initialize s[], p[] to 0;

i=0;

Repeat every 1 second{

Monitor SNR and write into s[i];

if (i>0){

err = s[i-1]-p[i-1];

p[i]=intercept + ar*(s[i-1]-intercept)

+ma*err;

}

2) NFI

Set the order of forward difference to be 4. The

monitored and predicted SNR values are recorded in array

s[] and p[] respectively. The 1st, 2nd, and 3rd order of forward

differences are calculated during runtime, and are written

into arrays delta[], delta2[], and delta3[] respectively. The

pseudo code of the online NFI prediction is described as

below.

Initialize delta[], delta2[],delta3[],p[], s[] to 0;

i=0;

Repeat every 1 second{

Monitor SNR and write into s[i];

If (i>0)

delta[i]=s[i]-s[i-1];

If (i>1)

delta2[i]=delta[i]-delta[i-1];

If (i>2)

delta3[i]=delta2[i]-delta2[i-1];

If (i>3)

p[i]=s[i-4]+3*delta[i-3]+3*delta2[i-2]+delta3[i-

1];

i++;

}

3) Markov Chain

The Markov chain SNR predictor computes the probable

next state (St+1), given the current state (St) of SNR,

according to the probability of different transitions obtained

S1 S2

0.983

0.0220

0.017 0.978

77



Table 2: Comparison of performance and power overhead

among different models. ARIMA NFI MC

CPU_CYCLES:100000 445 432 623

DCACHE_ACCESS_ALL:10

0000

31 25 73

Power(W) 0.365 0.366 0.474

Table 3. Classifications of the network conditions. 1 SNR mean is smaller than 15; SNR threshold is set to 15.

2 SNR mean is between 15 and 20; SNR threshold is set to 15 and 20.

3 SNR mean is bigger than 20, and standard deviation of SNR is smaller than 5. SNR threshold is set to 20.

4 SNR mean is bigger than 20, and standard deviation of SNR is not smaller than 5. SNR threshold is set to 15 and 20.

from offline training. The pseudo code for predicting the

transition probability of the future state of SNR is given as

follows:

//Monitor SNR and discreetize the received value.

Get current_SNR;

if (current_SNR <= Threshold)

Current_state = S1;

else current_state = S2;

//Determine next state. SNRt[]is the transition matrix

obtained from offline training. SNRt[0] and SNRt[1]

represents the probability of the state transition from S1 to

S2 and vice versa .

Generate a random number, X.

if (current_state == S1)

{

if (X ≤ SNRt[0])

next_state = S1;

else next_state = S2;

}

else

{

if (X ≤ SNRt[1]

next_state = S2;

else next_state = S1;

}

D. Model Evaluation

We measured the overhead of online SNR prediction in

terms of CPU cycle count, data cache access rate and power

consumption. We ran oprofile [26] on the Nokia N810 while

running the online prediction algorithms for 1 minute, and

monitored the events, namely, CPU_CYCLES and

DCACHE_ACCESS_ALL. The CPU cycle count and data

cache access counts caused by the SNR prediction are listed

in Table 2.

We used Nokia power measurement software to measure

the power consumption during runtime. During

measurements the display was turned off and the WLAN

interface was turned on. The average power is 0.361 W

when the system is idle and the power saving mode of

WLAN is turned on, and 0.362 W when SNR monitoring is

running at 1Hz at the same time. We measured the average

power consumption when different online SNR predictors

were running, and the comparisons are listed in Table 2.The

energy overhead is calculated as the difference between the

power consumption when the prediction algorithms are

running and the power consumption of the device when it is

in IDLE state. The power overhead caused by online

prediction is between 0.004W and 0.113W depending on the

prediction algorithm used.

V. PREDICTION-BASED NETWORK ADAPTATIONS

Wireless transmission is considered to cost much more energy than local processing on mobile devices [15], and the energy consumed by wireless transmission varies with the performance of the wireless networks. The performance is highly dependent on the network conditions. Generally, wireless transmission under better network conditions is more energy-efficient. Hence, adaptive mobile applications are expected to adapt wireless transmission that the transmission can be conducted under relatively good network conditions in order to save energy. In this section, we describe an application performing power adaptation using network prediction.

A. Overview

Based on the comparison and analysis of prediction

algorithms made in Section 4 we are able to distinguish

between the pros and cons of using different prediction

methods, especially in terms of prediction accuracy,

performance and energy overhead. After performing the

offline training, and the analysis of different prediction

methods we continue forward to the second phase of

network prediction, online computation. In this section we

present a real time user scenario that demonstrates the need

for adaptation in ubiquitous applications. Realizing the

requirements for network adaptation from the presented

scenario we apply network prediction in the application. Let us consider a practical scenario showcasing the need

for prediction enabled adaptive applications. Alice’s mobile phone is connected to a WLAN network

and she has been in the process of downloading a file. The network quality associated with the device fluctuates between poor to strong. The application with the help of prediction foresees the status of network, and hence chooses an adaptive action accordingly in order to gain the optimal power saving. The adaptive action can be pausing, stopping, or continuing the file download.

78



Figure 6: Flow chart representing online adaptation

B. Adaptation Policy

Adaptation policy defines the actions to be performed based on the changing context data. Adaptation is triggered in mobile applications as the outcome of the prediction is matched with a simple predefined adaptation policy. The adaptation policy can be defined by either the application developer or end-user. In the case of the end-user, policies can be in the form of user preferences for adaptive applications. The priority and conflict management between prediction results and current network condition however are outside the scope of this paper.

In the file download scenario, we designed an adaptive

policy which enables the application to pause or continue

the file download depending on the predicted future SNR.

The basic idea is to transfer the data only when the SNR is

above a threshold. We classify the network condition into

four types using the mean and standard deviation of SNR as

described in Table 3. The threshold value is set depending

on the type of the network condition. It is described as

below.

if (predicted_SNR < threshold) && (current_state ==

downloading)

Pause download;

else if ((predicted_SNR ≥ threshold) && (current_state ==

hold))

Continue download;

else

// Continue the same action state.

Continue download | Pause download; While applying the above mentioned adaptive policy for

the applications such as real-time streaming which is not delay-tolerant, the settings of the threshold value must take the tolerant delay into account. For example, the delay caused by the pause operation can be predicted based on the predicted SNR value. Only when the delay is tolerable, the download can be paused. The prediction based adaptation continues till the file download is completed. Figure 6 describes the online adaptation performed in the file download scenario using the online prediction and defined

policy. The implementation for online computation of the prediction forecast is combined with the adaptation code. Thus the prediction outcome is directly used for choosing the adaptation policy, and executes an adaptive operation in the file download application.

C. Experimental results

The prediction-based power adaptation is evaluated in terms of energy consumption, network goodput, and prediction accuracy, under different network conditions. For different network conditions, we test the adaptations with an SNR threshold of 15 and/or 20 with an exception for the MC method. For the MC method we set the threshold to be 19, as the offline training and online computation has to follow the same threshold value. Hence, our definitions of test cases include the network conditions in terms of the mean and standard deviation of SNR values, SNR threshold value, and possible network goodput range. Though the MC method follows a different threshold, the measurements obtained using the MC as the prediction method can still be accommodated within the test cases described. Based on the analysis of our experimental network conditions, we divide our test cases into four scenarios as listed in Table 3.

The adaptation code is implemented using C language. The power measurements for the adaptive file download with and without prediction were carried out on a public 802.11g network in our campus with the power saving mode on. A TCP client wget (http://www.gnu.org/software/wget/) running on the N810 downloaded a MySQL software package from a remote file server [26]. The file name is MySQL-shared-compat-5.1.42-0.rhel3.x86_64.rpm with a size of 5072982 Bytes. We set both the prediction and monitoring frequency to 1Hz and performed one-step-ahead prediction. Adaptation is performed for each prediction output, which means the adaptation policy is executed every one second.

We measured the power consumption in different scenarios without adaptation, and used the values as base line for comparison. The power measurement results are listed in Table 4. Since the energy consumption depends on the network goodput, and the network goodput can vary even in the same scenario, we list multiple samples in each scenario with possible network goodput. The testing results of each scenario are listed in Table 5.

The download duration is the duration from the beginning to the ending of the file download, while the total energy includes the energy consumption of the mobile device during the download duration. Pause duration is the total duration when the file download is paused due to power adaptation, and the actual download duration is the download duration deducted by the pause duration. We calculate the network goodput as the downloaded file size divided by the actual download duration. The prediction accuracy is the ratio of the number of correctly predicted samples to the total number of predicted samples. For example, if the predicted SNR is higher than the SNR threshold and the real SNR is lower than the SNR threshold, or if the predicted SNR is lower than the SNR threshold while the real one is higher, it

Start

Monitor SNR

Compute predicted SNR

if ((predicted_

SNR) < Threshold))

Pause Download

YES

NO

YES

if ((Download == Completed

)

Continue Download

Stop

if (current_stat

e == Downloading)

YES

NO

NO

if ((current_state == HOLD )

YES

NO

79



Table 4: Baseline of energy consumption in different network scenarios. Type SNR mean SNR standard deviation Energy(J) Network Goodput (KB/s)

1 14.245 5.806 92.075 54.590

1 11.391 5.454 118.945 36.765

2 19.9 3.872 52.903 158.521

2 19.346 6.425 43.943 178.526

2 15.660 5.010 45.918 165.136

3 24.273 4.548 38.990 225.196

3 26.690 4.635 55.213 147.878

3 26.952 3.290 63.745 120.828

4 22.325 9.024 62.775 60.047

4 21.900 8.110 45.813 176.925

4 24.204 5.500 194.58 18.264

Table 5: Experimental results for test cases under different network conditions.

Type SNR mean SNR standard Deviation

Download duration(s)

Pause duration(s)

Network goodput (KB/s)

Energy(J) Accuracy (%)

Model Threshold

1 13.450 4.625 78.751 44 142.559 68.463 85.00 ARIMA 15

1 13.310 6.468 101.750 37 76.511 55.200 82.76 ARIMA 15

1 14.926 7.387 54.001 25 170.825 59.296 77.78 NFI 15

1 12.615 6.446 64.748 38 185.213 60.523 87.02 NFI 15

2 17.567 6.026 40.499 12 173.834 49.098 78.38 ARIMA 15

2 16.514 8.862 35.249 15 244.658 45.47 83.78 NFI 15

2 17.283 4.662 133.999 90 115.298 108.733 92.54 ARIMA 20

2 18.113 5.106 81.999 53 170.836 59.590 86.25 NFI 20

2 18.901 4.867 60.000 31 170.800 59.718 78.43 MC 19

3 27.864 4.560 22.001 1 235.898 39.030 90.91 ARIMA 20

3 25.519 2.513 22.000 0 225.186 39.970 100 NFI 20

3 24.296 3.831 30.501 4 190.538 46.87 74.07 MC 19

4 34.206 7.014 32.251 0 153.61 38.213 100 ARIMA 15

4 30.842 8.772 37.999 3 141.549 44.068 89.47 NFI 15

4 28.926 11.198 30.251 9 233.122 38.423 96.43 ARIMA 20

4 26.758 6.083 30.748 3 178.538 47.648 100 NFI 20

Table 6: Experimental results with low network goodput.

SNR mean SNR standard deviation

Download duration(s)

Pause duration(s)

Network goodput(KB/s)

Energy(J) Accuracy (%)

Model Threshold

28.436 5.543 276.250 1 17.998 189.645 98.91 ARIMA 15

21.089 8.971 216.509 90 40.1 137.753 87.68 ARIMA 20

24.5 7.104 195 50 34.165 137.617 70.97 MC 19

is considered to be a wrongly predicted sample.

In the network scenarios of Type 1, SNR fluctuates between 0 and 20 with the mean less than 15. When the threshold is set to 15, the file download is paused when the predicted values are lower than 15. According to Table 5, the pause durations take around 36% to 59% of the download duration, whereas the network goodput increase more than 100% compared to the base line. The energy consumption with adaptation is reduced by 40% on average.

For Type 2, when the threshold is set to 20, the pause duration becomes very big. The overhead caused by the adaptation includes the energy overhead of prediction itself and the energy cost during the pause duration. The total overhead is bigger than the energy savings caused by the adaptations. Hence, when the SNR is relatively stable with

the standard deviation less than 5 and the mean value bigger than 15, it is not energy-efficient to adopt adaptation with a threshold of 20 even though the prediction accuracy is relatively high. When the threshold is set to 15, the download duration decreases and thus leads to energy savings when compared with the corresponding Type 2 scenario.

For Type 3, the SNR values are high and the standard deviation is very small. The pause duration is very short, and the impact on network goodput is negligible. The energy consumption depends on the network goodput, and the energy overhead is caused by the prediction algorithm, which is around 0.1W. For type 4, when the threshold is set to 15, it leads to energy savings, whereas when the threshold is set to 20 the energy saving is comparatively less.

80



Due to the workload of the file server and network overload, the network goodput sometimes becomes much smaller under the same SNR conditions. As shown in Table 6, for Type 4, when network goodput is relatively lower, the energy consumption is more than 190J. Compared to the similar case in Table 4, the adaptation can help save energy especially when the threshold is set to 20.

VI. DISCUSSION

It is well-known that network transmission at a higher data rate results in less energy consumption. Our adaptation aims at adapting the network transmission to network conditions in terms of SNR values by controlling the transfer operations depending on the one-step-ahead SNR prediction. In our adaptation based on the prediction method, if the predicted SNR value is lower than a predefined threshold, transmission will be paused or else continued. Through such adaptation, network goodput could be increased or maintained since the download speed is relatively higher under better network conditions in terms of higher SNR values. In addition, it is possible to reduce some unnecessary retransmission causing high loss rate in a noisy network environment.

The effectiveness of our adaptation depends on the trade-off between the energy overhead caused by the SNR prediction and the energy savings made by increased network goodput. The former one is considered to be stable and independent of network conditions, whereas the latter one varies with network scenarios. To figure out the impact from different network scenarios on the effectiveness of our adaptation, we divide the experimental network scenarios into 4 types based on the mean and standard deviation of SNR values. The type classification is mainly due to the fact that in real time network measurements it is hard to get a stable SNR value.

According to our evaluation results presented in Section 5, when SNR values are generally low as in Type 1, the increase in network goodput is significant, and hence the adaptation is profitable. In the scenarios where the SNR values fluctuate heavily even though the mean of SNR is high, such as Type 4, our adaptation also could save energy to a certain extent. If the network conditions are relatively stable, for example, in Type 2 and 3, when the standard deviation is less than 5, there is not much advantage for the threshold-based adaptation. Energy consumption could not be saved, but is wasted due to the energy overhead caused by network adaptations. In addition to the SNR range, the selection of threshold has an impact on the effectiveness of our adaptation. For example, in the network conditions of Type 2, when the threshold is set to 20, the pause duration is close to 0. However, when the threshold is changed to 15, the adaptation becomes more energy efficient. Hence, in summary, threshold-based adaptation is energy-efficient when network conditions fluctuate in a big range or remain in a relatively bad state. In other words, when SNR values are generally low, such as lower than 15, or when the standard deviation of SNR values is big, network adaptations help save energy up to 40%.

In this paper, we compare the prediction accuracy of different prediction algorithms during offline and online experiments. Our experimental results show that all the three algorithms could attain reasonable high prediction accuracy, and that the energy savings depend more on network conditions than prediction accuracy in our case. In the case that the predicted values are smaller or higher than the real values, it does not always follow that there is an impact on the results of adaptations. For example, when the wrong prediction causes an unnecessary pause of the download, or does not pause the download when the SNR value is very low, it might increase the download duration and waste energy. When both predicted and real values are bigger or smaller than the threshold, it does not change the adaptive operation.

Among the three prediction algorithms, ARIMA has relatively higher prediction accuracy, and NFI performs better than MC. However, ARIMA requires offline learning. Hence in a new network environment which has not been trained, the accuracy might be lower and the energy savings might be reduced too. To choose the prediction algorithm to use for prediction, it would be better to apply ARIMA if the user’s moving paths could be predefined. For the application scenarios where the network conditions are totally new to mobile devices, it is wiser to choose NFI which can adjust itself to the new network scenarios quickly.

VII. CONCLUSION AND FUTURE WORK

To summarize our contribution in this paper, we have explored the possibility of applying network predictions to network-based power adaptation, and implemented three prediction algorithms, ARIMA, NFI, and the MC in adaptive file transfer on mobile devices. We compared the effectiveness of the algorithms while taking the performance of prediction like prediction accuracy and performance overhead into account. The experiments were conducted in a number of settings in public WLANs.

From the lessons learned through prototype level implementation and experimental evaluation, we have figured out the future roadmap for our work that helps to improve the prediction and adaptation techniques used. The temporal difference (TD) method which is often used for reinforcement learning can be a potential topic to be investigated for enhancing our prediction techniques. The advantage of utilizing the TD method is the higher prediction accuracy, and it requires less memory for computation [24]. In addition, TD is applicable for multi-step prediction [25]. As indicated in Section VI the threshold based adaptation could lead to more energy savings, and we will explore the possibility of using dynamic threshold values. The dynamic threshold especially could be helpful in situations where the network behavior fluctuates.

ACKNOWLEDGMENTS

This work was supported by TEKES as part of the Future

Internet program of TIVIT (Finnish Strategic Centre for

Science, Technology, and Innovation in the Field of ICT).

We also extend our thanks to Chengyu Liu for

81



implementing the prediction algorithms, and

Gopalacharyulu, PV for providing valuable suggestions

while choosing prediction algorithms.

REFERENCES

[1] R. Sri Kalyanamaran, Y. Xiao, and A. Ylä-Jääski, ―Network Prediction for Adaptive Mobile Applications‖, UBICOMM’09: In Proceedings of the International Conference on Mobile Ubiquitous Computing, Systems, Services, and Technologies, pp. 141 – 146, Malta, October 2009.

[2] Y. Xiao, P. Savolainen, A. Karpanen, M. Siekkinen, A. Ylä-Jääski, ―Practical power modeling of data transmission over 802.11g for wireless applications‖, e-Energy’10: In Proceedings of the 1st International Conference on Energy-efficient Computing and Networking, pp. 75 – 84, Passau, Germany, April, 2010.

[3] X. Long and B. Sikdar, ―A Real-time Algorithm for Long Range Signal Strength Prediction in Wireless Networks‖, WCNC 2008: In proceedings of IEEE Wireless Communications and Networking Conference, pp. 1120 – 1125, 2008.

[4] Y. Vanrompay, P. Rigole, and Y. Berbers, ―Predicting network connectivity for context-aware pervasive systems with localized network availability‖, WoSSIoT'07: In Proceedings of 1st International Workshop on System Support for the Internet of Things, Lisbon, Portugal, March, 2007.

[5] R. Mayrhofer, ―An Architecture for Context Prediction‖, In Advances in Pervasive Computing, Volume 176, Austrian Computer Society (OCG), pp. 65 – 72, 2004.

[6] C.Anagnostopoulos, P. Mpougiouris, and S. Hadjiefthymiades, ―Prediction intelligence in context-aware applications‖, In Proceedings of the 6th international conference on Mobile data management, MDM’05, pp. 137 – 141, 2005.

[7] Y. Wen, R. Wolski, and C. Krintz, ―Online Prediction of Battery Lifetime for Embedded and Mobile Devices‖, In Proceedings of 3rd International Workshop on Power-Aware Computer Systems, pp. 57 – 72, December, 2003.

[8] A. Gupta and P. Mohapatra, ―Power Consumption and Conservation in WiFi Based Phones: A Measurement-Based Study‖, SECON’07: In Proceedings of the 4th Annual IEEE Communications Society Conference on Sensor, Mesh, and Ad Hoc Communications and Networks, pp. 122 – 131, June, 2007.

[9] M . Vukovic, I. Lovrek, and D. Jevtic, ―Predicting user movement for advanced location-aware services‖, SoftCom: In Proceedings of 15th International Conference on Software, Telecommunications, and Computer Networks, pp. 1 – 5, 2007.

[10] D. Narayanan, J. Flinn, and M. Satyanarayanan, ―Using History to Improve Mobile Application Adaptation‖, WMCSA '00: In Proceedings of the Third IEEE Workshop on Mobile Computing Systems and Applications, pp. 31 – 40, 2000.

[11] J. Petzold, F. Bagci, W. Trumler, and T. Ungerer, ―Next Location Prediction Within a Smart Office Building‖, ECHISE’05: In

proceedings of 1st International Workshop on Exploiting Context Histories in Smart Environments at the 3rd International Conference on Pervasive Computing, Munich, Germany, May, 2005.

[12] A. S. Weigend, and N. A. GershenField, ―Time Series Prediction: Forecasting the Future and Understanding the Past‖, SFI Studies in the Sciences of Complexity, Proc. Vol XV, Adisson-Wesley, 1993.

[13] G. E. P Box, and G. M. Jenkins, ―Time Series Analysis: Forecasting and Control‖. San Francisco: Holden Day, 1970, 1976.

[14] F. B. Hildebrand, ―Introduction to Numerical Analysis‖ (2nd edition) McGraw-Hill. ISBN 0-070-28761-9, Pp 43-59, 1974.

[15] K. C. Barr and K. Asanovic, ―Energy-aware Lossless Data Compression‖, ACM Transactions On Computer Systems. Vol. 24, No. 3, pp. 250 – 291, August, 2006.

[16] Y. Hao, L. Chuang, S. Berton, and L. Bo, and M. Geyong, ‖Network traffic prediction based on a new time series model‖, International Journal of Communication Systems, Volume 18, Issue 8, pp. 711–729, October, 2005.

[17] N. Sadek, and A. Khotanzad, ―Multi-scale network traffic prediction using k-factor Gegenbauer ARMA and MLP models‖, AICCSA’05: In Proceedings of the ACS/IEEE 2005 International Conference on Computer Systems and Applications, 2005.

[18] J. Z. S. Tenhuen, and J. Sauvola, ―CME: a middleware architecture for network-aware adaptive applications‖, PIMRC’03: In Proceedings of the 14th International Symposium on Personal, Indoor, and Mobile Radio Communications, , pp. 839 – 849, Beijing, China, Septemer, 2003.

[19] R. H. Shumway, and D.S. Stoffer, ―Time Series Analysis and Its Applications: With R Examples‖, Springer Texts in Statistics, 2nd Edition, pp. 85 – 154, 2006.

[20] D. Katsaros, and Y. Manolopoulos, ―Prediction in wireless networks by markov chains‖, IEEE Journal on Wireless Communications, Volume 16, No. 2, pp. 56 – 63, 2009.

[21] MD. O. Gani, H. Sarwar, and C.M. Rahman, ―Prediction of state of wireless network using markov and hidden markov model‖, Journal of Networks, Volume 4, No. 10, pp. 976 – 984, 2009.

[22] R. R. Sarukkai, ―Link prediction and path analysis using markov chains‖, In Proceedings of the 9th International World Wide Web Conference on Computer Networks, pp. 337 – 386, 2000.

[23] Y. L. Chou, ‖Statistical Analysis", Holt International, 1975

[24] R. S. Sutton, ―Learning to predict by the Methods of Temporal Differences‖, Journal for Machine Learning, Volume 3, Number 1, pp. 9 – 44, Springer Netherlands, August, 1988.

[25] X. Z. Gao, S.J. Ovaska, and A.V. Vasilakos, ‖Temporal difference method-based multi-step ahead prediction of long term deep fading in mobile networks‖, In Computer Communications, Volume 25, Issue 16, pp. 1477 – 1486, October, 2002.

[26] OProfile, System profiler for Linux, http://maemo.org/development/tools/doc/chinook/oprofile/

[27] MYSQL Software Download Site, http://vesta.informatik.rwth-aachen.de/mysql/Downloads/MySQL-5.1

82



Highly Accurate Location-Aware Information Delivery With PosPush

Zhao Junhui, Wang YongcaiNEC Laboratories, China

zhao [email protected]

Abstract—This paper proposes PosPush, a highly accuratelocation-based information delivery system, which utilizes thehigh-resolution 3D locations obtained from ultrasonic position-ing devices to efficiently deliver the location based informationto users. This system is designed especially for applicationswhere a 3D space is partitioned into a set of closely neighboringsmall zones, and as a user moves into one of the zones, thecorresponding information will be timely transferred to theuser. In order to identify precise zones, PosPush defines a zonemodel by a set of key location points extracted by a locationclustering algorithm, and the zone model is used for onlinezone identification based on hierarchical searching. In order todetermine the appropriate delivery time, an Adaptive WindowChange Detection (AWCD) method is proposed to detect thefast change along the location stream. To evaluate the systemperformance, a MV (Music Video) shelf demo prototype wasbuilt in which the information of commodities on the shelfcan be delivered based on PosPush. We verified the feasibilityand effectiveness of our proposed system from objective andsubjective perspectives.

Keywords-Location Based Service, Ultrasonic Positioning,Ubiquitous Computing

I. INTRODUCTION

Location data can be contextual indexing information thata system will use to provide services to users, which iscalled Location Based Services (LBS)[1-6]. Location basedInformation Delivery System (LIDS) is one typical exampleof LBS, in which relevant information are predefined fordifferent zones so that when a user enters one of these zones,the related information will be delivered automatically tothe user. So far, there are already a lot of research andengineering efforts focusing on LIDS. These conventionalLIDS always utilized a very coarse location data of userobtained from either Bluetooth[7,8], Wifi[9-11], RFID[12-14] or GPS[15,16] to provide proximity based informationdelivery service. Generally, in conventional LIDS, the loca-tion zones are relatively large (over tens of square meters).Because of the location coarseness, these zones are roughlydefined and have fuzzy boundaries between each other.In addition, the delivery time in conventional LIDS onlydepends on the instant location of user, i.e., the locationbased information will be sent to user only if the user is atone of the location zones.

In this paper, we propose PosPush, a highly accuratelocation-based information delivery system, which utilizesthe 3D locations obtained from ultrasonic positioning device[17] to efficiently deliver the location based information

to users. In PosPush, a 3D space is partitioned into a setof closely neighboring small zones; a user holds a stick-style ultrasonic tag and moves it into one of zones to getmore related information. PosPush can be used in manyapplication scenarios. For example, as shown in Figure 1, inthe shopping mall, the commodities on the shelf are placedso closely that the distance between two commodities maybe just several tens of centimeters. The vicinity of eachcommodity forms a small zone. If a customer shows muchinterests in one specific commodity and wants to know morerelated information, he/she may approach the commodityand point at it with a stick-style ultrasonic tag, and thenthe information related to the interested commodity willbe pushed to the public display near the commodity shelf.Another example is the exhibition scenario, where manyexhibits are placed very closely on the booth table. Visitorscan get more information related to one exhibit by movingthe ultrasonic stick close to the interested exhibit. It will bea fantastic experience for users because user can get whathe/she wants by just pointing a stick towards what he/sheis interested in, so we call such a ultrasonic stick as MagicStick.

Customer

Public Display

Figure 1. PosPush for Shopping Mall Scenario

PosPush is a novel LIDS with fine-grained location zonesand highly accurate location information. Compared withthe conventional LIDS, more complex mechanism is highlydesirable in PosPush. Exactly, there are two unaddressedproblems to the conventional LIDS:

1) Precise zone modeling and identificationSince the zones in PosPush are small and closelyneighboring, precise zone identification is extremelynecessary, which poses high requirement to the offlinezone modeling and online zone identification.

2) Appropriate delivery time determinationConsidering that there are many neighboring small

83



Accurate Positioning Device

Location Server

InformationServer

Delivery EngineMagic Stick

Ultrasonic Positioning System

Figure 2. System Overview of PosPush

zones in PosPush, user may unconsciously pass thesesmall zones. If information delivery is triggered when-ever a user is detected within a small zone, the userwill receive many unwanted information. An appro-priate delivery time determination method is highlydesired.

To address the above problems, two crucial mechanismsare designed in PosPush. For precise zone modeling andidentification, in offline phase, we make use of a locationclustering algorithm to precisely calibrate the small zone,which extracts a set of key location points from the randomlycollected location trajectory inside a small zone, and inonline phase, a hierarchical searching method is used toaccurately and quickly determine which small zone the targetis located in. To determine the appropriate delivery time,we propose Adaptive Window Change Detection (AWCD)method to tolerate the zone identification error and optimizeinformation delivery time by monitoring the fast changealong the location stream. We implement PosPush in aprototypical application for commodity information deliveryin shopping mall. Based on this prototype, we carry outa number of experiments to evaluate the performance ofPosPush.

The rest of this paper is organized as the follows. wepresent the system overview in Section 2. Zone calibrationand identification algorithm, and delivery time determinationalgorithm are introduced in Section 3 and 4, respectively.Section 5 presents the prototype implementation and aprivacy-preserving approach is also described. The perfor-mance evaluation covering not only objective evaluation butalso subjective evaluation is in Section 6. Finally, Section 7concludes the paper.

II. SYSTEM OVERVIEW

The system architecture of PosPush is illustrated in Figure2. Such architecture contains several main components:• Ultrasonic Positioning System

In PosPush, a ultrasonic positioning system is deployedfor locating the mobile targets, which is composedof Accurate Positioning Device and Magic Stick. Thepositioning device is named as Positioning on One De-vice (POD) that integrates multiple ultrasonic receivers.POD is mounted, facing downwards, in the room to becovered. Magic Stick is a ultrasonic transmitter carriedby user for tracking. It works in active transmissionmode and can be located by POD with the accuracy ofless than 10 centimeters. For more information aboutthe ultrasonic positioning system, please refer to ourprior work [17,18].

• Location ServerThe function of Location Server is to collect the ac-curate location of Magic Stick. In offline phase, themodel for each small zone is calibrated by clustering asequence of location data samples. In online phase, thereal-time location of Magic Stick is used to determinewhich small zone the Magic Stick is located in bysearching the location models.

• Delivery EngineDelivery Engine aims to efficiently determine the ap-propriate time for information delivery. Particularly,Delivery Engine performs an AWCD based method todetect the delivery time by sequentially evaluating thelocation stream of Magic Stick. At the delivery time,Delivery Engine will send a query for retrieving thelocation based information from Information Server andthen forward the information to the Public Display.

• Information ServerInformation Server contains a location based informa-tion database that stores various information associatedwith the small zones.

Based on the system architecture, the work flow ofPosPush is as follows.

¬ The Magic Stick carried by user emits the rangingsignals to POD.

84



The accurate location of Magic Stick is calculated byPOD and sent to the location server.

® Location server identifies which small zone the MagicStick belongs to, and sends the corresponding zoneindex to Delivery Engine.

¯ Delivery Engine determines the appropriate deliverytime along the location stream and sends it as a queryto Information Server to retrieve the location basedinformation.

° The location based information is delivered to thePublic Display for rendering timely.

III. ZONE CALIBRATION AND IDENTIFICATION

In PosPush, location server is an important componentthat performs both offline zone model calibration and onlinezone identification. The zone defined in PosPush refers to asmall 3D space containing a specific object. For example,in shopping mall, a vicinity area around a commodity onthe shelf is regarded as a zone. Qualitatively, assume thedistance between two neighboring objects is d cm, such asmall zone can be described as a cubic space with the edgelength of d cm. Figure 3 gives an illustration.

Object 1 Object 2

Zone for Object 2Zone for Object 1

d cm

Figure 3. Cubic Space Description for Small Zone

The characteristics of these zones are small and closelyneighboring. As far as the cubic space is concerned, thereare two basic methods for zone calibration and identification.

1) Center based methodThe coordinate of central point is recorded in offlinephase, which is taken as the zone model. In onlinephase, the distance between the real-time location ofMagic Stick and the central point of the zone iscalculated. The most probable zone can be determinedif such a calculated distance is less than d cm.

2) Boundary based methodIn offline phase, Magic Stick is moved along theboundary of the cubic space so that the movingtrajectory is recorded as the zone model. In onlinephase, the real-time location is compared with the zoneboundary for determining if the tag is located at thiszone.

However, the above methods are not so efficient becauseit is difficult to make a correct judgment when Magic Stick

moves between the boundaries of the neighboring zones.The experiment in Section 6 also confirms it. In addition,since the Boundary based method requires user to moveMagic Stick strictly along the boundary for calibration, lotsof manual efforts will be needed.

To solve the problems of above methods, we propose anew method for precise zone calibration and identification.

A. Offline Calibration

Magic Stick

A Random Moving Trajectory

Figure 4. Random Location Collection Inside a Zone

During the offine phase, we perform location clusteringmethod to compute the location model for each small zone.Firstly, we collect a sequence of 3D location data for a smallzone by just moving Magic Stick along a random trajectoryinside the small zone, which is show in Figure 4.

KLP Initialization

DistanceCalculation

Location Data Classification

KLP Updating

Location Data Set

Convergent KLP

Iteration

Figure 5. Block Diagram of Zone Calibration

For users, such a random collection approach is flexibleand convenient. Afterwards, we apply a clustering algorithmto extract a set of key location points (KLP) form thecollected location data to represent the precise model foreach zone. There are a fair number of research literatures that

85



describe various clustering algorithms [19-21]. However, wechoose the classical k-means algorithm [22] to find thelocation model since it is a simple but efficient way toclassify the collected location data set into n clusters inwhich each location data belongs to the cluster with thenearest distance. The whole process is shown in Figure5. After an iterative process, n KLP are finally refinedfrom the convergent clusters, which are taken as a locationmodel for each small zone. It can be denoted as Pm =(xmj , ymj , zmj), j = 1, 2, ..., n,m = 1, 2, ..., M , where n isthe number of KLP in one model and M is the total numberof the zones. It should be noted that at the case of n = 1,i.e., the number of KLP is 1, the proposed clustering basedmethod will be reduced into the above mentioned Centerbased method.

B. Online Identification

During the online phase, a hierarchical searching methodis used to quickly find the most probable small zone wherethe Magic Stick is located in. In this method, there are twosearch stages, which are shown in Figure 6.

Real-time Location Data

The recognized location zone

Candidate Zone Selection

First Stage

Precise Zone Selection

Second Stage

Figure 6. Block Diagram of Zone Identification

The first stage is Candidate Zone Selection, in which foreach small zone, only one of KLP is randomly selectedfor distance calculation to the real-time location data. Thedistance data are filtered by comparison with a predefinedthreshold so that only Mc zones from all M zones withminimum distance to the real-time location data are selectedas the candidates for next stage processing, especially,Mc << M . The second stage is Precise Zone Selection. Foreach candidate small zone, the average distance between lo-cation model of mth zone from Mc candidates and the real-time location data Xt = (xt, yt, zt) is calculated accordingto the following equation.

Distm =

√√√√n∑

j=1

[(xt−xmj)2+(yt−ymj)2+(zt−zmj)2]

n

(m = 1, ..., Mc)

(1)

Finally, the index of the most probable zone will beobtained using minimum criteria among the Mc distances,which is depicted as

Loct = arg minm

Distm (2)

Through the online zone identification to the real-timelocation data of Magic Stick, the location stream of MagicStick can be mapped into the matched zone index stream.

IV. DELIVERY TIME DETERMINATION

For purpose of providing friendly user experience, it isimportant to find the appropriate and reasonable time forinformation delivery. This is especially true for PosPushwhere the small zones are close neighboring so that usermay unconsciously pass through some of the small zones.If we apply the conventional instant-location based deliverymethod, many unnecessary information delivery will betriggered, which would be rather annoying to the users. Sowe define the time for starting and stopping informationdelivery:

Information Delivery Time (IDT): the moment when thelocation based information should be delivered to user clientas user is close or inside one of location zones.

Information Stop Time (IST): the moment when thelocation based information delivery should be terminated asuser is leaving the location zone.

IDT and IST are detected by sequentially evaluatingthe newest location data in the location stream to find anadaptive window whose accumulate departing significanceis compared against a IDT-threshold and a IST-threshold re-spectively. The IDT, IST determination problem is illustratedin Figure 7. The upper stream is the location stream and thelower stream is the corresponding zone index stream.

AW(Adaptive Window)

…A1 A3 A1 A3 AI

time

Ai AiAi

NewestIDT

Current IDT

… xt-1

xt

x1

xwLocation data Stream

Zone Index Stream

…

Figure 7. IDT Determination from Streaming Data

The total M small zones can be denoted as {Ai, i =1, 2, ..., I, ..., M}. Suppose the time when the location datais Xw, which located at zone AI is the current IDT. Atthis IDT, AI related information is delivered to user. Ourobjective is to find a IST to stop the information delivery toA(I), as the magic stick leaves AI . After the informationdelivery stops at IST, we need to find the next IDT to

86



Send request to Information Server for Location based information

Behavior Change Detection

Output is IDT or ISC?

Grid Index Stream

No

IDT

Forward the location based information to user client

Any ongoing informationdelivery?

Terminate the ongoing information delivery

IST

Yes

Figure 8. The Block Diagram of IDT and IST Detection

start new information delivery process as the magic stickenters another zone. Particularly, we focus on detecting themoving trend within a time window that is started at Xw

and ended at the newly-received Xt. Since the window sizeis always adaptive to newly-received location data, we callthe proposed ISC and IDT determination method AdaptiveWindow Change Detection (AWCD).

The key idea of AWCD is that we calculate Accumu-late Departing Significance (ADS) in the adaptive window,which is used as a measurement to online check the trendof Magic Stick departing from the current IDT zone AI .Since we have known the last location at IDT, which is AI ,AWCD algorithm tolerates noises by following schemes:

1) if the following location data in the stream departsfrom AI only a little, we predict that the location ofMagic Stick has not changed.

2) If the trend that the data in adaptive window departsfrom AI is larger than a user defined IST-threshold,an IST event is detected and the location based infor-mation delivery should be stopped.

3) If the trend that the data in adaptive window departsfrom AI is larger than a user defined IDT-threshold,A new IDT is detected to inform the new informationdelivery.

In details, the AWCD algorithm includes the followingsteps.

1) Reset StepAt the time of the current IDT (that is, Xw located at AI

as shown in Figure 7), the ADS to all of zones except AI

are reset to zero.

ADS(Ai, Xw) = 0,∀Ai, (i = 1, ..., M, i 6= I) (3)

2) Update StepEach time the new location data Xt is received, the ADS

to all of zones except AI are calculated in the adaptivewindow from Xw to Xt by a recursive approach.

ADS(Ai, Xt) = ADS(Ai, Xt−1) + DS(Ai|AI , Xt)∀Ai, (i = 1, ..., M, i 6= I)

(4)where DS(Ai|AI , Xt) is Departing Significance that is

defined as

DS(Ai|AI , Xt) =

−α, if Xt ∈ AI (a)β, if Xt ∈ Ai (b)0, Xt /∈ Ai and Xt /∈ AI (c)

(5)where α and β are positive values. From the above

equations, we can find that if Xt is located at AI , all ADSwill be decreased by α, but if Xt is located at one zone Ai,the ADS of Ai will be increase by β and all other ADS willkeep unchanged.

3) Maximum StepAs the real-time location data is received continuously, we

can obtain the spatial-temporal distribution of ADS, whichis shown in the following equation.

87



ADS(A1, Xw)ADS(A2, Xw)

.

.ADS(AM , Xw)

ADS(A1, Xw+1)ADS(A2, Xw+1)

.

.ADS(AM , Xw+1)

......

ADS(A1, Xt)ADS(A2, Xt)

.

.ADS(AM , Xt)

(6)Each line represents a temporal trajectory of the ADS of

one zone; and each column represents the ADS distributionfor all zones at one time. According to the update step, theADS to Ai will be monotonically increasing if Magic Stickhas an apparent trend of leaving AI and entering Ai; On theother hand, all ADS will be relatively small if the locationsteam of Magic Stick concentrate on the vicinity of AI . So,for each time, the maximum value of ADS should be pickedout.

M(Xt) = maxz

(ADS(Az, Xt)) (7)

4) Judgment StepFinally, the maximum value of ADS is checked against

the predefined thresholds for IST and IDT. The thresholdof IST is predefined as TIST . It means that when theaccumulate departure trend is larger than TIST , the currentinformation delivery should be stopped. The threshold ofIDT is predefined as TIDT , which means that when theaccumulate departure trend to another grid is large enough,a new information delivery process should begin. GenerallyTIST is smaller than TIDT .

Figure 8 shows the block diagram of the IDT&ISTdetection. If M(Xt) is larger than the IST threshold (i.e.,TIST ), an IST is detected, the current information deliverystops.

If M(Xt) is larger than the IDC threshold (i.e., TIDT ),an IDT is detected, and

the new IDT zone will be found accordingly.

Az = arg maxz

(ADS(Az, Xt)) (8)

V. PROTOTYPE IMPLEMENTATION - INFORMATIONDELIVERY OF MUSIC VIDEO DISC

Based on PosPush, we implemented a prototypical ap-plication of commodity information delivery in shoppingmall scenario. Particularly, considering that Music Video(MV) is popular to most people, we choose MV disc asthe commodity in our prototype, in which if customer isinterested in a MV disc, he/she will move the Magic Stickclose to it and soon a video clip from this MV disc will berendered on the Public Display. Figure 9(a) illustrates ourprototype.

Basically, the prototype mainly includes:• Commodity Shelf and MV Disc: In the prototype, a

commodity shelf with eight grids is used. Each gridcontains one MV disc so totally eight discs are placed

POD

Public

Display

Magic Stick

(a) Demonstration

Zone 1

Zone 2

Zone 3

Zone 4

Zone 5

Zone 6

Zone 7

Zone 8

35cm

25cm

35cm

(b) Zone Definition

Figure 9. PosPush Prototype

on the shelf. It should be noted that the MV discswe used are from the popular singers nowadays forattracting the attentions of the visitors. The grids onthe shelf are closely neighboring and each grid has asize of 35cm height and 35cm width. Since ultrasonicsignal may be obstructed by the grid board, we definea vicinity area outside the shelf but close to a MV discas the small zone for information delivery. Exactly, thezone size is 35cm×35cm×25cm, as shown in Figure9(b). The zone index is also given.

• POD and Magic Stick: A POD is fixed on the topof the commodity shelf to cover all zones of eight MVdiscs. It has a hexagon shape containing six surroundingultrasonic receivers. A stick-style ultrasonic tag is usedas Magic Stick to be held by user. As the user movesthe Magic Stick to the interested MV disc, POD cancalculate the accurate 3D location of Magic Stick andsent the location data to a server computer. It shouldbe pointed out here is that the updating rate of Magic

88



Stick’s location is 5Hz, i.e., the transmission interval ofMagic Stick is 200ms.

• Server Computer: On server computer side, thevideo clips of all MV discs are stored at an infor-mation database, and a Java-based PosPush softwareis developed to perform both precise zone calibra-tion&identification and IDT determination. PosPushsoftware also provides a user-friendly GUI for ease ofuse. In offline phase, PosPush aggregates a sequenceof location data in the vicinity area of a MV disc andextracts a set of KLP as the zone model. Then, thelocation model is associated with the correspondingvideo clip of the MV disc. In online phase, PosPushobtains the appropriate IDT from the location streamof Magic Stick. At the IDT, the video clip is retrievedfrom information database and is played on the PublicDisplay.

A. User Privacy

A potential problem to Magic Stick mode is user privacyissue. Considering that public display is used to render thevideo clips and there are always many people in shoppingmall, it is unavoidable that the video clip of a user interestwill be seen by others, which leads an uncomfortable ex-perience. Another problem is that only one public displayis placed in a shelf so that more than two users can notview the interested information simultaneously. To solve thisproblem, we further designed a small USB-style ultrasonictransmitter, which can be easily plugged into user’s personalmobile device, such as PDA or mobile phone. As userwant to know more information to the interested commodity,he/she just move the personal mobile device close to such acommodity and soon, more information will be delivered touser’s personal mobile device via wireless connection, e.g.,Wifi, Bluetooth, etc. Figure 10 gives an illustration to thisscenario.

USB Ultrasonic Tag

Figure 10. Prototype for User Privacy Preservation

VI. PERFORMANCE EVALUATION

Based on the prototype, a number of experiments arecarried out to evaluate the PosPush performance. In addition,we showed the MV demo in a formal exhibition to visitorsfor subjective evaluation.

A. Evaluation of Zone Calibration&Identification

In PosPush, zone models are offline calibrated by ap-plying k-means clustering to a randomly collected locationsequence with length L, so that n KLP can be extracted asthe zone model. The selection of parameters L and n willaffect the complexity of zone calibration and the accuracy ofthe zone identification. On one hand, larger value of L andn can create a more precise zone model. On the other hand,larger L means more time consumed for location collectionprocessing. Thus we expect to find an optimal combinationof parameter L and n so as to obtain a precise locationmodel while minimizing the calibration efforts as much aspossible.

In our experiment, the length L of collected locationsequence during offline calibration phase varies from 30,to 60 and 90. Considering that the update frequency ofMagic Stick is 5Hz, the corresponding collection time is 6s,12s and 18s respectively. The number of KLP (n) rangesfrom 1 to 20. For each pair of parameter L and n, zonemodels are built for all the eight zones of the prototype.In online phase, we collect a set of real-time location asthe test data. Particularly, most of the test data are collectednear the boundary of zones. The ground truth of zone indexstream is manually labeled. We use the error rate of zoneidentification for performance evaluation. The results areshown in Figure 11. From this figure, we can see that, firstly,the length of collected location sequence has little effect onthe zone identification accuracy. Especially, when n > 10,although the length of the location sequence is different, theidentification accuracy are almost identical. Secondly, theidentification accuracy is improved with the increase of thenumber of KLP, but the accuracy will increase very slightlywhen n > 10.

Therefore, according to the experiment results, we selectL = 30 and n = 11 as the optimal parameters. Withthis parameter setting, a precise zone model can be builtfor online identification through moving Magic Stick in aspecific zone for just about 6s.

Next, we perform experiment to compare the error ratebetween our proposed clustering based method and thetwo baseline methods mentioned in Section 3. In onlineidentification phase, we use the same test data with the aboveexperiment to evaluate the performance of the three methods.Figure 12 shows the comparison result. From this figure, itis obvious that our proposed method can achieve the bestaccuracy compared to the other two methods. We also findthat the error rate of Center based method is very close to

89



2 4 6 8 10 12 14 16 18 200

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

Number of Key Location Points

Err

or

rate

in Z

on

e D

ete

rmin

atio

n

L=60 L=90

L=30

Figure 11. Effect of L and n to the Zone Determination Accuracy

the case of n = 1 in the clustering based method as shownin Figure 11.

Center based Boundary based Clustering based

Err

or

Rat

e

Figure 12. Comparison between Different Methods

B. Evaluation of AWCD

To evaluate the performance of the proposed AWCDmethod, we need a metric to check if there is a goodmatch between the IDT determined by AWCD and the user’strue interest. We propose Matching Rate between the user’strue interest and the actually delivered information as theevaluation metric. At each time instance, user’s true interestis manually labeled. If the delivered information is the samewith the user’s true interest, a ”match” is counted; otherwise,a ”mismatch” is counted. After the user has used the PosPushsystem for a period, a sequence of ”match” and ”mismatch”will be recorded. Matching Rate is defined as the percentageof the whole sequence occupied by the ”match” number.For convenience of evaluation, we index all zones. The fourzones on the left column are indexed as Zone 1 to Zone 4from top to bottom, and the four zones on the right columnare indexed as Zone 5 to Zone 8 from top to bottom.

The experiment result of one test is shown in Figure 13.The Y-Axis is the zone index, and the X-Axis is time. A user

moves the Magic Stick in the shelf area. His true interest isthe commodities placed in the Zones 1, Zone 6, Zone 3 andZone 8. The green dotted line shows the zone determinationresults. We can see the trace starts from Zone 1 for a while,then passes Zone 5 and stays in Zone 6 for a while; nextfast passes Zone 7 and stays in Zone 3 for a while; finallyfast passes Zone 4 and ends in Zone 8. The true interestis labeled by the user himself and is shown in the dottedblack curve. According to the zone determination result, theMagic Stick traveled all of zones considering that there arefast-pass zone result and error determination result. If thedelivery time is just dependent on the instant location like theconventional LIDS, the information related to all zones willbe pushed to user. It will bring uncomfortable experienceto users. For comparison, the delivered result generated byour proposed AWCD is plotted in the solid purple curve. Itcan be found that the information delivered by AWCD wellmatches the user’s true interest and AWCD can tolerant thezone identification error and fast-pass zone result. Exactly,the Matching Rate of AWCD based method is as high as96.5%.

0 10 20 30 40 50 60 70 801

2

3

4

5

6

7

8

Time

Ind

ex o

f L

oca

tio

n Z

on

e

True Interest of User

Location Zone Determination Result

Delivered Result of AWCD

Figure 13. Experiment of AWCD based Delivery

We carry out 50 times of above test to compare theMatching Rate between AWCD and the conventional instant-location based method. The average Matching Rate aresummarized in Table I. We can see AWCD-based methodcan improve Matching Rate over 10% compared with theconventional method.

Table IMATCHING RATE COMPARISON

Instant-Location basedMethod

AWCD based Method

Matching Rate 86.1% 96.7%

90



C. Subjective Evaluation

To better understand user’s subjective affection to Pos-Push, we showed our MV shelf prototype at NEC SolutionFair 2008 (as shown in Figure 14).

During this exhibition, we presented the MV informa-tion delivery demo to many visitors who had never seenPosPush before. It is worth noting that almost all of thevisitors were from information technology domain, whichmeans they may have a stronger willing to experience newapplications. Firstly, we introduced the visitors to PosPushand then explained MV shelf demo to them. Such a livedemonstration attracted the attention of visitors immediately.Most of the visitors took the Magic Stick or PDA in theirhands and experienced PosPush system by themselves. Afterthe visitors had experienced the functionalities of PosPush,they gave us their feedback. One hand, we got some praisingwords, such as

1) ”It is really the first time for me to see such atechnology.”

2) ”It is an interesting experience for me.”3) ”It is definitely high accurate locating system.”On the other hand, we also got some advice, such as1) ”If I use Barcode reader to scan the goods ID and

send it to server via wireless, I can also get the moreinformation of my interested goods.”

2) ”PosPush may just be applied for MV shop, but it ismore common in shopping store that the commodities(such as shampoo, cookie) are placed very close, so itis difficult to realize PosPush in such real scenario. ”

VII. CONCLUSION

In this paper, we designed, implemented and evaluatedPosPush - a highly accurate location based informationdelivery system, which utilizes the 3D locations obtainedfrom ultrasonic positioning device to efficiently deliver thelocation based information to users. To meet the requirementof practical application, we proposed two key mechanisms.1) Precise zone modeling and identification 2) AWCD baseddelivery time determination. Based on PosPush, we imple-mented a prototypical application of commodity informationdelivery in shopping mall scenario. It verified the feasibilityand effectiveness of our proposed system. In the future, thereare several directions to pursue in order to further improvePosPush. Firstly, we investigated the PosPush system into thezone with the size of tens of centimeters in this paper. Butit is common in current supermarket that the commodities(such as cookie, shampoo etc.) are placed one by one so thatthe zone has a smaller size of less than 10cm. So we plan toapply PosPush system to such smaller zones for performanceevaluation. Secondly, the current PosPush is based on theultrasonic positioning system. We will investigate the useof other 3D positioning system [23-25] for PosPush in nearfuture.

VIII. ACKNOWLEDGEMENT

The authors would like to thank Wang Yanzhe for hiseffort and assistance in the testing experiments and in help-ful discussions regarding system development. The authorswould also like to thank Toshikazu Fukushima and ZhangHongming for their valuable support.

REFERENCES

[1] E. Beinat, Location-Based Services: Market and BusinessDrivers, GeoInformatics, vol. April, pp. 6-9, 2001.

[2] ABI Research, Location Based Services, Allied Business In-telligence, 2004.

[3] R. Bharat and M. Louis, Evolution of Mobile Location-basedServices, Journal of Communications of the ACM, Vol. 46, pp.61-65, 2003.

[4] JOS, R., et al. An Open Architecture for Developing MobileLocation-Based Applications over the Internet , 6th IEEESymposium on Computers and Communications, July, pp.3-5, 2001.

[5] KAGAL, L. et al. A highly adaptable infrastructure for servicediscovery and management in ubiquitous computing, TechnicalReport TR CS-01-06, Department of Computer Science andElectrical Engineering, University of Maryland: Baltimore,United States, 2001.

[6] Barnes, S. J. Known By the Network: The Emergence ofLocation-Based Mobile Commerce. Advances in Mobile Com-merce Technologies, 171-89, 2003

[7] Anastasi, G., Bandelloni, et al. Experimenting an IndoorBluetooth-Based Positioning Service. ICDCSW’ 03: 23rd In-ternational Conference on Distributed Computing SystemsWorkshops 2003, 480-83.

[8] L. Aalto, N. Gothlin, J. Korhonen and T. Ojala T, Bluetoothand WAP Push based Location-aware Mobile AdvertisingSystem, Second International Conference on Mobile Systems,Applications and Services, pp. 49-59, Boston, MA, 2004.

[9] S. Yamada, Y. Watanabe, H. Kitagawa and T. Amagasa,Location-Based Information Delivery Using Stream ProcessingEngine, International Conference on Mobile Data Management(MDM06), pp. 57-61, 2006.

[10] Anthony Lamarca, Yatin Chawathe, et al.,Place Lab: DevicePositioning Using Radio Beacons in the Wild, In Proceedingsof the Third International Conference on Pervasive Computing(May 2005)

[11] P. Bahl and V. Padmanabhan. RADAR: An In-Building RF-based User Location and Tracking System. In Proc. IEEEINFOCOM (Tel-Aviv, Israel, Mar. 2000).

[12] N. Michiko, O. Naoya, T, Ryuji, Just-in-Time InformationDelivery System for Passenger Assistance, Journal of Q RepRTRI, Vol.47; No.4; pp.187-191, 2006.

[13] J. Hightower, C. Vakili, G. Borriello, and R. Want. Design andCalibration of the SpotON Ad-Hoc Location Sensing System.August 2001.

91



Figure 14. User Experience in Exhibition

[14] L. M. Ni, Y. Liu, Y. C. Lau, and A. P. Patil. LANDMARC:Indoor Location Sensing Using Active RFID. In Proceedingsof IEEE PerCom 2003, Dallas, TX, USA, March 2003.

[15] N. Marmasse and C. Schmandt, Location-Aware InformationDelivery with ComMotion, International Symposium on Hand-held and Ubiquitous Computing , pp. 361-370, London, UK,2000.

[16] A. Hinze and A. Voisard. Location- and Time-based In-formation Delivery in Tourism, International Symposium onAdvances in Spatial and Temporal Databases (SSTD 2003),pp. 489-507, July 2003.

[17] J. Zhao and Y. Wang, Autonomous Ultrasonic Indoor Track-ing System, IEEE International Symposium on Parallel andDistributed Processing with Applications, pp.532-539, Sydney,Austrialia, 2008.

[18] Y. Wang and J. Zhao and T. Fukushima, LOCK: A HighlyAccurate, Easy-to-Use Location-based Access Control System,4th International Symposium on Location and Context Aware-ness (LOCA’09), Tokyo, Japan, 2009.

[19] J. Hartigan, Clustering Algorithms. Wiley.

[20] MacKay and David, Chapter 20. An Example InferenceTask: Clustering, Information Theory, Inference and LearningAlgorithms, Cambridge University Press. pp. 284-292, 2003.

[21] Luger, George; Stubblefield, William . Artificial Intelli-gence: Structures and Strategies for Complex Problem Solving(5th ed.). The Benjamin/Cummings Publishing Company, Inc.ISBN 0-8053-4780-1.(2004)

[22] J. Hartigan and M. Wong, Algorithm AS 136: A K-MeansClustering Algorithm Applied Statistics Vol.28, No.1, pp. 100-108, 1979.

[23] R. FONTANA, Recent System Applications of Short-pulseUltra-wideband (UWB) Technology.IEEE Trans on MicrowaveTheory and Techniques, 52(9), pp. 2087-2104, 2004.

[24] W. Chung and D. Ha, An accurate ultra wideband (UWB)ranging for precision asset location, in Proceedings of IEEEConference on Ultra Wideband Systems and Technologies(UWBST’03), pp. 389-393, Reston, Va, USA, November 2003.

[25] R. Fleming, C. Kushner, G. Roberts, and U. Nandiwada,Rapid acquisition for ultra-wideband localizers, in Proceedingsof IEEE Conference on Ultra Wideband Systems and Tech-nologies (UWBST’02), pp. 245-249, Baltimore, Md, USA,May 2002.

92



1

Micro-Mobility Solution Based onIntra-domain multicast and Congestion

Avoiding for Two-Nodes Mobile IP NetworkYacine Benallouche and Dominique Barth

PRiSM-UMR 8144Université de Versailles Saint-Quentin

45, Avenue des Etats-Unis78035 Versailles Cedex, France

Email: {bey, dominique.barth}@prism.uvsq.fr

Abstract—In this paper, we deal with micro-mobilityin TWINBOARD network, which is a two nodes mobilenetwork architecture based on an all IP infrastructure.The two nodes are the Base Station (BS) and the AccessGateway (AG). To manage micro-mobility, we propose anew approach providing efficient and smooth handover,while being able to coexist and inter-operate with exist-ing technologies. Specifically, we propose an intra-domainmulticast-based handover approach combined with anAlert mechanism. Alert approach is a distributed mecha-nism that provides routers with information regarding thecongestion state of other routers without any modificationson existing routing protocol. Our solution achieves anefficient intra-domain handover and avoids flooding in thenetwork. The simulations used to evaluate our scheme andcompare it to other multicast scheme - DVMRP (DistanceVector Multicast Routing Protocol) show that our solutionpresents a good performance and outperforms DVMRPscheme. Our main contribution consists on an efficient newapproach to manage IP micro-mobility using intra-domainmulticast with alert mechanism.

Keywords - Micro-mobility, intra-domain handover,multicast algorithm, alert algorithm .

I. INTRODUCTION

Current mobile networks are composed of severalnetwork elements interconnected by specific networkinfrastructures, leading to important development,deployment and maintenance costs. For example, in thedata-path of 3G networks defined by 3GPP there are atleast four types of interconnected nodes: node B, RNC,SGSN and GGSN. The architecture under definition atthe WiMAX Forum presents a step forward towardssimplification in defining an IP based infrastructureconnecting 3 nodes [23]: Base Station (BS), AccessGateway (AG), and Anchor Point/Home Agent (HA).

Considering the topics and the objectives of theNext Generation Mobile Networks (NGMN) initiative

launched by major European and North Americanmobile network operators [20], the European CELTICproject TWINBOARD 1 investigates the performancesand the cost of a two nodes IP architecture for a mobilenetwork (see Figure 1). In this project, we consider amobile network composed of BSs connected to oneor some enhanced AGs through a dedicated networkthat we call Access Aggregation Network (A2N).The major objective of the TWINBOARD project isto propose a novelA2N architecture considering aspecific features of IP networks especially in terms ofload distribution, reliability, and flexibility. Here allmobility related functions and associated features areensured by BS-AG tandem.

B S

B S

B S

B SB S

A GA G

A g g r e g a t i o n N e t w o r k w i t h d y n a m i c c o n t r o l

Fig. 1. TWINBOARD network Architecture

The target architecture defined by TWINBOARDrecommendations is an optimized Packet Switched(PS) network architecture, which will provide a smoothmigration of existing 2G and 3G networks towards anIP network with improved cost competitiveness andbroadband performance. The A2N network is an IPbased network and loosely meshed with tree-like traffic

1TWINBOARD European CELTIC PROJECT, see http://www.celtic-initiative.org/Projects/TWINBOARD/default.asp.

93



2

pattern -mostly GW from/to BS- that is changing dueto mobility. Due to these peculiarities, load sharingand resilience mechanisms known from the Internet areexpected to yield suboptimal results.

Related works: Several studies on micro-mobility [7],[16] show that Mobile IP (MIP) [11], the proposedstandard, has several drawbacks from its networkoverhead and its end-to-end delays due to the trianglerouting problem. Many micro-mobility approachesattempt to improve MIP [13], [14] in current IP mobilenetworks. However, such approaches suffer fromcomplexity and handover performance [6]. To the bestof our knowledge, the proposed utilization of multicastcombined with alert message diffusion in Two-NodesMobile IP network has never been studied to manageintra-domain handover.

The rest of the paper is organized as follows. InSection II, we give an overview of multicast protocols.In Section III, we describe our proposed algorithmsand we prove the NP-completeness of the relatedproblem. Section IV presents our TWINBOARDsimulator and its environment (used topologies,multicast group and traffic model). Section V givesevaluation and simulation results. In Section VI, wepresent conclusions and outline perspective future works.

In our case, we propose two distributed algorithmsmainly implemented on GW and BS nodes to guarantyQoS dealing with the real time constraints of mobility.The paradox we deal with is to guarantee movingmobile connections by insuring enough flexible resourceuse on an IP routing without considering too expensivemechanisms on each router of the aggregation network.Considering these attempts, the solution we proposefocuses in particular on optimization of the route tablesand on traffic load balancing techniques between BSand GW, without using resource allocation mechanismsin the IP interconnection network.

The architecture we propose is based on twocollaborative algorithmic mechanisms. Firstly, toavoid congestion in the network and to insure flexibilityin the use of the bandwidth of the network, we adapt therouting alert algorithm proposed in [19] for inter-domainnetwork by introducing a hierarchy concept in the IPnetwork based on the particularity of the consideredtraffic (from BS to GW and from GW to BS). Secondly,we propose a distributed process to control multicastfunctionalities in each IP router to obtain a constraintdelay multicast tree compatible with the embedded IProuting. Here multicast is used to anticipate handoverwhen it is considered as probable. Then traffic to the

target mobile user is multicasted to the BS to, which itis connected and to the geographical neighboring BS.The aim of the proposed process using only IP basicmechanisms, is to limit congestion overhead due to themulticast.

These algorithmic solution takes benefits of forwardingand routing of datagrams and presents several naturaladvantages:

• Cheap installation and exploitation on adapted andinexpensive existing IP infrastructure.

• Good load distribution (Alert mechanism).• Ability to coexist and to inter-operate with existing

technologies.

II. M ULTICAST OVERVIEW

We categorize algorithms for the multicast tree con-struction in two categories [17]:

1) Source-Based Algorithms (SBA).2) Core-Based Algorithms (CBA).

In SBA algorithm the tree’s root is the source nodeand the leaves are the multicast group’s components.SBA is currently used as the tree construction algorithmfor Distance Vector Multicast Routing Protocol(DVMRP) [18], Protocol Independent Multicast DenseMode (PIM-DM) [3], and Multicast Open Shortest PathFirst (MOSPF) [10].

The CBA or the core-based algorithm selects acore node as a multicast tree’s root. Afterwards, atree rooted at the core node is constructed to reachall the multicast group’s members. In this case, thecore node is different from the source and it is veryimportant to select the best one as much as possible.Therefore the source send messages to the core node,which distribute those messages to the destinations.Among the protocols that use the CBA we can citeProtocol Independent Multicast Sparse Mode (PIM-SM)[4] and the Core-Based Tree (CBT) protocol [2]. Thecore-based algorithms are highly suitable for sparsegroups and for large networks. Indeed they provideexcellent bandwidth conservation for receivers. With themulticast technology, multimedia applications, such asvideoconferences, require an efficient management ofthe QoS. An essential factor of these real-time strategyis to optimize the DVBMT problem [12].

The multicast delay variation is the difference ofthe maximum end-to-end delay and the minimumend-to-end delay among the paths from the source nodeto all the destination nodes. Minimizing this parameter

94



3

allows all the destination nodes to receive the samedata simultaneously as much as possible. One issue tothe DVBMT problem is to minimize multicast delayvariation under multicast end-to-end delay constraint.In [12], authors propose a heuristic solution calledDelay Variation Multicast Algorithm (DVMA), wherethey construct at first the tree by considering only theend-to-end delay constraints. Afterwards, they enhancethe tree by considering the the multicast delay variationconstraint. Nevertheless, DVMA presents a high timecomplexity, which is does not fit in modern applications.

Another heuristic solution with lower time complexitythan DVMA is called Delay and Delay VariationConstraint Algorithm (DDVCA). DDVCA is based onthe Core-Based Tree (CBT), where the core node isselected as the node with minimum delay variationwith all other multicast group’s nodes. However, theDDVCA exhibits high network charge around the corenode. Indeed, all the multicast packets transit throughthe core node, this last one resends these packets to theleaves.

Our multicast algorithm overcomes these limitations andit is used in wireless networks to manage the handoversby constructing an optimized tree from one from anAccess Gateway (AG) to some Base Stations (BS)on, which mobility can be predicted. Unlike DDVCAwhere the tree construction is based only on one corenode, our distributed solution extends this constructionon several core nodes. This allows us to minimize thebandwidth consumption as we spread the charge ondifferent core nodes.

III. M ULTICAST&A LERT-BASED MICRO-MOBILITY

In this section we present our solution to managemicro-mobility with congestion avoiding. We proposetwo algorithms: Multicast algorithm and alert algorithm.

A. Problem Context

Handover performance and router congestion are asignificant factors in evaluating performance of IP mo-bile network. With the Internet growth it becomes cru-cial to design efficient, scalable and robust handoverprotocols. We propose a new architecture for providingefficient and smooth handover with congestion avoiding.Our approach consists of two distributed algorithms:

• Multicast Algorithm used to construct an optimizedmulticast tree from an Access Gateway (AG) tosome Base Stations (BS) on, which mobility canbe predicted [1].

• Alert Algorithm used to avoid IP router conges-tion [19].

Note that the multicast will occur only within the periodduring, which the BS communicates the candidates cellsto the moving Mobile Node. Once this information isacquired by the Mobile Node and one cell is selected,then the unicast routing runs again and the multicast isinterrupted. The handover trigger is based on a simplepower signal comparison between the current BS andthe candidate one. In our case, Handover Trigger eventare invoked when the received signal level in the currentcell becomes lower than the pre-defined thresholds. Ahandover_triggernotification message is sent by thecurrent cell to the AG in order to launch the multicastto the candidate cells.

We consider a single domain as shown in Figure 2.The Border Router (BR) connects the network to theinternet and one Access Router (AR) serves a numberof BSs. When a mobile node moves from one BS toanother without changing its AR, we talk about anintra-AR handover case that is not considered in thispaper because it is specific to AR implementation.Each BS on, which mobility can be predicted isassigned a multicast address and it sends a notificationto its attachedAG . We do not focus on mobilityprediction, so we assume that we know the BaseStations set on, which mobility can be predicted [8],[15]. Considering this set to be the multicast groupMi,it appears suitable to match them to the local groupgiven by the neighboring setNSi of each Base StationBSi in the planar graphPG = (Vp, Ep), whereVp isthe set of Base Stations andEp is a set of directedlinks. Therefore, for a givenBSi a multicast groupMi ⊂ Vp is constructed bym Base Stations belongingthe set NSi and distributed geographically on theTWINBOARD access network.

Actually, the multicast is triggered at the AG levelconsidered as the intelligent entity responsible of pro-cessing the handovers and other functions in TWIN-BOARD network. In the simulated handovers scenario(Figure 2), one mobile node attached to the Base StationBSi is in communication with another mobile nodeattached to the Base StationBSj that may roam outsideof its serving cell. Before handover trigerring, unicastpackets are sent byBSi to its attached AG along theIP routers network by using an unicast routing protocolcombined with the Alert mechanism. Once the handoveris detected, and according to the address ofBSj obtainedfrom the unicast packet, the AG selects the multicastgroup membersmj from theNSj set. Thus, it constructsthe multicast packets to be forwarded to each multicastgroup member using the proposed Multicast Algorithmcombined with the Alert Algorithm.

95



4

B a s e S t a t i o n

M o b i l e N o d e i

M o b i l i t y p r e d i c t i o n

M u l t i c a s t + A l e r t

U n i c a s t + A l e r t

B S i B S j

M o b i l e N o d e j

A c c e s s R o u t e r

A c c e s s R o u t e r

I npu t : i , j and NS j M u l t i c a s t T r i g g e r

I n t e r n e t

B o r d e r R o u t e r B o r d e r R o u t e r

N S j S e t

A c c e s s G a t e w a y

Fig. 2. TWINBOARD Architectural view

B. Multicast Algorithm

When a mobile user moves from one point of accessto another within a domain, a handover event takes placebetween the two points of access. Handover involvesto redirect the incoming traffic flow to the new accesspoint. In proactive handover the link between the mobileuser and the new access point is established prior to itsdisconnection with the old access point. Hence a smoothhandover, i.e., handover with low packet loss, can takeplace by exploiting the fact that the new access routeris known a priori and that multicasting allows proactivepath setup to the new access router. The packets aremulticasted to the mobile nodes within the domainwhere handover can be predicted. Mobility predictionneed not necessarily be a part of the micro-mobilityalgorithm as it can be better achieved with additionalinformation from lower layers.

Problem definition:We define the related multicast problem from a graphtheory point of view (see Figure 3 for an example). Then,we show that it is NP-complete and not approximableproblem. We consider a symmetric digraphG = (V, E)within a coherent routing functionR, a vertexs ∈ V(called transmitter) and a vertex subsetD (calleddestination set).

Definition 1: We define a Broadcast schemefrom s to D as a subtree ofG rooted in s, with depthequal tomax

v∈D|R(s, v)|, with leaves in setD and where

the edge set can be decomposed in a set of paths suchthat:

1) The initial extremity of each path iss or an

intermediate vertex with outgoing degree> 1 inthe subtree.

2) The final extremity of each path is a leaf.3) Each path < i, f > corresponds to the route

R(i, f).

The number of edges in the subtree defines thesizeofthe broadcast scheme.

s

Vertices in D

Routes From R

Fig. 3. An example of multicast scheme

s

S1 S2 S3 Sm

S’1 S’2 S’m

c1 c2 c3 cn

Transmiter

Destination set

Fig. 4. Polynomial reduction

The problem of finding a multicast tree minimizingthe number of edges under an IP routing constraint canbe defined as follows:

Problem: Destination_Mobility (DeMo)Given: A symmetric digraphG = (V, E), a coherentroutingR, a transmitters ∈ V , a destination setD andan integerk.Question: Does there exist a Broadcast scheme withsize at mostk ?

Complexity:Theorem 1:For somec > 0, ProblemDeMo is NP-

complete and not approximable withinc log(n),wherenis the number of vertices of the graph.To prove this theorem, we propose a polynomialreduction from theSET-COVER problem.

96



5

Proof: Problem DeMo is clearly in NP. Indeed, it iseasy to check in polynomial time for a given structuredtrees if one of them is (or not) a broadcast schemewith at mostk edges. Counting the number of edgesrequires at most| E | elementary operations. The treesgeneration can be done in polynomial time within thenumber of the structured trees by choosing recursivelythe initial extremity for each element ofD.To prove that Problem DeMo is NP-complete, wepropose a polynomial reduction from theSET-COVERproblem defined as follows:

Problem: Set Cover (SET-COVER)Given: A set C = {c1, . . . , cn} of n elements, aset S = {S1, . . . , Sm} of m subsets ofC such as:⋃m

i=1Si = C and an integerk′.Question:Does there exist a subsetS′ of S (calledcover of the set C:

⋃

i|Si∈S′Si = C) such that|S′| ≤ k′ ?

The problem SET-COVER has been shown to beNP-complete in [5].Consider any instanceI=(C, S, k′) of the ProblemSET-COVER. We transform this instance to the instanceI ′=(G, R, D, s, k) of the ProblemDeMo.

1) G is defined as follows:

• We define the sets of vertices:

ζ = {c1, . . . , cn}, ϕ = {S1, . . . , Sm},the vertexs and the setϕ′ = {S′

1, . . . , S′m}.

• We connect by a symmetric edge the vertexsto each vertex of the setϕ.

• We connect the vertexcj to the vertexSi byan edge if and only ifcj ∈ Si.

• So, the vertices set of the graphG is composedby: V = s ∪ ζ ∪ ϕ ∪ ϕ′.

• We consider a complete bipartite graph be-tweenϕ andϕ′.

2) The set of destinations is :D = ζ ∪ ϕ′ , and thetransmitter is the vertexs.

3) The routingR is defined as follows :

• For i ∈ {1, . . . , k′} , the routeR(s, S′i) is the

shortest path crossing first the arc toSi ∈ ϕ,then the arc fromSi to S′

i;• For each couple(i, j) the routeR(Si, S

′j)

is the arc that connect them in the completebipartite graph;

• For j ∈ {1, . . . , n} , if and only if cj ∈ Si thearc fromSi to cj is the routeR(Si, cj);

• For each other couples of vertices in the graphG, we consider a shortest path routing.

4) We choosek = k′ + m + n.

It’s clear that the number of vertices in the graphG is polynomial considering|C|. Thus, we obtain apolynomial reductionSET− COVER → DeMo.This construction is shown in the Figure 4.

Consider now that the answer to ProblemSET-COVER for I is positive. So, it exists a setC ⊂ Sof k′ elements, which cover the setC. We define aBroadcast scheme where the tree is induced by theunion of the edges of the following routes:

1) The routes of length1 from s to each vertexSj

such thatSj ∈ C.2) For eachi such thatSi 6∈ C, a route made of

length1 from a vertexSj ∈ C to S′i.

3) For eachci ∈ ζ, a route made of length1 from avertexSj ∈ C to ci such thatci ∈ Sj .

First, it is clear that this set of routes induces asubtree ofG rooted ins with k′ + m + n edges, depth2 and leave setD. Secondly, by considering routesfrom s to verticesS′

j such thatSj ∈ C, and all otheredges as routes, we can conclude that this tree definesa broadcast scheme.

Conversely, consider now that the answer to ProblemDeMo for I ′ is positive. So, it exists in the graphG a Broadcast scheme of at mostk edges withs astransmitter and where each destination (final extremity)is in D. Sinceϕ is a vertex set disconnectings from Dand since the maximal delay ism + n + 1, then the treecan only consists in some path froms to a subsetSϕ ofϕ and then one arc from a vertex inSϕ to each vertexin D. Thus, |Sϕ| = k′ and sinceSϕ is also a subsetof S, the answer is positive for instanceI to ProblemSET-COVER.

We conclude that ProblemDeMo is NP-complete.Now, let us demonstrate that ProblemDeMo is notapproximable withinc log(n), for somec > 0 with nthe number of vertices. We consider ProblemDeMo asan minimization problem. We have seen that any treein a broadcast scheme for any instanceI ′ describedabove consists in some path froms to a subsetSϕ

of ϕ and then one arc from a vertex inSϕ to eachvertex in D. Thus, the number of arcs in such a treeis equal toa + m + n and that this number is equalto opt + m + n where opt is the minimum size ofa solution for instanceI to Problem SET-COVERconsidered as a minimization problem.

Consider that for some constantc, Problem DeMois c-approximable, i.e., there exists a polynomialalgorithm providing a solution sizea× (m + n + 1)+ nfor I ′, and a solution of sizea for I, such that

97



6

a+m+nopt+m+n

≤ c. Thus, aopt

≤ c(

1 + 1opt

)

≤ 2c , i.e.,ProblemSET-COVER is 2c-approximable, which is acontradiction with the fact that for somec > 0, thisproblem is not approximable withinc log(m + n) [9].We conclude the proof of the Theorem 1.

Distributed algorithm to solve Problem DeMo:We give now more details about our proposedalgorithm to achieve an efficient intra-domain handover.A multicast datagrams are composed of two portions: afixed length header and a data field. The header containsa main destination address considered by the routingfunction, and a set of secondary destination addresses.An IP router receiving such a multicast packet has todecide if it creates and sends new multicast packetsthrough all the outgoing links or remains on the maindestination route. For each empty part on a link,no packet is sent. Note that the complexity of thisdistributed algorithm is low and that no large memorysize is required in routers. Moreover, this algorithm justconsists in piloting the usual IP multicast functions inthese routers.

The header of a multicast packetP consists in:

• A set of destinationDest(P ) and a main destina-tion Maindest(P ) ∈ Dest(P ).

• A delay Delay(P ) being the maximal number ofremaining steps for each destination inDest(P )to receive the packet.

For each couple of nodesu andv in the graph, let usdenote bysuccR(u, v) the successor ofu on the routeR(u, v). Consider now a nodeu receiving a multicastpacketP at a given step. Any distributed algorithm tosolve ProblemDeMo consists foru to send a multicastpacketPw to eachw ∈ Γ+

G(u) at the next step. Such adistributed algorithm has to respect the following rules.For each nodew = succR(u, v), let us first define:

Sent(w, P ) = {v ∈ Dest(P )s.t.|R(u, v)| = Delay(P )}.

Then,

1) Sent(w, P )⊂Dest(Pw),2) Delay(Pw) = Delay(P ) − 1,3) if w = succR(u, Maindest(P )) then

Maindest(Pw) = Maindest(P ),4) the set of subsets{Dest(Pw)| w ∈ Γ+

G(u) andDest(Pw) 6= ∅} is a partition ofDest(P ).

If for any w, Dest(Pw)=∅ then no packet is sent byu tow. Consideringu′ as the neighbor ofu having sentPu tou, we define for anyv ∈ Dest(Pw): Last(v)=|R(u′, v)|.

We Consider w0=succR(u, Maindest(P )). Then

for any w ∈ Γ+G(u) and respecting the previous rules,

the distributed multicast algorithm consists in thefollowing instructions:

Algorithm 1 NodeU receiving a multicast packetP

Require: PacketP ; w = succR(u, v).Ensure:

1: if w 6= w0 then2: Dest(Pw) = Sent(w, P )3: for all v ∈ Dest(P ) −

⋃

w∈Γ+

G(u)

Sent(w, P ) do

4: if |R(u, v)| ≥ Last(v) then5: Put v in Dest(PsuccR(u,v))6: end if7: end for8: for all w ∈ Γ+

G(u) and anyv ∈ Dest(Pw) do9: SetLast(v)=|R(u, v)|

10: end for11: end if12: Maindest(Pw)= the least number vertex in

Dest(Pw).13: Dest(Pw0

)=Dest(P )-⋃

w∈Γ+

G(v)−{w0}

Dest(Pw)

The step number1 guarantees the respect of theprevious rules. The step2 is the optimization stepof the algorithm consisting in identifying on thecurrent branch of the tree, if for any destinationv ∈ Dest(P ) −

⋃

w∈Γ+

G(u)

Sent(w, P ), the current vertex

u is or not the much closer vertex fromv.

To initiate the process, we consider that the transmitters has received a packetP with a node inD such thatthe distance froms to this node is maximum (thisdistance initializesDelay(P )). This node will be themain destination and all the other nodes inD composethe secondary destinations. For anyv ∈ Dest(P ), wealso setLast(v)=|R(s, v)|.

C. Alert Algorithm

This algorithm is based on works proposed in [19]for inter-domain network. We adapt this routing alertalgorithm for intra-domain network by introducing ahierarchy conceptin the TWINBOARD network similarto the one used in inter-domain network. Alert algorithmuses the existing intra-domain routing protocol function-alities. It acts directly on routing tables by disabling,activating or replacing routes. Our goal is to providerouters with information regarding the congestion stateof other routers without any change in the routingprotocol. We say that a router is:

98



7

• Perturbed or inred state: if the total amount oftraffic transitting through it, emitted by it, andsending to it exceeds its capacity.

• Stable or ingreen state: if its capacity exceeds itstraffic loads.

Each router informs its neighbors when it becomesperturbed in order to allow them to change their routingand it also keeps them informed when it returns to anoperational state. Each router is provided with:

• RTi containing the next hop for the intra-domainrouting protocol.

• Routing tableLTi containing a lists of the next hopstowards every destination. This table is altered byclassical intra-domain routing mechanism only.

• Priority table PTi storing a list of potentialcongestioned-free routes. It is same as theRTi tableif there are no perturbed routers.

• State Tablesti containing states of its neighbors.This table is updated by alerts sent by neighboringrouters when their states change. The default valuesin this table aregreen.

The alert message is composed of: an identifier ofthe router (ID), a new state, and a delayd. Once in therouter scheduler, the received message will be processedafter d delay (unit of time) and will replace any oldermessage that arrived from the same emitting router. Thedelay is set according to an exponential distributionin order to avoid synchronization in the network. Themean of the distribution is small forred alert and bigfor greenalert. By this way, we limit the emergence ofoscillations that can occur by exchanges ofgreen andred messages.

Algorithm 2 details the behavior of a routeri treatinga packetp received from the nodej. First, the routeriselects destinations for, which the routerj (in red state)is the next hop. The routeri then chooses uniformly thealternative next hops among routers ingreenstate andstored in the setLTi[dest]. If no node can be chosen,the nodei selects the routing path stored inRTi tablein spite of its state.

IV. SIMULATOR DESCRIPTION

In this section we simulate the behavior of the TWIN-BOARD network in case of intra-domain handoversby combining the two algorithms: Multicast and AlertAlgorithms. The two proposed algorithms are compati-bles with the embedded IP routing. The purpose of oursimulation is to show how the proposed scheme canbe adopted and compatible with the existing IP routingprotocols.

Algorithm 2 Node i treating a messagep

Require: messagep = (j, state, delay); tables RTi,PTi, LTi, sti

Ensure:deletep from the scheduler

2: sti[j] ⇐ stateif state = red then

4: for all dest ∈ V doif PTi[dest] = j then

6: let S = {x ∈ LTi[dest]/sti[x] = green}if S = ∅ then

8: PTi[dest] ⇐ RTi[dest]else

10: PTi[dest] ⇐ choose_uniformly_in(S)end if

12: end ifend for

14: elsefor all dest ∈ V do

16: if sti[PTi[dest]] = red thenif j ∈ LTi[dest] then

18: PTi[dest] ⇐ jend if

20: if RTi[dest] = j thenPTi[dest] ⇐ j

22: end forend if

A. Simulation environment

In order to prove our concepts and the advantages ofour algorithms, we have developed a simulator usingOMNeT++ [21], which is a free, open-source discreteevent simulation tool, similar to other tools like PAR-SEC, NS, or commercial products like OPNET. Other-wise, OMNeT++ contains definitions of many popularprotocols (UDP, TCP, IPv4, IPv6) as well as models ofbasic network nodes (routers, hubs, access points, gate-way, base station etc.) that help us on modelling intra-domain mobility without worrying about the underlyingmechanisms.

B. Topology and traffic Assumptions

The simulated network has been modelled accordingthe TWINBOARD architecture as shown in Figure 2.The main elements are Base Station (BS) and the AccessGateway (AG) interconnected by a network of basic IProuters. We used topologies with 80 nodes where thenumber of Gateway, Base Stations and their degree ofconnectivity are considered as parameters of simulation:

• The degree of the gateway is chosen greater than 1.• The degree of the BS is equal to 1 (Access Points).

99



8

• The topology of the IP routers network is generatedby the BRITE generator [22].

The traffic is generated by the set of BS (sources) then itflows through the IP routers to reach the gateway to befinally multicasted to the Base Stations members of themulticast group.TVi(j) represents the traffic amountto be delivered from the sourceBSi to the destinationBSj member of the multicast group. The traffic amountTVi(j) =

α

|Vp|2N (10, 8) generated by theBSi is a

random quantities given by a normal distribution withthe mean equal to 10 and the standard deviation equalto 8.

The parameterα is the flow’s desired traffic parameterused to vary the level of the traffic in the network. Tosimulate the intra-domain handovers event, we varyperiodically the traffic matrix as follows:

• We choose randomly and with certain probabilityPij two traffic matrix elementsTVi(j) andTVj(i).

• We reduce the value of the traffic matrix elementTVi(j) of a certain quantityq.

• We increase the value of the traffic matrix elementTVj(i) of a certain quantityq.

Indeed, we simulate the user handovers by a toleratetraffic matrix fluctuations. Although networks are engi-neered to tolerate some variation in the traffic matrix,large changes can lead to congested links and break theassumptions used in most designs. To simulate realisticscenarios, the cellular network (set of BSs) is representedby a random planar graph with degree equal to 6.

C. Simulator Modules

Our simulator deals with the IPv6 Mobility Extention.We can easily build different network scenarios byproviding a few simple parameters from, which thesimulator constructs the network automatically.

According to OMNeT++, the structure of our simula-tor is modular. We defined the modules and implementedtheir functions in C++. The main modules are:

• Gateway: The intelligent component implementsthe mobile extention management (Multicast andAlert Algorithms).

• Base Station:These elements represents all phys-ical radio access points belonging to the samenetwork.

• Router: This component stands for the whole wirednetwork between Base Station, servers and Gate-ways. It is responsible for routing packets andsimulates network delays as well.

V. PERFORMANCE ANALYSIS

With the limited charge capacity of routers and net-work overhead the optimization of multicast and theAlert mechanism are very important. We simulated threescenarios by varying the level of the traffic flowingthrough the network. This controlled-load function is in-voked by specifying the parameterα (described above):

• The first scenario depicts an unloaded network ob-tained with the valueα=0.06. This value engendersa minor traffic fluctuations.

• The second scenario is given by an average networkload (α=0.13). This value increases the networkload of 200% compared to the one in unloadednetwork.

• The third scenario corresponds to a heavily loadednetwork (α=0.5) enough to saturate 20% of routersin the network.

Note that these three values ofα are obtained byexperiences.

For each scenario, we investigate the following statis-tics to evaluate the performance of our approach andcompare it to the DVMRP (Distance Vector MulticastRouting Protocol) combined with the Alert algorithm :

• Number of perturbed routers: A router isperturbedif its traffic loads exceeds its capacity.

• Number of perturbed routes: A perturbed routeisa route containing at least one perturbed node.

• Volume of a perturbed traffic: Is the sum oftraffic passing on perturbed routes. Traffic fromito j is perturbed if the route fromi to j is perturbed.

We chose DVMRP protocol because it is anInteriorGateway Protocol (IGP); suitable for use within anAutonomous System (AS), but not between differentASs. DVMRP provides an efficient mechanism forconnectionless message multicast to a group of hosts.

To verify the functioning of our approach weperformed three series of simulation runs on the samenetwork topologies (50 routers) for three values ofα.All the graphs follow a common format. Each oneshows simulation statistics obtained by the proposed andDVMRP algorithms (combined with Alert algorithm) forthe three scenarios mentionned above. In what follows,we denote bystandard approach the combinationof the two algorithms: DVMRP and Alert. Whileproposed approach will denote the combination of themulticast algorithm with the Alert mechanism.

Figure 5 illustrates simulation results for a heavily

100



9

0

2

4

6

8

10

12

14

0 50 100 150 200 250 300 350 400 450 500

Num

bre

of p

ertu

rbed

rou

ters

Time [S]

-a-

Standard ApproachProposed Approach

0

20

40

60

80

100

120

140

0 50 100 150 200 250 300 350 400 450 500

Num

ber

of p

ertu

rbed

rou

tes

Time [S]

-b-


0

50

100

150

200

250

300

350

400

0 50 100 150 200 250 300 350 400 450 500

Per

turb

ed F

low

[TU

]

Time [S]

-c-


Fig. 5. Simulation results for 80 nodes network and high networkcharge (one simulation run,α=0.5).

loaded network (α=0.5). Figure 5.a shows that thenumber of perturbed routers obtained with the standardapproach reaches eleven routers when the traffic matrixis filled up and it oscillates around this values. Ourapproach reduces this value to oscillates around eightrouters and it reaches six routers in the best cases. Weobserve that for the traffic valueα=0.5 almost 20% ofthe routers are perturbed. In this case the performanceof our algorithms approaches those of DVMRP. Thesame analysis can be done for the Figures 5.b and3.c. Indeed, because of the high network load, ouralgorithms can not find substitute routes to achieve

0

2

4

6

8

10

0 50 100 150 200 250 300 350 400 450 500

Num

bre

of p

ertu

rbed

rou

ters

Time [S]

-a-


0

20

40

60

80

100

0 50 100 150 200 250 300 350 400 450 500

Num

ber

of p

ertu

rbed

rou

tes

Time [S]

-b-


0

50

100

150

200

0 50 100 150 200 250 300 350 400 450 500

Per

turb

ed F

low

[TU

]

Time [S]

-c-


Fig. 6. Simulation results for 80 nodes network and average networkcharge (one simulation run,α=0.13).

an efficient load-balancing and consequently it cannot reduce the number of perturbed routers, perturbedroutes and the quantity of the perturbed flow.

For an average network charge (α=0.13), Figures6 show that our approach results are significantly betterthan those obtained by the standard approach. Thenumbers of perturbed routers, perturbed routes and thequantity of the perturbed flow, presented inFigure 6.a,Figure 6.b and Figure 6.c , exhibit that our algorithmpresents performance four times better than DVMRP. Inthese figures, the spades correspond to the fluctuations

101



10

0

2

4

6

8

10

0 100 200 300 400 500

Num

bre

of p

ertu

rbed

rou

ters

Time [S]

-a-


0

20

40

60

80

100

0 100 200 300 400 500

Num

ber

of p

ertu

rbed

rou

tes

Time [S]

-b-


0

20

40

60

80

100

0 100 200 300 400 500

Per

turb

ed F

low

[TU

]

Time [S]

-c-


Fig. 7. Simulation results for 80 nodes network and low networkcharge (one simulation run,α=0.06).

in the traffic that provoke additional router saturation.Our algorithms react by finding a new (better) routes toreduce immediately the number of perturbed routers. InFigure 6.c, we observe that our algorithms reduce thequantity of perturbed flow to a third.

The graphs inFigure 7, show that even in unloadednetwork (α=0.06), the standard approach presents itslimits. Since we register in some cases two perturbednodes. On the opposite, our proposed approach findsquickly an efficient routing with no perturbed routerseven with the traffic fluctuations.

0

5

10

15

20

25

30

1.000.500.130.090.06

Nom

bre

moy

en d

e ro

uteu

rs s

atur

es

Alpha

DVRMP-Multicast+AlerteOpt-Multicast+Alerte

Fig. 8. The number of perturbed routers depending on the traffic loadaveraged for series of simulations runs.

To investigate the influence of the traffic load onthe performance of our approach, we used averagedseries of simulation runs (100 network topologies).Each series had a different load by specifying theparameterα, which varies from light loadα = 0.06to saturating loadα = 1. Figure 8 shows that for thesaturating load value almost half of the network routersare perturbed. In this case the performance of ouralgorithms approaches those of the standard approach.The average values in Figure 8 confirm those shownin Figures 5, 6 and 7. Clearly, the performance of ourapproach decrease when the the parameterα increases.

VI. CONCLUSION

We have presented a novel approach to manageIP micro-mobility using intra-domain multicast withalert mechanism. Our algorithms achieve efficientlytwo major functions in the mobile network: mobilitymanagement and network congestion avoiding. In termsof multicast performance, our algorithm achieves anoptimized multicast in terms of the number of the usedlinks. Also, it provides minimal break in service sinceit is based on handovers prediction.

In this paper, and with the context and the goalsof TWINBOARD project, our simulation resultsshow that with a good and a priori established trafficengineering, our proposed approach performs a reliableintra-domain handover with congestion avoiding. Ourmulticast algorithm outperforms DVMRP because of itsminimal number of perturbed routers.

In the future, we plan to conduct further simulations toinvestigate real handover scenarios. We also would liketo investigate the extention of our approach to supportdynamic multicast group membership.

102



11

REFERENCES

[1] Y. Benallouche and D. Barth, “Optimized multicast tree forhandover in a two-nodes mobile network architecture basedon an all-IP infrastructure,”Proceedings of the Fourth Inter-national Conference on Wireless and Mobile Communications(ICWMC’08), Athens, 2008, pp. 179-184.

[2] A. Ballardie, “Core Based Trees (CBT Version 2) MulticastRouting:Protocol Specification,” IETF, Internet Draft, RFC 2189,September 1997.

[3] S. Deering, “Protocol Independent Multicast-Dense Mode (PIM-DM):Protocol Specification,” IETF, Internet Draft, RFC 2365,July 1998.

[4] D. Estrin, “Protocol Independent Multicast-Sparse Mode (PIM-SM):Protocol Specification,” IETF, Internet Draft, RFC 2362,June 1998.

[5] M. R. Garey and D. S. Johnson, “Computers and intractability.A guide to the theory of NP-completness,” San Francisco, W. H.Freeman, 1979.

[6] A. Helmy, M. Jaseemuddin and Ganesha Bhaskara, “EfficientMicro-Mobility using Intra-domain Multicast-based Mechanisms(M&M),” In ACM SIGCOMM Computer Communication Re-view, Vol. 32, No. 5, 2002, pp. 61-72.

[7] A. Helmy, “A Multicast-based Protocol for IP Mobility Sup-port,” In ACM SIGCOMM Second International Workshop onnetworked Group Communication (NGC 2000), Novembre 2000.

[8] N.K. Karthirkeyab and P. Narayanasamy, “A Novel MobilityBased Bandwith Reduction Algorithm in Cellular Mobile Net-work,” In International Journal of Computer Science and Net-work Security, Vol.8 ,No.5, 2008, pp. 62-69.

[9] C. Lund and M. Yannakakis, “On the hardness of approximatingminimization problems,”In Journal of the ACM (JACM), v.41n.5, p.960-981, Sept. 1994.

[10] J.Moy, “Multicast Extension to OSPF,” IETF, Internet Draft, RFC1584, March 1994.

[11] C. Perkins, “IP Mobility Support,” RFC 2002, Internet Engineer-ing Task Force, October 1996.

[12] G.N. Rouskas and I. Baldine, “Multicast Routing with end-to-end delay and delay variation constraints,”In IEEE JSAC, April1997, pp. 346-356.

[13] T.C. Schmidt and M. Wählisch, “Analysis of Handover Fre-quencies for Predictive, Reactive and Proxy Schemes and theirImplications on IPv6 and Multicast Mobility,”Proceedings ofthe fourth International Conference on Networking (Networking– ICN 2005), Reunion Island, 2005, Part II, Lecture Notes inComputer Science 3421, pp. 1039–1046.

[14] T.C. Schmidt and M. Wählisch, “Performance Analysis ofMul-ticast Mobility in a Hierarchical Mobile IP Proxy Environment,”Proceedings of the TERENA Networking Conference, 2004.

[15] W. Su, S.J. Lee and M.Gerla, “Mobility Prediction and Routing inAd hoc Wireless Network,”In International Journal of NetworkManagement, vol.11(1), 2001, pp.3-30.

[16] K. Suh, D.-H. Kwon, Y.-J. Suh, and Y. Park, “Fast MulticastProtocol for Mobile IPv6 in the fast handovers environments,”IETF, Internet Draft, February 2004.

[17] B. Wang and J.C. Hou. “Multicast Routing and its QoS Ex-tension:Problems, Algorithms, and Protocols,” IEEE Networks,January/February 2000.

[18] D. Weitzman and C.Partridge, “Distance Vector Multicast Rout-ing Protocol,” IETF, Internet Draft, RFC 1075, November 1998.

[19] M. Weisser, J. Tomasik and D. Barth, “Congestion Avoid-ing Mechanism Based on inter-domain Hierarchy,” In SpringerBerlin/Heidelberg, editor, IFIP Networking, volume 4982,pp.470-481. LNCS, 2008.

[20] The Next Generation Mobile Networks project WEB pages.SeeURL http://www.ngmn-cooperation.com.

[21] The OMNET++ Simulator. See URL http://www.omnetpp.org.[22] The Brite Generator. See URL http://www.cs.bu.edu/brite/.[23] WiMAX Forum. WiMAX Forum Mobile System Profile. 06-07.

103



Abstract— In this paper, we study and evaluate the

overhead for a security algorithm based on clustering in

MANET networks. The analysis of the communication

overhead in Ad Hoc networks is an important issue because it

affects the energy consumption and the limited battery life

time of the mobile nodes. The algorithm partitions the network

into clusters based on affinity relationships between nodes and

two types of keys which are generated by a clusterhead. The

first one is shared by a clusterhead and its local members and

the second one is shared by the clusterhead and its parent

cluster. The performance evaluation and communication

overhead analysis of the proposed protocol are presented using

simulation.

Keywords: ad hoc networks, clustering, security, key

management.

I. INTRODUCTION

Moving from wired networks to wireless mobile networks

leads to use pervasive networks with new network-

computing challenges. Ubiquitous Computing (UC) is a

concept that deals with providing available services in a

network by giving users the ability to access services

anytime and irrespective to their location. Pervasive

Computing (PC) is often considered as the same as

ubiquitous computing but the main objective in PC is to offer

spontaneous services created on the fly by mobiles nodes

that interact with ad hoc connections [1].

Mobile ad hoc networks (MANET) are autonomous

systems created by mobile nodes without any infrastructure.

Currently, MANET has gained popularity in mobile

pervasive applications, such as electronic business,

emergency teams, etc. These applications support group

communications, auto-adaptive discovery and composition

services. Most of research on the pervasive communications

in MANET mainly focused on admission control and

resource management (like bandwidth, energy consumption,

interferences, etc.) to perform the communication in these

mobile networks. However, in these types of applications,

secure group communications is very critical and is a major

concern.

Clustering in MANET is a challenging issue because of

the dynamic network topology changes. Clustering algorithm

partitions a network into different clusters, creating a

network hierarchy in the network. In general, clustering

algorithms can be divided into cluster formation stage and

cluster maintenance stage. A particular node is elected in a

cluster to manage the cluster information is known as the

clusterhead, and the other nodes are its members.

In MANET, to ensure a confidential communication

between two or several mobile nodes, traffic can be

encrypted and only receivers can decrypt data [2, 3].

Furthermore, MANET may be highly versatile, involving

short-lived communications between nodes that may never

have met before, thus complicating the initial trust

establishment and trust maintenance. Thus, new solutions

should be introduced to support efficient and secure group

communication in mobiles pervasive networks with respect

to the dynamic network topology induced by the node

mobility and unreliable communication. Also, this type of

network does not have any trust node for key management,

like a central reference, to ensure the message

encryption/decryption. This cannot actually satisfy MANET

dynamic environments. To solve this problem, one of the

approaches is to share a secret key called “group key” [4].

When a member joins a group, the group key is rekeyed to

ensure that the new member cannot decrypt previous

messages, a security requirement known as backward

secrecy [5]. When a member leaves the group, the group key

is re-keyed to ensure that future communications cannot be

decrypted by the leaving member, a security requirement

known as forward secrecy.

In MANET, it is not easy to control mobile members of a

cluster and the frequency of their adhesions. The security

algorithm must support the mobility problem and the

clusters‟ scalability. Therefore, to solve these problems and

ensure trusted communications in a MANET environment,

the major solution is to introduce an efficient key

management algorithm, adequate to manage and distribute

keys to cluster members in order to encrypt/decrypt multicast

C. Maghmoumi H. Abouaissa

University of Haute-Alsace University of Haute-Alsace

F-68000 Colmar, France F-68000 Colmar, France

[email protected] [email protected]

J. Gaber P. Lorenz

Belfort University University of Haute-Alsace

F-90000 Belfort, France F-68000 Colmar, France

[email protected] [email protected]

[email protected]

Analysis of Communication Overhead for a Clustering-Based Security

Protocol in Ad Hoc Networks

104



data. An efficient security algorithm should provide a rapid

re-keying process and be adaptive to frequent topology

changes.

The analysis of communication overhead in Ad Hoc

network is related to different parameters, e.g. network size,

node mobility, node transmission range and network density.

An efficient clustering and key management algorithm must

support all these network parameters in order to minimize

the messages overhead.

The rest of the paper is organized as follows. Section II

overviews the related work. In section III, we present an

overview on security in ad hoc networks. Section IV

introduces the proposed key management protocol. Section

V specifies proofs for the proposed protocol. In Section VI,

we implemented and evaluated the proposed protocol and

section VII draws conclusions and future works.

II. RELATED WORK

Recently, many clustering algorithms have been proposed

for mobile ad hoc networks in order to improve the

efficiency of routing protocols and save energy or to

implement efficient flooding and broadcasting mechanisms.

Haddad and Kheddouci presented in [6] a classification of

topology-based approaches to define an efficient

organization over the network to optimize communication

protocols for routing, service discovery, resource sharing

and management.

Many group security algorithms or protocols have been

proposed for MANET in the literature. They can be divided

into two categories: centralized and distributed protocols

[11]. In centralized protocol, only a single node controls all

the other nodes. Therefore, the re-encryption process is

managed only by this node. This protocol can optimize

network resources. However, since there is only one key

manager in the group, it is probable that this node breaks

down [12, 13]. In [14], the authors proposed two key

agreement protocols based on the threshold cryptography

using the Lagrange interpolation theorem. This approach

seems theoretically efficient; however, it focuses only on a

special case of scenario. In [7], a hierarchical protocol based

on multicast source key is proposed. The source node

provides keys to its local members and to groups‟ leaders. A

new node that will be joined to a group should negotiate

with group‟s leader, then, the latter informs the source node

to get a new key. The only role of leaders is to manage

received keys from the source node. Although, the protocol

secures the network, its complexity is high due to multiple

key‟s generation to maintain group communication security.

Luo et al. [17], [18] chose a different method to distribute

the certification process. They use a specially crafted key

sharing algorithm distributing the key amongst all nodes

instead of a subset only. Upon this, Luo et al. build an

access control system based on signed tickets issued by

neighbors of the node seeking access.

In [15], the authors proposed an analysis of the overhead

involved in clustering for one-hop clustered ad hoc

networks. This analysis captures the effects of different

network parameters, e.g. node mobility, node transmission

range, and network density on the amount of overhead that

clustering algorithms may incur in different network

environments. But the authors in this analysis focused

mainly on the cluster maintenance stage and only one-hop

clustering algorithms are considered. In [16], the authors

introduced a cluster-based architecture for a distributed

public key infrastructure. This architecture is adapted to the

highly dynamic topology and varying link qualities in ad

hoc networks but the overhead is very high.

The proposed protocol in this paper differs from previous

studies in three ways:

1) We don‟t require any centralized key control

component to manage and distribute keys. Encryption keys

are generated by clusterhead and re-encrypted by

participating sub-clusterheads.

2) A dynamic clustering algorithm that is adaptive to

frequent topology changes is used.

3) Since the key distribution process is totally

decentralized and the keys are shared by different

communication groups, the proposed protocol can be used

to build a generic security service for multicast

communication.

III. SECURITY IN AD HOC NETWORKS

The security of a multicast group requires that only group

members can access the data transmitted by the source, even

if these data are diffused in the network. To ensure this

confidentiality, a symmetric key is used by the source to

encrypt data, and by the members to decrypt data. This key

is called TEK (Traffic Encryption Key). The life of a

session of a secure group is represented by a set of time

intervals, each interval is defined by a change in the status

of the group (join or leave a member) as shown in figure 1.

To preserve data confidentiality of the group, it is necessary

to renew the encryption key after each event (join or leave

the group). A member who leaves the group should no

longer be able to decrypt data.

Figure 1. Life‟s changes of a secured group.

... ...

ti

Join x

t i+3 t i+2

Join z

Leave y Leave t

t i+1

105



A. Classification of group key management approaches

Several architectures for group key management in

networks have been proposed and developed; we can

classify them into three approaches according to the number

of TEKs used as shown in figure 2.

Figure 2. Classification of group key management protocols.

In the approach p TEK (Traffic Encryption Key), each

subgroup shares a local TEK generated by a local controller,

p is the number of sub-groups of the multicast group. This

hierarchical approach is to address the problem 1 affects n.

At each arrival/departure to/from the group, only the

subgroup affected by this change will change its local TEK.

In the approach (1 to p TEKS) the protocol begin with a

centralized approach (1 affects n) and dynamically switches

towards a subdivision of the group into subgroups (p TEK).

The approach (1 TEK) is to share a one encryption key TEK

used by the source and members in order to encrypt/decrypt

multicast data, this approach can be used in a centralized

architecture (no clustering) or hierarchical ( static or

dynamic clustering). Table 1 shows the classification of

different protocols from the literature.

Table. 1. Comparison of group Key Management protocols

IV. CLUSTERING BASED SECURITY PROTOCOL

A. Clustering approach

As the topology of a MANET changes, clustering

messages are generated by nodes to update a node of

changes to its cluster members or clusterhead. The

execution of clustering algorithms can usually be divided

into cluster formation stage and cluster maintenance stage

[20, 21]. Different clustering protocols may use different

schemes and generally there are three types of clustering

messages:

a) Join message, for nodes to know the neighbor‟s

identities. The HELLO message is often used.

b) Acknowledgement message to accept new node in

the cluster.

c) Leave message to remove a node from a cluster.

In what follows, for the clustering overhead analysis, we

denote the network size by N, a cluster size by NCi (the

number of members in the cluster Ci), the network density

by , and the transmission range is r. The average cluster

size NCi is given by NCi = N/n where n is the number of

clusters in the network.

Two properties for clustered networks should be ensured

and any violation will trigger clustering messages at

relevant nodes [15]:

P1. No cluster-heads directly connected to each other.

P2. Each node should belong to only one cluster.

The main idea underlying this protocol is to divide the ad

hoc network into clusters according to affinity relationships

between involved nodes [8] and uses the Key Management

Protocol proposed initially in [9]. Once the clusterhead is

selected, it handles two KEKs (Key Encryption Key), one

shared by clusterhead and its local members, and the second

is shared by the clusterhead and the parent cluster.

Affinity relationships between the nodes can be

determined according to the services they provide. A

service can be described by four main parts [8]:

Protocol Number of KEK Clustering

IOLUS p static

AKMP 1 to p dynamic

SAKM 1 to p dynamic

Chu et al 1 no clustering

DEP p static

Baal 1 dynamic

BALADE 1 to p dynamic

Maghmoumi et al 1 to p dynamic

(1) TEK (1 to p) TEKs

Static

clustering Dynamic

clustering

1 KEKs

(1 to p)

KEKs

(p) TEKs

Service –based

clustering

Clusterisation

Group key management

Without

clustering

Ex : IOLUS Ex: AKMP, SAKM

Ex : DEP Ex : Chu et al

Ex : Baal Ex : BALADE Ex : Maghmoumi et al

(1 to p)

KEKs

106



1) the attributes.

2) the capsule.

3) constraints and requirements.

4) set of relevant semantic keywords.

Attributes contain the characteristic of the service, such

as operations that can be invoked and their input and output

parameters. The capsule includes information about the

service localization, the invocation protocol and the port

number. The constraints and requirements give information

about the resources needed to execute the service. The set of

semantic keywords are used by for matching relevant

keywords to each nearby service. [1].

Ad hoc networks are characterized by the node‟s mobility

of nodes; several nodes can move with different speeds. Our

goal is to form stable clusters; in this case, we set a given

threshold to separate the clusters formed by high-speed or

slow nodes. In basic ad hoc networks, nodes can exchange

[RTS, CTS, DATA, and ACK] messages, via a complete

virtual graph, in order to guarantee group self-stability by

the homogeneous mobility of nodes and thus ensure a

reliable communication between wireless mobile nodes.

In this protocol, the source node generates a TEK

encrypted in a Key Encryption Key KEKi that should be

sent to its local members. Once, each clusterhead receives

the encrypted data, it decrypts it and re-encrypts it with its

own KEKj, then forwards it to its descendents. The join or

leave events within each cluster results in the KEKj re-

keying by the clusterhead. Therefore, the proposed protocol

belongs to the dynamic clustering algorithm with one TEK

and 1 to p KEK, where p is the number of clusters that

constitute the ad hoc network. That makes it possible to

optimize the cost of data encrypting and decrypting

processes and to reduce the 1 affects n phenomena [10].

The clustering-based key management protocol consists

of two tasks:

1) Cluster Formation

The cluster formation starts when a node Ni boots

up and sends a cluster join message JOINreq to its

neighbors. This message contains the description of

a service D (Si) and a number IDi that identifies this

service.

When a node Nj receives the message JOINreq (D

(Si), IDi), it examines the compatibility of the

service Si with a service Sj using MATCH (D (Si),

D (Sj)) and sends a response message Rep to Ni that

contains the rate of available energy f(Ej) expressed

by (1) [19]:

𝑓 𝐸 =Emax − Econs

Emax (1)

Econs = Econs + Ereq + ε ; ε≥

Where Emax is the maximum energy of the node, Econs is

is the energy consumed, Ereq is the energy required to

transmit a packet and ε is the energy that can be lost in the

environment due to factors not anticipated [22].

Cluster head

Sub-Cluster head

Ordinary node

Figure 3. Ad hoc network partitioned into in 4 clusters

When the node Ni receives the response message

Rep (ID, f (Ej)), it verifies the RTT (Round Trip

Time) and the f(Ej) and sends back an

acknowledgement message ACK containing a flag

that indicates the validation of theses parameters.

Then, it adds the node Nj to the list of the cluster

Ci„s members that consider Ni as clusterhead (the

first member of the list). Once the clusterhead is

chosen, it generates a KEK that should be sent to its

local members.

If a member Nj is at two hops from the clusterhead

and if there are at least two nodes belonging to the

cluster via Nj, then Nj becomes a clusterhead for a

sub-cluster that contains these nodes. It generates

thus a KEKj for its own cluster as shown in figure 3.

The formal description of the protocol process is

described in figure 4.

KEKj

KEKi

Ci

Cj

Nj

Ni

107



When a node Nj receives JOINreq (D (Si), IDs) then

mij = MATCH(D (Si),D (Sj))

if ijm ≥ σ then

send RepN (IDi, IDs,f(E))

fi

When a node Ni receives RepN (IDi, IDs,(E)) then

if RTT ≤ β then

if )ƒ(E ≤ α then

send ACK (IDj, IDs, non)

else

send ACK (IDj, IDs, ok)

CLi = {CLi U Nj}

Send ({CLi}, KEKi)

fi

fi

When a node Nj receives ({Cli}, KEKi) then CLj = CLi

When a node Nj is at 2 hops from the clusterhead then

if H ≥ 2 then

CLj = {Nj ∪{H}}

send ({CLj}, KEKj)

fi

// Ni ∈{H} => Ni is at 3 hops at least from the clusterhead

// and Ni received the ACK via Nj

When a node Ni receives LeavNj (IDs) then

CLi = {CLi \ Nj}

send({Cli}, KEKi)

When a node Nj receives LeavHi (IDs) then

If Nj {CLi} then

send JOINreq (D (Sj), IDs)

else

CLj = {CLj \ Ni}

send ({CLj}, KEKj)

Figure 4. Clustering-based security protocol

Consequently, each clusterhead handles two KEKs:

1. KEKi: shared between clusterhead and its local

members.

2. KEKj: shared between the clusterhead and its parent

cluster.

2) Cluster Maintenance

When a member leaves the cluster Ci, it sends a

leave message to the clusterhead which removes it

from its list of nodes CLi, regenerates a new KEK

and transmits it to its local members except the

departing member. In this step the number of

messages sent is NCi – 1.Where NCi is the number

of nodes in the cluster Ci.

If a clusterhead Nj leaves the cluster, it sends to its

members a leave message in addition to its parents

cluster members. When the clusterhead of the parent

cluster Ni receives this message, it removes it from

the list, regenerates a new KEKi and transmits it to

its local members except the departing member.

Each member of the leaving clusterhead send join

message JOINreq to its neighbors. In this step the

number of messages sent is NCi + NCi – 2.

The total number of messages is thus:

𝑀𝑙𝑒𝑎𝑣 = 2NC𝑖 + NC𝑗 – 3 (2)

When a new node Nj joins a cluster, it sends

messages to its neighbors. The clusterhead that

accepts this node regenerates a new KEK for its

new list of nodes. The number of messages sent in

this step is:

𝑀𝑗𝑜𝑖𝑛 = NC𝑖 + µ (3)

Where µ is the expected number of network neighbors of

a randomly node that depends on ρ, r and N.

The figure 5 shows the process of joining the cluster and

the figure 6 shows the process of leaving the cluster.

108



Figure 5. Process of joining the cluster

Figure 6. Process of leaving the cluster

B. Correctness proof

In this part, we explain the expected value of the number

of network neighbors of a randomly chosen node µ and we

analyze the capabilities of our protocol that ensures

communication trust when a node joins or leaves the cluster.

Lemma 1:

The expected number of network neighbors of a

randomly selected node is:

µ = O( N − 1 r2

S) (4)

Proof:

The expected number of network neighbors of a

randomly selected node is given in [15]:

µ = (N − 1)r2ρ

N

r2

2

ρ

N−

8r

3

ρ

N+ π

within an area S with density ρ, we obtain equation (4)

Start

D (Si), IDs, (E)

Send JOINreq (D (Si), IDs)

ijm ≥ σ

End

Send Rep (IDi, IDs,f(E))

Send ACK (IDj, IDs, ok)

RTT ≤ β

f(E) ≤ α

Yes

No

Yes

Yes

No

No

CLi = {CLi U Nj}

Send ACK (IDj, IDs, non)

Send ({CLi}, KEKi)

Start

IDs, KEK

Send LeavX(IDs)

X is CHi

End

Send JOINreq (D (Si), IDs)

N{CLi}

Yes

No

No

CL = {CL \ X}

Send ({CL}, KEK)

Yes

109



Lemma 2:

The new node that joins the cluster cannot decrypt past

encrypted data.

Proof:

Assume that a new node Ni sends a cluster join message

to clusterhead. Ni cannot decrypt messages because any

node cannot decrypt data as long as it does not receive the

acknowledgment from its clusterhead. In fact, when the

clusterhead receives a new join message, it updates the list

of members and regenerates a new key KEKi, then sends it

to its local members. The proof could drive to that every

node must have a KEK key to decrypt and encrypt data

traffic which proves that the proposed protocol guarantees

the backward secrecy.

Lemma 3:

The node which leaves the cluster cannot decrypt the

future data.

Proof:

Leaving of ordinary node from a cluster is

uncomplicated. The node sends a leave message to the

clusterhead that leaves this node from the list of members

and regenerates a new KEK broadcasted to all local

members in the new list. However, when a clusterhead

wants to leave the network, it must inform the upper

clusterhead to re-encryption its key and secure the data

transmission of the upper cluster. Also, nodes belonging to

the same clusters should re-construct a new key. Therefore,

the forward secrecy is guaranteed.

Theorem:

In Ad Hoc networks, the security in multicast

communications is guaranteed.

Proof:

From lemma 2, it is proved that backward secrecy is

guaranteed. In lemma 3, we have proved that the forward

secrecy is guaranteed. Therefore, the security in multicast

communications is guaranteed.

V. SIMULATION RESULTS

From the equations (2) and (3), we can calculate the total

number of KEK messages sent during the clustering for one

cluster (n=1):

𝑀𝑡𝑜𝑡𝑎𝑙 = 𝑀𝑗𝑜𝑖𝑛 + 𝑀𝑙𝑒𝑎𝑣

𝑀𝑡𝑜𝑡𝑎𝑙 = 3NC𝑖 + NC𝑗 + µ – 3

The total number of KEK messages sent for n

clusters is:

𝑀𝑡𝑜𝑡𝑎𝑙 = 3NC𝑖 + NC𝑗 + µ – 3𝑛−11 (5)

The aim of simulations we have performed is to study the

impact of transmission range of nodes r and density ρ on

cluster formation and KEK messages overhead. We are also

interested in studying the impact of the number of clusters

with respect to the number of nodes on the number of KEK

messages sent in ad hoc network.

Figure 7. Number of KEK messages sent with and without clustering for

N = 200 nodes and r = 30 m

The first simulation is performed (figure 7) with 95 nodes

for a transmission range r = 22 meters and the network

density ρ is varied from 0.2 to 0.6 (the number of nodes per

unit area). In the second simulation (figure 8), we increased

the number of nodes to N = 200 for a transmission range r =

30 meters with the same variation of density (ρ = 0.2, 0.3,

0.4, 0.5, 0.6). The two simulations are evaluated for a

different number of clusters (n = 3, 5.8) and compared with

the case where there is one cluster (n = 1) or we can say that

there is no clustering in the network. The two figures (7, 8)

show that when the number of clusters increases, the

number of KEK messages will be decreased because each

node joins or leaves the network affects a single cluster. The

case (1 affect N) has been avoided because each node in this

case affects only K nodes (where k is the number of nodes

in a cluster).

Figure 8. Number of KEK messages sent with and without clustering for N = 95 nodes and r = 22 m

110



The simulation results show also the benefits of using

clusters for the management and maintenance of keys.

Increasing the density allows us to have clusters with very

high cardinality, which reduces the number of KEK

messages in ad hoc network and ensures efficient key

management.



In this part of simulation, the aim is to study the impact of

transmission range of nodes r and density ρ on cluster

formation and KEK messages overhead. The number of

nodes is fixed at 95 nodes with a transmission range varied

(r = 18, 20.22). We observed for each simulation the change

in the number of KEK messages sent during cluster

formation for several values of density ρ.



Figures 9, 10, 11 show that when the transmission range

of node r increases, the number of KEK messages decreases.

Similarly, getting clusters with a huge amount of nodes and

wide coverage will increase the probability of staying these

node within the cluster, and this will lead up to decrease the

number of KEK messages and increase the stability of the

cluster.



In this part of simulation, we fixed transmission range of

nodes r = 20 m and we increase the number of nodes with

the same change of the number of clusters (n = 1, 3, 5, 8).

The aim is to check the impact of the number of nodes with

the number of clusters on KEK messages overhead.

In the figures 12, 13, the simulations show that when we

used 95 nodes and 100 nodes with a different number of

clusters (n = 3, 5, 8), the number of KEK messages

decreased, but when we used 250 nodes with the same

number of clusters, the number of KEK messages increased

over a number of KEK messages sent in the network

without clusters (n = 1) as shown in figure 14. So the

number of clusters must be compatible with the number of

nodes in the ad hoc network.



111







VI. CONCLUSION

In this paper, we have presented an analysis of

communication overhead for a security protocol based on

dynamic clustering in ad hoc networks. The main idea of

this protocol is based on the affinity relationships between

the nodes for cluster formation, when a node is chosen as

clusterhead, it generates two KEKs. The first key is shared

by clusterhead and its local members and the second key is

shared by the clusterhead and its parent cluster. The

proposed protocol is scalable for large and dynamic

multicast groups. For evaluate the performance of the

proposed protocol, we have calculated the number of KEK

messages sent during protocol steps and we have performed

several simulations for analysis of communication overhead,

we have studied the impact of transmission range of nodes r,

density ρ and the number of clusters with the number of

nodes on KEK messages overhead in ad hoc network. The

work presented provides a good basis for further analysis on

the performance of clustering protocols for MANET

networks.

In future work, the communication overhead analysis will

be investigated and compared with different clustering

based protocols.

VII. REFERENCES

[1] J. Gaber, “Spontaneous Emergence Model for Pervasive

Environments,” Proc. IEEE GLOBECOM Workshop. Washington, November 2007.

[2] F. Stajano and R. Anderson, “The Resurrecting Duckling: Security

Issues for Ad-hoc Wireless Networks,” Proc. 7th International Workshop on Security Protocols, 1999.

[3] F. Stajano, “The Resurrecting Duckling: What Next?,” Proc. 8th

International Workshop on Security Protocols, B. Crispo, M. Roe, and B. Criso, Eds., Lecture Notes in Computer Science, Vol. 2133,

Berlin: Springer-Verlag, April 2000.

[4] C. K. Wong, M. Gouda, and S. S. Lam, “Secure group communications using key graphs,” Proc. IEEE/ACM Transactions

on Networking, vol. 8, 2000, pp.16–30.

[5] Y. Kim, A. Perrig, and G. Tsudik, “Tree-based group key agreement,” ACM Transactions on Information and System Security,

vol. 7, 2004, pp.60–96.

[6] M. Haddad and H. Kheddouci, “A survey on graph based service discovery approaches for ad hoc networks,” Proc. IEEE International

Conference on Pervasive Services. Istanbul, 2007.

[7] M. S. Bouassida, I. Chrisment, and O. Festor, “Validation de BALADE. INRIA research rapport N 5896, April 2006.

[8] M. Bakhouya and J. Gaber, “An affinity-driven clustering approach

for service discovery and composition for pervasive computing,” Proc. IEEE International Conference on Pervasive Services, Lyon,

France, 2006.

[9] N. Kettaf, A. Abouaissa, P. Lorenz, and H. Guyennet. “A self organizing algorithm for ad hoc networks, ” In Proceedings of the 10th IFIP Int. Conf. on Personal Wireless Communication, Colmar, France, August 2005.

[10] Y. M. Tseng, C. C. Yang, and D. R. Liao, “A Secure Group

Communication Protocol for Ad Hoc Wireless Networks,” Advances

in Wireless Ad Hoc and Sensor Networks and Mobile Computing, Book Series ”Network Theory and Applications,” Springer 2006.

[11] K. C. Chan and S. H. Chan, “Key Management Approaches to Offer

Data Confidentiality for Secure Multicast,” Proc. IEEE Journal on Network, pp. 30-39, October 2003.

[12] C. C. Chang and C. Y. Chung, “An Efficient Session Key Generation

Protocol,” Proc. IEEE International Conference on Communication Technology, Beijing, China, pp. 203-207, April 2003.

[13] I. R. Chen, J. H. Cho, and D. C. Wang, “Performance Characteristics

of Region-Based Group Key Management in Mobile Ad Hoc Networks,” Proc. IEEE International Conference on Sensor

Networks, Vol. 1, pp.411-419, June 2006.

[14] J. Pieprzyk and C. H. Li, “Multiparty Key Agreement Protocols,” IEE Journal on Computers and Digital Techniques, pp.229-236, July

2000.

[15] X. Mingqiang, E. R. Inn-Inn and K. G. S. Winston, “Analysis of

Clustering and Routing Overhead for Clustered Mobile Ad Hoc

Networks”, (ICDCS'06) 26th IEEE International Conference on

Distributed Computing Systems, pp.46, 2006. [16] M.Bechler, H. J. Hof, D. Kraft, F. Pahlke and L. Wolf, “A Cluster-

Based Security Architecture for Ad Hoc Networks,” Twenty-third

AnnualJoint Conference of the IEEE Computer and Communications Societies, Hongkong, March 2004.

[17] H. Luo, P. Zerfos, J. Kong, S. Lu, and L. Zhang, “Self-securing ad

hoc wireless networks,” in Proc. 7th IEEE Symp. on Comp. and Communications (ISCC), Taormina, 2002.

[18] J. Kong, P. Zerfos, H. Luo, S. Lu, and L. Zhang, “Providing robust and ubiquitous security support for mobile ad-hoc networks,” in

Proc. 9th International Conference on Network Protocols (ICNP).

Riverside, California: IEEE, Nov. 2001, pp. 251–260.

112



[19] C. Maghmoumi, T. A. Andriatrimoson, J. Gaber, and P. Lorenz, “A

Service Based Clustering Approach for Pervasive Computing in Ad Hoc Networks,” in Proc. IEEE GLOBECOM 2008, December 2008.

[20] G. Venkataraman, S. Emmanuel and T. Srikanthan, “Size Restricted

Cluster formation and Cluster Maintenance Technique for Mobile Ad-hoc Networks”, International Journal of Network Management,

Wiley InterScience, 2007, Vol.17, pp. 171-194.

[21] N. S. Yadav, B.P. Deosarkar and R.P.Yadav, “A Low Control Overhead Cluster Maintenance Scheme for Mobile Ad hoc

NETworks (MANETs),” ACEEE International Journal on Network

Security, Volume 1. Number 1. May 2009. [22] C. Maghmoumi, H. Abouaissa, J. Gaber, and P. Lorenz, “A

Clustering-Based Scalable Key Management Protocol for Ad Hoc

Networks,” in Proc. Second International Conference on Communication Theory, Reliability, and Quality of Service, France,

July 2009.

113



www.iariajournals.org

International Journal On Advances in Intelligent Systems

ICAS, ACHI, ICCGI, UBICOMM, ADVCOMP, CENTRIC, GEOProcessing, SEMAPRO,BIOSYSCOM, BIOINFO, BIOTECHNO, FUTURE COMPUTING, SERVICE COMPUTATION,COGNITIVE, ADAPTIVE, CONTENT, PATTERNS, CLOUD COMPUTING, COMPUTATION TOOLS

issn: 1942-2679

International Journal On Advances in Internet Technology

ICDS, ICIW, CTRQ, UBICOMM, ICSNC, AFIN, INTERNET, AP2PS, EMERGING

issn: 1942-2652

International Journal On Advances in Life Sciences

eTELEMED, eKNOW, eL&mL, BIODIV, BIOENVIRONMENT, BIOGREEN, BIOSYSCOM,BIOINFO, BIOTECHNO

issn: 1942-2660

International Journal On Advances in Networks and Services

ICN, ICNS, ICIW, ICWMC, SENSORCOMM, MESH, CENTRIC, MMEDIA, SERVICECOMPUTATION

issn: 1942-2644

International Journal On Advances in Security

ICQNM, SECURWARE, MESH, DEPEND, INTERNET, CYBERLAWS

issn: 1942-2636

International Journal On Advances in Software

ICSEA, ICCGI, ADVCOMP, GEOProcessing, DBKDA, INTENSIVE, VALID, SIMUL, FUTURECOMPUTING, SERVICE COMPUTATION, COGNITIVE, ADAPTIVE, CONTENT, PATTERNS,CLOUD COMPUTING, COMPUTATION TOOLS

issn: 1942-2628

International Journal On Advances in Systems and Measurements

ICQNM, ICONS, ICIMP, SENSORCOMM, CENICS, VALID, SIMUL

issn: 1942-261x

International Journal On Advances in Telecommunications

AICT, ICDT, ICWMC, ICSNC, CTRQ, SPACOMM, MMEDIA

issn: 1942-2601

The - IARIA Journals · Libraries are permitted to photocopy or print, ... Rana Hamed, German...

Documents

Transcript of The - IARIA Journals · Libraries are permitted to photocopy or print, ... Rana Hamed, German...