Collision-free Interleavers using Latin Squares for Parallel Decoding

Collision-free Interleavers using Latin

Squares for Parallel Decoding of

Turbo Codes

Hyun-Young Oh

The Graduate School

Yonsei University

Department of Electrical and Electronic

Engineering

Collision-free Interleavers using LatinSquares for Parallel Decoding of

Turbo Codes

Hyun-Young Oh

A Thesis Submitted to the

Graduate School of Yonsei University

in Partial Fulfillment of the

Requirements for the Degree of

Master of Science

Supervised by

Professor Hong-Yeop Song, Ph.D.

Department of Electrical and Electronic EngineeringThe Graduate School

YONSEI University

December 2006

This certifies that the thesis ofHyun-Young Oh is approved.

Thesis Supervisor: Hong-Yeop Song

Sanghoon Lee

Kwang Soon Kim

The Graduate SchoolYonsei UniversityDecember 2006

��

��

�� . �� !�� " ��

��. �#�� $�%&'( )*�� +, ��-. /012345&

64�� 7*�� 8�� 9:;��-., �� <=� �� 1( ��"�� 8!��

�� "#$5& >8 ��< ��.

�%�?@?@��&�AB ��#�64CD7*E��FD��E��-.'()GH#�

64CD7*�� *+,-�.�� IJK�� +, ��"L:��. M4�� /012345& N*/01�� O&

CD7*E�� PQ>�� R 64�21 �!� ��"��S4�%�� !�1( �� 7*�� #"3 4�IJK��

�� 5�� IJK�� +,64 ��"L:��. �6��12 �� <�� /2-. 34$� �� %&/0

7*E�� !�1( �� 7*�� IJK�� +, ��"L:��.

�� CD T4��-., �� %� �� '()#�� 2&7��(,�� (,��(,$%&''(��

�&�AB�� )%� !UD�� (,V��WX64 %�*�� " �� L:��. �&�AB�� (�)+�, �-��.�� *�

CD7*��8��YZ+�,,"JK5&-.��7)1 ��/כ��*��&�AB+0,�1,��*� ��127*��)%�+�,,

2%��9�� [D\� �� %&/07*�� :�� +�,, �&�AB�� ]2�-3'(9�� R /27*�4�� :��

+�,, �&�AB�� !UD�� ;�E�� !;, +�,, �� 64 �'( K=� �� *� ��

<#$/2 7*�4�� 7*�� +�,, /012 �� _V� �� CD7*�� a*�� 3 4 +�,�� <�#"��

!;1�� 5. \�E�� +�,��WX %�*�� " �� L:��. !UD�� *� 6�7bc7*�4�� V�

8=9 +�,, K=� �� [D��WX CD7*�� :�� +�,, �=�� /012 64�� :�� d�

6�>��WX64 ��8�� :?��. ?@e/ �� /2e/-. �� 64�� -;�� R /2 7*��

�� +�,, 0#"<@= +�,, fD�� +�, �=�� /0,��WX ��8�� :?��'(, ��5&8�� E�0�1��

�� /2 gKABC�� 2>3:?�� V� �� 4(#"��. ��h:5& �&�AB�� <#$/2>�� S4

!;,�� <D=, UDK�� /כ� �E�� )%��, d��#"�� 9:!� �� V�<D=��WX64 ��8��

�� !�1( �� :?��. M4�� i� "� ��1 ��j��-. ��(,�E��-.64 V��

�8� �� F�)�� 1(7*�4�� )%�UD�� WX64 ��" �� YZ��. N*

kl �? ��%& ��E�� &�AB�� 34G�� 5� �� 2%�� 5. \�� H21�� 9�%� V�

��.

�� WX R V�m#� �� "IJN*64 �'( ��#�� R /2 7*�� no7*��WX ��8

�� :?��. <�� *� CD>��-. �� !� �� :�� (,�� <D=:�� <D=��

WX64 � �(,�� @!�� !�1(� ��86�� p��. ��h:5& PQ �� 2>3:?� ��q*V� ��

$6>7��. <#Kh:5& �� gK�� @!��h:5& �A1(CD7*E�� !� ��"�� "$0,

�� 7*E�'(, �� %� *�8�� R /27*�� N*34�� +, gK�� " �� L:=�'(, � �(,

�� #��B� �� $"��.

2006&7� 12��

�� L:#-3

Contents

List of Figures iv

List of Tables v

Abstract vi

1 Introduction 1

1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Parallel Architecture of Turbo Codes 4

2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Encoding Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.3 Decoding Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.4 Computing Minimum Distance . . . . . . . . . . . . . . . . . . . . . . 9

2.5 Error-Rate Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Collision-free Interleavers 14

3.1 Collision-free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

i

3.2 Some Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 2D Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.2 Almost Regular Permutation . . . . . . . . . . . . . . . . . . . 17

4 Proposed Collision-free Interleaver 18

4.1 Construction 1: Using 3GPP interleaver and Latin square . . . . . . . . 18

4.1.1 Latin square type interleaver . . . . . . . . . . . . . . . . . . . 18

4.1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1.3 Simulation Results and Discussions . . . . . . . . . . . . . . . 28

4.2 Construction 2: Using 3GPP interleaver and Kasami sequence set . . . 33

4.2.1 Kasami type 1 interleaver . . . . . . . . . . . . . . . . . . . . 33

4.2.2 Kasami type 2 interleaver . . . . . . . . . . . . . . . . . . . . 35


4.3 Construction 3: Using S-random interleaver and Latin square . . . . . . 40

4.3.1 Collision-free S-random interleaver . . . . . . . . . . . . . . . 40


5 Concluding Remarks 44

5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

Bibliography 45

Abstract (in Korean) 50

ii

List of Figures

2.1 Zero-state-starting encoding of information bits (01010) without tail bits 6

2.2 Tail-biting encoding of information bits (01010) without tail bits . . . . 6

2.3 Encoding of parallel turbo codes . . . . . . . . . . . . . . . . . . . . . 7

2.4 Comparison BER and FER of block size 320 (with various parallelisms) 12

2.5 Comparison BER and FER of block size 640 (with various parallelisms) 13

3.1 Memory collision in parallel turbo code interleaving . . . . . . . . . . . 15

4.1 Intra subblock permutation . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2 Inter subblock permutation . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Comparison of column wise 2-tuple pattern of 4 by 4 Latin square . . . 21

4.4 Comparison of FER performance between good and bad groups . . . . 23

4.5 Example of the proposed interleaver of size 18 . . . . . . . . . . . . . . 27

4.6 Comparison BER and FER of block size 320 (Latin square type) . . . . 29

4.7 Comparison BER and FER of block size 640 (Latin square type and

semi-Latin square type) . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.8 Flowchart of making the semi-Latin structure . . . . . . . . . . . . . . 32

4.9 Pseudo code for extracting a hopping sequence in Kasami type 2 . . . . 36

iii

4.10 Comparison BER and FER of block size 320 (Kasami type) . . . . . . . 38

4.11 Comparison BER and FER of block size 640 (Kasami type) . . . . . . . 39

4.12 Comparison BER and FER of block size 640 (Proposed S-random inter-

leaver) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

iv

List of Tables

4.1 Simulation environment . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.2 Minimum distances of 24 Latin square types of blocksize 320, 640 . . . 24

4.3 Weight parameters of Latin square type . . . . . . . . . . . . . . . . . 28

4.4 Weight parameters of Kasami type 1 and type 2 . . . . . . . . . . . . . 36

4.5 Simulation environment of method 1 and method 2 . . . . . . . . . . . 42

v

ABSTRACT

Collision-free Interleavers using Latin Squares forParallel Decoding of Turbo Codes

Hyun-Young OhDepartment of Electricaland Electronic Eng.The Graduate SchoolYonsei University

In many communication systems turbo codes are used widely due to its powerful error

correcting capability. One of the problems of turbo codes is decoding delay. Decod-

ing delay disturbs very high speed communication. Thus many researches of parallel

architecture of turbo codes had been proceeded for reducing decoding delay.

In the parallel architecture of turbo codes, the constituent interleaver must avoid the

memory collision. Because it needs additional delay to solve collision problem. In this

thesis, we first introduce past-researched interleavers which can be applied to the par-

allel architecture of turbo codes, such as almost regular permutations (ARP) and 2D

interleaver constituted by temporal permutation and spatial permutation. Next we ana-

lyze the characteristic of interleaver which directly affect to the performance in parallel

turbo codes. And then we propose collision-free interleavers which can be made easily

to various block sizes.

vi

Performance of the proposed interleavers are almost the same as ARP at FER 10−5

region with the information block size of 320 and 640 when the simulation environment

is given by 3GPP standard turbo codes with 4 parallelism in AWGN channel. The pro-

posed interleavers which use Latin square can be made for various information block

sizes by defining only a single mapping matrix while ARP needs the exhaustive search-

ing processes at every block size. We propose another collision-free interleaver which

uses Kasami sequence set so that gives irregular structure.

We apply well-known S-random interleaver to proposed interleaver structure. The

proposed S-random interleaver uses Latin square structured spatial permutation and is

generated by the modified constraints different from that of the conventional S-random

interleaver. The proposed S-random interleaver shows better error-rate performance than

ARP, 0.1dB improvement at FER 10−5 with the information block size 640.

Key words : Turbo codes, Parallel architecture, Collision-free, Interleaver, temporalpermutation, spatial permutation

vii

Chapter 1

Introduction

1.1 Motivation

Due to its outstanding error-correcting capability, turbo codes have been studied exten-

sively [1]. One of the problems of turbo codes in the application to communication

systems is the decoding delay. Decoding delay of turbo codes is directly influenced by

the information block size. In soft-input soft-output (SISO) decoder of turbo codes, log-

likelihood-ratio (LLR) of information bits can be estimated by forward and backward

recursions [2]. Thus the parallel architecture of turbo codes can be a good solution for

decreasing the decoding delay, where a block is divided into several subblocks and each

of them is (encoded and) decoded separately. No tail bits are needed with the help of cir-

cular tail-biting encoding [3]. Starting and ending states are not a problem. In iterative

decoding procedure, the initial distributions of starting and ending states probability in

BCJR algorithm are set to uniform [4].

In the parallel decoding of turbo codes, many processors must be able to operate si-

multaneously and must avoid memory collisions [6], [9]. If more than one SISO module

try to access the same memory bank to read the bits by the constituent interleaver, the

1

access can not be accomplished at a time and an additional delay may occur. The con-

straints of collision-free in parallel architecture were studied in [6], [9]. In 2003, [10]

proposed a collision-free interleaver organized by temporal and spatial permutations.

We call it here by 2D interleaver. In 2004, [11] proposed another collision-free almost

regular permutation (ARP) defined by several parameters so that they can be simply im-

plemented and optimized by searching for some parameters. Recently, [12] proposed

a collision-free S-random interleaver which can be applied for various block sizes by

pruning.

Deterministic collision-free interleavers proposed in [10] and [11] have a complex

optimizing process. In the case of 2D interleaver we have to decide the spatial and tem-

poral permutations at every block size. We also have to search for the parameters at every

block size to optimize ARP. In communication systems with turbo codes, it is required

to support various block sizes in general. So we need to define its constituent interleaver

at every possible block size, for example, from 40 to 5114 in 3GPP [16]. This thesis

proposes collision-free interleaver structures which can be optimized easily over vari-

ous information block sizes. Furthermore, we propose another collision-free interleaver

based on S-random interleaver, which shows better error-rate performance.

1.2 Overview

In Chapter 2, parallel architecture of turbo codes is described. Encoding and decoding

structures are represented. Performances of parallel architecture and conventional non-

parallel architecture are compared. In Chapter 3, the constraints of a constituent inter-

leaver of the parallel architecture of turbo codes are analyzed. And the past-researched

2

interleavers which can be applied to the parallel architecture are introduced. In Chap-

ter 4, we propose two kind of interleavers which can be applied to the parallel archi-

tecture and easily made up for various block sizes. The comparison with the one of

interleavers mentioned in Chapter 3 is represented in terms of performance and com-

plexity. And we propose another collision-free interleaver. It use the similar constraint

with S-random interleaver when generating. Comparison of error-rate performance is

also represented. Finally the proposed interleaver of this thesis are summarized and

some discussions follow.

3

Chapter 2

Parallel Architecture of TurboCodes

2.1 Introduction

Due to the delay problem of turbo codes, especially, decoding delay problem, many

attempts to reduce the decoding delay through the parallel architecture had been stud-

ied [1], [5]. We assume a single processor performs L iterations. The pipeline structure

can be applied to reduce decoding delay where each processor among W processors

operates on the entire information block for L/W iterations before passing the result-

ing extrinsic information to the next processor in the pipeline and operating on the next

information block [1]. A parallel structure was proposed in 1998 by the authors of [5].

In this scheme each information block is divided into W partially overlapped subblocks

and each of the W processors performs all the L iterations on the W subblocks in paral-

lel. The reason for each window to be overlapped partially is the initialization problem

of the recursion [2]. But the authors in [3], [4] solved this problem.

Nevertheless all these studies mentioned above can reduce the decoding delay of

4

turbo codes, they do not concern about the delay of interleaving. They only try to re-

duce decoding delay by introducing pipeline or windowing method. But the interleav-

ing through the entire information block is executed at every iteration step. The tries

to solve this problem by executing interleaving by parallel fashion had been studied

in [6], [9], [10], [11], [12]. All these issues about the interleaver will be handled in

Chapter 3 in detail.

2.2 Encoding Structure

In the parallel architecture of turbo codes, the encoding is processed at each separate

subblock. If the number of subblocks are n, we say n by the degree of parallelism and

call this structure by n-parallelism, where the whole block is divided into n equal-sized

subblocks. In the conventional turbo codes, (e.g. 3GPP [16]) tail bits must be added for

the trellis termination at the end of information block. Because if not, the reliability of

the bits located at the end of the block will be unstable. Similarly, tail bits must be added

at each subblocks in the parallel turbo codes. So we need more tail bits in parallel turbo

codes than the conventional. The more we divide the block, the more rate loss we suffer.

Without tail bits we cannot always make the ending state be identical to the zero state,

starting state.

But using the algorithm in [3], we call it by circular encoding or tail-biting encoding,

we can make the ending state same as the starting state without tail bits. Fig. 2.1 repre-

sents the conventional convolutional encoding, i.e., starting with the zero state (00), but

the ending state is not guaranteed to be zero state (00). Fig. 2.2 represents the tail-biting

encoding. In tail-biting encoding we can choose the starting state, which results in the

5

00

10

01

11

00

10

01

11

Information bits: 0 1 0 1 0

Figure 2.1: Zero-state-starting encoding of information bits (01010) without tail bits

00

10

01

11

00

10

01

11

0 1 0 1 0Information bits:

Figure 2.2: Tail-biting encoding of information bits (01010) without tail bits

6

Information Information Information

parity1 parity1 parity1

parity2 parity2 parity2

Tail-bitingEncoding

Tail-bitingEncoding

Figure 2.3: Encoding of parallel turbo codes

same ending state with the starting state. But this phenomenon is not always possible.

If the block size is the multiple of the period of constituent convolutional encoder, we

can not apply tail-biting encoding [3]. For example, we assume the number of memories

in the constituent convolutional encoder is 3. Then we must avoid the subblock length

multiple of 7 = 23 − 1.

Fig. 2.3 represents the encoding structure of parallel turbo codes. The information

block is divided into several subblocks. Then encode each divided information block to

generate the first parity bits. After the interleaving by the interleaver, the second parity

bits are generated in the same way.

7

2.3 Decoding Structure

With the help of the tail-biting encoding, no rate loss due to tail bits is possible. But

the decoders don’t know the starting state and the ending state exactly. The decoders

only know the fact that starting state and ending state are identical. So the initial value

of α, β in the BCJR algorithm of iterative decoding process is set to be uniform [4].

This means that the probability distribution of every state is uniform. After the each

iteration, the initial value is updated from the last value at previous iteration. This tail-

biting decoding algorithm can be represented as following equations. The forward and

backward recursion (2.1), (2.2) are executed conventionally in each separate subblock,

where S is the set of states, state transition is assumed to be from s to s′, L-parallelism

is considered and subblock length is M .

αk(s) =∑s′∈S

αk−1(s′)γk(s′, s) (2.1)

βk−1(s′) =∑s∈S

βk(s)γk(s′, s) (2.2)

But due to the tail-biting encoding, the initialization must set to be uniform for all states.

(2.3) is the initialization of alpha value before the first iteration. The initialization of beta

is also like (2.4). Where α(j)i,l,m(s) is the alpha value in m-th bit of l-th subblock, at i-th

decoder, j-th iteration and 1 ≤ m ≤ M , 0 ≤ l ≤ L − 1, i = 1, 2.

α(1)i,l,0(s) = 1/|S|, ∀s ∈ S, l ∈ {0, · · · , L − 1}, i ∈ {1, 2} (2.3)

β(1)i,l,M (s) = 1/|S|, ∀s ∈ S, l ∈ {0, · · · , L − 1}, i ∈ {1, 2} (2.4)

At the n-th iteration,

α(n)i,l,0(s) = α

(n−1)i,l,M (s), ∀s ∈ S, l ∈ {0, · · · , L − 1}, i ∈ {1, 2} (2.5)

8

β(n)i,l,M(s) = β

(n−1)i,l,0 (s), ∀s ∈ S, l ∈ {0, · · · , L − 1}, i ∈ {1, 2} (2.6)

Since we must use the last value of previous iteration to initialize α, β at current iteration,

we need additional memories, 2 · L · |S| storages for α and β values.

2.4 Computing Minimum Distance

In this section we introduce the algorithm of computing minimum distance dmin in paral-

lel architecture of turbo codes. The minimum distance of turbo codes directly influences

the error-rate performance [13]. Bit error-rate Pb(e) is upper-bounded as

Pb(e) ≤ W

2kerfc

(√dminRcEb

N0

)· e

dminRcEbN0 · ∂AC(W,Z)

∂W

∣∣∣∣W=Z=e−RcEb/N0

. (2.7)

Where W , Z are dummy variables, k is block size, dmin is minimum distance, Rc is

code rate and AC(W,Z) is a input redundancy weight enumerating function represent-

ing the weight distribution of turbo codes. Thus the larger the dmin is, the better the

error-rate performance is expected. If we check dmin of interleavers for finding good

ones, we can avoid the poor which gives a small dmin.

The algorithm of finding dmin is based on the method proposed in [14], [15]. Since

turbo codes are linear codes, minimum weight of codewords is equivalent with the min-

imum distance of turbo codes. So computing minimum distance means for finding a

minimum weight codeword. For finding the minimum weight codeword, we must in-

vestigate all possible codewords. But it needs large amount of time. The following

algorithm executes this process efficiently by only considering the candidates which

have large possibility to be the minimum weight codewords. The algorithm starts with

the set of initial vectors {s(0), s(1), · · · , s(N−1)} regarded as information bits, where

9

s(j) = (s(j)i )i=0,1,··· ,N−1 and s

(j)i is defined as

s(j)i =

⎧⎪⎪⎨⎪⎪⎩

0 if i < j

1 if i = j

−1 if j < i

. (2.8)

Here −1 means the undetermined. The algorithm determines the undetermined by in-

vestigating that which of 0 or 1 does result in smaller codeword weight. The method of

determining the undetermined is similar to Viterbi algorithm. For example if we have to

determine k-th element, i.e. the bits up to (k− 1)-th place are determined, then compute

the resulted codeword weight for two cases when the k-th element is 0 and 1. The way

of computing the weight of resulted codeword is the followings.

Step 1 Encode the given vector from the bit index 0 to k − 1. Then compute the weights

of the remained undetermined bit for terminating the trellis with the constraints of

tail-biting and k-th element is 0 or 1.

Step 2 Permute the given vector by the interleaver.

Step 3 Encode the permuted vector. At this time determine the undetermined bits at sev-

eral places for resulting the smallest weight.

Step 4 Compute the sum of the information weight and the first parity weight computed

at Step 1 and the second parity weight computed at Step 3.

Step 5 Compare the value computed at Step 4 when the k-th element is 0 and 1 then

determine the k-th element to be what results in smaller weight.

Step 6 If the resulted weights of two cases are same, add all these two vectors to the list

of the candidates.

10

While doing the above processes for all candidates, store the minimum weight code-

words and updates the dmin and multiplicity Nd. For the sake of saving computing time,

discard candidates from the list and pass to the next candidate if the computed codeword

weight is exceed the expected dmin during the process of determining the undetermined.

2.5 Error-Rate Performance

In this section we represent error-rate performances when various parallelisms are ap-

plied to 3GPP standard turbo codes. Non-parallelism, 4-parallelism and 8-parallelism

are considered with the information block size 320 and 640 respectively. Tail-biting en-

coding and decoding method mentioned in section 2.2 are applied. The degradation tend

to be proportional to the degree of parallelisms as shown in Fig. 2.4 and Fig. 2.5. But the

performance degradation due to the parallel structure is less than 0.1dB at FER 10−3 in

both cases. In the case of 4-parallelism the error-rate performance is almost the same as

non-parallelism. Note that if the moderate parallelism (e.g. 4-parallelism) is applied to

conventional non-parallel turbo codes, the performance degradation may be negligible,

but the decoding delay can be reduced by the degree of parallelisms.

11

0.5 1 1.5 2 2.510

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

K=320, 0,4,8 parallelism, Max.Iter.=8, maxLogMAP

Eb/N0

Err

or R

ate

Non−parallelism4−prallelism8−parallelism

BER

FER

Figure 2.4: Comparison BER and FER of block size 320 (with various parallelisms)

12

0.5 1 1.5 210

−7

10−6

10−5

10−4

10−3

10−2

10−1

100

K=640, 0,4,8 parallelism, Max.Iter.=8, maxLogMAP

Eb/N0

Err

or R

ate

Non−parallelism4−prallelism8−parallelism

FER

BER

Figure 2.5: Comparison BER and FER of block size 640 (with various parallelisms)

13

Chapter 3

Collision-free Interleavers

3.1 Collision-free

The constituent interleaver of the parallel turbo codes must be able to execute parallel

interleaving and parallel de-interleaving. This means that each processor operated in

each subblock can read the bit in the permuted order without any memory collisions [6].

If several processors are about to read the bits in the same memory bank, the collision

occurs. Fig. 3.1 represents these collisions. Processors in the first subblock and the

third subblock are reading the bits in the second subblock simultaneously. These two

processes can be executed only sequentially. So memory collision makes another delay

problem. Because we have to wait one or more (according to the number of collisions)

clock cycles in order to write data, collisions delay the decoding process [6], [7]. As-

sume block size is N , parallelism is L, maximum iteration number is n, the interleaver

is given by the uniform random interleaver and one clock cycle is needed to manage

each memory collision. Then the average added delay of decoding process would be

n · (N/L) ·(

LL−L!LL

). In other words, the probability that collision occurs is LL−L!

LL at

every access to the memory bank. For example, the probability that collision occurs is

14

Information bits

Permutedinformation bits

Interleaver

Collision

Figure 3.1: Memory collision in parallel turbo code interleaving

90 percent with 4 parallelism and 99 percent with 8 parallelism. But clock delays due

to collision problem is relatively small comparing with the whole process of decoding

computation, less than 0.1 percent of decoding time [8]. But since the higher parallelism

is, the more clock delays occur due to the memory collision, it can disturb very fast de-

coding process at highly parallel turbo codes. To avoid memory collision each processor

must access the bits in the different subblock at a time. So we can define collision-free

as Definition 3.1.

Definition 3.1 Let the information block size be N , the number of subblocks be L and

Π(·) be the constituent interleaver. Π(·) is collision-free if ∀i ∈ {0, 1, · · · , N/L − 1},

∀j, k ∈ {0, 1, · · · , L − 1}, j �= k satisfies the following constraint.

⌊Π(

i +N

Lj

)/(N/L)

⌋�=⌊Π(

i +N

Lk

)/(N/L)

⌋

15

3.2 Some Reviews

In this section we review two collision-free interleavers proposed in [10] and [11], re-

spectively. One is 2D interleaver and the other is ARP.

3.2.1 2D Interleaver

A collision-free interleaver constituted by two permutations, named by temporal permu-

tation and spatial permutation was proposed in [10]. At first the temporal permutation

permutes bits in each subblock. And then the spatial permutation permutes bits among

the subblocks. Let the number of information bits be K , the number of subblocks be

L and the number of bits in each subblock be M . Then M is equal to K/L, where L

should divide K. The bit index k, where k ∈ {0, 1, · · · ,K − 1}, can be represented by

the 2 dimensional array structure with temporal index t and spatial index s. We have the

relation k = s ·M + t, where s ∈ {0, 1, · · · , L− 1} and t ∈ {0, 1, · · · ,M − 1}. Let the

temporal permutation be denoted by ΠT (t, s) and the spatial permutation by ΠS(t, s).

Then a collision-free 2D interleaver is defined as

Π(k) = Π(t, s) = ΠS(t, s) · M + ΠT (t, s). (3.1)

A bit of index k in the permuted order is read from ΠT (t, s)-th position of ΠS(t, s)-th

subblock. In parallel architecture, permuting is completed by L processors allocated to

each subblock during M ticks, where a tick means the time for a processor to access

or read one bit (or a soft decision value) from the memory. For every processor to

read bits from all distinct subblocks, ΠS(t, s) must satisfies the condition that for every

t ∈ {0, 1, · · · ,M − 1}, ΠS(t, s)s=0,1,··· ,L−1 are in one-to-one correspondence with

16

subblocks 0, 1, · · · , L − 1. [10] defines ΠS(t, s) as a simple rotational form. As long

as ΠS(t, s) satisfies the collision-free constraint, one can easily make a collision-free

interleaver using any permutation as ΠT (t, s).

3.2.2 Almost Regular Permutation

ARP was proposed in 2004 by C. Berrou, based on the relative prime interleaver [11].

Periodic fluctuation patterns are added as

Π(k) = (P · k + C · (α(k) · P + β(k)) + γ) (mod K). (3.2)

Here, P is relatively prime with the information block size K, C is a period, α(k) and

β(k) are positive integer sequences of period C for 0 ≤ k ≤ K−1, γ is an initial offset.

It shows an impressive performance improvement against 3GPP interleaver, specifically,

0.55 dB improvement at FER 10−5 with the information block size 640 in AWGN chan-

nel, where the simulation environment is given by 3GPP standard turbo codes with max

log MAP decoding algorithm and a number of iterations is 8 [17].

Although the performance of ARP is good, the optimizing process is complex. It

requires an exhaustive search whenever the block size changes. For example in 4 par-

allelism, i.e., C = 4, we must find 4 parameters. In [11], α repeats the pattern of

(0, 0, 1, 1) or (0, 1, 0, 1) and β repeats (β0 = 0, β1, β2, β3) with βi ∈ {0, 1, · · · , 8}.

Thus for each candidate P we must investigate(83

)= 56 cases, since we may assume

that γ and α’s are known and βi’s are distinct. Therefore the process of finding an op-

timal ARP over various information block sizes contains a huge amount of computing

time and complexity.

17

Chapter 4

Proposed Collision-free Interleaver

4.1 Construction 1: Using 3GPP interleaver and Latin square

4.1.1 Latin square type interleaver

We define a collision-free interleaver by rewriting spatial permutation as a matrix form

as

Π(k) = Π(s · M + t) = uts · M + ΠT (t). (4.1)

Here the M by L matrix U = {uts} indicates the mapping among subblocks, where

t is the temporal index and s the spatial index. In L parallelism, at t-th tick in s-th

subblock, the s-th processor reads (uts · M + ΠT (t))-th bit (or the soft decision value).

To avoid memory collision, each row of U must be the permutation of subblocks,

0, 1, · · · , L − 1. To define a collision-free interleaver easily for various kind of possi-

ble block sizes, we use a pre-structured interleaver as the temporal permutation ΠT (t).

For example, the interleaver of 3GPP standard is defined for the block size from 40 to

5114 [16]. Since ΠT (t) is given, the optimizing process is to decide the mapping matrix

U, i.e., to find M permutations of {0, 1, · · · , L−1}. One of the ways to choose a permu-

tation is to generate randomly. But the collision-free interleaver made by the randomly

18

generated U turned out to be not good enough. It only gives the average performance

among all possible cases.

We must avoid bad patterns among possible permutation patterns. One of the rea-

sons that degrades the performance is the spreading characteristic of the bits in the same

subblock before and after permuting. If two bits in the same subblock remain in the same

subblock after permuting as shown in Fig. 4.1, those bits make a cycle which restricts

the propagation of messages in the iterative decoding process. Since every pair of bits

in the form of Fig. 4.1 makes a cycle, L · (M/L2

)cycles exist at each subblock when

L divides M . If two bits in the same subblock are permuted to different subblocks as

shown in Fig. 4.2 or conversely, two bits in the different subblocks are permuted to the

same subblock, they do not form a cycle. To decrease the number of cycles of the type

shown in Fig. 4.1, we must move the bits in the same subblock to different subblocks

as much as possible. It means that the multiplicity of subblock indices in each column

vector of U must be nearly or exactly M/L.

i j

ij

Figure 4.1: Intra subblock permutation

19

The Latin square is the L by L square matrix over an alphabet of size L, where

every row and every column are the permutation of L symbols [19]. We define U matrix

by the form of columnwise repetition of an L by L Latin square. We call it by Latin

square structured U. It is satisfied that the distribution of multiplicity of subblock in-

dices in each column vector is uniform by the repeating feature. And the cycle length is

guaranteed by at least L+1, where the cycle length is defined by |i − j|+ |Π(i) − Π(j)|

in Fig. 4.1.

The Latin square structure has also advantage for complexity. At first we only need

to store the L by L matrix instead of the whole M by L matrix. For example, K = 640,

L = 4, then we only need to store 4 · 4 elements instead of 160 · 4 elements. Secondly,

we have lower complexity in optimizing process. In the case of L = 4, the first row of

U is initialized by (0, 1, 2, 3). Other resulting interleavers with the different initializa-

tion can be derived by permuting the subblock labels. Thus we only need to investigate

24 cases, instead of 576 cases corresponding to the all possible cases of 4 by 4 Latin

i j

i j

Figure 4.2: Inter subblock permutation

20

⎛⎜⎜⎝

0 1 2 31 0 3 22 3 0 13 2 1 0

⎞⎟⎟⎠

(a) Bad Spreading

⎛⎜⎜⎝

0 1 2 31 0 3 22 3 1 03 2 0 1

⎞⎟⎟⎠

(b) Good Spreading

Figure 4.3: Comparison of column wise 2-tuple pattern of 4 by 4 Latin square

squares. Furthermore we can reduce this number to 12 by picking out good ones among

them. A criterion is the distribution of two-tuple pattern of the column vector of U. One

of the bad and good cases are represented in Fig. 4.3.

Since the each column vector implies the subblock permuting pattern of one decod-

ing block, the distribution of the patterns in each column vector directly influences the

performance. For example, observe the consecutive two-tuple patterns (0, y) along the

(circular) column where y ∈ {0, 1, 2, 3}. Fig. 4.3(a) contains only the patterns (0, 1)

and (0, 3) twice, but Fig. 4.3(b) contains (0, 2), (0, 3) once and (0, 1) twice. Therefore,

Fig. 4.3(b) is expected to give a better performance than Fig. 4.3(a), since it is closer

to the uniform distribution than the other. The ideal distribution of the patterns would

be (0, 1), (0, 2), (0, 3) and (0, 4) exactly once, and furthermore, every pattern of (x, y)

occurs exactly once for x, y ∈ {0, 1, 2, 3, 4} and x �= y, which is equivalent to a circular

Tuscan array [18], [20].

By this criterion we can divide 24 cases into 2 groups, good and bad groups where

each group has 12 cases. The comparison of the performance of these two groups is

21

shown in Fig. 4.4. We use 3GPP interleaver of size 160 as the temporal interleaver.

Simulation environment is 3GPP standard turbo codes of the information blocksize 640

with 4 parallelism. Decoding algorithm is given by max log MAP and the maximum

iteration number is 8 with Genie stopping rule, i.e., the iterations are stopped when there

are no information bits in error. The simulation environment is summarized in table 4.1.

FER curves of good group and bad group are distinguished definitely by the curve of

3GPP interleaver of the size 640 in Fig. 4.4.

And the minimum distances of 24 cases computed by the algorithm described in

section 2.4 are represented in Table 4.2 for blocksize 320 and 640, where dmin is the

minimum distance, Nd is the number of codewords whose weight is dmin and w is the

sum of all the weights of minimum weight codewords. U(good)i means that it has good

spreading property and U(bad)i conversely. dmin and Nd is directly influence the upper

bound of FER and w is also influence the upper bound of BER [13]. In the case of

blocksize 640, dmin is 28 for all cases. In blocksize 320, dmin is 28 or 29. The mini-

Table 4.1: Simulation environment

Decoding algorithm Max log MAP algorithm

Maximum iteration 8

Stopping criterion Genie

Information block size 640

Parallelism 4

Temporal interleaver 3GPP standard interleaver of size 160

Code rate 1/3

Modulation BPSK

Channel model AWGN

22

mum weight parameters (dmin, Nd, w) of 12 good cases piked out from 24 cases by the

criterion as mentioned in section 4.1.1 are tend to be better than them of bad cases, but

they don’t actually be distinguished. That is, dmin and its multiplicity are not dominant

factors which affect FER performance but the spreading factor at spatial permutation

affects more as shown in Fig 4.4. But we can eliminate a candidate which brings about

smaller dmin than others from good 12 cases. Actually in blocksize 320 U(good)7 and

U(good)9 are left out.

1.4 1.5 1.6 1.7 1.8 1.9 2 2.110

−5

10−4

10−3

10−2

K=640, L=4, Max.Iter.=8, maxLogMAP, Good and Bad Cases

Eb/N0

Err

or R

ate

Bad Group Element3GPP(FER)Good Group Element

Bad Group

Good Group

Figure 4.4: Comparison of FER performance between good and bad groups

23

Table 4.2: Minimum distances of 24 Latin square types of blocksize 320, 640

Latin

square

(dmin, Nd, w) Latin

square

(dmin, Nd, w)

640 320 640 320

U(bad)1 =

0BBBB@

0 1 2 3

1 0 3 2

2 3 0 1

3 2 1 0

1CCCCA

(28,3,12) (28,2,8) U(good)13 =

0BBBB@

0 1 2 3

2 3 0 1

3 0 1 2

1 2 3 0

1CCCCA

(28,4,16) (29,1,3)

U(good)2 =

0BBBB@

0 1 2 3

1 0 3 2

2 3 1 0

3 2 0 1

1CCCCA

(28,2,8) (29,2,6) U(bad)14 =

0BBBB@

0 1 2 3

2 3 0 1

3 2 1 0

1 0 3 2

1CCCCA

(28,3,12) (29,2,6)

U(good)3 =

0BBBB@

0 1 2 3

1 0 3 2

3 2 0 1

2 3 1 0

1CCCCA

(28,2,8) (29,1,3) U(bad)15 =

0BBBB@

0 1 2 3

2 3 1 0

1 0 3 2

3 2 0 1

1CCCCA

(28,6,24) (28,1,4)

U(bad)4 =

0BBBB@

0 1 2 3

1 0 3 2

3 2 1 0

2 3 0 1

1CCCCA

(28,1,4) (29,1,3) U(good)16 =

0BBBB@

0 1 2 3

2 3 1 0

3 2 0 1

1 0 3 2

1CCCCA

(28,1,4) (29,2,6)

U(bad)5 =

0BBBB@

0 1 2 3

1 2 3 0

2 3 0 1

3 0 1 2

1CCCCA

(28,4,16) (28,1,4) U(good)17 =

0BBBB@

0 1 2 3

3 0 1 2

1 2 3 0

2 3 0 1

1CCCCA

(28,4,16) (29,1,3)

U(good)6 =

0BBBB@

0 1 2 3

1 2 3 0

3 0 1 2

2 3 0 1

1CCCCA

(28,1,4) (29,2,6) U(bad)18 =

0BBBB@

0 1 2 3

3 0 1 2

2 3 0 1

1 2 3 0

1CCCCA

(28,4,16) (28,1,4)

U(good)7 =

0BBBB@

0 1 2 3

1 3 0 2

2 0 3 1

3 2 0 1

1CCCCA

(28,3,12) (28,2,8) U(bad)19 =

0BBBB@

0 1 2 3

3 2 0 1

1 0 3 2

2 3 1 0

1CCCCA

(28,4,16) (28,1,4)

U(bad)8 =

0BBBB@

0 1 2 3

1 3 0 2

3 2 1 0

2 0 3 1

1CCCCA

(28,3,12) (29,1,3) U(good)20 =

0BBBB@

0 1 2 3

3 2 0 1

2 3 1 0

1 0 3 2

1CCCCA

(28,6,24) (29,2,6)

24

U(good)9 =

0BBBB@

0 1 2 3

2 0 3 1

1 3 0 2

3 2 1 0

1CCCCA

(28,5,20) (28,2,8) U(bad)21 =

0BBBB@

0 1 2 3

3 2 1 0

1 0 3 2

2 3 0 1

1CCCCA

(28,4,16) (29,1,3)

U(bad)10 =

0BBBB@

0 1 2 3

2 0 3 1

3 2 1 0

1 3 0 2

1CCCCA

(28,2,8) (29,2,6) U(good)22 =

0BBBB@

0 1 2 3

3 2 1 0

1 3 0 2

2 0 3 1

1CCCCA

(28,4,16) (29,2,6)

U(bad)11 =

0BBBB@

0 1 2 3

2 3 0 1

1 0 3 2

3 2 1 0

1CCCCA

(28,5,20) (28,2,8) U(good)23 =

0BBBB@

0 1 2 3

3 2 1 0

2 0 3 1

1 3 0 2

1CCCCA

(28,4,16) (29,1,3)

U(good)12 =

0BBBB@

0 1 2 3

2 3 0 1

1 2 3 0

3 0 1 2

1CCCCA

(28,4,16) (29,2,6) U(bad)24 =

0BBBB@

0 1 2 3

3 2 1 0

2 3 0 1

1 0 3 2

1CCCCA

(28,4,16) (29,2,6)

25

4.1.2 Example

One of the merits of (4.1) is that we can use the spreading features of short length inter-

leaver ΠT . After permuting, the bits which was in the same subblock before permuting

and is in the same subblock after permuting, are come a part in the same degree de-

fined in ΠT . Fig. 4.5 shows the example of the proposed interleaver, where K = 18,

L = 3, M = 6, ΠT = (2, 0, 5, 3, 4, 1). And 6 by 3 mapping matrix U is the columnwise

repetition of Latin square (4.2) twice.⎛⎜⎜⎝

0 1 2

2 0 1

1 2 0

⎞⎟⎟⎠ (4.2)

The resulting permuted bits in Fig. 4.5(f) inherited the spreading feature of in-

terleaver ΠT . In each subblock there are 2 bits interleaved from the same subblock

respectively. Those bits are separated by the degree defined at ΠT and the positions in

the subblock are also the same as defined in ΠT . For example the second and the third

bit in the first subblock are permuted to the position of the first and the fourth in the first

subblock respectively. In the sense of the quality of interleaver, the bits in the same sub-

block before permuting must be scattered as far as possible after permuting as mentioned

in section 4.1.1. Since the bits in the same subblock before permuting is related with one

and another in the first decoding process, to gain as much information as possible at the

second decoding, we must scatter the bits related one and another in the first decoding as

far as possible. The proposed scheme uses the spreading feature of the short interleaver

ΠT both relative and absolute position-scattering pattern.

26

(a) tick 0

(b) tick 1

(c) tick 2

(d) tick 3

(e) tick 4

2 2 20 0 05 5 53 4 1 3 4 1 3 4 1

0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5

(f) tick 5

Figure 4.5: Example of the proposed interleaver of size 18

27

4.1.3 Simulation Results and Discussions

In this section we compare the performance of the proposed interleaver with ARP and

3GPP standard interleaver when K = 320 and 640. Let L = 4 for both cases. The pa-

rameters of ARP in [17] is defined as follows. α repeats (0, 0, 1, 1), β repeats (0, β1, β2, β3)

and γ is 3. P = 197, β1 = 2, β2 = 5, β3 = 3 for the information block size 320 and

P = 201, β1 = 6, β2 = 3, β3 = 1 for the information block size 640. In the proposed

interleaver we use the 3GPP interleaver of size 80, 160 as a temporal permutation and

the Latin square structured U matrix is given as

U1 =

⎛⎜⎜⎜⎜⎜⎝

0 1 2 3

1 0 3 2

3 2 0 1

2 3 1 0

⎞⎟⎟⎟⎟⎟⎠ , U2 =

⎛⎜⎜⎜⎜⎜⎝

0 1 2 3

2 3 0 1

1 2 3 0

3 0 1 2

⎞⎟⎟⎟⎟⎟⎠ . (4.3)

U1 shows the best performance for K = 320, U2 for K = 640 among 12 candidates.

The minimum distance and other weight parameters are as shown in Table 4.3. Fig. 4.6

and Fig. 4.7 are the curves of BER, FER versus Eb/N0 for K = 320 and K = 640,

respectively. Other simulation environments are the same as those in Table 4.1. The

proposed interleaver shows almost the same performance as ARP for K = 320 in Fig.

Table 4.3: Weight parameters of Latin square type

InterleaversBlocksize

320 640

3GPP Standard Interleaver (25,1,3) (30,1,2)

ARP (33,2,6) (30,1,6)

Latin square type (29,1,3) (28,4,16)

28

0.5 1 1.5 2 2.510

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100

K=320, L=4, Max.Iter.=8, maxLogMAP

Eb/N0

Err

or R

ate

3GPPARPLatin Square Type

BER

FER

Figure 4.6: Comparison BER and FER of block size 320 (Latin square type)

4.6. There is 0.09 dB degradation at FER 10−5 and 0.06 dB degradation at BER 10−7 in

Fig. 4.7.

One of the reasons that degrades the performance is the regular structure of U

matrix. So we add the irregularity to the Latin square structured U. We generate

M permutations randomly as the row vectors of U with the constraint that for s ∈

{0, 1, · · · , L − 1}, t ∈ {σ − 1, σ, · · · ,M − 1} and σ ≤ L, the σ consecutive elements

of each column vector, ut,s, ut−1,s, · · · , ut−σ+1,s, are all distinct. i.e., σ = L means

the Latin square structure. We call it by the semi-Latin square structure. Flowchart of

29

making the semi-Latin structure is represented as shown in Fig. 4.8. This semi-Latin

square structure may break the uniform distribution of subblocks but the irregularity can

make the error-rate performance better. Fig. 4.7 shows the performance when we use

the semi-Latin square structure with σ = 3. It shows the performance almost the same

as ARP. The minimum distance of a semi-Latin type interleaver used in simulation is 30

which is the same as ARP.

Note that the proposed interleaver performs almost the same as ARP and we can

make a near-optimized interleaver with much lower complexity in the optimization. Fur-

thermore, we can make a collision-free interleaver of the length between 40 · L and

5114 · L by just defining a single L by L mapping matrix U. One may use any pre-

structured interleaver in the design of the proposed construction of interleavers.

30

0.5 1 1.5 210

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100


Eb/N0

Err

or R

ate

3GPPARPLatin Square TypeSemi−Latin Square Type

BER

FER

Figure 4.7: Comparison BER and FER of block size 640 (Latin square type and semi-Latin square type)

31

Initialization

1.2. Decide 3.

Generating permutation

Set by random permutation of

Storing

1. Store 2.

Yes

No

Yes

No

End

For

are all different?

Figure 4.8: Flowchart of making the semi-Latin structure

32

4.2 Construction 2: Using 3GPP interleaver and Kasami se-quence set

4.2.1 Kasami type 1 interleaver

In section 4.1 the irregular structure of spatial permutation can be applied for a better

error-rate performance. In this section we define the spatial permutation using Kasami

small sequence set. Kasami small sequence sets have parameters (2n−1, 2n/2, 2n/2+1),

where a period is 2n − 1, 2n/2 sequences are in the set and maximum absolute value of

correlation is 2n/2 + 1 [21]. We use the following notation: n = 2m, q = 2n, α is a

primitive element in F2n , v = 2m − 1 and d = 2m + 1. Then let sλ = {sλ,i} be a binary

sequence whose elements are given by

sλ,i = fλ(αi), i = 0, 1, · · · , (4.4)

where

fλ(x) = Trm1

(Trn

m(x2) + λxd)

, λ ∈ F2m , x ∈ F2n . (4.5)

A sequence set S consists of sλ for all λ ∈ F2m ; that is,

S = {sλ such that λ ∈ F2m}. (4.6)

Then make hopping sequence set H = {hλ} from Kasami small sequence set S by

converting m consecutive binary bits to decimal system. So hλ,i can be represented like

hλ,i = sλ,i · 2m−1 + sλ,i+1 · 2m−2 + · · · + sλ,i+m−1. (4.7)

We introduce the following Lemma and Theorem to show that the hopping sequence set

H can be applied for the spatial permutation. Lemma 4.1 is about one of the well-known

properties of M -sequences [22].

33

Lemma 4.1 (Window Property) Let W be the set of all, but the all-zero n-tuples over

GF (p), and let B = {b(j)} be an M -sequence of length q = pn − 1 over GF (p).

Then for each w ∈ W there exists a unique index j, 0 ≤ j < q, such that w =

b(j), b(j + 1), · · · , b(j + n − 1).

The proposed Kasami type 1 interleaver is based on the following theorem.

Theorem 4.1 Let S be a Kasami small sequence set defined by (4.6) and H be a hopping

sequence set defined by (4.7). Then, for each λ ∈ F2m , all i-th elements of hλ (hλ,i, i =

0, 1, · · · ) are distinct respectively.

Proof: From (4.4) and (4.5),

sλ,i = fλ(αi) (4.8)

= Trm1

(Trn

m(α2i) + λαdi)

(4.9)

= Trn1 (α2i) + Trm

1 (λαdi) (4.10)

Let β = αd then ord(β) = 2m + 1 so λ can be represented by some powers of β. Let

λ = βr, r = 0, 1, · · · , 2m − 2. Thus,

= Trn1 (α2i) + Trm

1 (βi+r) (4.11)

sλ,i is the modular 2 addition of i-th elements of M -sequence of length 2n − 1 and r

shifted M -sequence of length 2m − 1. Therefore successive m-tuples of sλ and sλ′ are

distinct by Lemma 4.1, where λ �= λ′ .

Set the each column vector of M by L matrix U = {uts} at the equation (4.1)

by the hopping sequence hλ defined in (4.7). Let ut,s = hλs,t where λs ∈ Fm2 , s =

34

0, 1, · · · , L−1(= 2m−1). The parameters of Kasami small sequence set are dependent

only on n. When we design interleavers we must consider various kind of combination

of parallelism and blocksize. Thus we must change the assumption ‘m = n/2’ to ‘m

divides n’ and ‘d = 2m + 1’ to ‘d = 2n−12m−1 ’. Then we have more degree of freedom

about length and parallelism.

4.2.2 Kasami type 2 interleaver

The various length of runs in sequences is desirable in the point of randomness, where

the run means repetition of identical symbol or bit. But the run property of hopping

sequence hλ can make short cycle as shown in Fig. 4.1. Thus a direct application

Kasami sequence to spatial permutation brings about so many short length cycles that

they degrade error-rate performance. Thus we propose to extract only the entries of hλ

which don’t make runs. The following pseudo code generates new hopping sequence set

H(2) which don’t have runs but the length gets be smaller. We need to set high value on

n enough for the length of the extracted hopping sequence to be greater than the desired

length.

35

for (λ ∈ Fm2 )

h(2)λ,0 = hλ,0;

count=0;

for (i = 1, 2, · · · , 2m − 1)

if (hλ,i �= h(2)λ,count for all λ ∈ Fm

2 )

Store hλ,i to h(2)λ,count+1;

count++;

else

continue;

Figure 4.9: Pseudo code for extracting a hopping sequence in Kasami type 2


In this section we compare the BER, FER performance among 3GPP standard, ARP,

Kasami type 1 and Kasami type 2 interleaver of blocksize 320 and 640 with 4 paral-

lelism. Other simulation environments are the same as Table 4.1. We set n = 8,m = 2

at Kasami type 1, n = 10,m = 2 at Kasami type 2 and n = 12,m = 2 at another

Kasami type 2. And the weight parameters are as shown in Table 4.4. In the case of

Table 4.4: Weight parameters of Kasami type 1 and type 2

InterleaversBlocksize

320 640

3GPP Standard Interleaver (25,1,3) (30,1,2)

ARP (33,2,6) (30,1,6)

Kasami type 1 (n = 8) (18,1,2) (20,1,4)

Kasami type 2 (n = 10) (17,1,3) (24,1,4)

Kasami type 2 (n = 12) (22,3,6) (26,2,4)

36

blocksize 320, all Kasami types are even worse than 3GPP standard as shown in Fig.

4.10. FER of Kasami type 1 is 0.35 dB worse than 3GPP standard at 10−4 region,

Kasami type 2, when n = 10, is 0.2 dB worse than 3GPP and Kasami type 2, when

n = 12, is 0.15 dB worse than 3GPP at the same FER region. In the case of blocksize

640, FER performance of Kasami type 1 is 0.1dB worse than 3GPP standard at 10−4

region as shown in Fig. 4.11. FER of Kasami type 2, when n = 10, is almost the same

as ARP at FER 10−5 region. Kasami type 2, when n = 12 shows a little better perfor-

mance than ARP by 0.03 dB at the same FER region.

As analyzed in section 4.1.1, long runs of spatial permutation degrade the error-rate

performance. Thus Kasami type 2 which doesn’t have runs shows better performance

than Kasami type 1. And when we set the value of n to be larger, it shows a slight better

FER performance than Kasami type 2 with smaller n in both blocksize 320 and 640.

37

0.5 1 1.5 2 2.510

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100


Eb/N0

Err

or R

ate

3GPPARPKasami Type 1 (n=8,m=2)Kasami Type 2 (n=10,m=2)Kasami Type 2 (n=12,m=2)

BER

FER

Figure 4.10: Comparison BER and FER of block size 320 (Kasami type)

38

0.5 1 1.5 210

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100


Eb/N0

Err

or R

ate

3GPPARPKasami Type 1 (n=8,m=2)Kasami Type 2 (n=10,m=2)Kasami Type 2 (n=12,m=2)

BER

FER

Figure 4.11: Comparison BER and FER of block size 640 (Kasami type)

39

4.3 Construction 3: Using S-random interleaver and Latinsquare

4.3.1 Collision-free S-random interleaver

In this section we propose another collision-free interleaver using S-random interleaver.

The design of an S-random interleaver guarantees that if two input bits to the interleaver

Π are within distance S, they cannot be mapped to a distance less than S apart at the

interleaver output [23]. S-random interleaver can be made by generating random inte-

gers i, 1 ≤ i ≤ K [24]. Each randomly selected integer is compared to the S previously

selected integers. If the current selection is equal to any of the S previous selections

within a distance of ±S, then the current selection is rejected. This process is repeated

until all K integers are selected. So, considering two indexes i, j such that

0 < |i − j| ≤ S (4.12)

the design imposes that

|Π(i) − Π(j)| > S (4.13)

We call the above (4.12), (4.13) constraints by S-constraint. The S-random design tends

to increase the length of cycles defined in section 4.1.1. S-random design can make

more irregularities than the proposed interleaver in section 4.1.1.

We can think the following two methods when applying S-constraint to collision-

free interleaver.

1) Method 1

• Using S-random interleaver of size M as the temporal permutation

40

Method 1 is different from the interleaver in section 4.1.1 only in the point of using S-

random interleaver as temporal permutation instead of 3GPP standard interleaver.

2) Method 2

• Making the interleaver of size K in accordance with the S-constraint

At first, determine the Latin square structured spatial permutation. Then it is determined

that which subblock each SISO module has to access one by one. Next, decide that

which position in the subblocks has to be accessed in order with the S-constraint. Since

we consider the parallel structure S-constraint has to be modified. Consider (4.12),

(4.13) only if �i/M� = �j/M� and �Π(i)/M� = �Π(j)/M�. Because the two bits only

in the case as shown in Fig. 4.1 make cycles and affect the performance. We call it by

modified S-constraint. The searching time for this algorithm increases with S, and it is

not guaranteed to finish successfully. However choosing S <√

K/2 usually produce

a realization in reasonable time. And the higher S is, the better error-rate performance

tends to be.


We compare two methods to make collision-free S-random interleaver with ARP. The

simulation environments are given as Table 4.5. Since S-random interleaver is not de-

terministic interleaver, we make 100 realizations respectively. Collision-free S-random

interleaver made by the method 1 shows the worst performance as shown in Fig. 4.12.

It is 0.3dB worse than ARP at FER 10−5. Collision-free S-random interleaver made by

the method 2 shows better performance than ARP, 0.1dB better than ARP at FER 10−5.

41

Good spreading features of temporal permutation don’t greatly affect spreading fea-

tures of the whole interleaver so that FER performance is not improved but even wors-

ened. Method 2 which generates the random number with the constraint that maintains

spreading features of the whole interleaver within designed S value, shows better FER

performance.

Table 4.5: Simulation environment of method 1 and method 2

Method 1 Method 2

Decoding algorithm Max log MAP algorithm

Maximum iteration 8

Stopping criterion Genie

Information block size 640

Parallelism 4

Temporal permutation S-random inter-leaver of size 160with S = 9

Generate randomnumber satisfy-ing the modifiedS-constraint withS = 15

Spatial permutation Columnwise repetition of

⎛⎜⎜⎜⎜⎜⎝

0 1 2 3

2 3 0 1

3 0 1 2

1 2 3 0

⎞⎟⎟⎟⎟⎟⎠

Number of realization of interleaver 100

Code rate 1/3

Modulation BPSK

Channel model AWGN

42

0.5 1 1.5 210

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1

100


Eb/N0

Err

or R

ate

ARPS−random (Method 1)S−random (Method 2)

BER

FER

Figure 4.12: Comparison BER and FER of block size 640 (Proposed S-random inter-leaver)

43

Chapter 5

Concluding Remarks

5.1 Summary

In this thesis, we have proposed collision-free interleavers constituted by temporal per-

mutation and spatial permutation so that it can avoid memory collision.

At first, we have proposed to use pre-structured interleaver as the temporal inter-

leaver and Latin square structured spatial permutation so as to reduce the complex of

optimizing processes. We call it by Latin square type. One can make interleavers of

various block sizes by just defining a single mapping matrix. The optimizing process

is much less complex than that for ARP. In the case of 4 parallelism, only 12 cases are

investigated, regardless of block size. The proposed interleaver shows the performance

almost the same as ARP. Irregular structure can be added as proposed in this thesis,

semi-Latin square type or Kasami type can be used for a little better performance.

Finally, we have proposed collision-free S-random interleaver. It also use the struc-

ture of temporal permutation and spatial permutation. Spatial permutation is the Latin

square structure, too. But temporal permutation is generated randomly with the modified

S-constraint as mentioned in this thesis. It shows 0.1dB better performance than ARP at

44

FER 10−5 with the information block size 640 and 4-parallelism.

5.2 Future Directions

We have proposed two kinds of collision-free interleavers, deterministic one and nonde-

terministic one. Latin square type and Kasami type are deterministic, semi-Latin type

and collision-free S-random interleaver are nondeterministic. In the further research, the

following problems are desired to be studied.

• In the case of the deterministic one, although the optimizing process is simple,

the regular structure can limit the error-rate performance. We need to study more

about the way of adding irregular patterns.

• In the case of the nondeterministic one, especially S-random interleaver, many

researches about prunable S-random interleaver which can be applied to various

block size by pruning only if one can design a single prunable interleaver, have

been studied so far. Thus we need to study the pruning technique of the proposed

collision-free S-random interleaver.

• In the process of finding L by L Latin squares which have good spreading property

in the case of the value of L is large, that is in the case of higher parallelism, we

need to have some algorithm to always be able to generate Latin square which has

good spreading property.

45

Bibliography

[1] C. Berrou, A. Glavieux and P. Thitimajshima, “Near Shannon limit error-correcting

coding and decoding: turbo-codes”, Proc. of IEEE ICC’93, Geneva, pp. 1064-1070,

May 1993.

[2] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for

minimizing symbol error rate,” IEEE Transactions on Information Theory, Vol. 20,

pp. 284- 287, Mar. 1974.

[3] C. Weiß, C. Bettstetter, and S. Riedel, “Code Construction and Decoding of Parallel

Concatenated Tail-Biting Codes,” IEEE Transactions on Information Theory, vol. 47,

pp. 368-388, Jan. 2001

[4] J. B. Anderson and M. Hladik, “Tailbiting MAP decoders,” IEEE J. Select. Areas

Commun., vol. 16, pp. 297-302, Feb. 1998.

[5] J. Hsu and C. Wang., “A parallel decoding scheme for turbo codes,” IEEE Int. Conf.

on Circuits and Systems (ISCAS ’98), 4:445-448, Jun. 1998.

[6] A. Giulietti, L.Van der Perre, and M. Strum, “Parallel turbo coding interleavers:

46

Avoiding collisions in accesses to storage elements,” Electron. Lett., vol. 38, no. 5,

pp. 232-234, Feb. 2002.

[7] Jaeyoung Kwak, Sook Min Park, Sang-Sic Yoon, and Kwyro Lee, “Implementation

of a parallel turbo decoder with dividable interleaver,” IEEE International Symposium

on Circuits and Systems, Vol. 2, pp. 65-68, May 2003.

[8] Peter H. -Y. Wu,“On the complexity of turbo decoding algorithms,” VTC 2001

Spring, Vol. 2, pp. 1439-1443, 2001.

[9] A. Tarable, S. Benedetto, and G. Montorsi, “Mapping interleaving laws to parallel

turbo and LDPC decoder architectures,” IEEE Trans. Inf. Theory, vol. 50, no. 9, pp.

2002.2009, Sep. 2004.

[10] D. Gnaedig, E. Boutillon, M. Jezequel, V. Gaudet, and P. Gulak, “On multiple

slice turbo codes,” in Proc. 3rd Int. Symp. on Turbo Codes and Related Topics, Brest,

France, pp. 343-346, Sep. 2003.

[11] C. Berrou, S. Kerouedan Y. Saouter, C. Douillard, and M. Jezequel, “Designing

good permutations for turbo codes: towards a single model,” in Proc. International

Conference on Communications, Paris, France, vol. 1, pp. 341-345, Jun. 2004.

[12] L. Dinoi and S. Benedetto, “Variable-size interleaver design for parallel turbo de-

coder architecture,” IEEE Transactions on Communications, vol. 53, no. 11, pp. 1833-

1840, Nov. 2005.

[13] S. Benedetto and G. Montorsi, “Unveiling turbo codes: some results on parallel

47

concatenated coding schemes,” IEEE Transactions on Information Theory, Vol. 42,

pp. 409-428, Mar. 1996.

[14] R. Garello, P. Pierleoni, and S. Benedetto, “Computing the free distance of turbo

codes and serially concatenated codes with interleavers: Algorithms and applica-

tions,” IEEE J. Sel. Areas Commun., vol. 19, pp. 800.812, May 2001.

[15] E. Rosnes, �yvind Ytrehus, “Improved Algorithms for the Determination of

Turbo-Code Weight Distributions,” IEEE Transactions on Communications, vol. 53,

no. 1, pp. 20-26, Jan. 2005.

[16] “3rd generation partnership project (3GPP) technical specification group: Univer-

sal mobile telecommunications system (UMTS); multiplexing and channel coding

(FDD), TS 25.212 v3.4.0,” Sep. 2000.

[17] 3GPP TSG RAN WG1-43, “Enhancement of Rel. 6 Turbo Code,” Nov. 2005.

[18] Hong-Yeop Song and Jeffrey H. Dinitz, “Tuscan Squares,” Part IV, Chapter 48 of

The CRC Handbook of Combinatorial Designs, edited by Charles J. Colbourn and

Jeffrey H. Dinitz, CRC Press, pp. 480-484, 1996.

[19] Charles J. Colbourn and Jeffrey H. Dinitz, “Latin Squares,” Part II, Chapter 1 of

The CRC Handbook of Combinatorial Designs, edited by Charles J. Colbourn and

Jeffrey H. Dinitz, CRC Press, pp. 97-110, 1996

[20] Wensong Chu, Solomon W. Golomb and Hong-Yeop Song, “Tuscan Squares,” Part

IV, Chapter 63 of The CRC Handbook of Combinatorial Designs, 2nd edition, edited

48

by Charles J. Colbourn and Jeffrey H. Dinitz, CRC Press,to be published in Nov.

2006.

[21] Solomon W. Golomb and Guang Gong, “Signal sets with low correlation,” Chapter

10 of Signal Design for Good Correlation, Cambridge University Press, pp. 344-353,

2005

[22] A. Lempel, H. Greenberger, “Families of Sequences with Optimal Hamming Cor-

relation Properties,” IEEE Transactions on Information Theory, vol. IT-20, no. 1, pp.

90-94, Jan. 1974.

[23] L. Dinoi and S. Benedetto, “Variable-Size Interleaver Design for Parallel Turbo

Decoder Architectures,” IEEE Transactions on Communications, vol. 53, pp. 1833-

1840, Nov. 2005.

[24] S. Dolinar, D. Divsalar, “Weight Distributions for Turbo Codes Using Random and

Nonrandom Permutations,” JPL TDA Progr. Rep., vol. 42-122, Aug. 1995.

49

��כ��

��

��

�� !"��#$�%&��'�(!��)$��*�+, ��

��)$��-��*�.�/�0�.12&�.��3$� �� 4��56��4�

��+, ��0�.12��*�.�4��7�� !"��#$��/�� !�"*��%+,��

��*��89��&:��.��0/$כ0��#�3$!� �/��"#!%$�;<��%��&�&��/�

��=�>?*��%'(�4��0�.@��%$�;<%��&��%&�AB��CD%��&� !"�.�EF��

G��)��'��'*�� 4�*�.��#$#��(��/��HF ��0�.

�� ()�� 4�IJKL �0�. *! ��

�� *+�M� �� ()��/� ��כ�+� /�NOG� 12 �()�� 4��-� �� ,,�� PQ#$

��RJ��+,,-� �!�� S�-.�/-�� 4��

"�%��&��-��0�� ARP (almost regular permutation)G�'�� .��1�� (temporal

permutation)7� /01� �� .��1�� (spatial permutation)"� /�� 2T��PQ

#$��RJ�� !0�.1223��"2#$ ��4 !). �%&,��)��1��)��4�

IJ� ��PQ#$4�כ��!). *�-3�0�.*!��"#!�%&�%��&�&��/��PQ#$��U'�� 7��

/�-� �!�� ARPG� 2T��=�#$� $ ��)��56( *�-3�

0�.

50

56( ��'�� .��1��"�-� �!��0��4#�&��/��PQ#$��S��

��)�� *�-3�0�./01� �� .��1��"�� )��56( *��%M�"2#/01� ��

��V�)�� /�+,) � W�� &��/�� PQ#$��#$KL*�5 #��NF)�� %��U'�� 7��

��V�)��-3�0�.)��"#! ARPG�=�+X*��%%��&�&��/� 3207� 640��YZ��=��5

�)��/�YZ.�6�7"#!��PQ��(�=�(SNR)�� 0.1dB��V��1��4�� !�"*�-3�

0�.[��678��#��"#! 3GPP\]8 !��)�� 4�� "�� *�-3�+,,U'PQ

�0�� HF�� 89+� !"� 56�*�-3�+,, 4�NF'�( )$��#�� (AWGN)�� 4��*�-3�

0�.�� =�9��+�)�� 4�*��%1��)��*�5HF ��

0�.=�9��+�)�� 4�*�� !"� semi-Latin square type7� Kasami sequence set��

/�� Kasami type�� 56( *�-3�0�._�� *!��"#! �� S-��:;+��)��

56( *�-3�0�.)��"#!%��&�&��/� 640, FER 10−5 -��-3 �� ARP��0� 0.1dB,<�-2#S�-.�

0�.

=��S��.��:�� ,�� , �� , ��, /01� �� .��1��, '�� .��1��

51

Collision-free Interleavers using Latin Squares for Parallel Decoding

Documents

Transcript of Collision-free Interleavers using Latin Squares for Parallel Decoding