Collision-free Interleavers using Latin Squares for Parallel Decoding
Transcript of Collision-free Interleavers using Latin Squares for Parallel Decoding
Collision-free Interleavers using Latin
Squares for Parallel Decoding of
Turbo Codes
Hyun-Young Oh
The Graduate School
Yonsei University
Department of Electrical and Electronic
Engineering
Collision-free Interleavers using LatinSquares for Parallel Decoding of
Turbo Codes
Hyun-Young Oh
A Thesis Submitted to the
Graduate School of Yonsei University
in Partial Fulfillment of the
Requirements for the Degree of
Master of Science
Supervised by
Professor Hong-Yeop Song, Ph.D.
Department of Electrical and Electronic EngineeringThe Graduate School
YONSEI University
December 2006
This certifies that the thesis ofHyun-Young Oh is approved.
Thesis Supervisor: Hong-Yeop Song
Sanghoon Lee
Kwang Soon Kim
The Graduate SchoolYonsei UniversityDecember 2006
������� ���
�������� ������ ����� ���������� ��������� �� �� ��� ������ ���� ���������� ��
����� ��� ���� ������� ����� �� ���������. �������� !�� ����� ������ ���" ������
�����������. �#���� ����� ���� �������� $�%&'( )*�������� ���+, ����-. /012345&
64������ 7*�� ����� �������� ����� ��8������ 9:;����-., �� <=� �� ��1( ���"�� 8!��
��� "#$5& >8 �������< ������.
�%�?@?@���&�AB ��#�64CD7*E�������������������������FD�������E����-.'()GH#�
64CD7*�� *+,-�.��� �IJK�� +, ���"L:������. M4��� /012345& N*/01��� �� �������� �� O&
CD7*E��� PQ>�� ������� R 64�21 �!� ��"���S4�%��� �!�1( �� 7*�� ���#"3 4�IJK�� ��
�� ���5�� �IJK�� +,64 ���"L:������. �6��12 ���������� <���� ���/2-. 34$� ��� %&/0
7*E��� �!�1( �� 7*�� ����� �IJK�� ���+, ���"L:������.
�������� ��������� CD T4���-., ���� �������� %� �� '()#��� 2&7��(,��� �(,���(,$%&''(��
�&�AB�� )%� !UD����� �(,V������WX64 %�*�� ���" �� L:������. �&�AB�� (�)+�, �-���.��� *�
CD7*��8��YZ+�,,"JK5&-.���7)1 �����/כ���*���&�AB+0,�1,���*� ���127*����)%�+�,,
2%���9����� [D\� �� %&/07*�� :����� +�,, �&�AB�� �������� ]2�-3'(9�� R /27*�4��� ���:��
+�,, �&�AB�� �� !UD������ ����� �;�E��� ���!;, +�,, ���� �����64 �'( K=� �� *� ��
<#$/2 7*�4��� 7*��� +�,, /012 ������� _V� �� CD7*�� a*������ ���3 4 +�,�� <�#"��� ��
!;1�� 5. \�E��� ����� +�,��WX %�*�� ���" �� L:������. !UD������ *� 6�7bc7*�4��� V�
8=9 +�,, K=� �� ��� [D���WX CD7*����� ���������� ���:�� +�,, �=��� /012 64������ :�� d�
6�>��WX64 ��8������ :?�������. ?@e/ ���� ���/2e/-. ����� 64����� �-;�� R /2 7*���
�� ���� +�,, 0#"<@= +�,, fD�� +�, �=��� /0,����WX ��8������ :?��'(, ��5&8�� E�0�1���
������ ���/2 gKABC��� 2>3:?��� ���V� �� 4(#"������. ���h:5& �&�AB�� ��� ��<#$/2>�� S4
!;,��� ����� ��� ���<D=, UDK��� ��/כ� �E�� )%����, d��#"��� 9:!� �� V�<D=��WX64 ��8���
�� �!�1( �� :?�������. M4��� ��i� "� ��1 ����j��-. ����(,�E����-.64 V��� ������
�8� ���� �� F�)��� ���1(7*�4��� �������� )%�UD�� �����WX64 ���" �� YZ������. N*
kl �? ���%& ��E�� �&�AB�� ��� 34G�� �������� ���5� �� 2%������ 5. \���� H21�� ���9�%� V�
���������.
�������� �WX R V�m#� ���� "IJN*64 �'( ��#�� R /2 7*����� no7*��WX ��8
������ :?�������. <���� ������ *� CD>����-. ����� ���!� ��� :�� �(,��� <D=:���� <D=���
WX64 � �(,����� ��@!���� �!�1(� ��86�� p���������. ���h:5& PQ ��� 2>3:?� ��q*V� ��
$6>7���. <#Kh:5& ��� ���� ������ gK������ ��@!��h:5& �A1(CD7*E��� �!� ��"��� "$0,
��� 7*E�'(, �� ������� %� *�8��� R /27*�� N*34�� +, gK������ ���" �� L:=�'(, � �(,
����� ���� �� #��B� �� �������� $"�����.
2006&7� 12���
�� ��� ��� L:#-3
Contents
List of Figures iv
List of Tables v
Abstract vi
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Parallel Architecture of Turbo Codes 4
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Encoding Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Decoding Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Computing Minimum Distance . . . . . . . . . . . . . . . . . . . . . . 9
2.5 Error-Rate Performance . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Collision-free Interleavers 14
3.1 Collision-free . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
i
3.2 Some Reviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.1 2D Interleaver . . . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2.2 Almost Regular Permutation . . . . . . . . . . . . . . . . . . . 17
4 Proposed Collision-free Interleaver 18
4.1 Construction 1: Using 3GPP interleaver and Latin square . . . . . . . . 18
4.1.1 Latin square type interleaver . . . . . . . . . . . . . . . . . . . 18
4.1.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1.3 Simulation Results and Discussions . . . . . . . . . . . . . . . 28
4.2 Construction 2: Using 3GPP interleaver and Kasami sequence set . . . 33
4.2.1 Kasami type 1 interleaver . . . . . . . . . . . . . . . . . . . . 33
4.2.2 Kasami type 2 interleaver . . . . . . . . . . . . . . . . . . . . 35
4.2.3 Simulation Results and Discussions . . . . . . . . . . . . . . . 36
4.3 Construction 3: Using S-random interleaver and Latin square . . . . . . 40
4.3.1 Collision-free S-random interleaver . . . . . . . . . . . . . . . 40
4.3.2 Simulation Results and Discussions . . . . . . . . . . . . . . . 41
5 Concluding Remarks 44
5.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
Bibliography 45
Abstract (in Korean) 50
ii
List of Figures
2.1 Zero-state-starting encoding of information bits (01010) without tail bits 6
2.2 Tail-biting encoding of information bits (01010) without tail bits . . . . 6
2.3 Encoding of parallel turbo codes . . . . . . . . . . . . . . . . . . . . . 7
2.4 Comparison BER and FER of block size 320 (with various parallelisms) 12
2.5 Comparison BER and FER of block size 640 (with various parallelisms) 13
3.1 Memory collision in parallel turbo code interleaving . . . . . . . . . . . 15
4.1 Intra subblock permutation . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2 Inter subblock permutation . . . . . . . . . . . . . . . . . . . . . . . . 20
4.3 Comparison of column wise 2-tuple pattern of 4 by 4 Latin square . . . 21
4.4 Comparison of FER performance between good and bad groups . . . . 23
4.5 Example of the proposed interleaver of size 18 . . . . . . . . . . . . . . 27
4.6 Comparison BER and FER of block size 320 (Latin square type) . . . . 29
4.7 Comparison BER and FER of block size 640 (Latin square type and
semi-Latin square type) . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.8 Flowchart of making the semi-Latin structure . . . . . . . . . . . . . . 32
4.9 Pseudo code for extracting a hopping sequence in Kasami type 2 . . . . 36
iii
4.10 Comparison BER and FER of block size 320 (Kasami type) . . . . . . . 38
4.11 Comparison BER and FER of block size 640 (Kasami type) . . . . . . . 39
4.12 Comparison BER and FER of block size 640 (Proposed S-random inter-
leaver) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
iv
List of Tables
4.1 Simulation environment . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.2 Minimum distances of 24 Latin square types of blocksize 320, 640 . . . 24
4.3 Weight parameters of Latin square type . . . . . . . . . . . . . . . . . 28
4.4 Weight parameters of Kasami type 1 and type 2 . . . . . . . . . . . . . 36
4.5 Simulation environment of method 1 and method 2 . . . . . . . . . . . 42
v
ABSTRACT
Collision-free Interleavers using Latin Squares forParallel Decoding of Turbo Codes
Hyun-Young OhDepartment of Electricaland Electronic Eng.The Graduate SchoolYonsei University
In many communication systems turbo codes are used widely due to its powerful error
correcting capability. One of the problems of turbo codes is decoding delay. Decod-
ing delay disturbs very high speed communication. Thus many researches of parallel
architecture of turbo codes had been proceeded for reducing decoding delay.
In the parallel architecture of turbo codes, the constituent interleaver must avoid the
memory collision. Because it needs additional delay to solve collision problem. In this
thesis, we first introduce past-researched interleavers which can be applied to the par-
allel architecture of turbo codes, such as almost regular permutations (ARP) and 2D
interleaver constituted by temporal permutation and spatial permutation. Next we ana-
lyze the characteristic of interleaver which directly affect to the performance in parallel
turbo codes. And then we propose collision-free interleavers which can be made easily
to various block sizes.
vi
Performance of the proposed interleavers are almost the same as ARP at FER 10−5
region with the information block size of 320 and 640 when the simulation environment
is given by 3GPP standard turbo codes with 4 parallelism in AWGN channel. The pro-
posed interleavers which use Latin square can be made for various information block
sizes by defining only a single mapping matrix while ARP needs the exhaustive search-
ing processes at every block size. We propose another collision-free interleaver which
uses Kasami sequence set so that gives irregular structure.
We apply well-known S-random interleaver to proposed interleaver structure. The
proposed S-random interleaver uses Latin square structured spatial permutation and is
generated by the modified constraints different from that of the conventional S-random
interleaver. The proposed S-random interleaver shows better error-rate performance than
ARP, 0.1dB improvement at FER 10−5 with the information block size 640.
Key words : Turbo codes, Parallel architecture, Collision-free, Interleaver, temporalpermutation, spatial permutation
vii
Chapter 1
Introduction
1.1 Motivation
Due to its outstanding error-correcting capability, turbo codes have been studied exten-
sively [1]. One of the problems of turbo codes in the application to communication
systems is the decoding delay. Decoding delay of turbo codes is directly influenced by
the information block size. In soft-input soft-output (SISO) decoder of turbo codes, log-
likelihood-ratio (LLR) of information bits can be estimated by forward and backward
recursions [2]. Thus the parallel architecture of turbo codes can be a good solution for
decreasing the decoding delay, where a block is divided into several subblocks and each
of them is (encoded and) decoded separately. No tail bits are needed with the help of cir-
cular tail-biting encoding [3]. Starting and ending states are not a problem. In iterative
decoding procedure, the initial distributions of starting and ending states probability in
BCJR algorithm are set to uniform [4].
In the parallel decoding of turbo codes, many processors must be able to operate si-
multaneously and must avoid memory collisions [6], [9]. If more than one SISO module
try to access the same memory bank to read the bits by the constituent interleaver, the
1
access can not be accomplished at a time and an additional delay may occur. The con-
straints of collision-free in parallel architecture were studied in [6], [9]. In 2003, [10]
proposed a collision-free interleaver organized by temporal and spatial permutations.
We call it here by 2D interleaver. In 2004, [11] proposed another collision-free almost
regular permutation (ARP) defined by several parameters so that they can be simply im-
plemented and optimized by searching for some parameters. Recently, [12] proposed
a collision-free S-random interleaver which can be applied for various block sizes by
pruning.
Deterministic collision-free interleavers proposed in [10] and [11] have a complex
optimizing process. In the case of 2D interleaver we have to decide the spatial and tem-
poral permutations at every block size. We also have to search for the parameters at every
block size to optimize ARP. In communication systems with turbo codes, it is required
to support various block sizes in general. So we need to define its constituent interleaver
at every possible block size, for example, from 40 to 5114 in 3GPP [16]. This thesis
proposes collision-free interleaver structures which can be optimized easily over vari-
ous information block sizes. Furthermore, we propose another collision-free interleaver
based on S-random interleaver, which shows better error-rate performance.
1.2 Overview
In Chapter 2, parallel architecture of turbo codes is described. Encoding and decoding
structures are represented. Performances of parallel architecture and conventional non-
parallel architecture are compared. In Chapter 3, the constraints of a constituent inter-
leaver of the parallel architecture of turbo codes are analyzed. And the past-researched
2
interleavers which can be applied to the parallel architecture are introduced. In Chap-
ter 4, we propose two kind of interleavers which can be applied to the parallel archi-
tecture and easily made up for various block sizes. The comparison with the one of
interleavers mentioned in Chapter 3 is represented in terms of performance and com-
plexity. And we propose another collision-free interleaver. It use the similar constraint
with S-random interleaver when generating. Comparison of error-rate performance is
also represented. Finally the proposed interleaver of this thesis are summarized and
some discussions follow.
3
Chapter 2
Parallel Architecture of TurboCodes
2.1 Introduction
Due to the delay problem of turbo codes, especially, decoding delay problem, many
attempts to reduce the decoding delay through the parallel architecture had been stud-
ied [1], [5]. We assume a single processor performs L iterations. The pipeline structure
can be applied to reduce decoding delay where each processor among W processors
operates on the entire information block for L/W iterations before passing the result-
ing extrinsic information to the next processor in the pipeline and operating on the next
information block [1]. A parallel structure was proposed in 1998 by the authors of [5].
In this scheme each information block is divided into W partially overlapped subblocks
and each of the W processors performs all the L iterations on the W subblocks in paral-
lel. The reason for each window to be overlapped partially is the initialization problem
of the recursion [2]. But the authors in [3], [4] solved this problem.
Nevertheless all these studies mentioned above can reduce the decoding delay of
4
turbo codes, they do not concern about the delay of interleaving. They only try to re-
duce decoding delay by introducing pipeline or windowing method. But the interleav-
ing through the entire information block is executed at every iteration step. The tries
to solve this problem by executing interleaving by parallel fashion had been studied
in [6], [9], [10], [11], [12]. All these issues about the interleaver will be handled in
Chapter 3 in detail.
2.2 Encoding Structure
In the parallel architecture of turbo codes, the encoding is processed at each separate
subblock. If the number of subblocks are n, we say n by the degree of parallelism and
call this structure by n-parallelism, where the whole block is divided into n equal-sized
subblocks. In the conventional turbo codes, (e.g. 3GPP [16]) tail bits must be added for
the trellis termination at the end of information block. Because if not, the reliability of
the bits located at the end of the block will be unstable. Similarly, tail bits must be added
at each subblocks in the parallel turbo codes. So we need more tail bits in parallel turbo
codes than the conventional. The more we divide the block, the more rate loss we suffer.
Without tail bits we cannot always make the ending state be identical to the zero state,
starting state.
But using the algorithm in [3], we call it by circular encoding or tail-biting encoding,
we can make the ending state same as the starting state without tail bits. Fig. 2.1 repre-
sents the conventional convolutional encoding, i.e., starting with the zero state (00), but
the ending state is not guaranteed to be zero state (00). Fig. 2.2 represents the tail-biting
encoding. In tail-biting encoding we can choose the starting state, which results in the
5
00
10
01
11
00
10
01
11
Information bits: 0 1 0 1 0
Figure 2.1: Zero-state-starting encoding of information bits (01010) without tail bits
00
10
01
11
00
10
01
11
0 1 0 1 0Information bits:
Figure 2.2: Tail-biting encoding of information bits (01010) without tail bits
6
Information Information Information
parity1 parity1 parity1
parity2 parity2 parity2
Tail-bitingEncoding
Tail-bitingEncoding
Figure 2.3: Encoding of parallel turbo codes
same ending state with the starting state. But this phenomenon is not always possible.
If the block size is the multiple of the period of constituent convolutional encoder, we
can not apply tail-biting encoding [3]. For example, we assume the number of memories
in the constituent convolutional encoder is 3. Then we must avoid the subblock length
multiple of 7 = 23 − 1.
Fig. 2.3 represents the encoding structure of parallel turbo codes. The information
block is divided into several subblocks. Then encode each divided information block to
generate the first parity bits. After the interleaving by the interleaver, the second parity
bits are generated in the same way.
7
2.3 Decoding Structure
With the help of the tail-biting encoding, no rate loss due to tail bits is possible. But
the decoders don’t know the starting state and the ending state exactly. The decoders
only know the fact that starting state and ending state are identical. So the initial value
of α, β in the BCJR algorithm of iterative decoding process is set to be uniform [4].
This means that the probability distribution of every state is uniform. After the each
iteration, the initial value is updated from the last value at previous iteration. This tail-
biting decoding algorithm can be represented as following equations. The forward and
backward recursion (2.1), (2.2) are executed conventionally in each separate subblock,
where S is the set of states, state transition is assumed to be from s to s′, L-parallelism
is considered and subblock length is M .
αk(s) =∑s′∈S
αk−1(s′)γk(s′, s) (2.1)
βk−1(s′) =∑s∈S
βk(s)γk(s′, s) (2.2)
But due to the tail-biting encoding, the initialization must set to be uniform for all states.
(2.3) is the initialization of alpha value before the first iteration. The initialization of beta
is also like (2.4). Where α(j)i,l,m(s) is the alpha value in m-th bit of l-th subblock, at i-th
decoder, j-th iteration and 1 ≤ m ≤ M , 0 ≤ l ≤ L − 1, i = 1, 2.
α(1)i,l,0(s) = 1/|S|, ∀s ∈ S, l ∈ {0, · · · , L − 1}, i ∈ {1, 2} (2.3)
β(1)i,l,M (s) = 1/|S|, ∀s ∈ S, l ∈ {0, · · · , L − 1}, i ∈ {1, 2} (2.4)
At the n-th iteration,
α(n)i,l,0(s) = α
(n−1)i,l,M (s), ∀s ∈ S, l ∈ {0, · · · , L − 1}, i ∈ {1, 2} (2.5)
8
β(n)i,l,M(s) = β
(n−1)i,l,0 (s), ∀s ∈ S, l ∈ {0, · · · , L − 1}, i ∈ {1, 2} (2.6)
Since we must use the last value of previous iteration to initialize α, β at current iteration,
we need additional memories, 2 · L · |S| storages for α and β values.
2.4 Computing Minimum Distance
In this section we introduce the algorithm of computing minimum distance dmin in paral-
lel architecture of turbo codes. The minimum distance of turbo codes directly influences
the error-rate performance [13]. Bit error-rate Pb(e) is upper-bounded as
Pb(e) ≤ W
2kerfc
(√dminRcEb
N0
)· e
dminRcEbN0 · ∂AC(W,Z)
∂W
∣∣∣∣W=Z=e−RcEb/N0
. (2.7)
Where W , Z are dummy variables, k is block size, dmin is minimum distance, Rc is
code rate and AC(W,Z) is a input redundancy weight enumerating function represent-
ing the weight distribution of turbo codes. Thus the larger the dmin is, the better the
error-rate performance is expected. If we check dmin of interleavers for finding good
ones, we can avoid the poor which gives a small dmin.
The algorithm of finding dmin is based on the method proposed in [14], [15]. Since
turbo codes are linear codes, minimum weight of codewords is equivalent with the min-
imum distance of turbo codes. So computing minimum distance means for finding a
minimum weight codeword. For finding the minimum weight codeword, we must in-
vestigate all possible codewords. But it needs large amount of time. The following
algorithm executes this process efficiently by only considering the candidates which
have large possibility to be the minimum weight codewords. The algorithm starts with
the set of initial vectors {s(0), s(1), · · · , s(N−1)} regarded as information bits, where
9
s(j) = (s(j)i )i=0,1,··· ,N−1 and s
(j)i is defined as
s(j)i =
⎧⎪⎪⎨⎪⎪⎩
0 if i < j
1 if i = j
−1 if j < i
. (2.8)
Here −1 means the undetermined. The algorithm determines the undetermined by in-
vestigating that which of 0 or 1 does result in smaller codeword weight. The method of
determining the undetermined is similar to Viterbi algorithm. For example if we have to
determine k-th element, i.e. the bits up to (k− 1)-th place are determined, then compute
the resulted codeword weight for two cases when the k-th element is 0 and 1. The way
of computing the weight of resulted codeword is the followings.
Step 1 Encode the given vector from the bit index 0 to k − 1. Then compute the weights
of the remained undetermined bit for terminating the trellis with the constraints of
tail-biting and k-th element is 0 or 1.
Step 2 Permute the given vector by the interleaver.
Step 3 Encode the permuted vector. At this time determine the undetermined bits at sev-
eral places for resulting the smallest weight.
Step 4 Compute the sum of the information weight and the first parity weight computed
at Step 1 and the second parity weight computed at Step 3.
Step 5 Compare the value computed at Step 4 when the k-th element is 0 and 1 then
determine the k-th element to be what results in smaller weight.
Step 6 If the resulted weights of two cases are same, add all these two vectors to the list
of the candidates.
10
While doing the above processes for all candidates, store the minimum weight code-
words and updates the dmin and multiplicity Nd. For the sake of saving computing time,
discard candidates from the list and pass to the next candidate if the computed codeword
weight is exceed the expected dmin during the process of determining the undetermined.
2.5 Error-Rate Performance
In this section we represent error-rate performances when various parallelisms are ap-
plied to 3GPP standard turbo codes. Non-parallelism, 4-parallelism and 8-parallelism
are considered with the information block size 320 and 640 respectively. Tail-biting en-
coding and decoding method mentioned in section 2.2 are applied. The degradation tend
to be proportional to the degree of parallelisms as shown in Fig. 2.4 and Fig. 2.5. But the
performance degradation due to the parallel structure is less than 0.1dB at FER 10−3 in
both cases. In the case of 4-parallelism the error-rate performance is almost the same as
non-parallelism. Note that if the moderate parallelism (e.g. 4-parallelism) is applied to
conventional non-parallel turbo codes, the performance degradation may be negligible,
but the decoding delay can be reduced by the degree of parallelisms.
11
0.5 1 1.5 2 2.510
−7
10−6
10−5
10−4
10−3
10−2
10−1
100
K=320, 0,4,8 parallelism, Max.Iter.=8, maxLogMAP
Eb/N0
Err
or R
ate
Non−parallelism4−prallelism8−parallelism
BER
FER
Figure 2.4: Comparison BER and FER of block size 320 (with various parallelisms)
12
0.5 1 1.5 210
−7
10−6
10−5
10−4
10−3
10−2
10−1
100
K=640, 0,4,8 parallelism, Max.Iter.=8, maxLogMAP
Eb/N0
Err
or R
ate
Non−parallelism4−prallelism8−parallelism
FER
BER
Figure 2.5: Comparison BER and FER of block size 640 (with various parallelisms)
13
Chapter 3
Collision-free Interleavers
3.1 Collision-free
The constituent interleaver of the parallel turbo codes must be able to execute parallel
interleaving and parallel de-interleaving. This means that each processor operated in
each subblock can read the bit in the permuted order without any memory collisions [6].
If several processors are about to read the bits in the same memory bank, the collision
occurs. Fig. 3.1 represents these collisions. Processors in the first subblock and the
third subblock are reading the bits in the second subblock simultaneously. These two
processes can be executed only sequentially. So memory collision makes another delay
problem. Because we have to wait one or more (according to the number of collisions)
clock cycles in order to write data, collisions delay the decoding process [6], [7]. As-
sume block size is N , parallelism is L, maximum iteration number is n, the interleaver
is given by the uniform random interleaver and one clock cycle is needed to manage
each memory collision. Then the average added delay of decoding process would be
n · (N/L) ·(
LL−L!LL
). In other words, the probability that collision occurs is LL−L!
LL at
every access to the memory bank. For example, the probability that collision occurs is
14
Information bits
Permutedinformation bits
Interleaver
Collision
Figure 3.1: Memory collision in parallel turbo code interleaving
90 percent with 4 parallelism and 99 percent with 8 parallelism. But clock delays due
to collision problem is relatively small comparing with the whole process of decoding
computation, less than 0.1 percent of decoding time [8]. But since the higher parallelism
is, the more clock delays occur due to the memory collision, it can disturb very fast de-
coding process at highly parallel turbo codes. To avoid memory collision each processor
must access the bits in the different subblock at a time. So we can define collision-free
as Definition 3.1.
Definition 3.1 Let the information block size be N , the number of subblocks be L and
Π(·) be the constituent interleaver. Π(·) is collision-free if ∀i ∈ {0, 1, · · · , N/L − 1},
∀j, k ∈ {0, 1, · · · , L − 1}, j �= k satisfies the following constraint.
⌊Π(
i +N
Lj
)/(N/L)
⌋�=⌊Π(
i +N
Lk
)/(N/L)
⌋
15
3.2 Some Reviews
In this section we review two collision-free interleavers proposed in [10] and [11], re-
spectively. One is 2D interleaver and the other is ARP.
3.2.1 2D Interleaver
A collision-free interleaver constituted by two permutations, named by temporal permu-
tation and spatial permutation was proposed in [10]. At first the temporal permutation
permutes bits in each subblock. And then the spatial permutation permutes bits among
the subblocks. Let the number of information bits be K , the number of subblocks be
L and the number of bits in each subblock be M . Then M is equal to K/L, where L
should divide K. The bit index k, where k ∈ {0, 1, · · · ,K − 1}, can be represented by
the 2 dimensional array structure with temporal index t and spatial index s. We have the
relation k = s ·M + t, where s ∈ {0, 1, · · · , L− 1} and t ∈ {0, 1, · · · ,M − 1}. Let the
temporal permutation be denoted by ΠT (t, s) and the spatial permutation by ΠS(t, s).
Then a collision-free 2D interleaver is defined as
Π(k) = Π(t, s) = ΠS(t, s) · M + ΠT (t, s). (3.1)
A bit of index k in the permuted order is read from ΠT (t, s)-th position of ΠS(t, s)-th
subblock. In parallel architecture, permuting is completed by L processors allocated to
each subblock during M ticks, where a tick means the time for a processor to access
or read one bit (or a soft decision value) from the memory. For every processor to
read bits from all distinct subblocks, ΠS(t, s) must satisfies the condition that for every
t ∈ {0, 1, · · · ,M − 1}, ΠS(t, s)s=0,1,··· ,L−1 are in one-to-one correspondence with
16
subblocks 0, 1, · · · , L − 1. [10] defines ΠS(t, s) as a simple rotational form. As long
as ΠS(t, s) satisfies the collision-free constraint, one can easily make a collision-free
interleaver using any permutation as ΠT (t, s).
3.2.2 Almost Regular Permutation
ARP was proposed in 2004 by C. Berrou, based on the relative prime interleaver [11].
Periodic fluctuation patterns are added as
Π(k) = (P · k + C · (α(k) · P + β(k)) + γ) (mod K). (3.2)
Here, P is relatively prime with the information block size K, C is a period, α(k) and
β(k) are positive integer sequences of period C for 0 ≤ k ≤ K−1, γ is an initial offset.
It shows an impressive performance improvement against 3GPP interleaver, specifically,
0.55 dB improvement at FER 10−5 with the information block size 640 in AWGN chan-
nel, where the simulation environment is given by 3GPP standard turbo codes with max
log MAP decoding algorithm and a number of iterations is 8 [17].
Although the performance of ARP is good, the optimizing process is complex. It
requires an exhaustive search whenever the block size changes. For example in 4 par-
allelism, i.e., C = 4, we must find 4 parameters. In [11], α repeats the pattern of
(0, 0, 1, 1) or (0, 1, 0, 1) and β repeats (β0 = 0, β1, β2, β3) with βi ∈ {0, 1, · · · , 8}.
Thus for each candidate P we must investigate(83
)= 56 cases, since we may assume
that γ and α’s are known and βi’s are distinct. Therefore the process of finding an op-
timal ARP over various information block sizes contains a huge amount of computing
time and complexity.
17
Chapter 4
Proposed Collision-free Interleaver
4.1 Construction 1: Using 3GPP interleaver and Latin square
4.1.1 Latin square type interleaver
We define a collision-free interleaver by rewriting spatial permutation as a matrix form
as
Π(k) = Π(s · M + t) = uts · M + ΠT (t). (4.1)
Here the M by L matrix U = {uts} indicates the mapping among subblocks, where
t is the temporal index and s the spatial index. In L parallelism, at t-th tick in s-th
subblock, the s-th processor reads (uts · M + ΠT (t))-th bit (or the soft decision value).
To avoid memory collision, each row of U must be the permutation of subblocks,
0, 1, · · · , L − 1. To define a collision-free interleaver easily for various kind of possi-
ble block sizes, we use a pre-structured interleaver as the temporal permutation ΠT (t).
For example, the interleaver of 3GPP standard is defined for the block size from 40 to
5114 [16]. Since ΠT (t) is given, the optimizing process is to decide the mapping matrix
U, i.e., to find M permutations of {0, 1, · · · , L−1}. One of the ways to choose a permu-
tation is to generate randomly. But the collision-free interleaver made by the randomly
18
generated U turned out to be not good enough. It only gives the average performance
among all possible cases.
We must avoid bad patterns among possible permutation patterns. One of the rea-
sons that degrades the performance is the spreading characteristic of the bits in the same
subblock before and after permuting. If two bits in the same subblock remain in the same
subblock after permuting as shown in Fig. 4.1, those bits make a cycle which restricts
the propagation of messages in the iterative decoding process. Since every pair of bits
in the form of Fig. 4.1 makes a cycle, L · (M/L2
)cycles exist at each subblock when
L divides M . If two bits in the same subblock are permuted to different subblocks as
shown in Fig. 4.2 or conversely, two bits in the different subblocks are permuted to the
same subblock, they do not form a cycle. To decrease the number of cycles of the type
shown in Fig. 4.1, we must move the bits in the same subblock to different subblocks
as much as possible. It means that the multiplicity of subblock indices in each column
vector of U must be nearly or exactly M/L.
i j
ij
Figure 4.1: Intra subblock permutation
19
The Latin square is the L by L square matrix over an alphabet of size L, where
every row and every column are the permutation of L symbols [19]. We define U matrix
by the form of columnwise repetition of an L by L Latin square. We call it by Latin
square structured U. It is satisfied that the distribution of multiplicity of subblock in-
dices in each column vector is uniform by the repeating feature. And the cycle length is
guaranteed by at least L+1, where the cycle length is defined by |i − j|+ |Π(i) − Π(j)|
in Fig. 4.1.
The Latin square structure has also advantage for complexity. At first we only need
to store the L by L matrix instead of the whole M by L matrix. For example, K = 640,
L = 4, then we only need to store 4 · 4 elements instead of 160 · 4 elements. Secondly,
we have lower complexity in optimizing process. In the case of L = 4, the first row of
U is initialized by (0, 1, 2, 3). Other resulting interleavers with the different initializa-
tion can be derived by permuting the subblock labels. Thus we only need to investigate
24 cases, instead of 576 cases corresponding to the all possible cases of 4 by 4 Latin
i j
i j
Figure 4.2: Inter subblock permutation
20
⎛⎜⎜⎝
0 1 2 31 0 3 22 3 0 13 2 1 0
⎞⎟⎟⎠
(a) Bad Spreading
⎛⎜⎜⎝
0 1 2 31 0 3 22 3 1 03 2 0 1
⎞⎟⎟⎠
(b) Good Spreading
Figure 4.3: Comparison of column wise 2-tuple pattern of 4 by 4 Latin square
squares. Furthermore we can reduce this number to 12 by picking out good ones among
them. A criterion is the distribution of two-tuple pattern of the column vector of U. One
of the bad and good cases are represented in Fig. 4.3.
Since the each column vector implies the subblock permuting pattern of one decod-
ing block, the distribution of the patterns in each column vector directly influences the
performance. For example, observe the consecutive two-tuple patterns (0, y) along the
(circular) column where y ∈ {0, 1, 2, 3}. Fig. 4.3(a) contains only the patterns (0, 1)
and (0, 3) twice, but Fig. 4.3(b) contains (0, 2), (0, 3) once and (0, 1) twice. Therefore,
Fig. 4.3(b) is expected to give a better performance than Fig. 4.3(a), since it is closer
to the uniform distribution than the other. The ideal distribution of the patterns would
be (0, 1), (0, 2), (0, 3) and (0, 4) exactly once, and furthermore, every pattern of (x, y)
occurs exactly once for x, y ∈ {0, 1, 2, 3, 4} and x �= y, which is equivalent to a circular
Tuscan array [18], [20].
By this criterion we can divide 24 cases into 2 groups, good and bad groups where
each group has 12 cases. The comparison of the performance of these two groups is
21
shown in Fig. 4.4. We use 3GPP interleaver of size 160 as the temporal interleaver.
Simulation environment is 3GPP standard turbo codes of the information blocksize 640
with 4 parallelism. Decoding algorithm is given by max log MAP and the maximum
iteration number is 8 with Genie stopping rule, i.e., the iterations are stopped when there
are no information bits in error. The simulation environment is summarized in table 4.1.
FER curves of good group and bad group are distinguished definitely by the curve of
3GPP interleaver of the size 640 in Fig. 4.4.
And the minimum distances of 24 cases computed by the algorithm described in
section 2.4 are represented in Table 4.2 for blocksize 320 and 640, where dmin is the
minimum distance, Nd is the number of codewords whose weight is dmin and w is the
sum of all the weights of minimum weight codewords. U(good)i means that it has good
spreading property and U(bad)i conversely. dmin and Nd is directly influence the upper
bound of FER and w is also influence the upper bound of BER [13]. In the case of
blocksize 640, dmin is 28 for all cases. In blocksize 320, dmin is 28 or 29. The mini-
Table 4.1: Simulation environment
Decoding algorithm Max log MAP algorithm
Maximum iteration 8
Stopping criterion Genie
Information block size 640
Parallelism 4
Temporal interleaver 3GPP standard interleaver of size 160
Code rate 1/3
Modulation BPSK
Channel model AWGN
22
mum weight parameters (dmin, Nd, w) of 12 good cases piked out from 24 cases by the
criterion as mentioned in section 4.1.1 are tend to be better than them of bad cases, but
they don’t actually be distinguished. That is, dmin and its multiplicity are not dominant
factors which affect FER performance but the spreading factor at spatial permutation
affects more as shown in Fig 4.4. But we can eliminate a candidate which brings about
smaller dmin than others from good 12 cases. Actually in blocksize 320 U(good)7 and
U(good)9 are left out.
1.4 1.5 1.6 1.7 1.8 1.9 2 2.110
−5
10−4
10−3
10−2
K=640, L=4, Max.Iter.=8, maxLogMAP, Good and Bad Cases
Eb/N0
Err
or R
ate
Bad Group Element3GPP(FER)Good Group Element
Bad Group
Good Group
Figure 4.4: Comparison of FER performance between good and bad groups
23
Table 4.2: Minimum distances of 24 Latin square types of blocksize 320, 640
Latin
square
(dmin, Nd, w) Latin
square
(dmin, Nd, w)
640 320 640 320
U(bad)1 =
0BBBB@
0 1 2 3
1 0 3 2
2 3 0 1
3 2 1 0
1CCCCA
(28,3,12) (28,2,8) U(good)13 =
0BBBB@
0 1 2 3
2 3 0 1
3 0 1 2
1 2 3 0
1CCCCA
(28,4,16) (29,1,3)
U(good)2 =
0BBBB@
0 1 2 3
1 0 3 2
2 3 1 0
3 2 0 1
1CCCCA
(28,2,8) (29,2,6) U(bad)14 =
0BBBB@
0 1 2 3
2 3 0 1
3 2 1 0
1 0 3 2
1CCCCA
(28,3,12) (29,2,6)
U(good)3 =
0BBBB@
0 1 2 3
1 0 3 2
3 2 0 1
2 3 1 0
1CCCCA
(28,2,8) (29,1,3) U(bad)15 =
0BBBB@
0 1 2 3
2 3 1 0
1 0 3 2
3 2 0 1
1CCCCA
(28,6,24) (28,1,4)
U(bad)4 =
0BBBB@
0 1 2 3
1 0 3 2
3 2 1 0
2 3 0 1
1CCCCA
(28,1,4) (29,1,3) U(good)16 =
0BBBB@
0 1 2 3
2 3 1 0
3 2 0 1
1 0 3 2
1CCCCA
(28,1,4) (29,2,6)
U(bad)5 =
0BBBB@
0 1 2 3
1 2 3 0
2 3 0 1
3 0 1 2
1CCCCA
(28,4,16) (28,1,4) U(good)17 =
0BBBB@
0 1 2 3
3 0 1 2
1 2 3 0
2 3 0 1
1CCCCA
(28,4,16) (29,1,3)
U(good)6 =
0BBBB@
0 1 2 3
1 2 3 0
3 0 1 2
2 3 0 1
1CCCCA
(28,1,4) (29,2,6) U(bad)18 =
0BBBB@
0 1 2 3
3 0 1 2
2 3 0 1
1 2 3 0
1CCCCA
(28,4,16) (28,1,4)
U(good)7 =
0BBBB@
0 1 2 3
1 3 0 2
2 0 3 1
3 2 0 1
1CCCCA
(28,3,12) (28,2,8) U(bad)19 =
0BBBB@
0 1 2 3
3 2 0 1
1 0 3 2
2 3 1 0
1CCCCA
(28,4,16) (28,1,4)
U(bad)8 =
0BBBB@
0 1 2 3
1 3 0 2
3 2 1 0
2 0 3 1
1CCCCA
(28,3,12) (29,1,3) U(good)20 =
0BBBB@
0 1 2 3
3 2 0 1
2 3 1 0
1 0 3 2
1CCCCA
(28,6,24) (29,2,6)
24
U(good)9 =
0BBBB@
0 1 2 3
2 0 3 1
1 3 0 2
3 2 1 0
1CCCCA
(28,5,20) (28,2,8) U(bad)21 =
0BBBB@
0 1 2 3
3 2 1 0
1 0 3 2
2 3 0 1
1CCCCA
(28,4,16) (29,1,3)
U(bad)10 =
0BBBB@
0 1 2 3
2 0 3 1
3 2 1 0
1 3 0 2
1CCCCA
(28,2,8) (29,2,6) U(good)22 =
0BBBB@
0 1 2 3
3 2 1 0
1 3 0 2
2 0 3 1
1CCCCA
(28,4,16) (29,2,6)
U(bad)11 =
0BBBB@
0 1 2 3
2 3 0 1
1 0 3 2
3 2 1 0
1CCCCA
(28,5,20) (28,2,8) U(good)23 =
0BBBB@
0 1 2 3
3 2 1 0
2 0 3 1
1 3 0 2
1CCCCA
(28,4,16) (29,1,3)
U(good)12 =
0BBBB@
0 1 2 3
2 3 0 1
1 2 3 0
3 0 1 2
1CCCCA
(28,4,16) (29,2,6) U(bad)24 =
0BBBB@
0 1 2 3
3 2 1 0
2 3 0 1
1 0 3 2
1CCCCA
(28,4,16) (29,2,6)
25
4.1.2 Example
One of the merits of (4.1) is that we can use the spreading features of short length inter-
leaver ΠT . After permuting, the bits which was in the same subblock before permuting
and is in the same subblock after permuting, are come a part in the same degree de-
fined in ΠT . Fig. 4.5 shows the example of the proposed interleaver, where K = 18,
L = 3, M = 6, ΠT = (2, 0, 5, 3, 4, 1). And 6 by 3 mapping matrix U is the columnwise
repetition of Latin square (4.2) twice.⎛⎜⎜⎝
0 1 2
2 0 1
1 2 0
⎞⎟⎟⎠ (4.2)
The resulting permuted bits in Fig. 4.5(f) inherited the spreading feature of in-
terleaver ΠT . In each subblock there are 2 bits interleaved from the same subblock
respectively. Those bits are separated by the degree defined at ΠT and the positions in
the subblock are also the same as defined in ΠT . For example the second and the third
bit in the first subblock are permuted to the position of the first and the fourth in the first
subblock respectively. In the sense of the quality of interleaver, the bits in the same sub-
block before permuting must be scattered as far as possible after permuting as mentioned
in section 4.1.1. Since the bits in the same subblock before permuting is related with one
and another in the first decoding process, to gain as much information as possible at the
second decoding, we must scatter the bits related one and another in the first decoding as
far as possible. The proposed scheme uses the spreading feature of the short interleaver
ΠT both relative and absolute position-scattering pattern.
26
(a) tick 0
(b) tick 1
(c) tick 2
(d) tick 3
(e) tick 4
2 2 20 0 05 5 53 4 1 3 4 1 3 4 1
0 1 2 3 4 5 0 1 2 3 4 5 0 1 2 3 4 5
(f) tick 5
Figure 4.5: Example of the proposed interleaver of size 18
27
4.1.3 Simulation Results and Discussions
In this section we compare the performance of the proposed interleaver with ARP and
3GPP standard interleaver when K = 320 and 640. Let L = 4 for both cases. The pa-
rameters of ARP in [17] is defined as follows. α repeats (0, 0, 1, 1), β repeats (0, β1, β2, β3)
and γ is 3. P = 197, β1 = 2, β2 = 5, β3 = 3 for the information block size 320 and
P = 201, β1 = 6, β2 = 3, β3 = 1 for the information block size 640. In the proposed
interleaver we use the 3GPP interleaver of size 80, 160 as a temporal permutation and
the Latin square structured U matrix is given as
U1 =
⎛⎜⎜⎜⎜⎜⎝
0 1 2 3
1 0 3 2
3 2 0 1
2 3 1 0
⎞⎟⎟⎟⎟⎟⎠ , U2 =
⎛⎜⎜⎜⎜⎜⎝
0 1 2 3
2 3 0 1
1 2 3 0
3 0 1 2
⎞⎟⎟⎟⎟⎟⎠ . (4.3)
U1 shows the best performance for K = 320, U2 for K = 640 among 12 candidates.
The minimum distance and other weight parameters are as shown in Table 4.3. Fig. 4.6
and Fig. 4.7 are the curves of BER, FER versus Eb/N0 for K = 320 and K = 640,
respectively. Other simulation environments are the same as those in Table 4.1. The
proposed interleaver shows almost the same performance as ARP for K = 320 in Fig.
Table 4.3: Weight parameters of Latin square type
InterleaversBlocksize
320 640
3GPP Standard Interleaver (25,1,3) (30,1,2)
ARP (33,2,6) (30,1,6)
Latin square type (29,1,3) (28,4,16)
28
0.5 1 1.5 2 2.510
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
K=320, L=4, Max.Iter.=8, maxLogMAP
Eb/N0
Err
or R
ate
3GPPARPLatin Square Type
BER
FER
Figure 4.6: Comparison BER and FER of block size 320 (Latin square type)
4.6. There is 0.09 dB degradation at FER 10−5 and 0.06 dB degradation at BER 10−7 in
Fig. 4.7.
One of the reasons that degrades the performance is the regular structure of U
matrix. So we add the irregularity to the Latin square structured U. We generate
M permutations randomly as the row vectors of U with the constraint that for s ∈
{0, 1, · · · , L − 1}, t ∈ {σ − 1, σ, · · · ,M − 1} and σ ≤ L, the σ consecutive elements
of each column vector, ut,s, ut−1,s, · · · , ut−σ+1,s, are all distinct. i.e., σ = L means
the Latin square structure. We call it by the semi-Latin square structure. Flowchart of
29
making the semi-Latin structure is represented as shown in Fig. 4.8. This semi-Latin
square structure may break the uniform distribution of subblocks but the irregularity can
make the error-rate performance better. Fig. 4.7 shows the performance when we use
the semi-Latin square structure with σ = 3. It shows the performance almost the same
as ARP. The minimum distance of a semi-Latin type interleaver used in simulation is 30
which is the same as ARP.
Note that the proposed interleaver performs almost the same as ARP and we can
make a near-optimized interleaver with much lower complexity in the optimization. Fur-
thermore, we can make a collision-free interleaver of the length between 40 · L and
5114 · L by just defining a single L by L mapping matrix U. One may use any pre-
structured interleaver in the design of the proposed construction of interleavers.
30
0.5 1 1.5 210
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
K=640, L=4, Max.Iter.=8, maxLogMAP
Eb/N0
Err
or R
ate
3GPPARPLatin Square TypeSemi−Latin Square Type
BER
FER
Figure 4.7: Comparison BER and FER of block size 640 (Latin square type and semi-Latin square type)
31
Initialization
1.2. Decide 3.
Generating permutation
Set by random permutation of
Storing
1. Store 2.
Yes
No
Yes
No
End
For
are all different?
Figure 4.8: Flowchart of making the semi-Latin structure
32
4.2 Construction 2: Using 3GPP interleaver and Kasami se-quence set
4.2.1 Kasami type 1 interleaver
In section 4.1 the irregular structure of spatial permutation can be applied for a better
error-rate performance. In this section we define the spatial permutation using Kasami
small sequence set. Kasami small sequence sets have parameters (2n−1, 2n/2, 2n/2+1),
where a period is 2n − 1, 2n/2 sequences are in the set and maximum absolute value of
correlation is 2n/2 + 1 [21]. We use the following notation: n = 2m, q = 2n, α is a
primitive element in F2n , v = 2m − 1 and d = 2m + 1. Then let sλ = {sλ,i} be a binary
sequence whose elements are given by
sλ,i = fλ(αi), i = 0, 1, · · · , (4.4)
where
fλ(x) = Trm1
(Trn
m(x2) + λxd)
, λ ∈ F2m , x ∈ F2n . (4.5)
A sequence set S consists of sλ for all λ ∈ F2m ; that is,
S = {sλ such that λ ∈ F2m}. (4.6)
Then make hopping sequence set H = {hλ} from Kasami small sequence set S by
converting m consecutive binary bits to decimal system. So hλ,i can be represented like
hλ,i = sλ,i · 2m−1 + sλ,i+1 · 2m−2 + · · · + sλ,i+m−1. (4.7)
We introduce the following Lemma and Theorem to show that the hopping sequence set
H can be applied for the spatial permutation. Lemma 4.1 is about one of the well-known
properties of M -sequences [22].
33
Lemma 4.1 (Window Property) Let W be the set of all, but the all-zero n-tuples over
GF (p), and let B = {b(j)} be an M -sequence of length q = pn − 1 over GF (p).
Then for each w ∈ W there exists a unique index j, 0 ≤ j < q, such that w =
b(j), b(j + 1), · · · , b(j + n − 1).
The proposed Kasami type 1 interleaver is based on the following theorem.
Theorem 4.1 Let S be a Kasami small sequence set defined by (4.6) and H be a hopping
sequence set defined by (4.7). Then, for each λ ∈ F2m , all i-th elements of hλ (hλ,i, i =
0, 1, · · · ) are distinct respectively.
Proof: From (4.4) and (4.5),
sλ,i = fλ(αi) (4.8)
= Trm1
(Trn
m(α2i) + λαdi)
(4.9)
= Trn1 (α2i) + Trm
1 (λαdi) (4.10)
Let β = αd then ord(β) = 2m + 1 so λ can be represented by some powers of β. Let
λ = βr, r = 0, 1, · · · , 2m − 2. Thus,
= Trn1 (α2i) + Trm
1 (βi+r) (4.11)
sλ,i is the modular 2 addition of i-th elements of M -sequence of length 2n − 1 and r
shifted M -sequence of length 2m − 1. Therefore successive m-tuples of sλ and sλ′ are
distinct by Lemma 4.1, where λ �= λ′ .
Set the each column vector of M by L matrix U = {uts} at the equation (4.1)
by the hopping sequence hλ defined in (4.7). Let ut,s = hλs,t where λs ∈ Fm2 , s =
34
0, 1, · · · , L−1(= 2m−1). The parameters of Kasami small sequence set are dependent
only on n. When we design interleavers we must consider various kind of combination
of parallelism and blocksize. Thus we must change the assumption ‘m = n/2’ to ‘m
divides n’ and ‘d = 2m + 1’ to ‘d = 2n−12m−1 ’. Then we have more degree of freedom
about length and parallelism.
4.2.2 Kasami type 2 interleaver
The various length of runs in sequences is desirable in the point of randomness, where
the run means repetition of identical symbol or bit. But the run property of hopping
sequence hλ can make short cycle as shown in Fig. 4.1. Thus a direct application
Kasami sequence to spatial permutation brings about so many short length cycles that
they degrade error-rate performance. Thus we propose to extract only the entries of hλ
which don’t make runs. The following pseudo code generates new hopping sequence set
H(2) which don’t have runs but the length gets be smaller. We need to set high value on
n enough for the length of the extracted hopping sequence to be greater than the desired
length.
35
for (λ ∈ Fm2 )
h(2)λ,0 = hλ,0;
count=0;
for (i = 1, 2, · · · , 2m − 1)
if (hλ,i �= h(2)λ,count for all λ ∈ Fm
2 )
Store hλ,i to h(2)λ,count+1;
count++;
else
continue;
Figure 4.9: Pseudo code for extracting a hopping sequence in Kasami type 2
4.2.3 Simulation Results and Discussions
In this section we compare the BER, FER performance among 3GPP standard, ARP,
Kasami type 1 and Kasami type 2 interleaver of blocksize 320 and 640 with 4 paral-
lelism. Other simulation environments are the same as Table 4.1. We set n = 8,m = 2
at Kasami type 1, n = 10,m = 2 at Kasami type 2 and n = 12,m = 2 at another
Kasami type 2. And the weight parameters are as shown in Table 4.4. In the case of
Table 4.4: Weight parameters of Kasami type 1 and type 2
InterleaversBlocksize
320 640
3GPP Standard Interleaver (25,1,3) (30,1,2)
ARP (33,2,6) (30,1,6)
Kasami type 1 (n = 8) (18,1,2) (20,1,4)
Kasami type 2 (n = 10) (17,1,3) (24,1,4)
Kasami type 2 (n = 12) (22,3,6) (26,2,4)
36
blocksize 320, all Kasami types are even worse than 3GPP standard as shown in Fig.
4.10. FER of Kasami type 1 is 0.35 dB worse than 3GPP standard at 10−4 region,
Kasami type 2, when n = 10, is 0.2 dB worse than 3GPP and Kasami type 2, when
n = 12, is 0.15 dB worse than 3GPP at the same FER region. In the case of blocksize
640, FER performance of Kasami type 1 is 0.1dB worse than 3GPP standard at 10−4
region as shown in Fig. 4.11. FER of Kasami type 2, when n = 10, is almost the same
as ARP at FER 10−5 region. Kasami type 2, when n = 12 shows a little better perfor-
mance than ARP by 0.03 dB at the same FER region.
As analyzed in section 4.1.1, long runs of spatial permutation degrade the error-rate
performance. Thus Kasami type 2 which doesn’t have runs shows better performance
than Kasami type 1. And when we set the value of n to be larger, it shows a slight better
FER performance than Kasami type 2 with smaller n in both blocksize 320 and 640.
37
0.5 1 1.5 2 2.510
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
K=320, L=4, Max.Iter.=8, maxLogMAP
Eb/N0
Err
or R
ate
3GPPARPKasami Type 1 (n=8,m=2)Kasami Type 2 (n=10,m=2)Kasami Type 2 (n=12,m=2)
BER
FER
Figure 4.10: Comparison BER and FER of block size 320 (Kasami type)
38
0.5 1 1.5 210
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
K=640, L=4, Max.Iter.=8, maxLogMAP
Eb/N0
Err
or R
ate
3GPPARPKasami Type 1 (n=8,m=2)Kasami Type 2 (n=10,m=2)Kasami Type 2 (n=12,m=2)
BER
FER
Figure 4.11: Comparison BER and FER of block size 640 (Kasami type)
39
4.3 Construction 3: Using S-random interleaver and Latinsquare
4.3.1 Collision-free S-random interleaver
In this section we propose another collision-free interleaver using S-random interleaver.
The design of an S-random interleaver guarantees that if two input bits to the interleaver
Π are within distance S, they cannot be mapped to a distance less than S apart at the
interleaver output [23]. S-random interleaver can be made by generating random inte-
gers i, 1 ≤ i ≤ K [24]. Each randomly selected integer is compared to the S previously
selected integers. If the current selection is equal to any of the S previous selections
within a distance of ±S, then the current selection is rejected. This process is repeated
until all K integers are selected. So, considering two indexes i, j such that
0 < |i − j| ≤ S (4.12)
the design imposes that
|Π(i) − Π(j)| > S (4.13)
We call the above (4.12), (4.13) constraints by S-constraint. The S-random design tends
to increase the length of cycles defined in section 4.1.1. S-random design can make
more irregularities than the proposed interleaver in section 4.1.1.
We can think the following two methods when applying S-constraint to collision-
free interleaver.
1) Method 1
• Using S-random interleaver of size M as the temporal permutation
40
Method 1 is different from the interleaver in section 4.1.1 only in the point of using S-
random interleaver as temporal permutation instead of 3GPP standard interleaver.
2) Method 2
• Making the interleaver of size K in accordance with the S-constraint
At first, determine the Latin square structured spatial permutation. Then it is determined
that which subblock each SISO module has to access one by one. Next, decide that
which position in the subblocks has to be accessed in order with the S-constraint. Since
we consider the parallel structure S-constraint has to be modified. Consider (4.12),
(4.13) only if �i/M� = �j/M� and �Π(i)/M� = �Π(j)/M�. Because the two bits only
in the case as shown in Fig. 4.1 make cycles and affect the performance. We call it by
modified S-constraint. The searching time for this algorithm increases with S, and it is
not guaranteed to finish successfully. However choosing S <√
K/2 usually produce
a realization in reasonable time. And the higher S is, the better error-rate performance
tends to be.
4.3.2 Simulation Results and Discussions
We compare two methods to make collision-free S-random interleaver with ARP. The
simulation environments are given as Table 4.5. Since S-random interleaver is not de-
terministic interleaver, we make 100 realizations respectively. Collision-free S-random
interleaver made by the method 1 shows the worst performance as shown in Fig. 4.12.
It is 0.3dB worse than ARP at FER 10−5. Collision-free S-random interleaver made by
the method 2 shows better performance than ARP, 0.1dB better than ARP at FER 10−5.
41
Good spreading features of temporal permutation don’t greatly affect spreading fea-
tures of the whole interleaver so that FER performance is not improved but even wors-
ened. Method 2 which generates the random number with the constraint that maintains
spreading features of the whole interleaver within designed S value, shows better FER
performance.
Table 4.5: Simulation environment of method 1 and method 2
Method 1 Method 2
Decoding algorithm Max log MAP algorithm
Maximum iteration 8
Stopping criterion Genie
Information block size 640
Parallelism 4
Temporal permutation S-random inter-leaver of size 160with S = 9
Generate randomnumber satisfy-ing the modifiedS-constraint withS = 15
Spatial permutation Columnwise repetition of
⎛⎜⎜⎜⎜⎜⎝
0 1 2 3
2 3 0 1
3 0 1 2
1 2 3 0
⎞⎟⎟⎟⎟⎟⎠
Number of realization of interleaver 100
Code rate 1/3
Modulation BPSK
Channel model AWGN
42
0.5 1 1.5 210
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
100
K=640, L=4, Max.Iter.=8, maxLogMAP
Eb/N0
Err
or R
ate
ARPS−random (Method 1)S−random (Method 2)
BER
FER
Figure 4.12: Comparison BER and FER of block size 640 (Proposed S-random inter-leaver)
43
Chapter 5
Concluding Remarks
5.1 Summary
In this thesis, we have proposed collision-free interleavers constituted by temporal per-
mutation and spatial permutation so that it can avoid memory collision.
At first, we have proposed to use pre-structured interleaver as the temporal inter-
leaver and Latin square structured spatial permutation so as to reduce the complex of
optimizing processes. We call it by Latin square type. One can make interleavers of
various block sizes by just defining a single mapping matrix. The optimizing process
is much less complex than that for ARP. In the case of 4 parallelism, only 12 cases are
investigated, regardless of block size. The proposed interleaver shows the performance
almost the same as ARP. Irregular structure can be added as proposed in this thesis,
semi-Latin square type or Kasami type can be used for a little better performance.
Finally, we have proposed collision-free S-random interleaver. It also use the struc-
ture of temporal permutation and spatial permutation. Spatial permutation is the Latin
square structure, too. But temporal permutation is generated randomly with the modified
S-constraint as mentioned in this thesis. It shows 0.1dB better performance than ARP at
44
FER 10−5 with the information block size 640 and 4-parallelism.
5.2 Future Directions
We have proposed two kinds of collision-free interleavers, deterministic one and nonde-
terministic one. Latin square type and Kasami type are deterministic, semi-Latin type
and collision-free S-random interleaver are nondeterministic. In the further research, the
following problems are desired to be studied.
• In the case of the deterministic one, although the optimizing process is simple,
the regular structure can limit the error-rate performance. We need to study more
about the way of adding irregular patterns.
• In the case of the nondeterministic one, especially S-random interleaver, many
researches about prunable S-random interleaver which can be applied to various
block size by pruning only if one can design a single prunable interleaver, have
been studied so far. Thus we need to study the pruning technique of the proposed
collision-free S-random interleaver.
• In the process of finding L by L Latin squares which have good spreading property
in the case of the value of L is large, that is in the case of higher parallelism, we
need to have some algorithm to always be able to generate Latin square which has
good spreading property.
45
Bibliography
[1] C. Berrou, A. Glavieux and P. Thitimajshima, “Near Shannon limit error-correcting
coding and decoding: turbo-codes”, Proc. of IEEE ICC’93, Geneva, pp. 1064-1070,
May 1993.
[2] L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for
minimizing symbol error rate,” IEEE Transactions on Information Theory, Vol. 20,
pp. 284- 287, Mar. 1974.
[3] C. Weiß, C. Bettstetter, and S. Riedel, “Code Construction and Decoding of Parallel
Concatenated Tail-Biting Codes,” IEEE Transactions on Information Theory, vol. 47,
pp. 368-388, Jan. 2001
[4] J. B. Anderson and M. Hladik, “Tailbiting MAP decoders,” IEEE J. Select. Areas
Commun., vol. 16, pp. 297-302, Feb. 1998.
[5] J. Hsu and C. Wang., “A parallel decoding scheme for turbo codes,” IEEE Int. Conf.
on Circuits and Systems (ISCAS ’98), 4:445-448, Jun. 1998.
[6] A. Giulietti, L.Van der Perre, and M. Strum, “Parallel turbo coding interleavers:
46
Avoiding collisions in accesses to storage elements,” Electron. Lett., vol. 38, no. 5,
pp. 232-234, Feb. 2002.
[7] Jaeyoung Kwak, Sook Min Park, Sang-Sic Yoon, and Kwyro Lee, “Implementation
of a parallel turbo decoder with dividable interleaver,” IEEE International Symposium
on Circuits and Systems, Vol. 2, pp. 65-68, May 2003.
[8] Peter H. -Y. Wu,“On the complexity of turbo decoding algorithms,” VTC 2001
Spring, Vol. 2, pp. 1439-1443, 2001.
[9] A. Tarable, S. Benedetto, and G. Montorsi, “Mapping interleaving laws to parallel
turbo and LDPC decoder architectures,” IEEE Trans. Inf. Theory, vol. 50, no. 9, pp.
2002.2009, Sep. 2004.
[10] D. Gnaedig, E. Boutillon, M. Jezequel, V. Gaudet, and P. Gulak, “On multiple
slice turbo codes,” in Proc. 3rd Int. Symp. on Turbo Codes and Related Topics, Brest,
France, pp. 343-346, Sep. 2003.
[11] C. Berrou, S. Kerouedan Y. Saouter, C. Douillard, and M. Jezequel, “Designing
good permutations for turbo codes: towards a single model,” in Proc. International
Conference on Communications, Paris, France, vol. 1, pp. 341-345, Jun. 2004.
[12] L. Dinoi and S. Benedetto, “Variable-size interleaver design for parallel turbo de-
coder architecture,” IEEE Transactions on Communications, vol. 53, no. 11, pp. 1833-
1840, Nov. 2005.
[13] S. Benedetto and G. Montorsi, “Unveiling turbo codes: some results on parallel
47
concatenated coding schemes,” IEEE Transactions on Information Theory, Vol. 42,
pp. 409-428, Mar. 1996.
[14] R. Garello, P. Pierleoni, and S. Benedetto, “Computing the free distance of turbo
codes and serially concatenated codes with interleavers: Algorithms and applica-
tions,” IEEE J. Sel. Areas Commun., vol. 19, pp. 800.812, May 2001.
[15] E. Rosnes, �yvind Ytrehus, “Improved Algorithms for the Determination of
Turbo-Code Weight Distributions,” IEEE Transactions on Communications, vol. 53,
no. 1, pp. 20-26, Jan. 2005.
[16] “3rd generation partnership project (3GPP) technical specification group: Univer-
sal mobile telecommunications system (UMTS); multiplexing and channel coding
(FDD), TS 25.212 v3.4.0,” Sep. 2000.
[17] 3GPP TSG RAN WG1-43, “Enhancement of Rel. 6 Turbo Code,” Nov. 2005.
[18] Hong-Yeop Song and Jeffrey H. Dinitz, “Tuscan Squares,” Part IV, Chapter 48 of
The CRC Handbook of Combinatorial Designs, edited by Charles J. Colbourn and
Jeffrey H. Dinitz, CRC Press, pp. 480-484, 1996.
[19] Charles J. Colbourn and Jeffrey H. Dinitz, “Latin Squares,” Part II, Chapter 1 of
The CRC Handbook of Combinatorial Designs, edited by Charles J. Colbourn and
Jeffrey H. Dinitz, CRC Press, pp. 97-110, 1996
[20] Wensong Chu, Solomon W. Golomb and Hong-Yeop Song, “Tuscan Squares,” Part
IV, Chapter 63 of The CRC Handbook of Combinatorial Designs, 2nd edition, edited
48
by Charles J. Colbourn and Jeffrey H. Dinitz, CRC Press,to be published in Nov.
2006.
[21] Solomon W. Golomb and Guang Gong, “Signal sets with low correlation,” Chapter
10 of Signal Design for Good Correlation, Cambridge University Press, pp. 344-353,
2005
[22] A. Lempel, H. Greenberger, “Families of Sequences with Optimal Hamming Cor-
relation Properties,” IEEE Transactions on Information Theory, vol. IT-20, no. 1, pp.
90-94, Jan. 1974.
[23] L. Dinoi and S. Benedetto, “Variable-Size Interleaver Design for Parallel Turbo
Decoder Architectures,” IEEE Transactions on Communications, vol. 53, pp. 1833-
1840, Nov. 2005.
[24] S. Dolinar, D. Divsalar, “Weight Distributions for Turbo Codes Using Random and
Nonrandom Permutations,” JPL TDA Progr. Rep., vol. 42-122, Aug. 1995.
49
����כ������
������������������ ���������������������� ��������
�������������
������������� ������������� !"���#$�%&�������'�(!�������)$���*�+, ��
���)$�����-���������*�.�/�0�.12&�.����3$� ����� ���������4������56�����4�
��+, ��0�.12���*�.�4������������7��� !"���#$��������/�� �!�"*��%+,���
����������*����89��&:����.��0/$כ0����#�3$!� �/�������"#!%$�;<����%��&�&��/�
��=�>?*��%'(�4��0�.@�����%$�;<%��&����%&�AB����CD%��&� !"�.�EF������
G������)��'���'*����������� ��4�*�.���#$#��(��/���HF ��0�.
������ ���� ���� ����������� ������ ������� �()���� 4�IJKL �0�. *! ����������
��� *+�M� ������ ������� �()��/� ��כ�+� /�NOG� 12 �()���� 4���-� ��� ��,,��� PQ#$
���RJ��+,,-� �!������ S�-.�/-����������������� � �4������������������������
"�%��&���������-��0��� ARP (almost regular permutation)G�'�� �� .��1�� (temporal
permutation)7� /01� �� .��1�� (spatial permutation)"� /������� 2T��������������PQ
#$���RJ��� !0�.1223���"2#$ �����4 !). �%&,����������������)�����1����)��4�
IJ� ����PQ#$4�כ����!). *�-3�0�.*!������"#!�%&�%��&�&��/���PQ#$��U'�� ��7���
/�-� �!�� ARPG� 2T��������������=�#$� $ ��������������������)��56( *�-3�
0�.
50
56( ��������������'�� �� .��1��"�-� �!��0��4#�&��/���PQ#$����S��� �����
��������)����� �*�-3�0�./01� �� .��1��"��������������� ��)��56( *��%M�"2#/01� ��
�����V�)�� ���/�+,) � W����� &��/��� PQ#$����#$KL*�5 #��NF)�� ����%��U'�� �� 7���
�������V�)�����-3�0�.)�����"#! ARPG�=�+X*��%%��&�&��/� 3207� 640����YZ��=���5
�)���������/�YZ.�6�7"#!����PQ����(�=�(SNR)���� 0.1dB��V���1����4�� �!�"*�-3�
0�.[������678���#��"#! 3GPP\]8 !��������)�� 4��������� ��"���� �*�-3�+,,U'PQ
�0��� ��� ���HF��� 89+� !"� 56�*�-3�+,, 4�NF'�( )$������#�� (AWGN)�� 4���*�-3�
0�.�������������� �������� ��=�9���+�)���� 4�*��%1�������)������������*�5HF ��
0�.=�9���+�)���� 4�*���������� !"� semi-Latin square type7� Kasami sequence set��
/�� �� Kasami type�� 56( *�-3�0�._�� *!������"#! ����������� S-��:;+���������)��
56( *�-3�0�.)�����"#!%��&�&��/� 640, FER 10−5 -��-3 ���� ARP��0� 0.1dB,<�-2#S�-.�
0�.
=�����S����.��:���� ��,������ � ��, ������ �����, ��������, /01� �� .��1��, '�� �� .��1��
51