3019-F.pdf

A Parallel Processing Algorithm for

Schnorr-Euchner Sphere Decoder

Han-Wen Liang1,2, Wei-Ho Chung2,, Hongke Zhang3, Sy-Yen Kuo1,3

Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan1

Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan2

College of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, China3

E-mail:*[email protected]

AbstractThis paper presents a category of detection schemesfor Multiple-Input Multiple-Output (MIMO) system called Par-allel Sphere Decoder (PSD). Compared to the conventional depth-first sphere decoder with Schnorr-Euchner enumeration (SE-SD), the proposed PSD algorithms use parallel computationsand achieve approximately 50% searching time reductions un-der the same amount of computations. Namely, in hardwareimplementation, the proposed work provides trade-off betweencomputational time and computing units. Simulations of theproposed algorithms in 44 16-QAM and 33 64-QAM MIMOsystems show the searching time reductions of the proposedalgorithms while maintaining ML performances.

I. INTRODUCTION

The Multiple-Input Multiple-Output (MIMO) system has

attracted tremendous research interests since it was proposed

[1]. The MIMO system is novel in its capability to exploit

the spacial diversity to obtain the increased channel capacity

without demanding more spectrum. The receiver of MIMO

system observes a linear superposition of transmitted symbols

from each transmitting antenna. How to detect the coupled

symbols is an interesting issue and is widely investigated

[2]. The optimal detection of the transmitted symbols can

be viewed as a discrete closest point search problem [3].

The complexity of the optimal detection algorithm grows

exponentially with the number of transmitting antennas. This

optimal solution can be achieved by sphere decoding algorithm

with reduced complexity.

The sphere decoding algorithm imposes a preprocessing

called QR decomposition to convert MIMO detection to a tree

search problem. According to the search rules in the decision

tree, the sphere decoding algorithms can be categorized into

three types.

Depth-First: The typical depth-first algorithm firstly

searches the nodes in the tree vertically (from top layer to

bottom layer) for the best solution. A top-to-bottom path

is searched in one search cycle, then depth-first algorithm

moves horizontally and searches the other top-to-bottom

Acknowledgment: This research was supported by National Science Coun-cil, Taiwan under Grant NSC 99-2221-E-002-106-MY3 and NSC 100-2221-E-001-004, National Natural Science Foundation of China (NSFC) under Grant60833002, and 111 Project under Grant B08002.

path till the best path is obtained [4]. The typical depth-

first algorithm generates an optimal detection. However,

it suffers form long and uncertain terminational time.

Breadth-First: In the same layer of the decision tree, the

best K nodes are kept since which are considered as the

candidates of resulting the best path. The breadth-first

algorithm searches the best path in the kept nodes. As

the K is as large as the number of nodes over a layer,

this algorithm achieves optimal detection. However, the

K is usually chosen to be small to reduce complexity

[5]. This algorithm is preferable in terms of hardware

implementation since it has fixed complexity and memory

usage.

Best-First: The tree search rule of best-first algorithm is

to search the neighboring nodes around the current path

[6]. Thus, this algorithm terminates earlier than the first

algorithm with high probability while keeping the optimal

performance. However, the essential disadvantage of this

algorithm is the need of large memories.

In this paper, we propose three algorithms to attain optimal

detection with reduced searching time compared to the depth-

first algorithm. The proposed algorithms are based on a better

complex-to-real conversion, which benefits to allow parallel

computations in tree search for sphere decoding. The main

contribution of this paper is to combine such complex-to-

real conversion and the Schnorr-Euchner (SE) algorithm. This

paper applies the proposed technique in depth-first sphere

decoder, which is called sphere decoder (SD) for simplicity

in the succeeding paragraphs, to demonstrate the reduction of

time. However, the proposed technique can be applied in other

tree search rules as well such as the breadth-first or hybrid tree

search rules [5]. In general, other tree search rules sacrifice

the performance to reduce complexity. We apply the proposed

technique in SD to show that the optimality is not sacrificed

when the proposed technique is applied.

The rest of this paper is organized as follows: Sec. II

models the MIMO system and defines the notations, where the

related work is also described. The preprocessing of proposed

parallel sphere decoders (PSD) is introduced in Sec. III. The

following subsections III-A, III-B and III-C give the details of

2012 IEEE Wireless Communications and Networking Conference: PHY and Fundamentals978-1-4673-0437-5/12/$31.00 2012 IEEE 613

proposed three algorithms, respectively. The Sec. IV shows

the simulation results of the proposed work compared to

conventional SD. Finally, conclusion of this paper is given

in Sec. V.

II. SYSTEM MODEL AND PREVIOUS WORK

We consider Nt transmitting antennas and Nr receiving

antennas in the MIMO system

y =

Hx + n, (1)

where x = [x1 x2 . . . xNt ]T

is the transmitted symbol vector.

The H is a Nr Nt channel matrix with elements hji, whichare assumed to be independent identically distributed (i.i.d)

zero-mean complex Gaussian random variables and each has

unit variance. The n = [n1 n2 . . . nNr ]T

is the additive white

Gaussian noise (AWGN) and n CNr . The = ESNtN0

is

the signal to noise ratio (SNR) where Es denotes the energy

per transmitted symbol and N0 denotes the noise power. The

y = [y1 y2 . . . yNr ]T

is the received symbol vector. The

transmitted symbol is assumed to be a M2-ary symbol.

The optimal detection metric for the received symbol y is

x = arg minxNt

||y Hx||2, (2)

where denotes the set of M2-ary constellation points and

|Nt | = M2Nt . The x represents the detected symbol vector.The complexity of searching by (2) grows exponentially

with the number of transmitting antennas. To achieve optimal

detection, the SD is adopted with reduced complexity. The SD

includes two steps:

1) Preprocessing Applying QR decomposition to H ,

where Q is an orthonormal matrix (QHQ = I). TheH represents conjugate transpose. The R is an upper

triangular matrix.

2) Tree Search The (2) is transformed to

x = arg minxNt

||y Rx||2, (3)

where y = QHy. The R converts the exhaustive search in (2)into a tree search as in Fig. 1. In the decision tree, each node

in layer k represents an element in k. The branch metric in

decision tree is defined as

Tb(xNt , xNt1, , xk) ,(

yk Ntl=k

rk,lxl

)2, (4)

where yk is the k-th element in y and rij is the element in

R. The SD algorithm searches from the root of the decision

tree to the leafnodes. Before searching layer k, SD algorithm

computes Tb(xNt , xNt1, , xk) for all nodes in layer k,and sorts these nodes by their Tb. The order is used to give

searching priority of the nodes over layer k. The cumulative

layer 1

layer N

Root

Leafnodes

t

Fig. 1. Decision tree while SD is used to achieve ML solution.

metric in decision tree is defined as

Tc(xNt , xNt1, , xk) ,Tb(xNt)+Tb(xNt , xNt1)

...

+Tb(xNt , xNt1, , xk). (5)The SD algorithm firstly makes an initial decision of xNt in

layer Nt, then it moves node-by-node and arrives at a leafnode.

The Tc of the leafnode is kept as an initial radius. If SD

algorithm visits another leafnode which has smallest Tc over

visited leafnodes, the radius is replaced by the new Tc. While

SD algorithm visits a node whose Tc exceeds the radius, this

algorithm will not visit the nodes extended from this node.

More details of SD can be found in [3].

The received complex-value symbol vector is often con-

verted to real-value for the feasibility of hardware implemen-

tation of SD [7], [5]. One of the usual conversion used in [7],

[5] and many other works is expressed as[ (y)(y)

]=

[ (H) (H)(H) (H)

] [ (x)(x)

]+

[ (n)(n)

],

(6)

where ( . ) and ( . ) represent the real and imaginary part of( . ), respectively. However, the other form of complex-to-realconversion [8] is proposed as

(y1)(y1)

...

=

(h11) (h11) . . .(h11) (h11) . . .

.... . .

(x1)(x1)

...

+

(n1)(n1)

...

. (7)

The conversion using (7) leads to an upper triangular matrix

with regular zero elements, which will be detailed latter. For

simplicity, the real-value matrices after applying conversion

(7) are denoted as

yr =

Hrxr + nr. (8)

The author of [8] adopted another preprocessing rather than

QR decomposition to generate an upper triangular matrix. The

preprocessing decomposes Hr to the product of an upper

triangular matrix and a non-unitary matrix. However, the

non-unitary matrix colors the noise, which complicates the

detection problem.614

III. PROPOSED ALGORITHM FOR PSD

The preprocessing for proposed PSD is divided into two

steps:

1) Complex-to-real Conversion PSD converts the received

complex-value symbol vector and channel matrix to real-

value yr and Hr by using (7).

2) QR Decomposition Applying QR decomposition to

Hr such that QrRp = Hr, where Qr is a real-valueorthonormal matrix, Rp is a real-value upper triangular

matrix. Let Hr = [h1, h2, , h2Nt ]. The QR de-composition processing can be expressed as

u1 = h1, e1 =u1

||u1|| ,

uk = hk k1l=1

el,hkel, ek = uk||uk|| ,

for k = 2, , 2Nt, (9)where . denotes the inner product. As a result, thereal-value upper triangular matrix is

Rp =

e1, h1 0 e1, h3 e1, h4 0 e2,h2 e2, h3 e2, h4 0 0 e3, h3 00 0 0 e4, h4...

.... . .

,

(10)

where zeros regularly appear above even diagonal ele-

ments (proved in [9]). This property is the same as the

preprocessing result of [8], although QR decomposition

is not adopted in [8].

The preprocessing applies for the proposed PSD. According

to different tree search rules, three versions of PSD are

proposed.

A. The first PSD

If (10) is used for SD, we observe that the Tb in 2k-thlayer is independent with the Tb in (2k 1)-th layer fork = 1, 2, , Nt. In the decision tree, we call layer 2k andlayer 2k 1 as layer pair k. Layer 2k, layer 2k 1 and layerpair k are denoted as Lke , Lko and Lk respectively. Since theTb for Lke and for Lko are independent, we can compute thetwo Tbs simultaneously. This arrangement enables the parallel

processing in SD. With parallel processing, the computational

complexity is maintained at the same level but the compu-

tational time is reduced to approximately half of the original

SD. The simulation for the amount of increased computational

complexity and the amount of reduced time will be given latter.

The algorithm for tree search of the first proposed PSD is

given as below:

1) The PSD searches from the root to leafnodes. It visits

two nodes simultaneously; one node is in Lko and theother is in Lke . The two nodes are defined as a nodepair. The PSD moves from node pair to node pair. If

the visited node pair includes a leafnode, the PSD will

compare the new Tc with radius and replace the radius

with the smaller one. If the PSD visits a node pair whose

Tc exceeds the radius, the PSD will not visit the node

pairs extended from this node pair, and will visit the next

priority or go back to the previous layer pair to visit next

priority. If the PSD visits an unvisited layer pair Lk, thevisiting priority of the node pairs in Lk is detailed inthe following steps. The PSD algorithm terminates until

the minimum Tc in decision tree is derived.

2) When the PSD moves to an unvisited layer pair Lk, itcomputes the Tbs for Lke and for Lko simultaneously.

3) Sort the nodes in Lke and in Lko with their Tbs inascending order, respectively; denote the sorted nodes

in Lke as n1, , nM and the sorted nodes in Lko asm1, ,mM .

4) The visiting priority of node pairs in Lk is given in Fig.2(a), i.e. (n1,m1), (n1,m2), , (n1,mM ), (n2,m1), , (n2,mM ), , (nM ,mM ). If Tc of a node pair(np, mk) exceeds the sphere radius, the node pairs(ni,mj : i p, j k) are removed from visitinglist.

n1 n2

m1

m2

n3

m3

...

...

...

(a) The visiting pri-ority of (ni, mj ) inthe first PSD.

n1 n2

m1

m2

n3

m3

...

...

(b) The visiting pri-ority of (ni, mj ) inthe second PSD.

nt mt( )

1

2

3

4

5

...

...

.

(c) The priority ofadditive node pairs(nu, mv) for com-parison.

Fig. 2. Visiting priorities.

B. The second PSD

The SE algorithm implies that giving higher visiting priority

to the node with less Tb can reduce searching time of SD. The

first PSD adopts this algorithm. The second PSD extends this

algorithm to the layer pair; it gives higher visiting priority to

the node pair with less summation of corresponding Tbs. The

summation of the two Tbs is not needed; instead, the statistical

order of the summation of Tbs is used.

A simple example of the second PSD is shown in Fig.

3, where the nodes in Lke and in Lko are sorted and labeledfor k = Nt and M = 4. The node pairs with the first twovisiting priority are (n1,m1) and (n1, m2); in contrast to the

first PSD which gives the next visiting priority to (n1,m3), the

second PSD gives the next visiting priority to (n2,m1) whose

summation of Tbs is statistically smaller.

The algorithm for the tree search of the second PSD is given

as below:

1) The first three steps are kept the same as in the first

PSD.

2) The visiting priority of node pairs in Lk is given in Fig.2(b), i.e. (n1, m1),(n1,m2),(n2,m1), . If Tc of a nodepair (np,mk) exceeds the sphere radius, the node pairs615

nn

m m mm

1

11 23

2

Fig. 3. The node pairs with different visiting priorities in PSD and PSD2.

(ni,mj : i p, j k) are removed from visitinglist.

C. The third PSD

In the decision tree, there are two searching directions for

PSD including vertical search and horizontal search. When

PSD moves from an upper layer pair to an unvisited layer

pair, it is called vertical search. The PSD has to compute

the Tbs for all nodes in the unvisited layer pair in vertical

search, it is obvious that the computational complexity is high

in vertical search. When PSD visits a node pair whose Tcexceeds sphere radius, the PSD will visit the node pair over

the same layer pair with next visiting priority; it is called

horizontal search. The PSD only has to compare the Tc of the

visited node pair with sphere radius to decide next searching

direction. The computational complexity of horizontal search

is low since the Tbs of visited node pair are already computed;

there are only addition and comparison in horizontal search.

Thus, the third PSD is proposed to add the horizontal search.

The design aims to reduce the terminational time in PSD. The

original processing in the horizontal search is to visit a node

pair according to visiting priority and compare the Tc of this

node pair with sphere radius; in the additive processing, we

compare one more node pairs Tc with sphere radius. If the Tcexceeds sphere radius, the PSD will not visit the node pairs

extended from this node pair; if the Tc is less than sphere

radius, the PSD keeps this node pair and will visit this node

pair by the visiting priority described in the second PSD. The

algorithm for tree search of the third PSD is given below:

1) The first two steps are kept the same as in the first PSD.

2) The unvisited layer pair is denoted as Lk, and theprevious layer pair is denoted as Lk+1. Compare thesummation of Tc of Lk+1 and Tb of Lko with sphere ra-dius. Denote the node whose corresponding summation

exceeds sphere radius as np. Also compare the Tc of Lkewith sphere radius. Denote the node whose Tc exceeds

sphere radius as mk. The node pairs {(ni,mj) : i p}and {(ni,mj) : j k} and their correspondingextended node pairs are removed from visiting list.

3) This step is the same as the third step in the first PSD

except that the nodes out of visiting list are not needed

to be sorted.

4) This step is the same as the second step in the second

PSD.

5) When the third PSD moves horizontally, it compares

an additive node pairs Tc with sphere radius. Denote

the sorted node pair which is to be compared additively

as (nu,mv). The comparing priority of (nu,mv) isdetermined by the two-phase rule:

a) Denote (na,ma) as the node pair in visiting listsatisfying that Tb of na is the largest over Tb of niand Tb of ma is the largest over Tb of mj . Further,

denote (nb, mb) as the node pair in the visiting listsatisfying the Tb of nb is the least over Tb of niand Tb of mb is the least over Tb of mj . The values

(u, v) of (nu,mv) in this phase is determined by

u = v = a + b2

, (11)

where . is a ceiling function. If Tc of (nu,mv)exceeds sphere radius, remove the node pairs

(ni,mj : i u, j v) from visiting list and re-place (na,ma) with (nu1,mv1) in next additivecomparison; if Tc of (nu,mv) does not exceedssphere radius, replace (nb,mb) with (nu,mv) innext additive comparison. Repeat this process until

a = b, then this phase ends and enters next phase.b) Let a = b = t, the comparison priority of next

node pair (nu, mv) in this phase is determined bythe rule shown in Fig. 2(c). If Tc of the node pair

(np,mk) exceeds sphere radius, remove the nodepairs (ni,mj : i p, j k) from visiting list.

IV. SIMULATION RESULT

To compare the terminational time of SD, the average

visited node pairs are adopted as the measure. A visited node

pair is the nodes whose Tbs can be computed simultaneously.

In the conventional SD, we can only compute Tb of a node

at a time. Thus, a node pair represents merely a node in

conventional SD. In our simulations, the memory is used

to record the computed Tbs and computed priority to avoid

repeated computation. The conventional SD is converted to

real-value system by (6) where the parallel algorithm can not

be applied.

The average visited node pairs in different SNR of the

proposed PSD and conventional SD are shown in Fig. 4 and

Fig. 5 for 4 4 16-QAM MIMO and 3 3 64-QAM MIMOrespectively. The simulation result shows that terminational

time is greatly reduced in proposed PSD as compared to

conventional SD.

To measure the computational complexity by the average

number of real-value multiplications, we assume multiplier

dominates the computational complexity and ignore addition

and sorting processing. The simulations of computational

complexity are shown in Fig. 6, for 4 4 16-QAM MIMOand in Fig. 7 for 3 3 64-QAM MIMO.

These results imply the computational complexity of pro-

posed PSD are slightly higher than conventional SD; however,616

0 2 4 6 8 10 120

50

100

150

200

250

SNR (dB)

ave

rage

visi

ted

node

pai

rsSESDPSDPSD2PSD3

Fig. 4. The average visited node pairs in 4 4, 16-QAM system

0 2 4 6 8 10 120

50

100

150

200

250

300

350

400

450

SNR (dB)

ave

rage

visi

ted

node

pai

rs

SESDPSDPSD2PSD3

Fig. 5. The average visited node pairs in 3 3, 64-QAM system

the difference is not obvious. It is noticeable that the com-

putational complexity simulation result of the second PSD

is the same as the third PSD. In fact, the third PSD has

more additions and comparisons than the second PSD; but

both additions and comparisons are ignored for computational

complexity.

V. CONCLUSION

In this paper we proposed three PSDs, which have about the

same computational complexity but about only half termina-

tional time compared to the conventional SD. The simulation

results show that the reduced terminational time is significant,

particularly in low SNR. Under the same computational com-

plexity, the proposed PSD uses two computing units simul-

taneously to reduce terminational time for SD. This property

is not achievable in conventional SD since the conventional

SD is not suitable for parallel processing. In the second and

third proposed PSDs, the priority of visited node pairs may be

optimized, which are the possible future directions.

0 2 4 6 8 10 120

500

1000

1500

2000

2500

SNR (dB)

ave

rage

num

ber o

f mul

tiplic

atio

ns

SESDPSDPSD2PSD3

Fig. 6. The computational complexity in 4 4, 16-QAM system

0 2 4 6 8 10 120

1000

2000

3000

4000

5000

6000

SNR (dB)

ave

rage

num

ber o

f mul

tiplic

atio

ns

SESDPSDPSD2PSD3

Fig. 7. The computational complexity in 3 3, 64-QAM system

REFERENCES

[1] G. Foschini, Layered space-time architecture for wireless communicationin a fading environment when using multi-element antennas, Bell labstechnical journal, vol. 1, no. 2, pp. 4159, 1996.

[2] E. G. Larsson, Mimo detection methods: How they work [lecture notes],IEEE Signal Process. Mag., vol. 26, no. 3, pp. 9195, May 2009.

[3] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, Closest point search inlattices, IEEE Trans. Information Theory, vol. 48, no. 8, pp. 22012214,Aug. 2002.

[4] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, andH. Bolcskei, VLSI implementation of mimo detection using the spheredecoding algorithm, IEEE J. Solid-State Circuits, vol. 40, no. 7, pp.15661577, July 2005.

[5] Z. Guo and P. Nilsson, Algorithm and implementation of the k-bestsphere decoding for mimo detection, IEEE J. Sel. Areas Commun.,vol. 24, no. 3, pp. 491503, Mar. 2006.

[6] C.-H. Liao, T.-P. Wang, and T.-D. Chiueh, A 74.8 mw soft-outputdetector ic for 8 8 spatial-multiplexing mimo communications, IEEEJ. Solid-State Circuits, vol. 45, no. 2, pp. 411421, Feb. 2010.

[7] G. Zhan and P. Nilsson, Reduced complexity schnorr-euchner decodingalgorithms for mimo systems, IEEE Commun. Lett., vol. 8, no. 5, pp.286288, May 2004.

[8] M. Siti and M. P. Fitz, A novel soft-output layered orthogonal latticedetector for multiple antenna communications, in Proc. IEEE Int. Conf.Commun. 2006 (ICC06), vol. 4, 2006, pp. 16861691.

[9] L. Azzam and E. Ayanoglu, Reduced complexity sphere decoding via areordered lattice representation, IEEE Trans. Commun., vol. 57, no. 9,pp. 2564 2569, Sept. 2009.617

3019-F.pdf

Documents

Transcript of 3019-F.pdf