3019-F.pdf
Transcript of 3019-F.pdf
-
A Parallel Processing Algorithm for
Schnorr-Euchner Sphere Decoder
Han-Wen Liang1,2, Wei-Ho Chung2,, Hongke Zhang3, Sy-Yen Kuo1,3
Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan1
Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan2
College of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, China3
E-mail:*[email protected]
AbstractThis paper presents a category of detection schemesfor Multiple-Input Multiple-Output (MIMO) system called Par-allel Sphere Decoder (PSD). Compared to the conventional depth-first sphere decoder with Schnorr-Euchner enumeration (SE-SD), the proposed PSD algorithms use parallel computationsand achieve approximately 50% searching time reductions un-der the same amount of computations. Namely, in hardwareimplementation, the proposed work provides trade-off betweencomputational time and computing units. Simulations of theproposed algorithms in 44 16-QAM and 33 64-QAM MIMOsystems show the searching time reductions of the proposedalgorithms while maintaining ML performances.
I. INTRODUCTION
The Multiple-Input Multiple-Output (MIMO) system has
attracted tremendous research interests since it was proposed
[1]. The MIMO system is novel in its capability to exploit
the spacial diversity to obtain the increased channel capacity
without demanding more spectrum. The receiver of MIMO
system observes a linear superposition of transmitted symbols
from each transmitting antenna. How to detect the coupled
symbols is an interesting issue and is widely investigated
[2]. The optimal detection of the transmitted symbols can
be viewed as a discrete closest point search problem [3].
The complexity of the optimal detection algorithm grows
exponentially with the number of transmitting antennas. This
optimal solution can be achieved by sphere decoding algorithm
with reduced complexity.
The sphere decoding algorithm imposes a preprocessing
called QR decomposition to convert MIMO detection to a tree
search problem. According to the search rules in the decision
tree, the sphere decoding algorithms can be categorized into
three types.
Depth-First: The typical depth-first algorithm firstly
searches the nodes in the tree vertically (from top layer to
bottom layer) for the best solution. A top-to-bottom path
is searched in one search cycle, then depth-first algorithm
moves horizontally and searches the other top-to-bottom
Acknowledgment: This research was supported by National Science Coun-cil, Taiwan under Grant NSC 99-2221-E-002-106-MY3 and NSC 100-2221-E-001-004, National Natural Science Foundation of China (NSFC) under Grant60833002, and 111 Project under Grant B08002.
path till the best path is obtained [4]. The typical depth-
first algorithm generates an optimal detection. However,
it suffers form long and uncertain terminational time.
Breadth-First: In the same layer of the decision tree, the
best K nodes are kept since which are considered as the
candidates of resulting the best path. The breadth-first
algorithm searches the best path in the kept nodes. As
the K is as large as the number of nodes over a layer,
this algorithm achieves optimal detection. However, the
K is usually chosen to be small to reduce complexity
[5]. This algorithm is preferable in terms of hardware
implementation since it has fixed complexity and memory
usage.
Best-First: The tree search rule of best-first algorithm is
to search the neighboring nodes around the current path
[6]. Thus, this algorithm terminates earlier than the first
algorithm with high probability while keeping the optimal
performance. However, the essential disadvantage of this
algorithm is the need of large memories.
In this paper, we propose three algorithms to attain optimal
detection with reduced searching time compared to the depth-
first algorithm. The proposed algorithms are based on a better
complex-to-real conversion, which benefits to allow parallel
computations in tree search for sphere decoding. The main
contribution of this paper is to combine such complex-to-
real conversion and the Schnorr-Euchner (SE) algorithm. This
paper applies the proposed technique in depth-first sphere
decoder, which is called sphere decoder (SD) for simplicity
in the succeeding paragraphs, to demonstrate the reduction of
time. However, the proposed technique can be applied in other
tree search rules as well such as the breadth-first or hybrid tree
search rules [5]. In general, other tree search rules sacrifice
the performance to reduce complexity. We apply the proposed
technique in SD to show that the optimality is not sacrificed
when the proposed technique is applied.
The rest of this paper is organized as follows: Sec. II
models the MIMO system and defines the notations, where the
related work is also described. The preprocessing of proposed
parallel sphere decoders (PSD) is introduced in Sec. III. The
following subsections III-A, III-B and III-C give the details of
2012 IEEE Wireless Communications and Networking Conference: PHY and Fundamentals978-1-4673-0437-5/12/$31.00 2012 IEEE 613
-
proposed three algorithms, respectively. The Sec. IV shows
the simulation results of the proposed work compared to
conventional SD. Finally, conclusion of this paper is given
in Sec. V.
II. SYSTEM MODEL AND PREVIOUS WORK
We consider Nt transmitting antennas and Nr receiving
antennas in the MIMO system
y =
Hx + n, (1)
where x = [x1 x2 . . . xNt ]T
is the transmitted symbol vector.
The H is a Nr Nt channel matrix with elements hji, whichare assumed to be independent identically distributed (i.i.d)
zero-mean complex Gaussian random variables and each has
unit variance. The n = [n1 n2 . . . nNr ]T
is the additive white
Gaussian noise (AWGN) and n CNr . The = ESNtN0
is
the signal to noise ratio (SNR) where Es denotes the energy
per transmitted symbol and N0 denotes the noise power. The
y = [y1 y2 . . . yNr ]T
is the received symbol vector. The
transmitted symbol is assumed to be a M2-ary symbol.
The optimal detection metric for the received symbol y is
x = arg minxNt
||y Hx||2, (2)
where denotes the set of M2-ary constellation points and
|Nt | = M2Nt . The x represents the detected symbol vector.The complexity of searching by (2) grows exponentially
with the number of transmitting antennas. To achieve optimal
detection, the SD is adopted with reduced complexity. The SD
includes two steps:
1) Preprocessing Applying QR decomposition to H ,
where Q is an orthonormal matrix (QHQ = I). TheH represents conjugate transpose. The R is an upper
triangular matrix.
2) Tree Search The (2) is transformed to
x = arg minxNt
||y Rx||2, (3)
where y = QHy. The R converts the exhaustive search in (2)into a tree search as in Fig. 1. In the decision tree, each node
in layer k represents an element in k. The branch metric in
decision tree is defined as
Tb(xNt , xNt1, , xk) ,(
yk Ntl=k
rk,lxl
)2, (4)
where yk is the k-th element in y and rij is the element in
R. The SD algorithm searches from the root of the decision
tree to the leafnodes. Before searching layer k, SD algorithm
computes Tb(xNt , xNt1, , xk) for all nodes in layer k,and sorts these nodes by their Tb. The order is used to give
searching priority of the nodes over layer k. The cumulative
layer 1
layer N
Root
Leafnodes
t
Fig. 1. Decision tree while SD is used to achieve ML solution.
metric in decision tree is defined as
Tc(xNt , xNt1, , xk) ,Tb(xNt)+Tb(xNt , xNt1)
...
+Tb(xNt , xNt1, , xk). (5)The SD algorithm firstly makes an initial decision of xNt in
layer Nt, then it moves node-by-node and arrives at a leafnode.
The Tc of the leafnode is kept as an initial radius. If SD
algorithm visits another leafnode which has smallest Tc over
visited leafnodes, the radius is replaced by the new Tc. While
SD algorithm visits a node whose Tc exceeds the radius, this
algorithm will not visit the nodes extended from this node.
More details of SD can be found in [3].
The received complex-value symbol vector is often con-
verted to real-value for the feasibility of hardware implemen-
tation of SD [7], [5]. One of the usual conversion used in [7],
[5] and many other works is expressed as[ (y)(y)
]=
[ (H) (H)(H) (H)
] [ (x)(x)
]+
[ (n)(n)
],
(6)
where ( . ) and ( . ) represent the real and imaginary part of( . ), respectively. However, the other form of complex-to-realconversion [8] is proposed as
(y1)(y1)
...
=
(h11) (h11) . . .(h11) (h11) . . .
.... . .
(x1)(x1)
...
+
(n1)(n1)
...
. (7)
The conversion using (7) leads to an upper triangular matrix
with regular zero elements, which will be detailed latter. For
simplicity, the real-value matrices after applying conversion
(7) are denoted as
yr =
Hrxr + nr. (8)
The author of [8] adopted another preprocessing rather than
QR decomposition to generate an upper triangular matrix. The
preprocessing decomposes Hr to the product of an upper
triangular matrix and a non-unitary matrix. However, the
non-unitary matrix colors the noise, which complicates the
detection problem.614
-
III. PROPOSED ALGORITHM FOR PSD
The preprocessing for proposed PSD is divided into two
steps:
1) Complex-to-real Conversion PSD converts the received
complex-value symbol vector and channel matrix to real-
value yr and Hr by using (7).
2) QR Decomposition Applying QR decomposition to
Hr such that QrRp = Hr, where Qr is a real-valueorthonormal matrix, Rp is a real-value upper triangular
matrix. Let Hr = [h1, h2, , h2Nt ]. The QR de-composition processing can be expressed as
u1 = h1, e1 =u1
||u1|| ,
uk = hk k1l=1
el,hkel, ek = uk||uk|| ,
for k = 2, , 2Nt, (9)where . denotes the inner product. As a result, thereal-value upper triangular matrix is
Rp =
e1, h1 0 e1, h3 e1, h4 0 e2,h2 e2, h3 e2, h4 0 0 e3, h3 00 0 0 e4, h4...
.... . .
,
(10)
where zeros regularly appear above even diagonal ele-
ments (proved in [9]). This property is the same as the
preprocessing result of [8], although QR decomposition
is not adopted in [8].
The preprocessing applies for the proposed PSD. According
to different tree search rules, three versions of PSD are
proposed.
A. The first PSD
If (10) is used for SD, we observe that the Tb in 2k-thlayer is independent with the Tb in (2k 1)-th layer fork = 1, 2, , Nt. In the decision tree, we call layer 2k andlayer 2k 1 as layer pair k. Layer 2k, layer 2k 1 and layerpair k are denoted as Lke , Lko and Lk respectively. Since theTb for Lke and for Lko are independent, we can compute thetwo Tbs simultaneously. This arrangement enables the parallel
processing in SD. With parallel processing, the computational
complexity is maintained at the same level but the compu-
tational time is reduced to approximately half of the original
SD. The simulation for the amount of increased computational
complexity and the amount of reduced time will be given latter.
The algorithm for tree search of the first proposed PSD is
given as below:
1) The PSD searches from the root to leafnodes. It visits
two nodes simultaneously; one node is in Lko and theother is in Lke . The two nodes are defined as a nodepair. The PSD moves from node pair to node pair. If
the visited node pair includes a leafnode, the PSD will
compare the new Tc with radius and replace the radius
with the smaller one. If the PSD visits a node pair whose
Tc exceeds the radius, the PSD will not visit the node
pairs extended from this node pair, and will visit the next
priority or go back to the previous layer pair to visit next
priority. If the PSD visits an unvisited layer pair Lk, thevisiting priority of the node pairs in Lk is detailed inthe following steps. The PSD algorithm terminates until
the minimum Tc in decision tree is derived.
2) When the PSD moves to an unvisited layer pair Lk, itcomputes the Tbs for Lke and for Lko simultaneously.
3) Sort the nodes in Lke and in Lko with their Tbs inascending order, respectively; denote the sorted nodes
in Lke as n1, , nM and the sorted nodes in Lko asm1, ,mM .
4) The visiting priority of node pairs in Lk is given in Fig.2(a), i.e. (n1,m1), (n1,m2), , (n1,mM ), (n2,m1), , (n2,mM ), , (nM ,mM ). If Tc of a node pair(np, mk) exceeds the sphere radius, the node pairs(ni,mj : i p, j k) are removed from visitinglist.
n1 n2
m1
m2
n3
m3
...
...
...
(a) The visiting pri-ority of (ni, mj ) inthe first PSD.
n1 n2
m1
m2
n3
m3
...
...
(b) The visiting pri-ority of (ni, mj ) inthe second PSD.
nt mt( )
1
2
3
4
5
...
...
.
(c) The priority ofadditive node pairs(nu, mv) for com-parison.
Fig. 2. Visiting priorities.
B. The second PSD
The SE algorithm implies that giving higher visiting priority
to the node with less Tb can reduce searching time of SD. The
first PSD adopts this algorithm. The second PSD extends this
algorithm to the layer pair; it gives higher visiting priority to
the node pair with less summation of corresponding Tbs. The
summation of the two Tbs is not needed; instead, the statistical
order of the summation of Tbs is used.
A simple example of the second PSD is shown in Fig.
3, where the nodes in Lke and in Lko are sorted and labeledfor k = Nt and M = 4. The node pairs with the first twovisiting priority are (n1,m1) and (n1, m2); in contrast to the
first PSD which gives the next visiting priority to (n1,m3), the
second PSD gives the next visiting priority to (n2,m1) whose
summation of Tbs is statistically smaller.
The algorithm for the tree search of the second PSD is given
as below:
1) The first three steps are kept the same as in the first
PSD.
2) The visiting priority of node pairs in Lk is given in Fig.2(b), i.e. (n1, m1),(n1,m2),(n2,m1), . If Tc of a nodepair (np,mk) exceeds the sphere radius, the node pairs615
-
nn
m m mm
1
11 23
2
Fig. 3. The node pairs with different visiting priorities in PSD and PSD2.
(ni,mj : i p, j k) are removed from visitinglist.
C. The third PSD
In the decision tree, there are two searching directions for
PSD including vertical search and horizontal search. When
PSD moves from an upper layer pair to an unvisited layer
pair, it is called vertical search. The PSD has to compute
the Tbs for all nodes in the unvisited layer pair in vertical
search, it is obvious that the computational complexity is high
in vertical search. When PSD visits a node pair whose Tcexceeds sphere radius, the PSD will visit the node pair over
the same layer pair with next visiting priority; it is called
horizontal search. The PSD only has to compare the Tc of the
visited node pair with sphere radius to decide next searching
direction. The computational complexity of horizontal search
is low since the Tbs of visited node pair are already computed;
there are only addition and comparison in horizontal search.
Thus, the third PSD is proposed to add the horizontal search.
The design aims to reduce the terminational time in PSD. The
original processing in the horizontal search is to visit a node
pair according to visiting priority and compare the Tc of this
node pair with sphere radius; in the additive processing, we
compare one more node pairs Tc with sphere radius. If the Tcexceeds sphere radius, the PSD will not visit the node pairs
extended from this node pair; if the Tc is less than sphere
radius, the PSD keeps this node pair and will visit this node
pair by the visiting priority described in the second PSD. The
algorithm for tree search of the third PSD is given below:
1) The first two steps are kept the same as in the first PSD.
2) The unvisited layer pair is denoted as Lk, and theprevious layer pair is denoted as Lk+1. Compare thesummation of Tc of Lk+1 and Tb of Lko with sphere ra-dius. Denote the node whose corresponding summation
exceeds sphere radius as np. Also compare the Tc of Lkewith sphere radius. Denote the node whose Tc exceeds
sphere radius as mk. The node pairs {(ni,mj) : i p}and {(ni,mj) : j k} and their correspondingextended node pairs are removed from visiting list.
3) This step is the same as the third step in the first PSD
except that the nodes out of visiting list are not needed
to be sorted.
4) This step is the same as the second step in the second
PSD.
5) When the third PSD moves horizontally, it compares
an additive node pairs Tc with sphere radius. Denote
the sorted node pair which is to be compared additively
as (nu,mv). The comparing priority of (nu,mv) isdetermined by the two-phase rule:
a) Denote (na,ma) as the node pair in visiting listsatisfying that Tb of na is the largest over Tb of niand Tb of ma is the largest over Tb of mj . Further,
denote (nb, mb) as the node pair in the visiting listsatisfying the Tb of nb is the least over Tb of niand Tb of mb is the least over Tb of mj . The values
(u, v) of (nu,mv) in this phase is determined by
u = v = a + b2
, (11)
where . is a ceiling function. If Tc of (nu,mv)exceeds sphere radius, remove the node pairs
(ni,mj : i u, j v) from visiting list and re-place (na,ma) with (nu1,mv1) in next additivecomparison; if Tc of (nu,mv) does not exceedssphere radius, replace (nb,mb) with (nu,mv) innext additive comparison. Repeat this process until
a = b, then this phase ends and enters next phase.b) Let a = b = t, the comparison priority of next
node pair (nu, mv) in this phase is determined bythe rule shown in Fig. 2(c). If Tc of the node pair
(np,mk) exceeds sphere radius, remove the nodepairs (ni,mj : i p, j k) from visiting list.
IV. SIMULATION RESULT
To compare the terminational time of SD, the average
visited node pairs are adopted as the measure. A visited node
pair is the nodes whose Tbs can be computed simultaneously.
In the conventional SD, we can only compute Tb of a node
at a time. Thus, a node pair represents merely a node in
conventional SD. In our simulations, the memory is used
to record the computed Tbs and computed priority to avoid
repeated computation. The conventional SD is converted to
real-value system by (6) where the parallel algorithm can not
be applied.
The average visited node pairs in different SNR of the
proposed PSD and conventional SD are shown in Fig. 4 and
Fig. 5 for 4 4 16-QAM MIMO and 3 3 64-QAM MIMOrespectively. The simulation result shows that terminational
time is greatly reduced in proposed PSD as compared to
conventional SD.
To measure the computational complexity by the average
number of real-value multiplications, we assume multiplier
dominates the computational complexity and ignore addition
and sorting processing. The simulations of computational
complexity are shown in Fig. 6, for 4 4 16-QAM MIMOand in Fig. 7 for 3 3 64-QAM MIMO.
These results imply the computational complexity of pro-
posed PSD are slightly higher than conventional SD; however,616
-
0 2 4 6 8 10 120
50
100
150
200
250
SNR (dB)
ave
rage
visi
ted
node
pai
rsSESDPSDPSD2PSD3
Fig. 4. The average visited node pairs in 4 4, 16-QAM system
0 2 4 6 8 10 120
50
100
150
200
250
300
350
400
450
SNR (dB)
ave
rage
visi
ted
node
pai
rs
SESDPSDPSD2PSD3
Fig. 5. The average visited node pairs in 3 3, 64-QAM system
the difference is not obvious. It is noticeable that the com-
putational complexity simulation result of the second PSD
is the same as the third PSD. In fact, the third PSD has
more additions and comparisons than the second PSD; but
both additions and comparisons are ignored for computational
complexity.
V. CONCLUSION
In this paper we proposed three PSDs, which have about the
same computational complexity but about only half termina-
tional time compared to the conventional SD. The simulation
results show that the reduced terminational time is significant,
particularly in low SNR. Under the same computational com-
plexity, the proposed PSD uses two computing units simul-
taneously to reduce terminational time for SD. This property
is not achievable in conventional SD since the conventional
SD is not suitable for parallel processing. In the second and
third proposed PSDs, the priority of visited node pairs may be
optimized, which are the possible future directions.
0 2 4 6 8 10 120
500
1000
1500
2000
2500
SNR (dB)
ave
rage
num
ber o
f mul
tiplic
atio
ns
SESDPSDPSD2PSD3
Fig. 6. The computational complexity in 4 4, 16-QAM system
0 2 4 6 8 10 120
1000
2000
3000
4000
5000
6000
SNR (dB)
ave
rage
num
ber o
f mul
tiplic
atio
ns
SESDPSDPSD2PSD3
Fig. 7. The computational complexity in 3 3, 64-QAM system
REFERENCES
[1] G. Foschini, Layered space-time architecture for wireless communicationin a fading environment when using multi-element antennas, Bell labstechnical journal, vol. 1, no. 2, pp. 4159, 1996.
[2] E. G. Larsson, Mimo detection methods: How they work [lecture notes],IEEE Signal Process. Mag., vol. 26, no. 3, pp. 9195, May 2009.
[3] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, Closest point search inlattices, IEEE Trans. Information Theory, vol. 48, no. 8, pp. 22012214,Aug. 2002.
[4] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, andH. Bolcskei, VLSI implementation of mimo detection using the spheredecoding algorithm, IEEE J. Solid-State Circuits, vol. 40, no. 7, pp.15661577, July 2005.
[5] Z. Guo and P. Nilsson, Algorithm and implementation of the k-bestsphere decoding for mimo detection, IEEE J. Sel. Areas Commun.,vol. 24, no. 3, pp. 491503, Mar. 2006.
[6] C.-H. Liao, T.-P. Wang, and T.-D. Chiueh, A 74.8 mw soft-outputdetector ic for 8 8 spatial-multiplexing mimo communications, IEEEJ. Solid-State Circuits, vol. 45, no. 2, pp. 411421, Feb. 2010.
[7] G. Zhan and P. Nilsson, Reduced complexity schnorr-euchner decodingalgorithms for mimo systems, IEEE Commun. Lett., vol. 8, no. 5, pp.286288, May 2004.
[8] M. Siti and M. P. Fitz, A novel soft-output layered orthogonal latticedetector for multiple antenna communications, in Proc. IEEE Int. Conf.Commun. 2006 (ICC06), vol. 4, 2006, pp. 16861691.
[9] L. Azzam and E. Ayanoglu, Reduced complexity sphere decoding via areordered lattice representation, IEEE Trans. Commun., vol. 57, no. 9,pp. 2564 2569, Sept. 2009.617