3019-F.pdf

5
A Parallel Processing Algorithm for Schnorr-Euchner Sphere Decoder Han-Wen Liang 1,2 , Wei-Ho Chung 2,* , Hongke Zhang 3 , Sy-Yen Kuo 1,3 Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan 1 Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan 2 College of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, China 3 E-mail:*[email protected] Abstract—This paper presents a category of detection schemes for Multiple-Input Multiple-Output (MIMO) system called Par- allel Sphere Decoder (PSD). Compared to the conventional depth- first sphere decoder with Schnorr-Euchner enumeration (SE- SD), the proposed PSD algorithms use parallel computations and achieve approximately 50% searching time reductions un- der the same amount of computations. Namely, in hardware implementation, the proposed work provides trade-off between computational time and computing units. Simulations of the proposed algorithms in 4 × 4 16-QAM and 3 × 3 64-QAM MIMO systems show the searching time reductions of the proposed algorithms while maintaining ML performances. I. I NTRODUCTION The Multiple-Input Multiple-Output (MIMO) system has attracted tremendous research interests since it was proposed [1]. The MIMO system is novel in its capability to exploit the spacial diversity to obtain the increased channel capacity without demanding more spectrum. The receiver of MIMO system observes a linear superposition of transmitted symbols from each transmitting antenna. How to detect the coupled symbols is an interesting issue and is widely investigated [2]. The optimal detection of the transmitted symbols can be viewed as a discrete closest point search problem [3]. The complexity of the optimal detection algorithm grows exponentially with the number of transmitting antennas. This optimal solution can be achieved by sphere decoding algorithm with reduced complexity. The sphere decoding algorithm imposes a preprocessing called QR decomposition to convert MIMO detection to a tree search problem. According to the search rules in the decision tree, the sphere decoding algorithms can be categorized into three types. Depth-First: The typical depth-first algorithm firstly searches the nodes in the tree vertically (from top layer to bottom layer) for the best solution. A top-to-bottom path is searched in one search cycle, then depth-first algorithm moves horizontally and searches the other top-to-bottom Acknowledgment: This research was supported by National Science Coun- cil, Taiwan under Grant NSC 99-2221-E-002-106-MY3 and NSC 100-2221-E- 001-004, National Natural Science Foundation of China (NSFC) under Grant 60833002, and 111 Project under Grant B08002. path till the best path is obtained [4]. The typical depth- first algorithm generates an optimal detection. However, it suffers form long and uncertain terminational time. Breadth-First: In the same layer of the decision tree, the best K nodes are kept since which are considered as the candidates of resulting the best path. The breadth-first algorithm searches the best path in the kept nodes. As the K is as large as the number of nodes over a layer, this algorithm achieves optimal detection. However, the K is usually chosen to be small to reduce complexity [5]. This algorithm is preferable in terms of hardware implementation since it has fixed complexity and memory usage. Best-First: The tree search rule of best-first algorithm is to search the neighboring nodes around the current path [6]. Thus, this algorithm terminates earlier than the first algorithm with high probability while keeping the optimal performance. However, the essential disadvantage of this algorithm is the need of large memories. In this paper, we propose three algorithms to attain optimal detection with reduced searching time compared to the depth- first algorithm. The proposed algorithms are based on a better complex-to-real conversion, which benefits to allow parallel computations in tree search for sphere decoding. The main contribution of this paper is to combine such complex-to- real conversion and the Schnorr-Euchner (SE) algorithm. This paper applies the proposed technique in depth-first sphere decoder, which is called ‘sphere decoder’ (SD) for simplicity in the succeeding paragraphs, to demonstrate the reduction of time. However, the proposed technique can be applied in other tree search rules as well such as the breadth-first or hybrid tree search rules [5]. In general, other tree search rules sacrifice the performance to reduce complexity. We apply the proposed technique in SD to show that the optimality is not sacrificed when the proposed technique is applied. The rest of this paper is organized as follows: Sec. II models the MIMO system and defines the notations, where the related work is also described. The preprocessing of proposed parallel sphere decoders (PSD) is introduced in Sec. III. The following subsections III-A, III-B and III-C give the details of 2012 IEEE Wireless Communications and Networking Conference: PHY and Fundamentals 978-1-4673-0437-5/12/$31.00 ©2012 IEEE 613

Transcript of 3019-F.pdf

  • A Parallel Processing Algorithm for

    Schnorr-Euchner Sphere Decoder

    Han-Wen Liang1,2, Wei-Ho Chung2,, Hongke Zhang3, Sy-Yen Kuo1,3

    Department of Electrical Engineering, National Taiwan University, Taipei, Taiwan1

    Research Center for Information Technology Innovation, Academia Sinica, Taipei, Taiwan2

    College of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, China3

    E-mail:*[email protected]

    AbstractThis paper presents a category of detection schemesfor Multiple-Input Multiple-Output (MIMO) system called Par-allel Sphere Decoder (PSD). Compared to the conventional depth-first sphere decoder with Schnorr-Euchner enumeration (SE-SD), the proposed PSD algorithms use parallel computationsand achieve approximately 50% searching time reductions un-der the same amount of computations. Namely, in hardwareimplementation, the proposed work provides trade-off betweencomputational time and computing units. Simulations of theproposed algorithms in 44 16-QAM and 33 64-QAM MIMOsystems show the searching time reductions of the proposedalgorithms while maintaining ML performances.

    I. INTRODUCTION

    The Multiple-Input Multiple-Output (MIMO) system has

    attracted tremendous research interests since it was proposed

    [1]. The MIMO system is novel in its capability to exploit

    the spacial diversity to obtain the increased channel capacity

    without demanding more spectrum. The receiver of MIMO

    system observes a linear superposition of transmitted symbols

    from each transmitting antenna. How to detect the coupled

    symbols is an interesting issue and is widely investigated

    [2]. The optimal detection of the transmitted symbols can

    be viewed as a discrete closest point search problem [3].

    The complexity of the optimal detection algorithm grows

    exponentially with the number of transmitting antennas. This

    optimal solution can be achieved by sphere decoding algorithm

    with reduced complexity.

    The sphere decoding algorithm imposes a preprocessing

    called QR decomposition to convert MIMO detection to a tree

    search problem. According to the search rules in the decision

    tree, the sphere decoding algorithms can be categorized into

    three types.

    Depth-First: The typical depth-first algorithm firstly

    searches the nodes in the tree vertically (from top layer to

    bottom layer) for the best solution. A top-to-bottom path

    is searched in one search cycle, then depth-first algorithm

    moves horizontally and searches the other top-to-bottom

    Acknowledgment: This research was supported by National Science Coun-cil, Taiwan under Grant NSC 99-2221-E-002-106-MY3 and NSC 100-2221-E-001-004, National Natural Science Foundation of China (NSFC) under Grant60833002, and 111 Project under Grant B08002.

    path till the best path is obtained [4]. The typical depth-

    first algorithm generates an optimal detection. However,

    it suffers form long and uncertain terminational time.

    Breadth-First: In the same layer of the decision tree, the

    best K nodes are kept since which are considered as the

    candidates of resulting the best path. The breadth-first

    algorithm searches the best path in the kept nodes. As

    the K is as large as the number of nodes over a layer,

    this algorithm achieves optimal detection. However, the

    K is usually chosen to be small to reduce complexity

    [5]. This algorithm is preferable in terms of hardware

    implementation since it has fixed complexity and memory

    usage.

    Best-First: The tree search rule of best-first algorithm is

    to search the neighboring nodes around the current path

    [6]. Thus, this algorithm terminates earlier than the first

    algorithm with high probability while keeping the optimal

    performance. However, the essential disadvantage of this

    algorithm is the need of large memories.

    In this paper, we propose three algorithms to attain optimal

    detection with reduced searching time compared to the depth-

    first algorithm. The proposed algorithms are based on a better

    complex-to-real conversion, which benefits to allow parallel

    computations in tree search for sphere decoding. The main

    contribution of this paper is to combine such complex-to-

    real conversion and the Schnorr-Euchner (SE) algorithm. This

    paper applies the proposed technique in depth-first sphere

    decoder, which is called sphere decoder (SD) for simplicity

    in the succeeding paragraphs, to demonstrate the reduction of

    time. However, the proposed technique can be applied in other

    tree search rules as well such as the breadth-first or hybrid tree

    search rules [5]. In general, other tree search rules sacrifice

    the performance to reduce complexity. We apply the proposed

    technique in SD to show that the optimality is not sacrificed

    when the proposed technique is applied.

    The rest of this paper is organized as follows: Sec. II

    models the MIMO system and defines the notations, where the

    related work is also described. The preprocessing of proposed

    parallel sphere decoders (PSD) is introduced in Sec. III. The

    following subsections III-A, III-B and III-C give the details of

    2012 IEEE Wireless Communications and Networking Conference: PHY and Fundamentals978-1-4673-0437-5/12/$31.00 2012 IEEE 613

  • proposed three algorithms, respectively. The Sec. IV shows

    the simulation results of the proposed work compared to

    conventional SD. Finally, conclusion of this paper is given

    in Sec. V.

    II. SYSTEM MODEL AND PREVIOUS WORK

    We consider Nt transmitting antennas and Nr receiving

    antennas in the MIMO system

    y =

    Hx + n, (1)

    where x = [x1 x2 . . . xNt ]T

    is the transmitted symbol vector.

    The H is a Nr Nt channel matrix with elements hji, whichare assumed to be independent identically distributed (i.i.d)

    zero-mean complex Gaussian random variables and each has

    unit variance. The n = [n1 n2 . . . nNr ]T

    is the additive white

    Gaussian noise (AWGN) and n CNr . The = ESNtN0

    is

    the signal to noise ratio (SNR) where Es denotes the energy

    per transmitted symbol and N0 denotes the noise power. The

    y = [y1 y2 . . . yNr ]T

    is the received symbol vector. The

    transmitted symbol is assumed to be a M2-ary symbol.

    The optimal detection metric for the received symbol y is

    x = arg minxNt

    ||y Hx||2, (2)

    where denotes the set of M2-ary constellation points and

    |Nt | = M2Nt . The x represents the detected symbol vector.The complexity of searching by (2) grows exponentially

    with the number of transmitting antennas. To achieve optimal

    detection, the SD is adopted with reduced complexity. The SD

    includes two steps:

    1) Preprocessing Applying QR decomposition to H ,

    where Q is an orthonormal matrix (QHQ = I). TheH represents conjugate transpose. The R is an upper

    triangular matrix.

    2) Tree Search The (2) is transformed to

    x = arg minxNt

    ||y Rx||2, (3)

    where y = QHy. The R converts the exhaustive search in (2)into a tree search as in Fig. 1. In the decision tree, each node

    in layer k represents an element in k. The branch metric in

    decision tree is defined as

    Tb(xNt , xNt1, , xk) ,(

    yk Ntl=k

    rk,lxl

    )2, (4)

    where yk is the k-th element in y and rij is the element in

    R. The SD algorithm searches from the root of the decision

    tree to the leafnodes. Before searching layer k, SD algorithm

    computes Tb(xNt , xNt1, , xk) for all nodes in layer k,and sorts these nodes by their Tb. The order is used to give

    searching priority of the nodes over layer k. The cumulative

    layer 1

    layer N

    Root

    Leafnodes

    t

    Fig. 1. Decision tree while SD is used to achieve ML solution.

    metric in decision tree is defined as

    Tc(xNt , xNt1, , xk) ,Tb(xNt)+Tb(xNt , xNt1)

    ...

    +Tb(xNt , xNt1, , xk). (5)The SD algorithm firstly makes an initial decision of xNt in

    layer Nt, then it moves node-by-node and arrives at a leafnode.

    The Tc of the leafnode is kept as an initial radius. If SD

    algorithm visits another leafnode which has smallest Tc over

    visited leafnodes, the radius is replaced by the new Tc. While

    SD algorithm visits a node whose Tc exceeds the radius, this

    algorithm will not visit the nodes extended from this node.

    More details of SD can be found in [3].

    The received complex-value symbol vector is often con-

    verted to real-value for the feasibility of hardware implemen-

    tation of SD [7], [5]. One of the usual conversion used in [7],

    [5] and many other works is expressed as[ (y)(y)

    ]=

    [ (H) (H)(H) (H)

    ] [ (x)(x)

    ]+

    [ (n)(n)

    ],

    (6)

    where ( . ) and ( . ) represent the real and imaginary part of( . ), respectively. However, the other form of complex-to-realconversion [8] is proposed as

    (y1)(y1)

    ...

    =

    (h11) (h11) . . .(h11) (h11) . . .

    .... . .

    (x1)(x1)

    ...

    +

    (n1)(n1)

    ...

    . (7)

    The conversion using (7) leads to an upper triangular matrix

    with regular zero elements, which will be detailed latter. For

    simplicity, the real-value matrices after applying conversion

    (7) are denoted as

    yr =

    Hrxr + nr. (8)

    The author of [8] adopted another preprocessing rather than

    QR decomposition to generate an upper triangular matrix. The

    preprocessing decomposes Hr to the product of an upper

    triangular matrix and a non-unitary matrix. However, the

    non-unitary matrix colors the noise, which complicates the

    detection problem.614

  • III. PROPOSED ALGORITHM FOR PSD

    The preprocessing for proposed PSD is divided into two

    steps:

    1) Complex-to-real Conversion PSD converts the received

    complex-value symbol vector and channel matrix to real-

    value yr and Hr by using (7).

    2) QR Decomposition Applying QR decomposition to

    Hr such that QrRp = Hr, where Qr is a real-valueorthonormal matrix, Rp is a real-value upper triangular

    matrix. Let Hr = [h1, h2, , h2Nt ]. The QR de-composition processing can be expressed as

    u1 = h1, e1 =u1

    ||u1|| ,

    uk = hk k1l=1

    el,hkel, ek = uk||uk|| ,

    for k = 2, , 2Nt, (9)where . denotes the inner product. As a result, thereal-value upper triangular matrix is

    Rp =

    e1, h1 0 e1, h3 e1, h4 0 e2,h2 e2, h3 e2, h4 0 0 e3, h3 00 0 0 e4, h4...

    .... . .

    ,

    (10)

    where zeros regularly appear above even diagonal ele-

    ments (proved in [9]). This property is the same as the

    preprocessing result of [8], although QR decomposition

    is not adopted in [8].

    The preprocessing applies for the proposed PSD. According

    to different tree search rules, three versions of PSD are

    proposed.

    A. The first PSD

    If (10) is used for SD, we observe that the Tb in 2k-thlayer is independent with the Tb in (2k 1)-th layer fork = 1, 2, , Nt. In the decision tree, we call layer 2k andlayer 2k 1 as layer pair k. Layer 2k, layer 2k 1 and layerpair k are denoted as Lke , Lko and Lk respectively. Since theTb for Lke and for Lko are independent, we can compute thetwo Tbs simultaneously. This arrangement enables the parallel

    processing in SD. With parallel processing, the computational

    complexity is maintained at the same level but the compu-

    tational time is reduced to approximately half of the original

    SD. The simulation for the amount of increased computational

    complexity and the amount of reduced time will be given latter.

    The algorithm for tree search of the first proposed PSD is

    given as below:

    1) The PSD searches from the root to leafnodes. It visits

    two nodes simultaneously; one node is in Lko and theother is in Lke . The two nodes are defined as a nodepair. The PSD moves from node pair to node pair. If

    the visited node pair includes a leafnode, the PSD will

    compare the new Tc with radius and replace the radius

    with the smaller one. If the PSD visits a node pair whose

    Tc exceeds the radius, the PSD will not visit the node

    pairs extended from this node pair, and will visit the next

    priority or go back to the previous layer pair to visit next

    priority. If the PSD visits an unvisited layer pair Lk, thevisiting priority of the node pairs in Lk is detailed inthe following steps. The PSD algorithm terminates until

    the minimum Tc in decision tree is derived.

    2) When the PSD moves to an unvisited layer pair Lk, itcomputes the Tbs for Lke and for Lko simultaneously.

    3) Sort the nodes in Lke and in Lko with their Tbs inascending order, respectively; denote the sorted nodes

    in Lke as n1, , nM and the sorted nodes in Lko asm1, ,mM .

    4) The visiting priority of node pairs in Lk is given in Fig.2(a), i.e. (n1,m1), (n1,m2), , (n1,mM ), (n2,m1), , (n2,mM ), , (nM ,mM ). If Tc of a node pair(np, mk) exceeds the sphere radius, the node pairs(ni,mj : i p, j k) are removed from visitinglist.

    n1 n2

    m1

    m2

    n3

    m3

    ...

    ...

    ...

    (a) The visiting pri-ority of (ni, mj ) inthe first PSD.

    n1 n2

    m1

    m2

    n3

    m3

    ...

    ...

    (b) The visiting pri-ority of (ni, mj ) inthe second PSD.

    nt mt( )

    1

    2

    3

    4

    5

    ...

    ...

    .

    (c) The priority ofadditive node pairs(nu, mv) for com-parison.

    Fig. 2. Visiting priorities.

    B. The second PSD

    The SE algorithm implies that giving higher visiting priority

    to the node with less Tb can reduce searching time of SD. The

    first PSD adopts this algorithm. The second PSD extends this

    algorithm to the layer pair; it gives higher visiting priority to

    the node pair with less summation of corresponding Tbs. The

    summation of the two Tbs is not needed; instead, the statistical

    order of the summation of Tbs is used.

    A simple example of the second PSD is shown in Fig.

    3, where the nodes in Lke and in Lko are sorted and labeledfor k = Nt and M = 4. The node pairs with the first twovisiting priority are (n1,m1) and (n1, m2); in contrast to the

    first PSD which gives the next visiting priority to (n1,m3), the

    second PSD gives the next visiting priority to (n2,m1) whose

    summation of Tbs is statistically smaller.

    The algorithm for the tree search of the second PSD is given

    as below:

    1) The first three steps are kept the same as in the first

    PSD.

    2) The visiting priority of node pairs in Lk is given in Fig.2(b), i.e. (n1, m1),(n1,m2),(n2,m1), . If Tc of a nodepair (np,mk) exceeds the sphere radius, the node pairs615

  • nn

    m m mm

    1

    11 23

    2

    Fig. 3. The node pairs with different visiting priorities in PSD and PSD2.

    (ni,mj : i p, j k) are removed from visitinglist.

    C. The third PSD

    In the decision tree, there are two searching directions for

    PSD including vertical search and horizontal search. When

    PSD moves from an upper layer pair to an unvisited layer

    pair, it is called vertical search. The PSD has to compute

    the Tbs for all nodes in the unvisited layer pair in vertical

    search, it is obvious that the computational complexity is high

    in vertical search. When PSD visits a node pair whose Tcexceeds sphere radius, the PSD will visit the node pair over

    the same layer pair with next visiting priority; it is called

    horizontal search. The PSD only has to compare the Tc of the

    visited node pair with sphere radius to decide next searching

    direction. The computational complexity of horizontal search

    is low since the Tbs of visited node pair are already computed;

    there are only addition and comparison in horizontal search.

    Thus, the third PSD is proposed to add the horizontal search.

    The design aims to reduce the terminational time in PSD. The

    original processing in the horizontal search is to visit a node

    pair according to visiting priority and compare the Tc of this

    node pair with sphere radius; in the additive processing, we

    compare one more node pairs Tc with sphere radius. If the Tcexceeds sphere radius, the PSD will not visit the node pairs

    extended from this node pair; if the Tc is less than sphere

    radius, the PSD keeps this node pair and will visit this node

    pair by the visiting priority described in the second PSD. The

    algorithm for tree search of the third PSD is given below:

    1) The first two steps are kept the same as in the first PSD.

    2) The unvisited layer pair is denoted as Lk, and theprevious layer pair is denoted as Lk+1. Compare thesummation of Tc of Lk+1 and Tb of Lko with sphere ra-dius. Denote the node whose corresponding summation

    exceeds sphere radius as np. Also compare the Tc of Lkewith sphere radius. Denote the node whose Tc exceeds

    sphere radius as mk. The node pairs {(ni,mj) : i p}and {(ni,mj) : j k} and their correspondingextended node pairs are removed from visiting list.

    3) This step is the same as the third step in the first PSD

    except that the nodes out of visiting list are not needed

    to be sorted.

    4) This step is the same as the second step in the second

    PSD.

    5) When the third PSD moves horizontally, it compares

    an additive node pairs Tc with sphere radius. Denote

    the sorted node pair which is to be compared additively

    as (nu,mv). The comparing priority of (nu,mv) isdetermined by the two-phase rule:

    a) Denote (na,ma) as the node pair in visiting listsatisfying that Tb of na is the largest over Tb of niand Tb of ma is the largest over Tb of mj . Further,

    denote (nb, mb) as the node pair in the visiting listsatisfying the Tb of nb is the least over Tb of niand Tb of mb is the least over Tb of mj . The values

    (u, v) of (nu,mv) in this phase is determined by

    u = v = a + b2

    , (11)

    where . is a ceiling function. If Tc of (nu,mv)exceeds sphere radius, remove the node pairs

    (ni,mj : i u, j v) from visiting list and re-place (na,ma) with (nu1,mv1) in next additivecomparison; if Tc of (nu,mv) does not exceedssphere radius, replace (nb,mb) with (nu,mv) innext additive comparison. Repeat this process until

    a = b, then this phase ends and enters next phase.b) Let a = b = t, the comparison priority of next

    node pair (nu, mv) in this phase is determined bythe rule shown in Fig. 2(c). If Tc of the node pair

    (np,mk) exceeds sphere radius, remove the nodepairs (ni,mj : i p, j k) from visiting list.

    IV. SIMULATION RESULT

    To compare the terminational time of SD, the average

    visited node pairs are adopted as the measure. A visited node

    pair is the nodes whose Tbs can be computed simultaneously.

    In the conventional SD, we can only compute Tb of a node

    at a time. Thus, a node pair represents merely a node in

    conventional SD. In our simulations, the memory is used

    to record the computed Tbs and computed priority to avoid

    repeated computation. The conventional SD is converted to

    real-value system by (6) where the parallel algorithm can not

    be applied.

    The average visited node pairs in different SNR of the

    proposed PSD and conventional SD are shown in Fig. 4 and

    Fig. 5 for 4 4 16-QAM MIMO and 3 3 64-QAM MIMOrespectively. The simulation result shows that terminational

    time is greatly reduced in proposed PSD as compared to

    conventional SD.

    To measure the computational complexity by the average

    number of real-value multiplications, we assume multiplier

    dominates the computational complexity and ignore addition

    and sorting processing. The simulations of computational

    complexity are shown in Fig. 6, for 4 4 16-QAM MIMOand in Fig. 7 for 3 3 64-QAM MIMO.

    These results imply the computational complexity of pro-

    posed PSD are slightly higher than conventional SD; however,616

  • 0 2 4 6 8 10 120

    50

    100

    150

    200

    250

    SNR (dB)

    ave

    rage

    visi

    ted

    node

    pai

    rsSESDPSDPSD2PSD3

    Fig. 4. The average visited node pairs in 4 4, 16-QAM system

    0 2 4 6 8 10 120

    50

    100

    150

    200

    250

    300

    350

    400

    450

    SNR (dB)

    ave

    rage

    visi

    ted

    node

    pai

    rs

    SESDPSDPSD2PSD3

    Fig. 5. The average visited node pairs in 3 3, 64-QAM system

    the difference is not obvious. It is noticeable that the com-

    putational complexity simulation result of the second PSD

    is the same as the third PSD. In fact, the third PSD has

    more additions and comparisons than the second PSD; but

    both additions and comparisons are ignored for computational

    complexity.

    V. CONCLUSION

    In this paper we proposed three PSDs, which have about the

    same computational complexity but about only half termina-

    tional time compared to the conventional SD. The simulation

    results show that the reduced terminational time is significant,

    particularly in low SNR. Under the same computational com-

    plexity, the proposed PSD uses two computing units simul-

    taneously to reduce terminational time for SD. This property

    is not achievable in conventional SD since the conventional

    SD is not suitable for parallel processing. In the second and

    third proposed PSDs, the priority of visited node pairs may be

    optimized, which are the possible future directions.

    0 2 4 6 8 10 120

    500

    1000

    1500

    2000

    2500

    SNR (dB)

    ave

    rage

    num

    ber o

    f mul

    tiplic

    atio

    ns

    SESDPSDPSD2PSD3

    Fig. 6. The computational complexity in 4 4, 16-QAM system

    0 2 4 6 8 10 120

    1000

    2000

    3000

    4000

    5000

    6000

    SNR (dB)

    ave

    rage

    num

    ber o

    f mul

    tiplic

    atio

    ns

    SESDPSDPSD2PSD3

    Fig. 7. The computational complexity in 3 3, 64-QAM system

    REFERENCES

    [1] G. Foschini, Layered space-time architecture for wireless communicationin a fading environment when using multi-element antennas, Bell labstechnical journal, vol. 1, no. 2, pp. 4159, 1996.

    [2] E. G. Larsson, Mimo detection methods: How they work [lecture notes],IEEE Signal Process. Mag., vol. 26, no. 3, pp. 9195, May 2009.

    [3] E. Agrell, T. Eriksson, A. Vardy, and K. Zeger, Closest point search inlattices, IEEE Trans. Information Theory, vol. 48, no. 8, pp. 22012214,Aug. 2002.

    [4] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner, andH. Bolcskei, VLSI implementation of mimo detection using the spheredecoding algorithm, IEEE J. Solid-State Circuits, vol. 40, no. 7, pp.15661577, July 2005.

    [5] Z. Guo and P. Nilsson, Algorithm and implementation of the k-bestsphere decoding for mimo detection, IEEE J. Sel. Areas Commun.,vol. 24, no. 3, pp. 491503, Mar. 2006.

    [6] C.-H. Liao, T.-P. Wang, and T.-D. Chiueh, A 74.8 mw soft-outputdetector ic for 8 8 spatial-multiplexing mimo communications, IEEEJ. Solid-State Circuits, vol. 45, no. 2, pp. 411421, Feb. 2010.

    [7] G. Zhan and P. Nilsson, Reduced complexity schnorr-euchner decodingalgorithms for mimo systems, IEEE Commun. Lett., vol. 8, no. 5, pp.286288, May 2004.

    [8] M. Siti and M. P. Fitz, A novel soft-output layered orthogonal latticedetector for multiple antenna communications, in Proc. IEEE Int. Conf.Commun. 2006 (ICC06), vol. 4, 2006, pp. 16861691.

    [9] L. Azzam and E. Ayanoglu, Reduced complexity sphere decoding via areordered lattice representation, IEEE Trans. Commun., vol. 57, no. 9,pp. 2564 2569, Sept. 2009.617