[IEEE 2011 IEEE Workshop on Signal Processing Systems (SiPS) - Beirut, Lebanon...

AN ENERGY-EFFICIENT MULTIPLE-INPUT MULTIPLE-OUTPUT (MIMO) DETECTORARCHITECTURE

Eric P. Kim and Naresh R. Shanbhag

University of Illinois at Urbana ChampaignCoordinate Science Laboratory/Department of Electrical and Computer Engineering

1308 W. Main St. Urbana, IL 61801{epkim2, shanbhag}@illinois.edu

ABSTRACTIn this paper, a novel low complexity energy efficient recon-

figurable reduced dimension maximum likelihood (RRDML)

multiple-input multiple-output (MIMO) detector is proposed.

RRDML is based on RDML [1], in which maximum like-

lihood (ML) is applied to detect a sub-dimension of the re-

ceived vector and linear detection is used for the remaining

dimension. The channel condition number is employed to

configure RRDML. For a 4× 4 MIMO system with 16-QAM

modulation over a Rayleigh fading channel, Verilog simula-

tion in a commercial 45nm, 1.2V CMOS process show that

RRDML achieves up to 62.3% power savings with a BER

loss at most 3.7% compared to ML based receivers.

Index Terms— MIMO, Reconfigurable architectures,

low power

1. INTRODUCTION

Current mobile systems are facing great challenges in pro-

viding high speed communication to satisfy data intensive

mobile applications. The 4th generation (4G) guidelines set

by ITU-R IMT-Advanced specifies a downlink of 1Gbps.

Multiple-input multiple-output (MIMO) systems have been

gaining interest as a solution to enable such high speed com-

munication, due to its significant increase in throughput with-

out the use of additional bandwidth or transmit power. This is

possible by utilizing the inherent spatial diversity of wireless

channels. Current wireless standards, including fixed/mobile

broadband wireless systems (IEEE 802.16e/m WiMAX),

wireless local area networks (IEEE 802.11n WiFi), and all

upcoming 4G systems (LTE) employ MIMO technology to

achieve high speed communication.

Existing work on low power MIMO detection is mainly

focused on maximum likelihood (ML) techniques that include

tree based decoding, sphere decoding, and K-best decoding

by means of complexity reduction [2–4]. However, the com-

plexity of such ML based MIMO detectors, and hence its

power consumption, grows exponentially with increasing an-

tenna size and modulation constellation size. This limits their

Preprocessor

PZ,H,

sADC 1y

2y

RMy

ADC

ADC

y

DimensionSelector

)(HC

1n

ML BasedDetector

][ 1n

LinearDetector

][ 1nMT

)(HC

Fig. 1. Block diagram of the reconfigurable reduced dimen-

sion maximum likelihood (RRDML) MIMO detector.

application to MIMO systems with small number of antennas

(at most 4 for 802.11n and 802.16e) with a moderate con-

stellation (16-QAM). Future wireless systems are expected to

be at least 8 × 8 (no. of TX antennas × no. of RX anten-

nas) with large constellations (64-QAM and higher). The cur-

rent draft of 3GPP long term evolution-advanced (LTE-A) [5]

specifies a downlink that supports up to 8 streams with a con-

stellation size of 64 resulting in a receive space dimension of

648. This rate of increase in detection complexity is expected

to be greater than the growth of silicon capability (Moore’s

Law) and significantly greater than the scaling of battery ca-

pacity [6]. Thus, there is a strong need for a low complexity

and low power MIMO detector.

The MIMO detection process can be separated into two

stages, a preprocessing stage where the channel matrix (H),

its sorted QR decomposition and corresponding permutation

matrix (P), and a dimension reduction operator (Z) is calcu-

lated, and a detection stage where the actual detection is car-

ried out, as shown in Fig. 1. Typically the preprocessing stage

can be done at the channel varying rate, while the detection

stage needs to be done at the higher transmit rate. Therefore,

the detector is the power hungry block in a MIMO receiver,

and the focus of our paper.

In this paper, we present a low complexity and energy

efficient reconfigurable MIMO detector. The reconfigurable

239978-1-4577-1921-9/11/$26.00 ©2011 IEEE SiPS 2011

MIMO detector is based on a reduced dimension maximum

likelihood (RDML) detector [1]. RDML is characterized by

its dimension reduction parameter n1. RDML reduces the

detection complexity by performing ML detection over a re-

duced dimension n1, and employing a linear detector for the

remaining dimensions. RDML provides a tradeoff between

computation complexity and BER performance via n1. At

the extreme ends, RDML converges to a simple linear de-

tector (n1 = 0) or a ML detector (n1 = MT , the full di-

mension). In a reconfigurable RDML (RRDML), depicted in

Fig. 1, the preprocessor assesses the channel quality based on

the condition number of the channel (C(H)), and configures

the dimension parameter n1 based on a predefined thresh-

old τ . The goal is to have low-complexity MIMO detec-

tors used in good communication environments, where their

performance will be adequate, while more complex detec-

tors will be used in adverse communication environments.

RRDML detector shows minimal loss in BER performance

while achieving up to 62.3% power savings over conventional

ML based detectors.

The remainder of this paper is organized as follows. Sec-

tion 2 develops the communication model used in this paper.

Section 3 gives the necessary background needed for devel-

oping a reconfigurable detector. In Section 4, we show that

the condition number of H, the channel matrix, is a useful

metric to base RRDML on. Section 5 introduces the recon-

figurable reduced dimension ML detector (RRDML). Simu-

lation results are shown in Section 6 which include perfor-

mance, complexity and power comparisons. Section 7 con-

cludes the paper.

2. SYSTEM MODEL

Our wireless system has MT transmit antennas that transmit

a MT × 1 symbol vector s each channel use. Each element of

s is transmitted on one antenna and is chosen from a constel-

lation of size 2q . Thus, a total of qMT bits are transmitted per

channel use. The set of constellation points are denoted as Fand s ∈ FMT . The channel is modeled as a flat Rayleigh fad-

ing channel such that the received vector is y = Hs + n. De-

noting MR as the number of received antennas, y is a MR×1vector, H is the channel matrix of size MR ×MT with chan-

nel coefficients obtained as i.i.d. complex Gaussian random

variables, and n is spatially white Gaussian noise, a MR × 1complex Gaussian vector with zero mean and variance chosen

for a specific SNR. This model is depicted in Fig. 2.

3. BACKGROUND

In this section, we summarize MIMO detection algorithms

and architectures.

TX RX

1n

RMn

2n

11h

TRMMh

1s

2s

TMs

1y

2y

RMy

Fig. 2. Block diagram of a MIMO wireless link.

3.1. Linear Detection and Successive Interference Cance-lation

Linear detection takes a general form of x = Gy. After x is

estimated, quantization is performed to map x to the closet

constellation point. Depending on how the matrix G is cho-

sen, different detection schemes arise. In zero forcing (ZF),

G is chosen to be the Moore-Penrose pseudo inverse of the

channel.

GZF = (HHH)−1HH = H† (1)

where (·)H denotes the conjugate transpose. Though ZF

transforms the channel to identity, the effective noise, nZF =Gn, can be enhanced.

In MMSE, the following matrix G is used which accounts

for the noise as well as the channel.

GMMSE = (HHH + σ2nI)−1HH (2)

where I is the identity matrix and σ2n is the noise variance.

Both ZF and MMSE perform matrix multiplication in the

detection stage. A matrix multiplier is conventionally imple-

mented using a multiply and accumulate (MAC) unit. The

block diagram of a MAC unit, and the implementation of a

ZF detection stage is depicted in Fig. 3. Parallel MAC units

can be employed to speed up computation.

Successive interference cancelation (SIC), unlike linear

schemes, detects the symbols one by one. It is an iterative

process with a total of MT iterations and each iteration de-

pendent on the previous iteration. SIC can suffer from error

propagation if a previous signal was detected incorrectly. A

detailed description of SIC can be found in [7].

Each iteration of SIC is implemented with a block sim-

ilar to linear detection, but the iterations cannot be done in

parallel. Several chains of iterations can be scheduled using

pipelining to match the throughput of a parallel implementa-

tion of linear detection. The architectural block diagram for a

single iteration, and for the complete detection stage is shown

in Fig. 4.

3.2. Maximum-Likelihood Detectors

The ML detector finds the solution to argmins∈FMT ‖y−Hs‖.

A straightforward implementation will perform an exhaustive

240

8D

jiG,

jy js8

8

D

jx

(a)

jyjG ,1 jG ,2 jNt

G ,

1x 2x tNx

MAC MAC MAC

(b)

Fig. 3. Architecture of: (a) multiply-accumulate (MAC) unit,

and (b) linear detection stage.

MAC

)( ijy

jiG,

8D

ijH ,

-

SIC iteration)1(i

jy

D D

(a)

SIC Iter SIC Iter SIC Iter

1s 2s tNs

)2(jy

)3(jy )( tN

jy)1(jy

jG,1 1,jH jG ,2 2,jH jNtG , tNj

H ,

D D D

(b)

Fig. 4. Architecture diagram of: (a) a single iteration stage,

and (b) complete detection stage for a successive interference

cancelation receiver.

search over all 2qMT possible transmit vectors, which is pro-

hibitively costly for high dimension systems. ML achieves

the optimal BER performance at the cost of exponential com-

plexity. Sphere decoders (SD) are widely used as an alterna-

tive to ML due to its reduced complexity. Sphere decoding

transforms the ML search problem into a tree traversal prob-

lem where branches can be pruned according the the sphere

constraint. However, the expected number of elements in

the search space is still exponential in the dimension of the

MIMO system, thus, application of SD at high dimensional

systems are of great concern.

In terms of implementation, SD has variable complexity

because the number of nodes visited varies with the input

data. This makes it hard to achieve efficient implementations.

One alternative to this is the K-best detector. In K-best, detec-

tion on the tree is performed breadth first, and for each level,

only the best K are retained. This gives a constant complex-

ity sub-optimal algorithm. However, its performance loss is

small for sufficiently large K. Simulations show that for a

4×4 16-QAM system, K = 5 is a good tradeoff between per-

formance and complexity, which is also verified in [8].

3.3. Reduced Dimension Maximum Likelihood Detector

Reduced dimension ML (RDML) detector was first intro-

duced in [1] and it forms the core of our proposed recon-

figurable detector. The idea is to perform ML over a re-

duced dimensional space to reduce the size of the search.

Linear techniques are then used to determine the symbols

corresponding to the remaining dimension. To reduce the

dimension, the channel matrix is vertically divided into

two matrices H1 and H2 each n1 and n2 columns wide

(n1 + n2 = MT ). By defining a dimension reduction oper-

ator Z = σ2(H2HH

2 + σ2I)−1

, the detection is now a two

step process.

s1 = argmins1∈Fn1

‖Zy − ZH1s1‖ (3)

s2 = HH2

(H2HH

2 + σ2I)−1

(y − H1s1). (4)

The result of the two steps are concatenated to form the

complete detected symbol s. In practice, the performance of

RDML can degrade significantly from ML due to the subop-

timal performance of the ML stage. To combat this effect, a

list of best candidate symbols are obtained from the ML stage

(a list-SD or K-best). Then, MMSE detection is performed

on each candidate. The minimum distance solution among all

candidates is chosen as the final estimate.

4. CHANNEL CONDITION NUMBER

As depicted in Fig. 1, a readily available metric to assess

channel quality is essential for a successful reconfigurable im-

plementation. The condition number of the channel matrix H(C(H)) is ideal, as it can be computed by the preprocessor.

The condition number C(A) of a matrix A is defined as

C(A) = ‖A‖‖A−1‖ (5)

‖A‖ = maxx�=0

‖Ax‖‖x‖ . (6)

The condition number defines the ratio of the maximum

stretching to maximum shrinking the matrix A does to any

vector. A high condition number indicates that a small per-

turbation of the input (due to n) results in a large deviation

in the solution (s). Fig. 5(a) plots the cumulative distribution

of condition number for a Raleigh fading channel for a 4× 4

241

MIMO system. Most channels turn out to be good channels

(have low C(H)) because 55% of the channels have C(H)below 8 while 80% of the channels have C(H) below 13.

Fig. 5(b) plots the BER vs. channel condition number for the

same 4 × 4 MIMO system with 16-QAM modulation at an

SNR of 15dB using various detection schemes. It can be seen

for a specified BER of 10−3 at low C(H), MMSE can be

used in place of the more expensive K-best detector. Coupled

with the distribution of the condition number in Fig. 5(a), we

can see that RDML with low n1 can be used for a large num-

ber of channels without sacrificing performance. The power

consumption of the reconfigurable RDML detector can be

expressed as:

P =

MT∑

i=0

PDi[CDF (τi+1)− CDF (τi)] (7)

where PDi is the power consumption of RDML with n1 = i,CDF (τ) is the cumulative distribution of C(H) at point τ ,

and τi is the condition number threshold (denoted as τC(H))

used to choose n1. Given that most channels are good, it is

highly probable for a low complexity detector to be used, re-

sulting in significant power savings over an ML only detector.

Thus, the availability of the condition number gives an oppor-

tunity to implement a low complexity reconfigurable detec-

tor. Low complexity detectors can be used for good channels

while more sophisticated detectors are used in bad channels.

4.1. Impact of Channel Estimation Errors

A MIMO receiver typically estimates the channel matrix

through a channel estimator which is part of the preprocess-

ing stage. As the channel cannot be perfectly estimated this

will impact the performance. It has been shown in [9] that

the curves of various MIMO detection schemes show similar

degradation with increasing channel estimation error, and the

effects of estimation error can be neglected.

5. PROPOSED RECONFIGURABLE REDUCEDDIMENSION ML MIMO DETECTOR

Reconfigurable RDML utilizes the condition number (as

shown in Fig. 1). RRDML(n1) detector adjusts the dimen-

sion parameter n1 (ML dimension) based on the channel

condition number, where the dimensionality parameter of the

linear detector is MT − n1. For good channels (C(H) ≈ 1)

only linear detection is used (RRDML(0)) while in bad

channels (C(H) > 40) only ML detection is performed

(RRDML(MT )). For other channels, RRDML(n1) with

0 < n1 < MT is employed. It is the role of the prepro-

cessor to compute the condition number, decide the value of

n1, and provide the appropriate control signals for detector.

0 10 20 30 40 50 60 70 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

C(H)=13CDF(C(H))=0.8

C(H)

CD

F(C

(H))

C(H)=8CDF(C(H))=0.55

0 2 4 6 8 100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

(a)

0 10 20 30 40 50 60 70 8010 5

10 4

10 3

10 2

10 1

100

C(H)

BE

R

MMSEZFK best (K=5)RDML KB n1=1

RDML KB n1=2

RDML KB n1=3

(b)

Fig. 5. Impact of condition number C(H) on detector perfor-

mance:(a) cumulative distribution function (CDF) of C(H)for a Rayleigh fading channel, and (b) the impact of C(H) on

various detection schemes.

5.1. The RRDML MIMO Detector

The block diagram for the RRDML detector is given in Fig. 6.

The ML-based detection unit performs the detection of s1 and

provides a list of candidates. Two RRDML architectures are

considered depending on the ML algorithm employed: the

original sphere decoding algorithm (RRDML-SD) and K-best

(RRDML-KB). The architecture for SD is based on the one-

node-per-cycle architecture [10] and the architecture for K-

best is based on [11]. As the ML detection stage for RDML is

imperfect, the ML based architectures were modified to out-

put a list of candidates. For SD, a list-SD is employed which

tracks the 5 best candidates. For K-best, K = 5 was cho-

sen, by reasons mentioned in Section 3.2, and the final stage

was adjusted to give 5 candidates instead of only the mini-

mum distance candidate. The ML detection stage begins by

estimating the MMSE solution to find a good initial value for

the radius. If the number of candidates that satisfy this sphere

constraint is less than K, fewer candidates are passed down to

further reduce the MMSE-SIC computation. The ML detec-

tion stage is by-passed if no ML-based detection is needed.

The MMSE-SIC block implements the architecture de-

242

SD or K-bestDetectionBlock11,ZH

y

Parallelto

SerialBlock

MMSE-SICDetectionBlock1s

2s

22,ZH

Post-ProcessingBlock

Concat.and

ReorderBlock

s~

H yP

y1H

Variabledelay

MMSEDetectionBlock

initialradius

0

0

ML Bypass

s

MMSEBypass

ML Based DetectionDimension

1n 1nMT

Linear DetectionDimension

Fig. 6. High level block diagram of reconfigurable RDML detector.

Table 1. Comparison of operations (multiplication/addition)

per symbol detection for a 4 × 4 16-QAM MIMO system at

various SNRDetection SNRScheme 10dB 13dB 16dB 19dB 22dB 25dB

MMSE 16/12 16/12 16/12 16/12 16/12 16/12

RRDML-SD 368/325 306/288 255/258 215/234 185/216 167/205

RRDML-KB 193/166 193/166 193/166 193/166 193/166 193/166

K-best 460/680 460/680 460/680 460/680 460/680 460/680

SD 652/748 549/687 474/642 425/613 390/592 371/581

picted in Fig. 4(b). This block and all the blocks that follow

are by-passed in case ML-only detection is performed. The

concatenate and reorder block merges the candidate symbols

s1 and s2 into one vector s in the correct order (as permutation

of H reorders the detection). This block consists of several

multiplexers. The post processing block calculates the actual

distance between the detected candidates (s) and the received

signal y. It chooses the minimum distance candidate as the

final detection estimate s.

5.2. Complexity Analysis

The hardware gate complexity of RRDML is slightly higher

than conventional ML based detectors due to the additional

MMSE detector and auxiliary blocks as described in Sec. 5.1.

However, the expected number of nodes that are visited in

the ML-based stage decreases exponentially with reduction

in each dimension. This alone accounts for more than 40%

reduction in computation complexity. Comparison on the av-

erage number of multiplications and additions for various de-

tectors are summarized in Table 1.

6. SIMULATION RESULTS

This section shows the performance achieved by RRDML

along with its power savings. A 4 × 4 MIMO system with

16-QAM modulation is simulated.

0 5 10 15 20 25 3010 6

10 5

10 4

10 3

10 2

10 1

100

SNR (dB)B

ER

ZFMMSERRDML KBRRDML SDSDK Best (K=5)

Fig. 7. BER vs. SNR for various detector architectures.

6.1. BER Performance

The bit error (BER) performance of RRDML is given in Fig. 7

with comparisons to conventional SD, conventional K-best,

ZF and MMSE. In Fig. 5(a), it can be seen that RRDML-

KB with n1 = 3 achieves almost identical performance as

K-best. Based on this figure, the condition number threshold

(τC(H)) for RRDML-KB was chosen as 4, 8, 15, and ∞ (K-

best-only is not used). The thresholds for RRDML-SD were

chosen using a similar plot and the values are 4, 6, 8, and 15.

Within the SNR of interest (15dB to 20dB), it can be seen that

RRDML has minimal performance degradation compared to

ML only detectors.

6.2. Power Savings

The power estimation is done by estimating the number of

operations, as given in Table 1, and scaling it by the power

consumption of each operation. The power consumption of a

multiplier and adder is obtained through HSPICE simulations

using 1.2V 45nm CMOS process at 150MHz and is estimated

to be 460μW and 30.7μW , respectively. Fig. 8 shows the

resulting power consumption. SD based architectures have

a decrease in power as SNR is increased due to its variable

complexity. RRDML-SD achieves power savings of 44% to

55% at an SNR from 10dB to 25dB compared to SD, while

243

10 15 20 2550

100

150

200

250

300

350

SNR (dB)

��

e� (�

�)

RRDMLS�KBRRDMLS�SDSDK�Best (K=5)

5�� e�s��s

�� 44��e�s��s

�� 55��e�s��s

Fig. 8. Power consumption of RRDML and ML-based detec-

tors.

0 10 20 30 40 50 60 �0 �0100

120

140

160

1�0

200

220

240

τ�(�)

��

e� (�

�) RRDML KB

K Best(K=5)

Fig. 9. Power vs. τC(H) for a restricted RRDML with n1 = 2or 4 at SNR of 20dB.

RRDML-KB achieves 59% savings compared to K-best.

The RRDML-KB MIMO receiver was implemented in

Verilog with a precision of 15b for the received vector y and

12b for the channel coefficient H and synthesized using a

45nm standard cell library with Synopsis Design Compiler.

PrimeTime was used to estimate the power consumption at a

frequency of 150MHz. At a SNR of 20dB, the power con-

sumption of RRDML-KB is 82.4mW while conventional K-

best is 218.7mW, which represents a savings of 62.3%. There

is a 6% difference in the values obtained in Fig. 8. The BER

loss was only 3.7%.

To illustrate how the threshold τC(H) affects power con-

sumption, a simplified RRDML was constructed which uses

only a single threshold with a two dimension ML-based de-

coding stage (n1 = 2) for good channels, and ML-only

(n1 = 4) for bad channels. Fig. 9 shows the power con-

sumption vs. τC(H) at SNR 20dB. This figure shows the

power/performance tradeoff that RRDML provides. The

power graph resembles CDF (C(H)) in Fig. 5(b). This is

because the power relationship is very linear and it is possi-

ble to predict accurate power estimates based on τC(H) (and

using Fig. 5(b)). This is even be expected for SD where the

expected number of nodes that the SD algorithm visits is

dependent on C(H) as well as SNR.

7. CONCLUSION

We presented a reconfigurable MIMO detector based on a re-

duced dimension maximum likelihood (RDML) detector and

the condition number of a MIMO channel. Simulations show

that RRDML has minimal loss in BER while achieving up

to 62.3% power savings compared to an ML-only detector.

Future work include error resilient RRDML receivers which

are expected to provide further power savings by trading off

the increased resiliency with power (by means of voltage re-

duction). An integrated circuit implementation of RRDML is

warranted as well.

8. REFERENCES

[1] J. W. Choi, B. Shim, A. Singer, and N. I. Cho, “Low-

complexity decoding via reduced dimension maximum-

likelihood search,” IEEE Trans. Signal Process., vol. 58, no. 3,

pp. 1780–1793, Mar. 2010.

[2] S. Chen and T. Zhang, “Low power soft-output signal detector

design for wireless MIMO communication systems,” in Proc.IEEE ISLPED, 2007, pp. 232–237.

[3] J.-H. Lin and K. Parhi, “Low complexity iterative joint detec-

tion, decoding, and channel estimation for wireless MIMO sys-

tem,” in Proc. of IEEE SIPS, Oct. 2006, pp. 45–50.

[4] R. Jenkal and R. Davis, “An architecture for energy efficient

sphere decoding,” in Proc. IEEE ISLPED, Aug. 2010, pp. 244–

249.

[5] Evolved Universal Terrestrial Radio Access (E-UTRA); LTEphysical layer; General description, The 3rd Generation Part-

nership Project (3GPP) Std. 3GPP TS 36.201, Rev. 10.0.0,

Dec. 2010.

[6] J. M. Rabaey, “Low-power silicon architecture for wireless

communications: embedded tutorial,” in Proc. of the 2000ASP-DAC, ser. ASP-DAC ’00. New York, NY, USA: ACM,

2000, pp. 377–380.

[7] P. Wolniansky, G. Foschini, G. Golden, and R. Valenzuela, “V-

BLAST: An architecture for realizing very high data rates over

the rich-scattering wireless channel,” in Proc. URSI ISSSE,

1998, pp. 295–300.

[8] Z. Guo and P. Nilsson, “VLSI implementation issues of lat-

tice decoders for MIMO systems,” in Proc. IEEE ISCAS, May

2004, pp. IV – 477–480.

[9] M. Rupp, “On the influence of uncertainties in MIMO decod-

ing algorithms,” in 36th Asilomar Conference on Signals, Sys-tems and Computers, 2002, pp. 570–574.

[10] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner,

and H. Bolcskei, “VLSI implementation of MIMO detection

using the sphere decoding algorithm,” IEEE J. Solid-State Cir-cuits, vol. 40, no. 7, pp. 1566–1577, Jul. 2005.

[11] K. Wong, C. Tsui, R. Cheng, and W. Mow, “A VLSI architec-

ture of a K-best lattice decoding algorithm for MIMO chan-

nels,” in Proc. IEEE ISCAS, Aug. 2002.

244

[IEEE 2011 IEEE Workshop on Signal Processing Systems (SiPS) - Beirut, Lebanon...

Documents

Transcript of [IEEE 2011 IEEE Workshop on Signal Processing Systems (SiPS) - Beirut, Lebanon...