[IEEE 2011 IEEE Workshop on Signal Processing Systems (SiPS) - Beirut, Lebanon...
Transcript of [IEEE 2011 IEEE Workshop on Signal Processing Systems (SiPS) - Beirut, Lebanon...
AN ENERGY-EFFICIENT MULTIPLE-INPUT MULTIPLE-OUTPUT (MIMO) DETECTORARCHITECTURE
Eric P. Kim and Naresh R. Shanbhag
University of Illinois at Urbana ChampaignCoordinate Science Laboratory/Department of Electrical and Computer Engineering
1308 W. Main St. Urbana, IL 61801{epkim2, shanbhag}@illinois.edu
ABSTRACTIn this paper, a novel low complexity energy efficient recon-
figurable reduced dimension maximum likelihood (RRDML)
multiple-input multiple-output (MIMO) detector is proposed.
RRDML is based on RDML [1], in which maximum like-
lihood (ML) is applied to detect a sub-dimension of the re-
ceived vector and linear detection is used for the remaining
dimension. The channel condition number is employed to
configure RRDML. For a 4× 4 MIMO system with 16-QAM
modulation over a Rayleigh fading channel, Verilog simula-
tion in a commercial 45nm, 1.2V CMOS process show that
RRDML achieves up to 62.3% power savings with a BER
loss at most 3.7% compared to ML based receivers.
Index Terms— MIMO, Reconfigurable architectures,
low power
1. INTRODUCTION
Current mobile systems are facing great challenges in pro-
viding high speed communication to satisfy data intensive
mobile applications. The 4th generation (4G) guidelines set
by ITU-R IMT-Advanced specifies a downlink of 1Gbps.
Multiple-input multiple-output (MIMO) systems have been
gaining interest as a solution to enable such high speed com-
munication, due to its significant increase in throughput with-
out the use of additional bandwidth or transmit power. This is
possible by utilizing the inherent spatial diversity of wireless
channels. Current wireless standards, including fixed/mobile
broadband wireless systems (IEEE 802.16e/m WiMAX),
wireless local area networks (IEEE 802.11n WiFi), and all
upcoming 4G systems (LTE) employ MIMO technology to
achieve high speed communication.
Existing work on low power MIMO detection is mainly
focused on maximum likelihood (ML) techniques that include
tree based decoding, sphere decoding, and K-best decoding
by means of complexity reduction [2–4]. However, the com-
plexity of such ML based MIMO detectors, and hence its
power consumption, grows exponentially with increasing an-
tenna size and modulation constellation size. This limits their
Preprocessor
PZ,H,
sADC 1y
2y
RMy
ADC
ADC
y
DimensionSelector
)(HC
1n
ML BasedDetector
][ 1n
LinearDetector
][ 1nMT
)(HC
Fig. 1. Block diagram of the reconfigurable reduced dimen-
sion maximum likelihood (RRDML) MIMO detector.
application to MIMO systems with small number of antennas
(at most 4 for 802.11n and 802.16e) with a moderate con-
stellation (16-QAM). Future wireless systems are expected to
be at least 8 × 8 (no. of TX antennas × no. of RX anten-
nas) with large constellations (64-QAM and higher). The cur-
rent draft of 3GPP long term evolution-advanced (LTE-A) [5]
specifies a downlink that supports up to 8 streams with a con-
stellation size of 64 resulting in a receive space dimension of
648. This rate of increase in detection complexity is expected
to be greater than the growth of silicon capability (Moore’s
Law) and significantly greater than the scaling of battery ca-
pacity [6]. Thus, there is a strong need for a low complexity
and low power MIMO detector.
The MIMO detection process can be separated into two
stages, a preprocessing stage where the channel matrix (H),
its sorted QR decomposition and corresponding permutation
matrix (P), and a dimension reduction operator (Z) is calcu-
lated, and a detection stage where the actual detection is car-
ried out, as shown in Fig. 1. Typically the preprocessing stage
can be done at the channel varying rate, while the detection
stage needs to be done at the higher transmit rate. Therefore,
the detector is the power hungry block in a MIMO receiver,
and the focus of our paper.
In this paper, we present a low complexity and energy
efficient reconfigurable MIMO detector. The reconfigurable
239978-1-4577-1921-9/11/$26.00 ©2011 IEEE SiPS 2011
MIMO detector is based on a reduced dimension maximum
likelihood (RDML) detector [1]. RDML is characterized by
its dimension reduction parameter n1. RDML reduces the
detection complexity by performing ML detection over a re-
duced dimension n1, and employing a linear detector for the
remaining dimensions. RDML provides a tradeoff between
computation complexity and BER performance via n1. At
the extreme ends, RDML converges to a simple linear de-
tector (n1 = 0) or a ML detector (n1 = MT , the full di-
mension). In a reconfigurable RDML (RRDML), depicted in
Fig. 1, the preprocessor assesses the channel quality based on
the condition number of the channel (C(H)), and configures
the dimension parameter n1 based on a predefined thresh-
old τ . The goal is to have low-complexity MIMO detec-
tors used in good communication environments, where their
performance will be adequate, while more complex detec-
tors will be used in adverse communication environments.
RRDML detector shows minimal loss in BER performance
while achieving up to 62.3% power savings over conventional
ML based detectors.
The remainder of this paper is organized as follows. Sec-
tion 2 develops the communication model used in this paper.
Section 3 gives the necessary background needed for devel-
oping a reconfigurable detector. In Section 4, we show that
the condition number of H, the channel matrix, is a useful
metric to base RRDML on. Section 5 introduces the recon-
figurable reduced dimension ML detector (RRDML). Simu-
lation results are shown in Section 6 which include perfor-
mance, complexity and power comparisons. Section 7 con-
cludes the paper.
2. SYSTEM MODEL
Our wireless system has MT transmit antennas that transmit
a MT × 1 symbol vector s each channel use. Each element of
s is transmitted on one antenna and is chosen from a constel-
lation of size 2q . Thus, a total of qMT bits are transmitted per
channel use. The set of constellation points are denoted as Fand s ∈ FMT . The channel is modeled as a flat Rayleigh fad-
ing channel such that the received vector is y = Hs + n. De-
noting MR as the number of received antennas, y is a MR×1vector, H is the channel matrix of size MR ×MT with chan-
nel coefficients obtained as i.i.d. complex Gaussian random
variables, and n is spatially white Gaussian noise, a MR × 1complex Gaussian vector with zero mean and variance chosen
for a specific SNR. This model is depicted in Fig. 2.
3. BACKGROUND
In this section, we summarize MIMO detection algorithms
and architectures.
TX RX
1n
RMn
2n
11h
TRMMh
1s
2s
TMs
1y
2y
RMy
Fig. 2. Block diagram of a MIMO wireless link.
3.1. Linear Detection and Successive Interference Cance-lation
Linear detection takes a general form of x = Gy. After x is
estimated, quantization is performed to map x to the closet
constellation point. Depending on how the matrix G is cho-
sen, different detection schemes arise. In zero forcing (ZF),
G is chosen to be the Moore-Penrose pseudo inverse of the
channel.
GZF = (HHH)−1HH = H† (1)
where (·)H denotes the conjugate transpose. Though ZF
transforms the channel to identity, the effective noise, nZF =Gn, can be enhanced.
In MMSE, the following matrix G is used which accounts
for the noise as well as the channel.
GMMSE = (HHH + σ2nI)−1HH (2)
where I is the identity matrix and σ2n is the noise variance.
Both ZF and MMSE perform matrix multiplication in the
detection stage. A matrix multiplier is conventionally imple-
mented using a multiply and accumulate (MAC) unit. The
block diagram of a MAC unit, and the implementation of a
ZF detection stage is depicted in Fig. 3. Parallel MAC units
can be employed to speed up computation.
Successive interference cancelation (SIC), unlike linear
schemes, detects the symbols one by one. It is an iterative
process with a total of MT iterations and each iteration de-
pendent on the previous iteration. SIC can suffer from error
propagation if a previous signal was detected incorrectly. A
detailed description of SIC can be found in [7].
Each iteration of SIC is implemented with a block sim-
ilar to linear detection, but the iterations cannot be done in
parallel. Several chains of iterations can be scheduled using
pipelining to match the throughput of a parallel implementa-
tion of linear detection. The architectural block diagram for a
single iteration, and for the complete detection stage is shown
in Fig. 4.
3.2. Maximum-Likelihood Detectors
The ML detector finds the solution to argmins∈FMT ‖y−Hs‖.
A straightforward implementation will perform an exhaustive
240
8D
jiG,
jy js8
8
D
jx
(a)
jyjG ,1 jG ,2 jNt
G ,
1x 2x tNx
MAC MAC MAC
(b)
Fig. 3. Architecture of: (a) multiply-accumulate (MAC) unit,
and (b) linear detection stage.
MAC
)( ijy
jiG,
8D
ijH ,
-
SIC iteration)1(i
jy
D D
(a)
SIC Iter SIC Iter SIC Iter
1s 2s tNs
)2(jy
)3(jy )( tN
jy)1(jy
jG,1 1,jH jG ,2 2,jH jNtG , tNj
H ,
D D D
(b)
Fig. 4. Architecture diagram of: (a) a single iteration stage,
and (b) complete detection stage for a successive interference
cancelation receiver.
search over all 2qMT possible transmit vectors, which is pro-
hibitively costly for high dimension systems. ML achieves
the optimal BER performance at the cost of exponential com-
plexity. Sphere decoders (SD) are widely used as an alterna-
tive to ML due to its reduced complexity. Sphere decoding
transforms the ML search problem into a tree traversal prob-
lem where branches can be pruned according the the sphere
constraint. However, the expected number of elements in
the search space is still exponential in the dimension of the
MIMO system, thus, application of SD at high dimensional
systems are of great concern.
In terms of implementation, SD has variable complexity
because the number of nodes visited varies with the input
data. This makes it hard to achieve efficient implementations.
One alternative to this is the K-best detector. In K-best, detec-
tion on the tree is performed breadth first, and for each level,
only the best K are retained. This gives a constant complex-
ity sub-optimal algorithm. However, its performance loss is
small for sufficiently large K. Simulations show that for a
4×4 16-QAM system, K = 5 is a good tradeoff between per-
formance and complexity, which is also verified in [8].
3.3. Reduced Dimension Maximum Likelihood Detector
Reduced dimension ML (RDML) detector was first intro-
duced in [1] and it forms the core of our proposed recon-
figurable detector. The idea is to perform ML over a re-
duced dimensional space to reduce the size of the search.
Linear techniques are then used to determine the symbols
corresponding to the remaining dimension. To reduce the
dimension, the channel matrix is vertically divided into
two matrices H1 and H2 each n1 and n2 columns wide
(n1 + n2 = MT ). By defining a dimension reduction oper-
ator Z = σ2(H2HH
2 + σ2I)−1
, the detection is now a two
step process.
s1 = argmins1∈Fn1
‖Zy − ZH1s1‖ (3)
s2 = HH2
(H2HH
2 + σ2I)−1
(y − H1s1). (4)
The result of the two steps are concatenated to form the
complete detected symbol s. In practice, the performance of
RDML can degrade significantly from ML due to the subop-
timal performance of the ML stage. To combat this effect, a
list of best candidate symbols are obtained from the ML stage
(a list-SD or K-best). Then, MMSE detection is performed
on each candidate. The minimum distance solution among all
candidates is chosen as the final estimate.
4. CHANNEL CONDITION NUMBER
As depicted in Fig. 1, a readily available metric to assess
channel quality is essential for a successful reconfigurable im-
plementation. The condition number of the channel matrix H(C(H)) is ideal, as it can be computed by the preprocessor.
The condition number C(A) of a matrix A is defined as
C(A) = ‖A‖‖A−1‖ (5)
‖A‖ = maxx�=0
‖Ax‖‖x‖ . (6)
The condition number defines the ratio of the maximum
stretching to maximum shrinking the matrix A does to any
vector. A high condition number indicates that a small per-
turbation of the input (due to n) results in a large deviation
in the solution (s). Fig. 5(a) plots the cumulative distribution
of condition number for a Raleigh fading channel for a 4× 4
241
MIMO system. Most channels turn out to be good channels
(have low C(H)) because 55% of the channels have C(H)below 8 while 80% of the channels have C(H) below 13.
Fig. 5(b) plots the BER vs. channel condition number for the
same 4 × 4 MIMO system with 16-QAM modulation at an
SNR of 15dB using various detection schemes. It can be seen
for a specified BER of 10−3 at low C(H), MMSE can be
used in place of the more expensive K-best detector. Coupled
with the distribution of the condition number in Fig. 5(a), we
can see that RDML with low n1 can be used for a large num-
ber of channels without sacrificing performance. The power
consumption of the reconfigurable RDML detector can be
expressed as:
P =
MT∑
i=0
PDi[CDF (τi+1)− CDF (τi)] (7)
where PDi is the power consumption of RDML with n1 = i,CDF (τ) is the cumulative distribution of C(H) at point τ ,
and τi is the condition number threshold (denoted as τC(H))
used to choose n1. Given that most channels are good, it is
highly probable for a low complexity detector to be used, re-
sulting in significant power savings over an ML only detector.
Thus, the availability of the condition number gives an oppor-
tunity to implement a low complexity reconfigurable detec-
tor. Low complexity detectors can be used for good channels
while more sophisticated detectors are used in bad channels.
4.1. Impact of Channel Estimation Errors
A MIMO receiver typically estimates the channel matrix
through a channel estimator which is part of the preprocess-
ing stage. As the channel cannot be perfectly estimated this
will impact the performance. It has been shown in [9] that
the curves of various MIMO detection schemes show similar
degradation with increasing channel estimation error, and the
effects of estimation error can be neglected.
5. PROPOSED RECONFIGURABLE REDUCEDDIMENSION ML MIMO DETECTOR
Reconfigurable RDML utilizes the condition number (as
shown in Fig. 1). RRDML(n1) detector adjusts the dimen-
sion parameter n1 (ML dimension) based on the channel
condition number, where the dimensionality parameter of the
linear detector is MT − n1. For good channels (C(H) ≈ 1)
only linear detection is used (RRDML(0)) while in bad
channels (C(H) > 40) only ML detection is performed
(RRDML(MT )). For other channels, RRDML(n1) with
0 < n1 < MT is employed. It is the role of the prepro-
cessor to compute the condition number, decide the value of
n1, and provide the appropriate control signals for detector.
0 10 20 30 40 50 60 70 800
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
C(H)=13CDF(C(H))=0.8
C(H)
CD
F(C
(H))
C(H)=8CDF(C(H))=0.55
0 2 4 6 8 100
0.1
0.2
0.3
0.4
0.5
0.6
0.7
(a)
0 10 20 30 40 50 60 70 8010 5
10 4
10 3
10 2
10 1
100
C(H)
BE
R
MMSEZFK best (K=5)RDML KB n1=1
RDML KB n1=2
RDML KB n1=3
(b)
Fig. 5. Impact of condition number C(H) on detector perfor-
mance:(a) cumulative distribution function (CDF) of C(H)for a Rayleigh fading channel, and (b) the impact of C(H) on
various detection schemes.
5.1. The RRDML MIMO Detector
The block diagram for the RRDML detector is given in Fig. 6.
The ML-based detection unit performs the detection of s1 and
provides a list of candidates. Two RRDML architectures are
considered depending on the ML algorithm employed: the
original sphere decoding algorithm (RRDML-SD) and K-best
(RRDML-KB). The architecture for SD is based on the one-
node-per-cycle architecture [10] and the architecture for K-
best is based on [11]. As the ML detection stage for RDML is
imperfect, the ML based architectures were modified to out-
put a list of candidates. For SD, a list-SD is employed which
tracks the 5 best candidates. For K-best, K = 5 was cho-
sen, by reasons mentioned in Section 3.2, and the final stage
was adjusted to give 5 candidates instead of only the mini-
mum distance candidate. The ML detection stage begins by
estimating the MMSE solution to find a good initial value for
the radius. If the number of candidates that satisfy this sphere
constraint is less than K, fewer candidates are passed down to
further reduce the MMSE-SIC computation. The ML detec-
tion stage is by-passed if no ML-based detection is needed.
The MMSE-SIC block implements the architecture de-
242
SD or K-bestDetectionBlock11,ZH
y
Parallelto
SerialBlock
MMSE-SICDetectionBlock1s
2s
22,ZH
Post-ProcessingBlock
Concat.and
ReorderBlock
s~
H yP
y1H
Variabledelay
MMSEDetectionBlock
initialradius
0
0
ML Bypass
s
MMSEBypass
ML Based DetectionDimension
1n 1nMT
Linear DetectionDimension
Fig. 6. High level block diagram of reconfigurable RDML detector.
Table 1. Comparison of operations (multiplication/addition)
per symbol detection for a 4 × 4 16-QAM MIMO system at
various SNRDetection SNRScheme 10dB 13dB 16dB 19dB 22dB 25dB
MMSE 16/12 16/12 16/12 16/12 16/12 16/12
RRDML-SD 368/325 306/288 255/258 215/234 185/216 167/205
RRDML-KB 193/166 193/166 193/166 193/166 193/166 193/166
K-best 460/680 460/680 460/680 460/680 460/680 460/680
SD 652/748 549/687 474/642 425/613 390/592 371/581
picted in Fig. 4(b). This block and all the blocks that follow
are by-passed in case ML-only detection is performed. The
concatenate and reorder block merges the candidate symbols
s1 and s2 into one vector s in the correct order (as permutation
of H reorders the detection). This block consists of several
multiplexers. The post processing block calculates the actual
distance between the detected candidates (s) and the received
signal y. It chooses the minimum distance candidate as the
final detection estimate s.
5.2. Complexity Analysis
The hardware gate complexity of RRDML is slightly higher
than conventional ML based detectors due to the additional
MMSE detector and auxiliary blocks as described in Sec. 5.1.
However, the expected number of nodes that are visited in
the ML-based stage decreases exponentially with reduction
in each dimension. This alone accounts for more than 40%
reduction in computation complexity. Comparison on the av-
erage number of multiplications and additions for various de-
tectors are summarized in Table 1.
6. SIMULATION RESULTS
This section shows the performance achieved by RRDML
along with its power savings. A 4 × 4 MIMO system with
16-QAM modulation is simulated.
0 5 10 15 20 25 3010 6
10 5
10 4
10 3
10 2
10 1
100
SNR (dB)B
ER
ZFMMSERRDML KBRRDML SDSDK Best (K=5)
Fig. 7. BER vs. SNR for various detector architectures.
6.1. BER Performance
The bit error (BER) performance of RRDML is given in Fig. 7
with comparisons to conventional SD, conventional K-best,
ZF and MMSE. In Fig. 5(a), it can be seen that RRDML-
KB with n1 = 3 achieves almost identical performance as
K-best. Based on this figure, the condition number threshold
(τC(H)) for RRDML-KB was chosen as 4, 8, 15, and ∞ (K-
best-only is not used). The thresholds for RRDML-SD were
chosen using a similar plot and the values are 4, 6, 8, and 15.
Within the SNR of interest (15dB to 20dB), it can be seen that
RRDML has minimal performance degradation compared to
ML only detectors.
6.2. Power Savings
The power estimation is done by estimating the number of
operations, as given in Table 1, and scaling it by the power
consumption of each operation. The power consumption of a
multiplier and adder is obtained through HSPICE simulations
using 1.2V 45nm CMOS process at 150MHz and is estimated
to be 460μW and 30.7μW , respectively. Fig. 8 shows the
resulting power consumption. SD based architectures have
a decrease in power as SNR is increased due to its variable
complexity. RRDML-SD achieves power savings of 44% to
55% at an SNR from 10dB to 25dB compared to SD, while
243
10 15 20 2550
100
150
200
250
300
350
SNR (dB)
���
e� (�
�)
RRDMLS�KBRRDMLS�SDSDK�Best (K=5)
5�� ���e�s�����s
������� 44����e�s�����s
������� 55����e�s�����s
Fig. 8. Power consumption of RRDML and ML-based detec-
tors.
0 10 20 30 40 50 60 �0 �0100
120
140
160
1�0
200
220
240
τ�(�)
���
e� (�
�) RRDML KB
K Best(K=5)
Fig. 9. Power vs. τC(H) for a restricted RRDML with n1 = 2or 4 at SNR of 20dB.
RRDML-KB achieves 59% savings compared to K-best.
The RRDML-KB MIMO receiver was implemented in
Verilog with a precision of 15b for the received vector y and
12b for the channel coefficient H and synthesized using a
45nm standard cell library with Synopsis Design Compiler.
PrimeTime was used to estimate the power consumption at a
frequency of 150MHz. At a SNR of 20dB, the power con-
sumption of RRDML-KB is 82.4mW while conventional K-
best is 218.7mW, which represents a savings of 62.3%. There
is a 6% difference in the values obtained in Fig. 8. The BER
loss was only 3.7%.
To illustrate how the threshold τC(H) affects power con-
sumption, a simplified RRDML was constructed which uses
only a single threshold with a two dimension ML-based de-
coding stage (n1 = 2) for good channels, and ML-only
(n1 = 4) for bad channels. Fig. 9 shows the power con-
sumption vs. τC(H) at SNR 20dB. This figure shows the
power/performance tradeoff that RRDML provides. The
power graph resembles CDF (C(H)) in Fig. 5(b). This is
because the power relationship is very linear and it is possi-
ble to predict accurate power estimates based on τC(H) (and
using Fig. 5(b)). This is even be expected for SD where the
expected number of nodes that the SD algorithm visits is
dependent on C(H) as well as SNR.
7. CONCLUSION
We presented a reconfigurable MIMO detector based on a re-
duced dimension maximum likelihood (RDML) detector and
the condition number of a MIMO channel. Simulations show
that RRDML has minimal loss in BER while achieving up
to 62.3% power savings compared to an ML-only detector.
Future work include error resilient RRDML receivers which
are expected to provide further power savings by trading off
the increased resiliency with power (by means of voltage re-
duction). An integrated circuit implementation of RRDML is
warranted as well.
8. REFERENCES
[1] J. W. Choi, B. Shim, A. Singer, and N. I. Cho, “Low-
complexity decoding via reduced dimension maximum-
likelihood search,” IEEE Trans. Signal Process., vol. 58, no. 3,
pp. 1780–1793, Mar. 2010.
[2] S. Chen and T. Zhang, “Low power soft-output signal detector
design for wireless MIMO communication systems,” in Proc.IEEE ISLPED, 2007, pp. 232–237.
[3] J.-H. Lin and K. Parhi, “Low complexity iterative joint detec-
tion, decoding, and channel estimation for wireless MIMO sys-
tem,” in Proc. of IEEE SIPS, Oct. 2006, pp. 45–50.
[4] R. Jenkal and R. Davis, “An architecture for energy efficient
sphere decoding,” in Proc. IEEE ISLPED, Aug. 2010, pp. 244–
249.
[5] Evolved Universal Terrestrial Radio Access (E-UTRA); LTEphysical layer; General description, The 3rd Generation Part-
nership Project (3GPP) Std. 3GPP TS 36.201, Rev. 10.0.0,
Dec. 2010.
[6] J. M. Rabaey, “Low-power silicon architecture for wireless
communications: embedded tutorial,” in Proc. of the 2000ASP-DAC, ser. ASP-DAC ’00. New York, NY, USA: ACM,
2000, pp. 377–380.
[7] P. Wolniansky, G. Foschini, G. Golden, and R. Valenzuela, “V-
BLAST: An architecture for realizing very high data rates over
the rich-scattering wireless channel,” in Proc. URSI ISSSE,
1998, pp. 295–300.
[8] Z. Guo and P. Nilsson, “VLSI implementation issues of lat-
tice decoders for MIMO systems,” in Proc. IEEE ISCAS, May
2004, pp. IV – 477–480.
[9] M. Rupp, “On the influence of uncertainties in MIMO decod-
ing algorithms,” in 36th Asilomar Conference on Signals, Sys-tems and Computers, 2002, pp. 570–574.
[10] A. Burg, M. Borgmann, M. Wenk, M. Zellweger, W. Fichtner,
and H. Bolcskei, “VLSI implementation of MIMO detection
using the sphere decoding algorithm,” IEEE J. Solid-State Cir-cuits, vol. 40, no. 7, pp. 1566–1577, Jul. 2005.
[11] K. Wong, C. Tsui, R. Cheng, and W. Mow, “A VLSI architec-
ture of a K-best lattice decoding algorithm for MIMO chan-
nels,” in Proc. IEEE ISCAS, Aug. 2002.
244