Advanced Wireless Receivers: Algorithmic and Architectural Optimizations
-
Upload
quintin-hollis -
Category
Documents
-
view
19 -
download
0
description
Transcript of Advanced Wireless Receivers: Algorithmic and Architectural Optimizations
RICE UNIVERSITY
Advanced Wireless Receivers: Algorithmic and Architectural Optimizations
Suman Das
Rice University
Department of Electrical and Computer Engineering&
Center for Multimedia Communication
RICE UNIVERSITY
Introduction
Wireless is one of the fastest growing industries
0
100
200
300
400
500
600
700
1993 1994 1995 1996 1997 1998 1999 2000 2001
mill
ions
of c
ell-p
hone
use
rs
Year
Source: Ericsson
“By 2002, a lot more cellular phones are going to have internet access than PCs.”Larry Ellison , CEO, Oracle.
RICE UNIVERSITY
Wireless Cellular
Ubiquitous wireless connectivity
Wireless LANBluetooth/
Home Networks
Ad-hoc Network
RICE UNIVERSITY
Why advanced receiver algorithms?
The number of wireless subscribers growing
Multimedia data replacing voice traffic
Higher and varied data rate (144Kbps - 2Mbps)
Stricter quality of service (QOS)
Wireless bandwidth remains a critical resource
Current generation receivers are suboptimal
RICE UNIVERSITY
Performance of advanced receivers
Current receiver
Advanced receiver
Theoretical limit
6 8 10 12 14 16410
-4
10-2
100
bit
err
or
rat
e
SNR (dB)
Huge performance improvement
RICE UNIVERSITY
Computational requirements of advanced receivers
15 user system transmitting at 0.5Mbps needs
~20 Billion additions per second
~15 Billion multiplications per second
Requires 32 bit floating point precision
50 floating point DSP-s running at 200MHz to sustain the computation!
RICE UNIVERSITY
My research
Receiver design
High performance
Low complexity
Approach
Algorithmic simplification
Efficient architectural mapping
RICE UNIVERSITY
Wireless channel model
Direct Path
Reflected Paths
Noise
User 1
User 2
Base Station
Channel Effects
Background noise
Fading
Multiple paths
Multiple Users
Multiple Access Interference(MAI)
RICE UNIVERSITY
Code Division Multiple Access (CDMA)
time
S(t)
bit
chip Spreading gain = 7
Wideband CDMA
- technology of choice
Users distinguished by
spreading sequence
K
k
P
pkpkkpk tnibiTtswtr
1 1,, )()()()(
Received signal K: # of usersP: # of pathsw: attenuation : delay b: data bits
-1 -1 1 -1 1 1 1
RICE UNIVERSITY
CDMA system
Proposed advanced/multiuser receiver modules
Designed in isolation
Suboptimal design
DECODING DETECTION DEMODULATION
CHANNEL ESTIMATIONRECEIVER
detected bits of all K users
TRANSMITTER
ENCODING SPREADING MODULATION
OTHER USERSdata
RICE UNIVERSITY
Integrated receiver design
Joint channel estimation and detection
Joint detection and decoding
DECODING DETECTION DEMODULATION
CHANNEL ESTIMATIONRECEIVER
detected bits of all K users
RICE UNIVERSITY
Why separate channel estimation and detection?
Chip-matched filter Channel Estimation
Detection
Code-matched filter
Received signal
ri
bibi+1
delay
time
Processing Window for Chan. Est. Operate on different statistics
RICE UNIVERSITY
Towards an integrated solution
Reuse computation from channel estimation step
Use same discretized filter output
Avoid alignment to bit interval of each user
Reduce computation
Save hardware
RICE UNIVERSITY
Components of the observation vector
-1 -1 1 -1 1 1 1 1 1 -1 1 -1 -1 -1
bit i = +1 bit i+1 = -1
attenuation
delay
1 1 1 0 0 0 0
0 0 0 -1 -1 1 -1
+
wk,p
wk,p
-wk,p
RICE UNIVERSITY
Matrix representation
-1 -1 1 -1 1 1 1 -1 -1 1 -1 1 1 1
bit i = +1 bit i+1 = +1
0
0
0
0
0
0
, pkw
Uk Zk
bk(i) + other users
0
0
0
0
0
0
1
0
0
0
0
0
1
1
0
0
0
0
1
1
1
0
0
0
1
1
1
1
0
0
1
1
1
1
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
r = U Z bpreamble
RICE UNIVERSITY
Efficient statistics
Parametric approach
Build channel model (number of paths)
Estimate delay, attenuation
Produce the code matched filter output
Our approach
Estimate effective spreading code (UZ)
Code matched filter y = (UZ)T r
RICE UNIVERSITY
Simulation parameters
System parameters15 users
3 paths
Spreading gain - 31
Hardware platformTI C62 and C67 EVM boards
64 KB each internal program & data memory
256 KB SBSRAM, 8 MB SDRAM (external)
Code-composer 1.0 to profile code
RICE UNIVERSITY
Effectiveness of integrated design
-4 -2 0 2 4 6 8 10 12 14 1610 -4
10 -3
10 -2
10 -1
10 0
SNR (dB)
bit
erro
r ra
te
Multiuser
Parametric approachUZ approachActual Parameters
Single User
2dB gain in performance
RICE UNIVERSITY
Computational savings
Avoid extraction of actual channel parameters
Avoid realignment of data for code-matched filtering
Reduce intermediate storage requirement
Avoid divisions (28 cycles) and square-root (38 cycles)
in DSP.
RICE UNIVERSITY
Fixed point behavior
Fixed point advantages
Speed Power Cost
Fixed point analysis
12 bit of precision required instead of 32 bits!
Pack two16 bit operations in 32 bit registers
More packing with
• Saturation arithmetic
• User power control!
RICE UNIVERSITY
Time requirement
0
10
20
30
40
50
60
70
80
90
100
Original UnifiedSynch
+ Detect
16 bit fixed-point
68.5
41.8
Nor
mal
ized
tim
e
2.39 X speedup
RICE UNIVERSITY
Integrated receiver design
Joint channel estimation and detection
Effective spreading code approach
Optimized detector design
Joint detection and decoding
DECODING DETECTION DEMODULATION
CHANNEL ESTIMATIONRECEIVER
detected bits of all K users
RICE UNIVERSITY
Linear multiuser detector
N block-lengthK # of Users
Received signal r = (UZ) b + n
Channel estimation (UZ)
Matched filter output y = (UZ)T r
Linear detector R b + n= y solve
R = ((UZ)TUZ)
Size of the linear system (NK)
Direct inverse takes O((NK)3) operation
RICE UNIVERSITY
Outline of the Kronecker algorithm
Kronecker representation
Isolates structure and the matrix
blocks
Fourier transform converts it to a
block-diagonal system
Computationally optimal
Correlation matrix is block-ToeplitzApproximate it as a block-circulant system
Solve N independent order K system iteratively
RICE UNIVERSITY
Speedup in detector
Complexity
O(N2K3)
Vs
O(NK2 + KNlogN)
0
10
20
30
40
50
60
70
80
90
Decorrelator Kronecker
Ach
ieva
ble
data
rat
e (K
bps)
10.4 Kbps
83.1 Kbps
RICE UNIVERSITY
Pipelining and parallelization
Mostly matrix based operations
Detector - iterative algorithm
Pipeline various iterations
Parallelize operations
Add more functional units
Distribute data across functional units
Distribute computations
RICE UNIVERSITY
Projected computation time
0
100
200
300
400
500
600
Base MultiuserAlgorithm
Ach
ieva
ble
data
rat
e (K
bps)
20.75 Kbps
564.5 Kbps.
DSP only
DSP + Coprocessor
support
Hardware Pipelining
Pipelining + Parallelization
154.3 Kbps
30 adders and
multipliers
RICE UNIVERSITY
Integrated receiver design
Joint channel estimation and detection
Effective spreading code approach
Optimized detector design
Joint detection and decoding
DECODING DETECTION DEMODULATION
CHANNEL ESTIMATIONRECEIVER
detected bits of all K users
RICE UNIVERSITY
Maximum a-posteriori (MAP) decoding
Received signal: r = UZd + n
Optimum decoding rule
Constrained optimization problem
Decode all users simultaneously
Exponential complexity in number of users
)r|d(logmaxargˆd
pdC
TRANSMITTER
ENCODING SPREADING MODULATION
OTHER USERS
b d
RICE UNIVERSITY
Single-user detection and decoding
Suboptimum alternatives
Isolate detection and decoding
y1
yK
b1^
bK^
r
MF K
MF 1 Decoder 1
Decoder K.
.
.
.
RICE UNIVERSITY
Decoding matched filter outputs
1 2 3 4 5 6 7 810
-4
10-3
10-2
10-1
100
SNR(dB)
BE
R
MF+Viterbi
Optimal
Huge performance loss!
RICE UNIVERSITY
Iterative detection and decoding
)]d(log)d|y([logmaxarg
)y|d(logmaxargd
cccd
ccd
c
c
c
pp
p
c
c
C
C
^ yc = (UZ)Tc(r- (UZ)IdI)
User of concern c, interfering users I
r = (UZ)cdc + (UZ)IdI + z
Estimate dI
Eliminate interference:
Estimate dc for the next step
Complexity linear in number of users
RICE UNIVERSITY
Reduction in decoding complexity
Convolutional code
Coded bits depend on past data bits
Performance improves with memory length
Viterbi algorithm for decoding
Complexity exponential in memory length
Our suboptimal approach
Maximal weight basis decoding
Complexity quadratic in memory length
RICE UNIVERSITY
Joint detection and decoding performance
1 2 3 4 5 6 7 810
-5
10-4
10-3
10-2
10-1
100
SNR (dB)
BE
R
MF + Viterbi
Iter1 + Subopt
Iter1 + ViterbiIter3 + Subopt
Optimal
Rate = 1/2 = 7
RICE UNIVERSITY
Joint detection and decoding
Huge performance gain.
Suboptimal approximation -
Insignificant performance loss
Significant computational gain
Architecture for suboptimal decoding?
Viterbi algorithm - butterfly architecture
Have a sliding window implementation
RICE UNIVERSITY
Summary of contributions
Integrated channel estimation and detection model [wcnc]
Optimized detection algorithm [PIMRC, Tr. Com]
Fixed point implementation [ICASSP, SPIE]
Parallel architecture [Asilomar]
Joint detection and decoding [Globecom,Tr. Com]
Suboptimal decoding algorithm [Asilomar, Tr. Inf. Th.]
RICE UNIVERSITY
Future research
Wireless Cellular
Wireless LAN
Bluetooth/Home Networks
Ad-hoc Network
RICE UNIVERSITY
Future research
Universal wireless receiver
Reconfigurable solution
Power efficient
Automate design?
Network level interaction
Resource allocation
Quality of service guarantee
Application level interaction
RICE UNIVERSITY
Further details
http://www.ece.rice.edu/~suman
http://www.ece.rice.edu/CMC
RICE UNIVERSITY
Convolutional codes
b
deven
dodd
Rate : 1/2 memory () 2
d2 = d1
d4 = d1 + d3
d6 = d1 + d3 + d5
d8 = d3 + d5 + d7
d10 = d5 + d7 + d9
dodd systematic bits
deven parity bits
RICE UNIVERSITY
Suboptimal single user channel decoder
y = (y1, …yN)
d = (d1, …dN)
Viterbi algorithm:
Complexity grows exponentially with
If no codeword constraint d = sgn(y)
Estimated d may not be a codeword !!
)]d(log)d|y([logmaxargd cccd
cc
pp cC
)d|y(logmaxd
pC
RICE UNIVERSITY
Maximum weight basis decoding
More variables than equationsNR independent variables N: block-length
R: Rate
Choice depends on yi
y= 7.5 d = 1y= - 4.5 d = -1y = 0.5 d = ?
)d|y(logmaxd
pC
d2 = d1
d4 = d1 + d3
d6 = d1 + d3 + d5
d8 = d3 + d5 + d7
d10 = d5 + d7 + d9
Want to choose maximally independent subset with largest total weight
RICE UNIVERSITY
Selection of maximally independent subset
Set I =
Given y, sort the weights |yi | : i = {1..N}
While | I | < NR
Choose location from {1..N} with largest
weight such that I U e is still an independent
subset of {1..N}
Set I = I U e
.
Ie
RICE UNIVERSITY
Suboptimal decoding algorithm
Chose M maximum independent subset
For each independent subset
Compute the codeword dI
Compute the likelihood p ( y|dI )
Chose codeword with largest likelihood
Decoding complexity reduced from O(2) to O(2)
If de = sgn(ye)Ie
RICE UNIVERSITY
Performance improvement
1 2 3 4 5 6 7 810
-4
10-3
10-2
10-1
100
SNR(dB)
BE
R
MF+MAP 2stage + MAP Single User
Performance approaches single-user bound