Advanced Wireless Receivers: Algorithmic and Architectural Optimizations

RICE UNIVERSITY

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations

Suman Das

Rice University

Department of Electrical and Computer Engineering&

Center for Multimedia Communication

RICE UNIVERSITY

Introduction

Wireless is one of the fastest growing industries

0

100

200

300

400

500

600

700

1993 1994 1995 1996 1997 1998 1999 2000 2001

mill

ions

of c

ell-p

hone

use

rs

Year

Source: Ericsson

“By 2002, a lot more cellular phones are going to have internet access than PCs.”Larry Ellison , CEO, Oracle.

RICE UNIVERSITY

Wireless Cellular

Ubiquitous wireless connectivity

Wireless LANBluetooth/

Home Networks

Ad-hoc Network

RICE UNIVERSITY

Why advanced receiver algorithms?

The number of wireless subscribers growing

Multimedia data replacing voice traffic

Higher and varied data rate (144Kbps - 2Mbps)

Stricter quality of service (QOS)

Wireless bandwidth remains a critical resource

Current generation receivers are suboptimal

RICE UNIVERSITY

Performance of advanced receivers

Current receiver

Advanced receiver

Theoretical limit

6 8 10 12 14 16410

-4

10-2

100

bit

err

or

rat

e

SNR (dB)

Huge performance improvement

RICE UNIVERSITY

Computational requirements of advanced receivers

15 user system transmitting at 0.5Mbps needs

~20 Billion additions per second

~15 Billion multiplications per second

Requires 32 bit floating point precision

50 floating point DSP-s running at 200MHz to sustain the computation!

RICE UNIVERSITY

My research

Receiver design

High performance

Low complexity

Approach

Algorithmic simplification

Efficient architectural mapping

RICE UNIVERSITY

Wireless channel model

Direct Path

Reflected Paths

Noise

User 1

User 2

Base Station

Channel Effects

Background noise

Fading

Multiple paths

Multiple Users

Multiple Access Interference(MAI)

RICE UNIVERSITY

Code Division Multiple Access (CDMA)

time

S(t)

bit

chip Spreading gain = 7

Wideband CDMA

- technology of choice

Users distinguished by

spreading sequence

K

k

P

pkpkkpk tnibiTtswtr

1 1,, )()()()(

Received signal K: # of usersP: # of pathsw: attenuation : delay b: data bits

-1 -1 1 -1 1 1 1

RICE UNIVERSITY

CDMA system

Proposed advanced/multiuser receiver modules

Designed in isolation

Suboptimal design

DECODING DETECTION DEMODULATION

CHANNEL ESTIMATIONRECEIVER

detected bits of all K users

TRANSMITTER

ENCODING SPREADING MODULATION

OTHER USERSdata

RICE UNIVERSITY

Integrated receiver design

Joint channel estimation and detection

Joint detection and decoding




RICE UNIVERSITY

Why separate channel estimation and detection?

Chip-matched filter Channel Estimation

Detection

Code-matched filter

Received signal

ri

bibi+1

delay

time

Processing Window for Chan. Est. Operate on different statistics

RICE UNIVERSITY

Towards an integrated solution

Reuse computation from channel estimation step

Use same discretized filter output

Avoid alignment to bit interval of each user

Reduce computation

Save hardware

RICE UNIVERSITY

Components of the observation vector

-1 -1 1 -1 1 1 1 1 1 -1 1 -1 -1 -1

bit i = +1 bit i+1 = -1

attenuation

delay

1 1 1 0 0 0 0

0 0 0 -1 -1 1 -1

+

wk,p

wk,p

-wk,p

RICE UNIVERSITY

Matrix representation

-1 -1 1 -1 1 1 1 -1 -1 1 -1 1 1 1

bit i = +1 bit i+1 = +1

0

0

0

0

0

0

, pkw

Uk Zk

bk(i) + other users

0

0

0

0

0

0

1

0

0

0

0

0

1

1

0

0

0

0

1

1

1

0

0

0

1

1

1

1

0

0

1

1

1

1

1

0

1

1

1

1

1

1

1

1

1

1

1

1

1

r = U Z bpreamble

RICE UNIVERSITY

Efficient statistics

Parametric approach

Build channel model (number of paths)

Estimate delay, attenuation

Produce the code matched filter output

Our approach

Estimate effective spreading code (UZ)

Code matched filter y = (UZ)T r

RICE UNIVERSITY

Simulation parameters

System parameters15 users

3 paths

Spreading gain - 31

Hardware platformTI C62 and C67 EVM boards

64 KB each internal program & data memory

256 KB SBSRAM, 8 MB SDRAM (external)

Code-composer 1.0 to profile code

RICE UNIVERSITY

Effectiveness of integrated design

-4 -2 0 2 4 6 8 10 12 14 1610 -4

10 -3

10 -2

10 -1

10 0

SNR (dB)

bit

erro

r ra

te

Multiuser

Parametric approachUZ approachActual Parameters

Single User

2dB gain in performance

RICE UNIVERSITY

Computational savings

Avoid extraction of actual channel parameters

Avoid realignment of data for code-matched filtering

Reduce intermediate storage requirement

Avoid divisions (28 cycles) and square-root (38 cycles)

in DSP.

RICE UNIVERSITY

Fixed point behavior

Fixed point advantages

Speed Power Cost

Fixed point analysis

12 bit of precision required instead of 32 bits!

Pack two16 bit operations in 32 bit registers

More packing with

• Saturation arithmetic

• User power control!

RICE UNIVERSITY

Time requirement

0

10

20

30

40

50

60

70

80

90

100

Original UnifiedSynch

+ Detect

16 bit fixed-point

68.5

41.8

Nor

mal

ized

tim

e

2.39 X speedup

RICE UNIVERSITY



Effective spreading code approach

Optimized detector design





RICE UNIVERSITY

Linear multiuser detector

N block-lengthK # of Users

Received signal r = (UZ) b + n

Channel estimation (UZ)

Matched filter output y = (UZ)T r

Linear detector R b + n= y solve

R = ((UZ)TUZ)

Size of the linear system (NK)

Direct inverse takes O((NK)3) operation

RICE UNIVERSITY

Outline of the Kronecker algorithm

Kronecker representation

Isolates structure and the matrix

blocks

Fourier transform converts it to a

block-diagonal system

Computationally optimal

Correlation matrix is block-ToeplitzApproximate it as a block-circulant system

Solve N independent order K system iteratively

RICE UNIVERSITY

Speedup in detector

Complexity

O(N2K3)

Vs

O(NK2 + KNlogN)

0

10

20

30

40

50

60

70

80

90

Decorrelator Kronecker

Ach

ieva

ble

data

rat

e (K

bps)

10.4 Kbps

83.1 Kbps

RICE UNIVERSITY

Pipelining and parallelization

Mostly matrix based operations

Detector - iterative algorithm

Pipeline various iterations

Parallelize operations

Add more functional units

Distribute data across functional units

Distribute computations

RICE UNIVERSITY

Projected computation time

0

100

200

300

400

500

600

Base MultiuserAlgorithm

Ach

ieva

ble

data

rat

e (K

bps)

20.75 Kbps

564.5 Kbps.

DSP only

DSP + Coprocessor

support

Hardware Pipelining

Pipelining + Parallelization

154.3 Kbps

30 adders and

multipliers

RICE UNIVERSITY



Effective spreading code approach

Optimized detector design





RICE UNIVERSITY

Maximum a-posteriori (MAP) decoding

Received signal: r = UZd + n

Optimum decoding rule

Constrained optimization problem

Decode all users simultaneously

Exponential complexity in number of users

)r|d(logmaxargˆd

pdC

TRANSMITTER

ENCODING SPREADING MODULATION

OTHER USERS

b d

RICE UNIVERSITY

Single-user detection and decoding

Suboptimum alternatives

Isolate detection and decoding

y1

yK

b1^

bK^

r

MF K

MF 1 Decoder 1

Decoder K.

.

.

.

RICE UNIVERSITY

Decoding matched filter outputs

1 2 3 4 5 6 7 810

-4

10-3

10-2

10-1

100

SNR(dB)

BE

R

MF+Viterbi

Optimal

Huge performance loss!

RICE UNIVERSITY

Iterative detection and decoding

)]d(log)d|y([logmaxarg

)y|d(logmaxargd

cccd

ccd

c

c

c

pp

p

c

c

C

C

^ yc = (UZ)Tc(r- (UZ)IdI)

User of concern c, interfering users I

r = (UZ)cdc + (UZ)IdI + z

Estimate dI

Eliminate interference:

Estimate dc for the next step

Complexity linear in number of users

RICE UNIVERSITY

Reduction in decoding complexity

Convolutional code

Coded bits depend on past data bits

Performance improves with memory length

Viterbi algorithm for decoding

Complexity exponential in memory length

Our suboptimal approach

Maximal weight basis decoding

Complexity quadratic in memory length

RICE UNIVERSITY

Joint detection and decoding performance

1 2 3 4 5 6 7 810

-5

10-4

10-3

10-2

10-1

100

SNR (dB)

BE

R

MF + Viterbi

Iter1 + Subopt

Iter1 + ViterbiIter3 + Subopt

Optimal

Rate = 1/2 = 7

RICE UNIVERSITY


Huge performance gain.

Suboptimal approximation -

Insignificant performance loss

Significant computational gain

Architecture for suboptimal decoding?

Viterbi algorithm - butterfly architecture

Have a sliding window implementation

RICE UNIVERSITY

Summary of contributions

Integrated channel estimation and detection model [wcnc]

Optimized detection algorithm [PIMRC, Tr. Com]

Fixed point implementation [ICASSP, SPIE]

Parallel architecture [Asilomar]

Joint detection and decoding [Globecom,Tr. Com]

Suboptimal decoding algorithm [Asilomar, Tr. Inf. Th.]

RICE UNIVERSITY

Future research

Wireless Cellular

Wireless LAN

Bluetooth/Home Networks

Ad-hoc Network

RICE UNIVERSITY

Future research

Universal wireless receiver

Reconfigurable solution

Power efficient

Automate design?

Network level interaction

Resource allocation

Quality of service guarantee

Application level interaction

RICE UNIVERSITY

Further details

http://www.ece.rice.edu/~suman

http://www.ece.rice.edu/CMC

RICE UNIVERSITY

Convolutional codes

b

deven

dodd

Rate : 1/2 memory () 2

d2 = d1

d4 = d1 + d3

d6 = d1 + d3 + d5

d8 = d3 + d5 + d7

d10 = d5 + d7 + d9

dodd systematic bits

deven parity bits

RICE UNIVERSITY

Suboptimal single user channel decoder

y = (y1, …yN)

d = (d1, …dN)

Viterbi algorithm:

Complexity grows exponentially with

If no codeword constraint d = sgn(y)

Estimated d may not be a codeword !!

)]d(log)d|y([logmaxargd cccd

cc

pp cC

)d|y(logmaxd

pC

RICE UNIVERSITY

Maximum weight basis decoding

More variables than equationsNR independent variables N: block-length

R: Rate

Choice depends on yi

y= 7.5 d = 1y= - 4.5 d = -1y = 0.5 d = ?

)d|y(logmaxd

pC

d2 = d1

d4 = d1 + d3

d6 = d1 + d3 + d5

d8 = d3 + d5 + d7

d10 = d5 + d7 + d9

Want to choose maximally independent subset with largest total weight

RICE UNIVERSITY

Selection of maximally independent subset

Set I =

Given y, sort the weights |yi | : i = {1..N}

While | I | < NR

Choose location from {1..N} with largest

weight such that I U e is still an independent

subset of {1..N}

Set I = I U e

.

Ie

RICE UNIVERSITY

Suboptimal decoding algorithm

Chose M maximum independent subset

For each independent subset

Compute the codeword dI

Compute the likelihood p ( y|dI )

Chose codeword with largest likelihood

Decoding complexity reduced from O(2) to O(2)

If de = sgn(ye)Ie

RICE UNIVERSITY

Performance improvement

1 2 3 4 5 6 7 810

-4

10-3

10-2

10-1

100

SNR(dB)

BE

R

MF+MAP 2stage + MAP Single User

Performance approaches single-user bound

Advanced Wireless Receivers: Algorithmic and Architectural Optimizations

Documents

Transcript of Advanced Wireless Receivers: Algorithmic and Architectural Optimizations