High Design-Level Comparison of MIMO Baseband Hardware Architectures
Transcript of High Design-Level Comparison of MIMO Baseband Hardware Architectures
Slide Slide 11
High Design-Level High Design-Level ComparisonComparison of MIMO of MIMO
Baseband Hardware Baseband Hardware ArchitecturesArchitectures
Steffen PaulSteffen Paul
Infineon TechnologiesInfineon Technologies
MunichMunich, Germany, Germany
Markus Markus RuppRupp
TU TU ViennaVienna, INTHFT, INTHFT
ViennaVienna, , AustriaAustria
Slide Slide 22
ContentsContents
Current situation in designing modem hardware and futureCurrent situation in designing modem hardware and future
architecturesarchitectures
Modem parameters for MIMO HSDPAModem parameters for MIMO HSDPA
Modem architecture covering Release 4 to Release 6Modem architecture covering Release 4 to Release 6
Modem Modem subblocksubblock and their implementation issues and effort and their implementation issues and effort
ConclusionsConclusions
Slide Slide 33
AlgorithmsAlgorithms and and DSPsDSPs
Computational complexity of wireless systems grows faster than processorComputational complexity of wireless systems grows faster than processor
performanceperformance
Gap gets larger and largerGap gets larger and larger
Source: J. Source: J. RabaeyRabaey (UCB) and R. Subramanian ( (UCB) and R. Subramanian (MorphicsMorphics))
Slide Slide 44
Impact on System Impact on System ArchitectureArchitecturePurely DSP based systems (asPurely DSP based systems (as
e.g. in GSM) can only be realizede.g. in GSM) can only be realized
some time after the introductionsome time after the introduction
of a wireless standardof a wireless standard
Design of the receiverin dedicated hardwaretakes too much effort
Such solutions willhit the marketearly enough
Source: T. Noll, RWTH Aachen
Slide Slide 55
ExpectedExpected MIMO Modem MIMO Modem ArchitecturesArchitectures
MIMO modems will be built by a mix ofMIMO modems will be built by a mix of
dedicated HW blocks (HW accelerators) performing regular operationsdedicated HW blocks (HW accelerators) performing regular operations
at high data rateat high data rate
dedicated simple application specific processors for specific signaldedicated simple application specific processors for specific signal
processing task, e.g. pilot signal processing.processing task, e.g. pilot signal processing.
These processors interact only locally with HW blocks (e.g. HW 5, HW 6These processors interact only locally with HW blocks (e.g. HW 5, HW 6
with one being a processor) with the restriction not to occupy the bus.with one being a processor) with the restriction not to occupy the bus.
DSP dDSP HW 3 HW 4
Bus
HW 5 dDSP
Slide Slide 66
Standard Evolution and Modem Standard Evolution and Modem DevelopmentDevelopment
Advantages of such an architectureAdvantages of such an architecture
Early start with product developmentEarly start with product development
Optimization of computational demanding blocksOptimization of computational demanding blocks
Late changes in block interaction and fine-tuning of algorithmsLate changes in block interaction and fine-tuning of algorithms
Evolution of a wireless standard
Basic parameters
specified(e.g. CDMA system)
Step by step enhancements of
features, definition ofperformance requirements,
configurations etc.
Settling of performance
requirements
Product development
Architecture and datapathdefinition
Implementation ofalgorithms
Simple fixes: SW update(e.g. task scheduling,parameter estimation
)
Slide Slide 77
A Future MIMO SystemA Future MIMO System
MIMO is part of UMTS in MIMO is part of UMTS in RelRel. 6 together with HSDPA. 6 together with HSDPA
Given Parameters (based on 3 GPP TR 25.876)Given Parameters (based on 3 GPP TR 25.876)
Number of antennas Number of antennas up to four on up to four on TxTx and Rx side and Rx side
Sample RateSample Rate 7.68 M Samples/s with 7.68 M Samples/s with oversamplingoversampling factor 2 factor 2
ModulationModulation QPSK, 16 QAMQPSK, 16 QAM
Spreading codesSpreading codes up to 15 in parallelup to 15 in parallel
with fixed spreading factor 16with fixed spreading factor 16
Channel delay spreadChannel delay spread up to 3.7 up to 3.7 μμs s 30 half chips 30 half chips
Number of propagation paths between any two antennas not specified yetNumber of propagation paths between any two antennas not specified yet
Silicon ParametersSilicon Parameters
Clock frequency of chip 200 - 300 MHzClock frequency of chip 200 - 300 MHz
Slide Slide 88
Some Hardware Aspects beyond OP CountSome Hardware Aspects beyond OP Count
Degrees of freedom
Mapping of operations onto DSP or dedicated HW
Degree of parallelism, loop unrolling
Time multiplexing (resource sharing)
Complexity metric
Number of arithmetic operations
Reuse of blocks / parallelism, loop unrolling
Memory amount and number of read write operations
Amount bus communication
Power consumption, chip area
Slide Slide 99
CodingInterleaving
Mapping
DEMUX
...
Spreading Code 1
Spreading Code 2
Spreading Code C
Scrambling
Code
Scrambling
Code
CodingInterleaving
Mapping...
...
...
Highspeeddatastream
Antenna 1
Antenna T
Transmitter StructureTransmitter Structure
One proposal for UMTS Release 6 MIMO extension (PARC MIMO by Lucent)One proposal for UMTS Release 6 MIMO extension (PARC MIMO by Lucent)
Transmitter:Transmitter:
3GPP RAN WG1, R1-0109413GPP RAN WG1, R1-010941
Slide Slide 1010
Receiver AlgorithmsReceiver Algorithms
ProPro Cons Cons
Matrix algebra basedMatrix algebra based - good performance - good performance - implementation complexity- implementation complexity
numerical stability of fixed numerical stability of fixed
point implementations point implementations
Zero forcingZero forcing
MMSE equalizationMMSE equalization
BLAST techniquesBLAST techniques
CorrelatorCorrelator based based - proven technology in - proven technology in RelRel. 4. 4 - limited performance - limited performance
- same HW for - same HW for RelRel. 4 and. 4 and
HSDPA MIMO extension HSDPA MIMO extension
- flexible design- flexible design
- lots of options for HW-- lots of options for HW-
mapping mapping
RAKE based receiverRAKE based receiver
Slide Slide 1111
Proposed Iterative Receiver for PARC Proposed Iterative Receiver for PARC
As suggested in 3GPP RAN WG1, R1-010941As suggested in 3GPP RAN WG1, R1-010941
MuxMMSEdetectionforremainingantennawith highestSINR
Despread 1
Despread 10
Reconstructsignals forcancellation
Collectandmux
Detect,demap,deinter-leave,decode
Slide Slide 1212
Low Complexity ReceiverLow Complexity Receiver
Reduced complexity receiver based on RAKE (without MRC)Reduced complexity receiver based on RAKE (without MRC)
and ML detectionand ML detection
Complexity proportional to RAKE fingers even under ML!Complexity proportional to RAKE fingers even under ML!
M transmitantennas
N receiveantennas
RF to
base
band
Channel
estimator,Finger
RA
KE
Turbo
decoder
ML
De
tecto
r
virtualantennas
Slide Slide 1313
Low Complexity ReceiverLow Complexity Receiver
Processing of all receive paths as individual fingers without MRC as inProcessing of all receive paths as individual fingers without MRC as in
conventional RAKEconventional RAKE
N receive
antennas
RF tobase
band
Channel
estimator,Finger
ML
De
tecto
r
RAKE
Ant. 1, Finger 1
Ant. 1, Finger 2
Ant. 2, Finger 1
Ant. 2, Finger 2
Ant. 3, Finger 1
Ant. 3, Finger 2
Ant. 4, Finger 1
Ant. 4, Finger 2
Bu
ffer
Bu
ffe
rB
uff
er
Bu
ffe
r
RAKE-Finger
Despreading
PN-
Sequenz
Generator
Descrambling
Scrambling-
Sequenz
Generator
Integrate
DumpCode-
Tracking
(channel impulse response)
Slide Slide 1414
RAKE HW ComplexityRAKE HW Complexity
Per RAKE finger
Optimal sampling point reconstruction
Descrambling 1 Bit multiplication
Despreading fixed spreading factor of 16: 16 complex adds
HW clock much higher than sample rate
Resource reuse possible
Parallel of fingers
Inefficient HW use
Logical fingersmapped onone physical finger
Finger 1
Finger 2Finger 3
Finger M
Symbol Time
Finger 1
Finger 2Finger 3
Finger M
Symbol Time
Slide Slide 1515
RAKE HW ComplexityRAKE HW Complexity
Resource reuse possibilities:
Parameters: Spreading factor SF, HW clock frequenccy fclock, slot durationTslot, number of codes C
Gives reuse factor (C=1):
i.e. up to R = 26 logical fingers can be mapped onto one physical finger
Performance requirements suggest the combination of up to six paths
In total at maximum 4x6XC logical fingers required, if one finger processesone code (C = 1),
However, C=15 also possible prohibitively many fingers
Finger with multicode capability recommended (e.g. 3 codes in parallel),then 4x6xC/3 = logical fingers are required
With moderate clock frequency only 1-3 physical fingers needed
1* * 2 *
/
slot
clock
TSF R
f Symbols Slot=
200clockf MHz= 26R =
300clockf MHz= 39R =
Slide Slide 1616
RAKE HW ComplexityRAKE HW Complexity
Number of logical multicode fingers for four antennas and QPSK only:
Number of physical multicode fingers @ 300 MHz
28.8120804015
2396643212
11.54832166
1.9241681
Data Rate(MBit/s)642
# propagation path per antenna# codes
43215
32112
2216
1111
642
# propagation path per antenna# codes
Physicalimplementationof a few fingersis required
Slide Slide 1717
Rake HW EffortRake HW Effort
Multicode fingers
Transmit signal is the sum of bit streams with different spreading codes
Arrive at receiver after passing the same channel
Sample point reconstruction and descrambling are the same for all signalcomponents (codes) of one propagation path
RAKE-Finger
DespreadingDescrambling
Scrambling-Sequenz
Generator
Code-Tracking
(channel impulseresponse)
RAKE-Finger
Despreading
1
PNSequenz
Descrambling
Scrambling-Sequenz
Generator
-
(channel impulseresponse)
2 3
Integrate
Dump
Integrate
Dump
Integrate
Dump
Integrate
Dump
Integrate
Dump
Integrate
Dump
Slide Slide 1818
Data BufferingData Buffering
Rather than forming subgroups of fingers assigned to individual antennas, allRather than forming subgroups of fingers assigned to individual antennas, all
fingers treated equally -> flexibility in assigning fingers to antennasfingers treated equally -> flexibility in assigning fingers to antennas
Sample streams of receive antennas write into the same buffer (pre-bufferingSample streams of receive antennas write into the same buffer (pre-buffering
a few samples)a few samples)
Data rate: write 4 x 7.68 MHz = 30.72 MHz write: M x 7.68 MHzData rate: write 4 x 7.68 MHz = 30.72 MHz write: M x 7.68 MHz
RF tobase
band
Channel
estimator,Finger
ML
De
tecto
r
RAKE
Ant. 1, Finger 1
Ant. 1, Finger 2
Ant. 2, Finger 1
Ant. 2, Finger 2
Ant. 3, Finger 1
Ant. 3, Finger 2
Ant. 4, Finger 1
Ant. 4, Finger 2
Bu
ffe
r
Ant 1Ant 2Ant 3Ant 4
Tc/2
write Read Finger 1 M
… …
Samples
from RRC
Write 4 and read M samplesWrite 4 and read M samples
@ 300 MHz @ 300 MHz M = 35 could M = 35 could
be supportedbe supported
Split into two buffers doublesSplit into two buffers doubles
MM
Slide Slide 1919
Sample Buffering: 2 SolutionsSample Buffering: 2 Solutions
Solution oneSolution one
Common buffer for all fingersCommon buffer for all fingers
Fingers run synchronouslyFingers run synchronously
Simplifies operation of ML detectorSimplifies operation of ML detector
Synchronous code generatorsSynchronous code generators
Buffer size (I,Q in separate buffers):Buffer size (I,Q in separate buffers):
#Antenna x OSR x (SHO + DS) #Antenna x OSR x (SHO + DS)
= 4 x 2 x (296+120) = 4 x 2 x (296+120)
= 3328 samples @ 8 Bit = 3328 samples @ 8 Bit
= 26 = 26 kBitkBit
Ring bufferCode
generators
Finger 1
Finger 2
Finger 3
Memory
Address
selected byfinger
placement
Slide Slide 2020
Sample Buffering: 2 SolutionsSample Buffering: 2 Solutions
Solution twoSolution two
Symbol buffer at each fingerSymbol buffer at each finger
Code generator phase needs to be controlledCode generator phase needs to be controlled
Total buffer size (I,Q in separate buffers):Total buffer size (I,Q in separate buffers):
#Finger x (SHO + DS)/SF #Finger x (SHO + DS)/SF
= 120 x (296+120) = 120 x (296+120)
= 49920 symbols @ 8 Bit = 49920 symbols @ 8 Bit
= 390 = 390 kBitkBit
#Finger x (SHO + DS)/SF #Finger x (SHO + DS)/SF
= 60 x (296+120) = 60 x (296+120)
= 24960 symbols @ 8 Bit = 24960 symbols @ 8 Bit
= 195 = 195 kBitkBit
Much more memory due to large number of fingersMuch more memory due to large number of fingers
Lower RW-rateLower RW-rate
than solution onethan solution one
Symbol buffer
Channel path profile
CG 1 CG 2 CG 3
Start of code
generators (CG)
Finger 1
Finger 2
Finger 3
To
MIM
O d
ete
cto
Slide Slide 2121
Basic Channel Estimation StructureBasic Channel Estimation Structure
TX and RX structure (TR 25.869)TX and RX structure (TR 25.869)Pilot Symbol
Pattern #1 (P1)
AA
COVSF1
COVSF2
Scrambling code
CSC
+
+
+
+
+
-
+
-h4
h2
h3
h11/g
1/g
hD
hC
hB
hA
Pilot Symbol
Pattern #2 (P2)
A-A or -AA
ha
hb
COVSF1
Scrambling code CSC
+
+
+
-
+
+
+
-
antenna #2
antenna #3
antenna #1
antenna #4
Gain
g
Gain
g
COVSF1
COVSF2
COVSF2
Pilot Symbol
Pattern #1 (P1)
AA
Pilot Symbol
Pattern #1 (P2)
A-A or -AA
X1
X2
X3
X4
Slide Slide 2222
Channel Estimation HW EffortChannel Estimation HW Effort
Per AntennaPer Antenna
1 Descrambling1 Descrambling
2 2 DespreadingsDespreadings (code length 256 chips) (code length 256 chips)
4 Correlations with pilot pattern (AA, A-A)4 Correlations with pilot pattern (AA, A-A)
4 4 SmoothingsSmoothings
Split into two different tasksSplit into two different tasks
Delay profile estimation as input information for finger placementDelay profile estimation as input information for finger placement
descrambling over pilot sequence and pilot modulationdescrambling over pilot sequence and pilot modulation
Channel weight estimationChannel weight estimation
Use of Use of multicodemulticode RAKE finger for channel weight estimation RAKE finger for channel weight estimation
(depending on (depending on multicodemulticode capability up to 2) and capability up to 2) and postprocessingpostprocessing of output of output
(symbol modulation AA, A-A etc.)(symbol modulation AA, A-A etc.)
Slide Slide 2323
ML DetectorML Detector
In principleIn principle
four antennas and QPSK: 256 possibilitiesfour antennas and QPSK: 256 possibilities
four antennas and 16 QAM: 65536 possibilitiesfour antennas and 16 QAM: 65536 possibilities
But reduction is possible by operating on a reduced point set first andBut reduction is possible by operating on a reduced point set first and
picking a number of n best candidates on which the ML search is donepicking a number of n best candidates on which the ML search is done
Typically, n= 1Typically, n= 1……2020
Reduction of lReduction of l22 norm to norm to
Rake finger output with N # antennas, Rake finger output with N # antennas, LLrr # of resolvable paths # of resolvable paths
� �2
argminj
jd
d r Hd=
rNL
r C
( ) ( )5 3
Re( ) Im( ) max Re( ),Im( )8 8
d d d d d+ +
Slide Slide 2424
ConclusionsConclusions
Rake receiver based implementations for the specific requirements ofRake receiver based implementations for the specific requirements of
MIMO HSDPA in UMTS Release 6 is possible with moderate hardwareMIMO HSDPA in UMTS Release 6 is possible with moderate hardware
efforteffort
Number of fingers grows rapidlyNumber of fingers grows rapidly
Special handling of parallel use of up to 15 codes requires the use ofSpecial handling of parallel use of up to 15 codes requires the use of
fingers with fingers with multicodemulticode capability capability
Hardware structure of Release 4 receiver is a subset of the RAKEHardware structure of Release 4 receiver is a subset of the RAKE
MIMO architectureMIMO architecture
More advanced concepts, e.g. interference cancellation can be addedMore advanced concepts, e.g. interference cancellation can be added
to the Rake receiverto the Rake receiver
Slide Slide 2525
ReferencesReferences
M. Rupp, G. M. Rupp, G. GritschGritsch, H. , H. WeinrichterWeinrichter: Approximate ML detection for MIMO: Approximate ML detection for MIMO
systems with very low complexity. Proc. ICASSP 2004, Montreal.systems with very low complexity. Proc. ICASSP 2004, Montreal.
D. D. SamardzijaSamardzija, P. , P. WolnianskyWolniansky, J. Ling: Performance evaluation of VBLAST, J. Ling: Performance evaluation of VBLAST
algorithm in W-CDMA systems. Proc. Vehicular Technology Conf. Fall, 2001.algorithm in W-CDMA systems. Proc. Vehicular Technology Conf. Fall, 2001.
R. Van Nee, A. Van R. Van Nee, A. Van ZelstZelst, G. , G. AwaterAwater. Maximum likelihood decoding in space. Maximum likelihood decoding in space
division multiplexing system. Proc. Vehicular Technology Conf. Spring, 2000.division multiplexing system. Proc. Vehicular Technology Conf. Spring, 2000.
A. A. AdjoudaniAdjoudani, E. Beck. et. al. Prototype experience for MIMO BLAST over, E. Beck. et. al. Prototype experience for MIMO BLAST over
third generation wireless system. IEEE Journal on Selected Areas inthird generation wireless system. IEEE Journal on Selected Areas in
Communications, Vol. 21, No 3, 2003.Communications, Vol. 21, No 3, 2003.