Post on 21-Dec-2015
Bridging the Energy Gap in Size, Weight and Power Constrained Software Defined Radio:
Agile Baseband Processing as a Key Enabler Bruno Bougard, Min Li, David Novo, Liesbet Van der Perre and Francky Catthoor
Bruno Bougard et al.Athens, May 2008 2
The number of standards to implement in a single handset increases dramatically
Mobility
Sta
tio
na
ryW
alki
ng
Dri
vin
g
Data rate10 Mbps
IEEE802.16a,d
1 100
HSxPA
IEEE802.16e
GSMGPRS
DECT
EDGE
3G LTE
0.1
BlueTooth
UMTS
WLAN(IEEE 802.11b)
WLAN (IEEE 802.11a/g/n)
Bruno Bougard et al.Athens, May 2008 3
All cost factors direct towards high-volume programmable solutions everywhere possible
[source: ICERA]
Bruno Bougard et al.Athens, May 2008 4
Two barriers remain
The energy gap The exploding complexity
2G
3G
3G+
LTE?
.11b
.11g
.11n
MIMO
Bruno Bougard et al.Athens, May 2008 5
VLIW/DSPs
ASIPs
FPGAs
Most SWPC SDR research focuses on more energy efficient processor architectures
Flexibility
Efficienc
y
ASICs
VLIW/DSPs
ASIPs
FPGAs RISCs
GPPs
?VLIW/DSPs
ASIPs
FPGAs
• NXP onDSP, EVP
• Sandbridge Sandblaster SB3011
• SiliconHive CSP2200
• Infineon MUSIC
• Icera DXP
• Nokia VectorASIP
• UMich SODA
• ULinkoping/CORESONICS BBE2
• TUDresden SAMIRA
• …
Bruno Bougard et al.Athens, May 2008 6
Radio Baseband Platform Requirements
Low Cost
Energy Aware Spectrum Agile
Long HW lifespan
Short SW deployment time
Scalable HW/SW
Energy aware HW
Energy aware algorithms
Energy aware protocols
Techno-aware power managnt
Versatile RX digital front end
Versatile TX digital front end
Powerful MAC/RLC/QoS Ctrl
Bruno Bougard et al.Athens, May 2008 7
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format assignment
Bruno Bougard et al.Athens, May 2008 8
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format assignment
Bruno Bougard et al.Athens, May 2008 9
Where do you need flexibility?Where do you need energy efficiency?
Duty Cycle
Div
ers
ity/V
ers
ati
lity
FE steeringSignal detection
Synchronization
ModulationDemodulation(Inner Modem)
Forward Error Correction
(Outer Modem)
Bruno Bougard et al.Athens, May 2008 10
Where do you need flexibility?Where do you need energy efficiency?
Need in energy efficiency
Need
in
flexib
ility
FE steeringSignal detection
Synchronization
ModulationDemodulation(Inner Modem)
Forward Error Correction
(Outer Modem)
Bruno Bougard et al.Athens, May 2008 11
IMEC MIMO-capable SDR baseband platform
802.11n
and next gen.
802.16e and next gen.
• Up to 3 antennas
• Up to 200Mbps
• <500mW
Flexible platform
3GPPLTE
DVB-H/T
Bruno Bougard et al.Athens, May 2008 14
Two Programmable CGA Processor Cores at its heart
• 32KB I$• 128KB IMEM• 128-entries CMEM• 64KB L1 data scratchpad
• TSMC 90G • Dual VT and substrate biasing for leakage
reduction in sleep mode
• Clock rate 400MHz WCC
• Total Area: 6 sqmm• Power consumption
– Active TC VLIW 75mW
– Active TC CGA 300mW
– Leakage @ T=65C 25mW
– Leakage in standby <10mW
2.55 mm
2.27 mm
L1
scra
tch
padI$
CGA Config mem
CGA Config mem
AHB
Core logic(including registerfiles)
2.55 mm
2.27 mm
L1
scra
tch
padI$
CGA Config mem
CGA Config mem
AHB
Core logic(including registerfiles)
2.27 mm
L1
scra
tch
padI$
CGA Config mem
CGA Config mem
AHB
Core logic(including registerfiles)
• 4x4 64-bit 4-way SIMD CGA• VLIW and CGA mode of operations• C-programmable
• 25 (theoretical) GOPS• 46MOPS/mW
Bruno Bougard et al.Athens, May 2008 1515
200 Mbps+ SDR application driver
• IEEE 802.11 n digital inner modem receiver
– Channel bonding 40MHz
– 2 antennas MIMO SDM OFDM
# ant. mod. scheme
cod. rate
SNR [dB] BER = 10
1 BPSK 1/2 3.0
1 QPSK 1/2 6.5
1 16QAM 1/2 12.5
1 64QAM 3/4 22.3
2 BPSK 1/2 5.5
2 QPSK 1/2 11.5
2 16QAM 1/2 18.0
2 64QAM 3/4 34.0
-3
Bruno Bougard et al.Athens, May 2008 17
0 2 4 6 8 10 12 14 16 18
acorr
fshift
xcorr
fft (2x)
freq offset estim.
freq offset comp
SDM MMSE (2x)
tracking
QAM demap
Total preample processing
Total per symbol processing
execution time @ 400MHz (Us)
Profiling for SDR benchmarks and OFDM full application prove real time operations @100Mbps
2-antenna SDM-OFDM @100Mbps
Bruno Bougard et al.Athens, May 2008 18
Great benefit in area but power higher than dedicated hardware solutions
VLIW ctrl0%
FU VLIW4%
FU CGA25%
VLIW reg6%
CGA reg2%
CMEM13%
DMEM10%
I$1%
peripherals1%
CGA intercon - mux - pipeline
38%
Active Power VLIW: 75mWActive Power CGA: 300mWLeakage Power: 25mW
0
0.5
1
1.5
2
2.5
3
3.5
4
802.11n 802.16e DVB-H 11n&16e all
0
50
100
150
200
250
300
350
400
SDR(IMEC)
Reconf.(Intel)
ASIC(source:
Intel)
SDR(IMEC)
ASIC(Atheros)
Bruno Bougard et al.Athens, May 2008 19
The interconnection network dominates the power consumption in VLIW and CGA modes
VLIW ctrl0%
FU VLIW22%
FU CGA2%
VLIW reg21%
CGA reg2%
CMEM0%
DMEM13%
I$10%
peripherals2%
Interconnect + mux28%
VLIW ctrl0%
FU VLIW4%
FU CGA25%
VLIW reg6%
CGA reg2%
CMEM13%
DMEM10%
I$1%
peripherals1%
CGA intercon - mux - pipeline
38%
VLIW mode CGA mode
Active power: 75mWLeakage Power: 25mW
Active power: 300mWLeakage Power: 25mW
Bruno Bougard et al.Athens, May 2008 20
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format assignment
Bruno Bougard et al.Athens, May 2008 21
Wanted: SDR-Platform Aware Signal Processing
Elephant as Platform Horse as Platform
Bruno Bougard et al.Athens, May 2008 22
Dynamic signal processing implementation
Time
3GPPChannelresponse
Cycle Count onSoA processor
Bruno Bougard et al.Athens, May 2008 23
Wanted: SDR-Platform Aware Signal Processing
ASIC as platform
• Requires simple control flow
• Requires manifest and regular computation structures
• Maximum functional reuse is a must
• Minimum data wordwidth• Accommodates high
computation loads• Highest energy efficiency
SDR as platform
• Accommodates more complex control flows
• Accommodates complex and irregular computation structures
• Functional reuse not a must (reuse memory footprint only)
• Aligned data wordwidth• Limited maximum computation
load• Lower energy efficiency
Bruno Bougard et al.Athens, May 2008 24
Algorithm-Architecture Co-Design
• Make algorithm compatible with architecture/compiler constraints • Exploit opportunities of programmable architecture
Architecture/
Compiler
Algorithm/Software
Bruno Bougard et al.Athens, May 2008 25
Channel Channel
Observation
• Wireless baseband processing implies high dynamics • Wireless baseband processing tolerate inaccuracy• This is already considered at system level (X-layer), but what
about in the signal processing implementation?
Bruno Bougard et al.Athens, May 2008 26
The opportunity
• Two viewpoints toward complexity – Computation complexity and memory complexity
– Structure complexity (control flow, heterogeneity , etc.)
• Wireless system can cope with inaccuracy (“scalable” QoS)• On SDR
– Computation complexity is much more costly than in ASIC
– Memory complexity is as costly as in ASIC
– Structure complexity is much less costly than in ASIC
• What can we do ?Increase the structure complexity of baseband processing
to reduce the average computation and memory complexity
by enabling run-time adaptation of the algorithms implementation
to the dynamics in QoS requirement, environment (and platform)
Baseband ASIC Baseband ASIC
with Low Structurewith Low Structure
Complexity Complexity
SDR Baseband SDR Baseband
with High Structurewith High Structure
ComplexityComplexity
Bruno Bougard et al.Athens, May 2008 27
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format assignment
Bruno Bougard et al.Athens, May 2008 28
Motivation: OFDMA Modulation Error requirements vary
WiMAX Specification
Modulation accuracy can be relaxed for lower order modulation
Bruno Bougard et al.Athens, May 2008 29
RCE relaxation can be exploited by a scalable digital OFDMA Modulator
• Original: A large-size (e.g., 1024) IFFT based non-scalable modulator
• Transformed: An scalable OFDMA modulator with 3 cascaded components
Interpolation factor can be used as a knob to adjust the accuracy and computation load to the RCE requirement
Bruno Bougard et al.Athens, May 2008 30
Computation load scales smoothly with the interpolation factor
Norm
aliz
ed c
ycl
e c
ount
Interpolation factor
Bruno Bougard et al.Athens, May 2008 31
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: adaptive fixed-point refinement
Bruno Bougard et al.Athens, May 2008 32
OFDMA mod./demod. requires (I)FFT with Partial input/output
The position and number of bins change dynamically
Bruno Bougard et al.Athens, May 2008 33
Efficient Partial FFT on ILP Architectures
• Exploit the partial input/output to reduce active instructions and memory accesses
• 30 years theoretical research on PFFT but few implementations
• We propose a generic and efficient scheme for PFFT on ILP architectures– Any pattern of bin-distribution can be implemented
Bruno Bougard et al.Athens, May 2008 34
The proposed scheme brings important gains in almost all implementation cost factors and scales smoothly with the number of sub-carriers to be processed
Bruno Bougard et al.Athens, May 2008 35
The prize to pay is an higher instruction cache miss rate (acceptable)
Bruno Bougard et al.Athens, May 2008 36
Outline
• IMEC SDR Baseband Platform
• Wanted: Platform Aware Signal Processing
• Case study 1: OFDMA transmitter
• Case study 2: OFDMA receiver
• Case study 3: Dynamic fixed-point format assignment
Bruno Bougard et al.Athens, May 2008 3737
State-of-the-art
• Automatic Floating point to fixed point conversion (>30 years of work)– Commercial products: Catalytic Inc. & Mathworks
– Recent academic contributions:
• Simulation-based: Seoul National Univ. (‘95)• Analytical methods: Aachen (‘98), Northwest Univ. (‘01)• Hybrid methods: Imperial College (‘03), Berkeley (‘04) and ENSSAT (‘05)
[Yoshizawa, S. et Al. ISCAS’06]
• Run-time word-length selection: Receiver VLSI architecture based in a control feedback loop. Hokkaido University (‘06)
Bruno Bougard et al.Athens, May 2008 3838
Modeling of the fixed-point communication system
• Performance of the communication system as a function of the receiver SNR
– BER = f(SNR)
+b
ac
+na
a
c
+nb
a nc
• Fixed-point refined system includes quantization noise
– BER = f(SNR, na, nb, …) = f’(SNR) ≈ f(SNR’)
0 5 10 15 20 25 30 35 400
20
40
60
80
100
120
SNR [dB]
Th
rou
gh
pu
t [M
bp
s]
BPSK 1/2QPSK 1/216QAM 1/264QAM 2/3
A B C D• Implementation-scenarios defined
and optimized at design time
Bruno Bougard et al.Athens, May 2008 39
RX
Opportunity: application dynamics and tolerance to inaccuracy can be propagated to the implementation
• Multiple link parameters trade off noise/interference robustness versus data rate
• Different system configurations have different requirements in [digital] signal processing accuracy use different implementations
0 5 10 15 20 25 30 35 400
20
40
60
80
100
120
SNR [dB]
Th
rou
gh
pu
t [M
bp
s]
BPSK 1/2QPSK 1/216QAM 1/264QAM 2/3
A B C D
AnalogFE
DigitalDSP
#bitsTX A +
noise
Channel
SNR
SY
ST
EM
LE
VE
LIM
PL
EM
EN
TA
TIO
N L
EV
EL
• We adapt the application fixed-point mapping at run-time• By switching between the “mappings”, the average load is reduced
Bruno Bougard et al.Athens, May 2008 40
SDR enables more agile signal processing implementations
DSP implementation
Run-timecontroller
Monitoring info
AdaptData format
QoS req.
TimeFreq
ChanAtt
Several sw implementation of the same functionality with different precision/computation load
Monotonic relation between precision/load
One can switch between sw implementation in a few cycles
Bruno Bougard et al.Athens, May 2008 41
Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance
Bruno Bougard et al.Athens, May 2008 42
Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance
Bruno Bougard et al.Athens, May 2008 43
Dynamic fixed-point format assignment increases energy efficiency in situation requiring lower performance
Bruno Bougard et al.Athens, May 2008 4444
Increase in scalability
• Energy efficiency increased at lower rate modes
• Average energy consumption is reduced
Bruno Bougard et al.Athens, May 2008 4545
Conclusions
• Energy efficiency of flexible implementation closer to their dedicated hardware counterparts:– Has the potential to continuously best-fit the dynamism.
• Does not rely on hypothetical provision in the standards:– Implementation centric
– Applicable to any functional-level algorithmic solutions
• Wireless systems context today but also other domains tomorrow: – Digital signal processing with an SNR type constraint and which has
dynamic data resolution variation
– biomedical signal processing, multimedia, etc.