ASIP LDPC DESIGN FOR 802.11AD AND...
Transcript of ASIP LDPC DESIGN FOR 802.11AD AND...
© IMEC 2014
ASIP LDPC DESIGN FOR 802.11AD AND 802.11AC
MENG LI
CSI DEPARTMENT
3/NOV/2014
GDR-ISIS @ TELECOM BRETAGNE BREST FRANCE
© IMEC 2014
OUTLINES
2
1. Introduction of IMEC and CSI department
2. ASIP design flow
3. Template of the high speed LDPC decoding
processor
4. Instantiation for the 802.11ad and 802.11ac
standard
© IMEC 2014
BRIEF INTRODUCTION OF IMEC
3
in 2013:
2086 people
383 residents
289 PhD students
71 nationalities
© IMEC 2014 IMEC 2013 CONFIDENTIAL
DIGITAL BB LOW POWER ARCHITECTURES
DEMONSTRATION
AND TEST
RF AND ANALOG IC AND MODULE DESIGN
SYSTEM AND ALGORITHMS
Strong expertise and track record in high-speed and low-power
architectures and circuits for multi-Gbps communication
© IMEC 2014
Flexibility
ASICs
DSP
Ene
rgy e
ffic
iency
Low duty cycle and/or
simple function
~ power efficient DSP
DIGITAL CASE: IMEC BASEBAND CORES PROGRAMMABILITY WITH HIGH ENERGY EFFICIENCY
5 to 1
0X
hig
her
energ
y e
ffic
iency
BOADRE
S
BLO
X BLO
X BLO
X
High duty cycle and/or
complex function
~ specialized data
(co)-processor
© IMEC 2014
FULL MODELING IN TARGET
COMMERCIAL TOOL FLOW Typical users: ASIC/SoC design teams
© IMEC 2014
HIGH SPEED LDPC ASIP: TEMPLATE
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
MINSIGN
OUTPUT
Syndrome detection
Configurable memory
Compiled firmware, for
given set of modes,
compaction can be done
through synthesis tool
8 parallel slices
Cross-slice
operations
© IMEC 2014
INSTRUCTION SET AND MAPPING
8
© IMEC 2014
HIGH SPEED LDPC ASIP: TEMPLATE
ad ac
Available instances
• Horizontal layered decoding
• Check node processing using
normalized/offset min-sum
• Flexible quantization scheme
• Multiple slices implementation
• Offset permutation (only rotation during
read of data)
• Early stop & Max iteration configuration
• Target Toolsuite based
• Assembly programmable
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
BARRELROT
ADD/SUB
MINSIGN
OUTPUT
Syndrome detection
Configurable memory
© IMEC 2014
INSTANTIATION AND IMPLEMENTATION
CLAUDE DESSET, MENG LI - CSI 10
High throughput
instances
Template/ processor
802.11ad
8 slices
802.11ac
8 slices
Nu
mb
er
of
slic
es
Qu
an
tizati
on
LL
R
Ch
ec
k n
od
e
alg
ori
thm
Ma
x ite
rati
on
nu
mb
er
Ea
rly s
top
flexibility
802.11ac
4 slices
© IMEC 2014
802.11AD REQUIREMENTS
11
2 4 6 8 10 12 1410
-6
10-5
10-4
10-3
10-2
10-1
100
Eb/N
0 (dB)
BE
R/P
ER
R3/4 QPSK (MCS8) BER Performance with maximum iteration number 5
BER floating offset min-sum
BER Q5 offset min-sum
BER Q4 offset min-sum
PER floating offset min-sum
PER Q5 offset min-sum
PER Q4 offset min-sum
SC Multi indoor channel
Decoder input
rate
Up to 8.316 Gbps (MCS 22-24)
Modes MCS 0-24
Block size 672bits
Coding rates 1/2,5/8,3/4,13/16
Quantization Min 4 bits
© IMEC 2014
802.11AD RESULTS – FACT SHEET
12
Feature Value
Algorithm Normalized/Offset Min-sum, layered
decoding
Quantization APP: 7/6-bits & UPD: 5/4-bits
Parallelization Check-node FUs: 42
Bit-node FUs: 42x8 (half layer in parallel)
Functional support 1/2,5/8,3/4,13/16 (all 802.11ad modes)
Technology 28HPM – physical synthesis
Clock [MHz] 440
Quantization 4 bits
Average #cycles/layer 4
Throughput [Gbps for 1 iter for 1 core] 25.8 (R=13/16)
Latency [ns] 0.126us @5iteration
Area [sqmm] 0.269(4bits) [core-level]
Power [mW] 35.21mW @ QPSK R13/16 [core-level];
Energy Efficiency [pJ/bit/it] 3.7-4.7 [core-level]; 4.5-5.5 [wrapper-level]
• Parallel layer decoding
• Early stop
• Back-rotation
• Demapper
ad
2 cores + 3 iterations
HW/SW early stop
Stimuli @0.1dB
© IMEC 2014
SOA COMPARISON
13
[1] 2011, ‘LDPC decoder architecture for high-data rate personal-area networks’, Weiner, M., ect. from UC Berkeley
[2] 2013, ‘A parallelized layered QC-LDPC decoder for IEEE 802.11ad’, Alexios B., ect. from EPFL
[3] 2013, ‘AN AREA AND ENERGY EFFICIENT HALF-ROW-PARALLELED LAYER LDPC DECODER FOR THE 802.11AD
STANDARD’, Meng L. from IMEC
Berkeley[1] EPFL[2] Imec’s[3] Imec’s
Decoding algo. Two phases layered layered layered
LLR quan. 5 bits 5 bits 5 bits 4 bits
Parallelism 42*16 42*2 42*8 42*8
Pipeline scheme frame level X half layer half layer
Early stop YES NO NO YES
Number of cores Single Single Single Multi-cores
ASIC vs. ASIP ASIC ASIC ASIC ASIP
Tech node[nm] 65 40 40G 28HPM
Working freq. [MHz] 150 850 500 440
Throughput [Gbps] 3.08 3.12 5.6 8.3
Area [mm^2] 1.3 0.18 0.16 0.269
Energy eff.[pJ/bit/iter] 6.18 N/A 3.53 4.5-5.5
© IMEC 2014
802.11AC REQUIREMENTS
14
Decoder input rate Up to 6.93 Gbps (8 spatial streams, 160 MHz, 256-
QAM)
Modes MCS 0-9
Block size 648, 1296 or 1944bits
Coding rates 4 coding rates
Quantization Min 5 bits
© IMEC 2014
MAPPING TO 802.11AC
15
8 slices solution 4 slices solution
4 slices solution is more efficient than for low coding
rates
Average #cycles/iter: 74 Average #cycles/iter: 114
© IMEC 2014
802.11AC RESULTS – FACT SHEET
16
Feature Value
Algorithm Normalized/Offset Min-sum, layered decoding
Quantization APP: 7-bits & UPD: 5-bits
Parallelization Check-node FUs: Z=81/54/27
Bit-node FUs: Zx8 or Zx4 orZx2
Technology 28HPM– physical synthesis
Clock [MHz] 500MHz
Instances 8 slices 4 slices
Average #cycles/layer 7 10
Area [sqmm] 0.447 0.383
Throughput [Gbps for1iter 1 core] @In 37.4(R5/6)~13.1(R1/2) 23.7(R5/6)~8.4(R1/2)
Latency [ns] 580ns @10iterations 940ns @10iterations
Energy Efficiency [pJ/bit/it] 6.95 7.10
Functional support all 802.11ac modes (1/2,2/3,3/4,5/6)
Functional completeness
Parallel layer decoding
Back-rotation
Demapper
(R5/6,Z81)
ac
© IMEC 2014
SOA COMPARISON
17
[1] 2010, ‘A 15.8 pJ/bit/iter quasi-cyclic LDPC decoder for IEEE 802.11n in 90 nm CMOS’, Roth. C., etc. from ETH Zurich
[2] 2011, ‘A 115mW 1Gbps QC-LDPC decoder ASIC for WiMAX in 65nm CMOS’, Xiao P., etc. from UC Waseda
[3] 2011, ‘Multi-layer parallel decoding algorithm and VLSI architecture for quasi-cyclic LDPC codes’, Yang S., ect. from UC Rice
Zurich[1] Waseda[2] Rice[3] Imec’s
Decoding algo. layered layered Multi-layered layered
LLR quan. 5 bits 5 bits 5 bits 5 bits
Parallelism Z Z*2 Z*4 Z*4
Pipeline scheme NO NO 1/12 layer 1/6 layer
Early stop NO NO NO NO
ASIC vs. ASIP ASIC ASIC ASIC ASIP
Tech node[nm] 90 65 45 28HPM
Size Z & rate Z=81&R=5/6 Z=48&R=? Z=81 &
R=5/6
Z=81 & R=5/6
Working freq. [MHz] 346 110 815 500
Throughput [Mbps] 680 1056 3000 @ 15it 2370@10it
Area [mm^2] 1.77* 3.36* 0.81 0.383
Energy eff.[pJ/bit/iter] 15.8* 10.9* N/A 7.1 * tape out results
© IMEC 2014
802.11AC REQUIREMENTS
18
8 slices solution 4 slices solution
Multi-layer decoding
4 slices solution
© IMEC 2014
Heading for ultra-high
throughput FEC
© IMEC 2014