[IEEE 2011 IEEE Custom Integrated Circuits Conference - CICC 2011 - San Jose, CA, USA...
Transcript of [IEEE 2011 IEEE Custom Integrated Circuits Conference - CICC 2011 - San Jose, CA, USA...
Low Power and Error Resilient PN CodeAcquisition Filter via Statistical Error Compensation
Eric P. Kim, Daniel J. Baker, Sriram Narayanan, Douglas L. Jones, and Naresh R. ShanbhagCoordinated Science Laboratory / Department of Electrical and Computer Engineering
University of Illinois at Urbana-Champaign
1308 W Main St., Urbana, Illinois, 61801
Email:{epkim2, djbaker3, spnaraya, dl-jones, shanbhag}@illinois.edu
Abstract—We present a 256-tap PN code acquisition filter inan 180nm CMOS process employing statistical system-level errorcompensation. Under voltage overscaling (VOS), near constantdetection probability (Pdet) above 90% with 5.8× reduction inenergy is achieved at a supply voltage 27% below the point offirst failure (PoFF) with an error rate (pe) of 0.868. This is animprovement of 5.8× in energy-efficiency over conventional errorfree designs and 3.79× in energy-efficiency and 2170× in errortolerance over existing error tolerant designs.
I. INTRODUCTION
As CMOS technology scales to the sub-45nm regime in
accordance with Moore’s law, nonidealities due to process,
temperature and voltage variations, and soft errors are increas-
ingly becoming commonplace. These variations often result
in uncertain gate delays and leakage currents subsequently
causing intermittent errors in computation. This trend is ex-
pected to worsen in the next decade [1]. Early approaches
seek to avoid these errors by designing for the worst-case
through overprovisioning of resources. These methods are
often wasteful and often unaffordable in many power-limited
applications. Therefore, modern IC systems need to be tolerantof subcomponent errors.
Error-resiliency to delay/timing errors has been shown [2]–
[6] to be an effective approach for combating variations while
achieving energy-efficiency. Voltage overscaling (VOS) (see
Fig. 1) was employed in [2]–[6] to induce timing errors by
reducing the supply voltage Vdd below the point of first failure
(PoFF). The error-rate pe (percentage of clock cycles in which
the output is in error) increases as Vdd is reduced. Figure 1
shows HSPICE simulation results of a 4 tap filter in 45nm
CMOS subject to VOS. If the errors are fully compensated
for without additional overhead, energy reduction up to 9×can be achieved over a system operating at PoFF.
Razor I [3] employs VOS along in situ local (FF level)
timing error-detection, and local correction, in order to reduce
energy while combating variations. Razor I demonstrated that
at an error-rate of 10−7, near the PoFF, the error-correction
overhead is minimal, and energy-efficiency gains are 14%-
to-17%, compared to an architecture operating at the PoFF.
Razor II [4], [5] employs local error-detection and architectural
replay to operate at an error-rate of pe = 4× 10−4, which is
also near PoFF, while achieving an energy savings of 33%-to-
35%.
0.8 1 1.2 1.4 1.6 1.80
2
4
6
8
10
Ener
gy (p
J)
0.8 1 1.2 1.4 1.6 1.80
0.2
0.4
0.6
0.8
1
Vdd(V)
p eenergy
9X
pe
PoFF
11
This work
Past work
2170X
Voltage overscaling (VOS)
Fig. 1. Simulations of voltage overscaling (VOS) for a 4-tap correlation filter(a sensor) at 50 MHz in 45nm CMOS.
In this work, we present a PN code acquisition filter that
employs statistical error compensation (SEC). SEC enables
operation at a voltage significantly less than at PoFF providing
extremely high reliability at very low power (as noted in
Fig. 1). The filter is implemented in an 180nm, 1.8V CMOS
process, operating at an error-rate of pe = 0.868 while
achieving an energy-efficiency of 5.8×.
The remainder of this paper is organized as follows: Section
II introduces the PN-code acquisition application along with
conventional and SEC based architecures. Actual chip design
and architecture used are shown in Section III with Section IV
showing measurement results. Section V concludes the paper.
II. PN CODE ACQUISITION FILTER
Pseudo-noise (PN) codes play an important role in direct-
sequence spread spectrum (DS/SS) systems. PN code acqui-
sition is required to be able to decode the received message.
PN codes have a characteristic of having a cross correlation of
two different PN codes to be zero, while the autocorrelation
has an impulse at lag zero. Reducing the power required to
perform PN code acquisition is essential for mobile wireless
communication [7].
The traditional architecture for a PN code acquisition system
is a simple matched filter such as the one in Fig. 2(a) [8]. This
architecture exploits the correlation characteristic of PN codes.
A length 256-tap PN acquisition filter correlates the received
978-1-4577-0223-5/11/$26.00 ©2011 IEEE
D D D D D D][nx
][ˆ ny1Nh4h3h2h1h0h
(a)
8
8
PN in PN out
Data in
Data out
Fusion init
Algorithm select
Load enable
D D D D
D D D D
D
InternalControl
Sensor
>> 210
y
Statistical Error Compensated PNCode Acquisition Filter
S0
S1
S63
FusionBlock
0y
1y
63y
xPN in
PN code
PN code
PN code
Data inData out
(b)
Fig. 2. PN code acquisition systems: (a) conventional, and (b) SEC based.
signal xj with the PN-code φj as:
yo =
255∑
j=0
φjxj (1)
where φj represents the 1b PN-code and xj is an 8b received
signal. The detection of a PN code is done by performing
correlation with the received signal against a locally generated
PN code. A peak detector or a thresholding block is used to
detect a match by a threshold τ by y = sgn(yo − τ).
The SEC implementation of the PN code acquisition system
is shown in Fig. 2(b). First, a parallel block decomposition of
the matched filter in Fig. 2(a) is obtained. The parallel outputs
are then combined by a fusion block. With a decomposition
factor of 64, the 256-tap correlator is decomposed into 64 4-
tap sub-correlators with each output given by:
yi =
3∑
j=0
φ4i+jx4i+j . (2)
0 1 2 3 4 5 6-150
-100
-50
0
50
100
150
Time index
Sen
sor o
utpu
t val
ue
yo
sensorsmedianmean
Outliers can shift themean while median
is unaffected
Outliersdue to η
Fig. 3. Measured sensor outputs over time.
It should be noted that the sum of all sub-correlators∑63
i=0 yiis equal to yo, the output of the conventional matched filter
in (1). Each sub-correlator is referred to as a sensor. As each
sensor performs correlation over a different subset of the full
PN sequence, sensor outputs exhibit spatially uncorrelated
estimation errors εi, i.e. yi = yo + εi. If additionally, the
sensors are subject to VOS, timing errors ηi are induced as
well, resulting in sensor outputs given by yi = yo + εi + ηi.These errors are compensated via a fusion block that combines
all sensor outputs to a single value using mean and median
operations. Figure 3 plots the correct output yo, measured
outputs of four sensors, and the mean and median of all
64 sensors. Most times the sensor outputs are close to yo,
indicating that εi is Gaussian distributed (relatively small
in magnitude). Once in a while, the sensor outputs deviate
significantly from yo indicating that ηi is large in magnitude.
This is to be expected as MSB errors will occur in LSB-
first computation. The median and mean fusion is shown to
compensate for errors effectively.
A more general approach to SEC has been proposed in [9].
The general framework is referred to as stochastic networked
computation and is applicable to any architecture that can be
decomposed in a statistically similar manner.
III. CHIP ARCHITECTURE
Figure 4 shows the high level block diagram of the PN code
acquisition chip. There are a total of 64 sensors, and a fusion
block implementing mean and median. An adaptive thresh-
olding block at the output determines whether a detection has
occurred.
The SEC chip architecture in Fig. 4 reduces filter energy
consumption by shifting 1b taps instead of the 8b data. The 64
sensor outputs y0, y1, ..., y63 are then processed by the fusion
block to generate the final output y. This is thresholded to
indicate the presence/absence of the PN code in the data. Past
work [9] has shown that the mean and median operations are
effective approximations of the optimal robust estimator. The
fusion block implements a 3-stage hierarchical mean/median
functions to avoid global interconnect. The hierarchical me-
dian is based on [10] and requires special attention. First stage
S0 -S3
S4 -S7
S8 -S11
S12 -S15
S28 -S31
S32 -S35
S60 -S63
F-A
F-B
F-C
F-D
F-E
F-H
F-P
data_in
8
To F-P
F-AA
F-CC
F-DD
F-BB
To F-DD
FinalFusion
Threshold
8
8
8
8
8
8
8
8
8
8
8
8 1
code_in
PN _load
sensors
PN_in
4x8
4x8
4x8
4x8
4x8
4x8
4x8
from S0-S3from F-A, F-B, F-C,
F-D
y
Fig. 4. High level architectural diagram of the chip.
TABLE ICHIP STATISTICS
Technology Vdd Cells Area Frequency
TSMC 180 1.8V 48440 2mm× 2mm 50MHz
has 8 sensors grouped together, with 4 sensor overlap, to create
16 median outputs. These are passed to the next stage, and
grouped in a similar manner to produce 4 median outputs.
The final stage chooses the third largest among the four values.
The threshold is adaptively set to target a specific false alarm
rate. The fusion block was synthesized with a 33% stringent
timing constraint compared to sensors to enable it to operate
error-free at supply voltages of interest. The sensors have
identical functionality, and were synthesized with identical
timing constraints. Thus, the error probability mass functions
(PMFs) for all sensors are expected to be statistically similar.
Figure 5 shows the chip microphotograph. The chip has a
total of 48440 cells, a total cell area of 1.871mm2, and the
total chip area including the pad frame is 2.7mm × 2.7mm.
The core area is approximately 2mm× 2mm, and 10 IO pins
are placed on each side. Spacing from the power rings to the
pad frame is approximately 15 μm. This is summarized in
Table I.
IV. TEST RESULTS
The chip was fabricated in a 180nm, 1.8V, CMOS process,
and tested with Agilent 16900A logic analysis system, at a
frequency of fclk=50MHz. At this frequency, the PoFF voltage
is 0.95V . Test vectors were generated by corrupting a length
256 PN code sequence with additive white Gaussian noise at
an SNR of -12dB. Test vectors with length 106 containing
Fig. 5. Chip microphotograph.
-20 -10 0 10 200
500
1000
1500
2000
2500Error PMF for R1 (Vdd = 0.85V)
magnitudeoc
cure
nce
-20 0 20 40 600
2000
4000
6000
8000
10000Error PMF for R2 (Vdd = 0.76V)
magnitude
occu
renc
e
-100 -50 0 500
2000
4000
6000
8000Error PMF for R3 (Vdd = 0.66V)
magnitude
occu
renc
e
-200 -100 0 100 2000
2000
4000
6000
8000
10000
magnitude
occu
renc
e
Error PMF for R4 (Vdd = 0.60V)
074.0ep 54.0ep
95.0ep 97.0ep
Fig. 6. Measured sensor’s error PMFs.
103 detections were employed. The chip was tested at supply
voltages from 0.95V down to 0.6V . Figure 6 shows the
measured error PMFs at the output of sensor S0 (see Fig. 4) for
various Vdd. The PMFs were obtained by comparing measured
outputs with RTL simulations. Region R1 is at 0.85V , near
the PoFF but well below it by 10%, and hence the error rate is
0.074 (still 100× higher than in [4]) and consists of only small
(single bit) valued errors. Region R2 is at 0.76V (20% below
the PoFF) where multi-bit errors begin to appear and the error
rate is 0.54. Region R3 is at 0.66V (30% below PoFF) with
an error rate of 0.95. This region shows Gaussian like error
statistics (εi) overlaid with large magnitude errors (ηi) which
are still correctable with median or mean fusion. Region R4
(0.6V or 37% below PoFF) is where the error rate is 0.97
and the system breaks down. Figure 7 plots Pdet and pe vs.
Vdd for a fixed false alarm rate of 10%. It can be seen that a
near constant Pdet ≥ 90% is achieved for Vdd ≥ 0.69V and a
pe ≤ 0.868. This voltage is 27% below the PoFF voltage of
0.95V indicating significant robustness to voltage variations.
These results are consistent with the simulation results in [9].
Figure 8 shows the relation between energy and Vdd along
with pe. It can be seen that 5.8× energy savings and 2170×error tolerance can be achieved at Vdd = 0.69V compared
to Vdd = 0.95V (PoFF) without any loss in system level
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
10-2
10-1
100
Vdd (V)
P det
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
10-2
10-1
100
p e
0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.740.7
0.8
0.9
1
Vdd (V)
P det
0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.740.7
0.8
0.9
1
p epe
Pdet
PoFF
Fig. 7. Detection probability Pdet and sensor probability of error pe vs.supply voltage Vdd.
0.6 0.7 0.8 0.9 10.96
100
200
300
400
500
Vdd(V)
Ener
gy (p
J)
0.6 0.7 0.8 0.9 1
10-2
10-1
100
p e
peenergy
energyreduction
5.8X
PoFF
0.95
errortolerance
2170X
Fig. 8. Energy consumption and sensor probability of error pe vs. supplyvoltage Vdd.
performance (probability of detection Pdet) in the presence
of very high error rates pe ≤ 0.868. Compared to simulation
results in Fig. 1, measurements in Fig. 8 indicate that expected
error tolerance and up to half of the potential energy savings
has been realized. This represents a 3.79× greater energy
savings over [4] with a 2170× higher error rate tolerance.
Table II compares the results of our work with previous
published work. It can be seen that the SEC based design
achieves significantly better performance by operating at the
system level and utilizing statistical information.
V. CONCLUSION
We have shown an implementation of PN acquisition filter
utilizing statistical error compensation. This design operates
TABLE IICOMPARISON WITH OTHER WORK
Vdd Tech. pe Energy Savings
[3] 1.2− 1.8V 180nm 0.1% 14-17%
[4] 0.8− 1.2V 130nm 0.04% 33-35%
[6] 0.9− 1.0V 45nm N/A 22%
Our work 0.69− 0.95V 180nm 86.8% 82.8%
at 27% below the point of first failure (PoFF) in contrast to
the 10% droop for the near PoFF designs [3]–[6], and with an
error rate that is 2170× higher. Also, the energy dissipation at
the minimum voltage of 0.69V where Pdet remains 90%, the
energy consumed is 72.89 pJ, while at PoFF it is 422.94 pJ,
resulting in a 5.8× reduction in energy. This is 3.79× greater
energy savings than [4].
VI. ACKNOWLEDGMENTS
The authors acknowledge the support of the Gigascale
System Research Center (GSRC) under the Focus Center
Research Program (FCRP), a Semiconductor Research Cor-
poration program, and the National Science Foundation grant
CCF 0729092.
REFERENCES
[1] “International Technology Roadmap for Semiconductors,” Online:http://www.itrs.net.
[2] R. Hegde and N. Shanbhag, “A voltage overscaled low-power digitalfilter IC,” IEEE J. Solid-State Circuits, vol. 39, no. 2, pp. 388–391, Feb.2004.
[3] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner,and T. Mudge, “A self-tuning DVS processor using delay-error detectionand correction,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 792–804,Apr. 2006.
[4] D. Blaauw, S. Kalaiselvan, K. Lai, W.-H. Ma, S. Pant, C. Tokunaga,S. Das, and D. Bull, “Razor II: In situ error detection and correctionfor PVT and SER tolerance,” in Int. Solid-State Circuits Conf. (ISSCC),Feb. 2008, pp. 400 –622.
[5] D. Bull, S. Das, K. Shivashankar, G. Dasika, K. Flautner, and D. Blaauw,“A power-efficient 32 bit ARM processor using timing-error detectionand correction for transient-error tolerance and adaptation to PVTvariation,” IEEE J. Solid-State Circuits, vol. 46, no. 1, pp. 18–31, Jan.2011.
[6] J. Tschanz, K. Bowman, S.-L. Lu, P. Aseron, M. Khellah, A. Raychowd-hury, B. Geuskens, C. Tokunaga, C. Wilkerson, T. Karnik, and V. De, “A45nm resilient and adaptive microprocessor core for dynamic variationtolerance,” in Int. Solid-State Circuits Conf. (ISSCC), Feb. 2010, pp.282–283.
[7] W. Namgoong and T. Meng, “Minimizing power consumption in directsequence spread spectrum correlators by resampling if samples-part i:performance analysis,” IEEE Trans. Circuits Syst. II, vol. 48, no. 5, pp.450 –459, May 2001.
[8] D. Senderowicz, S. Azuma, H. Matsui, K. Hara, S. Kawama, Y. Ohta,M. Miyamoto, and K. Iizuka, “A 23 mw 256-tap 8 msample/s QPSKmatched filter for DS-CDMA cellular telephony using recycling integra-tor correlators,” in Proc. Int. Solid-State Circuits Conf. (ISSCC), 2000,pp. 354 –355.
[9] G. V. Varatkar, S. Narayanan, N. R. Shanbhag, and D. L. Jones,“Stochastic networked computation,” IEEE Trans. VLSI Syst., vol. 18,pp. 1421–1432, Oct. 2010.
[10] C. Lee and C. Jen, “Bit-sliced median filter design based on majoritygate,” IEE Proc. G Circuits, Devices and Syst., vol. 139, no. 1, pp.63–71, Feb. 1992.