[IEEE 2011 IEEE Custom Integrated Circuits Conference - CICC 2011 - San Jose, CA, USA...

Low Power and Error Resilient PN CodeAcquisition Filter via Statistical Error Compensation

Eric P. Kim, Daniel J. Baker, Sriram Narayanan, Douglas L. Jones, and Naresh R. ShanbhagCoordinated Science Laboratory / Department of Electrical and Computer Engineering

University of Illinois at Urbana-Champaign

1308 W Main St., Urbana, Illinois, 61801

Email:{epkim2, djbaker3, spnaraya, dl-jones, shanbhag}@illinois.edu

Abstract—We present a 256-tap PN code acquisition filter inan 180nm CMOS process employing statistical system-level errorcompensation. Under voltage overscaling (VOS), near constantdetection probability (Pdet) above 90% with 5.8× reduction inenergy is achieved at a supply voltage 27% below the point offirst failure (PoFF) with an error rate (pe) of 0.868. This is animprovement of 5.8× in energy-efficiency over conventional errorfree designs and 3.79× in energy-efficiency and 2170× in errortolerance over existing error tolerant designs.

I. INTRODUCTION

As CMOS technology scales to the sub-45nm regime in

accordance with Moore’s law, nonidealities due to process,

temperature and voltage variations, and soft errors are increas-

ingly becoming commonplace. These variations often result

in uncertain gate delays and leakage currents subsequently

causing intermittent errors in computation. This trend is ex-

pected to worsen in the next decade [1]. Early approaches

seek to avoid these errors by designing for the worst-case

through overprovisioning of resources. These methods are

often wasteful and often unaffordable in many power-limited

applications. Therefore, modern IC systems need to be tolerantof subcomponent errors.

Error-resiliency to delay/timing errors has been shown [2]–

[6] to be an effective approach for combating variations while

achieving energy-efficiency. Voltage overscaling (VOS) (see

Fig. 1) was employed in [2]–[6] to induce timing errors by

reducing the supply voltage Vdd below the point of first failure

(PoFF). The error-rate pe (percentage of clock cycles in which

the output is in error) increases as Vdd is reduced. Figure 1

shows HSPICE simulation results of a 4 tap filter in 45nm

CMOS subject to VOS. If the errors are fully compensated

for without additional overhead, energy reduction up to 9×can be achieved over a system operating at PoFF.

Razor I [3] employs VOS along in situ local (FF level)

timing error-detection, and local correction, in order to reduce

energy while combating variations. Razor I demonstrated that

at an error-rate of 10−7, near the PoFF, the error-correction

overhead is minimal, and energy-efficiency gains are 14%-

to-17%, compared to an architecture operating at the PoFF.

Razor II [4], [5] employs local error-detection and architectural

replay to operate at an error-rate of pe = 4× 10−4, which is

also near PoFF, while achieving an energy savings of 33%-to-

35%.

0.8 1 1.2 1.4 1.6 1.80

2

4

6

8

10

Ener

gy (p

J)

0.8 1 1.2 1.4 1.6 1.80

0.2

0.4

0.6

0.8

1

Vdd(V)

p eenergy

9X

pe

PoFF

11

This work

Past work

2170X

Voltage overscaling (VOS)

Fig. 1. Simulations of voltage overscaling (VOS) for a 4-tap correlation filter(a sensor) at 50 MHz in 45nm CMOS.

In this work, we present a PN code acquisition filter that

employs statistical error compensation (SEC). SEC enables

operation at a voltage significantly less than at PoFF providing

extremely high reliability at very low power (as noted in

Fig. 1). The filter is implemented in an 180nm, 1.8V CMOS

process, operating at an error-rate of pe = 0.868 while

achieving an energy-efficiency of 5.8×.

The remainder of this paper is organized as follows: Section

II introduces the PN-code acquisition application along with

conventional and SEC based architecures. Actual chip design

and architecture used are shown in Section III with Section IV

showing measurement results. Section V concludes the paper.

II. PN CODE ACQUISITION FILTER

Pseudo-noise (PN) codes play an important role in direct-

sequence spread spectrum (DS/SS) systems. PN code acqui-

sition is required to be able to decode the received message.

PN codes have a characteristic of having a cross correlation of

two different PN codes to be zero, while the autocorrelation

has an impulse at lag zero. Reducing the power required to

perform PN code acquisition is essential for mobile wireless

communication [7].

The traditional architecture for a PN code acquisition system

is a simple matched filter such as the one in Fig. 2(a) [8]. This

architecture exploits the correlation characteristic of PN codes.

A length 256-tap PN acquisition filter correlates the received

978-1-4577-0223-5/11/$26.00 ©2011 IEEE

D D D D D D][nx

][ˆ ny1Nh4h3h2h1h0h

(a)

8

8

PN in PN out

Data in

Data out

Fusion init

Algorithm select

Load enable

D D D D

D D D D

D

InternalControl

Sensor

>> 210

y

Statistical Error Compensated PNCode Acquisition Filter

S0

S1

S63

FusionBlock

0y

1y

63y

xPN in

PN code

PN code

PN code

Data inData out

(b)

Fig. 2. PN code acquisition systems: (a) conventional, and (b) SEC based.

signal xj with the PN-code φj as:

yo =

255∑

j=0

φjxj (1)

where φj represents the 1b PN-code and xj is an 8b received

signal. The detection of a PN code is done by performing

correlation with the received signal against a locally generated

PN code. A peak detector or a thresholding block is used to

detect a match by a threshold τ by y = sgn(yo − τ).

The SEC implementation of the PN code acquisition system

is shown in Fig. 2(b). First, a parallel block decomposition of

the matched filter in Fig. 2(a) is obtained. The parallel outputs

are then combined by a fusion block. With a decomposition

factor of 64, the 256-tap correlator is decomposed into 64 4-

tap sub-correlators with each output given by:

yi =

3∑

j=0

φ4i+jx4i+j . (2)

0 1 2 3 4 5 6-150

-100

-50

0

50

100

150

Time index

Sen

sor o

utpu

t val

ue

yo

sensorsmedianmean

Outliers can shift themean while median

is unaffected

Outliersdue to η

Fig. 3. Measured sensor outputs over time.

It should be noted that the sum of all sub-correlators∑63

i=0 yiis equal to yo, the output of the conventional matched filter

in (1). Each sub-correlator is referred to as a sensor. As each

sensor performs correlation over a different subset of the full

PN sequence, sensor outputs exhibit spatially uncorrelated

estimation errors εi, i.e. yi = yo + εi. If additionally, the

sensors are subject to VOS, timing errors ηi are induced as

well, resulting in sensor outputs given by yi = yo + εi + ηi.These errors are compensated via a fusion block that combines

all sensor outputs to a single value using mean and median

operations. Figure 3 plots the correct output yo, measured

outputs of four sensors, and the mean and median of all

64 sensors. Most times the sensor outputs are close to yo,

indicating that εi is Gaussian distributed (relatively small

in magnitude). Once in a while, the sensor outputs deviate

significantly from yo indicating that ηi is large in magnitude.

This is to be expected as MSB errors will occur in LSB-

first computation. The median and mean fusion is shown to

compensate for errors effectively.

A more general approach to SEC has been proposed in [9].

The general framework is referred to as stochastic networked

computation and is applicable to any architecture that can be

decomposed in a statistically similar manner.

III. CHIP ARCHITECTURE

Figure 4 shows the high level block diagram of the PN code

acquisition chip. There are a total of 64 sensors, and a fusion

block implementing mean and median. An adaptive thresh-

olding block at the output determines whether a detection has

occurred.

The SEC chip architecture in Fig. 4 reduces filter energy

consumption by shifting 1b taps instead of the 8b data. The 64

sensor outputs y0, y1, ..., y63 are then processed by the fusion

block to generate the final output y. This is thresholded to

indicate the presence/absence of the PN code in the data. Past

work [9] has shown that the mean and median operations are

effective approximations of the optimal robust estimator. The

fusion block implements a 3-stage hierarchical mean/median

functions to avoid global interconnect. The hierarchical me-

dian is based on [10] and requires special attention. First stage

S0 -S3

S4 -S7

S8 -S11

S12 -S15

S28 -S31

S32 -S35

S60 -S63

F-A

F-B

F-C

F-D

F-E

F-H

F-P

data_in

8

To F-P

F-AA

F-CC

F-DD

F-BB

To F-DD

FinalFusion

Threshold

8

8

8

8

8

8

8

8

8

8

8

8 1

code_in

PN _load

sensors

PN_in

4x8

4x8

4x8

4x8

4x8

4x8

4x8

from S0-S3from F-A, F-B, F-C,

F-D

y

Fig. 4. High level architectural diagram of the chip.

TABLE ICHIP STATISTICS

Technology Vdd Cells Area Frequency

TSMC 180 1.8V 48440 2mm× 2mm 50MHz

has 8 sensors grouped together, with 4 sensor overlap, to create

16 median outputs. These are passed to the next stage, and

grouped in a similar manner to produce 4 median outputs.

The final stage chooses the third largest among the four values.

The threshold is adaptively set to target a specific false alarm

rate. The fusion block was synthesized with a 33% stringent

timing constraint compared to sensors to enable it to operate

error-free at supply voltages of interest. The sensors have

identical functionality, and were synthesized with identical

timing constraints. Thus, the error probability mass functions

(PMFs) for all sensors are expected to be statistically similar.

Figure 5 shows the chip microphotograph. The chip has a

total of 48440 cells, a total cell area of 1.871mm2, and the

total chip area including the pad frame is 2.7mm × 2.7mm.

The core area is approximately 2mm× 2mm, and 10 IO pins

are placed on each side. Spacing from the power rings to the

pad frame is approximately 15 μm. This is summarized in

Table I.

IV. TEST RESULTS

The chip was fabricated in a 180nm, 1.8V, CMOS process,

and tested with Agilent 16900A logic analysis system, at a

frequency of fclk=50MHz. At this frequency, the PoFF voltage

is 0.95V . Test vectors were generated by corrupting a length

256 PN code sequence with additive white Gaussian noise at

an SNR of -12dB. Test vectors with length 106 containing

Fig. 5. Chip microphotograph.

-20 -10 0 10 200

500

1000

1500

2000

2500Error PMF for R1 (Vdd = 0.85V)

magnitudeoc

cure

nce

-20 0 20 40 600

2000

4000

6000

8000


magnitude

occu

renc

e

-100 -50 0 500

2000

4000

6000


magnitude

occu

renc

e

-200 -100 0 100 2000

2000

4000

6000

8000

10000

magnitude

occu

renc

e

Error PMF for R4 (Vdd = 0.60V)

074.0ep 54.0ep

95.0ep 97.0ep

Fig. 6. Measured sensor’s error PMFs.

103 detections were employed. The chip was tested at supply

voltages from 0.95V down to 0.6V . Figure 6 shows the

measured error PMFs at the output of sensor S0 (see Fig. 4) for

various Vdd. The PMFs were obtained by comparing measured

outputs with RTL simulations. Region R1 is at 0.85V , near

the PoFF but well below it by 10%, and hence the error rate is

0.074 (still 100× higher than in [4]) and consists of only small

(single bit) valued errors. Region R2 is at 0.76V (20% below

the PoFF) where multi-bit errors begin to appear and the error

rate is 0.54. Region R3 is at 0.66V (30% below PoFF) with

an error rate of 0.95. This region shows Gaussian like error

statistics (εi) overlaid with large magnitude errors (ηi) which

are still correctable with median or mean fusion. Region R4

(0.6V or 37% below PoFF) is where the error rate is 0.97

and the system breaks down. Figure 7 plots Pdet and pe vs.

Vdd for a fixed false alarm rate of 10%. It can be seen that a

near constant Pdet ≥ 90% is achieved for Vdd ≥ 0.69V and a

pe ≤ 0.868. This voltage is 27% below the PoFF voltage of

0.95V indicating significant robustness to voltage variations.

These results are consistent with the simulation results in [9].

Figure 8 shows the relation between energy and Vdd along

with pe. It can be seen that 5.8× energy savings and 2170×error tolerance can be achieved at Vdd = 0.69V compared

to Vdd = 0.95V (PoFF) without any loss in system level

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

10-2

10-1

100

Vdd (V)

P det

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95

10-2

10-1

100

p e

0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.740.7

0.8

0.9

1

Vdd (V)

P det

0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.740.7

0.8

0.9

1

p epe

Pdet

PoFF

Fig. 7. Detection probability Pdet and sensor probability of error pe vs.supply voltage Vdd.

0.6 0.7 0.8 0.9 10.96

100

200

300

400

500

Vdd(V)

Ener

gy (p

J)

0.6 0.7 0.8 0.9 1

10-2

10-1

100

p e

peenergy

energyreduction

5.8X

PoFF

0.95

errortolerance

2170X

Fig. 8. Energy consumption and sensor probability of error pe vs. supplyvoltage Vdd.

performance (probability of detection Pdet) in the presence

of very high error rates pe ≤ 0.868. Compared to simulation

results in Fig. 1, measurements in Fig. 8 indicate that expected

error tolerance and up to half of the potential energy savings

has been realized. This represents a 3.79× greater energy

savings over [4] with a 2170× higher error rate tolerance.

Table II compares the results of our work with previous

published work. It can be seen that the SEC based design

achieves significantly better performance by operating at the

system level and utilizing statistical information.

V. CONCLUSION

We have shown an implementation of PN acquisition filter

utilizing statistical error compensation. This design operates

TABLE IICOMPARISON WITH OTHER WORK

Vdd Tech. pe Energy Savings

[3] 1.2− 1.8V 180nm 0.1% 14-17%

[4] 0.8− 1.2V 130nm 0.04% 33-35%

[6] 0.9− 1.0V 45nm N/A 22%

Our work 0.69− 0.95V 180nm 86.8% 82.8%

at 27% below the point of first failure (PoFF) in contrast to

the 10% droop for the near PoFF designs [3]–[6], and with an

error rate that is 2170× higher. Also, the energy dissipation at

the minimum voltage of 0.69V where Pdet remains 90%, the

energy consumed is 72.89 pJ, while at PoFF it is 422.94 pJ,

resulting in a 5.8× reduction in energy. This is 3.79× greater

energy savings than [4].

VI. ACKNOWLEDGMENTS

The authors acknowledge the support of the Gigascale

System Research Center (GSRC) under the Focus Center

Research Program (FCRP), a Semiconductor Research Cor-

poration program, and the National Science Foundation grant

CCF 0729092.

REFERENCES

[1] “International Technology Roadmap for Semiconductors,” Online:http://www.itrs.net.

[2] R. Hegde and N. Shanbhag, “A voltage overscaled low-power digitalfilter IC,” IEEE J. Solid-State Circuits, vol. 39, no. 2, pp. 388–391, Feb.2004.

[3] S. Das, D. Roberts, S. Lee, S. Pant, D. Blaauw, T. Austin, K. Flautner,and T. Mudge, “A self-tuning DVS processor using delay-error detectionand correction,” IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 792–804,Apr. 2006.

[4] D. Blaauw, S. Kalaiselvan, K. Lai, W.-H. Ma, S. Pant, C. Tokunaga,S. Das, and D. Bull, “Razor II: In situ error detection and correctionfor PVT and SER tolerance,” in Int. Solid-State Circuits Conf. (ISSCC),Feb. 2008, pp. 400 –622.

[5] D. Bull, S. Das, K. Shivashankar, G. Dasika, K. Flautner, and D. Blaauw,“A power-efficient 32 bit ARM processor using timing-error detectionand correction for transient-error tolerance and adaptation to PVTvariation,” IEEE J. Solid-State Circuits, vol. 46, no. 1, pp. 18–31, Jan.2011.

[6] J. Tschanz, K. Bowman, S.-L. Lu, P. Aseron, M. Khellah, A. Raychowd-hury, B. Geuskens, C. Tokunaga, C. Wilkerson, T. Karnik, and V. De, “A45nm resilient and adaptive microprocessor core for dynamic variationtolerance,” in Int. Solid-State Circuits Conf. (ISSCC), Feb. 2010, pp.282–283.

[7] W. Namgoong and T. Meng, “Minimizing power consumption in directsequence spread spectrum correlators by resampling if samples-part i:performance analysis,” IEEE Trans. Circuits Syst. II, vol. 48, no. 5, pp.450 –459, May 2001.

[8] D. Senderowicz, S. Azuma, H. Matsui, K. Hara, S. Kawama, Y. Ohta,M. Miyamoto, and K. Iizuka, “A 23 mw 256-tap 8 msample/s QPSKmatched filter for DS-CDMA cellular telephony using recycling integra-tor correlators,” in Proc. Int. Solid-State Circuits Conf. (ISSCC), 2000,pp. 354 –355.

[9] G. V. Varatkar, S. Narayanan, N. R. Shanbhag, and D. L. Jones,“Stochastic networked computation,” IEEE Trans. VLSI Syst., vol. 18,pp. 1421–1432, Oct. 2010.

[10] C. Lee and C. Jen, “Bit-sliced median filter design based on majoritygate,” IEE Proc. G Circuits, Devices and Syst., vol. 139, no. 1, pp.63–71, Feb. 1992.

[IEEE 2011 IEEE Custom Integrated Circuits Conference - CICC 2011 - San Jose, CA, USA...

Documents

Transcript of [IEEE 2011 IEEE Custom Integrated Circuits Conference - CICC 2011 - San Jose, CA, USA...