[Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung-...

8/17/2019 [Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung- -- [IEEE …

1/4

Coupling-Driven Signal Encoding Scheme for Low-Power Interface Design

K i - W o o k K i m ,

Kwang-Hyun

B a e k , N a r e s h S h a n b h a g , C.

L

L i u t a n d S u n g - M o

Kang

C o o r d i n a t e d S c i e n c e L a b o r a t o r y , U n iv .

of

I l li n o i s a t U r b a n a - C h a m p a i g n , USA

D e p t . of C o m p u t e r S c i e n c e , N a t io n a l T s i n g

Hua

Univers i ty , Ta iwan

Abstract

Coupling effects between on-chip interconnects must be addressed

in ultra deep submicron VLSI and system-on-a-chip (SoC)designs.

A new low-power bus encoding scheme is proposed to minimize

coupled switchings which dominate the on-chip bus power con-

sumption. The coupling-driven bus invert method use slim encoder

and decoder architecture to minimize the hardware overhead. Ex-

perimental results indicate that our encoding methods save effective

switchings as much as 30 in an 8-bit bus with one-cycle redun-

dancy.

1

Introduction

Increased coupling effect between interconnects in ultra deep sub-

micron technology not only aggravates the power-delay metrics but

also deteriorates the signal integrity due to capacitive and inductive

crosstalk noises. Conventional approaches to interconnect synthe-

sis aim at optimal interconnect structures in terms of interconnect

topology, wire width and spacing, and buffer location and sizes [3].

In this paper, we study a signal encoding scheme to minimize cou-

pling effects between interconnects.

Signal encoding schemes have been proposed to m inimize transi-

tion activities on buses w hile ignoring cross-coupled capacitances.

When statistical properties are unknown a priori, the bus-invert

method [14] and the on-line adaptive scheme

[ I ]

can be applied to

encode random ly distributed signals. On the other hand, highly cor-

related access patterns exhibit a spatio-temporal locality w hich can

be exploited for energy reduction [ I 11 in Gray code

[9,

161, the

TO

method [7], and the working-zone encoding [lo ]. Lower bounds

for minimum achievable transition activity have been derived for

noiseless buses in [12] and for noisy buses in [5].

In

[17], a seg-

mentation method was introduced to reduce power consumption.

Specification transformation approaches were used to reduce the

number

of

memory accesses at the behavioral level [2]. T he effec-

tiveness

of

various encoding schemes w as compared at the system

level in [4].

Most of the previous bus-encoding schemes were designed to

minimize transition activities

on

each signal line as if each line

were isolated from neighboring lines, hence ignoring coup ling ef-

fects. Such an assum ption may be valid for off-chip buses where the

impedances of transmission lines

are

appropriately adjusted. H ow-

ever, this is not the case for long on-chip buses which are particu-

‘Supported

in

part by National Science Foundation under grant NSF

MIP

96 12184,

y

Intel under

grant

5414

and by

SRC

under

98-HJ-641.

capacitances

Figure

1: A

tightly cross-coupled on-chip buses in

a

system-level chip design

larly prevalent in a system-on-a-chip. For example, the wire aspect

ratio is expected to be over 2.4 for intermediate wiring (namely,

third metal layer and fourth m etal layer) in 0.18

p n

seven-layered

metal process [131. Accordingly, coupling has become an important

issue with scaled supply voltage when we consider signal integrity

and power dissipated by coupling capacitances, referred to as cou-

pling power. Shielding can be a way to avoid crosstalk problem

with area overhead.

In this paper, we propose a new encoding scheme for static on-

chip bus structure to minimize coupling power. The key idea is

that coupling effects could be alleviated by transforming the signal

sequences traveling on-chip buses that are closely placed. Small

blocks of encoding and decoding logic are employed at the trans-

mitter and receiver of on-chip buses as shown in Figure 1. T he en-

coder and decoder (codec) should have a low-complexity architec-

ture so that the power and delay overhead due to the codec circuitry

can be com pensated by significant savings in sw itching activities

on tightly coupled buses.

2 Interconnect Power Characterist ics

The average energy consumed by a wire with clock frequency

=

&

can be [61

where n is the number of clock cycles observed and

I c ( t )

represents

the drawn current due to transitions in a clock period. AQaV

is

the

time-averaged charge provided by the power supply to all capaci-

tances of the interconnect and i s given by

where p denotes the switchin g probability, and Ctoidenotes the total

lumped capacitance. The effective capacitance C is defined by

both physical capacitances and switching activities. T he effective

capacitance accounts for time-averaged charge stored in physical

capacitances provided by the power supply.

0-7803-6445-7/00/ 10.00 000 IEEE

318


2/4

Rd adjacent net

*

* *

primary net

Figure

2 A

distributed

RC

model for the interconnects.

a)

TY

pe

(b) Type

L

-

L ; __

c x x

pJ C T Z-1

e) Type

(d)

Type

IV

Figure

3:

Transition types: a) Single line switching; b) both lines switching in oppo-

site direction; c) both lines switching n the same direction ; d)

no

switching.

One can model the physical capacitance of a wire as shown in

Figure

2.

The capacitive components are

self

capacitance

C,

and

coupling capacitance C,. The self capacitance C, represents the

total lumped capacitance including parallel-plate capacitance and

fringe capacitance. For each capacitive component, w e define the

effective capacitance as

c,

=

Y .c,

3)

E, = z.c, 4)

where and Z are the average number of effective transitions per

cycle for C and C,, espectively, which are com puted as follows.

First, we quantify the selftransition activity for the self capac-

itance C,.Let p: denote the transitional probability that the signal

value of

i

chang& from

x

toy , which can be represented by the sig-

nal probability p(i",) and the conditional probability p(iE:l i:,)

such that p&, =,p(i L,).p(i LtlIiz, ,where

x , y E

(0 ,l) . I f we as-

sume that there

is

no

glitch in the signals, the self transition activity

is given by

5 )

y = P0,l

since the capacitance C, will be charged up only when a low-to-high

transition takes place.

Next, the coupled transition activity

2

s computed according

to the correlated switchings between physically adjacent intercon-

nects. The re are four types of possible transitions when we con-

sider dynamic charge distribution over coupling capacitances as il-

lustrated in Figure 3. We suppose that there

are

two parallel wires

placed with minimum spacing. A type transition occurs when one

of the signals switches while the other stays unchanged such that

the coupling capacitance is then charged up to

klC,Vdd.

where the

coefficient

kl

is introduced as a reference for other types of transi-

tion. In a type

II

transition, one bus switches from low to high while

the other switches from high to low. Th e effective capacitance will

be larger than

kl

by a factor of kz the value of which is usually two.

In a type Ill transition, both signals switch simultaneously and C

will not be charged. Howev er, because of possible misalignment

of the two transitions, the amoun t of power consumption va ries ac-

cording to the dynamic characteristics by a factor of k3. In a type

IV transition, there is no dynamic charge d istribution over coupling

capacitance. Thus, we set k4 to zero.

denote the joint transitional probability defined byet

(0,l). Each ty pe of transition contributes to the effective coupling

capacitance between a wire i and a wire as follows.

= kl (P&l +P&l0 +Pylp, + P y l J 0 ) +

~ d P ~ I J 0

;O ,Ol ) +k3 (P& +P?l,Oo)

6)

The total lumped capacitance f or a bus can be computed a ccord-

ing to Equations 5 ) and (6). Accordingly, the dynamic power

consumed by the interconnects and d rivers is given by

We define the capacitance ratio

q

= or a terminated bus.

The capacitance ratio increases as the aspect ratio of the intercon-

nect increases.

3 Low Power Encoding Schemes

Decoder

..................................................................

ncoder

D.son*latw i Con lor

.................................................................

Figure

4

Low-pow er encoder-decoder codec) framework

3.1

General Codec Architecture

Figure

4

llustrates a generic codec architecture for tw o bit signals.

The encoder consists of three components: a predictor, an encod-

ing function block

E,

and a decorrelator. The prediction function

q n ) s a function of past inpu t values given by q n ) = x n - ,

codec architecture. Th e combinational function

E

reduces the aver-

age number of self transitions and cou pled switchings between Zi(n)

and

z , ( n ) .

The encoding function E differs from the architectures

proposed in

[

1,121 n that it takes as input the data on adjacent buses

to account for coupling effect. In ge nerat, for an input signal

x i ( n ) ,

the encoding function is given by

y i ( n )

=

E(xi(n) t

(n),

x i - l ( n ) ,

x i - l ( n ) , x i + l ( n ) , % + l ( n ) ) ,

wherexi- l(n) andx i+l(n) areneighbor-

ing signals. The input data

xi(n)

and the prediction function z(n)

account for the reduction of self transition activities in

z i ( n ) .

Signal

integrity and coupling power depend on both the current value of

neighboring signals xi-1 (n) and

xi+l

(n)) and the transition histo-

ries ( Z - l ( n ) and +I@)) . The decorrelator employs a transition

encoding scheme, whereby a transition is encoded with the logic

value and no transition is encoded with the logic value 0.

As

a mirror of the encoder, the decoder consists of three com po-

nents: a correlator, a decoding function block D, and

a

register to

keep the prediction function qn . The decoding logic function is

given byxi(n )

=

~ (y i (n ) i (n ) , Y i - l (n ) , ~ i - l (n ) , Y i+ l (n ) , Z + l (n ) ) .

The decoder D realizes the inverse function of the encoder E.

In choosing a codec scheme, we need to take into account two

major issues. The first criterion is a tradeoff between architecture

complexity and encoding efficiency. Rent's rule

[8]

states that there

is a simple power-law relationship between the number of YO er-

minals for a logic block and the number of gates contained in that

x ( n 2 ,

... x ( n

K)). We consider K = 1 for low-complexity

319


3/4

block for a given degree of parallelism. It means that considera-

tions for adjacent input pins are likely to add logic gates to a co dec

functional block. Increased number of logic gates in tum can in-

duce overhead power consumption and longer propagational delay.

Therefore, it should be ensured that benefits from data encoding

are large enough to compensate for the overhead of a codec archi-

tecture. Secondly, the codec system should guarantee the unique

decodability constraints, even in the presence of physical noises.

Th e signal integrity can be ensured by asserting spatial redundancy

(extra control lines) or temporal redundancy (extra clock cycles) or

by selecting appropriate supply voltage, the size of transmitter and

receiver, and the clock frequency.

3.2 The Coupling-Driven Bus-Invert Scheme

E n d e r

E)

Coupling mu nl y

Figure

5: An 8-bit bus encoder for the coupling-driven bus-invert scheme

The bus-invert method [14] is limited to reduce transition activ-

ities while assuming that coupling power contribution can be ig-

nored. However, coupling power becomes

a

dominant component

of dynamic power as wires become thinner and taller. To reflect the

technology trend appropriately, we propose a

coupling-driven bus-

invert

method to tackle the coupling power reduction problem. T he

bus-invert method flips the data signal when the number of sw itch-

ing bits is more than half of the number of signal bits. In the sam e

context, we invert the input vector, when the coupling effect of the

inverted signals is less than that of the original signals. Th e prob-

lems are then how to accurately account for the coupling effect, and

to effectively implement the schem e with low hardware overhead.

Before addressing these issues, we state our assumptions. First,

synchronous latches are located at the transmitter side, thus all the

transitions shall take place at the same time on the bus. The si-

multaneo us transitions exclude type transitions by setting k3 =

0.

It means that the results we achieve are on the lower end of

power saving. Second, statistics on the information source are not

given in advance. H ence this scheme is suitable for data bus encod-

ing, where it i s difficult to extract accurate proba bilistic information

off-line.

Enumeration method is employed to represent the coupling ef-

fect. If a bus line Bi is located between two other lines, a signal

transition on

Bi

can trigger charge shifts on both coupling capac-

itances connected to Bi-1 and Bi+l, respectively. In other words,

at most two couplings can be initiated by a signal transition. T hus,

2(N

bits are sufficient to represent the whole set of couplings

in an N-bit bus per bus cycle. T he encoder architecture is shown in

Figure 5.

According to the types of correlated transition between neigh-

boring buses, the coupling encoder generates a codeword as fol-

lows:

00

for a type

l l

or IV transition, 01 for a type

I

transition, and

1 1 for a type I transition. Th e reason that we assign 1 1 to a type

transition is that switchings in different directions require to change

the polarity of the charge stored in the coupling capacitance, hence

consuming about twice the amount of charge required for a type I

transition. T he codeword 1 1 , instead of 10 helps to make a deci-

sion on data inversion using a majority voter, because the majority

voter outputs high when at least eight input lines are high out of fif-

teen inputs. The majority voter can be implemented by using either

full-adder circuitry or resistors and a voltage comparator [14]. The

control signal inv can be transmitted to the receiver using extra

bus lines or extra transfer cycles. On e problem of additional bus

lines for control is the area overhead that may not be allowed due

to physical constraints. In some cases, widening the space between

signal bus lines can reduce the coupling effects more effectively

than introducing extra control lines, because the coupling capaci-

tance is inversely proportional to net space. Temporal redundancy

is an altemative using extra clock cycles to transfer control signals.

We assume that the input stream i s transmitted in burst m ode that

enables us to accommodate temporal redundancy

[

151.

The following theorem gives the expected number of couplings

per bus cycle with independent data source where each bit switches

independently with transitional probability

p.

Theorem 3.1 Let { B i } be an N-tuple of 0-1 valued random vari-

ables such that Bi

= <

bo, bl,

...

, bN-] >, in which a bit bk

0

5

k 5 N switches independently of other bits with transitional

probability

k = p

0

5 k 5 N .

Suppose

{ B i }

are encoded

with the coupling-driven bus-invert algorithm with one-cycle re-

dundancy. Then, the coupled transition activity

Z

per bus cycle

is given by

N - l N . (2 N - 3) N - 2

z =

2 2 N . ( N - 2 ) ‘ ( )

if

transitions

on {Bi}

ollow the binomial distributions.

Example 3.1: The average number of couplings per bus cycle for

an 8-bit bus is evaluated to be 2.484. The average number of cou-

plings per bus cycle for randomly distributed, independent data is

3.5

since

p

=

0.5

and

N

=

8.

Hence, the percentage reduction in

couplings over raw data transmission is 29 which is calculated as

3.5;;484

*

100 when w e use the coupling-driven bus-invert method.

The coupling-driven bus-invert scheme can em ploy multiple cy-

cles to notify the receiver whether the transmitted data are inverted

or not. Multiple extra cycles for cotrol signals imply that the granu-

larity of unit codeword to be processed becom es finer, hence less

couplings are likely to occur on the bus. However, this benefit

comes at the cost of extra clock cycles.

4 Experimental Results

The methods proposed were implemented and power consumptions

were measured using HSPICE. Applied data streams consist of

MPEG video files

(Vl,

2

and

V3

in Table

l),

MP3 audio files

(Al,

2 and

A3)

nd

PDF

format files

(Pl, 2

and

P3).

Table 1 presents percentage reduction in transition activities us-

ing the bus-invert (BI) method and proposed coupling-driven bus-

invert (CBI) method. To com pare fairly, we use one redundant cy-

cle to transfer control signals per eight data packets each of which

consists of 8 bits. The reason that we compare the CBI scheme

with the BI method is that the BI method does not require prob-

abilistic information in advance. The results under the column

SP

present percentage reduction in self transitions, calculated as:

320


4/4

c,

1.0

3.0

5.0

Red

BI

c, p(l) P(E) p(T)

0.25

111

66 20

86 22.5

0.75 319 191 19 210 34.2

1.25

526

315 21

336 36.1

I

Avg. 24.3

I

24.2 I 24.3 ]I 31.8 17.5 I

30.0

BI [I41

-1*100, where SP(X) enotes the number of self tran-

sitions for raw input streams and SP(X') enotes that for encoded

data. The results under the columns

XP

and TP resent percentage

reduction in coupled switchings and total switchings, respectively.

The total switchings account for both self transitions and coupled

switchings with a capacitance ratio q. The C BI scheme yields bet-

ter results compared to the BI method. This is because coupling

reductions due to the CBI encoding are larger than that due to the

BI encoding. It implies that the CBI schem e saves much more ef-

fective switchings by combatting coupled switchings while even

sacrificing self transitions in a degree.

The CBI results show 31.8 average coupling reduction which

is in accordance with our theoretical bound 29 in Example 3.1.

The deviation comes from locality and correlations in input data

streams, which are assumed to be none in theoretical analysis.

Table 2 presents the power consumed by both the codec circuitry

and interconnects fo r the CBI scheme. Us ing 0 . 1 8 ~echnol-

ogy, power is measured for random data streams using HSPICE

with 1.6V power supply and the capacitance ratio q

= 4

that is

a realistic value from [13]. Th e results under column P(1) cor-

respond to power in pW consumed by interconnects and the re-

sults under column

P ( E )

n pW correspond to the power overhead

due to encodeddecoder circuits. Without encoding circuits, all the

power consumption contributes to interconnect power P(1) . Mean-

while, with suggested codec circuits, total power consumption un-

der column

P ( T )

are the sum of interconnect power P(Z) and en-

codeddecoder power P ( E ) . It should be pointed out that within a

realistic range of coupling capacitance, the power overhead P ( E )

due to the codec are relatively small enough to be compensated by

significant reduction in interconnect power P(Z). The percentage

of power savings are shown in the column % Red by using the

coupling-driven bus-invert encoding scheme.

CBI

5

Conclus ions

PI 24.2 23.8 24.2

P2 23.3 21.7 23.1

~ ~

Tightly coupled on-chip buses in a system-on-a-chip impose new

requirements for interconnect power reduc tion and signal integrity.

A novel bus encoding scheme was proposed for reducing power

consumed by on-chip buses by decreasing coupled switchings.

The coupling-driven bus-invert scheme reduces power consump-

tion about 30 with one-cycle redundancy. S imulation results by

using HSPICE indicate that the portion of power consumed by

the encodeddeco der logic block for those schemes is fairly small

in state-of-the-art technology. Therefore, the overhead due to en-

codeddecoder circuitry is compensated for by significant intercon-

nect power savings.

32.4 I 16.9 30.5

30.6 I 16.0 28.8

O

32

References

[I] L. Benini, A. Macii,E. Macii, M. Poncino, and R. Scarsi. Synthesis of

low-overhead interfaces for power-efficient communication over wide

buses. In

Proc. ACM/IEEE Design Automation Con ,

pages 128-133,

1999.

[2] F. Catthoor, F. Franssen,

S.

Wuytack, L. N achtergaele, and

H.

D. Man.

Global communication and memory optimizing transformations for

low power signal processing systems. In VLSI Signal Processing VII,

pages 178-187.1994.

[3]

J.

Cong. An Interconnect-centric design flow for nanometer technolo-

gies.

In Int. Symp. VLSI Technology, Systems, and Applicat ions,

pages

54-57, June 1999.

[4] W. Fomaciari, D. Sciuto, and C. Silvano. Power estimation for ar-

chitectural exploration of HW/SW communication on system-level

buses. In Int. Workshopon HardwareLTojhvare Codesig n, pages 152-

156,1999.

[5] R. Hegde and N. R. Shanbhag. Energy-efficiency in presence of deep

submicron noise. In

Proc. IEEWACM Int. Con Compute r Aided De-

sign, pages 228-234, 1998.

[6]

S. M. Kang and Y. Leblebici. CMCS Digital Integrated Circuits:

Analysis and Design.

McGraw-Hill, 2nd edition, 1998.

[7] L. Benini, G. De Micheli,E. Macii, D. Sciuto and C. S ilvano. Asymp-

totic zero-transition activity encoding for address busses in low-power

microprocessor-based systems. In Proc. the Great akes Symp. VLSI,

pages 77-82, 1997.

[8]

S. Landman and R. L. Russo. On a pin versus block relationship for

partitions of logic paths.

IEEE Trans. Computers,

C-20: 1469-1479,

1971.

191 H.Mehta, R. M. Owens, and M. J. Irwin. Some issues in Gray code

addressing. In Proc. the Great akes Symp. VLSI, pages 178-180,

Mar. 1996.

[IO]

E.

Musoll, T. Lang, and

J.

Cortadella. Working-zone encoding for

reducing the energy in microprocessor address buses.

IEEE Trans.

on

VLSI Systems,

6(4):568-572, Dec. 1998.

[I

I]

P.

R. Panda and

N.

D. Dutt. Reducing address bus transitions for low

power memory mapping. In

Proc. European Design and Test

C o nf ,

pages 63-37, Mar. 1996.

[I21 S. Ramprasad,

N.

. Shanbhag, and

I.

N. Hajj. Information-theoretic

bounds on average signal transition activity. IEEE Trans. on VLSI,

Sept. 1999.

[131 Semiconductor Industry Association . Intema-

tional technology roadmap for semiconductors.

htp://notes.sematech. org/I 999-SIA Roadma pMo me. him,

1999.

[I41 M. R. Stan and W.

P.

Burleson. Bus-invert coding for low-power YO.

IEEE Trans. on V U I Systems,

pages 49-58, Mar. 1995.

[IS]

M. R. Stan and W. P. Burleson. Two-dimensional codes for low-power.

In

International Symposium

on

Low-Pow er Electronics and Design,

pages 335-340, Aug. 1996.

[I61 C. L.

Su,

C. Y Tsui, and A. M. Despain. Saving power in the control

path

of

embedded processors. IEEE Design and Test

of

Computers,

11(4):24-30, 1994.

[I71 Y Zhang, W. Ye, and M. J. Irwin. An alternative architecture for on-

chip global interconnect: segmented bus power modeling. In

Asilomar

Con

on Signals,

ystems, and Computers,

pages 1062-1065, 1998.

,/

[Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung-...

Documents

Transcript of [Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung-...