[Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung-...

download [Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung- -- [IEEE International Conference on Computer Aided Design (ICCAD) - San Jose, CA,

of 4

Transcript of [Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung-...

  • 8/17/2019 [Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung- -- [IEEE …

    1/4

    Coupling-Driven Signal Encoding Scheme for Low-Power Interface Design

    K i - W o o k K i m ,

    Kwang-Hyun

    B a e k , N a r e s h S h a n b h a g , C.

    L

    L i u t a n d S u n g - M o

    Kang

    C o o r d i n a t e d S c i e n c e L a b o r a t o r y , U n iv .

    of

    I l li n o i s a t U r b a n a - C h a m p a i g n , USA

    D e p t . of C o m p u t e r S c i e n c e , N a t io n a l T s i n g

    Hua

    Univers i ty , Ta iwan

    Abstract

    Coupling effects between on-chip interconnects must be addressed

    in ultra deep submicron VLSI and system-on-a-chip (SoC)designs.

    A new low-power bus encoding scheme is proposed to minimize

    coupled switchings which dominate the on-chip bus power con-

    sumption. The coupling-driven bus invert method use slim encoder

    and decoder architecture to minimize the hardware overhead. Ex-

    perimental results indicate that our encoding methods save effective

    switchings as much as 30 in an 8-bit bus with one-cycle redun-

    dancy.

    1

    Introduction

    Increased coupling effect between interconnects in ultra deep sub-

    micron technology not only aggravates the power-delay metrics but

    also deteriorates the signal integrity due to capacitive and inductive

    crosstalk noises. Conventional approaches to interconnect synthe-

    sis aim at optimal interconnect structures in terms of interconnect

    topology, wire width and spacing, and buffer location and sizes [3].

    In this paper, we study a signal encoding scheme to minimize cou-

    pling effects between interconnects.

    Signal encoding schemes have been proposed to m inimize transi-

    tion activities on buses w hile ignoring cross-coupled capacitances.

    When statistical properties are unknown a priori, the bus-invert

    method [14] and the on-line adaptive scheme

    [ I ]

    can be applied to

    encode random ly distributed signals. On the other hand, highly cor-

    related access patterns exhibit a spatio-temporal locality w hich can

    be exploited for energy reduction [ I 11 in Gray code

    [9,

    161, the

    TO

    method [7], and the working-zone encoding [lo ]. Lower bounds

    for minimum achievable transition activity have been derived for

    noiseless buses in [12] and for noisy buses in [5].

    In

    [17], a seg-

    mentation method was introduced to reduce power consumption.

    Specification transformation approaches were used to reduce the

    number

    of

    memory accesses at the behavioral level [2]. T he effec-

    tiveness

    of

    various encoding schemes w as compared at the system

    level in [4].

    Most of the previous bus-encoding schemes were designed to

    minimize transition activities

    on

    each signal line as if each line

    were isolated from neighboring lines, hence ignoring coup ling ef-

    fects. Such an assum ption may be valid for off-chip buses where the

    impedances of transmission lines

    are

    appropriately adjusted. H ow-

    ever, this is not the case for long on-chip buses which are particu-

    ‘Supported

    in

    part by National Science Foundation under grant NSF

    MIP

    96 12184,

    y

    Intel under

    grant

    5414

    and by

    SRC

    under

    98-HJ-641.

    capacitances

    Figure

    1: A

    tightly cross-coupled on-chip buses in

    a

    system-level chip design

    larly prevalent in a system-on-a-chip. For example, the wire aspect

    ratio is expected to be over 2.4 for intermediate wiring (namely,

    third metal layer and fourth m etal layer) in 0.18

    p n

    seven-layered

    metal process [131. Accordingly, coupling has become an important

    issue with scaled supply voltage when we consider signal integrity

    and power dissipated by coupling capacitances, referred to as cou-

    pling power. Shielding can be a way to avoid crosstalk problem

    with area overhead.

    In this paper, we propose a new encoding scheme for static on-

    chip bus structure to minimize coupling power. The key idea is

    that coupling effects could be alleviated by transforming the signal

    sequences traveling on-chip buses that are closely placed. Small

    blocks of encoding and decoding logic are employed at the trans-

    mitter and receiver of on-chip buses as shown in Figure 1. T he en-

    coder and decoder (codec) should have a low-complexity architec-

    ture so that the power and delay overhead due to the codec circuitry

    can be com pensated by significant savings in sw itching activities

    on tightly coupled buses.

    2 Interconnect Power Characterist ics

    The average energy consumed by a wire with clock frequency

    =

    &

    can be [61

    where n is the number of clock cycles observed and

    I c ( t )

    represents

    the drawn current due to transitions in a clock period. AQaV

    is

    the

    time-averaged charge provided by the power supply to all capaci-

    tances of the interconnect and i s given by

    where p denotes the switchin g probability, and Ctoidenotes the total

    lumped capacitance. The effective capacitance C is defined by

    both physical capacitances and switching activities. T he effective

    capacitance accounts for time-averaged charge stored in physical

    capacitances provided by the power supply.

    0-7803-6445-7/00/ 10.00 000 IEEE

    318

  • 8/17/2019 [Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung- -- [IEEE …

    2/4

    Rd adjacent net

    *

    * *

    primary net

    Figure

    2 A

    distributed

    RC

    model for the interconnects.

    a)

    TY

    pe

    (b) Type

    L

    -

    L ; __

    c x x

    pJ C T Z-1

    e) Type

    (d)

    Type

    IV

    Figure

    3:

    Transition types: a) Single line switching; b) both lines switching in oppo-

    site direction; c) both lines switching n the same direction ; d)

    no

    switching.

    One can model the physical capacitance of a wire as shown in

    Figure

    2. 

    The capacitive components are

    self

    capacitance

    C,

    and

    coupling capacitance C,. The self capacitance C, represents the

    total lumped capacitance including parallel-plate capacitance and

    fringe capacitance. For each capacitive component, w e define the

    effective capacitance as

    c,

    =

    Y .c,

    3)

    E, = z.c, 4)

    where and Z are the average number of effective transitions per

    cycle for C and C,, espectively, which are com puted as follows.

    First, we quantify the selftransition activity for the self capac-

    itance C,.Let p: denote the transitional probability that the signal

    value of

    i

    chang& from

    x

    toy , which can be represented by the sig-

    nal probability p(i",) and the conditional probability p(iE:l i:,)

    such that p&, =,p(i L,).p(i LtlIiz, ,where

    x , y E

    (0 ,l) . I f we as-

    sume that there

    is

    no

    glitch in the signals, the self transition activity

    is given by

    5 )

    y = P0,l

    since the capacitance C, will be charged up only when a low-to-high

    transition takes place.

    Next, the coupled transition activity

    2

    s computed according

    to the correlated switchings between physically adjacent intercon-

    nects. The re are four types of possible transitions when we con-

    sider dynamic charge distribution over coupling capacitances as il-

    lustrated in Figure 3. We suppose that there

    are

    two parallel wires

    placed with minimum spacing. A type transition occurs when one

    of the signals switches while the other stays unchanged such that

    the coupling capacitance is then charged up to

    klC,Vdd.

    where the

    coefficient

    kl

    is introduced as a reference for other types of transi-

    tion. In a type

    II

    transition, one bus switches from low to high while

    the other switches from high to low. Th e effective capacitance will

    be larger than

    kl

    by a factor of kz the value of which is usually two.

    In a type Ill transition, both signals switch simultaneously and C

    will not be charged. Howev er, because of possible misalignment

    of the two transitions, the amoun t of power consumption va ries ac-

    cording to the dynamic characteristics by a factor of k3. In a type

    IV transition, there is no dynamic charge d istribution over coupling

    capacitance. Thus, we set k4 to zero.

    denote the joint transitional probability defined byet

    (0,l). Each ty pe of transition contributes to the effective coupling

    capacitance between a wire i and a wire as follows.

    = kl (P&l +P&l0 +Pylp, + P y l J 0 ) +

    ~ d P ~ I J 0

    ;O ,Ol ) +k3 (P& +P?l,Oo)

    6)

    The total lumped capacitance f or a bus can be computed a ccord-

    ing to Equations 5 ) and (6). Accordingly, the dynamic power

    consumed by the interconnects and d rivers is given by

    We define the capacitance ratio

    q

    = or a terminated bus.

    The capacitance ratio increases as the aspect ratio of the intercon-

    nect increases.

    3 Low Power Encoding Schemes

    Decoder

    ..................................................................

    ncoder

    D.son*latw i Con lor

    .................................................................

    Figure

    4

    Low-pow er encoder-decoder codec) framework

    3.1

    General Codec Architecture

    Figure 

    4

    llustrates a generic codec architecture for tw o bit signals.

    The encoder consists of three components: a predictor, an encod-

    ing function block

    E,

    and a decorrelator. The prediction function

    q n ) s a function of past inpu t values given by q n ) = x n - ,

    codec architecture. Th e combinational function

    E

    reduces the aver-

    age number of self transitions and cou pled switchings between Zi(n)

    and

    z , ( n ) .

    The encoding function E differs from the architectures

    proposed in

    [

    1,121 n that it takes as input the data on adjacent buses

    to account for coupling effect. In ge nerat, for an input signal

    x i ( n ) ,

    the encoding function is given by

    y i ( n )

    =

    E(xi(n) t

    (n),

    x i - l ( n ) ,

    x i - l ( n ) , x i + l ( n ) , % + l ( n ) ) ,

    wherexi- l(n) andx i+l(n) areneighbor-

    ing signals. The input data

    xi(n)

    and the prediction function z(n)

    account for the reduction of self transition activities in

    z i ( n ) .

    Signal

    integrity and coupling power depend on both the current value of

    neighboring signals xi-1 (n) and

    xi+l

    (n)) and the transition histo-

    ries ( Z - l ( n ) and +I@)) . The decorrelator employs a transition

    encoding scheme, whereby a transition is encoded with the logic

    value and no transition is encoded with the logic value 0.

    As

    a mirror of the encoder, the decoder consists of three com po-

    nents: a correlator, a decoding function block D, and

    a

    register to

    keep the prediction function qn . The decoding logic function is

    given byxi(n )

    =

    ~ (y i (n ) i (n ) , Y i - l (n ) , ~ i - l (n ) , Y i+ l (n ) , Z + l (n ) ) .

    The decoder D realizes the inverse function of the encoder E.

    In choosing a codec scheme, we need to take into account two

    major issues. The first criterion is a tradeoff between architecture

    complexity and encoding efficiency. Rent's rule

    [8]

    states that there

    is a simple power-law relationship between the number of YO er-

    minals for a logic block and the number of gates contained in that

    x ( n 2 ,

    ... x ( n

    K)). We consider K = 1 for low-complexity

    319

  • 8/17/2019 [Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung- -- [IEEE …

    3/4

    block for a given degree of parallelism. It means that considera-

    tions for adjacent input pins are likely to add logic gates to a co dec

    functional block. Increased number of logic gates in tum can in-

    duce overhead power consumption and longer propagational delay.

    Therefore, it should be ensured that benefits from data encoding

    are large enough to compensate for the overhead of a codec archi-

    tecture. Secondly, the codec system should guarantee the unique

    decodability constraints, even in the presence of physical noises.

    Th e signal integrity can be ensured by asserting spatial redundancy

    (extra control lines) or temporal redundancy (extra clock cycles) or

    by selecting appropriate supply voltage, the size of transmitter and

    receiver, and the clock frequency.

    3.2 The Coupling-Driven Bus-Invert Scheme

    E n d e r

    E)

    Coupling mu nl y

    Figure

    5: An 8-bit bus encoder for the coupling-driven bus-invert scheme

    The bus-invert method [14] is limited to reduce transition activ-

    ities while assuming that coupling power contribution can be ig-

    nored. However, coupling power becomes

    a

    dominant component

    of dynamic power as wires become thinner and taller. To reflect the

    technology trend appropriately, we propose a

    coupling-driven bus-

    invert

    method to tackle the coupling power reduction problem. T he

    bus-invert method flips the data signal when the number of sw itch-

    ing bits is more than half of the number of signal bits. In the sam e

    context, we invert the input vector, when the coupling effect of the

    inverted signals is less than that of the original signals. Th e prob-

    lems are then how to accurately account for the coupling effect, and

    to effectively implement the schem e with low hardware overhead.

    Before addressing these issues, we state our assumptions. First,

    synchronous latches are located at the transmitter side, thus all the

    transitions shall take place at the same time on the bus. The si-

    multaneo us transitions exclude type transitions by setting k3 =

    0.

    It means that the results we achieve are on the lower end of

    power saving. Second, statistics on the information source are not

    given in advance. H ence this scheme is suitable for data bus encod-

    ing, where it i s difficult to extract accurate proba bilistic information

    off-line.

    Enumeration method is employed to represent the coupling ef-

    fect. If a bus line Bi is located between two other lines, a signal

    transition on

    Bi

    can trigger charge shifts on both coupling capac-

    itances connected to Bi-1 and Bi+l, respectively. In other words,

    at most two couplings can be initiated by a signal transition. T hus,

    2(N

    bits are sufficient to represent the whole set of couplings

    in an N-bit bus per bus cycle. T he encoder architecture is shown in

    Figure 5.

    According to the types of correlated transition between neigh-

    boring buses, the coupling encoder generates a codeword as fol-

    lows:

    00

    for a type

    l l

    or IV transition, 01 for a type

    I

    transition, and

    1 1 for a type I transition. Th e reason that we assign 1 1 to a type

    transition is that switchings in different directions require to change

    the polarity of the charge stored in the coupling capacitance, hence

    consuming about twice the amount of charge required for a type I

    transition. T he codeword 1 1 , instead of 10 helps to make a deci-

    sion on data inversion using a majority voter, because the majority

    voter outputs high when at least eight input lines are high out of fif-

    teen inputs. The majority voter can be implemented by using either

    full-adder circuitry or resistors and a voltage comparator [14]. The

    control signal inv can be transmitted to the receiver using extra

    bus lines or extra transfer cycles. On e problem of additional bus

    lines for control is the area overhead that may not be allowed due

    to physical constraints. In some cases, widening the space between

    signal bus lines can reduce the coupling effects more effectively

    than introducing extra control lines, because the coupling capaci-

    tance is inversely proportional to net space. Temporal redundancy

    is an altemative using extra clock cycles to transfer control signals.

    We assume that the input stream i s transmitted in burst m ode that

    enables us to accommodate temporal redundancy

    [

    151.

    The following theorem gives the expected number of couplings

    per bus cycle with independent data source where each bit switches

    independently with transitional probability

    p.

    Theorem 3.1 Let { B i } be an N-tuple of 0-1 valued random vari-

    ables such that Bi

    = <

    bo, bl,

    ...

    , bN-] >, in which a bit bk

    0

    5

    k 5 N switches independently of other bits with transitional

    probability

    k = p

    0

    5 k 5 N .

    Suppose

    { B i }

    are encoded

    with the coupling-driven bus-invert algorithm with one-cycle re-

    dundancy. Then, the coupled transition activity

    Z

    per bus cycle

    is given by

    N - l N . (2 N - 3) N - 2

    z =

    2 2 N . ( N - 2 ) ‘ ( )

    if

    transitions

    on {Bi}

    ollow the binomial distributions.

    Example 3.1: The average number of couplings per bus cycle for

    an 8-bit bus is evaluated to be 2.484. The average number of cou-

    plings per bus cycle for randomly distributed, independent data is

    3.5

    since

    p

    =

    0.5

    and

    N

    =

    8.

    Hence, the percentage reduction in

    couplings over raw data transmission is 29 which is calculated as

    3.5;;484

    *

    100 when w e use the coupling-driven bus-invert method.

    The coupling-driven bus-invert scheme can em ploy multiple cy-

    cles to notify the receiver whether the transmitted data are inverted

    or not. Multiple extra cycles for cotrol signals imply that the granu-

    larity of unit codeword to be processed becom es finer, hence less

    couplings are likely to occur on the bus. However, this benefit

    comes at the cost of extra clock cycles.

    4 Experimental Results

    The methods proposed were implemented and power consumptions

    were measured using HSPICE. Applied data streams consist of

    MPEG video files

    (Vl,

    2

    and

    V3

    in Table

    l),

    MP3 audio files

    (Al,

    2 and

    A3)

    nd

    PDF

    format files

    (Pl, 2

    and

    P3).

    Table 1 presents percentage reduction in transition activities us-

    ing the bus-invert (BI) method and proposed coupling-driven bus-

    invert (CBI) method. To com pare fairly, we use one redundant cy-

    cle to transfer control signals per eight data packets each of which

    consists of 8 bits. The reason that we compare the CBI scheme

    with the BI method is that the BI method does not require prob-

    abilistic information in advance. The results under the column

    SP

    present percentage reduction in self transitions, calculated as:

    320

  • 8/17/2019 [Doi 10.1109%2Ficcad.2000.896492] Ki-Wook Kim, _ Kwang-Hyun-Baek, _ Shanbhag, N._ Liu, C.L._ Sung- -- [IEEE …

    4/4

    c,

    1.0

    3.0

    5.0

    Red

    BI

    c, p(l) P(E) p(T)

    0.25

    111

    66 20

    86 22.5

    0.75 319  191 19 210 34.2

    1.25

    526

    315 21

    336 36.1

    I

    Avg. 24.3

    I

    24.2 I 24.3 ]I 31.8 17.5 I

    30.0

    BI [I41

    -1*100, where SP(X) enotes the number of self tran-

    sitions for raw input streams and SP(X') enotes that for encoded

    data. The results under the columns

    XP

    and TP resent percentage

    reduction in coupled switchings and total switchings, respectively.

    The total switchings account for both self transitions and coupled

    switchings with a capacitance ratio q. The C BI scheme yields bet-

    ter results compared to the BI method. This is because coupling

    reductions due to the CBI encoding are larger than that due to the

    BI encoding. It implies that the CBI schem e saves much more ef-

    fective switchings by combatting coupled switchings while even

    sacrificing self transitions in a degree.

    The CBI results show 31.8 average coupling reduction which

    is in accordance with our theoretical bound 29 in Example 3.1.

    The deviation comes from locality and correlations in input data

    streams, which are assumed to be none in theoretical analysis.

    Table 2 presents the power consumed by both the codec circuitry

    and interconnects fo r the CBI scheme. Us ing 0 . 1 8 ~echnol-

    ogy, power is measured for random data streams using HSPICE

    with 1.6V power supply and the capacitance ratio q

    = 4

    that is

    a realistic value from [13]. Th e results under column P(1) cor-

    respond to power in pW consumed by interconnects and the re-

    sults under column

    P ( E )

    n pW correspond to the power overhead

    due to encodeddecoder circuits. Without encoding circuits, all the

    power consumption contributes to interconnect power P(1) . Mean-

    while, with suggested codec circuits, total power consumption un-

    der column

    P ( T )

    are the sum of interconnect power P(Z) and en-

    codeddecoder power P ( E ) . It should be pointed out that within a

    realistic range of coupling capacitance, the power overhead P ( E )

    due to the codec are relatively small enough to be compensated by

    significant reduction in interconnect power P(Z). The percentage

    of power savings are shown in the column % Red by using the

    coupling-driven bus-invert encoding scheme.

    CBI

    5

    Conclus ions

    PI 24.2 23.8 24.2

    P2 23.3 21.7 23.1

    ~ ~

    Tightly coupled on-chip buses in a system-on-a-chip impose new

    requirements for interconnect power reduc tion and signal integrity.

    A novel bus encoding scheme was proposed for reducing power

    consumed by on-chip buses by decreasing coupled switchings.

    The coupling-driven bus-invert scheme reduces power consump-

    tion about 30 with one-cycle redundancy. S imulation results by

    using HSPICE indicate that the portion of power consumed by

    the encodeddeco der logic block for those schemes is fairly small

    in state-of-the-art technology. Therefore, the overhead due to en-

    codeddecoder circuitry is compensated for by significant intercon-

    nect power savings.

    32.4 I 16.9 30.5

    30.6 I 16.0 28.8

    O

    32

    References

    [I] L. Benini, A. Macii,E. Macii, M. Poncino, and R. Scarsi. Synthesis of

    low-overhead interfaces for power-efficient communication over wide

    buses. In

    Proc. ACM/IEEE Design Automation Con ,

    pages 128-133,

    1999.

    [2] F. Catthoor, F. Franssen,

    S.

    Wuytack, L. N achtergaele, and

    H.

    D. Man.

    Global communication and memory optimizing transformations for

    low power signal processing systems. In VLSI Signal Processing VII,

    pages 178-187.1994.

    [3]

    J.

    Cong. An Interconnect-centric design flow for nanometer technolo-

    gies.

    In Int. Symp. VLSI Technology, Systems, and Applicat ions,

    pages

    54-57, June 1999.

    [4] W. Fomaciari, D. Sciuto, and C. Silvano. Power estimation for ar-

    chitectural exploration of HW/SW communication on system-level

    buses. In Int. Workshopon HardwareLTojhvare Codesig n, pages 152-

    156,1999.

    [5] R. Hegde and N. R. Shanbhag. Energy-efficiency in presence of deep

    submicron noise. In

    Proc. IEEWACM Int. Con Compute r Aided De-

    sign, pages 228-234, 1998.

    [6]

    S. M. Kang and Y. Leblebici. CMCS Digital Integrated Circuits:

    Analysis and Design.

    McGraw-Hill, 2nd edition, 1998.

    [7] L. Benini, G. De Micheli,E. Macii, D. Sciuto and C. S ilvano. Asymp-

    totic zero-transition activity encoding for address busses in low-power

    microprocessor-based systems. In Proc. the Great akes Symp. VLSI,

    pages 77-82, 1997.

    [8]

    S. Landman and R. L. Russo. On a pin versus block relationship for

    partitions of logic paths.

    IEEE Trans. Computers,

    C-20: 1469-1479,

    1971.

    191 H.Mehta, R. M. Owens, and M. J. Irwin. Some issues in Gray code

    addressing. In Proc. the Great akes Symp. VLSI, pages 178-180,

    Mar. 1996.

    [IO]

    E.

    Musoll, T. Lang, and

    J.

    Cortadella. Working-zone encoding for

    reducing the energy in microprocessor address buses.

    IEEE Trans.

    on

    VLSI Systems,

    6(4):568-572, Dec. 1998.

    [I

    I]

    P.

    R. Panda and

    N.

    D. Dutt. Reducing address bus transitions for low

    power memory mapping. In

    Proc. European Design and Test

    C o nf ,

    pages 63-37, Mar. 1996.

    [I21 S. Ramprasad,

    N.

    . Shanbhag, and

    I.

    N. Hajj. Information-theoretic

    bounds on average signal transition activity. IEEE Trans. on VLSI,

    Sept. 1999.

    [131 Semiconductor Industry Association . Intema-

    tional technology roadmap for semiconductors.

    htp://notes.sematech. org/I 999-SIA Roadma pMo me. him,

    1999.

    [I41 M. R. Stan and W.

    P.

    Burleson. Bus-invert coding for low-power YO.

    IEEE Trans. on V U I Systems,

    pages 49-58, Mar. 1995.

    [IS]

    M. R. Stan and W. P. Burleson. Two-dimensional codes for low-power.

    In

    International Symposium

    on

    Low-Pow er Electronics and Design,

    pages 335-340, Aug. 1996.

    [I61 C. L.

    Su,

    C. Y Tsui, and A. M. Despain. Saving power in the control

    path

    of

    embedded processors. IEEE Design and Test

    of

    Computers,

    11(4):24-30, 1994.

    [I71 Y Zhang, W. Ye, and M. J. Irwin. An alternative architecture for on-

    chip global interconnect: segmented bus power modeling. In

    Asilomar

    Con

    on Signals,

    ystems, and Computers,

    pages 1062-1065, 1998.

    ,/