A Low Power Design of Gray and T0 Codecs for the Address Bus

A Low Power Design of Gray and T0 Codecs for the Address BusEncoding for System Level Power Optimization

Prabhat K. Saraswat, Ghazal Haghani and Appiah Kubi Bernard

Advanced Learning and Research Institute, ALaRI,University of Lugano, Switzerland

ABSTRACTThis report describes our attempt to design the Gray and T0 codecs to be used to encode the bits to be sent on theprocessor-memory address bus. Since switching is one of the most important contributors to the power consumption ofVLSI circuits, it is imperative to encode the bits in such a way that the switching activity on the buses are reduced.However, it should also be understood that encoding does not always reduces power. The trade offs between power uti-lization of the codec hardware and the power reduction due to lessening of switching transitions has also been understood.Different codecs may perform differently for different address sequences. We have generated the sequences of addresses ofspecified sequentiality and evaluated the performance of both codecs.

The codecs are designed and synthesized using VHDL/Synopsis Tools. The VHDL models are then simulated in orderto measure the dynamic power consumed by them when the bits are encoded and decoded. The total power including thepower consumed by the bus is calculated. Various comparisons are made with the uncoded binary scheme. An optimumbus capacitance is also calculated which makes the usage of codecs beneficial. We have also tried to implement anotherscheme where the bus lines are interchanged in order to reduce the power consumption due to crosstalk. The resultsobtained are discussed and explained in the report.

Keywords: Gray Encoding, Zero Transition Encoding, Bus load

1. BUS ENCODING FUNDAMENTALS - GRAY AND T0 CODECSBandwidth of data transfers have increased considerably due to the high speed needed between microprocessors and systeminterfaces. Considerable amount of power is needed at the I/O pins of a microprocessor due to intrinsic capacitance of thebus lines. By minimizing the switching transitions on the system level bus lines, dramatic optimization of average powerconsumption can be achieved. There are various bus encoding schemes that achieve this purpose, eg. Gray code and T0code.1

1.1. Gray EncodingIt has been observed that the addresses generated by a program are often sequential in nature. The simplest way toencode the generated addresses is binary, which results in a lot of transitions thus increasing the switching activity. Oneof the often cited solutions for this was proposed by Su, Tsui and Despain2 to use gray encoding to minimize the numberof transitions. Gray encoding allows for only a single transition for consecutive addresses.

1.2. T0 EncodingThe sequentiality of the addresses is transferred to the subsystem by adding an additional redundant line to the bus inorder to avoid transfer of consecutive addresses. The redundant line is set to zero when 2 of the addresses in the bus areconsecutive, this prevents unnecessary switching, and the receiver then calculates the new address. As it is clearly visible,the T0 code guarantees zero transitions as its asymptotic performance for in-sequence addresses1

2. PROBLEM STATEMENT AND MOTIVATIONAn 16 bit address bus is assumed. Two bus encoding schemes, gray and T0 have to be implemented using VHDL.The program accesses to memory has to be modeled by generating address streams of varying sequentiality. The codecsare evaluated on the basis of switching activity and power consumption. Synthesization and evaluation of codec powerconsumption is done by using synopsis power compiler. The minimum bus load has to be calculated which makes the busencoding convenient. The main motivation for the project is to be able to appreciate and understand the effectiveness ofvarious encoding schemes for the address streams of different sequentiality.

Further author information: (Send correspondence to Prabhat Kumar Saraswat)Prabhat Kumar Saraswat: E-mail: [email protected], Telephone: 0041 786295106

1

3. DESIGN AND IMPLEMENTATION OF VHDL MODELS

The first and foremost step is to design the VHDL models effectively to reduce the overheads due to the hardware of codecitself. The gray code and T0 codec were implemented.The gray encoding algorithm is implemented by comparing each bit with the next bit in the generated bit stream. Forexample, if the generated bit stream is represented as B ,the gray code will be a concatenation of B[i] xor B[i+1] wherei is 0...n-1. The advantage of using gray algorithm before sending the data to the bus is that¡ we have less switchingtransitions and consequently less power usage. As the algorithm for encoding and decoding data is the same we can usesame hardware to encode and decode the data. The hardware configuration for gray encoder and decoder implementedin VHDL is shown in figure1.The zero transition codec algorithm is very efficient for purely sequential addressing mode , In this case we need to defineone extra line in our connection paths to use it as a flag. When system wants to access to sequential addresses in memorywe just freeze the first address and by setting the flag receiver is informed to calculate the addresses from the base address.The hardware configuration for T0 coded implemented in VHDL is shown below:

Figure 1. Implemented Hardware for GRAY and T0 CODEC respectively

The corresponding VHDL codes can be seen in the attached appendix with this report.

4. ADDRESS SEQUENCE GENERATION WITH A SPECIFIED SEQUENTIALITY

We have attempted to generate numbers with a specified sequentiality value. The objective of this attempt is to modelthe pattern of memory accesses from a processor when a software is run on it. The software in itself might be containingsome loops where the sequential memory locations are accessed. However, there might be cases (branches etc.) when thememory accesses are not sequential. The problem to be addressed is how to model those cases when the accesses are notsequential. How to simulate those cases and generate the resulting address streams, when a specific value, say sequentialitypercentage is defined.One primitive attempt to model this problem would be described here. However, this method is definitely not the bestway to generate, but it is hoped that it would raise some questions and issues that would allow for further refinement andunderstanding of this problem.Our approach takes shape along with a basic argument that defines non sequentiality. Non sequentiality, as we can presume,is observed when a totally chaotic (random) distribution of numbers is generated. It refers to the stream generated whenthere is no correlation between the numbers. Thus we need a random number generator function to be able to generatesuch a stream.Let the minimum address accessed is 0, and the max address which can be reached is NUM.We have coded a function whichgenerates a random number according to a uniform distribution:

X is a Uniform Distirbution between (0,NUM)

We will now define two parameters a and b which would be defining the sequentiality levels in generated numbers. Thenumber a and b are related as:

b = 1− a

Both a and b have ranges between 0 and 1 and the value of a would define the sequentiality percentage.The final number generated could be calculated by a simple function. Let I(n) represent a sequential stream from 0 to

2

NUM and X(n) define a uniformly random stream from 0 to NUM as mentioned in the aforementioned paragraph.The number to be generated was defined as:

genNum = a× I (n) + b×X(n)

The function was implemented in a CPP program and following graphs were generated by varying the values of a from 1to 0 in steps of 0.1. Thus a purely sequential stream was gradually made non sequential. Thus 50% sequential occurs ata=0.5. The graphs of a sample generated streams for various sequentiality values can be seen in the attached appendix.

5. DESCRIPTION OF THE SIMULATION ENVIRONMENT AND METHODOLOGY

The simulation methodology followed can be enumerated into following steps. This is further elucidated by a figure showingsome of the important components of the simulation and reports generation. Various steps involved are as follows:

1. The VHDL files are written corresponding to Gray code decoder, encoder and T0 code decoder and encoder.

2. Address streams corresponding to various sequentiality values were generated, corresponding to various sequentialityvalues from 10 (purely sequential) to 0 (purely random).

3. The address streams are used to assign these values to the DIN signal of the encoder inputs of both T0 and Graycodec. This is achieved by automatically generating corresponding do files for the same. The outputs from theencoder modules forms the input for the decoders. The output of decoders are compared against the original streamto verify the behavioral correctness.

4. The test benches are also generated for the encoder and decoder of both Gray and T0 codecs using the generatedaddress stream. The testbenches are generated concurrently with the do files in order to maintain a coherentbehavior. This is done because the random numbers are calculated on the basis of a seed value which depends onthe time of system and various other parameters.

5. The VHDL models are synthesized using design compiler. The testbenches are executed outside the DC shell togive a switching activity interchange format (SAIF) file. The switching activity is pre simulated with synthesizedRTL model of codec and is attached to the testbench.

6. The reports corresponding to power consumption and switching activity are generated. The process is repeated forall the values of a from 10 to 0. It is also repeated for address corresponding to both bit addressable and byteaddressable memory.

The figure corresponds to the simulation environment and methodology:

Figure 2. Simulation Environment Description

3

6. RESULTS

This section will explain all the comparisons and results obtained out of simulations. The assumptions would also beexplained at places where they have been used.

6.1. Switching Activity Reductions

The hamming distance between the consecutive codes were calculated and added inorder to estimate the number oftransitions for each codec. We have done the comparisons between the binary coded bus and gray coded bus, the secondcomparison is between binary and T0, and third is between T0 and gray encoded.The graphs are presented below: The graphs clearly indicate the superiority of the various coding schemes over the binary

Figure 3. Total transitions for various address sequentialities comparing between (a)BIN-GRAY, (b)BIN-T0 and (c)T0-GRAY

coded bus. The values on X axis indicates the value of b in the equation mentioned in the previous section. The smallvalues of b corresponds to the address streams of high sequentiality. We can see that the gray transitions are nearly halfthe binary transitions for the first value of b=0 which corresponds to a purely sequential address stream. It should also benoted that as the sequentiality decreases the reductions in number of transitions due to gray code also decreases. It canbe seen that towards the lower end of x axis (high values of b) the reductions due to gray encoding are small and oftenincoherent. This may be attributed to the highly random nature of the address streams.In case of T0 codec, for the total sequential case, the number of transitions is 1, thus this makes it a best choice for thestreams having a very high degree of sequentiality. It is interesting to note that for the other cases when the sequentialityis 70 % or lower, the T0 codec actually performs worse than the gray code, in terms of number of transitions observed.This can be clearly observed from the third graph shown on the extreme left which shows the comparison between T0 andGray codecs.We have also computed the total number of Bit Transitions on bus for various encoding schemes as compared wrt.the sequentiality values. The graphs corresponding to the Gray and T0 coded bus is shown below. It should be notedthat the 17th bit in the T0 coded bus represents the INC which doesn’t change much as the changes occur only whenthere is a change in sequentiality. Often the lines of the buses does not have same capacitance.2 The bus lines towardsthe inner regions have high capacitance. Thus it would be beneficial if the transitions on those lines are made less. In therelevant literature in this context, there is also a concern about power consumption due to crosstalk between lines, so itis advised that the lines which do not have high number of transitions should be placed between the lines containing alotof transitions. We also have tried to do something like this by pre profiling for various streams to observe which of thebits are changing the most. It should be noted that the higher end bits are not changed because the generated addressstreams was not full 16 bits long.

Figure 4. Transition profiles of various bits in Gray and T0 encoded bus

4

6.2. Power Consumption results from Synopsis Power CompilerThe power consumption due to the codec hardware was evaluated using the power compiler. The switching activity wasgenerated using the pregenerated testbenches corresponding to various address streams. We have done a simulation basedestimation of SA, where a Switching Activity Interchange Format (SAIF) file is generated by simulating the RTL codewith the testbench outside the dc shell and applying the SA to the design. The switching power is measured for bothencoder and decoder of T0 and Gray codes respectively. We have calculated the power for two variants of memories, bitaddressable and byte addressable. This is done by regulating the address streams generated for both types of memories.The results below are for encoder and decoder for Gray and T0 respectively. The quantitative total power calculation isshown in the next section.

Figure 5. Switching Power Consumption for enc and dec of Gray and T0 respectively.

It is clearly seen from the graphs that for highly sequential address streams, the switching power consumption is less.This concurs with the earlier observation that the transitions are less for sequential streams. When the streams are notsequential, the power consumption patterns are quite unpredictable as can be clearly observed from the graphs towardsthe higher values of b in x-axis. The power consumption of the T0 codec is more than the gray codec much owing to theincreased complexity of the T0’s circuit.

6.3. Power Consumption Calculations due to CodecsWe would present the equations derived to calculate total power consumption of codecs. Let us define PGtot to be thetotal power consumption for Gray codec. The total power consumption is actually the sum of two factors, the powerconsumed by the codec and the power consumed by the bus lines. We would be referring to these values as PGcodec andPGbus from now on.PGcodec is sum of the contributions due to both encoder and decoder thus:

PGcodec = PGenc + PGdec

PGcodec = PGenc dynamic + PGenc static + PGdec dynamic + PGdec static

The values for the aforementioned parameters are easily taken out from the power compiler’s report.

It is assumed that the total number of bus lines are same as number of address bits. All the bus lines have different valuesof capacitance.2 The power consumed can be modeled by a simple equation corresponding to the energy consumed by acapacitor. The switching activity factor corresponding to each line also creeps in. The bus consumption for all 16 linescan be showed by the following equation:

PGbus = 12V 2F

∑15i=0 CiSi

whereV = voltage of the binary levelsF = Frequency of the circuit (here 100 MHz)Ci = Capacitance for the bus line iSi = Switching activity for the bus line i

However for the ease of calculation and tractability it can be safely assumed that ∀i, Ci = Ceq. The capacitance of allbuses are assumed to be a constant value Ceq. Thus the bus equation finally becomes:

PGbus = 12V 2FCeq

∑15i=0 Si

thus

PGtot = PGcodec + PGbus = PGenc dynamic + PGenc static + PGdec dynamic + PGdec static + 12V 2FCeq

∑15i=0 Si

5

Similarly for the T0 Codec, we get the same equation except the number of bus lines are 17. Thus the equation is:

PT0tot = PT0codec + PT0bus = PT0enc dynamic + PT0enc static + PT0dec dynamic + PT0dec static + 12V 2FCeq

∑16i=0 Si

7. OPTIMUM BUS LOAD CALCULATIONS

The bus load is the capacitance of the bus lines. The higher is the load the higher is the power consumed per transition.This means that if we have a high load the reduction that is obtained after encoding the information (thus reducing theSA) in terms of power is high. So the savings obtained with respect to the original design are high and one may havebenefits even if an encoder and a decoder (that itself consumes power) is added. On the other side, if the bus has a verylimited load, the reduction in terms of power when the SA (with the encoding) is minimized is limited, hence adding thepower due to the codec may result in an increase of the global power w.r.t. the original design.∗

Thus for the encoding to be convenient and keeping in mind above considerations the total power consumed by the codecshould be less than the total power consumed without the codec hardware (with a simple binary scheme). As it has beencalculated above, in the similar way, the power consumed in the unencoded binary stream is given by:

Puncoded = 12V 2FCeq

∑15i=0 S uni

where S uni is the switching activity due to binary transitions obtained using the simulations.

For the encoding to be convenient and profitable following inequalities should be solved.For Gray Codec

PGtot < Puncoded

PGcodec + 12V 2FCeq

∑15i=0 Si < 1

2V 2FCeq

∑15i=0 S uni

For T0 Codec

PT0tot < Puncoded

PT0codec + 12V 2FCeq

∑16i=0 Si < 1

2V 2FCeq

∑16i=0 S uni

The only unknown in both of the equations is Ceq, which was calculated for both gray and T0 codecs. The calculationswas done for only the bit addressable memory. The calculated values are shown below:

Codec Opt BusloadGray 42 pFT0 57 pF

8. CONCLUSIONS AND FUTURE WORK

This report provided a comprehensive evaluation of two bus encoding schemes, Gray and Zero Transition for the systemlevel power optimization. The overheads due to power consumptions by the hardware itself were understood. In the endan optimum bus load was calculated to make encoding convenient.The effect of interchange of bus lines on the crosstalk power consumption was also understood. One of the future workscould be to model it in hardware and quantitatively evaluate the effect. It was also seen that the address sequentiality hasa lot of role in the power consumption. Thus a logic block can be put before encoding which could arrange the generatedaddress to support a particular encoding scheme.

ACKNOWLEDGMENTS

We would like to thank Prof. Enrico Macii and Prof. Alberto Macii for being so helpful both in and outside of class. Theemail conversations helped us to understand the task at hand good enough and allowed us to explore new avenues. Wewould also like to thank each other for complementing each other so well during the project.

REFERENCES1. L. Benini, G. Micheli, E. Macii, D. Sciuto, and C. Silvano, “Address bus encoding techniques for system-level power

optimization,” 1997.2. C. Y. T. C. L. Su and A. M. Despain, “Saving power in the control path of embedded processors.” IEEE Design and

Test of Computers, Vol. 11, No. 4, pp. 24-30, Winter 1994, 1994.

∗With reference to Prof Machi’s Email

6

A Low Power Design of Gray and T0 Codecs for the Address Bus

Documents

Transcript of A Low Power Design of Gray and T0 Codecs for the Address Bus