DIGIT SERIAL PROCESSING PIPELINING

download DIGIT SERIAL PROCESSING PIPELINING

of 41

Transcript of DIGIT SERIAL PROCESSING PIPELINING

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    1/41

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    2/41

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    3/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 3

    CHAPTER-2

    DIGITAL FILTERS

    There are several reasons why digital filters have become more common in electronic

    systems over the years. Like many digital systems today, digital filters are often implemented

    in a computer using a high-level programming language. This results in a short development

    time and makes them flexible and highly adaptable, since changing the filter characteristics

    simply implies changing some variables in the code. Analog filters on the other hand are

    implemented using analog components, such as inductors and capacitors, which must be

    carefully tuned. This makes analog filters harder to develop and modify. Another advantage

    with digital design is that the characteristics of digital components do not change over time.

    Digital systems are also unaffected by temperature variations. Advances in CMOS processeshave resulted in higher packing density and lower threshold voltages, leading to a

    considerable decrease in power consumption, which further explains the increased interest in

    digital filters.

    Today, frequency-selective digital filters are important and common components in modern

    communication systems. Like their analog counterparts, digital filters are used to suppress

    unwanted frequency components. A linear, time-invariant and causal filter can be described

    by a difference equation (1)

    Applying Z-transform we get,

    (2)

    The function described by eqn(1) is a recursive function, since the computation requires the

    value of former output samples. Since the impulse response of the filter described in (2) is

    infinite, these filters are known as infinite impulse response (IIR) filters. In the case where a k

    =0, for k =1 to N the function described by (2) is a finite impulse response (FIR) filter. FIR

    filter structures are non-recursive.

    The recursive nature of the IIR filter can cause these filters to become unstable. It is therefore

    necessary to perform stability analysis when designing IIR digital filters, especially at finite

    wordlength conditions. This is not the case for FIR filter, they cannot become unstable. FIR

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    4/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 4

    filters can also be designed with exact linear phase. The main drawback of FIR filters is that

    they require higher filter orders than IIR filters to achieve a certain filter specification. The

    higher filter order makes FIR filters larger to implement in hardware than the corresponding

    IIR filters.

    Recursive algorithms require extra attention due to potential problems with stability, finite

    word length effects etc. A class of realizations of IIR filters that fulfills these requirements is

    Wave Digital Filters, (WDF) , which are derived from low-sensitivity analog structures and

    inherits their low sensitivity during the transformation to the digital structures. A special type

    of WDF is the Lattice Wave Digital Filter (LWDF), which consists of two all pass filters in

    parallel. This type of realization of IIR filters is superior in many ways, low pass band

    sensitivity, large dynamic range, robustness, highly modular, and suitable for high speed andlow power operation, but the stop band sensitivity is high.However, this is not a problem

    since the coefficients are fixed.

    Realizations of the allpass filters can be performed in many ways, a structure of interest is a

    cascade of first- and second order Richards allpass sections connected with circulator

    structures. The first and second order Richardsallpass sections using the symmetric two-port

    adaptor with the corresponding signal-flow graph is shown in Fig.2.1 and the whole filter is

    shown in Fig 2.2.

    Fig. 2.1First and second order Richards allpass sections.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    5/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 5

    2.1 LATTICE WAVE DIGITAL FILTERS

    Lattice wave digital filters have a regular structure. An example of a lattice wave digital filter

    is shown in Fig.2.3. The filter consists of two parallel all pass branches that are added or

    subtracted at the output of the filter. The all pass branches are often realized by cascading

    first- and second-order sections . This structure has good properties regarding dynamic range,

    stability, and coefficient sensitivity in the pass band, but hasvery high sensitivity in the stop

    band. In practice, the coefficients can be truncated to a small number of bits, which is an

    important property, since it results in less complex processing elements.

    Fig 2.3 Lattice Wave Digital Filter

    The structure is regular, with a small number of adaptor operations. The regular structure

    makes it easy to implement large filters by reuse of the processing elements. Further, the

    recursive loops are contained within each first- and second-order section of the filter, making

    pipelining and interleaving of operations straightforward.

    Bireciprocal lattice wave digital filters is a subset of half band filters with certain symmetryin the attenuation function around T =/2. A distinguishing feature is that half of the filter

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    6/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 6

    coefficients are zero. Hence, half of the adaptor operations are removed with a reduction of

    the workload by 50%, and moreover, the maximal sample frequency is increased by a factor

    of two since the critical loops now contain two delay elements. This subset of lattice wave

    digital filter is therefore particularly attractive from an implementation point of view. An

    example of a bireciprocal lattice wave digital filter is shown in Fig. 2. 4

    Fig 2.4 Bireciprocal Lattice Wave Digital Filter

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    7/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 7

    CHAPTER-3

    PROCESSING ELEMENTS

    The digit-serial approach is interesting for performing a trade-off between area, throughput,

    and power consumption. In recursive algorithms the critical loop limits the maximum

    throughput. Binary arithmetic can be classified into three groups based on the number of bits

    processed at the time. In the digit-serial approach a number of bits is processed concurrently ,

    i.e., the digit-size, . IfD is unity the arithmetic reduces to bit-serial arithmetic , while for

    D=Wd , where Wd is the data word length, it reduces to bit-parallel arithmetic . Hence, all

    arithmetic can be regarded as digit-serial with bit-parallel and bit-serial just as two special

    cases.

    Timing of the operations is conveniently defined in terms of clock cycles. The execution time

    in terms of clock cycles for a digit-serial Processing Element (PE), is defined as

    (3)where Wd is required to be an integer multiple of the digit-size.

    The throughput or sample rate of a system is defined as the reciprocal of the time between

    two consecutive sample outputs. Introducing the clock period Tclk.

    (4)The minimum clock period is determined by the delay in the critical path where the critical

    path is defined as the path with the longest delay between two registers.The latency of a

    system is defined as the time needed to produce an output value from the corresponding input

    value. For digit-serial arithmetic it is defined as the time needed to produce an output digit

    from an input digit of the same significance. The actual latency is conveniently divided into

    algorithmic latency and clock period as in Eq 5.The algorithmic latency is determined by the

    operation and pipeline level, which will be discussed in the following section.

    (5)

    As an example of a recursive algorithm, a bireciprocal third-order Lattice Wave Digital

    Filter, LWDF, is used.The filter shown in Fig.3.1 has earlier been implemented using parallel

    carry-save arithmetic, and bit-serial arithmetic,. The filter coefficient is =0.375 . Hence, the

    number of fractional bits is Wf=3. In digit-serial arithmetic, a sign-extension circuit is

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    8/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 8

    required in front of the multiplier to produce the most significant bits. The multiplication

    increases the word length with 3bits, which are removed by the quantization block in order

    to keep the word length constant in the loop. By allowing the output y to be quantized and

    sign-extended, these operations can be located as shown in Fig.3.1. The input word length is

    assumed to be 12 bits, which is sign-extended with Wf bits at the input, yielding an internal

    word length of 15 bits, in order to equalize the execution time for all operations. The Wf

    extra bits are also sufficient to prevent overflow in all nodes. Thus, no over/underflow

    saturation circuits are required, which otherwise would have reduced the throughput. The

    extra bits do not decrease the throughput, since it is independentof Wd. The maximum

    throughput for this slightly modified filter algorithm is

    (6)

    Where are the latencies of adder, multiplier and the combinedquantization/sign extension circuit, respectively

    Fig.3.1 A third order bireciprocal LWDF

    3.1 PIPELINING AND INTERLEAVING.

    Pipelining is a transformation that is used for increasing the throughput by increasing the

    parallelism and decreasing the critical path. It can be applied at all abstraction levels during

    the design process. In the pipelining at the algorithm level, additional delay elements are

    introduced at the input or output and propagated into the non-recursive parts of the algorithm

    using retiming. Retiming is allowed in shift-invariant algorithms, i.e., a fixed amount of delay

    can be moved from the inputs to the outputs of an operation without changing the behavior

    of the algorithm. Ideally the critical path should be broken into parts of equal length. Hence,

    operations belonging to several sample periods are processed concurrently. The latency

    remains the same if the parts have equal length, otherwise it will increase. Pipelining always

    changes the properties of the algorithm, e.g., algorithm pipelining increases the group delay

    and increases the parallelism.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    9/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 9

    Another approach to increase the throughput of sequential algorithms without increasing the

    latency is interleaving . Different samples are distributed onto different processing elements

    working in parallel. This relaxes the throughput requirements on each processing element

    and the clock frequency can be reduced according to Eq.(7)

    (7)3.2 LATENCY MODELS

    Pipelining at the arithmetic or logic level will increase the throughput by decreasing the

    critical path enabling an increased clock frequency. By inserting registers or D flip-flops, the

    critical path is decreased by split ideally into parts of equal length. This decreases the

    minimum clock period, but at the same time increases the number of clock cycles before the

    result is available, i.e., the algorithmic latency increases. Since the pipeline registers are not

    ideal, i.e., the propagation time is non-zero, pipelining the operations beyond a certain limit

    will not increase the throughput but only increase the latency. The level of pipelining of

    processing elements is referred to as Latency Model (LM) order. It was introduced for

    modelling of latency for different logic styles implementing bit-serial arithmetic. LM 0 is

    suitable for implementations with static CMOS using standard cells without pipelining, LM 1

    corresponds to implementation with one pipeline register or using dynamic logic styles with

    merged logic and latches, and finally LM 2 which corresponds to pipelining internally in the

    adder suitable for standard cells implementation. It was generalized for digit-serial arithmetic

    and introducing pipeline registers after each additions. We use the reciprocal, i.e., an

    extension of the LM concept to include fractional latency model orders defined by Eq.(8)

    (8)where

    is the number of adders between each pipeline register. By this we keep the

    relationship between logic style and LM order and in addition we gain a tuning variable for

    the pipeline level.

    Hence, with LM equals zero, LM 0, which corresponds to a non-pipelined addershown in Fig.3.2. The algorithmic latency equals .. Hence, the algorithmic latencyin terms of clock cycles equals zero, while the clock period, Tclk0, is determined by the critical

    path, and denoted with dashed arrows in Fig.3.2.A cascade of LM 0 adders, yields an

    increased critical path and thus and the total algorithmic latency becomes

    (9)

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    10/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 10

    LM 1, corresponds to a pipelined adder, according to Fig. 3.2, with algorithmic latency

    while the clock period Tclk1 , is determined by the delay of one adder . A cascadeof LM 1 adders yields an unchanged Tclk1 and the algorithmic latency in the cascade

    becomes

    (10)A fractional LM order is obtained by a cascade of LM 0 adders followed by one LM 1 adder at the

    end, shown in Fig. 3.2. The algorithmic latency becomes

    =1 (11)and the clock period is determined by critical path given by

    ( ) + (12)

    Fig 3.2 Adders with different LM orders, and adders in cascade with their corresponding critical path.

    3.2.1 Multiplication Latency

    In fixed function DSP algorithms, the multiplications are usually performed by applying the data

    sequentially and the constants in parallel using serial/parallel multipliers. The result of the

    multiplication has an increased word length, i.e., the sum of the data word length W d, and the

    constants number of fractional bits Wf , yielding Wd+Wf . For maintaining the word length a

    truncation or rounding must be performed, discarding the Wf least significant bits . A digit-serial

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    11/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 11

    multiplier produces D bits each clock cycle. Hence, the result during the first clock cycles isdiscarded and the algorithmic latency of the multiplication with LM 0 becomes

    (13)

    It is not required that the number of fractional bits of the coefficient is an integer multiple of

    the digit-size. However an alignment stage has to be used in cascade for alignment of the

    digits. The amount of hardware is reduced but the algorithmic latency increases to the nearest

    integer clock cycle. The execution time for the multiplier becomes,

    (14)

    Introducing a pipeline register at the output yields a LM 1 multiplier and the algorithmic

    latency increase with one clock cycle. However the execution time is unchanged.

    (15)3.3 MAXIMAL SAMPLE FREQUENCY

    Non-recursive algorithms have no fundamental upper limit of the throughput in contrast with

    recursive algorithms that have under hardware speed constraints a maximum throughput fmax ,

    limited by the total latency.

    { } (16)Where is the total latency and Ni is the number of delay elements in the recursive loop .

    The reciprocal Tmin is referred to as the minimum iteration period bound.It is convenient to

    divide Tmin into Lmin and Tclkas shown in Eq.(17)

    (17)Where is the algorithmic latency for operation k in the recursive loop i , Lmin isdetermined by the algorithm and LM order that can be known before the actual minimal

    iteration bound. The loop with is called the critical loop.

    To get a clear idea the bireciprocal lattice wave digital filter is shown with its critical loop

    marked with dashed lines in Fig 3.3.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    12/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 12

    Fig 3.3 Example of critical loop in IIR Filter

    (18)3.4ALGORITHM TRANSFORMATIONS

    Algorithm transformations are used for increasing the throughput of algorithms or

    decreasing the power consumption. A tremendous amount of work has been done in the past

    to increase the throughput. However these results can be traded for low power using the

    excess speed to lower the power supply voltage, i.e., voltage scaling.

    Pipelining, described earlier, is based on retiming. In non-recursive algorithms, which are

    limited by the critical path, pipelining at different design levels is easily applied to split the

    critical path and thereby increase the throughput. In recursive algorithms limited by the

    critical loop, pipelining at the algorithm level is impossible. Methods called Scattered Look-

    Ahead pipelining or Frequency Masking Techniques can be applied introducing more delay

    elements in the loop and thus increase fmax . The former method introduce extra poles which

    have to be cancelled by zeros in order to obtain an unaltered transfer function, this may cause

    problems under finite word length conditions. Hence, Frequency Masking Techniques are

    preferable. However, pipelining at the logic level for split of the critical path by increasing

    the LM order is easily applied.

    Another possibility is to use numerical equivalence transformations in order to reduce the

    number of operations in the critical loop. These transformations are based on the

    commutative, distributive, associative properties of shift-invariant arithmetic operations.

    Applying these transformations often yield a critical loop containing only one addition, one

    multiplication, and a quantization . By rewriting the multiplication as a sum of power-of-two

    coefficients the latency the can be reduced further yielding, for bit-serial arithmetic, L=Wf ,independent on LM order. The multiplication is rewritten as sum of shifted inputs,

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    13/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 13

    introducing several loops. The shifting is implemented using D flip-flops in bit-serial

    arithmetic. The power-of-two coefficient that require most shifting, i.e., largest latency, is

    placed as the inner loop and the other coefficients is placed as outer loops in order with

    decreasing amount of shifting. This technique can be expanded for larger digit-sizes than

    one.

    3.5 IMPLEMENTATION OF DSP ALGORITHMS

    The implementation of the DSP algorithms can be performed by a sequence of descriptions

    introducing more information at each step of the design.

    3.5.1 Precedence Graph

    The precedence graph shows the executable order of the operations, which operations that

    have to be computed in a sequence, and which operations that can be computed in parallel. It

    can also serve as a base for writing executable code for implementation using signal

    processors or for use of implementation of DSP algorithms using a personal computer. As

    examples the IIR filter in Fig. 3.3 yields the precedence graph shown in Fig. 3.4. The critical

    loop is denoted with a dashed line.

    Fig 3.4 Precedence Graph for IIR Filter

    3.5.2 Computation Graph

    By including timing information of the operations to the precedence graph a computation

    graph is obtained. This graph can serve as a base for the scheduling of the operations. The

    shaded area indicates the execution time, while the darker area indicates latency of the

    operations. The unit on the time axis is clock cycles. The timing information is then later used

    for the design of the control unit.

    At this level of the design the LM order has to be chosen. Here we use three different LM

    orders 0, 1/3, and 1. As an example, the computation graphs over a single sample interval

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    14/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 14

    for the IIR filter is shown in Fig. 3.5. The As Soon As Possible (ASAP) scheduling approach

    has been used for the scheduling of the IIR filter.

    Fig 3.5 Computation graphs for LM 0, 1/3 and 1 implementations of the IIR filter using D=3

    3.6 OPERATION SCHEDULING

    Operation scheduling are generally a combinatorial optimization problem. The operations in

    data independent DSP algorithms are known in advance and a static schedule, which is

    optimal in some sense, can be found before the next design level. The contrary case is called

    dynamic scheduling, which is performed at execution time. We are interested for the

    recursive algorithm to obtain maximally fast and resource minimal schedules, i.e., the

    schedule reach the minimum iteration bound with a minimum number of processing

    elements. For the non-recursive algorithm is the aim to obtain a schedule that can be unfolded

    at an arbitrary degree

    3.7 UNFOLDING AND CYCLIC SCHEDULING OF RECURSIVE ALGORITHMS

    To attain the minimum iteration bound it is necessary to perform loop unfolding of the

    algorithm and cyclic scheduling of the operations belonging to several successively sample

    intervals if

    The execution time for the processing elements are longer than, or The critical loop(s) contain more than one delay element

    For digit-serial arithmetic a lower bound of m is derived in the following. The execution time

    for a digit-serial PE is . The ratio between and determines the number ofdigits that need to be processed concurrently, i.e., equal to the number of concurrentsamples

    (19)

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    15/41

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    16/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 16

    scheduling for several sample periods just by taking m sets ofand delay them in time.In the case with a non-integer , a non-uniform delay with alternating and integer clock cycles between the sets can be used. This aligns all samples with the sameclock phase.An example of a maximally fast schedule using ,D=3 LM 1 is shown in Fig 3.6

    Fig 3.6 Maximally Fast scheduling

    3.8 MAPPING TO HARDWARE

    Fig 3.7 Hardware Structure for Unfolded Filter

    An isomorphic mapping of the operations in the cyclic schedule to hardware yields a

    maximally fast and resource minimal implementation. The branches in the computation graph

    with different start and stop time are mapped to shift-registers, i.e., the time difference is

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    17/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 17

    referred to as slack or Shimming Delay (SD) .The shimming delay is registers implemented

    using D flip-flops. Properties of a maximal fast implementation are that the critical loop has

    no SD, i.e., the delay elements in the critical loop are totally absorbed by the latencies of the

    operations in the loop. Isomorphic mapping to hardware also yields low power consumption,

    since the amount of dataflow without processing data is low. The processing elements yields

    a good utilization grade, hence power down or gating of the clock only increase the

    complexity without reducing the power consumption. Fig 3.7 shows the IIR Filter structure

    after unfolding.

    3.3 THE SUM PRODUCT ALGORITHM

    The sum-product algorithm is a generic algorithm that operates on a factor graph through a

    sequence of local computations at every factor-graph node . The computation rules consist

    only of multiplications and additions, hence the name sum product algorithm. The local

    results are passed as messagesalong the edges of the factor graph. The algorithm can be used

    to compute the exact function summary in a factor graph that forms a tree, that is, has no

    loops. But the sum-product algorithm can also be applied to factor graphs with cycles where

    it results in an iterative algorithm without a natural termination. This makes the function

    summary non-exact But decoding of turbo codes or low-density parity-check codes are some

    of the most exciting applications that reflect precisely this situation with a factor graph

    having cycles. And with some precautions, the algorithm performs very well.

    A mathematical representation of the sum product algorithm can be observed in the following

    example. Let us consider the case with the real-valued global function as in equation (1) that

    may represent a conditional joint probability mass function of a collection of discrete random

    variables, given some observation y. We are then interested in the function summary

    2 3 4 5

    )4()(),,,,(| 11543211x x x x

    xgxxxxxgyxp

    2 3 4 5

    )5(),(,,,| 5343321211x x x x

    EDCBA xxfxxfxxxfxfxfyxp

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    18/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 18

    Figure 3.3 Gathering separate product terms in factor graph to compute g1(x1)

    We observe immediately that g1(x1) can be calculated by only knowing fA and fBCDE . The

    latter can be computed by just knowingfB , fCandfDE . Finally,fDEcan be calculated by just

    knowing fD and fE. The products can be assembled in the factor graph as shown in Fig

    3.3.With each node in the factor graph we can now imagine an associated processor which is

    capable of doing local products and local function summaries. They may communicate

    together by sending and receiving messages from neighbouring nodes. The messages are

    whole distributions, that is., the outcome of the function nodes, which are passed from one

    factor graph node to another connected by an edge. In general, they represent discrete

    probability mass functions, but also continuous probability distributions are included in the

    framework. Through the message passing behaviour, all information needed to calculate g1

    (x1) becomes available at x1. Hence, the information is distributed fully bi-directional on all

    branches of the network if we calculate the function summary for all variables.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    19/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 19

    CHAPTER-4

    PROBABILITY CALCULUS MODULES

    4.1 BUILDING BLOCK OF PROBABILITY PROPOGATION NETWORK

    In the previous sections we have introduced the basics of factor graphs and the sum-product

    algorithm which can be run on these graphs. The messages passed from one node to another

    often have the meaning of probabilities or probability density functions. To construct

    probability propagation networks, we consider in the following building block

    Figure 4.1 Building Block Of Probability Propagation Network

    The building block compute a discrete probability mass function pz from the discrete

    probability mass functions px and py as follows:

    Let X, Y, and Z be finite sets. Letpxandpybe the input probability mass functions defined on

    the sets X and Y, respectively. Letpzbe the output probability mass function on Z defined by

    )6(.............),,( ZzzyxfypxpzpXx Yy

    YXZ

    Using this equation we can calculate the output probability function a scaling factor is

    added in order to adjust the summation of probabilities to 1.The {0,1} valued functions f can

    be illustrated by trellis modules.Such a trellis module is a bipartite graph with labelled edges.

    The set of left-hand vertices is X, the set of right-hand vertices is Z, and an edge between

    x X and z Z with label y Y exists if and only if , f (x, y, z) = 1. Conversely, the trellis

    module uniquely defines f . In the context of coding theory, the binary indicator functions f

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    20/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 20

    are known as local indicator functions of the factorized global code membership indicator

    functions.

    4.2 SOFT LOGIC GATES AND TRELLIS REPRESENTATIONS

    EQUAL GATE

    The functionf(x,y,z) for this particular case is equal to 1 if and only ifx=y=z and

    f(x,y,z) = 0 otherwise. The corresponding trellis is shown below.

    Figure 4.2 EQUAL Gate Trellis

    The probability formulation of the output distribution is given by

    The equal gate is simply a local function node that multiplies two probabilities

    together. A scaling factor is provided in order to round the value to either 0 or 1.

    SOFT XOR GATE

    Another common functionf is defined asf(x,y,z)=1 if and only if z= x xor yand f(x,y,

    z)=0 otherwise. The corresponding trellis is shown in Fig 4.3.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    21/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 21

    Figure 4.3 XOR Gate Trellis

    The probability formulation of the output distribution is given by

    =

    Here, the soft XOR both multiplies and adds probability distributions. There are two multiply

    operations here.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    22/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 22

    CHAPTER-5

    CIRCUIT IMPLEMENTATION

    This section is concerned with the hardware implementation of the soft logic gates discussed

    in the previous section. For local functions f it was shown that the sum product algorithm

    allowed probability distributions which behaved similar to Boolean logic gates. This section

    documents the building blocks and circuits that accomplish these probabilistic adds and

    multiply.

    Is it not obvious how these circuits will operate. Will they be voltage or current mode

    implementations? How can adds and multiplies be constructed? Adds are simple in analog

    circuitry. Essentially they are available for free. Recall that Kirchoffs Current Law states the

    sum of all currents into a node is zero. By connecting one wire to another (shorting a wire),

    the addition of current is realized.

    Multiplication is not straightforward in analog CMOS. In 1975 Gilberts current multiplier

    used both the exponential and logarithmic characteristics of BJTs to multiply current.

    Performing multiplication using analog CMOS requires the manipulation of the operation

    range of the MOS transistor. This operating range is known as the sub threshold region, and it

    is synonymous with Carver Mead. Meads work focused on exploiting the non linearities of

    MOS transistors operating in the sub threshold region. The goal was to create circuits that

    used little power and behaved like biological functions. By operating in the sub threshold

    region, Mead found that CMOS technology behaved like bipolar junction transistors, that is,

    the CMOS implementations modeled the characteristics of BJTs necessary have current

    multiplication

    5.1 SIGNAL SUMMATION

    Summing signals is easily accomplished in the current domain that is when signals are

    represented by currents. This is due to Kirchhoffs current summation law, which states that

    the sum of all currents along the incoming branches to a given node is equal to the sum of all

    currents of the outgoing branches. If only one outgoing branch exists, it automatically carries

    the sum current of all incoming branches. This means that current summation is simply done

    by connecting wires.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    23/41

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    24/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 24

    where UT is the thermal voltage, kT/q. As expected, the current response is exponential

    function of Vgs. The deviation from the exponential behavior occurs when the flow of Isat

    becomes restricted.

    Figure 5.1: Saturation current of an NMOS transistor as a function of gate voltage.

    We have just defined a critical operating parameter in our probability circuits. From the graph

    P1 (z = 1) = Isat = 10nA and P1 < 10nA when P1 (z 1) (9)

    P0 (z = 1) = Isat = 10nA and P0 < 10nA when P0 (z 1) (10)

    under the following constraint that,

    P1 + P0 = 1 (11)

    so that the currents are complimentary.

    5.3 A SUBTHRESHOLD CMOS MULTIPLIER

    In order to mimic the functions described by the soft logic gates, a circuit must be

    created to allow for the multiplication. The multiplier circuit is nothing more than a

    collection of several current mirrors, and a differential pair - a simple, but extremely

    elegant solution. We begin by discussing several critical operation parameters of a

    diode connected transistor, show how it creates a current mirror, define the transfer

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    25/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 25

    characteristics of a differential pair, and then show how these circuits create a current

    multiplier.

    CURRENT MIRROR

    Figure 5.2 Current Mirror

    Current mirrors are a key element in analog integrated circuits. They are used to duplicate

    currents or to fold or cascade parts of the circuit in order to reduce the supply-voltage

    requirements. The simplest structure of an NMOS current mirror is shown below. Both

    transistors have to be in saturation and we rely on perfect matching for an ideal operation. A

    brief analysis shows that the copy errors due to the finite output resistance of the mirror

    transistors are relatively large. The output resistance is improved by making the transistors

    longer. Note that the current mirror can be operated in both strong inversion and weak

    inversion. However, current mirrors operated in strong inversion match better than those

    operated in weak inversion. For a given WL, where W is the width of the transistor, thematching of the transistors is best if the current mirror is designed to operate with a large VGS,

    that is, to force the transistor to deep strong inversion. Matching can be improved by

    augmenting the active transistor area WL. This reduces the relative errors of random

    fabrication errors.

    The design of a current mirror is generally started by imposing current mirror a certain

    voltage swing at the nominal current. This leads to a W=L in a given semiconductor

    technology. The minimum voltage between drain and source that still allows the operation in

    the saturated region can be derived from the given gate-source voltage. Finally, the active

    transistor area WL is adapted until the desired level of matching is achieved. Note, however,

    that the parasitic gate capacitance is augmented by the same amount. Hence, the increase of

    the parasitic capacitance will reduce the maximum operation speed. Generally, several

    parameters have to be traded off during the design process.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    26/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 26

    In an NMOS device, the drain current Isat is an exponential function of Vs and Vd.

    )12(0T

    sg

    U

    VV

    eII

    If this equation is solved in terms of Vg the result is a logarithmic current to voltage converter

    )13(log0

    I

    IUVV TSg

    In a current mirror, a diode connected transistor shares its gate node with another transistor of

    the same type. If the current sources are fixed and the devices are in saturation, they act as

    current sources. Similarly if both devices are the same geometries and share the same source

    potential then they source the same current.

    This is explained in the current mirror shown above Current IIN sets gate voltage Vg for both

    transistors. Consequently Vg now sets current IOUT . In this example, Vs,T1 = Vs,T2 and Vd,T1 =

    Vg,T1 = Vg,T2 . Using above equations it is easily verified that IOUT = IIN because of the

    logarithmic-exponential characteristics.

    Current mirrors allow the current output to be scaled by either using different source voltage

    potentials in which

    )14(

    21

    IN

    U

    VV

    OUT IeIT

    ss

    however we can also scale the current output by varying the ratio of transistor widths given as

    )15(11

    22INOUT I

    LW

    LWI

    The implementation of efficient and noise tolerant current scaling is important when a

    decoding application is considered. Due to the multiplicative nature of the sum productalgorithm and the fact that the multiplied probabilities will not always be unity, the current

    level will tend to 0 if it is not scaled from time to time.

    THE DIFFERENTIAL PAIR

    As the final step towards building the Analog CMOS multiplier, the differential pair is

    introduced.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    27/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 27

    Figure 5.3 DIFFERENTIAL PAIR

    Consider the circuit shown above, comprising three NMOS transistors. The sources ofM1

    and M2, which are called the differential-pair transistors, are connected together at node V.

    The third transistor,Mb, which is called the bias transistor, is supposed to sink a constant bias

    current,Ib, from node V. The gate voltages ofM1andM2 are the inputs to this circuit and the

    currentsI1 andI2 are its outputs. For a circuit like the differential pair, we often express each

    of the two input voltages, V1 and V2, in terms of a common-mode input voltage, and a

    differential mode input voltage. Analogously, we also often express the two output currents,

    I1 andI2, in terms of a common-mode output current, and a differential-mode output current.

    This circuit is quite powerful, by careful consideration of what voltage is applied at V1 and

    V2, the current at the sources of the differential pair can be steered to appear at either output.

    )16(21

    11

    ININ

    INb

    II

    III

    and

    )17(

    21

    2

    2

    ININ

    INb

    II

    III

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    28/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 28

    Core Circuit For Matrix Multiplications

    The fundamental circuit for matrix multiplication is as shown in Fig 5.4.

    FIGURE 5.4 CORE CIRCUIT FOR MATRIX MULTIPLICATION

    Its inputs are the currents Ix,i:i=1, 2,,m and the currentsIy,i :j=1,2,.n. Its outputs are

    the currentsIi,j. All transistors are assumed to be weakly inverted MOS transistors.

    The function of this circuit is given by

    )18(,,,

    j

    jy

    j

    ixZji I

    I

    II

    II

    )19(,1

    ,

    1

    ,

    n

    j

    JYy

    m

    i

    ixX IIII

    Now recall our equation,

    )20(.............),,( ZzzyxfypxpzpXx Yy

    YXZ

    The application of the circuit of Figure above to the computation of equation is now

    straightforward. Let X={x1,x2,..xm } and Y={y1,y2,..ym }. The input terminals of the circuit are

    fed with the currentsIx,i =Ix px(xi) andIy,j =Iy py(yj), respectively. The output currents then

    equal to

    Ii,j =Iz px(xi) py(yj) (21)

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    29/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 29

    The computation is completed by summing the currents Ii,j for each zZ for which

    f(xi,yj,z)=1.

    CMOS Multiplier

    There is not much work left to create this multiplier. In fact, all the necessary components

    have been discussed. All that remains is to add a diode connected transistor to each of the

    inputs in figure a. Recall that the current Ib was steered by the domineering voltage. By

    adding diode connected transistors to inputs V1 and V2, creating current mirrors at the inputs

    of the differential pair, Ib is now multiplied by currents instead of voltages.

    Formally that is given by the following equations

    21

    12

    ININ

    INbII

    III

    and21

    22

    ININ

    INbII

    III

    5.4 IMPLEMENTATION OF SOFT XOR GATE

    Let us start with the most simple module, the soft-XOR gate. It can be drawn directly using

    its butterfly trellis section that has been derived from its binary indicator function f. Like all

    modules with binary input distributions, it consists of six core transistors forming the

    multiplication matrix and its characteristic connection pattern of the trellis. In fact the trellis

    pattern will be directly visible on silicon if the devices are properly arranged. This fact may

    be helpful in order to create automatic tools for generating such building blocks in a chip-

    design environment. All product terms are used to build the output probability distribution pz.

    The output terms are mirrored by the current mirrors on top of the kernel circuit. The input

    currentsIx,i are also passed through an input current mirror. By doing this, the module gets

    freely cascadable by simply connecting the output current vectorIzto one of the input current

    vectorsIx andIy , respectively, of the next circuit section. This method marks the simplest

    way of interconnection of several building blocks. Note also that all the current mirrors may

    equally well operate in the strong inversion region of MOS transistors, thereby having a

    standard quadratic behaviour.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    30/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 30

    Figure 5.5 Soft XOR gate

    The six NMOS transistors in the middle section performs the multiplication operation . The

    PMOS transistors at the top two corners are output transistors they are current mirrors that

    source current rather than sink current, by this it will be able to cascade this block to other

    blocks for a large circuit. The current mirrors are necessary for scaling of current. The input

    current probabilities are given through diode connected NMOS. These diode connected (gate

    tied to drain) NMOS at each input have a low input impedance and logic gate obtained are

    current-mode circuits. Transforming the blocks to voltage-mode operation is easily achieved

    by shifting the diode connected NMOS from the input to the output side. From the trellis

    diagram and equations discussed earlier it is clear that the SOFT XOR gate will perform twoadds also this is made possible by shorting of sources of T5 and T7 and also T6 and T8 each

    giving the following realizations of PZ(0) and PZ(1) respectively.

    =

    The circuit operation is straightforward current at Ix P(x=0) and Ix P(x=1) bias Vg of T11 andT12 and T13 and T14 respectively. T12 and T13 mirror the current. Transistors T9 and T10

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    31/41

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    32/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 32

    The diode connected transistors at each input have a low input impedance and the soft logic

    gate obtained are current-mode circuits. Transforming the blocks to voltage-mode operation

    is easily achieved by shifting the diode connected NMOS from the input to the output side.

    The current mirrors at the output make possible easy cascading of several blocks of soft

    EQUAL gate.

    The operation is simple and there is only a small difference from the circuit of soft XOR gate

    .Here the transistor at the middle is not shorted to the gate terminals of top transistors this is

    because in case of EQUAL operation the need of addition is not there, only product is

    required so the wires are not shorted instead the drain of T6 and T7 are connected to the VDD.

    The voltage VBIAS is given in order to normalize the probability of current as in the equation.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    33/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 33

    CHAPTER-6

    DESIGN CONSIDERATIONS

    Several questions come to mind when presented with the above schematics. In specifying

    parameters such as voltage and transistor sizes it is important to understand the context in

    which the circuit will be performing. As mentioned previously, these circuits are suited well

    for decoding applications. Why? First, they can indicate, through very little computational

    cost, the probability of whether a bit is a one or zero. Secondly, analog circuits biggest

    criticism, the lack of accuracy, is not an issue in this application. This statement deserves

    careful consideration.

    Suppose that after passing through several gates we operate at 50% of maximum current

    value, then we leave ourselves vulnerable to noise contamination and we lose the meaning of

    what we wanted to compute. Now suppose, that after several probability computations we

    scale our probability current using W/L ratio or source voltage Because our circuits give

    indications of a 1 or 0, worrying about whether or not we can accurately bring our current

    back to its maximum operating value is irrelevant. Instead, we can preserve our computations

    against the circuits inherent noise by periodically "raising" the current level; this is

    analogous to the natural error correction of digital logic, where an input can be pulled high orset low after each stage of logic.

    6.1 DC OPERATING VOLTAGE

    The supply voltage VDD is a trade-off between power consumption and signal noise. For

    example, lowering VDD results is lower current draw from the supply which results in more

    power savings, however this is at the expense of degraded transistor operation, the output

    currents may not rise to the maximum operating level. However, because some error is

    tolerated, a large supply voltage is also not necessary. Instead the voltage VDD should be

    chosen to ensure to ensure maximum power savings while tolerating some percentage of

    error.

    In addition to VDD the circuit diagrams of the soft logic gates use a bias voltage called

    VBIAS to assist in the probability calculation.

    VBIAS = 25%(VDD). (22)

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    34/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 34

    Also the output of a current mirror Iout can be scaled according to the difference in Vs

    . Voltage VBIAS attempts to reduce this scaling effect and allow this second stage of current

    mirrors to have a more ideal operation, that is

    IOUT = IIN

    Because we can never be sure of the voltage Vs at the transistors in the differential pair, a bias

    voltage is selected to reduce the scaling of IOUT.

    6.2 TRANSISTOR SIZING

    CMOS implementations have the benefit of occupying less die area than their BJT

    counterparts in layout. Therefore, the circuit must operate correctly, but should occupy an

    area smaller than a BJT implementation. With this constraint in mind, sizing in these circuits

    employs where possible, a minimum W/L ratio. In digital design, the sizing of the transistors

    (in particular, the transistors width) can allow the designer to favour different types of input

    conditions. The designer does this to decrease the delay of that particular gate. In the case of

    these circuits, widening the transistors does not decrease the rise time. In fact, widening the

    transistors will simply increase power consumption, current levels will need to be raised, and

    a larger supply voltage is needed to calculate the circuits outputs. Therefore digital

    techniques to improve gate delay do not apply. Our only other recourse is to set the current atlevel just below Vt ,so that we operate at the maximum current density in the sub threshold

    region.

    Transistor length is also important. Short channel effects, such as the Early effect can

    modulate our transistors current output. By increasing the transistor length we can protect

    against any short channel effects. As a rule of thumb in any of the circuits presented in this

    paper, current mirror transistors length were given by

    Lcurrent_mirror = 6Lmin

    This sizing rule is needed for proper operation.

    6.3 IMPLEMENTATION ISSUES

    Topology-Induced Problems

    The topology-induced problems means that, for example, biasing a large probability

    propagation network may cause severe problems since no local matching can be guaranteed

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    35/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 35

    for a distributed biasing networks (e.g. distributed current mirrors for the current sources

    needed in each cell). Since the geometrical dimensions may get very large, additional effects

    such as non-zero resistance of long metal wires show up. This may affect signal tracks as

    well as power-supply lines. It must be kept in mind during the design phase that a distributed

    bias-network implemented with BJTs draws a considerable amount of base current which

    causes large voltage drops on long metal tracks. In the extreme, these voltage drops may

    prevent the whole network from working correctly. Comparable problems arise in digital

    circuits for the clock distribution. There the solution is to balance the load in different

    branches of a clock distribution tree instead of having one large single track. Adapted to our

    networks, this would mean using local repeaters for the biasing circuits. Errors introduced by

    these circuits are not critical, since all calculations rely on relative signal strength.

    Construction of Large Analog Networks

    A second, more general implementation issue is how to construct large analog computational

    networks. Up to a few hundred transistors, an analog system may be drawn very easily if a

    hierarchic design approach is chosen. But imagine a large factor graph of several hundreds or

    thousands of individual nodes. The drawing of interconnection lines between the individual

    building blocks would be complex and there may be error. It is certainly a good idea not to

    rely on your own drawing capability if a schematic can be generated by a computer program.

    In general, it will not be possible to design a large analog network first-time-right without

    computer aided design. By this we subsume not only computer-aided drawing (CAD), but

    also computer-aided engineering (CAE), which includes much more than only the sketch of

    schematics.

    Testability

    Another big issue of such large networks is testability. How can we guarantee that a circuit

    leaving the waferfab works as expected? Rudimentary tests such as checking the supply

    current or verifiying individual test blocks are generally not sufficent to guarantee the overall

    functionality. Testing large digital circuits is much easier than doing the same thing for

    analog networks. Boundary scan and JTAG test access ports are commonly used today for

    looking inside the working digital circuit. They mostly rely on digital registers that can be

    addressed and read out serially on certain circuit chip pins. Testing large analog systems is

    much more difficult. The measuring circuitry should not modifiy (by creating additional loads

    on the interesting nodes) the overall behaviour. Additionally, the resolution of measured

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    36/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 36

    values should be better than the resolution of the actual circuit under test. This means that the

    circuits for measuring have to be more precise, and are thus in general also more complex

    and space consuming, than the circuit to verify. Even if a measurement circuit can be shared

    among many circuit nodes to test, it may add a considerable overhead to the overall network.

    So it would be desired that the circuits functionality could be guaranteed by design. One

    approach to this may lie in an information-theoretic approach that tries to quantify the impact

    of individual error sources to an overall probability propagation network. Unfortunately, we

    did not have the time yet to investigate such an approach, but it will be subject to future

    research.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    37/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 37

    CHAPTER-7

    ISSUES AND FUTURE WORK

    The key question is how to apply these circuits in decoding applications. We know that any

    codeword or sequence of data can be represented by a corresponding trellis diagram. Each

    individual trellis section corresponds to a function variable on a factor graph. The output

    characteristics of that trellis depend on the local function f(x; y; z) that the variable is passed

    into. Therefore, we can create a factor graph that is a visual depiction of this trellis diagram.

    Furthermore, we can map our equal and XOR probability modules to this trellis, and observe

    any given variable on that factor graph.

    What happens if the trellis diagram does not have a specific entry point and exit point? In this

    case the factor graph has no clear entry and exit point and becomes cyclical. This model

    accurately describes any codes that are "tail-biting." Like a trellis of arbitrary states, the

    possible paths of this project branch in every direction. In order to fully study the circuits

    presented in this paper, the next step would be to layout these schematics and compare the

    performance of the ideal circuits presented with those simulated with interconnect delay. In

    addition, tape out of these schematics for actual test may prove whether the MOSIS process

    allows for proper analog behaviour. A really aggressive schedule would seek to fullyunderstand turbo codes, a high performing channel encoding scheme, and move to creating a

    tool to automate schematic design and layout.

    Beside the decoding applications, the probability propagation networks may be applied in

    various other related domains such as the tracking of hidden-Markov models, widely used for

    many pattern recognition tasks, and the inference on Bayesian networks, which appear in the

    context of artificial intelligence problems.

    Application of the probability-propagation calculus to other problems

    Many problems can be described by factor graphs which in turn can be directly converted to

    an analog probability-propagation network. It would be very interesting to apply the design

    technique to other application fields such as artificial-intelligence problems that might appear

    in on-line fault-detection circuits of complex systems.

    Adaptive filters

    By changing the signal representation from a probability-based interpretation to a real-valued

    interpretation, the well-known equal gate and soft-XOR gate may be operated as real-valueadders and real-value multipliers, respectively. Hence, they represent the basic operations of

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    38/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 38

    discrete-time filters. By making the filter taps adaptive, we could easily build adaptive FIR

    and IIR filters. Adaptive FIRfilters are commonly used for equalizing wire-line channels.

    Joint channel-equalizer.

    In the communications community, there exist several concepts of jointly equalizing a given

    channel and decoding the transmission code. But all of them work in the digital domain and it

    thus may make no sense to do decoding in the analog way, whereas the remaining part of the

    receiver works digitally. So why not build most of the receiver front-end using our analog

    probability networks? For example, the decision-feed-back equalizer (DFE) is a good

    candidate for an analog network implementation, since all the basic operations can be

    implemented using our generic building blocks. By doing so, we get one step closer to the

    antenna or the line interface of a data communication system without flipping too much back

    and forth between analog to digital.

    All-analog receiver system

    Our experience so far is that many individual blocks of a receiver system can be

    implemented in analog electronics. Despite the fact that many renowned researchers postulate

    software radio, that is, a system that consists merely of an A/D converter as close as possible

    to the antenna and digital processors for signal processing, we think that for certain

    demanding applications analog signal processing in an intelligent manner is the way to go.

    Our long-term aim is an all analog receiver system, that is, to have no digital signals before

    the decoder block, since the analog decoder can make an inherent A/D conversion. This

    would potentially provide very efficient highest-speed and ultra-low-power communication

    systems as needed by todays e-society.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    39/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 39

    CONCLUSIONS

    A technique for efficiently implementing the sum-product algorithm (or probability

    propagation algorithm) in analog VLSI technology has been discussed. The described new

    type of analog computing networks exhibits a natural match between probability theory and

    transistor physics. The elementary modules of which these networks are composed include

    probabilistic versions of standard logic gates. The obvious application of such networks is the

    decoding of error-correcting codes. However, any factor graph where all function nodes of

    degree larger than one can be mapped onto such analog networks.

    The transistor-level implementations of the building blocks are very simple current-mode

    vector multipliers and current-mode selective adders that process discrete probability

    distributions. Basically one transistor is needed to build the pair-wise product of two elements

    of discrete probability distributions.

    The presented networks follow a bio-inspired approach and therefore omit many plagues of

    traditional analog circuit design such as data-representation overflow, temperature

    dependence, linear approximations of non-linearities, component variations, and tedious

    manual design flow. The circuits exploit rather than fight the inherent non-linearities of the

    used exponential characteristic of both bipolar junction transistors and weakly inverted MOS

    transistors. By building large, highly connected networks out of very simple and low-

    precision computation nodes, a high precision and high processing throughput is reached on

    the system level. Due to their simplicity and computational efficiency, analog networks

    exhibit a distinct advantage in the speed-power-ratio compared to their comparable digital

    counter-parts. According to experience (still limited), this advantage amounts to at least two

    orders of magnitude.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    40/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    Dept. of ECE, Sree Buddha College of Engineering 40

    REFERENCES

    [1] C. A. Mead, Neuromorphic electronic systems. in Proc. IEEE, vol. 78, pp 1629

    1636, Oct. 1990.

    [2] Gilbert B , "A precise four-quadrant multiplier with sub nanosecond response,"

    IEEE J. Solid-State Circuits, vol.3, no.4, pp. 365- 373, Dec. 1968

    [3] H.A. Loeliger, M. Helfenstein, F. Lustenberger, and F. Tarky. Probability

    propagation and decoding in analog VLSI. in Proc. IEEE Int. Symp. on Information

    Theory, Aug. 1998, pp 146.

    [4]Wiberg N, H.A. Loeliger. and Kotter R, "Codes and iterative decoding on general

    graphs,"in Proc.IEEE Int. Symp.on Information Theory , 17-22 Sep 1995 pp.468.

    [5] M. Frey, H.A. Loeliger, F. Lustenberger, P. Merkli, and P. Strebel, Analog decoder

    experiments with sub threshold CMOS soft-gates, in Proc. IEEE Int. Symp. on Circuits

    and Systems, June 2003, pp. 8588.

    [6] F. R. Kschischang, B. J. Frey, and H. A. Loeliger, Factor graphs and the sum -product

    algorithm,IEEE Trans. on Information Theory, vol. 47, no. 2, Feb. 2001.

    [7] J. Hagenauer and M. Winkelhofer , The analog decoder, in Proc. IEEE Int. Symp.

    on Information Theory,vol 4, Aug. 1998,pp 145.

    [8] Xiao-An Wang and Wicker S.B , "An artificial neural net Viterbi decoder,"IEEE

    Trans. on Communications, vol.44, no.2, pp.165-171, Feb 1996.

    [9] H.A. Loeliger, F. Lustenberger, F. Tarky, and M. Helfenstein, Decoding in Analog

    VLSI,IEEE Communications Magazine ,vol. 37, no. 4, pp. 99101, April 1999.

    [10] G. D. Forney, Codes ongraphs: Normal realizations. in Proc. IEEE Int. Symp.

    on Information Theory, June 2000, pp 9-19.

    [11] Gunhee Han and Edgar Sanchez-Sinencio, CMOS Transconductance Multipliers A

    Tutorial. IEEE Trans. on Circuits and Systems II: Analog and Digital signal Processing,

    vol.45, no.12, Dec 1998.

    [12] F. Lustenberger, On the design of analog iterative VLSI decoders, Ph.D.

    dissertation, ETH, Zurich, Switzerland, 2000.

  • 7/31/2019 DIGIT SERIAL PROCESSING PIPELINING

    41/41

    SEMINAR REPORT PIPELINING OF DIGIT-SERIAL PROCESSING ELEMENTS IN RECURSIVEDIGITAL FILTERS

    [13] F. Lustenberger and H.A. Loeliger, On mismatch errors in analog-VLSI error

    correcting decoders, in Proc. IEEE Int. Symp. on Circuits and Systems, July 2001, pp.

    198201.