documentation.pprojectdf

Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using

Verilog HDL

Department of ECE, GITAM UNIVERSITY 1

Chapter 1

INTRODUCTION

1.1 Overview of the project

Convolutional coding has been used in communication systems

including deep space communications and wireless communications. It offers an

alternative to block codes for transmission over a noisy channel. An advantage of

convolutional coding is that it can be applied to a continuous data stream as well as to

blocks of data. IS-95, a wireless digital cellular standard for CDMA (code division

multiple access), employs convolutional coding. A third generation wireless cellular

standard, under preparation, plans to adopt turbo coding, which stems from

convolutional coding.

The Viterbi decoding algorithm, proposed in 1967 by Viterbi, is a

decoding process for convolutional codes in memory-less noise. The algorithm can

be applied to a host of problems encountered in the design of communication

systems. The Viterbi decoding algorithm provides both a maximum-likelihood

and a maximum a posteriori algorithm. A maximum a posteriori algorithm

identifies a code word that maximizes the conditional probability of the decoded

code word against the received code word, in contrast a maximum likelihood

algorithm identifies a code word that maximizes the conditional probability of

the received code word against the decoded code word. The two algorithms give the

same results when the source information has a uniform distribution.

Traditionally, performance and silicon area are the two most important

concerns in VLSI design. Recently, power dissipation has also become an important

concern, especially in battery- powered applications, such as cellular phones, pagers

and laptop computers. Power dissipation can be classified into two categories, static

power dissipation and dynamic power dissipation Typically, static power dissipation

is due to various leakage currents, while dynamic power dissipation is a result

of charging and discharging the parasitic capacitance of transistors and wires. Since

the dynamic power dissipation accounts for about 80 to 90 percent of overall power

dissipation in CMOS circuits; numerous techniques have been proposed to reduce

dynamic power dissipation. These techniques can be applied at different levels of


Verilog HDL


digital design, such as the algorithmic level, the architectural level, the gate level and,

the circuit level.

A Viterbi decoder uses the Viterbi algorithm for decoding a bit

stream that has been encoded using Forward error correction based on a

Convolutional code. The Viterbi algorithm is commonly used in a wide range of

communications and data storage applications. It is used for decoding convolutional

codes, in baseband detection for wireless systems, and also for detection of recorded

data in magnetic disk drives. The requirements for the Viterbi decoder or Viterbi

detector, which is a processor that implements the Viterbi algorithm, depend on the

applications where they are used. This results in very wide range of required data

throughputs and power or area requirements.

Viterbi detectors are used in cellular telephones with low data rates, of

the order below 1Mb/s but with very low energy dissipation requirement. They are

used for trellis code demodulation in telephone line modems, where the throughput is

in the range of tens of kb/s, with restrictive limits in power dissipation and the

area/cost of the chip. On the opposite end, very high speed Viterbi detectors are used

in magnetic disk drive read channels, with throughputs over 600Mb/s. But at these

high speeds, area and power are still limited.

Convolutional coding has been used in communication systems including

deep space communications and wireless communications. It offers an alternative to

block codes for transmission over a noisy channel. An advantage of convolutional

coding is that it can be applied to a continuous data stream as well as to blocks of

data. IS-95, a wireless digital cellular standard for CDMA (code division multiple

access), employs convolutional coding.

1.2 Motivation

Unlike wired digital networks, wireless digital networks are much

more prone to bit errors. Packets of bits that are received are more likely to be

damaged and considered unusable in a packetized system. Error detection and

correction mechanisms are vital and numerous techniques exist for reducing the effect

of bit-errors and trying to ensure that the receiver eventually gets an error free version

of the packet. The major techniques used are error detection with Automatic Repeat


Verilog HDL


Request (ARQ), Forward Error Correction (FEC) and hybrid forms of ARQ and FEC

(H-ARQ).

This project focuses on FEC techniques. Forward Error Correction

(FEC) is the method of transmitting error correction information along with the

message. At the receiver, this error correction information is used to correct any bit-

errors that may have occurred during transmission. The improved performance comes

at the cost of introducing a considerable amount of redundancy in the transmitted

code. There are various FEC codes in use today for the purpose of error correction.

Most codes fall into either of two major categories: block codes and convolutional

codes. Block codes work with fixed length blocks of code. Convolutional codes deal

with data sequentially (i.e. taken a few bits at a time) with the output depending on

both the present input as well as previous inputs.

In terms of implementation, block codes become very complex as

their length codes, are less complex and therefore easier to implement. In

packetized digital networks convolutionally coded data would still be transmitted as

packets or blocks. However these blocks would be much larger in comparison to those

used by block codes. The fact that convolutional codes are easier to implement,

coupled with the emergence of a very efficient convolutional decoding algorithm,

known as Viterbi Algorithm is one of the reasons for convolutional codes becoming

the preferred method for real time communication technologies.

This project studies the use of various error detection and correction

techniques for mobile networks with a focus on non-recursive convolutional coding

and the Viterbi Algorithm. The constraint length of a non-recursive convolutional

code results from the number of stages present in the combinatorial logic of the

encoder. The error correction power of a convolutional code increases with its

constraint length. However, decoding complexity increases exponentially as the

constraint length increases. Fortunately, the efficiency of the Viterbi algorithm allows

the use of convolutional coding with quite reasonable constraint lengths in many

applications. Due to its high accuracy in finding the most likely sequence of states, the

Viterbi algorithm is used in many applications ranging from communication

networks, optical character recognition and even DNA sequence analysis.

Recently, interest has grown in the use of certain error correction

codes that provide much superior performance. Two of these codes are Low Density

Parity Check codes and Turbo Codes. The ideas presented in this thesis are likely to


Verilog HDL


be relevant to these more advanced codes as well as non-recursive convolutional

codes, but this thesis will concentrate on convolutional codes. Since preservation of

battery energy is a major concern for mobile devices, it is desirable that the error

detection and correction mechanism take the minimum amount of energy to execute.

This project explores the possibility of improving the energy efficiency of the Viterbi

decoder and develops an algorithm to achieve this.

1.3 Outline and Context of the Report

This project focuses on the use of Viterbi Algorithm for forward error

correction in mobile networks. It is desirable to keep energy consumption at a

minimum in order to optimize use of available battery energy. In order to get good

error correcting capabilities, the constraint length must be kept high and since the

complexity of a convolutional decoder increases exponentially with its constraint

length, optimizing the decoding mechanism with respect to energy consumption

becomes a worthwhile goal. The growing need for improved energy efficiency of

decoders has resulted in several approaches being explored.

The main focus of the project is to explore an idea, proposed by Barry

Cheetham which is to switch off the Viterbi decoder and use a simpler decoder when

no bit-errors are occurring. It is possible that by doing this, a significant amount of

energy could be saved. When bit-errors are detected, the Viterbi decoder can be

switched back on to take advantage of its error correction functionality. This process

at the receiver depends on having a memory of previous bits received. Correctly

maintaining and using this previous memory (previous history) when switching

between the two decoders is one of the main technical challenges in the project. The

energy saving mechanism proposed by Barry Cheetham is based on an earlier idea

published by Wei Shao, though it is hoped that the new approach will be easier to

implement.

This algorithm can be developed using verilog though it will require a

custom designed version of the Viterbi algorithm to be developed from scratch, and

then adapted to the new energy saving idea. Possible problems that may affect the

accuracy and energy saving capabilities of the algorithm must be analyzed and

solutions to these problems must be developed. The performance of the resulting


Verilog HDL


algorithm must be studied in terms of bit-error performance, packet loss rates and

processing time.

In principle, evaluating the performance of the new technique requires

profiling of the energy consumption of the two algorithms involved. To do this

accurately would require resources beyond the scope of the project verilog provides

some profiling facilities. But relating information obtained to energy consumption as

would be observed in a VLSI implementation of the code is a complex issue.

Nevertheless, it is believed that the execution times of particular parts of the

algorithms can give some idea of the likely relationship between the energy

consumption of these particular parts. Hence, in place of quoting estimations of the

likely energy consumption of different techniques, execution times will be quoted

with an implicit assumption that this gives a first order approximation to the likely

energy consumption. By comparison with the standard Viterbi decoder available

verilog an analysis will be made of whether this method provides a significant

improvement over existing mechanisms.

1.4 Contributions and main objectives

The main objectives of this project are as follows

1. An understanding of the background literature relevant to error detection and

error control mechanisms as currently used in packetized digital communication

networks.

2. A detailed understanding of the concept of convolutional coding, and decoding

using the Viterbi algorithm.

3. An implementation of the Viterbi algorithm in verilog to obtain a custom

designed version called My Viterbi and check that it is working correctly by

comparing its performance with that of the Viterbi decoder function provided by

verilog (A custom designed Viterbi decoder is needed because verilog does not

provide access to the code.

4. A resolution of questions that still need to be answered about the new

algorithm including the correct initialization of component decoders and the stability

of the feedback mechanism

5. An implementation in verilog of the new algorithm as a modification of the

custom designed Viterbi algorithm.


Verilog HDL


6. An evaluation of the new algorithm in terms of its accuracy and capacity for

achieving energy saving the Analysis will be performed on the basis of bit-error

performance, packet loss rates and execution time (considered to provide a first order

approximation to energy.

1.5 Scope of the Project

This project is intended to further develop and implement the energy saving decoding

algorithm developed by Barry Cheetham. Solutions to some issues that still remained

to be resolved at the beginning of this project. The main focus of this project is to

provide a working demonstration of the algorithm by implementation in verilog and to

analyze its performance by comparison with the standard Viterbi decoder available in

verilog. The system will be developed using a hard decision Viterbi decoder but may

be extended to using a soft decision decoder. The project does not consider the circuit

level design of the algorithm but uses a high level approach to test the proposed

algorithm. This may be considered in future work if it is found that this algorithm

promises considerable benefits over existing mechanisms.


Verilog HDL


Chapter 2

2.1 Overview of VLSI

The first semiconductor chips held one transistor each. Subsequent

advances added more and more transistors, and, as a consequence, more individual

functions or systems were integrated over time. The first integrated circuits held only

a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors,

making it possible to fabricate one or more logic gates on a single device. Now known

retrospectively as "small-scale integration" (SSI), improvements in technique led to

devices with hundreds of logic gates, known as large-scale integration (LSI), i.e.

systems with at least a thousand logic gates. Current technology has moved far past

this mark and today's microprocessors have many millions of gates and hundreds of

millions of individual transistors.

At one time, there was an effort to name and calibrate various levels of large-scale

integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used.

But the huge number of gates and transistors available on common devices has

rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of

integration are no longer in widespread use. Even VLSI is now somewhat quaint,

given the common assumption that all microprocessors are VLSI or better.

As of early 2008, billion-transistor processors are commercially available, an example

of which is Intel's Montecito Itanium chip. This is expected to become more

commonplace as semiconductor fabrication moves from the current generation of 65

nm processes to the next 45 nm generations (while experiencing new challenges such

as increased variation across process corners). Another notable example is NVIDIAs

280 series GPU.

This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable

of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor

count is largely due to the 24MB L3 cache). Current designs, as opposed to the

earliest devices, use extensive design automation and automated logic synthesis to lay

out the transistors, enabling higher levels of complexity in the resulting logic

functionality. Certain high-performance logic blocks like the SRAM cell, however,

are still designed by hand to ensure the highest efficiency (sometimes by bending or


Verilog HDL


breaking established design rules to obtain the last bit of performance by trading

stability).

2.2 INTRODUCTION OF VLSI

Very-large-scale integration (VLSI) is the process of creating integrated

circuits by combining thousands of transistor-based circuits into a single chip. VLSI

began in the 1970s when complex semiconductor and communication technologies

were being developed. The microprocessor is a VLSI device. The term is no longer as

common as it once was, as chips have increased in complexity into the hundreds of

millions of transistors.

The first semiconductor chips held one transistor each. Subsequent advances

added more and more transistors, and, as a consequence, more individual functions or

systems were integrated over time. The first integrated circuits held only a few

devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it

possible to fabricate one or more logic gates on a single device. Now known

retrospectively as "small-scale integration" (SSI), improvements in technique led to

devices with hundreds of logic gates, known as large-scale integration (LSI), i.e.

systems with at least a thousand logic gates. Current technology has moved far past

this mark and today's microprocessors have many millions of gates and hundreds of

millions of individual transistors.

At one time, there was an effort to name and calibrate various levels of large-scale

integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used.

But the huge number of gates and transistors available on common devices has

rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of

integration are no longer in widespread use. Even VLSI is now somewhat quaint,

given the common assumption that all microprocessors are VLSI or better.

As of early 2008, billion-transistor processors are commercially available, an example

of which is Intel's Montecito Itanium chip. This is expected to become more

commonplace as semiconductor fabrication moves from the current generation of 65

nm processes to the next 45 nm generations (while experiencing new challenges such

as increased variation across process corners). Another notable example is NVIDIAs

280 series GPU.


Verilog HDL


This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable

of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor

count is largely due to the 24MB L3 cache). Current designs, as opposed to the

earliest devices, use extensive design automation and automated logic synthesis to lay

out the transistors, enabling higher levels of complexity in the resulting logic

functionality. Certain high-performance logic blocks like the SRAM cell, however,

are still designed by hand to ensure the highest efficiency (sometimes by bending or

breaking established design rules to obtain the last bit of performance by trading

stability).

2.3 What is VLSI?

VLSI stands for "Very Large Scale Integration". This is the field which involves

packing more and more logic devices into smaller and smaller areas.

VLSI

Simply we say Integrated circuit is many transistors on one chip.

Design/manufacturing of extremely small, complex circuitry using modified

semiconductor material

Integrated circuit (IC) may contain millions of transistors, each a few mm in

size

Applications wide ranging: most electronic logic devices

2.3.1 History of scale integration

late 40s Transistor invented at Bell Labs

late 50s First IC (JK-FF by Jack Kilby at TI)

early 60s Small Scale Integration (SSI)

10s of transistors on a chip

100s of transistors on a chip

early 70s Large Scale Integration (LSI)

1000s of transistor on a chip

early 80s VLSI 10,000s of transistors on a

chip (later 100,000s & now 1,000,000s)

Ultra LSI is sometimes used for 1,000,000s


Verilog HDL


SSI - Small-Scale Integration (0-102)

MSI - Medium-Scale Integration (102-103)

LSI - Large-Scale Integration (103-105)

VLSI - Very Large-Scale Integration (105-107)

ULSI - Ultra Large-Scale Integration (>=107)

2.3.2 Advantages of ICs over discrete components

While we will concentrate on integrated circuits, the properties of integrated circuits-

what we can and cannot efficiently put in an integrated circuit-largely determine the

architecture of the entire system. Integrated circuits improve system characteristics in

several critical ways. ICs have three key advantages over digital circuits built from

discrete components:

Size. Integrated circuits are much smaller-both transistors and

wires are shrunk to micrometer sizes, compared to the millimeter or centimeter scales

of discrete components. Small size leads to advantages in speed and power

consumption, since smaller components have smaller parasitic resistances,

capacitances, and inductances.

Speed. Signals can be switched between logic 0 and logic 1

much quicker within a chip than they can between chips. Communication within a

chip can occur hundreds of times faster than communication between chips on a

printed circuit board. The high speed of circuits on-chip is due to their small size-

smaller components and wires have smaller parasitic capacitances to slow down the

signal.

Power consumption. Logic operations within a chip also take

much less power. Once again, lower power consumption is largely due to the small

size of circuits on the chip-smaller parasitic capacitances and resistances require less

power to drive them.

2.3.4 VLSI and systems

These advantages of integrated circuits translate into advantages at the system level:

Smaller physical size. Smallness is often an advantage in

itself-consider portable televisions or handheld cellular telephones.


Verilog HDL


Lower power consumption. Replacing a handful of standard

parts with a single chip reduces total power consumption. Reducing power

consumption has a ripple effect on the rest of the system: a smaller, cheaper power

supply can be used; since less power consumption means less heat, a fan may no

longer be necessary; a simpler cabinet with less shielding for electromagnetic

shielding may be feasible, too.

Reduced cost. Reducing the number of components, the

power supply requirements, cabinet costs, and so on, will inevitably reduce system

cost. The ripple effect of integration is such that the cost of a system built from

custom ICs can be less, even though the individual ICs cost more than the standard

parts they replace.

Understanding why integrated circuit technology has such profound influence on the

design of digital systems requires understanding both the technology of IC

manufacturing and the economics of ICs and digital systems.

Applications

Electronic system in cars.

Digital electronics control VCRs

Transaction processing system, ATM

Personal computers and Workstations

Medical electronic systems.

2.4 Applications of VLSI

Electronic systems now perform a wide variety of tasks in daily life. Electronic

systems in some cases have replaced mechanisms that operated mechanically,

hydraulically, or by other means; electronics are usually smaller, more flexible, and

easier to service. In other cases electronic systems have created totally new

applications. Electronic systems perform a variety of tasks, some of them visible,

some more hidden:

Personal entertainment systems such as portable MP3

players and DVD players perform sophisticated algorithms with remarkably little

energy.


Verilog HDL


Electronic systems in cars operate stereo systems and

displays; they also control fuel injection systems, adjust suspensions to varying

terrain, and perform the control functions required for anti-lock braking (ABS)

systems.

Digital electronics compress and decompress video, even at

high-definition data rates, on-the-fly in consumer electronics.

Low-cost terminals for Web browsing still require

sophisticated electronics, despite their dedicated function.

Personal computers and workstations provide word-

processing, financial analysis, and games. Computers include both central processing

units (CPUs) and special-purpose hardware for disk access, faster screen display, etc.

Medical electronic systems measure bodily functions and

perform complex processing algorithms to warn about unusual conditions. The

availability of these complex systems, far from overwhelming consumers, only creates

demand for even more complex systems.

2.5 VERILOG HDL

Verilog HDL is a hardware description language that can be used to model a

digital system at many levels of abstraction ranging from the algorithmic-level to the

gate-level to the switch-level. The complexity of the digital system being modelled

could vary from that of a simple gate to a complete electronic digital system, or

anything in between. The digital system can be described hierarchically and timing

can be explicitly modelled within the same description.

The Verilog HDL language includes capabilities to describe the behaviour-al

nature of a design, the dataflow nature of a design, a design's structural composition,

delays and a waveform generation mechanism including aspects of response

monitoring and verification, all modelled using one single language. In addition, the

language provides a programming language interface through which the internals of a

design can be accessed during simulation including the control of a simulation run.

The language not only defines the syntax but also defines very clear simulation

semantics for each language construct. Therefore, models written in this language can

be verified using a Verilog simulator. The language inherits many of its operator

symbols and constructs from the C programming language. Verilog HDL provides an


Verilog HDL


extensive range of modelling capabilities, some of which are quite difficult to

comprehend initially. However, a core subset of the language is quite easy to learn

and use. This is sufficient to model most applications.

2.5.1 History:

The verilog HDL language was first developed by Gateway Design Automation in

1983 as hardware are modelling language for their simulator product, At that time was

a propnetary language. Because of the popularity of the simulator product, Verilog

HDL gained acceptance as a usable and practical language by a number of designers.

In an effort to increase the popularity of the language, the language was placed in the

public domain in 1990. Open verilog International (OVI) was formed to promote

Verilog. In 1992 OVI decided to pursue standardization of verilog HDL as an IEEE

standard. This effort was successful and the language became an IEEE standard in

1995. The complete standard is described in the verilog hardware description

language reference manual. The standard is called std 1364-1995.

2.5.2 Major Capabilities:

Listed below are the major capabilities of the verilog hardware description:

Primitive logic gates, such as and, or and nand, are built-in into the language.

Flexibility of creating a user-defined primitive (UDP). Such a primitive could

either be a combinational logic primitive or a sequential logic primitive.

Switch-level modelling primitive gates, such as pmos and nmos, are also built-

in into the language.

Explicit language constructs are provided for specifying pin-to-pin delays,

path delays and timing checks of a design.

A design can be modelled in three different styles or in a mixed style. These

styles are: behavioural style - modelled using procedural constructs; dataflow style -

modelled using continuous assignments; and structural style - modelled using gate and

module instantiations.

There are two data types in Verilog HDL; the net data type and the register

data type. The net type represents a physical connection between structural elements

while a register type represents an abstract data storage element.


Verilog HDL


Figure.2-1 shows the mixed-level modeling capability of Verilog HDL, that is,

in one design, each module may be modeled at a different level.

Verilog HDL also has built-in logic functions such as & (bitwise-and) and I

(bitwise-or).

High-level programming language constructs such as condition- als, case

statements, and loops are available in the language.

Notion of concurrency and time can be explicitly modelled.

Powerful file read and write capabilities fare provided.

The language is non-deterministic under certain situations, that is, a model

may produce different results on different simulators; for example, the ordering of

events on an event queue is not defined by the standard.

2.6 Verilog synthesis

Synthesis is the process of constructing a gate level netlist from a register-transfer

level model of a circuit described in Verilog HDL. Figure.2-2 shows such a process.

A synthesis system may as an intermediate step, generate a netlist that is comprised of

register-transfer level blocks such as flip-flops, arithmetic-logic-units, and

multiplexers, interconnected by wires. In such a case, a second program called the

RTL module builder is necessary. The purpose of this builder is to build, or acquire

Fig2.1: Mixed level modelling


Verilog HDL


from a library of predefined components, each of the required RTL blocks in the user-

specified target technology.

Having produced a gate level netlist, a logic optimizer reads in the netlist and

optimizes the circuit for the user-specified area and timing constraints. These area and

timing constraints may also be used by the module builder for appropriate selection or

generation of RTL blocks. In this book, we assume that the target netlist is at the gate

level. The logic gates used in the synthesized netlists are described in Appendix B.

The module building and logic optimization phases are not described in this book.

The above figure shows the basic elements of Verilog HDL and the elements used in

hardware. A mapping mechanism or a construction mechanism has to be provided that

translates the Verilog HDL elements into their corresponding hardware elements as

shown in figure.

Figure: 2.2 synthesis process


Verilog HDL


2.7 Tools used and explanation

Requirements:

Xilinx 9.1i

Modelsim 6.2

2.8 Introduction about the Software:

Xilinx ISE 8.2i software includes the new Xilinx Smart Compile

technology, which significantly improves run times by up to 6 times faster than the

previous version, while maintaining exact design preservation of unchanged logic.

Modelsim SE 6.2C is a verification and simulation tool for VHDL, Verilog,

System-Verilog, and mixed language designs.


Verilog HDL


CHAPTER-3

3.1 THE VITERBI DECODER ALGORITHM

The Viterbi decoding algorithm is a decoding process for

convolutional codes for memory-less channel. It depicts the normal flow of

information over a noisy channel. For the purpose of error recovery, the encoder

adds redundant information to the original Information, and the output is transmitted

through a channel. Input at receiver end (r) is the information with redundancy and

possibly, noise. The receiver tries to extract the original information through a

decoding algorithm and generates an estimate (e). A decoding algorithm that

maximizes the probability p(r|e) is a maximum likelihood (ML) algorithm. An

algorithm which maximizes the p(r|e) through the proper selection of the estimate (e)

is called a maximum a posteriori (MAP) algorithm. The two algorithms have identical

results when the source information has a uniform distribution.

Figure 3.1 The Convolutional Decoding

The Viterbi Algorithm was developed by Andrew J. Viterbi and first published

in the IEEE transactions journal on Information theory in 1967. It is a maximum

likelihood decoding algorithm for convolutional codes. This algorithm provides a

method of finding the branch in the trellis diagram that has the highest probability of

matching the actual transmitted sequence of bits. Since being discovered, it has

become one of the most popular algorithms in use for convolutional decoding. Apart

from being an efficient and robust error detection code, it has the advantage of having

a fixed decoding time. This makes it suitable for hardware implementation. The

algorithm has found universal application in decoding the convolutional codes used in

both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space

communications, and 802.11 wireless LANs. It is now also commonly used in speech


Verilog HDL


recognition, speech synthesis, keyword spotting, computational linguistics,

and bioinformatics. For example, in speech-to-text (speech recognition), the acoustic

signal is treated as the observed sequence of events, and a string of text is considered

to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most

likely string of text given the acoustic signal. The terms Viterbi path and Viterbi

algorithm are also applied to related dynamic programming algorithms that discover

the single most likely explanation for an observation. For example, in statistical

parsing a dynamic programming algorithm can be used to discover the single most

likely context-free derivation (parse) of a string, which is sometimes called the Viterbi

parse.

3.2 Convolutional Encoders

Like any error-correcting code, a convolutional code works by

adding some structured redundant information to the user's data and then correcting

errors using this information. A convolutional encoder is a linear system. A binary

convolutional encoder can be represented as a shift register. The outputs of the

encoder are modulo 2 sums of the values in the certain register's cells. The input to the

encoder is either the unencoded sequence (for non-recursive codes) or the unencoded

sequence added with the values of some register's cells (for recursive codes).

In telecommunication, a convolutional code is a type of error-correcting

code in which

Each m-bit information symbol (each m-bit string) to be encoded is

transformed into an n-bit symbol, where m/n is the code rate (n m) and

The transformation is a function of the last k information symbols, where k is

the constraint length of the code.

Convolutional codes can be systematic and non-systematic.

Systematic codes are those where an unencoded sequence is a part of the output

sequence. Systematic codes are almost always recursive, conversely, non-recursive

codes are almost always non-systematic. Convolutional codes are used extensively in

numerous applications in order to achieve reliable data transfer, including digital

video, radio, mobile communication, and satellite communication. These codes are

often implemented in concatenation with a hard-decision code, particularly Reed


Verilog HDL


Solomon. Prior to turbo codes, such constructions were the most efficient, coming

closest to the Shannon limit.

To convolutionally encode data, start with k memory registers, each holding 1

input bit. Unless otherwise specified, all memory registers start with a value of 0. The

encoder has nmodulo-2 adders (a modulo 2 adder can be implemented with a

single Boolean XOR gate, where the logic is: 0+0 = 0, 0+1 = 1, 1+0 = 1, 1+1 = 0),

and n generator polynomials one for each adder (see figure below). An input

bit m1 is fed into the leftmost register. Using the generator polynomials and the

existing values in the remaining registers, the encoder outputs n bits. Now bit shift all

register values to the right (m1 moves to m0, m0 moves to m-1) and wait for the next

input bit. If there are no remaining input bits, the encoder continues output until all

registers have returned to the zero state.

The figure below is a rate 1/3 (m/n) encoder with constraint length (k) of 3.

Generator polynomials are G1 = (1,1,1), G2 = (0,1,1), and G3 = (1,0,1). Therefore,

output bits are calculated (modulo 2) as follows:

n1 = m1 + m0 + m-1

n2 = m0 + m-1

n3 = m1 + m-1.

A combination of register's cells that forms one of the output streams (or that is added

with the input stream for recursive codes) is defined by a polynomial. Let m be the

maximum degree of the polynomials constituting a code, then K=m+1 is a constraint

length of the code.

Figure 3.2 The Convolutional Encoder

Figure A standard convolutional encoder with polynomials (171,133). For example,

for the decoder on the Figure 3.2, the polynomials are:


Verilog HDL


g1(z)=1+z+z2+z3+z6

g2(z)=1+z2+z3+z5+z6

Encoder polynomials are usually denoted in the octal notation. For the above example,

these designations are 1111001 = 171 and 1011011 = 133.The constraint length

of this code is 7.An example of a recursive convolutional encoder is on fig3.3

.

Figure 3.3. A recursive convolutional encoder

3.2.1 Trellis Diagram

A convolutional encoder is often seen as a finite state machine. Each state

corresponds to some value of the encoder's register. Given the input bit value, from a

certain state the encoder can move to two other states. These state transitions

constitute a diagram which is called a trellis diagram. A trellis diagram for the code

on the Figure 2 is depicted on the Figure 3. A solid line corresponds to input 0, a

dotted line to input 1 (note that encoder states are designated in such a way that the

rightmost bit is the newest one).

Each path on the trellis diagram corresponds to a valid sequence from the

encoder's output. Conversely, any valid sequence from the encoder's output can be

represented as a path on the trellis diagram. One of the possible paths is denoted as

red (as an example).Note that each state transition on the diagram corresponds to a

pair of output bits. There are only two allowed transitions for every state, so there are

two allowed pairs of output bits, and the two other pairs are forbidden.

If an error occurs, it is very likely that the receiver will get a set of forbidden

pairs, which don't constitute a path on the trellis diagram. So, the task of the decoder

is to find a path on the trellis diagram which is the closest match to the received

sequence.


Verilog HDL


Figure 3.4 A trellis diagram corresponding to the encoder on the Figure 3.3

Let's define a free distance df as a minimal Hamming distance between two different

allowed binary sequences (a Hamming distance is defined as a number of differing

bits).


Verilog HDL


Chapter -4

4. Viterbi decoder

A Viterbi decoder uses the Viterbi algorithm for decoding a bit stream that

has been encoded using a convolutional code. There are other algorithms for decoding

a convolutionally encoded stream (for example, the Fano algorithm). The Viterbi

algorithm is the most resource-consuming, but it does the maximum

likelihood decoding. It is most often used for decoding convolutional codes with

constraint lengths k


Verilog HDL


For each state, the Hamming distance between the received bits and the expected bits

is calculated. Hamming distance between two symbols of the same length is

calculated as the number of bits that are different between them. These branch metric

values are passed to Block 2. If soft decision inputs were to be used, branch metric

would be calculated as the squared Euclidean distance between the received symbols.

The squared Euclidean distance is given as (a1-b1)2 + (a2-b2)2 + (a3-b3)2 where a1, a2, a3

and b1, b2, b3 are the three soft decision bits of the received and expected bits

respectively.

value Meaning

000 strongest 0

001 relatively strong 0

010 relatively weak 0

011 weakest 0

100 weakest 1

101 relatively weak 1

110 relatively strong 1

111 strongest 1

Figure 4.2 A recursive convolutional encoder


Verilog HDL


4.2. Path Metric Computation and Add-Compare-Select (ACS) Unit

A path metric unit summarizes branch metrics to get metrics for

paths, where K is the constraint length of the code, one of which can eventually be

chosen as optimal. Every clock it makes decisions, throwing off wittingly

nonoptimal paths. The results of these decisions are written to the memory of a

traceback unit.

The core elements of a PMU are ACS (Add-Compare-Select) units. The way in

which they are connected between themselves is defined by a specific code's trellis

diagram. Since branch metrics are always , there must be an additional circuit

preventing metric counters from overflow (it isn't shown on the image). An alternate

method that eliminates the need to monitor the path metric growth is to allow the path

metrics to "roll over", to use this method it is necessary to make sure the path metric

accumulators contain enough bits to prevent the "best" and "worst" values from

coming within 2(n-1) of each other. The compare circuit is essentially unchanged.

Figure 4.3 ACS Unit

It is possible to monitor the noise level on the incoming bit stream by monitoring the

rate of growth of the "best" path metric. A simpler way to do this is to monitor a

single location or "state" and watch it pass "upward" through say four discrete levels

within the range of the accumulator. As it passes upward through each of these

thresholds, a counter is incremented that reflects the "noise" present on the incoming

signal.

The path metric or error probability for each transition state at a particular time

instant is measured as the sum of the path metric for its preceding state and the branch

metric between the previous state and the present state. The initial path metric at the


Verilog HDL


first time instant is infinity for all states except state 0. For each state, there are two

possible predecessors. The mechanism of calculating the predecessors (and

successors) is the path metrics from both these predecessors are compared and the one

with the smallest path metric is selected. This is the most probable transition that

occurred in the original message. In addition, a single bit is also stored for each state

which specifies whether the lower or upper predecessor was selected.

Figure 4.4 A sample implantation of a path metric unit for a specific k=4 decoder

In cases where both paths result in the same path metric to the state, either the higher

or lower state may consistently be chosen as the surviving predecessor. For the

purpose of this project the higher state is consistently chosen as the surviving

predecessor. Finally, the state with the least accumulated path metric at the current

time instant is located. This state is called the global winner and is the state from

which traceback operation will begin. This method of starting the traceback operation

from the global winner instead of an arbitrary state was described by Linda

Brackenbury in her design of an asynchronous Viterbi decoder. This greatly improves

probability of finding the correct traceback path quicker and hence reduces the

amount of history information that needs to be maintained. It also reduces the number

of updates required to the surviving path. Both these measures result in improved

energy savings. The values for the surviving predecessors (also called local winners)

and the global winner are passed to Block 3.


Verilog HDL


Figure 4.5 A sample implementation of an ACS Unit

4.3. Survivor memory unit or Trace back Unit

Back-trace unit restores an (almost) maximum-likelihood path from the

decisions made by PMU. Since it does it in inverse direction, a viterbi decoder

comprises a FILO (first-in-last-out) buffer to reconstruct a correct order. Note that the

implementation shown on the image requires double frequency. There are some tricks

that eliminate this requirement.

The global winner for the current state is received from Block 2. Its

predecessor is selected in the manner. In this way, working backwards through the

trellis, the path with the minimum accumulated path metric is selected. This path is

known as the traceback path. A diagrammatic description will help visualize this

process. The trellis diagram for a K=3 (7, 5) coder with sample input taken as the

received data.

The general approach to traceback is to accumulate path metrics for up to five

times the constraint length (5 * (K 1)), find the node with the largest accumulated

cost, and begin traceback from this node.

However, computing the node which has accumulated the largest cost (either

the largest or smallest integral path metric) involves finding the maxima or minima of

several (usually 2K-1) numbers, which may be time consuming when implemented on

embedded hardware systems.

Most communication systems employ Viterbi decoding involving data packets

of fixed sizes, with a fixed bit/byte pattern either at the beginning or/and at the end of


Verilog HDL


the data packet. By using the known bit/byte pattern as reference, the start node may

be set to a fixed value, thereby obtaining a perfect

Maximum Likelihood Path during traceback.

Figure 4.6 Selected minimum error path for a k=3(7, 5) coder

The state having minimum accumulated error at the last time instant is

State 10 and traceback is started here. Moving backwards through the trellis, the

minimum error path out of the two possible predecessors from that state is selected.

This path is marked in blue. The actual received data is described at the bottom while

the expected data written in blue along the selected path. It is observed that at time

slot three there was an error in received data (11). This was corrected to (10) by the

decoder.

Local winner information must be stored for five times the constraint

length. For a K =7 decoder, this results in storing history for 7 x 5 = 35 time slots. The

state of the decoder at the time instant 35 time slots prior can then be accurately

determined. This state value is passed to Block 4. At the next time slot, all the trellis

values are shifted left to the previous time slot. The path metric for the last received

data and compute the minimum error path is then calculated. If the global winner at

this stage is not a child of the previous global winner, the traceback path has to be

updated accordingly until the traceback state is a child of the previous state [22].


Verilog HDL


Figure 4.7 Trace back path unit

Multiple traceback paths are possible and it may be thought that traceback up to

the first bit is necessary to correctly determine the surviving path. However, it was

found that all possible paths converge within a certain distance or depth of traceback.

This information is useful as it allows the setting of a certain traceback depth beyond

which it is neither necessary nor advantageous to store path metric and other

information. This greatly reduces memory storage requirements and hence energy

consumption of the decoder. Empirical observations showed that a depth of five times

the constraint length was sufficient to ensure merging of paths. Therefore, local

winner information is stored for 35 slots (five times seven) in the decoder used for this

project. Block 4. Data Input Determination Now going forwards through the

traceback path, the state transitions at successive time intervals are studies and the

data bit that would have caused this transition is determined. This represents the

decoded output.

Determining Successors to a particular State, Each state is represented

by 6 shift registers (in the case of a K=7 encoder or decoder). The next state can

therefore be obtained by a right shift of the values of the shift registers. The first shift

register is given a value of 0. The resulting state represents the next state of the coder

if the input bit was 0. By adding 32 (1x25) to this value, the next state of the coder if

the input bit was 1 Determining Predecessors to a particular State In a similar way, the


Verilog HDL


first predecessor can be calculated this time by a left shift of the values of the shift

registers. By adding one (1x20) to this value, the value of the second predecessor to

the state is derived.

4.3.1 State Metric Storage

The block stores the partial path metric of each state at the current stage.

4.3.2 Output Generator:

This block generates the decoded output sequence. In the traceback approach, the

block incorporates combinational logic, which traces back along the survivor path and

latches the path (equivalently the decoded output sequence) to a register.

Figure 4.8 the block diagram of a general Viterbi Decoder

4.4. Encoding Mechanism

Data is coded by using a convolutional encoder, as described. It consists of a series of

shift registers and an associated combinatorial logic. The combinatorial logic is

usually a series of exclusive-or gates. The conventional encoder K=7, (171,133) is

used for the purpose of this project. The octal numbers 171 and 133 when represented


Verilog HDL


in binary form correspond to the connection of the shift registers to the upper and

lower exclusive-or gates respectively. Figure 3.1 represents this convolutional encoder

that will be used for the project.The encoder consists of series of xor gates for the

mechanism of encoding.

Figure 4.9: Rate=1/2 k=7, (171,133) Convolution Encoder

. 4.5. Decoding Mechanism

There are two main mechanisms by which Viterbi decoding may be

carried out namely, the Register Exchange mechanism and the Traceback mechanism.

Register exchange mechanisms, as explained by Ranpara and Sam Ha store the

partially decoded output sequence along the path. The advantage of this approach is

that it eliminates the need for traceback and hence reduces latency. However at each

stage, the contents of each register needs to be copied to the next stage. This makes

the hardware complex and more energy consuming than the traceback mechanism.

Traceback mechanisms use a single bit to indicate whether the survivor branch came

from the upper or lower path. This information is used to traceback the surviving path

from the final state to the initial state. This path can then be used to obtain the

decoded sequence. Traceback mechanisms prove to be less energy consuming and

will hence be the approach followed in this project.

Decoding may be done using either hard decision inputs or soft

decision inputs. Inputs that arrive at the receiver may not be exactly zero or one.


Verilog HDL


Having been affected by noise, they will have values in between and even higher or

lower than zero and one. The values may also be complex in nature.

In the hard decision Viterbi decoder, each input that arrives at the

receiver is converted into a binary value (either 0 or 1). In the soft decision Viterbi

decoder, several levels are created and the arriving input is categorized into a level

that is closest to its value. If the possible values are split into 8 decision levels, these

levels may be represented by 3 bits and this is known as a 3 bit Soft decision.

This project uses a hard decision Viterbi decoder for the purpose of

developing and verifying the new energy saving algorithm. Once the algorithm is

verified, a soft decision Viterbi decoder may be used in place of the hard decision

decoder. Figure 3.2 shows the various stages required to decode data using the Viterbi

Algorithm. The decoding mechanism comprises of three major stages namely the

Branch Metric Computation Unit, the Path Metric Computation and Add-Compare-

Select (ACS) Unit and the Traceback Unit. A schematic representation of the decoder

is described below

Figure 4.10: Schematic representation of the Viterbi decoding block


Verilog HDL


CHAPTER-5

METHODS AND TYPES OF VITERBI DECODER

5.1 REGISTER EXCHANGE METHOD

The register exchange (RE) method is the simplest conceptually and a

commonly used technique. Because of the large power consumption and large area

required in VLSI implementations of the RE method, the trace back method (TB)

method is the preferred method in the design of large constraint length, high

performance Viterbi decoders. In the register exchange, a register assigned to each

state contains information bits for the survivor path from the initial state to the current

state. In fact, the register keeps the partially decoded output sequence along the path,

as illustrated in Figure 3.3. The register of state S1 at t=3 contains '101'. This is the

decoded output sequence along the hold path from the initial state.

Figure 5.1 Register Exchange Method

The register-exchange method eliminates the need to trace back

since the register of the final state contains the decoded output sequence. However,

this method results in complex hardware due to the need to copy the contents of all the

registers in a stage to the next stage. The survivor path information is applied to the

least significant bit of each register, and all the registers perform a shift left operation

at each stage to make room for the next bits. Hence, each register fills in the survivor


Verilog HDL


path information from the least significant bit toward the most significant bit. The

scheme is called shift update.

The shift update method is simple in implementation but causes high

switching activity due to the shift operation and, hence, results in high power

dissipation.

5.2 Trace back mechanism

Register exchange mechanisms, as explained by Ranpara and Sam Ha store the

partially decoded output sequence along the path. The advantage of this approach is

that it eliminates the need for traceback and hence reduces latency. However at each

stage, the contents of each register needs to be copied to the next stage. This makes

the hardware complex and more energy consuming than the traceback mechanism.

Traceback mechanisms use a single bit to indicate whether the survivor branch came

from the upper or lower path. This information is used to traceback the surviving path

from the final state to the initial state. This path can then be used to obtain the

decoded sequence. Traceback mechanisms prove to be less energy consuming and

will hence be the approach followed in this project.

5.3 TYPES OF VITERBI DECODING

In order to realize a certain coding scheme a suitable measure of similarity or

distance metric between two code words is vital. The two important metrics used to

measure the distance between two code words are the Hamming distance and

Euclidian distance adopted by the decoder depending on the code scheme, required

accuracy, channel characteristics and demodulator type.

5.3.1 HARD DECISION VITERBI DECODING

In the hard-decision decoding, the path through the trellis is determined using

the Hamming distance measure. Thus, the most optimal path through the trellis is the

path with the minimum Hamming distance. The Hamming distance can be defined as

a number of bits that are different between the observed symbol at the decoder and the

sent symbol from the encoder. Furthermore, the hard decision decoding applies one

bit quantization on the received bits. Hard decision decoding takes a stream of bits say


Verilog HDL


from the 'threshold detector' stage of a receiver, where each bit is considered

definitely one or zero. E.g. For binary signalling, received pulses are sampled and the

resulting voltages are compared with a single threshold. If a voltage is greater than

the threshold it is considered to be definitely a 'one' say regardless of how close it is to

the threshold. If it is less, it is definitely zero.

5.3.2 SOFT DECISION VITERBI DECODING

Soft-decision decoding is applied for the maximum likelihood

decoding, when the data is transmitted over the Gaussian channel. On the contrary to

the hard decision decoding, the soft-decision decoding uses multi-bit quantization for

the received bits, and Euclidean distance as a distance measure instead of the

hamming distance. The demodulator input is now an analog waveform and is usually

quantized into different levels in order to help the decoder decide more easily.

A 3-bit quantization results in an 8-array output. Soft decision decoding requires a

stream of 'soft bits' where we get not only the 1 or 0 decision but also an indication of

how certain we are that the decision is correct. One way of implementing this would

be to make the threshold detector generate instead of 0 or 1, say:

000 (definitely 0), 001 (probably 0), 010 (maybe 0), 011 (guess 0),

100 (guess 1), 101 (maybe 1), 110 (probably 1), 111(definitely 1).

We may call the last two bits 'confidence' bits. This is easy to do with eight voltage

thresholds rather than one. This helps when we anticipate errors and have some

'forward error correction' coding built into the transmission.


Verilog HDL



Verilog HDL


CHAPTER-6

Applications

The Viterbi algorithm has a wide range of applications ranging from

satellite and space communications, DNA sequence analysis and Optical Character

Recognition.

An attempt to perform optical character recognition of text was investigated

by Neuhoff. The initial approach considered was to create a dictionary which

simulated vocabularies. Each time a character was read by the optical reader, it would

search the dictionary for the most likely estimate. The huge amount of computational

and storage requirements required under this approach made it impractical. However,

another approach makes use of statistical information about the language such as

relative frequency of letter pairs. A maximum a priori probability (MAP) of a word is

determined based on its probability as the output of the source model. The Viterbi

algorithm may then be used to perform this MAP sequence estimation.

An interesting application discussed by Metzner investigated among others,

the use of Viterbi decoding with soft decision to increase the probability of

successfully transmitting a data packet during a meteor burst. Since meteor trails are

made up of ionized material, these can be used for reliable communications. Some

characteristics of such meteor burst communication and descriptions of its practical

applications are detailed in. Metzner showed that convolutional codes with soft

decision were considerably better for meteor burst applications as compared to Reed-

Solomon codes.

Low power applications of the Viterbi decoder are particularly relevant to

many digital communication and recording systems today. As described by Kawokgy

and Salama systems like these are increasingly being used in wireless applications

which being battery operated, require low power consumption. In addition, these

systems also require processing speeds of over 100Mbps to allow multimedia

transmission. Following this trend, many papers have been written on designing low

power Viterbi decoding algorithms targeted for next generation wireless applications,

particularly CDMA systems. Some of these energy saving ideas that have been

investigated are described in the next section.


Verilog HDL


6.1 Research Work

In mobile networks, decoding capabilities are limited by the receiver which is

a mobile handset. As such, it has limited resources of energy and computation power.

Another factor that affects wireless communication is that bandwidth is expensive.

Therefore, there is a high demand for codes that can correct errors very efficiently

while at the same time utilizing minimum energy. Hence, a lot of the past research has

been focused on how this may be achieved.

The fixed T-algorithm algorithm is an optimization of the Viterbi algorithm

which applies a pruning threshold to the accumulated path metrics of the Viterbi

decoder. Instead of storing all the survivor paths for all 2K-1 states, only some of the

most-likely paths are kept at every trellis stage. This results in fewer paths being

found and stored. The following Figure 3.4 demonstrates the result of an experiment

conducted by Henning and Chakrabarti [34] which compares normalized energy

estimates for the Viterbi and the fixed T-algorithm decoders as it varies with signal to

noise ratio (Eb/No) and code rate.

Figure 6.1: Normalised energy edtimated for the Viterbi and fixed T-algorithm

(Tf) decoders as code rate and signal to noise ratio (Eb/No) vary.

From the graph, it is estimated that a 33% to 83 % reduction in energy

consumption can be achieved when the signal to noise ratio is between 2.1 and 4 dB.

One of the other approaches taken has been to develop an adaptive T-algorithm which

adjusts parameters of the decoder based on real-time variations in signal to noise ratio

(SNR), code rate and maximum acceptable bit-error rate. The parameters adjusted are


Verilog HDL


truncation length and pruning threshold of the T-algorithm along with trace-back

memory management. Henning and Chakrabarti demonstrate in their paper how this

can achieve a potential energy reduction of 70% to 97.5% as compared to Viterbi

decoding. Truncation length refers to the number of bits a path is followed back

before a decision is made on the bit that was encoded. By reducing the truncation

length more bits can be decoded per traceback. Similarly, lowering the pruning

threshold means fewer paths need to be found and stored. Both of these measures can

reduce the number of memory accesses required by the decoder and hence reduce

energy consumption.

However, these measures may cause significant reduction in the error

correcting capability of the decoder.

Nevertheless, adjusting these parameters based on real-time changes in

the channel can optimize energy consumption. The following figure, Figure 3.5

demonstrates the results of an experiment conducted by Henning and Chakrabarti [34]

in which pruning threshold and truncation length are adapted to maintain bit-error rate

below 0.0037. From the graph, it is estimated that an energy consumption reduction of

70 to 97.5 % compared to the Viterbi decoder can be achieved when the signal to

noise ratio is between 2.1 and 4 dB.

However, the adaptive T-algorithm does require an additional overhead in

terms of monitoring the real-time variations and choosing the appropriate truncation

and threshold parameters from a lookup table. Since these operations are not complex

it is assumed that their energy consumption is negligible.


Verilog HDL


Figure 6.2: Normalised energy estimates for the Viterbi and adaptive T-

algorithm (Ta) decoders as code rate and signal to noise ratio (Eb/No) vary while

maintaining bit-error rate below 0.0037

Yet another approach that was put forward by Jie Jin and Chi-Ying Tsui in the

2006 International Symposium on Low Power Electronics and Design, was to

integrate the T-algorithm with a Scarce-StateTransition (SST) decoder structure. The

SST structure first pre-decodes the received data (Rx) by performing an inverse

operation of the encoder. The pre-decoded signal will contain the original message

along with bit errors (Pre-Dec). This message Pre-Dec is re-encoded and XORed

with Rx, the original received data. The operation results in an output which consists

of mainly 0s and the errors in the message. This output is then fed to the Viterbi

decoder and the errors are corrected. In the end, the pre-decoded data (Pre-Dec) is

added to the decoded output of the Viterbi decoder using modulo-2 addition. When

channel bit-errors are low, most of the Viterbi decoder output bits are zero and thus

reduces switching activity.

The SST structure was used to reduce the switching activities of the

decoder and combined with the T-algorithm to reduce the average number of Add-


Verilog HDL


Compare Select calculations. In their experiments, Jie Jin and Chi-Ying Tsui achieved

a 30%-76% reduction in power consumption over the traditional Viterbi design for a

range of SNR values varying from 4 dB to 12 dB.

A different approach investigated by Sherif Welsen Shaker, Salwa Hussein Elramly

and Khaled Ali Shehata at a Telecommunications forum held in Belgrade last year

(2009) was to use the traceback approach with clock gating. In clock gating, the clock

of each register is enabled only when the register updates it survivor path information.

This reduces power dissipation. Their simulations showed a 30% reduction in

dynamic power dissipation which gives a good indication of power reduction on

implementation.

A similar approach investigated by Ranpara and Sam Ha and presented

in the International ASIC conference at Washington in 1999 was the use of clock

gating in combination with a concept known as toggle filtering. Signals may arrive at

the inputs of a combinational block at different times and this causes the block to go

through several intermediate transitions before it stabilizes. By blocking early signals,

the number of intermediate transitions can be reduced and hence power disspation can

be minimized. This mechanism of blocking early signals until all input signals arrive,

called toggle filtering, was used by Ranpara, et al, to reduce energy consumption of

the Viterbi decoder. Recently a new approach, targeted towards wireless applications

has been introduced [38] and involves a pre-traceback architecture for the survivor

path memory unit.

The start state of decoding is obtained directly through a pointer register

pointing to the target traceback state instead of estimating the start state through a

recursive traceback operation. This approach makes use of the similarity between bit

write and decode traceback operation to introduce the pre-traceback operation.

Effectively resulting in a trace forward type of operation, it results in a 50% reduction

in survivor memory read operations. Apart from improving latency by 25%,

implementation results predict up to 11.9% better energy efficiency when compared to

conventional traceback architecture for typical wireless applications.


Verilog HDL


6.3 Low power consumption

For the branch metric of the Viterbi decoder, our design employs a soft-decision

method to improve its correction capability. In order to find the survivor path

efficiently, we modify the classical Viterbi decoding algorithm into a new one. This

new algorithm is similar to the register-exchange method with lower latency, but

using RAM instead of register banks for recording the output bit-stream of the

survivor path. Hence, our design can provide a low-power design. Finally, the chip of

this design consumes about 28.6 K gates using TSMC 0.18 m CMOS technology.

The power consumption of our chip is about 19.5 mW at 100 MHz. The power usage

in the implementation is around 367 mw.

Figure 6.3: Demonstration Of Power Consumption


Verilog HDL


6.4 Summary

This chapter has explained the decoding mechanism of the Viterbi decoder in detail

and described a few of its applications. A number of energy saving techniques that

have been investigated in the past has been discussed. The next chapter gives a

detailed description of the proposed energy saving algorithm that will be used in this

project.


Verilog HDL


CHAPTER-7

SYNTHESIS AND SIMULATION RESULTS

7.1 Sample code

/******************************************************/

module pDFF(DATA,QOUT,CLOCK,RESET);

/****************************************************** /

Code for d flip flop

parameter WIDTH = 1;

input [WIDTH-1:0] DATA;

input CLOCK, RESET;

output [WIDTH-1:0] QOUT;

reg [WIDTH-1:0] QOUT;

always @( posedge CLOCK or negedge RESET)

if (~RESET) QOUT


Verilog HDL


wire [`WD_CODE-1:0]

wire [`WD_DIST-1:0] D0,D1,D2,D3,D4,D5,D6,D7;// output distances

reg [`WD_CODE-1:0] CodeRegister ;

always @( posedge Clock2 or negedge Reset)

begin

if (~Reset) CodeRegister


Verilog HDL


Code for hamming distance calculation

module HARD_DIST_CALC (InputSymbol , BranchOutput ,

OutputDistance) ;

/ /desc. : performs 2 bits hamming DISTance calculation

/*-----------------------------------*/

input [`WD_CODE-1:0] InputSymbol , BranchOutput ;

output [`WD_DIST-1:0] OutputDistance;

reg [`WD_DIST-1:0] OutputDistance; 77

wireMS,LS; 79

assign MS = (InputSymbol[1] ^ BranchOutput[1]) ;

assign LS = (InputSymbol[0] ^ BranchOutput[0]) ; 82

always @(MS or LS)

begin

OutputDistance[1]


Verilog HDL


assign wA = PolyA & BranchID;

assign wB = PolyB&BranchID;

always @(wA or wB)

begin

EncOut[1] = (((wA[0]^wA[1]) ^ (wA[2]^wA[3]))^((wA[4]^wA[5] ) ^

(wA[6]^wA[7]))^wA[8]) ;

EncOut[0] = (((wB[0]^wB[1]) ^ (wB[2]^wB[3]))^((wB[4]^wB[5] ) ^

(wB[6]^wB[7]))^wB[8]) ;

end

code for viterbi encoder

/***************************************** *************/

module viterbi_encode9(X,Y,Clock,Reset) ;

/****************************************************** /

Input X,Clock,Reset;

output [1:0]

wire [1:0] Y;

wire X, Clock,Reset;

wire [8:0] PolyA, PolyB;

wire [8:0] wA, wB, ShReg;

assign PolyA =9'b111_101_011; assign PolyB =

9'b101_110_001; assign PolyA = 9'b110_101_111;

assign PolyB = 9'b100_011_101;

assign wA = PolyA &

ShReg;

assign wB = PolyB & ShReg;

assign ShReg[8] = X;


Verilog HDL


pDFF dff7(ShReg[8] , ShReg[7], Clock, Reset) ;



pDF dff4(ShReg[5], ShReg[4],Clock,Reset) ;


pDFF dff2(ShReg[3] , ShReg[2], Clock.Reset) ;


pDFF dff0(ShReg[1] , ShReg[0], Clock,Reset) ;

assign Yt[1] = wA[0] ^ wA[1] ^ wA[2] ^ wA[3] ^ wA[4] ^ wA[5] ^

wA[6] ^ wA[7] ^ wA[8];

assign Yt[0] = wB[0] ^ wB[1] ^ wB[2] ^ wB[3] ^ wB[4] ^ wB[5] ^

wB[6] ^ wB[7] ^ wB[8];

pDFF dffy1(Yt[1] , Y[1], Clock, Reset);

pDFF dffy0(Yt[0] , Y[0], Clock, Reset);

endmodule

code for viterbi decoder

/ / Module : VITERBIDECODER

/ / File : decoder.v

/ / Description : Top Level Module of Viterbi Decoder

/ /module VITERBIDECODER (Reset , CLOCK, Active, Code,

DecodeOut) ;

module VITERBIDECODER (Reset , CLOCK, Active, Code,

DecodeOut) ;

input Reset , CLOCK, Active;

input [`WD_CODE-1:0] Code;

output DecodeOut;


Verilog HDL


wire [`WD_DIST*2*`N_ACS-1:0] Distance; / / BMG

Output

wire [`WD_FSM-1:0] ACSSegment; / /

wire [`WD_DEPTH-1:0] ACSPage; / / Control Output

wire CompareStart , Hold, Init ; / /

wire [`N_ACS-1:0] Survivors; / / ACS Output

wire [`WD_STATE-1:0] LowestState ;

wire TB_EN;

wire RAMEnable;

wire ReadClock, WriteClock, RWSelect; 31

wire [`WD_RAM_ADDRESS-1:0] AddressRAM; / / RAM

AddressBus,

/ / generated by TBU and ACSU

wire [`WD_RAM_DATA-1:0] DataRAM; / / RAM Databus

wire [`WD_RAM_DATA-1:0] DataTB;

wire [`WD_RAM_ADDRESS-`WD_FSM-1:0] AddressTB;

wire Clock1, Clock2;

/ / for metric memory connection

wire [`WD_METR*2*`N_ACS-1:0] MMPathMetric ;

wire [`WD_METR*`N_ACS-1:0] MMMetric;

wire [`WD_FSM-2:0] MMReadAddress;

wire [`WD_FSM-1:0] MMWriteAddress ;

wire MMBlockSelect ;

/ / instantiation of Viterbi Decoder Modules

CONTROL ctl (Reset , CLOCK, Clock1, Clock2, ACSPage,

ACSSegment ,

Active, CompareStart , Hold, Init , TB_EN);

BMU bmu (Reset , Clock2, ACSSegment , Code, Distance);

ACSUNIT acs (Reset , Clock1, Clock2, Active, Init , Hold,

CompareStart ,


Verilog HDL


ACSSegment , Distance, Survivors , LowestState ,

MMReadAddress , MMWriteAddress , MMBlockSelect , MMMetric ,

MMPathMetric) ;

MMU mmu (CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ,

ACSPage,

ACSSegment [`WD_FSM-1:1], Survivors ,

DataTB, AddressTB,

RWSelect , ReadClock, WriteClock, RAMEnable , AddressRAM,

DataRAM);

TBU tbu (Reset , Clock1, Clock2, TB_EN, Init , Hold, LowestState

Code for main memory unit

/ / Module : MMU

/ / Description : Description of MMU Unit in Viterbi Decoder

module MMU (CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ,

ACSPage,

ACSSegment_minusLSB, Survivors ,

DataTB, AddressTB,

RWSelect , ReadClock, WriteClock,

RAMEnable , AddressRAM, DataRAM);

/ / connection from Control

input CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ;

input [`WD_DEPTH-1:0] ACSPage;

input [`WD_FSM-2:0] ACSSegment_minusLSB; 21

/ / connection from ACS Unit

input [`N_ACS-1:0] Survivors;

/ / connection from/to TB Unit

output [`WD_RAM_DATA-1:0] DataTB;

input [`WD_RAM_ADDRESS-`WD_FSM-1:0] AddressTB;


Verilog HDL


// connection from/to RAM

output RWSelect , ReadClock, WriteClock, RAMEnable;

output [`WD_RAM_ADDRESS-1:0] AddressRAM;

inout [`WD_RAM_DATA-1:0] DataRAM;

wire [`WD_RAM_DATA-1:0] WrittenSurvivors ; 35 reg dummy,

SurvRDY;

reg [`WD_RAM_ADDRESS-1:0] AddressRAM;

reg [`WD_DEPTH-1:0] TBPage;

wire [`WD_DEPTH-1:0] TBPage_;

wire [`WD_DEPTH-1:0] ACSPage;

wire [`WD_TB_ADDRESS-1:0] AddressTB;

/ / Read and Write clock

/ / Dummy variable used because Write Clock only occur every 2

Clocks.


i f (~Reset) dummy


Verilog HDL

51

Department Of ECE, GITAM UNIVERSITY

/ / every negedge Clock2 : - TBPage is decreased by 1, OR

/ / - When Init is Active, TBPage equal ACSPage - 1

always @( negedge Clock2 or negedge Reset)

begin

if (~Reset) begin

TBPage


Verilog HDL

52


end

end

endmodule

module ACSSURVIVORBUFFER (Reset , Clock1, Active, SurvRDY,

Survivors ,

Writ tenSurvivors) ;

/ /

/ / To accomodate the use of 8 bit wide RAM DATA BUS, the

Survivor

/ / (which is only 4 on every clock) must be buffered first .

/*-----------------------------------*/

input Reset , Clock1, Active, SurvRDY;

input [`N_ACS-1:0] Survivors;

output [`WD_RAM_DATA-1:0] WrittenSurvivors ;

wire[`WD_RAM_DATA-1:0] WrittenSurvivors ;

reg [`N_ACS-1:0] WrittenSurvivors_; 123


begin

if (~Reset) WrittenSurvivors_ = 0;

else if (Active)

Writ tenSurvivors_ = Survivors;

end

code for ACS unit( add compare and select)

/ / Module : ACSUNIT

/ / File : acs.v

/ / Description : Description of ACS Unit in Viterbi Decoder

module ACSUNIT (Reset , Clock1, Clock2, Active, Init , Hold,

CompareStart ,

ACSSegment , Distance, Survivors , LowestState ,


Verilog HDL

53


MMReadAddress , MMWriteAddress , MMBlockSelect , MMMetric ,

MMPathMetric) ;

/*-----------------------------------*/

/ / ACS UNIT consists of :

/ / - 4 ACS modules (ACS)

/ / - RAM Interface

/ / - State with smallest metric finder (LOWESTPICK)

/*-----------------------------------*/

input Reset , Clock1, Cloc

documentation.pprojectdf

Documents

Transcript of documentation.pprojectdf