documentation.pprojectdf
description
Transcript of documentation.pprojectdf
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 1
Chapter 1
INTRODUCTION
1.1 Overview of the project
Convolutional coding has been used in communication systems
including deep space communications and wireless communications. It offers an
alternative to block codes for transmission over a noisy channel. An advantage of
convolutional coding is that it can be applied to a continuous data stream as well as to
blocks of data. IS-95, a wireless digital cellular standard for CDMA (code division
multiple access), employs convolutional coding. A third generation wireless cellular
standard, under preparation, plans to adopt turbo coding, which stems from
convolutional coding.
The Viterbi decoding algorithm, proposed in 1967 by Viterbi, is a
decoding process for convolutional codes in memory-less noise. The algorithm can
be applied to a host of problems encountered in the design of communication
systems. The Viterbi decoding algorithm provides both a maximum-likelihood
and a maximum a posteriori algorithm. A maximum a posteriori algorithm
identifies a code word that maximizes the conditional probability of the decoded
code word against the received code word, in contrast a maximum likelihood
algorithm identifies a code word that maximizes the conditional probability of
the received code word against the decoded code word. The two algorithms give the
same results when the source information has a uniform distribution.
Traditionally, performance and silicon area are the two most important
concerns in VLSI design. Recently, power dissipation has also become an important
concern, especially in battery- powered applications, such as cellular phones, pagers
and laptop computers. Power dissipation can be classified into two categories, static
power dissipation and dynamic power dissipation Typically, static power dissipation
is due to various leakage currents, while dynamic power dissipation is a result
of charging and discharging the parasitic capacitance of transistors and wires. Since
the dynamic power dissipation accounts for about 80 to 90 percent of overall power
dissipation in CMOS circuits; numerous techniques have been proposed to reduce
dynamic power dissipation. These techniques can be applied at different levels of
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 2
digital design, such as the algorithmic level, the architectural level, the gate level and,
the circuit level.
A Viterbi decoder uses the Viterbi algorithm for decoding a bit
stream that has been encoded using Forward error correction based on a
Convolutional code. The Viterbi algorithm is commonly used in a wide range of
communications and data storage applications. It is used for decoding convolutional
codes, in baseband detection for wireless systems, and also for detection of recorded
data in magnetic disk drives. The requirements for the Viterbi decoder or Viterbi
detector, which is a processor that implements the Viterbi algorithm, depend on the
applications where they are used. This results in very wide range of required data
throughputs and power or area requirements.
Viterbi detectors are used in cellular telephones with low data rates, of
the order below 1Mb/s but with very low energy dissipation requirement. They are
used for trellis code demodulation in telephone line modems, where the throughput is
in the range of tens of kb/s, with restrictive limits in power dissipation and the
area/cost of the chip. On the opposite end, very high speed Viterbi detectors are used
in magnetic disk drive read channels, with throughputs over 600Mb/s. But at these
high speeds, area and power are still limited.
Convolutional coding has been used in communication systems including
deep space communications and wireless communications. It offers an alternative to
block codes for transmission over a noisy channel. An advantage of convolutional
coding is that it can be applied to a continuous data stream as well as to blocks of
data. IS-95, a wireless digital cellular standard for CDMA (code division multiple
access), employs convolutional coding.
1.2 Motivation
Unlike wired digital networks, wireless digital networks are much
more prone to bit errors. Packets of bits that are received are more likely to be
damaged and considered unusable in a packetized system. Error detection and
correction mechanisms are vital and numerous techniques exist for reducing the effect
of bit-errors and trying to ensure that the receiver eventually gets an error free version
of the packet. The major techniques used are error detection with Automatic Repeat
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 3
Request (ARQ), Forward Error Correction (FEC) and hybrid forms of ARQ and FEC
(H-ARQ).
This project focuses on FEC techniques. Forward Error Correction
(FEC) is the method of transmitting error correction information along with the
message. At the receiver, this error correction information is used to correct any bit-
errors that may have occurred during transmission. The improved performance comes
at the cost of introducing a considerable amount of redundancy in the transmitted
code. There are various FEC codes in use today for the purpose of error correction.
Most codes fall into either of two major categories: block codes and convolutional
codes. Block codes work with fixed length blocks of code. Convolutional codes deal
with data sequentially (i.e. taken a few bits at a time) with the output depending on
both the present input as well as previous inputs.
In terms of implementation, block codes become very complex as
their length codes, are less complex and therefore easier to implement. In
packetized digital networks convolutionally coded data would still be transmitted as
packets or blocks. However these blocks would be much larger in comparison to those
used by block codes. The fact that convolutional codes are easier to implement,
coupled with the emergence of a very efficient convolutional decoding algorithm,
known as Viterbi Algorithm is one of the reasons for convolutional codes becoming
the preferred method for real time communication technologies.
This project studies the use of various error detection and correction
techniques for mobile networks with a focus on non-recursive convolutional coding
and the Viterbi Algorithm. The constraint length of a non-recursive convolutional
code results from the number of stages present in the combinatorial logic of the
encoder. The error correction power of a convolutional code increases with its
constraint length. However, decoding complexity increases exponentially as the
constraint length increases. Fortunately, the efficiency of the Viterbi algorithm allows
the use of convolutional coding with quite reasonable constraint lengths in many
applications. Due to its high accuracy in finding the most likely sequence of states, the
Viterbi algorithm is used in many applications ranging from communication
networks, optical character recognition and even DNA sequence analysis.
Recently, interest has grown in the use of certain error correction
codes that provide much superior performance. Two of these codes are Low Density
Parity Check codes and Turbo Codes. The ideas presented in this thesis are likely to
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 4
be relevant to these more advanced codes as well as non-recursive convolutional
codes, but this thesis will concentrate on convolutional codes. Since preservation of
battery energy is a major concern for mobile devices, it is desirable that the error
detection and correction mechanism take the minimum amount of energy to execute.
This project explores the possibility of improving the energy efficiency of the Viterbi
decoder and develops an algorithm to achieve this.
1.3 Outline and Context of the Report
This project focuses on the use of Viterbi Algorithm for forward error
correction in mobile networks. It is desirable to keep energy consumption at a
minimum in order to optimize use of available battery energy. In order to get good
error correcting capabilities, the constraint length must be kept high and since the
complexity of a convolutional decoder increases exponentially with its constraint
length, optimizing the decoding mechanism with respect to energy consumption
becomes a worthwhile goal. The growing need for improved energy efficiency of
decoders has resulted in several approaches being explored.
The main focus of the project is to explore an idea, proposed by Barry
Cheetham which is to switch off the Viterbi decoder and use a simpler decoder when
no bit-errors are occurring. It is possible that by doing this, a significant amount of
energy could be saved. When bit-errors are detected, the Viterbi decoder can be
switched back on to take advantage of its error correction functionality. This process
at the receiver depends on having a memory of previous bits received. Correctly
maintaining and using this previous memory (previous history) when switching
between the two decoders is one of the main technical challenges in the project. The
energy saving mechanism proposed by Barry Cheetham is based on an earlier idea
published by Wei Shao, though it is hoped that the new approach will be easier to
implement.
This algorithm can be developed using verilog though it will require a
custom designed version of the Viterbi algorithm to be developed from scratch, and
then adapted to the new energy saving idea. Possible problems that may affect the
accuracy and energy saving capabilities of the algorithm must be analyzed and
solutions to these problems must be developed. The performance of the resulting
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 5
algorithm must be studied in terms of bit-error performance, packet loss rates and
processing time.
In principle, evaluating the performance of the new technique requires
profiling of the energy consumption of the two algorithms involved. To do this
accurately would require resources beyond the scope of the project verilog provides
some profiling facilities. But relating information obtained to energy consumption as
would be observed in a VLSI implementation of the code is a complex issue.
Nevertheless, it is believed that the execution times of particular parts of the
algorithms can give some idea of the likely relationship between the energy
consumption of these particular parts. Hence, in place of quoting estimations of the
likely energy consumption of different techniques, execution times will be quoted
with an implicit assumption that this gives a first order approximation to the likely
energy consumption. By comparison with the standard Viterbi decoder available
verilog an analysis will be made of whether this method provides a significant
improvement over existing mechanisms.
1.4 Contributions and main objectives
The main objectives of this project are as follows
1. An understanding of the background literature relevant to error detection and
error control mechanisms as currently used in packetized digital communication
networks.
2. A detailed understanding of the concept of convolutional coding, and decoding
using the Viterbi algorithm.
3. An implementation of the Viterbi algorithm in verilog to obtain a custom
designed version called My Viterbi and check that it is working correctly by
comparing its performance with that of the Viterbi decoder function provided by
verilog (A custom designed Viterbi decoder is needed because verilog does not
provide access to the code.
4. A resolution of questions that still need to be answered about the new
algorithm including the correct initialization of component decoders and the stability
of the feedback mechanism
5. An implementation in verilog of the new algorithm as a modification of the
custom designed Viterbi algorithm.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 6
6. An evaluation of the new algorithm in terms of its accuracy and capacity for
achieving energy saving the Analysis will be performed on the basis of bit-error
performance, packet loss rates and execution time (considered to provide a first order
approximation to energy.
1.5 Scope of the Project
This project is intended to further develop and implement the energy saving decoding
algorithm developed by Barry Cheetham. Solutions to some issues that still remained
to be resolved at the beginning of this project. The main focus of this project is to
provide a working demonstration of the algorithm by implementation in verilog and to
analyze its performance by comparison with the standard Viterbi decoder available in
verilog. The system will be developed using a hard decision Viterbi decoder but may
be extended to using a soft decision decoder. The project does not consider the circuit
level design of the algorithm but uses a high level approach to test the proposed
algorithm. This may be considered in future work if it is found that this algorithm
promises considerable benefits over existing mechanisms.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 7
Chapter 2
2.1 Overview of VLSI
The first semiconductor chips held one transistor each. Subsequent
advances added more and more transistors, and, as a consequence, more individual
functions or systems were integrated over time. The first integrated circuits held only
a few devices, perhaps as many as ten diodes, transistors, resistors and capacitors,
making it possible to fabricate one or more logic gates on a single device. Now known
retrospectively as "small-scale integration" (SSI), improvements in technique led to
devices with hundreds of logic gates, known as large-scale integration (LSI), i.e.
systems with at least a thousand logic gates. Current technology has moved far past
this mark and today's microprocessors have many millions of gates and hundreds of
millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale
integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used.
But the huge number of gates and transistors available on common devices has
rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of
integration are no longer in widespread use. Even VLSI is now somewhat quaint,
given the common assumption that all microprocessors are VLSI or better.
As of early 2008, billion-transistor processors are commercially available, an example
of which is Intel's Montecito Itanium chip. This is expected to become more
commonplace as semiconductor fabrication moves from the current generation of 65
nm processes to the next 45 nm generations (while experiencing new challenges such
as increased variation across process corners). Another notable example is NVIDIAs
280 series GPU.
This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable
of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor
count is largely due to the 24MB L3 cache). Current designs, as opposed to the
earliest devices, use extensive design automation and automated logic synthesis to lay
out the transistors, enabling higher levels of complexity in the resulting logic
functionality. Certain high-performance logic blocks like the SRAM cell, however,
are still designed by hand to ensure the highest efficiency (sometimes by bending or
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 8
breaking established design rules to obtain the last bit of performance by trading
stability).
2.2 INTRODUCTION OF VLSI
Very-large-scale integration (VLSI) is the process of creating integrated
circuits by combining thousands of transistor-based circuits into a single chip. VLSI
began in the 1970s when complex semiconductor and communication technologies
were being developed. The microprocessor is a VLSI device. The term is no longer as
common as it once was, as chips have increased in complexity into the hundreds of
millions of transistors.
The first semiconductor chips held one transistor each. Subsequent advances
added more and more transistors, and, as a consequence, more individual functions or
systems were integrated over time. The first integrated circuits held only a few
devices, perhaps as many as ten diodes, transistors, resistors and capacitors, making it
possible to fabricate one or more logic gates on a single device. Now known
retrospectively as "small-scale integration" (SSI), improvements in technique led to
devices with hundreds of logic gates, known as large-scale integration (LSI), i.e.
systems with at least a thousand logic gates. Current technology has moved far past
this mark and today's microprocessors have many millions of gates and hundreds of
millions of individual transistors.
At one time, there was an effort to name and calibrate various levels of large-scale
integration above VLSI. Terms like Ultra-large-scale Integration (ULSI) were used.
But the huge number of gates and transistors available on common devices has
rendered such fine distinctions moot. Terms suggesting greater than VLSI levels of
integration are no longer in widespread use. Even VLSI is now somewhat quaint,
given the common assumption that all microprocessors are VLSI or better.
As of early 2008, billion-transistor processors are commercially available, an example
of which is Intel's Montecito Itanium chip. This is expected to become more
commonplace as semiconductor fabrication moves from the current generation of 65
nm processes to the next 45 nm generations (while experiencing new challenges such
as increased variation across process corners). Another notable example is NVIDIAs
280 series GPU.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 9
This microprocessor is unique in the fact that its 1.4 Billion transistor count, capable
of a teraflop of performance, is almost entirely dedicated to logic (Itanium's transistor
count is largely due to the 24MB L3 cache). Current designs, as opposed to the
earliest devices, use extensive design automation and automated logic synthesis to lay
out the transistors, enabling higher levels of complexity in the resulting logic
functionality. Certain high-performance logic blocks like the SRAM cell, however,
are still designed by hand to ensure the highest efficiency (sometimes by bending or
breaking established design rules to obtain the last bit of performance by trading
stability).
2.3 What is VLSI?
VLSI stands for "Very Large Scale Integration". This is the field which involves
packing more and more logic devices into smaller and smaller areas.
VLSI
Simply we say Integrated circuit is many transistors on one chip.
Design/manufacturing of extremely small, complex circuitry using modified
semiconductor material
Integrated circuit (IC) may contain millions of transistors, each a few mm in
size
Applications wide ranging: most electronic logic devices
2.3.1 History of scale integration
late 40s Transistor invented at Bell Labs
late 50s First IC (JK-FF by Jack Kilby at TI)
early 60s Small Scale Integration (SSI)
10s of transistors on a chip
100s of transistors on a chip
early 70s Large Scale Integration (LSI)
1000s of transistor on a chip
early 80s VLSI 10,000s of transistors on a
chip (later 100,000s & now 1,000,000s)
Ultra LSI is sometimes used for 1,000,000s
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 10
SSI - Small-Scale Integration (0-102)
MSI - Medium-Scale Integration (102-103)
LSI - Large-Scale Integration (103-105)
VLSI - Very Large-Scale Integration (105-107)
ULSI - Ultra Large-Scale Integration (>=107)
2.3.2 Advantages of ICs over discrete components
While we will concentrate on integrated circuits, the properties of integrated circuits-
what we can and cannot efficiently put in an integrated circuit-largely determine the
architecture of the entire system. Integrated circuits improve system characteristics in
several critical ways. ICs have three key advantages over digital circuits built from
discrete components:
Size. Integrated circuits are much smaller-both transistors and
wires are shrunk to micrometer sizes, compared to the millimeter or centimeter scales
of discrete components. Small size leads to advantages in speed and power
consumption, since smaller components have smaller parasitic resistances,
capacitances, and inductances.
Speed. Signals can be switched between logic 0 and logic 1
much quicker within a chip than they can between chips. Communication within a
chip can occur hundreds of times faster than communication between chips on a
printed circuit board. The high speed of circuits on-chip is due to their small size-
smaller components and wires have smaller parasitic capacitances to slow down the
signal.
Power consumption. Logic operations within a chip also take
much less power. Once again, lower power consumption is largely due to the small
size of circuits on the chip-smaller parasitic capacitances and resistances require less
power to drive them.
2.3.4 VLSI and systems
These advantages of integrated circuits translate into advantages at the system level:
Smaller physical size. Smallness is often an advantage in
itself-consider portable televisions or handheld cellular telephones.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 11
Lower power consumption. Replacing a handful of standard
parts with a single chip reduces total power consumption. Reducing power
consumption has a ripple effect on the rest of the system: a smaller, cheaper power
supply can be used; since less power consumption means less heat, a fan may no
longer be necessary; a simpler cabinet with less shielding for electromagnetic
shielding may be feasible, too.
Reduced cost. Reducing the number of components, the
power supply requirements, cabinet costs, and so on, will inevitably reduce system
cost. The ripple effect of integration is such that the cost of a system built from
custom ICs can be less, even though the individual ICs cost more than the standard
parts they replace.
Understanding why integrated circuit technology has such profound influence on the
design of digital systems requires understanding both the technology of IC
manufacturing and the economics of ICs and digital systems.
Applications
Electronic system in cars.
Digital electronics control VCRs
Transaction processing system, ATM
Personal computers and Workstations
Medical electronic systems.
2.4 Applications of VLSI
Electronic systems now perform a wide variety of tasks in daily life. Electronic
systems in some cases have replaced mechanisms that operated mechanically,
hydraulically, or by other means; electronics are usually smaller, more flexible, and
easier to service. In other cases electronic systems have created totally new
applications. Electronic systems perform a variety of tasks, some of them visible,
some more hidden:
Personal entertainment systems such as portable MP3
players and DVD players perform sophisticated algorithms with remarkably little
energy.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 12
Electronic systems in cars operate stereo systems and
displays; they also control fuel injection systems, adjust suspensions to varying
terrain, and perform the control functions required for anti-lock braking (ABS)
systems.
Digital electronics compress and decompress video, even at
high-definition data rates, on-the-fly in consumer electronics.
Low-cost terminals for Web browsing still require
sophisticated electronics, despite their dedicated function.
Personal computers and workstations provide word-
processing, financial analysis, and games. Computers include both central processing
units (CPUs) and special-purpose hardware for disk access, faster screen display, etc.
Medical electronic systems measure bodily functions and
perform complex processing algorithms to warn about unusual conditions. The
availability of these complex systems, far from overwhelming consumers, only creates
demand for even more complex systems.
2.5 VERILOG HDL
Verilog HDL is a hardware description language that can be used to model a
digital system at many levels of abstraction ranging from the algorithmic-level to the
gate-level to the switch-level. The complexity of the digital system being modelled
could vary from that of a simple gate to a complete electronic digital system, or
anything in between. The digital system can be described hierarchically and timing
can be explicitly modelled within the same description.
The Verilog HDL language includes capabilities to describe the behaviour-al
nature of a design, the dataflow nature of a design, a design's structural composition,
delays and a waveform generation mechanism including aspects of response
monitoring and verification, all modelled using one single language. In addition, the
language provides a programming language interface through which the internals of a
design can be accessed during simulation including the control of a simulation run.
The language not only defines the syntax but also defines very clear simulation
semantics for each language construct. Therefore, models written in this language can
be verified using a Verilog simulator. The language inherits many of its operator
symbols and constructs from the C programming language. Verilog HDL provides an
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 13
extensive range of modelling capabilities, some of which are quite difficult to
comprehend initially. However, a core subset of the language is quite easy to learn
and use. This is sufficient to model most applications.
2.5.1 History:
The verilog HDL language was first developed by Gateway Design Automation in
1983 as hardware are modelling language for their simulator product, At that time was
a propnetary language. Because of the popularity of the simulator product, Verilog
HDL gained acceptance as a usable and practical language by a number of designers.
In an effort to increase the popularity of the language, the language was placed in the
public domain in 1990. Open verilog International (OVI) was formed to promote
Verilog. In 1992 OVI decided to pursue standardization of verilog HDL as an IEEE
standard. This effort was successful and the language became an IEEE standard in
1995. The complete standard is described in the verilog hardware description
language reference manual. The standard is called std 1364-1995.
2.5.2 Major Capabilities:
Listed below are the major capabilities of the verilog hardware description:
Primitive logic gates, such as and, or and nand, are built-in into the language.
Flexibility of creating a user-defined primitive (UDP). Such a primitive could
either be a combinational logic primitive or a sequential logic primitive.
Switch-level modelling primitive gates, such as pmos and nmos, are also built-
in into the language.
Explicit language constructs are provided for specifying pin-to-pin delays,
path delays and timing checks of a design.
A design can be modelled in three different styles or in a mixed style. These
styles are: behavioural style - modelled using procedural constructs; dataflow style -
modelled using continuous assignments; and structural style - modelled using gate and
module instantiations.
There are two data types in Verilog HDL; the net data type and the register
data type. The net type represents a physical connection between structural elements
while a register type represents an abstract data storage element.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 14
Figure.2-1 shows the mixed-level modeling capability of Verilog HDL, that is,
in one design, each module may be modeled at a different level.
Verilog HDL also has built-in logic functions such as & (bitwise-and) and I
(bitwise-or).
High-level programming language constructs such as condition- als, case
statements, and loops are available in the language.
Notion of concurrency and time can be explicitly modelled.
Powerful file read and write capabilities fare provided.
The language is non-deterministic under certain situations, that is, a model
may produce different results on different simulators; for example, the ordering of
events on an event queue is not defined by the standard.
2.6 Verilog synthesis
Synthesis is the process of constructing a gate level netlist from a register-transfer
level model of a circuit described in Verilog HDL. Figure.2-2 shows such a process.
A synthesis system may as an intermediate step, generate a netlist that is comprised of
register-transfer level blocks such as flip-flops, arithmetic-logic-units, and
multiplexers, interconnected by wires. In such a case, a second program called the
RTL module builder is necessary. The purpose of this builder is to build, or acquire
Fig2.1: Mixed level modelling
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 15
from a library of predefined components, each of the required RTL blocks in the user-
specified target technology.
Having produced a gate level netlist, a logic optimizer reads in the netlist and
optimizes the circuit for the user-specified area and timing constraints. These area and
timing constraints may also be used by the module builder for appropriate selection or
generation of RTL blocks. In this book, we assume that the target netlist is at the gate
level. The logic gates used in the synthesized netlists are described in Appendix B.
The module building and logic optimization phases are not described in this book.
The above figure shows the basic elements of Verilog HDL and the elements used in
hardware. A mapping mechanism or a construction mechanism has to be provided that
translates the Verilog HDL elements into their corresponding hardware elements as
shown in figure.
Figure: 2.2 synthesis process
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 16
2.7 Tools used and explanation
Requirements:
Xilinx 9.1i
Modelsim 6.2
2.8 Introduction about the Software:
Xilinx ISE 8.2i software includes the new Xilinx Smart Compile
technology, which significantly improves run times by up to 6 times faster than the
previous version, while maintaining exact design preservation of unchanged logic.
Modelsim SE 6.2C is a verification and simulation tool for VHDL, Verilog,
System-Verilog, and mixed language designs.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 17
CHAPTER-3
3.1 THE VITERBI DECODER ALGORITHM
The Viterbi decoding algorithm is a decoding process for
convolutional codes for memory-less channel. It depicts the normal flow of
information over a noisy channel. For the purpose of error recovery, the encoder
adds redundant information to the original Information, and the output is transmitted
through a channel. Input at receiver end (r) is the information with redundancy and
possibly, noise. The receiver tries to extract the original information through a
decoding algorithm and generates an estimate (e). A decoding algorithm that
maximizes the probability p(r|e) is a maximum likelihood (ML) algorithm. An
algorithm which maximizes the p(r|e) through the proper selection of the estimate (e)
is called a maximum a posteriori (MAP) algorithm. The two algorithms have identical
results when the source information has a uniform distribution.
Figure 3.1 The Convolutional Decoding
The Viterbi Algorithm was developed by Andrew J. Viterbi and first published
in the IEEE transactions journal on Information theory in 1967. It is a maximum
likelihood decoding algorithm for convolutional codes. This algorithm provides a
method of finding the branch in the trellis diagram that has the highest probability of
matching the actual transmitted sequence of bits. Since being discovered, it has
become one of the most popular algorithms in use for convolutional decoding. Apart
from being an efficient and robust error detection code, it has the advantage of having
a fixed decoding time. This makes it suitable for hardware implementation. The
algorithm has found universal application in decoding the convolutional codes used in
both CDMA and GSM digital cellular, dial-up modems, satellite, deep-space
communications, and 802.11 wireless LANs. It is now also commonly used in speech
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 18
recognition, speech synthesis, keyword spotting, computational linguistics,
and bioinformatics. For example, in speech-to-text (speech recognition), the acoustic
signal is treated as the observed sequence of events, and a string of text is considered
to be the "hidden cause" of the acoustic signal. The Viterbi algorithm finds the most
likely string of text given the acoustic signal. The terms Viterbi path and Viterbi
algorithm are also applied to related dynamic programming algorithms that discover
the single most likely explanation for an observation. For example, in statistical
parsing a dynamic programming algorithm can be used to discover the single most
likely context-free derivation (parse) of a string, which is sometimes called the Viterbi
parse.
3.2 Convolutional Encoders
Like any error-correcting code, a convolutional code works by
adding some structured redundant information to the user's data and then correcting
errors using this information. A convolutional encoder is a linear system. A binary
convolutional encoder can be represented as a shift register. The outputs of the
encoder are modulo 2 sums of the values in the certain register's cells. The input to the
encoder is either the unencoded sequence (for non-recursive codes) or the unencoded
sequence added with the values of some register's cells (for recursive codes).
In telecommunication, a convolutional code is a type of error-correcting
code in which
Each m-bit information symbol (each m-bit string) to be encoded is
transformed into an n-bit symbol, where m/n is the code rate (n m) and
The transformation is a function of the last k information symbols, where k is
the constraint length of the code.
Convolutional codes can be systematic and non-systematic.
Systematic codes are those where an unencoded sequence is a part of the output
sequence. Systematic codes are almost always recursive, conversely, non-recursive
codes are almost always non-systematic. Convolutional codes are used extensively in
numerous applications in order to achieve reliable data transfer, including digital
video, radio, mobile communication, and satellite communication. These codes are
often implemented in concatenation with a hard-decision code, particularly Reed
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 19
Solomon. Prior to turbo codes, such constructions were the most efficient, coming
closest to the Shannon limit.
To convolutionally encode data, start with k memory registers, each holding 1
input bit. Unless otherwise specified, all memory registers start with a value of 0. The
encoder has nmodulo-2 adders (a modulo 2 adder can be implemented with a
single Boolean XOR gate, where the logic is: 0+0 = 0, 0+1 = 1, 1+0 = 1, 1+1 = 0),
and n generator polynomials one for each adder (see figure below). An input
bit m1 is fed into the leftmost register. Using the generator polynomials and the
existing values in the remaining registers, the encoder outputs n bits. Now bit shift all
register values to the right (m1 moves to m0, m0 moves to m-1) and wait for the next
input bit. If there are no remaining input bits, the encoder continues output until all
registers have returned to the zero state.
The figure below is a rate 1/3 (m/n) encoder with constraint length (k) of 3.
Generator polynomials are G1 = (1,1,1), G2 = (0,1,1), and G3 = (1,0,1). Therefore,
output bits are calculated (modulo 2) as follows:
n1 = m1 + m0 + m-1
n2 = m0 + m-1
n3 = m1 + m-1.
A combination of register's cells that forms one of the output streams (or that is added
with the input stream for recursive codes) is defined by a polynomial. Let m be the
maximum degree of the polynomials constituting a code, then K=m+1 is a constraint
length of the code.
Figure 3.2 The Convolutional Encoder
Figure A standard convolutional encoder with polynomials (171,133). For example,
for the decoder on the Figure 3.2, the polynomials are:
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 20
g1(z)=1+z+z2+z3+z6
g2(z)=1+z2+z3+z5+z6
Encoder polynomials are usually denoted in the octal notation. For the above example,
these designations are 1111001 = 171 and 1011011 = 133.The constraint length
of this code is 7.An example of a recursive convolutional encoder is on fig3.3
.
Figure 3.3. A recursive convolutional encoder
3.2.1 Trellis Diagram
A convolutional encoder is often seen as a finite state machine. Each state
corresponds to some value of the encoder's register. Given the input bit value, from a
certain state the encoder can move to two other states. These state transitions
constitute a diagram which is called a trellis diagram. A trellis diagram for the code
on the Figure 2 is depicted on the Figure 3. A solid line corresponds to input 0, a
dotted line to input 1 (note that encoder states are designated in such a way that the
rightmost bit is the newest one).
Each path on the trellis diagram corresponds to a valid sequence from the
encoder's output. Conversely, any valid sequence from the encoder's output can be
represented as a path on the trellis diagram. One of the possible paths is denoted as
red (as an example).Note that each state transition on the diagram corresponds to a
pair of output bits. There are only two allowed transitions for every state, so there are
two allowed pairs of output bits, and the two other pairs are forbidden.
If an error occurs, it is very likely that the receiver will get a set of forbidden
pairs, which don't constitute a path on the trellis diagram. So, the task of the decoder
is to find a path on the trellis diagram which is the closest match to the received
sequence.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 21
Figure 3.4 A trellis diagram corresponding to the encoder on the Figure 3.3
Let's define a free distance df as a minimal Hamming distance between two different
allowed binary sequences (a Hamming distance is defined as a number of differing
bits).
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 22
Chapter -4
4. Viterbi decoder
A Viterbi decoder uses the Viterbi algorithm for decoding a bit stream that
has been encoded using a convolutional code. There are other algorithms for decoding
a convolutionally encoded stream (for example, the Fano algorithm). The Viterbi
algorithm is the most resource-consuming, but it does the maximum
likelihood decoding. It is most often used for decoding convolutional codes with
constraint lengths k
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 23
For each state, the Hamming distance between the received bits and the expected bits
is calculated. Hamming distance between two symbols of the same length is
calculated as the number of bits that are different between them. These branch metric
values are passed to Block 2. If soft decision inputs were to be used, branch metric
would be calculated as the squared Euclidean distance between the received symbols.
The squared Euclidean distance is given as (a1-b1)2 + (a2-b2)2 + (a3-b3)2 where a1, a2, a3
and b1, b2, b3 are the three soft decision bits of the received and expected bits
respectively.
value Meaning
000 strongest 0
001 relatively strong 0
010 relatively weak 0
011 weakest 0
100 weakest 1
101 relatively weak 1
110 relatively strong 1
111 strongest 1
Figure 4.2 A recursive convolutional encoder
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 24
4.2. Path Metric Computation and Add-Compare-Select (ACS) Unit
A path metric unit summarizes branch metrics to get metrics for
paths, where K is the constraint length of the code, one of which can eventually be
chosen as optimal. Every clock it makes decisions, throwing off wittingly
nonoptimal paths. The results of these decisions are written to the memory of a
traceback unit.
The core elements of a PMU are ACS (Add-Compare-Select) units. The way in
which they are connected between themselves is defined by a specific code's trellis
diagram. Since branch metrics are always , there must be an additional circuit
preventing metric counters from overflow (it isn't shown on the image). An alternate
method that eliminates the need to monitor the path metric growth is to allow the path
metrics to "roll over", to use this method it is necessary to make sure the path metric
accumulators contain enough bits to prevent the "best" and "worst" values from
coming within 2(n-1) of each other. The compare circuit is essentially unchanged.
Figure 4.3 ACS Unit
It is possible to monitor the noise level on the incoming bit stream by monitoring the
rate of growth of the "best" path metric. A simpler way to do this is to monitor a
single location or "state" and watch it pass "upward" through say four discrete levels
within the range of the accumulator. As it passes upward through each of these
thresholds, a counter is incremented that reflects the "noise" present on the incoming
signal.
The path metric or error probability for each transition state at a particular time
instant is measured as the sum of the path metric for its preceding state and the branch
metric between the previous state and the present state. The initial path metric at the
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 25
first time instant is infinity for all states except state 0. For each state, there are two
possible predecessors. The mechanism of calculating the predecessors (and
successors) is the path metrics from both these predecessors are compared and the one
with the smallest path metric is selected. This is the most probable transition that
occurred in the original message. In addition, a single bit is also stored for each state
which specifies whether the lower or upper predecessor was selected.
Figure 4.4 A sample implantation of a path metric unit for a specific k=4 decoder
In cases where both paths result in the same path metric to the state, either the higher
or lower state may consistently be chosen as the surviving predecessor. For the
purpose of this project the higher state is consistently chosen as the surviving
predecessor. Finally, the state with the least accumulated path metric at the current
time instant is located. This state is called the global winner and is the state from
which traceback operation will begin. This method of starting the traceback operation
from the global winner instead of an arbitrary state was described by Linda
Brackenbury in her design of an asynchronous Viterbi decoder. This greatly improves
probability of finding the correct traceback path quicker and hence reduces the
amount of history information that needs to be maintained. It also reduces the number
of updates required to the surviving path. Both these measures result in improved
energy savings. The values for the surviving predecessors (also called local winners)
and the global winner are passed to Block 3.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 26
Figure 4.5 A sample implementation of an ACS Unit
4.3. Survivor memory unit or Trace back Unit
Back-trace unit restores an (almost) maximum-likelihood path from the
decisions made by PMU. Since it does it in inverse direction, a viterbi decoder
comprises a FILO (first-in-last-out) buffer to reconstruct a correct order. Note that the
implementation shown on the image requires double frequency. There are some tricks
that eliminate this requirement.
The global winner for the current state is received from Block 2. Its
predecessor is selected in the manner. In this way, working backwards through the
trellis, the path with the minimum accumulated path metric is selected. This path is
known as the traceback path. A diagrammatic description will help visualize this
process. The trellis diagram for a K=3 (7, 5) coder with sample input taken as the
received data.
The general approach to traceback is to accumulate path metrics for up to five
times the constraint length (5 * (K 1)), find the node with the largest accumulated
cost, and begin traceback from this node.
However, computing the node which has accumulated the largest cost (either
the largest or smallest integral path metric) involves finding the maxima or minima of
several (usually 2K-1) numbers, which may be time consuming when implemented on
embedded hardware systems.
Most communication systems employ Viterbi decoding involving data packets
of fixed sizes, with a fixed bit/byte pattern either at the beginning or/and at the end of
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 27
the data packet. By using the known bit/byte pattern as reference, the start node may
be set to a fixed value, thereby obtaining a perfect
Maximum Likelihood Path during traceback.
Figure 4.6 Selected minimum error path for a k=3(7, 5) coder
The state having minimum accumulated error at the last time instant is
State 10 and traceback is started here. Moving backwards through the trellis, the
minimum error path out of the two possible predecessors from that state is selected.
This path is marked in blue. The actual received data is described at the bottom while
the expected data written in blue along the selected path. It is observed that at time
slot three there was an error in received data (11). This was corrected to (10) by the
decoder.
Local winner information must be stored for five times the constraint
length. For a K =7 decoder, this results in storing history for 7 x 5 = 35 time slots. The
state of the decoder at the time instant 35 time slots prior can then be accurately
determined. This state value is passed to Block 4. At the next time slot, all the trellis
values are shifted left to the previous time slot. The path metric for the last received
data and compute the minimum error path is then calculated. If the global winner at
this stage is not a child of the previous global winner, the traceback path has to be
updated accordingly until the traceback state is a child of the previous state [22].
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 28
Figure 4.7 Trace back path unit
Multiple traceback paths are possible and it may be thought that traceback up to
the first bit is necessary to correctly determine the surviving path. However, it was
found that all possible paths converge within a certain distance or depth of traceback.
This information is useful as it allows the setting of a certain traceback depth beyond
which it is neither necessary nor advantageous to store path metric and other
information. This greatly reduces memory storage requirements and hence energy
consumption of the decoder. Empirical observations showed that a depth of five times
the constraint length was sufficient to ensure merging of paths. Therefore, local
winner information is stored for 35 slots (five times seven) in the decoder used for this
project. Block 4. Data Input Determination Now going forwards through the
traceback path, the state transitions at successive time intervals are studies and the
data bit that would have caused this transition is determined. This represents the
decoded output.
Determining Successors to a particular State, Each state is represented
by 6 shift registers (in the case of a K=7 encoder or decoder). The next state can
therefore be obtained by a right shift of the values of the shift registers. The first shift
register is given a value of 0. The resulting state represents the next state of the coder
if the input bit was 0. By adding 32 (1x25) to this value, the next state of the coder if
the input bit was 1 Determining Predecessors to a particular State In a similar way, the
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 29
first predecessor can be calculated this time by a left shift of the values of the shift
registers. By adding one (1x20) to this value, the value of the second predecessor to
the state is derived.
4.3.1 State Metric Storage
The block stores the partial path metric of each state at the current stage.
4.3.2 Output Generator:
This block generates the decoded output sequence. In the traceback approach, the
block incorporates combinational logic, which traces back along the survivor path and
latches the path (equivalently the decoded output sequence) to a register.
Figure 4.8 the block diagram of a general Viterbi Decoder
4.4. Encoding Mechanism
Data is coded by using a convolutional encoder, as described. It consists of a series of
shift registers and an associated combinatorial logic. The combinatorial logic is
usually a series of exclusive-or gates. The conventional encoder K=7, (171,133) is
used for the purpose of this project. The octal numbers 171 and 133 when represented
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 30
in binary form correspond to the connection of the shift registers to the upper and
lower exclusive-or gates respectively. Figure 3.1 represents this convolutional encoder
that will be used for the project.The encoder consists of series of xor gates for the
mechanism of encoding.
Figure 4.9: Rate=1/2 k=7, (171,133) Convolution Encoder
. 4.5. Decoding Mechanism
There are two main mechanisms by which Viterbi decoding may be
carried out namely, the Register Exchange mechanism and the Traceback mechanism.
Register exchange mechanisms, as explained by Ranpara and Sam Ha store the
partially decoded output sequence along the path. The advantage of this approach is
that it eliminates the need for traceback and hence reduces latency. However at each
stage, the contents of each register needs to be copied to the next stage. This makes
the hardware complex and more energy consuming than the traceback mechanism.
Traceback mechanisms use a single bit to indicate whether the survivor branch came
from the upper or lower path. This information is used to traceback the surviving path
from the final state to the initial state. This path can then be used to obtain the
decoded sequence. Traceback mechanisms prove to be less energy consuming and
will hence be the approach followed in this project.
Decoding may be done using either hard decision inputs or soft
decision inputs. Inputs that arrive at the receiver may not be exactly zero or one.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 31
Having been affected by noise, they will have values in between and even higher or
lower than zero and one. The values may also be complex in nature.
In the hard decision Viterbi decoder, each input that arrives at the
receiver is converted into a binary value (either 0 or 1). In the soft decision Viterbi
decoder, several levels are created and the arriving input is categorized into a level
that is closest to its value. If the possible values are split into 8 decision levels, these
levels may be represented by 3 bits and this is known as a 3 bit Soft decision.
This project uses a hard decision Viterbi decoder for the purpose of
developing and verifying the new energy saving algorithm. Once the algorithm is
verified, a soft decision Viterbi decoder may be used in place of the hard decision
decoder. Figure 3.2 shows the various stages required to decode data using the Viterbi
Algorithm. The decoding mechanism comprises of three major stages namely the
Branch Metric Computation Unit, the Path Metric Computation and Add-Compare-
Select (ACS) Unit and the Traceback Unit. A schematic representation of the decoder
is described below
Figure 4.10: Schematic representation of the Viterbi decoding block
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 32
CHAPTER-5
METHODS AND TYPES OF VITERBI DECODER
5.1 REGISTER EXCHANGE METHOD
The register exchange (RE) method is the simplest conceptually and a
commonly used technique. Because of the large power consumption and large area
required in VLSI implementations of the RE method, the trace back method (TB)
method is the preferred method in the design of large constraint length, high
performance Viterbi decoders. In the register exchange, a register assigned to each
state contains information bits for the survivor path from the initial state to the current
state. In fact, the register keeps the partially decoded output sequence along the path,
as illustrated in Figure 3.3. The register of state S1 at t=3 contains '101'. This is the
decoded output sequence along the hold path from the initial state.
Figure 5.1 Register Exchange Method
The register-exchange method eliminates the need to trace back
since the register of the final state contains the decoded output sequence. However,
this method results in complex hardware due to the need to copy the contents of all the
registers in a stage to the next stage. The survivor path information is applied to the
least significant bit of each register, and all the registers perform a shift left operation
at each stage to make room for the next bits. Hence, each register fills in the survivor
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 33
path information from the least significant bit toward the most significant bit. The
scheme is called shift update.
The shift update method is simple in implementation but causes high
switching activity due to the shift operation and, hence, results in high power
dissipation.
5.2 Trace back mechanism
Register exchange mechanisms, as explained by Ranpara and Sam Ha store the
partially decoded output sequence along the path. The advantage of this approach is
that it eliminates the need for traceback and hence reduces latency. However at each
stage, the contents of each register needs to be copied to the next stage. This makes
the hardware complex and more energy consuming than the traceback mechanism.
Traceback mechanisms use a single bit to indicate whether the survivor branch came
from the upper or lower path. This information is used to traceback the surviving path
from the final state to the initial state. This path can then be used to obtain the
decoded sequence. Traceback mechanisms prove to be less energy consuming and
will hence be the approach followed in this project.
5.3 TYPES OF VITERBI DECODING
In order to realize a certain coding scheme a suitable measure of similarity or
distance metric between two code words is vital. The two important metrics used to
measure the distance between two code words are the Hamming distance and
Euclidian distance adopted by the decoder depending on the code scheme, required
accuracy, channel characteristics and demodulator type.
5.3.1 HARD DECISION VITERBI DECODING
In the hard-decision decoding, the path through the trellis is determined using
the Hamming distance measure. Thus, the most optimal path through the trellis is the
path with the minimum Hamming distance. The Hamming distance can be defined as
a number of bits that are different between the observed symbol at the decoder and the
sent symbol from the encoder. Furthermore, the hard decision decoding applies one
bit quantization on the received bits. Hard decision decoding takes a stream of bits say
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 34
from the 'threshold detector' stage of a receiver, where each bit is considered
definitely one or zero. E.g. For binary signalling, received pulses are sampled and the
resulting voltages are compared with a single threshold. If a voltage is greater than
the threshold it is considered to be definitely a 'one' say regardless of how close it is to
the threshold. If it is less, it is definitely zero.
5.3.2 SOFT DECISION VITERBI DECODING
Soft-decision decoding is applied for the maximum likelihood
decoding, when the data is transmitted over the Gaussian channel. On the contrary to
the hard decision decoding, the soft-decision decoding uses multi-bit quantization for
the received bits, and Euclidean distance as a distance measure instead of the
hamming distance. The demodulator input is now an analog waveform and is usually
quantized into different levels in order to help the decoder decide more easily.
A 3-bit quantization results in an 8-array output. Soft decision decoding requires a
stream of 'soft bits' where we get not only the 1 or 0 decision but also an indication of
how certain we are that the decision is correct. One way of implementing this would
be to make the threshold detector generate instead of 0 or 1, say:
000 (definitely 0), 001 (probably 0), 010 (maybe 0), 011 (guess 0),
100 (guess 1), 101 (maybe 1), 110 (probably 1), 111(definitely 1).
We may call the last two bits 'confidence' bits. This is easy to do with eight voltage
thresholds rather than one. This helps when we anticipate errors and have some
'forward error correction' coding built into the transmission.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 35
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 36
CHAPTER-6
Applications
The Viterbi algorithm has a wide range of applications ranging from
satellite and space communications, DNA sequence analysis and Optical Character
Recognition.
An attempt to perform optical character recognition of text was investigated
by Neuhoff. The initial approach considered was to create a dictionary which
simulated vocabularies. Each time a character was read by the optical reader, it would
search the dictionary for the most likely estimate. The huge amount of computational
and storage requirements required under this approach made it impractical. However,
another approach makes use of statistical information about the language such as
relative frequency of letter pairs. A maximum a priori probability (MAP) of a word is
determined based on its probability as the output of the source model. The Viterbi
algorithm may then be used to perform this MAP sequence estimation.
An interesting application discussed by Metzner investigated among others,
the use of Viterbi decoding with soft decision to increase the probability of
successfully transmitting a data packet during a meteor burst. Since meteor trails are
made up of ionized material, these can be used for reliable communications. Some
characteristics of such meteor burst communication and descriptions of its practical
applications are detailed in. Metzner showed that convolutional codes with soft
decision were considerably better for meteor burst applications as compared to Reed-
Solomon codes.
Low power applications of the Viterbi decoder are particularly relevant to
many digital communication and recording systems today. As described by Kawokgy
and Salama systems like these are increasingly being used in wireless applications
which being battery operated, require low power consumption. In addition, these
systems also require processing speeds of over 100Mbps to allow multimedia
transmission. Following this trend, many papers have been written on designing low
power Viterbi decoding algorithms targeted for next generation wireless applications,
particularly CDMA systems. Some of these energy saving ideas that have been
investigated are described in the next section.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 37
6.1 Research Work
In mobile networks, decoding capabilities are limited by the receiver which is
a mobile handset. As such, it has limited resources of energy and computation power.
Another factor that affects wireless communication is that bandwidth is expensive.
Therefore, there is a high demand for codes that can correct errors very efficiently
while at the same time utilizing minimum energy. Hence, a lot of the past research has
been focused on how this may be achieved.
The fixed T-algorithm algorithm is an optimization of the Viterbi algorithm
which applies a pruning threshold to the accumulated path metrics of the Viterbi
decoder. Instead of storing all the survivor paths for all 2K-1 states, only some of the
most-likely paths are kept at every trellis stage. This results in fewer paths being
found and stored. The following Figure 3.4 demonstrates the result of an experiment
conducted by Henning and Chakrabarti [34] which compares normalized energy
estimates for the Viterbi and the fixed T-algorithm decoders as it varies with signal to
noise ratio (Eb/No) and code rate.
Figure 6.1: Normalised energy edtimated for the Viterbi and fixed T-algorithm
(Tf) decoders as code rate and signal to noise ratio (Eb/No) vary.
From the graph, it is estimated that a 33% to 83 % reduction in energy
consumption can be achieved when the signal to noise ratio is between 2.1 and 4 dB.
One of the other approaches taken has been to develop an adaptive T-algorithm which
adjusts parameters of the decoder based on real-time variations in signal to noise ratio
(SNR), code rate and maximum acceptable bit-error rate. The parameters adjusted are
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 38
truncation length and pruning threshold of the T-algorithm along with trace-back
memory management. Henning and Chakrabarti demonstrate in their paper how this
can achieve a potential energy reduction of 70% to 97.5% as compared to Viterbi
decoding. Truncation length refers to the number of bits a path is followed back
before a decision is made on the bit that was encoded. By reducing the truncation
length more bits can be decoded per traceback. Similarly, lowering the pruning
threshold means fewer paths need to be found and stored. Both of these measures can
reduce the number of memory accesses required by the decoder and hence reduce
energy consumption.
However, these measures may cause significant reduction in the error
correcting capability of the decoder.
Nevertheless, adjusting these parameters based on real-time changes in
the channel can optimize energy consumption. The following figure, Figure 3.5
demonstrates the results of an experiment conducted by Henning and Chakrabarti [34]
in which pruning threshold and truncation length are adapted to maintain bit-error rate
below 0.0037. From the graph, it is estimated that an energy consumption reduction of
70 to 97.5 % compared to the Viterbi decoder can be achieved when the signal to
noise ratio is between 2.1 and 4 dB.
However, the adaptive T-algorithm does require an additional overhead in
terms of monitoring the real-time variations and choosing the appropriate truncation
and threshold parameters from a lookup table. Since these operations are not complex
it is assumed that their energy consumption is negligible.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 39
Figure 6.2: Normalised energy estimates for the Viterbi and adaptive T-
algorithm (Ta) decoders as code rate and signal to noise ratio (Eb/No) vary while
maintaining bit-error rate below 0.0037
Yet another approach that was put forward by Jie Jin and Chi-Ying Tsui in the
2006 International Symposium on Low Power Electronics and Design, was to
integrate the T-algorithm with a Scarce-StateTransition (SST) decoder structure. The
SST structure first pre-decodes the received data (Rx) by performing an inverse
operation of the encoder. The pre-decoded signal will contain the original message
along with bit errors (Pre-Dec). This message Pre-Dec is re-encoded and XORed
with Rx, the original received data. The operation results in an output which consists
of mainly 0s and the errors in the message. This output is then fed to the Viterbi
decoder and the errors are corrected. In the end, the pre-decoded data (Pre-Dec) is
added to the decoded output of the Viterbi decoder using modulo-2 addition. When
channel bit-errors are low, most of the Viterbi decoder output bits are zero and thus
reduces switching activity.
The SST structure was used to reduce the switching activities of the
decoder and combined with the T-algorithm to reduce the average number of Add-
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 40
Compare Select calculations. In their experiments, Jie Jin and Chi-Ying Tsui achieved
a 30%-76% reduction in power consumption over the traditional Viterbi design for a
range of SNR values varying from 4 dB to 12 dB.
A different approach investigated by Sherif Welsen Shaker, Salwa Hussein Elramly
and Khaled Ali Shehata at a Telecommunications forum held in Belgrade last year
(2009) was to use the traceback approach with clock gating. In clock gating, the clock
of each register is enabled only when the register updates it survivor path information.
This reduces power dissipation. Their simulations showed a 30% reduction in
dynamic power dissipation which gives a good indication of power reduction on
implementation.
A similar approach investigated by Ranpara and Sam Ha and presented
in the International ASIC conference at Washington in 1999 was the use of clock
gating in combination with a concept known as toggle filtering. Signals may arrive at
the inputs of a combinational block at different times and this causes the block to go
through several intermediate transitions before it stabilizes. By blocking early signals,
the number of intermediate transitions can be reduced and hence power disspation can
be minimized. This mechanism of blocking early signals until all input signals arrive,
called toggle filtering, was used by Ranpara, et al, to reduce energy consumption of
the Viterbi decoder. Recently a new approach, targeted towards wireless applications
has been introduced [38] and involves a pre-traceback architecture for the survivor
path memory unit.
The start state of decoding is obtained directly through a pointer register
pointing to the target traceback state instead of estimating the start state through a
recursive traceback operation. This approach makes use of the similarity between bit
write and decode traceback operation to introduce the pre-traceback operation.
Effectively resulting in a trace forward type of operation, it results in a 50% reduction
in survivor memory read operations. Apart from improving latency by 25%,
implementation results predict up to 11.9% better energy efficiency when compared to
conventional traceback architecture for typical wireless applications.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 41
6.3 Low power consumption
For the branch metric of the Viterbi decoder, our design employs a soft-decision
method to improve its correction capability. In order to find the survivor path
efficiently, we modify the classical Viterbi decoding algorithm into a new one. This
new algorithm is similar to the register-exchange method with lower latency, but
using RAM instead of register banks for recording the output bit-stream of the
survivor path. Hence, our design can provide a low-power design. Finally, the chip of
this design consumes about 28.6 K gates using TSMC 0.18 m CMOS technology.
The power consumption of our chip is about 19.5 mW at 100 MHz. The power usage
in the implementation is around 367 mw.
Figure 6.3: Demonstration Of Power Consumption
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 42
6.4 Summary
This chapter has explained the decoding mechanism of the Viterbi decoder in detail
and described a few of its applications. A number of energy saving techniques that
have been investigated in the past has been discussed. The next chapter gives a
detailed description of the proposed energy saving algorithm that will be used in this
project.
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 43
CHAPTER-7
SYNTHESIS AND SIMULATION RESULTS
7.1 Sample code
/******************************************************/
module pDFF(DATA,QOUT,CLOCK,RESET);
/****************************************************** /
Code for d flip flop
parameter WIDTH = 1;
input [WIDTH-1:0] DATA;
input CLOCK, RESET;
output [WIDTH-1:0] QOUT;
reg [WIDTH-1:0] QOUT;
always @( posedge CLOCK or negedge RESET)
if (~RESET) QOUT
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 44
wire [`WD_CODE-1:0]
wire [`WD_DIST-1:0] D0,D1,D2,D3,D4,D5,D6,D7;// output distances
reg [`WD_CODE-1:0] CodeRegister ;
always @( posedge Clock2 or negedge Reset)
begin
if (~Reset) CodeRegister
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 45
Code for hamming distance calculation
module HARD_DIST_CALC (InputSymbol , BranchOutput ,
OutputDistance) ;
/ /desc. : performs 2 bits hamming DISTance calculation
/*-----------------------------------*/
input [`WD_CODE-1:0] InputSymbol , BranchOutput ;
output [`WD_DIST-1:0] OutputDistance;
reg [`WD_DIST-1:0] OutputDistance; 77
wireMS,LS; 79
assign MS = (InputSymbol[1] ^ BranchOutput[1]) ;
assign LS = (InputSymbol[0] ^ BranchOutput[0]) ; 82
always @(MS or LS)
begin
OutputDistance[1]
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 46
assign wA = PolyA & BranchID;
assign wB = PolyB&BranchID;
always @(wA or wB)
begin
EncOut[1] = (((wA[0]^wA[1]) ^ (wA[2]^wA[3]))^((wA[4]^wA[5] ) ^
(wA[6]^wA[7]))^wA[8]) ;
EncOut[0] = (((wB[0]^wB[1]) ^ (wB[2]^wB[3]))^((wB[4]^wB[5] ) ^
(wB[6]^wB[7]))^wB[8]) ;
end
code for viterbi encoder
/***************************************** *************/
module viterbi_encode9(X,Y,Clock,Reset) ;
/****************************************************** /
Input X,Clock,Reset;
output [1:0]
wire [1:0] Y;
wire X, Clock,Reset;
wire [8:0] PolyA, PolyB;
wire [8:0] wA, wB, ShReg;
assign PolyA =9'b111_101_011; assign PolyB =
9'b101_110_001; assign PolyA = 9'b110_101_111;
assign PolyB = 9'b100_011_101;
assign wA = PolyA &
ShReg;
assign wB = PolyB & ShReg;
assign ShReg[8] = X;
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 47
pDFF dff7(ShReg[8] , ShReg[7], Clock, Reset) ;
pDFF dff6(ShReg[7] , ShReg[6], Clock, Reset) ;
pDFF dff5(ShReg[6] , ShReg[5], Clock, Reset) ;
pDF dff4(ShReg[5], ShReg[4],Clock,Reset) ;
pDFF dff3(ShReg[4] , ShReg[3], Clock, Reset) ;
pDFF dff2(ShReg[3] , ShReg[2], Clock.Reset) ;
pDFF dff1(ShReg[2] , ShReg[1], Clock, Reset) ;
pDFF dff0(ShReg[1] , ShReg[0], Clock,Reset) ;
assign Yt[1] = wA[0] ^ wA[1] ^ wA[2] ^ wA[3] ^ wA[4] ^ wA[5] ^
wA[6] ^ wA[7] ^ wA[8];
assign Yt[0] = wB[0] ^ wB[1] ^ wB[2] ^ wB[3] ^ wB[4] ^ wB[5] ^
wB[6] ^ wB[7] ^ wB[8];
pDFF dffy1(Yt[1] , Y[1], Clock, Reset);
pDFF dffy0(Yt[0] , Y[0], Clock, Reset);
endmodule
code for viterbi decoder
/ / Module : VITERBIDECODER
/ / File : decoder.v
/ / Description : Top Level Module of Viterbi Decoder
/ /module VITERBIDECODER (Reset , CLOCK, Active, Code,
DecodeOut) ;
module VITERBIDECODER (Reset , CLOCK, Active, Code,
DecodeOut) ;
input Reset , CLOCK, Active;
input [`WD_CODE-1:0] Code;
output DecodeOut;
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 48
wire [`WD_DIST*2*`N_ACS-1:0] Distance; / / BMG
Output
wire [`WD_FSM-1:0] ACSSegment; / /
wire [`WD_DEPTH-1:0] ACSPage; / / Control Output
wire CompareStart , Hold, Init ; / /
wire [`N_ACS-1:0] Survivors; / / ACS Output
wire [`WD_STATE-1:0] LowestState ;
wire TB_EN;
wire RAMEnable;
wire ReadClock, WriteClock, RWSelect; 31
wire [`WD_RAM_ADDRESS-1:0] AddressRAM; / / RAM
AddressBus,
/ / generated by TBU and ACSU
wire [`WD_RAM_DATA-1:0] DataRAM; / / RAM Databus
wire [`WD_RAM_DATA-1:0] DataTB;
wire [`WD_RAM_ADDRESS-`WD_FSM-1:0] AddressTB;
wire Clock1, Clock2;
/ / for metric memory connection
wire [`WD_METR*2*`N_ACS-1:0] MMPathMetric ;
wire [`WD_METR*`N_ACS-1:0] MMMetric;
wire [`WD_FSM-2:0] MMReadAddress;
wire [`WD_FSM-1:0] MMWriteAddress ;
wire MMBlockSelect ;
/ / instantiation of Viterbi Decoder Modules
CONTROL ctl (Reset , CLOCK, Clock1, Clock2, ACSPage,
ACSSegment ,
Active, CompareStart , Hold, Init , TB_EN);
BMU bmu (Reset , Clock2, ACSSegment , Code, Distance);
ACSUNIT acs (Reset , Clock1, Clock2, Active, Init , Hold,
CompareStart ,
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 49
ACSSegment , Distance, Survivors , LowestState ,
MMReadAddress , MMWriteAddress , MMBlockSelect , MMMetric ,
MMPathMetric) ;
MMU mmu (CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ,
ACSPage,
ACSSegment [`WD_FSM-1:1], Survivors ,
DataTB, AddressTB,
RWSelect , ReadClock, WriteClock, RAMEnable , AddressRAM,
DataRAM);
TBU tbu (Reset , Clock1, Clock2, TB_EN, Init , Hold, LowestState
Code for main memory unit
/ / Module : MMU
/ / Description : Description of MMU Unit in Viterbi Decoder
module MMU (CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ,
ACSPage,
ACSSegment_minusLSB, Survivors ,
DataTB, AddressTB,
RWSelect , ReadClock, WriteClock,
RAMEnable , AddressRAM, DataRAM);
/ / connection from Control
input CLOCK, Clock1, Clock2, Reset , Active, Hold, Init ;
input [`WD_DEPTH-1:0] ACSPage;
input [`WD_FSM-2:0] ACSSegment_minusLSB; 21
/ / connection from ACS Unit
input [`N_ACS-1:0] Survivors;
/ / connection from/to TB Unit
output [`WD_RAM_DATA-1:0] DataTB;
input [`WD_RAM_ADDRESS-`WD_FSM-1:0] AddressTB;
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
Department of ECE, GITAM UNIVERSITY 50
// connection from/to RAM
output RWSelect , ReadClock, WriteClock, RAMEnable;
output [`WD_RAM_ADDRESS-1:0] AddressRAM;
inout [`WD_RAM_DATA-1:0] DataRAM;
wire [`WD_RAM_DATA-1:0] WrittenSurvivors ; 35 reg dummy,
SurvRDY;
reg [`WD_RAM_ADDRESS-1:0] AddressRAM;
reg [`WD_DEPTH-1:0] TBPage;
wire [`WD_DEPTH-1:0] TBPage_;
wire [`WD_DEPTH-1:0] ACSPage;
wire [`WD_TB_ADDRESS-1:0] AddressTB;
/ / Read and Write clock
/ / Dummy variable used because Write Clock only occur every 2
Clocks.
always @( posedge Clock2 or negedge Reset)
i f (~Reset) dummy
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
51
Department Of ECE, GITAM UNIVERSITY
/ / every negedge Clock2 : - TBPage is decreased by 1, OR
/ / - When Init is Active, TBPage equal ACSPage - 1
always @( negedge Clock2 or negedge Reset)
begin
if (~Reset) begin
TBPage
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
52
Department Of ECE, GITAM UNIVERSITY
end
end
endmodule
module ACSSURVIVORBUFFER (Reset , Clock1, Active, SurvRDY,
Survivors ,
Writ tenSurvivors) ;
/ /
/ / To accomodate the use of 8 bit wide RAM DATA BUS, the
Survivor
/ / (which is only 4 on every clock) must be buffered first .
/*-----------------------------------*/
input Reset , Clock1, Active, SurvRDY;
input [`N_ACS-1:0] Survivors;
output [`WD_RAM_DATA-1:0] WrittenSurvivors ;
wire[`WD_RAM_DATA-1:0] WrittenSurvivors ;
reg [`N_ACS-1:0] WrittenSurvivors_; 123
always @( posedge Clock1 or negedge Reset)
begin
if (~Reset) WrittenSurvivors_ = 0;
else if (Active)
Writ tenSurvivors_ = Survivors;
end
code for ACS unit( add compare and select)
/ / Module : ACSUNIT
/ / File : acs.v
/ / Description : Description of ACS Unit in Viterbi Decoder
module ACSUNIT (Reset , Clock1, Clock2, Active, Init , Hold,
CompareStart ,
ACSSegment , Distance, Survivors , LowestState ,
-
Implementation Of Low Power Consumption Convolution Encoder And Viterbi Decoder Using
Verilog HDL
53
Department Of ECE, GITAM UNIVERSITY
MMReadAddress , MMWriteAddress , MMBlockSelect , MMMetric ,
MMPathMetric) ;
/*-----------------------------------*/
/ / ACS UNIT consists of :
/ / - 4 ACS modules (ACS)
/ / - RAM Interface
/ / - State with smallest metric finder (LOWESTPICK)
/*-----------------------------------*/
input Reset , Clock1, Cloc