04 32 bit loss less comp.doc
-
Upload
ammayi9845930467904 -
Category
Documents
-
view
221 -
download
0
Transcript of 04 32 bit loss less comp.doc
-
7/27/2019 04 32 bit loss less comp.doc
1/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
CHAPTER 1
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
2/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
Introduction to VLSI
1.1 Historical perspective:
The electronics industry has achieved a phenomenal growth over the last two decades, mainly
due to the rapid advances in integration technologies, large-scale systems design - in short, due to the
advent of VLSI. The number of applications of integrated circuits in high-performance computing,
telecommunications, and consumer electronics has been rising steadily, and at a very fast pace.
Typically, the required computational power (or, in other words, the intelligence) of these applications
is the driving force for the fast development of this field. The current leading-edge technologies (such
as low bit-rate video and cellular communications) already provide the end-users a certain amount of
processing power and portability. This trend is expected to continue, with very important implications
on VLSI and systems design. One of the most important characteristics of information services is their
increasing need for very high processing power and bandwidth (in order to handle real-time video, for
example). The other important characteristic is that the information services tend to become more and
more personalized (as opposed to collective services such as broadcasting), which means that the
devices must be more intelligent to answer individual demands, and at the same time they must be
portable to allow more flexibility/mobility. As more and more complex functions are required in
various data processing and telecommunications devices, the need to integrate these functions in a
small system/package is also increasing. The level of integration as measured by the number of logic
gates in a monolithic chip has been steadily rising for almost three decades, mainly due to the rapid
progress in processing technology and interconnects technology
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
3/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
1.2 Advantages of IC:
The most important message here is that the logic complexity per chip has been (and still is) increasing
exponentially. The monolithic integration of a large number of functions on a single chip usually
provides
Less area/volume and therefore, compactness
Less power consumption
Less testing requirements at system level
Higher reliability, mainly due to improved on-chip interconnects
Higher speed, due to significantly reduced interconnection length
Significant cost savings
1.3 Levels Of ICs:
Digital circuits are constructed with integrated circuits. An integrated circuit (IC) is a small
silicon semiconductor crystal, called a chip. Containing the electronic components for the digital gates.
The various gates are interconnected inside the chip to form the circuit.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
4/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
Digital ICs are categorized according to their circuit complexity. As measured by the number of logic
gates in a single package, they are:
Small Scale Integration (SSI).
Medium Scale Integration (MSI)
Large Scale Integration (LSI)
Very Large Scale Integration (VLSI).
1.4 Classification of ICS by device count:
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
5/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
NomenclatureActive Device
CountFunctions Technology
SSI 1-100Gates, op-amps,Many linear
ApplicationsBipolar
MSI 100-1,000 Registers, filters etc
Bipolar like
TTL, ECL
LSI 1,000-10,000 Microprocessors MOS: NMOS,PMOS
VLSI 1,00,000-10,00,000Memories, computers,Signal processors CMOS
Very Large Scale Integration:
Micro electron chip which contains billions of physical components or millions of logical
components integrated (embedded) on an IC.
The feature size (physical dimension) of the component which is placing on a VLSI chip
is measured in terms of microns.
ACMOS IC fabricated with Very Deep Sub-micron (VDSM) technology (0.09micron
1.5 VLSI Design Flow:
1. Design Specification:
The first step in high level design flow is the design specification process. This process involves
specifying the behavior expected of the final design. The specifications specify the expected function
and behavior of the design using textual description and graphic element.
2. Behavioral Description:
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
6/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
Behavioral description is created to analyze the functionality and algorithm and then framed and
its performance and compliance to standard is verified.
VLSI Design Flow:
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
Design Specification
Behavioral Description
RTL description (VHDL)
Functional Verifying &testing
Logic Synthesis
Gate level
Logic Verification& testing
Flour Planning Automatic
Planning & Routing
Physical Layout
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
7/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
3. RTL Description (VHDL):
Once the algorithm is scrutinized, the code is written keeping in mind the functionality & its ability
to be synchronized, the RTL description can be written in Gate level, Data flow or behavioral levels. A
standard VHDL simulator can be used to read the RTL description and verify the correctness of the
design.
4. Functional Verifying & Testing:
The VHDL simulator reads VHDL description compiles it in to an internal format, and then
executes the compiled format using test vectors, after compilation if any syntax errors are there they has
to be removed and recompiled. After analyzing the results of the simulation stimulus for the design has
to be added. This may be file of input stimulus design (or) the file output stimulus design using
waveform editor the respective output waveform are to be observed to test the functionality of the
design.
5. Logic Synthesis:
Once the code is validated to implement the design process VHDL synthesis tool are used. The
goal of the VHDL synthesis step is to create a design that implements the required functionality and
constraints provided. Logic synthesis tool convert the given RTL code in to Optimized Gate level net
list.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
8/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
6. Gate level:
A gate level net list is the description of the design (circuit) interms of the Gates and connections
between them. Gate level is an input to automatic place and route tool.
7. Logic Verification &Testing:
The VHDL synthesis tool report syntax & synthesis errors. It gives errors & warnings. If it founds
mismatches between RTL Simulation results & output netlist simulation results. If it is error free the
next step is to map the design.
8. Flour Planning, Automatic Placing and Routing:
Place and route tools are used to take the design netlist and implement the design to the target
technology device.
9. Physical Layout:
In this each component or primitive from the netlist are placed on the target device according to
the design or architecture. The signals from one module to the other are also connected to form a
Physical layout.
1.6 INTRODUCTION TO VHDL1.6.1 What is VHDL?
VHDL stands for VHSIC Hardware Description Language, where VHSIC stands for Very High
Speed Integrated Circuit. Like the name implies, VHDL is a language for describing the behavior of
digital hardware. VHDL is just another way of describing what outputs of a digital circuit are desired
when it is given certain inputs. The critical difference between VHDL and these other languages are
that it can be readily interpreted by software, enabling the computer to accomplish your design work for
you.
As the size and complexity of digital systems increase, more computers aided design tools are
introduced into the hardware design process. The early paper-and-pencil design methods have given
way to sophisticated design entry, verification, and automatic hardware generation tools. The newest
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
9/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
addition to this design methodology is the hardware description languages (HDL). Although the
concept of HLDs is not new, their widespread use in digital system design is no more than a decade old.
Based on HDLs, new digital system CAD tools have been developed and are now being utilized by
hardware designers.
1.6.2 VHDL History:
In 1980 the US government developed the Very High Speed Integrated Circuit (VHSIC) project
to enhance the electronic design process, technology, and procurement, spawning development of many
advanced integrated circuit (IC) process technologies. This was followed by the arrival of VHSIC
Hardware Description Language (VHDL).
1.6.3 Why We Use VHDL?
There are many reasons why it makes good design sense to use VHDL:
1. Portability:
Technology changes so quickly in the digital industry that discrete digital devices require
constant rework in order to remain current. VHDL is designed to be device-independent, meaning that
if you describe your circuit in VHDL, as opposed to designing it with discrete devices, changing
hardware becomes a (relatively) trivial process.
2. Flexibility:
Most working engineers can recall a situation where they felt frustrated with their customer,
supervisor, or team members because the design specification that they were working with was
constantly changing. Sometimes these changes can't be helped. Design work is usually focused on
creating small, easily maintainable components and then integrating these components into a larger
device. On larger projects different teams of engineers will each design separate parts of the project at
the same time. This can mean that if one component in the project changes, all of the components must
change, even those being worked on by other engineering teams. Suppose you were told to design a
simple counter that set an output bit after it had counted to 100. However, the software engineer
working on this project discovered that the entire design could be radically simplified if your counter
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
10/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
could count down from 300 instead of up to 100. If you had implemented your design in discrete
circuits, you'd have to start over from scratch. But, if you'd designed using VHDL, all you'd have to do
is change your code.
1.6.4 VHDL Features:
General features:
VHDL can be used for design documentation, high level design, simulation, synthesis, and
testing of hardware and as a driver for a physical design tool.
1. Concurrency:
In VHDL the transfer statements, descriptions of components, and instantiations of gates or
logical units can all be executed such that in the end they appear to have been executed
simultaneously.
2. Support for design hierarchy:
In VHDL the operation of a system can be specified based on its functionality, or it can be
specified structurally in terms of its smaller subcomponents.
3. Library support:
User and system defined primitives and descriptions reside in the library system. VHDL
provides a mechanism for accessing various libraries. Moreover different designers can access theselibraries.
4. Sequential statement:
VHDL provides mechanism for executing sequential statements. These statements provide an
easy method for modeling hardware components based on their functionality. Sequential or
procedural capability is only for convenience, and the overall structure of the VHDL language
remains highly concurrent.
5. Type declaration and usage:
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
11/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
VHDL is not limited to just bit or boolean types, but it also supports integer, floating-point,
enumerated types and user-defined types. In addition, VHDL also allows array-type declarations
and composite-type definitions.
6. Use of subprograms:
VHDL allows the use of functions and procedures which can be used in type conversions, logic
unit definitions, operator redefinitions, new operation definitions, and other applications.
7. Timing control:
VHDL allows the designer to schedule values to signals and delay the actual assignment of
values until a later time. It also allows the use of any number of explicitly defined clock signals. It
provides features for edge detection, delay specification, setup and hold time specification, pulse
width checking, and setting various time constraints.
8. Structural specification:
VHDL allows the designer to describe a generic 1-bit design and use it when describing
multibit regular structures in one or more dimensions.
1.7 Advantages of VHDL:
VHDL offers the following advantages for digital design:
1. Standard:
VHDL is an EKE standard. Just like any standard (such as graphics X- window standard,
bus communication interface standard, high-level programming languages, and so on), it reduces
confusion and makes interfaces between tools, companies, and products easier. Any development
to the standard would have better chances of lasting longer and have less chance of becoming
obsolete due to incompatibility with others.
2. Government support:
VHDL is a result of the VHSIC program; hence, it is clear that the US government supports
the VHDL standard for electronic procurement. The Department of Defense (DOD) requires
contractors to supply VHDL for all Application Specific Integrated Circuit (ASIC) designs.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
12/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
3. Industry support:
With the advent of more powerful and efficient VHDL tools has come the growing support
of the electronic industry. Companies use VHDL tools not only with regard to defense contracts,
but also for their commercial designs.
4. Portability:
The same VHDL code can be simulated and used in many design tools and at different
stages of the design process. This reduces dependency on a set of design tools whose limited
capability may not be competitive in later markets. The VHDL standard also transforms design
data much easier than a design database of a proprietary design tool.
5. Modeling capability:
VHDL was developed to model all levels of designs, from electronic boxes to transistors.
VHDL can accommodate behavioral constructs and mathematical routines that describe complex
models, such as queuing networks and analog circuits. It allows use of multiple architectures and
associates with the same design during various stages of the design process. VHDL can describe
low-level transistors up to very large systems.
6. Reusability:
Certain common designs can be described, verified, and modified slightly in VHDL for
future use. This eliminates reading and marking changes to schematic pages, which is time
consuming and subject to error. For example, a parameterized multiplier VHDL code can be
reused easily by changing the width parameter so that the same VHDL code can do either 16 by
16 or 12 by 8 multiplication.
7. Technology and foundry independence:
The functionality and behavior of the design can be described with VHDL and verified,
making it foundry and technology independent. This frees the designer to proceed without having
to wait for the foundry and technology to be selected.
8. Documentation:
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
13/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
Single place by embedding it in the code. The combining of comments and the code that
actually dictates what the design should do reduces the ambiguity between specification and
implementation.
9. New design methodology:
Using VHDL and synthesis creates a new methodology that increases the design
productivity, shortens the design cycle, and lowers costs. It amounts to a revolution comparable to
that introduced by the automatic semi-custom layout synthesis tools of the last few years.
Synthesis, in the domain of digital design, is a process of translation and optimization. For
example, layout synthesis is a process of taking a design netlist and translating it into a form of
data that facilitates placement and routing, resulting in optimizing timing and/or chip size. Logic
synthesis, on the other hand, is the process of taking a form of input (VHDL), translating it into a
form (Boolean equations and synthesis tool specific), and then optimizing in terms of propagation
delay and/or area. After the VHDL code is translated into an internal form, the optimization
process can be performed based on constraints such as speed, area, power.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
14/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
CHAPTER 2www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
15/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
INTRODUCTION TO LOSSLESS COMPRESSION
2.1. Objective
With the increase in silicon densities, it is becoming feasible for multiple compression
systems to be implemented in parallel onto a single chip. A 32-BITsystem with distributed memory
architecture is based on having multiple data compression and decompression engines working
independently on different data at the same time. This data is stored in memory distributed to each
processor. The objective of the project is to design a lossless parallel data compression system which
operates in high-speed to achieve high compression rate. By using Parallel architecture of compressors,
the data compression rates are significantly improved. Also inherent scalability of parallel architecture
is possible. The main parts of the system are the two Xmatchpro based data compressors in
parallel and the control blocks providing control signals for the Data compressors, allowing
appropriate control of the routing of data into and from the system. Each Data compressor can process
four bytes of data into and from a block of data every clock cycle. The data entering the system
needs to be clocked in at a rate of 4n bytes every clock cycle, where n is the number of
compressors in the system. This is to ensure that adequate data is present for all compressors to process
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
16/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
rather than being in an idle state.
2.2.Goal of the Thesis
To achieve higher compression rates using 32-bit compression/decompression architecture with
least increase in latency.
2.3.LITERATURE SURVEY
2.3.1. Compression Techniques
At present there is an insatiable demand for ever-greater bandwidth in communication networks
and forever-greater storage capacity in computer system. This led to the need for an efficient
compression technique. The compression is the process that is required either to reduce the volume of
information to be transmitted text, fax and images or reduce the bandwidth that is required for its
transmission speech, audio and video. The compression technique is first applied to the source
information prior to its transmission. Compression algorithms can be classified in to two types, namely
O Lossless Compression
O Lossy Compression
2.3.1.1. Lossless Compression
In this type of lossless compression algorithm, the aim is to reduce the amount of source
information to be transmitted in such a way that, when the compressed information is
decompressed, there is no loss of information. Lossless compression is said therefore, to be
reversible. i.e., Data is not altered or lost in the process of compression or decompression.
Decompression generates an exact replica of the original object. The Various lossless Compression
Techniques are,
Packbits encoding
CCITT Group 3 1D
CCITT Group 3 2D
Lempel-Ziv and Welch algorithm LZW
Huffman
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
17/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
Arithmetic
Example applications of lossless compression are transferring data over a network as a
text file since, in such applications, it is normally imperative that no part of the source
information is lost during either the compression or decompression operations and file storage
systems (tapes, hard disk drives, solid state storage, file servers) and communication networks
(LAN, WAN, wireless).
2.3.1.2. Lossy Compression
The aim of the Lossy compression algorithms is normally not to reproduce an exact copy
of the source information after decompression but rather a version of it that is perceived by the recipient
as a true copy.
The Lossy compression algorithms are:
JPEG (Joint Photographic Expert Group)
MPEG (Moving Picture Experts Group)
CCITT H.261 (Px64)
Example applications of lossy compression are the transfer of digitized images and audio
and video streams. In such cases, the sensitivity of the human eye or ear is such that any fine details
that may be missing from the original source signal after decompression are not detectable .
2.3.1.3. Text Compression
There are three different types of text unformatted, formatted and hypertext and all are
represented as strings of characters selected from a defined set. The compression algorithm associated
with text must be lossless since the loss of just a single character could modify the meaning of a
complete string. The text compression is restricted to the use of entropy encoding and in practice,
statistical encoding methods. There are two types of statistical encoding methods which are used with
text: one which uses single character as the basis of deriving an optimum set of code words and the
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
18/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
other which uses variable length strings of characters. Two examples of the former are the Huffman and
Arithmetic coding algorithms and an example of the latter is Lempel-Ziv (LZ) algorithm.
The majority of work on hardware approaches to lossless parallel data compression has used an
adapted form of the dictionary-based Lempel-Ziv algorithm, in which a large number of simple
processing elements are arranged in a systolic array [1], [2], [3], [4].
2.3.2. Previous work on Lossless Compression Methods
A second Lempel-Ziv method used a content addressable memory (CAM) capable of
performing a complete dictionary search in one clock cycle [5], [6], [7]. The search for the most
common string in the dictionary (normally, the most computationally expensive operation in the
Lempel-Ziv algorithm) can be performed by the CAM in a single clockcycle, while the systolic array
method uses a much slower deep pipelining technique to implement its dictionary search.
However, compared to the CAM solution, the systolic array method has advantages in terms of
reduced hardware costs and lower power consumption, which may be more important criteria in
some situations than having faster dictionary searching. In [8], the authors show that hardware
main memory data compression is both feasible and worthwhile. The authors also describe the
design and implementation of a novel compression method, the XMatchPro algorithm. The authors
exhibit the substantial impact such memory compression has on overall system performance. The
adaptation of compression code for parallel implementation is investigated by Jiang and Jones [9].
They recommended the use of a processing array arranged in a tree-like structure. Although
compression can be implemented in this manner, the implementation of the decompressors
search and decode stages in parallel hardware would greatly increase the complexity of the
design and it is likely that these aspects would need to be implemented sequentially. An FPGA
implementation of a parallel binary arithmetic coding architecture that is able to process 8 bits
per clock cycle compared to the standard 1 bit per cycle is described by Stefo et al [10].
Although little research has been performed on architectures involving several independent
compression units working in a concurrent cooperative manner, IBM has introduced the MXT
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
19/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
chip [11], which has four independent compression engines operating on a shared memory area.
The four Lempel-Ziv compression engines are used to provide data throughput sufficient for
memory compression in computer servers. Adaptation of software compression algorithms to make use
of multiple CPU systems was demonstrated by research of Penhorn [12] and Simpson and Sabharwal
[13]. Penhorn used two CPUs to compress data using a technique based on the Lempel-Ziv
algorithm and showed that useful compression rate improvements can be achieved, but only at
the cost of increasing the learning time for the dictionary. Simpson and Sabharwa described the
software implementation of compression system for a multiprocessor system based on the
parallel architecture developed by Gonzalez and Smith and Storer [14].
2.3.2.1. Statistical Methods
Statistical Modeling of lossless data compression system is based on assigning Values to
events depending on their probability. The higher the value, the higher the probability. The accuracy
with which this frequency distribution reflects reality determines the efficiency of the model. In
Markov modeling, predictions are done based on the symbols that precede the current symbol.
Statistical Methods in hardware are restricted to simple higher order modeling using binary
alphabets that limits speed, or simple multisymbol alphabets using zeroth-order models that
limits compression. Binary alphabets limit speed because only a few bits (typically a single bit)
are processed in each cycle while zeroth order models limit compression because they can only
provide an inexact representation of the statistical properties of the data source.
2.3.2.2. Dictionary Methods
Dictionary Methods try to replace a symbol or group of symbols by a dictionary location code.
Some dictionary-based techniques use simple uniform binary codes to rocess the information
supplied. Both software and hardware based dictionary models achieve good throughput and
competitive compress
The UNIX utility compress uses Lempel-Ziv-2 (LZ2) algorithm and the data
compression Lempel-Ziv (DCLZ) family of compressors initially invented by Hewlett-
Packard[16] and currently being developed by AHA[17],[18] also use LZ2 derivatives. Bunton
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
20/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
and Borriello present another LZ2 implementation in [19] that improves on the Data
Compression Lempel-Ziv method. It uses a tag attached to each dictionary location to identify which
node should be eliminated once the dictionary becomes full.
2.4. XMatchPro Based System
The Lossless data compression system is a derivative of the XMatchPro Algorithm which
originates from previous research of the authors [15] and advances in FPGA technology. The
flexibility provided by using this technology is of great interest since the chip can be adapted to the
requirements of a particular application easily. The drawbacks of some of the previous methods are
overcome by using the XmatchPro algorithm in design. The objective is then to obtain better
compression ratios and still maintain a high throughput so that the compression/decompression
processes do not slow the original system down.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
21/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
CHAPTER 3www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
22/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
23/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
FUNCTIONS OF LOSSLESS COMPRESSION
3.1. BASICS OF COMMUNICATION
A sender can compress data before transmitting it and a receiver can decompress the data after
receiving it, thus effectively increasing the data rate of the communication channel. Lossless data
compression is the process of encoding a body of data into a smaller body of data that can at a
later time be uniquely decoded back to the original data.
Lossless compression removes redundant information from the data while they are being
transmitted or before they are stored in memory, and lossless decompression reintroduces the
redundant information to recover fully the original data. In the same way, the data is
compressed before it is stored and decompressed when it is retrieved, thus increasing theeffective capacity of the storage device.
3.2. Proposed Method
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
24/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
In [1], the author discusses about the Parallel Algorithm that can be implemented form
High Speed Data Compression. The authors gives the basic idea about how the Data Compression is
carried out using the Lempel-Ziv Algorithm and how it could be altered for Parallelism of the
algorithm. The author describes the Lempel-Ziv algorithm as a very efficient universal data
compression technique, based upon an incremental parsing technique, which maintains codebooks
of parsed phrases at the transmitter and at the receiver. An important feature of the algorithm is
that it is not necessary to determine a model of the source, which generates the data. According to the
author, in an attempt to increase the speed of the algorithm on general-purpose processors, the
algorithm has been parallelised to run on two processors.
3.3. Background
The author explains a novel architecture for a high-performance lossless data compressor
that is organized around a selectively shiftable Content Addressable Memory, which permits full
matching, the processor offers very high performance with good compression of computer-based
data. The author also gives details about the operation, architecture and performance of the
Data Compression Techniques. He also introduces the XMatchPro lossless data compressor. In [3],
the authors discuss about the parallelism in Data Compression Techniques and the authors
explain the Parallel Architecture for High Speed Data Compression. In this paper, the author
expresses Data Communication as an essential component of high-speed data communication and
storage. In [4], the authors discuss about the various methods of Data Compression and their
Techniques and drawbacks and propose a new methodology for a high speed Parallel Lossless
Data Compression. The authors describes the research and hardware implementation of a high
performance parallel multi compressor chip which could able to meet the intensive data processing
demands of highly concurrent system. The authors also investigate the performance of
alternative input and output routing strategies for realistic data sets demonstrate that the design
of parallel compression devices involves important trade offs that affect compression performance,
latency and throughput. Compression ratio achieved by the proposed universal code uniformly
approaches the lower bounds on the compression ratios attainable by block-to-variable codes and
variable-to-block codes designed to match a completely specified source.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
25/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
3.4. Usage of XMatchPro Algorithm
The Lossless Parallel Data Compression system designed uses the XMatchPro Algorithm.
The XMatchPro algorithm uses a fixed-width dictionary of previously seen data and attempts to
match the current data element with a match in the dictionary. It works by taking a 4-byte word and
trying to match or partially match this word with past data. This past data is stored in a dictionary,
which is constructed from a content addressable memory. As each entry is 4 bytes wide, several
types of matches are possible. If all the bytes do not match with any data present in the dictionary
they are transmitted with an additional miss bit. If all the bytes are matched then the match location
and match type is coded and transmitted, this match is then moved to the front of the dictionary.
The dictionary is maintained using a move to front strategy whereby a new tuple is placed at the front
of the dictionary while the rest move down one position. When the dictionary becomes full the
tuple placed in the last position is discarded leaving space for a new one.
The coding function for a match is required to code several fields as follows: A zero followed
by:
1). Match location: It uses the binary code associated to the matching location.
2). Match type: Indicates which bytes of the incoming tuple have matched.
3). Characters that did not match transmitted in literal form.
A description of the XMatchPro algorithm in pseudo-code is given in the figure below.
clear the dictionary;
set the next free location (NFL) to 0;
Do
{
read in a tuple T from the data stream;
search the dictionary for tuple T;
IF (full or partial hit)
{
determine the best match location ML and match type MT;
output 0;
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
26/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
output any required literal characters of T;
}
ELSE
{ output 1;
output tuple T;
}
IF (full hit)
{
move dictionary entries 0 to ML -1 down by one location ;
}
ELSE
{move all dictionary entries down by one location;
increment NFL (if dictionary is not full);
}
copy tuple T to dictionary location 0;
}
WHILE (more data is to be compressed);
Fig.3.2. Pseudo Code for XMatchPro Algorithm
With the increase in silicon densities, it is becoming feasible for multiple XMatchPros to
be implemented in parallel onto a single chip. A parallel system with distributed memory
architecture is based on having multiple data compression and decompression engines working
independently on different data at the same time. This data is stored in memory distributed to each
processor. There are several approaches in which data can be routed to and from the compressors that
will affect the speed, compression and complexity of the system. Lossless compression removes
redundant information from the data while they are transmitted or before they are stored in memory.
Lossless decompression reintroduces the redundant information to recover fully the original data.
There are two important contributions made by the current parallel compression &
decompression work, namely, improved compression rates and the inherent scalability. Significant
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
27/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
improvements in data compression rates have been achieved by sharing the computational
requirement between compressors without significantly compromising the contribution made by
individual compressors. The scalability feature permits future bandwidth or storage demands to be
met by adding additional compression engines.
3.4.1. The XMatchPro based Compression system
Previous research on the lossless XMatchPro data compressor has been on optimising
and implementing the XMatchPro algorithm for speed, complexity and compression in hardware.
The XMatchPro algorithm uses a fixed width dictionary of previously seen data and attempts to
match the current data element with a match in the dictionary. It works by taking a 4-byte word and
trying to match this word with past data. This past data is stored in a dictionary, which is constructed
from a content addressable memory.
Initially all the entries in the dictionary are empty & 4-bytes are added to the front of the
dictionary, while the rest move one position down if a full match has not occurred. The larger the
dictionary, the greater the number of address bits needed to identify each memory location, reducing
compression performance. Since the number of bits needed to code each location address is a
function of the dictionary size greater compression is obtained in comparison to the case where
a fixed size dictionary uses fixed address codes for a partially full dictionary.
In the parallel XMatchPro system, the data stream to be compressed enters the
compression system, which is then partitioned and routed to the compressors. For parallel
compression systems, it is important to ensure all compressors are supplied with sufficient data by
managing the supply so that neither stall conditions nor data overflow occurs.
3.4.2. The Main Component- Content Addressable Memory
Dictionary based schemes copy repetitive or redundant data into a lookup table (such as
CAM) and output the dictionary address as a code to replace the data. The compression architecture is
based around a block of CAM to realize the dictionary. This is necessary since the search
operation must be done in parallel in all the entries in the dictionary to allow high and data-independent
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
28/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
throughput.
Fig.3.3. Conceptual view of CAM
The number of bits in a CAM word is usually large, with existing implementations
ranging from 36 to 144 bits. A typical CAM employs a table size ranging between a few
hundred entries to 32K entries, corresponding to an address space ranging from 7 bits to 15 bits. The
length of the CAM varies with three possible values of 16, 32 or 64 tuples trading complexity for
compression.
The no. of tuples present in the dictionary has an important effect on compression. In principle,
the larger the dictionary the higher the probability of having a match and improving
compression. On the other hand, a bigger dictionary uses more bits to code its locations degrading
compression when processing small data blocks that only use a fraction of the dictionary length
available. The width of the CAM is fixed with 4bytes/word. Content Addressable Memory (CAM)
compares input search data against a table of stored data, and returns the address of the matching
data. CAMs have a single clock cycle throughput making them faster than other hardware and
software-based search systems.
The input to the system is thesearch wordthat is broadcast onto the searchlines to the table
of stored data. Each stored word has a matchline that indicates whether the search word and
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
29/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
stored word are identical (the match case) or are different (a mismatch case, or miss). The matchlines
are fed to an encoder that generates a binary match location corresponding to the matchline that is
in the match state. An encoder is used in systems where only a single match is expected. The overall
function of a CAM is to take a search word and return the matching memory location.
3.4.2.1. Managing Dictionary entries
Since the initialization of a compression CAM sets all words to zero, a possible input
word formed by zeros will generate multiple full matches in different locations. TheXmatchpro
compression system simply selects the full match closer to the top. This operational mode initializes
the dictionary to a state where all the words with location address bigger than zero are declared
invalid without the need for extra logic. The reason is that location x can never generate a match until
the data contents of location x-1 are different from zero because locations closer to the top have
higher priority generating matches. Also to increase dictionary efficiency, only one dictionary
position contains repeated information and in the best case, all the dictionary positions contain
different data.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
30/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
CHAPTER 4
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
31/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
XMATCHPRO LOSSLESS COMPRESSION SYSTEM
4.1. DESIGN METHODOLOGYThe XMatchPro algorithm is efficient at compressing the small blocks of data necessary
with cache and page based memory hierarchies found in computer systems. It is suitable for high
performance hardware implementation. The XMatchPro hardware achieves a throughput 2-3
times greater than other high-performance hardware implementation. The core component of the
system is the XMatchPro based Compression / Decompression system. The XMatchPro is a high-
speed lossless dictionary based data compressor. The XMatchPro algorithm works by taking an
incoming four-byte tuple of data and attempting to match fully or partially match the tuple with the
past data.
4.2. FUNCTIONAL DESCRIPTION
The XMatchPro algorithm maintains a dictionary of data previously seen and attempts to
match the current data element with an entry in the dictionary, replacing it with a shorter code
referencing the match location. Data elements that do not produce a match are transmitted in full
(literally) prefixed by a single bit. Each data element is exactly 4 bytes in width and is referred to
as tuple. This feature gives a guaranteed input data rate during compression and thus also guaranteed
data rates during decompression, irrespective of the data mix. Also the 4-byte tuple size gives an
inherently higher throughput than other algorithms, which tend to operate on a byte stream.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
32/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
The dictionary is maintained using move to front strategy, where by the current tuple is
placed at the front of the dictionary and the other tuples move down by one location as
necessary to make space. The move to front strategy aims to exploit locality in the input data. If the
dictionary becomes full, the tuple occupying the last location is simply discarded.
A full match occurs when all characters in the incoming tuple fully match a Dictionary
entry. A partial match occurs when at least any tow of the characters in the incoming tuple
match exactly with a dictionary entry, with the characters that do not match being transmitted
literally.
The use of partial matching improves the compression ratio when compared with
allowing only 4 byte matches, but still maintains high throughput. If neither a full nor
partial match occurs, then a miss is registered and a single miss bit of 1 is transmitted followed by the
tuple itself in literal form. The only exception to this is the first tuple in any compression operation,
which will always generate a miss as the dictionary begins in an empty state. In this case no miss bit is
required to prefix the tuple.
At the beginning of each compression operation, the dictionary size is reset to zero. The
dictionary then grows by one location for each incoming tuple being placed at the front of the
dictionary and all other entries in the dictionary moving down by one location. A full match
does not grow the dictionary, but the move-to-front rule is still applied. This growth of the
dictionary means that code words are short during the early stages of compressing a block. Because the
XMatchPro algorithm allows partial matches, a decision must be made about which of the
locations provides the best overall match, with the selection criteria being the shortest possible
number of output bits.
4.3 Parallel Xmatchpro Compression
The Input router of the system divide the data to be processed and Output router concatenate
the data to give as output of the parallel compression system respectively. The split data by Input
Router are sent to each of the compression system or XMatchPro compression engines where the
data is compressed and is sent to the Output Router to merge the compressed data and sent out as
the compressed data.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
33/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
For multiple compression systems, it is important to ensure all compressors are supplied
with sufficient data by managing the supply so that neither stall conditions nor data overflow occurs.
There are several approaches in which data can be routed in and out of the compressors. The
basic method for input routing used in this project is done by getting twice the size of the input to the
XMatchPro compressor, the lower 32 bit is given to the Compressor 0 and the higher 32 bits are
given to the other Compressor 1. The method is used for output routing and additional output
pins are assigned for both the Compressor 0 and Compressor 1.
4.4. DATA FLOW FOR PARALLEL XMATCHPRO COMPRESSOR
The below figure shows graphically the general concept of this approach. Thedata
stream to be compressed enters the compression system, which is then partitioned and routed to
the compressors. Appropriate methods for routing the data are discussed below, but to achieve
good compression performance, it is important that the partitioning mechanism supplies the
compressors with sufficient data to keep them active for as great a proportion of the time that the stream
is entering the system as is possible.
As the compressors operate independently, each producing its own compressed data
stream, a mechanism is required to merge these streams in such a way that subsequent
decompression can be performed correctly. Also, subsequent decompression needs to be capable of
operating in an appropriate parallel fashion, otherwise, a disparity in compression and decompression
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
34/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
speeds will occur, reducing overall throughput.
The data Flow for parallel compression system is given in Figure 3 below.
4.5. INPUT ROUTINGAs per the Algorithm, XMatchPro can process four bytes of data per clock cycle, then to ensure
that all are busy, data must enter the system at a rate of 4n bytes per clock cycle, where n is the
number of compressors in the system. It can be achieved by 2 methods.
1. Interleaved input method
2. Blocked Input method
4.5.1INTERLEAVED INPUT METHOD
In the Interleaved input approach, the router divides the input data into 4-byte widedata streams that are fed into the compressors. This is illustrated in the below figure for two
compressors, but the technique can be extended to supply data to any required number of
compressors.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
35/111
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
7 5 3 1
8 6 4 2
IR
7 5 3 1
8 6 4 2
XMatchPro
XMatchPro
Fig.4.3. Interleaved Input Routing
The interleaved method avoids the need for input buffering as data are continuously fed
to the compressors and acted upon immediately on arrival. This minimization of latency is an
important advantage of the approach.
4.5.2. BLOCKED INPUT METHOD
In the blocked input approach, a fixed length block of data is sent from the incoming data
stream to each of the compressors in turn, as shown in the following figure. In this scheme, the
data has to arrive at the dedicated memory of the compressor at a rate slower than it can be processed,
thereby allowing the memory to be filled with data.
www.1000projects.com
www.fullinterview.comwww.chetanasprojects.com
http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/http://www.1000projects.com/http://www.fullinterview.com/ -
7/27/2019 04 32 bit loss less comp.doc
36/111
To minimize the latency introduced in blocked mode, compressors need to start
processing data as it arrives. It is also important to ensure that sufficient data are available for the
compressor to work on while data are being routed to the other compressors, as no new data can be
added to the dedicated memory until this process has been completed.
4.5.3. PROPOSED INPUT ROUTING
In this project, Blocked Input Routing method is used for inputting data to compression
system as it is more advantageous than interleaved input approach. The advantage of going for
this method is that the complexity in designing and coding is reduced and helps in achieving
superior compression ratio. But at the same time number of input pins increase as it assigns another
set of pins for the second XMatchPro compressor. Actually, the input data size for one
XMatchPro compressor is 32 bit, so another 32 bit is required for the second XMatchPro
compressor. In order to achieve this, while designing the parallel compressor an input data is assigned
as 64 bits and the lower order 32 bits is sent to one XMatchPro compressor and the higher order 32 bit
is sent to the second XMatchPro compressor. Thus, by doing so, both the XMatchPro compressor is
supplied with the data simultaneously and this increases the speed of compression.
4.6. OUTPUT ROUTING
The lengths of the compressed data output blocks from an array of parallel compressors
will generally not be constant due to the variability of redundancy in the data. As in
decompression, the system would not know the data boundaries of each block, these data cannot be sent
directly to the output bus and additional manipulation is needed in order to guarantee that the original
data can be recovered.
It is achieved by 3 methods, namely,
1. Single Compressed Block
2. Multiple Compressed Block
3. Interleaved Compressed Block
4.6.1. SINGLE COMPRESSED BLOCK
In this method, it is assumed that the data enters the system using the blocked mode
technique and that the compressed data are collected in the compressors output buffers. The
buffer outputs are routed in strict order of the compressor number and a boundary tag that
-
7/27/2019 04 32 bit loss less comp.doc
37/111
contains information on the block length is added so as to precede the data. As the tag will enter the
decompression system, first, it will know the length of the compressed data input belonging to any
given decompression engine. The introduction of tags is detrimental to the compression ratio, but this
effect diminishes as the block length is increased, as the overhead of one tag per block of compressed
data is largely constant.
One of the drawbacks of this approach is that the data output may contain idle time.
This arises since a whole block of data needs to be compressed before the appropriate tag
values can be determined and, so, a compressor may still be compressing its data when router becomes
available.
4.6.2. MULTIPLE COMPRESSED BLOCK
The Figure 2.7 illustrates the format of an output data stream containing multiple blocks. This
is similar to the single block scheme, but, instead of waiting for each compressor to finish
processing its block of data, all compressors need to finish compressing blocks before the data
are sent. In this technique, the tag provides information on the length or the compressed data to
enable correct decompression. As all compressors need to have completed their operations before an
output can be produced, this approach has a greater latency compared with the single compressed block
case, but, as fewer tags are needed, the effect on the compression ratio is reduced. The combined tag isshorter than the sum of the individual tags as the output bus granularity is of fixed width. Output tags
are sized in accordance with the output but width in order to simplify the routing architecture and
decoding operations, even though fewer bits are required to determine block size boundaries.
-
7/27/2019 04 32 bit loss less comp.doc
38/111
4.6.3. INTERLEAVED COMPRESSED BLOCK
The figure illustrates the interleaved approach for routing multiple compressed blocks of
data to the output stream. Instead of waiting for a whole block to be compressed, a predefined fixed
length of compressed data is always sent to the output. If a compressor has not completed its
operations, the system must wait until the data block has been produced.
There are two benefits of this approach compared with the previously discussed methods.
First, there is a reduction in latency since data can be sent to the output before the whole block is
compressed. Second, since no boundary tags are required, the compression ratio is improved.
At the end of compression sequence, the interleaved approach needs to add dummy tags
to the output stream in receipt of the stop signal, output routing continues until all compressors
have completed operations on their input blocks. It is likely that the final interleaved block from each
compressor will contain insufficient data to fill the required fixed output length and, so, the dummy
data tags are added as required in order to maintain the interleave length.
-
7/27/2019 04 32 bit loss less comp.doc
39/111
4.6.4. PROPOSED OUTPUT ROUTING
In this project, the Interleaved technique was selected as the Output Routing method as it
imparts no overhead to maintain compressed data boundaries, and so has no detrimental effect on the
compression ratio. The advantage of going for this method is that the complexity in designing and
coding is reduced. But at the same time number of input pins increase as it assigns another set of pinsfor the second XMatchPro compressor. Actually, the output data size for one 32 bit compressor is either
7 bit (match is found) or 33 bit (match not found), so another set of 33 bit in case of no match and 7 bit
in case of match is required for the second compressor. In order to achieve this, while designing the
parallel compressor an output data is assigned with two sets of 7 bits as well as two 33 bit output pins.
Thus, by doing so, both the compressors are supplied with data simultaneously and the output
data is transmitted via the external bus
4.7. IMPLEMENTATION OF XMATCHPRO BASED COMPRESSOR
The block diagram gives the details about the components of a single 32 bit Compressor.
There are three components namely, COMPARATOR, ARRAY, CAMCOMPARATOR. The
comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for equal and 0 for
unequal. The CAM COMPARATOR searches the CAM dictionary entries for a full match of the input
data given.
The reason for choosing a full match is to get a prototype of the high throughout Xmatchpro
-
7/27/2019 04 32 bit loss less comp.doc
40/111
compressor with reduced complexity and high performance.
If a full match occurs, the match-hit signal is generated and the corresponding match
location is given as output by the CAM Comparator.. If no full match occurs, the corresponding data
that is given as input at the given time is got as output.
Array is of length of 64X32 bit locations. This is used to store the unmatched incoming data and
when a new data comes, the incoming data is compared with all the data stored in this array. If a match
occurs, the corresponding match location is sent as output else the incoming data is stored in
next free location of the array & is sent as output. The last component is the cam comparator and
is used to send the match location of the CAM dictionary as output if a match has occurred. This is
done by getting match information as input from the comparator.
Suppose the output of the comparator goes high for any input, the match is found and the
corresponding address is retrieved and sent as output along with one bit to indicate that match is
found. At the same time, suppose no match occurs, or no matched data is found, the incoming data is
stored in the array and it is sent as the output. These are the functions of the three components of the
Compressor. The hardware descriptions of these modules are done using VHDL Language. VHDL is
an acronym for Very high-speed integrated circuits Hardware Description Language. It can be
used to model a digital system at many levels of the abstraction, ranging from the algorithmic
level to gate level.
The VHDL language can be regarded as an integrated amalgamation of the following
languages:
o Sequential language
o Concurrent language
o Net-list language
o Timing specifications
o Waveform generation language.
So the language has constructs that enable you to express the concurrent or sequential
behavior of a digital system with or without timing. It also allows modeling the system as an
inter-connection of components. Test waveforms can also be generated using the same constructs. The
language not only defines the syntax but also defines very clear simulation semantics for each language
construct. Therefore, models written in this language can be verified using a VHDL simulator.
VHDL is event driven, to allow for efficient simulation of hardware. Computations are
only performed when some data has changed (event occurred).
-
7/27/2019 04 32 bit loss less comp.doc
41/111
CHAPTER 5
-
7/27/2019 04 32 bit loss less comp.doc
42/111
DESIGN OF PARALLEL LOSSLESS
COMPRESSION/DECOMPRESSION SYSTEM
5.1. DESIGN OF COMPRESSOR / DECOMPRESSOR
The block diagram [Fig.12] gives the details about the components of a single 32-bit
compressor / decompressor. The Same design approach is used for designing a 64-bit
Compression/Decompression system which is essentially used for comparison of increased
compression rates given by the 64-bit Lossless Parallel High-Speed Data Compression System.
There are three components namely COMPRESSOR, DECOMPRESSOR and CONTROL.The
compressor has the following components - COMPARATOR, ARRAY, and CAMCOMPARATOR.
The comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for equal and 0
for unequal.
Array is of length of 64X32bit locations. This is used to store the unmatched in coming
data and when the next new data comes, that data is compared with all the data stored in this array. If
the incoming data matches with any of the data stored in array, the Comparator generates a match
signal and sends it to Cam Comparator. The last component is the Cam comparator and is
used to send the incoming data and all the stored data in array one by one to the comparator.
Suppose output of comparator goes high for any input, then the match is found and the
corresponding address (match location) is retrieved and sent as output along with one bit to indicate
the match is found. At the same time, suppose no match is found, then the incoming data stored in the
array is sent as output. These are the functions of the three components of the XMatchPro
based compressor.
The decompressor has the following components Array and Processing Unit. Array has
the same function as that of the array unit which is used in the Compressor. It is also of the same length.
Processing unit checks the incoming match hit data and if it is 0, it indicates that the data is not present
in the Array, so it stores the data in the Array and if the match hit data is 1, it indicates the data is
present in the Array, then it instructs to find the data from the Array with the help of the address input
and sends as output to the data out.
-
7/27/2019 04 32 bit loss less comp.doc
43/111
Fig.5.1. Block Diagram of 32 bit Compressor/Decompressor
-
7/27/2019 04 32 bit loss less comp.doc
44/111
The Control has the input bit called C / D i.e., Compression / Decompression indicates
whether compression or decompression has to be done. If it has the value 0 then compressor is stared
when the value is 1 decompression is done.
5.2. DESIGN OF 64 BIT SINGLE COMPRESSION/DECOMPRESSION SYSTEM
The 64 bit single Compression /Decompression system is done to compare the compression
rate & area with the parallel compression / decompression system which gives higher throughput.
The design & functionality of the 64- bit Single compression system is same as that of
the 32-bit compressor / decompressor discussed above except the input is changed from 32-bit to 64-
bit & hence to accommodate more data in CAM dictionary, the array size is increased from 64X32 to
128 X 64. The match location is now given by 7 bits for the fixed 128 locations of the memory.
In the Compression system, the comparator compares the incoming 64 bit data with data
entries that are previously stored in the memory. If any of the dictionary entries matches with the
incoming data, then a match signal is generated to provide the corresponding match location as
output along with match signal. If no match occurs, then the incoming data is stored in the dictionary
entry and the data is given as output of the compressor.
The Decompression system hence gets 64 bit data if a match has not occurred or 1 bit match
signal & 7 bit match location to be processed by the 128 X 64 array in decompressor to give
the data in the match location as output. The block diagram of the 64 bit Compression / Decompression
System is given below.
-
7/27/2019 04 32 bit loss less comp.doc
45/111
Fig.5.2. Block Diagram of 64 bit Compression / Decompression system
5.3. PARALLEL COMPRESSION / DECOMPRESSION SYSTEM
-
7/27/2019 04 32 bit loss less comp.doc
46/111
5.3.1 DESIGN OF PARALLEL COMPRESSION SYSTEM
The block diagram gives the details about the components of a parallel Compression
system. Here the compressor is instantiated twice for the two processors. The number of input as
well as the number of output pins are twice as that of the single compressor.
The components of the single instantiated compressor are same as that of the 32-bit compressor.
The components involved in the 32-bit compressor are namely, COMPARATOR, ARRAY, and
CAMCOMPARATOR.
The comparator is used to compare two 32-bit data and to set or reset the output bit as 1 for
equal and 0 for unequal. Array is of length of 64X32bit locations.
This is used to store the unmatched incoming data and when a new data comes, that
data is compared with the all the data stored in this array for a match. If no match occurs, the
incoming data is stored in next free location of the array. The last component is the cam comparator and
is used to send the incoming data and all the stored data in array one by one to the comparator.
-
7/27/2019 04 32 bit loss less comp.doc
47/111
Comparator goes high for any input the match is found and the corresponding address is
retrieved and sent as output along with one bit to indicate that a match is found. At the same time,
suppose that no match is found, then the incoming data is stored in the array and is sent as output.
These are the functions of the three components of the 32-bit Compressor.
5.3.2 DESIGN OF PARALLEL DECOMPRESSION SYSTEM
The parallel Decompression system is also implemented by concatenating the outputs of
two compressors in parallel architecture and giving those data as input to the parallel decompression
system comprising two 32-bit decompression system discussed above for single compression
system. The 32-bit decompressor has the following components Array and Processing Unit.
Array has the same function as that of the array unit which is used in the Compressor. It is
also of the same length. Processing unit checks the incoming match hit data and if it is 0, it indicates
that the data is not present in the Array, so it stores the data in the Array. If the match hit data is 1, it
indicates the data is present in the Array, then it instructs to find the data from the Array with the
help of the address input (match location) and sends as output to the data out.
5.4. SIMULATION RESULTS
The design coded in VHDL is simulated using Modelsim of Mentor Graphics. The obtained
waveforms are as follows
Fig.5.4.Comparator
-
7/27/2019 04 32 bit loss less comp.doc
48/111
Fig.5.5. Cam Comparator
-
7/27/2019 04 32 bit loss less comp.doc
49/111
Fig.5.6.Content Addressable Memory
-
7/27/2019 04 32 bit loss less comp.doc
50/111
Fig.5.7. 32-bit Single Compression Top Module
Fig.5.8. 32-bit Single Compression Top Module Decimal inputs
-
7/27/2019 04 32 bit loss less comp.doc
51/111
Fig.5.9. 64-bit Single Compression System -Top module
Fig.5.10. 64-bit Single Compression System -Test bench Waveform
-
7/27/2019 04 32 bit loss less comp.doc
52/111
Fig.5.11. 32-bit Single Decompression Top Module
Fig.5.12. 32-bit Single Decompression- Test bench Waveform
-
7/27/2019 04 32 bit loss less comp.doc
53/111
Fig.5.13. Parallel Compression System - 64-bit input Top module
Fig.5.14. Parallel Compression System - 64-bit input Test bench
-
7/27/2019 04 32 bit loss less comp.doc
54/111
5.5. RTL SCHEMATIC
The RTL Schematic for vhdl codes are generated using Xilinx Project Navigator 8.1i
Fig.5.15. 32 bit Single Compression System
Fig.5.16. 32 bit Single Compression System
-
7/27/2019 04 32 bit loss less comp.doc
55/111
Fig.5.17. 64 bit Single Compression System
Fig.5.18. RTL Schematic for 64 bit Single Compression System
-
7/27/2019 04 32 bit loss less comp.doc
56/111
Fig5.19. 64 bit Parallel Compression System
Fig.5.20. RTL Schematic for 64 bit Parallel Compression System
5.6. Xilinx Synthesis Results for Target Device xc2v1500bg575-6
-
7/27/2019 04 32 bit loss less comp.doc
57/111
5.6.1. 32-bit Single Compression System
===============================================================
* Synthesis Options Summary *
===============================================================---- Source Parameters
Input File Name : "xmatchpro.prj"
Input Format : mixedIgnore Synthesis Constraint File : NO
---- Target Parameters
Output File Name : "xmatchpro"Output Format : NGC
Target Device : xc2v1500-6-bg575
===============================================================* HDL Compilation *
===============================================================
Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/comparator.vhd" in Library
work.
Architecture arch _comp of Entity comparator is up to date.
Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/camcomp.vhd" in Library work.
Architecture arch_cam64 of Entity camcomp is up to date.
Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/cam.vhd" in Library work.
Architecture arch_cam of Entity cam is up to date.
Compiling vhdl file "E:/proj/xilinx/s_comp32/s_comp32/xmatchpro.vhd" in Library
work.
Architecture arch_xmatch of Entity xmatchpro is up to date.
Table 4.1. 32-bit Single Compression System - HDL Synthesis Report
Macro Statistics
# ROMS
4x1 bit ROM
No.
64
64
# Adders/Subtractors 1
32-bit adder
# Registers
1-bit register32-bit register
6-bit register
# Latches
1-bit latch
6-bit latch
# Comparators
32-bit comparator equal
1
68
166
1
2
11
64
64
43
-
7/27/2019 04 32 bit loss less comp.doc
58/111
5.6.2. 64-bit Single Compression System===============================================================
* Synthesis Options Summary *===============================================================
---- Source ParametersInput File Name : "xmatchpro.prj"Input Format : mixed
Ignore Synthesis Constraint File : NO
---- Target Parameters
Output File Name : "xmatchpro"
Output Format : NGC
Target Device : xc2v1500-6-bg575===============================================================
* HDL Compilation *
===============================================================Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/comparator.vhd" in Library
work.
Architecture arch_comp of Entity comparator is up to date.Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/camcomp.vhd" in Library work.
Architecture arch_cam64 of Entity camcomp is up to date.
Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/cam.vhd" in Library work.Architecture arch_cam of Entity cam is up to date.
Compiling vhdl file "E:/proj/xilinx/s_comp64/s_comp64/xmatchpro.vhd" in Library
work.Architecture arch_xmatchpro of Entity xmatchpro is up to date.
Table 5.2. 64-bit Single Compression System - HDL Synthesis Report
Macro Statistics
# ROMS
4x1 bit ROM
Nos.
128
128
# Adders/Subtractors 1
32-bit adder
# Registers
1-bit register
32-bit register
64-bit register
7-bit register# Latches
1-bit latch
7-bit latch
# Comparators
64-bit comparator equal
1
132
1
1
129
12
1
1
128
128
-
7/27/2019 04 32 bit loss less comp.doc
59/111
-
7/27/2019 04 32 bit loss less comp.doc
60/111
5.6.4. 64-bit Parallel Decompression System
===============================================================
* Synthesis Options Summary *===============================================================
---- Source Parameters
Input File Name : "LL_decomp.prj"Input Format : mixed
Ignore Synthesis Constraint File : NO
---- Target Parameters
Output File Name : "LL_decomp"
Output Format : NGC
Target Device : xc2v1500-6-bg575===============================================================
* HDL Compilation *
===============================================================Compiling vhdl file "E:/proj/xilinx/dual_decomp/dual_decomp/de_xmatchpro.vhd" in
Library work.
Architecture arch_de_camcomparator of Entity de_xmatchpro is up to date.Compiling vhdl file "E:/proj/xilinx/dual_decomp/dual_decomp/LL_decomp.vhd" in
Library work.
Architecture arch_dualdecomp of Entity ll_decomp is up to date.
Table 5.4. 64-bit Parallel Decompression System - HDL Synthesis Report
Macro Statistics Nos.
# Adders / Subtractors 2
32-bit adder 2
# Latches 130
32-bit latch 130
# Multiplexers 2
32-bit 64-to-1 multiplexer 2
-
7/27/2019 04 32 bit loss less comp.doc
61/111
CHAPTER 6
-
7/27/2019 04 32 bit loss less comp.doc
62/111
CHAPTER 6 -ANALYSIS OF RESULTS
6.1. Device Utilization of Various Modules
Table 6.1.
Compression Device Utilization Summary for Selected Device: xc2v1500bg575-6
Modules:32-bit Single
Compression
64-bit Single
Compression
64-bit Parallel
Compression
Number of Slices: 1756 out of 768022%
6819 out of 768088%
3560 out of 768046%
Number of Slice 2064 out of 15360 8168 out of 15360 4206 out of 15360
Flip Flops: 13% 53% 27%
Number of 4 input 1368 out of 15360 4776 out of 15360 2930 out of 15360LUTs:
Number of bonded
IOBs:
IOB Flip Flops:
8%
74 out of 392
18%
39
31%
139 out of 392
35%
72
19%
145 out of 392
36%
78
Number of
GCLKs:
2 out of 16 12% 2 out of 16 12% 2 out of 16 12%
-
7/27/2019 04 32 bit loss less comp.doc
63/111
6.2. CADENCE RTL Compiler Reports
The Hardware designs done are compiled in Cadence RTL compiler and the
results are as follows:
6.2.1. 32-bit Single Compression System
6.2.1.1. Area Report
============================================================Generated by: Encounter(r) RTL Compiler v06.10-p003_1Generated on: Apr 17 2007 08:42:56 PMModule: scomp_32Technology libraries: typical 1.3
tpz973gtc 230ram_128x16A 0.0ram_256x16A 0.0
rom_512x16A 0.0pllclk 4.3Operating conditions: typical (balanced_tree)Wireload mode: segmented
============================================================
Instance Cells Cell Area Net Area Wireload----------------------------------------------------------------------scomp_32 5393 116863 0 TSMC32K_Conservative (S)
6.2.1.2. Power Report============================================================
Generated by: Encounter(r) RTL Compiler v06.10-p003_1Generated on: Apr 17 2007 08:43:13 PMModule: scomp_32Technology libraries: typical 1.3
tpz973gtc 230ram_128x16A 0.0ram_256x16A 0.0rom_512x16A 0.0pllclk 4.3
Operating conditions: typical (balanced_tree)Wireload mode: segmented
============================================================
Leakage Internal Net SwitchingInstance Cells Power(nW) Power(nW) Power(nW) Power(nW)
-----------------------------------------------------------------------scomp_32 5393 4.255 5832894.166 2001783.940 7834678.105
6.2.1.3. Timing Report============================================================
Generated by: Encounter(r) RTL Compiler v06.10-p003_1Generated on: Apr 17 2007 08:42:21 PMModule: scomp_32Technology libraries: typical 1.3
-
7/27/2019 04 32 bit loss less comp.doc
64/111
tpz973gtc 230ram_128x16A 0.0ram_256x16A 0.0rom_512x16A 0.0pllclk 4.3
Operating conditions: typical (balanced_tree)Wireload mode: segmented============================================================
Pin Type Fanout Load Slew Delay Arrival(fF) (ps) (ps) (ps)
----------------------------------------------------------------------(clock clk) launch 0 Ru3
Wr_addr_reg_reg[31]/CK setup 0 +365 7451 R- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -(clock clk)