Final Project 112
-
Upload
joginder-rohilla -
Category
Documents
-
view
221 -
download
0
Transcript of Final Project 112
-
8/8/2019 Final Project 112
1/32
Project Report
ON
VIDEO COMPRESSION
Submitted in partial fulfillment of the requirement for the degreeof
B. Tech. in Computer Science & Engineering.
Under Guidance of:Mr. Roshan Singh(Assistant Professor)
MANAV RACHNA COLLEGE OF ENGINEERING
FARIDABAD
BATCH (2007-2011)
1
Submitted By:Amit Saini (Roll No.)
Raj Kamal Sharma (Roll No.)
Sandeep Yadav (Roll No.)
-
8/8/2019 Final Project 112
2/32
Table Of Content
Chapter 1. Introduction
1.1 Objective(s) of the System/Tool 1
1.2 Scope of the System/Tool 2
1.3 Problem definition of the System/Tool ,3
1.4 Hardware and Software Requirements 3.
Chapter 2. Problem Analysis
2.1 Literature Survey .4
2.1.1 Introduction.........................................................................................4
2.1.2 VIDEO COMPRESSION TECHNOLOGY...............................5
2.1.3.COMPRESSION STANDARDS................................6
2.1.4.MPEG-1...............................................................................................9
2.1.5.MPEG-2..............................................................................................12
2.1.6.MPEG-4..............................................................................................17
2.1.7.MPEG-7..............................................................................................19
2.1.8.H.261....................................................................................................21
2.2 Methodology Adopted ..24
Chapter 3. Project Estimation and Implementation Plan
3.1 Cost and Benefit Analysis 26
3.2. Schedule Estimate 28
3.3 PERT Chart/ Gantt Chart 28
Reference s30
2
-
8/8/2019 Final Project 112
3/32
Chapter -1 Introduction
1.1. OBJECTIVE
1.1.1. NEED OF THE SYSTEM: Uncompressedxc video (and audio) data are huge. In
HDTV, the bit rate easily exceeds 1 Gbps. -- big problems for storage and network
communications. For example: One of the formats defined for HDTV broadcasting
within the United States is 1920 pixels horizontally by 1080 lines vertically, at 30
frames per second. If these numbers are all multiplied together, along with 8 bits for
each of the three primary colors, the total data rate required would be approximately 1.5
Gb/sec. Because of the 6 MHz. channel bandwidth allocated, each channel will only
support a data rate of 19.2 Mb/sec, which is further reduced to 18 Mb/sec by the fact
that the channel must also support audio, transport, and ancillary data information. As
can be seen, this restriction in data rate means that the original signal must be
compressed by a figure of approximately 83:1. This number seems all the more
impressive when it is realized that the intent is to deliver very high quality video to the
end user, with as few visible artifacts as possible.
1.1.2. OUTCOME OF THE SYSTEM:
Video Compressor is a multifunctional video compression software to help you compress
video files to smaller file size. With comprehensive video formats supported, plentiful profiles
and handy tools provided, this Video Compressor is the ideal video file compressor and video
size compressor .
A digital video compression system and its methods for compressing digitalized video signals
in real time. The system compressor receives digitalized video frames divided into subframes,
performs in a single pass a spatial domain to transform domain transformation in two
dimensions of the picture elements of each subframe, normalizes the resultant coefficients by a
normalization factor having a predetermined compression ratio component and an adaptive
rate buffer capacity control feedback component, to provide compression, encodes the
coefficients and stores them in a first rate buffer memory asynchronously at a high data transfer
3
-
8/8/2019 Final Project 112
4/32
rate from which they are put out at a slower, synchronous rate. The compressor adaptively
determines the rate buffer capacity control feedback component in relation to instantaneous
data content of the rate buffer memory in relation to its capacity, and it controls the absolute
quantity of data resulting from the normalization step so that the buffer memory is never
completely emptied and never completely filled. In expansion, the system essentially mirrors
the steps performed during compression. An efficient, high speed decoder forms an important
aspect of the present invention. The compression system forms an important element of a
disclosed color broadcast compression system.
Scope of the system / Tool
What will the tool be able to do?
The system tool is able into these topics:
Desktop Tools and Development Environment Startup and shutdown, arranging thedesktop, and using tools to become more productive with MATLAB
Data Import and Export Retrieving and storing data, memory-mapping, andaccessing Internet files
Mathematics Mathematical operationsData Analysis Data analysis, including data fitting, Fourier analysis, and time-series
toolsProgramming Fundamentals The MATLAB language and how to develop
MATLAB applicationsObject-Oriented Programming Designing and implementing MATLAB classesGraphics Tools and techniques for plotting, graph annotation, printing, and
programming with Handle Graphics objects3-D Visualization Visualizing surface and volume data, transparency, and viewing
and lighting techniquesCreating Graphical User Interfaces GUI-building tools and how to write callback
functionsExternal Interfaces MEX-files, the MATLAB engine, and interfacing to Sun
Microsystems Java software, Microsoft .NET Framework, COM, Web services, and theserial port
There is reference documentation for all MATLAB functions:
Function Reference Lists all MATLAB functions, listed in categories or alphabetically
Handle Graphics Property Browser Provides easy access to descriptions of graphicsobject properties
4
-
8/8/2019 Final Project 112
5/32
C/C++ and Fortran API Reference Covers functions used by the MATLAB externalinterfaces, providing information on syntax in the calling language, description, arguments,return values, and examples
The MATLAB application can read data in various file formats, discussed in the followingsections:
Recommended Methods for Importing DataImporting MAT-FilesImporting Text Data FilesImporting XML DocumentsImporting Excel SpreadsheetsImporting Scientific Data FilesImporting ImagesImporting Audio and VideoImporting Binary Data with Low-Level I/O
1.4 Hardware And Software Requirements:1.4.1 HARDWARE REQUIREMENTS:
512 MB RAM
10 GB HARD DISK
1.4.2 SOFTWARE REQUIREMENTS :
1. OPERATING SYSTEM WINDOWs or LINUX
2. MATLAB
5
http://www.mathworks.com/help/techdoc/import_export/braietb-1.htmlhttp://www.mathworks.com/help/techdoc/import_export/f5-86568.htmlhttp://www.mathworks.com/help/techdoc/import_export/f5-124245.htmlhttp://www.mathworks.com/help/techdoc/import_export/f5-132080.htmlhttp://www.mathworks.com/help/techdoc/import_export/f5-6031.htmlhttp://www.mathworks.com/help/techdoc/import_export/f5-86568.htmlhttp://www.mathworks.com/help/techdoc/import_export/f5-124245.htmlhttp://www.mathworks.com/help/techdoc/import_export/f5-132080.htmlhttp://www.mathworks.com/help/techdoc/import_export/f5-6031.htmlhttp://www.mathworks.com/help/techdoc/import_export/braietb-1.html -
8/8/2019 Final Project 112
6/32
Chapter-2 Problem Analysis
2.1 Literature Survey
2.1.1 Introduction (Background Work):
Video compression typically operates on square-shaped groups of neighboring pixels, often
called macro blocks. These pixel groups or blocks of pixels are compared from one frame to
the next and the video compression codec (encode/decode scheme) sends only the difference
within those blocks. This works extremely well if the video has no motion. A still frame of
text, for example, can be repeated with very little transmitted data. In areas of video with more
motion, more pixels change from one frame to the next. When more pixels change, the video
compression scheme must send more data to keep up with the larger number of pixels that are
changing. If the video content includes an explosion, flames, a flock of thousands of birds, or
any other image with a great deal of high-frequency detail, the quality will decrease, or the
variable bit rate must be increased to render this added information with the same level of
detail.
Video is basically a three-dimensional array of color pixels. Two dimensions serve as spatial
(horizontal and vertical) directions of the moving pictures, and one dimension represents the
time domain. A data frame is a set of all pixels that correspond to a single time moment.Basically, a frame is the same as a still picture.
Some forms of data compression are lossless. This means that when the data is decompressed,
the result is a bit-for-bit perfect match with the original. While lossless compression of video is
possible, it is rarely used, as lossy compression results in far higher compression ratios at an
acceptable level of quality.
One of the most powerful techniques for compressing video is interframe compression.
Interframe compression uses one or more earlier or later frames in a sequence to compress the
current frame, while intraframe compression uses only the current frame, which is effectively
image compression.
The most commonly used method works by comparing each frame in the video with the
previous one. If the frame contains areas where nothing has moved, the system simply issues a
6
-
8/8/2019 Final Project 112
7/32
-
8/8/2019 Final Project 112
8/32
other methods (DCT) that work on smaller pieces of the desired data. The result is a
hierarchical representation of an image, where each layer represents a frequency band.
2.1.3 Compression Standards (Techniques for solving the problem):
MPEG stands for the Moving picture Expert Group.MPEG is an ISO/IEC working group,
established in 1988 to develop standards for digital audio and video formats. There are four
MPEG standards being used or in development. Each compression standard was designed with
a specific application and bit rate in mind, although MPEG compression scales well with
increased bit rates. They include:
2.1.3.1 MPEG-1
Designed for up to 1.5 Mbit/sec Standard for the compression of moving pictures and audio.
This was based on CD-ROM video applications, and is a popular standard for video on the
Internet, transmitted as .mpg files. In addition, level 3 of MPEG-1 is the most popular standard
for digital compression of audio--known as MP3. MPEG-1 is the standard of compression for
VideoCD, the most popular video distribution format thoughout much of Asia.
2.1.3.2 MPEG-2
Designed for between 1.5 and 15Mbit/sec Standard on which Digital Television set top boxes
and DVD compression is based. It is based on MPEG-1, but designed for the compression and
transmission of digital broadcast television. The most significant enhancement from MPEG-1
is its ability to efficiently compress interlaced video. MPEG-2 scales well to HDTV resolution
and bit rates, obviating the need for an MPEG-3.
2.1.3.4 MPEG-4
Standard for multimedia and Web compression. MPEG-4 is based on object-based
compression, similar in nature to the Virtual Reality Modeling Language. Individual objects
within a scene are tracked separately and compressed together to create an MPEG4 file. This
results in very efficient compression that is very scalable, from low bit rates to very high. It
8
-
8/8/2019 Final Project 112
9/32
also allows developers to control objects independently in a scene, and therefore introduce
interactivity.
2.1.3.5 MPEG7- this standard, currently under development, is also called the Multimedia
Content Description Interface. When released, the group hopes the standard will provide a
framework for multimedia content that will include information on content manipulation,
filtering and personalization, as well as the integrity and security of the content. Contrary to the
previous MPEG standards, which described actual content, MPEG-7 will represent information
about the content.
2.1.3.6 H.261:- H.261 is an ITU standard designed for two-way communication over ISDN
lines (video conferencing) and supports data rates which are multiples of 64Kbit/s. The
algorithm is based on DCT and can be implemented in hardware or software and uses
intraframe and interframe compression. H.261 supports CIF and QCIF resolutions.
MPEG4 advantages include high compression, low bit rate and motion compensation support.
Disadvantages are latency and blocking artifacts. JPEG, JPEG2000, and MPEG4 have all been
used in video surveillance systems, with the choice depending on what is most important in
that particular application. H.264 is an advanced compression scheme which is also starting to
find its way into video surveillance systems. H.264 offers compression at the expense of additional hardware complexity. It is not examined this paper, but FPGA-based solutions for
H.264.
2.1.3.7 MPEG21:
The MPEG-21 standard, from the Moving Picture Expert Group, aims at defining an open
framework for multimedia applications. MPEG-21 is ratified in the standards ISO/IEC 21000 -
Multimedia framework (MPEG-21).
MPEG-21 is based on two essential concepts:
definition of a Digital Item (a fundamental unit of distribution and transaction)
users interacting with Digital Items
9
-
8/8/2019 Final Project 112
10/32
Digital Items can be considered the kernel of the Multimedia Framework and the users can be
considered as who interacts with them inside the Multimedia Framework. At its most basic
level, MPEG-21 provides a framework in which one user interacts with another one, and the
object of that interaction is a Digital Item. Due to that, we could say that the main objective of
the MPEG-21 is to define the technology needed to support users to exchange, access,
consume, trade or manipulate Digital Items in an efficient and transparent way.
The storage of an MPEG-21 Digital Item in a file format based on the ISO base media file
format, with some or all of Digital Item's ancillary data (such as movies, images or other non-
XML data) within the same file.
2.1.3.8 H.263:
H.263 is a video compression standard originally designed as a low-bit rate compressed format
for video conferencing. It was developed by the (VCEG) ITU-T Video Coding Expert Group
(VCEG).
H.263 has since found many applications on the internet: much Flash video content (as used
on sites such as YouTube,Google Video,MySpace, etc.) used to be encoded in Sorenson Spark
format, though many sites now use VP6 or H.264 encoding.
H.263 was developed as an evolutionary improvement based on experience from H.261, the
previous ITU-T standard for video compression, and the MPEG-1 and MPEG-2 standards. Its
first version was completed in 1995 and provided a suitable replacement for H.261 at all bit
rates. It was further enhanced in projects known as H.263v2; MPEG-4 Part 2 is H.263
compatible in the sense that a basic H.263 bit stream is correctly decoded by an MPEG-4
Video decoder.
2.1.3.9 H.264:
10
-
8/8/2019 Final Project 112
11/32
The next enhanced codec developed by ITU-T VCEG (in partnership with MPEG) after H.263
is the H.264 standard, also known as AVC and MPEG-4 part 10. As H.264 provides a
significant improvement in capability beyond H.263, the H.263 standard is now considered a
legacy design. Most new videoconferencing products now include H.264 as well as H.263 and
H.261 capabilities. H.264 is used in such applications as players for Blu-ray Discs, videos from
You Tube and the iTunes Store, web software such as the Adobe Flash Player and Microsoft
Silverlight, broadcast services for DVB and SBTVD, direct-broadcast satellite television
services, cable television services, and real-time videoconferencing.
2.1.4 MPEG-1
The Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to
about 1,5 Mbit/s (ISO/IEC 11172) or MPEG-1 as it is more commonly known as, standardizesthe storage and retrieval of moving pictures and audio storage media forms the basis for Video
CD and MP3 formats.
This part of the specification describes the coded representation for the compression of video
sequences.
The basic idea of MPEG video compression is to discard any unnecessary information i.e. an
MPEG-1 encoder by analyses:
1
How much movement there is in the current frame compared to the previous frame what
changes of color have taken place since the last frame what changes in light or contrast have
taken place since the last frame what elements of the picture have remained static since the last
frame
The encoder then looks at each individual pixel to see if movement has taken place, if there has
been no movement, the encoder stores an instruction to say to repeat the same frame or repeat
the same frame, but move it to a different position.
1.I intra-frame
2.B Bidirectional frames
3.P Predicted frames
11
-
8/8/2019 Final Project 112
12/32
Audio, video and time codes are converted into one single stream.
625 and 525 line
from 1 to 1.5Mbits/s
24-30 frames per second
MPEG-1 compression treats video as a sequence of separate images. Picture Elements, often
referred to as pixels are elements in the image. Each pixel consists of three components
Luminance/luminosity (Y) and two for chrominance Cb and Cr. MPEG-1 encodes Y pixels in
full (check the correct term) resolution as the Human Visual System (HVS) is most sensitive to
luminance/luminosity.1
Quantification
Predictive coding the difference between the predicted pixel value and the real value is coded.
Movement compensation (MC) predicts the value of a neighboring block of pixels (1 block =
8x8 pixels) in an image to those of a known block of pixels. A vector describes the 2-
dimensional movement. If no movement takes place, the value is 0.
2
Interframe coding
Sequential coding
VLC (Variable? Coding)
Image Interpolation
Intra (I frames) are coded independently of other images.
MPEG codes images progressively Interlaced images need to be converted into a de-interlaced
format before encoding, Video is encoded and Encoded video is converted into an interlaced
form.
To achieve a high compression ratio: An appropriate spatial resolution for the signal is
chosen/the image is broken down into different pixels; block-based motion compensation is
used to reduce the temporal redundancy.
12
-
8/8/2019 Final Project 112
13/32
1 Motion compensation is used for causal prediction of the current picture from a
previous picture, for non-causal prediction of the current picture from a future picture, or for
interpolative prediction from past and future pictures.
The difference signal, the prediction error, is further compressed using the discrete cosine
transform (DCT) to remove spatial correlation and is then quantized. Finally, the motion
vectors are combined with the DCT information, and coded using variable length codes.
MPEG-1 is a standard in 4 parts:
Part 1 addresses the problem of combining one or more data streams from the video and audio
parts of the MPEG-1 standard with timing information to form a single stream .This is an
important function because, once combined into a single stream, the data are in a form wellsuited to digital storage or transmission.
Part 2 specifies a coded representation that can be used for compressing video sequences -
both 625-line and 525-lines - to bitrates around 1,5 Mbit/s. Part 2 was developed to operate
principally from storage media offering a continuous transfer rate of about 1,5 Mbit/s.
Nevertheless it can be used more widely than this because the approach taken is generic. A
number of techniques are used to achieve a high compression ratio. The first is to select an
appropriate spatial resolution for the signal. The algorithm then uses block-based motion
compensation to reduce the temporal redundancy. Motion compensation is used for causal
prediction of the current picture from a previous picture, for non-causal prediction of the
current picture from a future picture, or for interpolative prediction from past and future
pictures. The difference signal, the prediction error, is further compressed using the discrete
cosine transform (DCT) to remove spatial correlation and is then quantized. Finally, the motion
vectors are combined with the DCT information, and coded using variable length codes.
Part 3 specifies a coded representation that can be used for compressing audio sequences -
both mono and stereo. Input audio samples are fed into the encoder. The mapping creates a
filtered and subsampled representation of the input audio stream. A psychoacoustic model
creates a set of data to control the quantiser and coding. The quantiser and coding block creates
13
-
8/8/2019 Final Project 112
14/32
a set of coding symbols from the mapped input samples. The block 'frame packing' assembles
the actual bit stream from the output data of the other blocks, and adds other information (e.g.
error correction) if necessary.
Part 4 specifies how tests can be designed to verify whether bitstreams and decoders meet the
requirements as specified in parts 1, 2 and 3 of the MPEG-1 standard. These tests can be used
by:
manufacturers of encoders, and their customers, to verify whether the encoder produces
valid bitstreams.
manufacturers of decoders and their customers to verify whether the decoder meets the
requirements specified in parts 1,2 and 3 of the standard for the claimed decoder
capabilities. applications to verify whether the characteristics of a given bitstream meet the
application requirements, for example whether the size of the coded picture does not
exceed the maximum value allowed for the application.
2.1.5 MPEG-2
MPEG-2 is an extension of the MPEG-1 international standard for digital compression of audio
and video signals. MPEG-2 is directed at broadcast formats at higher data rates; it provides
extra algorithmic 'tools' for efficiently coding interlaced video, supports a wide range of bit
rates and provides for multichannel surround sound coding.
1. INTRODUCTION:
The MPEG-2 standard [2] is capable of coding standard-definition television at bit rates from
about 3-15 Mbit/s and high-definition television at 15-30 Mbit/s. MPEG-2 extends the stereo
audio capabilities of MPEG-1 to multi-channel surround sound coding. MPEG-2 decoders will
also decode MPEG-1 bit streams.
2. VIDEO FUNDAMENTALS
14
-
8/8/2019 Final Project 112
15/32
Television services in Europe currently broadcast video at a frame rate of 25 Hz. Each frame
consists of two interlaced fields, giving a field rate of 50 Hz. The first field of each frame
contains only the odd numbered lines of the frame (numbering the top frame line as line 1).
The second field contains only the even numbered lines of the frame and is sampled in the
video camera 20 ms after the first field. It is important to note that one interlaced frame
contains fields from two instants in time. American television is similarly interlaced but with a
frame rate of just less than 30 Hz.
The red, green and blue (RGB) signals coming from a color television camera can be
equivalently expressed as luminance (Y) and chrominance (UV) components. The
chrominance bandwidth may be reduced relative to the luminance without significantly
affecting the picture quality. For standard definition video, CCIR recommendation 601 [3]defines how the component (YUV) video signals can be sampled and digitized to form discrete
pixels . The terms 4:2:2 and 4:2:0 are often used to describe the sampling structure of the
digital picture. 4:2:2 means the chrominance is horizontally sub sampled by a factor of two
relative to the luminance; 4:2:0 means the chrominance is horizontally and vertically sub
sampled by a factor of two relative to the luminance.
3. BIT RATE REDUCTION PRINCIPLES : A bit rate reduction system operates by
removing redundant information from the signal at the coder prior to transmission and re-inserting it at the decoder. A coder and decoder pair is referred to as a 'codec'. In video signals,
two distinct kinds of redundancy can be identified.
Spatial and temporal redundancy: Pixel values are not independent, but are correlated with
their neighbors both within the same frame and across frames. So, to some extent, the value of
a pixel is predictable given the values of neighboring pixels.
Psycho visual redundancy:
The human eye has a limited response to fine spatial detail [4], and is less sensitive to detail
near object edges or around shot-changes. Consequently, controlled impairments introduced
into the decoded picture by the bit rate reduction process should not be visible to a human
observer.
15
-
8/8/2019 Final Project 112
16/32
Two key techniques employed in an MPEG codec are intra-frame Discrete Cosine Transform
(DCT) coding and motion-compensated inter-frame prediction. These techniques have been
successfully applied to video bit rate reduction prior to MPEG, notably for 625-line video
contribution standards at 34 Mbit/s and video conference systems at bit rates below 2 Mbit/s.
4. MPEG-2 DETAILS
In an MPEG-2 system, the DCT and motion-compensated interframe prediction are combined.
The coder subtracts the motion-compensated prediction from the source picture to form a
'prediction error' picture. The prediction error is transformed with the DCT, the coefficients are
quantized and these quantized values coded using a VLC. The coded luminance and
chrominance prediction error is combined with 'side information' required by the decoder, such
as motion vectors and synchronizing information, and formed into a bit stream for
transmission. In the decoder, the quantized DCT coefficients are reconstructed and inverse
transformed to produce the prediction error. This is added to the motion-compensated
prediction generated from previously decoded pictures to produce the decoded output.
Picture types
In MPEG-2, three 'picture types' are defined. The picture type defines which prediction modes
may be used to code each block..
'Intra' pictures (I-pictures) are coded without reference to other pictures. Moderate compression
is achieved by reducing spatial redundancy, but not temporal redundancy. They can be used
periodically to provide access points in the bitstream where decoding can begin.
'Predictive' pictures (P-pictures) can use the previous I- or P-picture for motion compensation
and may be used as a reference for further prediction. Each block in a P-picture can either be
predicted or intra-coded. By reducing spatial and temporal redundancy, P-pictures offer
increased compression compared to I-pictures.
16
-
8/8/2019 Final Project 112
17/32
'Bidirectionally-predictive' pictures (B-pictures) can use the previous and next I- or P-pictures
for motion-compensation, and offer the highest degree of compression. Each block in a B-
picture can be forward, backward or bidirectionally predicted or intra-coded. To enable
backward prediction from a future frame, the coder reorders the pictures from natural 'display'
order to 'bitstream' order so that the B-picture is transmitted after the previous and next pictures
it references. This introduces a reordering delay dependent on the number of consecutive B-
pictures.
Buffer control : By removing much of the redundancy from the source images, the coder
outputs a variable bit rate. The bit rate depends on the complexity and predictability of the
source picture and the effectiveness of the motion-compensated prediction.
MPEG-2 is a standard currently in 6 parts :
Part 1 of MPEG-2 addresses the combining of one or more elementary streams of video and
audio, as well as, other data into single or multiple streams which are suitable for storage or
transmission. This is specified in two forms: the Program Stream and the Transport Stream.
Each is optimized for a different set of applications. The Program Stream is similar to MPEG-1
Systems Multiplex. It results from combining one or more Packetised Elementary Streams
(PES), which have a common time base, into a single stream. The Program Stream is designed
for use in relatively error-free environments and is suitable for applications which may involve
software processing. Program stream packets may be of variable and relatively great length.
The Transport Stream combines one or more Packetized Elementary Streams (PES) with one
or more independent time bases into a single stream. Elementary streams sharing a common
timebase form a program. The Transport Stream is designed for use in environments whereerrors are likely, such as storage or transmission in lossy or noisy media. Transport stream
packets are 188 bytes long.
17
-
8/8/2019 Final Project 112
18/32
Part 2 of MPEG-2 builds on the powerful video compression capabilities of the MPEG-1
standard to offer a wide range of coding tools. These have been grouped in profiles to offer
different functionalities. Since the final approval of MPEG-2 Video in November 1994, one
additional profile has been developed. This uses existing coding tools of MPEG-2 Video but is
capable to deal with pictures having a colour resolution of 4:2:2 and a higher bitrate. Even
though MPEG-2 Video was not developed having in mind studio applications, a set of
comparison tests carried out by MPEG confirmed that MPEG-2 Video was at least good, and
in many cases even better than standards or specifications developed for high bitrate or studio
applications.
The Multiview Profile (MVP) is an additional profile currently being developed. By using
existing MPEG-2 Video coding tools it is possible to encode in an efficient way tow videosequences issued from two cameras shooting the same scene with a small angle between them.
Part 3 of MPEG-2 - Digital Storage Media Command and Control (DSM-CC) is the
specification of a set of protocols which provides the control functions and operations specific
to managing MPEG-1 and MPEG-2 bitstreams. These protocols may be used to support
applications in both stand-alone and heterogeneous network environments. In the DSM-CCmodel, a stream is sourced by a Server and delivered to a Client. Both the Server and the Client
are considered to be Users of the DSM-CC network. DSM-CC defines a logical entity called
the Session and Resource Manager (SRM) which provides a (logically) centralized
management of the DSM-CC Sessions and Resources.
Part 4of MPEG-2 will be the specification of a multichannel audio coding algorithm not
constrained to be backwards-compatible with MPEG-1 Audio. The standard has been approved
in April 1997.
18
-
8/8/2019 Final Project 112
19/32
Part 5 of MPEG-2 was originally planned to be coding of video when input samples are 10
bits. Work on this part was discontinued when it became apparent that there was insufficient
interest from industry for such a standard.
Part 6 of MPEG-2 is the specification of the Real-time Interface (RTI) to Transport Stream
decoders which may be utilised for adaptation to all appropriate networks carrying Transport
Streams.
2.1.6 MPEG-4
Introduction
The creation of the MPEG-4 specification arose as experts wanted a faster compression rate
than MPEG-2, but which also worked well at low bit rates. Discussions began at the end of
1992 and work on the standards started in July 1993.
MPEG-4 provides a standardized method of:
1 1. Audio-visual coding at very low bit rates
2 2. Describing audio-visual objects in a scene.3 3. Multiplexing and synchronizing the information associated with the objects
4 4. Interacting with the audio-visual scene that is received by the end user.
Elementary Streams:
Each encoded media object has its own Elementary Stream (ES), which is sent to the decoder
and decoded individually, before composition. The following streams are created in
MPEG-4:
1 Scene Description Stream
2 Object Description Stream
3 Visual Stream
19
-
8/8/2019 Final Project 112
20/32
4 Audio Stream
When data has been encoded, the data streams can be transmitted or stored separately and need
to be composed at the receiving end. Media objects are organized in a hierarchical manner to
form audio-visual scenes. Due to this organizational manner, the media objects, each object
can be described or encoded independently of other objects in the scene e.g. the background.
MPEG-4/BiFS:
1 Allows users to change their view point in a 3D scene or to interact with media objects.
Allows different objects in the same scene to be coded at different levels of quality.
MPEG-4 Systems also addresses:
1 1. A standard file format to enable the exchange and authoring of MPEG-4 content
2 2.Interactivity (both client-side and server-side
3 3.MPEG-J (MPEG-4 & Java)
4 4. FlexMux tool which allows for the interleaving of multiple streams into a single
stream.
Profiles have been developed to create conformance points for MPEG-4 tools and toolsets,
therefore interoperability of MPEG-4 products with the same Profiles and Levels can be
assured.
A Profile is a subset of the MPEG-4 Systems, Visual or Audio tools set and is used for specific
applications. It limits the tool set a decoder has to implement, therefore many applications only
need a portion of the MPEG-4 toolset. Profiles specified in the MPEG-4 standard include:
1 a. Visual Profile
2 b. Natural Profile
3 c. Synthetic & Natural/Synthetic Hybrid Profiles
4 d. Audio Profile
5 e. Graphic Profile
6 f. Scene Graph Profile
20
-
8/8/2019 Final Project 112
21/32
The systems part of the MPEG-4 addresses the description of the relationship between the
audio-visual components that constitute a scene. The relationship is described at two main
levels.
The Binary Format for Scenes (BIFS) describes the spatio-temporal arrangements of
the objects in the scene. Viewers may have the possibility of interacting with the
objects, e.g. by rearranging them on the scene or by changing their own point of view in
a 3D virtual environment. The scene description provides a rich set of nodes for 2-D
and 3-D composition operators and graphics primitives.
At a lower level, Object Descriptors (ODs) define the relationship between the
Elementary Streams pertinent to each object (e.g. the audio and the video stream of a
participant to a videoconference) ODs also provide additional information.
7
2.1.7 MPEG-7
Introduction
The MPEG standards are an evolving set of standards for video and audio compression. MPEG
7 technology covers the most recent developments in multimedia search and retreival, designed
to standardize the description of multimedia content supporting a wide range of applications
including DVD, CD and HDTV.
MPEG-7 is a seven-part specification, formally entitled Multimedia Content Description
Interface. It provides standardized tools for describing multimedia content, which will enable
searching, filtering and browsing of multimedia content.
ISO 15938-1 Systems
MPEG-7 descriptions exist in two formats:
21
-
8/8/2019 Final Project 112
22/32
Textual - XML which allows editing, searching and filtering of a multimedia description. The
description can be located anywhere, not necessaryily with the content. Binary format -
suitable for storing, transmitting and streaming delivery of the multimedia description.
The MPEG-7 Systems provides the tools for:
1
a. The preparation of binary coded representation of the PEG-7 descriptions,
b. For efficient storage and transmission.
c. Transmission techniques (both textual and binary formats)
d. Multiplexing of descriptions
e. Synchronization of descriptions with content
f. Intellectual property management and protectiong. Terminal architecture
h. Normative interface
Descriptions may represented in two forms:
Textual (XML)
Binary form (BiM Binary format for Metadata). Binary coded representation is
useful for efficient storage and transmission of content.
1 MPEG-7 data is obtained from transport or storage ,handed to delivery layer. This
allows extraction of elementary streams (consisting of individually accessible chunks called
access units) by undoing the transport/storage specific framing and multiplexing and retains
timing information needed for synchronisation.
2
3 Elementary streams are forwarded to the compression layer where the schema streams
(schemes describing strucure of MPEG-7 data) and partial or full description streams (streams
describing the content) are decoded.
22
-
8/8/2019 Final Project 112
23/32
MPEG-7 tools
MPEG-7 uses the following tools:
Descriptor (D): It is a representation of a feature defined syntactically andsemantically. It could be that a unique object was described by several descriptors.
Description Schemes (DS): Specify the structure and semantics of the relations
between its components, these components can be descriptors (D) or description
schemes (DS).
Description Definition Language (DDL): It is based on XML language used to define
the structural relations between descriptors. It allows the creation and modification of description schemes and also the creation of new descriptors (D).
System tools: These tools deal with binarization, synchronization, transport and storage
of descriptors.
2.1.8 H.261
H.261 is a ITU-T video coding standard, ratified in November 1988. Originally designed for
transmission over ISDN lines on which data rates are multiples of 64 kbit/s. It is one member
of the H.26x family of video coding standards in the domain of the ITU-T Video Coding
Experts Group (VCEG). The coding algorithm was designed to be able to operate at video bit
rates between 40 kbit/s and 2 Mbit/s. The standard supports two video frame sizes: CIF
(352x288 luma with 176x144 chroma) and QCIF (176x144 with 88x72 chroma) using a 4:2:0
sampling scheme. It also has a backward-compatible trick for sending still picture graphics
with 704x576 luma resolution and 352x288 chroma resolution (which was added in a later
revision in 1993.
Different steps follow by this standard is:
23
-
8/8/2019 Final Project 112
24/32
1. Loop filter
The prediction process may be modified by a two-dimensional spatial filter (FIL) which
operates on pixels within a predicted 8 by 8 block..The filter is separable into one-dimensional
horizontal and vertical functions. Both are non-recursive with coefficients of 1/4, 1/2, 1/4
except at block edges where one of the taps would fall outside the block. In such cases the 1-D
filter is changed to have coefficients of 0, 1, 0. Full arithmetic precision is retained with
rounding to 8 bit integer values at the 2-D filter output. Values whose fractional part is one half
are rounded up. The filter is switched on/off for all six blocks in a macro block according to the
macro block type.
2. Transformer
Transmitted blocks are first processed by a separable two-dimensional discrete cosine
transform of size 8 by 8. The output from the inverse transform ranges from 256 to +255 after
clipping to be represented with 9 bits. The transfer function of the inverse transform is given
by:
NOTE Within the block being transformed, x = 0 and y = 0 refer to the pel nearest the left
and top edges of the picture, respectively.
The arithmetic procedures for computing the transforms are not defined, but the inverse one
should meet the error Tolerance.
3. Quantization
The number of quantizers is 1 for the INTRA dc coefficient and 31 for all other coefficients.
Within a macro block the same quantizer is used for all coefficients except the INTRA dc one.
The decision levels are not defined. The INTRA dc coefficient is nominally the transform
value linearly quantized with a step size of 8 and no dead-zone. Each of the other 31 quantizers
24
-
8/8/2019 Final Project 112
25/32
is also nominally linear but with a central dead-zone around zero and with a step size of an
even value in the range 2 to 62.
Clipping of reconstructed picture
To prevent quantization distortion of transform coefficient amplitudes causing arithmetic
overflow in the encoder and decoder loops, clipping functions are inserted. The clipping
function is applied to the reconstructed picture which is formed by summing the prediction and
the prediction error as modified by the coding process. This clipper operates on resulting pel
values less than 0 or greater than 255, changing them to 0 and 255, respectively.
Different Tools have been developed already:
Video compressor AVI Compressor
MP4 Compressor MPEG Compressor
3GP Compressor You Tube Compressor
iPod Compressor Flash Video Compressor
Quick Time Compressor
WMV Compressor
MKV Compressor
VOB Compressor DVD Compressor
25
-
8/8/2019 Final Project 112
26/32
2.2 Methodology Adopted
H.261
H.261 is a ITU-T video coding standard, ratified in November 1988. Originally designed for
transmission over ISDN lines on which data rates are multiples of 64 kbit/s. It is one member
of the H.26x family of video coding standards in the domain of the ITU-T Video Coding
Experts Group (VCEG). The coding algorithm was designed to be able to operate at video bit
rates between 40 kbit/s and 2 Mbit/s. The standard supports two video frame sizes: CIF(352x288 luma with 176x144 chroma) and QCIF (176x144 with 88x72 chroma) using a 4:2:0
sampling scheme. It also has a backward-compatible trick for sending still picture graphics
with 704x576 luma resolution and 352x288 chroma resolution (which was added in a later
revision in 1993.
Different steps follow by this standard is:
1.Loop filterThe prediction process may be modified by a two-dimensional spatial filter (FIL) which
operates on pixels within a predicted 8 by 8 block..The filter is separable into one-dimensional
horizontal and vertical functions. Both are non-recursive with coefficients of 1/4, 1/2, 1/4
except at block edges where one of the taps would fall outside the block. In such cases the 1-D
filter is changed to have coefficients of 0, 1, 0. Full arithmetic precision is retained with
rounding to 8 bit integer values at the 2-D filter output. Values whose fractional part is one half
are rounded up. The filter is switched on/off for all six blocks in a macro block according to the
macro block type.
26
-
8/8/2019 Final Project 112
27/32
2.Transformer
Transmitted blocks are first processed by a separable two-dimensional discrete cosine
transform of size 8 by 8. The output from the inverse transform ranges from 256 to +255 after
clipping to be represented with 9 bits. The transfer function of the inverse transform is given
by:
NOTE Within the block being transformed, x = 0 and y = 0 refer to the pel nearest the left
and top edges of the picture, respectively.
The arithmetic procedures for computing the transforms are not defined, but the inverse one
should meet the errorTolerance.
3. Quantization
The number of quantizers is 1 for the INTRA dc coefficient and 31 for all other coefficients.
Within a macro block the same quantizer is used for all coefficients except the INTRA dc one.
The decision levels are not defined. The INTRA dc coefficient is nominally the transform
value linearly quantized with a step size of 8 and no dead-zone. Each of the other 31 quantizers
is also nominally linear but with a central dead-zone around zero and with a step size of an
even value in the range 2 to 62.
Clipping of reconstructed picture
To prevent quantization distortion of transform coefficient amplitudes causing arithmetic
overflow in the encoder and decoder loops, clipping functions are inserted. The clipping
function is applied to the reconstructed picture which is formed by summing the prediction and
the prediction error as modified by the coding process. This clipper operates on resulting pel
values less than 0 or greater than 255, changing them to 0 and 255, respectively
27
-
8/8/2019 Final Project 112
28/32
Chapter -3 Project Estimation and Implementation Plan3.1 Cost and Benefit Analysis
3.1.1 ECONOMICAL
Economic analysis is the most frequently used method for evaluating the candidate system.
More commonly known as cost of Benefit Analysis, the procedure is to determine the
benefits and savings that are expected from the candidate system and compare them with the
costs. If benefit outweighs the cost then the decision is made to design and implementation
otherwise further justification or alterations are made in the proposed system.
This project doesn't have many hardware requirements, thus, it requires less costing to install
the software on the whole.
Though, from the point of economy, the manual handling of the hardware component is
much cheaper and best as compared to computerized systems. This approach normally
works very well in any ordinary organization . The major problem starts when the no. of
hardware components are starts growing with a time. Manual system needs variousregisters/books to maintain the daily complain entry, hardware entry done. In case of any
misplacement of hardware component, the concerned registers have to be searched for the
verification of identifying the status of that component . It is very cumbersome job to
maintain all these manually. So it is very easy to maintain all these in the proposed system.
3.1.2 COST ANALYSIS
The cost to conduct investigation was negligible, as the center manager
and teachers of center provided most of information.
The cost of essential hardware and software requirement is not very
expensive.
28
-
8/8/2019 Final Project 112
29/32
Moreover hardware like Pentium Core PC and software like MATLAB
are easily available in the market.
3.1.3 BENEFITS AND SAVINGS-
Cost of the maintenance of the proposed system is negligible. Money is saved as paper work is minimized.
Records are easily entered and retrieved.
Time is saved as all the work can be done by a simple mouse click.
The proposed system is fully automated and hence easy to use. Since benefits out base the cost, hence our project is economically feasible.
29
-
8/8/2019 Final Project 112
30/32
3.2 Schedule Estimate
This is the table of Activity and its estimated time duration, which are used to accomplish the
project.
Activity Completion
Date
Duration
(In Days)
Effort
(in Manhours)A) Introduction 20 AUG 2010 20 250B) Problem Analysis 15 SEP 2010 25 370C) Project Estimation &
Implementation Plan
20 OCT 2010 35 520
D) Research Design 10 NOV 2010 20 300E) System Interface Design 10 DEC 2010 30 450
F) Coding 20 FEB 2011 30 600G) Experiments Specification 10 MAR 2011 20 300H) Conclusions 25 MAR 2011 15 20I) User Manual 10 APR 2011 15 20
3.3 Gantt Chart
A Gantt chart is a horizontal bar chart developed as a production control tool in 1917 by HenryL. Gantt, an American engineer and social scientist. Frequently used in project management, a
Gantt chart provides a graphical illustration of a schedule that helps to plan, coordinate, andtrack specific tasks in a project.
Gantt charts may be simple versions created on graph paper or more complex automatedversions created using project management applications such as Microsoft Project or Excel.
A Gantt chart is constructed with a horizontal axis representing the total time span of the project, broken down into increments (for example, days, weeks, or months) and a vertical axis
30
-
8/8/2019 Final Project 112
31/32
representing the tasks that make up the project (for example, if the project is outfitting your computer with new software, the major tasks involved might be: conduct research, choosesoftware, install software). Horizontal bars of varying lengths represent the sequences, timing,and time span for each task. Using the same example, you would put "conduct research" at thetop of the vertical axis and draw a bar on the graph that represents the amount of time you
expect to spend on the research, and then enter the other tasks below the first one andrepresentative bars at the points in time when you expect to undertake them. The bar spans mayoverlap, as, for example, you may conduct research and choose software during the same timespan. As the project progresses, secondary bars, arrowheads, or darkened bars may be added toindicate completed tasks, or the portions of tasks that have been completed. A vertical line isused to represent the report date.
Scheduling of SDLC (GANNT CHART)
In Weeks
0 5 10 15 20 25 30 35
Analysis
Documentation
Design.
Coding
Testing
References
31
-
8/8/2019 Final Project 112
32/32
[1] HUFFMAN, D. A. (1951). A method for the construction of minimum redundancy codes.
In the
Proceedings of the Institute of Radio Engineers 40, pp. 1098-1101.
[2] CAPON, J. (1959). A probabilistie model for run-length coding of pictures. IRE Trans. On
Information
Theory, IT-5, (4), pp. 157-163.
[3] APOSTOLOPOULOS, J. G. (2004). Video Compression. Streaming Media Systems
Group.
[4] The Moving Picture Experts Group home page. (3. Feb. 2006)
[5] CLARKE, R. J. Digital compression of still images and video. London: Academic press.
1995, pp.
[6] http://www.irf.uka.de/seminare/redundanz/vortrag15/.(3. Feb. 2006)
[7] PEREIRA, F. The MPEG4 Standard: Evolution or Revolution
[8] MANNING, C. The digital video site.
[9] SEFERIDIS, V. E. GHANBARI, M. (1993). General approach to block-matching motion
estimation.
Optical Engineering, (32), pp. 1464-1474.
[10] GHARAVI, H. MILLIS, M. (1990). Block matching motion estimation algorithms-new
results. IEEE
Transactions on Circuits and Systems, (37), pp. 649-651.
[11] CHOI, W. Y. PARK R. H. (1989). Motion vector coding with conditional transmission.
Signal
Processing, (18). pp. 259-267.
[12] Institut fr Informatik Universitt Karlsruhe .