HEVC-SIM:
A simulation and analysis tool for H.265 packet
transmission
By
Mohammad Manjur Rashed Khan
B.Sc., Bangladesh University of Engineering and Technology (BUET), 2003
Project report Submitted in Partial Fulfillment of the
Requirements for the Degree of
Master of Engineering
In the
School of Engineering Science
Faculty of Applied Science
Mohammad Manjur Rashed Khan 2015
SIMON FRASER UNIVERSITY
Fall 2015
ii
Approval
Name: Mohammad Manjur Rashed Khan
Degree: Master of Engineering
Title: HEVC-SIM: A simulation and analysis tool for H.265 packet transmission
Supervisory Committee: Chair: Dr. Jie Liang Professor
Dr. Ivan V. Bajić Senior Supervisor Associate Professor
Thomas Pang Supervisor Senior Manager Software Development Broadcom Canada.
Date Defended/Approved:
October 9, 2015
iii
Abstract
Video compression and decompression are essential parts of multimedia
communications. Lowering the bit rate while maintaining good video quality remains a
considerable challenge with increasing video resolutions and frame rates. In particular,
the introduction of 4K and 8K video resolution has created a significant demand for
efficient video compression appropriate for these high resolutions. The International
Telecommunication Union (ITU) and the Moving Picture Experts Group (MPEG) have
introduced High-Efficiency Video Coding (HEVC, also known as H.265) that has better
compression efficiency at the same or higher video quality, compared to earlier video
coding standards. In this project, a video coding and simulation tool was developed based
on open-source implementations of HEVC encoders and decoders. The tool, named
HEVC-SIM, can be used to encode raw video, analyze the packet structure and simulate
packet loss. In addition, two error concealment methods were implemented to help the
video decoder recover from packet loss. Simulations were carried out to assess the
effectiveness of the various algorithms implemented in HEVC-SIM.
iv
Dedication
I would like to dedicate this report to my wife Shifa and my daughter Aarisha. Aarisha was
born at the time I was doing this project. She came into our life with joy and happiness.
My wife Shifa provided me positive support to complete this project. And I am also grateful
to my parents for giving me lots of encouragement.
I also dedicate this work to Thomas pang who always see everything positively and always
tries to find good from it. This is a perfect example for me to improve people skill.
I also dedicate this work to my professor Ivan V. Bajić who helped me by giving clear
direction on this project and providing his suggestion and advice.
Finally I dedicate this work to all Mighty who creates every single thing for a purpose.
v
Acknowledgements
I would like to thank Dr. Ivan V. Bajić, Associate Professor of SFU, for supervising this
project and providing his guidance on improving it. He made me aware of the HM
reference code for HEVC, as well as x265, an efficient implementation of HEVC. He also
explained the 2-state Markov chain model, as well various approaches for packet loss
concealment. I would also like to thank Thomas Pang, Sr. Manager of Broadcom who
provided his valuable feedback and suggestion.
I would like to thank my wife Sayeeda Sharmin Shifa to co-operate and support me to
complete this project when she had to manage our little daughter Aarisha. She also
provided suggestion to write this report.
vi
Table of Contents
Approval .............................................................................................................................ii Abstract ............................................................................................................................. iii Dedication .........................................................................................................................iv Acknowledgements ........................................................................................................... v Table of Contents ..............................................................................................................vi List of Tables .................................................................................................................... vii List of Figures.................................................................................................................. viii List of Acronyms ................................................................................................................ix
Chapter 1. Introduction ............................................................................................... 1
Chapter 2. HEVC-SIM Software Description.............................................................. 7
Chapter 3. Result and Analysis ................................................................................ 19
Chapter 4. Conclusion .............................................................................................. 30
References .................................................................................................................. 31
vii
List of Tables
Table 1: The PSNR values in dB for the four frames in the example .............................. 25
Table 2: HM and x265 encoder execution time comparison ........................................... 29
viii
List of Figures
Figure 1: The YUV420 format, from [3] ............................................................................. 3
Figure 2: Typical HEVC video encoder (with decoder modeling elements shaded in light gray) from [1]. ................................................................................. 4
Figure 3: HEVC bitstream structure. ................................................................................. 6
Figure 4: HM encoder user interface ................................................................................. 8
Figure 5: x265 encoder user interface .............................................................................. 9
Figure 6: Packet loss simulation user interface ............................................................... 10
Figure 7: Gilbert-Elliott (2-state Markov) model, from [10] .............................................. 11
Figure 8: User-supplied packet loss file for packet loss simulation ................................. 12
Figure 9: Decoder user interface ..................................................................................... 12
Figure 10: Packet loss concealment and PSNR calculation ........................................... 13
Figure 11: Illustration of “CTU MV copy” concealment ................................................... 14
Figure 12: Motion vector file format ................................................................................. 15
Figure 13: Video preview window ................................................................................... 16
Figure 14: List window displays frame type (e.g., I or P) ................................................ 17
Figure 15: Output window shows the contents of VPS NAL unit .................................... 17
Figure 16: Output window shows the contents of SPS NAL unit .................................... 18
Figure 17: Output window shows the contents of PPS NAL unit .................................... 18
Figure 18: Output window shows decoding progress of slice NAL units ......................... 18
Figure 19: Frame 3 decoded without any packet loss. .................................................... 21
Figure 20: Frame 3 with 8 CTUs lost and no concealment ............................................. 22
Figure 21: Frame 3 with 8 CTUs missing and "CTU copy" concealment ........................ 23
Figure 22: Frame 3 with 8 CTUs lost and "CTU MV copy" concealment ........................ 24
Figure 23: PSNR (dB) vs. packet loss percentage for Foreman ..................................... 26
Figure 24: PSNR (dB) near 2% packet loss percentage for Foreman ............................ 27
Figure 25: PSNR (dB) vs. packet loss percentage for Flower Garden ............................ 28
ix
List of Acronyms
HEVC High Efficiency Video Coding
AVC Advanced Video Coding
CTU Coding Tree Unit
CTB Coding Tree Block
CU Coding Unit
CB Coding Block
PU Prediction Unit
PB Prediction Block
TU Transform Unit
TB Transform Block
AMVP Advance Motion Vector Prediction
MV Motion Vector
CABAC Context Adaptive Binary Arithmetic Coding
NAL Network Abstraction Layer
SEI Supplemental Enhancement Information
VPS Video Parameter Set
SPS Sequence Parameter Set
PPS Picture Parameter Set
VCL Video Coding Layer
PSNR Peak signal to noise ratio
RGB Red, Green and Blue
YUV Y is for Luminance and UV is chrominance
1
Chapter 1. Introduction
1.1 Background
Internet users have significantly increased their demand for video content.
YouTube and Netflix are some of the prime examples that demonstrate how people are
using the Internet to stream videos, thus adding to the Internet traffic. It is estimated [14]
that by 2018, video will account for 75% of the Internet traffic. Moreover, the 4K (Ultra-HD)
and 8K resolution video will become the broadcasting standards of the future. For these
reasons, there will be additional pressure to develop video codecs with higher coding
efficiency that deliver lower bit rates and higher video quality. As the current answer to this
increase in video demand, High Efficiency Video Coding (HEVC, also known as H.265)
comes as a new coding standard with 50% lower bit rate at the same video quality
compared to the earlier standard H.264 [1].
The HEVC standard has been designed to significantly reduce the bit rate for the
same video quality, especially at high resolutions. It can support video resolutions from
320240 to 81924320 (widthheight in pixels). The International Telecommunication
Union (ITU) and the Moving Picture Experts Group (MPEG) finalized this standard on
January 25, 2013. The scalable coding extension, multi-view extensions, and 3D
extensions have also been developed based on HEVC. The main coding tools of HEVC
are still based on Inter/Intra picture prediction, motion vector prediction and coding, spatial
transform, loop and deblocking filters, as its predecessor H.264. The H.264 uses fixed
sized macro block (MB) for each frame, whereas the HEVC uses variable size block which
is renamed as coding tree unit (CTU). This variable size coding tree unit helps to improve
coding efficiency by allowing the coding block to be tailored to video content.
2
The primary objective of this project is to develop a HEVC simulation application
based on the HEVC open source code. This involves understanding the different parts of
the HEVC standard, its various coding tools, video formats and source code. An additional
goal was to implement packet loss concealment, which is not part of the standard. The
developed application is referred to as HEVC-SIM.
HEVC-SIM is a simulator application built upon the HM reference software for
HEVC encoding and decoding, as well as an efficient HEVC encoder implementation
called x265. The HM encoder/decoder is relatively slow since it was developed primarily
for research purposes and standardization record-keeping, and does not involve any
hardware acceleration. On the other hand, the x265 encoder uses x86 hardware-
accelerated encoding. HEVC-SIM enables users who are not familiar with video coding in
general, and HEVC in particular, to easily encode raw video, analyze the bitstream, and
then simulate packet loss. Several packet loss concealment methods are implemented so
that user can investigate their performance on various video sequences.
1.2 Video format
The input of the HEVC encoder is a sequence of raw video frames. In video coding
applications, the input video color space is usually YUV, where Y denotes the luminance
component, while U and V are chrominance components. Conversion formulae between
RGB and YUV color spaces are shown below [15].
From RGB to YUV:
Y 0.299R 0.587G 0.114B
U 0.492 B Y
V 0.877 R Y
(1)
This can also be represented as:
3
Y 0.299R 0.587G 0.114B
U 0.147R 0.289G 0.436B
V 0.615R 0.515G 0.100B
(2)
From YUV to RGB:
R Y 1.140V
G Y 0.395U 0.581V
B Y 2.032U
(3)
Depending on the sampling, the YUV format comes in several flavors such as
YUV444, YUV422 and YUV420. The default input format for the main profile of HEVC is
YUV420. This is indicated by the value 1 in the chroma_format_idc field of the Sequence
Parameter Set (SPS) packet. The resolution of U and V components in the YUV420 format
is one quarter of the Y component resolution, half vertically and half horizontally. The
following figure from [3] illustrates the YUV420 format.
Figure 1: The YUV420 format, from [3]
4
1.3 High Efficiency Video Coding (HEVC)
The HEVC video coding layer has a number of processing blocks, as depicted in
Figure 2 below. The raw video frame is split into blocks, and each block region is passed
through various elements of the encoder. The first picture is encoded using intra-picture
prediction, and is referred to as I-Frame. The remaining pictures in the same group can
be encoded using inter-picture prediction, where prediction can be causal (P-Frames) or
bi-directional (B-Frames). The inter-picture prediction employs selected reference pictures
and motion vectors (MVs) that indicate the motion of various blocks.
Figure 2: Typical HEVC video encoder (with decoder modeling elements shaded in
light gray) from [1].
The following is some important terminology of HEVC.
5
Coding tree unit (CTU) and coding tree block (CTB): The CTU is the basic block
of pixels, similar to the macroblock in H264. The frame is divided into CTUs of
size 6464, 3232 or 1616. The CTU has three Coding tree blocks (CTBs) -
one for luminance and two for chrominance components.
Coding unit (CU) and coding block (CB): CTU is divided into one luminance
coding block (CB) and two chrominance CBs, each formed as a quad tree.
These CBs together are called coding unit (CU).
Prediction unit (PU) and prediction block (PB): The decision whether inter-
picture or intra-picture prediction will be used is decided at the CU level. The
CU is partitioned into prediction units (PUs). The CBs are split prediction blocks
(PBs) for luminance and chrominance components. The size of PB varies from
6464 to 44.
Transform unit (TU) and transform block (TB): A transform unit (TU) tree has
root at the CU level and it performs a spatial transform. The luminance and
chrominance part of the TU are called transform blocks (TBs). The TB size can
be 44, 88, 1616, or 3232.
Motion vector signaling: The new algorithm called Advanced Motion Vector
Prediction (AMVP) is used on nearest PBs for MV prediction.
Entropy coding: The entropy coding algorithm used in HEVC is Context
Adaptive Binary Arithmetic Coding (CABAC), similar to H.264.
In-loop deblocking filter: The in-loop deblocking filter is used within the inter-
picture prediction loop.
The NAL units in the HEVC bitstream are separated by certain reserved bit
sequences, specifically 0x000001 (3 Bytes) or 0x00000001 (4 Bytes). The bitstream
contains various parameters, organized into parameter sets, and the encoded pixel values
organized into slices. The structure of the bitstream is illustrated in Figure 3.
6
Figure 3: HEVC bitstream structure.
The parameter sets are as follows:
Video parameter set (VPS) contains the information of various layers, for
example in scalable or 3D video coding extensions.
Sequence parameter set (SPS) contains the information of temporal sub-
layers, color bit depth, coding block size, and so on.
Picture parameter set (PPS) contains the information of the initial quantization
parameter (QP), resolution, number of tiles, and so on.
Slice header contains the frame index (a.k.a., picture order count), the address
of the first CTU, and so on.
These parameters enable the HEVC decoder to interpret and decode the video
bitstream.
VPS SPS PPS Slice header Slice data
7
Chapter 2. HEVC-SIM Software Description
The HEVC-SIM software application was built upon open-software
implementations of HEVC encoder and decoder. For the encoder, the source code was
taken from HM 16.3 [4] and the API was taken from the x265 project [6]. The decoder
source code was taken from HM 16.3. The purpose of HEVC-SIM is to allow the user to
explore various HEVC encoding options and compare the speed of the HM and the x265
video encoders. In addition, packet loss concealment was implemented to allow one to
study the effects of packet loss on the decoded video quality. The remainder of this chapter
describes the various features of HEVC-SIM.
2.1 HM encoder
The HM encoder is used to compress YUV420 video. The user interface was
developed for the HM encoder to allow the user to control the following.
Video resolution (width and height)
Target bitrate
Quantization profile for quantization parameters
Intra period of I-frames
Number of frames to be encoded
Maximum Coding Unit size
Maximum depth of Coding Unit
Slice size determined by CTU, bytes, or tiles.
Various options for motion estimation, such as full search, diamond, or
PMVFAST using a search range.
An illustration of the user interface for the HM encoder is given in Figure 4.
8
Figure 4: HM encoder user interface
2.2 x265 encoder
The x265 encoder is considerably faster than the HM encoder on x86 PCs,
because it uses hardware acceleration for heavy computations. The user interface
developed for the x265 encoder is illustrated in Figure 5 and allows the user to control the
following encoding options:
Video resolution (width and height)
Target bitrate
Quantization profile for quantization parameters
Number of frames to be encoded
9
Figure 5: x265 encoder user interface
2.3 Packet loss simulation
In practice, the encoded video is transmitted to the user through an imperfect
channel, often through a packet-switched network, where packets are prone to errors and
loss. To allow one to study the effects of packet loss on the decoded video, HEVC-SIM
provides packet loss simulation to emulate the channel, as well as packet loss
concealment for the video decoder.
To simulate packet loss, HEVC-SIM takes an encoded HEVC video file as input,
removes some slices from it, and writes the remaining slices into another file. The output
file has some slices missing, compared to the input file, which simulates the effects of
packet loss. The removal of the slices can be done with one of the following two options.
Randomly, using a Gilbert-Elliott [12] (2-state Markov) packet loss model.
With a user-supplied list of packets.
The user interface of the packet loss simulation module is shown in Figure 6 below.
10
Figure 6: Packet loss simulation user interface
The Gilbert-Elliott model is a 2-state Markov model, often used to simulate bursty
errors or losses [10]. It is shown in Figure 7. The two states are termed good (G) state and
bad (B) state. Transition from good to bad state implies packet loss and staying in the bad
state means the same. Transition from bad to good state implies no packet loss and
staying in the good state means the same.
If is the probability of transition from good to bad state, and is the probability of
transition from bad to good state, then the state transition matrix is given by
11
((4)
11
Figure 7: Gilbert-Elliott (2-state Markov) model, from [10]
Probabilities and can be set in the user interface (Figure 6). It is often useful to
translate the model's probabilities and to the parameters of a burst packet loss process.
Let P be the probability of packet loss and L be the average burst length. Then, following
[11], these can be related to the probabilities and as follows:
P (5)
and
L1 (6)
To simulate packet loss caused by a different, possibly more complicated loss
process, the user can also specify which slices/packets to drop. This can be accomplished
by providing packet indices (one per line) in a text file called pkt_drop_index.txt in
the same directory where the HEVC-SIM executable is located. When presented with such
a file, the simulator will read packet indices from the file and remove the corresponding
slices from the bitstream. An illustration of the user-supplied packet loss file is shown in
Figure 8.
G B
1 1
12
Figure 8: User-supplied packet loss file for packet loss simulation
2.4 HM decoder
The HEVC decoder for the HEVC-SIM application was built upon the HM decoder
[4]. In its basic form, it takes a loss-free HEVC file, decodes it, and generates a YUV420
formatted file that can be played using a YUV Player. The corresponding user interface is
shown in Figure 9. The more challenging case, when the encoded file has been subject
to packet loss, is described in the following section.
Figure 9: Decoder user interface
2.5 Packet loss concealment
13
The user interface for the packet loss concealment module is illustrated in Figure
10. Several options have been provided for decoding the video files that have been subject
to packet loss:
No concealment ("No copy" in the user interface)
Previous CTU repeat ("CTU copy" in the user interface)
CTU repeat with motion vector ("CTU MV copy" in the user interface)
Figure 10: Packet loss concealment and PSNR calculation
The “no concealment” option means that no attempt is made to reconstruct the
pixel values in the missing packets/slices. These pixel values are simply set to zero, which
will result in them being displayed as green color. The “previous CTU repeat” option means
that the pixel values in the lost part of the frame are recovered by copying them from the
same position in the previous frame. The “CTU repeat with motion vector” option means
that the pixel values in the lost part of the frame are recovered by copying the equal region
of CTU size that is offset from the current position by a suitably chosen MV. In the current
implementation, the MV is simply taken from the nearest available CTU in the vertical
direction, but other options could be examined in the future. Figure 11 shows how previous
CTU with MV is copied to the current frame. Here, and are the coordinates of
the upper-left corner of a CTU while and are the motion vector components.
14
Figure 11: Illustration of “CTU MV copy” concealment
After choosing the concealment option, user can decode the video file either using
the decoder tab or using the analyzer. Also, the Peak-Signal-to-Noise-Ratio (PSNR, see
Chapter 3) of the decoded video can be calculated.
2.6 Motion vector output
Current frame
Orange color represents lost CTUs
Arrow shows the copy CTU from previous frame
Previous frame
Coding Tree Unit
The MV is taken from the nearest available CTU in the vertical direction.
15
Motion vectors from the compressed bitstream can be useful in a number of video
processing tasks, such as global motion estimation [16], motion segmentation [17],
tracking [18], visual saliency estimation [19, 20], and so on. HEVC-SIM allows the motion
vector of each PU to be saved into the MotionVector.txt file in the output directory.
The format of the file is given in Figure 12 below. parameter m_ctuRsAddr shows the
CTU index, uiNumPartition shows the index of the partition within each CTU, and POC
is the picture order count. The vector format is , , where represent the
horizontal component and is the vertical component.
Figure 12: Motion vector file format
2.7 Preview using OpenGL
The video can be displayed frame by frame on the preview window within HEVC-
SIM's user interface, shown in Figure 13 below. This helps to visualize the effects of
compression, and possible packet loss and error concealment. From this window it is also
possible to initiate stream analysis that displays information about various parameter sets
such as VPS, PPS, SPS, as described next.
TComCUMvField: m_ctuRsAddr=9, m_uiNumPartition=256, POC=1
nTComCUMvField: ref pic number=0
TComCUMvField:dumpMV m_uiNumPartition=256:
(-2,3) (-2,3 )(-2,3)…
16
Figure 13: Video preview window
2.8 NAL unit description
The HEVC-SIM can parse VPS, SPS and PPS NAL units and display the content
of their various fields on the output window. It also displays the first and last CTU index
for each slice or frame. Figures 15 – 18 show the information from these NAL units.
17
Figure 14: List window displays frame type (e.g., I or P)
Figure 15: Output window shows the contents of VPS NAL unit
18
Figure 16: Output window shows the contents of SPS NAL unit
Figure 17: Output window shows the contents of PPS NAL unit
Figure 18: Output window shows decoding progress of slice NAL units
19
Chapter 3. Result and Analysis
This section presents the results and analysis of packet loss concealment in
HEVC-SIM. The tests were done on Intel Core i5 CPU running Windows 7. The packet
loss is introduced into the bitstream by the user-supplied list of lost packets, or the Gilbert-
Elliot packet loss model. The concealment is performed using one of the options described
in Section 2.5.
Peak Signal-to-Noise Ratio (PSNR) is commonly used to measure the objective
quality of video. It is calculated based on the mean square error (MSE). The definition of
PSNR in dB is given in equation (8). The and are the width and height of the frame,
respectively, is original pixel value, and is the reconstructed pixel value, where the
reconstruction is accomplished by decoding and/or error concealment. MAX is the
maximum possible pixel value, which is 255 for 8-bit images.
MSE 1
, ,
((7)
PSNR 10 ∙ logMAXMSE
((8)
If the decoded and original pixel values ( and ) are the same for each pixel, the
MSE will be zero and the PSNR will be infinite – this would indicate a perfect match
between the original and decoded video. Lower PSNR values indicate degradation, due
20
to quantization and possibly packer loss. The analysis is done by comparing the PSNR of
decoded video in different scenarios, for example without packet loss, with packet loss but
no concealment, with “CTU copy” concealment and “CTU MV copy” concealment.
3.1 Visual quality
The next test is performed on a CIF-resolution (352 288 Foreman sequence,
with frame rate of 30 frames per second (fps), encoded at 800 kbps. Four figures are
shown below to illustrate the test. In Figure 19, decoded frame 3 is shown without any
packet loss. Figure 20 shows the same frame, decoded without 8 CTUs, which have been
assumed lost. No concealment has been applied, so the pixels belonging to the lost CTUs
are shown as green.
Figure 21 shows the same frame, but with "CTU copy" applied to recover pixel
values in the missing CTUs. Finally, Figure 22 shows the same frame with "CTU MV copy"
concealment applied, where the MV is taken from the nearest available vertically
neighbouring CTU. In this particular example, due to low motion intensity in the sequence,
both concealment methods produce visually good results. It is hard to distinguish the
concealed frames (Figures 21-22) from the error-free frame in Figure 19.
21
Figure 19: Frame 3 decoded without any packet loss.
22
Figure 20: Frame 3 with 8 CTUs lost and no concealment
23
Figure 21: Frame 3 with 8 CTUs missing and "CTU copy" concealment
24
Figure 22: Frame 3 with 8 CTUs lost and "CTU MV copy" concealment
3.2 PSNR results
The following table shows the PSNR values of video frames from the example
above. Frame 3 suffers from packet loss, which is evident in the table as the PSNR of this
frame is much lower than that of frame 2. Note that concealment (both with and without
MV) gives much better PSNR than no concealment. Also note that frame 4, which did not
suffer from packet loss, still has considerably lower PSNR than frames 1 and 2. This effect
is known as error propagation and is caused by predictive encoding of frames using earlier
frames as the reference.
25
Frame Loss-free No concealment CTU copy CTU MV copy
1 49.60 49.60 49.60 49.60
2 42.62 42.62 42.62 42.62
3 38.20 9.94 31.34 32.62
4 40.07 11.03 31.20 33.69
Table 1: The PSNR values in dB for the four frames in the example
To get a better idea of the average behavior of the error concealment schemes,
Gilbert-Elliott packet loss model was applied to two video sequences, Foreman and Flower
Garden, both with CIF resolution at 30 fps. They were encoded at 800 kbps using the
IPPP… group of pictures (GOP) structure, with the first frame being intra-coded and all
subsequent frames being inter-coded as P-frames. The packet loss probability P was
taken in the range from 0 to 0.2 and the average burst length L was taken to be 5 packets.
Figures 23 and 25 show the average PSNR vs. packet loss percentage for these two
sequences. Error bars along the PSNR curves indicate the standard deviation of PSNR
for each packet loss percentage. This result was obtained by running 8 packet loss
realizations over 300 frames of each sequence.
The graphs show that PSNR is reducing as the packet los percentage increases,
as expected. It is also seen that concealment can give about 5-10 dB advantage in PSNR
over no concealment. Figure 23 shows the PSNR of CTU copy and CTU MV copy are
very close. Figure 24 zooms into the region near 2% packet loss, where it is seen that on
average, CTU MV copy has a 0.1 dB advantage over CTU copy. This indicates that a
simple copying of the MV from the nearest available vertically neighboring CTU is not a
particularly successful strategy for exploiting motion information for error concealment,
and that other strategies should be explored to further improve the performance. Another
factor that brings the PSNR down is the IPPP… GOP structure. In practice, the system
would likely need to refresh the coding loop with an I-frame at regular intervals, or when
packet loss is detected.
26
Figure 23: PSNR (dB) vs. packet loss percentage for Foreman
Packet loss in percent0 5 10 15 20 25
PS
NR
0
5
10
15
20
25
30
35
40
45Packet loss rate vs PSNR (foreman cif.yuv)
no errorNo concealmentCTU copyCTU copy with MV
27
Figure 24: PSNR (dB) near 2% packet loss percentage for Foreman
Figure 25 shows the corresponding PSNR vs. packet loss percentage curves for
Flower Garden. Similar conclusions can be drawn from this figure – concealment provides
significant improvement compared to no concealment, while CTU copy and CTU MV copy
offer similar performance.
1.9 1.95 2 2.05 2.1 2.15 2.2 2.25
35.8
36
36.2
36.4
36.6
36.8
Packet loss in percent
PS
NR
Packet loss rate vs PSNR (foreman cif.yuv)
no error
No concealmentCTU copy
CTU copy with MV
28
Figure 25: PSNR (dB) vs. packet loss percentage for Flower Garden
3.3 Encoder complexity
HM encoder is rather slow, as it does not have hardware acceleration. On the other
hand, x265 incorporates x86 hardware acceleration, which makes it much faster than the
HM encoder. Below is the side by side comparison between the HM encoder and the x265
encoder for encoding 10 frames (YUV 4:2:0, CIF resolution) of three popular video
sequences. The target bitrate was 800 kbps for all three sequences.
Packet loss in percent0 5 10 15 20 25
PS
NR
0
5
10
15
20
25
30Packet loss rate vs PSNR (flower cif.yuv)
no errorNo concealmentCTU copyCTU copy with MV
29
Sequence HM Encoder (seconds) x265 Encoder (seconds)
Foreman 926.466 0.811
Mobile Calendar 717.088 1.168
News 758.475 0.729
Table 2: HM and x265 encoder execution time comparison
From the above table, we see the encoding does not take exactly same time for
each frame - it varies depending on the motion and texture complexity of the sequence.
The results show that the x265 encoder is roughly three orders of magnitude faster than
the HM encoder, which is expected given its optimized implementation.
30
Chapter 4. Conclusion
HEVC-SIM is a software tool that can help to understand and explore HEVC
encoder and decoder capabilities. Its graphical interface allows users to easily change
various encoder parameters and analyze the encoded bitstream. The software can
emulate packet loss on the bitstream using the Gilbert-Elliott channel model and save the
results in an output file that can later be used in testing network-based video bitstream
transmission. Further, the software is able to accept any user-supplied packet loss
realization, which can come from more sophisticated channel models or even real
measured packet traces.
Beyond integrating open source HEVC encoders and decoders under a common
user interface, two procedures for concealing the effects of packet loss are implemented
in HEVC-SIM. Both are based on utilizing the available CTU data to estimate pixel values
in lost CTUs. The two concealment methods were compared on several test sequences,
and the results showed that they can bring about 5-10 dB gain in PSNR compared to no
concealment. Both of these methods represent simple temporal concealment. In future
work, more sophisticated concealment methods could be implemented in HEVC-SIM. For
example, smaller units could be employed in the concealment, which could allow better
handling of objects and structures that are smaller than CTU. Also, rather than just using
an existing MV from a neighboring unit, a MV predictor could be employed to estimate the
motion of the lost CTU. Finally, spatio-temporal concealment may be able to provide
further improvement.
In other experiments, execution time of HM and x265 encoders was compared and
it was found that x265 is about three orders of magnitude faster. Overall, HEVC-SIM is an
excellent learning tool for HEVC, allowing the users to test and experience various HEVC
features.
31
References
[1] G. J. Sullivan, J.-R. Ohm, W.-J. Han, and T. Wiegand, “Overview of the high efficiency video coding (HEVC) standard,” IEEE Trans Circuits Syst. Video Technol., vol. 22, no. 12, pp. 1649–1668, Dec. 2012..
[2] Wikipedia contributors. High Efficiency Video Coding, [Online] Available: http://en.wikipedia.org/wiki/High_Efficiency_Video_Coding, 6 Mar, 2015.
[3] V. Sze and M. Budagavi, Design and Implementation of Next Generation Video Coding Systems (H.265/HEVC Tutorial), ISCAS Tutorial 2014, http://www.rle.mit.edu/eems/wp-content/uploads/2014/06/H.265-HEVC-Tutorial-2014-ISCAS.pdf.
[4] HM1.6 source code, https://hevc.hhi.fraunhofer.de/trac/hevc/browser/tags/HM-16.3, Jan, 2015.
[5] H. Hadizadeh, I. V. Bajić, and G. Cheung, “Video error concealment using a computation-efficient low saliency prior,” IEEE Trans. Multimedia, vol. 15, no. 8, pp. 2099-2113, Dec. 2013.
[6] x265 source code, https://bitbucket.org/multicoreware/x265, Apr, 2014.
[7] PSNR source code, https://code.google.com/p/webm/source/browse/src/psnr.c?repo=vpx-codec-comparison, Mar. 2015.
[8] Wikipedia contributors. Peak signal-to-noise ratio, [Online] Available: http://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio, Accessed Mar. 10, 2015
[9] OpenGL source code, http://trac.unfix.net/browser/yuvplayer/win/yuvplayer, Feb. 2014.
[10] G. Hasslinger and O. Hohlfeld, “The Gilbert-Elliott model for packet loss in real time services on the internet,” in Measuring, Modelling and Evaluation of Computer and Communication Systems (MMB), 2008 14th GI/ITG Conference -, 31 2008-April 2, pp. 1–15.
[11] K. Stuhlmüller, N. Farber, M. Link, and B. Girod, "Analysis of video transmission over lossy channels," IEEE Journal on Selected Areas in Communications, vol. 18, no. 6, pp. 1012-32, June 2000.
[12] H. Hadizadeh and I. V. Bajić, "Burst loss resilient packetization of video," IEEE Trans. Image Processing, vol. 20, no. 11, pp. 3195-3206, Nov. 2011.
32
[13] H. Hadizadeh and I. V. Bajić, "NAL-SIM: An interactive simulator for H.264/AVC video coding and transmission," Proc. IEEE CCNC'10, Las Vegas, NV, Jan. 2010
[14] Cisco Forecast highlights, [Online] Available: http://www.cisco.com/web/solutions/sp/vni/vni_forecast_highlights/index.html, Accessed May 12, 2015
[15] Definition of YUV and RBG, [Online] Available: http://www.pcmag.com/encyclopedia/term/55166/yuv-rgb-conversion-formulas, Accessed Mar 14, 2015
[16] Y.-M. Chen and I. V. Bajić, "Motion vector outlier rejection cascade for global motion estimation," IEEE Signal Processing Letters, vol. 17, no. 2, pp. 197-200, Feb. 2010.
[17] Y.-M. Chen and I. V. Bajić, "A joint approach to global motion estimation and motion segmentation from a coarsely sampled motion vector field," IEEE Trans. Circuits Syst. Video Technol., vol. 21, no. 9, pp. 1316-1328, Sep. 2011.
[18] S. H. Khatoonabadi and I. V. Bajić, "Video object tracking in the compressed domain using spatio-temporal Markov random fields," IEEE Trans. Image Processing, vol. 22, no. 1, pp. 300-313, Jan. 2013.
[19] S. H. Khatoonabadi, N. Vasconcelos, I. V. Bajić, and Y. Shan, “How many bits does it take for a stimulus to be salient?,” Proc. IEEE CVPR’15, pp. 5501-5510, Boston, MA, Jun. 2015.
[20] S. H. Khatoonabadi, I. V. Bajić, and Y. Shan, “Compressed-domain correlates of human fixations in dynamic scenes,” Multimedia Tools and Applications, 2015. http://dx.doi.org/10.1007/s11042-015-2802-3
Top Related