THEJASWINI PURUSHOTHAM ELECTRICAL ENGINEERING GRADUATE STUDENT THE UNIVERSITY OF TEXAS AT ARLINGTON...
-
Upload
rose-williams -
Category
Documents
-
view
215 -
download
0
Transcript of THEJASWINI PURUSHOTHAM ELECTRICAL ENGINEERING GRADUATE STUDENT THE UNIVERSITY OF TEXAS AT ARLINGTON...
8 Septmeber 2010
1
THEJASWINI PURUSHOTHAMELECTRICAL ENGINEERING GRADUATE
STUDENTTHE UNIVERSITY OF TEXAS AT
ARLINGTON
ADVISOR Dr. K . R . RAO, EE DEPT, UTA
Low Complexity H.264 Encoder using Machine Learning.
8 Septmeber 2010
2
Agenda
Introduction.H.264/AVC.Machine learning.C4.5.Weka.Thesis Approach.Results.Conclusions.
8 Septmeber 2010
3
video compression and standardization
Importance of video Need for compression
High bandwidth requirements Remove inherent redundancy
Need for standardization Ensures interoperability
Coding Effi-
ciency
Network
awareness
Complexity2005
2010
1999
1994
MPEG4
H.264
1992MPEG1
Video Conferencing
H.263
2003
Mobile Phone
Hand PC
Mobile TV
SVC
HDTV
MPEG2
H.265/HEC/ NGVC
VC-1
8 Septmeber 2010
4
MOTIVATION FOR THE RESEARCH
8 Septmeber 2010
5
Motivation for a low complexity H.264 encoder
H.264 can achieve considerably higher coding efficiency than previous standards.
Motion estimation, in-loop deblocking filter, sub-pel interpolation and mode decision bring in the complexity.
The high-computational complexity of H.264 and real-time requirements of video systems are the main challenges.
8 Septmeber 2010
6
OVERVIEW OF H.264/AVC
8 Septmeber 2010
7
Design Features Highlights
Features for enhancement of prediction Directional spatial prediction for intra coding
9 intra 4x4 modes + 4 intra 16x16 modes + 9 intra 8x8 modes
Variable block-size motion compensation with small block size 16x16, 16x8, 8x16, 8x8, 8x4, 4x8, 4x4
Quarter-sample-accurate motion compensation Multiple reference picture motion compensation In-the-loop deblocking filtering to remove blocky artifacts
Features for improved coding efficiency Small block-size transform – 4x4 and 8x8 integer DCT Exact-match inverse transform Short word-length transform Hierarchical block transform Arithmetic entropy coding Context-adaptive entropy coding
8 Septmeber 2010
8
H.264 - Encoder
8 Septmeber 2010
9
H.264 Decoder
8 Septmeber 2010
10
H.264 Decoder
8 Septmeber 2010
11
OVERVIEW OF MACHINE LEARNING
8 Septmeber 2010
12
Machine learning is a subfield of artificial intelligence.
It is the subject concerned with the design and development of algorithms and techniques that allow computers to learn.
Machine learning method in this thesis extracts rules and patterns out of massive data sets.
The major focus of machine learning research is to extract information from data automatically, by computational and statistical methods.
8 Septmeber 2010
13
C4.5 CLASSIFIER
8 Septmeber 2010
14
C4.5 was developed by Ross Quinlan.C4.5 (know as a J48) is a system that constructs
classifiers.Classifiers are one of the commonly used tools in
data mining.Such systems take as input a collection of cases,
each belonging to one of a small number of classes and described by its values for a fixed set of attributes.
With that, a classifier accurately predicts the class to which a new case belongs.
C4.5 uses the information gain of the data attribute to sort the data.
8 Septmeber 2010
15
Illustration of C4.5 classification
8 Septmeber 2010
16
Decision tree
8 Septmeber 2010
17
WEKA
Weka is a collection of machine learning algorithms for data mining tasks.
The algorithms can either be applied directly to a dataset or called from another Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization.
It is also well-suited for developing new machine learning schemes [25].
8 Septmeber 2010
18
COMPLEXITY IN THE H.264 ENCODER
8 Septmeber 2010
19
Figure 1: Multi-frame Motion Estimation.
8 Septmeber 2010
20
The most computational expensive process in H.264 is the Motion Estimation.
For example, assuming FS and P block types, Q reference frames and a search range of MxN, MxNxPxQ computions are needed.
8 Septmeber 2010
21
APPROACH IN THIS THESIS
8 Septmeber 2010
22
Approach
J4.8 analysis is used to reduce the complexity of determining mode decisions.
The statistics for each 16x16 macroblock of the first four frames of the video sequence is calculated.
The statistics are the mean, variance, variance of means for all the sub macroblock sizes in the macroblock, mean of the adjacent macroblocks, variance of the adjacent macroblocks and variance of means for all the submacroblock sizes in the adjacent blocks.
8 Septmeber 2010
23
Figure 2:Flow chart of the process followed to achieve the low complexity encoder.
8 Septmeber 2010
24
The modes for the same first four frames from the video sequences are determined from the H.264 encoder in the JM 16.2 software.
These modes and the determined statistics are collectively given as attributes for training in the WEKA tool.
This is an offline process. WEKA tool uses C4.5 (J48) classifier algorithm to
determine the mode decision tree. A universal tree that can give relatively accurate
mode decisions to any video sequence is developed.
8 Septmeber 2010
25
Different combination of video sequences are used for training the mode decision trees and later testing the mode decision trees.
Table 1 summarizes the results. The attributes most commonly considered for mode
decision in all the entries in the table are considered to determine the mode decision for the universal mode decision tree.
This tree is implemented in the form of if – else statements in the motion estimation block of JM16.2.
Hence, the mode decision process is reduced to if –else statements.
8 Septmeber 2010
26
Attributes in the thesis
The metrics used in the decision trees are the mean, variance, variance of means, residual absolute sum, residual mean, residual variance, residual variance of means and means of variance.
These metrics were calculated for the main MB shapes 16x16, 8x8 and 4x4.
8 Septmeber 2010
27
Decision Tree for mode decision
8 Septmeber 2010
28
Table 1:Classification rule accuracy
Training Seq 1
% Accuracy * for Training seq 1
Training seq 2
% Accuracy * for Training seq 2
Test sequence
% Accuracy*
Bus_cif 70.6861 Foreman_cif 80.7645 Mobile_cif 77.188
Stefan_cif 81.8182 Tempete_cif 82.8897 Container_cif 85.207
Container_cif 98.9268 ------ ---- Waterfall_cif 93.358
Waterfall_cif 90.5636 --------- ------- Stefan_cif 85.9583
Bus_cif 70.6861 -------- ------ Container_cif 86.529
Bus_cif 75.4665 Foreman_cif 94.9495 Mobile_cif 82.0896
Stefen_cif 88.3838 Tempete_cif 85.0444 Container_cif 90.1812
Container_cif 98.131 ---------- ----- Waterfall_cif 95.00442
Waterfall_cif 92.1086 -------- ------------- Stefan_cif 88.952
Bus_cif 70.6861 --------- ------------- Bus_cif 74.8865
Waterfall_cif 90.5636 --------- ------ Bus_cif 83.0469
8 Septmeber 2010
29
Table 1 summarizes the WEKA tool results.The accuracy in determining the modes from
the classification rule is summarized.
30
8 Septmeber 2010
Sequence Encoding time (seconds)
for JM 16.2 without
machine learning.
Enc oding time
(seconds) using
machine learning.
ME time (seconds) for JM
16.2 without machine
learning.
ME time (seconds) using
machine learning.
Foreman_qcif 346.720 270.037 247.147 151.595
Coast_qcif 361.714 279.803 242.531 144.371
Car phone_qcif 347.85 269.674 249.081 152.576
Silent_qcif 368.155 253.006 254.297 139.053
Suzie_qcif 343.983 342.583 263.777 260.981
Miss-america_qcif 368.694 198.909 310.542 141.584
Bus_cif 1608.934 1346.542 1010.012 617.088
Container_cif 1542.106 1241.772 1109.672 686.165
Foreman_cif 1689.383 889.833 1316.543 537.128
Mobile_cif 2031.07 1695.243 1066.867 627.440
Tempete_cif 1808.560 1361.954 1078.435 590.689
Stefan_cif 1750.255 1267.813 1136.800 617.822
Waterfall_cif 1497.525 994.996 1017.974 529.557
Mother-daughter_qcif 422.332 360.371 322.011 276.212
Table 2: Results obtained using JM 16.2 and JM using machine learning for 4 frames.
31
8 Septmeber 2010
Table 3: Speed up in encoding time and motion estimation time for 4 frames using machine learning compared to JM 16.2 encoder.
Sequence Speed up in Encoding time Speed up in ME time
Foreman_qcif 22.11 % 38.66 %
Coast_qcif 22.64 % 40.47 %
Car phone_qcif 22.47 % 38.74 %
Miss-america_qcif 15.772% 28.86%
Bus_cif 16.30 % 38.90 %
Container_cif 19.47 % 38.16 %
Foreman_cif 47.32 % 59.20 %
Mobile_cif 47.47% 62.99%
Tempete_cif 40.370% 56.629%
Stefan_cif 35.04% 51.268%
Waterfall_cif 32.022% 46.778%
Silent_qcif 30.9266% 45.039%
Suzie_qcif 23.36779% 23.819%
Mother-daughter_qcif 23.75% 23.353%
32
8 Septmeber 2010
Motion estimation time for 4 frames for sequences in Table 3.
1 2 3 4 5 6 7 8 9 10 11 12 13 140
200
400
600
800
1000
1200
1400
ME (sec)ME machine learning (sec)
Sequence Number
Sec
33
8 Septmeber 2010
Table 4: Comparison of compressed file sizes for four frames for sequences in Table 2.
Sequence Compressed file
size (KB) in JM 16.2
encoder.
Compressed file
size (KB) using
machine learning.
% Increase in
encoded file size
using machine
learning
Foreman_qcif 4.34 4.34 0
Coast_qcif 5.68 5.67 + 0.0017
Silent_qcif 4.0 4.0 0
Suzie_qcif 3.0 3.0 0
Car phone_qcif 4.52 4.54 0.0044
Bus (cif) 31.9 32.2 0.0093
Container (cif) 12.0 12.0 0
Foreman (cif) 12.4 12.7 0.1903
Mobile(cif) 50.4 51.0 0.0119
Stefan(cif) 32.5 34.0 0.0462
Waterfall(cif) 18.7 19.0 0.0160
Tempete(cif) 36.7 37.0 0.0082
Miss-america_qcif 2.0 2.0 0
Mother-daughter_qcif 2.279 2.279 0.0
34
8 Septmeber 2010
Compressed file sizes using machine learning for four frames for sequences in Table 4.
1 2 3 4 5 6 7 8 9 10 11 12 13 140
10
20
30
40
50
60
Compressed file size using JM 16.2
Compressed file size using machine learn-ing
Sequence Number
KB
35
8 Septmeber 2010
Sequence PSNR(dB) using JM 16.2
encoder
PSNR (dB) using machine
learning
MSE using JM 16.2
encoder.
MSE using machine
learning.
Foreman_qcif 37.389 37.324 11.881 12.068
Coast_qcif 35.24 35.21 19.539 19.681
Car ph_qcif 37.937 37.879 10.472 10.619
Miss-america_qcif 40.949 40.881 5.22970 5.31475
Bus_cif 35.961 35.932 16.518 16.633
Container_cif 37.162 37.153 12.517 12.544
Foreman_cif 37.833 37.85 10.371 10.684
Mobile_cif 35.541 35.512 18.2873 18.419
Tempete_cif 35.962 35.93 16.594 16.705
Stefan_cif 37.011 36.985 13.00572 13.08644
Waterfall_cif 35.912 35.906 16.6923 16.716
Mother-daughter_qcif 38.363 38.363 9.481 9.481
Silent_qcif 36.784 36.775 13.63795 13.6775
Suzie_qcif 37.749 37.741 10.938 10.381
Table 5: Comparison of PSNR and MSE for four frames.
36
8 Septmeber 2010
Comparison of PSNR and MSE for four frames in Table 5.
1 2 3 4 5 6 7 8 9 10 11 12 13 1432
33
34
35
36
37
38
39
40
41
42
PSNR using JM 16.2
PSNR using machine learn-ing
Sequence Number
PSN
R
37
8 Septmeber 2010
Table 6: SSIM comparison for four frames.
Sequence SSIM for JM 16.2 SSIM using machine
learning.
% decrease **
Foreman_qcif 0.95944 0.95910 0.035
Coast_qcif 0.91793 0.91763 0.032
Car phone_qcif 0.96670 0.96641 0.029
Suzie_qcif 0.9555 0.9557 0.0002
Bus_cif 0.94973 0.94941 0.033
Container_cif 0.92827 0.92823 0.0043
Foreman_cif 0.94302 0.94306 + 0.0042
Mobile_cif 0.9758 0.9755 .00003
Tempete_cif 0.9711 0.9709 0.02
Stefan_cif 0.9807 0.9806 0.0001
Waterfall_cif 0.9420 0.9420 0.00
Silent_qcif 0.9600 0.9600 0.00
Miss-america_qcif 0.9707 0.9706 0.001
Mother-daughter_qcif 0.9663 .9663 0.00
38
8 Septmeber 2010
Comparison of SSIM for four frames in Table 6.
1 2 3 4 5 6 7 8 9 10 11 12 13 140
5
10
15
20
25
MSE using JM 16.2
MSE using machine learn-ing
Sequence Number
MSE
8 Septmeber 2010
39
CONCLUSIONS
It was observed that a single universal mode decision tree failed in terms of fidelity of the video when all the modes for ME/MC were used in the machine learning algorithm.
So this thesis uses only sub macroblock modes, i.e 8x8, 8x4, 4x8 and 4x4 modes for the machine learning. The function called ‘submacroblock_mode_decision’ in the JM 16.2 was replaced by the if-else statements .
The results are tabulated in the Tables 7 through 11. From Table 8, it is clear that the average speed up in the encoding time is 28.5%. The average speed up in the motion estimation time is 42.846%.
From table 9, the average percentage decrease in compressed file size is 0.36%. From Table 11, it is evident that the average decrease in SSIM is less than 0.0107%.
When 100 frames are encoded the average speed up in the encoding time is 8.5%. The average speed up in the motion estimation time is 18.346% and the average decrease in SSIM is less than 0.0109%.
8 Septmeber 2010
40
REFERENCES
[1] http://iphome.hhi.de/suehring/tml/ for JM software [2] Soon-kak Kwon, A. Tamhankar and K.R. Rao ”Overview of H.264 / MPEG-4 Part 10”, J.
Visual Communication and Image Representation, vol. 17, pp.186-216, April 2006.[3] http://www.vcodex.com/files/h264_overview_orig.pdf reference for H.264[4] http://iphome.hhi.de/suehring/tml/JM%20 Reference%20Software%20Manual%20(JVT-
AE010).pdf for JM reference software documentation manual[5] G. A. Davidson, et al “ATSC video and audio coding”, Proceedings of IEEE, vol. 94, pp. 60-
76, Jan. 2006[6] http://www.birds-eye.net/definition/c/cif-common_intermediate_format.shtml for
information about CIF and QCIF formats[7] M.Fieldler, “Implementation of basic H.264/AVC Decoder”, seminar paper at Chemnitz
University of Technology, June 2004[8] A.Puri, X.Chen and A. Luthra , “ Video coding using H.264/MPEG-4 AVC compression
standard”, Science Direct. Signal processing: Image communication, vol.19, pp 793-849, Oct. 2004.
[9] T.Wiegand, et al “Overview of the H.264/AVC video coding standard”, IEEE Trans. CSVT, vol.13, pp 560-576, July 2003.
41
8 Septmeber 2010
[10] T. Wiegand and G. J. Sullivan, “The H.264 video coding standard”, IEEE Signal Processing Magazine, vol. 24, pp. 148-153, March 2007.
[11] D. Marpe, T. Wiegand and G. J. Sullivan, “The H.264/MPEG-4 AVC standard and its applications”, IEEE Communications Magazine, vol. 44, pp. 134-143, Aug. 2006.
[12] R. Schäfer, T. Wiegand and H. Schwarz, “The emerging H.264/AVC standard”, EBU Technical Review, Jan. 2003.
[13] Video test sequences (YUV 4:2:0): http://trace.eas.asu.edu/yuv/index.html [14] Z. Wang et al, “Image quality assessment: From error visibility to structural similarity,”
IEEE Trans. on Image Processing, vol. 13, pp. 600-612, Apr. 2004. [15] Z. Wang, L. Lu, and A.C. Bovik, “Video quality assessment based on structural distortion
measurement,” Signal Processing: Image Communication, Special Issue on Objective Video Quality Metrics, vol. 19,pp. 122-124, Jan. 2004.
[16] Z. Wang, H.R. Sheikh, and A.C. Bovik, “Objective video quality assessment,” in The Handbook of Video Databases: Design and Applications (B. Furht and O. Marques, eds.), pp. 1041–1078, CRC Press, Sept. 2003.
[17] T.K. Tan, G. Sullivan and T. Wedi, “Recommended simulation conditions for coding efficiency experiments”, ITU-T SC16/Q6, 34th VCEG Meeting, Antalya, Turkey, Jan. 2008, Doc.VCEG-AH10r3.
[18] P.Carrillo, H.Kalva, and T.Pin, “Low complexity H.264 video encoding”, Applications of Digital Image Processing. Proc. of SPIE, vol. 7443, 74430A, Sept.2009.
8 Septmeber 2010
42
[19] G.Sullivan and T.Wiegand, “Video compression – From concepts to the H.264/AVC Standard,” Proc. IEEE, vol.93, pp. 18-31, Jan.2005.
[20] http://www.apple.com/quicktime/technologies/h264/ for H.264 codec reference[21] D. Kumar, P. Shastry and A. Basu, “Overview of the H.264 / AVC”, 8th Texas Instruments
Developer Conference India, 30 Nov. – 1 Dec. 2005, Bangalore.[22] http://wiki.multimedia.cx/index.php?title=Motion_Prediction for motion prediction[23] Zhi-Yi Mai, et al “A new-rate distortion optimization using structural information in H.264
I-frame encoder” ACIVS 2005, LNCS 3708, pp. 435–441, 2005 [24] Z. Wang and A. C. Bovik, Modern Image Quality Assessment. Synthesis Lectures on
Image, Video and Multimedia Processing. Morgan and Claypool, 2006.[25] http://www.cs.waikato.ac.nz/ml/weka/ for WEKA tool download[26]I.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley , 2006.[27]I.E.Richardson, “The H. 264 Advanced Video Compression Standard”, Wiley, II edition,
2010.[28] HTTP://iphome.hhi.de/suehring/tml/download/ , JM reference software.[29] http://trace.eas.asu.edu/yuv/index.html, Video sequences.[30] E. Peixoto, R. L. de Queiroz, and D. Mukherjee, “Mobile video communications using a
Wyner-Ziv transcoder,” Proc. SPIE 6822, VCIP, 68220R Jan. 2008.[31] A. Aaron, D. Varodayan, and B. Girod, “Wyner-Ziv residual coding of video,” Proc.
International Picture Coding Symposium, Beijing, P. R. China , April 2006.
8 Septmeber 2010
43
THANK YOU
8 Septmeber 2010
44
H.264 - Profiles
8 Septmeber 2010
45
Design Features Highlights
Features for enhancement of prediction Directional spatial prediction for intra coding Variable block-size motion compensation with small block
size Quarter-sample-accurate motion compensation Motion vectors over picture boundaries Multiple reference picture motion compensation Decoupling of referencing order from display order Decoupling of picture representation methods from picture
referencing capability Weighted prediction Improved “skipped” and “direct” motion inference In-the-loop deblocking filtering
8 Septmeber 2010
46
Features for improved coding efficiency Small block-size transform Exact-match inverse transform Short word-length transform Hierarchical block transform Arithmetic entropy coding Context-adaptive entropy coding
8 Septmeber 2010
47
Features for robustness to data errors/losses Parameter set structure NAL unit syntax structure Flexible slice size Flexible macroblock ordering (FMO) Arbitrary slice ordering (ASO) Redundant pictures Data Partitioning SP/SI synchronization/switching pictures
8 Septmeber 2010
48
Directional spatial prediction for intra coding
Intra prediction is to predict the texture in current block using the pixel samples from neighboring blocks
Intra prediction for 44 (9 modes) and 16 16 blocks (4 modes) are supported in all H.264 profiles.
Intra prediction for 8x8 (9 modes) is supported in the high profiles.
8 Septmeber 2010
49
Luma prediction modes in H.264
8 Septmeber 2010
50
Variable block-size motion compensation
Partitioned in 2 stagesIn the 1st stage, determine first 4 modes
161616881688
If mode 4 (88) is chosen, further partition into smaller blocks for every 88 block
844844
At most 16 motion vectors may be transmitted for a 1616 macroblockSub pixel accuracyLarge computational complexity to determine the modes but efficient encoding
8 Septmeber 2010
51
Variable block-size motion compensation
8 Septmeber 2010
52
P Slice More than one prior coded picture can be
used as reference for MC prediction Reference index parameter is transmitted
for each MC 1616, 168, 816 or 88 For smaller blocks within the 88 use 1
reference index P-Skip type is supported
B Slice Utilize two distinct lists of reference
pictures Four different types of inter-picture predict
List 0, list 1, bi-predictive, and direct Bi-predictive
weighted average of MC list 0 and list 1
Direct prediction Inferred from previously transmitted
syntax Either list 0 or list 1 prediction or bi-
predictive Similar macroblock partitioning as P slices
is utilized B Skip mode is supported
Multiple reference picture motion compensation
P frame
B frame
8 Septmeber 2010
53
Hierarchical block transform
4x4 and 8x8 (high profile only) multiplier-free integer DCT transform
Transform coefficients perfectly invertible Hierarchical transform (Integer DCT and
Hadamard) For macroblock coded in 1616 Intra mode and
chrominance blocks DC coefficients are further grouped and
transformed Hadamard transform is used for chrominance
block
Integer DCT 4x4 Integer DCT 8x8Hadamard 4x4
Hadamard 2x2
8 Septmeber 2010
54
In loop deblocking filter
Block based operations are responsible for blocking artifacts
In-loop deblock filter –smoothes blocky edges; increases rate-distortion performance.
Applied to all 4x4 blocks except at picture boundaries.
Filtering adaptive at Slice level Block level Pixel level
Vertical edges filtered first (left to right)
Followed by horizontal edges (top to bottom)
8 Septmeber 2010
55
Entropy encoding
CAVLC (Context-based Adaptive Variable
Length Coding).
CABAC (Context-based Adaptive Binary
Arithmetic Coding).
CAVLC makes use of run-length encoding.
CABAC utilizes arithmetic coding; codes
both MV and residual transform
coefficients.
Typically CABAC provides 10-15 %
reduction in bit rate compared to CAVLC,
for the same PSNR.
All other syntax elements are encoded by
Exp-Golomb codes (Universal Variable
Length Codes (UVLC)).
CAVLC
CABAC
8 Septmeber 2010
56
Computational Overhead
Entropy encodingMultiple block sizeSmaller block sizeInteger transformIn-loop deblocking
8 Septmeber 2010
57
H.264 Extensions Scalable video coding
Application scenario
8 Septmeber 2010
58
H.264 Extensions Scalable video coding
8 Septmeber 2010
59
Types of Scalability
8 Septmeber 2010
60
H.264 Extensions Multi view coding
Applications3-D Video
Stereoscopic TV
8 Septmeber 2010
61
H.264 Extensions Multi view coding
8 Septmeber 2010
62
Snapshots of video sequences considered in the thesis.
8 Septmeber 2010
63
8 Septmeber 2010
64
8 Septmeber 2010
65
8 Septmeber 2010
66