Digital Media and Interactive...
-
Upload
nguyenquynh -
Category
Documents
-
view
216 -
download
1
Transcript of Digital Media and Interactive...
January 8, 2014 Sam Siewert
Digital Media and Interactive Systems
Deeper Dive into MPEG Digital Video Encoding
MPEG Encode/Decode
Tools
Sam Siewert
2
FFMPEG FAQ
Read It!! http://ffmpeg.org/faq.html You should know how to Decode Video (recorded from your camera or pre-recorded by someone else) You should know how to Encode Video (to turn in with your labs) On Ubuntu – do “apt-get install ffmpeg” to get it!
Sam Siewert 3
Ffmpeg (avconv) Notes sudo apt-get install ffmpeg ffmpeg -i movie.mpg –ss 30 –t 30 movie%d.ppm –- 30 seconds @ 30 sec
ssiewert@ssiewert-VirtualBox:~/a485/media$ ffmpeg -i big_buck_bunny_480p_surround-fix.avi -ss 30 -t 30 bbb%d.ppm ffmpeg version 0.8.6-4:0.8.6-0ubuntu0.12.04.1, Copyright (c) 2000-2013 the Libav developers built on Apr 2 2013 17:02:36 with gcc 4.6.3 Input #0, avi, from 'big_buck_bunny_480p_surround-fix.avi': Duration: 00:09:56.45, start: 0.000000, bitrate: 2957 kb/s Stream #0.0: Video: mpeg4 (Simple Profile), yuv420p, 854x480 [PAR 1:1 DAR 427:240], 24 tbr, 24 tbn, 24 tbc Stream #0.1: Audio: ac3, 48000 Hz, 5.1, s16, 448 kb/s Incompatible pixel format 'yuv420p' for codec 'ppm', auto-selecting format 'rgb24' [buffer @ 0x907700] w:854 h:480 pixfmt:yuv420p [avsink @ 0x9054c0] auto-inserting filter 'auto-inserted scaler 0' between the filter 'src' and the filter 'out' [scale @ 0x905b60] w:854 h:480 fmt:yuv420p -> w:854 h:480 fmt:rgb24 flags:0x4 Output #0, image2, to 'bbb%d.ppm': Metadata: encoder : Lavf53.21.1 Stream #0.0: Video: ppm, rgb24, 854x480 [PAR 1:1 DAR 427:240], q=2-31, 200 kb/s, 90k tbn, 24 tbc Stream mapping: Stream #0.0 -> #0.0 Press ctrl-c to stop encoding ... Last message repeated 719 times -0kB time=29.00 bitrate= -0.0kbits/s frame= 720 fps= 38 q=0.0 Lsize= -0kB time=30.00 bitrate= -0.0kbits/s video:864686kB audio:0kB global headers:0kB muxing overhead -100.000002% ssiewert@ssiewert-VirtualBox:~/a485/media$
Sam Siewert 4
Now with PPM Frames PPM is Simple, but No Compression – Good for CV – http://en.wikipedia.org/wiki/Netpbm_format - Read this! – JPEG, PNG are Compressed – TIFF is an Alternative, but More Complex
Sam Siewert 5
Simple Re-encode When Quality is not a Concern, Keep it Simple
ffmpeg -f image2 -i bbb%d.ppm bbbtrans.mpg vlc bbbtrans.mpg
Sam Siewert 6
Quality Encoding is Tricky Use MPEG4 HQ Settings, Encode 480p, AR=4:3 ffmpeg -f image2 -i bbb%d.ppm -maxrate 20000k -bufsize 32M -s 640x480 -vcodec mpeg4 -qscale 1 bbbtranshq.mp4
Sam Siewert 7
MPEG Encode/Decode
Theory and Algorithms
Sam Siewert
8
Overview MPEG-2 Standards – 13818-1: Transport Streams for Video & Audio
Container for Program Streams (188 Byte Packets) Multiplexed Video and Audio Elementary Streams PSI – Program Specific Information System Clock (PCT, PTS/DTS)
– 13818-2: Elementary Video Stream Encode/Decode
Color Format Macro-blocks, Video DCT GoP (I-Frame, B-Frame, P-Frame) Motion Compensation and Vector Quantization
Differences Between MPEG-2 and MPEG-4
Sam Siewert 9
MPEG-2: Order Of Operators
Sam Siewert 10
#1: POINT (Pixel) Encoding #2 A-C: Macro-Block Lossy Intra-Frame Compression #3: Motion-Based Compression in Group of Pictures
#1
#2A
#2B #2C #3
Sam Siewert 11
Step #1 – RGB to YCrCb 4:4:4 24-bit (Lossless)
For every Y sample in a scan-line, there is also one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – No compression between RGB and YCrCb 4:4:4 (both 24 bits/pixel)
Typically a Post Production, CEDIA or DCI format
… 0 319
… 76,480 76,799
…
= Y, Cr, and Cb sample = Y sample only
48 bit to 32 bit
Sam Siewert 12
Step #1 – RGB to YCrCb 4:2:2 (Lossy) For every 2 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixels = 48 bits, Whereas Two YCrCb is 32 bits, or 16
bits per pixel vs. 24 bits per pixel (33% smaller frame size)
… 0 319
… 76,480 76,799
…
= Y, Cr, and Cb sample = Y sample only
Sam Siewert 13
Step #1 – RGB to YCrCb 4:2:0 (Lossy) For every 4 Y samples in a scan-line, one CrCb sample – Each Y (Y7:Y0), Cr (Cr7:Cr0), & Cb (Cb7:Cb0) Sample is 8 bits – Two RGB Pixes = 48 bits, Whereas Four YCrCb is 48 bits, or 12
bits per pixel on average vs. 24 bits per pixel (50% smaller)
… 0 319
… 76,480 76,799
…
= Cr, Cb sample = Y sample only
Step #2 – Convert to 8x8 Macroblocks and Transform
Aspect Ratios Designed to Fit 8x8 Macroblock E.g. 640 x 480 => 80 x 60 Macroblocks Discrete Cosine Transform Applied to Each 8x8 – Spatial Intensity to Frequency Transform – Applied on X Axis (Row) – Applied on Y Axis (Column)
Set up for Intra-frame (I-frame) Compression
Sam Siewert 14
Convolution Concepts Math operation on 2 functions, that produces a 3rd Point Spread Function “Sharpen” meets this Definition So do Many Mask Operations applied to Pixel Neighborhoods
Sam Siewert 15
2 impulses, f(t), g(X – t)
Area inside intersection
f convolved with g over t
DCT – Discrete Cosine Transform Convolution of Image with Discrete Cosine See http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/example-dct1/ De-convolved to restore image from Convolved Image
Sam Siewert 16
DCT
Inverse DCT
DCT Concepts F(x) is a sum of sinusoids (with frequency, amplitude) DCT operates of a discrete number of samples Can derive DC sum at any x, even where F(x) not known N x N Macro-block has Zero Frequency DC at 0,0 Increasing Horizontal Frequency Increasing Vertical Frequency Can De-convolve (inverse DCT, or iDCT) Can Eliminate High Frequency Horizontal and Vertical Terms – Minimal Losses from Truncation (otherwise lossless) – Loss of High Frequency Image Features (What are These?)
Sam Siewert 17
Basic Concept of Waveforms Complex Waveform is Sum of Simple Fundamentals Simple Fundamentals Can Be Derived from Complex
Sam Siewert 18
Scanline DCT Example Small Losses Due to DCT, iDCT Numerical Truncation Larger Losses Due to H.O.T. Quantization and Truncation http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_doc/1D-DCT-N-Fundamentals.xlsx
Sam Siewert 19
What Is Lost with DCT Quantization? Noise More Than Anything Else Complex XY Variable Patterns (Real Science Data?)
Sam Siewert 20
Complex Tiling Higher Frequency X Higher Frequency Y Terms Can Still be Ignored
Complex Wood Texture Most Detail in X Far Less in Y
Randomized Texture Image High X Detail High Y Detail Most Loss of Detail, But Noisy
Step #2A: Macro-block Discrete Cosine Transform
8x8 Pixel Block – Macro-block – SD NTSC 720x480 (90x60 Macro-blocks), 3:2 Aspect Ratio – HD 720 1280x720 (160x90 Macro-blocks), 16:9 AR – HD 1080 1920x1080 (240x135 Macro-blocks), 16:9 AR
Sam Siewert 21
Step #2B: Macro-block Quantization (Lossy)
Apply Weighting and Scaling 8x8 to DCT Produces Lots of Repeated Values (and Zeros) Compared to Original
Sam Siewert 22
Decode Process for #2A-B
Sam Siewert 23
How Lossy is the Decode Macro-Block?
Sam Siewert 24
OpenCV Macroblock DCT Example Same Cactus 320x240 with 80x80 DCT Macroblocks
Sam Siewert 25
DCT iDCT
Same Cactus 320x240 Again with 8x8 DCT Macroblocks
DCT iDCT
Mathematics for 2D DCT Frequency Variation on X and Y axes from top left to bottom right Straight-forward Algorithm Based on 2D Equation is O(n2) per dimension Like Cooley-Tukey for DFT, a DCT Algorithm that is O(n*log2(n)) has been formulated (Arai, Y.; Agui, T.; Nakajima, M. - Numerical Recipes: The Art of Scientific Computing (3rd ed.)) http://www.cse.uaa.alaska.edu/~ssiewert/a490dmis_code/dct2/dct2.c
Sam Siewert 26
http://en.wikipedia.org/wiki/File:Dctjpeg.png
Step #2C: Macro-block Run-Length and Huffman Encoding
Zig-Zag Run-Length Encoding to Exploit Repeated Data and Zeros found in H.O.T. of Quantized DCT
– 86, 1, 7, -5, -1, 0, 1, 0, 0, 2, -1, 1, 0, -1, 0 , 0, 0, 0, -1, 0, 0, …
Becomes:
Sam Siewert 27
Huffman Applied to RLE Data Huffman Tables for MPEG-2 Macro-Blocks Defined in 13818-2 (Lossless) Compression Based on Probability of Occurance Shannon’s Source Coding Theory: log2(P), P=probability of occurrence, Binary encoding of Symbols
Sam Siewert 28
Step #3: Group of Pictures Concept – Transmit Change-Only Data I-Frame Compressed Only Intra-Frame By Methods #2A-2C to Macro-Blocks I-Frame Can Be Decoded Alone P-Frame is Differences Only Over the GoP B-Frame is Differences Only Between Both I-Frame and Closest P-Frame Difference Data Can be Further Encoded with Lossless Methods Without Steps 2A-C, Specifically Quantization, and With High Motion Video, Could Blow-Up
Sam Siewert 29
Group of Pictures: High Level View
Sam Siewert 30
Overall MPEG YCrCb Compression Performance
Standard Definition 720x480x2 (675KB/frame) @ 30fps – Requires 20MB/sec (200 Mbps) Uncompressed – Typical MPEG-2 @ 3.75 Mbps, > 50x Compression – Typical MPEG-4 @ 1.5 Mbps, > 100x Compression – 10 to 20 Programs on QAM 256 (48Mbps, 6MhZ/Ch) – ≈10 MPEG-4 Programs on ATSC 8VSB (19.39 Mbps, 6MhZ/Ch)
HD 720p (1280x720x2,1800KB/frame) @ 30fps – Requires 53MB/sec (530Mbps) Uncompressed – Typical MPEG-2 @ 20 Mbps, > 25x Compression – Typical MPEG-4 @ 10 Mbps, > 50x Compression
HD 1080p (1920x1080x2, 4050KB/frame) @ 30fps – Requires 120MB/sec (1200Mbps) Uncompressed – Typical MPEG-2, VC-1 @ 45 Mbps, > 30x Compression – Typical MPEG-4 @ 20 Mbps, > 60x Compression
Sam Siewert 31
13818-2 Defines Elementary/Program Streams
13818-2: Elementary Video Stream Encode/Decode – Defines Color Sub-Sampling Formats – 8x8 Macro-Block Encoding – Video DCT – Post DCT Macro-Block Quantization Weighting and Scaling
Coefficients – RLE Zig-Zag Macro-Block Sampling – Huffman Encoding Table – Group of Pictures:I-Frame, B-Frame, P-Frame – Presentation and Decode Time Stamps (PTS/DTS) – Order of Encode and Decode Operations
Not Suitable for Transport over Networks, but Sufficient for Local Playback (DVD, PC HDD, Flash-Memory Media)
Sam Siewert 32
13818-1 13818-1: Transport Streams for Video & Audio – Container for Program Streams (188 Byte Packets) – Multiplexed Video and Audio Elementary Streams – PSI – Program Specific Information
PID – Program ID Guide Data Emergency Broadcast
– System Clock (PCR, PTS/DTS) – Sequence Headers – Resolution and Format Information, Bit-Rate
GoP Header, Frame Header Slices of Macro-Blocks for Resolution Decoder Information (Color, Quantization Tables)
– Can Be Multiple Programs or Combined Audio and Video as a Program
MPEG-2 Video Elementary Stream AC-3 Audio Elementary Stream Secondary Audio Stream (Different Language) Up to 10 or More Audio+Video in One Transport Stream for Virtual Channels
Sam Siewert 33
Parsing an Elementary Video Stream
Sam Siewert 34
Many 188-Byte Packet Types and Header Allows for Multi-plexing of many Video and Audio Streams on a Carrier
MPEG-4 vs. MPEG-2 MPEG-2 – Defined by ISO 13818-1, 13818-2 – Leverages MPEG-1 (Motion Picture Experts Group – 1988) – Widely Used for Digital Video – Digital Cable TV, DVD – Transport Stream designed for Broadcast (Lossy, No Beginning or End of
Stream) ATSC – Advanced Television Systems Committee (HDTV Broadcast)
– 8VSB Modulation – 8 level Vestigal Sideband Modulation, 6MhZ channel, 19.39 Mbps, Reed-Solomon Error Correction
– Up to 1080p (1920x1080) Video Resolution – AC-3 (Dolby) Audio
DVB – Digital Video Broadcast (Europe, Satellite) – Program Stream designed for Playback Media (DVD, Flash, HDD, etc.)
MPEG-4 – Defined by ISO 14496 (1998) – Leverages MPEG-2 Standards for Program/Transport, Encode/Decode – Better Compression Rates (improved motion prediction for P,B frames),
MPEG-4 Part-10 (H.264), e.g. Blu-Ray – Extensions for Digital Rights Management – Advanced Audio Encoding – Becoming More Widely Deployed for HD and Because of Lower Bit-Rate
Transport Streams
Sam Siewert 35