VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf ·...

29

Transcript of VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf ·...

Page 1: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the
Page 2: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

VIDEO TRANSCODING WITH AMLON HETEROGENEOUS COMPUTE

Mike SchmitAMDSr. Manager, Software Engineering, Office of the CTO, Multimedia

Page 3: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

3 Video Transcoding with AML on Heterogeneous Compute | June 2011

AGENDA

AML Overview

Building a Binary Library of OpenCLTM Kernels

Decoder and Encoder Pipelines

Partitioning Workloads in a Heterogeneous Environment

Performance Measurements

Challenges

Q & A

Page 4: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

4 Video Transcoding with AML on Heterogeneous Compute | June 2011

AML OVERVIEW

Page 5: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

5 Video Transcoding with AML on Heterogeneous Compute | June 2011

AML (AMD MEDIA LIBRARY) OVERVIEW

AML is a collection of OpenCL media components for GPU shader assisted video compression, post processing and image-related processing

– OpenCL-based low level building blocks– Build your own standards-based codec or a proprietary codec– Pick and choose from the dozens of “tools” within an encoder– Full control over CPU-GPU pipelining– Optimized OpenCL kernels (binaries) will be installed on end-users systems as part of the GPU drivers– A thin SDK layer allows ISV applications to easily access the library of kernels

(called a kernel database or KDB file)– Current development

MPEG-2 & H.264 encode and decode up to 1080p

JPEG decode up to 64 MP

Page 6: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

6 Video Transcoding with AML on Heterogeneous Compute | June 2011

APPLICATIONS AND USE CASES FOR AML (A FEW EXAMPLES)

Ultra-fast decode (800+ fps on 1080p)– Smart movie navigation– Video Tapestries and Video narratives (UI with blends from many scenes)– Index movie with object detection, recognition, search, etc– Face detection and recognition (need to leave lots of compute cycles after the decode)– Video skimming (smart fast forward)

Video editing– Many decoder tracks, with one encoder– Fast/smooth scrubbing and full quality preview of complex effects– Clip cataloguing

TranscodingVideo conferencing with one to many participants

Page 7: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

7 Video Transcoding with AML on Heterogeneous Compute | June 2011

BUILDING A BINARY LIBRARY OF OPENCL KERNELS

Page 8: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

8 Video Transcoding with AML on Heterogeneous Compute | June 2011

Working with OpenCL kernels

OpenCL works with a JIT (just-in-time) model for compiling and execution– The compiler goes through several stages from the OpenCL front end to the final shader compiler (SC)

that creates the ISA (instruction set architecture) binary for the precise GPU hardware target– The application passes a CL source file in a buffer to the run-time compiler to be executed on the GPU– We present a methodology to store intermediate binaries (LLVM) in a KDB (kernel database) file at

build time for mass distribution– At install time these kernels are compiled for the specific GPU installed on the system– At run time the KDB file is opened (via a thin SDK) which checks for proper installation and delivers

each kernel binary for execution via the OpenCL run-time– Upgrades or swapping of GPUs is fully supported via the app startup installation check where a quick

recompile may occur (this is a rare event and may take about a minute)

Page 9: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

9 Video Transcoding with AML on Heterogeneous Compute | June 2011

AML Kernel Database

OpenCL Runtime

Installed Image

ISV Application

KDBSDK.lib KDBSDK.dll

AMD Graphics

Prog Obj

GetProgram

Driver installation

OpenCL Environment

KDB Installation and Design Overview

OpenCL API

Page 10: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

10 Video Transcoding with AML on Heterogeneous Compute | June 2011

OPENCL COMPILER FLOW

source.cl

LLVM

IL

ISA

OpenCL front end

OpenCL back end

Shader Compiler (SC)

GPU H/W

OpenCL Implementation

GPU H/W specific

JIT

x86

CPU

(see llvm.org)

(GPU intermediate language)

Page 11: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

11 Video Transcoding with AML on Heterogeneous Compute | June 2011

KDB (KERNEL DATA BASE) CREATION PROCESS

source.cl

LLVM

ILILLLVM KDB (generic)

Make/Build

KDB (ISA specific)

GPU H/W

Install / OpenCL ->SC

Runtime

Install download package

Post Install KDB on HDD

source.clsource.cl

IL

ISA

Page 12: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

12 Video Transcoding with AML on Heterogeneous Compute | June 2011

DECODER AND ENCODER PIPELINE EXAMPLES

Page 13: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

13 Video Transcoding with AML on Heterogeneous Compute | June 2011

VIDEO DECODER PIPELINE | With MPEG-2 as a Simple Example

VLD

Block coeff info

Block coeff data

Block coeff info Block coeff

data

Mpeg2Recon KernelMotion CompensationMpeg2McY8x8P KernelMpeg2McC8x4P Kernel

MPEG-2 MC Macroblock InfoMPEG-2 MC Motion

Vector Info

Ref Frame 0

Ref Frame 1

Motion Comp Output

MPEG-2 MC Macroblock InfoMPEG-2 MC Motion

Vector info Decoded Frame

Decoded Frame

CPU

GPU

or

Page 14: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

14 Video Transcoding with AML on Heterogeneous Compute | June 2011

VIDEO ENCODER PIPELINE | With MPEG-2 as a Simple Example

Mpeg2MdCalcBasic Kernel

MotionCompMpeg2McY8x8P KernelMpeg2McC8x4P Kernel

CPU

GPU

Frame Read

Frame buffer

Mode Decision MB

Info

Mode Decision KernelsMpeg2MDRCIMpeg2MDRCPMpeg2MDRCB

Mpeg2Recon Kernel

Mpeg2MotionSearchFullNxM*

Mpeg2Diff Kernel

Motion Search MV Info

Block coeff Info

MPEG-2 MC Macroblock Info

MPEG-2 MC Motion Vector info

Motion Comp Output

Ref Frame 0Ref Frame

1

VLE / entropy encode

Block Coeff Data

or

Frame buffer Block Coeff Data

Page 15: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

15 Video Transcoding with AML on Heterogeneous Compute | June 2011

MOTION ESTIMATION DETAILS

A motion search implementation might use the following 8 of 50+ motion search kernels

MotionMvScaleMpeg2MotionSearchFull16x16Mpeg2MotionSearchFull5x3Rect16x16Mpeg2MotionSearchFull5x3Rect16x16AvgMpeg2MotionSearchFullNxMRect8x8Mpeg2MotionSearchHPel3x3Rect16x16Mpeg2MotionSearchHPel3x3Rect16x16AvgMotionMvSelect

Page 16: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

16 Video Transcoding with AML on Heterogeneous Compute | June 2011

SPLITTING THE WORKLOAD IN A HETEROGENEOUS

COMPUTE ENVIRONMENT

Page 17: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

17 Video Transcoding with AML on Heterogeneous Compute | June 2011

PARTITIONING GPU LOAD AND CPU LOAD

Page 18: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

18 Video Transcoding with AML on Heterogeneous Compute | June 2011

PARTITIONING GPU LOAD AND CPU LOAD (AS A PERCENTAGE OF ORIGINAL CPU LOAD)

Page 19: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

19 Video Transcoding with AML on Heterogeneous Compute | June 2011

AML PERFORMANCE

Page 20: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

20 Video Transcoding with AML on Heterogeneous Compute | June 2011

MPEG-2 1080p VIDEO DECODER PERFORMANCE

0

200

400

600

800

1000

1200

1 2 3 4

FPS

CPU Threads

HD 6970HD 5670Llano (2.4 Ghz)2.8 Ghz CPU

Page 21: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

21 Video Transcoding with AML on Heterogeneous Compute | June 2011

H.264 1080p VIDEO ENCODER PERFORMANCE

0

30

60

90

120

150

180

210

Llano (2.4 GHz)

Phenom II (2.8 GHz)

Llano(+GPU)

Phenom II+HD 5670

Phenom II+HD 5770

Phenom II+HD 6970

FPS

CPU software only CPU + GPU

Page 22: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

22 Video Transcoding with AML on Heterogeneous Compute | June 2011

CHALLENGES

Memory Bandwidth– Need lots of CPU and GPU compute time relative to CPU->GPU and GPU->CPU data transfers– Future platform directions; FSA (Fusion System Architecture) will address this– But still a good practice to use lots of compute time per data load, just like for a CPU cache

Pipeline adds latency– High throughput may be the tradeoff for longer latency

Pipeline with rate control– Getting feedback on the bit consumption along with high parallelism/performance can be tricky

Motion estimation: lots of choices– The biggest consumption of cycles in most algorithms– Low-end GPUs – do less; high-end do more

Page 23: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

23 Video Transcoding with AML on Heterogeneous Compute | June 2011

H.264 SPECIFIC ISSUES

Intra-prediction mode and Deblocking– Both of these H.264 features have strong dependencies on the neighboring macroblocks that are

above and to the left– These dependencies limit the amount of parallelism obtainable– Thus the more powerful a GPU is the lower and lower the GPU shader utilization will be

Solutions– Encoding with multiple slices somewhat mitigates this– Can encode multiple frames at once within the stream (B frames)– Can encode multiple streams at once– Future GPUs address with the ability to schedule more than a single kernel at once

Page 24: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

24 Video Transcoding with AML on Heterogeneous Compute | June 2011

INTRA-PREDICTION MODE DETAILS (16 X 16)

16 x 16 Luma modes (similar to 8 x 8 chroma modes)

0: vertical 1: horizontal

2: DC (Mean(H+V) 3: Plane

Page 25: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

25 Video Transcoding with AML on Heterogeneous Compute | June 2011

INTRA-PREDICTION MODE DETAILS (4 X 4)

8: horizontal up

1: horizontal

3: diagonal down-left

0: vertical

7: vertical left

4: diagonal down-right

6: horizontal down

5: vertical right

2: DC

Page 26: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

26 Video Transcoding with AML on Heterogeneous Compute | June 2011

SUMMARY: ADVANTAGES OF THE AML MODEL

Pre-written, optimized OpenCL kernels provided for each AMD GPU family, including APUsPerformance designed to scale with more powerful GPUs for data parallel functionsPerformance designed to scale from generation to generationSpecific compilation for the targeted GPU means the code automatically takes advantage of – AMD instruction set improvements such as SAD (sum of absolute differences) instructions– OpenCL compiler improvements– SIMD engine (CU) design advancements, such as LDS (local data store)

ISVs do not have to become experts in data parallel programming optimizationCan get a high performance codec up and running quicklyISVs can mix and match with their own custom kernels

Page 27: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

27 Video Transcoding with AML on Heterogeneous Compute | June 2011

Other Sessions of interest– 1721: M-JPEG Decoding using OpenCL on Fusion– 2112: A Methodology for Optimizing Data Transfer in OpenCL– 2322: Fusion Enabled Video and Imaging Pipelines– 1741: Optimizing Video Editing Software with OpenCL– 2116: Video Post Processing– 2904: 1) High Quality and Efficient Post Processing on GPU Compute– 2904: 2) Real-time H.264 Video Enhancement Using AMD APP SDK– 2904: 3) Using Fusion System Architecture for Broadcast Video

ADDITIONAL SESSIONS

Page 28: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

QUESTIONS

Page 29: VIDEO TRANSCODING WITH AML - Home - AMDdeveloper.amd.com/wordpress/media/2013/06/2146_final.pdf · 2013-10-24 · – A thin SDK layer allows ISV applications to easily access the

29 Video Transcoding with AML on Heterogeneous Compute | June 2011

Disclaimer & AttributionThe information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors.

The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limitedto product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. There is no obligation to update or otherwise correct or revise this information. However, we reserve the right to revise this information and to make changes from time to time to the content hereof without obligation to notify any person of such revisions or changes.

NO REPRESENTATIONS OR WARRANTIES ARE MADE WITH RESPECT TO THE CONTENTS HEREOF AND NO RESPONSIBILITY IS ASSUMED FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION.

ALL IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE ARE EXPRESSLY DISCLAIMED. IN NO EVENT WILL ANY LIABILITY TO ANY PERSON BE INCURRED FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.

AMD, the AMD arrow logo, and combinations thereof are trademarks of Advanced Micro Devices, Inc. All other names used in this presentation are for informational purposes only and may be trademarks of their respective owners.

OpenCL is a trademark of Apple Inc. used by permission by Khronos.

© 2011 Advanced Micro Devices, Inc.