High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High...

High Performance Video Pipelining: A Flexible Architecture for GPU Processing of Broadcast Video

Peter Walsh Chief Emerging Technology Engineer

Overview

• Real-time GPU processing of broadcast video

– Maximize GPU utilization

– Maintain flexibility

• High Performance Video Pipeline

– CPU and GPU buffers

– Data transfer

Monday Night Football production truck

NASCAR production truck

Studio (BCS championship “Film Room”)

GPU Processing

• Segmentation (generating chromakey)

• Inserting graphics (linear and chromakeying)

• Field (camera) tracking

• Object (player) tracking

Segmentation

GFX insertion

Field Tracking

Interop

Input Video

CPU GPU

Rendering

Output Video

Object Tracking

Background

• “Best Practices in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2013

• “Topics in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2014

Naïve Sequential Implementation

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Simultaneous Operations

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Techniques

• Avoid CPU memory copies

• Use pinned system memory

• DMA Video I/O using pinned memory

• DMA between CPU and GPU

• Asynchronous – using multiple CUDA streams

• Double buffers for simultaneous R/W

Frame Buffers

Pinned System

System

Frame Buffers

Pinned System

System

Buffer Allocation • Device • System • Pinned System

• 1D • 2D (pitch specified) • 2D (pitch determined by CUDA allocation)

CUDA API

Allocation:

Memory Copies:

cudaMalloc() cudaHostAlloc() cudaMallocPitch()

cudaMemcpy() cudaMemcpy2D() cudaMemcpyAsync() cudaMemcpy2DAsync()

Buffer Transfers

B.Copy(A, pStream)

• Source and destination buffers

– System, pinned system, device

– Different pitches

• Supports Synchronous/Asynchronous transfers

CUDA Kernels

LaunchKernel( A, B, pStream, …)

• Buffers A and B are in device memory

• Sync/Async behavior controlled by pStream

Processing

Acquire(A) B.Copy(A, pUploadStream) Process(B, C, pProcessingStream, params) D.Copy(C, pDownLoadStream) Output(D)

Double Buffering

Frame “i”

Frame “i + 1”

Double Buffering

Processing

Src Dst Src Dst

Src Dst

Double Buffering

Processing

Src Dst Src Dst

Src Dst

Segmentation

GFX insertion

Field Tracking

Interop

Input Video

CPU GPU

Rendering

Output Video

Object Tracking

Simultaneous Operations

• Acquire

• Upload

• Process

• Download

• Output

1 Frame Time

Intel IPP ippiFilter_8u_C1R (pSrcImgOffset, srcPitch, pDstImgOffset, dstPitch, roi, filterKernel, kernelSize, anchor, divisor);

NVIDIA NPP nppiFilter_8u_C1R (pSrcImgOffset, srcPitch, pDstImgOffset, dstPitch, roi, filterKernel, kernelSize, anchor, divisor);

HPVP Filter_8u_C1R(pSrc, pDest, roi, pFilterKernel);

Live Filtering

• Acquire(A)

• B.Copy(A, pUploadStream)

• Filter_8u_C3R(B, C, roi, pFilterKernel) *

• D.Copy(C, pDownLoadStream)

• Output(D)

* CUDA stream for processing already defined

References/Links

“Best Practices in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2013

“Topics in GPU-Based Video Processing,” Tom True, NVIDIA, GTC 2014 http://www.youtube.com/watch?v=QpEV-XVIxNw http://frontrow.espn.go.com/2014/01/espns-advanced-replay-tool-art-graphically-enhances-sports-telecasts/

Questions

Peter Walsh ESPN pete.m.walsh@espn.com (860) 766-2908

High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High...

Documents

Transcript of High Performance Video Pipelining: A Flexible Architecture for … · 2014-04-07 · Title: High...

Improving Performance: Pipelining › ... › Notes › 2015-16 › lecture04-pipelining.pdfInf3 Computer Architecture - 2015-2016 1 Improving Performance: Pipelining General registers

Pipelining II Andreas Klappenecker CPSC321 Computer Architecture.

PIPELINING basics - · PIPELINING basics • A pipelined architecture for MIPS • Hurdles in pipelining • Simple solutions to pipelining hurdles • Advanced pipelining

COSC 6385 Computer Architecture - Pipelining (II)gabriel/courses/cosc6385_s18/CA_09_Pipelining_2.pdf · 1 COSC 6385 Computer Architecture - Pipelining (II) Edgar Gabriel Spring 2018

CS 61C: Great Ideas in Computer Architecture Pipelining ...gamescrafters.berkeley.edu/.../lec/20/2016Sp-CS61C-L20-Pipelining-post.pdf9. Pipelining Hazards A hazard is a situation that

Pipelining in MIPs Architecture

COSC 6385 Computer Architecture Dynamic Branch Prediction · Computer Architecture Dynamic Branch Prediction Edgar Gabriel Spring 2018 Pipelining • Pipelining allows for overlapping

CSCE 430/830 Computer Architecture Basic Pipelining & Performance

EECS 322 Computer Architecture Introduction to Pipelining

CS152 Computer Architecture and Engineering Lecture 15 Advanced pipelining/Compiler Scheduling

Lecture 05: Pipelining: Basic/ Intermediate Concepts … 05: Pipelining: Basic/ Intermediate Concepts and Implementation CSE 564 Computer Architecture Summer 2017 Department …

This Unit: Pipelining Advanced Computer Architecture I

Computer architecture pipelining

Flexible Architecture

CPSC 614:Graduate Computer Architecture Lecture 2 Pipelining, Caching, and Benchmarks

Computer Architecture Lecture 6: Pipelining - …sallamah.weebly.com/uploads/6/9/3/5/6935631/comparch-2016-s-06.pdf · Computer Architecture Lecture 6: Pipelining ... (J, COND, IRD)

PIPELINING AND PROCESSOR PERFORMANCE - … · PIPELINING AND PROCESSOR PERFORMANCE ... Computer Architecture: A Quantitative Approach”, 5th edition, Chapter 1, John L. Hennessy

EECE 476: Computer Architecture Slide Set #4: Pipelining

ELEC692 VLSI Signal Processing Architecture Lecture 2 Pipelining and Parallel Processing.

Chap 6.1 Computer Architecture Chapter 6 Enhancing Performance with Pipelining.