Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

Post on 19-Jan-2018

215 views 0 download

description

Introduction (1/2)  HEVC coding tree unit (CTU) 3

Transcript of Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

1

HIGHLY PARALLEL FRAMEWORK FOR HEVC MOTION ESTIMATION ON MANY-

CORE PLATFORM

Data Compression Conference 2013

Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li

2

Outline Introduction Related Work Proposed Method Experimental Results Conclusion

3

Introduction(1/2)

HEVC coding tree unit (CTU)

4

Introduction(2/2)

Local parallel method (LPM) Maximum parallelism of LMP is equal or less than 8. independent Pus (IPUs)

Directed acyclic graph (DAG)

5

Related Work(1/2)

Local parallel method (LPM) [16] Motion estimate region (MER)

[16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012

6

Related Work(2/2)

Local parallel method (LPM)

123

M = 16 or 8

8

7

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

8

Proposed Method.A(1/3)

Independent PUs (IPUs) The IPU’s left boundary and MER’s left boundary do not

overlap. The IPU’s upper boundary and MER’s upper boundary do not

overlap.

123

9

Proposed Method.A(2/3)

10

Proposed Method.A(3/3)

Neighboring CTUs left upper upper-left upper-right

11

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

12

Proposed Method.B(1/4)

Generate a DAG to capture the dependency relationships of CTUs.

13

Proposed Method.B(2/4)

DAG consists of a set of vertices V and edges E. data dependency <=> an edge. Processed <=> remove

123

14

Proposed Method.B(3/4)

Condition matrix (CM)

15

Proposed Method.B(4/4)

16

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

17

Proposed Method.C(1/5)

18

Proposed Method.C(2/5)

Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is

designed to record the number of related CTUs for each CTU. Step2 :

When some values in the CM become zero, get the corresponding coordinates and push them into DQ.

19

Proposed Method.C(3/5)

Step3 :Get coordinates from DQ and process corresponding

CTUs in parallel on many-core platform. Step4 :

Update CM. When a CTU with coordinate (i, j) in CM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus one operation.

Step5 :Repeat above steps 2~4 until each frame is over.

20

Proposed Method.C(4/5)

Maximum parallelism of CTU

123

Maximum parallelism of highly parallel framework

123

Average parallelism of highly parallel framework

123

21

Proposed Method.C(5/5)

22

Experimental Results(1/5)

23

Experimental Results(2/5)

24

Experimental Results(3/5)

25

Experimental Results(4/5)

26

Experimental Results(5/5)

27

Conclusion(1/1)

Highly parallel framework provide sufficient parallelism for many-core platforms.

Use the DAG-based order to parallelize CTUs.