Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

27
HIGHLY PARALLEL FRAMEWORK FOR HEVC MOTION ESTIMATION ON MANY-CORE PLATFORM Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1

description

Introduction (1/2)  HEVC coding tree unit (CTU) 3

Transcript of Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

Page 1: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

1

HIGHLY PARALLEL FRAMEWORK FOR HEVC MOTION ESTIMATION ON MANY-

CORE PLATFORM

Data Compression Conference 2013

Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li

Page 2: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

2

Outline Introduction Related Work Proposed Method Experimental Results Conclusion

Page 3: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

3

Introduction(1/2)

HEVC coding tree unit (CTU)

Page 4: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

4

Introduction(2/2)

Local parallel method (LPM) Maximum parallelism of LMP is equal or less than 8. independent Pus (IPUs)

Directed acyclic graph (DAG)

Page 5: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

5

Related Work(1/2)

Local parallel method (LPM) [16] Motion estimate region (MER)

[16] Minhua Zhou, “AHG10: Configurable and CU-group level parallel merge/skip,” JCTVC-H0082, Feb. 2012

Page 6: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

6

Related Work(2/2)

Local parallel method (LPM)

123

M = 16 or 8

8

Page 7: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

7

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

Page 8: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

8

Proposed Method.A(1/3)

Independent PUs (IPUs) The IPU’s left boundary and MER’s left boundary do not

overlap. The IPU’s upper boundary and MER’s upper boundary do not

overlap.

123

Page 9: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

9

Proposed Method.A(2/3)

Page 10: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

10

Proposed Method.A(3/3)

Neighboring CTUs left upper upper-left upper-right

Page 11: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

11

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

Page 12: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

12

Proposed Method.B(1/4)

Generate a DAG to capture the dependency relationships of CTUs.

Page 13: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

13

Proposed Method.B(2/4)

DAG consists of a set of vertices V and edges E. data dependency <=> an edge. Processed <=> remove

123

Page 14: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

14

Proposed Method.B(3/4)

Condition matrix (CM)

Page 15: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

15

Proposed Method.B(4/4)

Page 16: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

16

Proposed Method A. Data Dependency Analysis

B. DAG for CTUs

C. Highly Parallel Framework

Page 17: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

17

Proposed Method.C(1/5)

Page 18: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

18

Proposed Method.C(2/5)

Step1 : Initialize DQ and CM. DQ is a waiting queue. CM is

designed to record the number of related CTUs for each CTU. Step2 :

When some values in the CM become zero, get the corresponding coordinates and push them into DQ.

Page 19: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

19

Proposed Method.C(3/5)

Step3 :Get coordinates from DQ and process corresponding

CTUs in parallel on many-core platform. Step4 :

Update CM. When a CTU with coordinate (i, j) in CM is processed, the values of coordinates (i+1, j), (i+1, j-1), (i,j+1) and (i+1,j+1) in CM will minus one operation.

Step5 :Repeat above steps 2~4 until each frame is over.

Page 20: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

20

Proposed Method.C(4/5)

Maximum parallelism of CTU

123

Maximum parallelism of highly parallel framework

123

Average parallelism of highly parallel framework

123

Page 21: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

21

Proposed Method.C(5/5)

Page 22: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

22

Experimental Results(1/5)

Page 23: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

23

Experimental Results(2/5)

Page 24: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

24

Experimental Results(3/5)

Page 25: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

25

Experimental Results(4/5)

Page 26: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

26

Experimental Results(5/5)

Page 27: Data Compression Conference 2013 Chenggang Yan, Yongdong Zhang, Feng Dai and Liang Li 1.

27

Conclusion(1/1)

Highly parallel framework provide sufficient parallelism for many-core platforms.

Use the DAG-based order to parallelize CTUs.