COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

40
COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION 03/26/2012 1

description

COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION. 03/26/2012. OUTLINE. Introduction Motivation Network-on-Chip (NoC) ASIC based approaches Coarse grain architectures Proposed Architecture Results. INTRODUCTION. Goal - PowerPoint PPT Presentation

Transcript of COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

Page 1: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

03/26/20121

Page 2: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

OUTLINE

Introduction Motivation Network-on-Chip (NoC) ASIC based approaches Coarse grain architectures Proposed Architecture Results

2

Page 3: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

INTRODUCTION Goal

Application specific hybrid coarse grained reconfigurable architecture using NoC

Purpose Support Variable Block Size Motion Estimation

(VBSME) First approach

No ASIC and other coarse grained reconfigurable

architectures Difference

Use of intelligent NoC routers Support full and fast search algorithms 3

Page 4: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

4

MOTIVATION

H.264

Motion Estimation

Ө(f)=

Page 5: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

5

MOTION ESTIMATION

Previous Frame

Current Frame

Current 16x16 Block

Mot

ion

Vecto

r

Search Window

Sum of Absolute Difference (SAD)

Page 6: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

SYSTEM-ON-CHIP (SOC)

Single chip systems Common components

Microprocessor Memory Co-processor Other blocks

Increased processing power and data intensive applications Facilitating communication between individual

blocks has become a challenge

6

Page 7: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

TECHNOLOGY ADVANCEMENT

7

Page 8: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

DELAY VS. PROCESS TECHNOLOGY

8

Page 9: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

NETWORK-ON-CHIP (NOC)

Efficient communication via use of transfer protocols

Need to take into consideration the strict constraints of SoC environment

Types of communication structure Bus Point-to-point Network

9

Page 10: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

COMMUNICATION STRUCTURES

10

Page 11: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

BUS VS. NETWORK

Bus Pros & Cons Network Pros & Cons

Every unit attached adds parasitic capacitance

x ✓ Local performance not degraded with scaling

Bus timing is difficult x ✓ Network wires can be pipelined

Bus arbitration can become a bottleneck

x ✓ Routing decisions are distributed

Bus testability problematic and slow

x ✓ Locally placed BIST is fast and easy

Bandwidth is limited and shared by all

x ✓ Bandwidth scales with network size

Bus latency is wire speed once granted

✓ x Network contention may cause latency

Very compatible ✓ x IPs need smart wrappers

Simple to understand ✓ x Relatively complicated

11

Page 12: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

EXAMPLE

12

Page 13: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

EXAMPLE OF NOC

13

Page 14: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

ROUTER ARCHITECTURE

14

Page 15: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

BACKGROUND

ME General purpose processors, ASIC, FPGA and

coarse grain Only FBSME VBSME with redundant hardware

General purpose processors Can exploit parallelism Limited by the inherent sequential nature and

data access via registers

15

Page 16: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

CONTINUED…

ASIC No support to all block sizes of H.264 Support provided at the cost of high area

overhead Coarse grained

Overcome the drawbacks of LUT based FPGAs Elements with coarser granularity Fewer configuration bits Under utilization of resources

16

Page 17: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

ASIC Approaches

Topology SAD accumulation

2D systolic array

•Large number of registers•Store partial SADs•Area overhead•High latency

•Mesh based architecture•Store partial SADs•Area overhead•High latency•No VBSME

Partial Sum

Parallel Sum

1D systolic array

1D systolic array

2D systolic array

Partial Sum

Parallel Sum

2D systolic array

Partial Sum

Parallel Sum

•Reference pixels broadcasted•SAD computation for each 4x4 block pipelined•Each processing element computes pixel difference, accumulates it to the previous partial SAD and sends the computed partial SAD to the next processing element•Large number of registers

•All pixel differences of a 4x4 block computed in parallel•Reference pixels are reused•Direction of data transfer depends on search pattern

17

Page 18: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

OU’S APPROACH

16 SAD modules to process 16 4x4 motion vectors

VBSME processor Chain of adders and comparators to compute

larger SADs PE array

Basic computational element of SAD module Cascade of 4 1D arrays

1D array 1D systolic array of 4 PEs Each PE computes a 1 pixel SAD

18

Page 19: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

Module 0Module 0

Module 1Module 1

Module 15Module 15

current_block_data_0 search_block_data_0

current_block_data_1

current_block_data_15

search_block_data_1

search_block_data_15

SAD_0

SAD_1

SAD_15

MV_0

MV_1

MV_15

strip_sel read_addr_B

read_addr_A

write_addr

SAD Modules

MUX for SADMUX for SAD

1D Array

0

1D Array

0

1D Array

3

1D Array

3

block_strip_B

block_strip_A

DD DDcurrent_block_data_i

4 bits

1 bit 1 bit

32 bits

32 bits

SAD_i

MV_i

PE Array19

Page 20: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

PEPE

PEPE

PEPE

PEPE

ACCMACCM

DD

DD

DD

DD

DDDD

DD

DD DD

DD DD DD

32 bits 32 bits

1D Array

20

Page 21: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

PUTTING IT TOGETHER

Clock cycle Columns of current 4x4 sub-block scheduled using a

delay line Two sets of search block columns broadcasted

4 block matching operations executed concurrently per SAD module

4x4 SADs -> 4x4 motion vectors Chain of adders and comparators

4x4 SADs -> 4x8 SADs -> … 16x16 SADs Chain of adders and comparators

Drawbacks No reuse of search data between modules Resource wastage

21

Page 22: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

22

ALTERNATIVE SOLUTION: COARSE GRAIN ARCHITECTURES

ChESS*(M x 0.8M)/256 x 17 x 17

MATRIX*(M x0.8M)/256 x 17 x 17

RaPiD*272+32M+14.45M2

* Performance (clock cycles) [Frame Size: M x 0.8M]

• Resource utilization

• Generic interconnect

Page 23: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

PROPOSED ARCHITECTURE

2D architecture 16 CPEs 4 PE2s 1 PE3 Main Memory Memory Interface

CPE (Configurable Processing Element) PE1 NoC router Network Interface Current and reference block from main memory

23

Page 24: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

CPE(1,1)CPE(1,1)

CPE(2,1)CPE(2,1)

CPE(3,1)CPE(3,1)

CPE(4,1)CPE(4,1)

CPE(1,2)CPE(1,2)

CPE(2,2)CPE(2,2)

CPE(3,2)CPE(3,2)

CPE(4,2)CPE(4,2)

CPE(1,3)CPE(1,3)

CPE(2,3)CPE(2,3)

CPE(3,3)CPE(3,3)

CPE(4,3)CPE(4,3)

CPE(1,4)CPE(1,4)

CPE(2,4)CPE(2,4)

CPE(3,4)CPE(3,4)

CPE(4,4)CPE(4,4)

c_d

c_d

c_d

c_d

r_d

r_d

r_d

r_d

c_d

c_d

c_d

c_d

r_d

r_d

r_d

r_d

c_d

c_d

c_d

c_d

r_d

r_d

r_d

r_d

r_d

r_d

r_d

r_d

c_d

c_d

c_d

c_d

PE 2(1)PE

2(1)

PE 2(3)PE

2(3)

PE 2(2)PE

2(2)

PE 2(4)PE

2(4)

PE 3PE 3

Main MemoryMain Memory Memory Interface (MI)

Memory Interface (MI)

data_load_control

(16 bits)

reference_block_id (5 bits)

c_d_(x,y)

(32 bits)

r_d_(x,y)

(32 bits)

32 bits

14 bits

12 bits

24

Page 25: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

18 bit sub

18 bit sub

CPRCPR

RPRRPR

28 bit sub

28 bit sub

CPRCPR

RPRRPR

38 bit sub

38 bit sub

CPRCPR

RPRRPR

48 bit sub

48 bit sub

CPRCPR

RPRRPR

58 bit sub

58 bit sub

CPRCPR

RPRRPR

68 bit sub

68 bit sub

CPRCPR

RPRRPR

78 bit sub

78 bit sub

CPRCPR

RPRRPR

88 bit sub

88 bit sub

CPRCPR

RPRRPR

98 bit sub

98 bit sub

CPRCPR

RPRRPR

108 bit sub

108 bit sub

CPRCPR

RPRRPR

118 bit sub

118 bit sub

CPRCPR

RPRRPR

128 bit sub

128 bit sub

CPRCPR

RPRRPR

138 bit sub

138 bit sub

CPRCPR

RPRRPR

148 bit sub

148 bit sub

CPRCPR

RPRRPR

158 bit sub

158 bit sub

CPRCPR

RPRRPR

168 bit sub

168 bit sub

CPRCPR

RPRRPR

10 bit

adder

10 bit

adder

10 bit

adder

10 bit

adder

10 bit

adder

10 bit

adder

10 bit

adder

10 bit

adder

12 bit

adder

12 bit

adder

COMPCOMP

REGREG

r_d c_d To/From NI

To/From East

To/From South

4x4 mv

25

Page 26: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

CONTROL UNIT

CONTROL UNIT

PACKETIZATION UNIT

PACKETIZATION UNIT

DEPACKETIZATION UNIT

DEPACKETIZATION UNIT

reference_block_id to MI

data_load_control to MI

Network Interface

NETWORK INTERFACE

26

Page 27: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

00

11

33

55

4422

Ring Buffer

First Index Last Index

Header DecoderHeader Decoder

PE 1East

West

North

South

PE 1

EastWest

North

South

Input Controller

Input Controller

Output Controller

Output Controller

ack ackrequest requestReceives

packets from NI/ adjacent router

Stores packets

•XY routing protocol•Extracts direction of data transfer from header packet•Updates number of hops

Sends packets to NI or adjacent router

Input/Output Control Signals

27

NOC ROUTER

Page 28: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

Input Controller

Output Controller

Input Controller

Output Controller

Router 1 Router 2

Step 1: Send a message from Router 1 to Router 2

req (1 bit)

Busy?

Buffer space available?

ack (1 bit)

Step 2: Send a 1 bit request signal to Router 2Step 3: Router 2 first checks if it is busy. If not checks for available buffer spaceStep 4: Send ack if space availableStep 5: Send the packet

packet

32 bit

28

Page 29: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

PE2 AND PE3

AddersMuxesDe-muxes

ComparatorsRegisters 29

Page 30: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

FAST SEARCH ALGORITHM

Diamond Search

•9 candidate search points•Numbers represent order of processing the reference frames•Directed edges labeled with data transmission equations derived based on data dependencies

30

Page 31: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

EXAMPLE

Frame

Macro-block

SAD

31

Page 32: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

CONTINUED…

32

Page 33: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

DATA TRANSFER

Data Transfer between PE1(1,1) and PE1(1,3)

Individual PointsIntersecting Points

33

Page 34: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

DATA LOAD SCHEDULE

34

Page 35: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

OTHER FAST SEARCH ALGORITHMS

Hexagon

Big Hexagon Spiral

35

Page 36: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

FULL SEARCH

36

Page 37: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

CONTINUED…

37

Page 38: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

RESULTS

38

Page 39: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

CONTINUED…

39

Page 40: COARSE GRAINED RECONFIGURABLE ARCHITECTURE FOR VARIABLE BLOCK SIZE MOTION ESTIMATION

40