A Power Efficient Architecture for 2-D Discrete Wavelet Transform

21
A POWER EFFICIENT ARCHITECTURE FOR 2-D DISCRETE WAVELET TRANSFORM Rahul Jain, CoWare India Preeti Ranjan Panda, IIT-Delhi

Transcript of A Power Efficient Architecture for 2-D Discrete Wavelet Transform

Page 1: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

A POWER EFFICIENT ARCHITECTURE FOR 2-D DISCRETE WAVELET TRANSFORM

Rahul Jain, CoWare India

Preeti Ranjan Panda, IIT-Delhi

Page 2: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

2

Agenda

� Memory Power Optimization

� Existing Z-Scan based Schemes

� Low Power Z-Scan (Proposed Architecture )

� Results

� Conclusion

Page 3: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

3

� Importance of Optimizing Memory System Energy

� Many emerging applications like JPEG2000 are data intensive

� Memory system can contribute up to 90% energy

� Concurrently Optimizing Memory Architecture and Accesses

� Algorithm Level� Reduce memory requirement

� Improve regularity of accesses

� Build optimized memory architecture� Memory Partitioning

� Custom Circuits

Memory Power Optimization

Page 4: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

4

Z-Scan based Schemes [Chiu-SIPS’03]

� Suspending a DWT line computation

� Store 4 intermediate values

� Z-Scan

� Column Processing starts early

� On-Chip Buffer Required = 4*MM =Image Tile ht

� Optimal Z-Scan

� EBCOT Code-Block size (CW*CH) considered

� On-Chip Buffer Required = 4*M+4*2*CW

� Usually CW=CH=64 (values used in exp.)

2* CW

2* CH

Page 5: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

5

Low-Power Z-Scan (1)

� Generalize the Z-Scan� Compute r elements in a row� For Z Scan, r =2� For Optimal Z-Scan, r = 2*CW� On-Chip Buffer Required = 4*M+4*r

r r

2*CH

Page 6: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

6

Low-Power Z-Scan (2)

� r will be a sub-integral multiple of 2*CW� This considers the Code Block Size

� 2 separate buffers used� Row Buffer (RB) = 4*M� Column Buffer (CB) = 4*r

� How to decide the value of r ?� Size of CB α r� RB Sleep Time α r

CB: r locations

RB in Low Power Mode

RB access

Page 7: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

7

Memory Power Analysis (1)

� Let us assume that each element is computed in unit time (Energy and Power can be used interchangeably)

� For a memory of size 2n, Let

� Pa(2n) : memory access power

� Ps(2n) : sleep mode / data retention mode power

� Pw(2n) : wakeup power for each state transition from

sleep mode to active mode

� Let, Ps(2n) = s* Pa (2

n) and Pw (2n) = w* Pa (2n)

� s = 0.1, w = 0.33 (Assumed for Experiments)

� Buffer Accesses

� Read at Resumption

� Write at Suspension

Page 8: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

8

Memory Power Analysis (2)

� Row Buffer Power

� 2 access per r elements

� RB in sleep mode for r-2 element computation

� Wakeup RB once per row

� Power per ‘r’ element computation:

Prow_buffer (r, M) = 2* Pa(M) + (r-2) * Ps(M) + Pw(M)

RB in Low Power Mode

Row Computation Suspends

Row Computation Resumes

Wakeup

Page 9: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

9

Memory Power Analysis (3)

� Column Buffer Power

� 1 access per element

� Power consumption per element computation:

Pcol_buffer (r) = Pa(r)

� Power per 2-D DWT Element Computation:

Prow_buffer (r, M)/r + Pcol_buffer (r)

Col Computation Suspends

Col Computation Resumes

Page 10: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

10

Variation of Power with r

0.00E+00

1.00E-10

2.00E-10

3.00E-10

4.00E-10

5.00E-10

6.00E-10

2 4 8 16 32 64 128

M=512

M=256

M=128

M=64

M=32

Value of r

Energy (J)

r=16

r=32

Page 11: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

11

� Banked Buffer

� Increases the average idleness of the each buffer

� Lower Access Power

� Predictable state changes, no timing overheads

� Let there be ‘b’ RB banks and ‘c’ CB banks

� Average RB power per element:

Prow = [Power of bank in use*M/b + Sleep Power*(M-M/b)] / M

= [{Prow_buffer (r, M/b) / r} * M/b + Ps (M/b) * (M-M/b)] / M

� Each bank waked up once for M*r elements� Additional Row Buffer Wakeups per Element = b/M*r

Power Implications of Banking (1)Power Implications of Banking (1)

Page 12: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

12

� Average column-buffer power per element:

Pcol = [{Pcol_buffer (r/c)} * r/c + Ps (r/c) * (r-r/c)] / r

� No of Column Buffer Wakeups per Element = c/r

� Additional Wakeup Power :

Pwakeups = [Pw(M/b) * b/M*r ] + [ Pw(r/c) * c/r ]

� MUX power considered

� Total Power per Element :

Prow + Pcol + Pwakeups + Pmux

Power Implications of Banking (2)Power Implications of Banking (2)

Page 13: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

13

r vs Power (Banked Case, M=512)

Min Power with r=64, c=4, b=8

Page 14: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

14

Energy Consumption Comparison

MZ-scan

(10-11J)

Optimal Z-scan

(10-11J)

Low-Power Z-scan

(10-11J)r c b

% imp

32 23.4 29.1 8.08 32 4 4 72.2

64 25.5 29.3 8.13 64 4 4 72.3

128 29.9 29.7 8.18 64 4 8 72.5

256 38.5 30.6 8.29 64 4 8 72.9

512 55.8 32.3 8.49 64 4 8 73.7

1024 90.3 35.8 8.89 64 4 8 75.2

Up to 90% and 75% improvement over Z-Scan and Optimal Z-Scan respectively

Page 15: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

15

Energy Modelling

� Sequential Access Memory [Moon-CICC’02]

� Configured as a circular buffer

� Address Sequencing logic and decoders replaced with row sequencer to get low power and high speed

� Banked implementation used for big memory

� Energy Modelling [Coumeri-TVLSI’00]

� Empirical Equations for modelling energy of on-chip SRAM memory

� Model parameters are Size, Bit Width, Access Mode

� Individual equations for different memory components

� To model SAM, Row Decoder, Column Decoder, Buffers not considered

Page 16: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

16

Conclusion

� A methodology to arrive at a Low-Power DWT architecture proposed

� Co-Optimization of Memory Architecture and Access pattern done

� Up to 90% energy saving achieved

� The derived architecture depends on the target memory technology

� Would lead to different architectures for ASIC and FPGA implementations

Page 17: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

17

References:

� [Chiu-SIPS’03]: Mu-Yu Chiu et al (2003).Optimal data transfer and buffering schemes for JPEG2000 encode. IEEE Workshop on SIPS, Aug. 2003, pp. 177 – 182

� [Moon-CICC’02]: Joong-Seok Moon et.al (2002). Low-power sequential access memory design. Custom Integrated Circuits Conference, 2002. pp.111 – 114

� [Coumeri-TVLSI’00]: Coumeri, S.L et al (2000). Memory modelling for System Synthesis. IEEE Trans. VLSI Systems, , June 2000, pp:327 – 334

Page 18: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

18

Thank You

Questions!

Page 19: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

Backup Slides

Page 20: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

20

Discrete Wavelet Transform� 2D wavelet transform:

� 1st:1D wavelet transform to all rows

� 2nd:1D wavelet transform to all columns

� Each Row/Column can be computed independently

� Store 4 values at line computation suspension

Z(2i+1)

Z(2i)0 2 4 6 8

Y(2i+1)

X(i)

Y(2i)

0

0

2

2

4

4

6

6

8

8

1 3 5 7

1 3 5 7

1 3 5 7

Colored arrows show multiplication by constants a, b, c, ddefined in JPEG2000 standard

Page 21: A Power Efficient Architecture for 2-D Discrete Wavelet Transform

10 August 2006 10th IEEE VLSI Design And Test Symposium, 2006

21

Buffer Structure

� The Buffers are all the time full

� They are accessed like a circular FIFO

� General Memory Row Decoder not required

� use a counter

� use a shift register loaded with a 1 initially

� Every Write Signal

� Increments the counter

� Shifts the Register

� Store all the 4 intermediate values in one Column

� No need for the Column Decoder

� This would be similar to Sequential Access Memory (SAM) [Moon-CICC’02]