Matrix Factorizations for Parallel Integer Transforms

24
Matrix Factorizations for Parallel Integer Transforms Yiyuan She 1,2,3 , Pengwei Hao 1,2 , Yakup Paker 2 1 Center for Information Science, Peking University 2 Queen Mary, University of London 3 Department of Statistics, Stanford University

Transcript of Matrix Factorizations for Parallel Integer Transforms

Page 1: Matrix Factorizations for Parallel Integer Transforms

Matrix Factorizations for Parallel Integer Transforms

Yiyuan She1,2,3, Pengwei Hao1,2, Yakup Paker2

1Center for Information Science, Peking University

2Queen Mary, University of London

3Department of Statistics, Stanford University

Page 2: Matrix Factorizations for Parallel Integer Transforms

Contents

1. Introduction2. Point & block factorizations3. Parallel ERM factorization (PERM)4. Parallel computational complexity5. Matrix blocking strategy6. Conclusions

Page 3: Matrix Factorizations for Parallel Integer Transforms

Why integer transform reversible?

store

commE

ncoding

Spatial Transform

B/W image

Color image

Multi-component image

Color Space T

MCTD

ecoding

Inverse Spatial T

Color image

Multi-component image

InverseColor T

IMCT

B/W image

Lossless?

Lossless? Lossless?

Page 4: Matrix Factorizations for Parallel Integer Transforms

How to implement?• Wavelet construction

S transform (Blume & Fand, 1989)

TS transform (Zandi et al, 1995)

S+P transform (Said & Pearlman, 1996)

• Ladder structure (Bruekers & van den Enden, 1992)

• Lifting scheme (2D, Sweldens, 1996)

• Approximated color transform (Gormish et al, 1997)

• General wavelet transform (2D, Daubechies et al, 1998)

Page 5: Matrix Factorizations for Parallel Integer Transforms

Matrix factorizationsP. Hao and Q. Shi, Invertible linear transforms

implemented by integer mapping, Science in China, Series E (in Chinese), 2000, 30, pp. 132-141.

P. Hao and Q. Shi, Matrix factorizations for reversible integer mapping, IEEE Trans. Signal Processing, 2001, 49 pp. 2314-2324.

P. Hao and Q. Shi, Proposal of reversible integer implementation for multiple component transforms, ISO/IEC JTC1/SC29/WG1N1720, Arles, France, 2000.

Y. She and P. Hao, A block TERM factorization of nonsingular uniform block matrices, Science in China, Series E (in Chinese), 2004, 34(2).

Page 6: Matrix Factorizations for Parallel Integer Transforms

Can we make it more efficient?

Less factor matricesLess rounding errorInteger computationParallel computing

How to increase the degree of parallelism?

Page 7: Matrix Factorizations for Parallel Integer Transforms

b

[ ]

x j y+

+

b

x1/j

b

[ ]

-

+

Elementary reversible structure

• Integer factor: j• Flexible rounding: round(), floor(), ceil(), …• Generalized lifting scheme: for j =1, it is the same

as ladder structure and the lifting scheme• Implementation: y=jx+[b] and x=(1/j).(y+[b])

y=jx+[b] x=(1/j).(y+[b])

Page 8: Matrix Factorizations for Parallel Integer Transforms

Elementary reversible matrix (ERM)• Diagonal elements: Integer factors• Triangular ERM (TERM)

– Upper TERM– Lower TERM

• Single-row ERM (SERM)–– Only one row off-diagonal nonzeros

Tm m m= +S J e s

Page 9: Matrix Factorizations for Parallel Integer Transforms

Point factorizations (PLUS)

0

1 1

R

N N −

==

A PLUS DLU S S Sif det det 0T

R= ≠P A D

]0,,,,[ 12100 −⋅+=+= NNT

N ssseIseIS

( ))det(,1,,1,1 APD TR Diag=

Tm m m= +S I e s

Page 10: Matrix Factorizations for Parallel Integer Transforms

Block factorizations (BLUS)

0

1 1

R

N N −

==

A PLUS DLU S S Sif ( ) ( ) existsT

R=DET P A DET D

0 0 1 2 1[ , , , ,0]TN N N −= + = + ⋅S I e s I e s s s

( ), , , , ( )TR Diag=D I I I DET P A

Tm m m= +S I e s

Page 11: Matrix Factorizations for Parallel Integer Transforms

Parallel factorizations (PERM)(1) (2) ( )(0) (1) (2) ( 1) ( )

1 2

Kn n nK KN m m m m m N−= → → → =

( )

1(1) (2) ( ) ( ) ( ) ( ) (1) (1) ( ) ( )

1( ) ( ) kK K K K k k

nk K=

= = ∏A P P P D L U L U PD S S

PERM(0)

PERM(1)

Page 12: Matrix Factorizations for Parallel Integer Transforms

Parallel computing PERM(0)

x P y(1)1S (1)

2S (1)3S (1)

4S (2)1S (2)

2S (2)3S (2)

4S

Page 13: Matrix Factorizations for Parallel Integer Transforms

Parallel computing PERM(1)

x (1)0S (1)

1S (1)2S (1)

3S (1)4S (2)

0S (2)1S (2)

2S (2)3S (2)

4S P y

Page 14: Matrix Factorizations for Parallel Integer Transforms

Parallel multiplication

For p processors to implement multiplications of n pairs of numbers

the computational time is:

* nTp

=

Page 15: Matrix Factorizations for Parallel Integer Transforms

Parallel additionx

1S x

1

1

1

1

1

1

1

1

1

1

1

1

(1,5)(1,6)(1,7)(1,8)(1,9)(1,10)(1,11)(1,12)(1,13)(1,14)(1,15)(1,16)

SSSSSSSSSSSS

[ ]

2

2

log if 2/ log if 2

n n pT

n p C p n p+ < = + ≥

Page 16: Matrix Factorizations for Parallel Integer Transforms

Computational complexity *

( )

(1)* ( ) ( ) ( 1) ( )

12 2

( 1) 2 ( ) 2 1 2

1

( 1) ( ) /

1 ( ) ( )

Kk k k k

PERMk

Kk k

k

T n m m m p

N Nm mp p

=

=

= + −

−≈ − =

( 0)* ( ) ( ) ( 1) ( ) 1

( 1)1

( 1) ( )1 1 1 2

1

( ) /

( ) ( )

Kk k k k

kPERMk

Kk k

k

NT n m m m pm

N N N Nm mp p

−−

=

=

= −

−≈ − =

For n(k)m(k)= m(k–1), m(0)=N1, m(K)=N2 , the parallel multiplication time is:

It’s independent of the blocking manners.

(1) (2) ( )(0) (1) (2) ( 1) ( )1 2

Kn n nK KN m m m m m N−= → → → =

Page 17: Matrix Factorizations for Parallel Integer Transforms

Computational complexity +

For n(k)m(k)= m(k–1), m(0)=N1, m(K)=N2 , the parallel addition time:

There is a turning point Kp, where

is close to but less than 2p.

(1) (2) ( )(0) (1) (2) ( 1) ( )1 2

Kn n nK KN m m m m m N−= → → → =

( ) ( )(0 )( ) ( ) ( 1) ( ) ( ) 1

2 2 ( 1)1

( ) ( 1) ( ) 12 ( 1)

( ) / log log

log ( )

p

p

Kk k k k k

kPERMk

Kk k k

kk K

NT n m m m p p C p mm

Nn m mm

+ −−

=

−−

=

= − − + −

+ −

( ) ( )(1)( ) ( ) ( 1) ( ) ( )

2 21

( ) ( 1) ( )2

( 1) ( ) / log log

( 1) log ( )

p

p

Kk k k k k

PERMk

Kk k k

k K

T n m m m p p C p m

n m m

+ −

=

=

= + − − + −

+ + −

∑( ) ( 1) ( )( )p p pK K Km m m− −

Page 18: Matrix Factorizations for Parallel Integer Transforms

Blocking strategy

Since the parallel computational time has a turning point (ignoring the factors like communication time)

We propose a three-phase blocking strategy

(1) (2) ( )(0) (1) (2) ( 1) ( )1 2

Kn n nK KN m m m m m N−= → → → =

if 2 :

if 2 2 :

if 2 : 1

N p N p

p N p N p

N p N

≥ → →

≤ < → →

≤ → →

Page 19: Matrix Factorizations for Parallel Integer Transforms

Computational complexity(1) (2) ( )(0) (1) (2) ( 1) ( )

1 2

Kn n nK KN m m m m m N−= → → → =

(1)

* *1 2

2* * *

2 3PERM

2*

3 4

( , ) 1 1 ( , ) 2

( , ) ( , ) 1 1 ( , ) 2 4

( , ) 5log 4

N N Nf N p p f p p pp p

N N N NT N p f N p f p p pp p

Nf N p N p

= + ⋅ − ⋅ + ≤

= = + ⋅ − + < < = ≥

( )

(1)

1 2

2

2 2 3PERM

3 4 4

( , ) 1 1 ( , ) 2

( , ) ( , ) 1 1 log ( , ) 2 4

( , ) 5log log 9 1

N N Nf N p p f p p pp p

N N N NT N p f N p C p f p p pp p

f N p N N

+ +

+ + +

+

= + ⋅ − ⋅ + ≤

= = + ⋅ − + + < <

= −2

4

Np

Page 20: Matrix Factorizations for Parallel Integer Transforms

Complexity comparison(1)

*pSERM

1( , ) ( 1) NT N p Np−= +

(1) 2pSERM

1( , ) ( 1) logNT N p N C pp

+ −= + +

p Operation O(N) O(N2)

SERM(1) O(N) O(N) Multiplications

PERM(1) O(N) O(logN) SERM(1) O(NlogN) O(NlogN)

Additions PERM(1) O(N) O(log2N)

Page 21: Matrix Factorizations for Parallel Integer Transforms

PERM vs. parallel SERM

1

10

100

1000

10000

1 4 16 64 256 1024Number of Processors ( p )

Computational Com

plexity

PERM MultiplicationsPERM AdditionsSERM MultiplicationsSERM Additions

Computational complexity (N = 64, C = 1)

Page 22: Matrix Factorizations for Parallel Integer Transforms

PERM vs. parallel SERM

Relative speedup( N = 64, C = 1)

0

2

4

6

8

10

1 4 16 64 256 1024Number of Processors (p )

Speedup(PERM

/SERM

)

P ERM Multiplica tio n/SERM Multiplica tio nPERM Addition/SERM Addition

Page 23: Matrix Factorizations for Parallel Integer Transforms

ConclusionsFor parallel computing:

Increase the degree of parallelismAccommodate more processors

For sequential computing:May be more efficient for sequential computing with special matrix computation software such as BLAS

More factorization levels possibly result in greater rounding error

Page 24: Matrix Factorizations for Parallel Integer Transforms

Thank You

[email protected]

[email protected]

http://www.dcs.qmul.ac.uk/~phao