PMLAB, IECS, FCU Designing Efficient Matrix Transposition on Various Interconnection Networks Using...
-
Upload
evelyn-pitts -
Category
Documents
-
view
217 -
download
0
description
Transcript of PMLAB, IECS, FCU Designing Efficient Matrix Transposition on Various Interconnection Networks Using...
PMLAB, IECS, FCU
Designing Efficient Matrix Transposition on Various Interconnection Networks
Using Tensor Product Formulation
Presented by Chin-Yi Tsai
2PMLAB, IECS, FCU
Outline• Introduction• Tensor Product Notation• Matrix Transposition• Designing Matrix Transposition on Various
Interconnection Networks• Conclusions and Future Work
3PMLAB, IECS, FCU
Introduction• Matrix transposition is a simple, but an
important computational problem.• A matrix is a two-dimensional data structure
which is stored in a one-dimensional computer memory.
• A simple double-loop transposition program will perform poorly in modern computer architecture with memory hierarchy.
4PMLAB, IECS, FCU
Introduction (cont’d)• We develop matrix transposition algorithms on various
interconnection networks, including omega, baseline and hypercube networks.
• Tensor product has been successfully used for designing block recursive algorithm, such as FFT, Strassen’s matrix multiplication, parallel prefix algorithm, Hilbert space-filling curve, and Karatsuba’s multiplication.
• Tensor product formulas are also suitable for specifying interconnection networks.
5PMLAB, IECS, FCU
Introduction (cont’d)• Different interconnection networks have their
own architectural characteristics and properties.
• Distributed-memory algorithms and VLSI circuit design.
• A major goal of this study is to provide an effective way for designing VLSI circuits of DSP algorithms.
6PMLAB, IECS, FCU
Tensor Product Notation• Let A and B be two matrices of size and ,
respectively
• Stride permutation
BaBa
BaBaBA
nmm
n
1,10,1
1,00,0
nm qp
mi
nj
nj
mi
mnn eeeeL )(
7PMLAB, IECS, FCU
Matrix Transposition• Matrix transposition can be viewed as changing
the elements from the row-major order to column-major order.
• Matrix A is stored in a computer memory, the index scheme of element :– Row-major order– Column-major order
• Various matrix transposition algorithms can be designed by manipulating stride permutation:
jiA ,
mi
nj
nj
mi
mnn eeeeL )(
mnjin
nj
mi eee
mnijm
mi
nj eee
8PMLAB, IECS, FCU
Matrix Transposition (cont’d)
pr
pqrsrs
mnn LL
))((
))((
sqrrp
qsspr
qsprrq
pssr
ILILI
ILILI
sq
rp
rpStep1: blocks with qs elements of each blockStep2: perform transposition of matrix for pr
blocksStep3: transpose a block matrix with each
block of qs elementsStep4: convert a block structure order of blocks
with qs elements of each blcok to the row- major order of the transposed matrix
9PMLAB, IECS, FCU
Designing Matrix Transposition on Various Interconnection Networks
• We consider two kinds of networks:– multistage interconnection network,– direct interconnection network.
• The basic component of multistage interconnection network is a switching element.
• A direct interconnection network is a set of processors connected by a set of links.
1001
2I
0110
2Sx0
x1 y1
y0x0
x1 y1
y0
10PMLAB, IECS, FCU
Designing Matrix Transposition on Various Interconnection Networks
• Suppose that N=2n,• Omega network
• Baseline network
• Hypercube network))(( 22
22
1
0 2 1 DILIB n
in
i
n
iN
}0{222 1 niISIH iin
N
))(( 222
1
0 2 11
n
nn LDIn
iN
11PMLAB, IECS, FCU
16828
16828
16828
16828
16 )()()()( LDILDILDILDI
168L
168L
168L
168L
12PMLAB, IECS, FCU
)())()()()(( 2816228
82228
4242816
16 DILDILIDILIDIIB
424 LI 8
22 LI 16I162L
13PMLAB, IECS, FCU
}{}{}{}{ 824222242816 ISISIISISIH
0 4
651
2
73
8 12
14139
10
1511
0 1
532
4
76
8 9
131110
12
1514
0 1 2 3
14PMLAB, IECS, FCU
Deviation of Algorithm on Omega Interconnection Network
122
22
)(22 n
n
n
n LL
))((
)(
)(
)(
2
1212
2
12
2
2
2
222
1
02
1
0
22
1
0
122
0
1
122
1
0
122
n
nn
n
n
n
n
n
LII
L
L
L
L
n
i
n
i
n
i
ni
n
i
15PMLAB, IECS, FCU
16822
16822
164 )()( 33 LIILIIL
10
32
54
76
98
1110
1312
1514
80
91
102
113
124
135
146
157
40
128
51
139
62
1410
23
1511
10
32
54
76
98
1110
1312
1514
20
64
108
1412
31
75
119
1513
168L
168L
16828
16828
16828
16828
16 )()()()( LDILDILDILDI
Omega Interconnection Network
16PMLAB, IECS, FCU
Deviation of Algorithm on Baseline Interconnection Network
)(1
0
222
n
i
n in
i LIR
]))((][))(([
])()][([
])()][([
)]()][([
12
22222
1
022
222
1
0
12222
222
1
0
1
0
222
222
1
0
1
0
2222
222
12
2
12
2
22
22
2
n
ni
n
i
n
i
n
ni
n
i
n
i
n
i
n
i
IILIIILI
LILI
LILI
LIILI
n
in
in
in
i
in
i
in
i
in
in
in
i
in
in
in
i
)(2
,222
2 nnn RIRL n
n
n Bit-reversal operation
1
0
222
,2n
i
nn in
i LIR
Partial bit-reversal operation
17PMLAB, IECS, FCU
424 LI
10
32
54
76
98
1110
1312
1514
20
31
64
75
108
119
1412
1513
20
31
64
75
108
119
1412
1513
10
54
98
1312
32
76
1110
1514
40
128
51
139
62
1410
73
1511
16I162L
822 LI
))()(()())(( 28424281628
16228
822
164 IILIIIIIILIILIL
)())()()()(( 2816228
82228
4242816
16 DILDILIDILIDIIB Baseline Interconnection Network
18PMLAB, IECS, FCU
000
0000000000000000
0010000000000100
0000000000000000
1000000000000001
0000000000000000
0000000000000000
0000100000010000
1000000000000001
42L
Hypercube Interconnection Network
0
2 3
1
0 1
32
0
2 3
1
0
3
1
2
0
2 3
1
0
31
2
19PMLAB, IECS, FCU
Deviation of Algorithm on Hypercube Interconnection Network
n
nL22
2
))((
)(
22
1
0
1
0
4222
1
02
222
11
1
1
ijnjin
i
n
nin
IILII
ILI
n
i
n
j
n
i
20PMLAB, IECS, FCU
))()()(( 2422
4244
422
422
164 ILILIILILIL
0 1
532
4
76
8 9
131110
12
1514
Hypercube Interconnection Network (cont’d)
}{}{}{}{ 824222242816 ISISIISISIH
))()()(( 2422
4244
422
422
164 ILILIILILIL
0 1
354
2
76
8 9
111312
10
1514
))()()(( 2422
4244
422
422
164 ILILIILILIL
0 4
651
2
73
8 12
14139
10
1511
))()()(( 2422
4244
422
422
164 ILILIILILIL
0 4
1251
8
139
2 6
1473
10
1511
))()()(( 2422
4244
422
422
164 ILILIILILIL
0 4
5128
1
139
2 6
71410
3
1511
21PMLAB, IECS, FCU
Conclusions and Future Work• We use tensor product as the framework to
design matrix transposition algorithms on various interconnection networks.
• To manipulate stride permutation operations to fit into networks.
• VLSI circuit design for DSP and image processing algorithms on various interconnection networks.