Market Structure Competition. Competitive Firm P0 P P 10,000,00050,000,000 P0 3010 D d IndustryFirm.
P0 P1 P2 P3 P4 P5 P6 P7 - Argonne National Laboratory · 2009-02-07 · P0 P1 P2 P3 P4 P5 P6 P7 S...
Transcript of P0 P1 P2 P3 P4 P5 P6 P7 - Argonne National Laboratory · 2009-02-07 · P0 P1 P2 P3 P4 P5 P6 P7 S...
P0 P1 P2 P3 P4 P5 P6 P7
Step 1Step 2Step 3
P0 P1 P2 P3 P4 P50 1 2 3 4 5
P0 P1 P2 P3 P4 P54
00 1 2 3 51 2 3 4 5
P0 P1 P2 P3 P4 P5
10543
0 2 3 511 2 3 4 5 023 5 24
P1 P2 P3 P4 P5P0
After local shift
P1 P2 P3 P4 P54
34
4 5 1
34
212
00
0
2
0
2
45
0
2
4 45
2
0 0 0
22
4 4
P00
2
4
4
0 1
1 2 3 52 3 4 5 01
5 0 10
235
5 1 3
1 11111
3 3 3 3 3 34
5 555
Initial data After step 0 After step 1
After step 2
0
20
40
60
80
100
120
140
0 5 10 15 20 25 30 35
time
(micr
osec
.)
Number of processes
Myrinet Cluster, 16 bytes message size
Recursive DoublingBruck Algorithm
0
200
400
600
800
1000
1200
1400
1600
1800
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
time
(micr
osec
.)
message length (bytes)
Myrinet Cluster
MPICH OldMPICH New
0
20000
40000
60000
80000
100000
120000
0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
time
(micr
osec
.)
message length (bytes)
Myrinet Cluster
Recursive doublingRing
0
20000
40000
60000
80000
100000
120000
140000
160000
180000
0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
time
(micr
osec
.)
message length (bytes)
IBM SP
Recursive doublingRing
0
50000
100000
150000
200000
250000
0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
time
(micr
osec
.)
message length (bytes)
Myrinet Cluster
MPICH OldMPICH New
0
20000
40000
60000
80000
100000
120000
140000
160000
0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
time
(micr
osec
.)
message length (bytes)
IBM SP
IBM MPIMPICH New
200
300
400
500
600
700
800
900
0 50 100 150 200 250 300
time
(micr
osec
.)
message length (bytes)
Myrinet Cluster, 64 nodes
MPICH OldMPICH New
000102030405
1112131415
22232425
2021
333435
303132
4445
40414243
5051525354
5500
04
11
15
22
20 31
33
42
44
53
50 01 12 23 34 45
54 10 21 32 43
5500 11 22 33 4450 01 12 23 34 45
3040
4151
5202
0313
1424
2535
5500 11 22 33 4450 01 12 23 34 45
3040
4151
5202
0313
1424
2535
0454 05
152010
3121
4232
5343
P0 P1 P2 P3 P4 P5
Initial Data
P0 P1 P2 P3 P4 P5 P0 P1 P2 P3 P4 P5
P1 P2 P3 P4 P5P0
10
55
After local rotation
03
05
015500
02
04
11
13
15
22
24
20 31
33
35 40
42
44
51
53
12
14
10
23
25
21
30
32
34
41
43
45 50
52
54 05
P1 P2 P3 P4 P5P0
After local inverse rotation
0012
3024
42
10
50
20
40
011121
514131
02
2232
52
03
332313
04
544434
14
4353
0515
4555
2535
5202
0313
1424
2535
3040
4151
0454 05
15 2010
3121
4232
5343
After communication step 0
After communication step 1
P1 P2 P3 P4 P5P0
After communication step 2
P0 P1 P2 P3 P4 P5 P6 P7
Step 2
Step 1
Step 3
0
200
400
600
800
1000
1200
1400
1600
0 1000 2000 3000 4000 5000 6000 7000 8000 9000
time
(micr
osec
.)
message length (bytes)
IBM SP
IBM MPIMPICH New
0
50000
100000
150000
200000
250000
300000
350000
400000
0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
time
(micr
osec
.)
message length (bytes)
Myrinet Cluster
MPICH OldMPICH New
0
50000
100000
150000
200000
250000
300000
350000
400000
450000
0 1e+06 2e+06 3e+06 4e+06 5e+06 6e+06 7e+06 8e+06 9e+06
time
(micr
osec
.)
message length (bytes)
Myrinet Cluster
MPICH OldMPICH New
2
4
8
16
32
64
128
256
512
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Fastest Protocol forAllreduce(sum,dbl)
vendorbinary tree
pairwise + ringhalving + doublingrecursive doubling
binary blocks halving+doublingbreak-even points : size=1k and 2k and min( (size/256)9/16, ...)
0
10
20
30
40
50
60
70
80
90
100
2 4 8 16 32 64 128 256
band
widt
h [M
b/s]
number of MPI processes
buffersize = 32 kbAllreduce(sum,dbl)
vendorbinary tree
pairwise + ringhalving + doubling
binary blocks halving + doublingrecursive doubling
chosen best
16
32
64
128
256
512
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Allreduce(sum,dbl) - ratio := best bandwidth of 4 new al
2
4
8
16
32
64
128
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Allreduce(sum,dbl) - ratio := best bandwidth of 4 new algo.s / vendor’s bandwidth
100.<= ratio 50. <= ratio <100.20. <= ratio < 50.10. <= ratio < 20.7.0 <= ratio < 10.5.0 <= ratio < 7.03.0 <= ratio < 5.02.0 <= ratio < 3.01.5 <= ratio < 2.01.1 <= ratio < 1.50.9 <= ratio < 1.10.7 <= ratio < 0.90.0 <= ratio < 0.7
4
8
16
32
64
128
256
512
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Allreduce(sum,dbl) - ratio := best bandwidth of 4 new al
2
4
8
16
32
64
128
256
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Allreduce(sum,dbl) - ratio := best bandwidth of 4 new algo.s / vendor’s bandwidth
100.<= ratio 50. <= ratio <100.20. <= ratio < 50.10. <= ratio < 20.7.0 <= ratio < 10.5.0 <= ratio < 7.03.0 <= ratio < 5.02.0 <= ratio < 3.01.5 <= ratio < 2.01.1 <= ratio < 1.50.9 <= ratio < 1.10.7 <= ratio < 0.90.0 <= ratio < 0.7
2
4
8
16
32
64
128
256
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Allreduce(sum,dbl) - ratio := best bandwidth of 5 new al
2
4
8
16
32
64
128
256
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Reduce(sum,dbl) - ratio := best bandwidth of 4 new algo.s / vendor’s bandwidth
100.<= ratio 50. <= ratio <100.20. <= ratio < 50.10. <= ratio < 20.7.0 <= ratio < 10.5.0 <= ratio < 7.03.0 <= ratio < 5.02.0 <= ratio < 3.01.5 <= ratio < 2.01.1 <= ratio < 1.50.9 <= ratio < 1.10.7 <= ratio < 0.90.0 <= ratio < 0.7
2
4
8
16
32
64
128
256
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Allreduce(maxloc,dbl) - ratio := best bandwidth of 5 new
2
4
8
16
32
64
128
256
8 32 256 1k 8k 32k 256k 1M 8M
num
ber o
f MPI
pro
cess
es
buffersize [bytes]
Reduce(maxloc,dbl) - ratio := best bandwidth of 4 new algo.s / vendor’s bandwidth
100.<= ratio 50. <= ratio <100.20. <= ratio < 50.10. <= ratio < 20.7.0 <= ratio < 10.5.0 <= ratio < 7.03.0 <= ratio < 5.02.0 <= ratio < 3.01.5 <= ratio < 2.01.1 <= ratio < 1.50.9 <= ratio < 1.10.7 <= ratio < 0.90.0 <= ratio < 0.7