On-line Parallel Tomography
description
Transcript of On-line Parallel Tomography
![Page 1: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/1.jpg)
1
On-line Parallel Tomography
Shava SmallenUCSD
![Page 2: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/2.jpg)
2
I) Introduction to On-line Parallel Tomography
II) Tunable On-line Parallel Tomography
III) User-directed application-level scheduler
IV) Experiments
V) Conclusion
Talk Outline
![Page 3: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/3.jpg)
3
What is tomography?
• A method for reconstructing the interior of an object from its projections
• At the National Center for Microscopy and Imaging Research (NCMIR), tomography is applied to electron microscopy to study specimens at the cellular and subcellular level
![Page 4: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/4.jpg)
4
Tomogram of spiny dendrite(Images courtesy of Steve Lamont)
Example
![Page 5: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/5.jpg)
5
Parallel Tomography at NCMIR
• Embarrassingly parallel
X
Y
slice
specimen
Z
scanlineprojection
projection
scanline
![Page 6: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/6.jpg)
6
NCMIR Usage Scenarios
Off-line parallel tomography (off-line PT)
– Data resides somewhere on secondary storage
– Single, high quality tomogram
– Reduce turnaround time
– Previous work (HCW’ 00)
On-line parallel tomography (on-line PT)
– Data streamed from the electron microscope
• long makespan, configuration errors, etc.
– Iteratively computed tomogram
– Soft real-time execution
![Page 7: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/7.jpg)
7
On-line PT
• Real-time feedback on quality of data acquisition1 ) First projection acquired from microscope2 ) Generate coarse tomogram3 ) Iteratively refine tomogram using subsequent
projections (refresh)• Update each voxel value • Size of tomogram is constant
![Page 8: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/8.jpg)
8
NCMIR Target Platform
• Multi-user, heterogenous resources– NCMIR cluster
• SGI Indigo2, SGI Octane, SUN ULTRA, SUN Enterprise
• IRIX, Solaris– Meteor cluster
• Pentium III dual proc• Linux, PBS
– Blue Horizon• AIX, Loadleveler, Maui Scheduler
network
![Page 9: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/9.jpg)
slices
preprocessor
ptomo
ptomo
ptomo
ptomo
ptomo
writer
On-line PT Architecture
projection
scanlines
tomogram
![Page 10: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/10.jpg)
10
On-line PT Design
1) Frame on-line parallel tomography as a tunable application– Resource limitations / dynamic– Availability of alternate configurations [Chang,et al]
• each configuration corresponds to different output quality and resource usage
2) Coupled with user-directed application-level scheduler (AppLeS)– adaptive scheduler– promote application performance
![Page 11: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/11.jpg)
11
On-line PT Configuration
• Triple: (f, r, su)• Reduction factor (f)
– Reduce resolution of data reduce both computation and communication
• Projections per refresh (r)– Reduce refinement frequency reduce
communication• Service Units - (su)
– Increase cost of execution increase computational power
![Page 12: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/12.jpg)
12
User Preferences
• Best configuration (f, r, su) = (1, 1, 0 )• Several possible configurations user
specifies bounds– projections should be at least size 256x256
• 1 f 4 or 1 f 8– user could tolerate up to a 10 minute time wait
• 1 r 13– reasonable upper bound
• 0 su (50 x acquisition period x c)
![Page 13: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/13.jpg)
13
User-directed
• Feasible?– Use dynamic load information– if work allocation found
• Better? – e.g.
1. (1, 6, 4) - best f2. (2, 2, 8) - good su/r3. (2, 1, 20) - best r
reduction factor
projections per refresh
service units
![Page 14: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/14.jpg)
generaterequest
displaytriples
adjustrequest
reviewtriples
processrequest
findwork
allocation
executeon-line PT
accepts one
rejects all
infeasible
feasible
User-directed AppLeS
User
User-directed AppLeS
![Page 15: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/15.jpg)
15
Triple Search
• Search parameter space– If triple satisfies constraints feasible
• Constrained optimization problem based on soft real-time execution– compute constraint– transfer constraint
• Heuristics to reduce search space– e.g. assume user will always choose (1,2,1)
over (1,2,4)
![Page 16: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/16.jpg)
16
Work Allocation
work allocation
transfer constraints
cost
user constraints
compute constraints
cpu availability
processor availability
ptomo-to-writer bandwidth
subnet-to-writer bandwidth
Multiple mixed-integer programs approx soln
![Page 17: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/17.jpg)
17
Experiments
• Impact of dynamic information on scheduler performance
• Usefulness of tunability Grid environments
• Scheduling latency
![Page 18: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/18.jpg)
18
Dynamic Information
• We fix the triple and let schedulers determine work allocation
Infinite bandwidth
Dynamic bandwidth
Dedicated cpu
wwa wwa+bw
Dynamic cpu
wwa+cpu AppLeS
![Page 19: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/19.jpg)
19
• Evaluate schedulers– Repeatibility – Long makespan– several resource environments
• Simgrid (Casanova [CCGrid’2001])– API for evaluating scheduling algorithms
• tasks• resources modeled using traces
– E.g. Parameter sweep applications [HCW’00]• Simtomo
Simulation
![Page 20: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/20.jpg)
20
relative refresh lateness
expected refresh period
actual refresh period
• Relative refresh lateness
Performance Metric
![Page 21: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/21.jpg)
21
NCMIR experiments
• Traces (8 machines)– 8 hour work day on March 8th, 2001
• Ran simulations throughout day at 10 minute intervals
8:00 am 4:00 pm
![Page 22: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/22.jpg)
22
Perfect Load Predictions
0 1 2 3 4 5 6 7 810
0
101
102
103
104
hours since 3/8/2001 - 8:00 PST
mea
n re
lativ
e re
fresh
late
ness
wwawwa+cpuwwa+bwAppLeS
![Page 23: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/23.jpg)
23
Imperfect Load Predictions
0 1 2 3 4 5 6 7 8100
101
102
103
104
hours since 3/8/2001 - 8:00 PST
mea
n re
lativ
e re
fresh
late
ness
wwawwa+cpuwwa+bwAppLeS
![Page 24: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/24.jpg)
24
Synthetic Grids
• Bandwidth predictibility– Average prediction error– pi {L, M, H}
– p1 p2 p3
• e.g. LMH
– 27 types– 2510 Grids
x 4 schedulers– 10,040 simulations
writer
cluster3
cluster2
cluster1
p1
p2
p3
![Page 25: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/25.jpg)
25
wwa wwa+cpu wwa+bw AppLeS 0
500
1000
1500
2000
2500
3000
scheduler
num
ber o
f run
s1st2nd3rd4th
Relative Scheduler Performance
705.89 658.91 127.10 1.07
![Page 26: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/26.jpg)
26
Partial Ordering
• Performance vs. bandwidth predictability• Grid predictibility
– Partial orders using p1 p2 p3
– Comparable/Not Comparable• e.g. HML is comparable to HLL• e.g. HLM is not comparable to LHM
• HHH, HHM, HMM, HLM, MLM, LLM, LLL
![Page 27: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/27.jpg)
27
Example Partial Order
HHH HHM HMM HLM MLM LLM LLL . 10
0
101
102
103
104
rela
tive
refre
sh la
tene
ss (s
econ
ds)
wwawwa+cpuwwa+bwAppLeS
![Page 28: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/28.jpg)
28
Tunability Experiments
• How useful is tunability?– variability
• Fixed topology– categorized traces
• L, M, H
– v1 v2 v3 v4 v5
– 243 Grid types cluster2
cluster1
writer
supercomputer
v2
v1
v3
v4
v5
![Page 29: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/29.jpg)
29
Tunability Experiments
• Run over a 2 day period– back-to-back– assume single user
model• f, r, su
• Set of triples chosen– T = {1,…,61}
02
46
8
05
10150
2
4
6
x 104
fr
su
![Page 30: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/30.jpg)
30
Tunability Results
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
fract
ion
of c
hang
es
parameters
frsu
• Count how many times a triple changed per 2-day simulation
• e.g.– 12.9%– 25.7%
![Page 31: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/31.jpg)
31
0 2 4 6 8 100
1000
2000
3000
4000
5000
6000
7000
seconds
num
ber o
f exp
erim
ents
Scheduling Latency
• Time to search for feasible triples• e.g.
– 88% under 1 sec– 63% under 1 sec
![Page 32: On-line Parallel Tomography](https://reader036.fdocuments.in/reader036/viewer/2022062315/568160d0550346895dd004d7/html5/thumbnails/32.jpg)
32
Conclusions and Future Work
• Grid-enabled version of on-line parallel tomography– Tunable application
• Tunability is useful in Grid environments– User-directed AppLeS
• Importance of bandwidth predictability – e.g. rescheduling
• Scheduling latency is nominal
• Production use