1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton,...
-
Upload
andrew-flynn -
Category
Documents
-
view
219 -
download
0
Transcript of 1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton,...
1
Resolution of large symmetric eigenproblems on a world-wide grid
Laurent Choy, Serge Petiton, Mitsuhisa SatoCNRS/LIFLHPCS Lab. University of Tsukuba
2nd NEGST workshop at TokyoMay 28-29th, 2007
2
Outlines
Introduction Distribution of the numerical method Experiments
Experiments on world-wide grids: platforms, numerical settings
Experiments on Grid'5000: motivations, platforms, numerical settings
Results YML
Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments
Conclusion
3
Outlines
➔ Introduction Distribution of the numerical method Experiments
Experiments on world-wide grids: platforms, numerical settings
Experiments on Grid'5000: motivations, platforms, numerical settings
Results YML
Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments
Conclusion
4
Introduction
Huge number of nodes connected to Internet Clusters and NOWs of institutions,PCs of individual users Volunteer
Constant availability of nodes, on-demand access HPC and large Grid Computing are complementary
We do not target the highest performances We target a different community of users
Why the real symmetric eigenproblem? Requires a lot of resources on the nodes Communications, synchronization points Useful problem Few similar studies for very large Grid Computing
5
Outlines
Introduction➔ Distribution of the numerical method Experiments
Experiments on world-wide grids: platforms, numerical settings
Experiments on Grid'5000: motivations, platforms, numerical settings
Results YML
Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments
Conclusion
6
Distribution of the numerical method (1/2) Real symmetric eigenproblem
Au=lu, A real symmetric Main steps:
Lanczos tridiagonalization T=QtAQ, T real symmetric tridiagonal Data accessed by means of MVP
Bisection and Inverse Iteration Tv=lv, same eigenvalues as A (Ritz eigenvalues) Communication-free parallelism: task-farming
Ritz eigenvectors computations (u) Accuracy tests |Au-lu|2<eps
7
Distribution of thenumerical method (2/2)
Reducing the memory usage Out-of-core Restarted scheme
Reorthogonalization Bisection, Inverse Iteration Reduces the disk usage too
Volume of communications Data-persistence (A and Q)
Number of communications Task-farming
Other issue to be improved Distribution of A
8
Outlines
Introduction Distribution of the numerical method➔ Experiments
➔ Experiments on world-wide grids: platforms, numerical settings
➔ Experiments on Grid'5000: motivations, platforms, numerical settings
Results YML
Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments
Conclusion
9
World-wide grid experimentsExperimental platforms, numerical settings (1/2) Computing and network resources
University of Tsukuba Homogeneous dedicated clusters Dual Xeon ~3GHz, 1 to 4 GB
University of Lille 1 Heterogeneous NOWs Celeron 1.4 GHz to P4 3.2 Ghz 128MB to 1GB Shared with students
Internet
10
World-wide grid experimentsExperimental platforms, numerical settings (2/2) 4 Platforms
OmniRPC 2 local platforms: 29 / 58 nodes, Lille 2 world-wide platforms
58 (29 Lille+ 29 Tsukuba dual-proc.) 116 (58 Lille, 58 Tsukuba dual-proc.)
Matrix N=47792 2.5 million elements, avg 48 nnz/row
Parameters M=10, 15, 20, 25 K=1, 2, 3, 4
11
Grid'5000 experimentsPresentation, motivations
Up to 9 sites distributed in France Dedicated PC with reservation policy Fast and dedicated Network
RENATER (1GBit/s to 10GBit/s) PC are homogeneous (few exceptions) Homogeneous environment
(deployment strategy)
For those experiments Orsay: up to 300 single-CPU nodes Lille: up to 60 single-CPU nodes Nice: up to 60 dual-CPU nodes Rennes: up to 70 dual-CPU nodes
12
Grid'5000 experimentsPlatforms and numerical settings (1/2)
Step 1: Goal: improving previous analysis. Platforms
29 Orsay, single-proc 58 Orsay, single-proc 58 Lille, Sophia dual-proc 116 Orsay, Sophia dual-proc (1 core/proc) + 116 Orsay, Lille, Sophia dual-proc (1 core/proc)
1 process/dual-processor
Numerical settings Matrix: N=47792 , 2.5 million elements, avg 48 nnz/row Parameters
m=10, 15, 20, 25 k=1, 2, 3, 4
13
Grid'5000 experimentsPlatforms and numerical settings (2/2)
Step 2: Goal: increasing the size of the problem. In progress N=430128, 193 million elements 7 OmniRPC relay nodes, 206 CPU
3 sites 11 OmniRPC relay nodes, 412 CPU
4 sites k=1, m=15
14
Outlines
Introduction Distribution of the numerical method➔ Experiments
Experiments on world-wide grids: platforms, numerical settings
Experiments on Grid'5000: motivations, platforms, numerical settings
➔ Results YML
Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments
Conclusion
15
World-wide grid experiments Results
Sing. Proc. OrsayDual. Proc. Tsukuba (all proc. Used)
116
Sing. Proc. LilleDual. Proc. Tsukuba (all proc. Used)
58
Sing. Proc Lille58
Sing. Proc. Lille29
16
Grid'5000 experiments – step 1 Results
Sing. Proc. OrsaySing. Proc. LilleDual. Proc. Sophia (1 proc. Used)
116
Sing. Proc. OrsayDual. Proc. Sophia (all proc. Used)
116
Sing. Proc. LilleDual. Proc. Sophia (all proc. Used)
58
Sing. Proc Orsay58
Sing. Proc. Orsay 29
17
Grid'5000 experiments – step 2 Results
119Ritz eigenvector
9<1Bisection + Inverse Iteration
1315010962Wall-clock time
Send new column of Q: 20MVP: 12311Reorthog: 159
Send new column of Q: 22MVP: 10106Reorthog: 129
Lanczos tridiagonalization
Details for N=430128, m=15, k=1Wall-clock times in seconds
691
206
810
412
|Au-lu| < eps
Number cpu
Evaluation of the wall-clock-time for 1 MVP with the matrix A In the tridiagonalization:
15(m)*5(nb restarts)=75 MVPs 134 sec (206 cpu) and 164 sec (412 cpu) per MVP
In the tests of convergence: 5(nb restarts) MVPs 138 sec (206 cpu) and 162 sec (412 cpu) per MVP
18
Outlines
Introduction Distribution of the numerical method Experiments
Experiments on world-wide grids: platforms, numerical settings
Experiments on Grid'5000: motivations, platforms, numerical settings
Results➔ YML
➔ Progress of YML➔ YvetteML workflow of the real symmetric
eigenproblem➔ First experiments
Conclusion
19
Progress of YML
YML 1.0.5Stability, error reportingCollections of data
out-of-coreVariable lists of parametersParameters in/out of the Workflow
Mainly developed at the PRiSM laboratory, University of Versailles
http://yml.prism.uvsq.fr/Olivier Delannoy, Nahid Emad
20
Resolution of the eigenproblem with YML No data persistence
Future work: binary cache
Re-usability / aggregation of components
21
Experiments with YML & OmniRPC back-end
YML + OmniRPC back-end
(wall-clock times in min)OmniRPC
(wall-clock times in min)
Overhead (in %)
Sources of overhead No computation in the
YvetteML workflow Sheduler, (un)packing the
parameters Transfers of binaries
22
Outlines
Introduction Distribution of the numerical method Experiments
Experiments on world-wide grids: platforms, numerical settings
Experiments on Grid'5000: motivations, platforms, numerical settings
Results YML
Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments
➔ Conclusion
23
Conclusion (1/3) Reminder of the scope of this work
Large grid computing and HPC: complementary tools Used by people that have no access to HPC
Significant computations (size of the problem) We do not (cannot) target the high performances
The resources are not dedicated Slow networks, heterogeneous machines, external
perturbations, etc Linear algebra problems are useful for many general
applications Differences with HPC and cluster computing
We must not have a “speed-up” approach of the computations
Recommendations to save resources on nodes
24
Conclusion (2/3)
We propose Scalable real symmetric eigensolver for large grids
Next expected bounding limit: disk space for much larger or very dense matrix
Before the implementation of the method, key choices must be done
Numerical methods and programming paradigms Bisection (Task-farming) Restarted scheme (memory and disk)
Out-of-core (memory) Data persistence (communication)
New version of YML Workflow of the eigensolver and re-usable components
In progress
25
Conclusion (3/3)
Topics of study for the eigensolver Improving the distribution of A Testing more matrices
Different kind of matrices (e.g. sparse, dense) Larger matrices
Scheduling level adapting the workload balancing to the heterogeneity of the
platforms
Current and future work on YML Finishing the multi back-end support Binary cache