1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton,...

1

Resolution of large symmetric eigenproblems on a world-wide grid

Laurent Choy, Serge Petiton, Mitsuhisa SatoCNRS/LIFLHPCS Lab. University of Tsukuba

2nd NEGST workshop at TokyoMay 28-29th, 2007

2

Outlines

Introduction Distribution of the numerical method Experiments

Experiments on world-wide grids: platforms, numerical settings

Experiments on Grid'5000: motivations, platforms, numerical settings

Results YML

Progress of YML YvetteML workflow of the real symmetric eigenproblem First experiments

Conclusion

3

Outlines

➔ Introduction Distribution of the numerical method Experiments



Results YML


Conclusion

4

Introduction

Huge number of nodes connected to Internet Clusters and NOWs of institutions,PCs of individual users Volunteer

Constant availability of nodes, on-demand access HPC and large Grid Computing are complementary

We do not target the highest performances We target a different community of users

Why the real symmetric eigenproblem? Requires a lot of resources on the nodes Communications, synchronization points Useful problem Few similar studies for very large Grid Computing

5

Outlines

Introduction➔ Distribution of the numerical method Experiments



Results YML


Conclusion

6

Distribution of the numerical method (1/2) Real symmetric eigenproblem

Au=lu, A real symmetric Main steps:

Lanczos tridiagonalization T=QtAQ, T real symmetric tridiagonal Data accessed by means of MVP

Bisection and Inverse Iteration Tv=lv, same eigenvalues as A (Ritz eigenvalues) Communication-free parallelism: task-farming

Ritz eigenvectors computations (u) Accuracy tests |Au-lu|2<eps

7

Distribution of thenumerical method (2/2)

Reducing the memory usage Out-of-core Restarted scheme

Reorthogonalization Bisection, Inverse Iteration Reduces the disk usage too

Volume of communications Data-persistence (A and Q)

Number of communications Task-farming

Other issue to be improved Distribution of A

8

Outlines

Introduction Distribution of the numerical method➔ Experiments

➔ Experiments on world-wide grids: platforms, numerical settings

➔ Experiments on Grid'5000: motivations, platforms, numerical settings

Results YML


Conclusion

9

World-wide grid experimentsExperimental platforms, numerical settings (1/2) Computing and network resources

University of Tsukuba Homogeneous dedicated clusters Dual Xeon ~3GHz, 1 to 4 GB

University of Lille 1 Heterogeneous NOWs Celeron 1.4 GHz to P4 3.2 Ghz 128MB to 1GB Shared with students

Internet

10

World-wide grid experimentsExperimental platforms, numerical settings (2/2) 4 Platforms

OmniRPC 2 local platforms: 29 / 58 nodes, Lille 2 world-wide platforms

58 (29 Lille+ 29 Tsukuba dual-proc.) 116 (58 Lille, 58 Tsukuba dual-proc.)

Matrix N=47792 2.5 million elements, avg 48 nnz/row

Parameters M=10, 15, 20, 25 K=1, 2, 3, 4

11

Grid'5000 experimentsPresentation, motivations

Up to 9 sites distributed in France Dedicated PC with reservation policy Fast and dedicated Network

RENATER (1GBit/s to 10GBit/s) PC are homogeneous (few exceptions) Homogeneous environment

(deployment strategy)

For those experiments Orsay: up to 300 single-CPU nodes Lille: up to 60 single-CPU nodes Nice: up to 60 dual-CPU nodes Rennes: up to 70 dual-CPU nodes

12

Grid'5000 experimentsPlatforms and numerical settings (1/2)

Step 1: Goal: improving previous analysis. Platforms

29 Orsay, single-proc 58 Orsay, single-proc 58 Lille, Sophia dual-proc 116 Orsay, Sophia dual-proc (1 core/proc) + 116 Orsay, Lille, Sophia dual-proc (1 core/proc)

1 process/dual-processor

Numerical settings Matrix: N=47792 , 2.5 million elements, avg 48 nnz/row Parameters

m=10, 15, 20, 25 k=1, 2, 3, 4

13

Grid'5000 experimentsPlatforms and numerical settings (2/2)

Step 2: Goal: increasing the size of the problem. In progress N=430128, 193 million elements 7 OmniRPC relay nodes, 206 CPU

3 sites 11 OmniRPC relay nodes, 412 CPU

4 sites k=1, m=15

14

Outlines

Introduction Distribution of the numerical method➔ Experiments



➔ Results YML


Conclusion

15

World-wide grid experiments Results

Sing. Proc. OrsayDual. Proc. Tsukuba (all proc. Used)

116

Sing. Proc. LilleDual. Proc. Tsukuba (all proc. Used)

58

Sing. Proc Lille58

Sing. Proc. Lille29

16

Grid'5000 experiments – step 1 Results

Sing. Proc. OrsaySing. Proc. LilleDual. Proc. Sophia (1 proc. Used)

116

Sing. Proc. OrsayDual. Proc. Sophia (all proc. Used)

116

Sing. Proc. LilleDual. Proc. Sophia (all proc. Used)

58

Sing. Proc Orsay58

Sing. Proc. Orsay 29

17

Grid'5000 experiments – step 2 Results

119Ritz eigenvector

9<1Bisection + Inverse Iteration

1315010962Wall-clock time

Send new column of Q: 20MVP: 12311Reorthog: 159

Send new column of Q: 22MVP: 10106Reorthog: 129

Lanczos tridiagonalization

Details for N=430128, m=15, k=1Wall-clock times in seconds

691

206

810

412

|Au-lu| < eps

Number cpu

Evaluation of the wall-clock-time for 1 MVP with the matrix A In the tridiagonalization:

15(m)*5(nb restarts)=75 MVPs 134 sec (206 cpu) and 164 sec (412 cpu) per MVP

In the tests of convergence: 5(nb restarts) MVPs 138 sec (206 cpu) and 162 sec (412 cpu) per MVP

18

Outlines




Results➔ YML

➔ Progress of YML➔ YvetteML workflow of the real symmetric

eigenproblem➔ First experiments

Conclusion

19

Progress of YML

YML 1.0.5Stability, error reportingCollections of data

out-of-coreVariable lists of parametersParameters in/out of the Workflow

Mainly developed at the PRiSM laboratory, University of Versailles

http://yml.prism.uvsq.fr/Olivier Delannoy, Nahid Emad

20

Resolution of the eigenproblem with YML No data persistence

Future work: binary cache

Re-usability / aggregation of components

21

Experiments with YML & OmniRPC back-end

YML + OmniRPC back-end

(wall-clock times in min)OmniRPC

(wall-clock times in min)

Overhead (in %)

Sources of overhead No computation in the

YvetteML workflow Sheduler, (un)packing the

parameters Transfers of binaries

22

Outlines




Results YML


➔ Conclusion

23

Conclusion (1/3) Reminder of the scope of this work

Large grid computing and HPC: complementary tools Used by people that have no access to HPC

Significant computations (size of the problem) We do not (cannot) target the high performances

The resources are not dedicated Slow networks, heterogeneous machines, external

perturbations, etc Linear algebra problems are useful for many general

applications Differences with HPC and cluster computing

We must not have a “speed-up” approach of the computations

Recommendations to save resources on nodes

24

Conclusion (2/3)

We propose Scalable real symmetric eigensolver for large grids

Next expected bounding limit: disk space for much larger or very dense matrix

Before the implementation of the method, key choices must be done

Numerical methods and programming paradigms Bisection (Task-farming) Restarted scheme (memory and disk)

Out-of-core (memory) Data persistence (communication)

New version of YML Workflow of the eigensolver and re-usable components

In progress

25

Conclusion (3/3)

Topics of study for the eigensolver Improving the distribution of A Testing more matrices

Different kind of matrices (e.g. sparse, dense) Larger matrices

Scheduling level adapting the workload balancing to the heterogeneity of the

platforms

Current and future work on YML Finishing the multi back-end support Binary cache

1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton,...

Documents

Transcript of 1 Resolution of large symmetric eigenproblems on a world-wide grid Laurent Choy, Serge Petiton,...