MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial)...

29
MueLu: The Trilinos Multigrid Framework Andrey Prokopenko Jonathan Hu Sandia National Labs October 29, 2014 SAND2014-19281 PE Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXP Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 1 / 19

Transcript of MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial)...

Page 1: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu: The Trilinos Multigrid Framework

Andrey Prokopenko

Jonathan Hu

Sandia National Labs

October 29, 2014

SAND2014-19281 PE

Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly

owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security

Administration under contract DE-AC04-94AL85000. SAND NO. 2011-XXXP

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 1 / 19

Page 2: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu

Team

Andrey Prokopenko (SNL)

Tobias Wiesner (TUM)

Jonathan Hu (SNL)

Chris Siefert (SNL)

Ray Tuminaro (SNL)

Paul Tsuji (SNL)

First public release

Trilinos 11.12, October 2014

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 2 / 19

Page 3: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu

Team

Andrey Prokopenko (SNL)

Tobias Wiesner (TUM)

Jonathan Hu (SNL)

Chris Siefert (SNL)

Ray Tuminaro (SNL)

Paul Tsuji (SNL)

First public release

Trilinos 11.12, October 2014

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 2 / 19

Page 4: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Algebraic Multigrid (AMG)

Spost1

Spost0

Spre1

Spre0

S2

A0

A1

A2

Main idea

Capture errors at multiple resolutions.

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 3 / 19

Page 5: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Algebraic Multigrid (AMG)

Spost1

Spost0

Spre1

Spre0

S2

A0

A1

A2

Two main components

Smoothers

Approximate solve on each level

‘‘Cheap’’ reduction of oscillatory error

(high energy)

SL ≈ A−1L on the coarsest level L

Grid transfers (prolongators and restrictors)

Main idea

Capture errors at multiple resolutions.

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 3 / 19

Page 6: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Algebraic Multigrid (AMG)

P2R2

P1R1

Ai = RiAi−1Pi

Spost1

Spost0

Spre1

Spre0

S2

A0

A1

A2

Two main components

Smoothers

Approximate solve on each level

‘‘Cheap’’ reduction of oscillatory error

(high energy)

SL ≈ A−1L on the coarsest level L

Grid transfers (prolongators and restrictors)

Data movement between levels

Reduction of smooth error (low energy)

Main idea

Capture errors at multiple resolutions.

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 3 / 19

Page 7: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Capabilities

Can use either Epetra (32-bit) or TpetraTemplate types: Local and global indices, scalar, compute node

Grid transfers

Smoothed and unsmoothed aggregation

Petrov-Galerkin

Energy minimization

Maxwell

Smoothers (Ifpack/Ifpack2)Relaxation: Jacobi, SOR, l1 Gauss-Seidel

Incomplete factorizations: ILU(k), ILUT, ILUTP*

Others: Chebyshev, additive Schwarz, Krylov, Vanka, . . .

Direct solvers (Amesos/Amesos2)KLU2, SuperLU, . . .

Load balancing (Zoltan/Zoltan2)RCB, multijagged (Zoltan2 only)

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 4 / 19

Page 8: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu/ML Feature Comparison

Similarities

Algorithmic capabilities

Performance (with some caveats)

Simple application interfaces

Simple input decks

Differences

MueLu can solve problems with > 2.1b DOFs

MueLu can use Kokkos (MPI+X)

MueLu has much stronger unit testing than ML

ML has a better scaling SPGEMM (slower in serial)

Relative setup performance Relative solve performance

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 5 / 19

Page 9: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Some Performance Results

Weak scalability of GMRES/SA-AMG pressure

solve on BG/Q

Strong scalability of GMRES/SA-AMG

pressure solve on BG/Q

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 6 / 19

Page 10: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

User interfaces

Natural parameter lists (recommended)

Suitable for beginners and experts

Support most common use-cases

Provide a reasonable subset of all MueLu parameters

Fully validated

Hierarchical parameter lists

Suitable for experts

Reflect module dependencies in MueLu

ML-style parameter lists

Oriented toward former ML users

Strive to provide some backwards compability with MLBut: MueLu and ML have different defaults

C++ API

Through Stratimikos

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 7 / 19

Page 11: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Natural interface

1 <ParameterList name="MueLu">2 <Parameter name="verbosity" type="string" value="high"/>3 <Parameter name="max levels" type="int" value="10"/>4 <Parameter name="coarse: max size" type="int" value="2000"/>5 </ParameterList>

Uses reasonable defaults

Generates smoothed aggregation AMG

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 8 / 19

Page 12: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Natural interface

1 <ParameterList name="MueLu">2 <Parameter name="verbosity" type="string" value="high"/>3 <Parameter name="max levels" type="int" value="10"/>4 <Parameter name="coarse: max size" type="int" value="2000"/>5 <Parameter name="multigrid algorithm" type="string"6 value="unsmoothed"/>7 </ParameterList>

Generates unsmoothed aggregation AMG

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 8 / 19

Page 13: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Natural interface

1 <ParameterList name="MueLu">2 <Parameter name="verbosity" type="string" value="high"/>3 <Parameter name="max levels" type="int" value="10"/>4 <Parameter name="coarse: max size" type="int" value="2000"/>5 <Parameter name="multigrid algorithm" type="string"6 value="unsmoothed"/>7 <Parameter name="smoother: type" type="string"8 value="CHEBYSHEV"/>9 <ParameterList name="smoother: params">10 <Parameter name="chebyshev: degree" type="int" value="3"/>11 </ParameterList>12 </ParameterList>

Generates unsmoothed aggregation AMG

Use third degree polynomial smoother

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 8 / 19

Page 14: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Natural interface

1 <ParameterList name="MueLu">2 <Parameter name="verbosity" type="string" value="high"/>3 <Parameter name="max levels" type="int" value="10"/>4 <Parameter name="coarse: max size" type="int" value="2000"/>5 <Parameter name="multigrid algorithm" type="string"6 value="unsmoothed"/>7 <ParameterList name="level 2">8 <Parameter name="smoother: type" type="string"9 value="CHEBYSHEV"/>10 <ParameterList name="smoother: params">11 <Parameter name="chebyshev: degree" type="int" value="3"/>12 </ParameterList>13 </ParameterList>14 </ParameterList>

Generates unsmoothed aggregation AMG

Use third degree polynomial smoother on level 2

Use default smoother (symmetric Gauss-Seidel) for all other levelsAndrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 8 / 19

Page 15: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu’s master list

Single place for all MueLu parameters.

1 <parameter>2 <name>smoother: type</name>3 <type>string</type>4 <default>"RELAXATION"</default>5 <Poisson>"CHEBYSHEV"</Poisson>6 <description>Smoother type</description>7 <visible>true</visible>8 </parameter>

XSL transformations to

ParameterListUsed internally in MueLu

LaTEXUsed in User’s Manual

HTML

Used for website

Documentation is always consistent!

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 9 / 19

Page 16: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu as a preconditioner in Belos

1 // Create A, B, X ...2 Teuchos::RCP<Tpetra::CrsMatrix<> > A;3 Teuchos::RCP<Tpetra::MultiVector<> > B, X;

4 // Construct preconditioner5 std::string optionsFile = "mueluOptions.xml";6 Teuchos::RCP<MueLu::TpetraOperator> mueLuPreconditioner =7 MueLu::CreateTpetraPreconditioner(A, optionsFile);8 // Construct problem9 Belos::LinearProblem<> problem(A, X, B);10 problem->setLeftPrec(mueLuPreconditioner);11 bool set = problem.setProblem();12 // Set Belos parameters13 Teuchos::ParameterList belosList;14 belosList.set("Maximum Iterations", 100);15 // Solve the problem16 Belos::BlockCGSolMgr<> solver(rcp(&problem,false), rcp(&

belosList,false));17 Belos::ReturnType ret = solver.solve();

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 10 / 19

Page 17: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu as a preconditioner in Belos

1 // Create A, B, X ...2 Teuchos::RCP<Tpetra::CrsMatrix<> > A;3 Teuchos::RCP<Tpetra::MultiVector<> > B, X;4 // Construct preconditioner5 std::string optionsFile = "mueluOptions.xml";6 Teuchos::RCP<MueLu::TpetraOperator> mueLuPreconditioner =7 MueLu::CreateTpetraPreconditioner(A, optionsFile);

8 // Construct problem9 Belos::LinearProblem<> problem(A, X, B);10 problem->setLeftPrec(mueLuPreconditioner);11 bool set = problem.setProblem();12 // Set Belos parameters13 Teuchos::ParameterList belosList;14 belosList.set("Maximum Iterations", 100);15 // Solve the problem16 Belos::BlockCGSolMgr<> solver(rcp(&problem,false), rcp(&

belosList,false));17 Belos::ReturnType ret = solver.solve();

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 10 / 19

Page 18: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu as a preconditioner in Belos

1 // Create A, B, X ...2 Teuchos::RCP<Tpetra::CrsMatrix<> > A;3 Teuchos::RCP<Tpetra::MultiVector<> > B, X;4 // Construct preconditioner5 std::string optionsFile = "mueluOptions.xml";6 Teuchos::RCP<MueLu::TpetraOperator> mueLuPreconditioner =7 MueLu::CreateTpetraPreconditioner(A, optionsFile);8 // Construct problem9 Belos::LinearProblem<> problem(A, X, B);10 problem->setLeftPrec(mueLuPreconditioner);11 bool set = problem.setProblem();

12 // Set Belos parameters13 Teuchos::ParameterList belosList;14 belosList.set("Maximum Iterations", 100);15 // Solve the problem16 Belos::BlockCGSolMgr<> solver(rcp(&problem,false), rcp(&

belosList,false));17 Belos::ReturnType ret = solver.solve();

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 10 / 19

Page 19: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu as a preconditioner in Belos

1 // Create A, B, X ...2 Teuchos::RCP<Tpetra::CrsMatrix<> > A;3 Teuchos::RCP<Tpetra::MultiVector<> > B, X;4 // Construct preconditioner5 std::string optionsFile = "mueluOptions.xml";6 Teuchos::RCP<MueLu::TpetraOperator> mueLuPreconditioner =7 MueLu::CreateTpetraPreconditioner(A, optionsFile);8 // Construct problem9 Belos::LinearProblem<> problem(A, X, B);10 problem->setLeftPrec(mueLuPreconditioner);11 bool set = problem.setProblem();12 // Set Belos parameters13 Teuchos::ParameterList belosList;14 belosList.set("Maximum Iterations", 100);

15 // Solve the problem16 Belos::BlockCGSolMgr<> solver(rcp(&problem,false), rcp(&

belosList,false));17 Belos::ReturnType ret = solver.solve();

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 10 / 19

Page 20: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu as a preconditioner in Belos

1 // Create A, B, X ...2 Teuchos::RCP<Tpetra::CrsMatrix<> > A;3 Teuchos::RCP<Tpetra::MultiVector<> > B, X;4 // Construct preconditioner5 std::string optionsFile = "mueluOptions.xml";6 Teuchos::RCP<MueLu::TpetraOperator> mueLuPreconditioner =7 MueLu::CreateTpetraPreconditioner(A, optionsFile);8 // Construct problem9 Belos::LinearProblem<> problem(A, X, B);10 problem->setLeftPrec(mueLuPreconditioner);11 bool set = problem.setProblem();12 // Set Belos parameters13 Teuchos::ParameterList belosList;14 belosList.set("Maximum Iterations", 100);15 // Solve the problem16 Belos::BlockCGSolMgr<> solver(rcp(&problem,false), rcp(&

belosList,false));17 Belos::ReturnType ret = solver.solve();

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 10 / 19

Page 21: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Documentation

User’s Guide (packages/muelu/doc/UsersGuide)

Geared towards new users

Complete list of user options (new options are caught automatically)

Tutorial (packages/muelu/doc/Tutorial)

Examples and tests (packages/muelu/{examples,tests})

Mailing lists

{muelu-users,muelu-developers}@software.sandia.gov

Doxygen

Best used as reference

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 11 / 19

Page 22: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

MueLu Tutorial and virtual machine

PDF guide along with interactive Python script

Provides a step-by-step tutorial for new MueLu users

with practical examples

Easy to try multigrid methods

Comes with a VirtualBox image, no Trilinoscompilation

The MueLu tutorialTobias Wiesner

Michael Gee

Andrey Prokopenko

Jonathan Hu

SAND2014-18624 R

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 12 / 19

Page 23: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Future Plans

Support for 64-bit Epetra

Incorporation of Kokkos kernels directly into MueLu

Setup cost reduction, improved solver robustness

Reuse of algorithmic components between solves

New algorithms research for

higher order methods

semistructured problems (i.e., extruded meshes)

UQ ensembles

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 13 / 19

Page 24: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 14 / 19

Page 25: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

Backup slides

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 15 / 19

Page 26: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

C++ interface

1 Hierarchy H(fineA); // generate hierarchy using fine level2 // matrix3

4 H.Setup(); // call multigrid setup (create hierarchy)5

6 H.Iterate(B, nIts, X);// perform nIts iterations with multigrid7 // algorithm (V-Cycle)

Uses reasonable defaults

Generates smoothed aggregation AMG

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 16 / 19

Page 27: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

C++ interface

1 Hierarchy H(fineA); // generate hierarchy using fine level2 // matrix3 RCP<TentativePFactory> PFact = rcp(new TentativePFactory());4 FactoryManager M; // construct factory manager5 M.SetFactory("P", PFact); // define tentative prolongator6 // factory as default factory for7 // generating P8 H.Setup(M); // call multigrid setup (create hierarchy)9

10 H.Iterate(B, nIts, X);// perform nIts iterations with multigrid11 // algorithm (V-Cycle)

Generates unsmoothed aggregation AMG

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 17 / 19

Page 28: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

C++ interface

1 Hierarchy H(fineA); // generate hierarchy using fine level2 // matrix3 Teuchos::ParameterList smootherParams;4 smootherParams.set("chebyshev: degree", 3);5 RCP<SmootherPrototype> smooProto =6 rcp(new TrilinosSmoother("CHEBYSHEV", smootherParams);7 RCP<SmootherFactory> smooFact =8 rcp(new SmootherFactory(smooProto));9 FactoryManager M;10 M.SetFactory("Smoother", smooFact);11

12 H.Setup(M); // call multigrid setup (create hierarchy)13

14 H.Iterate(B, nIts, X);// perform nIts iterations with multigrid15 // algorithm (V-Cycle)

Generates smoothed aggregation AMG

Use third degree polynomial smootherAndrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 18 / 19

Page 29: MueLu: The Trilinos Multigrid Framework · ML has a better scaling SPGEMM (slower in serial) Relative setup performance Relative solve performance Andrey Prokopenko MueLu: The Trilinos

References

1 P. Lin, M. Bettencourt, S. Domino, T. Fisher, M. Hoemmen, J.J. Hu, E. Phipps, A. Prokopenko, S. Rajamanickam, C. Siefert, S. Kennon, Towards extreme-scale

simulations for low Mach fluids with second- generation Trilinos (to appear)

2 L. Olson, J. Schroder, and R.S. Tuminaro, A general interpolation strategy for algebraic multi- grid using energy minimization SIAM Journal on ScientiïňĄc

Computing, 33(2):966âĂŞ991, 2011.

3 M. Sala and R.S. Tuminaro, A new Petrov-Galerkin smoothed aggregation preconditioner for nonsymmetric linear systems SIAM Journal on ScientiïňĄc

Computing, 31(1):143âĂŞ166, 2008.

4 T.A. Wiesner, Flexible aggregation-based algebraic multigrid methods for contact and flow problems, PhD thesis, 2014.

5 T.A. Wiesner, M.W. Gee, A. Prokopenko, and J.J. Hu, The MueLu tutorial, http://trilinos.org/packages/muelu/muelu-tutorial, 2014. SAND2014- 18624R.

6 T.A. Wiesner, R.S. Tuminaro, W.A. Wall and M.W. Gee, Multigrid transfers for nonsymmetric systems based on Schur complements and Galerkin projections, Numer.

Linear Algebra Appl., 2013

7 A. Popp, Ph. Farah, T.A. Wiesner, W.A. Wall, Efficient parallel solution methods for mortar finite element discretizations in computational contact mechanics, 11th

World Congress on Computational Mechanics (WCCM XI), Barcelona/Spain, 2014

8 T.A. Wiesner, A. Popp, M.W. Gee, W.A. Wall, Aggregation based algebraic multigrid methods for mortar methods in contact mechanics, (in preparation)

Andrey Prokopenko MueLu: The Trilinos Multigrid Framework TUG 2014 19 / 19