Future of LAPACK and ScaLAPACK

The Future of LAPACK andScaLAPACK

Jason Riedy, Yozo Hida, James Demmel

EECS DepartmentUniversity of California, Berkeley

November 18, 2005

Outline

Survey responses: What users want

Improving LAPACK and ScaLAPACKImproved NumericsImproved PerformanceImproved FunctionalityImproved Engineering and Community

Two Example ImprovementsNumerics: Iterative Refinement for Ax = bPerformance: The MRRR Algorithm for Ax = λx

2 / 16

Survey: What users want

I Survey available fromhttp://www.netlib.org/lapack-dev/.

I 212 responses, over 100 different, non-anonymous groups

I Problem sizes:100 1K 10K 100K 1M (other)8% 26% 24% 12% 6% (24%)

I >80% interested in small-medium SMPs

I >40% interested in large distributed-memory systems

I Vendor libs seen as faster, buggier

I over 20% want > double precision, 70% out-of-core

I Requests: High-level interfaces, low-level interfaces,parallel redistribution∗ and tuning

3 / 16

ParticipantsI UC Berkeley

I Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye

Li, Osni Marques, Christof Vomel, David Bindel, Yozo Hida,

Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . .I U Tennessee, Knoxville

I Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek,

Stan Tomov, . . .I Other Academic Institutions

I UT Austin, UC Davis, U Kansas, U Maryland, North Carolina

SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen,

U Madrid, U Manchester, U Umea, U Wuppertal, U ZagrebI Research Institutions

I CERFACS, LBL, UEC (Japan)I Industrial Partners

I Cray, HP, Intel, MathWorks, NAG, SGI

4 / 16

Improved Numerics

Improved accuracy with standard asymptotic speed:Some are faster!

I Iterative refinement for linear systems, least squaresDemmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan

I Pivoting and scaling for symmetric systemsI Definite and indefinite

I Jacobi SVD (and faster) — Drmac / Veselic

I Condition numbers and estimatorsHigham / Cheng / Tisseur

I Useful approximate error estimates

5 / 16

Improved Performance

Improved performance with at least standard accuracy

I MRRR algorithm for eigenvalues, SVDParlett / Dhillon / Vomel / Marques / Willems / Katagiri

I Fast Hessenberg QR & QZByers / Mathias / Braman, Kagstrom / Kressner

I Fast reductions and BLAS2.5van de Geijn, Bischof / Lang, Howell / Fulton

I Recursive data layoutsGustavson / Kagstrom / Elmroth / Jonsson

I generalized SVD — Bai, Wang

I Polynomial roots from semi-separable formGu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel

I Automated tuning, optimizations in ScaLAPACK, . . .

6 / 16

Improved Functionality

Algorithms

I Updating / downdating factorizations — Stewart, Langou

I More generalized SVDs: products, CSD — Bai, Wang

I More generalized Sylvester, Lyupanov solversKagstrom, Jonsson, Granat

I Quadratic eigenproblems — Mehrmann

I Matrix functions — Higham

Implementations

I Add “missing” features to ScaLAPACK

I Generate LAPACK, ScaLAPACK for higher precisions

7 / 16

Improved Engineering and Community

Use new features without a rewrite

I Use modern Fortran 95, maybe 2003I DO . . . END DO, recursion, allocation (in wrappers)

I Provide higher-level wrappers for common languagesI F95, C, C++

I Automatic generation of precisions, bindingsI Full automation (FLAME, etc.) not quite ready for all

functions

I Tests for algorithms, implementations, installations

Open development

Need a community for long-term evolution.http://www.netlib.org/lapack-dev/

Lots of work to do, research and development.

8 / 16

Two Example Improvements

Recent, locally developed improvements

Improved Numerics

Iterative refinement for linear systems Ax = b:

I Extra precision ⇒ small error, dependable estimate

I Both normwise and componentwise

I (See LAWN 165 for full details.)

Improved Performance

MRRR algorithm for eigenvalue, SVD problems

I Optimal complexity: O(n) per value/vector

I (See LAWNs 162, 163, 166, 167. . . for more details.)

9 / 16

Numerics: Iterative RefinementImprove solution to Ax = b

Repeat: r = b − Ax , dx = A−1r , x = x + dx

Until: good enough

Not-too-ill-conditioned ⇒ error O(√

κnorm

log10E

Normwise error vs. condition number κnorm. (2000000 cases)

167212 1156099

463800 22544

190085 260

0 5 10 15100

log10 κnorm

log10E

Normwise error vs. condition number κnorm. (2000000 cases)

821097 1136987

0 5 10 15100

10 / 16

Until: good enough

Dependable normwise relative error estimate

log10 Enorm

log10B

Normwise Error vs. Bound (2000000 cases)

133497 485930

1323311

-8 -6 -4 -2 0100

log10B

Normwise Error vs. Bound (2000000 cases)

1361 18

1957741 78

-8 -6 -4 -2 0100

11 / 16

Until: good enough

Also small componentwise errors and dependable estimates

log10 κcomp

log10E

Componentwise error vs. condition number κcomp. (2000000 cases)

545427 1380569

0 5 10 15100

PSfrag replacemen

log10 Ecomp

log10B

Componentwise Error vs. Bound (2000000 cases)

25307 53

1912961 6706

-8 -6 -4 -2 0100

12 / 16

Relying on Condition Numbers

Need condition numbers for dependable estimates.

Picking the right condition number and estimating it well.

log10 κnorm (single precision)

log10κ

(double

κnorm: single vs. double precision (2000000 cases)

0.0% 28.9%

21.8% 0.0%

0 2 4 6 8 10 12 14100

13 / 16

Performance: The MRRR Algorithm

Multiple Relatively Robust Representations

I 1999 Householder Award honorable mention for Dhillon

I Optimal complexity with small error!I O(nk) flops for k eigenvalues/vectors of n × n

tridiagonal matrixI Small residuals: ‖Txi − λixi‖ = O(nε)I Orthogonal eigenvectors: ‖xT

i xj‖ = O(nε)

I Similar algorithm for SVD.

I Eigenvectors computed independently ⇒ naturallyparallelizable

I (LAPACK r3 had bugs, missing cases)

14 / 16

Performance: The MRRR Algorithm

“fast DC”: Wilkinson, deflate like crazy

15 / 16

Summary

I LAPACK and ScaLAPACK are open for improvement!

I Planned improvements inI numerics,I performance,I functionality, andI engineering.

I Forming a community for long-term development.

16 / 16

Future of LAPACK and ScaLAPACK

Technology

Transcript of Future of LAPACK and ScaLAPACK

C Wrapper to LAPACK by Rémi Delmas supervised by Julien Langou.

The Future of LAPACK and ScaLAPACK Jim Demmel UC Berkeley 28 Sept 2005.

BLAS, LAPACK ScaLAPACK · 2019. 9. 9. · BLASとは Basic LinearAlgebra Subprograms （基本線形代数副プログラム集） 線形代数計算で用いられる、基本演算を標準化

Heterogeneous ScaLAPACK Programmersâ€™ Reference and

Computational plasma physics - WebHome · or Numpy and those programming in a low level language like Fortran, C or C++ can use e cient libraries, like LAPACK, ScaLAPACK, PETSc, Trilinos,

Lapack / ScaLAPACK / Multi-core and more Julie Langou @ UCSB - April 16 th 2007.

Intel®Math Kernel Library (Intel®MKL) · PDF file•DP CR and/or higher precision (Quad Precision) ... BLAS/LAPACK ScaLAPACK Sparse Solver FFTs VML Random Number Generators VSL Pseudo-random

LAPACK Working Note 93 Installation Guide for ScaLAPACK1performance.netlib.org/scalapack/scalapack_install.pdfLAPACK Working Note 93 Installation Guide for ScaLAPACK1 L. S. Blackford2,

MAGMA – LAPACK for HPC on Heterogeneous Architectures

HOKUSAI-GreatWave User's Guide - University of …staff.cs.manchester.ac.uk/~fumie/tmp/hokusai.pdf · HOKUSAI-GreatWave User's Guide Version 1.5 ... LAPACK, ScaLAPACK, MPI, SSLII,

LAPACK Users' Guide

Automatic Sparsity Detection in LAPACK · 2015. 5. 8. · Automatic Sparsity Detection in LAPACK by Razvan Corneliu Carbunescu A thesis submitted in partial satisfaction of the requirements

Blas Lapack

A Complete Bibliography of the LAPACK Working …ftp.math.utah.edu/pub/tex/bib/lawn.pdfA Complete Bibliography of the LAPACK Working Notes Nelson H. F. Beebe University of Utah Department

Prospectus for the Next LAPACK and ScaLAPACK Libraries ...

ScaLAPACK: a portable linear algebra library for ... · ScaLAPACK: a portable linear algebra library for distributed memory computers - design issues and performance J. Choi a, j.

The Future of LAPACK and ScaLAPACK netlib/lapack-dev

GRAPE accelerators - jun.artcompsci.orgjun.artcompsci.org/talks/barcelona20100603.pdf · GRAPE accelerators Jun Makino ... ∗ BLAS, LAPACK – Particle-Particle interaction ... programmable

The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Resources for parallel computing · • Sparse linear equation solvers • ScaLAPACK • Some statistical functions (incl. random number generation) • some MPI support • Fast

BLAS, LAPACK ScaLAPACK · 2019. 9. 9. · BLASとは Basic LinearAlgebra Subprograms （基本線形代数副プログラム集）線形代数計算で用いられる、基本演算を標準化