The Future of {LAPACK} and...

16
The Future of LAPACK and ScaLAPACK Jason Riedy, Yozo Hida, James Demmel EECS Department University of California, Berkeley November 18, 2005

Transcript of The Future of {LAPACK} and...

Page 1: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

The Future of LAPACK andScaLAPACK

Jason Riedy, Yozo Hida, James Demmel

EECS DepartmentUniversity of California, Berkeley

November 18, 2005

Page 2: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Outline

Survey responses: What users want

Improving LAPACK and ScaLAPACKImproved NumericsImproved PerformanceImproved FunctionalityImproved Engineering and Community

Two Example ImprovementsNumerics: Iterative Refinement for Ax = bPerformance: The MRRR Algorithm for Ax = λx

2 / 16

Page 3: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Survey: What users want

I Survey available fromhttp://www.netlib.org/lapack-dev/.

I 212 responses, over 100 different, non-anonymous groups

I Problem sizes:100 1K 10K 100K 1M (other)8% 26% 24% 12% 6% (24%)

I >80% interested in small-medium SMPs

I >40% interested in large distributed-memory systems

I Vendor libs seen as faster, buggier

I over 20% want > double precision, 70% out-of-core

I Requests: High-level interfaces, low-level interfaces,parallel redistribution∗ and tuning

3 / 16

Page 4: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

ParticipantsI UC Berkeley

I Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye

Li, Osni Marques, Christof Vomel, David Bindel, Yozo Hida,

Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . .I U Tennessee, Knoxville

I Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek,

Stan Tomov, . . .I Other Academic Institutions

I UT Austin, UC Davis, U Kansas, U Maryland, North Carolina

SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen,

U Madrid, U Manchester, U Umea, U Wuppertal, U ZagrebI Research Institutions

I CERFACS, LBL, UEC (Japan)I Industrial Partners

I Cray, HP, Intel, MathWorks, NAG, SGI

You?

4 / 16

Page 5: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Improved Numerics

Improved accuracy with standard asymptotic speed:Some are faster!

I Iterative refinement for linear systems, least squaresDemmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan

I Pivoting and scaling for symmetric systemsI Definite and indefinite

I Jacobi SVD (and faster) — Drmac / Veselic

I Condition numbers and estimatorsHigham / Cheng / Tisseur

I Useful approximate error estimates

5 / 16

Page 6: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Improved Performance

Improved performance with at least standard accuracy

I MRRR algorithm for eigenvalues, SVDParlett / Dhillon / Vomel / Marques / Willems / Katagiri

I Fast Hessenberg QR & QZByers / Mathias / Braman, Kagstrom / Kressner

I Fast reductions and BLAS2.5van de Geijn, Bischof / Lang, Howell / Fulton

I Recursive data layoutsGustavson / Kagstrom / Elmroth / Jonsson

I generalized SVD — Bai, Wang

I Polynomial roots from semi-separable formGu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel

I Automated tuning, optimizations in ScaLAPACK, . . .

6 / 16

Page 7: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Improved Functionality

Algorithms

I Updating / downdating factorizations — Stewart, Langou

I More generalized SVDs: products, CSD — Bai, Wang

I More generalized Sylvester, Lyupanov solversKagstrom, Jonsson, Granat

I Quadratic eigenproblems — Mehrmann

I Matrix functions — Higham

Implementations

I Add “missing” features to ScaLAPACK

I Generate LAPACK, ScaLAPACK for higher precisions

7 / 16

Page 8: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Improved Engineering and Community

Use new features without a rewrite

I Use modern Fortran 95, maybe 2003I DO . . . END DO, recursion, allocation (in wrappers)

I Provide higher-level wrappers for common languagesI F95, C, C++

I Automatic generation of precisions, bindingsI Full automation (FLAME, etc.) not quite ready for all

functions

I Tests for algorithms, implementations, installations

Open development

Need a community for long-term evolution.http://www.netlib.org/lapack-dev/

Lots of work to do, research and development.

8 / 16

Page 9: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Two Example Improvements

Recent, locally developed improvements

Improved Numerics

Iterative refinement for linear systems Ax = b:

I Extra precision ⇒ small error, dependable estimate

I Both normwise and componentwise

I (See LAWN 165 for full details.)

Improved Performance

MRRR algorithm for eigenvalue, SVD problems

I Optimal complexity: O(n) per value/vector

I (See LAWNs 162, 163, 166, 167. . . for more details.)

9 / 16

Page 10: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Numerics: Iterative RefinementImprove solution to Ax = b

Repeat: r = b − Ax , dx = A−1r , x = x + dx

Until: good enough

Not-too-ill-conditioned ⇒ error O(√

n ε)

log10

κnorm

log10E

norm

Normwise error vs. condition number κnorm. (2000000 cases)

167212 1156099

463800 22544

190085 260

0 5 10 15100

101

102

103

104

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

log10 κnorm

log10E

norm

Normwise error vs. condition number κnorm. (2000000 cases)

40377

1539

821097 1136987

0 5 10 15100

101

102

103

104

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

10 / 16

Page 11: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Numerics: Iterative RefinementImprove solution to Ax = b

Repeat: r = b − Ax , dx = A−1r , x = x + dx

Until: good enough

Dependable normwise relative error estimate

log10 Enorm

log10B

norm

Normwise Error vs. Bound (2000000 cases)

133497 485930

1323311

56848

414

-8 -6 -4 -2 0100

101

102

103

104

105

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

log10

Enorm

log10B

norm

Normwise Error vs. Bound (2000000 cases)

100

40359

343

1361 18

1957741 78

-8 -6 -4 -2 0100

101

102

103

104

105

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

11 / 16

Page 12: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Numerics: Iterative RefinementImprove solution to Ax = b

Repeat: r = b − Ax , dx = A−1r , x = x + dx

Until: good enough

Also small componentwise errors and dependable estimates

log10 κcomp

log10E

com

p

Componentwise error vs. condition number κcomp. (2000000 cases)

41755

32249

545427 1380569

0 5 10 15100

101

102

103

104

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

PSfrag replacemen

log10 Ecomp

log10B

com

p

Componentwise Error vs. Bound (2000000 cases)

1 236

41702

13034

25307 53

1912961 6706

-8 -6 -4 -2 0100

101

102

103

104

105

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

12 / 16

Page 13: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Relying on Condition Numbers

Need condition numbers for dependable estimates.

Picking the right condition number and estimating it well.

log10 κnorm (single precision)

log10κ

norm

(double

pre

cisi

on)

κnorm: single vs. double precision (2000000 cases)

0.0% 28.9%

30.1%

19.2%

21.8% 0.0%

0 2 4 6 8 10 12 14100

101

102

103

104

0

2

4

6

8

10

12

14

13 / 16

Page 14: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Performance: The MRRR Algorithm

Multiple Relatively Robust Representations

I 1999 Householder Award honorable mention for Dhillon

I Optimal complexity with small error!I O(nk) flops for k eigenvalues/vectors of n × n

tridiagonal matrixI Small residuals: ‖Txi − λixi‖ = O(nε)I Orthogonal eigenvectors: ‖xT

i xj‖ = O(nε)

I Similar algorithm for SVD.

I Eigenvectors computed independently ⇒ naturallyparallelizable

I (LAPACK r3 had bugs, missing cases)

14 / 16

Page 15: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Performance: The MRRR Algorithm

“fast DC”: Wilkinson, deflate like crazy

15 / 16

Page 16: The Future of {LAPACK} and {ScaLAPACK}jriedy.users.sonic.net/resume/material/future-of-scalapack.pdfI 1999 Householder Award honorable mention for Dhillon I Optimal complexity with

Summary

I LAPACK and ScaLAPACK are open for improvement!

I Planned improvements inI numerics,I performance,I functionality, andI engineering.

I Forming a community for long-term development.

16 / 16