Future of LAPACK and ScaLAPACK

16
The Future of LAPACK and ScaLAPACK Jason Riedy, Yozo Hida, James Demmel EECS Department University of California, Berkeley November 18, 2005

description

Progress and directions for LAPACK and ScaLAPACK as of 2005. See LAPACK at netlib and

Transcript of Future of LAPACK and ScaLAPACK

Page 1: Future of LAPACK and ScaLAPACK

The Future of LAPACK andScaLAPACK

Jason Riedy, Yozo Hida, James Demmel

EECS DepartmentUniversity of California, Berkeley

November 18, 2005

Page 2: Future of LAPACK and ScaLAPACK

Outline

Survey responses: What users want

Improving LAPACK and ScaLAPACKImproved NumericsImproved PerformanceImproved FunctionalityImproved Engineering and Community

Two Example ImprovementsNumerics: Iterative Refinement for Ax = bPerformance: The MRRR Algorithm for Ax = λx

2 / 16

Page 3: Future of LAPACK and ScaLAPACK

Survey: What users want

I Survey available fromhttp://www.netlib.org/lapack-dev/.

I 212 responses, over 100 different, non-anonymous groups

I Problem sizes:100 1K 10K 100K 1M (other)8% 26% 24% 12% 6% (24%)

I >80% interested in small-medium SMPs

I >40% interested in large distributed-memory systems

I Vendor libs seen as faster, buggier

I over 20% want > double precision, 70% out-of-core

I Requests: High-level interfaces, low-level interfaces,parallel redistribution∗ and tuning

3 / 16

Page 4: Future of LAPACK and ScaLAPACK

ParticipantsI UC Berkeley

I Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye

Li, Osni Marques, Christof Vomel, David Bindel, Yozo Hida,

Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . .I U Tennessee, Knoxville

I Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek,

Stan Tomov, . . .I Other Academic Institutions

I UT Austin, UC Davis, U Kansas, U Maryland, North Carolina

SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen,

U Madrid, U Manchester, U Umea, U Wuppertal, U ZagrebI Research Institutions

I CERFACS, LBL, UEC (Japan)I Industrial Partners

I Cray, HP, Intel, MathWorks, NAG, SGI

You?

4 / 16

Page 5: Future of LAPACK and ScaLAPACK

Improved Numerics

Improved accuracy with standard asymptotic speed:Some are faster!

I Iterative refinement for linear systems, least squaresDemmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan

I Pivoting and scaling for symmetric systemsI Definite and indefinite

I Jacobi SVD (and faster) — Drmac / Veselic

I Condition numbers and estimatorsHigham / Cheng / Tisseur

I Useful approximate error estimates

5 / 16

Page 6: Future of LAPACK and ScaLAPACK

Improved Performance

Improved performance with at least standard accuracy

I MRRR algorithm for eigenvalues, SVDParlett / Dhillon / Vomel / Marques / Willems / Katagiri

I Fast Hessenberg QR & QZByers / Mathias / Braman, Kagstrom / Kressner

I Fast reductions and BLAS2.5van de Geijn, Bischof / Lang, Howell / Fulton

I Recursive data layoutsGustavson / Kagstrom / Elmroth / Jonsson

I generalized SVD — Bai, Wang

I Polynomial roots from semi-separable formGu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel

I Automated tuning, optimizations in ScaLAPACK, . . .

6 / 16

Page 7: Future of LAPACK and ScaLAPACK

Improved Functionality

Algorithms

I Updating / downdating factorizations — Stewart, Langou

I More generalized SVDs: products, CSD — Bai, Wang

I More generalized Sylvester, Lyupanov solversKagstrom, Jonsson, Granat

I Quadratic eigenproblems — Mehrmann

I Matrix functions — Higham

Implementations

I Add “missing” features to ScaLAPACK

I Generate LAPACK, ScaLAPACK for higher precisions

7 / 16

Page 8: Future of LAPACK and ScaLAPACK

Improved Engineering and Community

Use new features without a rewrite

I Use modern Fortran 95, maybe 2003I DO . . . END DO, recursion, allocation (in wrappers)

I Provide higher-level wrappers for common languagesI F95, C, C++

I Automatic generation of precisions, bindingsI Full automation (FLAME, etc.) not quite ready for all

functions

I Tests for algorithms, implementations, installations

Open development

Need a community for long-term evolution.http://www.netlib.org/lapack-dev/

Lots of work to do, research and development.

8 / 16

Page 9: Future of LAPACK and ScaLAPACK

Two Example Improvements

Recent, locally developed improvements

Improved Numerics

Iterative refinement for linear systems Ax = b:

I Extra precision ⇒ small error, dependable estimate

I Both normwise and componentwise

I (See LAWN 165 for full details.)

Improved Performance

MRRR algorithm for eigenvalue, SVD problems

I Optimal complexity: O(n) per value/vector

I (See LAWNs 162, 163, 166, 167. . . for more details.)

9 / 16

Page 10: Future of LAPACK and ScaLAPACK

Numerics: Iterative RefinementImprove solution to Ax = b

Repeat: r = b − Ax , dx = A−1r , x = x + dx

Until: good enough

Not-too-ill-conditioned ⇒ error O(√

n ε)

log10

κnorm

log10E

norm

Normwise error vs. condition number κnorm. (2000000 cases)

167212 1156099

463800 22544

190085 260

0 5 10 15100

101

102

103

104

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

log10 κnorm

log10E

norm

Normwise error vs. condition number κnorm. (2000000 cases)

40377

1539

821097 1136987

0 5 10 15100

101

102

103

104

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

10 / 16

Page 11: Future of LAPACK and ScaLAPACK

Numerics: Iterative RefinementImprove solution to Ax = b

Repeat: r = b − Ax , dx = A−1r , x = x + dx

Until: good enough

Dependable normwise relative error estimate

log10 Enorm

log10B

norm

Normwise Error vs. Bound (2000000 cases)

133497 485930

1323311

56848

414

-8 -6 -4 -2 0100

101

102

103

104

105

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

log10

Enorm

log10B

norm

Normwise Error vs. Bound (2000000 cases)

100

40359

343

1361 18

1957741 78

-8 -6 -4 -2 0100

101

102

103

104

105

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

11 / 16

Page 12: Future of LAPACK and ScaLAPACK

Numerics: Iterative RefinementImprove solution to Ax = b

Repeat: r = b − Ax , dx = A−1r , x = x + dx

Until: good enough

Also small componentwise errors and dependable estimates

log10 κcomp

log10E

com

p

Componentwise error vs. condition number κcomp. (2000000 cases)

41755

32249

545427 1380569

0 5 10 15100

101

102

103

104

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

PSfrag replacemen

log10 Ecomp

log10B

com

p

Componentwise Error vs. Bound (2000000 cases)

1 236

41702

13034

25307 53

1912961 6706

-8 -6 -4 -2 0100

101

102

103

104

105

-9

-8

-7

-6

-5

-4

-3

-2

-1

0

1

12 / 16

Page 13: Future of LAPACK and ScaLAPACK

Relying on Condition Numbers

Need condition numbers for dependable estimates.

Picking the right condition number and estimating it well.

log10 κnorm (single precision)

log10κ

norm

(double

pre

cisi

on)

κnorm: single vs. double precision (2000000 cases)

0.0% 28.9%

30.1%

19.2%

21.8% 0.0%

0 2 4 6 8 10 12 14100

101

102

103

104

0

2

4

6

8

10

12

14

13 / 16

Page 14: Future of LAPACK and ScaLAPACK

Performance: The MRRR Algorithm

Multiple Relatively Robust Representations

I 1999 Householder Award honorable mention for Dhillon

I Optimal complexity with small error!I O(nk) flops for k eigenvalues/vectors of n × n

tridiagonal matrixI Small residuals: ‖Txi − λixi‖ = O(nε)I Orthogonal eigenvectors: ‖xT

i xj‖ = O(nε)

I Similar algorithm for SVD.

I Eigenvectors computed independently ⇒ naturallyparallelizable

I (LAPACK r3 had bugs, missing cases)

14 / 16

Page 15: Future of LAPACK and ScaLAPACK

Performance: The MRRR Algorithm

“fast DC”: Wilkinson, deflate like crazy

15 / 16

Page 16: Future of LAPACK and ScaLAPACK

Summary

I LAPACK and ScaLAPACK are open for improvement!

I Planned improvements inI numerics,I performance,I functionality, andI engineering.

I Forming a community for long-term development.

16 / 16