Matrix Multiplication Based Computations of the ... · The most efficient routine in linear...

86
Introduction Matrix multiplication based linear algebra Computing the characteristic polynomial Conclusion and perspectives Matrix Multiplication Based Computations of the Characteristic Polynomial Clément PERNET, joint work with Arne Storjohann Symbolic Computation Group University of Waterloo Joint Lab Meeting ORCCA-SCG, February 9, 2007 Clément Pernet Matrix multiplication based Characteristic Polynomial

Transcript of Matrix Multiplication Based Computations of the ... · The most efficient routine in linear...

Page 1: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix Multiplication Based Computations ofthe Characteristic Polynomial

Clément PERNET,joint work with Arne Storjohann

Symbolic Computation GroupUniversity of Waterloo

Joint Lab Meeting ORCCA-SCG,February 9, 2007

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 2: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Introduction

Dense Linear Algebra over a Field:one of the usual models for complexity in linear algebraapplied to

R: floating point linear algebraGF (q), Zp and Z ( using CRT)

Applications in exact computation:Cryptography : number field sievesRepresentation theory : null space basis

Topology : Smith normal formsGraph theory : characteristic polynomial

...

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 3: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Introduction

Dense Linear Algebra over a Field:one of the usual models for complexity in linear algebraapplied to

R: floating point linear algebraGF (q), Zp and Z ( using CRT)

Applications in exact computation:Cryptography : number field sievesRepresentation theory : null space basis

Topology : Smith normal formsGraph theory : characteristic polynomial

...

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 4: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Approach

ProblemCompute the characteristic polynomial of a dense matrix over afield

Deterministic or Las Vegas randomized algorithmns

Asymptotic time complexity...... and practical algorithms

⇒Balance between asymptotic complexity and practicalefficiency considerations

space complexity

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 5: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Approach

ProblemCompute the characteristic polynomial of a dense matrix over afield

Deterministic or Las Vegas randomized algorithmnsAsymptotic time complexity...... and practical algorithms

⇒Balance between asymptotic complexity and practicalefficiency considerationsspace complexity

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 6: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Approach

ProblemCompute the characteristic polynomial of a dense matrix over afield

Deterministic or Las Vegas randomized algorithmnsAsymptotic time complexity...... and practical algorithms⇒Balance between asymptotic complexity and practical

efficiency considerations

space complexity

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 7: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Approach

ProblemCompute the characteristic polynomial of a dense matrix over afield

Deterministic or Las Vegas randomized algorithmnsAsymptotic time complexity...... and practical algorithms⇒Balance between asymptotic complexity and practical

efficiency considerationsspace complexity

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 8: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Outline

1 Matrix multiplication based linear algebraMatrix multiplication: a building blockReductions to matrix multiplication

2 Computing the characteristic polynomialState of the artA new algorithmAlgorithm into practice

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 9: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Outline

1 Matrix multiplication based linear algebraMatrix multiplication: a building blockReductions to matrix multiplication

2 Computing the characteristic polynomialState of the artA new algorithmAlgorithm into practice

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 10: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Outline

1 Matrix multiplication based linear algebraMatrix multiplication: a building blockReductions to matrix multiplication

2 Computing the characteristic polynomialState of the artA new algorithmAlgorithm into practice

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 11: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Asymptotic complexity

Matrix multiplication:

Folklore: 2n3 − n2

Strassen 1969: 7n2.807 + o(n2.807)

Winograd 1971: 6n2.807 + o(n2.807)

...Coppersmith Winograd 1990: O

(n2.376)

⇒O (nω), where ω denotes an admissible exponent

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 12: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Efficiency in practice

The most efficient routine in linear algebra.Several reasons:

dedicated processor instruction fused-mac: z ← xy + z

simple structure of the dot-product (pipelining is easy)

sub-cubic algorithmused to be considered as not practicablebeware of unstability with floating point numbersbut improves efficiency over finite fields

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 13: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Efficiency in practice

The most efficient routine in linear algebra.Several reasons:

dedicated processor instruction fused-mac: z ← xy + zsimple structure of the dot-product (pipelining is easy)

sub-cubic algorithmused to be considered as not practicablebeware of unstability with floating point numbersbut improves efficiency over finite fields

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 14: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Efficiency in practice

The most efficient routine in linear algebra.Several reasons:

dedicated processor instruction fused-mac: z ← xy + zsimple structure of the dot-product (pipelining is easy)enables better memory management

sub-cubic algorithmused to be considered as not practicablebeware of unstability with floating point numbersbut improves efficiency over finite fields

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 15: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Efficiency in practice

The most efficient routine in linear algebra.Several reasons:

dedicated processor instruction fused-mac: z ← xy + zsimple structure of the dot-product (pipelining is easy)enables better memory managementsub-cubic algorithm

used to be considered as not practicablebeware of unstability with floating point numbersbut improves efficiency over finite fields

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 16: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Efficiency in practice

The most efficient routine in linear algebra.Several reasons:

dedicated processor instruction fused-mac: z ← xy + zsimple structure of the dot-product (pipelining is easy)enables better memory managementsub-cubic algorithm

used to be considered as not practicablebeware of unstability with floating point numbersbut improves efficiency over finite fields

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 17: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Memory management considerations

CPU-Memory communication: bandwidth gap⇒Hierarchy of several cache memory levels

Imposes a structure for algorithms: operationsmust be blocked to increase data locality and fitin the cache

CPU

L1

RAM

L2

L3

Reuse of the dataWork� Data to amortize memory transfer⇒reach the peak performance of the CPU

Matrix multiplication: n3 � n2

⇒well suited for block design

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 18: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Memory management considerations

CPU-Memory communication: bandwidth gap⇒Hierarchy of several cache memory levels

Imposes a structure for algorithms: operationsmust be blocked to increase data locality and fitin the cache ...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Reuse of the dataWork� Data to amortize memory transfer⇒reach the peak performance of the CPU

Matrix multiplication: n3 � n2

⇒well suited for block design

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 19: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Memory management considerations

CPU-Memory communication: bandwidth gap⇒Hierarchy of several cache memory levels

Imposes a structure for algorithms: operationsmust be blocked to increase data locality and fitin the cache ...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

...

Reuse of the dataWork� Data to amortize memory transfer⇒reach the peak performance of the CPU

Matrix multiplication: n3 � n2

⇒well suited for block design

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 20: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Practical implementation over finite fields

Matrix multiplication

0

1000

2000

3000

4000

5000

6000

0 1000 2000 3000 4000 5000 6000 7000

Mfo

ps

n

Multiplication classique dans Z/65521Z sur un P4, 3.4 GHz

Standard Bloc-33

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 21: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Practical implementation over finite fields

Matrix multiplication

0

1000

2000

3000

4000

5000

6000

0 1000 2000 3000 4000 5000 6000 7000

Mfo

ps

n

Multiplication classique dans Z/65521Z sur un P4, 3.4 GHz

fgemm Standard Bloc-33

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 22: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Practical implementation over finite fields

Matrix multiplication

0

1000

2000

3000

4000

5000

6000

0 1000 2000 3000 4000 5000 6000 7000

Mfo

ps

n

Multiplication classique dans Z/65521Z sur un P4, 3.4 GHz

dgemm fgemm Standard Bloc-33

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 23: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Practical implementation over finite fields

Matrix multiplication

600

800

1000

1200

1400

1600

1800

2000

0 500 1000 1500 2000 2500 3000

Mfo

ps

n

Multiplication rapide dans Z/65521Z sur un PIII-1.6Ghz (1Mo de cache L2)

ClassiqueWinograd 1 niveau

Winograd 2 niveauxWinograd 3 niveauxWinograd 4 niveauxWinograd 5 niveaux

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 24: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Outline

1 Matrix multiplication based linear algebraMatrix multiplication: a building blockReductions to matrix multiplication

2 Computing the characteristic polynomialState of the artA new algorithmAlgorithm into practice

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 25: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Other linear algebra problems

Asymptotic complexity:

Used to be in O(n3)

Room for improvement: O (nω) for everyone ?

Practical efficiency: reuse the efficient matrix multiplicationkernel

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 26: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Other linear algebra problems

Asymptotic complexity:

Used to be in O(n3)

Room for improvement: O (nω) for everyone ?

Practical efficiency: reuse the efficient matrix multiplicationkernel

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 27: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

Matrix Inversion [Strassen 69]

[A BC D

]−1

=

[A−1

I

] [I −B

I

] [I

(D − CA−1B)−1

] [I

CA−1 I

]

1: Compute E = A−1 (Recursive call)2: Compute F = D − CEB (MM)3: Compute G = F−1 (Recursive call)4: Compute H = −EB (MM)5: Compute J = HG (MM)6: Compute K = CE (MM)7: Compute L = E + JK (MM)8: Compute M = GK (MM)

9: Return[

E JM G

]

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 28: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

Matrix Inversion [Strassen 69]

[A BC D

]−1

=

[A−1

I

] [I −B

I

] [I

(D − CA−1B)−1

] [I

CA−1 I

]1: Compute E = A−1 (Recursive call)2: Compute F = D − CEB (MM)3: Compute G = F−1 (Recursive call)4: Compute H = −EB (MM)5: Compute J = HG (MM)6: Compute K = CE (MM)7: Compute L = E + JK (MM)8: Compute M = GK (MM)

9: Return[

E JM G

]Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 29: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

TRSM: Multiple triangular system solving[A B

C

]−1 [DE

]=

[A−1

I

] [I −B

I

] [I

C−1

] [DE

]

1: Compute F = C−1E (Recursive call)2: Compute G = D − BF (MM)3: Compute H = A−1G (Recursive call)

4: Return[HF

]

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 30: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

TRSM: Multiple triangular system solving[A B

C

]−1 [DE

]=

[A−1

I

] [I −B

I

] [I

C−1

] [DE

]

1: Compute F = C−1E (Recursive call)2: Compute G = D − BF (MM)3: Compute H = A−1G (Recursive call)

4: Return[HF

]

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 31: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

LU decomposition[A BC D

]=

[LA

CU−1A LE

] [UA L−1

A BUE

]where E = D − CA−1B

1: Compute A = LAUA (Recursive call)2: Compute F = CU−1

A (TRSM)3: Compute G = L−1

A B (TRSM)4: Compute E = D − FG (MM)5: Compute E = LEUE (Recursive call)

6: Return([

LAF LE

],

[UA G

UE

])

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 32: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

LU decomposition[A BC D

]=

[LA

CU−1A LE

] [UA L−1

A BUE

]where E = D − CA−1B

1: Compute A = LAUA (Recursive call)2: Compute F = CU−1

A (TRSM)3: Compute G = L−1

A B (TRSM)4: Compute E = D − FG (MM)5: Compute E = LEUE (Recursive call)

6: Return([

LAF LE

],

[UA G

UE

])

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 33: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

Divide and conquer approach:⇒ involves computations with dimensions n

2i for i = 1 . . . log2 n

⇒ overall time complexity by geometric progression

log2 n∑i=1

2i( n

2i

= O (nω)

These reductions reduce to in O (nω) the following problemsdet, rank, rank profile,echelon form, inverse, system solving.

What about the characteristic polynomial ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 34: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

Divide and conquer approach:⇒ involves computations with dimensions n

2i for i = 1 . . . log2 n

⇒ overall time complexity by geometric progression

log2 n∑i=1

2i( n

2i

= O (nω)

These reductions reduce to in O (nω) the following problemsdet, rank, rank profile,echelon form, inverse, system solving.

What about the characteristic polynomial ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 35: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Matrix multiplication: a building blockReductions to matrix multiplication

Reductions to matrix multiplication

Divide and conquer approach:⇒ involves computations with dimensions n

2i for i = 1 . . . log2 n

⇒ overall time complexity by geometric progression

log2 n∑i=1

2i( n

2i

= O (nω)

These reductions reduce to in O (nω) the following problemsdet, rank, rank profile,echelon form, inverse, system solving.

What about the characteristic polynomial ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 36: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Outline

1 Matrix multiplication based linear algebraMatrix multiplication: a building blockReductions to matrix multiplication

2 Computing the characteristic polynomialState of the artA new algorithmAlgorithm into practice

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 37: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Outline

1 Matrix multiplication based linear algebraMatrix multiplication: a building blockReductions to matrix multiplication

2 Computing the characteristic polynomialState of the artA new algorithmAlgorithm into practice

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 38: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Pre-Strassen age

Leverrier 1840: trace of powers of A, and Newton’s formulaimproved/rediscovered by Souriau, Faddeev,Frame and CsankyO

(n4), based on Matrix multiplication

Suited for parallel computation model

Danilevskii 1937: elementary row/column operations⇒O

(n3)

Hessenberg 1942: transformation to quasi-upper triangularand determinant expansion formula.⇒O

(n3)

But no trivial translation into a block algorithm with O (nω)complexity.

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 39: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Pre-Strassen age

Leverrier 1840: trace of powers of A, and Newton’s formulaimproved/rediscovered by Souriau, Faddeev,Frame and CsankyO

(n4), based on Matrix multiplication

Suited for parallel computation modelDanilevskii 1937: elementary row/column operations

⇒O(n3)

Hessenberg 1942: transformation to quasi-upper triangularand determinant expansion formula.⇒O

(n3)

But no trivial translation into a block algorithm with O (nω)complexity.

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 40: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Pre-Strassen age

Leverrier 1840: trace of powers of A, and Newton’s formulaimproved/rediscovered by Souriau, Faddeev,Frame and CsankyO

(n4), based on Matrix multiplication

Suited for parallel computation modelDanilevskii 1937: elementary row/column operations

⇒O(n3)

Hessenberg 1942: transformation to quasi-upper triangularand determinant expansion formula.⇒O

(n3)

But no trivial translation into a block algorithm with O (nω)complexity.

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 41: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Pre-Strassen age

Leverrier 1840: trace of powers of A, and Newton’s formulaimproved/rediscovered by Souriau, Faddeev,Frame and CsankyO

(n4), based on Matrix multiplication

Suited for parallel computation modelDanilevskii 1937: elementary row/column operations

⇒O(n3)

Hessenberg 1942: transformation to quasi-upper triangularand determinant expansion formula.⇒O

(n3)

But no trivial translation into a block algorithm with O (nω)complexity.

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 42: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Post-Strassen age

Preparata & Sarwate 1978: Update Csanky with fast matrixmultiplication⇒O

(nω+1)

Keller-Gehrig 1985, alg.1: computes (A2i)i=1... log2 n to form a

Krylov basis.

O (nω log n)the best complexity up to now

Keller-Gehrig 1985, alg.2: inspired by Danilevskii, blockoperations

O (nω)Only valid with generic matrices

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 43: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Post-Strassen age

Preparata & Sarwate 1978: Update Csanky with fast matrixmultiplication⇒O

(nω+1)

Keller-Gehrig 1985, alg.1: computes (A2i)i=1... log2 n to form a

Krylov basis.

O (nω log n)the best complexity up to now

Keller-Gehrig 1985, alg.2: inspired by Danilevskii, blockoperations

O (nω)Only valid with generic matrices

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 44: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Post-Strassen age

Preparata & Sarwate 1978: Update Csanky with fast matrixmultiplication⇒O

(nω+1)

Keller-Gehrig 1985, alg.1: computes (A2i)i=1... log2 n to form a

Krylov basis.

O (nω log n)the best complexity up to now

Keller-Gehrig 1985, alg.2: inspired by Danilevskii, blockoperations

O (nω)Only valid with generic matrices

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 45: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Outline

1 Matrix multiplication based linear algebraMatrix multiplication: a building blockReductions to matrix multiplication

2 Computing the characteristic polynomialState of the artA new algorithmAlgorithm into practice

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 46: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Statement

Theorem

If A is a n × n matrix over a field having more than 2n2

elements, the characteristic polynomial of A can be computedin O (nω) field operations by a Las Vegas randomized algorithm.

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 47: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Definition (degree d Krylov matrix of one vector v )

K =[v Av . . . Ad−1v

]Property

A× K = K ×

0 ∗1 ∗

. . . ∗1 ∗

⇒if d = n, K−1AK = CPAcar

⇒[Keller-Gehrig, alg. 2] computes K in O (nω)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 48: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Definition (degree d Krylov matrix of one vector v )

K =[v Av . . . Ad−1v

]Property

A× K = K ×

0 ∗1 ∗

. . . ∗1 ∗

⇒if d = n, K−1AK = CPA

car

⇒[Keller-Gehrig, alg. 2] computes K in O (nω)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 49: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Definition (degree d Krylov matrix of one vector v )

K =[v Av . . . Ad−1v

]Property

A× K = K ×

0 ∗1 ∗

. . . ∗1 ∗

⇒if d = n, K−1AK = CPA

car⇒[Keller-Gehrig, alg. 2] computes K in O (nω)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 50: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Definition (degree k Krylov matrix of several vectors vi )

K =[

v1 . . . Ak−1v1 v2 . . . Ak−1v2 . . . vl . . . Ak−1vl]

Property

1

0

1

1

0

1

1

0

1

k k k

A× K = K×

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 51: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

FactIf (d1, . . . dl) is lexicographically maximal such that

K =[

v1 . . . Ad1−1v1 . . . vl . . . Adl−1vl]

is non-singular, then

1

0

1

0

1

1

1

0

1

d2 dld1

K−1AK =

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 52: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Principle

k -shifted form:

1

0

1

1

0

1

1

0

1

k k ≤ k

try to inflate each slice by one ...... to obtain the k + 1-shifted form

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 53: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Principle

k -shifted form:

1

0

1

1

0

1

1

0

1

k k ≤ k

try to inflate each slice by one ...

... to obtain the k + 1-shifted form

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 54: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Principle

k + 1-shifted form:

1

0

1

0

1

0

1

1

1

≤ k + 1k + 1 d3

try to inflate each slice by one ...... to obtain the k + 1-shifted form

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 55: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Principle

Compute iteratively from 1-shifted form to d1-shifted form

each diagonal block appears in the increasing degreeuntil the shifted Hessenberg form is obtained:

1

0

1

0

1

1

1

0

1

d2 dld1

How to transform from k to k + 1-shifted form ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 56: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Principle

Compute iteratively from 1-shifted form to d1-shifted formeach diagonal block appears in the increasing degree

until the shifted Hessenberg form is obtained:

1

0

1

0

1

1

1

0

1

d2 dld1

How to transform from k to k + 1-shifted form ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 57: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Principle

Compute iteratively from 1-shifted form to d1-shifted formeach diagonal block appears in the increasing degreeuntil the shifted Hessenberg form is obtained:

1

0

1

0

1

1

1

0

1

d2 dld1

How to transform from k to k + 1-shifted form ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 58: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Principle

Compute iteratively from 1-shifted form to d1-shifted formeach diagonal block appears in the increasing degreeuntil the shifted Hessenberg form is obtained:

1

0

1

0

1

1

1

0

1

d2 dld1

How to transform from k to k + 1-shifted form ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 59: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Krylov normal extension

for any k -shifted form

1

0

1

1

0

1

1

0

1

k k ≤ k

Ak = c1 c2 c3

compute the n × (n + k) matrix

1

11

11

1

k kk

K = c1 c2 c3

and form K by picking its first linearly independent columns.

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 60: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Krylov normal extension

for any k -shifted form

1

0

1

1

0

1

1

0

1

k k ≤ k

Ak = c1 c2 c3

compute the n × (n + k) matrix

1

11

11

1

k kk

K = c1 c2 c3

and form K by picking its first linearly independent columns.

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 61: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Krylov normal extension

for any k -shifted form

1

0

1

1

0

1

1

0

1

k k ≤ k

Ak = c1 c2 c3

compute the n × (n + k) matrix

1

11

11

1

k kk

K = c1 c2 c3

and form K by picking its first linearly independent columns.

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 62: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Krylov normal extension

LemmaIf #F > 2n2, with high probability, the matrix K will have the form

1

11

1

1

1

k k ≤ k

K = c1 c2

and Ak+1 = K−1Ak K will be in k + 1 shifted form

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 63: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

The algorithm

Form K : just copy the columns of Ak

Compute K : rank profile of KApply the similarity transformation K−1AkK

How to use matrix multiplication knowing the structure ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 64: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

The algorithm

Form K : just copy the columns of Ak

Compute K : rank profile of K

Apply the similarity transformation K−1AkK

How to use matrix multiplication knowing the structure ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 65: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

The algorithm

Form K : just copy the columns of Ak

Compute K : rank profile of KApply the similarity transformation K−1AkK

How to use matrix multiplication knowing the structure ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 66: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

The algorithm

Form K : just copy the columns of Ak

Compute K : rank profile of KApply the similarity transformation K−1AkK

How to use matrix multiplication knowing the structure ?

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 67: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Permutations: compressing the dense columns

0

1

1

1

1

0

1

0

1

1

0

1

×PAk = = Q×c2c2c3c1 c3

c1

1

1

0

1

1

1

11

1

×P′K = = Q′× c2c1c1 c2

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 68: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Permutations: compressing the dense columns

0

1

1

1

1

0

1

0

1

1

0

1

×PAk = = Q×c2c2c3c1 c3

c1

1

1

0

1

1

1

11

1

×P′K = = Q′× c2c1c1 c2

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 69: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Reduction to Matrix multiplication

Rank profile: derived from LQUP⇒O

(k

(nk

)ω)

Similarity transformation: parenthesing

⇒O(k

(nk

)ω)Overall complexity: summing for each iteration:

n∑k=1

k(n

k

= nωn∑

k=1

(1k

)ω−1

= O (nω)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 70: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Reduction to Matrix multiplication

Rank profile: derived from LQUP⇒O

(k

(nk

)ω)Similarity transformation: parenthesing

K−1AK = Q′T[

I ∗0 ∗

]P ′T Q

[I ∗0 ∗

]PQ′

[I ∗0 ∗

]P ′

⇒O(k

(nk

)ω)Overall complexity: summing for each iteration:

n∑k=1

k(n

k

= nωn∑

k=1

(1k

)ω−1

= O (nω)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 71: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Reduction to Matrix multiplication

Rank profile: derived from LQUP⇒O

(k

(nk

)ω)Similarity transformation: parenthesing

K−1AK = Q′T([

I ∗0 ∗

](P ′T Q

([I ∗0 ∗

](PQ′

[I ∗0 ∗

]))))P ′

⇒O(k

(nk

)ω)Overall complexity: summing for each iteration:

n∑k=1

k(n

k

= nωn∑

k=1

(1k

)ω−1

= O (nω)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 72: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Reduction to Matrix multiplication

Rank profile: derived from LQUP⇒O

(k

(nk

)ω)Similarity transformation: parenthesing

K−1AK = Q′T([

I ∗0 ∗

](P ′T Q

([I ∗0 ∗

](PQ′

[I ∗0 ∗

]))))P ′

⇒O(k

(nk

)ω)

Overall complexity: summing for each iteration:

n∑k=1

k(n

k

= nωn∑

k=1

(1k

)ω−1

= O (nω)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 73: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Reduction to Matrix multiplication

Rank profile: derived from LQUP⇒O

(k

(nk

)ω)Similarity transformation: parenthesing

K−1AK = Q′T([

I ∗0 ∗

](P ′T Q

([I ∗0 ∗

](PQ′

[I ∗0 ∗

]))))P ′

⇒O(k

(nk

)ω)Overall complexity: summing for each iteration:

n∑k=1

k(n

k

= nωn∑

k=1

(1k

)ω−1

= O (nω)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 74: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Outline

1 Matrix multiplication based linear algebraMatrix multiplication: a building blockReductions to matrix multiplication

2 Computing the characteristic polynomialState of the artA new algorithmAlgorithm into practice

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 75: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Heuristic improvement

The randomization:

the iterate vectors at the first iteration must be random vectors.or equivalently

the matrix has to be preconditioned: M−1AM for a randommatrix M.

⇒as expensive as the rest of the algorithm

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 76: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

A block Krylov preconditoner

1: Pick n/c random vectors U =[u1 . . . un/c

].

2: M =[U AU . . . Ac−1

]

3: if M is non singular then4: M−1AM = Hc is in c-shifted form.5: else6: complete M into a non singular matrix M by adding some

columns at the end

7: then M−1

AM =

[Hc ∗

R

]8: end if

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 77: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

A block Krylov preconditoner

1: Pick n/c random vectors U =[u1 . . . un/c

].

2: M =[U AU . . . Ac−1

]3: if M is non singular then4: M−1AM = Hc is in c-shifted form.

5: else6: complete M into a non singular matrix M by adding some

columns at the end

7: then M−1

AM =

[Hc ∗

R

]8: end if

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 78: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

A block Krylov preconditoner

1: Pick n/c random vectors U =[u1 . . . un/c

].

2: M =[U AU . . . Ac−1

]3: if M is non singular then4: M−1AM = Hc is in c-shifted form.5: else6: complete M into a non singular matrix M by adding some

columns at the end

7: then M−1

AM =

[Hc ∗

R

]8: end if

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 79: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Efficiency balancing parameter

c small: full square matrix multiplications, but more opsc large: tends to matrix-vector products, but less ops

⇒parameter c balances efficiency

120

130

140

150

160

170

180

190

200

210

220

230

0 50 100 150 200

Tim

e (s

)

Preconditionning parameter c

Finding the optimal preconditionning paramater, n=5000

1 block of order 50005 blocks of order 100010 blocks of order 500

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 80: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Efficiency balancing parameter

c small: full square matrix multiplications, but more opsc large: tends to matrix-vector products, but less ops

⇒parameter c balances efficiency

120

130

140

150

160

170

180

190

200

210

220

230

0 50 100 150 200

Tim

e (s

)

Preconditionning parameter c

Finding the optimal preconditionning paramater, n=5000

1 block of order 50005 blocks of order 100010 blocks of order 500

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 81: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Efficiency balancing parameter

c small: full square matrix multiplications, but more opsc large: tends to matrix-vector products, but less ops

⇒parameter c balances efficiency

120

130

140

150

160

170

180

190

200

210

220

230

0 50 100 150 200

Tim

e (s

)

Preconditionning parameter c

Finding the optimal preconditionning paramater, n=5000

1 block of order 50005 blocks of order 100010 blocks of order 500

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 82: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Experiments

n LU-Krylov New algorithm200 0.024 0.032300 0.06s 0.088s500 0.248s 0.316s750 1.084s 1.288s

1000 2.42s 2.296s5000 267.6s 153.9s

10 000 1827s 991s20 000 14 652s 7097s30 000 48 887s 24 928s

Computation time for 1 Frobenius block matrices, Itanium2-64 1.3Ghz, 192Gb

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 83: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Experiments

0.01

0.1

1

10

100

1000

10000

100000

100 1000 10000 100000

Tim

e (s

)

Matrix order

Comparison for 1 frobenius block matrices over Z/(547909)

New algorithmLU−Krylov

Timing comparison between the new algorithm and LU-Krylov, logarithmicscales, Itanium2-64 1.3Ghz, 192Gb

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 84: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

State of the artA new algorithmAlgorithm into practice

Comparison to Magma

n magma-2.11 LU-Krylov New algorithm100 0.010s 0.005s 0.006s300 0.830s 0.294s 0.105s500 3.810s 1.316s 0.387s800 15.64s 4.663s 1.387s1000 29.96s 10.21s 2.755s1500 102.1s 33.36s 7.696s2000 238.0s 79.13s 17.91s3000 802.0s 258.4s 61.09s5000 3793s 1177s 273.4s7500 MT 4209s 991.4s

10 000 MT 8847s 2080s

Computation time for 1 Frobenius block matrices, Athlon 2200, 1.8Ghz, 2Gb

MT: Memory thrashing

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 85: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Conclusion and perspectives

Results:

Las Vegas reduction to matrix multiplication,The Frobenius normal form is easily derivable in O (nω)......but no transformation matrixAdaptive combination with block Krylov in practice.

Still to be done:

Condition on the size of the field is a limitation. Eberly’salgorithm ?Ideally: derandomization ? (deterministic)

Clément Pernet Matrix multiplication based Characteristic Polynomial

Page 86: Matrix Multiplication Based Computations of the ... · The most efficient routine in linear algebra. Several reasons: dedicated processor instruction fused-mac: z ←xy +z simple

IntroductionMatrix multiplication based linear algebraComputing the characteristic polynomial

Conclusion and perspectives

Conclusion and perspectives

Results:

Las Vegas reduction to matrix multiplication,The Frobenius normal form is easily derivable in O (nω)......but no transformation matrixAdaptive combination with block Krylov in practice.

Still to be done:

Condition on the size of the field is a limitation. Eberly’salgorithm ?Ideally: derandomization ? (deterministic)

Clément Pernet Matrix multiplication based Characteristic Polynomial