Matrix similarity - diva-portal.org1038860/FULLTEXT01.pdf · Similarity between matrices is a well...

Matrix similarity

Isak Johansson and Jonas Bederoff ErikssonSupervisor: Tilman Bauer

Department of Mathematics

2016

1

Abstract

This thesis will deal with similar matrices, also referred to as matrix conju-gation.

The first problem we will attack is whether or not two given matrices aresimilar over some field. To solve this problem we will introduce the Ratio-nal Canonical Form, RCF. From this normal form, also called the Frobeniusnormal form, we can determine whether or not the given matrices are sim-ilar over any field. We can also, given some field F , see whether they aresimilar over F or not. To be able to understand and prove the existence anduniqueness of the RCF we will introduce some additional module theory. Thetheory in this part will build up to finally prove the theorems regarding theRCF that can be used to solve our problem.

The next problem we will investigate is regarding simultaneous conjugation,i.e. conjugation by the same matrix on a pair of matrices. When are two pairsof similar matrices simultaneously conjugated? Can we find any necessaryor even sufficient conditions on the matrices? We will address this morecomplicated issue with the theory assembled in the first part.

2

Contents

1 Introduction 4

2 Modules 52.1 Structure theorem for finitely generated modules over a P.I.D,

existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.2 Structure theorem for finitely generated modules over a P.I.D,

uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

3 Similar matrices and normal forms 123.1 Linear transformations . . . . . . . . . . . . . . . . . . . . . . 123.2 Rational Canonical Form . . . . . . . . . . . . . . . . . . . . . 123.3 Similiar Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 143.4 Smith Normal Form . . . . . . . . . . . . . . . . . . . . . . . 163.5 Finding a conjugating matrix . . . . . . . . . . . . . . . . . . 18

4 Simultaneous Conjugation 25

3

1 Introduction

Similarity between matrices is a well studied problem in the area of linearalgebra. Most of the theory regarding matrix similarity has been known fora very long time. Ferdinand Georg Frobenius, who named the normal formwe will use in this report, lived over a hundred years ago.

Besides the purely mathematical reasons for this report to be interesting, it isinteresting in the way that it can be used to find similarity between objectsthat is not immediately visible. If we have two linear transformations indifferent bases, how can we know if they are in fact the same? Or givena linear transformation, which is the ”nicest” way of representing it as amatrix? Is it always possible to write it in this ”nice” form? These questionscan be answered after reading this report.

We will assume that the reader is familiar with some basic abstract algebra.This includes the theory of groups, rings, principal ideal domains, fields andmodules.

The structure of the report will be to define new concepts and then use themto prove useful theorems. We will also give some examples so the readerwill have the opportunity to apply the theory and understand it in a betterway.

Our sources to this paper was mainly the book Abstract algebra by David S.Dummit and Richard M. Foote, but we have also been reading a couple ofreports similar to this one.

From now on we let R denote a commutative unitary ring.

4

2 Modules

Definition 1. Let M be an R-module. The torsion of M is all the elementsof M for which there exists an r ∈ R −

{0}

such that their product is zeroi.e.

Tor(M) = {m ∈M | ∃r ∈ R−{

0}, rm = 0}.

If M = Tor(M), M is said to be a torsion module.

Definition 2. Let M be an R-module and let A be a subset of M. If forevery non-zero element x ∈ M there exist unique r1, r2, ..., rn ∈ R −

{0}

and a1, a2, ..., an ∈ A, ai 6= aj when i 6= j, for some n ∈ Z≥0 such thatx = r1a1 +r2a2 + ...+rnan then M is a free module on the set A. The set Ais called the basis for M, and the rank of M is the number of basis elementsi.e. the cardinality of A.

Definition 3. Let M be an R-module. The annihilator of M is all theelements of R for which the product with every element in M is zero i.e.

Ann(M) = {r ∈ R | rm = 0 ∀m ∈M}.

Definition 4. An R-module M is said to be finitely generated if it isgenerated by some finite subset, i.e. if there is some finite subset A ={a1, a2, ..., an} of M such that

M = RA = Ra1+Ra2+...+Ran = {r1a1+r2a2+...+rnan | ri ∈ R, 1 ≤ i ≤ n}.

Definition 5. An R-module M is said to be cyclic if it is generated by asingle element, i.e. if there is some element x ∈M such that

M = Rx = {rx | r ∈ R}.

Theorem 6. Let R be a P.I.D, M be a finitely generated free R-module ofrank n and N be a submodule of M. Then the following statements are true.i) N is free of rank m, m ≤ n.ii) There exists a basis {y1, y2 . . . yn} for M such that {a1y1, a2y2 . . . amym} isa basis for N, where ai ∈ R− {0} and ai | ai+1.

Proof. The theorem holds for N = 0, so assume N is not equal to zero. LetΦ be any homomorphism from M into R. Then Φ(N) will be an ideal in R.

5

Since R is a P.I.D there exists an element aΦ in R such that Φ(N) = (aΦ).Now we want to collect all the ideals in R obtained from mapping N in Minto R.

Σ = {(aΦ) | Φ ∈ HomR(M,R)}

Since R is a P.I.D. there exists an element in Σ, which is not properlycontained in any other element of Σ[1]. This implies that there exists anφ ∈ HomR(M,R) such that φ(N) = (aφ). Let a1 = aφ and y be an elementof N such that φ(y) = a1.

Next we show that a1 divides Φ(y) for every Φ ∈ HomR(M,R). Let d be adivisor of both a1 and Φ(y) such that d = r1a1 + r2Φ(y) for some r1, r2 ∈ R,then (a) ⊆ (d) and (Φ(y)) ⊆ (d). Now define a homomorphism ν such thatν = r1φ + r2Φ then ν(y) = r1a1 + r2Φ(y) = d. So (a1) ⊆ (ν(y)) = (d). Butfrom how we defined a1 we must have the equality (a1) = (ν(y)) = (d) andin particular (Φ(y)) ⊆ (a1).

Applying this knowledge to the natural projection homomorphism πi on thebasis {x1, x2 . . . xn} for M we obtain

πi(y) = a1bi, bi ∈ R, 1 ≤ i ≤ n.

Now define y1 ∈M as

y1 =n∑i=1

bixi,

but then the equality a1y1 = y holds and φ(y1) = 1, since a1 = φ(y) =φ(a1y1) = a1φ(y1). We will now prove that y1 can be taken as an element ina basis for M and a1y1 can be taken as an element in a basis for N , i.e.a) M = Ry1 ⊕ ker(φ)b) N = Ra1y1 ⊕ (N ∩ ker(φ))

For a) we can take any element x ∈M and add and subtract φ(x)y1 to it,

x = φ(x)y1 + (x− φ(x)y1).

Note that x− φ(x)y1 is an element of ker(φ) since

φ(x− φ(x)y1) = φ(x)− φ(x)φ(y1) = 0

6

Therefore x can be written as x = Ry1 +ker(φ). To see that we have a directsum suppose ry1 is an element of ker(φ), then

0 = φ(ry1) = rφ(y1) = r =⇒ r = 0.

To prove b) we first recall that a1 divides φ(z) for all z ∈ N . In a similiarway as before we rewrite z as

z = φ(z)y1 + (z − φ(z)y1),

and evaluating φ at z gives

z = ba1y1 + (z − ba1y1).

So clearly,N = Ra1y1 + (N ∩ ker(φ)).

This is just a special case of a) so the sum is direct.

We will prove the first part of the theorem by induction. If m = 0, N mustbe a torsion module, but since N is also a free module, N = 0. The theoremholds for N = 0 as noted earlier. By a) we see that

m = 1 + rank(N ∩ kerφ) =⇒ rank(N ∩ kerφ) = m− 1

So by induction N ∩ kerφ is a free R-module with rank m − 1 and N is afree module of rank m.

We will again use induction and the identities a) and b) to prove the secondpart of the theorem. Applying a) to M we get that ker(φ) is a free R-moduleof rank n−1. If we then continue with replacing M in a) with ker(φ) and Nin b) with N ∩kerφ we get that {y2, y3, . . . , yn} is a basis for ker(φ) and that{a2y2, a3y3, . . . , amym} is a basis for N∩kerφ. Here ai ∈ R, 1 ≤ i ≤ m and wehave the divisibility relation a2 | a3 | · · · | am. If we again use the identitieswe see that {y1, y2, . . . , yn} is a basis for M and {a1y1, a2y2, . . . , amym} isbasis for N , since the sums are direct.

If we can show that a1 divides a2, we are done. Let ψ be the projectionhomomorphism from M to R, such that ψ(y1) = ψ(y2) = 1 and ψ(yi) = 0for all 3 ≤ i ≤ n. Then ψ(a1y1) = a1ψ(y1) = a1, so (a1) ⊆ ψ(N) butsince a1 can not be properly contained by any homomorphism of N into Rwe must have equality (a1) = ψ(N). In a similiar way for a2y2 we obtainψ(a2y2) = a2ψ(y2) = a2, hence (a2) ⊆ ψ(N) = a1 i.e. a1 | a2.

7

2.1 Structure theorem for finitely generated modulesover a P.I.D, existence

Theorem 7. Let R be a P.I.D and M be a finitely generated R-module. ThenM is isomorphic to a free part and a direct sum of cyclic modules on the formR/(ai).

M ∼= Rr ⊕R/(a1)⊕R/(a2) · · · ⊕R/(am)

where the denominators above satisfies the relation a1 | a2 | · · · | am.

Proof. Let n ∈ Z≥0. Take any basis{b1, . . . , bn

}for Rn and generators{

c1, . . . , cn}

for M and define a homomorphism Φ as

Φ(bi) = ci for 1 ≤ i ≤ n

Then by the first isomorphism theorem we have

Rn/ker(Φ) ∼= M.

By Theorem 6 we can choose another basis{y1, . . . , yn

}forRn, where

{a1y1, . . . , amym

}is a basis for the kernel of Φ. Here ai ∈ R for 1 ≤ i ≤ n and with the di-visibility relation a1 | a2 | · · · | am. The last expression can now be writtenas

M ∼=Ry1 ⊕ · · · ⊕Ryn

Ra1y1 ⊕ · · · ⊕Ramym.

We want to rewrite this expression using the map

Ry1 ⊕ · · · ⊕Ryn 7→ R/(a1)⊕ · · · ⊕R/(am)⊕Rn−m,

which clearly has the kernel Ra1y1 ⊕ · · · ⊕Ramym. Hence,

M ∼= Rr ⊕R/(a1)⊕R/(a2) · · · ⊕R/(am).

Note: r = n−m.

Definition 8. In Theorem 7, r is called the rank of M and the numbers withthe divisibility relation a1 | · · · | am are called invariant factors.

8

Theorem 9. Let R, M and r be as in Theorem 7. Then M is isomorphic toa direct sum

M ∼= Rr ⊕R/(pα11 )⊕R/(pα2

2 ) · · · ⊕R/(pαmm ),

where pi are primes and αi ∈ N. Note that the cyclic factors are not neces-sarily distinct.

Proof. This follows from applying the Chinese Remainder Theorem on eachcyclic factor in Theorem 7.

Definition 10. The numbers pα11 , p

α22 , . . . , p

αmm in Theorem 9 are called

elementary divisors.

Theorem 11. Let R be a P.I.D, p be a prime in R and F be the field R/(p).Then

Rr/pRr ∼= F r.

Proof. Consider the natural map

Φ : Rr 7→ F r,

defined by

Φ(x1, x2, ..., xr) = (x1 (mod p), x2 (mod p), ..., xr (mod p)),

where Im(Φ) = F r and Ker(Φ) = pRr. Since this is an R-module homo-morphism it gives us the desired isomorphism

Rr/pRr ∼= F r

by the First Isomorphism Theorem for modules.

Lemma 12. Let R be a P.I.D, p ∈ R be a prime and M the R-moduleM = R/(a1)⊕ R/(a2) · · · ⊕ R/(am), where p divides all ai. Then M/pM isisomorphic to Fm where F is the field R/(p).

Proof. We will prove that if M = R/(a), then M/pM ∼= R/(p). This provesour lemma.

9

The module pM = [(p) + (a)]/(a) is the image of (p) in R via the canonicalhomomorphism from R to R/(a). If p divides a then pM = (p)/(a) and

M/pM = (R/(a))/((p)/(a)) ∼= R/(p) = F.

Lemma 13. Given a list of elementary divisors, the corresponding invariantfactors are uniquely determined.

Proof. Suppose we have two different lists of invariant factors a1 | a2 | · · · | anand b1 | b2 | · · · | bm with the same corresponding list of elementary divisors.Due to the divisibility relation, an and bm is the product of the greatestpowers of each prime among the elementary divisors, which implies that an =bm. Removing these prime powers, an−1 and bm−1 are now the products ofthe greatest powers of each prime remaining among the elementary divisors.This implies an−1 = bm−1. Continuing this procedure we see that aj = bj ∀jand we are done.

2.2 Structure theorem for finitely generated modulesover a P.I.D, uniqueness

Theorem 14. Two finitely generated R-modules M1 and M2 are isomorphicif and only if they have the same free rank and the same list of invariantfactors.

Proof. (⇐=). Assume M1 and M2 have the same free rank, r, and the samelist of invariant factors, a1, a2, ..., am. Then they are trivially isomorphic since

M1∼= Rr ⊕R/(a1)⊕R/(a2) · · · ⊕R/(am) ∼= M2

⇒M1∼= M2.

(=⇒). Assume now M1∼= M2. Then also Tor(M1) ∼= Tor(M2)⇒ Rr1 ∼= Rr2 ,

where r1 and r2 are the free ranks of M1 and M2 respectively. If p is anynon-zero prime in R we obtain Rr1/pRr1 ∼= Rr2/pRr2 . By Theorem 8 we getthat

Rr1/pRr1 ∼= Rr2/pRr2 ⇒ F r1 ∼= F r2 ,

10

where F = R/(p). Since these are vector spaces we must have that r1 = r2.

Since the invariant factors are uniquely determined by the elementary divisorsit is sufficient to prove that the modules have the same list of elementarydivisors.

Given any prime p we want to show that the list of elementary divisors whichare powers of p are the same for both M1 and M2.

If M1 and M2 are isomorphic, the submodules of M1 and M2 containing thecyclic factor whose elementary divisor is the greatest power of p must alsobe isomorphic, since M1 and M2 share the same annihilator.

It is now sufficient to deal with the case for when the elementary divisors ofM1 and M2 are on the following form:

for M1: p, p . . . p︸︷︷︸n1

, pα1 , pα2 , . . . , pαs

for M2: p, p . . . p︸︷︷︸n2

, pβ1 , pβ2 , . . . , pβt

But then the elementary divisors for the pMi must be

for pM1: pα1−1, pα2−1, . . . , pαs−1

for pM2: pβ1−1, pβ2−1, . . . , pβt−1

If M1∼= M2 then pM1

∼= pM2, one can obtain that the power of the ele-mentary divisors has decreased for the latter modules with one. The list ofelementary divisors for pM1 and pM2 reveals that αs−1 = βt−1 and αs = βt.

By induction we obtain thatαi = βi

and s = t.

Using the previous lemma we obtain M1/pM1∼= F n1+s and M2/pM2

∼= F n2+t

which gives that F n1+s ∼= F n2+t, but then n1 = n2. This proves that M1 andM2 has the same list of elementary divisors hence the same list of invariantfactors.

11

3 Similar matrices and normal forms

In this section we will introduce the concept of similarity. Using the theoryof modules, most importantly the structure theorem, we can now constructthe normal forms useful to determine matrix similarity.

3.1 Linear transformations

Let F be a field. For every pair of a vector space V and a linear transforma-tion T : V 7→ V , there exists an F [x]-module isomorphic to the vector spaceV . Without loss of generality we can consider V to be this F [x]-module.

T.α = xα ∀α ∈ V.

T acts by multiplication with x on V . In this chapter we will apply thetheory for modules to the special case when the module is a vector spaceover F [x].

3.2 Rational Canonical Form

Let V be a finite dimensional vector space considered as an F [x]-module. LetT : V 7→ V be the linear transformation which acts by multiplication with xon V . Let (a(x)) be the annihilator of V . Since a(x) is unique up to a unit,one can let a(x) be a monic polynomial.

a(x) = xn + bn−1xn−1 + · · ·+ b0.

Now let the T act on the generators of V .

1 7→ x

x 7→ x2

xn−1 7→ −bn−1xn−1 − bn−2x

n−2 − · · · − b0.

12

Writing this in matrix notation and the basis {1, x, x2, ..., xn−1} we get amatrix with ones down the subdiagonal.

T =

0 0 · · · 0 −b0

1 0 · · · 0 −b1

0 1 · · · 0 −b2...

.... . .

......

0 0 · · · 1 −bn−1

Definition 15. The matrix Ca(x) above is said to be the companion matrixof the monic polynomial a(x) = xn + bn−1x

n−1 + · · ·+ b0.

Applying Theorem 7 to the vector space V with zero free rank we get

V ∼= F [x]/(a1(x))⊕ F [x]/(a2(x))⊕ · · · ⊕ F [x]/(am(x)).

Let Vi ∼= F [x]/(ai(x)), thenV ∼= ⊕Vi

as F [x]-modules.

Let T be the linear transformation acting by componentwise (on each Vi)multiplication with x.

These subspaces are clearly T -invariant, i.e. T (v) ∈ Vi for all v ∈ Vi. Nowone can choose a basis for V to be {x1, x2, · · · , xn}, such that T (v) = Cai(x)vfor all v ∈ Vi, where Cai(x) is the companion matrix of the monic polynomialai(x). T can now be written as a block diagonal matrix A,

A = ⊕Cai(x).

A =

Ca1(x) 0 · · · 0 0

0 Ca2(x). . . 0 0

0 0. . . 0 0

......

. . . . . ....

0 0 · · · 0 Cam(x)

.

Definition 16. The matrix above is the Rational Canonical Form of alinear transformation T . Here ai(x) | ai+1(x).

13

3.3 Similiar Matrices

Definition 17. Two n × n matrices A and B are said to be similiar orconjugated if they are the same linear transformation in possibly differentbases, i.e.

A = PBP−1

where P is an invertible change of basis matrix. A similiar to B can be writ-ten as A ∼ B.

Theorem 18. Given any n × n matrix A with entries in F there exists aunique matrix B with entries in F such that B is in Rational Canonical Formand A ∼ B.

Proof. This is just a special case of Theorem 7 and Theorem 14. Let A :V 7→ V . Here V is a vector space which is isomorphic to a direct sum on theform

V ∼= F [x]/(a1(x))⊕ F [x]/(a2(x))⊕ · · · ⊕ F [x]/(am(x)).

One can now produce a matrix with companion matrices of the invariantfactors on the diagonal and zeros elsewhere. Thus the Rational CanonicalForm for A exists.

By Theorem 14, two F [x]-modules are isomorphic if and only if they have thesame invariant factors, hence the Rational Canonical Form must be unique.

Lemma 19. i) The characteristic polynomial of the companion matrix ofa(x) is a(x).ii) The characteristic polynomial of a matrix with companion matrices on thediagonal is the product of the characteristic polynomials of the companionmatrices,

χA(x) =∏i

χCi(x).

Theorem 20. The characteristic polynomial of an n × n matrix A is theproduct of all invariant factors of the F [x]-module (F n, A).

Proof. Let B be the matrix in Rational Canonical Form which A is similarto. Then the following is true.

14

χA(x) = Det(xI − A) = Det(xI − PBP−1) =

= Det(P )Det(xI −B)Det(P−1) = Det(P )Det(xI −B)1

Det(P )= χB(x).

Applying the previous lemma now and we are done.

Theorem 21. Let A and B be two n× n matrices with entries in some fieldF. Then the following statements are equivalent.i) A and B are similar over F.ii) A and B have the same invariant factors.iii) A and B have the same Rational Canonical Form.

Proof. i) =⇒ ii). Let A = PBP−1. We know that P is an isomorphismfrom a vector space V to V since it is invertible. Below we show that P isan F [x]-module isomorphism from VB to VA, where VB is the F [x]-modulefor V and B and VA is the F [x]-module for V and A.

P.(xα) = P.(B.α) = PB.(α) = AP.(α) = A.(P.(α)) = x(P.(α)) ∀α ∈ VB ∼= V.

Since the F [x]-modules are isomorphic they must by have the same list ofinvariant factors by Theorem 14.

ii) =⇒ iii). Since the matrices have the same list of invariant factorsthey have the same list of companion matrices and hence the same RationalCanonical Form.

iii) =⇒ i). Since the matrices have the same Rational Canonical Formthey have the same representation but in a possibly different basis. So theyare the same linear transformation up to a change of basis, hence similar.

Theorem 22. If A ∼ B are two n× n matrices then the following is true:i) Tr(A)=Tr(B).ii) Det(A) = Det(B).

Proof. Since A and B are similar they have the same invariant factors by The-orem 21. By Theorem 20, they also have the same characteristic polynomial.In particular, they share trace and determinant since these are coefficients inthe characteristic polynomial.

15

Theorem 23. Let A and B be two n × n matrices with entries in somefield F. Let K be an extension field of F. Then A and B are similiar over K(elements of P in K) if and only if A and B are similiar over F (elements ofP in F).

Proof. (⇐=). Assume that A ∼ B over F . But then also A ∼ B over Ksince F is embedded in K.

(=⇒). Assume that A ∼ B over K. Then A and B have the same RationalCanonical Form. By Theorem 18, this RCF has entries in F . Theorem 21then implies that A and B are also similar over F .

3.4 Smith Normal Form

Up to this point we have showed the existence and uniqueness of the RCF.With this in mind, we will in this section show how to find this unique RCF.We will introduce another normal form, the Smith Normal Form, which canbe found using simple matrix operations. This Smith Normal Form can beused to extract the RCF of any matrix.

Definition 24. Let A be any n×n matrix over some field F. Then the matrixB = xI − A has entries in F [x] and the following three operations on B arecalled the elementary row and column operations.1) Interchanging any two rows or columns.2) Adding a multiple in F [x] of one row or column to another row or column.3) Multiplying any row or column by a unit in F [x], i.e any non-zero elementin the field F.

These operations are often referred to as the ERCOs. The notation usedfor the ERCOs are the following.

Ri ⇐⇒ Rj

whenever the ith and jth row are interchanged.

Ci + p(x)Cj

whenever p(x) times the jth column is added to the ith column.

Ri · u

16

whenever the ith row is multiplied by a factor u.

If two operations are done in one step, the one which is written above theother is the first one executed.

Definition 25. An n×n matrix A is said to be in Smith Normal Form,SNF, if it is diagonal and the elements 1, 1, ..., a1, a2, ..., am on the diagonalsatisfy the divisibility relation a1 | a2 |...| am.

A =

1 0 0 · · · 0 0

0. . . 0

. . . 0 0... 0 1

. . . 0...

0 0 0 a1 0 0

0 0 0. . . . . . 0

0 0 0 · · · 0 am

Theorem 26. Given any n×n matrix A over some field K the matrix xI−Acan be put in Smith Normal Form using ERCOs. This normal form is uniqueand the diagonal elements a1(x), a2(x), ..., am(x) are the invariant factors ofA.

Proof. The proof of this theorem is omitted due to its lenght[1].

What we want to do now is to give an intuition on why this algorithm works,i.e. when using ERCOs on the matrix xI − A to put it in SNF, we get theinvariant factors of A on the diagonal.

First off we can see that when using the ERCOs on xI−A we do not changethe determinant (up to a unit in F [x]). Since the characteristic polynomialof A is defined as Det(xI − A) it is therefore invariant under ERCOs up toa unit. Since this polynomial is defined as a monic we know that

χA(x) = Det(SNF )

where SNF is the unique Smith Normal Form we get from using ERCOs onxI−A. The SNF is diagonal so its determinant is the product of the diagonalelements, which have the divisibility criteria a1(x) | a2(x) |...| am(x). Sonow we know that the product of the diagonal element of the SNF is the

17

characteristic polynomial. We also know that they divide each other. Theseproperties are shared with the invariant factors of A.

3.5 Finding a conjugating matrix

Finding a matrix P conjugating any matrix A into it s RCF (i.e. P−1AP isin RCF) is done by keeping track of the row operations used on xI − A toput it in SNF.

First off we denote the order of the ith invariant factor of A to be di, sothat

∑i

di = n where n is the dimension of the matrix A. This is also the

dimension of the vector space corresponding to the ith cyclic factor in theinvariant factor decomposition.

Starting with the matrix P ′ = I, for every row operation used on xI −A, wechange the matrix P ′ according to the following rules.

1) If Ri ⇐⇒ Rj then Ci ⇐⇒ Cj for P ′.

2) If Ri + p(x)Rj then Cj − p(A)Ci for P ′.

3) If Ri · u then Ci · u−1 for P ′.

This can be seen as finding a generator for each invariant factor by takingF [x]-linear combinations of the generators of the standard basis, i.e. thestandard basis vectors where ei has a one on the ith position and zeros else-where.

When this is done we will have a matrix with n-m zero-columns in the begin-ning, where m denotes the number of invariant factors of A. These columnswill be removed now and with the remaining m non-zero columns (note thatthese columns correspond precisely to the m diagonal elements in the SmithNormal Form) we do the following. From the first (non-zero) column, nowdenoted C1, we will extract d1 columns, since the vector space correspondingto this cyclic factor has dimension d1. These columns will form a basis forthis vector space and are extracted in the following way.

C1, AC1, A2C1, ..., A

d1−1C1.

With respect to this order we have now the first d1 columns of P , i.e. thebasis for the above mentioned vector space. Then we do the same thing

18

with C2 from which we will extract d2 columns to put in P in the sameway as above. They will in the same manner form a basis for the vectorspace corresponding to the second cyclic factor of A. These basis vectors willbecome columns d1 + 1, d1 + 2, ..., d1 + d2 of P . We continue this process forthe remaining (non-zero) columns of P ′ to construct n linearly independentcolumns of P , i.e. n basis vectors in which basis the linear transformation Ais in Rational Canonical Form.

Example 27. Assume A =

0 3 0 31 1 −1 0−1 −1 1 30 1 1 2

and B =

0 3 0 −31 2 0 −30 0 3 30 0 0 −1

Are A and B similar, i.e. is there some matrix P, say over Q, such thatA = PBP−1?

Solution.We will do this by finding the RCF for both A and B, and see if they are thesame or not. To find the RCF we must first find the invariant factors, whichwe will do by using elementary row and column operations on xI − A andxI − B to put them on their Smith Normal Form. We start with A, wherewe from now on define a(x) := x2 − 2x− 3

xI − A =

x −3 0 −3−1 x− 1 1 01 1 x− 1 −30 −1 −1 x− 2

C2−C4−−−−→

x 0 0 −3−1 x− 1 1 01 4 x− 1 −30 1− x −1 x− 2

R2+R3−−−−→

x 0 0 −30 x+ 3 x −31 4 x− 1 −30 1− x −1 x− 2

R3−R1−−−−→

x 0 0 −30 x+ 3 x −3

1− x 4 x− 1 00 1− x −1 x− 2

R2+x·R4−−−−−→

x 0 0 −30 −a(x) 0 a(x)

1− x 4 x− 1 00 1− x −1 x− 2

C1+C3−−−−→

x 0 0 −30 −a(x) 0 a(x)0 4 x− 1 0−1 1− x −1 x− 2

C2+C4−−−−→

x −3 0 −30 0 0 a(x)0 4 x− 1 0−1 −1 −1 x− 2

R2 ⇐⇒ R4−−−−−−→

x −3 0 −3−1 −1 −1 x− 20 4 x− 1 00 0 0 a(x)

19

R1+x·R2−−−−−→

0 −x− 3 −x a(x)−1 −1 −1 x− 20 4 x− 1 00 0 0 a(x)

C2−C3−−−−−−→R1 ⇐⇒ R2

−1 0 −1 x− 20 −3 −x a(x)0 5− x x− 1 00 0 0 a(x)

C3−C1−−−−→C1(−1)

1 0 0 x− 20 −3 −x a(x)0 5− x x− 1 00 0 0 a(x)

C4+(2−x)·C1−−−−−−−→

1 0 0 00 −3 −x a(x)0 5− x x− 1 00 0 0 a(x)

R2−R4−−−−→

1 0 0 00 −3 −x 00 5− x x− 1 00 0 0 a(x)

R3+2·R2−−−−−→

1 0 0 00 −3 −x 00 −1− x −1− x 00 0 0 a(x)

C2−C3−−−−→C3(−1)

1 0 0 00 x− 3 x 00 0 x+ 1 00 0 0 a(x)

C3−C2−−−−→

1 0 0 00 x− 3 3 00 0 x+ 1 00 0 0 a(x)

R3−x+13·R2−−−−−−−→

1 0 0 00 x− 3 3 0

0 −a(x)3

0 00 0 0 a(x)

C2+ 3−x3·C3−−−−−−−→

1 0 0 00 0 3 0

0 −a(x)3

0 00 0 0 a(x)

C2 ⇐⇒ C3−−−−−−→

1 0 0 00 3 0 0

0 0 −a(x)3

00 0 0 a(x)

C3·(−3)−−−−→C2·( 1

3)

1 0 0 00 1 0 00 0 a(x) 00 0 0 a(x)

This is now the Smith Normal Form of xI−A. We see that we have two equalinvariant factors, a(x). Each of them have the corresponding companionmatrix

Ca(x) =

(0 31 2

).

20

This gives the Rational Canonical Form of A

RCF (A) =

0 3 0 01 2 0 00 0 0 30 0 1 2

.

If we now do the same procedure for B we get that

xI −B =

x −3 0 3−1 x− 2 0 30 0 x− 3 −30 0 0 x+ 1

C2+C1−−−−→

C2+C1−−−−→

x x− 3 0 3−1 x− 3 0 30 0 x− 3 −30 0 0 x+ 1

R1−R2−−−−→

x+ 1 0 0 0−1 x− 3 0 30 0 x− 3 −30 0 0 x+ 1

R2+R3−−−−→

x+ 1 0 0 0−1 x− 3 x− 3 00 0 x− 3 −30 0 0 x+ 1

C3−C2−−−−→

x+ 1 0 0 0−1 x− 3 0 00 0 x− 3 −30 0 0 x+ 1

R4+x+13R3−−−−−−→

R4·3

x+ 1 0 0 0−1 x− 3 0 00 0 x− 3 −30 0 a(x) 0

R3 ⇐⇒ R4−−−−−−→

x+ 1 0 0 0−1 x− 3 0 00 0 a(x) 00 0 x− 3 −3

R1+(x+1)R2−−−−−−−→

0 a(x) 0 0−1 x− 3 0 00 0 a(x) 00 0 x− 3 −3

C2 ⇐⇒ C4−−−−−−→R1 ⇐⇒ R4

0 −3 x− 3 0−1 0 0 x− 30 0 a(x) 00 0 0 a(x)

R1 ⇐⇒ R2−−−−−−→

−1 0 0 x− 30 −3 x− 3 00 0 a(x) 00 0 0 a(x)

C4+(x−3)C1−−−−−−−→

−1 0 0 00 −3 x− 3 00 0 a(x) 00 0 0 a(x)

21

C3+x−33C2−−−−−−→

−1 0 0 00 −3 0 00 0 a(x) 00 0 0 a(x)

C2·(−13

)−−−−→C1(−1)

1 0 0 00 1 0 00 0 a(x) 00 0 0 a(x)

This is the Smith Normal Form for xI − B. Since A and B have the sameinvariant factors, they have the same RCF, i.e.

RCF (B) =

0 3 0 01 2 0 00 0 0 30 0 1 2

and

RCF (A) = RCF (B).

Theorem 21 gives us that

A ∼ B or A = PBP−1

for some invertible matrix P . By following the algorithm above we will findmatrices PA, PB, such that M = P−1

A APA = P−1B BPB, where M is the RCF

of A and B. Then we have that A = PAP−1B BPBP

−1A = (PAP

−1B )B(PAP

−1B )−1

so that P = PAP−1B . We start with PA.

1 0 0 00 1 0 00 0 1 00 0 0 1

C3−C2−−−−→

1 0 0 00 1 −1 00 0 1 00 0 0 1

C1+C3−−−−→

1 0 0 0−1 1 −1 01 0 1 00 0 0 1

C4−AC2−−−−−→

1 0 0 −3−1 1 −1 −11 0 1 10 0 0 0

C2 ⇐⇒ C4−−−−−−→

1 −3 0 0−1 −1 −1 11 1 1 00 0 0 0

C2−AC1−−−−−→

1 0 0 0−1 0 −1 11 0 1 00 0 0 0

C1 ⇐⇒ C2−−−−−−→

0 1 0 00 −1 −1 10 1 1 00 0 0 0

C4+C2−−−−→

0 1 0 10 −1 −1 00 1 1 10 0 0 0

C2−2C3−−−−→

0 1 0 10 1 −1 00 −1 1 10 0 0 0

22

C2+A+I3C3−−−−−−→

0 0 0 10 0 −1 00 0 1 10 0 0 0

= P ′A.

For A we have that d1 = d2 = 2, so P ′A has the right number of zero-columns.The first two columns of PA are

0−110

, A

0−110

=

−3−220

.

The last two columns are 1010

, A

1010

=

0001

.

This gives

PA =

0 −3 1 0−1 −2 0 01 2 1 00 0 0 1

.

Now we do the same thing for the matrix B.1 0 0 00 1 0 00 0 1 00 0 0 1

C2+C1−−−−→

1 1 0 00 1 0 00 0 1 00 0 0 1

C3−C2−−−−→

1 1 −1 00 1 −1 00 0 1 00 0 0 1

C3−B+I3C4−−−−−−→

1 1 0 00 1 0 00 0 0 00 0 0 1

C4· 13−−→

1 1 0 00 1 0 00 0 0 00 0 0 1

3

C3 ⇐⇒ C4−−−−−−→

1 1 0 00 1 0 00 0 0 00 0 1

30

C2−(B+I)C1−−−−−−−→

1 0 0 00 0 0 00 0 0 00 0 1

30

C1 ⇐⇒ C4−−−−−−→

0 0 0 10 0 0 00 0 0 00 0 1

30

= P ′B.

23

For matrix B we of course also have d1 = d2 = 2, which gives the first twocolumns of PB

00013

, B

00013

=

−1−11−13

.

The last two columns of PB are given by1000

, B

1000

=

0100

.

This gives the matrix

PB =

0 −1 1 00 −1 0 10 1 0 013

−13

0 0

,

with an inverse

P−1B =

0 0 1 30 0 1 01 0 1 00 1 1 0

.

This finally gives by above

P = PAP−1B =

1 0 −2 00 0 −3 −31 0 4 30 1 1 0

,

which conjugates B into A by A = PBP−1

Note however that this is not the only matrix conjugating B into A. Forexample the matrix

P2 =

1 0 1 00 1 0 −10 −1 1 10 0 1 1

also conjugates B into A in the same manner.

24

4 Simultaneous Conjugation

Definition 28. Two pairs of matrices (A1, A2) and (B1, B2) are said to besimultaneously conjugated if there exists an invertible matrix P such that

P.(A1, A2) := (PA1P−1, PA2P

−1) = (B1, B2).

The notation used for this pairwise similarity will be (A1, A2) ∼ (B1, B2).

When we started investigating this pairwise similarity we wanted to somehowuse our prior knowledge in simple similarity to attack the problem of whentwo pairs of matrices are pairwise similar.

First of all we see that we we need A1 ∼ B1 and A2 ∼ B2. This is the mostobvious necessary condition. It is clear that this is not sufficient. Then werealised that we could construct the products A1A2 and A2A1 and see thatanother necessary condition is that A1A2 ∼ B1B2 and A2A1 ∼ B2B1. Thisfollows straight from the definition since

A1 = P−1B1P

andA2 = P−1B2P.

This givesA1A2 = P−1B1B2P

andA2A1 = P−1B2B1P.

We could also show that this together with the simple similarity (A1 ∼ B1 andA2 ∼ B2) was not a sufficient condition, by finding a counterexample.

Another approach we thought about was finding the explicit matrices P foreach conjugation and from there see if they could coincide. The problem herewas that in general there are a lot of matrices P that can conjugate a matrixinto one in the same similarity class. In special cases e.g. small matrices itis possible to find some stronger restrictions in the set of these matrices Pthat makes it possible to find all of them. But in the general case this isvery hard and they can look a lot different. This can be seen at the end ofExample 27.

25

We continued the process by seeing if the simple similarity condition togetherwith the condition that

Tr(Ai11 Ai22 A

i31 ...A

ik2 ) = Tr(Bi1

1 Bi22 B

i31 ...B

ik2 ), ij, k ∈ N ∪ 0, 1 ≤ j ≤ k

was sufficient. We could also here construct a counterexample to show thatis was not sufficient, by looking at upper triangular matrices.

Finally we looked at the strongest necessary simple conjugation condition wecould find. By constructing finite products of powers of A1 and A2 we couldsee that a necessary condition was that

Ai11 Ai22 A

i31 ...A

ik2 ∼ Bi1

1 Bi22 B

i31 ...B

ik2 .

The proof that this is not a sufficient condition follows below. Note that thiscondition is a generalization of all the above mentioned conditions. Hence weomitted the proofs for for these special cases since they follow from Theorem29.

Theorem 29. Ai11 Ai22 A

i31 ...A

ik2 ∼ Bi1

1 Bi22 B

i31 ...B

ik2 , ij, k ∈ N∪0, 1 ≤ j ≤ k

does not imply (A1, A2) ∼ (B1, B2).

Proof. We will prove this by finding a counterexample. Consider thefollowing matrices.

A1 =

(1 30 2

)B1 =

(1 10 2

)

A2 =

(2 50 3

)B2 =

(2 10 3

)One can see that

Ai11 Ai22 A

i31 ...A

ik2 =

(2J ∗0 2I3J

)and

Bi11 B

i22 B

i31 ...B

ik2 =

(2J ∗∗0 2I3J

),

where

I =

k2∑

n=1

i2n−1

26

and

J =

k2∑

n=1

i2n.

The only thing we know about the upper right element is that it is a positivenumber but since it does not affect the invariant factors and hence similaritywe denote it with stars. Clearly Ai11 A

i22 A

i31 ...A

ik2 and Bi1

1 Bi22 B

i31 ...B

ik2 have the

same invariant factors so they must be similar.

Since A1 and B1 are similar there exists an invertible matrix P such that

A1 = PB1P−1 ⇐⇒ A1P = PB1.

LH =

(1 30 2

)(p11 p12

p21 p22

)=

(p11 + 3p21 p12 + 3p22

2p21 2p22

).

RH =

(p11 p12

p21 p22

)(1 10 2

)=

(p11 p11 + 2p12

p21 p21 + 2p22

).

This implies that p21 = 0 and replacing p11 and p22 with the parameters sand t gives

P =

(s 3t− s0 t

).

Now we apply the same procedure for the matrices A2 and B2 with theinvertible change of basis matrix Q.

LH =

(2 50 3

)(q11 q12

q21 q22

)=

(2q11 + 5q21 2q12 + 5q22

3q21 3q22

).

RH =

(q11 q12

q21 q22

)(2 10 3

)=

(2q11 q11 + 3q12

2q21 q21 + 3q22

).

This implies that q21 = 0 and replacing q11 and q22 with the parameters s′

and t′ gives

Q =

(s′ 5t′ − s′0 t′

).

Assume now (A1, A2) ∼ (B1, B2). Then there exists a choice of s, t, s’, t’such that P = Q, (

s 3t− s0 t

)=

(s′ 5t′ − s′0 t′

).

27

This implies that s = s′, t = t′ and

3t− s = 5t′ − s′ =⇒ 3t = 5t =⇒ t = t′ = 0.

Thus we have reached a contradiction since P and Q are supposed to beinvertible matrices and we are done.

References

[1] David S Dummit and Richard M. Foote, Abstract Algebra, Wiley, thirdedition, 2004

28

Matrix similarity - diva-portal.org1038860/FULLTEXT01.pdf · Similarity between matrices is a well...

Documents

Transcript of Matrix similarity - diva-portal.org1038860/FULLTEXT01.pdf · Similarity between matrices is a well...