Lecture Topic: Principal Components...An Example Let A 2Rn n denote a data matrix, which encodes n...

Post on 19-Jul-2020

1 views 0 download

Transcript of Lecture Topic: Principal Components...An Example Let A 2Rn n denote a data matrix, which encodes n...

Lecture Topic: Principal Components

Principal Component Analysis

Principal component analysis (PCA) is one of the most often used tools instatistics (data science, machine learning).

It transforms a matrix of observations of possibly correlated variables into a matrixof values of linearly uncorrelated variables called principal components (PC),where each PC is defined by a combination of the columns of the original matrix.

The first principal component accounts for as much of the variability in the dataas possible, and each subsequent PC has the highest variance possible such that itis orthogonal to all preceding PC.

When you are given a matrix large enough not to be able to print on a sheet ofA4, the first step in understanding it should involve PCA or its regularised variant.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 2 / 1

Principal Component Analysis

Principal component analysis (PCA) is one of the most often used tools instatistics (data science, machine learning).

It transforms a matrix of observations of possibly correlated variables into a matrixof values of linearly uncorrelated variables called principal components (PC),where each PC is defined by a combination of the columns of the original matrix.

The first principal component accounts for as much of the variability in the dataas possible, and each subsequent PC has the highest variance possible such that itis orthogonal to all preceding PC.

When you are given a matrix large enough not to be able to print on a sheet ofA4, the first step in understanding it should involve PCA or its regularised variant.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 2 / 1

Principal Component Analysis

Principal component analysis (PCA) is one of the most often used tools instatistics (data science, machine learning).

It transforms a matrix of observations of possibly correlated variables into a matrixof values of linearly uncorrelated variables called principal components (PC),where each PC is defined by a combination of the columns of the original matrix.

The first principal component accounts for as much of the variability in the dataas possible, and each subsequent PC has the highest variance possible such that itis orthogonal to all preceding PC.

When you are given a matrix large enough not to be able to print on a sheet ofA4, the first step in understanding it should involve PCA or its regularised variant.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 2 / 1

Principal Component Analysis

Principal component analysis (PCA) is one of the most often used tools instatistics (data science, machine learning).

It transforms a matrix of observations of possibly correlated variables into a matrixof values of linearly uncorrelated variables called principal components (PC),where each PC is defined by a combination of the columns of the original matrix.

The first principal component accounts for as much of the variability in the dataas possible, and each subsequent PC has the highest variance possible such that itis orthogonal to all preceding PC.

When you are given a matrix large enough not to be able to print on a sheet ofA4, the first step in understanding it should involve PCA or its regularised variant.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 2 / 1

An Example

Let A ∈ Rn×n denote a data matrix, which encodes n ratings of n users, e.g., ofmovies or books.

Let us assume the ratings of each user, i.e. values of each column, sum up to 1.

The first 2 principal components could represent a new coordinate system withtwo axes, such as “likes horrors” and “likes romantic comedies”, depending on theactual ratings.

You could replace the ratings of one user in the original rating-per-movie form, i.e.one row, with a 2-vector, which would suggest how much the user likes horrorsand how much the users likes romantic comedies.

For excellent interactive demonstrations, see:http://setosa.io/ev/principal-component-analysis/

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 3 / 1

An Example

Let A ∈ Rn×n denote a data matrix, which encodes n ratings of n users, e.g., ofmovies or books.

Let us assume the ratings of each user, i.e. values of each column, sum up to 1.

The first 2 principal components could represent a new coordinate system withtwo axes, such as “likes horrors” and “likes romantic comedies”, depending on theactual ratings.

You could replace the ratings of one user in the original rating-per-movie form, i.e.one row, with a 2-vector, which would suggest how much the user likes horrorsand how much the users likes romantic comedies.

For excellent interactive demonstrations, see:http://setosa.io/ev/principal-component-analysis/

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 3 / 1

An Example

Let A ∈ Rn×n denote a data matrix, which encodes n ratings of n users, e.g., ofmovies or books.

Let us assume the ratings of each user, i.e. values of each column, sum up to 1.

The first 2 principal components could represent a new coordinate system withtwo axes, such as “likes horrors” and “likes romantic comedies”, depending on theactual ratings.

You could replace the ratings of one user in the original rating-per-movie form, i.e.one row, with a 2-vector, which would suggest how much the user likes horrorsand how much the users likes romantic comedies.

For excellent interactive demonstrations, see:http://setosa.io/ev/principal-component-analysis/

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 3 / 1

An Example

Let A ∈ Rn×n denote a data matrix, which encodes n ratings of n users, e.g., ofmovies or books.

Let us assume the ratings of each user, i.e. values of each column, sum up to 1.

The first 2 principal components could represent a new coordinate system withtwo axes, such as “likes horrors” and “likes romantic comedies”, depending on theactual ratings.

You could replace the ratings of one user in the original rating-per-movie form, i.e.one row, with a 2-vector, which would suggest how much the user likes horrorsand how much the users likes romantic comedies.

For excellent interactive demonstrations, see:http://setosa.io/ev/principal-component-analysis/

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 3 / 1

An Example

Let A ∈ Rn×n denote a data matrix, which encodes n ratings of n users, e.g., ofmovies or books.

Let us assume the ratings of each user, i.e. values of each column, sum up to 1.

The first 2 principal components could represent a new coordinate system withtwo axes, such as “likes horrors” and “likes romantic comedies”, depending on theactual ratings.

You could replace the ratings of one user in the original rating-per-movie form, i.e.one row, with a 2-vector, which would suggest how much the user likes horrorsand how much the users likes romantic comedies.

For excellent interactive demonstrations, see:http://setosa.io/ev/principal-component-analysis/

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 3 / 1

Key Concepts

Principal Component Analysis is a linear transformation that transforms data intoa new coordinate system such that the projection of the data on the firstcoordinate explains more variance in the data than the projection on the thesecond coordinate, which in turn explains more variance than a projection ontothe third coordinate etc.

The Eigenvalue Problem:given A ∈ Rn×n, find the unknowns x ∈ Rn and λ ∈ R such that Ax = λx .

Here, λ is called an eigenvalue of A and x is an eigenvector of A.

Power Method for computing the greatest eigenvalue by absolute value, based onxk+1 = Axk

‖Axk‖ .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 4 / 1

Key Concepts

Principal Component Analysis is a linear transformation that transforms data intoa new coordinate system such that the projection of the data on the firstcoordinate explains more variance in the data than the projection on the thesecond coordinate, which in turn explains more variance than a projection ontothe third coordinate etc.

The Eigenvalue Problem:given A ∈ Rn×n, find the unknowns x ∈ Rn and λ ∈ R such that Ax = λx .

Here, λ is called an eigenvalue of A and x is an eigenvector of A.

Power Method for computing the greatest eigenvalue by absolute value, based onxk+1 = Axk

‖Axk‖ .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 4 / 1

Key Concepts

Principal Component Analysis is a linear transformation that transforms data intoa new coordinate system such that the projection of the data on the firstcoordinate explains more variance in the data than the projection on the thesecond coordinate, which in turn explains more variance than a projection ontothe third coordinate etc.

The Eigenvalue Problem:given A ∈ Rn×n, find the unknowns x ∈ Rn and λ ∈ R such that Ax = λx .

Here, λ is called an eigenvalue of A and x is an eigenvector of A.

Power Method for computing the greatest eigenvalue by absolute value, based onxk+1 = Axk

‖Axk‖ .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 4 / 1

Key Concepts

Principal Component Analysis is a linear transformation that transforms data intoa new coordinate system such that the projection of the data on the firstcoordinate explains more variance in the data than the projection on the thesecond coordinate, which in turn explains more variance than a projection ontothe third coordinate etc.

The Eigenvalue Problem:given A ∈ Rn×n, find the unknowns x ∈ Rn and λ ∈ R such that Ax = λx .

Here, λ is called an eigenvalue of A and x is an eigenvector of A.

Power Method for computing the greatest eigenvalue by absolute value, based onxk+1 = Axk

‖Axk‖ .

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 4 / 1

Principal Component Analysis

Let A ∈ Rn×n denote a data matrix, which encodes m observations (samples) ofdimension n each (e.g., n features), which has been normalised such that eachcolumn has zero mean.

PCA extracts convex combinations of columns of A while maximising `2 norm ofAx .

The extraction of the first combination is:

maxx∈Rn‖Ax‖2 such that ‖x‖2 ≤ 1, (2.1)

The optimum x∗ ∈ Rn is called the loading vector. Ax∗ is called the first principalcomponent.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 5 / 1

Principal Component Analysis

Let A ∈ Rn×n denote a data matrix, which encodes m observations (samples) ofdimension n each (e.g., n features), which has been normalised such that eachcolumn has zero mean.

PCA extracts convex combinations of columns of A while maximising `2 norm ofAx .

The extraction of the first combination is:

maxx∈Rn‖Ax‖2 such that ‖x‖2 ≤ 1, (2.1)

The optimum x∗ ∈ Rn is called the loading vector. Ax∗ is called the first principalcomponent.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 5 / 1

Principal Component Analysis

Let A ∈ Rn×n denote a data matrix, which encodes m observations (samples) ofdimension n each (e.g., n features), which has been normalised such that eachcolumn has zero mean.

PCA extracts convex combinations of columns of A while maximising `2 norm ofAx .

The extraction of the first combination is:

maxx∈Rn‖Ax‖2 such that ‖x‖2 ≤ 1, (2.1)

The optimum x∗ ∈ Rn is called the loading vector. Ax∗ is called the first principalcomponent.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 5 / 1

Principal Component Analysis

Let A ∈ Rn×n denote a data matrix, which encodes m observations (samples) ofdimension n each (e.g., n features), which has been normalised such that eachcolumn has zero mean.

PCA extracts convex combinations of columns of A while maximising `2 norm ofAx .

The extraction of the first combination is:

maxx∈Rn‖Ax‖2 such that ‖x‖2 ≤ 1, (2.1)

The optimum x∗ ∈ Rn is called the loading vector. Ax∗ is called the first principalcomponent.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 5 / 1

Principal Component Analysis

Each row-vector Ai : of A is mapped to a new vector t = x · Ai : in terms of theprincipal component x .

One can consider further principle components, usually sorted in the decreasingorder of the amount of variance they explain.

These can be obtained by running the same methods on an updated matrixAk+1 = Ak − xk(xk)TAkxk(xk)T , which is known as Hotelling’s deflation.

The combinations point in mutually orthogonal directions.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 6 / 1

Principal Component Analysis

Each row-vector Ai : of A is mapped to a new vector t = x · Ai : in terms of theprincipal component x .

One can consider further principle components, usually sorted in the decreasingorder of the amount of variance they explain.

These can be obtained by running the same methods on an updated matrixAk+1 = Ak − xk(xk)TAkxk(xk)T , which is known as Hotelling’s deflation.

The combinations point in mutually orthogonal directions.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 6 / 1

Principal Component Analysis

Each row-vector Ai : of A is mapped to a new vector t = x · Ai : in terms of theprincipal component x .

One can consider further principle components, usually sorted in the decreasingorder of the amount of variance they explain.

These can be obtained by running the same methods on an updated matrixAk+1 = Ak − xk(xk)TAkxk(xk)T , which is known as Hotelling’s deflation.

The combinations point in mutually orthogonal directions.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 6 / 1

Principal Component Analysis

Each row-vector Ai : of A is mapped to a new vector t = x · Ai : in terms of theprincipal component x .

One can consider further principle components, usually sorted in the decreasingorder of the amount of variance they explain.

These can be obtained by running the same methods on an updated matrixAk+1 = Ak − xk(xk)TAkxk(xk)T , which is known as Hotelling’s deflation.

The combinations point in mutually orthogonal directions.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 6 / 1

The DerivationProof sketch. Consider:

maxx∈Rn‖Ax‖2 such that ‖x‖2 ≤ 1, (2.2)

where ‖ · ‖2 is `2.

The Lagrangian is L(x) = xTAx − λ(xT x − 1

)where λ ∈ R is a newly

introduced Lagrangian multiplier.

The stationary points of L(x) satisfy

dL(x)

dx= 0 (2.3)

2xTAT − 2λxT = 0 (2.4)

Ax = λx . (2.5)

Recalling Ax = λx in the definition of an eigenvalue problem, each eigenvalues λof A is hence the value of L(x) at an eigenvectors of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 7 / 1

The DerivationProof sketch. Consider:

maxx∈Rn‖Ax‖2 such that ‖x‖2 ≤ 1, (2.2)

where ‖ · ‖2 is `2.

The Lagrangian is L(x) = xTAx − λ(xT x − 1

)where λ ∈ R is a newly

introduced Lagrangian multiplier.

The stationary points of L(x) satisfy

dL(x)

dx= 0 (2.3)

2xTAT − 2λxT = 0 (2.4)

Ax = λx . (2.5)

Recalling Ax = λx in the definition of an eigenvalue problem, each eigenvalues λof A is hence the value of L(x) at an eigenvectors of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 7 / 1

The DerivationProof sketch. Consider:

maxx∈Rn‖Ax‖2 such that ‖x‖2 ≤ 1, (2.2)

where ‖ · ‖2 is `2.

The Lagrangian is L(x) = xTAx − λ(xT x − 1

)where λ ∈ R is a newly

introduced Lagrangian multiplier.

The stationary points of L(x) satisfy

dL(x)

dx= 0 (2.3)

2xTAT − 2λxT = 0 (2.4)

Ax = λx . (2.5)

Recalling Ax = λx in the definition of an eigenvalue problem, each eigenvalues λof A is hence the value of L(x) at an eigenvectors of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 7 / 1

Eigenvalues

Notice that we have seen eigenvalues a number of times before:

We have seen that positive definite matrices A, i.e., xTAx > 0 for all x 6= 0, havexTAx = λxT x > 0 and eigenvalues λ > 0.

We have seen the set σ(A) of all eigenvalues of A, called the spectrum.

The absolute value of the dominant eigenvalue is the spectral radius, which wehave seen in the definition of a contraction mapping.

We have also seen the spectral condition number of symmetric A,

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

, which was a measure of the distortion produced by A,

i.e., the difference in expansion/contraction of eigenvectors that A can cause.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 8 / 1

Eigenvalues

Notice that we have seen eigenvalues a number of times before:

We have seen that positive definite matrices A, i.e., xTAx > 0 for all x 6= 0, havexTAx = λxT x > 0 and eigenvalues λ > 0.

We have seen the set σ(A) of all eigenvalues of A, called the spectrum.

The absolute value of the dominant eigenvalue is the spectral radius, which wehave seen in the definition of a contraction mapping.

We have also seen the spectral condition number of symmetric A,

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

, which was a measure of the distortion produced by A,

i.e., the difference in expansion/contraction of eigenvectors that A can cause.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 8 / 1

Eigenvalues

Notice that we have seen eigenvalues a number of times before:

We have seen that positive definite matrices A, i.e., xTAx > 0 for all x 6= 0, havexTAx = λxT x > 0 and eigenvalues λ > 0.

We have seen the set σ(A) of all eigenvalues of A, called the spectrum.

The absolute value of the dominant eigenvalue is the spectral radius, which wehave seen in the definition of a contraction mapping.

We have also seen the spectral condition number of symmetric A,

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

, which was a measure of the distortion produced by A,

i.e., the difference in expansion/contraction of eigenvectors that A can cause.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 8 / 1

Eigenvalues

Notice that we have seen eigenvalues a number of times before:

We have seen that positive definite matrices A, i.e., xTAx > 0 for all x 6= 0, havexTAx = λxT x > 0 and eigenvalues λ > 0.

We have seen the set σ(A) of all eigenvalues of A, called the spectrum.

The absolute value of the dominant eigenvalue is the spectral radius, which wehave seen in the definition of a contraction mapping.

We have also seen the spectral condition number of symmetric A,

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

, which was a measure of the distortion produced by A,

i.e., the difference in expansion/contraction of eigenvectors that A can cause.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 8 / 1

Eigenvalues

Notice that we have seen eigenvalues a number of times before:

We have seen that positive definite matrices A, i.e., xTAx > 0 for all x 6= 0, havexTAx = λxT x > 0 and eigenvalues λ > 0.

We have seen the set σ(A) of all eigenvalues of A, called the spectrum.

The absolute value of the dominant eigenvalue is the spectral radius, which wehave seen in the definition of a contraction mapping.

We have also seen the spectral condition number of symmetric A,

cond*rel(A) :=

maxλ∈σ(A) |λ|minλ∈σ(A) |λ|

, which was a measure of the distortion produced by A,

i.e., the difference in expansion/contraction of eigenvectors that A can cause.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 8 / 1

Eigenvalues

Prior to explaining more about eigenvalues and eigenvectors, let us see aninteractive demonstration:http://setosa.io/ev/eigenvectors-and-eigenvalues/

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 9 / 1

The Use in Python

Python libraries for computing eigenvalues and eigenvectors:

1 import numpy as npA = np.random.rand(2,2)(vals, vecs) = np.linalg.eig(A)np.dot(A, vecs[:,0]), vals[0] * vecs[:,0]np.dot(A, vecs[:,1]), vals[1] * vecs[:,1]

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 10 / 1

Eigenvalues: Bad News

The ith eigenvalue functional A 7→ λi (A) is not a linear functional beyonddimension one.

It is not convex (except for i = 1) or concave (except for i = n).

For any m ≥ 5, there is an m ×m matrix with rational coefficients whoseeigenvalue cannot be written using any expression involving rational numbers,addition, subtraction, multiplication, division, and taking kth roots.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 11 / 1

Eigenvalues: Bad News

The ith eigenvalue functional A 7→ λi (A) is not a linear functional beyonddimension one.

It is not convex (except for i = 1) or concave (except for i = n).

For any m ≥ 5, there is an m ×m matrix with rational coefficients whoseeigenvalue cannot be written using any expression involving rational numbers,addition, subtraction, multiplication, division, and taking kth roots.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 11 / 1

Eigenvalues: Bad News

The ith eigenvalue functional A 7→ λi (A) is not a linear functional beyonddimension one.

It is not convex (except for i = 1) or concave (except for i = n).

For any m ≥ 5, there is an m ×m matrix with rational coefficients whoseeigenvalue cannot be written using any expression involving rational numbers,addition, subtraction, multiplication, division, and taking kth roots.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 11 / 1

Eigenvalues: Bad News Explained

Example (Trefethen and Bau, 25.1)

Consider the polynomial p(z) = zmam−1zm−1 + · · ·+ a1z + a0. The roots of p(z)

are equal to the eigenvalues of:0 −a0

1 0 −a1

1. . .

.... . . 0 −am−2

1 (−am−1)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 12 / 1

Eigenvalues: Good News

The ith eigenvalue functional A 7→ λi (A) is Lipschitz continuous for symmetricmatrices, for fixed 1 ≤ i ≤ n.

There is a characterisation, resembling convexity:

Theorem (Courant-Fischer)

Let A ∈ Rn×n be symmetric. Then we have

λi (A) = supdim(V )=i

infv∈V :|v |=1

vTAv

λi (A) = infdim(V )=n−i+1

supv∈V :|v |=1

vTAv

for all 1 ≤ i ≤ n, where V ranges over all subspaces of Rn with the indicateddimension.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 13 / 1

Eigenvalues: Good News

The ith eigenvalue functional A 7→ λi (A) is Lipschitz continuous for symmetricmatrices, for fixed 1 ≤ i ≤ n.

There is a characterisation, resembling convexity:

Theorem (Courant-Fischer)

Let A ∈ Rn×n be symmetric. Then we have

λi (A) = supdim(V )=i

infv∈V :|v |=1

vTAv

λi (A) = infdim(V )=n−i+1

supv∈V :|v |=1

vTAv

for all 1 ≤ i ≤ n, where V ranges over all subspaces of Rn with the indicateddimension.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 13 / 1

Eigenvalues: Perturbation AnalysisLet us consider a symmetric matrices A,B ∈ Rn×n and view B as a perturbationof A. One can prove the so called Weyl inequalities:

λi+j−1(A + B) ≤ λi (A) + λj(B), (3.1)

for all i , j ≥ 1 and i + j − 1 ≤ n, the so called Ky Fan inequality

λ1(A + B) + . . .+ λk(A + B) ≤ λ1(A) + . . .+ λk(A) + λ1(B) + . . .+ λk(B)(3.2)

and the Tao inequality:

|λi (A + B)− λi (A)| ≤ ‖B‖op = max(|λ1(B)|, |λn(B)|). (3.3)

These suggest that for symmetric matrices, the spectrum of A + B is close to thatof A if B is small in operator norm.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 14 / 1

Another Example

Google started out with a patent application for PageRank. There A ∈ Rn×n is an“adjacency” matrix, which links between pairs of websites. n is the number ofwebsites.

The eigenvector corresponding to the dominant eigenvalue suggested a reasonablerating of the websites for the use in search results.

There, however, you need a very efficient method of computing the eigenvector,considering n is very large.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 15 / 1

Another Example

Google started out with a patent application for PageRank. There A ∈ Rn×n is an“adjacency” matrix, which links between pairs of websites. n is the number ofwebsites.

The eigenvector corresponding to the dominant eigenvalue suggested a reasonablerating of the websites for the use in search results.

There, however, you need a very efficient method of computing the eigenvector,considering n is very large.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 15 / 1

Another Example

Google started out with a patent application for PageRank. There A ∈ Rn×n is an“adjacency” matrix, which links between pairs of websites. n is the number ofwebsites.

The eigenvector corresponding to the dominant eigenvalue suggested a reasonablerating of the websites for the use in search results.

There, however, you need a very efficient method of computing the eigenvector,considering n is very large.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 15 / 1

Power Method

The power method is a simple iterative algorithm for computing the largesteigenvalue by absolute value.

It is based on xk+1 = Axk

‖Axk‖ ,

which is motivated by the fact that given a matrix A and a basis of Rn comprisingeigenvectors of A,vectors multiplied by A are expanded mostin the direction of the eigenvector associated with the eigenvalue of the biggestmagnitude.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 16 / 1

Power Method

The power method is a simple iterative algorithm for computing the largesteigenvalue by absolute value.

It is based on xk+1 = Axk

‖Axk‖ ,

which is motivated by the fact that given a matrix A and a basis of Rn comprisingeigenvectors of A,vectors multiplied by A are expanded mostin the direction of the eigenvector associated with the eigenvalue of the biggestmagnitude.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 16 / 1

Power Method

The power method is a simple iterative algorithm for computing the largesteigenvalue by absolute value.

It is based on xk+1 = Axk

‖Axk‖ ,

which is motivated by the fact that given a matrix A and a basis of Rn comprisingeigenvectors of A,vectors multiplied by A are expanded mostin the direction of the eigenvector associated with the eigenvalue of the biggestmagnitude.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 16 / 1

Power Method

The power method is a simple iterative algorithm for computing the largesteigenvalue by absolute value.

It is based on xk+1 = Axk

‖Axk‖ ,

which is motivated by the fact that given a matrix A and a basis of Rn comprisingeigenvectors of A,vectors multiplied by A are expanded mostin the direction of the eigenvector associated with the eigenvalue of the biggestmagnitude.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 16 / 1

Power Method

The power method is a simple iterative algorithm for computing the largesteigenvalue by absolute value.

It is based on xk+1 = Axk

‖Axk‖ ,

which is motivated by the fact that given a matrix A and a basis of Rn comprisingeigenvectors of A,vectors multiplied by A are expanded mostin the direction of the eigenvector associated with the eigenvalue of the biggestmagnitude.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 16 / 1

An Example

Question: What happens as we repeatedly multiply a vector by a matrix?

To get some intuition, consider the following example.

Let A be a 2× 2 matrix with eigenvalues 3 and 1/2and corresponding eigenvectors y and z .

Recall that eigenvectors of distinct eigenvalues are always linearly independent.We can hence think of {y , z} as a basis of Rn.

Let x0 be any linear combination of y and z , e.g., x0 = y + z .

(In general, we could study x0 = αy + βz for arbitrary scalars α, βbut here we are just letting α and β both equal 1,to simplify the discussion.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 17 / 1

An Example

Question: What happens as we repeatedly multiply a vector by a matrix?

To get some intuition, consider the following example.

Let A be a 2× 2 matrix with eigenvalues 3 and 1/2and corresponding eigenvectors y and z .

Recall that eigenvectors of distinct eigenvalues are always linearly independent.We can hence think of {y , z} as a basis of Rn.

Let x0 be any linear combination of y and z , e.g., x0 = y + z .

(In general, we could study x0 = αy + βz for arbitrary scalars α, βbut here we are just letting α and β both equal 1,to simplify the discussion.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 17 / 1

An Example

Question: What happens as we repeatedly multiply a vector by a matrix?

To get some intuition, consider the following example.

Let A be a 2× 2 matrix with eigenvalues 3 and 1/2and corresponding eigenvectors y and z .

Recall that eigenvectors of distinct eigenvalues are always linearly independent.We can hence think of {y , z} as a basis of Rn.

Let x0 be any linear combination of y and z , e.g., x0 = y + z .

(In general, we could study x0 = αy + βz for arbitrary scalars α, βbut here we are just letting α and β both equal 1,to simplify the discussion.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 17 / 1

An Example

Question: What happens as we repeatedly multiply a vector by a matrix?

To get some intuition, consider the following example.

Let A be a 2× 2 matrix with eigenvalues 3 and 1/2and corresponding eigenvectors y and z .

Recall that eigenvectors of distinct eigenvalues are always linearly independent.We can hence think of {y , z} as a basis of Rn.

Let x0 be any linear combination of y and z , e.g., x0 = y + z .

(In general, we could study x0 = αy + βz for arbitrary scalars α, βbut here we are just letting α and β both equal 1,to simplify the discussion.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 17 / 1

An Example

Question: What happens as we repeatedly multiply a vector by a matrix?

To get some intuition, consider the following example.

Let A be a 2× 2 matrix with eigenvalues 3 and 1/2and corresponding eigenvectors y and z .

Recall that eigenvectors of distinct eigenvalues are always linearly independent.We can hence think of {y , z} as a basis of Rn.

Let x0 be any linear combination of y and z , e.g., x0 = y + z .

(In general, we could study x0 = αy + βz for arbitrary scalars α, βbut here we are just letting α and β both equal 1,to simplify the discussion.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 17 / 1

An Example

Question: What happens as we repeatedly multiply a vector by a matrix?

To get some intuition, consider the following example.

Let A be a 2× 2 matrix with eigenvalues 3 and 1/2and corresponding eigenvectors y and z .

Recall that eigenvectors of distinct eigenvalues are always linearly independent.We can hence think of {y , z} as a basis of Rn.

Let x0 be any linear combination of y and z , e.g., x0 = y + z .

(In general, we could study x0 = αy + βz for arbitrary scalars α, βbut here we are just letting α and β both equal 1,to simplify the discussion.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 17 / 1

An Example

Question: What happens as we repeatedly multiply a vector by a matrix?

To get some intuition, consider the following example.

Let A be a 2× 2 matrix with eigenvalues 3 and 1/2and corresponding eigenvectors y and z .

Recall that eigenvectors of distinct eigenvalues are always linearly independent.We can hence think of {y , z} as a basis of Rn.

Let x0 be any linear combination of y and z , e.g., x0 = y + z .

(In general, we could study x0 = αy + βz for arbitrary scalars α, βbut here we are just letting α and β both equal 1,to simplify the discussion.)

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 17 / 1

An Example: Expansion/Contraction by EigenvalueNow let x1 = Ax0, and xk+1 = Axk = Akx0 for each k ∈ N.

Since matrix multiplication is distributive over addition,

x1 = Ay + Az = 3y +1

2z .

Thus the y component of x0 is expanded while the z component is contracted.

Repeating this process k times, we get:

xk = Axk−1 = Akx0 = 3ky +1

2kz .

Thus, xk expands in the y direction and contracts in the z direction.

Eventually, xk will be almost completely made up of its y component, and the zcomponent will be negligible.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 18 / 1

An Example: Expansion/Contraction by EigenvalueNow let x1 = Ax0, and xk+1 = Axk = Akx0 for each k ∈ N.

Since matrix multiplication is distributive over addition,

x1 = Ay + Az = 3y +1

2z .

Thus the y component of x0 is expanded while the z component is contracted.

Repeating this process k times, we get:

xk = Axk−1 = Akx0 = 3ky +1

2kz .

Thus, xk expands in the y direction and contracts in the z direction.

Eventually, xk will be almost completely made up of its y component, and the zcomponent will be negligible.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 18 / 1

An Example: Expansion/Contraction by EigenvalueNow let x1 = Ax0, and xk+1 = Axk = Akx0 for each k ∈ N.

Since matrix multiplication is distributive over addition,

x1 = Ay + Az = 3y +1

2z .

Thus the y component of x0 is expanded while the z component is contracted.

Repeating this process k times, we get:

xk = Axk−1 = Akx0 = 3ky +1

2kz .

Thus, xk expands in the y direction and contracts in the z direction.

Eventually, xk will be almost completely made up of its y component, and the zcomponent will be negligible.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 18 / 1

An Example: Expansion/Contraction by EigenvalueNow let x1 = Ax0, and xk+1 = Axk = Akx0 for each k ∈ N.

Since matrix multiplication is distributive over addition,

x1 = Ay + Az = 3y +1

2z .

Thus the y component of x0 is expanded while the z component is contracted.

Repeating this process k times, we get:

xk = Axk−1 = Akx0 = 3ky +1

2kz .

Thus, xk expands in the y direction and contracts in the z direction.

Eventually, xk will be almost completely made up of its y component, and the zcomponent will be negligible.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 18 / 1

An Example: Expansion/Contraction by EigenvalueNow let x1 = Ax0, and xk+1 = Axk = Akx0 for each k ∈ N.

Since matrix multiplication is distributive over addition,

x1 = Ay + Az = 3y +1

2z .

Thus the y component of x0 is expanded while the z component is contracted.

Repeating this process k times, we get:

xk = Axk−1 = Akx0 = 3ky +1

2kz .

Thus, xk expands in the y direction and contracts in the z direction.

Eventually, xk will be almost completely made up of its y component, and the zcomponent will be negligible.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 18 / 1

An Example: Expansion/Contraction by EigenvalueNow let x1 = Ax0, and xk+1 = Axk = Akx0 for each k ∈ N.

Since matrix multiplication is distributive over addition,

x1 = Ay + Az = 3y +1

2z .

Thus the y component of x0 is expanded while the z component is contracted.

Repeating this process k times, we get:

xk = Axk−1 = Akx0 = 3ky +1

2kz .

Thus, xk expands in the y direction and contracts in the z direction.

Eventually, xk will be almost completely made up of its y component, and the zcomponent will be negligible.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 18 / 1

An Example: The Dominant Eigenvalue

Note that how many iterations k are needed depends onhow much the biggest eigenvalue exceeds the second biggest in magnitude.

In the example, the ratio is 31/2 = 6

so expansion in the y direction is happening 6 times faster than in the z direction.

In general, for any matrix A, the growth rate of its powers Ak as k →∞depends on the eigenvalue(s) of A whose absolute value is greatest:this largest absolute value is of course the spectral radius ρ(A).

These eigenvalue(s) are generally called the dominant eigenvalue(s) of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 19 / 1

An Example: The Dominant Eigenvalue

Note that how many iterations k are needed depends onhow much the biggest eigenvalue exceeds the second biggest in magnitude.

In the example, the ratio is 31/2 = 6

so expansion in the y direction is happening 6 times faster than in the z direction.

In general, for any matrix A, the growth rate of its powers Ak as k →∞depends on the eigenvalue(s) of A whose absolute value is greatest:this largest absolute value is of course the spectral radius ρ(A).

These eigenvalue(s) are generally called the dominant eigenvalue(s) of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 19 / 1

An Example: The Dominant Eigenvalue

Note that how many iterations k are needed depends onhow much the biggest eigenvalue exceeds the second biggest in magnitude.

In the example, the ratio is 31/2 = 6

so expansion in the y direction is happening 6 times faster than in the z direction.

In general, for any matrix A, the growth rate of its powers Ak as k →∞depends on the eigenvalue(s) of A whose absolute value is greatest:this largest absolute value is of course the spectral radius ρ(A).

These eigenvalue(s) are generally called the dominant eigenvalue(s) of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 19 / 1

An Example: The Dominant Eigenvalue

Note that how many iterations k are needed depends onhow much the biggest eigenvalue exceeds the second biggest in magnitude.

In the example, the ratio is 31/2 = 6

so expansion in the y direction is happening 6 times faster than in the z direction.

In general, for any matrix A, the growth rate of its powers Ak as k →∞depends on the eigenvalue(s) of A whose absolute value is greatest:this largest absolute value is of course the spectral radius ρ(A).

These eigenvalue(s) are generally called the dominant eigenvalue(s) of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 19 / 1

An Example: The Dominant Eigenvalue

Note that how many iterations k are needed depends onhow much the biggest eigenvalue exceeds the second biggest in magnitude.

In the example, the ratio is 31/2 = 6

so expansion in the y direction is happening 6 times faster than in the z direction.

In general, for any matrix A, the growth rate of its powers Ak as k →∞depends on the eigenvalue(s) of A whose absolute value is greatest:this largest absolute value is of course the spectral radius ρ(A).

These eigenvalue(s) are generally called the dominant eigenvalue(s) of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 19 / 1

An Example: The Dominant Eigenvalue

Note that how many iterations k are needed depends onhow much the biggest eigenvalue exceeds the second biggest in magnitude.

In the example, the ratio is 31/2 = 6

so expansion in the y direction is happening 6 times faster than in the z direction.

In general, for any matrix A, the growth rate of its powers Ak as k →∞depends on the eigenvalue(s) of A whose absolute value is greatest:this largest absolute value is of course the spectral radius ρ(A).

These eigenvalue(s) are generally called the dominant eigenvalue(s) of A.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 19 / 1

Power Method: In GeneralLet us generalise the example. Denote the approximation to the eigenvector atiteration k by xk .

The initial x1 may be chosen randomlyor set to an approximation to a dominant eigenvector x ,if there is some knowledge of this.

The core of the method is the step

xk+1 :=1

‖Axk‖Axk .

That is, at each iteration, xk is left-multiplied by A and normalised(divided by its own norm) giving a unit vector in the direction of Axk .

If xk were an eigenvector of A, then xk+1 would be equal to xk

(since they are both unit vectors, both in the same direction). This suggest aCauchy-type convergence criterion.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 20 / 1

Power Method: In GeneralLet us generalise the example. Denote the approximation to the eigenvector atiteration k by xk .

The initial x1 may be chosen randomlyor set to an approximation to a dominant eigenvector x ,if there is some knowledge of this.

The core of the method is the step

xk+1 :=1

‖Axk‖Axk .

That is, at each iteration, xk is left-multiplied by A and normalised(divided by its own norm) giving a unit vector in the direction of Axk .

If xk were an eigenvector of A, then xk+1 would be equal to xk

(since they are both unit vectors, both in the same direction). This suggest aCauchy-type convergence criterion.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 20 / 1

Power Method: In GeneralLet us generalise the example. Denote the approximation to the eigenvector atiteration k by xk .

The initial x1 may be chosen randomlyor set to an approximation to a dominant eigenvector x ,if there is some knowledge of this.

The core of the method is the step

xk+1 :=1

‖Axk‖Axk .

That is, at each iteration, xk is left-multiplied by A and normalised(divided by its own norm) giving a unit vector in the direction of Axk .

If xk were an eigenvector of A, then xk+1 would be equal to xk

(since they are both unit vectors, both in the same direction). This suggest aCauchy-type convergence criterion.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 20 / 1

Power Method: In GeneralLet us generalise the example. Denote the approximation to the eigenvector atiteration k by xk .

The initial x1 may be chosen randomlyor set to an approximation to a dominant eigenvector x ,if there is some knowledge of this.

The core of the method is the step

xk+1 :=1

‖Axk‖Axk .

That is, at each iteration, xk is left-multiplied by A and normalised(divided by its own norm) giving a unit vector in the direction of Axk .

If xk were an eigenvector of A, then xk+1 would be equal to xk

(since they are both unit vectors, both in the same direction). This suggest aCauchy-type convergence criterion.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 20 / 1

Power Method: In GeneralLet us generalise the example. Denote the approximation to the eigenvector atiteration k by xk .

The initial x1 may be chosen randomlyor set to an approximation to a dominant eigenvector x ,if there is some knowledge of this.

The core of the method is the step

xk+1 :=1

‖Axk‖Axk .

That is, at each iteration, xk is left-multiplied by A and normalised(divided by its own norm) giving a unit vector in the direction of Axk .

If xk were an eigenvector of A, then xk+1 would be equal to xk

(since they are both unit vectors, both in the same direction). This suggest aCauchy-type convergence criterion.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 20 / 1

Power Method: In GeneralLet us generalise the example. Denote the approximation to the eigenvector atiteration k by xk .

The initial x1 may be chosen randomlyor set to an approximation to a dominant eigenvector x ,if there is some knowledge of this.

The core of the method is the step

xk+1 :=1

‖Axk‖Axk .

That is, at each iteration, xk is left-multiplied by A and normalised(divided by its own norm) giving a unit vector in the direction of Axk .

If xk were an eigenvector of A, then xk+1 would be equal to xk

(since they are both unit vectors, both in the same direction). This suggest aCauchy-type convergence criterion.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 20 / 1

Power Method: In Python

Choose the starting x1 so that ‖x1‖ = 1.

import numpy as npdef Power(A, x0, tol = 1e-5, limit = 100):x = x0for iteration in xrange(limit):

5 next = A*xnext = next/np.linalg.norm(next)if allclose(x, next, atol=tol): breakx = next

return next

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 21 / 1

Power Method: Convergence

The method will converge if:

A has a dominant real eigenvalue λ1, that is, |λ1| is strictly larger than |λ| forany other eigenvalue λ of A

the initial vector has a nonzero component in the direction of the eigenvectorx corresponding to λ1.

Certain classes of matrices are guaranteed to have real eigenvalues,e.g., symmetric matrices, positive definite matrices.

Also, the adjacency matrix of a connected graph will have a dominant realeigenvalue, by the Perron-Frobenius theorem.

On the other hand, the power iteration method will failif A’s dominant eigenvalue(s) have non-zero imaginary parts.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 22 / 1

Power Method: Convergence

The method will converge if:

A has a dominant real eigenvalue λ1, that is, |λ1| is strictly larger than |λ| forany other eigenvalue λ of A

the initial vector has a nonzero component in the direction of the eigenvectorx corresponding to λ1.

Certain classes of matrices are guaranteed to have real eigenvalues,e.g., symmetric matrices, positive definite matrices.

Also, the adjacency matrix of a connected graph will have a dominant realeigenvalue, by the Perron-Frobenius theorem.

On the other hand, the power iteration method will failif A’s dominant eigenvalue(s) have non-zero imaginary parts.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 22 / 1

Power Method: Convergence

The method will converge if:

A has a dominant real eigenvalue λ1, that is, |λ1| is strictly larger than |λ| forany other eigenvalue λ of A

the initial vector has a nonzero component in the direction of the eigenvectorx corresponding to λ1.

Certain classes of matrices are guaranteed to have real eigenvalues,e.g., symmetric matrices, positive definite matrices.

Also, the adjacency matrix of a connected graph will have a dominant realeigenvalue, by the Perron-Frobenius theorem.

On the other hand, the power iteration method will failif A’s dominant eigenvalue(s) have non-zero imaginary parts.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 22 / 1

Power Method: Convergence

The method will converge if:

A has a dominant real eigenvalue λ1, that is, |λ1| is strictly larger than |λ| forany other eigenvalue λ of A

the initial vector has a nonzero component in the direction of the eigenvectorx corresponding to λ1.

Certain classes of matrices are guaranteed to have real eigenvalues,e.g., symmetric matrices, positive definite matrices.

Also, the adjacency matrix of a connected graph will have a dominant realeigenvalue, by the Perron-Frobenius theorem.

On the other hand, the power iteration method will failif A’s dominant eigenvalue(s) have non-zero imaginary parts.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 22 / 1

Power Method: Iteration Complexity

Theorem (Trefethen and Bau, 27.1)

Assuming |λ1| > |λ2| ≥ · · · ≥ |λm| ≥ 0 and non-zero x0, the iterates of powermethod satisfy:

‖xk − (±x∗)‖ = O

(∣∣∣∣λ2

λ1

∣∣∣∣k)

(4.1)

∣∣∣((xk)TAxk)− λ1

∣∣∣ = O

(∣∣∣∣λ2

λ1

∣∣∣∣2k)

(4.2)

as k →∞. The ± indicates that at each iteration k , the bound holds for one ofthe signs.

So its convergence is very fast, except when the largest and second largesteigenvalues of A are very close in absolute value.

Power Method: Per-Iteration Complexity

Assuming A is dense, the worst case complexity per iteration is n2 multiplicationsin the computation of Axk .

Computing the vector norm ‖Axk‖ =√

(Axk) · (Axk) also takes n multiplications.

This gives a total of O(n2) operations.

Assuming A is sparse, such as in PageRank computations, the number of floatingpoint operations in the computation of Axk is twice the number of non-zeros inA, independent of n,m.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 24 / 1

Power Method: Per-Iteration Complexity

Assuming A is dense, the worst case complexity per iteration is n2 multiplicationsin the computation of Axk .

Computing the vector norm ‖Axk‖ =√

(Axk) · (Axk) also takes n multiplications.

This gives a total of O(n2) operations.

Assuming A is sparse, such as in PageRank computations, the number of floatingpoint operations in the computation of Axk is twice the number of non-zeros inA, independent of n,m.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 24 / 1

Power Method: Per-Iteration Complexity

Assuming A is dense, the worst case complexity per iteration is n2 multiplicationsin the computation of Axk .

Computing the vector norm ‖Axk‖ =√

(Axk) · (Axk) also takes n multiplications.

This gives a total of O(n2) operations.

Assuming A is sparse, such as in PageRank computations, the number of floatingpoint operations in the computation of Axk is twice the number of non-zeros inA, independent of n,m.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 24 / 1

Power Method: Per-Iteration Complexity

Assuming A is dense, the worst case complexity per iteration is n2 multiplicationsin the computation of Axk .

Computing the vector norm ‖Axk‖ =√

(Axk) · (Axk) also takes n multiplications.

This gives a total of O(n2) operations.

Assuming A is sparse, such as in PageRank computations, the number of floatingpoint operations in the computation of Axk is twice the number of non-zeros inA, independent of n,m.

Jakub Marecek and Sean McGarraghy (UCD) Numerical Analysis and Software November 5, 2015 24 / 1

Power Method: Further PropertiesOnce the dominant eigenvalue is found, the deflation method can be used tofind the second largest eigenvalue, etc. For finding all eigenvalues,decomposition methods may be preferrable, though.If the initial x1 had a zero component in the x direction, we rely on roundingerrors to get a component in the x direction at some stage in the run. Oncewe get a non-zero x component, it will expand relative to the othercomponents, but we may need to wait for this component to appear and thengrow.The expression Ax could be replaced based on

Ax = λx ⇒ x tAx = λx tx = λx · x = λ‖x‖2 ⇒ λ =1

‖x‖2x tAx .

which makes it possible to introduce additional checks to avoid under- andover-flows and division by very small numbers.Otherwise, one only needs to store A and two vectors xk and Axk . This ispossible even for n × n matrix A, n in the billions, such as for the GooglePageRank.