Multivariate Statistics

Multivariate Statistics

Matrix Algebra I

W. M. van der VeldUniversity of Amsterdam

Overview

• Introduction• Definitions• Special names• Matrix transposition• Matrix addition• Matrix multiplication

Introduction

• The mathematics in which multivariate analysis is cast is matrix algebra.

• We will present enough matrix algebra to facilitate the description of the operations we need to understand the matrix algebra involving the multivariate analysis discussed in this course. In addition this basic understanding is necessary for the more advanced courses of the Research Master.

• Basically, all we need is a few basic tricks, at least at first. Let us summarize them, so that you will have some idea of what is coming and, more importantly, of why these topics must be mastered.

Introduction• Our point of departure is always a multivariate data matrix with a certain number,

n, of rows for the individual observation units, and a certain number, m, of columns for the variables.

• In most applications of multivariate analysis, we shall not be interested in variable means. They have their interest, of course, in each study, but multivariate analysis instead focuses on variances and covariances. Therefore, the data matrix will in general be transformed into a matrix where columns have zero means and where the numbers in the column represent deviations from the mean.

• Such a matrix is the basis for the variance-covariance matrix with n rows and m columns. For a variable i, the variance is defined as Σxi

2/n, whereas for two variables i and j the covariance is defined as Σxixj /n, xi and xj being taken as deviations from the mean. Variances and covariances can be collected in the variance-covariance matrix, in that the number in row i, column i (on the diagonal), gives the variance of variable i, while the number in row i, column j (i ≠ j), gives the covariance between the pair of variables i and j, and is the same number as in row j, column i.

• An often useful transformation is to standardize the data matrix: we first take deviations from the mean for each column, then divide the deviation from the mean by the standard deviation for the same column. The result is that values in a column will have zero mean and unit variance.

• The standardized data matrix is then the basis for calculating a correlation matrix, which is nothing but a variance-covariance matrix for standardized variables. In the diagonal of this matrix, we therefore find values equal to unity. In the other cells we find correlations: in row i, column j, we shall find the correlation coefficient rij = Σxixj /nσiσj.

Introduction• Very often we shall need a variable that is a linear compound of the initial

variables. The linear compound is simply a variable whose values are obtained by a weighted addition of values of the original variables. For example, with two initial variables x1 and x2, values of the compound are defined as y = w1x1 + w2x2, where w1 and w2 are weights. A linear compound could also be called a weighted sum.

• For some techniques of multivariate analysis, we need to be able to solve simultaneous equations. Doing so usually requires a computational routine called matrix inversion.

• Multivariate analysis nearly always comes down to finding a minimum or a maximum of some sort. A typical example is to find a linear compound of some variables that has maximum correlation with some other variable (multiple correlation), or to find a linear compound of the observed scores that has maximum variance (factor analysis). Therefore, among our stock of basic tricks, we need to include procedures for finding extreme values of functions.

• In addition, we shall often need to find maxima (or minima) of functions where the procedure is limited by certain side-conditions. For instance, we are given two sets of variables, and are required to find a linear compound from the first set, and another from the second set, such that the value of the correlation between these two compounds is maximum. This task can be reformulated as follows: find the two compounds in such a way that the covariance between them is maximum, given that the compounds both have unit variance.

• Very often in multivariate analysis, a maximization procedure under certain side-conditions takes on a very specific and recognizable form, namely, finding eigenvectors and eigenvalues of a given matrix.

Definitions

• For multivariate statistics the most important matrix is the data matrix.

• The data matrix has a certain number, n, of rows for the individual observation units, and a certain number, m, of columns for the variables.

7176800007

5722300002

5762500001

3244900000

PolTrstSocTrstSatlifeAgeidresp Data file

71768

57223

57625

32449Data matrix

Definitions

• In general a matrix has an n by m dimension.• The convention is to denote matrices by

boldface uppercase letters.

• The first subscript in a matrix element (xij) refers to the row and the second subscript refers to the column.

• It is important to remember this convention when matrix algebra is performed.

232221

131211

xxx

xxxΧ

Definitions

• A vector is a special type of matrix that has only one row (called a row vector) or one column (called a column vector). Below, a is a column vector while b is a row vector.

• The convention is to denote vectors by boldface lowercase letters.

2

1

x

xa 321 xxxb

Definitions

• A scalar is a matrix with only one row and one column.

• The convention is to denote scalars by italicized, lower case letters (e.g., x).

Special names

• If n = m then the matrix is called a square matrix.• The data matrix is normally not square, but the

variance-covariance matrix is; and many others.• Matrix A is square but matrix B is not square.

071

5122

543

A

5122

543B

Special names

• A symmetric matrix is a square matrix in which xij = xji , for all i and j.

• The data matrix is normally not symmetric, but the variance-covariance matrix is.

• Matrix A is symmetric; matrix B is not symmetric.

0101

10122

121

A

0101

21210

121

B

Special names

• A diagonal matrix is a symmetric matrix where all the off diagonal elements are 0.

• The data matrix is normally not diagonal, neither is the variance covariance matrix. The variance matrix is diagonal.

• These matrices are often denoted with D; matrix D is diagonal.

700

0120

001

D

Special names• An identity matrix is a diagonal matrix with 1s and only

1s on the diagonal, it is also sometimes called the unity matrix.

• This is a useful matrix in matrix algebra. • The convention is to denote the identity matrix by I.

100

010

001

I

Special names

• An unit vector is a vector containing only 1s.• This is a useful vector in matrix algebra. • The convention is to denote the identity matrix by

u.

1

1

1

u

Matrix transposition

• Matrix transposition is a useful transformation, with many purposes.

• The transpose of a matrix is denoted by a prime (A’) or a superscript t or T (At or AT).

• What it does? The first row of a matrix becomes the first column of the transpose matrix, the second row of the matrix becomes the second column of the transpose, etc.

120

151A

2

5

1

A'

12

25

01

A'


• What the transpose of A? And the dimensions of A’?

?'

175

713

531

AA

• The transpose of a square matrix is a square matrix

?'40

31

AA

• The transpose of a symmetric matrix is simply the original matrix.

• What type of special matrix is this matrix?• What the transpose of this matrix?


• The transpose of a row vector will be a column vector, and the transpose of a column vector will be a row vector.

32'3

2

aa

2

0

4

204 b'b

Matrix addition

• To add two matrices;– they both must have the same number of rows, and– they both must have the same number of columns.

• The elements of the two matrices are simply added together, element by element, to produce the results.

• That is, for R = A + B, then rij = aij + bij.

0101

10122

121

A

0101

21210

121

B

00...11

210...

......11

R

0202

122412

242

Matrix addition

• Matrix subtraction works in the same way, except that elements are subtracted instead of added.

• What is the result of this subtraction?

0 45 1

3 2

• What is the result of this addition?

5 0

3- 1-

4- 0

3 2

Matrix addition

• Rules for matrix addition and subtraction:– A + B = B + A

Commutative– (A + B) + C = A + (B + C) Associative– (A + B)’ = A’ + B’

Matrix multiplication

• Multiplication between a scalar and a vector.• Each element in the product matrix is simply the

scalar multiplied by the element in the vector.

• That is, for p = xa, then pij = xaij for all i and j. Thus,

• The following multiplication is also defined: p = ax. That is, scalar multiplication is commutative.

12

8

3*4

2*4

3

24ap x

? xap

4;3

2

xa


• Multiplication between two vectors.• To perform this, the row vector must have as many

columns as the column vector has rows.• The product is simply the sum of the first row vector

element multiplied by the first column vector element plus the second row vector element multiplied by the second column vector element plus the product of the third elements, etc.

• In algebra, if p = ab, then .

n

iiibap

1

?

2

1

0

210

52*21*10*0


• Multiplication between two matrices.• This is similar to the multiplication of two vectors.• Specifically, in the expression P = AB, p=ai• b•j,

where ai• is the ith row vector in matrix A and b•j is the jth column vector in matrix B.

• Thus, if

120

151A

17

40

21

B ?ABP




• Thus, if

120

151A

17

40

21

B

8ABP

87*10*51*1

7

0

1

1511111

bap

ABP




• Thus, if

120

151A

17

40

21

B

198ABP

191*14*52*1

1

4

2

1512112

bap




• Thus, if

120

151A

17

40

21

B

7

198ABP

77*10*21*0

7

0

1

1201221

bap




• Thus, if

120

151A

17

40

21

B

77

198ABP

71*14*22*0

1

4

2

1202222

bap


• Summary of multiplication procedure.

flekdjfiehdg

clbkajcibhag

li

kh

jg

fed

cba

– Then the product P = AB is defined if ma=nb – And the dimension of P is na by mb.

Matrix multiplication• For matrix multiplication to be legal, the first matrix

must have as many columns as the second matrix has rows. This, of course, is the requirement for multiplying a row vector by a column vector.

• The resulting matrix will have as many rows as the first matrix and as many columns as the second matrix.

• In the example A had 2 rows and 3 columns while B had 3 rows and 2 columns, the matrix multiplication was therefore defined resulting in matrix with 2 rows and 2 columns.

• Or in general:– Dimension A is na by ma

Dimension B is nb by mb,


• Rules for matrix and vector multiplication:– AB ≠ BA Not commutative– A(BC ) = (AB)C Associative– A(B+C) = AB + AC Distributive– (B+C)A = BA + CA– (AB) = BA– (ABC) = CBA

• Rules for scalar multiplication:– xA = Ax Commutative – x(A+B) = xA + xB Distributive– x(AB) = (xA)B = A(xB) Associative


• What is the product of:

?42

4

3

2

168

126

84

42

4

3

2

?11

4

3

2

44

33

22

11

4

3

2

?

4

3

2

11

Not possible: [1x2][3x1]


• What is the product of:

8

6

4

0*161*8

0*121*6

0*81*4

Not defined: [3x2] by [1x3]

0

1

168

126

84

4

3

2

168

126

84

37

55

73

320

10

012

19313*35*27*07*35*23*0

20

02


• Matrix division.• For simple numbers, division can be reduced to

multiplication by the reciprocal of the divider– 32 divided by 4, is the same as– 32 multiplied by ¼, or– multiplied by 4-1,– where 4-1 is defined by the general equality a-1a = 1.

• When working with matrices, we shall adopt the latter idea, and therefore not use the term division at all; instead we take the multiplication by an inverse matrix as the equivalent of division.

• However, the computation of the inverse matrix is quite complex, and discussed next time.

Multivariate Statistics

Documents

Transcript of Multivariate Statistics