Matrix Algebraaksikas/opt_math.pdfq1qp 111p amam amam = mm mm aa aa L MOM L L MOM L L MOM L p i p AM...

12
Matrix Algebra Why Bother? most optimization problems are multivariable. simplifies notation. improves understanding provides a useful, standard shorthand. Some References: Anton, H, Elementary Linear Algebra, Wiley, 1977. Edgar, T.F., and D.M. Himmelblau, Optimization of Chemical Processes, McGraw-Hill, 1988 (Appendix B & C). Horn, R.A, and C.R. Johnson, Matrix Analysis, Cambridge University Press, 1985 (Chapter 0). Ortega, J.M., Matrix Theory, Plenum Press, 1987 (Chapter 1). Three Entities: 1) scalars (numbers or single variables). 2) vectors (sets of numbers or variables). 3) matrices (sets of vectors).

Transcript of Matrix Algebraaksikas/opt_math.pdfq1qp 111p amam amam = mm mm aa aa L MOM L L MOM L L MOM L p i p AM...

Matrix Algebra

Why Bother?

→ most optimization problems are multivariable.→ simplifies notation.→ improves understanding→ provides a useful, standard shorthand.

Some References:

Anton, H, Elementary Linear Algebra, Wiley, 1977.Edgar, T.F., and D.M. Himmelblau, Optimization of

Chemical Processes, McGraw-Hill, 1988 (Appendix B &C).

Horn, R.A, and C.R. Johnson, Matrix Analysis, CambridgeUniversity Press, 1985 (Chapter 0).

Ortega, J.M., Matrix Theory, Plenum Press, 1987 (Chapter1).

Three Entities:1) scalars (numbers or single variables).2) vectors (sets of numbers or variables).3) matrices (sets of vectors).

Matrix Algebra

!!!

"

#

$$$

%

&

''

n

1

v

v

vector vor M v

!!!

"

#

$$$

%

&

''((

pnp1

1n11

npnp

mm

mm

matrix Mor

K

MOM

L

M

1) Scalars→ a number (x,2, π).

2) Vectors→ a set of numbers or variables in a row or column.

→ we will use vectors to describe: state of a system, solution to a problem.

3) Matrices→ a set of numbers arranged in an array.

→ we will use matrices to describe relationshipsbetween vectors.

Multiplication:

by a scalar:

by a vector:

Matrix Algebra

!!!

"

#

$$$

%

&

'

!!!

"

#

$$$

%

&

'

pnp1

1n11

n

1

smsm

smsm

s and

sv

sv

s

L

MOM

L

M Mv

[ ]

[ ]

vxvx

vxvx

= vv

x

x

vx =

v

v

xx

nq1q

n111

n1

q

1

T

n

1=i

ii

n

1

n1

T

!!!

"

#

$$$

%

&

!!!

"

#

$$$

%

&

'

!!!

"

#

$$$

%

&

' (

L

MOM

L

LM

ML

xv

vx

Matrix Algebra

Multiplication (cont'd):

by a matrix,

!!!!!

"

#

$$$$$

%

&

!!!

"

#

$$$

%

&

!!!

"

#

$$$

%

&

'

(

(

=

=

n

i

i

n

i

1

pi

1

i1i

n

1

pnp1

1n11

vm

vm

=

v

v

mm

mm

MM

L

MOM

L

Mv

!!!!!

"

#

$$$$$

%

&

!!!

"

#

$$$

%

&

!!!

"

#

$$$

%

&

'

((

((==

p

1=i

inqi

p

1=i

i1qi

1

in1i

1

i11i

pnp1

1n11

qpq1

1p11

mama

mama

=

mm

mm

aa

aa

L

MOM

L

L

MOM

L

L

MOM

L

p

i

p

i

AM

vector

matrix

(note: AM ≠ MA )

Inverse:

For the square, nonsingular matrix Mnxn there is amatrix M-1 such that:

(note: Mnxn is nonsingular if rank[M]=n.)

Matrix inverses are going to be particularly useful forsolving sets of simultaneous, linear equations:

Matrix Algebra

!!!!!!

"

#

$$$$$$

%

&

'

1000

0100

0010

0001

= = = 1-1

L

L

MMOMM

L

L

IMMMM

identity matrix

bMx

bMx

=

1-

=

Transpose:

→ denoted by a superscripted T (or sometimes a ').→ interchange rows and columns.

1) for vectors,

2) for matrices,

if MT=M then M is a symmetric matrix.

Matrix Algebra

[ ]

!!!

"

#

$$$

%

&

'

!!!

"

#

$$$

%

&

'

'

!!!

"

#

$$$

%

&

'

(

pn1n

p111

T

pn1p

1n11

T

np

n1

T

n

1

T

mm

mm

mm

mm

vv

v

v

L

MOM

L

L

MOM

L

LM

M

v

Rank:→ a very important concept in systems theory.

→ number of independent rows or columns in amatrix.

→ by definition:rank [x] = 1rank [v] = 1rank [Mpxn] ≤ min(p,n).

→ can be found by Gaussian Elimination, etc..

→ for example, tells us about the solution to:Mv = b

for Mnxn

Matrix Algebra

set of simultaneous,linear equations.

if rank[M] = n a unique solution.if rank[M] < n many or no solution.{

Some Useful Matrix Facts

1. (AB)T = BT AT

2. (AB)-1 = B-1 A-1 (only if A & B are non-singular)

3. rank[AB] ≤ min(rank[A], rank[B])rank[xTv] = 1rank[vTx] = 1

4. [ ]AA

A!

"

#

$%

&

'( =

!

!

#

$%

&

'(

1 = 1

adj

for a 2 2 matrix:

1

a b

c d ad-bc

d b

c a

First Derivatives:

1) For the scalar function P(x) of the vector variable x:

• convention adopted for this course.• allows direct application of the chain rule.• some texts define as a column vector.

2) For the vector function f(x) of the vector variable x:

Matrix Calculus

!"

#$%

&

'

'

'

'(()

nx

P

x

PPP L

1

wrt ofgradient )( xxx

row vector

!!!!!

"

#

$$$$$

%

&

'

'

'

'

'

'

'

'

!!!

"

#

$$$

%

&

(

(

)(

!!!

"

#

$$$

%

&

)

n

mm

n

x

f

x

f

x

f

x

f

f

f

f

f

L

MOM

L

M

M

1

1

1

1

m

1

m

1

=

)(

)(

)(

)(

)(

)(

x

x

xf

x

x

xf

x

x

x

Jacobian of f wrt x

Second Derivatives:

For the scalar function P(x) of the vector variable x:

Note that when P(x) is at least twice continuouslydifferentiable in x then the Hessian is symmetricor:

Matrix Calculus

!!!!!!!!!!

"

#

$$$$$$$$$$

%

&

'

'

''

'

''

'

'

'

(()

2

2

1

2

1

2

2

1

2

2 wrt of Hessian )(

nn

n

x

P

xx

P

xx

P

x

P

PP

L

MOM

L

xxxx

[ ] )( = )( 2T 2xx

xxxxPP !!

Taylor Series Expansion:

If the scalar function P(x) is at least twice continuouslydifferentiable in the vector variable x at some fixed pointxo then:

Note the difference with many texts.

We are going to make extensive use of the Taylor Seriesrepresentation of nonlinear functions.

Matrix Calculus

) - O( + )-( )-(

+ )-( + )( = )(

3

oo

2T

o21

oo

o

o

xxxxxx

xxxx

xxx

xx

P

PPP

!

!

Mathematical Definitions

Linearity:

the function f(z) is linear if f(αx+βy) = αf(x) + βf(y).

This is a very important concept and we will use it inclassify problems.