UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 2 ... · UVA CS 6316/4501 – Fall 2016...

UVACS6316/4501–Fall2016

MachineLearning

Lecture2:AlgebraandCalculusReview

Dr.YanjunQi

UniversityofVirginiaDepartmentof

ComputerScience

9/1/16 Dr.YanjunQi/UVACS6316/f16 1

9/1/16 2

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 2 7 103 4 and = 8 115 6 9 12

A B C=A–B=?

C=A+B=?

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 4 7 1 42 5 8 and = 2 53 6 9 3 6

A B C=AB=?

C=BA=?

Dr.YanjunQi/UVACS6316/f16

Minimumrequirement

Today:

q DataRepresentaMonforMLsystems

q ReviewofLinearAlgebraandMatrixCalculus

9/1/16 3Dr.YanjunQi/UVACS6316/f16

ATypicalMachineLearningPipeline

9/1/16

Low-level sensing

Pre-processing

Feature Extract

Feature Select

Inference, Prediction, Recognition

Label Collection

Dr.YanjunQi/UVACS6316/f16 4

Evaluation

Optimization

e.g. Data Cleaning Task-relevant

e.g. SUPERVISED LEARNING

•  Find function to map input space X to output space Y

•  Generalisation:learnfuncNon/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”newdataexamples

9/1/16 5Dr.YanjunQi/UVACS6316/f16KEY

ADataset

•  Data/points/instances/examples/samples/records:[rows]•  Features/a0ributes/dimensions/independentvariables/covariates/

predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:special

columntobepredicted[lastcolumn]

MainTypesofColumns

•  Con7nuous:arealnumber,forexample,ageorheight

•  Discrete:asymbol,like“Good”or“Bad”

e.g. SUPERVISED Classification

•  e.g.Here,targetYisadiscretetargetvariable

9/1/16 8f(x?)

Trainingdatasetconsistsofinput-outputpairs

Today:

q DataRepresentaMonforMLsystems

q ReviewofLinearAlgebraandMatrixCalculus

DEFINITIONS - SCALAR

◆ ��a scalar is a number –  (denoted with regular type: 1 or 22)

10 9/1/16 Dr.YanjunQi/UVACS6316/f16

DEFINITIONS - VECTOR

◆ Vector: a single row or column of numbers – denoted with bold small letters – row vector a = –  column vector (default) b =

[ ]54321

⎥⎥⎥⎥

⎢⎢⎢⎢

DEFINITIONS - VECTOR

9/1/16 12

•  Vector in Rn is an ordered set of n real numbers. –  e.g. v = (1,6,3,4) is in R4

–  A column vector: –  A row vector: ⎟⎟

⎟⎟⎟

⎜⎜⎜⎜⎜

( )4361

DEFINITIONS - MATRIX ◆ A matrix is an array of numbers A = ◆ Denoted with a bold Capital letter ◆ All matrices have an order (or dimension): that is, the number of rows * the number of

columns. So, A is 2 by 3 or (2 * 3). u  A square matrix is a matrix that has the

same number of rows and columns (n * n)

⎥⎦⎤

⎢⎣⎡

232221

131211

aaaaaa

DEFINITIONS - MATRIX

•  m-by-n matrix in Rmxn with m rows and n columns, each entry filled with a (typically) real number:

•  e.g. 3*3 matrix

9/1/16 14

⎟⎟⎟

⎜⎜⎜

2396784821

Squarematrix

Special matrices

⎟⎟⎟

⎜⎜⎜

fedcba

⎟⎟⎟

⎜⎜⎜

000000

⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜

⎟⎟⎟

⎜⎜⎜

diagonal upper-triangular

tri-diagonal lower-triangular

⎟⎟⎟

⎜⎜⎜

100010001

I (identity matrix)

Special matrices: Symmetric Matrices

Review of MATRIX OPERATIONS

1)  Transposition 2)  Addition and Subtraction 3)  Multiplication 4)  Norm (of vector) 5)  Matrix Inversion 6)  Matrix Rank 7)  Matrix calculus

(1) Transpose

Transpose: You can think of it as –  �flipping� the rows and columns

( )baba T

=⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎠

⎞⎜⎜⎝

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎛dbca

dcba T

(2) Matrix Addition/Subtraction

•  Matrix addition/subtraction – Matrices must be of same size.

(2) Matrix Addition/SubtractionAnExample

•  Ifwehave

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦

1 2 7 10 8 12+ = 3 4 + 8 11 = 11 15

5 6 9 12 14 18C A B

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 2 7 103 4 and = 8 115 6 9 12

thenwecancalculateC=A+Bby

(2) Matrix Addition/SubtractionAnExample

•  Similarly,ifwehave

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦

1 2 7 10 -6 -8- = 3 4 - 8 11 = -5 -7

5 6 9 12 -4 -6C A B

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 2 7 103 4 and = 8 115 6 9 12

thenwecancalculateC=A-Bby

OPERATION on MATRIX

(3)ProductsofMatrices•  WewritethemulNplicaNonoftwomatricesAandBasAB

•  Thisisreferredtoeitheras•  pre-mulNplyingBbyA or

•  post-mulNplyingAbyB•  SoformatrixmulNplicaNonAB,Aisreferredtoasthepremul7plierandBisreferredtoasthepostmul7plier

(3)ProductsofMatrices

9/1/16 24

CondiMon:n=q

m x n q x p m x p

(3)ProductsofMatrices

•  InordertomulNplymatrices,theymustbeconformable(thenumberofcolumnsinthepremulNpliermustequalthenumberofrowsinpostmulNplier)

•  Notethat•  an(mxn)x(nxp)=(mxp)•  an(mxn)x(pxn)=cannotbedone•  a(1xn)x(nx1)=ascalar(1x1)

ProductsofMatrices

•  IfwehaveA(3x3)andB(3x2)then

⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥ ⎢ ⎥= =⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦

11 12 13 11 12 11 12

21 22 23 21 22 21 22

31 32 31 3231 32 33

a a a b b c ca a a x b b = c c

b b c ca a aAB C

11 11 11 12 21 13 31

12 11 12 12 22 13 32

21 21 11 22 21 23 31

22 21 12 22 22 23 32

31 31 11 32 21 33 31

32 31 12 32 22 33 32

c = a b + a b +a bc = a b +a b +a bc = a b +a b +a bc = a b +a b +a bc = a b +a b +a bc = a b +a b +a b

where test

MatrixMulNplicaNonAnExample

•  Ifwehave

⎡ ⎤⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥=⎢ ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦ ⎣ ⎦⎣ ⎦

c c1 4 7 1 4 30 662 5 8 x 2 5 = c c = 36 813 6 9 3 6 42 96c c

( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )( ) ( ) ( )

11 11 11 12 21 13 31

12 11 12 12 22 13 32

21 21 11 22 21 23 31

22 21 12 22 22 23 32

31 31 11 32 21 33 31

32 31 12 32 22 3

c = a b + a b + a b =1 1 + 4 2 +7 3 = 30

c = a b + a b + a b =1 4 + 4 5 +7 6 = 66

c = a b + a b + a b = 2 1 +5 2 +8 3 = 36

c = a b + a b + a b = 2 4 +5 5 +8 6 = 81

c = a b + a b + a b = 3 1 +6 2 +9 3 = 42

c = a b + a b + a ( ) ( ) ( )3 32b = 3 4 +6 5 +9 6 = 96

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 4 7 1 42 5 8 and = 2 53 6 9 3 6

SomeProperNesofMatrixMulNplicaNon

•  Notethat•  Evenifconformable,ABdoesnotnecessarilyequalBA(i.e.,matrixmulNplicaNonisnotcommuta7ve)

• MatrixmulNplicaNoncanbeextendedbeyondtwomatrices

• matrixmulNplicaNonisassocia7ve,i.e.,A(BC)=(AB)C

SomeProperNesofMatrixMulNplicaNon

◆ Multiplication and transposition (AB)T = BTAT

u Multiplication with Identity Matrix

SpecialUsesforMatrixMulNplicaNon

•  ProductsofScalars&MatricesèExample,Ifwehave

⎡ ⎤ ⎡ ⎤⎢ ⎥ ⎢ ⎥=⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦

1 2 3.5 7.03.5 3 4 = 10.5 14.0

5 6 17.5 21.0bA

⎡ ⎤⎢ ⎥=⎢ ⎥⎣ ⎦

1 23 4 and b = 3.55 6

thenwecancalculatebAby

NotethatbA=Abifbisascalar9/1/16 30Dr.YanjunQi/UVACS6316/f16

•  Dot(orInner)ProductoftwoVectors• PremulNplicaNonofacolumnvectorabyconformablerowvectorbyieldsasinglevaluecalledthedotproductorinnerproduct-If

aT = 3 4 6!"

#$ and b =

aTb = a•b = 3 4 6!"

&& = 3 5( )+ 4 2( )+ 6 8( )= 71 = bTa

thentheirinnerproductgivesus

whichisthesumofproductsofelementsinsimilarposiNonsforthetwovectors9/1/16 31Dr.YanjunQi/UVACS6316/f16

•  OuterProductoftwoVectors• PostmulNplicaNonofacolumnvectorabyconformablerowvectorbyieldsamatrixcontainingtheproductsofeachpairofelementsfromthetwomatrices(calledtheouterproduct)-If

aT = 3 4 6!"

#$ and b =

abT =346

5 2 8!"

152030

243248

thenabTgivesus

•  OuterProductoftwoVectors,e.g.aspecialcase:

•  SumtheSquaredElementsofaVector• PremulNplyacolumnvectorabyitstranspose–If

thenpremulNplicaNonbyarowvectoraT

willyieldthesumofthesquaredvaluesofelementsfora,i.e.

⎡ ⎤⎢ ⎥=⎢ ⎥⎣ ⎦

aT = 5 2 8!"

aTa = 5 2 8!"

&& = 52 +22 + 82 = 93

•  Matrix-VectorProducts(I)

•  Matrix-VectorProducts(II)

•  Matrix-VectorProducts(III)

•  Matrix-VectorProducts(IV)

MATRIX OPERATIONS

A norm of a vector ||x|| is informally a measure of the �length� of the vector.

–  Common norms: L1, L2 (Euclidean)

–  Linfinity

(4) Vector norms

VectorNorm(L2,whenp=2)

VectorNorms(e.g.,)

MoreGeneral:Norm

•  AnormisanyfuncNong()thatmapsvectorstorealnumbersthatsaNsfiesthefollowingcondiNons:

Orthogonal&Orthonormal

Ifu•v=0,||u||2!=0,||v||2!=0àuandvareorthogonal

Ifu•v=0,||u||2=1,||v||2=1àuandvareorthonormal

x•y=

InnerProductdefinedbetweencolumnvectorxandy,asè

Orthogonal matrices

• Aisorthogonalif:

• NotaNon:

Example:

Orthogonal matrices

• NotethatifAisorthogonal,iteasytofinditsinverse:

Property:

•  DefiniMon:Givenavectornorm||x||,thematrixnormdefinedbythevectornormisgivenby:

•  Whatdoesamatrixnormrepresent?•  Itrepresentsthemaximum“stretching”thatAdoestoavectorx->(Ax).

MatrixNorm

Ax 0max

TheoremA:Thematrixnormcorrespondingto1-normismaximumabsolutecolumnsum:

Proof:Frompreviousslide,wecanhaveAlso,whereAjisthej-thcolumnofA.

Matrix1-Norm

iijjaA

111max AxAx =

Ax = x1A1 + x2A2 +!+ xnAn = x jA jj=1

MATRIX OPERATIONS

(5)InverseofaMatrix

•  TheinverseofamatrixAiscommonlydenotedbyA-1orinvA.

•  TheinverseofannxnmatrixAisthematrixA-1suchthatAA-1=I=A-1A

•  Thematrixinverseisanalogoustoascalarreciprocal

•  Amatrixwhichhasaninverseiscallednonsingular

(5)InverseofaMatrix

•  ForsomenxnmatrixA,aninversematrixA-1

maynotexist.•  Amatrixwhichdoesnothaveaninverseissingular.

•  AninverseofnxnmatrixAexistsiff|A|not0

THE DETERMINANT OF A MATRIX

◆ The determinant of a matrix A is denoted by |A| (or det(A) or det A).

◆ Determinants exist only for square matrices.

◆ E.g. If A = ⎥⎦

⎤⎢⎣

21122211 aaaaA −=

diagonal matrix:

HOW TO FIND INVERSE MATRIXES? An example,

◆ If ◆  and |A| not 0

⎥⎦

⎤⎢⎣

⎡−

)det(11

⎥⎦

⎤⎢⎣

Matrix Inverse

•  The inverse A-1 of a matrix A has the property: AA-1=A-1A=I

•  A-1 exists only if

•  Terminology –  Singular matrix: A-1 does not exist –  Ill-conditioned matrix: A is close to being singular

PROPERTIES OF INVERSE MATRICES

◆ ��

( ) 111 -- ABAB =−

AT( )−1 = A-1( )T

( ) AA =−11-

Inverse of special matrix

•  For diagonal matrices

•  For orthogonal matrices –  a square matrix with real entries whose columns and rows

are orthogonal unit vectors (i.e., orthonormal vectors)

Pseudo-inverse

•  The pseudo-inverse A+ of a matrix A (could be non-square, e.g., m x n) is given by:

•  It can be shown that:

MATRIX OPERATIONS

(6) Rank: Linear independence

•  A set of vectors is linearly independent if none of them can be written as a linear combination of the others.

x3=−2x1+x2

èNOTlinearlyindependent

(6) Rank: Linear independence

•  Alternative definition: Vectors v1,…,vk are linearly independent if c1v1+…+ckvk = 0 implies c1=…=ck=0

9/1/16 62

⎟⎟⎟

⎜⎜⎜

⎟⎟⎟

⎜⎜⎜

⎟⎟⎟

⎜⎜⎜

⎟⎟⎟

⎜⎜⎜

⎛=⎟⎟⎠

⎞⎜⎜⎝

⎟⎟⎟

⎜⎜⎜

313201

(u,v)=(0,0),i.e.thecolumnsarelinearlyindependent.

(6) Rank of a Matrix

•  rank(A) (the rank of a m-by-n matrix A) is = The maximal number of linearly independent columns =The maximal number of linearly independent rows

•  If A is n by m, then –  rank(A)<= min(m,n) –  If n=rank(A), then A has full row rank –  If m=rank(A), then A has full column rank

Rank=? Rank=?

(6) Rank of a Matrix

•  Equal to the dimension of the largest square sub-matrix of A that has a non-zero determinant.

Example: has rank 3

(6) Rank and singular matrices

MATRIX OPERATIONS

( ) ( )0

f a h f ah→

+ −iscalledthederivaNveofat.f a

Wewrite: ( ) ( ) ( )0

f x h f xf x

+ −′ =

�ThederivaNveoffwithrespecttoxis…�

TherearemanywaystowritethederivaMveof ( )y f x=

Review:DerivaNveofaFuncNon

èe.g.definetheslopeofthecurvey=f(x)atthepointx9/1/16 67Dr.YanjunQi/UVACS6316/f16

-3 -2 -1 1 2 3x

2 3y x= −

( ) ( )2 2

3 3limh

x h xy

+ − − −′ =

x xh h xy

+ + −′ =

2y x′ =-6-5-4-3-2-10

123456

-3 -2 -1 1 2 3x

0lim2h

y x h→

′ = +0

Review:DerivaNveofaQuadraNcFuncNon

SomeimportantrulesfortakingderivaNves

Review:DefiniNonsofgradient(Matrix_calculus/Scalar-by-matrix)

Inprinciple,gradientsareanaturalextensionofparNalderivaNvestofuncNonsofmulNplevariables.

èDenominatorlayout

Review:DefiniNonsofgradient(Matrix_calculus/Scalar-by-vector)

9/1/16 71

•  Sizeofgradientisalwaysthesameasthesizeof

èDenominatorlayout

For Examples

Exercise:asimpleexample

f (w)=wTx = w1 ,w2 ,w3!"

***=w1+2w2+3w3

∂ f∂w

=∂wTx∂w

= x =123

èDenominatorlayout

EvenmoregeneralMatrixCalculus:TypesofMatrixDerivaMves

ByThomasMinka.OldandNewMatrixAlgebraUsefulforStaNsNcs

Review:HessianMatrix/n==2case

•  1stderivaNvetogradient,

•  2ndderivaNvetoHessian

f (x, y)

g =∇f =∂f∂x

∂f∂y

H =∂2 f∂x2

∂2 f∂x∂y

∂2 f∂y2

759/1/16 Dr.YanjunQi/UVACS6316/f16

SinglevariateàmulNvariate

Review:HessianMatrix

TodayRecap

q DataRepresentaMonq LinearAlgebraandMatrixCalculusReview1)  Transposition 2)  Addition and Subtraction 3)  Multiplication 4)  Norm (of vector) 5)  Matrix Inversion 6)  Matrix Rank 7)  Matrix calculus

Extra:

•  HW1isreleasedtoday@Collab•  HW1isduenextSat@midnight

•  HandoutforLecture2hasbeenposted@hxp://www.cs.virginia.edu/yanjun/teach/2016f/schedule.html

•  Thefollowingtopicsarecoveredbyhandout,butnotbythisslide(willbecovered…)– Trace()– Eigenvalue/Eigenvectors– PosiNvedefinitematrix,Grammatrix– QuadraNcform– ProjecNon(vectoronaplane,oronavector)

References

q hxp://www.cs.cmu.edu/~zkolter/course/linalg/index.html

q Prof.JamesJ.Cochran’stutorialslides“MatrixAlgebraPrimerII”

q hxp://www.cs.cmu.edu/~aarN/Class/10701/recitaNon/LinearAlgebra_Matlab_Review.ppt

q Prof.AlexanderGray’sslidesq  Prof. George Bebis’ slides q  Prof. Hal Daum ́e III’ notes

UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 2 ... · UVA CS 6316/4501 – Fall 2016...

Documents

Transcript of UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 2 ... · UVA CS 6316/4501 – Fall 2016...

UVA CS 4501: Machine Learning Lecture 10: K-nearest ... · q Classiﬁcaon (supervised) q Unsupervised models ... model, vs. global model like linear ... , Steinbach , Kumar’s ...

UVA CS 6316: Machine Learning Lecture 18: Decision Tree ...Machine Learning Lecture 18: Decision Tree / Bagging ... of Virginia Department of Computer Science . Course Content Plan

UVA CS 6316: Machine Learning Lecture 9: K-nearest-neighbor · e.g., support vector machine, decision tree, logistic regression, e.g. neural networks (NN), deep NN 2. Generative:

UVA$CS$6316$$ –$Fall$2015$Graduate:$$ Machine$Learning ...€¦ · • k-Nearest neighbor classifier is a lazy learner – Does not build model explicitly. – Classifying unknown

UVA CS 6316: Machine Learning Lecture 5: Non-Linear ...

Intro to Neural Networks and Deep Learningjjl5sw/documents/DeepLearningIntro.pdf · Intro to Neural Networks and Deep Learning Jack Lanchantin Dr. Yanjun Qi 1 UVA CS 6316. Neurons

UVA CS 4501: Machine Learning Lecture 22: Unsupervised ...€¦ · Machine Learning Lecture 22: Unsupervised Clustering (I) Dr. Yanjun Qi University of Virginia Department of ...

AGR 4501 Pasture Management 1 AGR 4501 PASTURE MANAGEMENT.

Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ... · LR with non-linear basis funcGons • LR does not mean we can only deal with linear relaGonships • We are free

UVA CS 6316: Machine Learning Lecture 3: Linear Regression ... · Machine Learning Lecture 3: Linear Regression Basics Dr. YanjunQi ... qGraphical models qReinforcement Learning 9/18/19

UVA CS 6316: Machine Learning Lecture 11b: Support Vector ... · Machine Learning Lecture 11b: Support Vector Machine (nonlinear) Kernel Trick and in Practice Dr. YanjunQi University

Yanjun&Qi&/&UVA&CS&4501/01/6501/07& UVACS$4501$+$001 ...€¦ · 10/30/14 11 & & & 21 Method&II:&&LRwith&batch&Steepest descent/&&Gradientdescent& – This&is&as&abatch&gradientdescentalgorithm&

Dr. Yanjun Qi / UVA CS 6316 / f16 UVA CS 6316/4501 – Fall ...€¦ · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 15: K-nearest-neighbor Classiﬁer / Bias-Variance

Whirlpool Awe 6316 w

6316 vonnie ct_riverside

HIT2316-6316 Module 1

UVA CS 6316: Machine Learning Lecture 10: Bias-Variance Tradeoff€¦ · Adapt From Carols’ probtutorial . 10/3/19 Dr. Yanjun Qi / UVA CS 11 Review: Mean and Variance of RV •Variance:

UVA CS 6316: Machine Learning Lecture 19c:Unsupervised ... · Dr. Yanjun Qi / UVA CS 6316 / f16. 11/25/19 11 Other partitioning Methods •Partitioning around medoids(PAM): instead

UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM … · 2020-01-12 · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 21: EM (Extra) Dr. Yanjun Qi University

UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 19 ... · UVA CS 6316/4501 – Fall 2016 Machine Learning Lecture 19: Unsupervised Clustering (I) Dr. Yanjun Qi University