Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic...

21
Some Tricks For efficient implementation

Transcript of Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic...

Page 1: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Some TricksFor efficient implementation

Page 2: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Logistic Regression

• Another popular classification model

• Usual setting• Observe data

• with labels

• Assume the label probability follows:

Page 3: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Analysing further

• Probability for other class

• Thus, overall we have:

Page 4: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Training LR

• Maximum Likelihood Estimation

• Equivalently

• Add 𝐿2 regularizer

• Let’s solve this optimization problem in an efficient manner!

Page 5: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Logistic Regression vs SVM

• Recall SVM basically solves

• LR basically solves

• That is just replace max with softmax!

Page 6: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Gradient Descent to solve LR• The objective function is:

• How to evaluate this?J=0;for i=1:n

inner_product = 0;for j=1:d

inner_product = inner_product + w(j)*x(i,j);endJ = J + log( 1 + exp( - y(i)*inner_product ) );

endfor j=1:d

J = J + lambda*w(j)^2;end

Page 7: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Computing Objective Function• The objective function is:

• How to evaluate this?J=0;for i=1:n

inner_product = 0;for j=1:d

inner_product = inner_product + w(j)*X(i,j);endJ = J + log( 1 + exp( - y(i)*inner_product ) );

endfor j=1:d

J = J + lambda*w(j)^2;end

Never!

Page 8: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Computing Objective Function• The objective function is:

• How to evaluate this?

J = 0;for i=1:n

J = J + log( 1 + exp( - y(i)*X(i,:)*w ) );endJ = J + sum(w.^2);

Not even this!

Page 9: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Computing Objective Function• The objective function is:

• How to evaluate this?

• Short code!

• Matrix-vector products and summing vectors are highly optimized

J = sum( log( 1 + exp( - (X*w).*y ) ) ) + lambda*sum(w.^2);

Page 10: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Matrix Multiplication

• Never write vector or matrix operations by yourself!

• Always use libraries• 100x faster!

• MKL or BLAS maybe intimidating to use directly

• Good News:• Matlab already does it for you

• Eigen as wrapper• Almost matlab like API in C++

Page 11: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Exercise: Computing Gradient

• For the gradient descent approach, next thing needed is the gradient!

Page 12: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Exercise: Computing Gradient

• For the gradient descent approach, next thing needed is the gradient!

• Get the entire gradient vector at one go!

• One way using repmat

b = ( 1 + exp( (X*w).*y ) ) ) .* yb = repmat(b,1,5);

g = sum(X./b)’ + 2*lambda*w;

Page 13: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Exercise: Computing Gradient

• For the gradient descent approach, next thing needed is the gradient!

• Get the entire gradient vector at one go!

• More memory efficient

b = ( 1 + exp( (X*w).*y ) ) ) .* yg = sum(bsxfun(@rdivide, X,b));

g = g’ + 2*lambda*w;

Page 14: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Computing Gram Matrices

nsq=sum(X.^2,2);

K=bsxfun(@minus,nsq,(2*X)*X.'); K=bsxfun(@plus,nsq.',K); K=exp(-K);

Page 15: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Algebraic Tricks

• Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution for the posterior predictive of Normal Inverse Wishart:

• So you need the determinant and inverse of Σ – expensive 𝑂 𝑑3

• Moreover, posterior predictive has to be computed many times for different x

PDF of a general 𝑡𝜈 𝐱; 𝝁, 𝚺 =

Page 16: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Cholesky Decomposition

• The update in posterior predictive for Σ would be

• So instead of computing this update:• Suppose we have cholesky decomposition of Σ𝑛

• Then we calculate only the rank-one update to obtain Σ

Page 17: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Cholesky Updates

Page 18: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Cholesky Update

Page 19: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Nice Properties

Page 20: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Triangular Solver

Page 21: Some Tricks - Alex Smolaalex.smola.org/teaching/10-701-2015/website/recitation/trick.pdfAlgebraic Tricks •Hopefully if you will solve HW5 bonus and get a multi-variate student t-distribution

Miscellaneous Tricks

• Finding the min/max of a matrix of N-d array

• Try to avoid inverse of a matrix!• Typically you only need x = 𝐴\b

• This invokes appropriate linear solver

• Much more efficient and numerically stable

[MinValue, MinIndex] = min( A(:) ); %find minimum element in AMinSub = ind2sub(size(A), MinIndex); %convert MinIndex to subscripts