A principled way to principal components analysis

A principled way to principal components

analysis

Teaching activity objectives• Visualize large data sets.• Transform the data to aid in this

visualization.• Clustering data.• Implement basic linear algebra

operations.• Connect this operations to neuronal

models and brain function.

Context for the activity• Homework Assignment in 9.40 Intro

to neural Computation (Sophomore/Junior).

• In-class activity 9.014 Quantitative Methods and Computational Models in Neuroscience (1st year PhD).

Data visualization and performing pca:

MNIST data set

28 by 28 pixels8-bit gray scale images

These images live in a 784 dimensional space

http://yann.lecun.com/exdb/mnist/

Can we cluster images in the pixel space?

One possible visualization

There are more than 300000 possible pairwise pixel plots!!!

Is there a more principled way?

• Represent the data in a new basis set.• Aids in visualization and potentially in

clustering and dimensionality reduction.• PCA provides such a basis set by looking

at directions that capture most variance.• The directions are ranked by decreasing

variance.• It diagonalizes the covariance matrix.

Pedagogical approach• Guide them step by step to implement

PCA.• Emphasize visualizations and

geometrical approach/intuition.• We don’t use the MATLAB canned

function for PCA.• We want students to get their hands

“dirty”. This helps build confidence and deep understanding.

PCA Mantra• Reshape the data to proper format for PCA.• Center the data performing mean subtraction.• Construct the data covariance matrix.• Perform SVD to obtain the eigenvalues and

eigenvectors of the covariance matrix.• Compute the variance explained per

component and plot it.• Reshape the eigenvectors and visualize their

images.• Project the mean subtracted data onto the

eigenvectors basis.

First 9 Eigenvectors

Projections onto the first 2 axes

• The first two PCs capture ~37% of the variance.• The data forms clear clusters that are almost linearly separable

Building models: Synapses and PCA

• 1949 book: 'The Organization of Behavior' Theory about the neural bases of learning

• Learning takes place at synapses.

• Synapses get modified, they get stronger when the pre- and post- synaptic cells fire together.

• "Cells that fire together, wire together"

Hebbian Learning

Donald Hebb

Unstable

Building Hebbian synapses

Erkki Oja

Oja’s rule

A simplified neuron model as a principal component analyzer. Journal of Mathematical Biology, 15:267-273 (1982).

Feedback,forgetting term or regularizer

• Stabilizes the Hebbian rule.• Leads to a covariance learning rule: the weights

converge to the first eigenvector of the covariance matrix.

• Similar to power iteration method.

Learning outcomes• Visualize and manipulate a relatively large

and complex data set.• Perform PCA by building it step by step.• Gain an intuition of the geometry involved

in a change of basis and projections.• Start thinking about basic clustering

algorithms.• Discuss on dimensionality reduction and

other PCA applications

Learning outcomes (cont)• Discuss the assumptions, limitations

and shortcomings of applying PCA in different contexts.

• Build a model of how PCA might actually take place in neural circuits.

• Follow up: eigenfaces, is the brain doing PCA to recognize faces?

A principled way to principal components analysis

Documents

Transcript of A principled way to principal components analysis