Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing...

Post on 01-Feb-2020

2 views 0 download

Transcript of Reducing the Dimensionality of Data with Neural …dasgupta/254-deep-ul/andrea.pdf5/15/19 1 Reducing...

5/15/19

1

Reducing the Dimensionality of Data with Neural NetworksANDREA CASTRO

MAY 14, 2019

The curse of dimensionality• High dimensional data often has more features than observations• As more variables are added, it becomes more difficult to make

accurate predictions• Example: Finding a cell in a 2D petri dish vs. 3D beaker

2

25 cm2 125 cm3https://www.statisticshowto.datasciencecentral.com/dimensionality/

5/15/19

2

Reducing dimensionality

• Principle Components Analysis (PCA) • Finds directions of

greatest variance • Represents each data

point by coordinates along these directions

3

http://www.nlpca.org/pca_principal_component_analysis.html

Autoencoders

4

• Composed of encoder and decoder networks• Encoder: high-dimensional data -> low-

dimensional code• Decoder: recovers original data from low-

dimensional code

• Minimize discrepancy between input and output• Difficult to perform gradient descent

without well-initialized weights

5/15/19

3

Pretraining to optimize weights

• Training layer-by-layer as restricted Boltzmann machines (RBMs)• Learned feature activations are used as

input data in next layer

5

RBMs are energy-based models

Hidden units model the distribution

where

Energy can be raised or lowered by adjusting the biases and weight matrix

6

v1 v2 v3 vi…

h1 h2 hj…

bj

bi

Hidden layer

Visible layer

5/15/19

4

RBMs are energy-based models

The network assigns a probability to every possible image

Conditional distribution is easier to calculate

7

v1 v2 v3 vi…

h1 h2 hj…

bj

bi

Hidden layer

Visible layer

(and vice versa)

Derivation (1/2)

8

Joint over marginal

Expansion, cancel terms not dependent on h

Expansion

Exponential of sum is product of exponentials

Independent hj

Exponential of sum is product of exponentials

Expand for h’j = 0 and 1 cases

Combine both ∏j

Note distribution

5/15/19

5

Derivation (2/2)

9

Multiply by exp(-bj -Wj x)

RBM training

Given an input, hidden unit states are set to 1 according to

Next, a “confabulation” image is produced by setting each according to

Finally, the hidden unit states are updated to represent the confabulated image’s features

10

5/15/19

6

Unfolding and finetuning

Each next RBM is trained on previous hidden layer of feature detectors

Autoencoder is created by unfolding/mirroring stacked RBMs

Finetune using standard backpropagation

11

Exampleson images

12

Test data

6D Autoencoder

6D logistic PCA 7.64 MSE

1.44 MSE

Test data

30D Autoencoder

30D logistic PCA 8.01 MSE

3.00 MSE

Test data

30D Autoencoder

30D PCA 135 MSE

126 MSE

5/15/19

7

Example: 2D MNIST code visualization

13

LDA Autoencoder

Example: 2D document class visualization

14

Latent Semantic Analysis

Autoencoder