Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word...
Transcript of Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word...
![Page 1: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/1.jpg)
Representation learningDeep learning overview, representation learning methods in detail (sammons map, t-sne), the backprop algorithm in detail, and regularization and its impact on optimization.
Dimensionality reductionWord2vecPCASammon's mapRegularizationt-SNEFactorized EmbeddingsLatent variable modelsALI/BiGAN
Slides by Joseph Paul Cohen 2020
![Page 2: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/2.jpg)
Dim reduction overviewWe would like a mapping from R100 (or anything) to R2 to visualize it: aka: encoding, code, embedding, representation
Invertible vs non-invertible: If we allow for the loss of information we cannot have a lossless reconstruction. Some methods just preserve distances, no mapping learned.
![Page 3: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/3.jpg)
PCA for dim reduction (quick recap)
In 2d data this vector captures the most variance
A is the eigenvectors of X such that x=AATx, A's columns are orthogonal, and the columns of A form a basis which encodes the most variance.
![Page 4: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/4.jpg)
word2vecPresented at NeurIPS 2013
Strategy to learn representations for word tokens given their context.
Relevant to problems where context defines concepts (with redundancy)
4
Tomas Mikolov
![Page 5: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/5.jpg)
What to do with word embeddings?
● We can compose them to create paragraph embeddings (bag of embeddings).● Use in place of words for an RNN● Augment learned representations on small datasets
● `
[Cultural Shift or Linguistic Drift, Hamilton, 2016]
Study how the meaning between two texts varies (or hospitals, or doctors)?
[Pennington, 2014][Mikolov, 2013]
Study the compositionality of the learned latent space
![Page 6: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/6.jpg)
Token representationsOne-hot encoding: binary vector per token
Example: cat = [0 0 1 0 0 0 0 0 0 0 0 0 0 0 … 0]dog = [0 0 0 0 1 0 0 0 0 0 0 0 0 0 … 0]house = [1 0 0 0 0 0 0 0 0 0 0 0 0 0 … 0]
Note!
If x is one hotThe dot product of Wx= a Single column of W M x N N x 1
=
M x 1
![Page 7: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/7.jpg)
word2vecTarget wordContext wordContext word Context word
involving respiratory system and other chest symptomsContext word
involving
respiratory
doctor
chest
Mikolov, Efficient Estimation of Word Representations in Vector Space, 2013
1. Each word is a training example2. Each word is used in many contexts3. The context defines each word
system
1
1
0
1
0
0
0
0
0
1-1
5.1
involving
respiratory
doctor
chest
system
Context window = 2
![Page 8: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/8.jpg)
Learning in progress
![Page 9: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/9.jpg)
king + (woman - man) = ?
The point that is closest is queen!
![Page 10: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/10.jpg)
Try it yourself!https://colab.research.google.com/drive/1VU4mm_DThBaQc9t0Cf6ajjHQDEw-Q1H2
![Page 11: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/11.jpg)
Sammon's mapDescribed by John W. Sammon in 1969
Method of non-linear dim reduction based on gradient descent.
Basic method of preserving distances in a low dim space.
11
John W. Sammon
![Page 12: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/12.jpg)
Colab Notebookhttps://colab.research.google.com/drive/1FDJ2FlVfN5PYYrNKEW2w48_BuSknhKif
![Page 13: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/13.jpg)
= distance computed between each learned representation
= distance function (or matrix) that you want to represent
First: a basic non-linear dimensionality reduction
Learn a representation that maintains pairwise distances.
![Page 14: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/14.jpg)
= distance computed between each learned representation
= distance function (or matrix) that you want to represent
Constant!
Scale discrepancy by true distance.Small distance = more important
![Page 15: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/15.jpg)
= distance computed between each learned representation
= distance function (or matrix) that you want to represent
source_d = torch.pdist(source)target_d = torch.pdist(target)
stress = (((target_d - source_d)**2)/(source_d+1)).sum()
![Page 16: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/16.jpg)
Output is a non-square (non redundant) distance matrix between vectors
![Page 17: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/17.jpg)
source = torch.Tensor(data.values)target = torch.randn(source.shape[0],2, requires_grad=True)
optimizer = torch.optim.SGD([target], lr=0.5)optimizer.zero_grad() # get ready for new gradients
source_d = torch.pdist(source) # compute distancestarget_d = torch.pdist(target) # compute distances
stress = (((target_d - source_d)**2)/(source_d+1)).sum()
stress.backward() # compute gradients for target
optimizer.step() # adjust the target tensor
![Page 18: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/18.jpg)
This paper calls the learning rate the "magic factor" !
![Page 19: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/19.jpg)
Exercise (regularization)
How to control the representation learned?
Adjust the objective function so that the minimum has the property you want.
dloss += 0.01*torch.pdist(target[label==6]).mean()
![Page 20: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/20.jpg)
DiscussionSimple cases? hard cases?
What does a learning rate over 1 mean?
What are the drawbacks of this sammon's map?
What does regularization change about training?
![Page 21: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/21.jpg)
t-SNETL;DR: Sammon's map but distances delay exponentially
= conditional probability that xi is next to xj given a Gaussian centered at xi
data space embedding space
Ratio between distances weighted by source data distance.Drives p and q to be equal but only for nearby points.
![Page 22: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/22.jpg)
Setting the σ (perplexity)t-SNE performs a binary search for the value of σi that produces a Pi with a fixed perplexity.
Data space (but in 2d)
Small σ
Image from: https://www.dataminingapps.com/2019/11/a-refresher-on-t-sne/
Perp=30-> H = ~4.9
Large σ
![Page 23: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/23.jpg)
DiscussionHow does t-SNE differ from sammon's map?
Which distances are meaningful?
![Page 24: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/24.jpg)
Factorized EmbeddingsTL;DR: two spaces of non-linear embeddings.
Conditioned on each other to predict data.
GTEx Dataset: 8,910 samples, 56,000 genes
[Trofimov et al. ICML WCB 2017]
Tissue
![Page 25: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/25.jpg)
Color represents expression predicted when conditioned on
a gene
Evaluating sample space
Value predicted for entire space
![Page 26: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/26.jpg)
Latent variable modelsWe learn a mapping from a latent
variable z to a complicated x = Something simple like a Gaussian
where
The conditional prob is modeled by a neural network and the latent space
is a distribution we understand.
Image from Ward, A. D, 3D Surface Parameterization Using Manifold Learning for Medial Shape Representation
![Page 27: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/27.jpg)
ALI/BiGANICLR 2017. Two papers, same idea.
Matches joint distribution p(x,z).
Trains an encoder and decoder.
27
Vincent Dumoulin Jeff Donahue
![Page 28: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/28.jpg)
ALI/BiGANMatches the joint distribution p(z,x) with q(z,x) using an adversarial loss.
p(z,x)
q(z,x)
Classifier learns to tell the difference between inputs. Update E and G so D cannot distinguish.
Typically a Gaussian generates
points here.
![Page 29: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/29.jpg)
The generator learns to match the target
distribution to fool the discriminator
Generator
Fake Image
Discriminator
Real or Fake?
Fake Real
Noise
Quick intro to adversarial distribution matching
29
![Page 30: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/30.jpg)
Homework1) Find a single/multi cell RNA-seq dataset compute a PCA, Sammon's Map,
and t-SNE. Color points by some relevant value.
2) What are the challenges for representation learning?
3)
![Page 31: Sammon's map Word2vec Regularization t-SNE Representation … · 2020. 2. 10. · Context word Context word Target word Context word involving respiratory system and other chest symptoms](https://reader033.fdocuments.in/reader033/viewer/2022053118/609dbf31052944674d2937df/html5/thumbnails/31.jpg)