Visualizing Data using t-SNE - UCLouvain · 2016. 6. 13. · t-SNE can be used to make sense of the...

Visualizing Data using t-SNEAn Intuitive Introduction

Simon Carbonnelle

Universite Catholique de Louvain, ICTEAM12th of May, 2016

Visualization and Dimensionality Reduction

Intuition behind t-SNE

Visualizing representations

Visualization is key to understand data easily.

Data of house areas in m2 and price in 1000s of euros.

Area price

43.69 298.7128.82 308.

102.22 426.6836.32 307.5348.35 315.4

Area price

59.04 324.4890.13 373.859.24 325.7194.89 396.6927.72 313.53

Area price

65.2 323.4392.38 379.5677.86 337.7773.48 349.1552.19 311.86

Question

Is the relation linear?1

Visualization is key to understand data easily.

Question

Is the relation linear?1

Dimensionality Reduction is a helpful tool forvisualization.

I Dimensionality reduction algorithmsI Map high-dimensional data to a

lower dimensionI While preserving structure

I They are used forI VisualizationI PerformanceI Curse of dimensionality

I A ton of algorithms exist

I t-SNE is specialised for visualization

I ... and has gained a lot of popularity

Dimensionality Reduction is a helpful tool forvisualization.

I Dimensionality reduction algorithmsI Map high-dimensional data to a

lower dimensionI While preserving structure

I They are used forI VisualizationI PerformanceI Curse of dimensionality

I A ton of algorithms exist

I t-SNE is specialised for visualization

I ... and has gained a lot of popularity

Dimensionality Reduction techniques solveoptimization problems.

X = {x1, x2, ..., xn ∈ Rh} → Y = {y1, y2, ..., yn ∈ Rl}

C (X ,Y)

Three approaches for Dimensionality Reduction:

I Distance preservation

I Topology preservation

I Information preservation

t-SNE is distance-based but tends to preserve topology

Dimensionality Reduction techniques solveoptimization problems.

X = {x1, x2, ..., xn ∈ Rh} → Y = {y1, y2, ..., yn ∈ Rl}

C (X ,Y)

Three approaches for Dimensionality Reduction:

I Distance preservation

I Topology preservation

I Information preservation

t-SNE is distance-based but tends to preserve topology

SNE computes pair-wise similarities.

SNE converts euclidean distances to similarities, that can beinterpreted as probabilities.

pj |i =exp(− ‖ xi − xj ‖2 /2σ2

i )∑k 6=i exp(− ‖ xi − xk ‖2 /2σ2

qj |i =exp(− ‖ yi − yj ‖2)∑k 6=i exp(− ‖ yi − yk ‖2)

pi |i = 0, qi |i = 0

Hence the name Stochastic Neighbor Embedding...

Pair-wise similarities should stay the same.

pj |i⇔

qj |i⇔

Pair-wise similarities should stay the same.

Kullback-Leiber Divergence measures thefaithfulness with wich qj |i models pj |i .

I Pi = {p1|i , p2|i , ..., pn|i} and Qi = {q1|i , q2|i , ..., qn|i} are thedistributions on the neighbors of datapoint i .

I Kullback-Leiber Divergence (KL) compares two distributions.

C =∑i

KL(Pi ||Qi ) =∑i

pj |i logpj |iqj |i

I KL divergence is asymmetric

I KL divergence is always positive.

I We have our minimization problem: minY C (X ,Y)

Some remaining questions.

pj |i =exp(− ‖ xi − xj ‖2 /2σ2

i )∑k 6=i exp(− ‖ xi − xk ‖2 /2σ2

i ), qj |i =

exp(− ‖ yi − yj ‖2)∑k 6=i exp(− ‖ yi − yk ‖2)

1. Why radial basis function (exponential)?

2. Why probabilities?

3. How do you choose σi?

pj |i =exp(− ‖ xi − xj ‖2 /2σ2

i )∑k 6=i exp(− ‖ xi − xk ‖2 /2σ2

i ), qj |i =

Focus on localgeometry.

This is why t-SNEcan be interpreted astopology-based

pj |i =exp(− ‖ xi − xj ‖2 /2σ2

i )∑k 6=i exp(− ‖ xi − xk ‖2 /2σ2

i ), qj |i =

Small distance does notmean proximity on manifold.

Probabilities are appropriateto model this uncertainty

pj |i =exp(− ‖ xi − xj ‖2 /2σ2

i )∑k 6=i exp(− ‖ xi − xk ‖2 /2σ2

i ), qj |i =

The entropy of Pi increases with σi .

Entropy

H(P) = −∑

i pi log2 pi

Perplexity, a smooth measure of the # of neighbors.

Perplexity

Perp(P) = 2H(P)

⇒ Entropy of 1.055Perplexity of 2.078

⇒ Entropy of 3.800Perplexity of 13.929

From SNE to t-SNE.

Modelisation:pj |i =

exp(−‖xi−xj‖2/2σ2i )∑

k 6=i exp(−‖xi−xk‖2/2σ2i )

qj |i =exp(−‖yi−yj‖2)∑k 6=i exp(−‖yi−yk‖2)

Cost Function:C =

∑i KL(Pi ||Qi )

Derivatives:dCdyi

= 2∑

j(pj |i − qj |i + pi |j − qi |j)(yi − yj)

⇒ Symmetric SNE

Modelisation:pij =

pj|i+pi|j2n

qij =exp(−‖yi−yj‖2)∑k 6=l exp(−‖yk−yl‖2)

Cost Function:C = KL(P||Q)

Derivatives:dCdyi

= 4∑

j(pij − qij)(yi − yj)

I FasterComputa-tion

⇒ t-SNE

Modelisation:pij =

pj|i+pi|j2n

qij =(1+‖yi−yj‖2)−1∑k 6=l (1+‖yk−yl‖2)−1

Derivatives:dCdyi

= 4∑

j(pij − qij)(yi − yj)(1+ ‖ yi − yj ‖2)−1

I Even FasterComputation

I BetterBehaviour

From SNE to t-SNE.

Cost Function:C =

∑i KL(Pi ||Qi )

Derivatives:dCdyi

= 2∑

⇒ Symmetric SNE

Modelisation:pij =

pj|i+pi|j2n

Derivatives:dCdyi

= 4∑

⇒ t-SNE

Modelisation:pij =

pj|i+pi|j2n

qij =(1+‖yi−yj‖2)−1∑k 6=l (1+‖yk−yl‖2)−1

Derivatives:dCdyi

= 4∑

I BetterBehaviour

From SNE to t-SNE.

Cost Function:C =

∑i KL(Pi ||Qi )

Derivatives:dCdyi

= 2∑

⇒ Symmetric SNE

Modelisation:pij =

pj|i+pi|j2n

Derivatives:dCdyi

= 4∑

⇒ t-SNE

Modelisation:pij =

pj|i+pi|j2n

qij =(1+‖yi−yj‖2)−1∑k 6=l (1+‖yk−yl‖2)−1

Derivatives:dCdyi

= 4∑

I BetterBehaviour

The ”Crowding problem”

There is much more space in high dimensions.

Mismatched Tails can Compensate for MismatchedDimensionalities

Student-t distribution has heavier tails.

Last but not least: Optimization

C (X ,Y)

C = KL(P||Q) =∑i

pj |i logpj |iqj |i

I Non-convex

I Gradient descent + Momentum + Adaptive learning rate

Y(t) = Y(t−1) + η(t)δC

δY+ α(t)(Y(t−1) − Y(t−2))

I Two tricks:I Early CompressionI Early Exaggeration

I Illustration Colah’s blog

Mapping raw data to distributed representations.

I Feature engineering is often laborious.

I New tendency is to automatically learn adequate features orrepresentations.

I Ultimate goal: enable AI to extract useful features from rawsensory data.

t-SNE can be used to make sense of the learned representations!

Using t-SNE to explore a Word embedding.I System outputs 1 if central word is in right context, 0

otherwise.

I Algorithms learns representation and classificationsimultaneously.

From Machine Learning to Machine Reasoning, L. Bottou (2011)

Representation captures syntactic and semantic similarity.20

Using t-SNE to explore a Word embedding.

http://colah.github.io/21

Explore a Wikipedia article embedding.

http://colah.github.io/

Exploring game state representations.Google Deepmind plays Atari games.

Playing Atari with deep reinforcement learning, V. Mnih et Al.

Learning to play Space Invaders from score feedback and raw pixelvalues.

I A representation is learned with a convolutional neural networkI From 84x84x4 = 28.224 pixel values to 512 neurons.I Predicts expected score if a certain action is taken.

Human-level control through deep reinforcement learning, V. Mnih et Al. (Nature,2015)

Using t-SNE to explore image representations.Classifying dogs and cats.

https://indico.io/blog/visualizing-with-t-sne/

I Each data point is an image of a dog or a cat

I red = cats, blue = dogs

Representation

Convolutional net trained for Image Classification (1000 classes)

https://indico.io/blog/visualizing-with-t-sne/

Representation

Convolutional net trained for Image Classification (1000 classes)

https://indico.io/blog/visualizing-with-t-sne/27

Conclusion

I The t-SNE algorithm reduces dimensionality while preservinglocal similarity.

I The t-SNE algorithm has been build heuristically.

I t-SNE is commonly used to visualize representations.

Visualizing Data using t-SNE - UCLouvain · 2016. 6. 13. · t-SNE can be used to make sense of the...

Documents

Transcript of Visualizing Data using t-SNE - UCLouvain · 2016. 6. 13. · t-SNE can be used to make sense of the...

Organic & Biomolecular Chemistry - UCLouvain

Optimizing t-SNE using random sampling techniqueslnu.diva-portal.org/smash/get/diva2:1345416/FULLTEXT01.pdf · 2019-08-23 · OPTIMIZING T-SNE USING RANDOM SAMPLING TECHNIQUES MATEJ

SNE vacancies (April 2020) Ref. Deadline Cost-free SNE ... · (SNE Decision). The SNE will remain employed and remunerated by his/her employer during the secondment. He/she will equally

Visualizing Data using t-SNE - Semantic Scholar · Visualizing Data using t-SNE Laurens van der Maaten L.VANDERMAATEN@MICC UNIMAAS NL MICC-IKAT Maastricht University P.O. Box 616,

t-SNE (t-distributed stochastic neighbor embedding) · 2016-10-14 · t-SNE (t-distributed stochastic neighbor embedding) MIA Primer Joseph Nasser ... t-Distributed Stochastic Neighbor

· -!JdBd pun as-Inme.nxa sne Bpsnl uoasoq ueur os uop so pun ' sne Jne -JOSSOq goop upspld sne pun opuoqoplsne uonpuos -08 nz" pun oywos sne pof sne pun soleqzanusne JnB

Author's personal copy - UCLouvain

Delphes - UCLouvain

t-SNE and its theoretical guarantee · t-SNE L.J.P. van der Maaten and G.E. Hinton. Visualizing High-Dimensional Data Using t-SNE. JMLR2008 A symmetrized version of the SNE cost function

Visualizing Dynamics: from t-SNE to SEMI-MDPs › pdf › 1606.07112v1.pdf · Visualizing Dynamics: from t-SNE to SEMI-MDPs Figure 1. SMDP visualization for Breakout. purpose of model

Appendices - UCLouvain

Supplementary Figure Legends · 2020. 10. 16. · Supplementary Figure 2. Profile of t-SNE plot of single-cell RNA-seq data and the cell type classification. , t-SNE plot of the 15208

SNE SIMULATION NOTES EUROPE...SNE E DITORIAL - C ONTENT - I NFORMATION SNE Editorial Board SNE - Simulation Notes Europe is advised and supervised by an international scientific editorial

International Journal of Antimicrobial - UCLouvain

GPGPU - UCLouvain

E5600 White Paper - UCLouvain

PHARMACOLOGY crossm - UCLouvain

Food Quality and Preference - UCLouvain

Seminar plan - UCLouvain

visualization data using t-SNE