Day 9: Unsupervised learning, dimensionality...

Introduction to Machine Learning Summer SchoolJune 18, 2018 - June 29, 2018, Chicago

Instructor:SuriyaGunasekar,TTIChicago

28June2018

Day9:Unsupervisedlearning,dimensionality

reduction

Topicssofar

• Linearregression• Classification

o Logisticregressiono Maximummarginclassifiers,kerneltricko Generativemodelso Neuralnetworkso Ensemblemethods

• TodayandTomorrowo Unsupervisedlearning– dimensionalityreduction,clusteringo Review

1

Unsupervisedlearning• Unsupervisedlearning:Requiresdata𝑥 ∈ 𝒳,butnolabels• Goal?:Compactrepresentationofthedatabydetectingpatternso e.g.Groupemailsbytopic

• Usefulwhenwedon’tknowwhatwearelookingforo makesevaluationtricky

• Applicationsinvisualization,exploratorydataanalysis,semi-supervisedlearning

2

Clustering

3

Clusteringlanguages

4

Clusteringspecies(phylogeny)

5

Imageclustering/segmentation

6

Currenttrendistousedatasetswithlabelsforsuchtaske.g.,MSCOCO

Dimensionalityreduction

• Inputdata𝑥 ∈ 𝒳 mayhavethousandsormillionsofdimensions!o e.g.,textdatarepresentedasbagorwordso e.g.,videostreamofimageso e.g.,fMRIdata#voxelsx#timesteps

• Dimensionalityreduction:representdatawithfewerdimensionso easierlearninginsubsequenttasks(preprocessing)o visualizationo discoverintrinsicpatternsinthedata

7

Manifolds

8

Embeddings

9

Lowdimensionalembedding

• Givenhighdimensionalfeature𝒙 = 𝑥&, 𝑥(, … , 𝑥*

findtransformations𝑧 𝒙 = 𝑧& 𝒙 , 𝑧( 𝒙 , … , 𝑧, 𝒙

sothat“almostallusefulinformation”about𝒙 isretainedin𝑧(𝒙)• Ingeneral𝑘 ≪ 𝑑, and𝑧(𝒙) isnotinvertible• Transformationlearnedfromadatasetofexamplesof𝑥

𝑆 = 𝒙 𝒊 ∈ ℝ*: 𝑖 = 1,2, … , 𝑁o Note:typicallynolabels𝑦

10

Lineardimensionalityreduction• Givenhighdimensionalfeature

𝒙 = 𝑥&, 𝑥(, … , 𝑥*findtransformations

𝒛 = 𝑧 𝒙 = 𝑧& 𝒙 , 𝑧( 𝒙 ,… , 𝑧, 𝒙• Restrictz 𝒙 tobealinearfunctionof𝒙

𝑧& = 𝒘𝟏. 𝒙𝑧( = 𝒘𝟐. 𝒙

⋮𝑧, = 𝒘𝒌. 𝒙

11

𝑧&𝑧(⋮𝑧,

← 𝒘 𝟏 →← 𝒘 𝟐 →

⋮← 𝒘 𝒌 →

𝑥&𝑥(𝑥E

⋮

𝑥*

=𝒛 = 𝑾𝒙

where𝒛 ∈ ℝ,,𝑾 ∈ ℝ,×*,𝒙 ∈ ℝ*

onlyquestioniswhich𝑾?

Lineardimensionality2Dexample

• Givenpoints𝑆 = {𝒙 𝒊 : 𝑖 = 1,2, … , 𝑁} in2D,wewanta1Drepresentationo project 𝒙 𝒊 ontoaline𝒘. 𝒙 = 0

o Find𝒘 tominimizesthesumofsquareddistancestotheline

12

Vectorprojections

• 𝒙. 𝒖 = 𝒙 𝒖 cos𝜃• Assuming 𝒖 = 𝟏,• 𝒙. 𝒖 = 𝒙 cos𝜃 = 𝑧R à valueof𝑥 alongu

• distanceof𝒙 toprojectionis𝑧R𝒖 − 𝒙 = ‖ 𝒙. 𝒖 𝒖 − 𝒙‖

13

𝒙 cos𝜃= 𝑧R

𝜃𝑢

Principalcomponentanalysis

• Fora1Dembeddingalongdirection𝒖,distanceof 𝒙 totheprojectionalong𝒖 isgivenby

𝑧R𝒖 − 𝒙 = ‖ 𝒙. 𝒖 𝒖 − 𝒙‖• Moregenerallyfor𝑘 dimensionalembedding:

o findorthonormalbasisofthe𝑘 dimensionalsubspace𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌 ∈ ℝ*,i.e., 𝒖𝒊. 𝒖𝒋 = 1 if𝑖 = 𝑗,and0otherwise

o let𝑼 ∈ ℝ,×* bethematrixwith𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌 alongrowso distanceofprojectionof𝒙 tospan{𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌}

𝑼Y𝑼𝒙 − 𝒙o alsofromorthonormality of𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌,check𝑼𝑼Y = 𝑰

• PCAobjective

min𝑼∈ℝ^×_

` 𝑼Y𝑼𝒙 𝒊 − 𝒙 𝒊(

a

bc&

𝑠. 𝑡. 𝑼𝑼Y = 𝐼

14

PCA• PCAobjective

min𝑼∈ℝ^×_

1𝑁` 𝑼Y𝑼𝒙 𝒊 − 𝒙 𝒊

(a

bc&


• Also,forall𝑼𝑼Y = 𝐼𝑼Y𝑼𝒙 − 𝒙 ( = 𝒙 ( + 𝑥Y𝑼Y𝑼𝑼Y𝑼𝒙 − 2𝒙Y𝑼Y𝑼𝒙

= 𝒙 𝟐 − 𝒙Y𝑼Y𝑼𝒙 = 𝒙 𝟐 − 𝑼𝒙 𝟐

• EquivalentPCAobjective

max𝑼

1𝑁` 𝑼𝒙(𝒊) (a

bc&

= ` 𝑢jYΣlmm𝑢j

�

j∈ ,


whereΣlmm =&a∑ 𝑥 b 𝑥 b

Yabc& (derivationinboard)

• ThisisthesameasfindingtopkeigenvectorsofΣlmm

15

PCAalgorithm

• Given𝑆 = 𝒙 𝒊 ∈ ℝ*: 𝑖 = 1,2, … , 𝑁

• Let𝑿 ∈ ℝa×* bedatamatrix

o makesure𝑋 isre-centeredsothatcolumnmeanis0

• Σlmm =&a∑ 𝒙 𝒊 𝒙 𝒊

Yabc& =

&a𝑿Y𝑿 ∈ ℝ*×*

• 𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌 ∈ ℝ* aretopkeigenvectorsofΣlmm

16

Howtopick𝑘?• Dataassumedtobelowdimensionalprojection+noise• Onlykeepprojectionsontocomponentswithlargeeigenvaluesandignoretherest

17Slidecredit:Arti Singh

Eigenfaces

• TurkandPentland ’91

18

SVDversion• Given𝑆 = 𝒙 𝒊 ∈ ℝ*: 𝑖 = 1,2, … , 𝑁• Let𝑿 ∈ ℝa×* bedatamatrix

o makesure𝑿 isre-centeredsothatcolumnmeanis0• 𝑿 = 𝑽s𝑺s𝑼sY betheSingularValueDecomposition(SVD)of𝑿,whereo 𝑽s ∈ ℝa×* haveorthonormalcolumns, i.e.,𝑽sY𝑽s = 𝑰

§ columnsof𝑽sarecalledleftsingularvectorso 𝑼s ∈ ℝ*×* alsohasorthonormalcolumns, i.e.,𝑼sY𝑼s = 𝑰

§ columnsof𝑼sarecalledrightsingularvectorso 𝑺s = 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙 𝜎&, 𝜎(, … , 𝜎* ∈ ℝ*×*

§ 𝜎&, 𝜎(, … , 𝜎* arecalledthesingularvalues• Firstkcolumnsof𝑼sarethe𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌 wewant.

• Representationof𝒙 ∈ ℝ* as𝑧 𝒙 ∈ ℝ, isgivenby𝑧(𝒙)j = 𝜎j𝒖𝒋. 𝒙for𝑗 = 1,2, … , 𝑘

19

Otherlineardimensionalityreduction• PCA: givendata𝑥 ∈ ℝ*,findU ∈ ℝ,×* tominimize

min 𝑈Y𝑈𝑥 − 𝑥 ((𝑠. 𝑡. 𝑈𝑈Y = 𝐼

• Canonicalcorrelationanalysis:giventwo“views”ofdata𝑥 ∈ ℝ*and𝑥 ∈ ℝ*,findU ∈ ℝ,×*, 𝑈 ∈ ℝ,×* tominimize

𝑈𝑥 − 𝑈𝑥 ((𝑠. 𝑡. 𝑈𝑈Y = 𝑈𝑈Y = 𝐼

• Sparsedictionarylearning:learnasparserepresentationof𝑥 asalinearcombinationofover-completedictionary

𝑥 → 𝐷𝑧whereD ∈ ℝ*×, 𝑧 ∈ ℝo unlikePCA,here𝑚 ≫ 𝑑 so𝑧 ishigherdimensional,butlearnedtobesparse!

• Independentcomponentanalysis• Factoranalysis• Lineardiscriminantanalysis

20

Nonlineardimensionalityreduction

• Isomap• Autoencoders• KernelPCA• Locallinearembedding• Checkoutt-SNEfor2Dvisualization• …

21

Isomap

22

Isomap – algorithm

• Datasetof𝑁 points𝑆 = 𝒙 𝒊 ∈ ℝ*: 𝑖 = 1,2, … ,𝑁• RepresentthepointsasakNN-graphwithweightsproportionaltodistancebetweenthepoints• Thegeodesicdistance𝑑 𝑥, 𝑥 betweenpointsinthemanifoldisthelengthofshortestpathinthegraph• Useanyshortestpathalgorithmcanbeusedtoconstructamatrix𝑀 ∈ ℝa×a of𝑑 𝑥 b , 𝑥(j) forall𝑥 b , 𝑥(j) ∈ 𝑆• MDS: Finda(lowdimensional)embedding𝑧(𝑥) of𝑥 sothatdistancesarepreserved

min

` 𝑧 𝑥 b − 𝑧 𝑥 j − 𝑀bj(

�

b,j∈ a

o sometimesmin∑ m

m

�b,j∈ a

23

Autoencoders

• Recallneuralnetworksasfeaturelearning

o waslearnedforsomesupervisedlearningtasko weightslearnedbyminimizingℓ(𝑣R, 𝑦)o butwedon’thave𝑦 anymore!

24

𝑣&

𝑣(

𝑣E

𝜙 𝑥 &

𝜙 𝑥 ,

𝑥&

𝑥(

𝑥*

⋯𝑣R⋯⋯

Autoencoders

• Recallneuralnetworksasfeaturelearning

o waslearnedforsomesupervisedlearningtasko weightslearnedbyminimizingℓ(𝑣R, 𝑦)o butwedon’thave𝑦 anymore!o insteaduseanother“decoder”networktoreconstruct𝑥

25

𝑣&

𝑣(

𝑣E

𝜙 𝑥 &

𝜙 𝑥 ,

𝑥&

𝑥(

𝑥*

⋯𝑣R⋯⋯

Autoencoders

• 𝜙 𝑥 = 𝑓𝑾𝟏 𝒙• 𝒙£ = 𝑓𝑾𝟐 𝜙(𝒙)• somelossℓ 𝑥¤, 𝑥

𝑊¦&,𝑊¦( = min§̈ ,§`ℓ 𝑓𝑾𝟐 𝑓𝑾𝟏 𝒙

b , 𝒙 ba

bc&

• learnusingSGDwithbackpropagation

26

𝑣&

𝑣(

𝑣E

𝜙 𝑥 &

𝜙 𝑥 ,

𝑥&

𝑥(

𝑥*⋯ ⋯

⋯𝑥¤&

𝑥¤(

𝑥¤*

⋯

⋯

Day 9: Unsupervised learning, dimensionality...

Documents

Transcript of Day 9: Unsupervised learning, dimensionality...