Day 9: Unsupervised learning, dimensionality...

27
Introduction to Machine Learning Summer School June 18, 2018 - June 29, 2018, Chicago Instructor: Suriya Gunasekar, TTI Chicago 28 June 2018 Day 9: Unsupervised learning, dimensionality reduction

Transcript of Day 9: Unsupervised learning, dimensionality...

  • Introduction to Machine Learning Summer SchoolJune 18, 2018 - June 29, 2018, Chicago

    Instructor:SuriyaGunasekar,TTIChicago

    28June2018

    Day9:Unsupervisedlearning,dimensionality

    reduction

  • Topicssofar

    • Linearregression• Classification

    o Logisticregressiono Maximummarginclassifiers,kerneltricko Generativemodelso Neuralnetworkso Ensemblemethods

    • TodayandTomorrowo Unsupervisedlearning– dimensionalityreduction,clusteringo Review

    1

  • Unsupervisedlearning• Unsupervisedlearning:Requiresdata𝑥 ∈ 𝒳,butnolabels• Goal?:Compactrepresentationofthedatabydetectingpatternso e.g.Groupemailsbytopic

    • Usefulwhenwedon’tknowwhatwearelookingforo makesevaluationtricky

    • Applicationsinvisualization,exploratorydataanalysis,semi-supervisedlearning

    2

  • Clustering

    3

  • Clusteringlanguages

    4

  • Clusteringspecies(phylogeny)

    5

  • Imageclustering/segmentation

    6

    Currenttrendistousedatasetswithlabelsforsuchtaske.g.,MSCOCO

  • Dimensionalityreduction

    • Inputdata𝑥 ∈ 𝒳 mayhavethousandsormillionsofdimensions!o e.g.,textdatarepresentedasbagorwordso e.g.,videostreamofimageso e.g.,fMRIdata#voxelsx#timesteps

    • Dimensionalityreduction:representdatawithfewerdimensionso easierlearninginsubsequenttasks(preprocessing)o visualizationo discoverintrinsicpatternsinthedata

    7

  • Manifolds

    8

  • Embeddings

    9

  • Lowdimensionalembedding

    • Givenhighdimensionalfeature𝒙 = 𝑥&, 𝑥(, … , 𝑥*

    findtransformations𝑧 𝒙 = 𝑧& 𝒙 , 𝑧( 𝒙 , … , 𝑧, 𝒙

    sothat“almostallusefulinformation”about𝒙 isretainedin𝑧(𝒙)• Ingeneral𝑘 ≪ 𝑑, and𝑧(𝒙) isnotinvertible• Transformationlearnedfromadatasetofexamplesof𝑥

    𝑆 = 𝒙 𝒊 ∈ ℝ*: 𝑖 = 1,2, … , 𝑁o Note:typicallynolabels𝑦

    10

  • Lineardimensionalityreduction• Givenhighdimensionalfeature

    𝒙 = 𝑥&, 𝑥(, … , 𝑥*findtransformations

    𝒛 = 𝑧 𝒙 = 𝑧& 𝒙 , 𝑧( 𝒙 ,… , 𝑧, 𝒙• Restrictz 𝒙 tobealinearfunctionof𝒙

    𝑧& = 𝒘𝟏. 𝒙𝑧( = 𝒘𝟐. 𝒙

    ⋮𝑧, = 𝒘𝒌. 𝒙

    11

    𝑧&𝑧(⋮𝑧,

    ← 𝒘 𝟏 →← 𝒘 𝟐 →

    ⋮← 𝒘 𝒌 →

    𝑥&𝑥(𝑥E

    𝑥*

    =𝒛 = 𝑾𝒙

    where𝒛 ∈ ℝ,,𝑾 ∈ ℝ,×*,𝒙 ∈ ℝ*

    onlyquestioniswhich𝑾?

  • Lineardimensionality2Dexample

    • Givenpoints𝑆 = {𝒙 𝒊 : 𝑖 = 1,2, … , 𝑁} in2D,wewanta1Drepresentationo project 𝒙 𝒊 ontoaline𝒘. 𝒙 = 0

    o Find𝒘 tominimizesthesumofsquareddistancestotheline

    12

  • Vectorprojections

    • 𝒙. 𝒖 = 𝒙 𝒖 cos𝜃• Assuming 𝒖 = 𝟏,• 𝒙. 𝒖 = 𝒙 cos𝜃 = 𝑧R à valueof𝑥 alongu

    • distanceof𝒙 toprojectionis𝑧R𝒖 − 𝒙 = ‖ 𝒙. 𝒖 𝒖 − 𝒙‖

    13

    𝒙 cos𝜃= 𝑧R

    𝜃𝑢

  • Principalcomponentanalysis

    • Fora1Dembeddingalongdirection𝒖,distanceof 𝒙 totheprojectionalong𝒖 isgivenby

    𝑧R𝒖 − 𝒙 = ‖ 𝒙. 𝒖 𝒖 − 𝒙‖• Moregenerallyfor𝑘 dimensionalembedding:

    o findorthonormalbasisofthe𝑘 dimensionalsubspace𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌 ∈ ℝ*,i.e., 𝒖𝒊. 𝒖𝒋 = 1 if𝑖 = 𝑗,and0otherwise

    o let𝑼 ∈ ℝ,×* bethematrixwith𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌 alongrowso distanceofprojectionof𝒙 tospan{𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌}

    𝑼Y𝑼𝒙 − 𝒙o alsofromorthonormality of𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌,check𝑼𝑼Y = 𝑰

    • PCAobjective

    min𝑼∈ℝ^×_

    ` 𝑼Y𝑼𝒙 𝒊 − 𝒙 𝒊(

    a

    bc&

    𝑠. 𝑡. 𝑼𝑼Y = 𝐼

    14

  • PCA• PCAobjective

    min𝑼∈ℝ^×_

    1𝑁` 𝑼Y𝑼𝒙 𝒊 − 𝒙 𝒊

    (a

    bc&

    𝑠. 𝑡. 𝑼𝑼Y = 𝐼

    • Also,forall𝑼𝑼Y = 𝐼𝑼Y𝑼𝒙 − 𝒙 ( = 𝒙 ( + 𝑥Y𝑼Y𝑼𝑼Y𝑼𝒙 − 2𝒙Y𝑼Y𝑼𝒙

    = 𝒙 𝟐 − 𝒙Y𝑼Y𝑼𝒙 = 𝒙 𝟐 − 𝑼𝒙 𝟐

    • EquivalentPCAobjective

    max𝑼

    1𝑁` 𝑼𝒙(𝒊) (a

    bc&

    = ` 𝑢jYΣlmm𝑢j

    j∈ ,

    𝑠. 𝑡. 𝑼𝑼Y = 𝐼

    whereΣlmm =&a∑ 𝑥 b 𝑥 b

    Yabc& (derivationinboard)

    • ThisisthesameasfindingtopkeigenvectorsofΣlmm

    15

  • PCAalgorithm

    • Given𝑆 = 𝒙 𝒊 ∈ ℝ*: 𝑖 = 1,2, … , 𝑁

    • Let𝑿 ∈ ℝa×* bedatamatrix

    o makesure𝑋 isre-centeredsothatcolumnmeanis0

    • Σlmm =&a∑ 𝒙 𝒊 𝒙 𝒊

    Yabc& =

    &a𝑿Y𝑿 ∈ ℝ*×*

    • 𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌 ∈ ℝ* aretopkeigenvectorsofΣlmm

    16

  • Howtopick𝑘?• Dataassumedtobelowdimensionalprojection+noise• Onlykeepprojectionsontocomponentswithlargeeigenvaluesandignoretherest

    17Slidecredit:Arti Singh

  • Eigenfaces

    • TurkandPentland ’91

    18

  • SVDversion• Given𝑆 = 𝒙 𝒊 ∈ ℝ*: 𝑖 = 1,2, … , 𝑁• Let𝑿 ∈ ℝa×* bedatamatrix

    o makesure𝑿 isre-centeredsothatcolumnmeanis0• 𝑿 = 𝑽s𝑺s𝑼sY betheSingularValueDecomposition(SVD)of𝑿,whereo 𝑽s ∈ ℝa×* haveorthonormalcolumns, i.e.,𝑽sY𝑽s = 𝑰

    § columnsof𝑽sarecalledleftsingularvectorso 𝑼s ∈ ℝ*×* alsohasorthonormalcolumns, i.e.,𝑼sY𝑼s = 𝑰

    § columnsof𝑼sarecalledrightsingularvectorso 𝑺s = 𝑑𝑖𝑎𝑔𝑜𝑛𝑎𝑙 𝜎&, 𝜎(, … , 𝜎* ∈ ℝ*×*

    § 𝜎&, 𝜎(, … , 𝜎* arecalledthesingularvalues• Firstkcolumnsof𝑼sarethe𝒖𝟏, 𝒖𝟐, … , 𝒖𝒌 wewant.

    • Representationof𝒙 ∈ ℝ* as𝑧 𝒙 ∈ ℝ, isgivenby𝑧(𝒙)j = 𝜎j𝒖𝒋. 𝒙for𝑗 = 1,2, … , 𝑘

    19

  • Otherlineardimensionalityreduction• PCA: givendata𝑥 ∈ ℝ*,findU ∈ ℝ,×* tominimize

    min 𝑈Y𝑈𝑥 − 𝑥 ((𝑠. 𝑡. 𝑈𝑈Y = 𝐼

    • Canonicalcorrelationanalysis:giventwo“views”ofdata𝑥 ∈ ℝ*and𝑥 ∈ ℝ*,findU ∈ ℝ,×*, 𝑈 ∈ ℝ,×* tominimize

    𝑈𝑥 − 𝑈𝑥 ((𝑠. 𝑡. 𝑈𝑈Y = 𝑈𝑈Y = 𝐼

    • Sparsedictionarylearning:learnasparserepresentationof𝑥 asalinearcombinationofover-completedictionary

    𝑥 → 𝐷𝑧whereD ∈ ℝ*×, 𝑧 ∈ ℝo unlikePCA,here𝑚 ≫ 𝑑 so𝑧 ishigherdimensional,butlearnedtobesparse!

    • Independentcomponentanalysis• Factoranalysis• Lineardiscriminantanalysis

    20

  • Nonlineardimensionalityreduction

    • Isomap• Autoencoders• KernelPCA• Locallinearembedding• Checkoutt-SNEfor2Dvisualization• …

    21

  • Isomap

    22

  • Isomap – algorithm

    • Datasetof𝑁 points𝑆 = 𝒙 𝒊 ∈ ℝ*: 𝑖 = 1,2, … ,𝑁• RepresentthepointsasakNN-graphwithweightsproportionaltodistancebetweenthepoints• Thegeodesicdistance𝑑 𝑥, 𝑥 betweenpointsinthemanifoldisthelengthofshortestpathinthegraph• Useanyshortestpathalgorithmcanbeusedtoconstructamatrix𝑀 ∈ ℝa×a of𝑑 𝑥 b , 𝑥(j) forall𝑥 b , 𝑥(j) ∈ 𝑆• MDS: Finda(lowdimensional)embedding𝑧(𝑥) of𝑥 sothatdistancesarepreserved

    min

    ` 𝑧 𝑥 b − 𝑧 𝑥 j − 𝑀bj(

    b,j∈ a

    o sometimesmin∑ m

    m

    �b,j∈ a

    23

  • Autoencoders

    • Recallneuralnetworksasfeaturelearning

    o waslearnedforsomesupervisedlearningtasko weightslearnedbyminimizingℓ(𝑣R, 𝑦)o butwedon’thave𝑦 anymore!

    24

    𝑣&

    𝑣(

    𝑣E

    𝜙 𝑥 &

    𝜙 𝑥 ,

    𝑥&

    𝑥(

    𝑥*

    ⋯𝑣R⋯⋯

  • Autoencoders

    • Recallneuralnetworksasfeaturelearning

    o waslearnedforsomesupervisedlearningtasko weightslearnedbyminimizingℓ(𝑣R, 𝑦)o butwedon’thave𝑦 anymore!o insteaduseanother“decoder”networktoreconstruct𝑥

    25

    𝑣&

    𝑣(

    𝑣E

    𝜙 𝑥 &

    𝜙 𝑥 ,

    𝑥&

    𝑥(

    𝑥*

    ⋯𝑣R⋯⋯

  • Autoencoders

    • 𝜙 𝑥 = 𝑓𝑾𝟏 𝒙• 𝒙£ = 𝑓𝑾𝟐 𝜙(𝒙)• somelossℓ 𝑥¤, 𝑥

    𝑊¦&,𝑊¦( = min§̈ ,§`ℓ 𝑓𝑾𝟐 𝑓𝑾𝟏 𝒙

    b , 𝒙 ba

    bc&

    • learnusingSGDwithbackpropagation

    26

    𝑣&

    𝑣(

    𝑣E

    𝜙 𝑥 &

    𝜙 𝑥 ,

    𝑥&

    𝑥(

    𝑥*⋯ ⋯

    ⋯𝑥¤&

    𝑥¤(

    𝑥¤*