Jérôme Tubiana, Rémi Monasson Laboratoire de Physique...
Transcript of Jérôme Tubiana, Rémi Monasson Laboratoire de Physique...
![Page 1: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/1.jpg)
JérômeTubiana,RémiMonassonLaboratoiredePhysiqueThéorique
EcoleNormaleSupérieure
![Page 2: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/2.jpg)
Mo3va3on
GoogleNetDeepNeuralNetwork
Whydoesthisnetworkwork?(andnotothers!)
![Page 3: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/3.jpg)
RestrictedBoltzmannMachines
Ackley,Hinton&Sejnowski1985;Smolensky1986;Hinton,2002
Hiddenlayer
Visiblelayer(binaryr.v.)
V1 V3V2
h1 h2
W�,i
• Graphicalmodelcons3tutedbytwosetsofrandomvariablesthatarecoupledtogether.
P (v, h) =1
Ze�E(v,h)
E(v, h) = �NX
i=1
ivvi +
KX
�=1
U�(h�)�X
�,i
W�,ivih�
![Page 4: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/4.jpg)
RestrictedBoltzmannMachines
Ackley,Hinton&Sejnowski1985;Smolensky1986;Hinton,2002
Hiddenlayer
Visiblelayer(binaryr.v.)
V1 V3V2
h1 h2
W�,i
• Graphicalmodelcons3tutedbytwosetsofrandomvariablesthatarecoupledtogether.
• RBMlearnaprobabilitydistribu3onoverthe
visiblelayer.
P (v, h) =1
Ze�E(v,h)
E(v, h) = �NX
i=1
ivvi +
KX
�=1
U�(h�)�X
�,i
W�,ivih�
P (v) =
Z KY
�=1
dh�P (v, h�) =1
Ze�Heff (v)
![Page 5: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/5.jpg)
Vanillaexample
• Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)• Observestrongcorrela3onbetweenallpairsofvariables
V1
V3V2
V3
![Page 6: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/6.jpg)
Vanillaexample
V1
V3V2
V4
Isingmodelexplainscorrela3onbydirect
couplings
• Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)• Observestrongcorrela3onbetweenallpairsofvariables
![Page 7: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/7.jpg)
Vanillaexample
• Supposeavectoroffourbinaryrandomvariable(v1v2v3v4)• Observestrongcorrela3onbetweenallpairsofvariables
V1
V3V2
V3
h1
RBMexplainscorrela3onbycommoninput
![Page 8: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/8.jpg)
SamplingfromRBM
V(0)
h(0)
V(1) V(2)
h(1)Extractfeaturesfrom
data
• Computethehiddenlayerinputs
• Sampleeachhiddenunitindependently
• Computethevisiblelayerinputs
• Sampleeachvisibleunitindependently
x↵ =X
i
W↵,ivi
yi =X
↵
W↵,ih↵
p(h↵
|x↵
) / e
�U↵(h↵)+h↵x↵
p(vi|yi) / evi( iv+yi)
Inputlayer(data)
Hiddenlayer(features)
Reconstructdatafromfeatures
![Page 9: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/9.jpg)
Thehiddenunitspoten3alsGaussian units :
h↵ 2 R , U↵(h↵) =h2↵
2
Heff [v] = �X
i
ivvi �
X
↵
(X
i
W↵,ivi)2
GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)
![Page 10: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/10.jpg)
Thehiddenunitspoten3alsGaussian units :
h↵ 2 R , U↵(h↵) =h2↵
2
Heff [v] = �X
i
ivvi �
X
↵
(X
i
W↵,ivi)2
GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)
Ingeneral,theeffec3vehamiltonianisnotquadra3candhashigher-ordercouplings.Examples:• Bernoulli• ReLU+
Non gaussian units :
![Page 11: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/11.jpg)
Thehiddenunitspoten3alsGaussian units :
h↵ 2 R , U↵(h↵) =h2↵
2
Heff [v] = �X
i
ivvi �
X
↵
(X
i
W↵,ivi)2
GaussianRBMslearnpairwisecouplings(EquivalenttotheHopfieldmodel)
Ingeneral,theeffec3vehamiltonianisnotquadra3candhashigher-ordercouplings.Examples:• Bernoulli• ReLU+
Non gaussian units :
DifferentpotenNalscorrespondtodifferent
transferfuncNons
![Page 12: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/12.jpg)
RBMlearndatarepresenta3ons
Givenaconfigura3on(v1,…,vN),Thehiddenunitsac3va3ons(h1,…,hK)definearepresenta3onofthedata.Learningrepresenta3onsisacrucialtask:
• InNeuroscience:Sensoryinforma3onprocessing(ex:fromsensorstocor3calareas).
• InMachineLearning:Thesuccessoflearningalgorithmsdependsondatarepresenta3ons.Agoodrepresenta3onlearningalgorithmextractthefeaturesthathavevariabilityinmanydifferentneighborregionsoftheinputspace.
ThesuccessofDeepLearningalgorithmliesintheirabilitytolearnabstractrepresenta3ons.RBMisoneofthesimplestrepresenta3onlearningalgorithm
thatcanbestudied
![Page 13: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/13.jpg)
Example:MNISTsynthe3cdigits
ReLU+RBMK=400
LogLikelihood:-63Nats
GaussianRBMK=400
LogLikelihood:-83Nats
MNIST60,000imageofdigitsofsize28x28
Learningalgorithm:PCD,PT(Tieleman2008,Desjardins2010)
![Page 14: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/14.jpg)
MNISTdistributedrepresenta3on
AsubsetofthefeatureslearntforaReLU+RBM.K=400.Eachimagerepresentaweightvector
W�Eachgeneratedhandwriiendigitimageiscomposedby
superposingabout20elementarystrokes.
Differentcombina3onsofstrokesproducedifferentvariantsofthesamedigit
![Page 15: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/15.jpg)
Phenomenologyoflearning
Keymetricsmonitoredduringlearning:• Log-likelihood:increases L =< log(p(x|✓)) >
validation
Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)
![Page 16: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/16.jpg)
Phenomenologyoflearning
� =<X
i
W 2↵,i >↵
Keymetricsmonitoredduringlearning:• Log-likelihood:increases
• Weightamplitude:increases
L =< log(p(x|✓)) >validation
Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)
![Page 17: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/17.jpg)
Phenomenologyoflearning
� =<X
i
W 2↵,i >↵
Keymetricsmonitoredduringlearning:• Log-likelihood:increases
• Weightamplitude:increases
• Weightsparsity:Theweightsaregekngmoresparse
p = fraction non-zero couplings
L =< log(p(x|✓)) >validation
Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)
![Page 18: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/18.jpg)
Phenomenologyoflearning
� =<X
i
W 2↵,i >↵
Keymetricsmonitoredduringlearning:• Log-likelihood:increases
• Weightamplitude:increases
• Weightsparsity:Theweightsaregekngmoresparse
• Averagenumberofac3vehiddenunits:increase(alertransient)
p = fraction non-zero couplings
L =< log(p(x|✓)) >validation
L = Number of active hidden units
Learningalgorithms:PCD&PT(Tieleman2008,Desjardins2010)
![Page 19: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/19.jpg)
Ques3ons:
• Howcanabipar3tenetworkgeneratesuchdata?• WhydosomeRBMworkandothersdon’t?• Whatmechanismproducesdistributedrepresenta3ons?
![Page 20: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/20.jpg)
Ques3ons:
• Howcanabipar3tenetworkgeneratesuchdata?• WhydosomeRBMworkandothersdon’t?• Whatmechanismproducesdistributedrepresenta3ons?
Sta3s3calPhysicsApproach:studytheproper3esofarandomRBMwithprescribedcontrolparametersandthedifferentphases(theoutcomeofthealgorithm)
![Page 21: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/21.jpg)
RandomWeightRBMmodel
MulNtaskingassociaNvenetworksAgliarietal.,PRL2012
N ! 1
↵N
Sparse Random Weights
W
8<
:
0 1� p+1
p2
�1
p2
vi 2 {0, 1}field v
h� 2 R+ , ReLU+
Threshold h
![Page 22: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/22.jpg)
ThePhasesGlassyPhase:
• Allthehiddenunitareweakly
ac3vated• Allstateshaveweak
probability
ComposiNonalPhase• Severalhiddenunitsarestronglyac3vated,
andtheothersarequiet• Numberofregionswithhighprobabilityis
polynomialinN
FerromagneNcPhase:• Onehiddenunitisstrongly
ac3vatedandtheothersareweaklyac3vated.
• NumberofregionswithhighprobabilityislinearinN
h %p &
↵ %
![Page 23: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/23.jpg)
StatMechsofRandomRBM• Areplicatheorycomputa3onisperformedtoes3matethefreeenergyinthe
zero-temperaturelimit:
NumberofacNve
hiddenunitsHiddenunitsthreshold
Fieldsonvisibleunits
Number of hidden units
Number of visible units
Weightssparsity
L? / 1
p
F (↵, p, v, h, L)
![Page 24: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/24.jpg)
Valida3onMNIST
NumericalExperiment:
ReLU+RBMsaretrainedwitharangeofL1-like
regulariza3onthatcontroltheweightmatrixsparsity
Astheweightsget
sparser,thenumberofac3vehiddenunits
increases
![Page 25: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/25.jpg)
Conclusions
• Distributedencodingreliesontwokeyproper3es:– Nonquadra3cpoten3als(ienon-lineartransferfunc3ons).Theydenoisethehiddenlayerinputsallowingforbeierstability.
– Weightsparsityallowforac3va3onofmanyhiddenunitsthatdetectcomplementaryfeatures.Thecombinatoricscreatesarichoutputdistribu3on.
• Future:– Dynamicsoflearning– Deepmodels
![Page 26: Jérôme Tubiana, Rémi Monasson Laboratoire de Physique ...krzakala/WEBSITE_Stat2016/TALKS/jerome.pdf · • Why do some RBM work and others don’t ? • What mechanism produces](https://reader033.fdocuments.in/reader033/viewer/2022052008/601cf4529e51e1421444bf00/html5/thumbnails/26.jpg)
Acknowledgements
• Funding:– EcoleNormaleSupérieure– CNRS:Inphyni3Challenge
• Discussions:A.Dubreuil,L.Posani