Varieties of Helmholtz Machine
description
Transcript of Varieties of Helmholtz Machine
![Page 1: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/1.jpg)
Varieties of Helmholtz Machine
Peter Dayan and Geoffrey E. Hinton,
Neural Networks, Vol. 9, No. 8, pp.1385-1403, 1996.
![Page 2: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/2.jpg)
Helmholtz Machines
• Hierarchical compression schemes would reveal the true hidden causes of the sensory data and that this facilitate subsequent supervised learning.– Easy to unsupervised learning via
unlabelled data.
![Page 3: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/3.jpg)
Density Estimation with Hidden States
• log-likelihood of observed data vectors d
• maximum likelihood estimation
teshidden stawhere
dpdp
the,
)|,(log)|(log
d
dp )|(logmaxarg
![Page 4: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/4.jpg)
The Helmholtz Machine
• The top-down weights– the parameter of the generative model– unidirectional Bayesian network – factorial within each layer
• The bottom-up weights– the parameter of the recognition
model– another unidirectional Bayesian network
![Page 5: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/5.jpg)
![Page 6: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/6.jpg)
Another view of HM
• Autoencoders– the recognition model : the coding operation of t
urning inputs d into stochastic odes in the hidden layer
– the generative model : reconstructs its best guess of the input on the basis of the code that it sees
• Maximizing the likelihood of the data can be interpreted as minimizing the total number of bits it takes to send the data from sender to receiver
![Page 7: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/7.jpg)
The deterministic HM- Dayan et al. 1995 (NC)
• Approximation inspired by mean-field methods
• replacing stochastic firing probabilities in the recognition model by their deterministic mean values.
• Advantage – powerful optimization method
• disadvantage – incorrect capturing of recognition
distribution
![Page 8: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/8.jpg)
The stochastic HM- Hinton et al. 1995 (Science)
• Capture the correlation between the activities in different hidden layers.
• Wake-sleep algorithm
![Page 9: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/9.jpg)
Variants of the HM
• Unit activation function• reinforcement learning• alternative recognition models• supervised HM• modeling temporal structure
![Page 10: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/10.jpg)
Unit Activation Function
• The wake-sleep algorithm is particularly convenient for changing the activation functions.
![Page 11: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/11.jpg)
![Page 12: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/12.jpg)
The Reinforcement Learning HM
• This methods only for correctly optimizing recognition weights.
• can makes learning very slow.
![Page 13: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/13.jpg)
Alternative Recognition Models
• Recurrent Recognition– Sophisticated mean field methods– Using E-M algorithm – Only generative weights
– But poor results
![Page 14: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/14.jpg)
Alternative Recognition Models
• Dangling Units– For XOR problem (explanation away
problem)– No modification of wake-sleep algorithm
![Page 15: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/15.jpg)
Alternative Recognition Models
• Other sampling methods– Gibbs sampling– Metropolis algorithm
![Page 16: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/16.jpg)
Alternative Recognition Models• The Lateral HM
– Recurrent weights within hidden layer.
– Only recognition model– Recurrent connections
into the generative pathway of HM Boltzmann machine.
![Page 17: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/17.jpg)
Alternative Recognition Models
• The Lateral HM– During wake phase
• Using stochastic Gibbs sampling
– During sleep phase• Generative weights updated• Samples is produced by generative weights
and lateral weights
![Page 18: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/18.jpg)
Alternative Recognition Models
• The Lateral HM– Boltzmann machine learning methods can be u
sed.– Recognition models
• Calculate
• Use Boltzmann machine methods• For learning
),0|(
),1|(
1
i
i
i
i
yp
yp
d
d
![Page 19: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/19.jpg)
Supervised HMs
• Supervised learning p(d|e)– e : input, d : output
• First model– Not good architecture
)|(/)|,(),|( epedpedp
![Page 20: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/20.jpg)
Supervised HMs
• The Side-Information HM– e as extra input to both recognition and
generative pathway during learning– Standard wake-sleep algorithm can be
used.
![Page 21: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/21.jpg)
Supervised HMs
• The Clipped HM– To generate samples
over d– Standard wake-sleep
algorithm is used to train the e pathway
– The extra generative connections to d are trained during wake-phases once the weights for e have converged
![Page 22: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/22.jpg)
Supervised HMs
• The Inverse HM– Takes direct advantage of the
capacity of the recognition model in the HM to learn inverse distributions
– After learning, the units above d can be discarded
![Page 23: Varieties of Helmholtz Machine](https://reader036.fdocuments.in/reader036/viewer/2022081513/56814842550346895db55897/html5/thumbnails/23.jpg)
The Helmholtz Machine Through Time (HMTT)
• Wake-sleep algorithm is used.