Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Associative Memories (II)Stochastic Networks and Boltzmann Machines
Davide Bacciu
Dipartimento di InformaticaUniversità di [email protected]
Applied Brain Science - Computational Neuroscience (CNS)
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
Deterministic Networks
Consider the Hopfield network model introduced so farNeuron output is a deterministic function of its inputsRecurrent network of all visible unitsLearns to encode a set of fixed training patternsV = [v1 . . . vP]
What models do we obtain if we relax the deterministicinput-output mapping?
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
Stochastic Networks
A network of units whose activation is determined by astochastic function
The state of a unit at a given timestep is sampled from agiven probability distributionThe network learns a probability distribution P(V) from thetraining patterns
Network includes bothvisible v and hidden h unitsNetwork activity is asample from posteriorprobability given inputs(visible data)
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
Neural Sampling Hypothesis
The activity of each neuron can be sampled from the networkdistribution
No distinction between input and outputNatural way to deal with incomplete stimuliStochastic nature of activation is coherent with neuralresponse variabilitySpontaneous neural activity can be explained in terms ofprior/marginal distributions
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
Probability and Statistics Refresher
On the blackboard
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
Stochastic Binary Neurons
Spiking point neuron with binary output sjTypically discrete time model with time into small ∆tintervalsAt each time interval(t + 1 ≡ t + ∆t), the neuron can emit aspike with probability p(t)
j
s(t)j =
{1, with probability p(t)
j
0, with probability 1− p(t)j
The key is in the definition of the spiking probability (need to bea function of local potential)
p(t)j ≈ σ(x (t)
j )
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
General Sigmoidal Stochastic Binary Network
Network of N neurons with binary activation sj
Weight matrix M = [Mi j]i,j∈{1,...,N}Bias vector b = [bj ]j∈{1,...,N}
Local neuron potential xj defined as usual
x (t+1)j =
N∑i=1
Mijs(t)i + bj
A chosen neuron fires with spiking probability
p(t+1)j = P(s(t+1)
j = 1|st ) = σ(x (t+1)j ) =
1
1 + e−x (t+1)j
Formulation highlights Markovian dynamics
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
Neurobiological Foundations
Variability in transmitter release in synaptic vescicles ≈Gaussian Distribution (Katz 1954)Assuming independent distributions and large number ofsynaptic connections
⇒ Central limit theorem⇒ Local (membrane) potential ≈ Gaussian
DistributionConditional probability of neuron spiking is a GaussianCDF ≈ scaled sigmoid
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
Network Dynamics (I)
How does the network state (activation of all neurons) evolve intime?
Assume neurons to be updated in parallel every ∆t (Paralleldynamics)
P(s(t+1)|s(t)) =N∏
j=1
P(s(t+1)j |st ) = T (s(t+1)|s(t))
Yielding a Markov process for state update
P(s(t+1) = s′) =∑
s
T (s′|s)P(s(t) = s)
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
IntroductionStochastic Neurons
Network Dynamics (II)
Parallel dynamics assumes a synchronization clock exist(biologically non plausible)Alternatively, one neuron at random can be chosen forupdate at each step (Glauber Dynamics)No fixed-point guarantees for s but it has a stationarydistribution for the network at equilibrium state when itsconnectivity is symmetric
Given Fj as state flip operator for j-th neuron s(t+1) = Fjs(t)
T (s(t+1)|s(t)) =1N
P(s(t+1)j |st )
While if s(t+1) = s(t)
T (s(t+1)|s(t)) = 1− 1N
∑j
P(s(t+1)j |st )
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
The Boltzmann-Gibbs Distribution
Symmetric connectivity enforces detailed balance condition
P(s)T (s′|s) = P(s′)T (s|s′)
Ensures reversible transitions guaranteeing existence ofequilibrium (Boltzmann-Gibbs) distribution
P∞(s) =e−E(s)
Z
whereE(s) is the energy function
Z =∑
s
e−E(s) is the partition function
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Boltzmann Machines
A stochastic recurrent networkwhere binary unit states are alsorandom variables
visible v ∈ {0,1}latent h ∈ {0,1}s = [vh]
Boltzmann-Gibbs distribution having linear energy function
E(s) = −12
∑ij
Mijsisj −∑
j
bjsj = −12
sT Ms− bT s
with symmetric and no self-recurrent connectivity
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Learning
Ackley, Hinton and Sejnowski (1985)Boltzmann machines can be trained so that the equilibriumdistribution tends towards any arbitrary distribution acrossbinary vectors given samples from that distribution
A couple of simplifications to start withBias b absorbed into weight matrix MConsider only visible units s = v
Use probabilistic learning techniques to fit the parameters, i.e.maximizing the log-likelihood
L(M) =1L
L∑l=1
log P(vl |M)
given the P visible training patterns vl
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Gradient Approach
First, the gradient for a single pattern
∂P(v|M)
∂Mij= −〈vivj〉+ vivj
with free expectations 〈vivj〉 =∑
v
P(v)vivj
Then, the log-likelihood gradient
∂L∂Mij
= −〈vivj〉+ 〈vivj〉c
with clamped expectations 〈vivj〉c =1L
p∑l=1
v li v
lj
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Something we have already seen..
It is Hebbian learning again!
〈vivj〉c︸ ︷︷ ︸wake
−〈vivj〉︸ ︷︷ ︸dream
wake part is the usual Hebb rule applied to the empiricaldistribution of data that the machine sees coming in fromthe outside worlddream part is an anti-hebbian term concerning correlationbetween units when generated by the internal dynamics ofthe machine
Can only capture quadratic correlation!
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Learning with Hidden Units
To efficiently capture higher-order correlations we need tointroduce hidden units hAgain log-likelihood gradient ascent (s = [vh])
∂P(v|M)
∂Mij=
∑h
sisjP(h|v)−∑
s
sisjP(s)
= 〈sisj〉c − 〈sisj〉
Expectations generally become intractable due to thepartition function Z (exponential complexity)
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Boltzmann Mean-field Approximation
Approximate the expectations using a simpler distributionthan P(s)
Q(s) =N∏
i=1
msii (1−mi)
si
New parameter mi describing the probability of unit i beingon, that are chosen to minimize
m∗ = arg minm
KL[Q(s)||P(s)]
The solution is obtained by the N dimensional system
mi = σ(∑
j
Mijmj)
The mi ’s allow to approximate the intractable expectations
〈sisj〉 ≈ mimj
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Restricted Boltzmann Machines (RBM)
A special Boltzmann machineBipartite graphConnections only betweenhidden and visible units
Energy function, highlighting bipartition in hidden (h) andvisible (v) units
E(v,h) = −vT Mh− bT v− cT h
Learning (and inference) becomes tractable due to graphbipartition which factorizes distribution
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
The RBM Catch
Hidden units are conditionally independent given visible units,and viceversa
P(hj |v) = σ(∑
i
Mijvi + cj)
P(vi |h) = σ(∑
j
Mijhj + bi)
They can be updated in batch!
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Training Restricted Boltzmann Machines
Again by likelihood maximization, yields
∂L∂Mij
= 〈vihj〉c︸ ︷︷ ︸data
−〈vihj〉︸ ︷︷ ︸model
A Gibbs sampling approach
WakeClamp data on vSample vihj for all pairs ofconnected unitsRepeat for all elements ofdataset
DreamDon’t clamp unitsLet network reachequilibriumSample vihj for all pairs ofconnected unitsRepeat many times to geta good estimate
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Gibbs-Sampling RBM
∂L∂Mij
= 〈vihj〉c︸ ︷︷ ︸data
−〈vihj〉︸ ︷︷ ︸model
It is difficult to obtain an unbiased sample of the second term
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Gibbs-Sampling RBMPlugging-in Data
1 Start with a training vector on the visible units2 Alternate between updating all the hidden units in parallel
and updating all the visible units in parallel (iterate)
∂L∂Mij
= 〈vihj〉0︸ ︷︷ ︸data
−〈vihj〉∞︸ ︷︷ ︸model
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
Contrastive-Divergence Learning
Gibbs sampling can be painfully slow to converge
1 Clamp a training vector vl
on visible units2 Update all hidden units in
parallel3 Update the all visible units
in parallel to get areconstruction
4 Update the hidden unitsagain
〈vihj〉0︸ ︷︷ ︸data
− 〈vihj〉1︸ ︷︷ ︸reconstruction
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
What does Contrastive Divergence Learn?
A very crude approximation of the gradient of thelog-likelihood
It does not even follow the gradient closelyMore closely approximating the gradient of a objectivefunction called the Contrastive Divergence
It ignores one tricky term in this objective function so it isnot even following that gradient
Sutskever and Tieleman (2010) have shown that it is notfollowing the gradient of any function
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
Boltzmann MachineRestricted Boltzmann Machine
So Why Using it?
Because He says so!
It works well enough in many significant applications
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
ApplicationsSummary
Character Recognition
Learning good features for reconstructing images of number 2handwriting
Slide credit goes to G. Hinton
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
ApplicationsSummary
Weight Learning
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
ApplicationsSummary
Final Weights
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
ApplicationsSummary
Digit Reconstruction
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
ApplicationsSummary
Digit Reconstruction (II)
What would happen if we supply the RBM with a test digit that itis not a 2?
It will try anyway to see a 2 in whatever we supply!
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
ApplicationsSummary
One Last Final Reason for Introducing RBM
Deep Belief Network
The fundamental buildingblock for one of the mostpopular deep learningarchitectures
A network of stackedRBM trained layer-wiseby ContrastiveDivergence plus asupervised read-out layer
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
ApplicationsSummary
Take Home Messages
Stochastic networks as a paradigm that can explain neuralvariability and spontaneous activity in terms of distributionsBoltzmann Machines
Hidden neurons required to explain high-order correlationsTraining is a mix of Hebbian and anti-HebbianMultifaceted nature (recurrent network, undirectedgraphical model and energy-based network)
Restricted Boltzmann MachinesTractable model thanks to bipartite connectivityTrained by a very short Gibbs sampling (contrastivedivergence)Can be very powerful if stacked (deep learning)
Stochastic Networks and NeuronsBoltzmann Machine
Conclusions
ApplicationsSummary
Next Lecture
Part I - Lecture (1h)Incremental learning in competitive networksStability-plasticity dilemmaAdaptive Resonance Theory (ART)
Part II - Lab (2h)Restricted Boltzmann MachinesI suggest you have a good look at reference [1] on thecourse wiki (pages 3− 6)
Top Related