Data Normalization in the Learningtang/papers/ZM_poster.pdf · Data Normalization in the Learning...

1
Data Normalization in the Learning of Restricted Boltzmann Machines Charlie Tang Ilya Sutskever Introduction Algorithm Conclusions RBM training with CD works well when data is sparse (black background) Training is much worse with inverted images We present a simple and effective solution to this problem Extremely simple way to improve training Department of Computer Science, University of Toronto, Canada New algorithm requires the addition of lines of code 3 Tech Report: www.cs.toronto.edu/~tang AIS estimation of log-prob in nats: test (training) Filters learned using zero-mean Data normalization improves RBM training and should always be used "Zero-mean" solution: Many "dead" filters -110 nats Standard MNIST filters -96 nats Leads to models with higher log-probs Learns sparser features, which are better for classification Zero-mean works well with CD, PCD, and FPCD

Transcript of Data Normalization in the Learningtang/papers/ZM_poster.pdf · Data Normalization in the Learning...

Page 1: Data Normalization in the Learningtang/papers/ZM_poster.pdf · Data Normalization in the Learning of Restricted Boltzmann Machines Charlie Tang Ilya Sutskever Introduction Algorithm

Data Normalization in the Learningof Restricted Boltzmann Machines

Charlie TangIlya Sutskever

Introduction Algorithm

Conclusions

RBM training with CD works well when data is sparse (black background)

Training is much worse with inverted images

We present a simple and effective solution to this problem

Extremely simple way to improve training

Department of Computer Science, University of Toronto, Canada

New algorithm requires the addition of lines of code3

Tech Report: www.cs.toronto.edu/~tang

AIS estimation of log-prob in nats: test (training)Filters learned using zero-mean

Data normalization improves RBM training and should always be used

"Zero-mean" solution:

Many "dead"filters

-110 nats

StandardMNISTfilters

-96 nats

Leads to models with higher log-probs

Learns sparser features, which arebetter for classification

Zero-mean works well with CD,PCD, and FPCD