Unsupervised feature learning for audio classification using convolutional deep belief networks
description
Transcript of Unsupervised feature learning for audio classification using convolutional deep belief networks
Unsupervised feature learning for audio classification using convolutional deep belief net
works
Honglak Lee, Yan Largman, Peter Pham and Andrew Y. Ng
Presented by Bo Chen, 5.7,2010
Outline
• 1. What’s Deep Learning?
• 2. Why use Deep Learning?
• 3. Foundations of Deep Learning
• 4. Convolutional Deep Belief Networks
• 5. Results
Deep Architecture
• Deep architectures: compositions of many layers of adaptive non-linear components.
Difficulty: parameter searching (local minima)
• Deep belief nets: probabilistic generative models that are composed of multiple layers of stochastic, latent variables. (Hinton et al., 2006)
Deep Learning Wiki
Why Use Deep Learning
• Insufficient depth can hurt Usually our experiences tell us that one-layer machine only gives us
a set of general dictionary elements, unless a huge number of dictionary elements.
• The brain has a deep architecture• Cognitive processes seem deep• Learn a feature hierarchies or the complicated fu
nctions that can represent high-level abstractions
For example, PixelsEdgletsMotifsPartsObjectsScenes
Some from Yoshua Bengio’s course notes and Yann Lecun, et.al.,2010
One-layer dictionary
30 16x16 dictionary elementsand reconstructed images
250 16x16 dictionary elementsand reconstructed images
Restricted Boltzmann Machine
Figure from R Salakhutdinov et. al.
Energy functionBinary-valued
Real-valued
Contrastive divergence is used to solve the problem. (Hinton et al., 2006)
Deep Architectures
RBM in the different layers can be independently trained.
Convolutional Network Architecture
Figure from Yann LeCun et. al, 1998
Intuitively, in each layer the weight matrix will catch the most consistent ‘structures’ through all of the images.
3-dimensional Dictionary elements in the second layer
The dictionary element in the second layeris a 3-dimensional matrix.
D: the first-layer dictionary element E: the second-layer dictionary elementS: the convolution of the image and the first-layer elements.
Convolutional RBM with Probabilistic Max-Pooling Layer
Max-pooling Layer
Convolutional Deep Belief Networks
: the weight matrixConnecting poolingunit Pk to detection unit H’l.
Results on Natural Images
Results Caltech101 Images