Deep learning short introduction

13
Deep Learning Introduction July, 2015

Transcript of Deep learning short introduction

Page 1: Deep learning short introduction

Deep LearningIntroductionJuly, 2015

Page 2: Deep learning short introduction

Refresher: Machine LearningA computer program is said to learn from experience (E) with respect to some class of tasks (T) and performance measure (P), if its performance at tasks in T, as measured by P, improves with experience E — T. Mitchell 1997• Machine learning generally breaks down into two application areas which

are closely related: classification and regression.• Some Intuitive Examples of Machine learning:

Learning by labelled example (supervised learning) e.g. An email spam detector.

Discovering patterns (unsupervised learning) e.g. data clustering, pattern recognition

Feedback right/wrong: reinforcement learning e.g. repetitive game play with defined win/loss rules

• Most machine learning methods work well because of human-designed/hand engineered features (representations) .

• It can be seen as curve fitting, optimising weights in order to make a prediction.

Unsupervised Learning can also be viewed as a way to create a higher level representation of data.

Page 3: Deep learning short introduction

Neural Networks (Basics)• A basic neural network takes some numbers for input,

looks at some example answers and guesses new numbers for output where there are no example answers (e.g. of supervised learning).

• Transforms through function such as the sigmoid, logistic function, hyperbolic tangent etc.

• In simple neural networks there is an input layer, a hidden layer, and an output layer.

• A typical neural network consists of a set of layers with nodes that take input from the preceding layer, multiply these numbers by weights, sum up different inputs, and pass the result to the next layer.

• Multiple layers can be stacked one after the next, and each layer can have different numbers of nodes as per design.

Page 4: Deep learning short introduction

Intuitive Example• Imagine that you don’t speak a word of Chinese, but your company is moving you to

China next month. Company will sponsor Chinese speaking lesson for you once you are there, but you want to prepare yourself before you go.

• You decide to listen to Chinese radio station • For a month, you bombard yourself with Chinese radio.• This can be seen as unsupervised learning experience as you don’t know the

meaning of Chinese words.• Lets think that somehow your brain develops capacity to understand few

commonly occurring patterns without meaning. In other words, you have developed a different level of representation for some part of Chinese by becoming more tuned to its common sounds and structures.

• Hopefully, when you arrive in China, you’ll be in a better position to start the lessons.

Example loosely taken from Lecture series by Prof. Abu Mustafa

Page 5: Deep learning short introduction

Deep Learning

Image from blog http://www.datarobot.com/blog

Page 6: Deep learning short introduction

Learning Representations

Page 7: Deep learning short introduction

What changed ? • More data (Big Data) – Better performance is observed when you

feed lots and lots of unlabelled data. • Faster hardware: GPU’s, multi-core CPU’s.• Some key technical breakthroughs after 2006 (Currently cutting

edge and active research areas.)• Stacked Restricted Boltzman Machines• Stacked Autoencoders• Pretrained layers• Sparse Representations

• Working ideas on how to train deep architectures

Page 8: Deep learning short introduction

What deep learning refers to• Large Neural networks used to learn feature hierarchies• Simply a continuation of the multi-decade advancement in our ability to make use of

large scale neural networks. 

Page 9: Deep learning short introduction

Deep Learning in a nutshell Unsupervised training tries to get the network to learn statistical regularities in the

input space. To achieve this you need to define an objective function to maximize, which typically revolves around some kind of sparseness or information preservation criteria within the context of dimension reduction. You try to get the network to learn the best way to maintain as much information about the input as possible despite a data bottleneck. In this way the network hopefully learns a sort of conceptual vocabulary which is used to best describe the typical data.

Train the networks with data in layer-wise fashion: train the first layer, and then train the second layer on inputs that are pre-processed by the first, and so on. Once you have a network that has been trained on unlabelled data, then build a classifier (just an example) by passing the output of the network through a support vector machine , or by using other techniques. 

The final layer that generates the classifications is trained by using a fully-labelled data set which can have fewer examples than the data set used for the unsupervised pass.

This classifier ends up working very well because its input is has been already trained to represent the important statistical features of the input space.

Page 10: Deep learning short introduction

Why deep learn?• Don't need to hand engineer features. Manually designed features are often over-

specified, incomplete and take a long time to design and validate• Theoretically, a good lower level representation can be used for many distinct tasks. In

deep learning words, pre-trained deep network weights can be used as the starting point for further training on labelled data. This has shown better performance that other weight initialization strategies.

• For some problems like computer vision, the learnt features are better than the best hand-engineered features, and can be so complex that it would take way too much human time to engineer.

• Use of unlabelled data to pre-train the network. In addition, in some domains we have so much unlabelled data but labelled data is hard to find.

• It’s a latest machine learning technique that is producing significant increase in quality of results.

Page 11: Deep learning short introduction

Deep Learning Impact

Computer Vision Image recognition (e.g. Tagging faces in photos)

Audio Processing Voice recognition (e.g. Voice based search, Siri)

Natural Language Processingautomatic translation

Pattern detection (e.g. Handwriting recognition)

Bioinformatics …

Page 12: Deep learning short introduction

Deep Learning Myths Deep learning outperforms all other algorithms in confined settings and is ‘cure all’ ML. Deep learning is taking advantage of an understanding of how the brain processes

information, learns, makes decisions, or copes with large amounts of data. * Deep Learning is commonplace - just another algorithm. Deep Learning works out of box. **

* http://spectrum.ieee.org/robotics/artificial-intelligence/machinelearning-maestro-michael-jordan-on-the-delusions-of-big-data-and-other-huge-engineering-efforts** http://spectrum.ieee.org/automaton/robotics/artificial-intelligence/facebook-ai-director-yann-lecun-on-deep-learning

Page 13: Deep learning short introduction

Some cool demos and DIY• Handwritten digit recognition - http://www.cs.toronto.edu/~hinton/adi/index.htm• open-source implementations of various state-of-the-art neural networks (c.f. 

Caffe,cuda-covnet, Torch, Theano)• Deep Learning Tutorial - http://deeplearning.net/tutorial/• Deep Learning in java - http://deeplearning4j.org/• Deep Learning in R - http://cran.r-project.org/web/packages/deepnet/deepnet.pdf• Microsoft Research free book

http://research.microsoft.com/apps/pubs/default.aspx?id=209355