Deep Learning for Data Mining - Aris Anagnostopoulos -...

Post on 25-Jul-2020

8 views 0 download

Transcript of Deep Learning for Data Mining - Aris Anagnostopoulos -...

Deep Learning for Data Mining

University of Rome "La Sapienza"Dep. of Computer, Control and Management Engineering A. Ruberti

Valsamis Ntouskos, ALCOR Lab

Outline

Deep Learning for Data Mining

• Introduction - Motivation

• Theoretical aspects

• Convolutional Neural Networks

• Recurrent Neural Networks

• Generative models

• Further directions

13/12/2018

Deep Learning with CNNs

Deep Learning for Data Mining

Compositional Models

Learned End-to-End

Hierarchy of Representations- vision: pixel, motif, part, object

- text: character, word, clause, sentence

- speech: audio, band, phone, word

concrete abstractlearning

Slides from Caffe framework tutorial @ CVPR2015

13/12/2018

Deep Learning with CNNs

Deep Learning for Data Mining

Compositional Models

Learned End-to-End

Back-propagation jointly learns

all of the model parameters to

optimize the output for the task.

Slides from Caffe framework tutorial @ CVPR2015

13/12/2018

Motivation of CNNs

Deep Learning for Data Mining

Inputs usually treated as general feature vectors

In some cases inputs have special structure:• Audio• Images• Videos

Signals: Numerical representations of physical quantities

Deep learning can be directly applied on signals by using suitable operators

13/12/2018

Motivation

Deep Learning for Data Mining

. . . 0.0468 0.0468 0.0468 0.0390 0.0390 0.0390 0.0546 0.0625 0.0625 0.0390 0.0312 0.0468 0.0625 . . .

1D data - (variable length) vectors

Audio

13/12/2018

Motivation

Deep Learning for Data Mining

Images

A sequence of images sampled through time - 3D data

2D data - matrices

Video

13/12/2018

What is a CNN?

Deep Learning for Data Mining 13/12/2018

Some theory

13/12/2018Deep Learning for Data Mining

Convolution

• Image filtering is

based on convolution

with special kernels

Some theory

Deep Learning for Data Mining

Pooling

• Introduces subsampling

13/12/2018

Some theory

Deep Learning for Data Mining

Activation

Standard way to model a neuron

f(x) = tanh(x) or f(x) = (1 + e-x)-1

Very slow to train (saturation)

Non-saturating nonlinearity (RELU)f(x) = max(0, x)Quick to train

13/12/2018

Some theory

Deep Learning for Data Mining

Regularization

Dropout

• Applied on the fully-connected layers

• During training prune nodes with probability α

• During testing nodes are weighed by α

Image from Srivastava et al.. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"

13/12/2018

Some theory

Deep Learning for Data Mining

Every convolutional layer of a CNN transforms the 3D input

volume to a 3D output volume of neuron activations.

A regular 3-layer Neural Network

Material from Fei-Fei’s group

13/12/2018

Some theory

Deep Learning for Data Mining

Each neuron is connected to a

local region in the input volume

spatially, but to all channels

The neurons still compute a dot

product of their weights with the

input followed by a non-linearity

Material from Fei-Fei’s group

13/12/2018

Algorithms

Deep Learning for Data Mining

• Each* neuron/layer is differentiable!

• Just apply backpropagation (chain-rule)

• Use standard gradient-based optimization algorithms

(SGD, AdaGrad, …)

• The devil lies in the details though …

▪Choosing hyperparameters / loss-function

▪Exploding/Vanishing gradients – batch normalization

▪Overfitting – Regularization

▪Cost of performing experiments

▪Convergence

▪…

*what about max-pooling?

13/12/2018

Kernels and

Feature maps

Deep Learning for Data Mining

Material from Fei-Fei’s group

13/12/2018

Case study: image classification

Deep Learning for Data Mining

Slides from Caffe framework tutorial @ CVPR2015

13/12/2018

Deep Learning for Data Mining

Case study: image classification

Slides from Caffe framework tutorial @ CVPR2015

13/12/2018

Brief history of CNNs

Foundational work done in the middle of the 1900s

• 1940s-1960s: Cybernetics [McCulloch and Pitts 1943,

Hebb 1949, Rosenblatt 1958]

• 1980s-mid 1990s: Connectionism [Rumelhart 1986,

Hinton 1989]

• 1990s: modern convolutional networks [LeCun et al.

1998], LSTM [Hochreiter & Schmidhuber 1997,

MNIST and other large datasets]

Deep Learning for Data Mining 13/12/2018

Brief history of CNNs

Deep Learning for Data Mining

Hubel & Wiesel [60s] Simple & Complex cells architecture:

Hubel & Wiesel [60s] Simple & Complex cells architecture Fukushima’s Neocognitron [70s]

Yann LeCun’s Early CNNs [80s]:

13/12/2018

Brief history of CNNs

Deep Learning for Data Mining

Convolutional Networks: 1989

LeNet: a layered model composed of convolution and subsampling operations followed

by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]

13/12/2018

Recent success

• Parallel Computation (GPU)

• Larger training sets

• International Competitions

• Theoretical advancements

– Dropout

– ReLUs

– Batch Normalization

Deep Learning for Data Mining 13/12/2018

Recent success

ImageNet

• Over 15M labeled high resolution images

• Roughly 22K categories

• Collected from web and labeled by Amazon Mechanical Turk

Deep Learning for Data Mining

Larger training sets

13/12/2018

Recent success

ILSVRC

• Annual competition of image classification at large scale

• 1.2M images in 1K categories

• Classification: make 5 guesses about the image label

Deep Learning for Data Mining

Competitions

EntleBucher Appenzeller

13/12/2018

CNNs in Computer Vision

Deep Learning for Data Mining

• Image classification

13/12/2018

Evolution of CNNs for image classification

Deep Learning for Data Mining

AlexNet: a layered model composed of convolution, subsampling, and further operations

followed by a holistic representation and all-in-all a landmark classifier on

ILSVRC12. [ AlexNet ]

Convolutional Nets: 2012

AlexNet

13/12/2018

Evolution of CNNs for image classification

Deep Learning for Data Mining

Convolutional Nets: 2014

ILSVRC14 Winners: ~6.6% Top-5 error

- GoogLeNet: composition of multi-scale

dimension-reduced modules

+ depth

+ data

+ dimensionality reduction

13/12/2018

Evolution of CNNs for image classification

Deep Learning for Data Mining

Convolutional Nets: 2014

ILSVRC14 Winners: ~6.6% Top-5 error

- VGG: 16 layers of 3x3 convolution

interleaved with max pooling +

3 fully-connected layers

+ depth

+ data

+ dimensionality reduction

13/12/2018

Evolution of CNNs for image classification

Deep Learning for Data Mining

Convolutional Nets: 2015

ResNet

ILSVRC15 Winner: ~3.6% Top-5 error

Intuition: Easier to learn zero than identity function

13/12/2018

Reasonable questions

• Is this just for a particular dataset? – No!

Deep Learning for Data Mining

Slides from ICCV 2015 Math of Deep Learning tutorial

13/12/2018

Reasonable questions

Deep Learning for Data Mining

Object Localization[R-CNN, HyperColumns, Overfeat, etc.]

Pose estimation [Thomson et al, CVPR’15]

• Is this just for a particular task? – No!

Slides from ICCV 2015 Math of Deep Learning tutorial

13/12/2018

Reasonable questions

Deep Learning for Data Mining

Semantic Segmentation[Pinhero, Collobert, Dollar, ICCV’15]

• Is this just for a particular task? – No!

Slides from ICCV 2015 Math of Deep Learning tutorial

13/12/2018

Fine Tuning

Deep Learning for Data Mining

Dogs vs. Cats

top 10 in 10 minutes

Take a pre-trained model and fine-tune to new tasks [DeCAF] [Zeiler-Fergus] [OverFeat]

© kaggle.com

Your Task

Style

RecognitionLots of Data

ImageNet

13/12/2018

RNNs

Dealing with sequences

• data points are related

• order is important!

(C)NNs disregard this information

Deep Learning for Data Mining 13/12/2018

RNNs

Recurrent Neural Networks (RNNs) address this problem by introducing cells with loops

13/12/2018Deep Learning for Data Mining

• allow information to persist

• the value of output 𝐡𝑡 depends both on 𝐱𝑡 and the previous output 𝐡𝑡−1

Material from Colah’s blog

RNNs

Another way to see RNNs is to unfold the loop

• each cell dedicated to a different data sample

• output 𝐡𝑡 computed based also on 𝐡𝑡−1

13/12/2018Deep Learning for Data Mining

RNNs

RNN structure

13/12/2018Deep Learning for Data Mining

Material from Colah’s blog

RNNs

RNNs (and variants) are very effective in various domains:

• Speech recognition

• Translation

• Image/Video captioning

• Video summarization

• Activity recognition

• ...

13/12/2018Deep Learning for Data Mining

RNNs

a) "The clouds are in the _."

13/12/2018Deep Learning for Data Mining

ProblemRNNs cannot capture long-term dependencies• so-called vanishing/exploding gradients

problem

b) "I grew up in France... I speak fluently _."

LSTMs

Long Short Term Memories (LSTMs)

• RNNs with special structure

• designed to capture long-term dependencies

13/12/2018Deep Learning for Data Mining

Hochreiter & Schmidhuber (1997) Long short-term memory

LSTMs

Main idea: Separate cell output 𝐡𝑡 and cell state 𝐶𝑡• state is modified through a series of

structures called gates

13/12/2018Deep Learning for Data Mining

LSTMs – step by step

1st gate: forget mechanism

• drop elements of 𝐶𝑡−1 based on values of 𝐡𝑡−1and input 𝐱𝑡

13/12/2018Deep Learning for Data Mining

LSTMs – step by step

2nd and 3rd gate: update mechanism

i. decide which elements of 𝐶𝑡−1 to update

ii. compute the update vector ሚ𝐶𝑡

13/12/2018Deep Learning for Data Mining

LSTMs – step by step

2nd and 3rd gate: update mechanism

iii. update elements selected via 𝑖𝑡 with values computed in ሚ𝐶𝑡

13/12/2018Deep Learning for Data Mining

LSTMs – step by step

Last step: compute output

• Based on 𝐶𝑡 , 𝐡𝑡−1, 𝐱𝑡

13/12/2018Deep Learning for Data Mining

LSTM variants 1/3

Add peephole connections

All layers “see” the state 𝐶𝑡−1

13/12/2018Deep Learning for Data Mining

Gers & Schmidhuber (2000) Recurrent nets that time and count

LSTM variants 2/3

Coupled input and forget gates:

• only forget something if you are going to input something else

13/12/2018Deep Learning for Data Mining

LSTM variants 3/3

Gated Recurrent Units (GRUs)

• no cell state! - directly operate on output 𝐡𝑡• just two gates instead of three

13/12/2018Deep Learning for Data Mining

Yao et al. (2015) Depth-gated recurrent neural networks

LSTMs

Further improvements:

• Further modifications of gates/structure

• Bi-directional LSTMs

• Attention mechanisms

• ...

13/12/2018Deep Learning for Data Mining

LSTMs – typical use cases

One-to-one – classical NNs (no RNN)

One-to-many – sequence output (e.g. image captioning)

Many-to-one – sequence input (e.g. sentiment analysis)

Many-to-many – sequence to sequence (e.g. machine translation)

Many-to-many synced – e.g. frame per frame video analysis

13/12/2018Deep Learning for Data Mining

LSTMs – use case examples

Deep Learning for Data Mining

Visual Sequence Tasks

Jeff Donahue et al. CVPR’15

68

Based on Long short-term memory (LSTM) layers

13/12/2018

Using LSTMs

Some videos:

- Facebook AI

- Music classification

- Action recognition SecondHands project

13/12/2018Deep Learning for Data Mining

Beyond Regression/Classification

Generative models

• Variational Auto-Encoders (VAEs)

– focus on learning latent space structure

• Generative Adversarial Networks (GANs)

– focus on learning a distribution

– no latent space (in general)

13/12/2018Deep Learning for Data Mining

GANs

Goal

Sample from the input data distribution 𝒳

Idea

Invert a (Convolutional) Neural Network

13/12/2018Deep Learning for Data Mining

GANs

Encoder (conventional CNN)

• processes an image and produces a vector/code

13/12/2018Deep Learning for Data Mining

GANs

Decoder

• receives a code and produces an image

• uses “deconvolutional” layers

13/12/2018Deep Learning for Data Mining

Material from Frans’ blog

GANs

Problem

How to train the decoder to produce meaningful data?

Idea

Adversarial Training

13/12/2018Deep Learning for Data Mining

GANs

GAN is a combination of two networks

1. a generator network (decoder)

2. a discriminator network (critic)

13/12/2018Deep Learning for Data Mining

GANs

Roles

• Generator: produces realistic samples of 𝒳

• Discriminator: identifies if a sample actually comes from the (unknown) 𝒳 or not

Training

make the networks compete with each other

• generator tries to fool the discriminator in believing that the sample is ‘real’

• discriminator tries to discriminate as good as possible ‘real’ from ‘fake’ samples

13/12/2018Deep Learning for Data Mining

GANs

Example on CIFAR dataset (32x32 images)

Are these real images or generated?

13/12/2018Deep Learning for Data Mining

GANs

Example on CIFAR dataset (32x32 images)

Are these real images or generated?

13/12/2018Deep Learning for Data Mining

GANs

Example on CIFAR dataset (32x32 images)

Results at 300, 900 and 5700 iterations

13/12/2018Deep Learning for Data Mining

GANs

Example: Celebrity faces (1024x1024 images)

13/12/2018Deep Learning for Data Mining

VAEs

Goal

• modify data in specific directions

• identify meaningful directions in latent space

Examples

- ‘distort’ faces (change expression, add glasses)

- produce digits from different hand-writing styles

- distort 3D meshes

- ... 13/12/2018Deep Learning for Data Mining

VAEs

What is an auto-encoder?

- a combination of an encoder and a decoder

- trained based on reconstruction loss

- provides low-dimensional representation

13/12/2018Deep Learning for Data Mining

Material from Kurita’s blog

VAEs

Goal

Feed vectors and get realistic samples of 𝒳

AE problem: we don’t know if the vectors are valid or not

13/12/2018Deep Learning for Data Mining

VAEs

Main idea:

Encoder produces a distribution instead of a vector

Decoder operates on samples from this distribution

13/12/2018Deep Learning for Data Mining

VAEs

Problems

1. How to produce a distribution?

2. How to prevent degeneration?

Solutions

1. parametric distributions – typically Gaussian

– produce mean 𝜇 and variance Σ

2. add loss term based on Kullback-Leiblerdivergence

13/12/2018Deep Learning for Data Mining

VAEs

One more problem:

- Sampling operation is not differentiable

Solution:

- re-parametrization

13/12/2018Deep Learning for Data Mining

Images from Keita Kurita’s blog

VAEs

Example: faces (Hou et al., 2016)

13/12/2018Deep Learning for Data Mining

VAEs

Example: 3D Mesh deformation (Tan et al., 2018)

13/12/2018Deep Learning for Data Mining

Further directions

Few-shot and zero-shot learning

• meta-learner architectures

• Prototypical networks

13/12/2018Deep Learning for Data Mining

Further directions

Few-shot and zero-shot learning

• meta-learner architectures

• Prototypical networks

13/12/2018Deep Learning for Data Mining

One –show learning example: Mullet Zero–show learning example: Zebra (from horse)

Further directions

Few-shot and zero-shot learning

• meta-learner architectures

• Prototypical networks

Learning on manifolds

• learning functions defined on surfaces

13/12/2018Deep Learning for Data Mining

Further directions

Few-shot and zero-shot learning

• meta-learner architectures

• Prototypical networks

Learning on manifolds

• learning functions defined on surfaces

• learning functions defined on graphs

13/12/2018Deep Learning for Data Mining

Further directions

Few-shot and zero-shot learning

• meta-learner architectures

• Prototypical networks

Learning on manifolds

• learning functions defined on surfaces

• learning functions defined on graphs

Learning with constraints

• Frank-Wolf networks

13/12/2018Deep Learning for Data Mining

Resources

Frameworks:

• TensorFlow (Google) | C/C++, Python, Java, Go

• Torch/PyTorch (Facebook) | Lua/Python

• Caffe/Caffe 2 (UC Berkeley) | C/C++, Python, Matlab

• Theano (U Montreal) | Python

• CNTK (Microsoft) | Python, C++ , C#/.Net, Java

• MxNet (DMLC) | Python, C++, R, Perl, …

• Darknet (Redmon J.) | C

• …

Deep Learning for Data Mining 13/12/2018

Resources

High-level libraries:

• Keras | Backends: TensorFlow (TF), Theano

Models:

• Depends on the framework, e.g.

– https://github.com/BVLC/caffe/wiki/Model-Zoo (Caffe)

– https://github.com/tensorflow/models/tree/master/research (TF)

Interactive Interfaces:

• DIGITS (NVIDIA) | Caffe, TF, Torch

• TensorBoard (TF)

Tools:

• http://ethereon.github.io/netscope (for networks defined in protobuf )

Deep Learning for Data Mining 13/12/2018

Thank you!

13/12/2018Deep Learning for Data Mining