Machine Learning for biology - univ-rennes1.fr

Machine Learning for biology

V. Monbet

UFR de MathématiquesUniversité de Rennes 1

V. Monbet (UFR Math, UR1) Machine Learning for biology (2019) 1 / 26

Introduction

Outline

1 Introduction

2 Dimension Reduction

3 Unsupervised learning

4 Supervised learning

5 Linear model (I)

6 Linear model (II)

7 Data driven supervised learning

8 Ensemble methods (I)

9 Ensemble methods (II)

10 Neural Networks

11 Deep Learning

12 Kernel methods (I)

13 Kernel methods (II)V. Monbet (UFR Math, UR1) Machine Learning for biology (2019) 2 / 26

Dimension Reduction

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Unsupervised learning

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Supervised learning

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Linear model (I)

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Linear model (II)

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Data driven supervised learning

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Ensemble methods (I)

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Ensemble methods (II)

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Neural Networks

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Deep Learning

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep LearningIntroductionPre-processing for deep learning for imagesTricks to help the learning task


13 Kernel methods (II)


Deep Learning Introduction

Outline




Deeplearning vs Machine Learning

Deep Learning is efficient for classification in very large data bases of images (orcomplex data).

Before Deep Learning the standard Machine Learning process was1. Extract relevant features in the data2. Fit a model (Linear, ANN, Tree, ...)3. Predict for new observations

With Deep Learning the features extraction is included in the algorithm.But the price to pay is that it requires- very large data bases- a very long learning time.



Why is Deep Learning so famous

In 2012, AlexNet (univ. Toronto) wins the ImageNet competition (1.2M images, 1000labels) with an algorithm based on convolution networks. They obtained aclassification error around 15% while the second team had a classification erroraround 26%.The convolution network is a black box algorithm for feature extraction.

In 2013, Clarify ConvNet : 11% d’erreur.

Recently these algorithms have been shown to be efficient for many problems.


Deep Learning Pre-processing for deep learning for images

Outline




Deep learning

The main ideas/ingredients of deep learning are

pre-processing the data in order to extract (automatically) typical features/patternsThere are 2 usual methods for features extraction (and neural network initialisation) inDeep Learning : convolutional neural networks (CNN) and autoencoders (AE).

stacking neural networks.



Convolutional neural network

When we look at a picture of a dog, we can classify it as such if the picture hasidentifiable features such as paws or 4 legs.

In a similar way, the computer is able perform image classification by looking for lowlevel features such as edges and curves, and then building up to more abstractconcepts through a series of convolutional layers.



CNN scheme

A CNN aims at building new (latent) features to reduce the dimension of the inputspace and focus on important characteristics of the object to recognize.The input of a CNN is an image or a time series and the output is the new features forthe given input.This pre-processing stage is based on a combination of convolution steps andpooling steps.



Convolution

Convolution helps to extract typical features from the data.

Each image x of size r × c is subsampled into k patches xs of size a× b, and thesame filter

fs(x) = σ(ωxs + β)

is apply to each patch to obtain an array k × (r − a + 1)× (c − b + 1)

Link

ω and β are "weights" to be learned.

Different values of ω will lead to different filters.


http://ufldl.stanford.edu/tutorial/supervised/FeatureExtractionUsingConvolution/


Example of filters

https://towardsdatascience.com/


https://towardsdatascience.com/


Example of filteringImage



Example of filtering

Zoom on a part of the image

Focus on the vertical "line", it may look like this

The sum of the individual cell multiplications is [0+0+0+200+225+225+0+0+0] = 650.This is a relatively high value, compared to what another arrangement of the filter matrixmight produce, because both the image section and the filter have high values in thecenter column and low values elsewhere. So, for this initial filter action, we can say thatthe filter has detected a vertical line.

Credit: Peter Bruce & https://blogs.scientificamerican.com/observations/a-deep-dive-into-deep-learning/


https://blogs.scientificamerican.com/observations/a-deep-dive-into-deep-learning/

https://blogs.scientificamerican.com/observations/a-deep-dive-into-deep-learning/


Convolutional Neural Network

The first convolution layer extracts simple forms (vertical line, horizontal....) from the inputpixels,the second layer extracts some shapes a little more complex from the basic shapes,and so on: each layer sees larger and larger portions of the image.



Convolution parameters

Filter size (usually an odd value)Stacking small filters requires less parameters than a large filter. (F )

"zero-padding" width: number of zero pixels added at around the images. (P)

"stride": width of the step for the convolution moving. (S)

Number of filters (or cards) applied to the given layer. (N)

Left panel: F=3, P=1, S=1, N=1 ; right panel: F=3, P=1, S=2, N=1Green panel: filter weights

Size of a layer If the volume of the input is W , the volution of the output is

(W − F + 2P)/S + 1

Example: If the input is 7×7 and the filter 5×5 with padding 0 and stride 1, the outputis 3×3.

http://cs231n.github.io/convolutional-networks/]


http://cs231n.github.io/convolutional-networks/


Recap

To summarize, the Conv Layer

Accepts a volume of size W1 × H1 × D1

Requires four hyperparameters:- Number of filters N,- their spatial extent F (default 3 or 5),- the stride S, (default S = 1)- the amount of zero padding P (default chosen to keep the dimension of the input).

Produces a volume of size W2 × H2 × D2 where:

W2 = (W1F + 2P)/S + 1, H2 = (H1F + 2P)/S + 1, D2 = K

With parameter sharing, it introduces F 2D1 weights per filter, for a total of F 2D1Kweights and K biases.

In the output volume, the d-th depth slice (of size W2 × H2) is the result of performinga valid convolution of the d-th filter over the input volume with a stride of S, and thenoffset by d-th bias.



Pooling

Once the convoluted variables are available, the dimension of the spatial space is reducedby pooling.

The convoluted images is split in patches (for instance 4 patches) and the mean (orthe max) of the values of the patch is computed .

Example of max-pooling

This step allows to significantly reduce the dimension of the learning space and, as aconsequence, the number of parameters. It helps to control overfitting.



Pooling parameters

The pooling layer

Accepts a volume of size W1 × H1 × D1

Requires two hyperparameters:- their spatial extent F ,- the stride S.

Produces a volume of size W2 × H2 × D2 where:

W2 = (W1F )/S + 1, H2 = (H1F )/S + 1, D2 = D1

Introduces zero parameters since it computes a fixed function of the input

For Pooling layers, it is not common to pad the input using zero-padding.



ConvNet architecture

Choice of the architecture: in 90% or more of applications, the best solution is to look atwhatever architecture currently works best on ImageNet, download a pretrained modeland finetune it on your data.



CNN in practice

Example with keras + google colab

https://drive.google.com/open?id=1A7xgkLqVlWwi40FTqmDNmntsjD2R2SUb


https://drive.google.com/open?id=1A7xgkLqVlWwi40FTqmDNmntsjD2R2SUb


Autoencoders

"Autoencoding" is a data compression algorithm where the compression anddecompression functions are 1) data-specific, 2) lossy, and 3) learned automaticallyfrom examples rather than engineered by a human. Additionally, in almost allcontexts where the term "autoencoder" is used, the compression and decompressionfunctions are implemented with neural networks.

Autoencoders have mainly 2 goals- data denoising

- dimensionality reduction



Autoencoders

Auto encoders include- one (or more) neural network to encode the original data (reduce the dimension)

φ : X → F

- one (or more) neural network to decode (ie retrieve the original data)

ψ : F → X .

They are fitted such that the reconstruction error is reduced i.e.

x′ = φ(ψ(x)) ' x

For instance, if we use only 1 node in both layers, we want to minimize

E = ||x− x′||2 = ||x− σ′(W ′(σWx + β) + β′||2

where x is the input and x′ the output of the autoencoder.


Deep Learning Tricks to help the learning task

Outline




Overfitting in learning ANN

Regularization: add a Ridge penalty to the loss function

Early stopping: the optimization algorithm stops when the RMSE computed on avalidation set (ex: out-of-bag sample) starts to increase.

Dropout



Other gradient descent algorithm used in (deep) learning

MomentumSGD has trouble navigating ravines, i.e. areas where the surface curves much moresteeply in one dimension than in another.Momentum helps accelerate SGD in the relevant direction.

mr = αmt−1 + γr∇θJ(θ; xi:i+k , yi:i+k )

θr+1 = θr −mr

α is usually fixed to a value close to 0.9.



Other gradient descent algorithm used in (deep) learning

Adam (Adaptive Moment Estimation) computes adaptive learning rates for eachparameter.

gr,j = ∇θj J(θ; xi:i+k , yi:i+k )

mr = α1mtr1 + (1− α1)gr,j

mr = mr/(1− αr1)

νr = α2νr−1 + (1− α2)g2r,j

νr = νr/(1− αr2)

mr and νr are estimates of the first moment and the second moment of the gradientsrespectively

θr+1 = θr − γrmr√νr + ε



Transfert learning

Learning of a Deep Neural Networks requires a huge volume of data.

If your database is "small", you can use transfert learning: use model trained onImageNet to initialize your network.

There 2 main strategies:- use the features extract from a pretrained network as input of a classifier- refine the pretrained network (only the last layers).



Data augmentation

Data augmentation helps to increase the robusteness of the model.

It consists in adding transformed images to the initial database.

Rabdomly add rotations or translations, change brightness, change colors, blur, etc...



Why can we retain about deep learning?

Deep Learning is mainly used for classification tasks.

Deep Learning works on large databases of structured data (like images, time series).

The main ideas are to stack ANN and use good optimization algorithms.



Deep Learning in Practice

There are several packages/modules to run deep learning methods.

Keras is available in Python and R (2 examples in the lab).

Google


Kernel methods (I)

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Kernel methods (II)

Outline

1 Introduction




5 Linear model (I)

6 Linear model (II)




10 Neural Networks

11 Deep Learning



Machine Learning for biology - univ-rennes1.fr

Documents

Transcript of Machine Learning for biology - univ-rennes1.fr