Deep Learning Training Course - morrisriedel.de · High Productivity Data Processing Group Juelich...

Introduction to Deep Learning Models

Dr. – Ing. Gabriele CavallaroPostdoctoral ResearcherHigh Productivity Data Processing GroupJuelich Supercomputing Centre, Germany

Introduction to Deep Learning for Remote Sensing and 1D/2D CNNs for Hyperspectral Images Classification

May 22th, 2019Juelich Supercomputing Centre, Juelich, Germany

Deep Learning Training Course

LECTURE 5

Outline of the Course

1. Introduction to Machine Learning Fundamentals

2. Deep Learning: Overview

3. Neural Networks

4. Convolutional Neural Networks (CNNs)

5. Introduction to Deep Learning for Remote Sensing and 1D/2D CNNs for

Hyperspectral Images Classification

6. 3D CNNs for Hyperspectral Images Classification, Training Set Selection and

Performance Evaluation

7. Recurrent Neural Networks

8. LSTM Neural Networks

9. Deep Reinforcement Learning

10. Course Summary and Lessons Learned

Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 2

Outline


Outline

▪ Remote Sensing (RS) Background

▪ Challenges for Deep Learning (DL) in RS

▪ DL and Shallow Learning

▪ Classification of Hyperspectral Images

▪ 1D Convolutional Neural Networks (CNNs)

▪ 2D CNNs


Remote Sensing Background


▪ The term remote sensing was first used in the United States in the1950s by Ms. Evelyn Pruitt of the U.S. Office of Naval Research

Remote (without physical contact) Sensing (measurement of information)

▪ Measurement of radiation of differentwavelengths reflected or emitted fromdistant objects or materials

▪ They may be categorized by class/type,substance, and spatial distribution

[1] Satellite (1960)

[2] The Earth-Atmosphere Energy Balance

Remote Sensing


▪ Active Sensor: own source of illumination

▪ Capture image in day and night

▪ Any weather or cloud conditions

▪ Passive Sensor: natural light available

▪ Great quality satellite imagery

▪ Multispectral and Hyperspectral technology

▪ Platform: selected according to the application

[3] Active-and-passive-remote-sensing

Platforms and Sensors

Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification

DATA ACQUISITION

PROCESSING PRODUCTS

7

▪ Landscapes vary greatly in their spatial complexity

▪ Some may be represented clearly at coarse levels of detail

▪ Others are so complex that the finest level of detail is required

▪ The spatial minimum area discernable by a pixel (i.e., picture element)

▪ Influenced primarily by the sensor and the platform altitude

1m 10m 30m

Spatial Resolution


▪ Remote sensing observes at varied wavelengths

▪ Ability to define fine wavelength intervals

▪ The finer the spectral resolution, the narrower the wavelength range of a particular band

Landsat (7-11 bands) AVIRIS (256 bands)

[4] Airborne Hyperspectral

Spectrum of a mixture of three common minerals: kaolinite, Dolomite, Hematite

Spectral Resolution


▪ Maximum number of brightness levels available

▪ Depends on the number of bits used in representing the energy recorded

▪ Higher radiometric resolution, sharper images

Radiometric Resolution


▪ The time it takes for a satellite to complete one orbit cycle (revisit time)

▪ Depends on the satellite/sensor capabilities, swath overlap and latitude

Repeated imaging enables assessment of changes in the type or condition of surface features

[5] Tsunami Damage, Gleebruk, Indonesia January 7, 2005

Temporal Resolution


• Platform: Twin polar-orbiting satellites, phased at 180° to each other

• Temporal resolution of 5 days at the equator in cloud-free conditions

▪ Images for mapping land-use change, land-cover change, biophysical variables

▪ Monitor the coastal and inland waters and help with risk and disaster mapping

[6] Earth Observation Mission Sentinel 2

~23 TB data stored per day

Sentinel 2 Mission


▪ Different methods can be applied before further visual interpretation or digital analysis

▪ Often combined, as part of a chain of operations ▪ Start with a raw image and obtain a highly transformed output product

▪ The boundaries between this operations are not well defined in practice

Remote Sensing Image Processing


IMAGE PREPROCESSING

IMAGE ENHANCEMENT

IMAGE CLASSIFICATION

ANALYSIS OF CHANGE OVER TIME

DATA FUSION AND GIS INTEGRATION

HYPERSPECTRAL IMAGE ANALYSIS

BIOPHYSICAL MODELING

DATA ACQUISITION

PROCESSING PRODUCTS

13

• Perhaps the most common form of image interpretation

• Applications: environmental management, agricultural planning, health studies, climate and biodiversity monitoring, and land change detection

Generation of thematic maps

Classification of Remote Sensing Images


▪ Important task for regularly monitoring the Earth’s surface

▪ Generation of reliable maps with spatial consistency is challenging

▪ Data from several satellite orbit tracks, different dates and presence of clouds

▪ Cannot rely on field campaigns

▪ Use existing databases to build the reference sets for supervised classification

[7] CORINE land cover

Update of Land-Cover Maps at Continent Scale


▪ E.g., Corine Land Cover Programme▪ Every 6 years a new map▪ 44 classes in the hierarchical 3-level

15

▪ Global change detection and monitoring: Deforestation

▪ Environmental assessment and monitoring: Urban growth

[12] Deforestation in Bolivia from 1986 to 2001

[11] Population growth from 1975 - 2010 of Manila

1986 2001

Applications



Challenges for Deep Learning in Remote Sensing

17

▪ DL has risen to the top in numerous areas in the last years▪ Computer vision, speech recognition, etc.

▪ Deep learning is also taking off in RS

▪ RS offers a number of unique challenges for deep learning

[13] X. X. Zhu et al


Deep Learning in Remote Sensing

Exponential increase of the papers published

18

▪ Multimodal data: geometries and content are completely different

▪ From optical (multi- and hyperspectral), Lidar, and synthetic aperture radar (SAR) sensors

▪ High temporal resolutions data▪ Shift from individual image analysis to time-series processing

▪ Big remote sensing data: high spatial and more spectral dimensionality

▪ Traditional DL systems operate on relatively small grayscale or RGB imagery

Optical data: From four to hundreds of channelsSAR images: noisy data


Main Challenges for DL in RS

19

▪ The existing datasets (especially Hyperspectral) have several limitations

▪ Small scale of scene classes and the image numbers▪ Lack of image variations and diversity▪ Saturation of accuracy

▪ Several RGB and Multispectral datasets have been released

DatasetsImagetype

Image perclass

Scene classes

Annotation type

Total images

Spatial resolution (m)

Image sizes

Year Ref.

UC Merced Aerial RGB 100 21 Single/Multi label 2100 0.3 256x256 2010 [15]WHU-RS19 Aerial RGB ~50 19 Single label 1005 up to 0.5 600x600 2012 [16]

RSSCN7 Aerial RGB 400 7 Single label 2800 -- 400x400 2015 [17]SAT-6 Aerial MS -- 6 Single label 405000 1 28x28 2015 [18]

SIRI-WHU Aerial RGB 200 12 Single label 2400 2 200x200 2016 [19]RSC11 Aerial RGB ~100 11 Single label 1323 0.2 512x512 2016 [20]

Brazilian Coffee Satellite MS 1438 2 Single label 2876 -- 64x64 2016 [21]RESISC45 Aerial RGB 700 45 Single label 31500 ~30 to 0.2 256x256 2016 [22]

AID Aerial RGB ~300 30 Single label 10000 0.6 600x600 2016 [23]EuroSAT Satellite MS ~2500 10 Single label 27000 10 64x64 2017 [24]

RSI-CB128 Aerial RGB ~800 45 Single label 36000 0.3 to 3 128x128 2017 [25]RSI-CB256 Aerial RGB ~690 35 Single label 24000 0.3 to 3 256x256 2017 [25]PatternNet Aerial RGB ~800 38 Single label 30400 0.062~4.693 256X256 2017 [26]

[14] J. E. Ball


Limited Number of Remote Sensing Datasets

20

▪ ImageNet dataset: 14,197,122 images [28] ImageNet project


Most Recent Multispectral Dataset: BigEarthNet

DatasetsImagetype

Image perclass

Scene classes

Annotation type

Total images

Spatial resolution (m)

Image sizes

Year Ref.

BigEarthNet Satellite MS328 to 217119

43 Multi label 590,326102060

120x12060x6020x20

2018

[27] G. Sumbul

21


Deep Learning and Shallow Learning

22

How to Extract Spatial Features?

• Very High Spatial Resolution images: huge amount of details


DATA ACQUISITION

PREPROCESSINGFEATURE

EXTRACTION CLASSIFICATION

▪ Traditional feature extraction methods involve approaches to extract information

▪ Based on spatial, spectral, textural, morphological content, etc.

▪ The features are designed for a specific task

▪ E.g. Enhancing the spatial information (Self-Dual Attribute Profiles - SDAPs)

Representative: they contain salient structures of the input image

Non-redundant: same objects are present only in one or few levels of the SDAP

[29] G. Cavallaro


Traditional RS Pipeline

24

▪ Shallow learning: learning networks that usually have at most one to two layers

▪ E.G. Support Vector Machines (SVMs)

▪ They compute linear or nonlinear functions of the data (often hand-designed features)

▪ DL means a deeper network with many layers of non-linear transformations

▪ No universally accepted definition of how many layers constitute a “deep” learner

▪ Typical networks are typically at least four or five layers deep.


Deep Learning and Shallow Learning

25

http://www.cs.utoronto.ca/~rgrosse/cacm2011-cdbn.pdf


Deep Networks Learn Hierarchical Feature Representations

[30] H. Lee et al.

26


Classification of Hyperspectral Images

27

RGB 3 Channels Hyperspectral

Hyperspectral Images

Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification [31] Hyperspectral remote sensing

▪ Different materials produce distinct electromagnetic radiation spectra

▪ The curve of each class has its own visual shape which is different from other classes

▪ It is relatively difficult to distinguish some classes with human eye

▪ E.g., Gravel and bricks

Gravel

Bricks

Trees 28

Challenges for Deep Learning for Hyperspectral Data

▪ The application of deep learning with hyperspectral images is less straightforward than with other optical data (i.e., RGB images)

▪ Deep learning techniques designed for computer vision are not easily transferable

▪ Most hyperspectral sensors have a low spatial resolution

▪ The spectral dimension prevails over the spatial neighborhood in most cases

▪ Moreover, the low spatial resolution limits the number of training samples

▪ Not trivial classification process


Objects smaller than thespatial resolution are mixedwith their neighborhood

29

▪ Deep networks: most straightforward evolution from shallow machine learning to deeplearning

▪ Have been implemented since the 2000s with small networks, and brought up to daterecently Convolutional Neural Networks (CNNs)

▪ CNNs can achieve competitive and even better performance than human being in somevisual problems

▪ Different CNNs architectures can be adopted for classifying Hyperspectral images▪ With 1D convolutions▪ With 2D convolutions▪ With 3D convolutions▪ With a combination of the above

▪ What distinguish them?▪ Convolutional direction▪ Input shape▪ Output shape

Deep Learning for Classification of Hyperspectral Images


[32] P. Goel

30

Practicals


▪ Based on DeepHyperX▪ Available online at https://gitlab.inria.fr/naudeber/DeepHyperX/▪ Deep learning toolbox for Hyperspectral images▪ PyTorch implementation of several state-of-the-art CNN (i.e., 1D, 2D, 3D) ▪ Tested with different public hyperspectral datasets

▪ Login with Jupyter JSC and start Jupyter (use login node)▪ Once inside open the Terminal

[33] DeepHyperX

31

https://gitlab.inria.fr/naudeber/DeepHyperX/

Practicals


▪ Navigate to the project folder with $ cd /p/project/training1911/

▪ Create you own folder (with $ mkdir ) and get inside

▪ Copy your own copy of the toolbox in your own folder with$ cp -R /p/project/training1911/practicals/lecture_5_and_6/DeepHyperX .

[33] DeepHyperX

32

Start the Interactive Session and the Environment


▪ Start an interactive session by requiring a node with GPUs$ salloc --gres=gpu:4 --partition=gpus --nodes=1 --account=training1911 --time=02:00:00 --reservation=intro-dl-wed

▪ Load the virtual environment with $ source /p/project/training1911/practicals/lecture_5_and_6/load_environment_jureca.sh

▪ Check if you are on the right node (i.e., you can see the GPUs)$ srun nvidia-smi

33

Models for the Practicals


[33] DeepHyperX

34

Hyperspectral Dataset for the Practicals


▪ Acquired by the ROSIS sensor (aerial platform) with a spatial resolution of 1.3 m

▪ Data cube of 610×340×103

▪ 9 land cover classes were annotated covering 50% of the whole surface

[31] Hyperspectral remote sensingPavia University

35

• Classification: make a prediction for a whole input

• What are the classes and ranked list

• Localization or detection: towards fine-grained inference

• Classification and spatial location (e.g., bounding boxes)

• Semantic segmentation: fine-grained inference

• Make dense predictions inferring labels for every pixel

• Further improvements: Provide different instances of the same class

• Decomposition of already segmented classes into their components

• Many applications nourish from inferring knowledge from imagery

• Autonomous driving

• Human-machine interaction

• Computational photography

• Image search engines

• Remote sensing


Progression from Coarse to Fine Inference

[34] A. Garcia-Garcia

[35] Image Segmentation 36


Task for the Practicals: Pixel-Wise Classification

SAMPLER

TRAINING PHASE(learn)

MODEL(parameters)

TEST PHASE(performance)

CLASSIFIER

ANNOTATED SAMPLES(GROUDTRUTH)

TRAININGSET

TESTSET

VALIDATIONSET

▪ Training set: used to train the model▪ How do we ensure that the model is not overfitting to the data in the training set?

▪ Validation set: used to validate the model during training

▪ Its classification is based only on the model that is learnt from the training set▪ The model weights are updated based on this set▪ Help to adjust the hyperparameters (e.g., number of hidden layers, learning rate, etc.. )

▪ Test set: used to test the model after it has been trained

37


1D CNNs with Practicals

38

1D Approach


▪ 1D Spectral

▪ Most straightforward approach for classifying hyperspectral data

▪ Consider it as a set of 1D spectra makes sense given its finespectral resolution but low spatial resolution

▪ Each pixel corresponds to a spectral signature, that is a discretesignal to which a statistical model can be fit

39

1D Convolution


▪ 1D Convolution ▪ Calculate just in 1 direction

▪ E.g., time-axis▪ Output-shape is 1D array

▪ 1D Convolution with Hyperspectral images▪ Direction: spectrum dimension▪ Output: 1D vector ▪ Ability to discover close relations

[36] N. Audebert

E.g., Graph smoothing

feature extraction andrepresentation learning

fully connected and performs the classification

40

First Practical with 1D CNN


main.py

models.py

[37] Wei Hu

41



▪ Input layer: takes a pixel spectral vector with 𝑛1 bands

▪ Convolutional layer (C1) filters the 𝑛1 × 1 vector with 20 kernels of size 𝑘1 × 1

▪ 20 × 𝑛2 × 1 nodes (with 𝑛2 = 𝑛1 − 𝑘1 + 1)

▪ Max pooling layer (M2) contains 20 × 𝑛3 × 1 (with 𝑛3 =𝑛2

𝑘2)

▪ Fully connected layer (F3) has 𝑛4 nodes

▪ Output layer has 𝑛5 nodes (i.e., the number of classes of the dataset)

[37] Wei HuTotal number of trainable parameters20 × 𝑘1 + 1 + 20 × 𝑛3 + 1 × 𝑛4 + 𝑛4 + 1 × 𝑛5

42



models.py

43

Run the Classifier


▪ $ srun python main.py --model hu --dataset PaviaU --training_sample 0.1 --cuda 0

44


2D CNNs with Practicals

45

2D Approach


▪ 1D Spectral

▪ 2D Spatial

46

Class score

Series of convolution and pooling layers

Fully connected layers

▪ Convolutional layers: convolution operation on the input

▪ Emulate the response of an individual neuron to visual stimuli

▪ Each convolutional neuron processes data only for its receptive field

▪ Polling layers: progressively reduce the spatial size of the representation

▪ Reduce the amount of parameters and computation and control overfitting

▪ Fully connected layers connect every neuron in one layer to every neuron in another layer

▪ Same principle as the traditional multi-layer perceptron (MLP) network


Image Classification CNNs

[38] J. Long et al.

47

▪ Break the images into many small crops and classify the central pixel

▪ Redundant and computationally expensive (not optimal for large annotated datasets)

▪ Stores not only every pixel but also the surrounding pixels

▪ Increases the data size by a factor determined by the number of neighbouring pixels

▪ Very inefficient and not reusing shared features between overlapping patches

More visible with Hyperspectral images

Building

Building

Tree

Extract patchClassify center pixel with CNN


Approach for the Practicals: Sliding Window

48

Series of convolution and pooling layers

▪ Semantic segmentation tasks have input images with different sizes

▪ Fully convolutional networks can take input of any arbitrary size

▪ Produce correspondingly-sized dense output with efficient inference and learning

Fully connected layers:Inputs and feature maps of fixed size

Convolution can have inputs of any size

“Transforming a classification-purposed CNN to producespatial heatmaps by replacingfully connected layerswith convolutional ones”


[38] J. Long et al.

Semantic Segmentation Approach: Fully Convolutional

49

2D Convolution on Single Image


▪ 2D Convolutions▪ Consider data as usual image▪ 2-direction (x,y) to calculate the convolution ▪ Local spatial neighborhood ▪ Context information

▪ The 2D Convolution applied on a single image outputs an image▪ Input = [W,H], filter = [k,k], output = [W,H]

[39] D. Tran

50

2D Convolution on MultiChannel Images


▪ The 2D convolution applied on multichannel images also results in an image

▪ The filter depth = L must be matched with the input channels = L

▪ 2-direction (x,y) to calculate the convolution (not 3D)

▪ input = [W,H,L], filter = [k,k,L] output = [W,H]

[39] D. Tran

The spectral information of the multichannel data is lost right after each convolution operation

Local spatialglobal spectral

51

Layer of the 2D CNN – RBG Image


▪ The “neuron” does a weighted sum of the pixels across a small region define by the filter

▪ It then acts by adding a bias and feeding the result through its activation function

▪ By sliding the patch of weights across the image in both directions (x,y) ▪ Obtain as many output values as there were pixels in the image ▪ Padding is necessary at the edges

▪ To add more degrees of freedom, add N different set of weights (i.e., filters)▪ Can be done by adding a dimension to the tensor

[40] M. Görner

Then the output shape is 3D = 2D x N matrix

52

Layer of the 2D CNN – Hyperspectral Image


▪ The number of weights is proportional to the number of input channels

▪ Previous example: to generate one plane of output values using a patch size of 4x4 and an RGB image as the input, 4x4x3=48 weights were necessary

▪ With Hyperspectral images the number of weights increases dramatically

▪ For example, the shape of the weights tensor will become

[ 4 x 4 x C x N]

[39] D. Tran

Hundreds of spectral channels

53

2D CNN for Hyperspectral Images


▪ A popular approach to transpose CNNs to hyperspectral imaging is to reduce the spectral dimension in order to close the gap between hyperspectral and RGB images

▪ E.g., Principal Component Analysis (PCA) ▪ Project the hyperspectral data into a 3-channel tensor ▪ Then perform the classification using a standard 2D CNN architecture

[36] N. Audebert

54

Second Practical with 2D CNN


main.py

models.py

[42] V. Sharma

55



▪ Each band is treated as a separate image (i.e., gray-scale image)▪ Architecture:

▪ 3 convolutional layers, 2-fully connected layers 1 C-way softmax (C=number of classes)▪ All the convolution layers are followed by

▪ A batch normalization layer ▪ A rectification linear unit layer (ReLU)▪ A max pooling layer

▪ The ReLU and a dropout layer are applied to the output of the first fully connected layer

[42] V. Sharma

The number of trainable parameters for 5-layer network is 20 million

56



models.py

Note that this performs a 2D convolution on a single image

Refer to

torch.nn.Conv3d(in_channels, out_channels, kernel_size,…)

[43] Pytorch Parameters

57

Run the Classifier


▪ $ srun python main.py --model sharma --dataset PaviaU --training_sample 0.1 --cuda0 --epoch 50

58

Run time with 50 epochs: ~12hours

Free Practical (1): Hyperparameters


Consider the 1D CNN and change these parameters:

--epoch EPOCH Training epochs (optional, if absent will be set by the model)--patch_size PATCH_SIZE

Size of the spatial neighbourhood (optional, if absent will be set by the model)--lr LR Learning rate, set by the model if not specified.

Consider the 1D CNN and change its architecture:

models.py


▪ 1 Epoch: entire training set passed forward and backward through the network in once

▪ The training set is usually divided in batches since the data can be too large

▪ 1 iteration: entire batch passed forward and backward through the network in once

▪ If 1000 training samples and batch size set to 500, it means 2 iterations to complete 1 Epoch

Epochs and Iterations in DL

[43] Epoch vs Batch Size vs IterationsWhat is the right numbers of epochs?


Learning Rate

▪ Gradient: points the direction in which the function has the steepest rate of increase

▪ it does not tell how far along this direction we should step

▪ Learning rate: the size of the steps to take to reach a local minimum

[44] Minimizing cost functions with gradient descent


Other Optimizers

▪ RMSprop: uses a moving average of squared gradients to normalize the gradient itself

▪ Has an effect of balancing the step size

▪ Decrease the step for large gradient to avoid exploding

▪ Increase the step for small gradient to avoid vanishing

▪ Adagrad: adapts the learning rate to the parameters, performing:

▪ Low learning rates for parameters associated with frequently occurring features

▪ High learning rates for parameters associated with infrequent features

▪ Well-suited for dealing with sparse data

▪ Adam: inherits from RMSProp and AdaGrad

[45] An overview of gradient descent optimization algorithms


Principal Component Analysis (PCA)

Affine transformation to decorrelate the axis

INFORMATION ABSTRACTION

▪ Nearby bands tend to be correlated (correlation means redundancy - images “look alike”)

▪ Theoretically n bands = n dimensional data

▪ The “actual” dimension required to represent data with negligible information loss is lower

n = 103 bands

PCA finds the linear subspace that shows the largest variances (i.e., eigenvalue decomposition of the covariance matrix)

K = 4 Principal Components

Pick K dimensions with the highest amount of variance

103 components

OA 71.82% 70.55%

PREPROCESSING

Original PCA

[46] H. Hotelling

63

Free Practical (2): Reduce the Data with PCA and Classify


datasets.py

Add PCA

Hint: use Sklearn:https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html

Lecture Bibliography (1)

▪ [1] CORONA: American’s First Satellite Program: first photograph.

Online: https://www.oneonta.edu/faculty/baumanpr/geosat2/RS%20History%20II/RS-History-Part-2.html

▪ [2] The Earth-Atmosphere Energy Balance

Online: http://theatmosphere.pbworks.com/w/page/27058542/The%20Earth-Atmosphere%20Energy%20Balance

▪ [3] Active-and-passive-remote-sensing

Online: http://grindgis.com/remote-sensing/active-and-passive-remote-sensing

▪ [4] Airborne Hyperspectral Remote Sensing Systems

Online: http://uregina.ca/piwowarj/Satellites/Hyperspectral.html

▪ [5] Tsunami Damage, Gleebruk, Indonesia January 7, 2005

Online: https://earthobservatory.nasa.gov/IOTD/view.php?id=5145

▪ [6] Earth Observation Mission Sentinel 2

Online: https://sentinel.esa.int/web/sentinel/missions/sentinel-2

▪ [7] CORINE land cover

Online: https://land.copernicus.eu/pan-european/corine-land-cover

▪ [10] Military Grid Reference System

Online: https://en.wikipedia.org/wiki/Military_Grid_Reference_System


https://www.oneonta.edu/faculty/baumanpr/geosat2/RS%20History%20II/RS-History-Part-2.html

http://theatmosphere.pbworks.com/w/page/27058542/The%20Earth-Atmosphere%20Energy%20Balance

http://grindgis.com/remote-sensing/active-and-passive-remote-sensing

http://uregina.ca/piwowarj/Satellites/Hyperspectral.html

https://earthobservatory.nasa.gov/IOTD/view.php?id=5145

https://sentinel.esa.int/web/sentinel/missions/sentinel-2

https://land.copernicus.eu/pan-european/corine-land-cover

https://en.wikipedia.org/wiki/Military_Grid_Reference_System


▪ [11] Population growth from 1975 - 2010 of Manila

Online: https://manilabydaniellaandisabel.weebly.com/location-and-characteristics.html

▪ [12] Deforestation in Bolivia from 1986 to 2001

Online: https://www.satimagingcorp.com/gallery/more-imagery/aster/aster-deforestation-bolivia/

▪ [13] X. X. Zhu et al., "Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources," in IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 8-36, Dec. 2017.

▪ [14] J. E. Ball, D. T. Anderson, and C. S. Chan, “A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools andChallenges for the Community,” in Proceedings of the SPIE Journal of Applied Remote Sensing, 2017.

▪ [15] Y. Yang and S. Newsam, "Bag-of-visual-words and spatial extensions for land-use classification," in Proc. ACM SIGSPATIAL Int. Conf. Adv. Geogr. Inform. Syst., 2010, pp. 270-279.

▪ [16] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Spectral–spatial classification of hyperspectral data using loopy belief propagation and active learning,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2, pp. 844-856, 2013.

▪ [17] Q. Zou, L. Ni, T. Zhang, and Q. Wang, “Deep learning based feature selection for remote sensing scene classification,” IEEEGeosci. Remote Sens. Lett., vol. 12, no. 11, pp. 2321-2325, 2015.

▪ [18] S. Basu, S. Ganguly, S. Mukhopadhyay, R. DiBiano, M. Karki, and R. Nemani. Deepsat: a learning framework for satellite imagery. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, page 37. ACM, 2015

▪ [19] B. Zhao, Y. Zhong, G.-S. Xia, and L. Zhang, “Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 4, pp. 2108-2123, 2016

▪ [20] L. Zhao, P. Tang, and L. Huo, “Feature significance-based multibag-of-visual-words model for remote sensing image scene classification,” J. Appl. Remote Sens., vol. 10, no. 3, pp. 035004-035004, 2016.


https://manilabydaniellaandisabel.weebly.com/location-and-characteristics.html

https://www.satimagingcorp.com/gallery/more-imagery/aster/aster-deforestation-bolivia/


▪ [21] O. A. Penatti, K. Nogueira, and J. A. dos Santos, "Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?," in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. Workshops, 2015, pp. 44-51.

▪ [22] G. Cheng, J. Han and X. Lu, "Remote Sensing Image Scene Classification: Benchmark and State of the Art," in Proceedings of the IEEE, vol. 105, no. 10, pp. 1865-1883, Oct. 2017.

▪ [23] G.-S. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, and L. Zhang. Aid: A benchmark dataset for performance evaluation of aerial scene classication. arXiv preprint arXiv:1608.05167, 2016.

▪ [24] P. Helber, B. Bischke, A. Dengel, D. Borth. 2017. “EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification”, ArXiv

▪ [25] Li, H., Tao, C., Wu, Z., Chen, J., Gong, J., & Deng, M. (2017). RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data. Retrieved from https://arxiv.org/abs/1705.10450

▪ [26] Zhou, W., Newsam, S., Li, C., & Shao, Z. (2017). PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval. arXiv preprint arXiv:1706.03424.

▪ [27] Sumbul, G., Charfuelan, M., Demir, B., & Markl, V. (2019). BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding. Retrieved from http://arxiv.org/abs/1902.06148

▪ [28] ImageNet project

Online: http://image-net.org/about-overview

▪ [29] G. Cavallaro, N. Falco, M. Dalla Mura and J. A. Benediktsson, "Automatic Attribute Profiles," in IEEE Transactions on Image Processing, vol. 26, no. 4, pp. 1859-1872, April 2017.

▪ [30] H. Lee, R. Grosse, R. Ranganath and A. Y. Ng, “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks”, in Commun. ACM, vol. 54, no. 10, pp. 95-103, 2011.


https://arxiv.org/abs/1705.10450

http://arxiv.org/abs/1902.06148

http://image-net.org/about-overview


▪ [31] Hyperspectral Remote Sensing Scenes

Online: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes

▪ [32] Goel, P. K., Prasher, S. O., Patel, R. M., Landry, J. A., Bonnell, R. B., & Viau, A. A. (2003). Classification of hyperspectral data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn. Computers and Electronics in Agriculture. https://doi.org/10.1016/S0168-1699(03)00020-6

▪ [33] DeepHyperX: Deep learning for Hyperspectral imagery

Online: https://gitlab.inria.fr/naudeber/DeepHyperX/

▪ [34] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, J. Garcia-Rodriguez, “A Review on Deep Learning Techniques Applied to Semantic Segmentation”, in CoRR, 2017.

Online: http://arxiv.org/abs/1704.06857

▪ [35] Image Segmentation Using DIGITS 5

Online: https://devblogs.nvidia.com/image-segmentation-using-digits-5/

▪ [36] Audebert, N., Saux, B., & Lefèvre, S. (2019). Deep Learning for Classification of Hyperspectral Data: A Comparative Review. https://doi.org/10.1109/mgrs.2019.2912563

▪ [37] Wei Hu, Yangyu Huang, Li Wei, Fan Zhang and Hengchao Li, Deep Convolutional Neural Networks for Hyperspectral Image Classification Journal of Sensors, Volume 2015 (2015) https://www.hindawi.com/journals/js/2015/258619/

▪ [38] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3431-3440.

▪ [39] Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2015.510

▪ [40] Martin Görner, Theory: convolutional networks

Online: https://sites.google.com/site/nttrungmtwiki/home/it/data-science---python/tensorflow/tensorflow-and-deep-learning-part-3?tmpl=%2Fsystem%2Fapp%2Ftemplates%2Fprint%2F&showPrintDialog=1


http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes

https://doi.org/10.1016/S0168-1699(03)00020-6

https://gitlab.inria.fr/naudeber/DeepHyperX/

http://arxiv.org/abs/1704.06857

https://devblogs.nvidia.com/image-segmentation-using-digits-5/

https://doi.org/10.1109/mgrs.2019.2912563

https://www.hindawi.com/journals/js/2015/258619/

https://doi.org/10.1109/ICCV.2015.510

https://sites.google.com/site/nttrungmtwiki/home/it/data-science---python/tensorflow/tensorflow-and-deep-learning-part-3?tmpl=%2Fsystem%2Fapp%2Ftemplates%2Fprint%2F&showPrintDialog=1


▪ [41] Sharma, V., Diba, A., Tuytelaars, T., & Gool, L.V., Hyperspectral CNN for image classification & band selection, with application to face recognition, 2016

▪ [42] Pytorch Parameters

Online: https://pytorch.org/docs/stable/nn.html

▪ [43] Epoch vs Batch Size vs Iterations

Online: https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9

▪ [44] Minimizing cost functions with gradient descent

Online: https://mllog.github.io/2016/11/04/Python-ML-Chapter-2/

▪ [45] An overview of gradient descent optimization algorithms

Online: http://ruder.io/optimizing-gradient-descent/index.html#rmsprop

▪ [46] H. Hotelling, “Analysis of a complex of statistical variables into principal components", in Journal of Educational Psychology, 24, 417–441, and 498–520, 1933

Online: https://www.scribd.com/document/59617538/Analysis-of-a-Complex-of-Statistical-Variables-Into-Principal-Components


https://pytorch.org/docs/stable/nn.html

https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9

http://ruder.io/optimizing-gradient-descent/index.html#rmsprop

https://www.scribd.com/document/59617538/Analysis-of-a-Complex-of-Statistical-Variables-Into-Principal-Components

Deep Learning Training Course - morrisriedel.de · High Productivity Data Processing Group Juelich...

Documents

Transcript of Deep Learning Training Course - morrisriedel.de · High Productivity Data Processing Group Juelich...