Deep Learning Training Course - morrisriedel.de · High Productivity Data Processing Group Juelich...
Transcript of Deep Learning Training Course - morrisriedel.de · High Productivity Data Processing Group Juelich...
Introduction to Deep Learning Models
Dr. – Ing. Gabriele CavallaroPostdoctoral ResearcherHigh Productivity Data Processing GroupJuelich Supercomputing Centre, Germany
Introduction to Deep Learning for Remote Sensing and 1D/2D CNNs for Hyperspectral Images Classification
May 22th, 2019Juelich Supercomputing Centre, Juelich, Germany
Deep Learning Training Course
LECTURE 5
Outline of the Course
1. Introduction to Machine Learning Fundamentals
2. Deep Learning: Overview
3. Neural Networks
4. Convolutional Neural Networks (CNNs)
5. Introduction to Deep Learning for Remote Sensing and 1D/2D CNNs for
Hyperspectral Images Classification
6. 3D CNNs for Hyperspectral Images Classification, Training Set Selection and
Performance Evaluation
7. Recurrent Neural Networks
8. LSTM Neural Networks
9. Deep Reinforcement Learning
10. Course Summary and Lessons Learned
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 2
Outline
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 3
Outline
▪ Remote Sensing (RS) Background
▪ Challenges for Deep Learning (DL) in RS
▪ DL and Shallow Learning
▪ Classification of Hyperspectral Images
▪ 1D Convolutional Neural Networks (CNNs)
▪ 2D CNNs
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 4
Remote Sensing Background
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 5
▪ The term remote sensing was first used in the United States in the1950s by Ms. Evelyn Pruitt of the U.S. Office of Naval Research
Remote (without physical contact) Sensing (measurement of information)
▪ Measurement of radiation of differentwavelengths reflected or emitted fromdistant objects or materials
▪ They may be categorized by class/type,substance, and spatial distribution
[1] Satellite (1960)
[2] The Earth-Atmosphere Energy Balance
Remote Sensing
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 6
▪ Active Sensor: own source of illumination
▪ Capture image in day and night
▪ Any weather or cloud conditions
▪ Passive Sensor: natural light available
▪ Great quality satellite imagery
▪ Multispectral and Hyperspectral technology
▪ Platform: selected according to the application
[3] Active-and-passive-remote-sensing
Platforms and Sensors
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
DATA ACQUISITION
PROCESSING PRODUCTS
7
▪ Landscapes vary greatly in their spatial complexity
▪ Some may be represented clearly at coarse levels of detail
▪ Others are so complex that the finest level of detail is required
▪ The spatial minimum area discernable by a pixel (i.e., picture element)
▪ Influenced primarily by the sensor and the platform altitude
1m 10m 30m
Spatial Resolution
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 8
▪ Remote sensing observes at varied wavelengths
▪ Ability to define fine wavelength intervals
▪ The finer the spectral resolution, the narrower the wavelength range of a particular band
Landsat (7-11 bands) AVIRIS (256 bands)
[4] Airborne Hyperspectral
Spectrum of a mixture of three common minerals: kaolinite, Dolomite, Hematite
Spectral Resolution
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 9
▪ Maximum number of brightness levels available
▪ Depends on the number of bits used in representing the energy recorded
▪ Higher radiometric resolution, sharper images
Radiometric Resolution
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 10
▪ The time it takes for a satellite to complete one orbit cycle (revisit time)
▪ Depends on the satellite/sensor capabilities, swath overlap and latitude
Repeated imaging enables assessment of changes in the type or condition of surface features
[5] Tsunami Damage, Gleebruk, Indonesia January 7, 2005
Temporal Resolution
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 11
• Platform: Twin polar-orbiting satellites, phased at 180° to each other
• Temporal resolution of 5 days at the equator in cloud-free conditions
▪ Images for mapping land-use change, land-cover change, biophysical variables
▪ Monitor the coastal and inland waters and help with risk and disaster mapping
[6] Earth Observation Mission Sentinel 2
~23 TB data stored per day
Sentinel 2 Mission
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 12
▪ Different methods can be applied before further visual interpretation or digital analysis
▪ Often combined, as part of a chain of operations ▪ Start with a raw image and obtain a highly transformed output product
▪ The boundaries between this operations are not well defined in practice
Remote Sensing Image Processing
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
IMAGE PREPROCESSING
IMAGE ENHANCEMENT
IMAGE CLASSIFICATION
ANALYSIS OF CHANGE OVER TIME
DATA FUSION AND GIS INTEGRATION
HYPERSPECTRAL IMAGE ANALYSIS
BIOPHYSICAL MODELING
DATA ACQUISITION
PROCESSING PRODUCTS
13
• Perhaps the most common form of image interpretation
• Applications: environmental management, agricultural planning, health studies, climate and biodiversity monitoring, and land change detection
Generation of thematic maps
Classification of Remote Sensing Images
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 14
▪ Important task for regularly monitoring the Earth’s surface
▪ Generation of reliable maps with spatial consistency is challenging
▪ Data from several satellite orbit tracks, different dates and presence of clouds
▪ Cannot rely on field campaigns
▪ Use existing databases to build the reference sets for supervised classification
[7] CORINE land cover
Update of Land-Cover Maps at Continent Scale
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ E.g., Corine Land Cover Programme▪ Every 6 years a new map▪ 44 classes in the hierarchical 3-level
15
▪ Global change detection and monitoring: Deforestation
▪ Environmental assessment and monitoring: Urban growth
[12] Deforestation in Bolivia from 1986 to 2001
[11] Population growth from 1975 - 2010 of Manila
1986 2001
Applications
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 16
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Challenges for Deep Learning in Remote Sensing
17
▪ DL has risen to the top in numerous areas in the last years▪ Computer vision, speech recognition, etc.
▪ Deep learning is also taking off in RS
▪ RS offers a number of unique challenges for deep learning
[13] X. X. Zhu et al
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Deep Learning in Remote Sensing
Exponential increase of the papers published
18
▪ Multimodal data: geometries and content are completely different
▪ From optical (multi- and hyperspectral), Lidar, and synthetic aperture radar (SAR) sensors
▪ High temporal resolutions data▪ Shift from individual image analysis to time-series processing
▪ Big remote sensing data: high spatial and more spectral dimensionality
▪ Traditional DL systems operate on relatively small grayscale or RGB imagery
Optical data: From four to hundreds of channelsSAR images: noisy data
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Main Challenges for DL in RS
19
▪ The existing datasets (especially Hyperspectral) have several limitations
▪ Small scale of scene classes and the image numbers▪ Lack of image variations and diversity▪ Saturation of accuracy
▪ Several RGB and Multispectral datasets have been released
DatasetsImagetype
Image perclass
Scene classes
Annotation type
Total images
Spatial resolution (m)
Image sizes
Year Ref.
UC Merced Aerial RGB 100 21 Single/Multi label 2100 0.3 256x256 2010 [15]WHU-RS19 Aerial RGB ~50 19 Single label 1005 up to 0.5 600x600 2012 [16]
RSSCN7 Aerial RGB 400 7 Single label 2800 -- 400x400 2015 [17]SAT-6 Aerial MS -- 6 Single label 405000 1 28x28 2015 [18]
SIRI-WHU Aerial RGB 200 12 Single label 2400 2 200x200 2016 [19]RSC11 Aerial RGB ~100 11 Single label 1323 0.2 512x512 2016 [20]
Brazilian Coffee Satellite MS 1438 2 Single label 2876 -- 64x64 2016 [21]RESISC45 Aerial RGB 700 45 Single label 31500 ~30 to 0.2 256x256 2016 [22]
AID Aerial RGB ~300 30 Single label 10000 0.6 600x600 2016 [23]EuroSAT Satellite MS ~2500 10 Single label 27000 10 64x64 2017 [24]
RSI-CB128 Aerial RGB ~800 45 Single label 36000 0.3 to 3 128x128 2017 [25]RSI-CB256 Aerial RGB ~690 35 Single label 24000 0.3 to 3 256x256 2017 [25]PatternNet Aerial RGB ~800 38 Single label 30400 0.062~4.693 256X256 2017 [26]
[14] J. E. Ball
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Limited Number of Remote Sensing Datasets
20
▪ ImageNet dataset: 14,197,122 images [28] ImageNet project
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Most Recent Multispectral Dataset: BigEarthNet
DatasetsImagetype
Image perclass
Scene classes
Annotation type
Total images
Spatial resolution (m)
Image sizes
Year Ref.
BigEarthNet Satellite MS328 to 217119
43 Multi label 590,326102060
120x12060x6020x20
2018
[27] G. Sumbul
21
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Deep Learning and Shallow Learning
22
How to Extract Spatial Features?
• Very High Spatial Resolution images: huge amount of details
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 23
DATA ACQUISITION
PREPROCESSINGFEATURE
EXTRACTION CLASSIFICATION
▪ Traditional feature extraction methods involve approaches to extract information
▪ Based on spatial, spectral, textural, morphological content, etc.
▪ The features are designed for a specific task
▪ E.g. Enhancing the spatial information (Self-Dual Attribute Profiles - SDAPs)
Representative: they contain salient structures of the input image
Non-redundant: same objects are present only in one or few levels of the SDAP
[29] G. Cavallaro
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Traditional RS Pipeline
24
▪ Shallow learning: learning networks that usually have at most one to two layers
▪ E.G. Support Vector Machines (SVMs)
▪ They compute linear or nonlinear functions of the data (often hand-designed features)
▪ DL means a deeper network with many layers of non-linear transformations
▪ No universally accepted definition of how many layers constitute a “deep” learner
▪ Typical networks are typically at least four or five layers deep.
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Deep Learning and Shallow Learning
25
http://www.cs.utoronto.ca/~rgrosse/cacm2011-cdbn.pdf
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Deep Networks Learn Hierarchical Feature Representations
[30] H. Lee et al.
26
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Classification of Hyperspectral Images
27
RGB 3 Channels Hyperspectral
Hyperspectral Images
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification [31] Hyperspectral remote sensing
▪ Different materials produce distinct electromagnetic radiation spectra
▪ The curve of each class has its own visual shape which is different from other classes
▪ It is relatively difficult to distinguish some classes with human eye
▪ E.g., Gravel and bricks
Gravel
Bricks
Trees 28
Challenges for Deep Learning for Hyperspectral Data
▪ The application of deep learning with hyperspectral images is less straightforward than with other optical data (i.e., RGB images)
▪ Deep learning techniques designed for computer vision are not easily transferable
▪ Most hyperspectral sensors have a low spatial resolution
▪ The spectral dimension prevails over the spatial neighborhood in most cases
▪ Moreover, the low spatial resolution limits the number of training samples
▪ Not trivial classification process
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Objects smaller than thespatial resolution are mixedwith their neighborhood
29
▪ Deep networks: most straightforward evolution from shallow machine learning to deeplearning
▪ Have been implemented since the 2000s with small networks, and brought up to daterecently Convolutional Neural Networks (CNNs)
▪ CNNs can achieve competitive and even better performance than human being in somevisual problems
▪ Different CNNs architectures can be adopted for classifying Hyperspectral images▪ With 1D convolutions▪ With 2D convolutions▪ With 3D convolutions▪ With a combination of the above
▪ What distinguish them?▪ Convolutional direction▪ Input shape▪ Output shape
Deep Learning for Classification of Hyperspectral Images
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
[32] P. Goel
30
Practicals
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ Based on DeepHyperX▪ Available online at https://gitlab.inria.fr/naudeber/DeepHyperX/▪ Deep learning toolbox for Hyperspectral images▪ PyTorch implementation of several state-of-the-art CNN (i.e., 1D, 2D, 3D) ▪ Tested with different public hyperspectral datasets
▪ Login with Jupyter JSC and start Jupyter (use login node)▪ Once inside open the Terminal
[33] DeepHyperX
31
Practicals
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ Navigate to the project folder with $ cd /p/project/training1911/
▪ Create you own folder (with $ mkdir ) and get inside
▪ Copy your own copy of the toolbox in your own folder with$ cp -R /p/project/training1911/practicals/lecture_5_and_6/DeepHyperX .
[33] DeepHyperX
32
Start the Interactive Session and the Environment
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ Start an interactive session by requiring a node with GPUs$ salloc --gres=gpu:4 --partition=gpus --nodes=1 --account=training1911 --time=02:00:00 --reservation=intro-dl-wed
▪ Load the virtual environment with $ source /p/project/training1911/practicals/lecture_5_and_6/load_environment_jureca.sh
▪ Check if you are on the right node (i.e., you can see the GPUs)$ srun nvidia-smi
33
Models for the Practicals
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
[33] DeepHyperX
34
Hyperspectral Dataset for the Practicals
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ Acquired by the ROSIS sensor (aerial platform) with a spatial resolution of 1.3 m
▪ Data cube of 610×340×103
▪ 9 land cover classes were annotated covering 50% of the whole surface
[31] Hyperspectral remote sensingPavia University
35
• Classification: make a prediction for a whole input
• What are the classes and ranked list
• Localization or detection: towards fine-grained inference
• Classification and spatial location (e.g., bounding boxes)
• Semantic segmentation: fine-grained inference
• Make dense predictions inferring labels for every pixel
• Further improvements: Provide different instances of the same class
• Decomposition of already segmented classes into their components
• Many applications nourish from inferring knowledge from imagery
• Autonomous driving
• Human-machine interaction
• Computational photography
• Image search engines
• Remote sensing
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Progression from Coarse to Fine Inference
[34] A. Garcia-Garcia
[35] Image Segmentation 36
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Task for the Practicals: Pixel-Wise Classification
SAMPLER
TRAINING PHASE(learn)
MODEL(parameters)
TEST PHASE(performance)
CLASSIFIER
ANNOTATED SAMPLES(GROUDTRUTH)
TRAININGSET
TESTSET
VALIDATIONSET
▪ Training set: used to train the model▪ How do we ensure that the model is not overfitting to the data in the training set?
▪ Validation set: used to validate the model during training
▪ Its classification is based only on the model that is learnt from the training set▪ The model weights are updated based on this set▪ Help to adjust the hyperparameters (e.g., number of hidden layers, learning rate, etc.. )
▪ Test set: used to test the model after it has been trained
37
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
1D CNNs with Practicals
38
1D Approach
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ 1D Spectral
▪ Most straightforward approach for classifying hyperspectral data
▪ Consider it as a set of 1D spectra makes sense given its finespectral resolution but low spatial resolution
▪ Each pixel corresponds to a spectral signature, that is a discretesignal to which a statistical model can be fit
39
1D Convolution
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ 1D Convolution ▪ Calculate just in 1 direction
▪ E.g., time-axis▪ Output-shape is 1D array
▪ 1D Convolution with Hyperspectral images▪ Direction: spectrum dimension▪ Output: 1D vector ▪ Ability to discover close relations
[36] N. Audebert
E.g., Graph smoothing
feature extraction andrepresentation learning
fully connected and performs the classification
40
First Practical with 1D CNN
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
main.py
models.py
[37] Wei Hu
41
First Practical with 1D CNN
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ Input layer: takes a pixel spectral vector with 𝑛1 bands
▪ Convolutional layer (C1) filters the 𝑛1 × 1 vector with 20 kernels of size 𝑘1 × 1
▪ 20 × 𝑛2 × 1 nodes (with 𝑛2 = 𝑛1 − 𝑘1 + 1)
▪ Max pooling layer (M2) contains 20 × 𝑛3 × 1 (with 𝑛3 =𝑛2
𝑘2)
▪ Fully connected layer (F3) has 𝑛4 nodes
▪ Output layer has 𝑛5 nodes (i.e., the number of classes of the dataset)
[37] Wei HuTotal number of trainable parameters20 × 𝑘1 + 1 + 20 × 𝑛3 + 1 × 𝑛4 + 𝑛4 + 1 × 𝑛5
42
First Practical with 1D CNN
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
models.py
43
Run the Classifier
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ $ srun python main.py --model hu --dataset PaviaU --training_sample 0.1 --cuda 0
44
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
2D CNNs with Practicals
45
2D Approach
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ 1D Spectral
▪ 2D Spatial
46
Class score
Series of convolution and pooling layers
Fully connected layers
▪ Convolutional layers: convolution operation on the input
▪ Emulate the response of an individual neuron to visual stimuli
▪ Each convolutional neuron processes data only for its receptive field
▪ Polling layers: progressively reduce the spatial size of the representation
▪ Reduce the amount of parameters and computation and control overfitting
▪ Fully connected layers connect every neuron in one layer to every neuron in another layer
▪ Same principle as the traditional multi-layer perceptron (MLP) network
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Image Classification CNNs
[38] J. Long et al.
47
▪ Break the images into many small crops and classify the central pixel
▪ Redundant and computationally expensive (not optimal for large annotated datasets)
▪ Stores not only every pixel but also the surrounding pixels
▪ Increases the data size by a factor determined by the number of neighbouring pixels
▪ Very inefficient and not reusing shared features between overlapping patches
More visible with Hyperspectral images
Building
Building
Tree
Extract patchClassify center pixel with CNN
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Approach for the Practicals: Sliding Window
48
Series of convolution and pooling layers
▪ Semantic segmentation tasks have input images with different sizes
▪ Fully convolutional networks can take input of any arbitrary size
▪ Produce correspondingly-sized dense output with efficient inference and learning
Fully connected layers:Inputs and feature maps of fixed size
Convolution can have inputs of any size
“Transforming a classification-purposed CNN to producespatial heatmaps by replacingfully connected layerswith convolutional ones”
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
[38] J. Long et al.
Semantic Segmentation Approach: Fully Convolutional
49
2D Convolution on Single Image
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ 2D Convolutions▪ Consider data as usual image▪ 2-direction (x,y) to calculate the convolution ▪ Local spatial neighborhood ▪ Context information
▪ The 2D Convolution applied on a single image outputs an image▪ Input = [W,H], filter = [k,k], output = [W,H]
[39] D. Tran
50
2D Convolution on MultiChannel Images
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ The 2D convolution applied on multichannel images also results in an image
▪ The filter depth = L must be matched with the input channels = L
▪ 2-direction (x,y) to calculate the convolution (not 3D)
▪ input = [W,H,L], filter = [k,k,L] output = [W,H]
[39] D. Tran
The spectral information of the multichannel data is lost right after each convolution operation
Local spatialglobal spectral
51
Layer of the 2D CNN – RBG Image
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ The “neuron” does a weighted sum of the pixels across a small region define by the filter
▪ It then acts by adding a bias and feeding the result through its activation function
▪ By sliding the patch of weights across the image in both directions (x,y) ▪ Obtain as many output values as there were pixels in the image ▪ Padding is necessary at the edges
▪ To add more degrees of freedom, add N different set of weights (i.e., filters)▪ Can be done by adding a dimension to the tensor
[40] M. Görner
Then the output shape is 3D = 2D x N matrix
52
Layer of the 2D CNN – Hyperspectral Image
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ The number of weights is proportional to the number of input channels
▪ Previous example: to generate one plane of output values using a patch size of 4x4 and an RGB image as the input, 4x4x3=48 weights were necessary
▪ With Hyperspectral images the number of weights increases dramatically
▪ For example, the shape of the weights tensor will become
[ 4 x 4 x C x N]
[39] D. Tran
Hundreds of spectral channels
53
2D CNN for Hyperspectral Images
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ A popular approach to transpose CNNs to hyperspectral imaging is to reduce the spectral dimension in order to close the gap between hyperspectral and RGB images
▪ E.g., Principal Component Analysis (PCA) ▪ Project the hyperspectral data into a 3-channel tensor ▪ Then perform the classification using a standard 2D CNN architecture
[36] N. Audebert
54
Second Practical with 2D CNN
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
main.py
models.py
[42] V. Sharma
55
Second Practical with 2D CNN
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ Each band is treated as a separate image (i.e., gray-scale image)▪ Architecture:
▪ 3 convolutional layers, 2-fully connected layers 1 C-way softmax (C=number of classes)▪ All the convolution layers are followed by
▪ A batch normalization layer ▪ A rectification linear unit layer (ReLU)▪ A max pooling layer
▪ The ReLU and a dropout layer are applied to the output of the first fully connected layer
[42] V. Sharma
The number of trainable parameters for 5-layer network is 20 million
56
Second Practical with 2D CNN
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
models.py
Note that this performs a 2D convolution on a single image
Refer to
torch.nn.Conv3d(in_channels, out_channels, kernel_size,…)
[43] Pytorch Parameters
57
Run the Classifier
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
▪ $ srun python main.py --model sharma --dataset PaviaU --training_sample 0.1 --cuda0 --epoch 50
58
Run time with 50 epochs: ~12hours
Free Practical (1): Hyperparameters
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 59
Consider the 1D CNN and change these parameters:
--epoch EPOCH Training epochs (optional, if absent will be set by the model)--patch_size PATCH_SIZE
Size of the spatial neighbourhood (optional, if absent will be set by the model)--lr LR Learning rate, set by the model if not specified.
Consider the 1D CNN and change its architecture:
models.py
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 60
▪ 1 Epoch: entire training set passed forward and backward through the network in once
▪ The training set is usually divided in batches since the data can be too large
▪ 1 iteration: entire batch passed forward and backward through the network in once
▪ If 1000 training samples and batch size set to 500, it means 2 iterations to complete 1 Epoch
Epochs and Iterations in DL
[43] Epoch vs Batch Size vs IterationsWhat is the right numbers of epochs?
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 61
Learning Rate
▪ Gradient: points the direction in which the function has the steepest rate of increase
▪ it does not tell how far along this direction we should step
▪ Learning rate: the size of the steps to take to reach a local minimum
[44] Minimizing cost functions with gradient descent
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 62
Other Optimizers
▪ RMSprop: uses a moving average of squared gradients to normalize the gradient itself
▪ Has an effect of balancing the step size
▪ Decrease the step for large gradient to avoid exploding
▪ Increase the step for small gradient to avoid vanishing
▪ Adagrad: adapts the learning rate to the parameters, performing:
▪ Low learning rates for parameters associated with frequently occurring features
▪ High learning rates for parameters associated with infrequent features
▪ Well-suited for dealing with sparse data
▪ Adam: inherits from RMSProp and AdaGrad
[45] An overview of gradient descent optimization algorithms
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification
Principal Component Analysis (PCA)
Affine transformation to decorrelate the axis
INFORMATION ABSTRACTION
▪ Nearby bands tend to be correlated (correlation means redundancy - images “look alike”)
▪ Theoretically n bands = n dimensional data
▪ The “actual” dimension required to represent data with negligible information loss is lower
n = 103 bands
PCA finds the linear subspace that shows the largest variances (i.e., eigenvalue decomposition of the covariance matrix)
K = 4 Principal Components
Pick K dimensions with the highest amount of variance
103 components
OA 71.82% 70.55%
PREPROCESSING
Original PCA
[46] H. Hotelling
63
Free Practical (2): Reduce the Data with PCA and Classify
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 64
datasets.py
Add PCA
Hint: use Sklearn:https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
Lecture Bibliography (1)
▪ [1] CORONA: American’s First Satellite Program: first photograph.
Online: https://www.oneonta.edu/faculty/baumanpr/geosat2/RS%20History%20II/RS-History-Part-2.html
▪ [2] The Earth-Atmosphere Energy Balance
Online: http://theatmosphere.pbworks.com/w/page/27058542/The%20Earth-Atmosphere%20Energy%20Balance
▪ [3] Active-and-passive-remote-sensing
Online: http://grindgis.com/remote-sensing/active-and-passive-remote-sensing
▪ [4] Airborne Hyperspectral Remote Sensing Systems
Online: http://uregina.ca/piwowarj/Satellites/Hyperspectral.html
▪ [5] Tsunami Damage, Gleebruk, Indonesia January 7, 2005
Online: https://earthobservatory.nasa.gov/IOTD/view.php?id=5145
▪ [6] Earth Observation Mission Sentinel 2
Online: https://sentinel.esa.int/web/sentinel/missions/sentinel-2
▪ [7] CORINE land cover
Online: https://land.copernicus.eu/pan-european/corine-land-cover
▪ [10] Military Grid Reference System
Online: https://en.wikipedia.org/wiki/Military_Grid_Reference_System
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 65
Lecture Bibliography (2)
▪ [11] Population growth from 1975 - 2010 of Manila
Online: https://manilabydaniellaandisabel.weebly.com/location-and-characteristics.html
▪ [12] Deforestation in Bolivia from 1986 to 2001
Online: https://www.satimagingcorp.com/gallery/more-imagery/aster/aster-deforestation-bolivia/
▪ [13] X. X. Zhu et al., "Deep Learning in Remote Sensing: A Comprehensive Review and List of Resources," in IEEE Geoscience and Remote Sensing Magazine, vol. 5, no. 4, pp. 8-36, Dec. 2017.
▪ [14] J. E. Ball, D. T. Anderson, and C. S. Chan, “A Comprehensive Survey of Deep Learning in Remote Sensing: Theories, Tools andChallenges for the Community,” in Proceedings of the SPIE Journal of Applied Remote Sensing, 2017.
▪ [15] Y. Yang and S. Newsam, "Bag-of-visual-words and spatial extensions for land-use classification," in Proc. ACM SIGSPATIAL Int. Conf. Adv. Geogr. Inform. Syst., 2010, pp. 270-279.
▪ [16] J. Li, J. M. Bioucas-Dias, and A. Plaza, “Spectral–spatial classification of hyperspectral data using loopy belief propagation and active learning,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 2, pp. 844-856, 2013.
▪ [17] Q. Zou, L. Ni, T. Zhang, and Q. Wang, “Deep learning based feature selection for remote sensing scene classification,” IEEEGeosci. Remote Sens. Lett., vol. 12, no. 11, pp. 2321-2325, 2015.
▪ [18] S. Basu, S. Ganguly, S. Mukhopadhyay, R. DiBiano, M. Karki, and R. Nemani. Deepsat: a learning framework for satellite imagery. In Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems, page 37. ACM, 2015
▪ [19] B. Zhao, Y. Zhong, G.-S. Xia, and L. Zhang, “Dirichlet-derived multiple topic scene classification model for high spatial resolution remote sensing imagery,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 4, pp. 2108-2123, 2016
▪ [20] L. Zhao, P. Tang, and L. Huo, “Feature significance-based multibag-of-visual-words model for remote sensing image scene classification,” J. Appl. Remote Sens., vol. 10, no. 3, pp. 035004-035004, 2016.
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 66
Lecture Bibliography (3)
▪ [21] O. A. Penatti, K. Nogueira, and J. A. dos Santos, "Do deep features generalize from everyday objects to remote sensing and aerial scenes domains?," in Proc. IEEE Int. Conf. Comput. Vision Pattern Recognit. Workshops, 2015, pp. 44-51.
▪ [22] G. Cheng, J. Han and X. Lu, "Remote Sensing Image Scene Classification: Benchmark and State of the Art," in Proceedings of the IEEE, vol. 105, no. 10, pp. 1865-1883, Oct. 2017.
▪ [23] G.-S. Xia, J. Hu, F. Hu, B. Shi, X. Bai, Y. Zhong, and L. Zhang. Aid: A benchmark dataset for performance evaluation of aerial scene classication. arXiv preprint arXiv:1608.05167, 2016.
▪ [24] P. Helber, B. Bischke, A. Dengel, D. Borth. 2017. “EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification”, ArXiv
▪ [25] Li, H., Tao, C., Wu, Z., Chen, J., Gong, J., & Deng, M. (2017). RSI-CB: A Large Scale Remote Sensing Image Classification Benchmark via Crowdsource Data. Retrieved from https://arxiv.org/abs/1705.10450
▪ [26] Zhou, W., Newsam, S., Li, C., & Shao, Z. (2017). PatternNet: A Benchmark Dataset for Performance Evaluation of Remote Sensing Image Retrieval. arXiv preprint arXiv:1706.03424.
▪ [27] Sumbul, G., Charfuelan, M., Demir, B., & Markl, V. (2019). BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding. Retrieved from http://arxiv.org/abs/1902.06148
▪ [28] ImageNet project
Online: http://image-net.org/about-overview
▪ [29] G. Cavallaro, N. Falco, M. Dalla Mura and J. A. Benediktsson, "Automatic Attribute Profiles," in IEEE Transactions on Image Processing, vol. 26, no. 4, pp. 1859-1872, April 2017.
▪ [30] H. Lee, R. Grosse, R. Ranganath and A. Y. Ng, “Unsupervised Learning of Hierarchical Representations with Convolutional Deep Belief Networks”, in Commun. ACM, vol. 54, no. 10, pp. 95-103, 2011.
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 67
Lecture Bibliography (4)
▪ [31] Hyperspectral Remote Sensing Scenes
Online: http://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes
▪ [32] Goel, P. K., Prasher, S. O., Patel, R. M., Landry, J. A., Bonnell, R. B., & Viau, A. A. (2003). Classification of hyperspectral data by decision trees and artificial neural networks to identify weed stress and nitrogen status of corn. Computers and Electronics in Agriculture. https://doi.org/10.1016/S0168-1699(03)00020-6
▪ [33] DeepHyperX: Deep learning for Hyperspectral imagery
Online: https://gitlab.inria.fr/naudeber/DeepHyperX/
▪ [34] A. Garcia-Garcia, S. Orts-Escolano, S. Oprea, V. Villena-Martinez, J. Garcia-Rodriguez, “A Review on Deep Learning Techniques Applied to Semantic Segmentation”, in CoRR, 2017.
Online: http://arxiv.org/abs/1704.06857
▪ [35] Image Segmentation Using DIGITS 5
Online: https://devblogs.nvidia.com/image-segmentation-using-digits-5/
▪ [36] Audebert, N., Saux, B., & Lefèvre, S. (2019). Deep Learning for Classification of Hyperspectral Data: A Comparative Review. https://doi.org/10.1109/mgrs.2019.2912563
▪ [37] Wei Hu, Yangyu Huang, Li Wei, Fan Zhang and Hengchao Li, Deep Convolutional Neural Networks for Hyperspectral Image Classification Journal of Sensors, Volume 2015 (2015) https://www.hindawi.com/journals/js/2015/258619/
▪ [38] J. Long, E. Shelhamer and T. Darrell, "Fully convolutional networks for semantic segmentation," 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, 2015, pp. 3431-3440.
▪ [39] Tran, D., Bourdev, L., Fergus, R., Torresani, L., & Paluri, M. (2015). Learning spatiotemporal features with 3D convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. https://doi.org/10.1109/ICCV.2015.510
▪ [40] Martin Görner, Theory: convolutional networks
Online: https://sites.google.com/site/nttrungmtwiki/home/it/data-science---python/tensorflow/tensorflow-and-deep-learning-part-3?tmpl=%2Fsystem%2Fapp%2Ftemplates%2Fprint%2F&showPrintDialog=1
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 68
Lecture Bibliography (5)
▪ [41] Sharma, V., Diba, A., Tuytelaars, T., & Gool, L.V., Hyperspectral CNN for image classification & band selection, with application to face recognition, 2016
▪ [42] Pytorch Parameters
Online: https://pytorch.org/docs/stable/nn.html
▪ [43] Epoch vs Batch Size vs Iterations
Online: https://towardsdatascience.com/epoch-vs-iterations-vs-batch-size-4dfb9c7ce9c9
▪ [44] Minimizing cost functions with gradient descent
Online: https://mllog.github.io/2016/11/04/Python-ML-Chapter-2/
▪ [45] An overview of gradient descent optimization algorithms
Online: http://ruder.io/optimizing-gradient-descent/index.html#rmsprop
▪ [46] H. Hotelling, “Analysis of a complex of statistical variables into principal components", in Journal of Educational Psychology, 24, 417–441, and 498–520, 1933
Online: https://www.scribd.com/document/59617538/Analysis-of-a-Complex-of-Statistical-Variables-Into-Principal-Components
Lecture 5 - Introduction to DL for RS and 1D and 2D CNNs for Hyperspectral Images Classification 69