IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE …938 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE...

13
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 56, NO. 2, FEBRUARY 2018 937 Multisource Remote Sensing Data Classification Based on Convolutional Neural Network Xiaodong Xu, Student Member, IEEE, Wei Li , Senior Member, IEEE, Qiong Ran, Qian Du , Senior Member, IEEE , Lianru Gao , Member, IEEE, and Bing Zhang, Senior Member, IEEE Abstract— As a list of remotely sensed data sources is available, how to efficiently exploit useful information from multisource data for better Earth observation becomes an interesting but challenging problem. In this paper, the classification fusion of hyperspectral imagery (HSI) and data from other multiple sensors, such as light detection and ranging (LiDAR) data, is investigated with the state-of-the-art deep learning, named the two-branch convolution neural network (CNN). More specific, a two-tunnel CNN framework is first developed to extract spectral-spatial features from HSI; besides, the CNN with cascade block is designed for feature extraction from LiDAR or high- resolution visual image. In the feature fusion stage, the spatial and spectral features of HSI are first integrated in a dual-tunnel branch, and then combined with other data features extracted from a cascade network. Experimental results based on several multisource data demonstrate the proposed two-branch CNN that can achieve more excellent classification performance than some existing methods. Index Terms— Convolutional neural network (CNN), data fusion, deep learning, feature extraction, hyperspectral imagery (HSI). I. I NTRODUCTION H YPERSPECTRAL imagery (HSI) is obtained with hun- dreds of narrow contiguous spectral bands carrying plenty of spectral information. With the advantage of distin- guishing subtle spectral difference, classification tasks with HSI have been applied in various fields, including deforesta- tion, land cover mapping, and mineral exploration [1]–[5]. On the other hand, light detection and ranging (LiDAR) can provide the elevation information of height and shape with respect to the sensor [6], [7]. LiDAR contains full of altitude information, which is valuable for better characterization of the same scene acquired solely by optical sensors [8]–[10]. Manuscript received June 29, 2017; revised August 1, 2017 and August 28, 2017; accepted September 19, 2017. Date of publication October 16, 2017; date of current version January 26, 2018. This work was supported in part by the National Natural Science Foundation of China under Grant NSFC-91638201 and Grant 61571033 and in part by the Higher Edu- cation and High-Quality and World-Class Universities under Grant PY201619 (Corresponding author: Wei Li.) X. Xu, W. Li, and Q. Ran are with the College of Information Science and Technology, Beijing University ofChemical Technology, Beijing 100029, China (e-mail: [email protected]). Q. Du is with the Department of Electrical and Computer Engineer- ing, Mississippi State University, Starkville, MS 39762 USA (e-mail: [email protected]). L. Gao and B. Zhang are with the Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences, Beijing 100094, China (e-mail: [email protected]; [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TGRS.2017.2756851 Furthermore, with the development of remote sensor tech- nology, visible images (VIS) are collected with high spatial resolution using RGB channels (i.e., red, green, and blue), providing more spatial contexture details. Therefore, the fusion of aforementioned different data sources can help to integrate diverse information to further improve the performance in Earth observation. Numerous classification methods using HSI data have been developed [11]–[15]. Since different source data have specific merits, various classification fusion strategies have been pro- posed to combine the multiple characteristics from different data sources [16]. For example, the fusion of HSI and LiDAR has achieved great success in a variety of applications, such as microclimate modeling [17], fuel-type mapping [16], and texture-related mapping [18]–[20]. There is no doubt that the combination of HSI and LiDAR can lead to a higher classification accuracy compared with the use of the HSI data alone. The fusion of HSI and other sources of data has been of great interest for research and practical purposes. The morpho- logical extinction profiles [21] were proposed for both HSI and LiDAR feature extraction. In [22], the morphological attribute profiles were demonstrated to produce higher classification performance with two different types of remote sensing data. A decision fusion for the HSI and LiDAR data classification was presented in [23], and multiple features extracted through different kernels have been fused in a decision phase [24]. On the other hand, the VIS image has fewer spectral infor- mation but higher spatial resolution compared with the HSI data. The full utilization of the HSI and VIS has been studied to improve the land-cover classification [25] and the urban objects detection [26]. In [25], combining spectral and spatial features from multiple data obtained superior classification performance. In [26], different features were extracted from the VIS and HSI data after a principal component analysis was employed to reduce a feature subspace, and classifica- tion was carried out through the pixelwise support vector machine (SVM) classifier. In [27], the long-wave infrared HSI and VIS data were classified separately via the semisupervised local discriminant analysis and the SVM classifier, and then, the results were merged. In the field of remote sensing, various machine learning methods have been validated to be efficient for high- dimensional classification tasks. Random forest, with the strat- egy of boot-bagging [13], [28], provides satisfied performance in remote sensing data classification owing to its tolerance on big data. The SVM [29], [30], with a nonlinear kernel 0196-2892 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

Transcript of IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE …938 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE...

  • IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 56, NO. 2, FEBRUARY 2018 937

    Multisource Remote Sensing Data ClassificationBased on Convolutional Neural Network

    Xiaodong Xu, Student Member, IEEE, Wei Li , Senior Member, IEEE, Qiong Ran,

    Qian Du , Senior Member, IEEE, Lianru Gao , Member, IEEE, and Bing Zhang, Senior Member, IEEE

    Abstract— As a list of remotely sensed data sources is available,how to efficiently exploit useful information from multisourcedata for better Earth observation becomes an interesting butchallenging problem. In this paper, the classification fusionof hyperspectral imagery (HSI) and data from other multiplesensors, such as light detection and ranging (LiDAR) data,is investigated with the state-of-the-art deep learning, named thetwo-branch convolution neural network (CNN). More specific,a two-tunnel CNN framework is first developed to extractspectral-spatial features from HSI; besides, the CNN with cascadeblock is designed for feature extraction from LiDAR or high-resolution visual image. In the feature fusion stage, the spatialand spectral features of HSI are first integrated in a dual-tunnelbranch, and then combined with other data features extractedfrom a cascade network. Experimental results based on severalmultisource data demonstrate the proposed two-branch CNN thatcan achieve more excellent classification performance than someexisting methods.

    Index Terms— Convolutional neural network (CNN), datafusion, deep learning, feature extraction, hyperspectralimagery (HSI).

    I. INTRODUCTION

    HYPERSPECTRAL imagery (HSI) is obtained with hun-dreds of narrow contiguous spectral bands carryingplenty of spectral information. With the advantage of distin-guishing subtle spectral difference, classification tasks withHSI have been applied in various fields, including deforesta-tion, land cover mapping, and mineral exploration [1]–[5].On the other hand, light detection and ranging (LiDAR) canprovide the elevation information of height and shape withrespect to the sensor [6], [7]. LiDAR contains full of altitudeinformation, which is valuable for better characterization ofthe same scene acquired solely by optical sensors [8]–[10].

    Manuscript received June 29, 2017; revised August 1, 2017 andAugust 28, 2017; accepted September 19, 2017. Date of publicationOctober 16, 2017; date of current version January 26, 2018. This work wassupported in part by the National Natural Science Foundation of China underGrant NSFC-91638201 and Grant 61571033 and in part by the Higher Edu-cation and High-Quality and World-Class Universities under Grant PY201619(Corresponding author: Wei Li.)

    X. Xu, W. Li, and Q. Ran are with the College of Information Scienceand Technology, Beijing University of Chemical Technology, Beijing 100029,China (e-mail: [email protected]).

    Q. Du is with the Department of Electrical and Computer Engineer-ing, Mississippi State University, Starkville, MS 39762 USA (e-mail:[email protected]).

    L. Gao and B. Zhang are with the Institute of Remote Sensing andDigital Earth, Chinese Academy of Sciences, Beijing 100094, China (e-mail:[email protected]; [email protected]).

    Color versions of one or more of the figures in this paper are availableonline at http://ieeexplore.ieee.org.

    Digital Object Identifier 10.1109/TGRS.2017.2756851

    Furthermore, with the development of remote sensor tech-nology, visible images (VIS) are collected with high spatialresolution using RGB channels (i.e., red, green, and blue),providing more spatial contexture details. Therefore, the fusionof aforementioned different data sources can help to integratediverse information to further improve the performance inEarth observation.

    Numerous classification methods using HSI data have beendeveloped [11]–[15]. Since different source data have specificmerits, various classification fusion strategies have been pro-posed to combine the multiple characteristics from differentdata sources [16]. For example, the fusion of HSI and LiDARhas achieved great success in a variety of applications, suchas microclimate modeling [17], fuel-type mapping [16], andtexture-related mapping [18]–[20]. There is no doubt thatthe combination of HSI and LiDAR can lead to a higherclassification accuracy compared with the use of the HSI dataalone. The fusion of HSI and other sources of data has been ofgreat interest for research and practical purposes. The morpho-logical extinction profiles [21] were proposed for both HSI andLiDAR feature extraction. In [22], the morphological attributeprofiles were demonstrated to produce higher classificationperformance with two different types of remote sensing data.A decision fusion for the HSI and LiDAR data classificationwas presented in [23], and multiple features extracted throughdifferent kernels have been fused in a decision phase [24].On the other hand, the VIS image has fewer spectral infor-mation but higher spatial resolution compared with the HSIdata. The full utilization of the HSI and VIS has been studiedto improve the land-cover classification [25] and the urbanobjects detection [26]. In [25], combining spectral and spatialfeatures from multiple data obtained superior classificationperformance. In [26], different features were extracted fromthe VIS and HSI data after a principal component analysiswas employed to reduce a feature subspace, and classifica-tion was carried out through the pixelwise support vectormachine (SVM) classifier. In [27], the long-wave infrared HSIand VIS data were classified separately via the semisupervisedlocal discriminant analysis and the SVM classifier, and then,the results were merged.

    In the field of remote sensing, various machine learningmethods have been validated to be efficient for high-dimensional classification tasks. Random forest, with the strat-egy of boot-bagging [13], [28], provides satisfied performancein remote sensing data classification owing to its toleranceon big data. The SVM [29], [30], with a nonlinear kernel

    0196-2892 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

    https://orcid.org/0000-0001-7015-7335https://orcid.org/0000-0001-8354-7500https://orcid.org/0000-0003-3888-8124

  • 938 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 56, NO. 2, FEBRUARY 2018

    function, has been commonly used for HSI classification,especially when the number size of training data is small.A network-based classification method called extreme learningmachine (ELM) [31] has been introduced with comparableperformance to SVM. Recently, deep learning-based methodshave attracted great attention due to the ability of diggingthe latent representations and features from the raw data.A deep autoencode was designed for HSI classification, con-sidering both spatial and spectral information [32]. A stackautoencoder network was employed to learn features from thedata after dimension reduction [33]. The deep belief networkwas proposed in [34] for HSI classification. Furthermore, theconvolution neural network (CNN) was employed for thetask of large-scale visual image classification [35]–[37].The CNN was widely studied in remote sensing commu-nity and shown to be more powerful than the SVM [38].A CNN-based pixel-pairs feature framework was proposed forHSI classification [39], and a sparse representation combinedCNN was further developed [40]. In [41]–[43], more CNNswere employed for deep feature extraction. Due to its effec-tiveness, the technique has been applied for large-scale high-resolution remote-sensing image classification [44]–[52], anddemonstrated superior performance over traditional methods.

    In this paper, a novel two-branch CNN for multisourceremote sensing data classification is proposed, where the CNNarchitecture is designed to combine features extracted fromHSI and other source data, such as LiDAR or VIS. For theHSI branch, a dual-tunnel CNN is proposed for the spectral-spatial feature extraction on the local HSI patch. More specific,a pure 2-D CNN is utilized to focus spatial informationin the local patch windows, and a 1-D CNN is employedspectral feature of the center pixel. Both types of featuresare subsequently fused in the full-connected layer throughconcatenation or stacking. Furthermore, a cascade networkblock, which passes multiscale features to output, is developedto exploit spatial information in LiDAR or VIS. In the entireframework, two network branches are first trained individuallywith available labeled samples; then, different features forfurther merging together are extracted by branches whoseclassifying layers are popped out. The network is furtherretrained to fine-tune the weights with the idea of transferredlearning [53].

    The main contributions of this paper can be summarized asfollows. First, for the network architecture, a novel two-branchdeep CNN framework is proposed for pixelwise classificationwith fusing multisource remote sensing data, e.g., HSI andLiDAR. To make full use of spatial–spectral information inHSI, a dual-tunnel deep architecture network is developed. Thenetwork consists of pure 2-D and 1-D convolution operators,which extract features from surrounding neighbors of the pixelfor spectral information enhancement, and a deep networkincluding cascade blocks is further designed to extract featuresfrom LiDAR or VIS data. Second, in the feature fusion stage,the spatial and spectral features of HSI are combined withother data features extracted from the cascade network, and allthe features are concatenated together in full-connected layerswith the dropout [54] technology to overcome overfitting issue.Third, in the network training procedure, a two-branch network

    is difficult with weight tuning to achieve high performance ofclassification simultaneously. In our strategy, the two-branchCNN is first trained individually, and then, the two branchesare combined together by updating weights with fine-tuningand fixed weights.

    This paper is organized as follows. In Section II,the details of proposed classification network are described.In Section III, the experiment results on several multisourcedata sets are demonstrated. Section IV provides the concludingremarks.

    II. PROPOSED CLASSIFICATION FRAMEWORK

    The main procedure of the proposed classification frame-work is shown in Fig. 1, including a dual-tunnel CNN branchin HSI and a cascade-block CNN branch in LiDAR or VIS,followed by a classifier that consists of fully connected layerswith softmax loss.

    A. Dual-Tunnel CNN Branch for HSI

    First, a dual-tunnel CNN is designed for the HSI featureextraction, as shown in Fig. 2. It consists of a spectral tunneland a spatial tunnel. The spectral tunnel concentrates on thecenter pixel Hspeci j at the location pi j , which only makes up ofsimple operations, including 1-D convolution layer, activation,max-pooling, and batch normalization [55], [56]. For moredetailed composition of the network, the leaky rectified linearunit (ReLU) is adopted as the activation, which attempts tofix the “dying ReLU” by taking a small coefficient, and theinvolved batch normalization allows a higher learning rate toaccelerate convergence with normalizing data for each trainingmini-batch. After the convolutional and max-pooling layer,the output spectral features Fspeci j are then flatten.

    The spatial tunnel takes a patch centered at the pixel pi jwith a radius r (e.g., 4) as input data. The raw data patchHspati j ∈ Rksize×ksize(ksize = 2×r +1) is fed into the 2-D CNNtunnel. To ensure the consistency of both spectral and spatialtunnels, the architecture of spatial tunnel is the same as thespectral tunnel. The 2-D convolution and batch normalizationcarry out the spatial features Fspati j from the center target pixelpi j surrounding domain.

    The spatial–spectral features are then concatenated beforethey are fed into the full-connection layer. The output of thefull-connection layer can be expressed as

    F�Tspati j , T

    speci j

    � = f �W · �Fspeci j��Fspati j

    � + b� (1)where � denotes the operation of concatenating the spatialand spectral feature vectors, and W and b are the weightsand bias of the full connection layer, respectively. Then,the joint spatial–spectral feature, denoted as Fhsi, is fed intothe softmax classify layer to predict the probability distributionexpressed as

    pred(i, j) = 1�Cj=1(exp(θ j�Fhsi))

    ⎡⎢⎢⎢⎣

    exp(θ1�Fhsi)exp(θ2�Fhsi)

    ...exp(θC�Fhsi)

    ⎤⎥⎥⎥⎦ (2)

  • XU et al.: MULTISOURCE REMOTE SENSING DATA CLASSIFICATION BASED ON CNN 939

    Fig. 1. Flowchart of the proposed two-branch CNN for multisource remote sensing data classification.

    Fig. 2. Architecture of the proposed dual-tunnel CNN for HSI feature extraction. Note that the input data are a local patch around its center pixel.

    where θ j ( j = 1, 2, . . . , C) is the j th column of the weightsin the prediction layer, C is the number of classes, andpred(i, j) ∈ RC is a 1-D vector, which decides the label oftesting pixel.

    In this designed dual-tunnel CNN branch, two tunnelsforward simultaneously when forward computation is needed,and the chain rule will be followed in the backpropagation-based weight update.

    B. Cascade-Block CNN Branch for LiDAR or VIS

    The flowchart of a cascade-block CNN branch to extractfeatures in LiDAR or VIS is shown in Fig. 3, includ-ing convolution layer, cascade block, and max-pool layer.In this branch, normalized data are first fed into the networkfollowing a convolutional operation with a spatial kernel(e.g., a 3 × 3 kernel). After follow-up cascade and max-pooloperation, features Flv are extracted. A fully connected layerwith dropout takes the flatten Flv as input, and then, the last

    Fig. 3. Architecture of the proposed CNN with cascade blocks forLiDAR/VIS feature extraction. Note that the input data are a local patch.

    layer predicts the neighbor local patch belonging to the classwith the highest probability.

    The cascade block is defined as

    ym = gm(x1, {Wi}) + x1 (3)y = gs(xs, {Wj}) + xs (4)

  • 940 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 56, NO. 2, FEBRUARY 2018

    Fig. 4. Detail components of the cascade block.

    where gm(x1, {Wi}) and gs(xs, {Wj}) represent the functionmapping operation between two corresponding shortcut paths.Here, x and y are the input and output vector of the cascade,respectively. Note that x1 and xs are the output of first convo-lution and leaky ReLU, respectively, and the dimensionalityof ym and x1 should be the same.

    The detailed architecture of the cascade block is furthershown in Fig. 4. Inspired by the densenet [57] for imageclassification, the cascade is designed for combining differentlevel features from unequal layers for feature reuse and prop-agation. Cascade block consists of seven layers of operations,including convolution, batch normalization, and leaky ReLU.The two paths bridge the first and middle convolution and twoactivation operations. The path passes previous features to asubsequent layer with simple mathematical addition. Besides,the mathematical addition is performed between two featuremaps with the same shape channel by channel. Then, fusedfeatures are propagated to the next layer in the forward phase.In the phase of backpropagation weight update, the chain ruleis adopted.

    C. Network Training Strategy

    All the weights including bias in the proposed network needto be trained. Parameters optimization in the two branches isdifficult. In our strategy, training data pairs are denoted as{[(Hspati j , Hspeci j ), Li j ] ∼ Cij }. The Hspati j , Hspeci j values are fedinto the HSI branch, and the Li j value for the LiDAR/VISbranch. The different branches are trained using separatetraining data first. In our experiments, the fine-tune strategy isalso employed. As we known, fine-tune requires a pretrainedmodel, which is trained with the same or larger data set.Fine-tune can load the weights from the pretrained model tosignificantly reduce the computation time. The spatial–spectralCNN feature extractor for HSI and the cascade CNN featureextractor for LiDAR/VIS are first trained on individual trainingset. More specific, as described in Algorithm 1, two differentbranches are first trained in stage 1 with a large learning rate.When the two branches are merged in stage 2, the pretrainedmodels extract the corresponding features from training datapairs without their top fully connected layer and the softmaxprediction layer. The bottom layers in the two branches willbe fixed or trainable with a smaller learning rate of stochasticgradient descent rule. The weights and bias of additional layers

    are first initialized with glorot normalization, and then updatedwith a very small learning rate.

    Algorithm 1 Training Two-Branch CNN for Classification ofMultisource Remote Sensing Data1: Initialize all weights2: while epoch < epochs do3: stage 1:

    •Train the HSI branch•Train the LiDAR/VIS branch

    4: end while5: while epoch < epochs do6: stage 2:

    •Merge the separated trained model•Train two-branch CNN

    7: end while

    The deep learning methods require a large number oflabeled data for training. For remote sensing data, labelingprocess is expensive, and only a few training samples areavailable usually. To overcome this problem, a random seed isgenerated for rotating with 90°, flipping left and right, andup and down in the training phase. Thus, there are extrasamples added to the training set. In order to accelerate theconvergence speed, all data are scaled to 0 − 1. On the otherhand, the technique of fine-tune is usually applied to transfer apretrained model for the large-scale data to the small-scale andsimilar data. Considering the features extracted by individualbranch are suitable to the specific branch, we utilize transferredlearning strategy [53] to train the two different branches in asmall and very similar training set.

    D. Analysis on the Proposed Method

    In the proposed dual-tunnel CNN for HSI feature extraction,the joint utilization of 2-D and 1-D CNN has an integra-tion of spatial and spectral information. That is, 2-D CNNconcentrates on spatial information while 1-D CNN enhancesthe ability of spectral feature extraction. In the dual-tunnelCNN as shown in Fig. 2, the 2-D and 1-D operations areseparated. In [58], a 3-D convolution was also utilized for HSIclassification. Compared with our method, feature extractionfor spatial information is almost the same; however, spectruminformation of all the pixels but not the center pixel in acube is repeatedly convoluted, which actually requires morecomputational memory and resource to update its networkparameters.

    On the other hand, the cascade block is designed for thebranch of LiDAR/VIS. LiDAR is rich in elevation informationof the surface, which is useful for distinguishing objects withthe same altitude, and VIS has higher spatial resolution withtrue color. In the proposed CNN with cascade block, differentcascade layers extract distinct features; for example, the firstlayer may collect features of edge, texture, and corner from theraw input, and the subsequent layers extract more abstractivefeature. Cascade block can pass the low level to the high onethrough a shortcut path, which makes different feature maps

  • XU et al.: MULTISOURCE REMOTE SENSING DATA CLASSIFICATION BASED ON CNN 941

    TABLE I

    NUMBER OF TRAINING AND TESTING SAMPLES FOR THE HOUSTON DATA

    TABLE II

    NUMBER OF TRAINING AND TESTING SAMPLES FOR THE TRENTO DATA

    reusable. The cascade operation is the key of feature fusionwith multiple scales.

    III. EXPERIMENTAL RESULTS

    In this section, the performance of the proposed two-branchCNN is evaluated with several multisource data sets, includingHSI + LiDAR and HSI + VIS. Comparison with otherstate-of-the-art methods is conducted. All the experiments areimplemented with Python and Tensorflow1 with the high-levelapplication programming interface Keras.2 The Tensorflow isan open source library for machine intelligence using dataflow graph. Keras is written in Python and capable of runningon top of Tensorflow and Theano, which focuses on enablingfast experiments. All the experiments are implemented on theUbuntu 14.04, dual Intel E5 2683 CPUs, 16-GB memory, andGPU of Nvidia GTX 1080.

    A. Experiment Data

    To evaluate the performance of the proposed two-branchCNN in HSI + LiDAR, the Houston Data and Trento Dataare employed. As for experiments on feature fusion usingHSI + VIS, the public standard classification HSI data,i.e., Salinas Valley and Pavia University, are employed, wherethe corresponding VIS images are simulated.

    1https://www.tensorflow.org/2https://keras.io/

    TABLE III

    NUMBER OF TRAINING AND TESTING SAMPLES

    FOR THE UNIVERSITY OF PAVIA DATA

    TABLE IV

    NUMBER OF TRAINING AND TESTING SAMPLES FOR THE SALINAS DATA

    TABLE V

    DETAILS OF THE PROPOSED TWO-BRANCH CNN ARCHITECTURE

    1) Houston Data: The data are composed of the HSI andLiDAR images that were introduced for the 2013 GRSSData Fusion contest. The scene was acquired in 2012 byan airborne sensor over the area of University of Houstoncampus and neighbor area. The size of the data is 349 × 1905pixels with 2.5-m spatial resolution. The HSI scene consistsof 144 spectral bands with wavelength ranging from 0.38 to1.05 μm including 15 classes. Table I shows the availabletraining and testing samples, which are the same as [7], [22],and [23].

    2) Trento Data: The image was captured over the rural areaof the city of Trento, Italy. The scene has 600 × 166 pixels,including six classes. It consists of 63 bands ranging from420.89 to 989.09 nm where spectral resolution is 9.2 nm, andthe spatial resolution is 1 m. Table II lists the informationabout the number of samples in different classes.

  • 942 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 56, NO. 2, FEBRUARY 2018

    Fig. 5. Performance of fine-tune on each class accuracy (%) and OA (%) of different data. (a) Houston. (b) Trento. (c) University of Pavia. (d) Salinas.

    TABLE VI

    CLASSIFICATION PERFORMANCE WITH DIFFERENT

    NEIGHBORING WINDOW SIZES

    3) Pavia Data: The Reflective Optics System ImagingSpectrometer sensor captured the data over area of Pavia,Northern Italy. The scene has 103 spectral bands rangingfrom 0.43 to 0.86 μm with 610 × 340 pixels covering thecity. The spatial resolution of the data is 1.3 m. There areabout 42 776 pixels labeled with nine classes. The 200 samplesfrom each class are randomly selected from the ground-truthmap as a training set. The number of training and testingsamples is listed in Table III. To simulate the VIS image,the 53rd, 31st, and 7th bands from the original data are selected

    TABLE VII

    CLASSIFICATION PERFORMANCE OF INDIVIDUAL BRANCH

    to form as the simulated VIS data [59]. Since the VIS imagehas usually a higher spatial resolution compared with theHSI data, a Gaussian downsampling operation is applied todecrease the spatial resolution of the HSI data; an upsamplingoperation with simple interpolation is further used to the HSIdata to match the size of the VIS image. In doing so, the spatialresolution of VIS is 1.3 m while the one of HSI is much lower.

    4) Salinas Data: The data were acquired by the Air-borne Visible Infrared Imaging Spectrometer spectral sensorincluding 224 bands over the area of Salinas Valley, Cali-

  • XU et al.: MULTISOURCE REMOTE SENSING DATA CLASSIFICATION BASED ON CNN 943

    TABLE VIII

    CLASS SPECIFIC AND OVERALL CLASSIFICATION ACCURACY (%) OF DIFFERENT METHODS FOR THE HOUSTON DATA

    TABLE IX

    CLASS SPECIFIC AND OVERALL CLASSIFICATION ACCURACY (%) OF DIFFERENT METHODS FOR THE TRENTO DATA

    fornia. In addition, the data are including 512 × 217 pixelswith a spatial resolution of 3.7 m. There are 16 classesin these data, including bare soils, vegetables, and vineyardfields. Table IV lists the information of the training andtesting samples. Similar to the University of Pavia data,three bands of the original HSI data are selected to makeup the VIS data, i.e., the 50th, 20th, and 10th. The spatialresolution of the VIS remains as 3.7 m while the one ofHSI is degraded by employing a Gaussian low-pass filteringprocedure.

    B. Parameter Tuning

    The classification performance is closely related to thedesigned architecture of the deep learning network. In ourexperiments, a Visual Geometry Group-like network [35] isfirst designed for the HSI branch. The bottom three lay-ers, i.e., full connection, classify, and data input layers,are removed; thus, the entire architecture consists of fiveoperation pairs, which include a convolutional and ReLUactivation layer. The network of the branch has complexarchitecture with more parameters. Therefore, it needs muchtime to train this network until its convergence. Table Vlists the details of the proposed two-branch CNN. The sizeof convolution kernel and the dimension of feature mapare set. Note that Conv2D_3 means the 2-D convolutionwith a kernel size of 3 × 3, and 256 is the dimension offeature map.

    Besides, we investigate the effect of several different sizesof the neighbor patch. The performance on different windowsizes of 7×7, 9×9, and 11×11 is demonstrated in Table VI.Different values are tested using the Houston and Salinas data.The study indicates that the window size of 9 × 9 offers aslightly better performance than others.

    The learning rate is one of the factors that determinesthe convergence of speed, which can affect the training per-formance. The learning rate is set as 0.01 with the policyof Adam [60]. Different learning rates are tested on dataHouston, and the learning rate of 0.001 needs more time toconverge. Based on our empirical study, if a higher learn-ing rate is employed, classification accuracy may not beimproved.

    Since the fine-tune strategy can significantly decrease thecomplexity of computation, the low-level features are extractedby pretrained layers and then fit in the following layersfor expected objection. Actually, fine-tune helps leading toa higher classification accuracy and building a more robustnetwork. In Fig. 5, we compare the performance of withand without fine-tune. From the results, the classificationoverall accuracy (OA) on Houston has only reached at 84.37%without fine-tune, and it takes hours for the network toconverge. The conclusion is that with fine-tune will def-initely outperform without fine-tune. Moreover, the tech-nique of fine-tune for the proposed two-branch model pro-vides the potential for transfer learning to a much smallerdata.

  • 944 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 56, NO. 2, FEBRUARY 2018

    TABLE X

    CLASS SPECIFIC AND OVERALL CLASSIFICATION ACCURACY (%) OF DIFFERENT METHODS FOR THE UNIVERSITY OF PAVIA DATA

    TABLE XI

    CLASS SPECIFIC AND OVERALL CLASSIFICATION ACCURACY (%) OF DIFFERENT METHODS FOR THE SALINAS DATA

    C. Classification Performance

    To illustrate the performance of the proposed two-channelCNN for multisource remote sensing data classification,we compared with several traditional classifiers, such as SVM,ELM, and the recently developed CNN pixel-pair features(CNN-PPF) [31], [38], [39]. The SVM and the ELM areimplemented with the official open source library with optimalparameters. Taking the Houston data for example, SVM(H)represents that the SVM is tested for the HSI data whileSVM(H + L) represents that the classifier is run for newdata that the HSI and the LiDAR are concatenated together.Besides, for a fair comparison with other methods, all thetraining and testing samples are exactly the same.

    Table VII shows that the cascade branch for LiDAR/VISand the two-tunnel branch for HSI have reached a fairly highmetrics. Taking the Treneto data for example, the accuracy ofusing LiDAR is 85.17%, and the one of using HSI is 95.35%.Furthermore, integrating them leads to a more superior clas-sification accuracy, i.e., 97.92%.

    To evaluate the fusion of HSI and LiDAR,Tables VIII and IX list the class-specific accuracy andOA using the Houston and Trento data. We can see that theproposed method is obviously superior to other methods.

    TABLE XII

    ELAPSED TIME (m: MINUTES AND s: SECONDS) OF TRAINING AND

    TESTING TIME FOR THE PROPOSED METHOD USING THE

    EXPERIMENTAL DATA SETS

    For example, in Table VIII, the proposed method significantlyyields an accuracy of 87.98%, with an improvement over4% higher than that of CNN-PPF, and approximately 6%and 8% higher than ELM and SVM, respectively. Therefore,it can be concluded that the joint use of HSI and LiDAR canresult in a higher classification accuracy, especially for somesimilar classes, such as Parking lots and roads. The similarconclusion can be also drawn based on the metrics of averageaccuracy and Kappa coefficients.

    Tables X and XI provide the results of joint HSI and VISin the Pavia university and Salinas data, respectively. Notethat in Table X, CNN-PPF using the HSI alone achievesan accuracy of 93.28%, which is lower than the one in theliterature [39]. The reason is that the Pavia university in

  • XU et al.: MULTISOURCE REMOTE SENSING DATA CLASSIFICATION BASED ON CNN 945

    Fig. 6. Data set visualization and classification maps for the Houston data obtained with different methods including (a) pseudocolor image for HSI,(b) gray image for LiDAR, (c) ground-truth map, (d) SVM (80.49%), (e) ELM (81.92%), (f) CNN-PPF (83.33%), and (g) proposed two-branch CNN(87.98%).

    our experiments has been processed with a spatial blurringprocedure. Generally, if only using HSI, the proposed dual-tunnel CNN is superior to CNN-PPF, much higher than SVMand ELM; when combining HSI and VIS, the accuracy of theproposed method is as high as 99.13% for the Pavia universitydata, which is the best among all the classifiers. For the Salinasdata, the one from the proposed method is 97.72%, with animprovement of 2.5% compared with the CNN-PPF.

    Figs. 6 and 7 demonstrate the classification maps obtainedby different classification methods using the Houston andPavia university data, respectively. To facilitate comparison,the ground-truth map is also shown. These maps are consistentto the results listed in Tables VIII and X. It is obvious that

    the proposed two-branch CNN has less mislabeled areas inthe bottom half of the map than those of SVM, ELM, andCNN-PPF.

    In addition, classification performances versus differentnumbers of training samples have been explored. Fig. 8shows the performance with different percentages of trainingsamples from 20% to 100%. Here, 100% represents exactlythe number of training data as listed in Tables I–III. As forother percentages (e.g., 20%), different training samples (withthe same size of each class) are selected randomly to havemore reliable results, for which the experiment is repeated tentimes, reporting the average classification accuracy. As shownin Fig. 8, when the percentage is as small as 40%, the proposed

  • 946 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 56, NO. 2, FEBRUARY 2018

    Fig. 7. Data set visualization and classification maps for the University of Pavia data obtained with different methods including (a) pseudocolor image forHSI, (b) color image for VIS, (c) ground-truth map, (d) SVM (89.89%), (e) ELM (89.83%), (f) CNN-PPF (97.95%), and (g) proposed two-branch CNN(99.13%).

    framework has still a much better performance than other. Forexample, in Fig. 8(a), the accuracy of the proposed two-branchCNN is approximately 86% while the one of SVM is justnearly 80%. The improvement gap is obvious, which verifiesthe effectiveness of the proposed framework.

    Computational complexity of the training and testing pro-cedure of the proposed two-branch CNN is summarizedin Table XII. All the experiments of computational time areimplemented in the same configuration of software and hard-ware. For the training procedure, all the data sets are trained

  • XU et al.: MULTISOURCE REMOTE SENSING DATA CLASSIFICATION BASED ON CNN 947

    Fig. 8. Classification performance of methods with different percents of full training set. (a) Houston data. (b) Pavia data.

    in 20 epoches. The number of feature maps and the fullyconnected layers in the network are the major componentscontributing to the time cost. The training procedure takesmuch longer time while the testing for a whole scene isrelatively faster.

    IV. CONCLUSIONIn this paper, a novel CNN-based approach has been

    proposed for the classification of multisource remote sens-ing data, such as HSI and LiDAR, or HSI and VIS data.The proposed two-branch CNN model contains two differentpipelines. A simple two-tunnel CNN, which consists of thesame architecture of 2-D and 1-D CNNs, is designed forreinforcing the correspondence spatial–spectral information.Besides, a cascade network is devised to combine features atdifferent levels with a shortcut path. In addition, an asynchro-nous training strategy and a fine-tune technique are adopted inthe training phase. In the testing procedure, the raw pair dataare fed into the model with normalization. The experimentalresults with several multisource data demonstrated that in thecondition of the same training samples, the proposed methodoutperforms the traditional SVM and ELM, and state-of-the-art CNN-based method, i.e., CNN-PPF.

    REFERENCES[1] J. Li, P. R. Marpu, A. Plaza, J. M. Bioucas-Dias, and J. A. Benediktsson,

    “Generalized composite kernel framework for hyperspectral imageclassification,” IEEE Trans. Geosci. Remote Sens., vol. 51, no. 9,pp. 4816–4829, Sep. 2013.

    [2] W. Li, Q. Du, and B. Zhang, “Combined sparse and collaborativerepresentation for hyperspectral target detection,” Pattern Recognit.,vol. 48, no. 12, pp. 3904–3916, 2015.

    [3] X. Huang and L. Zhang, “An SVM ensemble approach combiningspectral, structural, and semantic features for the classification of high-resolution remotely sensed imagery,” IEEE Trans. Geosci. Remote Sens.,vol. 51, no. 1, pp. 257–272, Jan. 2013.

    [4] B. Du and L. Zhang, “A discriminative metric learning based anomalydetection method,” IEEE Trans. Geosci. Remote Sens., vol. 52, no. 11,pp. 6844–6857, Nov. 2014.

    [5] W. Li and Q. Du, “Joint within-class collaborative representation forhyperspectral image classification,” IEEE J. Sel. Topics Appl. EarthObserv. Remote Sens., vol. 7, no. 6, pp. 2200–2208, Jun. 2014.

    [6] Y. Chen, C. Li, P. Ghamisi, C. Shi, and Y. Gu, “Deep fusion ofhyperspectral and LiDAR data for thematic classification,” in Proc. IEEEInt. Geosci. Remote Sens. Symp. (IGARSS), Jul. 2016, pp. 3591–3594.

    [7] P. Ghamisi, B. Höfle, and X. X. Zhu, “Hyperspectral and LiDAR datafusion using extinction profiles and deep convolutional neural network,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 10, no. 6,pp. 3011–3024, Jun. 2017.

    [8] M. Pedergnana, P. R. Marpu, M. D. Mura, J. A. Benediktsson, andL. Bruzzone, “Classification of remote sensing optical and LiDAR datausing extended attribute profiles,” IEEE J. Sel. Topics Signal Process.,vol. 6, no. 7, pp. 856–865, Nov. 2012.

    [9] M. Belgiu, I. Tomljenovic, T. J. Lampoltshammer, T. Blaschke, andB. Höfle, “Ontology-based classification of building types detected fromairborne laser scanning data,” Remote Sens., vol. 6, no. 2, pp. 1347–1366,2014.

    [10] I. Tomljenovic, B. Höfle, D. Tiede, and T. Blaschke, “Building extractionfrom airborne laser scanning data: An analysis of the state of the art,”Remote Sens., vol. 7, no. 4, pp. 3826–3862, 2015.

    [11] F. Melgani and L. Bruzzone, “Classification of hyperspectral remotesensing images with support vector machines,” IEEE Trans. Geosci.Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.

    [12] Y. Chen, N. M. Nasrabadi, and T. D. Tran, “Hyperspectral imageclassification using dictionary-based sparse representation,” IEEETrans. Geosci. Remote Sens., vol. 49, no. 10, pp. 3973–3985,Oct. 2011.

    [13] J. Ham, Y. Chen, M. M. Crawford, and J. Ghosh, “Investigation ofthe random forest framework for classification of hyperspectral data,”IEEE Trans. Geosci. Remote Sens., vol. 43, no. 3, pp. 492–501,Mar. 2005.

    [14] W. Li and Q. Du, “Gabor-filtering-based nearest regularized subspacefor hyperspectral image classification,” IEEE J. Sel. Topics Appl. EarthObserv. Remote Sens., vol. 7, no. 4, pp. 1012–1022, Apr. 2014.

    [15] W. Li, E. W. Tramel, S. Prasad, and J. E. Fowler, “Nearest regularizedsubspace for hyperspectral classification,” IEEE Trans. Geosci. RemoteSens., vol. 52, no. 1, pp. 477–489, Jan. 2014.

    [16] B. Koetz, F. Morsdorf, S. van der Linden, T. Curt, and B. Allgöwer,“Multi-source land cover classification for forest fire management basedon imaging spectrometry and LiDAR data,” Forest Ecol. Manage.,vol. 256, no. 3, pp. 263–271, 2008.

    [17] U. Heiden, W. Heldens, S. Roessner, K. Segl, T. Esch, and A. Mueller,“Urban structure type characterization using hyperspectral remote sens-ing and height information,” Landscape Urban Planning, vol. 105, no. 4,pp. 361–375, 2012.

    [18] G. P. Asner et al., “Invasive species detection in Hawaiian rainforestsusing airborne imaging spectroscopy and LiDAR,” Remote Sens. Envi-ron., vol. 112, no. 5, pp. 1942–1955, 2008.

    [19] G. A. Blackburn, “Remote sensing of forest pigments using airborneimaging spectrometer and LIDAR imagery,” Remote Sens. Environ.,vol. 82, nos. 2–3, pp. 311–321, 2002.

  • 948 IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 56, NO. 2, FEBRUARY 2018

    [20] M. Voss and R. Sugumaran, “Seasonal effect on tree species classifica-tion in an urban environment using hyperspectral data, LiDAR, and anobject- oriented approach,” Sensors, vol. 8, no. 5, pp. 3020–3036, 2008.

    [21] P. Ghamisi, R. Souza, J. A. Benediktsson, X. X. Zhu, L. Rittner, andR. A. Lotufo, “Extinction profiles for the classification of remote sensingdata,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10, pp. 5631–5645,Oct. 2016.

    [22] M. Khodadadzadeh, J. Li, S. Prasad, and A. Plaza, “Fusion of hyperspec-tral and LiDAR remote sensing data using multiple feature learning,”IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 6,pp. 2971–2983, Jun. 2015.

    [23] W. Liao, R. Bellens, A. Pižurica, S. Gautama, and W. Philips, “Com-bining feature fusion and decision fusion for classification of hyper-spectral and LiDAR data,” in Proc. IEEE Int. Geosci. Remote Sens.Symp. (IGARSS), Jul. 2014, pp. 1241–1244.

    [24] C. Zhao, X. Gao, Y. Wang, and J. Li, “Efficient multiple-feature learning-based hyperspectral image classification with limited training samples,”IEEE Trans. Geosci. Remote Sens., vol. 54, no. 7, pp. 4052–4062,Jul. 2016.

    [25] J. Li, H. Zhang, M. Guo, L. Zhang, H. Shen, and Q. Du, “Urbanclassification by the fusion of thermal infrared hyperspectral and visibledata,” Photogramm. Eng. Remote Sens., vol. 81, no. 12, pp. 901–911,2015.

    [26] M. Eslami and A. Mohammadzadeh, “Developing a spectral-basedstrategy for urban object detection from airborne hyperspectral TIR andvisible data,” IEEE J. Sel. Topics Appl. Earth Observ. Remote Sens.,vol. 9, no. 5, pp. 1808–1816, May 2016.

    [27] X. Lu, J. Zhang, T. Li, and G. Zhang, “Synergetic classification oflong-wave infrared hyperspectral and visible images,” IEEE J. Sel.Topics Appl. Earth Observ. Remote Sens., vol. 8, no. 7, pp. 3546–3557,Jul. 2015.

    [28] P. O. Gislason, J. A. Benediktsson, and J. R. Sveinsson, “Random forestsfor land cover classification,” Pattern Recognit. Lett., vol. 27, no. 4,pp. 294–300, 2006.

    [29] L. Gao et al., “Subspace-based support vector machines for hyperspec-tral image classification,” IEEE Geosci. Remote Sens. Lett., vol. 12,no. 2, pp. 349–353, Feb. 2015.

    [30] W. Li, S. Prasad, J. E. Fowler, and L. M. Bruce, “Locality-preserving dimensionality reduction and classification for hyperspectralimage analysis,” IEEE Trans. Geosci. Remote Sens., vol. 50, no. 4,pp. 1185–1198, Apr. 2012.

    [31] W. Li, C. Chen, H. Su, and Q. Du, “Local binary patterns and extremelearning machine for hyperspectral imagery classification,” IEEE Trans.Geosci. Remote Sens., vol. 53, no. 7, pp. 3681–3693, Jul. 2015.

    [32] X. Ma, H. Wang, and J. Geng, “Spectral–spatial classification ofhyperspectral image based on deep auto-encoder,” IEEE J. Sel. TopicsAppl. Earth Observ. Remote Sens., vol. 9, no. 9, pp. 4073–4085,Sep. 2016.

    [33] Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu, “Deep learning-basedclassification of hyperspectral data,” IEEE J. Sel. Topics Appl. EarthObserv. Remote Sens., vol. 7, no. 6, pp. 2094–2107, Jun. 2014.

    [34] Y. Chen, X. Zhao, and X. Jia, “Spectral–spatial classification of hyper-spectral data based on deep belief network,” IEEE J. Sel. Topics Appl.Earth Observ. Remote Sens., vol. 8, no. 6, pp. 2381–2392, Jun. 2015.

    [35] K. Simonyan and A. Zisserman. (2014). “Very deep convolutionalnetworks for large-scale image recognition.” [Online]. Available:https://arxiv.org/abs/1409.1556

    [36] C. Szegedy et al., “Going deeper with convolutions,” in Proc. IEEEConf. Comput. Vis. Pattern Recognit., Jun. 2015, pp. 1–9.

    [37] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning forimage recognition,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit.,Jun. 2016, pp. 770–778.

    [38] W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li, “Deep convolutionalneural networks for hyperspectral image classification,” J. Sensors,vol. 2015, 2015, Art. no. 258619, doi: 10.1155/2015/258619.

    [39] W. Li, G. Wu, F. Zhang, and Q. Du, “Hyperspectral image classificationusing deep pixel-pair features,” IEEE Trans. Geosci. Remote Sens.,vol. 55, no. 2, pp. 844–853, Feb. 2017.

    [40] H. Liang and Q. Li, “Hyperspectral imagery classification using sparserepresentations of convolutional neural network features,” Remote Sens.,vol. 8, no. 2, p. 99, 2016.

    [41] Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi, “Deep feature extrac-tion and classification of hyperspectral images based on convolutionalneural networks,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 10,pp. 6232–6251, Oct. 2016.

    [42] K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis, “Deepsupervised learning for hyperspectral data classification through con-volutional neural networks,” in Proc. IEEE Int. Geosci. Remote Sens.Symp. (IGARSS), Jul. 2015, pp. 4959–4962.

    [43] S. Yu, S. Jia, and C. Xu, “Convolutional neural networks for hyper-spectral image classification,” Neurocomputing, vol. 219, pp. 88–98,Jan. 2017.

    [44] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Convolutionalneural networks for large-scale remote-sensing image classification,”IEEE Trans. Geosci. Remote Sens., vol. 55, no. 2, pp. 645–657,Feb. 2017.

    [45] F. Hu, G.-S. Xia, J. Hu, and L. Zhang, “Transferring deep convolutionalneural networks for the scene classification of high-resolution remotesensing imagery,” Remote Sens., vol. 7, no. 11, pp. 14680–14707, 2015.

    [46] M. Castelluccio, G. Poggi, C. Sansone, and L. Verdoliva. (2015). “Landuse classification in remote sensing images by convolutional neuralnetworks.” [Online]. Available: https://arxiv.org/abs/1508.00092

    [47] F. Zhang, B. Du, and L. Zhang, “Scene classification via a gradi-ent boosting random convolutional network framework,” IEEE Trans.Geosci. Remote Sens., vol. 54, no. 3, pp. 1793–1802, Mar. 2016.

    [48] M. Oquab, L. Bottou, I. Laptev, and J. Sivic, “Is object localizationfor free?—Weakly-supervised learning with convolutional neural net-works,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Jun. 2015,pp. 685–694.

    [49] E. Maggiori, Y. Tarabalka, G. Charpiat, and P. Alliez, “Fully con-volutional neural networks for remote sensing image classification,”in Proc. IEEE Int. Geosci. Remote Sens. Symp. (IGARSS), Jul. 2016,pp. 5071–5074.

    [50] G. J. Scott, M. R. England, W. A. Starms, R. A. Marcum, andC. H. Davis, “Training deep convolutional neural networks for land–cover classification of high-resolution imagery,” IEEE Geosci. RemoteSens. Lett., vol. 14, no. 4, pp. 549–553, Apr. 2017.

    [51] Y. Zhong, F. Fei, and L. Zhang, “Large patch convolutional neuralnetworks for the scene classification of high spatial resolution imagery,”J. Appl. Remote Sens., vol. 10, no. 2, p. 025006, 2016.

    [52] Q. Zou, L. Ni, T. Zhang, and Q. Wang, “Deep learning based featureselection for remote sensing scene classification,” IEEE Geosci. RemoteSens. Lett., vol. 12, no. 11, pp. 2321–2325, Nov. 2015.

    [53] J. Yosinski, J. Clune, Y. Bengio, and H. Lipson, “How transferable arefeatures in deep neural networks?” in Proc. Adv. Neural Inf. Process.Syst., 2014, pp. 3320–3328.

    [54] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, andR. Salakhutdinov, “Dropout: A simple way to prevent neural networksfrom overfitting,” J. Mach. Learn. Res., vol. 15, no. 1, pp. 1929–1958,2014.

    [55] A. L. Maas, A. Y. Hannun, and A. Y. Ng, “Rectifier nonlinearitiesimprove neural network acoustic models,” in Proc. 30th ICML, 2013,vol. 30. no. 1.

    [56] S. Ioffe and C. Szegedy. (2015). “Batch normalization: Acceleratingdeep network training by reducing internal covariate shift.” [Online].Available: https://arxiv.org/abs/1502.03167

    [57] G. Huang, Z. Liu, K. Q. Weinberger, and L. van der Maaten.(2016). “Densely connected convolutional networks.” [Online]. Avail-able: https://arxiv.org/abs/1608.06993

    [58] Y. Li, H. Zhang, and Q. Shen, “Spectral–spatial classification of hyper-spectral imagery with 3D convolutional neural network,” Remote Sens.,vol. 9, no. 1, p. 67, 2017.

    [59] Z. Mahmood, M. A. Akhter, G. Thoonen, and P. Scheunders, “Con-textual subpixel mapping of hyperspectral images making use of a highresolution color image,” IEEE J. Sel. Topics Appl. Earth Observ. RemoteSens., vol. 6, no. 2, pp. 779–791, Apr. 2013.

    [60] D. P. Kingma and J. Ba. (2014). “Adam: A method for stochasticoptimization.” [Online]. Available: https://arxiv.org/abs/1412.6980

    Xiaodong Xu (S’16) received the B.S. degree fromthe Beijing University of Chemical Technology,Beijing, China, in 2015, where he is currentlypursuing the M.S. degree under the supervisionof Dr. W. Li.

    http://dx.doi.org/10.1155/2015/258619

  • XU et al.: MULTISOURCE REMOTE SENSING DATA CLASSIFICATION BASED ON CNN 949

    Wei Li (S’11–M’13–SM’16) received theB.E. degree in telecommunications engineeringfrom Xidian University, Xi’an, China, in 2007,the M.S. degree in information science andtechnology from Sun Yat-sen University,Guangzhou, China, in 2009, and the Ph.D.degree in electrical and computer engineering fromMississippi State University, Starkville, MS, USA,in 2012.

    He was a Postdoctoral Researcher with theUniversity of California at Davis, Davis, CA, USA.

    He is currently with the College of Information Science and Technology,Beijing University of Chemical Technology, Beijing, China. His researchinterests include statistical pattern recognition, hyperspectral image analysis,and data compression.

    Dr. Li received the 2015 Best Reviewer Award from the IEEE Geoscienceand Remote Sensing Society for his service of the IEEE JOURNAL OFSELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTESENSING (JSTARS). He is an Active Reviewer of the IEEE TRANSACTIONSON GEOSCIENCE AND REMOTE SENSING, the IEEE GEOSCIENCE REMOTESENSING LETTERS, and the IEEE JSTARS. He serves as a Guest Editorof the special issue of the Journal of Real-Time Image Processing, RemoteSensing, and the IEEE JSTARS.

    Qiong Ran received the Ph.D. degree from theInstitute of Remote Sensing Applications, ChineseAcademy of Sciences, Beijing, China, in 2009.

    She is currently with the College of Infor-mation Science and Technology, Beijing Univer-sity of Chemical Technology, Beijing. She hasauthored over ten papers in China and abroad. Herresearch interests include image acquisition, imageprocessing, and hyperspectral image analysis andapplications.

    Qian Du (S’98–M’00–SM’05) received thePh.D. degree in electrical engineering from theUniversity of Maryland at Baltimore, Baltimore,MD, USA, in 2000.

    She is currently a Bobby Shackouls Professorwith the Department of Electrical and ComputerEngineering, Mississippi State University,Starkville, MS, USA. Her research interests includehyperspectral remote sensing image analysisand applications, pattern classification, datacompression, and neural networks.

    Dr. Du is a fellow of the International Society for Optics and Photonics.She received the 2010 Best Reviewer Award from the IEEE Geoscience andRemote Sensing Society. She served as the Co-Chair of the Data FusionTechnical Committee of the IEEE Geoscience and Remote Sensing Societyfrom 2009 to 2013 and the Chair of the Remote Sensing and MappingTechnical Committee of the International Association for Pattern Recognitionfrom 2010 to 2014. She is the General Chair of the 4th IEEE Geoscienceand Remote Sensing Society Workshop on Hyperspectral Image and SignalProcessing: Evolution in Remote Sensing, Shanghai, China, in 2012. Sheserved as an Associate Editor of the IEEE JOURNAL OF SELECTED TOPICSIN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, the Journalof Applied Remote Sensing, and the IEEE SIGNAL PROCESSING LETTERS.Since 2016, she has been the Editor-in-Chief of the IEEE JOURNAL OFSELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTESENSING.

    Lianru Gao (M’12) received the B.S. degree incivil engineering from Tsinghua University, Beijing,China, in 2002, and the Ph.D. degree in cartogra-phy and geographic information system from theInstitute of Remote Sensing Applications, ChineseAcademy of Sciences (CAS), Beijing, in 2007.

    He is currently a Professor with the Key Labora-tory of Digital Earth Science, Institute of RemoteSensing and Digital Earth, CAS. In last ten years,he led ten scientific research projects at national andministerial levels, including projects by the National

    Natural Science Foundation of China and by the Key Research Program of theCAS. He has authored over 110 peer-reviewed papers, including 47 journalpapers. He co-authored an academic book Hyperspectral Image Classificationand Target Detection. He holds 12 national invention patents and four softwarecopyright registrations. His research interests include hyperspectral imageprocessing and information extraction.

    Dr. Gao received the recognition of Best Reviewers of the IEEE JOURNALOF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTESENSING in 2015. His core research achievements received the OutstandingScience and Technology Achievement Prize of the CAS in 2016. He alsoreceived the China National Science Fund for Excellent Young Scholarsin 2017.

    Bing Zhang (M’11–SM’12) received theB.S. degree in geography from Peking University,Beijing, China, and the M.S. and Ph.D. degrees inremote sensing from the Institute of Remote SensingApplications, Chinese Academy of Sciences (CAS),Beijing, China.

    He is currently a Full Professor and the DeputyDirector of the Institute of Remote Sensing andDigital Earth, CAS, where he is leading the keyscientific projects in the area of hyperspectralremote sensing for over 20 years. He gave lessons

    of hyperspectral remote sensing with the University of Chinese Academy ofSciences, Beijing, for over ten years, where he is also a Professor. He hasdeveloped five software systems in the image processing and applications. Hehas been the Advisor of 30 Ph.D. dissertations and over 13 M.S. dissertations.He has authored over 300 publications, including more than 170 journalpapers. He has edited six books/contributed book chapters on hyperspectralimage processing and subsequent applications, those books serve as themain materials for education and research in hyperspectral remote sensingin China. His research interests include the development of mathematicaland physical models and image processing software for the analysis ofhyperspectral remote sensing data in many different areas, such as geology,hydrology, ecology, and botany.

    Dr. Zhang has been serving as a Technical Committee Member of theIEEE Workshop on Hyperspectral Image and Signal Processing: Evolution inRemote Sensing since 2011 and the President of the Hyperspectral RemoteSensing Committee of China National Committee of International Societyfor Digital Earth since 2012. He is a Scientific Committee Member ofthe International Geoscience and Remote Sensing Symposium (IGARSS)in 2014, 2016, and 2017, and a Student Paper Competition CommitteeMember of IGARSS in 2015 and 2016. His creative achievements receivedeight important prizes from the Chinese government and special governmentallowances of the Chinese State Council. He also received the NationalScience Foundation for Distinguished Young Scholars of China in 2013 andthe Outstanding Science and Technology Achievement Prize of the CAS,the highest level of awards for the CAS scholars in 2016. He has beenserving as an Associate Editor of the IEEE JOURNAL OF SELECTED TOPICSIN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING (JSTARS)since 2011. He has been a Guest Editor of several special issues of the IEEEJSTARS, Pattern Recongition Letters, and the PROCEEDINGS OF THE IEEE.