Automatic Generation of Land-Use Maps

4
Automatic generation of land-use maps for a spatial decision support system for Puerto Rico Johannes van der Kwast, Joseen Delrue, Luc Bertels, Inge Uljee, Stijn Van Looy, Joan Schepens and Guy Engelen Unit Environmental Modelling VITO Mol, Belgium Email: [email protected] Elias Guti ` errez Graduate School of Planning University of Puerto Rico San Juan, Puerto Rico Email: [email protected] Glenda Rom´ an Geographic Mapping Technologies Corp. San Juan, Puerto Rico Email: [email protected]  AbstractThis study prop oses an automatic image pro cess ing procedure in order to facilitate regular updating of the land-use map of Puerto Rico, which is a key dataset for the Xplorah Plan- ning Support Systems. The procedure is based on the contextual reclassication of digital high resolution aerial photographs that were preclassied using a decision tree classier. For the contex- tual reclassication the Optimized Spatial Reclassication Kernel (OSPARK) is used, which is able to discriminate functional land- use classes and land cover based on the conguration of objects in a kernel. A unique property of OSPARK is that it automatically adapts the kernel size as a function of spatial variation in the neighborhood of each pixel to be classied. The processing chain has bee n imp lement ed on a comput er clu ste r , whi ch ena ble s paral lel pro cess ing. Clas sica tion results were eva luated using independent land-use data derived from visual interpretation. It can be conc luded that the procedu re gives good classic ation re sul ts for the tiles that ar e use d to tra in the algor ithm, but that the ext rap olation to other til es re sulted in muc h lower accu racie s. Err or sour ces have been ident ied and sugge stion s for improvements are given. I. I NTRODUCTION The Xplorah Planning Support Systems, developed for the Pue rto Ric o Planni ng Boa rd, ena ble s pla nne rs and pol icy makers to forecast land-use changes as the result of various scenarios and to assess alternative planning and policy options in their ful ly int egrat ed, dyn ami c and spa tial context. The quality of land use predicted by Xplorah, as well as other land- use change models, relies heavily on the availability of high quali ty geogr aphic ally reference d data. A high quality time series of land-use maps is necessary for calibration, validation and updating of the model. Land-use maps, however, are often lacking. Even if time series are available, inconsistencies in mapping methodologies, legends and scales often induce mea- sured land-use changes that do not represent actual changes in land-use patterns. Furthermore, land-use maps are mainly derived from manual mapping, which is time-consuming and expensive. This study evaluates the feasibility of using an automatic image processing procedure in order to update the land-use ma p to be used in Xplo rah. Th e ai m is to automa ti ca ll y derive land-use maps at 60 m resolution from digital aerial pho tog rap hs with a cla ssi ca tio n accuracy of  66%. The procedure proposed in this study uses a contextual reclassi- cati on algor ithm applied to a preliminary classicatio n of dig ital aer ial pho tog rap hs. The pro ces sin g cha in has bee n imple mented on a compu ter clus ter , which enables parallel processing. II. REMOTE SENSING AND G IS  DATA In the period from October to December 2009 thousands of multispectral images were acquired over Puerto Rico, using the ADS40 SH52 digital image sensor of Fugro Earthdata, Inc. Each frame covers 10K by 10K pixels in four spectral bands (red, green, blue and near-infrared). Flying at an altitude of 2900 m, a ground resolution of 0.3 m was obtained. Histogram matching was applied during image pre-processing in order to ensure that all images have a comparable reectance. The reference land-use data consists of the Xplorah 2010 land-use map at 60 m resolution, which has been developed as part of the Xplorah project [1]. The map is derived by means of visual interpretation using remote sensing data, supplemented with ancillary datasets. The reported accuracy of the Xplorah 2010 land-use map is 97%, although it should be noted that this land-use map is a representation of reality with its inherent unc ertain tie s that are dif cult to qua nti fy . The goa l of the remot e sens ing base d classication, howev er, is to produ ce land-use maps similar to the Xplorah land-use map with higher tempo ral availa bilit y and less cost s. There fore it should be noted that the statistics derived from the comparison between the automatic cla ssi ca tio n and the ref ere nce map do not necessarily reect disagreement with reality. 978-1-4244-8657-1/11/$26.00  c 2011 IEEE

Transcript of Automatic Generation of Land-Use Maps

Page 1: Automatic Generation of Land-Use Maps

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 14

Automatic generation of land-use maps for a spatialdecision support system for Puerto Rico

Johannes van der Kwast

Josefien DelrueLuc BertelsInge Uljee

Stijn Van LooyJoan Schepens

and Guy EngelenUnit Environmental Modelling

VITO

Mol Belgium

Email hansvanderkwastvitobe

Elias Gutierrez

Graduate School of PlanningUniversity of Puerto Rico

San Juan Puerto Rico

Email eliasgutierrezyahoocom

Glenda Roman

Geographic Mapping Technologies CorpSan Juan Puerto Rico

Email gromangmtgiscom

Abstractmdash This study proposes an automatic image processingprocedure in order to facilitate regular updating of the land-usemap of Puerto Rico which is a key dataset for the Xplorah Plan-ning Support Systems The procedure is based on the contextualreclassification of digital high resolution aerial photographs thatwere preclassified using a decision tree classifier For the contex-tual reclassification the Optimized Spatial Reclassification Kernel(OSPARK) is used which is able to discriminate functional land-use classes and land cover based on the configuration of objects ina kernel A unique property of OSPARK is that it automaticallyadapts the kernel size as a function of spatial variation in theneighborhood of each pixel to be classified The processing chainhas been implemented on a computer cluster which enablesparallel processing Classification results were evaluated usingindependent land-use data derived from visual interpretation Itcan be concluded that the procedure gives good classificationresults for the tiles that are used to train the algorithm butthat the extrapolation to other tiles resulted in much loweraccuracies Error sources have been identified and suggestionsfor improvements are given

I INTRODUCTION

The Xplorah Planning Support Systems developed for the

Puerto Rico Planning Board enables planners and policy

makers to forecast land-use changes as the result of various

scenarios and to assess alternative planning and policy options

in their fully integrated dynamic and spatial context The

quality of land use predicted by Xplorah as well as other land-use change models relies heavily on the availability of high

quality geographically referenced data A high quality time

series of land-use maps is necessary for calibration validation

and updating of the model Land-use maps however are often

lacking Even if time series are available inconsistencies in

mapping methodologies legends and scales often induce mea-

sured land-use changes that do not represent actual changes

in land-use patterns Furthermore land-use maps are mainly

derived from manual mapping which is time-consuming and

expensive

This study evaluates the feasibility of using an automatic

image processing procedure in order to update the land-use

map to be used in Xplorah The aim is to automatically

derive land-use maps at 60 m resolution from digital aerial

photographs with a classification accuracy of ge66 The

procedure proposed in this study uses a contextual reclassi-

fication algorithm applied to a preliminary classification of

digital aerial photographs The processing chain has been

implemented on a computer cluster which enables parallel

processing

II REMOTE SENSING AND G IS DATA

In the period from October to December 2009 thousands of

multispectral images were acquired over Puerto Rico using

the ADS40 SH52 digital image sensor of Fugro Earthdata Inc

Each frame covers 10K by 10K pixels in four spectral bands

(red green blue and near-infrared) Flying at an altitude of

2900 m a ground resolution of 03 m was obtained Histogram

matching was applied during image pre-processing in order to

ensure that all images have a comparable reflectance

The reference land-use data consists of the Xplorah 2010

land-use map at 60 m resolution which has been developed as

part of the Xplorah project [1] The map is derived by means of visual interpretation using remote sensing data supplemented

with ancillary datasets The reported accuracy of the Xplorah

2010 land-use map is 97 although it should be noted that

this land-use map is a representation of reality with its inherent

uncertainties that are difficult to quantify The goal of the

remote sensing based classification however is to produce

land-use maps similar to the Xplorah land-use map with higher

temporal availability and less costs Therefore it should be

noted that the statistics derived from the comparison between

the automatic classification and the reference map do not

necessarily reflect disagreement with reality

978-1-4244-8657-111$2600 c⃝2011 IEEE

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 24

Fig 1 Flowchart of the OSPARK algorithm The shaded part shows theoriginal SPARK algorithm that is iterated for a range of kernel sizes in theOSPARK algorithm [5]

III THE OSPARK ALGORITHM

The Optimized SPARK (OSPARK) algorithm [2] is a con-

textual reclassifier which is based on the Spatial Reclas-

sification Kernel (SPARK [3]) Contextual reclassifiers are

based on the concept that information captured in neighboring

cells or information about patterns surrounding the pixel of

interest may provide useful supplementary information in the

classification process [4] Previous research [3] has demon-

strated a strong relationship between the spatial structure of

urban areas and its functional characteristics The SPARK

algorithm examines the local spatial patterns of land cover

in a square kernel or moving window and classifies the center

pixel based on the arrangement of adjacent pixels OSPARK

is an extension to SPARK in the sense that it automatically

adapts the kernel size to the spatial variation detected around

the pixel to be classified The classification consists of three

phases [3]

1) Producing a land-cover map using any type of pixel-

based spectral classifier from a remotely sensed image

further referred to as lsquoinitial land-cover maprsquo

2) Defining decision rules based on local spatial patterns

of land cover in typical land-use types

3) Reclassifying the initial land-cover map into land-use

types based on the decision rules of phase 2

Fig 1 shows the flowchart of the OSPARK algorithm The

algorithm derives adjacency event matrices M by counting the

frequency of the pixel-based classes positioned next to each

other as well as diagonally within each template kernel Next

the M-matrices are compared with template (T) matrices thatare derived from kernels that are representative for the land-

use classes to be derived The similarity index is used as

a goodness-of-fit measure

= 1 minus

09830865 minus2

1038389sum1103925=1

1038389sum907317=1

98308010383891103925907317 minus 11039251038389

2(1)

where 10383891103925907317 is the adjacency event in a 907317 by 907317 matrix M 11039251038389 is

the adjacency event in a 907317 by 907317 matrix T which is a template

matrix for land-use class is the total number of adjacency

983154983156983144983151983152983144983151983156983151983155983090983088983089983088

983089983088983147 983160 983089983088983147 983156983145983148983141983155983104 983088983086983091 983149

983113983150983145983156983145983137983148983148983137983150983140‐983139983151983158983141983154 983149983137983152

983089983147 983160 983089983147 983156983145983148983141983155983104 983091 983149

Resampling

Conversion to IDL

983123983120983105983122983115 983148983137983150983140‐983157983155983141983149983137983152

983091983147 983160 983091983147 983156983145983148983141983155983104 983091 983149

Conversion to

PCRaster

983123983120983105983122983115 983148983137983150983140‐983157983155983141983149983137983152

983104 983089983093 983149

983104 983094983088 983149

983104 983090983092983088 983149

983113983150983145983156983145983137983148983148983137983150983140‐983139983151983158983141983154 983149983137983152 983151983142

983156983154983137983145983150983145983150983143 983156983145983148983141983155

983123983120983105983122983115 983148983137983150983140‐983157983155983141983149983137983152 983151983142

983156983154983137983145983150983145983150983143 983156983145983148983141983155

983122983141983142983141983154983141983150983139983141 983148983137983150983140‐983157983155983141983149983137983152

983104 983089983093 983149

983104 983094983088 983149

983104 983090983092983088 983149

Resampling

Reclass

983113983150983145983156983145983137983148983148983137983150983140‐983139983151983158983141983154 983149983137983152

983091983147 983160 983091983147 983156983145983148983141983155983104 983091 983149983124983141983149983152983148983137983156983141

983140983137983156983137983138983137983155983141

983123983120983105983122983115

983137983148983143983151983154983145983156983144983149

Histogram matching

GDAL retile

Mosaick

983126983137983148983145983140983137983156983145983151983150

Sample

templates

Fig 2 Flowchart of the classification procedure

events in the kernel and 907317 is the number of classes in the per-

pixel classified input map can range from 0 to 1 If

equals 0 M is completely different from T while a value

of 1 means that they are identical

OSPARK iteratively calculates the similarity index for ker-

nel sizes with an apothem ie distance from the center pixelto a side of a square kernel from 1 to pixels The resulting

stack consisting of similarity maps is analyzed by an

integration operator which assigns the class that corresponds

with the optimal -value for each pixel The optimal -

value is determined based on two possible cases for the

evolution of with increasing kernel size [2]

1) In the case that local maxima are present the first local

maximum above a user-defined minimum -threshold

value is determined and the corresponding land-use class

is assigned

2) In the case that local maxima are absent the curve

converges to

asymp 1

and the integration operator assignsthe class to the pixel when the -value changes less

than 005 between consecutive iterations and is higher

than the threshold value

The threshold prevents classification of pixels with a too

low -value

The derived land-use map and a map containing the -

value corresponding to the optimal kernel size for each pixel

are the outputs of the algorithm

IV THE PROCESSING CHAIN

Fig 2 shows the workflow for the automatic classification of

the orthorectified aerial photographs of 2009 The procedure

consists of preprocessing building the template database run-ning OSPARK in batch on a computer cluster post-processing

and accuracy assessment

A Preprocessing

First the 1500 orthophoto tiles were in batch converted to

the IDL ENVI image format and resampled to tiles of 1000

by 1000 pixels with 3 m resolution Next each tile containing

blue green red and near-infrared channels was classified

using a decision tree classification The decision tree classifier

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 34

Legend

NATURAL

FOREST

AGRICULTURE

CONSTRUCTION

MINING

INDUSTRIAL

HIGH-DENSITYTRA DE AND SERVICES

HIGH-DENSITY RESIDENTIAL

FORESTRESERVES

MANGROVES AND SWAMPS

SEA

BEACH

CORALREEF

WATER RESOURCES

PUBLIC AND RECREATION

UTILITIES

INFRASTRUCTURE

ROCKYCLIFF S AND SHELVES

RANGELANDS

LOW-DENSITY TRADE AND SERVICES

LOW-DENSITY RESIDENTIAL

0 20 4010Kilometers

1

2

3

Fig 3 Training tiles for building the OSPARK template database 1 =urbanized area (San Juan) 2 = naturalrural area (around Bosque Estatal deMonte Guilarte) 3 = urbanized area (Mayaguez) The background map showsthe OSPARK classification result

is an unsupervised classification method that performs a multi-

stage classification by using a series of binary decisions inorder to cluster pixels This procedure resulted in 25 classes

3 m was considered as the most optimal resolution for the

initial land-cover map as the objects could be clearly defined

at this resolution while noise introduced by unnecessary

spatial detail was avoided The initial land-cover map was

retiled to tiles with 3000 by 3000 pixels and converted to

the PCRaster format which is the input for the OSPARK

algorithm Open-source utilities distributed with the Geospatial

Data Abstraction Library (GDAL httpwwwgdalorg) were

used to perform this The size of the tiles was considered as

optimal since small tiles would result in many missing values

after the OSPARK classification while larger tiles could causethe system to run out of memory Separate tiles were calculated

to cover the tile edges that will have missing values after the

OSPARK classification In total 178 tiles of 3000 by 3000

pixels 150 tiles of 100 by 3000 (row by columns) and 166 tiles

of 3000 by 100 were used to classify the entire Commonwealth

of Puerto Rico

B Building the template database

The OSPARK algorithm needs a database of representative

template matrices For this purpose three 3000 by 3000 tiles

were selected (Fig 3)

These training tiles were selected in order to include the

most important land-use classes involved in urban dynamics

but also to represent natural and rural land-use classes The

center coordinates of the template kernels were derived by

stratified random sampling of 50 points within each class of

the land-use map The same procedure was followed to derive

an independent set of pixels for evaluation of the contextual

classification of the tiles

In order to check the quality of the selected templates and

their transferability to different areas different combinations

of templates have been used in the OSPARK classifications

of the three tiles The resulting maps were evaluated using

contingency matrices with the independent reference data

sampled from the Xplorah 2010 land-use map Based on the

quality of the derived land-use maps templates were selected

or removed from the database The final set of templates was

used to classify all tiles

C Implementation on a computer cluster

The OSPARK algorithm was applied to all tiles coveringPuerto Rico using the templates database derived using the

procedure described in the previous section After general

preprocessing consisting of preparing the input tiles and

obtaining a good set of templates (T) for the database

OSPARK is run at a computer cluster The cluster hardware

consists of a server with a dual core Intel Xeon CPU (28 GHz)

and 1 GB of RAM The 19 nodes of the cluster each consist of

2 Intel Xeon CPUrsquos and between 4 and 12 GB of RAM which

allows the parallel execution of up to 144 jobs In the current

set up of the algorithm the maximum kernel apothem (W )

was set to 30 pixels which is a trade-off between calculation

time and classification accuracy With this configuration four

tiles can be parallel processed at the cluster The OSPARKalgorithm applied to each tile consists of

1) Loading the proper tile and templates database

2) Parallel execution of SPARK for apothems ranging from

1 to W pixels where W = 30 in this case

3) Running the integration operator that estimates the op-

timal class for each cell based on the stack of similarity

maps and resampling the output from 3 to 60 m cells

using a majority filter of 120 m

In step 3 also ocean and forest reserves are copied from the

Xplorah 2010 land-use map to the OSPARK classification

because the ocean class does not show much dynamics and

the forest reserves class is determined by policy decisionsand zoning documents rather than morphology or reflective

properties of the landscape Therefore it is not feasible to

derive this class by means of remote sensing techniques

D Postprocessing

After all tiles of all four sections are calculated a general

postprocessing routine mosaickes all the classified tiles into

land-use maps of Puerto Rico at 60 m resolution

V RESULTS

A OSPARK results for training tiles

Analysis of the contingency matrices of the classificationof the three tiles shows that the kappa and overall accuracy

of the classification of the training tiles is not always higher

than 66 The producerrsquos and userrsquos accuracy of the individual

classes show that some classes can be retrieved at an accuracy

higher than 66 while others are classified with a lower

accuracy The results vary per training tile In training tile 1

the classes construction mining residential sea beach water

resources and utilities have a producerrsquos and userrsquos accuracy

higher than 05 Other classes show a higher level of confusion

Training tile 2 shows a better result but many classes are not

present in the scene that covers mainly an agricultural and

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 44

forested area Good results were obtained for the classes forest

trade and services residential water resources public and

recreation and rangelands For training tile 3 good results were

obtained for urban classes construction industry residential

public and recreation utilities and infrastructure In addition

good results were also obtained for the non-urban classes

forest agriculture mangroves and swamps sea beach water

resources and rangelandsAn optimal database of templates was derived by trial-and-

error based on the analysis of these three tiles The optimal

database was used to classify the entire Commonwealth of

Puerto Rico

B OSPARK results for all tiles

In approximately one month time all tiles were processed

by the computer cluster (Fig 3) The overall accuracy is 66

and the kappa value is 057 The high figures are however

biased by the large area of sea and forest reserves that are

not taken into account by OSPARK but directly derived from

the Xplorah 2010 land-use map A more detailed analysis

of the accuracy reveals that most classes have a low userrsquos

and producerrsquos accuracy Exceptions are the relatively high

userrsquos and producerrsquos accuracy for the forest and residential

classes Water resources and public- and recreation facilities

can be derived with an acceptable userrsquos accuracy although

their producerrsquos accuracy is low

VI DISCUSSION AND CONCLUSIONS

In this study the feasibility of using a fully automated land-

use classification procedure applied to high resolution remote

sensing images has been investigated A processing chain has

been described for (1) preprocessing the aerial photographs

(2) performing a pre-classification of the blue green red andnear-infrared channels of the orthomosaic based on a decision

tree classification (3) training of the OSPARK algorithm using

three training tiles covering important land-use types and

(4) running the algorithm on a computer cluster in order to

improve the calculation times by parallel processing of the

kernels

Results of the classification procedure were compared with

the Xplorah 2010 land-use classification which has a reported

overall accuracy of 97 Although the results for the individ-

ual training tiles were promising and gave acceptable results

for most land-use classes the application of the algorithm to

the entire Commonwealth of Puerto Rico resulted in a much

lower accuracy for most classes Classes that can be inferred

with an acceptable accuracy using the proposed procedure are

forest residential water resources and public and recreation

The overall accuracy was 66 This value is however biased

by sea and forest reserve classes that were not derived by the

OSPARK classification but were copied from the reference

map

The errors in the classification can be attributed to different

sources The main source of errors is caused by the templates

database that is used Although the templates in the database

gave good results for the three training tiles the results for

the entire Commonwealth of Puerto Rico indicate that the

templates were not representative for all tiles and could not

be extrapolated Further research should focus on a better

training of the template database using statistical or machine

learning techniques It should also be investigated if it is

feasible to classify Puerto Rico with only one representative

set of templates or if a spatial stratification would yield better

classification resultsOther sources of errors could be introduced by the maxi-

mum kernel size which choice is a trade-off between calcu-

lation time and accuracy Furthermore the resolution of 3 m

chosen for the initial land-cover map has an impact on the

detection of homogeneous objects and consequently on the

configuration of objects within a kernel to be classified by

OSPARK This problem is aggravated by the comparison of

the automatically interpreted land-use map with the Xplorah

2010 land-use map which is generated at 15 m resolution by

means of visual interpretation The visual interpretation will

based on human insight generalize areas featuring a salt-and-

pepper structure in the most meaningful land uses covering

larger contiguous areas while the automatic classification will

consider the individual cells as meaningfull contributors to

each template analyzed Examples of such generalizations are

described in [1] Other errors could be introduced by the

histogram matching of the aerial photographs which might

cause a different illumination in the different regions Future

studies should also investigate these causes of inaccuraciesIn general it can be concluded that the automatic derivation

of 18 land-use classes by means of remote sensing techniques

remains a challenge The proposed processing chain however

can contribute to more advanced methods of classification that

can increase the time interval between land-use maps while

reducing the production costs compared to the labor-intensivemanual map production

ACKNOWLEDGMENT

The research presented in this paper is funded by the

Graduate School of Planning University of Puerto Rico in

the frame of the Xplorah project The reference land-use data

were made available by GMT Corp

REFERENCES

[1] G Roman A Castro and E Carreras ldquoGeneration of land-use mapsrequired for the implementation phase of a spatial decision supportsystem for puerto rico Xplorah 2010 land-use maprdquo Geographic MappingTechnologies Corporation San Juan Puerto Rico Tech Rep 2010

[2] J van der Kwast T van de Voorde F Canters I Uljee S van Looyand G Engelen ldquoInferring urban land use using the optimised spatialreclassification kernel (OSPARK)rdquo Environmental Modelling amp Softwarein review

[3] M Barnsley and S Barr ldquoInferring urban land use from satellite sensorimages using kernel-based analysis and classificationrdquo Photogramm Eng

Rem S vol 62 no 8 pp 949ndash958 1996[4] S M de Jong and F van der Meer Remote sensing image analysis

including the spatial domain ser Remote sensing and digital imageprocessing 5 Kluwer academic publishers 2004

[5] J van der Kwast T van de Voorde F Canters G Engelen andC Lavalle ldquoUsing remote sensing derived spatial metrics for the cal-ibration of land-use change modelsrdquo in IEEE Proceedings of the 7th

International Urban Remote Sensing Conference (URS 2009) ShanghaiIEEE 2009

Page 2: Automatic Generation of Land-Use Maps

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 24

Fig 1 Flowchart of the OSPARK algorithm The shaded part shows theoriginal SPARK algorithm that is iterated for a range of kernel sizes in theOSPARK algorithm [5]

III THE OSPARK ALGORITHM

The Optimized SPARK (OSPARK) algorithm [2] is a con-

textual reclassifier which is based on the Spatial Reclas-

sification Kernel (SPARK [3]) Contextual reclassifiers are

based on the concept that information captured in neighboring

cells or information about patterns surrounding the pixel of

interest may provide useful supplementary information in the

classification process [4] Previous research [3] has demon-

strated a strong relationship between the spatial structure of

urban areas and its functional characteristics The SPARK

algorithm examines the local spatial patterns of land cover

in a square kernel or moving window and classifies the center

pixel based on the arrangement of adjacent pixels OSPARK

is an extension to SPARK in the sense that it automatically

adapts the kernel size to the spatial variation detected around

the pixel to be classified The classification consists of three

phases [3]

1) Producing a land-cover map using any type of pixel-

based spectral classifier from a remotely sensed image

further referred to as lsquoinitial land-cover maprsquo

2) Defining decision rules based on local spatial patterns

of land cover in typical land-use types

3) Reclassifying the initial land-cover map into land-use

types based on the decision rules of phase 2

Fig 1 shows the flowchart of the OSPARK algorithm The

algorithm derives adjacency event matrices M by counting the

frequency of the pixel-based classes positioned next to each

other as well as diagonally within each template kernel Next

the M-matrices are compared with template (T) matrices thatare derived from kernels that are representative for the land-

use classes to be derived The similarity index is used as

a goodness-of-fit measure

= 1 minus

09830865 minus2

1038389sum1103925=1

1038389sum907317=1

98308010383891103925907317 minus 11039251038389

2(1)

where 10383891103925907317 is the adjacency event in a 907317 by 907317 matrix M 11039251038389 is

the adjacency event in a 907317 by 907317 matrix T which is a template

matrix for land-use class is the total number of adjacency

983154983156983144983151983152983144983151983156983151983155983090983088983089983088

983089983088983147 983160 983089983088983147 983156983145983148983141983155983104 983088983086983091 983149

983113983150983145983156983145983137983148983148983137983150983140‐983139983151983158983141983154 983149983137983152

983089983147 983160 983089983147 983156983145983148983141983155983104 983091 983149

Resampling

Conversion to IDL

983123983120983105983122983115 983148983137983150983140‐983157983155983141983149983137983152

983091983147 983160 983091983147 983156983145983148983141983155983104 983091 983149

Conversion to

PCRaster

983123983120983105983122983115 983148983137983150983140‐983157983155983141983149983137983152

983104 983089983093 983149

983104 983094983088 983149

983104 983090983092983088 983149

983113983150983145983156983145983137983148983148983137983150983140‐983139983151983158983141983154 983149983137983152 983151983142

983156983154983137983145983150983145983150983143 983156983145983148983141983155

983123983120983105983122983115 983148983137983150983140‐983157983155983141983149983137983152 983151983142

983156983154983137983145983150983145983150983143 983156983145983148983141983155

983122983141983142983141983154983141983150983139983141 983148983137983150983140‐983157983155983141983149983137983152

983104 983089983093 983149

983104 983094983088 983149

983104 983090983092983088 983149

Resampling

Reclass

983113983150983145983156983145983137983148983148983137983150983140‐983139983151983158983141983154 983149983137983152

983091983147 983160 983091983147 983156983145983148983141983155983104 983091 983149983124983141983149983152983148983137983156983141

983140983137983156983137983138983137983155983141

983123983120983105983122983115

983137983148983143983151983154983145983156983144983149

Histogram matching

GDAL retile

Mosaick

983126983137983148983145983140983137983156983145983151983150

Sample

templates

Fig 2 Flowchart of the classification procedure

events in the kernel and 907317 is the number of classes in the per-

pixel classified input map can range from 0 to 1 If

equals 0 M is completely different from T while a value

of 1 means that they are identical

OSPARK iteratively calculates the similarity index for ker-

nel sizes with an apothem ie distance from the center pixelto a side of a square kernel from 1 to pixels The resulting

stack consisting of similarity maps is analyzed by an

integration operator which assigns the class that corresponds

with the optimal -value for each pixel The optimal -

value is determined based on two possible cases for the

evolution of with increasing kernel size [2]

1) In the case that local maxima are present the first local

maximum above a user-defined minimum -threshold

value is determined and the corresponding land-use class

is assigned

2) In the case that local maxima are absent the curve

converges to

asymp 1

and the integration operator assignsthe class to the pixel when the -value changes less

than 005 between consecutive iterations and is higher

than the threshold value

The threshold prevents classification of pixels with a too

low -value

The derived land-use map and a map containing the -

value corresponding to the optimal kernel size for each pixel

are the outputs of the algorithm

IV THE PROCESSING CHAIN

Fig 2 shows the workflow for the automatic classification of

the orthorectified aerial photographs of 2009 The procedure

consists of preprocessing building the template database run-ning OSPARK in batch on a computer cluster post-processing

and accuracy assessment

A Preprocessing

First the 1500 orthophoto tiles were in batch converted to

the IDL ENVI image format and resampled to tiles of 1000

by 1000 pixels with 3 m resolution Next each tile containing

blue green red and near-infrared channels was classified

using a decision tree classification The decision tree classifier

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 34

Legend

NATURAL

FOREST

AGRICULTURE

CONSTRUCTION

MINING

INDUSTRIAL

HIGH-DENSITYTRA DE AND SERVICES

HIGH-DENSITY RESIDENTIAL

FORESTRESERVES

MANGROVES AND SWAMPS

SEA

BEACH

CORALREEF

WATER RESOURCES

PUBLIC AND RECREATION

UTILITIES

INFRASTRUCTURE

ROCKYCLIFF S AND SHELVES

RANGELANDS

LOW-DENSITY TRADE AND SERVICES

LOW-DENSITY RESIDENTIAL

0 20 4010Kilometers

1

2

3

Fig 3 Training tiles for building the OSPARK template database 1 =urbanized area (San Juan) 2 = naturalrural area (around Bosque Estatal deMonte Guilarte) 3 = urbanized area (Mayaguez) The background map showsthe OSPARK classification result

is an unsupervised classification method that performs a multi-

stage classification by using a series of binary decisions inorder to cluster pixels This procedure resulted in 25 classes

3 m was considered as the most optimal resolution for the

initial land-cover map as the objects could be clearly defined

at this resolution while noise introduced by unnecessary

spatial detail was avoided The initial land-cover map was

retiled to tiles with 3000 by 3000 pixels and converted to

the PCRaster format which is the input for the OSPARK

algorithm Open-source utilities distributed with the Geospatial

Data Abstraction Library (GDAL httpwwwgdalorg) were

used to perform this The size of the tiles was considered as

optimal since small tiles would result in many missing values

after the OSPARK classification while larger tiles could causethe system to run out of memory Separate tiles were calculated

to cover the tile edges that will have missing values after the

OSPARK classification In total 178 tiles of 3000 by 3000

pixels 150 tiles of 100 by 3000 (row by columns) and 166 tiles

of 3000 by 100 were used to classify the entire Commonwealth

of Puerto Rico

B Building the template database

The OSPARK algorithm needs a database of representative

template matrices For this purpose three 3000 by 3000 tiles

were selected (Fig 3)

These training tiles were selected in order to include the

most important land-use classes involved in urban dynamics

but also to represent natural and rural land-use classes The

center coordinates of the template kernels were derived by

stratified random sampling of 50 points within each class of

the land-use map The same procedure was followed to derive

an independent set of pixels for evaluation of the contextual

classification of the tiles

In order to check the quality of the selected templates and

their transferability to different areas different combinations

of templates have been used in the OSPARK classifications

of the three tiles The resulting maps were evaluated using

contingency matrices with the independent reference data

sampled from the Xplorah 2010 land-use map Based on the

quality of the derived land-use maps templates were selected

or removed from the database The final set of templates was

used to classify all tiles

C Implementation on a computer cluster

The OSPARK algorithm was applied to all tiles coveringPuerto Rico using the templates database derived using the

procedure described in the previous section After general

preprocessing consisting of preparing the input tiles and

obtaining a good set of templates (T) for the database

OSPARK is run at a computer cluster The cluster hardware

consists of a server with a dual core Intel Xeon CPU (28 GHz)

and 1 GB of RAM The 19 nodes of the cluster each consist of

2 Intel Xeon CPUrsquos and between 4 and 12 GB of RAM which

allows the parallel execution of up to 144 jobs In the current

set up of the algorithm the maximum kernel apothem (W )

was set to 30 pixels which is a trade-off between calculation

time and classification accuracy With this configuration four

tiles can be parallel processed at the cluster The OSPARKalgorithm applied to each tile consists of

1) Loading the proper tile and templates database

2) Parallel execution of SPARK for apothems ranging from

1 to W pixels where W = 30 in this case

3) Running the integration operator that estimates the op-

timal class for each cell based on the stack of similarity

maps and resampling the output from 3 to 60 m cells

using a majority filter of 120 m

In step 3 also ocean and forest reserves are copied from the

Xplorah 2010 land-use map to the OSPARK classification

because the ocean class does not show much dynamics and

the forest reserves class is determined by policy decisionsand zoning documents rather than morphology or reflective

properties of the landscape Therefore it is not feasible to

derive this class by means of remote sensing techniques

D Postprocessing

After all tiles of all four sections are calculated a general

postprocessing routine mosaickes all the classified tiles into

land-use maps of Puerto Rico at 60 m resolution

V RESULTS

A OSPARK results for training tiles

Analysis of the contingency matrices of the classificationof the three tiles shows that the kappa and overall accuracy

of the classification of the training tiles is not always higher

than 66 The producerrsquos and userrsquos accuracy of the individual

classes show that some classes can be retrieved at an accuracy

higher than 66 while others are classified with a lower

accuracy The results vary per training tile In training tile 1

the classes construction mining residential sea beach water

resources and utilities have a producerrsquos and userrsquos accuracy

higher than 05 Other classes show a higher level of confusion

Training tile 2 shows a better result but many classes are not

present in the scene that covers mainly an agricultural and

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 44

forested area Good results were obtained for the classes forest

trade and services residential water resources public and

recreation and rangelands For training tile 3 good results were

obtained for urban classes construction industry residential

public and recreation utilities and infrastructure In addition

good results were also obtained for the non-urban classes

forest agriculture mangroves and swamps sea beach water

resources and rangelandsAn optimal database of templates was derived by trial-and-

error based on the analysis of these three tiles The optimal

database was used to classify the entire Commonwealth of

Puerto Rico

B OSPARK results for all tiles

In approximately one month time all tiles were processed

by the computer cluster (Fig 3) The overall accuracy is 66

and the kappa value is 057 The high figures are however

biased by the large area of sea and forest reserves that are

not taken into account by OSPARK but directly derived from

the Xplorah 2010 land-use map A more detailed analysis

of the accuracy reveals that most classes have a low userrsquos

and producerrsquos accuracy Exceptions are the relatively high

userrsquos and producerrsquos accuracy for the forest and residential

classes Water resources and public- and recreation facilities

can be derived with an acceptable userrsquos accuracy although

their producerrsquos accuracy is low

VI DISCUSSION AND CONCLUSIONS

In this study the feasibility of using a fully automated land-

use classification procedure applied to high resolution remote

sensing images has been investigated A processing chain has

been described for (1) preprocessing the aerial photographs

(2) performing a pre-classification of the blue green red andnear-infrared channels of the orthomosaic based on a decision

tree classification (3) training of the OSPARK algorithm using

three training tiles covering important land-use types and

(4) running the algorithm on a computer cluster in order to

improve the calculation times by parallel processing of the

kernels

Results of the classification procedure were compared with

the Xplorah 2010 land-use classification which has a reported

overall accuracy of 97 Although the results for the individ-

ual training tiles were promising and gave acceptable results

for most land-use classes the application of the algorithm to

the entire Commonwealth of Puerto Rico resulted in a much

lower accuracy for most classes Classes that can be inferred

with an acceptable accuracy using the proposed procedure are

forest residential water resources and public and recreation

The overall accuracy was 66 This value is however biased

by sea and forest reserve classes that were not derived by the

OSPARK classification but were copied from the reference

map

The errors in the classification can be attributed to different

sources The main source of errors is caused by the templates

database that is used Although the templates in the database

gave good results for the three training tiles the results for

the entire Commonwealth of Puerto Rico indicate that the

templates were not representative for all tiles and could not

be extrapolated Further research should focus on a better

training of the template database using statistical or machine

learning techniques It should also be investigated if it is

feasible to classify Puerto Rico with only one representative

set of templates or if a spatial stratification would yield better

classification resultsOther sources of errors could be introduced by the maxi-

mum kernel size which choice is a trade-off between calcu-

lation time and accuracy Furthermore the resolution of 3 m

chosen for the initial land-cover map has an impact on the

detection of homogeneous objects and consequently on the

configuration of objects within a kernel to be classified by

OSPARK This problem is aggravated by the comparison of

the automatically interpreted land-use map with the Xplorah

2010 land-use map which is generated at 15 m resolution by

means of visual interpretation The visual interpretation will

based on human insight generalize areas featuring a salt-and-

pepper structure in the most meaningful land uses covering

larger contiguous areas while the automatic classification will

consider the individual cells as meaningfull contributors to

each template analyzed Examples of such generalizations are

described in [1] Other errors could be introduced by the

histogram matching of the aerial photographs which might

cause a different illumination in the different regions Future

studies should also investigate these causes of inaccuraciesIn general it can be concluded that the automatic derivation

of 18 land-use classes by means of remote sensing techniques

remains a challenge The proposed processing chain however

can contribute to more advanced methods of classification that

can increase the time interval between land-use maps while

reducing the production costs compared to the labor-intensivemanual map production

ACKNOWLEDGMENT

The research presented in this paper is funded by the

Graduate School of Planning University of Puerto Rico in

the frame of the Xplorah project The reference land-use data

were made available by GMT Corp

REFERENCES

[1] G Roman A Castro and E Carreras ldquoGeneration of land-use mapsrequired for the implementation phase of a spatial decision supportsystem for puerto rico Xplorah 2010 land-use maprdquo Geographic MappingTechnologies Corporation San Juan Puerto Rico Tech Rep 2010

[2] J van der Kwast T van de Voorde F Canters I Uljee S van Looyand G Engelen ldquoInferring urban land use using the optimised spatialreclassification kernel (OSPARK)rdquo Environmental Modelling amp Softwarein review

[3] M Barnsley and S Barr ldquoInferring urban land use from satellite sensorimages using kernel-based analysis and classificationrdquo Photogramm Eng

Rem S vol 62 no 8 pp 949ndash958 1996[4] S M de Jong and F van der Meer Remote sensing image analysis

including the spatial domain ser Remote sensing and digital imageprocessing 5 Kluwer academic publishers 2004

[5] J van der Kwast T van de Voorde F Canters G Engelen andC Lavalle ldquoUsing remote sensing derived spatial metrics for the cal-ibration of land-use change modelsrdquo in IEEE Proceedings of the 7th

International Urban Remote Sensing Conference (URS 2009) ShanghaiIEEE 2009

Page 3: Automatic Generation of Land-Use Maps

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 34

Legend

NATURAL

FOREST

AGRICULTURE

CONSTRUCTION

MINING

INDUSTRIAL

HIGH-DENSITYTRA DE AND SERVICES

HIGH-DENSITY RESIDENTIAL

FORESTRESERVES

MANGROVES AND SWAMPS

SEA

BEACH

CORALREEF

WATER RESOURCES

PUBLIC AND RECREATION

UTILITIES

INFRASTRUCTURE

ROCKYCLIFF S AND SHELVES

RANGELANDS

LOW-DENSITY TRADE AND SERVICES

LOW-DENSITY RESIDENTIAL

0 20 4010Kilometers

1

2

3

Fig 3 Training tiles for building the OSPARK template database 1 =urbanized area (San Juan) 2 = naturalrural area (around Bosque Estatal deMonte Guilarte) 3 = urbanized area (Mayaguez) The background map showsthe OSPARK classification result

is an unsupervised classification method that performs a multi-

stage classification by using a series of binary decisions inorder to cluster pixels This procedure resulted in 25 classes

3 m was considered as the most optimal resolution for the

initial land-cover map as the objects could be clearly defined

at this resolution while noise introduced by unnecessary

spatial detail was avoided The initial land-cover map was

retiled to tiles with 3000 by 3000 pixels and converted to

the PCRaster format which is the input for the OSPARK

algorithm Open-source utilities distributed with the Geospatial

Data Abstraction Library (GDAL httpwwwgdalorg) were

used to perform this The size of the tiles was considered as

optimal since small tiles would result in many missing values

after the OSPARK classification while larger tiles could causethe system to run out of memory Separate tiles were calculated

to cover the tile edges that will have missing values after the

OSPARK classification In total 178 tiles of 3000 by 3000

pixels 150 tiles of 100 by 3000 (row by columns) and 166 tiles

of 3000 by 100 were used to classify the entire Commonwealth

of Puerto Rico

B Building the template database

The OSPARK algorithm needs a database of representative

template matrices For this purpose three 3000 by 3000 tiles

were selected (Fig 3)

These training tiles were selected in order to include the

most important land-use classes involved in urban dynamics

but also to represent natural and rural land-use classes The

center coordinates of the template kernels were derived by

stratified random sampling of 50 points within each class of

the land-use map The same procedure was followed to derive

an independent set of pixels for evaluation of the contextual

classification of the tiles

In order to check the quality of the selected templates and

their transferability to different areas different combinations

of templates have been used in the OSPARK classifications

of the three tiles The resulting maps were evaluated using

contingency matrices with the independent reference data

sampled from the Xplorah 2010 land-use map Based on the

quality of the derived land-use maps templates were selected

or removed from the database The final set of templates was

used to classify all tiles

C Implementation on a computer cluster

The OSPARK algorithm was applied to all tiles coveringPuerto Rico using the templates database derived using the

procedure described in the previous section After general

preprocessing consisting of preparing the input tiles and

obtaining a good set of templates (T) for the database

OSPARK is run at a computer cluster The cluster hardware

consists of a server with a dual core Intel Xeon CPU (28 GHz)

and 1 GB of RAM The 19 nodes of the cluster each consist of

2 Intel Xeon CPUrsquos and between 4 and 12 GB of RAM which

allows the parallel execution of up to 144 jobs In the current

set up of the algorithm the maximum kernel apothem (W )

was set to 30 pixels which is a trade-off between calculation

time and classification accuracy With this configuration four

tiles can be parallel processed at the cluster The OSPARKalgorithm applied to each tile consists of

1) Loading the proper tile and templates database

2) Parallel execution of SPARK for apothems ranging from

1 to W pixels where W = 30 in this case

3) Running the integration operator that estimates the op-

timal class for each cell based on the stack of similarity

maps and resampling the output from 3 to 60 m cells

using a majority filter of 120 m

In step 3 also ocean and forest reserves are copied from the

Xplorah 2010 land-use map to the OSPARK classification

because the ocean class does not show much dynamics and

the forest reserves class is determined by policy decisionsand zoning documents rather than morphology or reflective

properties of the landscape Therefore it is not feasible to

derive this class by means of remote sensing techniques

D Postprocessing

After all tiles of all four sections are calculated a general

postprocessing routine mosaickes all the classified tiles into

land-use maps of Puerto Rico at 60 m resolution

V RESULTS

A OSPARK results for training tiles

Analysis of the contingency matrices of the classificationof the three tiles shows that the kappa and overall accuracy

of the classification of the training tiles is not always higher

than 66 The producerrsquos and userrsquos accuracy of the individual

classes show that some classes can be retrieved at an accuracy

higher than 66 while others are classified with a lower

accuracy The results vary per training tile In training tile 1

the classes construction mining residential sea beach water

resources and utilities have a producerrsquos and userrsquos accuracy

higher than 05 Other classes show a higher level of confusion

Training tile 2 shows a better result but many classes are not

present in the scene that covers mainly an agricultural and

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 44

forested area Good results were obtained for the classes forest

trade and services residential water resources public and

recreation and rangelands For training tile 3 good results were

obtained for urban classes construction industry residential

public and recreation utilities and infrastructure In addition

good results were also obtained for the non-urban classes

forest agriculture mangroves and swamps sea beach water

resources and rangelandsAn optimal database of templates was derived by trial-and-

error based on the analysis of these three tiles The optimal

database was used to classify the entire Commonwealth of

Puerto Rico

B OSPARK results for all tiles

In approximately one month time all tiles were processed

by the computer cluster (Fig 3) The overall accuracy is 66

and the kappa value is 057 The high figures are however

biased by the large area of sea and forest reserves that are

not taken into account by OSPARK but directly derived from

the Xplorah 2010 land-use map A more detailed analysis

of the accuracy reveals that most classes have a low userrsquos

and producerrsquos accuracy Exceptions are the relatively high

userrsquos and producerrsquos accuracy for the forest and residential

classes Water resources and public- and recreation facilities

can be derived with an acceptable userrsquos accuracy although

their producerrsquos accuracy is low

VI DISCUSSION AND CONCLUSIONS

In this study the feasibility of using a fully automated land-

use classification procedure applied to high resolution remote

sensing images has been investigated A processing chain has

been described for (1) preprocessing the aerial photographs

(2) performing a pre-classification of the blue green red andnear-infrared channels of the orthomosaic based on a decision

tree classification (3) training of the OSPARK algorithm using

three training tiles covering important land-use types and

(4) running the algorithm on a computer cluster in order to

improve the calculation times by parallel processing of the

kernels

Results of the classification procedure were compared with

the Xplorah 2010 land-use classification which has a reported

overall accuracy of 97 Although the results for the individ-

ual training tiles were promising and gave acceptable results

for most land-use classes the application of the algorithm to

the entire Commonwealth of Puerto Rico resulted in a much

lower accuracy for most classes Classes that can be inferred

with an acceptable accuracy using the proposed procedure are

forest residential water resources and public and recreation

The overall accuracy was 66 This value is however biased

by sea and forest reserve classes that were not derived by the

OSPARK classification but were copied from the reference

map

The errors in the classification can be attributed to different

sources The main source of errors is caused by the templates

database that is used Although the templates in the database

gave good results for the three training tiles the results for

the entire Commonwealth of Puerto Rico indicate that the

templates were not representative for all tiles and could not

be extrapolated Further research should focus on a better

training of the template database using statistical or machine

learning techniques It should also be investigated if it is

feasible to classify Puerto Rico with only one representative

set of templates or if a spatial stratification would yield better

classification resultsOther sources of errors could be introduced by the maxi-

mum kernel size which choice is a trade-off between calcu-

lation time and accuracy Furthermore the resolution of 3 m

chosen for the initial land-cover map has an impact on the

detection of homogeneous objects and consequently on the

configuration of objects within a kernel to be classified by

OSPARK This problem is aggravated by the comparison of

the automatically interpreted land-use map with the Xplorah

2010 land-use map which is generated at 15 m resolution by

means of visual interpretation The visual interpretation will

based on human insight generalize areas featuring a salt-and-

pepper structure in the most meaningful land uses covering

larger contiguous areas while the automatic classification will

consider the individual cells as meaningfull contributors to

each template analyzed Examples of such generalizations are

described in [1] Other errors could be introduced by the

histogram matching of the aerial photographs which might

cause a different illumination in the different regions Future

studies should also investigate these causes of inaccuraciesIn general it can be concluded that the automatic derivation

of 18 land-use classes by means of remote sensing techniques

remains a challenge The proposed processing chain however

can contribute to more advanced methods of classification that

can increase the time interval between land-use maps while

reducing the production costs compared to the labor-intensivemanual map production

ACKNOWLEDGMENT

The research presented in this paper is funded by the

Graduate School of Planning University of Puerto Rico in

the frame of the Xplorah project The reference land-use data

were made available by GMT Corp

REFERENCES

[1] G Roman A Castro and E Carreras ldquoGeneration of land-use mapsrequired for the implementation phase of a spatial decision supportsystem for puerto rico Xplorah 2010 land-use maprdquo Geographic MappingTechnologies Corporation San Juan Puerto Rico Tech Rep 2010

[2] J van der Kwast T van de Voorde F Canters I Uljee S van Looyand G Engelen ldquoInferring urban land use using the optimised spatialreclassification kernel (OSPARK)rdquo Environmental Modelling amp Softwarein review

[3] M Barnsley and S Barr ldquoInferring urban land use from satellite sensorimages using kernel-based analysis and classificationrdquo Photogramm Eng

Rem S vol 62 no 8 pp 949ndash958 1996[4] S M de Jong and F van der Meer Remote sensing image analysis

including the spatial domain ser Remote sensing and digital imageprocessing 5 Kluwer academic publishers 2004

[5] J van der Kwast T van de Voorde F Canters G Engelen andC Lavalle ldquoUsing remote sensing derived spatial metrics for the cal-ibration of land-use change modelsrdquo in IEEE Proceedings of the 7th

International Urban Remote Sensing Conference (URS 2009) ShanghaiIEEE 2009

Page 4: Automatic Generation of Land-Use Maps

7262019 Automatic Generation of Land-Use Maps

httpslidepdfcomreaderfullautomatic-generation-of-land-use-maps 44

forested area Good results were obtained for the classes forest

trade and services residential water resources public and

recreation and rangelands For training tile 3 good results were

obtained for urban classes construction industry residential

public and recreation utilities and infrastructure In addition

good results were also obtained for the non-urban classes

forest agriculture mangroves and swamps sea beach water

resources and rangelandsAn optimal database of templates was derived by trial-and-

error based on the analysis of these three tiles The optimal

database was used to classify the entire Commonwealth of

Puerto Rico

B OSPARK results for all tiles

In approximately one month time all tiles were processed

by the computer cluster (Fig 3) The overall accuracy is 66

and the kappa value is 057 The high figures are however

biased by the large area of sea and forest reserves that are

not taken into account by OSPARK but directly derived from

the Xplorah 2010 land-use map A more detailed analysis

of the accuracy reveals that most classes have a low userrsquos

and producerrsquos accuracy Exceptions are the relatively high

userrsquos and producerrsquos accuracy for the forest and residential

classes Water resources and public- and recreation facilities

can be derived with an acceptable userrsquos accuracy although

their producerrsquos accuracy is low

VI DISCUSSION AND CONCLUSIONS

In this study the feasibility of using a fully automated land-

use classification procedure applied to high resolution remote

sensing images has been investigated A processing chain has

been described for (1) preprocessing the aerial photographs

(2) performing a pre-classification of the blue green red andnear-infrared channels of the orthomosaic based on a decision

tree classification (3) training of the OSPARK algorithm using

three training tiles covering important land-use types and

(4) running the algorithm on a computer cluster in order to

improve the calculation times by parallel processing of the

kernels

Results of the classification procedure were compared with

the Xplorah 2010 land-use classification which has a reported

overall accuracy of 97 Although the results for the individ-

ual training tiles were promising and gave acceptable results

for most land-use classes the application of the algorithm to

the entire Commonwealth of Puerto Rico resulted in a much

lower accuracy for most classes Classes that can be inferred

with an acceptable accuracy using the proposed procedure are

forest residential water resources and public and recreation

The overall accuracy was 66 This value is however biased

by sea and forest reserve classes that were not derived by the

OSPARK classification but were copied from the reference

map

The errors in the classification can be attributed to different

sources The main source of errors is caused by the templates

database that is used Although the templates in the database

gave good results for the three training tiles the results for

the entire Commonwealth of Puerto Rico indicate that the

templates were not representative for all tiles and could not

be extrapolated Further research should focus on a better

training of the template database using statistical or machine

learning techniques It should also be investigated if it is

feasible to classify Puerto Rico with only one representative

set of templates or if a spatial stratification would yield better

classification resultsOther sources of errors could be introduced by the maxi-

mum kernel size which choice is a trade-off between calcu-

lation time and accuracy Furthermore the resolution of 3 m

chosen for the initial land-cover map has an impact on the

detection of homogeneous objects and consequently on the

configuration of objects within a kernel to be classified by

OSPARK This problem is aggravated by the comparison of

the automatically interpreted land-use map with the Xplorah

2010 land-use map which is generated at 15 m resolution by

means of visual interpretation The visual interpretation will

based on human insight generalize areas featuring a salt-and-

pepper structure in the most meaningful land uses covering

larger contiguous areas while the automatic classification will

consider the individual cells as meaningfull contributors to

each template analyzed Examples of such generalizations are

described in [1] Other errors could be introduced by the

histogram matching of the aerial photographs which might

cause a different illumination in the different regions Future

studies should also investigate these causes of inaccuraciesIn general it can be concluded that the automatic derivation

of 18 land-use classes by means of remote sensing techniques

remains a challenge The proposed processing chain however

can contribute to more advanced methods of classification that

can increase the time interval between land-use maps while

reducing the production costs compared to the labor-intensivemanual map production

ACKNOWLEDGMENT

The research presented in this paper is funded by the

Graduate School of Planning University of Puerto Rico in

the frame of the Xplorah project The reference land-use data

were made available by GMT Corp

REFERENCES

[1] G Roman A Castro and E Carreras ldquoGeneration of land-use mapsrequired for the implementation phase of a spatial decision supportsystem for puerto rico Xplorah 2010 land-use maprdquo Geographic MappingTechnologies Corporation San Juan Puerto Rico Tech Rep 2010

[2] J van der Kwast T van de Voorde F Canters I Uljee S van Looyand G Engelen ldquoInferring urban land use using the optimised spatialreclassification kernel (OSPARK)rdquo Environmental Modelling amp Softwarein review

[3] M Barnsley and S Barr ldquoInferring urban land use from satellite sensorimages using kernel-based analysis and classificationrdquo Photogramm Eng

Rem S vol 62 no 8 pp 949ndash958 1996[4] S M de Jong and F van der Meer Remote sensing image analysis

including the spatial domain ser Remote sensing and digital imageprocessing 5 Kluwer academic publishers 2004

[5] J van der Kwast T van de Voorde F Canters G Engelen andC Lavalle ldquoUsing remote sensing derived spatial metrics for the cal-ibration of land-use change modelsrdquo in IEEE Proceedings of the 7th

International Urban Remote Sensing Conference (URS 2009) ShanghaiIEEE 2009