Remote sensing-based land cover classification and change...
Transcript of Remote sensing-based land cover classification and change...
Department of Thematic Studies
Environmental Change
MSc Thesis (30 ECTS credits)
Science for Sustainable Development
Malena Hesping
Supervisor: Martin Karlson, PhD (Linköping University, Department of Thematic Studies)
Co-supervisor: Markus Immitzer, PhD (BOKU University of Natural Resources and Life Sciences,
Vienna, Department of Landscape, Spatial and Infrastructure Sciences)
Remote sensing-based land cover
classification and change detection using
Sentinel-2 data and Random Forest
A case study of Rusinga Island, Kenya
Linköpings universitet, SE-581 83 Linköping, Sweden
Copyright
The publishers will keep this document online on the Internet – or its possible replacement –
for a period of 25 years starting from the date of publication barring exceptional circumstances.
The online availability of the document implies permanent permission for anyone to read, to
download, or to print out single copies for his/her own use and to use it unchanged for non-
commercial research and educational purpose. Subsequent transfers of copyright cannot revoke
this permission. All other uses of the document are conditional upon the consent of the
copyright owner. The publisher has taken technical and administrative measures to assure
authenticity, security and accessibility.
According to intellectual property law the author has the right to be mentioned when his/her
work is accessed as described above and to be protected against infringement.
For additional information about the Linköping University Electronic Press and its procedures
for publication and for assurance of document integrity, please refer to its www home page:
http://www.ep.liu.se/.
© Malena Hesping
i
Table of contents
List of figures ...................................................................................................................................... ii
List of tables ........................................................................................................................................ ii
Abstract ...................................................................................................................................... 1
List of abbreviations .................................................................................................................. 1
1. Introduction ........................................................................................................................... 3
1.1 Land use and land cover change in Africa and Kenya ............................................................. 3
1.2 Remote sensing for land use and land cover change monitoring ............................................ 4
1.3 Vegetation to reduce land degradation ...................................................................................... 5
1.4 Aim of this study .......................................................................................................................... 7
2. Materials and methods .......................................................................................................... 8
2.1 Study area ..................................................................................................................................... 8
2.2 Sentinel-2 data and pre-processing .......................................................................................... 10 2.2.1 Software ............................................................................................................................................... 12 2.2.2 Data acquisition and pre-processing .................................................................................................... 12 2.2.3 Vegetation indices ................................................................................................................................ 15
2.3 Classification scheme ................................................................................................................. 17
2.4 Reference data ............................................................................................................................ 20
2.5 Random Forest land cover classification model development ............................................... 23 2.5.1 Predictor datasets ................................................................................................................................. 24 2.5.2 Input feature selection and feature importance ranking ....................................................................... 25 2.5.3 Model accuracy assessment ................................................................................................................. 26
2.6 Post processing and land cover map creation ......................................................................... 27
2.7 Change detection ........................................................................................................................ 28
3. Results .................................................................................................................................. 31
3.1 Random Forest classification model selection ......................................................................... 31
3.2 Accuracy assessment.................................................................................................................. 33
3.3 Feature importance.................................................................................................................... 35
3.4 Land cover maps ........................................................................................................................ 39
3.5 Land cover change detection .................................................................................................... 43
4. Discussion ............................................................................................................................ 46
4.1 Sentinel-2 data for Random Forest land cover classification ................................................ 46 4.1.1 Single-date vs. multi-temporal datasets ............................................................................................... 46 4.1.2 Classification scheme ........................................................................................................................... 47 4.1.3 Seasonal differences and phenology .................................................................................................... 49 4.1.4 Vegetation indices ................................................................................................................................ 49
ii
4.1.5 Rusinga land cover maps ..................................................................................................................... 50
4.2 Post-classification change detection for land cover change and vegetation monitoring ..... 51
4.3. Concluding remarks and future outlook ................................................................................ 52
Acknowledgements .................................................................................................................. 53
References ................................................................................................................................ 54
List of figures
Figure 1: Study area. ................................................................................................................................................ 8 Figure 2: Erosion and vegetation restoration activities on Rusinga Island. ......................................................... 10 Figure 3: Workflow of data acquisition and geometric pre-processing. ............................................................... 15 Figure 4: Average spectral signatures of the five land cover classes and the four land cover classes. ................ 19 Figure 5: Locations and distribution of reference samples. .................................................................................. 22 Figure 6: Workflow of the land cover classification and change detection. . ........................................................ 30 Figure 7: Input feature importance ranking measured by the mean decrease in accuracy. .................................. 36 Figure 8: Normalised feature importance score of the individual dates compared to the mean NDVI of all scenes
of the overall land area of Rusinga Island. ............................................................................................................ 38 Figure 9: Land cover maps of Rusinga Island. ...................................................................................................... 39 Figure 10: Rusinga Island land cover map with additional buildings data from OpenStreetMap. ....................... 40 Figure 11: Comparison of the land cover map produced in this study with existing (global) land cover maps. .. 42 Figure 12: Land cover change map of Rusinga Island between May 2016 – April 2017 and May 2018 – April
2019. ....................................................................................................................................................................... 43 Figure 13: Land cover change maps of Rusinga Island between May 2016 – April 2017 and May 2018 – April
2019 highlighting vegetation increase and decrease. ............................................................................................ 44
List of tables
Table 1: Technical specifications of the Sentinel-2 satellites. ................................................................................ 11 Table 2: Overview of Sentinel-2 scenes selected for analysis.. .............................................................................. 13 Table 3: Vegetation indices used in this study. ...................................................................................................... 17 Table 4: Description of land cover classes. ........................................................................................................... 19 Table 5: Selected scenes for the predictor datasets. .............................................................................................. 25 Table 6: Classification performance based on internal OOB validation. .............................................................. 32 Table 7: Confusion matrix of the multi-temporal Random Forest land cover classification model after feature
selection (based on internal OOB validation). ....................................................................................................... 33 Table 8: Accuracy assessment results based on the independent test dataset. ...................................................... 34 Table 9: Confusion matrices of the classification predictions and the actual classes of the independent test
dataset. .................................................................................................................................................................... 35 Table 10: Input feature importance ranking grouped by spectral band and vegetation index. ............................. 37 Table 11: Input feature importance ranking grouped by date. .............................................................................. 38 Table 12: Adopted from Table 8 in results chapter: accuracy assessment results including number of scenes and
number of features included in the predictor dataset after feature selection. ........................................................ 47
1
Abstract
Healthy forests and soils are crucial for the very existence of mankind as they provide food,
clean water and air, shade and protection against floods and storms. With their photosynthetic
carbon storage ability, they mitigate climate change and fertilise and stabilise soils.
Unfortunately, deforestation and the loss of fertile soils are the bleak reality and among the
world’s most pressing challenges. Over the past decades Kenya has faced severe deforestation,
but efforts are being undertaken to reverse deforestation, revegetate degraded land and combat
erosion. Satellite remote sensing technology becomes increasingly useful for vegetation
monitoring as the data quality improves and the costs decrease. This thesis explores the
potential of free open access Sentinel-2 data for vegetation monitoring through Random Forest
land cover classification and post-classification change detection on Rusinga Island, Kenya.
Different single-date and multi-temporal predictor datasets differentiating respectively between
five and four classes were examined to develop the most suitable model. The classification
achieved acceptable results when assessed on an independent test dataset (overall accuracy of
90.06% with five classes and 96.89% with four classes), which should however be confirmed
on the ground and could potentially be improved with better reference data. In this study,
change detection could only be analysed over a time frame of two years, which is too short to
produce meaningful results. Nevertheless, the method was proven conceptually and could be
applied in the future to monitor land cover changes on Rusinga Island.
Keywords: Land cover classification, post-classification change detection, Random Forest,
remote sensing, Sentinel-2
List of abbreviations
AOT Aerosol Optical Thickness
ASTER Advanced Spaceborne Thermal Emission and Reflection Radiometer
AVHRR Advanced Very High-Resolution Radiometer
CCI Climate Change Initiative (ESA)
CNES National Centre for Space Studies (France)
DEM Digital Elevation Model
ENVISAT - MERIS Environmental Satellite - Medium Resolution Imaging Spectrometer
EOSDIS Earth Observing System Data and Information System
ESA European Space Agency
ETM+ Enhanced Thematic Mapper Plus (Landsat)
EU European Union
FAO United Nations Food and Agricultural Organisation
GIS Geographic Information System
GNDVI Green Normalised Difference Vegetation Index
GRVI Green Red Vegetation Index
HJ-1 Huanjing-1 (satellite system, China)
IPCC Intergovernmental Panel on Climate Change
2
IRECI Inverted Red-edge Chlorophyll Index
ITPS Intergovernmental Technical Panel on Soils
LiDAR Light Detection and Ranging
LP DAAC Land Processes Distributed Active Archive Center
MDA Mean Decrease in Accuracy
MDG Mean Decrease Gini
METI Ministry of Economy, Trade, and Industry (Japan)
MSI Multi-spectral Instrument
NASA National Aeronautics and Space Administration (United States of
America)
NDII Normalised Difference Infrared Index
NDVI Normalised Difference Vegetation Index
NIR Near-infrared
OA Overall Accuracy
OOB Out-of-bag
PA Producer’s Accuracy
PROBA-V Project for On-board Autonomy - Vegetation
S2 Sentinel-2
SAVI Soil-adjusted Vegetation Index
SPOT Satellites Pour l’Obversation de la Terre (CNES)
SWIR Short Wave Infrared
TM 5 Thematic Mapper 5 (Landsat)
UA User’s Accuracy
UN United Nations
UNEP United Nations Environment Programme
USGS United States Geological Survey
UTM/WGS Universal Transverse Mercator / World Geodetic System
VHR Very High Resolution
VRE Vegetation Red Edge
WCED World Commission on Environment and Development
WV Water Vapour
3
1. Introduction
The surface of the Earth is constantly changing for a great variety of reasons, many of which
are manmade. According to the recent special report on land by the Intergovernmental Panel
on Climate Change (IPCC, 2019) more than 70% of the global land surface is directly affected
by human use. Land use and land cover change, which often occurs in form of deforestation or
conversion from naturally vegetated areas to fields for agricultural use, is a major driver of
climate change, biodiversity loss and land degradation, which in turn have negative
implications for carbon cycling, ecosystems, and food security (FAO, 2016c; FAO & ITPS,
2015; IPCC, 2019; UNEP, 2016). Mapping and visualising these changes support
understanding their patterns, causes, and implications. Monitoring and analysing land use and
land cover change has emerged as a field of scientific research and found application in both
public and private sectors. Land cover monitoring provides valuable insights to advice policy-
and decision-making on different levels and influence strategies and regulations for
development and land use (Lunetta, Knight, Ediriwickrema, Lyon, & Worthy, 2006). The aim
of this thesis is to explore the suitability of free open access satellite data to classify land cover
and to detect land cover changes for vegetation monitoring purposes. The Kenyan island
Rusinga in Lake Victoria served as a case study.
This chapter introduces the topic of land use and land cover change in Africa and Kenya in the
current global development and climate change context. Moreover, it introduces remote sensing
and presents its application for land cover classification and change detection. This is followed
by a brief introduction of land degradation, soil erosion, and the role vegetation cover plays as
control measure to counteract erosion and land degradation. Finally, the aim of this study is
presented together with the research questions, which led the work. Chapter 2 presents the data
and methods used in this study and starts with a description of the study area. It then describes
the satellite and software and continues with the data acquisition and pre-processing, followed
by the description of the classification schemes and reference data. Next, the Random Forest
(Breiman, 2001) algorithm is introduced including a description of the predictor datasets used
in this study, feature selection and the accuracy assessment strategy. This is followed by a brief
description of the post-processing and map creation methods as well as the methodology for
the change detection. Chapter 3 presents the results of this study starting with the Random
Forest classification results including the feature selection and accuracy assessment. Then the
results of the feature importance analysis are presented before visualising the land cover maps.
Finally, the results of the change detection are presented by change maps. Chapter 4 provides a
discussion of the results guided by the two primary research questions including the sub-
questions and puts the outcomes in a wider context. Finally, it provides concluding remarks on
the suitability of land cover classification and change detection using Sentinel-2 data, Random
Forest and remotely collected reference data for vegetation monitoring by local organisations
on Rusinga Island.
1.1 Land use and land cover change in Africa and Kenya
Most dramatic land use and land cover change in Africa is seen in the reduction of forest cover,
often resulting from agricultural expansion and urbanisation. Additionally, mining causes land
4
use and land cover change in Africa, directly through clearing of forest for mining activities,
and indirectly through settlement of workers and the subsequently increased need for
agricultural production in the region (UNEP, 2016). Deforestation remains a major problem in
Africa. While the African continent held only 15.6% of the world’s total forest area in 2015,
84% of the global forest loss between 2010 and 2015 occurred in Africa (FAO, 2016b). This is
an immense deforestation rate for the rather small share of forest cover. Increasing world
population, economic growth and investment in large-scale commercial agriculture are the main
drivers for forest loss and other land cover changes in Africa (UNEP, 2016). African population
is projected to increase by 115% between 2013 and 2050 (FAO & ITPS, 2015). Increasing
wealth and shifting dietary preferences towards more livestock-based foods increase the need
for agricultural land and puts pressure on the food production system and soil health (FAO,
2013; Montanarella et al., 2016). Subsequent unsustainable land use and agricultural practices,
such as cropland expansion, increasing livestock populations, large-scale monocultures and
chemical fertilisers as well as deforestation and over exploitation of natural resources cause
increasing soil erosion, soil fertility loss, and biodiversity loss (Borrelli et al., 2017; Cebecauer
& Hofierka, 2008; FAO & ITPS, 2015; Pimentel & Kounang, 1998). Deforestation and poor
soil quality have implications for carbon cycling and the climate. Climate change is another
important driver of land cover change, but at the same time changes in land use and land cover
contribute to climate change. Some land use and land cover change becomes inevitable as
climate change contributes to soil- and land degradation (UNEP, 2016). Such degraded land
becomes useless for agriculture or natural vegetation and often turns into wasteland.
Conversely, changes such as deforestation, agricultural expansion, mining or urbanisation
contribute to climate change as they reduce natural carbon sinks, cause pollution and
greenhouse gas emissions and drive soil degradation.
1.2 Remote sensing for land use and land cover change monitoring
Remote sensing, which is the acquisition of geospatial data from space or air, plays a vital role
for the analysis and monitoring of land use and land cover change and vegetation dynamics.
Remote sensing systems can be used as an alternative or complementary method to traditional
ground-based data collection, which can be labour intensive, time consuming and expensive.
Other advantages of remote sensing are that these systems can cover large areas and long
periods of time and enable consistent observations with high revisit frequency. Moreover, they
are not disturbing the landscape and enable researchers to collect data in otherwise inaccessible
areas such as mountain regions, glaciers and jungles (Willis, 2015). Defining properties of
remote sensing systems are their spatial, temporal, radiometric and spectral resolution. The
spatial resolution describes the pixel size of the sensed image, which represents the area covered
on the ground. There is some discrepancy in the categorisation of the systems by their spatial
resolution (Karlson, 2015; Rees, 2012). However, Sentinel-2 data, which were used in this
study and have a spatial resolution of 10 m, can be classified as high resolution images (ESA,
2015), while very high resolution (VHR) images have pixel sizes of 5 m or smaller and the
pixel size of medium resolution images ranges from 50 to 500 m. Any coarser pixel size is
referred to as low resolution (Rees, 2012). The temporal resolution describes the revisit time of
the sensor system, thus how frequently a place is covered by the sensing instrument (Karlson,
5
2015; Lillesand, Kiefer, & Chipman, 2015). The sensors used in remote sensing detect
electromagnetic radiation of sunlight reflected from ground objects. The radiometric resolution
describes the capacity of the sensor to distinguish differences in light intensity. Multi-spectral
sensors, such as the one used for the collection of Sentinel-2 data, sense the radiation at different
wavelength ranges within the electromagnetic spectrum. One multi-spectral image is then
composed of a set of different spectral bands, each containing the reflectance data of a specific
wavelength range. The spectral resolution defines the ability of the sensor to differentiate
different wavelengths. Different types of ground surface have distinct reflectance properties
along the electromagnetic spectrum, called spectral signatures. The spectral signatures are
useful to detect and classify different types of land cover. Moreover, they are used to create
indices that provide more specific information on certain ground cover types. Various
vegetation indices utilise the distinct spectral signature of green vegetation (Albertz, 2009;
Lillesand et al., 2015).
Remotely sensed data has been used in numerous studies to detect, classify and monitor land
use and land cover changes, especially in degraded or vulnerable as well as in protected areas
(e.g. W. B. Cohen, Yang, Healey, Kennedy, & Gorelick, 2018; Fensholt & Proud, 2012;
Frampton, Dash, Watmough, & Milton, 2013; Islam, Jashimuddin, Nath, & Nath, 2018; Rawat
& Kumar, 2015; Turner et al., 2015; Willis, 2015). Furthermore, it is useful for understanding
and assessing impacts of land use and land cover change on erosion risk and to evaluate and
develop land management strategies (e.g. Leh, Bajwa, & Chaubey, 2013; Nyberg et al., 2015;
Willis, 2015). Remote sensing for land cover classification and change detection has emerged
as an intensely studied field of research, which has developed a large number of methodologies
and techniques to study land use and land cover change of various natures. In pixel-based
approaches spectral pattern recognition is used to categorise each image pixel according to their
spectral signatures and assign them a class (Lillesand et al., 2015). The Random Forest
classification algorithm (Breiman, 2001), which is described in more detail in Chapter 2.5, can
be used to predict pixel classes based on a predictor dataset and a training dataset, which are
fed into the model. The predictor dataset contains a set of so-called features, in this case the
spectral bands as well as vegetation indices. If it contains data sensed at one point in time, it is
referred to as single-date predictor dataset. Subsequently, multi-temporal predictor datasets
contain image data sensed at multiple points in time. The training dataset consists of
representative training areas to which the researcher assigned a class beforehand, ideally based
on ground evidence (Lillesand et al., 2015). In this study training data was collected remotely
using VHR satellite images. The study compared two different classification schemes, which
describe the different classes and were defined by the author. Change detection is one of the
most common applications of remote sensing image analysis and refers to the comparison of
an area over time. In post-classification change detection, the approach used in this study, two
images are separately classified and then the two classification outputs are compared to each
other (Lillesand et al., 2015; Tewkesbury, Comber, Tate, Lamb, & Fisher, 2015).
1.3 Vegetation to reduce land degradation
According to the IPCC (2014), global temperatures are increasing, precipitation patterns
become less predictable and extreme weather events causing droughts and floods are occurring
6
more frequently. Africa is among the regions that are most severely affected by these climatic
changes (Niang et al., 2014), which cause unhealthy soils and erosion. Precipitation changes
affect soil moisture and the water holding capacity of soils, while temperature increase affects
soil formation. Soil erosion was found to be the number one threat to soil function globally and
especially in sub-Saharan Africa (FAO & ITPS, 2015). It contributes to land degradation and
affects food security and livelihoods, ecosystems and water resources and poses risks of
landslides, flooding and desertification to humans and nature. The loss of fertile topsoil
desolates agricultural productivity and reduces infiltration capacities of soils, which can cause
floods, water pollution and destruction of infrastructure (Bastola, Dialynas, Bras, Noto, &
Istanbulluoglu, 2018). A recent globally consistent comparative study assessed soil erosion
rates and found that in 2012 6.1% of the global land surface was affected by erosion with the
highest soil erosion rate of 10% found in Africa (Borrelli et al., 2017).
A solution, which is repeatedly suggested by researchers and international organisations is
sustainable land and forest management (FAO & ITPS, 2015; IPCC, 2019; Niang et al., 2014;
UNEP, 2016). It is very broadly defined, adopted from the common definition of sustainable
development (WCED, 1987), as the use of land and forest resources to meet the current
changing needs of humanity while ensuring long-term functionality and productivity of these
resources in the future. More practically this includes approaches like conservation agriculture
and forestry, agroecology including permaculture, agroforestry and perennial cropping systems
(IPCC, 2019), all of which are based on the principle of permanently covering the soil with
vegetation (FAO, 2014, 2015, 2016a; Ferguson & Lovell, 2013). The increasing awareness
these forms of land management are getting in the global arena is beneficial for climate
resilience initiatives and contributes to mitigating and adapting to the adverse effects of climate
change in many regions of the world, especially those affected by desertification or severe land
degradation and erosion. Vegetation cover is proven to prevent soil erosion by reducing run-
off, stabilising soils and infiltrating water and nutrients (Borrelli et al., 2017; Pimentel &
Burgess, 2013). For example, Bastola et al. (2018) found that both backfilling and revegetation
measures are effective in reducing gully erosion development. In the long term, however,
revegetation measures are found to be more effective although being highly dependent on the
density and strength of the roots and the revegetation management practices. Similar results
have previously been found by Gomez et al. (2003), who conclude that woody vegetation is
more effective for gully erosion prevention than grassy vegetation. Nyssen et al. (2004) prove
that a combination of sediment holding structures, such as check dams, and revegetation are
effective for gully erosion control. Increased vegetation cover does not only benefit local soils
by reducing erosion and increasing its moisture holding capacity, it also has a cooling effect on
the local and regional climate through increased evapotranspiration and acts as carbon sink,
mitigating climate change (IPCC, 2019). The Kenyan Government has included sustainable
land and forest management and extensive afforestation in its ‘Vision 2030’ (The Presidency,
2018). It also launched a ‘Greening Kenya’ initiative, which contributes to the national goal of
achieving 10% forest cover by 2022 by planting 1.8 billion trees (UN Environment, 2018). On
Rusinga Island, which is extremely deforested, a community-based organisation called
Badilisha undertakes efforts to revegetate the island to reduce erosion and land degradation and
to sustainably manage its natural resources. A similar project has proved successful in Lesotho,
7
where highland communities built physical barriers and revegetated areas to combat soil
erosion and protect headwaters (Orange-Senqu River Commission, 2014).
1.4 Aim of this study
A number of institutions and projects have developed global land cover and land cover change
maps, which are freely available (e.g. ESA & Université Catholique de Louvain, 2010; ESA
CCI Land Cover Project, 2015; Hansen et al., 2013; Mayaux et al., 2003; National Geomatics
Center of China, 2010). However, many of these maps have a too coarse spatial resolution or
are not adjusted for application on a local scale. The aim of this study was to explore the
suitability of free open access high resolution Sentinel-2 satellite data and open source software
to classify and map land cover to support vegetation monitoring and erosion control on Rusinga
Island, Kenya. There is a wide range of VHR resolution satellite images available commercially
with resolutions as high as 0.3 m (DigitalGlobe, 2017). For free open access satellite images,
however, the 10 m resolution of the Sentinel-2 images is the highest spatial resolution currently
available. Using Rusinga Island as a case study, this study evaluated an established method for
remote sensing-based monitoring of vegetation changes using Random Forest (Breiman, 2001)
classification and post-classification change detection. Moreover, it aimed to support the
development of a tool, which can be used for processing Sentinel-2 imagery to identify,
visualise and analyse land cover change on Rusinga Island. The following questions led the
research in this study:
1. Are Sentinel-2 data suitable to classify land cover on Rusinga Island using
Random Forest classification and remotely collected reference data?
a. How does the performance of single-date predictor datasets
compare to the performance of multi-temporal predictor datasets?
b. How do seasonal differences in the landscape influence the
classification performance?
c. How does the definition of the classes in the classification scheme
affect the model performance?
d. How do vegetation indices contribute to classification accuracy
compared to the Sentinel-2 spectral bands?
2. Are the land cover classifications resulting from Sentinel-2 data suitable to detect
changes to support vegetation monitoring on Rusinga Island?
8
2. Materials and methods
2.1 Study area
Rusinga is an island of about 40 km2 (Tryon et al., 2014) on the western border of Kenya in the
north eastern part of Lake Victoria. It stretches approximately 10 km from north to south and
12 km from east to west. Since the 1980s it is connected to the mainland by an artificial
causeway, which bridges the 250 m wide channel between the island and the mainland (Tryon
et al., 2014). The island is characterised by a number of hills of which Ligongo Hill is the
highest with 300 m above lake level (see Figure 1; Andrews, 1973; Tryon et al., 2014). Lake
Victoria itself is located 1134 m above sea level (Andrews, 1973).
Figure 1: Study area: A: Kenya in Africa; B: Rusinga Island in Kenya; C: Satellite image of
Rusinga Island including its outline, highest peak and road network; D: Digital elevation model
(DEM) of Rusinga Island. The ASTER DEM was retrieved from the online Earth Explorer,
courtesy of the NASA EOSDIS Land Processes Distributed Active Archive Center (LP DAAC),
https://earthexplorer.usgs.gov/ ASTER GDEM is a product of Japan’s Ministry of Economy,
Trade, and Industry (METI) and NASA (NASA & METI, 2011).
The climate in Kenya is characterised by two rainy seasons. The so-called ‘long rains’ are
concentrated from March to May and are more intense and more reliable than the ‘short rains’
occurring in October and November (Nicholson, 2017). However, in the lake region the
precipitation is relatively spread throughout the year so that even the driest month received a
modest amount of rain (Andrews, 1973). The lake water level as well as vegetation around Lake
Victoria are strongly influenced by local precipitation, and have been throughout the history of
A B
D C
9
the lake (Tryon et al., 2014). However, it is perceived that climate change causes shifting
precipitation patterns, and droughts are becoming more common (Badilisha, n.d.). These
observations and concerns are confirmed by science observing and predicting more extreme
and less predictable weather events caused by climate change with increasingly negative
impacts on people living in regions that are already degraded (IPCC, 2019). In the regional
assessment for Africa (Niang et al., 2014) the IPCC reports observed and projected increases
in temperature in the past 50 to 100 years and for the 21st century. Precipitation patterns show
a less clear trend and reveal high spatial and temporal variation. Nevertheless, in eastern Africa
a decrease in precipitation was observed during the wet season (March – May). However,
models project more intense wet seasons (March – May and October – December) by the end
of the 21st century, but drier Augusts and Septembers. An increase of extreme events, such as
droughts, has been observed in East Africa over the past 30 to 60 years. Extreme precipitation
events are also projected to increase by the mid 21st century.
While it is difficult to find reliable population data for Rusinga Island, it is estimated that in
2012 the population was about 35,000 compared to only 5000 in the early 1980s (Byrne, 2013).
Official national census statistics counted a population of 24,275 on Rusinga in 2009 (Kenya
National Bureau of Statistics, 2010). However, no earlier or later statistics could be found. The
official census statistics for Homa Bay district, which includes Rusinga, report a population
increase from 745,040 in 1999 and 917,170 in 2009 to 1,131,950 in 2019 (Kenya National
Bureau of Statistics, 2012, 2019). This corresponds to an average yearly growth rate of
approximately 2.1%. The region is one of the poorest in Kenya with a large proportion of the
population living in extreme poverty. HIV/AIDS rates in the region are among the highest
nationwide. The island’s population is largely dependent on fishing for their livelihoods.
However, overfishing and the increasingly poor ecological state of Lake Victoria have caused
many residents to turn to agriculture and livestock keeping instead of or additional to fishing
(Badilisha, n.d.; Byrne, 2013; Kanyala Little Stars, n.d.). The region is considered highly food
insecure (UNEP, 2009).
The island used to be densely vegetated before deforestation increased driven by population
growth and the resulting need for building material, firewood or income generation.
Community elders narrate of lush forests covering the hills and of sufficient rain and water
resources on the island 35 to 50 years ago (Byrne, 2013). Although Andrews’ (1973) vegetation
study of Rusinga, which was published nearly five decades ago, depicts intense human
influence and anthropogenic forest clearance already in the 1970s, there is no doubt that the
situation worsened significantly since. Today the island is extremely deforested and its people
suffer from droughts, hunger, and poverty (Byrne, 2013; Mureithi, Mwagi, & Gruber, 2018).
The deforestation causes soil erosion, especially on the hill slopes, and increased risk of
landslides or mud floods as well as droughts since the soil loses the ability to capture and store
water. Land degradation decreases agricultural productivity and has negative impacts on the
island’s ecosystem. Subsequently, it undermines the livelihood of the local population. Several
large gullies have formed on Rusinga, especially on the hill slopes and on the foot of the hill
due to uncontrolled precipitation run-off during the rainy seasons over the past years as depicted
in Figure 2. Moreover, sand and gold mining are reported as major causes of land degradation
10
on the island. Additionally, trees are usually regarded solely as a source of energy (as firewood
or charcoal) by the local community and are subsequently planted mainly for this purpose
(Nyaga, 2018; Okolla, 2018). To reduce these risks and to prevent erosion, revegetation efforts
have been initiated on the island. One local organisation engaging in revegetation efforts and
erosion control is the Badilisha Eco-village Foundation Trust. Badilisha is a community-based
organisation engaging in various community development projects in the fields of food security,
care for vulnerable children, livelihoods for women, education, tree planting and permaculture.
The principles of permaculture provide the base for Badilisha, which means ‘change’. The
organisation initiated erosion control efforts especially on the hill slopes since the major rainy
season in 2018 through building sediment holding structures like check dams and through
revegetation by spreading different native grass seeds and planting trees (Figure 2; Mureithi et
al., 2018; Wagenknecht, 2018).
Figure 2: Erosion and vegetation restoration activities on Rusinga Island: A: Large gully
erosion on Rusinga Island; B: Community members working on environmental restoration:
building dams and stabilising them with plants; C: Seedling nursery at Badilisha community
centre; D: Loose rock dam stabilised with vegetation (picture source: Books for Trees).
2.2 Sentinel-2 data and pre-processing
This study relied on open access data and open source software as the evaluated method is
intended to be easily replicable by non-profit organisations and other stakeholders. The satellite
imagery was retrieved from the European Space Agency’s (ESA) satellite Sentinel-2, which is
D C
A B
11
a multispectral, high resolution, wide-swath twin-satellite system circling the Earth in a polar,
sun-synchronous orbit. Sentinel-2 collects data globally apart from open seas and the poles. The
first satellite was launched in June 2015, while the second followed in March 2017. Each
satellite has a revisit frequency of ten days. The satellites are phased at 180° to each other,
which allows for a combined revisit frequency of five days at the equator and two to three days
in mid-latitudes. Each satellite has a swath width of 290 km and carries a Multi-Spectral
Instrument (MSI), which collects data in 13 different spectral bands, four of which with a spatial
resolution of 10 m, six bands with 20 m resolution, and three bands at 60 m. Since the sensing
instrument works passively, using the reflectance of sunlight, the orbit synchronisation with the
sun is crucial as it minimises variations of the reflectance angle as well as potential shadows
(ESA, 2015). The technical specifications of the Sentinel-2 satellites are summarised in Table
1. The high resolution of the imagery is important for this study since it theoretically allows a
fine spatial scale and accurate classification, which is crucial for detecting small features. Since
the landscape on Rusinga Island is rather heterogenous and erosion damage usually occurs in
narrow but deep gullies, using high resolution imagery is crucial for the analysis. Moreover,
the revegetation efforts are also undertaken in small and specific areas rather than the creation
of large-scale plantations.
Table 1: Technical specifications of the Sentinel-2 satellites (ESA, 2015).
Satellites Sentinel-2 A
Sentinel-2 B
Launched in June 2015
Launched in March 2017
Sensing instrument Multi-Spectral Instrument (MSI)
Swath width 290 km
Temporal
resolution
5 days (at equator)
Spectral and
spatial resolution
Band Central
wavelength (nm)
Region Spatial
resolution (m)
1 443 Coastal aerosol 60
2 490 Blue 10
3 560 Green 10
4 665 Red 10
5 705 Vegetation red edge
(VRE)
20
6 740 Vegetation red edge 20
7 783 Vegetation red edge 20
8 842 Near-infrared (NIR) 10
8a 865 Narrow NIR 20
9 940 Water vapour 60
10 1375 Short wave infrared
(SWIR) cirrus
60
11 1610 SWIR 20
12 2190 SWIR 20
Two different Sentinel-2 data products at different processing levels are freely available for
users to download from ESA’s Copernicus Open Access Hub (ESA, 2019): level 1C, and
12
level 2A data. To prepare satellite data for representation and analysis, pre-processing of the
data is needed. This involves correction of geometric and radiometric errors in the data.
Sentinel-2 L1C data is geometrically and radiometrically corrected and comes in ortho-images
of 100 km by 100 km, so-called tiles or granules. These are top-of-atmosphere reflectance
images. Sentinel-2 L2A data have the same cartographic geometry as the L1C data, but are
additionally corrected for atmospheric, topographic, and adjacency effects, and are thus called
bottom-of-atmosphere reflectance (ESA, 2015). To avoid distortions, only L2A data were used
in this study. L1C data products for the entire operating period of the satellite system are freely
available for download from the Copernicus Open Access Hub (ESA, 2019). After a successful
pilot since May 2017, ESA was working on providing readily processed L2A data products to
users, starting with the Mediterranean region as of 26 March 2018. Worldwide coverage with
L2A data was planned to be achieved by summer 2018, but was extended to the end of 2018
(ESA, 2018b, 2018d). Since mid-December 2018 L2A products for Rusinga Island (T36MXE)
sensed after 17 December 2018 can be downloaded from the Copernicus Open Access Hub
(ESA, 2019). For the data sensed earlier than that date, the atmospheric correction to transform
L1C data into L2A data needs to be performed by the user using the Sen2Cor processor as
plugin in the Sentinel Application Platform (ESA, 2018c) or as command line programme. This
processor is available for download on ESA’s Sentinel website.
The Sentinel-2 data tiles are projected in UTM/WGS84 (ESA, 2015). In this projection,
Rusinga Island is located in zone 36S. Subsequently, all spatial analysis in this thesis was
performed using WGS84/36S (EPSG 32736) projection.
2.2.1 Software
Most processing and analyses were performed using the programming language R (R Core
Team, 2018) in RStudio version 1.1.447 (RStudio, 2018). Apart from base commands, the
following packages were used: ‘raster’ (Hijmans et al., 2019), ‘randomForest’ (Breiman,
Cutler, Liaw, & Wiener, 2018), ‘rgdal’ (Bivand et al., 2019), ‘RStoolbox’ (Leutner, Horning,
Schwalb-Willmann, & Hijmans, 2019), ‘stringi’ (Gagolewski, Tartanus, contributors (stringi
source code), IBM and other contributors (ICU4C source code), & Unicode Inc. (Unicode
Character Database), 2019), ‘caret’ (Kuhn et al., 2019), ‘grDevices’ (R Core Team, 2019),
‘gdalUtils’ (Greenberg & Mattiuzzi, 2018), ‘DescTools’ (Signorell et al., 2017), ‘plotrix’
(Lemon et al., 2019), and ‘xtable’ (Scott et al., 2019). Atmospheric correction using the
Sen2Cor processor version 5.5.2 was performed as command line programme on MacOS 10.13.
For tasks when a geographic information system (GIS) interface was needed, mainly for
reference data collection and map production, QGIS versions 3.0 to 3.6 (QGIS Development
Team, 2019) were used.
2.2.2 Data acquisition and pre-processing
Sentinel-2 data for the study area (tile 36MXE) were downloaded directly from ESA’s Sentinel
data hub (ESA, 2019) at processing level 1C for those sensed before 17 December 2018, and at
processing level 2A for those sensed after that date. Scenes where Rusinga Island is cloud-free
were visually identified on the platform and selected for download. An overview of the selected
scenes is presented in Table 2.
13
Table 2: Overview of Sentinel-2 scenes selected for analysis. * Scenes used for land cover
classification model selection as well as for change detection.
27.05.2016 15.08.2016 14.10.2016 12.01.2017 02.04.2017* Change
detection –
first period
06.06.2016 25.08.2016 23.12.2016 13.03.2017* 12.04.2017*
13.03.2017*
02.04.2017*
12.04.2017*
22.05.2017
01.07.2017
11.07.2017
31.07.2017
05.08.2017
15.08.2017
09.09.2017
04.10.2017
09.10.2017
29.10.2017
03.11.2017
18.11.2017
28.11.2017
13.12.2017
23.12.2017
28.12.2017
17.01.2018
22.01.2018
01.02.2018
11.02.2018
16.02.2018
26.02.2018
Land cover
classification
& model
selection
01.06.2018 26.07.2018 24.09.2018 07.01.2019 Change
detection –
second period
01.07.2018 31.07.2018 19.10.2018 12.01.2019
06.07.2018 09.09.2018 28.11.2018 18.03.2019
The data sensed before 17 December 2018 needed to be pre-processed to obtain data at
processing level 2A. The pre-processing was performed using the Sen2Cor processor as
command line tool. The scene classification algorithm detects snow, clouds, cirrus, and cloud
shadow and produces a scene classification map with a focus on cloud differentiation. This map
is used as an input for the cirrus removal of the subsequent atmospheric correction. The
atmospheric correction process consists of five steps. Firstly, Look-Up Tables are prepared,
which contain specific information on sensor and solar geometries, atmospheric parameters,
and ground elevation. Different Look-Up Tables are calculated for the specifics of the
respective tile and user configurations. Secondly, Aerosol Optical Thickness (AOT), which is
a measure for the visual transparency of the atmosphere, is retrieved using the Dense Dark
Vegetation algorithm described by Kaufman et al. (1997). Next, water vapour (WV) content
over land is retrieved using the Atmospheric Pre-corrected Differential Absorption algorithm
(Schläpfer, Borel, Keller, & Itten, 1998). Then, cirrus is removed using the classification map
produced by the previous scene classification algorithm. Especially the visible, near-infrared
(NIR), and short-wave infrared (SWIR) spectral bands are affected by disturbance of this cirrus
clouds, which are difficult to detect by broadband multispectral sensors. Therefore, the MSI of
Sentinel-2 contains a separate band (band 10: ~ 1337.5 – 1413 nm) to sense cirrus clouds.
Finally, surface reflectance is retrieved for all bands. As a result, a bottom-of-the-atmosphere
reflectance output at processing level 2A is produced. The processor was run at a resolution of
10 m. However, since cirrus correction is only possible at 20 and 60 m resolutions, the
processes have been run twice; first at 20 m and then at 10 m resolution (Müller-Wilm, 2018a,
2018b). Data naming of Sentinel-2 products has changed as of 06 December 2016 to overcome
pathname character limitations of Windows operating systems (ESA, 2018a). With the current
version of the Sen2Cor processor (v2.5.5) the old data naming format could not be processed.
However, for the study area data in the new naming format are available for download until as
early as 27 May 2016. Thus, for this thesis data conversion from processing L1C to L2A has
been performed for cloud-free scenes between 27 May 2016 and 28 November 2018. Five
earlier cloud-free scenes (between 29 November 2015 and 28 March 2016) in the old naming
format have been disregarded for this study.
14
The Sen2Cor 20 m resolution processing produces separate atmospherically corrected spectral
reflectance bands with 20 m resolution (originally 20 and 10 m resolutions, except B08 is
omitted in 20 m processing). Additionally, AOT and WV files are produced along with the
scene classification map. The 10 m resolution processing produces atmospherically corrected
files of the original 10 m resolution bands. Besides, AOT and WV files are produced as well as
a true colour image. The cirrus band (B10) is omitted in both cases as it does not provide any
ground information (Müller-Wilm, 2018a).
The 10 m spectral bands (B02, B03, B04, B08) and 20 m spectral bands (B05, B06, B07, B8A,
B11, B12) were combined for each scene (10 bands per scene). The remaining Sen2Cor outputs
were disregarded.
The spatial extent of each tile was cropped to a rectangle covering the area of Rusinga Island
and small parts of the contiguous mainland to reduce the size of the files and subsequently
computing power and memory space requirements. Moreover, the selected bands of each scene
were resampled to the highest spatial resolution (10 m). Finally, the processed bands of each
scene were composed to single files, which were saved in GeoTiff format. The produced multi-
band subsets were visually evaluated for their suitability. Images which showed disturbance
from clouds or cloud shadows were excluded from the dataset.
Even if all data is obtained from the same source and has undergone the same processing, some
irregularities might occur. Systematic errors, thus those occurring in all scenes sensed by the
instrument, are corrected by ESA in the Payload Data Ground Segment. Non-systematic errors
might occur in single images and cannot be systematically corrected. Thus, any non-systematic
correction needs to be performed by the user (ESA, 2015; Jones & Vaughan, 2010). ESA aims
for 3 m (95% confidence level) performance of the multi-temporal registration, which,
according to their recent Quality Report, is currently at an average of 12 m (Clerc & Team,
2018). This means that images might be slightly shifted. Therefore, the pixel shift of all images
used for this study was calculated and corrected, if a shift was detected. A master image (13
March 2017) was defined, which is the temporally closest cloud-free Sentinel-2 scene to the
data used for the collection of reference data (20 February 2017, as described in Chapter 2.4).
This master image was visually compared to very high resolution (VHR) images (Google
Satellite layer, acquired by the French National Centre for Space Studies (CNES), the Bing
Virtual Earth layer, the Esri satellite layer, all three embedded in QGIS, and a false colour
Pléiades-1A image) and found to be well aligned, which qualifies it to be used as master image.
Subsequently, the shifts of all images relative to the master image were calculated and corrected
if necessary. The algorithm calculates the best shift based on maximum mutual information,
which is a measure for the mutual dependence of two random variables in information theory
(Leutner et al., 2019). The workflow of data acquisition and pre-processing is visualised in
Figure 3.
15
Figure 3: Workflow of data acquisition and geometric pre-processing. Italic steps indicate
optional operations only for parts of the data.
2.2.3 Vegetation indices
Certain objects or ground covers reflect light differently at different wavelengths. There are
many factors influencing how light interacts with different ground characteristics. For example,
structural features, texture, water content and chlorophyll content determine how vegetation
reflects and absorbs light at different wavelengths. To measure and account for perturbing
parameters, different mathematical combinations of the spectral bands the satellite sensor
collects are used, so-called spectral indices. The use of indices allows for normalisation of the
spectral data of unrelated ground characteristics and for enhancement of sensitivity in a small
reflective spectrum. The use of vegetation indices is commonly adopted in studies of land cover
mapping and vegetation monitoring using remote sensing (e.g. Eckert, Hüsler, Liniger, &
Hodel, 2015; Lunetta et al., 2006; Motohka, Nasahara, Oguma, & Tsuchida, 2010; Nyberg et
al., 2015; Viña, Gitelson, Nguy-Robertson, & Peng, 2011; Wibowo, Ismullah, Dipokusumo, &
Wikantika, 2012; Yang et al., 2015). Vegetation indices are good and comparable proxies for
vegetation net primary production, vegetation trend analysis and vegetation phenology as they
make use of the vegetation’s sharp increase in reflectance between the red and the near-infrared
bands (Fensholt & Proud, 2012; Jones & Vaughan, 2010). Therefore, the ratio of those two
bands can be used as indicator for vegetation cover. While vegetated areas are represented by
a large difference in reflectance between those two bands, the difference in reflectance of bare
soil is low. Accordingly, vegetation indices are useful tools for vegetation monitoring through
change detection and classification and can serve as management and decision-making tools in
environmental management and revegetation activities. For example, bare soil areas can easily
be detected and together with a digital elevation model be used to prioritise areas where
interventions are needed most urgently. As input features for the Random Forest models,
vegetation indices provide additional spectral responses, which are used for the classification
algorithm to distinguish the different classes. Different indices have been developed for
different purposes and for different satellite sensors. Remote sensing sensors are constantly
being improved and developed further. Accordingly, the range of different spectral bands
increases and so does the data, which is retrieved from them. This means that vegetation indices
16
are regularly refined and developed further as the technological improvements advance (Jones
& Vaughan, 2010).
The normalised difference vegetation index (NDVI) (Rouse, Haas, Schell, Deering, & Harlan,
1974) is the most commonly used vegetation index (e.g. Eckert et al., 2015; Fensholt & Proud,
2012; Gandhi, Parthiban, Thummalu, & Christy, 2015; Lunetta et al., 2006). It utilises the fact
that the chlorophyll of vegetation absorbs light in the red spectrum (600 – 700 nm), while the
light in the NIR spectrum (700 – 1000 nm) is reflected. Hence, the NDVI is obtained by
dividing the difference between the NIR and the red band by the sum of the NIR and the red
band. Hence, dense vegetation cover with high photosynthetic activity results in high NDVI
values close to 1, while areas free from vegetation, such as bare soil or water, result in much
lower NDVI values closer to 0 or negative. One adjustment of the NDVI is the green normalised
difference vegetation index (GNDVI) (Gitelson, Kaufman, & Merzlyak, 1996). It is defined
exactly as the NDVI, but substitutes the red band for the green band, which improves the
sensitivity to dense vegetation as its wider dynamic range is more sensitive to higher
concentrations of chlorophyll in vegetation compared to the red band used in the NDVI.
Another variation of the NDVI is the green-red vegetation index (GRVI) (Motohka et al., 2010),
which substitutes the NIR band for the green band. This modification improves the sensitivity
to colouring of the leaves from green to yellow. Furthermore, the normalised difference infrared
index (NDII) (Kimes, Markham, Tucker, & McMurtrey, 1981) is a variation of the NDVI,
which substitutes the red band with the short-wave infrared band (SWIR), which is sensitive to
leaf water content. Since the MSI collects SWIR reflectance around 1610 nm (band 11) and
around 2190 nm (band 12), two NDII combinations can be derived: NDII11, which uses band
11 and NDII12, which used band 12. Huete (1988) developed a vegetation index, which also
derived from the NDVI: The soil-adjusted vegetation index (SAVI) corrects the influence of
soil reflectance on vegetation reflectance by adding a constant to the denominator of the NDVI
and adding a multiplication factor to keep the values within the original NDVI bound (-1 to 1).
Additionally, the inverted red-edge chlorophyll index (IRECI) was particularly developed for
Sentinel-2 data and makes use of the three different vegetation red edge (VRE) bands, which
are collected by the MSI in the spectrum between red and NIR (690 – 795 nm) (Frampton et
al., 2013). It divides the difference between the third red-edge band and the red band by the
ratio of the first and second red-edge bands. A summary of the vegetation indices used in this
study can be found in Table 3.
17
Table 3: Vegetation indices used in this study.
Name Equation Sentinel-2 bands used Reference
Normalised
Difference
Vegetation
Index
𝑁𝐷𝑉𝐼 = 𝑁𝐼𝑅 − 𝑅
𝑁𝐼𝑅 + 𝑅 =
𝐵08 − 𝐵04
𝐵08 + 𝐵04
(Rouse et
al., 1974)
Green
Normalised
Difference
Vegetation
Index
𝐺𝑁𝐷𝑉𝐼 = 𝜌𝑁𝐼𝑅 − 𝜌𝐺
𝜌𝑁𝐼𝑅 + 𝜌𝐺 =
𝐵08 − 𝐵03
𝐵08 + 𝐵03
(Gitelson et
al., 1996)
Green-Red
Vegetation
Index
𝐺𝑅𝑉𝐼 = 𝐺 − 𝑅
𝐺 + 𝑅 =
𝐵03 − 𝐵04
𝐵03 + 𝐵04
(Motohka et
al., 2010)
Normalised
Difference
Infrared
Index
𝑁𝐷𝐼𝐼 =𝜌𝑁𝐼𝑅 − 𝜌𝑆𝑊𝐼𝑅
𝜌𝑁𝐼𝑅 + 𝜌𝑆𝑊𝐼𝑅
𝑁𝐷𝐼𝐼11 =𝐵08 − 𝐵11
𝐵08 + 𝐵11
𝑁𝐷𝐼𝐼12 =𝐵08 − 𝐵12
𝐵08 + 𝐵12
(Kimes et
al., 1981)
Soil-adjusted
Vegetation
Index
𝑆𝐴𝑉𝐼
= (1 + 0,75)𝜌𝑁𝐼𝑅 − 𝜌𝑅
𝜌𝑁𝐼𝑅 + 𝜌𝑅 + 0,75
= (1 + 0,75)𝐵08 − 𝐵04
𝐵08 + 𝐵04 + 0,75
(Huete,
1988)
Inverted Red-
edge
Chlorophyll
Index
𝐼𝑅𝐸𝐶𝐼 =𝜌𝑅𝐸3 − 𝜌𝑅
𝜌𝑅𝐸1𝜌𝑅𝐸2
=𝐵07 − 𝐵04
𝐵05𝐵06
(Frampton
et al., 2013)
2.3 Classification scheme
Land cover classification is one of the most common applications of optical satellite data. Since
the beginning of systematic global satellite image processing the need for standardising land
use and land cover classification was recognised. Institutions like the United States Geological
Survey (USGS) and the United Nations Food and Agricultural Organisation (FAO) have
developed detailed and systematic classification schemes driven by the lack of uniformity and
comparability (Anderson, Hardy, Roach, & Witmer, 1976; Di Gregorio, 2005). It is crucial to
make a distinction between land cover and land use. In this study, the common definitions of
land cover and land use, as adopted for example by FAO and the European Union (EU), were
followed. While land cover describes the Earth’s observed (bio)physical cover, land use refers
to the way it is being used (Di Gregorio, 2005; Eurostat, 2018). A single land cover can serve
different use cases: a grassland, for example, can be used as a meadow to grow fodder for
animals, it can be used for animals to graze or for sports such as football or golf or it can also
be an inaccessible piece of grassland that is not being used for anything in particular. Then
again, a single land use type may appear in various land cover types: a recreational area can be
a grassland for sport activities, it can be a water body, a sandy beach, a forest, or a built-up
urban space. In this study the focus is solely laid on land cover. With the various uses of land
and the different purposes of land cover classification, the definition of classes varies greatly
18
within scientific literature depending on the choice of classification scheme and the purpose of
the land cover map (e.g. Congalton, Gu, Yadav, Thenkabail, & Ozdogan, 2014; Di Gregorio,
2005; Herold & Di Gregorio, 2012). While an agricultural map distinguishes precisely between
different types of crops, the fields may be uniformly classified as agricultural fields in a tourist
map regardless of the crop type, because it is not relevant for touristic purposes.
For the purpose of this study, a very fine classification is not necessary as the aim is to provide
a land cover classification map as baseline for future monitoring of vegetation cover
development and afforestation. Therefore, the classes in this study were kept broad to minimise
the potential for misclassifications. The development of the classification scheme in this study
was inspired by the first level classes of the EU’s land use and land cover survey: Artificial
land, cropland, woodland, shrubland, grassland, water areas, wetlands, and bare land (Eurostat,
2018). However, some modifications were made to account for the local character of Rusinga
Island. Wetlands do not exist on the island, so the class was disregarded. Artificial land was
included in bare land. On Rusinga roads are built of sand and gravel, so that they have very
similar spectral response patters as bare soil. Buildings as separate class produced very high
misclassification rates, probably because they are rather small and scattered all over the island,
making them difficult to detect with 10 m spatial resolution data. Roofs of different colours
further cause disturbance in the spectral response patterns. As buildings are often surrounded
by bare soil, they were included in the bare land class in this study. Moreover, the classes
shrubland and woodland were combined in this study, because the reference data were collected
only from satellite imagery and not in situ, which made a distinction between small trees and
shrubs impossible. No large or dense forests exist on the island, but woody vegetation is
scattered, which further complicated the distinction between woodland and shrubland.
A second classification scheme with four classes was introduced at a later stage of the research
and compared to the 5-class scheme described above. This was done as a response to high
misclassification rates especially between the grassland and the cropland classes. The confusion
is not surprising since the spectral signatures of the two classes are very similar (Figure 4).
While the woodland and grassland classes displayed similar spectral signatures, those of the
grassland and cropland classes aligned even more closely. Subsequently, the number of classes
was reduced to four where water and bare land remained unchanged, woodland was translated
to continuous vegetation and complemented by evergreen crops and other year-round, dense
(grassy) vegetation. The remaining cropland and grassland samples, which correspond to
scattered or seasonal vegetation form the fourth class. Table 4 lists and describes the land cover
classes for the two schemes used in this study.
19
Figure 4: Average spectral signatures of the five land cover classes (top) and the four land
cover classes (bottom).
Table 4: Description of land cover classes.
5 classes 4 classes
Name Description Name Description
Woodland high / woody / dark green /
dense vegetation
Continuous
vegetation
dense / evergreen vegetation;
trees
Grassland low / grassy / light green /
scattered vegetation
Scattered or
seasonal
vegetation
seasonal / scattered vegetation
Cropland rectangular fields; seasonal
vegetation
Bare land bare soil; roads; buildings Bare land bare soil; roads; buildings
Water open water (Lake Victoria) Water open water (Lake Victoria)
20
2.4 Reference data
Supervised classification requires reference data, which are used to train, verify, and test the
classification model. Reference data collection for this study was conducted through visual
image interpretation. Because the Sentinel-2 scenes have a maximum spatial resolution of 10 m,
it is difficult to visually identify different types of land cover in these images. Subsequently,
reference data were collected using VHR images. The Google Satellite View imagery, acquired
by CNES Pléiades-1A satellite, which has a spatial resolution of 0.5 m and can be loaded in
QGIS, was used to identify and classify reference samples. This imagery was obtained on 20
February 2017 for most of the study area. A small part in the north east of the island is covered
by imagery dated 30 June 2017 (Google, 2018). The Pléiades-1A scene from 20 February 2017
was also used as false colour composite. The false colour composite facilitated the identification
of different vegetation types as it represents vegetation cover more sensitively than the true
colour composite. Moreover, the Esri World Imagery, which can be embedded in QGIS and is
acquired by DigitalGlobe, sensed on 29 August 2016, with a spatial resolution of 0.46 m, was
used to verify the visual classification (DigitalGlobe, 2016). Consulting both VHR images is
useful, as they are acquired relatively close in time but represent different seasons. While the
CNES image from mid-February represents the peak of the major dry season on Rusinga, the
DigitalGlobe image from August is taken from between the rains, after the major rainy season.
Even if the DigitalGlobe image does not represent the rainy season, it shows a clear difference
to the CNES image and is much greener. Eventually, the reference dataset used in this study
consists of two initially separately collected datasets, which were combined at a later stage of
this thesis work. The reason for this is that first the training dataset consisted of polygons of
which the mean reflectance was used to train the models, while the validation dataset was
comprised of data points. However, it was decided to convert the training samples into point
data and combine the two datasets for two reasons. Firstly, because using the polygons’ mean
reflectance values implies using artificially constructed values, that do not really occur in the
images. Secondly, combining the two datasets results in more accurate model prediction
outcomes, because more samples can be used for training the models.
For the first part of the reference data, initially the training dataset, 322 samples were collected.
To best represent the actual landscape of the study area, 250 points were randomly spread
covering the island and an additional 150 m buffer around the island to cover the coastal areas
and the lake. For each point the class was visually identified using the VHR images and a
corresponding homogeneous polygon was defined at or as close to the point as possible. If it
was not possible to identify the class at the location of the point, a polygon of a different class
representing the nearby area was defined. A drawback of simple random sampling is the
potential underrepresentation of classes (Lillesand et al., 2015). A stratified random sampling
method would be useful with each class representing a stratum. Since the points need to be
randomly selected first before assigning classes to them, it is technically not possible to stratify
the random sample by class. Instead, a 500 x 500 m grid was laid over the island area and it
was ensured that each grid cell included at least one training area to achieve a spread-out spatial
distribution of polygons. If a cell did not contain a training sample, a new polygon was added
preferably representing a potentially underrepresented class. When defining the training
polygons, it was also considered that not only the spectral response pattern of each land cover
21
type is different, but that there are also variations within the classes, especially if the classes are
defined relatively broadly, as they are in this study. Therefore, it was attempted for the reference
data to cover as many of these intra-class variations as possible (Lillesand et al., 2015).
For the second part of the reference data, initially the validation dataset, areas, which could
clearly be classified, were visually identified using the VHR imagery described above.
Polygons were defined and assigned a class. Then, their centroids were calculated which were
used as reference pixels. The purposive sampling method was chosen to sample areas as
information-rich and accurate as possible. This resulted in a total of 215 samples.
For both reference datasets, plots of the NDVI reflectance for each sample over time were
produced and inspected. The plots showed disturbances and noise, which is supposedly caused
by impure data. To resolve this problem, the initial training data were converted from polygon
data to point data. Although this reduced the overall surface area used for training data
drastically, it also made the data purer and the spectral signatures were much clearer. Moreover,
all sample points were revised and corrected if necessary. On this occasion, the 4-classes
classification scheme was introduced and besides revising the assigned classes for the 5-classes
scheme, classes for the 4-classes scheme were assigned to each sample point. Eventually, the
process resulted in a total of 537 reference samples, which were each assigned a class in the 5-
class- as well as the 4-class scheme. Their distribution by geolocation and class is shown in
Figure 5. These samples were then randomly split into two datasets; one, containing 70% (376)
of the samples, for model training and validation, and the second, containing 30% (161) of the
samples, for independent testing of the models. This test dataset was exclusively used for
assessing the accuracy of the models on independent data, that is not associated with building
the models in any way. The use of independent test data is absolutely necessary, if the
comparison of kappa coefficients, which assumes independence of samples when assessing
accuracy, are used for accuracy assessment of the classification (Foody, 2004), which is the
case in the present study. Using the same data for training a model and for assessing its accuracy
results in overestimating the accuracy (Congalton, 1991).
22
Figure 5: Locations and distribution of reference samples (training and test data combined) in
5-class scheme (upper) and 4-class scheme (lower).
23
2.5 Random Forest land cover classification model development
Digital image classification approaches can broadly be divided into spectral- or spatial pattern
recognition, or a hybrid approach, such as object-based image analysis, which is receiving
increasing attention in scientific literature (Duro, Franklin, & Dubé, 2012; Hussain, Chen,
Cheng, Wei, & Stanley, 2013; Immitzer, Atzberger, & Koukal, 2012; Immitzer, Vuolo, &
Atzberger, 2016; Karlson, Ostwald, Reese, Bazié, & Tankoano, 2016; Myint, Gober, Brazel,
Grossman-Clarke, & Weng, 2011; Weih & Riggan, 2010). This study explored the potential of
using free open access Sentinel-2 data for land cover classification with Random Forest and
aimed to develop a tool for change detection and vegetation monitoring. There is a wide range
of VHR satellite images available commercially with resolutions as high as 0.3 m
(DigitalGlobe, 2017). For free open access satellite images, however, Sentinel-2 images with
10 m have the highest spatial resolution currently available. Studies comparing pixel-based and
object-based classification methods with 10 m spatial resolution imagery have found no
improved classification accuracy for the object-based approach (Duro et al., 2012; Immitzer et
al., 2016) while more accurate results have been shown for object-based classification with
VHR imagery with multispectral resolution of 2.0 – 2.4 m (Immitzer et al., 2012; Myint et al.,
2011). Thus, the 10 m spatial resolution of Sentinel-2 images is probably too low for spatial
pattern recognition, but their great variety of spectral bands provides a solid base for a spectral
pattern recognition approach to classify different types of land cover using a pixel-based
analysis. An ongoing ESA-funded research project on using Sentinel-2 data for automated
global land cover mapping also relies on pixel-based supervised classification with Random
Forest as it was found to be the most suitable approach (Lewinski et al., 2017).
Random Forest is a supervised non-parametric machine learning algorithm developed by
Breiman (2001) in the 1990s and early 2000s. It can be used to approach both regression and
classification problems. In this study, it was used for the latter. Random Forest has gained
increasing popularity to approach classification problems especially in remote sensing
(Immitzer, 2017; Karlson, 2015; Schultz et al., 2015; Zhu & Woodcock, 2014) and has been
proved to be an adequate classification strategy to produce land cover maps by a benchmark
study of state-of-the-art supervised classification methods (Inglada et al., 2015). It handles high
dimensional data, thus allows for a large number of input features and is robust against
overfitting. As non-parametric classifier, no assumptions about data distribution are needed
(Breiman, 2001). Moreover, it provides an internal error rate estimation and two feature
importance ranking tools. An additional advantage of Random Forest is its simplicity. Only two
parameters are needed to be set for the model to perform: the number of trees the model builds
(ntree) and the number of features randomly selected at each node (mtry). The Random Forest
model then grows the pre-determined number of decision trees. For each tree, a bootstrap
sample with replacement is drawn from the training data set. Two thirds of the data are used to
grow the decision tree while the remaining one third is used for internal model validation to
estimate the generalisation error, the so-called out-of-bag (OOB) error. The OOB data, which
is not used for training the model, is put down the decision trees and the resulting
misclassification rates provide the OOB error. It gives an estimate of the overall accuracy (OA)
of the classification model. Each tree is grown by randomly selecting a subset of input features
to be split at each node (mtry) and eventually votes for a class. The final classification is derived
24
from a majority vote of all trees in the forest (Breiman, 2001). In this study, ntree was set to
500, which is the default and has proved suitable in previous research (Breiman et al., 2018;
Immitzer et al., 2012; Liaw & Wiener, 2002). Different mtry were tested of which the default
mtry value, which is the square root of the number of input features (Breiman et al., 2018)
proved suitable. Thus, mtry = 4 for the single-date datasets with 17 input features and mtry = 20
for the multi-temporal datasets with 425 input features.
Since the internal OOB error estimates are based on bootstrapping and bootstrapping might be
problematic with small sample sizes (Kuhn & Johnson, 2013), the OOB bootstrap validation
results were compared to the results of repeated 10-fold cross-validation. Repeated 10-fold
cross validation was used, because it produces acceptable bias and variance results, even with
rather small sample sizes (Kuhn & Johnson, 2013). The model performance results were
compared based on the Kappa coefficient (J. Cohen, 1960), because it is particularly useful
when data are distributed unevenly among the classes (Kuhn, 2008), which is the case here
(Figure 5). Because the results did not differ much, the internal bootstrap validation was
considered reliable and in this study the model performance was estimated based on the
Random Forest OOB error.
2.5.1 Predictor datasets
As the second Ssentinel-2 satellite was launched in March 2017, there are much more data
available since. Therefore, the period March 2017 – February 2018 was selected for model
development. A range of different models were developed from different predictor datasets and
compared to assess which is the most suitable for land cover mapping on Rusinga Island using
Sentinel-2 data and Random Forest classification. Table 5 provides an overview of the different
predictor datasets and the scenes they are comprised of. Different datasets were tested. A major
distinction can be made between single-date and multi-temporal datasets. The single-date
datasets each include the 10 spectral bands as well as the seven vegetation indices of one
selected scene. One scene representing the major wet season from March to May and one scene
representing the dry season from January to February were selected. The selection for each
season was based on the lowest OOB error. As for the single-date datasets, the respective set
of seven vegetation indices was included in the multi-temporal datasets. One uses all 25 scenes
covering the selected study period. The second multi-temporal dataset was constructed of all
four wet season scenes, while the third used all six dry season scenes. For the fourth multi-
temporal dataset the seasons were combined, but the amount of input features was reduced only
one scene from each season, the one which produced the best-performing model in the selection
of the single-date datasets. The fifth and the sixth multi-temporal predictor datasets were
comprised of 6 and 12 randomly selected scenes respectively. In this way, the total amount of
input data was reduced compared to the dataset with all scenes, but no assumptions were made
about the suitability of different seasons. Therefore, the randomly selected datasets also
represent the entire year. For each of the randomly selected predictor datasets three sets of
random samples were produced to account for variability of the randomness. Comparing
classification accuracies of single-date and multi-temporal data as well as between different
seasons has been previously studied, mostly in favour of multi-temporal classification, but it
has to be recognised that conditions differ and in some cases single-date classifications
25
outperform multi-temporal approaches (Karlson et al., 2016; Langley, Cheshire, & Humes,
2001; Löw, Knöfel, & Conrad, 2015; Siachalou, Mallinis, & Tsakiri-Strati, 2015). Each of the
predictor datasets was used with the 5-class scheme as well as the 4-class scheme.
Table 5: Selected scenes for the predictor datasets (*same scenes for 5 classes and 4 classes).
Name Scene(s)
5 classes 4 classes
Single-date wet season 22.05.2017 13.03.2017
Single-date dry season 22.01.2018 17.01.2018
Multi-temporal all scenes* 13.03.2017
02.04.2017
12.04.2017
22.05.2017
01.07.2017
11.07.2017
31.07.2017
05.08.2017
15.08.2017
09.09.2017
04.10.2017
09.10.2017
29.10.2017
03.11.2017
18.11.2017
28.11.2017
13.12.2017
23.12.2017
28.12.2017
17.01.2018
22.01.2018
01.02.2018
11.02.2018
16.02.2018
26.02.2018
Multi-temporal wet season* 13.03.2017 02.04.2017 12.04.2017 22.05.2017
Multi-temporal dry season* 17.01.2018
22.01.2018
01.02.2018
11.02.2018
16.02.2018
26.02.2018
Multi-temporal combined seasons 22.05.2017 22.01.2018 13.03.2017 17.01.2018
Multi-temporal 6 random scenes
1st sample*
01.07.2017
31.07.2017
15.08.2017
04.10.2017
28.11.2017
22.01.2018
Multi-temporal 6 random scenes
2nd sample*
13.03.2017
09.10.2017
29.10.2017
22.01.2018
01.02.2018
11.02.2018
Multi-temporal 6 random scenes
3rd sample*
13.03.2017
02.04.2017
12.04.2017
11.07.2017
05.08.2017
28.11.2017
Multi-temporal 12 random scenes
1st sample*
13.03.2017
01.07.2017
31.07.2017
05.08.2017
15.08.2017
04.10.2017
09.10.2017
13.12.2017
28.12.2017
17.01.2018
22.01.2018
26.02.2018
Multi-temporal 12 random scenes
2nd sample*
13.03.2017
12.04.2017
22.05.2017
05.08.2017
09.10.2017
29.10.2017
13.12.2017
28.12.2017
22.01.2018
01.02.2018
11.02.2018
16.02.2018
Multi-temporal 12 random scenes
3rd sample*
13.03.2017
02.04.2017
12.04.2017
22.05.2017
11.07.2017
05.08.2017
04.10.2017
09.10.2017
18.11.2017
28.11.2017
13.12.2017
01.02.2018
2.5.2 Input feature selection and feature importance ranking
The input features influence the accuracy of the classification as not all input features are
equally relevant. The Random Forest classification algorithm provides two tools for input
feature importance ranking: the mean decrease in accuracy (MDA) and the mean decrease in
Gini (MDG). These tools generate a ranking of the input features based on their relative
importance to the classification results. The MDA is derived from randomly permuting the
values of each input feature based on the OOB data while keeping all other variables constant.
The feature’s relative importance is determined by the influence it has on the misclassification
rate. The greater the MDA of a feature, the greater it’s importance to the classification
(Breiman, 2001). The MDG is derived from the sum of the Gini impurity decreases normalised
26
over the total amount of trees. The Gini impurity is a measure of the probability of a feature
being misclassified at a node. The further down a tree, the purer the splits and the smaller the
Gini impurity at each node. Hence, the greater the MDG of a feature, the greater it’s importance
(Breiman & Cutler, 2004). Even if Random Forest handles high-dimensional data, hence can
deal with a large number of input features, optimisation of model performance can be achieved
by elimination of input features with low explanatory power (Hastie, Tibshirani, & Friedman,
2009). For the feature selection in this study, a recursive elimination process based on the MDA
measure was applied: the models were run n times with n = number of features, starting with n
and reducing this number by 1 in every iteration until the number of input features = 0. Of these
iterated models the one with maximum OA was selected as best model. If one or more iterations
reached equal maximum OA, of these the one with the lowest number of input features was
selected as best model (Immitzer, 2017). The classification performance of the different models
tested in this study were compared before and after the feature selection process based on the
OA estimate.
Moreover, the MDA was used to rank the input features after feature selection to evaluate the
relative importance of the different features. The features were grouped by type, which are the
10 spectral bands and the seven vegetation indices. For each feature type the frequency of
occurring and a score was calculated to compare the importance of the different feature types.
The feature ranking was sorted by MDA value and each feature was assigned a score from 1
for the lowest-ranking feature to n for the highest-ranking feature, where n = number of input
features after feature selection. The individual scores were summed and normalised per feature
type. The same procedure was applied for grouping the input features by date. A potential
correlation between the importance of the dates and vegetation seasonality was investigated by
plotting the relative importance of the dates in the different models against the surface
reflectance NDVI, which is highly sensitive to plant greenness and subsequently provides a
good representation of the vegetation’s phenological cycle (Willis, 2015).
2.5.3 Model accuracy assessment
Classification accuracy assessment is, as the classification itself, an important part of the
classification process; i.e. the classification is not complete until its accuracy has been assessed
(Congalton, 1991). When different models are compared, like in this study, it is essential to
evaluate their accuracies. The use of an independent test dataset for the accuracy assessment is
not only good practice, but crucial for certain approaches, such as the Cohen’s kappa coefficient
(J. Cohen, 1960). The samples used to calculate the kappa coefficient are assumed to be
independent, which makes the use of an independent test dataset, which was held out of the
model production process, inevitable (Foody, 2004).
One of the most commonly used techniques for such comparison in accuracy assessment is the
confusion matrix (Congalton, 1991; Lillesand et al., 2015), which compares the classes
predicted by the model to the classes of the independent test dataset. It does not only present
the OA, which is the percentage of correctly allocated classes, but also includes error measures
for the individual classes. This is particularly interesting when one is primarily interested in one
or few classes. The producer’s accuracy (PA) indicates the percentage with which the test data
are classified correctly. The user’s accuracy (UA), subsequently indicates the percentage with
27
which the modelled data represents the real world, that is the test data. The OA was calculated
as follows:
OA =∑ 𝑥𝑖𝑖
𝑟𝑖=1
𝑁
Equation 1
where
𝑁 = total number of observations
𝑟 = number of classes
𝑥𝑖𝑖 = number of correctly classified observations of each class
In addition, the kappa coefficient is often used in combination with the confusion matrix
(Immitzer et al., 2016; Karlson et al., 2016; Lillesand et al., 2015). It provides an indication of
the actual agreement between the validation data and the modelled classification outputs as it
accounts for chance agreement, i.e. the case that pixels are classified correctly just by chance
instead of by correct modelling. The kappa coefficient usually ranges between 0 and 1 where 1
indicates perfect agreement and 0 indicates no better classification than chance agreement.
Kappa () was calculated as follows (adopted from Lillesand et al., 2015):
=𝑁 ∑ 𝑥𝑖𝑖
𝑟𝑖=1 − ∑ (𝑥𝑖+ ∙ 𝑥+𝑖)
𝑟𝑖=1
𝑁2 − ∑ (𝑥𝑖+ ∙ 𝑥+𝑖)𝑟𝑖=1
Equation 2
where
𝑁 = total number of observations
𝑟 = number of classes
𝑥𝑖𝑖 = number of correctly classified observations of each class
𝑥𝑖+ = total number of observations of the modelled classification of each class
𝑥+𝑖 = total number of observations of the test data of each class
The prediction performance of the different models was assessed by predicting the classes of
the independent test dataset using the models and comparing the results to the actual classes,
which were assigned to the data by the author through visual interpretation. The model
accuracies were assessed by comparing their OA and their kappa coefficients.
2.6 Post processing and land cover map creation
The selected models were used to predict the classes of the entire image to produce a land cover
classification of the entire study area. The to-be-classified images need to be comprised of
exactly the same layers (i.e. features) as were used for developing the predictive model
(Hijmans et al., 2019).
To reduce noise and achieve a better visual representation of the classification map, a 3x3
majority filter was applied. The majority filter uses a 3x3 kernel to re-determine the class of
each pixel based on the majority of the particular pixel and its eight most direct neighbours.
28
Additionally, for the same purpose, a sieve filter was used, which disregards connected pixel
groups of the same class, which are smaller than six pixels. Diagonal pixels are considered
connected (Lillesand et al., 2015; NASA Applied Remote Sensing Training, 2018). This
procedure was also tested with a 5x5 majority filter and a 10-pixel sieve. However, the resulting
maps turned out too coarse and generalised.
2.7 Change detection
Change detection was performed to identify, visualise, and analyse land cover changes on the
island. As for image classification, a range of different change detection techniques exist. A
comprehensive review by Tewkesbury et al. (2015) concludes that despite the great variety of
different methods pixel based post-classification techniques are remaining the most popular
choices for change detection due to their fast, convenient, descriptive and easy-to-interpret
characteristics. Since useful Sentinel-2 data are only available since May 2016 for the study
area, change detection could only be performed over a time period of three years, which is very
short for land cover change detection. Huang et al. (2009) suggest a time period covering a
minimum of ten years for effective change detection, however with data acquired every two
years to account for rapid changes. Subsequently, it was hypothesised that no noteworthy
changes could be detected over the analysed time frame. However, this change detection
application also served as test and example for future change detection to be performed by local
organisations.
For the post-classification change detection in this study two time periods were compared:
1. May 2016 – April 2017
2. May 2018 – April 2019
Even if in this study the models with all 25 scenes were selected for further analysis of the land
cover classification as they resulted in the highest accuracies, the number of scenes for the
change detection was limited to 10 and 12 for the first and second time period respectively.
This was done to limit model complexity and justified with the high accuracy results of the
tested models with 12 randomly selected scenes, which comes very close to the accuracy of the
models with all scenes and is therefore accepted as alternative. Moreover, for the first time
period only 10 images of the study area are available. Thus, for the second period 12 scenes
were randomly selected from the available Sentinel-2 images of which the study area was
cloud-free. For the first time period all 10 cloud-free scenes were used to create the predictor
dataset. The selected scenes were downloaded and pre-processed as described in Chapter 2.2.
The vegetation indices were calculated, which, together with the spectral bands of each selected
scene, make up the predictor datasets. The training data was extracted from the predictor dataset
and used to train the Random Forest models and select the best model after feature selection for
each of the two time periods. For the proof of concept of the change detection in this study, the
same training and test samples were used for both time periods as they are temporally so close
to each other and to the reference data date. Thus, it was assumed that no major land cover
changes occurred during that time. For change detection covering a larger time span new
reference data needs to be collected or the existing reference data needs to be updated. The
accuracy of each model was assessed on the independent test dataset as described in
29
Chapter 2.5.3. The models were used to predict the classes of the entire images and
subsequently produce land cover maps. Filtering the maps was done after the comparison to
maintain the original precision at this stage. To allow for unique change values the classification
output of the first time period was reclassified by the following formula:
𝑟 = 𝑐 ∙ 5 + 1
Equation 3
where
𝑐 = initial class
𝑟 = new class
Then, the newer image was subtracted from the older image, resulting in a change map with
unique values for each change direction. For visual representation these change maps were
filtered in the same way as the land cover maps described in Chapter 2.6. Figure 6 presents an
overview of the entire workflow.
30
Figure 6: Workflow of the land cover classification and change detection.
S2 data
Pre-processing:
- atmospheric correction
- spectral band selection
- spatial extent selection
- resampling to 10 m
- image coregistration
Vegetation
indices:
- NDVI
- GNDVI
- GRVI
- NDII11
- NDII12
- SAVI
- IRECI
Pre-processed datasets
Predictor dataset extraction
VHR imagery Land cover classes
Reference data
Training data (70%) Test data (30%)
Random Forest land cover classification
modelling
validation method
comparison: internal
RF bootstrap
validation & repeated
10-fold cross-
validation
Feature
selection
Best model selection Accuracy assessment
Rusinga Island land
cover classification
Change detection:
- May 2016 - April 2017
- May 2018 - April 2019
Confusion matrix
Feature importance ranking
Post-processing for
map visualisation:
- majority filter
- sieve
Post-processing for
map visualisation:
- majority filter
- sieve
Rusinga Island land
cover map
Rusinga Island land
cover change maps
Change matrix
31
3. Results
3.1 Random Forest classification model selection
A land cover classification of Rusinga Island, Kenya was performed using Sentinel-2 images
and the Random Forest classification algorithm. Different models were tested and compared
based on different predictor datasets differentiating a range of single-date and multi-temporal
datasets and two classification schemes. Moreover, the model performances were compared
before and after the feature selection process. In the final step, the model accuracies were
assessed on an independent test dataset.
The comparison of the classification performance of the different models is summarised in
Table 6. It shows the overall accuracy (OA) and the kappa coefficient of each predictor dataset
before and after feature selection for the 5-class scheme and the 4-class scheme. The feature
selection resulted in higher performance values for all models except for the single-date dry
season, for which the performance decreased slightly after feature selection in the 5-class model
and remained the same in the 4-class model. However, this inconsistency can be attributed to
the natural variation of the results effected by the randomness component of the Random Forest
algorithm. The multi-temporal models achieved higher prediction performance than the single-
date models, except for the multi-temporal dry season model with five classes and the first
iteration of the 4-class multi-temporal model with six randomly selected scenes. Again, these
exceptions can be attributed to the randomness of the Random Forest. While the models with
all 25 scenes reached the highest accuracy, those with 12 randomly selected scenes followed
close behind. Generally, the models with four classes achieved higher accuracies than the
models with five classes. Regarding different seasons separately revealed that the wet season
models reached better classification performance than the dry season (Table 6). While for the
single-date models the performance of the wet and dry season differed only slightly, it is
interesting to note that the 5-class multi-temporal wet season model reached a much higher OA
than the dry season one. The difference is 5.32 percentage points (pp) while for the 4-class
multi-temporal models the difference is with 2.13 pp much smaller.
32
Table 6: Classification performance based on internal OOB validation, reported as Kappa
coefficient and overall accuracy (OA), of the Random Forest land cover classification models
with different predictor datasets before and after feature selection for the 5-class- and the 4-
class scheme (colour scaling relative for each column).
The classification models with four classes achieved much higher accuracies than those with
five classes. This is not surprising, since the spectral signatures (Figure 4) of the grassland and
cropland classes in the 5-class models were very similar, which causes confusion in the
prediction outputs of the models (exemplified in the confusion matrix of the 5-class multi-
temporal model with all scenes in Table 7). While the woodland and grassland classes displayed
similar spectral signatures, those of the grassland and cropland classes aligned even more
closely. The confusion matrix reveals that the grassland and the cropland classes achieved the
lowest classification accuracies with producer’s accuracies (PA) of 57% and 81% respectively
and user accuracies (UA) of 71% and 81% respectively, while the remaining classes achieved
accuracies over 93%. It also shows that of the 47 grassland samples 27 were classified correctly
while 20 were misclassified, 12 of which (60%) as cropland. Likewise, of the 67 cropland
samples 54 were correctly classified while 13 were misclassified, eight of which (62%) as
grassland. However, the confusion of grassland and cropland is not the only source of the low
PA of the two classes. While for both classes about 60% of the misclassified samples were
confused for the respective other, the remaining 40% were confused for either woodland or bare
land. Of the misclassified grassland samples 25% were confused for bare land while 15% were
confused for woodland. The opposite is the case for the misclassified cropland samples of which
23% were confused for woodland and 15% for bare land.
33
Table 7: Confusion matrix of the multi-temporal Random Forest land cover classification
model with all scenes and five classes after feature selection (based on internal OOB
validation).
For the further analysis, the multi-temporal models with all scenes for both classification
schemes were used as they achieved the highest prediction accuracy results. Feature selection
proved beneficial for classification accuracy. Thus, the models for further analysis were
considered after feature selection. Both classification schemes were considered delivering
valuable information. The 4-class models showed fewer prediction errors, which can be
attributed to the similarity of the grassland and the cropland class of the classification scheme
with five classes (Figure 4), and therefore seem more reliable. However, the 5-class models
provide more detail and additionally give an indication of the land use (i.e. agricultural use).
Since the purpose of the models is to map land cover to provide a tool for vegetation monitoring,
using both classification schemes and providing the option to choose might prove beneficial. A
larger number of scenes incorporated in the predictor dataset increased the prediction accuracy,
but also the complexity of the model and subsequently its computational needs. The predictor
dataset comprised of 12 randomly selected scenes also provided acceptable prediction accuracy
compared to the one comprised of all scenes but reduces complexity by using only half as many
scenes. Therefore, it could be worth considering using the 12-random-scenes models if the
complexity of the all-scenes models increases due to an increased amount of available satellite
image data.
3.2 Accuracy assessment
The results of the accuracy assessment are summarised in Table 8 and reflect the results of the
Random Forest internal performance assessment (Table 6). The 4-class models performed
better than the 5-class models and the multi-temporal models achieved higher accuracies than
the single-date models. While for the 4-class models the predictor dataset with all scenes
achieved the highest accuracy, for the 5-class models the highest accuracy was achieved by one
of the predictor datasets with six randomly selected scenes (Table 8). As mentioned previously,
this can be attributed to the randomness of the Random Forest algorithm. Moreover, the 4-class
models indicate that the more scenes are incorporated in the predictor datasets, the higher the
prediction accuracy. This trend is not represented by the 5-class models, which is assumed to
derive from the high misclassification rates between the grassland and the cropland classes.
34
Table 8: Accuracy assessment results based on the independent test dataset, reported as Kappa
coefficient () and overall accuracy (OA), for the Random Forest land cover classification
models with different predictor datasets after feature selection for the 5-class- and the 4-class
scheme (colour scaling relative for each column).
Kappa OA Kappa OA
Single-date wet season 0.7501 81.37% 0.8786 91.30%
Single-date dry season 0.7490 81.37% 0.8530 89.44%
Multi-temporal all scenes 0.8666 90.06% 0.9564 96.89%
Multi-temporal wet season 0.8233 86.96% 0.8771 91.30%
Multi-temporal dry season 0.7531 81.37% 0.8872 91.93%
Multi-temporal combined seasons 0.7989 85.09% 0.9132 93.79%
Multi-temporal 6_1 random scenes 0.8832 91.30% 0.9126 93.79%
Multi-temporal 6_2 random scenes 0.8167 86.34% 0.9476 96.27%
Multi-temporal 6_3 random scenes 0.8413 88.20% 0.9302 95.03%
Multi-temporal 12_1 random scenes 0.8673 90.06% 0.9477 96.27%
Multi-temporal 12_2 random scenes 0.8325 87.58% 0.9389 95.65%
Multi-temporal 12_3 random scenes 0.8246 86.96% 0.9390 95.65%
Predictor dataset
5-class models 4-class models
Table 9 shows the confusion matrices of the accuracy assessment for the selected models, those
containing all scenes. The models reached an OA of 90.06% ( = 0.8666) with five classes and
96.89% ( = 0.9564) with four classes. For the 5-class model, only the water class reached UA
and PA of 100%. While the UAs of the woodland and the bare land classes were both around
93%, the PA of the bare land class reached nearly 97% and the woodland PA reached nearly
98%. As expected, for the grassland and cropland classes the model performed much worse.
While the UA of the cropland class with nearly 93% kept up with the other classes, its PA
reached only 59%. The grassland UA equalled 68% and its PA equalled 90%. Of the 21
grassland samples 17 were classified correctly, while four were misclassified, two of which as
bare land and one each as woodland and cropland. Of the 22 cropland samples 13 were
classified correctly, while nine were misclassified, five of which as cropland and two each as
woodland and bare land. For the 4-class model the continuous vegetation and water classes both
achieved UAs and PAs of 100%. The scattered or seasonal vegetation class reached a UA of
nearly 95% and a PA of just over 92%. The UA and PA of the bare land class reached just over
95% and nearly 97% respectively. Of the 39 scattered or seasonal vegetation samples in the test
data, 36 were classified correctly, while three were misclassified as bare land. Likewise, of the
60 bare land samples in the test data, 58 were correctly classified, while two were misclassified
as scattered or seasonal vegetation.
35
Table 9: Confusion matrices of the classification predictions by the multi-temporal model with
all scenes after feature selection and the actual classes of the independent test dataset with five
classes (upper) and four classes (lower).
3.3 Feature importance
The input feature importance rankings of the selected models are depicted in Figure 7. Table
10 provides a summary of the importance grouped by feature type (the 10 spectral bands and
the 7 vegetation indices). In the 5-class model the predictor dataset was comprised of 47 input
features after the feature selection process, while the predictor dataset for the 4-class model
after feature selection was comprised of 46 input features. In the 5-class model, only 12 of the
17 feature types contributed to the classification. In the 4-class model this number decreased to
nine. Overall, the blue band is the feature with the highest explanatory power. It accounts for
31% and 25 % of the explanatory power in the 5-class model and the 4-class model respectively.
Additionally, the red band, NDVI, GNDVI and SAVI rank high. It is notable that the NIR,
narrow NIR, and VRE bands rank rather low or are not even occurring in the models after
feature selection.
36
Figure 7: Input feature importance ranking measured by the mean decrease in accuracy of the
multi-temporal models with all scenes with five classes (upper) and four classes (lower).
37
Table 10: Input feature importance ranking of the multi-temporal models with all scenes after
feature selection grouped by spectral band and vegetation index for the 5-class model (left) and
the 4-class model (right).
Feature Frequency Normalised score
Blue 10 0.31
Red 5 0.15
GNDVI 5 0.11
NDVI 4 0.11
SAVI 3 0.09
IRECI 5 0.06
Green 3 0.06
SWIR1 2 0.05
VRE1 2 0.02
NIR 2 0.02
VRE3 2 0.01
NDII12 2 0.01
SWIR2 1 0.00
VRE2 1 0.00
NarrowNIR 0 0.00
GRVI 0 0.00
NDII11 0 0.00
Multi-temporal model with all scenes
5 classes
Feature Frequency Normalised score
Blue 8 0.25
NDVI 9 0.23
SAVI 7 0.17
Red 5 0.10
SWIR1 5 0.08
GNDVI 3 0.08
IRECI 4 0.07
GRVI 1 0.01
NDII12 3 0.01
NarrowNIR 1 0.00
Green 0 0.00
VRE1 0 0.00
VRE2 0 0.00
VRE3 0 0.00
NIR 0 0.00
SWIR2 0 0.00
NDII11 0 0.00
Multi-temporal model with all scenes
4 classes
The importance ranking by scene (i.e. date) is summarised in Table 11. The 13 March 2017 and
22 May 2017 scenes can clearly be identified as the scenes resulting in the highest mean
decrease in accuracy (MDA). In the 5-class model they account respectively for 17% and 23%
of the explanatory power, while in the 4-class model it is 20% and 17% respectively. Moreover,
the scenes from 18 November 2017 and 28 November 2017 rank relatively high. It is notable
that all these dates represent the wet season. While March and May fall into the major wet
season, November falls into the minor wet season. To investigate a potential correlation of the
explanatory power of the dates and the seasonality of vegetation on Rusinga Island more
closely, the MDA scores for each date were plotted against the mean land area NDVI, which
represents the phenology (Figure 8). While no unambiguous correlation between the feature
importance of the dates and the phenology can be identified, Figure 8 clearly shows that the
dates with the highest explanatory power (13 March, 22 May, 18 November and 28 November)
correlate with the vegetation peaks represented by the mean land area NDVI.
38
Table 11: Input feature importance ranking of the multi-temporal models with all scenes after
feature selection grouped by date for the 5- (left) and the 4-class (right) models.
Date Frequency Normalised score
20170522 9 0.23
20170313 7 0.17
20171128 4 0.13
20171118 3 0.08
20170815 5 0.08
20170805 4 0.05
20180226 4 0.05
20171029 1 0.04
20180117 1 0.03
20170909 1 0.03
20170711 1 0.03
20170731 1 0.02
20171228 1 0.02
20170701 3 0.02
20180211 1 0.01
20171004 1 0.01
20170402 0 0.00
20170412 0 0.00
20171009 0 0.00
20171103 0 0.00
20171213 0 0.00
20171223 0 0.00
20180122 0 0.00
20180201 0 0.00
20180216 0 0.00
Multi-temporal with all scenes
5 classes
Date Frequency Normalised score
20170313 6 0.20
20170522 5 0.17
20171118 4 0.11
20171103 4 0.09
20180216 3 0.08
20171128 2 0.06
20180226 5 0.06
20171029 2 0.05
20171228 2 0.05
20170815 4 0.04
20171223 1 0.03
20180117 1 0.02
20180122 2 0.01
20180211 1 0.01
20171009 2 0.01
20171004 1 0.00
20170805 1 0.00
20170402 0 0.00
20170412 0 0.00
20170701 0 0.00
20170711 0 0.00
20170731 0 0.00
20170909 0 0.00
20171213 0 0.00
20180201 0 0.00
Multi-temporal model with all scenes
4 classes
Figure 8: Normalised feature importance score of the individual dates of the 5- and 4-class
multi-temporal models with all scenes compared to the mean NDVI of all scenes of the overall
land area of Rusinga Island.
39
3.4 Land cover maps
Figure 9: Land cover maps of Rusinga Island with five classes (upper) and four classes (lower)
based on the model with all scenes and for visualisation purposes filtered to reduce noise.
40
The maps in Figure 9 show the results of the Random Forest classification model with all scenes
and five classes (upper) and four classes (lower). To reduce noise for better visual
interpretability the maps were filtered. When including buildings (OpenStreetMap contributors,
2017) in the map, some bare land areas can be identified as residential areas (Figure 10). Most
obviously, the largest residential area right at the driveway to the mainland. It also shows that
the large bare land areas around the central hill, especially on its north-eastern, eastern, and
southern slopes are mostly free from residential use. The same applies to the smaller hill with
the tower on top (in the south-west of the central hill) and the large bare area in the very western
part of the island.
Figure 10: Rusinga Island land cover map with five classes based on the model with all scenes
and additional buildings data from OpenStreetMap.
Figure 11 shows a comparison of the land cover map created in this study with existing land
cover maps. While the Global Land Cover 2000 map (A; Mayaux et al., 2003) with a resolution
of 1 km merely just identifies the island, the two 300 m maps GlobCover 2009 and Land Cover
Map 2015 (B and C; ESA & Université Catholique de Louvain, 2010; ESA CCI Land Cover
Project, 2015) provide more detail and give a brief overview of the island’s land cover
composition. The Globeland30-2010 map (D; National Geomatics Center of China, 2010) with
a resolution of 30 m achieves a quite detailed discrimination between forest, shrubland,
grassland, and cultivated land. However, it does not detect any bare land or artificial land on
Rusinga. The Global Forest Cover Change map (E; Hansen et al., 2013), also with a spatial
resolution of 30 m, does not detect any changes in forest cover on Rusinga between 2000 and
2018. However, it gives a quite precise estimation of the island’s rather low forest cover. The
41
vector map Kenya Land Cover 2007 (F; Landsberg & Henninger, 2007) is neither detailed nor
accurate. The S2 Land Cover Map of Africa 2016 (G; ESA CCI Land Cover Project, 2017) with
a spatial resolution of 20 m, is very comparable to the one produced in this study (H) as it was
derived from the same satellite data. For better comparison, the map produced in this study was
adjusted to comply with the legend of the ESA Climate Change Initiative (CCI) land cover map.
The major differences are that this study better detected the bare areas at the hill slopes.
Moreover, the ESA CCI land cover map shows some grassland – cropland confusion on the
hilltop and it detects a few aquatic or regularly flooded vegetation areas, which this study did
not observe. This study seems to detect more woodland (including trees and shrubs), while the
ESA CCI map classified much of these areas as grassland. Moreover, this study detected much
more bare land / built-up areas.
42
Figure 11: Comparison of the land cover map produced in this study with existing (global) land
cover maps for the study area: A: European Commission Joint Research Centre: Global Land
Cover 2000 (based on SPOT 4 - VEGETATION 1 (spatial resolution 1 km); Mayaux et al.,
43
2003); B: ESA & UC Louvain: GlobCover 2009 (based on ENVISAT - MERIS (spatial
resolution 300 m); ESA & Université Catholique de Louvain, 2010); C: ESA Climate Change
Initiative: Land Cover Map 2015 (based on ENVISAT - MERIS, AVHRR, SPOT -
VEGETATION, and PROBA-V (spatial resolution 300 m); ESA CCI Land Cover Project,
2015); D: Chinese National Geomatics Center: Globeland 30 (based on Landsat TM 5, Landsat
ETM+, and HJ-1 (spatial resolution 30 m); National Geomatics Center of China, 2010); E:
USGS & University of Maryland: Global Forest Cover Change (based on Landsat (spatial
resolution 30 m); Hansen et al., 2013); F: World Resources Institute: Kenya Land Cover 2007
(vector data; Landsberg & Henninger, 2007); G: Prototype S2 Land Cover Map of Africa 2016
(based on Sentinel-2 (spatial resolution 20 m); ESA CCI Land Cover Project, 2017); H: this
study’s output: Rusinga Island Land Cover Map (based on Sentinel-2 (spatial resolution 10
m)).
3.5 Land cover change detection
Owing to the prominent confusion between grassland and cropland in the 5-class scheme the
resulting change maps for these scenarios present a relatively great amount of changes between
these classes, where it is impossible to distinguish between actual conversion and the result of
misclassification. Therefore, the results of the change detection are presented only for the 4-
class scheme. The map in Figure 12 presents the entire change map, while the two maps in
Figure 13 represent areas where vegetation cover increased or upgraded (upper) and decreased
or degraded (lower).
Figure 12: Land cover change map of Rusinga Island between May 2016 – April 2017 and May
2018 – April 2019.
44
Figure 13: Land cover change maps of Rusinga Island between May 2016 – April 2017 and
May 2018 – April 2019 highlighting vegetation increase (upper) and decrease (lower).
45
The vegetation cover change maps (Figure 13) reveal that increased or upgraded vegetation
cover is predominantly located in the north eastern part of the island and at the northern hill of
the central part of the island where the major changes are conversion from scattered or seasonal
vegetation to continuous vegetation cover. Moreover, some areas north of the residential area
at the driveway experienced conversion from bare land to scattered or seasonal vegetation. The
most common changes are scattered or seasonal vegetation to continuous vegetation and bare
land to scattered or seasonal vegetation. Changes from bare land or water to continuous
vegetation and from water to scattered or seasonal vegetation are neglectable. The most severe
decrease of vegetation cover is observed at the slopes of the island’s central hill, where scattered
or seasonal vegetation was converted to bare land, except for the north eastern slopes.
Furthermore, several rather scattered areas in the south western parts of the island experienced
vegetation decrease and degradation both from continuous vegetation to scattered or seasonal
vegetation and from scattered or seasonal vegetation to bare land.
46
4. Discussion
4.1 Sentinel-2 data for Random Forest land cover classification
One aim of this study was to explore the suitability of free open access Sentinel-2 data (ESA,
2019) and remotely collected reference data to classify land cover on Rusinga Island, Kenya
using the Random Forest classifier (Breiman, 2001). It is generally accepted among researchers
that in remote sensing applications and land management overall classification accuracies (OA)
should reach at least 85% (Kussul, Lavreniuk, Skakun, & Shelestov, 2017; Otukei & Blaschke,
2010; Willis, 2015; Wulder, Franklin, White, Linke, & Magnussen, 2006), as initially suggested
by Anderson (1971) when developing a standardised framework for land use and land cover
classification (Anderson et al., 1976). However, this benchmark is also questioned as evaluating
model performance highly depends on the purpose and the nature of the model (Laba et al.,
2002; Wulder et al., 2006). This is discussed in more detail in the sections below. In this study,
a feature selection process was undertaken to reduce the number of features to an optimum and
select those with the highest contribution to the model accuracy (Chapter 2.5.2). It was found
that the feature selection improved model performance. Subsequently, all models considered
for further comparisons have undergone the feature selection process. Generally, most of the
models examined in this study produce classification accuracies above the 85% benchmark with
OAs ranging between 83.51% and 95.48% ( = 0.7859 – 0.9380) in the internal out-of-bag
(OOB) assessment and 81.37% and 96.89% ( = 0.7490 – 0.9564) in the accuracy assessment
based on the independent test dataset. Only three out of the 24 tested models reach an OA of
less than 85% (Table 6 and Table 8).
4.1.1 Single-date vs. multi-temporal datasets
The models were compared by their different predictor datasets. Some of them were comprised
of single-date data, while others contained multi-temporal data. The multi-temporal models
clearly reached higher classification accuracies than the single-date models (Table 6 and Table
8). This result was expected as including remote sensing data collected at different points in
time in the predictor datasets broadens the variability of spectral signatures on which the model
is being trained. Thus, the greater the variability within the training data, the better the model
performs at correctly classifying image pixels which deviate from the most common spectral
signatures of the classes. This finding is in line with previous research, which has highlighted
the benefits of using multi-temporal datasets for remote sensing-based image classification
(Esch, Metz, Marconcini, & Keil, 2014; Guo, Price, & Stiles, 2003; Karlson et al., 2016; Senf,
Leitão, Pflugmacher, van der Linden, & Hostert, 2015; Tigges, Lakes, & Hostert, 2013). Vuolo,
Richter, and Atzberger (2011) found an increasing number of features included in the
classification model resulting in higher accuracies. However, in this study the trend only held
up to a certain number of features. Some features with low explanatory power do not contribute
to the classification accuracy and therefore can be eliminated to optimise model performance
(Hastie et al., 2009). The feature selection results of this study, which indicated higher
performance of the models after feature selection, thus including less features, seem to
contradict the assumption of more features resulting in higher accuracies. It is however
complementary. The more features were initially included, the greater are the chances of those
47
features with the highest explanatory power being selected in the feature selection process. The
results of the accuracy assessment in this study only partly confirm the hypothesis that more
features increase model accuracy. Table 12 (adopted from Table 8 in the results section)
summarises the classification accuracy and adds the number of sensing dates and the number
of features included in each dataset after feature selection. It clearly shows that in both cases
(with five classes and four classes) the model with all 25 scenes reached the highest accuracy
and also included the highest number of features (and scenes). However, the results show no
clear correlation between the model accuracy and the number of features or number of scenes.
Hence, the results indicate that other factors than the number of features or scenes included in
the predictor dataset, for example the season in which the data was sensed and the classification
scheme, play relevant roles in the classification accuracy of the models. These factors are
discussed in the following sections.
Table 12: Adopted from Table 8 in results chapter: accuracy assessment results of the 5-class
models (upper) and 4-class models (lower) after feature selection sorted by Kappa. Number of
scenes indicates the number of different sensing dates included in the predictor datasets after
feature selection. Number of features indicates the number of features included in the predictor
dataset after feature selection.
Predictor datasets 5-class models Kappa OA No. of scenes No. of features
Single-date dry season 0.7490 81.37% 1 11
Single-date wet season 0.7501 81.37% 1 2
Multi-temporal dry season 0.7531 81.37% 6 11
Multi-temporal combined seasons 0.7989 85.09% 2 9
Multi-temporal wet season 0.8233 86.96% 4 35
Multi-temporal 12 random scenes* 0.8415 88.20% 7 13
Multi-temporal 6 random scenes* 0.8471 88.61% 6 29
Multi-temporal all scenes 0.8666 90.06% 16 47
* Average of three iterations
Predictor datasets 4-class models Kappa OA No. of scenes No. of features
Single-date dry season 0.8530 89.44% 1 9
Multi-temporal wet season 0.8771 91.30% 2 5
Single-date wet season 0.8786 91.30% 1 16
Multi-temporal dry season 0.8872 91.93% 6 20
Multi-temporal combined seasons 0.9132 93.79% 2 15
Multi-temporal 6 random scenes* 0.9301 95.03% 4 13
Multi-temporal 12 random scenes* 0.9419 95.86% 10 34
Multi-temporal all scenes 0.9564 96.89% 17 46
* Average of three iterations
4.1.2 Classification scheme
The reason for distinguishing two different classification schemes throughout this study is the
rather high misclassification rate between the grassland and the cropland classes in the 5-class
scheme as described in Chapter 2.3. The challenge to correctly classify grassland and cropland
48
can be attributed to the similar spectral signatures of the two classes (Figure 4) and has been
discussed in previous research (Esch et al., 2014; Yin, Pflugmacher, Kennedy, Sulla-Menashe,
& Hostert, 2014). Nevertheless, the predictor datasets were tested with both five and four
classes, because even if the classification accuracy of the 5-class models is lower, the results
might still be useful for application purposes as the 5-class scheme provides a finer distinction
of land cover. Table 6 and Table 8 clearly show the higher prediction accuracy of the 4-class
models compared to the 5-class models. For the best performing models, the OAs when
discriminating four classes are around seven percentage points higher than those of the same
models discriminating five classes. The higher performance of the models discriminating fewer
classes is not surprising regarding the previously described similarity of spectral signatures of
the grassland and cropland classes. However, misclassification in the 5-class model does not
only occur between the grassland and cropland classes. The confusion matrices of the 5-class
model in Table 7 and Table 9 show that a rather large share of the misclassified grassland and
cropland samples, 38-40% in the internal validation and 44-75% in the external accuracy
assessment, were misclassified as either woodland or bare land. While grassland is more
commonly confused for bare land, cropland is more commonly misclassified as woodland. This
is assumed to result from the reference data, which includes some cropland samples that most
probably are perennial crops or fruit trees. Moreover, the distinction between grassland and
bare land in the reference data collection process was a challenge as the visual interpretation
was mainly performed on the very high resolution (VHR) image, which was taken in the dry
season where much of the grassland is dry and brown which makes it difficult to distinguish it
from bare soil. The use of the VHR image from August for verification reduced this limitation
to some extent. Another factor for the higher accuracy performance of the 4-class models is
simply the number of classes. The fewer classes to distinguish, the higher can the prediction
accuracy be expected. This is because the classification criteria become more distinct and, at
least in this study, the total amount of training samples is divided among fewer classes, resulting
in more training samples per class, which positively contributes to the classification accuracy.
That the number of classes to discriminate between affect the classification accuracy has also
been observed by Esch et al. (2014), whose classification model OA reached 90% when
discriminating only grassland and cropland, 86% when regarding five classes, and only 70%
when regarding 11 different classes. It is suggested here that the 5-class model will be more
useful for application purposes, assuming that the cropland – grassland discrimination provides
acceptable results that can be worked with on the ground. Being able to work with three
different vegetation classes, woodland, grassland and cropland, could benefit the vegetation
restoration and monitoring processs by differentiating the type and purpose of the vegetation.
While cropland areas would be useful to promote sustainable farming practices such as
perennial crops or agroforestry, the focus of restoring vegetation cover on grass- or woodland
(including shrubs) is stabilisation and subsequently nurturing and moisturinsing the soil.
However, in the 5-class model, the user’s accuracy (UA) of the grassland class with 68% is
particularly low while the remaining three classes all reach more than 92% (Table 9). In the 4-
class model the UAs all reach more than 94%. Thus, if the classification outputs prove
inconsistent with ground truth, it might be better to consider the 4-class model. Thus, as the
classification outputs were not validated in situ, the change detection was performed using a 4-
class model.
49
4.1.3 Seasonal differences and phenology
Determining the optimal timing for data acquisition is a re-occurring issue in remote sensing-
based land cover classification as it depends on the local ground circumstances, climate, data
availability and the purpose of the classification. For example, while Karlson et al. (2016) found
dry season imagery, thus post growing season, resulting in higher classification accuracies for
tree species discrimination in Burkina Faso, Tigges et al. (2013) found growing season imagery
slightly more accurate for the same purpose in Berlin, Germany. Regarding the results of the
single-date models in this study, which concerns land cover classification rather than tree
species discrimination, the growing (wet) season models reach better classification
performance. While for the single-date models the performance of the wet and dry season differ
only slightly, the multi-temporal wet season model with five classes reaches a much higher OA
than the dry season one. However, the difference is much smaller for the 4-class models
(Table 6 and Table 8). This indicates that the discrimination between grassland and cropland,
the class discrimination that was dissolved in the 4-class scheme, is dependent on the
phenology. While the vegetation is greener during the wet season, grassland and cropland might
be better distinguishable. Subsequently, it would be worth analysing the confusion matrices and
comparing the misclassification between grassland, cropland and bare land during the wet
season and the dry season. A land cover classification study from southern Portugal (Senf et
al., 2015) found different optimal timing of image acquisition for different land cover classes.
The authors suggest that in southern Portugal March, which represents the end of the rainy
season and the peak of vegetation productivity, is the best time for discriminating between
cropland and grassland. For Rusinga, the end of the rainy season and the peak of vegetation
productivity would be late May and June and, slightly less pronounced, late November and
early December. Besides considering the seasonal models, the importance of the dates in this
study was assessed on the multi-temporal models with all scenes (Table 11 and Figure 8). The
analysis shows that of the four most important dates two fall into the major wet season
(13 March 2017 and 22 May 2017) and two into the minor wet season (18 and 28 November
2017). Thus, these findings are in line with the suggestion by Senf et al. (2015). Nevertheless,
the seasonal models clearly produce lower accuracies than the models covering a wider time
period spread over the year (Table 8). For example, the multi-temporal models with combined
seasons, which include only the features from the single-date dry season and the single-date
wet season (i.e., two scenes), perform better than the multi-temporal single season models,
which include 4 and 6 scenes with an exception of the 5-class multi-temporal wet season model,
which scores higher. This indicates that not only the number of features included in the predictor
dataset influences the prediction accuracy, but also the time of the year when the data were
sensed. The models with input data covering a larger timeframe and different times of the year,
result in higher accuracies. This finding is supported by previous studies (Esch et al., 2014;
Karlson et al., 2016; Senf et al., 2015; Tigges et al., 2013).
4.1.4 Vegetation indices
One of the sub-questions of this study was how vegetation indices contribute to the
classification accuracy compared to the spectral bands. Many land cover classification studies
include one or more vegetation index measures in their data to account for phenological changes
and to improve identification and distinction of vegetation cover (e.g. Eckert et al., 2015;
50
Lunetta et al., 2006; Motohka et al., 2010; Nyberg et al., 2015; Viña et al., 2011; Wibowo et
al., 2012; Yang et al., 2015). Therefore, it was expected that vegetation indices contribute
additionally to the classification performance. To compare the contribution (i.e. importance) of
the seven different vegetation indices and the spectral bands used in this study, a feature
importance ranking was performed, which proved the high contribution of the vegetation
indices to the classification accuracy (Figure 7 and Table 10). The blue band ranks particularly
high, while the red band, the NDVI, the GNDVI and the SAVI follow. The high importance of
the blue band could be explained by the large amount water surrounding the study area.
Verifying this hypothesis by repeating the analysis with only the land area of Rusinga was,
however, not included in this thesis. The importance of the NDVI could be expected due to its
popularity as vegetation proxy. The relatively high importance of the SAVI was to be expected
as it corrects for soil reflectance on vegetation reflectance, which is particularly useful in
scattered or patchy vegetated areas (Huete, 1988) such as Rusinga Island. The high importance
of the GNDVI is somewhat more surprising as it was developed to increase the sensitivity to
dense vegetation (Gitelson et al., 1996), which is not particularly common on Rusinga Island.
The rather low importance of the two NDIIs was to be expected as they are sensitive to leaf
water content (Kimes et al., 1981), while the vegetation on Rusinga Island mainly consists of
scattered shrubs and grasses. Surprising was the low importance of the IRECI, which was
particularly developed for Sentinel-2 data to incorporate the newly included red-edge bands
(Frampton et al., 2013). However, Frampton et al. (2013) note the need for further validation
of their index. In the same line of argument, it is notable that the VRE as well as the NIR and
narrow NIR spectral bands all show relatively low importance in this study (Table 10).
Nevertheless, it could be worth testing to try variations of the vegetation indices for example
substituting the NIR band for the narrow NIR band, as is was designed to reduce the water
vapour disturbance from which the NIR band suffered in earlier Landsat missions (ESA, 2015).
Jin et al. (2013) developed the Multi-Index Integrated Change Analysis method, which was
developed for the USGS National Land Cover Database. Besides the NDVI as vegetation index
it includes the Normalised Burn Ratio, the Change Vector, and the Relative Change Vector
Maximum, which complement each other being sensitive to different kinds of changes.
Accordingly, it could have been worth to diversify the features in this study more and to pay
more attention to the complementation of the different input features. Nevertheless, this study
proves that including vegetation indices in the training data benefits classification performance.
Moreover, it confirms the suitability of the NDVI, but also shows that other vegetation indices
might be worth to consider.
4.1.5 Rusinga land cover maps
Even if the real suitability of the produced land cover maps needs to be confirmed locally on
the ground, the comparison with existing land cover maps (Figure 11) shows that this study
improved the quality of land cover maps for Rusinga Island. The comparison clearly shows the
improvement of remote sensing data over the years and the benefit of higher spatial resolution.
While the S2 Land Cover Map of Africa 2016 from the European Space Agency’s Climate
Change Initiative (ESA CCI Land Cover Project, 2017) was derived from data from same
satellite, the maps of this study seem to be more accurate. It has to be noted that the ESA CCI
map is a prototype and has a spatial resolution of 20 m. Nevertheless, the higher accuracy of
51
this study’s map is not surprising as the training data and the classification models in this study
were developed particularly for the study area, while the existing 2016 Sentinel-2 land cover
map was taken from an Africa-wide classification output, which cannot include as much local
detail in the model development. This shows the benefit of developing a classification model
particularly for the concerned area rather than on a large scale. However, land cover
classification maps of large areas would be valuable and there are several projects currently
working on automated land cover classification of entire continents, such as the ESA CCI Land
Cover Project (ESA CCI Land Cover Project, 2019) that developed the cited Africa map and
another ESA funded project led by the Space Research Centre of the Polish Academy of
Sciences (ESA, n.d.) that has produced a 10 m land cover map of entire Europe and aims for a
global scale.
4.2 Post-classification change detection for land cover change and
vegetation monitoring
Even if post-classification change detection is a convenient and often-used approach for
detecting and mapping land cover changes, post-classification methods are entirely dependent
on the input maps. Thus, any error in any of the maps is directly transferred to the change map
and subsequently makes it very dependent on the quality of the classification process (Coppin,
Jonckheere, Nackaerts, Muys, & Lambin, 2004; Tewkesbury et al., 2015). Therefore, it is
difficult to assess the quality of the change detection without being entirely certain of the
classification quality. Moreover, change detection validation is particularly challenging for two
reasons: Firstly, it requires validation data from the earlier and the later point in time of the
change period. Secondly, changes typically occur only over a rather small area compared to the
overall classified area.
The change maps produced in this study seem to achieve acceptable results, although the actual
quality should be verified on the ground. Several studies have utilised Sentinel-2 data and noted
their suitability for various types of land cover classification and change detection (Frampton
et al., 2013; Immitzer et al., 2016; Inglada et al., 2015; Pesaresi et al., 2016). Many of these
classifications are much more complex than the one in this study, discriminating not only
between different types of land cover, but between different species, of trees or crops for
example. Accordingly, it should be possible to achieve better classification accuracies, which
represent the ground truth more precisely. The major challenge for the classification in this
study is the questionable quality of the reference data, which was collected remotely and over
a limited period of time. Although potential intra-annual changes were accounted for by
consulting VHR satellite images sensed at different times of the year (August 2016 and
February 2017) and different scenes temporally spread over the time period from March 2017
to February 2018, the reliability of the ‘ground truth’, which the reference data should represent,
remains questionable as only satellite-based data and visual image interpretation was used.
Therefore, in-situ verification of the change detection is considered necessary. Moreover, to
perform a reliable change analysis covering a larger time span updated reference data is
necessary. With the methods used in this study, each classification model was developed based
on the corresponding predictor dataset and the reference samples. To avoid re-examining or
collecting new reference data, a method would be needed that allows a model to learn on a
52
given training dataset and predict any other input dataset accordingly. To complement the
change detection analysis a change matrix indicating the percentages of changes from one class
to another would be beneficial. However, this would make most sense when only the land area
of the island, including a small shoreline area, would be considered to reduce the amount of
water, as mentioned in relation to the feature importance (Chapter 4.1.4)
4.3. Concluding remarks and future outlook
This study found Sentinel-2 data and Random Forest useful to classify and map land cover on
Rusinga Island. As VHR satellite data can be costly and therefore financially unfeasible for
many applications, especially in poorer regions, data and software used in this study are
available free of charge. The trade-off between spatial and temporal resolution on the one hand
and the cost of data acquisition on the other hand is often referred to as a limitation of remote
sensing (Karlson et al., 2016; Willis, 2015). However, the rapid developments in remote sensing
technology reduce this trade-off as data becomes available at higher resolution and lower cost.
With a spatial resolution of up to 10 m for multi-spectral bands and a temporal resolution of
five days the freely available Sentinel-2 data, which have been awaited by the scientific
community, provide substantial improvements compared to for example the Landsat-8 satellite
launched in May 2013, which provides multi-spectral data at 30 m resolution and a
panchromatic band at 15 m resolution (Karlson, 2015). Sentinel-2’s high temporal resolution
also reduces issues with limited data availability caused by cloud or haze contamination, which
is a prominent issue in remote sensing applications (Esch et al., 2014; Karlson, 2015; Willis,
2015). Hence, using multi-temporal data for land cover classification, which has been proven
to provide better classification results than single-date data, became more feasible with the
improvements of Sentinel-2. Many scientists highlight the benefits of Sentinel-2 data for Earth
Observation and the twin-satellites undoubtfully set a new status quo in the field of freely
available high spatial and temporal resolution satellite data (Frampton et al., 2013; Immitzer et
al., 2016; Inglada et al., 2015; Karlson, 2015; Pesaresi et al., 2016). Despite no doubt about the
suitability of Sentinel-2 data for terrestrial monitoring, efforts are being made to evaluate the
usefulness and performance of combined data sources such as Sentinel-2 and Landsat 8 (Inglada
et al., 2015; Senf et al., 2015; Zhu & Woodcock, 2014). Moreover, additional input features
such as LiDAR data and digital elevation models could improve the classification.
Random Forest is a widely used and accepted classifier not only in the field of land use and
land cover change and performs well in this study. The Random Forest classifier has also been
found most suitable in an ESA-funded project aiming to automatically classify global land
cover using Sentinel-2 data (ESA CCI Land Cover Project, 2019). Nevertheless, many studies
compare different classification methods different specific purposes within the same field and
find several suitable classifiers among which Random Forest, Support Vector Machine and
Neural Networks (Duro et al., 2012; Lu & Weng, 2007; Otukei & Blaschke, 2010). Moreover,
with the increasing spatial resolution of freely available satellite data a scientific discussion has
started about the benefits of object-based classification compared to pixel-based classification
(Duro et al., 2012; Hussain et al., 2013; Immitzer et al., 2012; Immitzer et al., 2016; Karlson,
2015; Tewkesbury et al., 2015; Weih & Riggan, 2010; Yu, Zhou, Qian, & Yan, 2016).
53
The biggest challenge and limitation in this study is the remotely collected reference data.
Visual image interpretation, which was used to collect the reference data for this study, requires
experience and knowledge of the objects on the ground (Albertz, 2009), which could not be
assured in this study. If it could be assured that the classes in the training and test data were
allocated correctly, the models work quite well, however, any errors are drawn into the
classification and result in undetectable errors. Therefore, at least a sample test dataset should
be collected in situ or a random sample of the reference data should be verified on the ground.
Additionally, other existing land cover maps could be included to inform the remote reference
data collection. However, the availability and quality of existing maps cannot be guaranteed.
The change detection produced acceptable results, however, their usefulness for vegetation
monitoring is questionable. To allow for the detection of detailed vegetation changes, the
classification quality should be improved for example by including in situ reference data. Once
the reference data is improved, using a finer classification scheme could become possible for
example including a differentiation between trees, shrubs, grasses and crops as well as between
bare land and built-up areas. Additionally, to achieve more useful change detection results the
analysed time frame should be longer. However, for the ongoing vegetation restoration on the
island the maps produced in this study could prove useful for detecting eroded areas to support
the revegetation planning. To support the identification of areas at risk of erosion, the
development of an erosion model for example based on the Revised Universal Soil Loss
Equation (Renard, Foster, Weesies, McCool, & Yoder, 1997) would be beneficial. The results
of this study will be used to develop a tool for land cover classification and change detection to
be used by the organisations working on Rusinga Island to restore vegetation cover. The tool is
expected to support planning and monitoring of the planting activities.
Acknowledgements
I would like to express my gratitude to my supervisors Martin Karlson at the Department for
Thematic Studies – Environmental Change at Linköping University and Markus Immitzer at
the Department of Landscape, Spatial and Infrastructure Sciences – Geomatics at BOKU
University of Natural Resources and Life Sciences, Vienna. Thank you for your support, your
invaluable feedback and your patience. I also want to thank Sebastian Böck for supporting me
with data acquisition and programming. Further, I want to thank Bernhard Wagenknecht, the
founder of Books for Trees, for all the background information, stories and pictures that helped
me gaining an impression of Rusinga Island without having been there. Thank you for initiating
the idea for this thesis together with Herbert Formayer at BOKU University. Moreover, I
express my thanks to Isabella Ostovary and Annika Neid, who have been on Rusinga for Books
for Trees and shared their impressions, knowledge, stories and pictures with me. Thank you
also, Evans Odula, founder of Badilisha, for providing me with background information.
Finally, I would like to express my gratitude to my friends and family, also my partner’s family,
who supported me whenever possible. Special gratitude goes to my partner Felix, who endured
this thesis process with me, always had a motivating attitude and supported me with technical
and programming issues.
54
References
Albertz, J. (2009). Einführung in die Fernerkundung. Grundlagen der Interpretation von Luft-
und Satellitenbildern (4th ed.). Darmstadt, Gernamy: Wissenschaftliche
Buchgesellschaft.
Anderson, J. R. (1971). Land Use Classification Schemes used in selected recent geographic
applications of remote sensing. Photogrammetric Engineering & Remote Sensing,
April, 379-387.
Anderson, J. R., Hardy, E. E., Roach, J. T., & Witmer, R. E. (1976). A Land Use and Land
Cover Classification System for Use with Remote Sensor Data (Vol. 964).
Washington, DC, USA United States Government Printing Office.
Andrews, P. (1973). Vegetation of Rusinga Island. Journal of the East Africa Natural History
Society and National Museum, 142.
Badilisha. (n.d.). Rusinga Island Community. Retrieved from
http://www.badilishapermaculture.org/people/. Accessed: 31 January 2019
Bastola, S., Dialynas, Y. G., Bras, R. L., Noto, L. V., & Istanbulluoglu, E. (2018). The role of
vegetation on gully erosion stabilization at a severely degraded landscape: A case
study from Calhoun Experimental Critical Zone Observatory. Geomorphology, 308,
25-39. doi:10.1016/j.geomorph.2017.12.032
Bivand, R., Keitt, T., Rowlingson, B., Pebesma, E., Sumner, M., Hijmans, R. J., . . . Rundel,
C. (2019). Bindings for the 'Geospatial' Data Abstraction Library. Retrieved from
https://cran.r-project.org/web/packages/rgdal/rgdal.pdf. Accessed: 07 September 2019
Borrelli, P., Robinson, D. A., Fleischer, L. R., Lugato, E., Ballabio, C., Alewell, C., . . .
Panagos, P. (2017). An assessment of the global impact of 21st century land use
change on soil erosion. Nature Communications, 8(1). doi:10.1038/s41467-017-
02142-7
Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.
doi:10.1023/A:1010933404324
Breiman, L., & Cutler, A. (2004). Random forests - classification description. Random
Forests. Retrieved from
https://www.stat.berkeley.edu/~breiman/RandomForests/cc_home.htm. Accessed: 14
December 2018
Breiman, L., Cutler, A., Liaw, A., & Wiener, M. (2018). Breiman and Cutler's Random
Forests for Classification and Regression. 29. Retrieved from https://cran.r-
project.org/web/packages/randomForest/randomForest.pdf. Accessed: 07 September
2019
Byrne, L. (2013). Helping small farmers help themselves, on Rusinga Island, Lake Victoria,
Kenya. Retrieved from https://permaculturenews.org/2013/01/29/helping-small-
farmers-help-themselves-on-rusinga-island-lake-victoria-kenya/ Accessed: 22
November 2019
Cebecauer, T., & Hofierka, J. (2008). The consequences of land-cover changes on soil erosion
distribution in Slovakia. Geomorphology, 98(3-4), 187-198.
doi:10.1016/j.geomorph.2006.12.035
Clerc, S., & Team, M. (2018). S2 MPC. L1C Data Quality Report (28). Retrieved from
https://earth.esa.int/documents/247904/3897638/Sentinel-
2_L1C_Data_Quality_Report/9699703d-556e-407a-a955-65c0c82dfcb5?version=1.2.
Accessed: 15 December 2018
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and
Psychological Measurement(1), 37-46.
55
Cohen, W. B., Yang, Z., Healey, S. P., Kennedy, R. E., & Gorelick, N. (2018). A LandTrendr
multispectral ensemble for forest disturbance detection. Remote Sensing of
Environment, 205, 131-140. doi:https://doi.org/10.1016/j.rse.2017.11.015
Congalton, R. G. (1991). A review assessing the accuracy of classifications of remotely
sensed data. Remote Sensing of Environment, 37, 12.
Congalton, R. G., Gu, J., Yadav, K., Thenkabail, P., & Ozdogan, M. (2014). Global Land
Cover Mapping: A Review and Uncertainty Analysis. Remote Sensing, 6(12), 12070-
12093. doi:10.3390/rs61212070
Coppin, P., Jonckheere, I., Nackaerts, K., Muys, B., & Lambin, E. (2004). Digital change
detection methods in ecosystem monitoring: a review. International Journal of
Remote Sensing, 25(9), 1565-1596. doi:10.1080/0143116031000101675
Di Gregorio, A. (2005). Land Cover Classification System (LCCS), version 2: Classification
Concepts and User Manual. Retrieved from
http://www.fao.org/docrep/008/y7220e/y7220e02.htm#TopOfPage. Accessed: 08
October 2018
DigitalGlobe. (2016). World Imagery [Satellite Imagery]. Retrieved from:
https://www.arcgis.com/home/webmap/viewer.html?webmap=c1c2090ed8594e01931
94b750d0d5f83,
https://www.arcgis.com/home/item.html?id=10df2279f9684e4a9f6a7f08febac2a9.
Accessed: 08 May 2018
DigitalGlobe. (2017). WorldView-4. Retrieved from
http://worldview4.digitalglobe.com/#/main. Accessed: 27 March 2019
Duro, D. C., Franklin, S. E., & Dubé, M. G. (2012). A comparison of pixel-based and object-
based image analysis with selected machine learning algorithms for the classification
of agricultural landscapes using SPOT-5 HRG imagery. Remote Sensing of
Environment, 118, 259-272. doi:10.1016/j.rse.2011.11.020
Eckert, S., Hüsler, F., Liniger, H., & Hodel, E. (2015). Trend analysis of MODIS NDVI time
series for detecting land degradation and regeneration in Mongolia. Journal of Arid
Environments, 113, 16-28. doi:10.1016/j.jaridenv.2014.09.001
ESA. (2015). Sentinel-2 User Handbook. Retrieved from
https://sentinel.esa.int/documents/247904/685211/Sentinel-2_User_Handbook.
Accessed: 15 November 2018
ESA. (2018a). Naming Convention. Retrieved from https://sentinel.esa.int/web/sentinel/user-
guides/sentinel-2-msi/naming-convention. Accessed: 15 November 2018
ESA. (2018b). News. Upcoming Sentinel-2 Level-2A product evolution. Retrieved from
https://earth.esa.int/web/sentinel/missions/sentinel-2/news/-/article/upcoming-sentinel-
2-level-2a-product-evolution. Accessed: 15 November 2018
ESA. (2018c). Sentinel Application Platform (SNAP): European Space Agency. Accessed:
ESA. (2018d). User Guides. Level-2A. Retrieved from
https://sentinel.esa.int/web/sentinel/user-guides/sentinel-2-msi/product-types/level-2a.
Accessed: 15 November 2018
ESA. (2019). Copernicus Open Access Hub [Data Base]. Retrieved from
https://scihub.copernicus.eu/dhus/#/home. Accessed: 25 August 2019
ESA. (n.d.). Global Land Cover - Sentinel-2. Retrieved from http://s2glc.cbk.waw.pl/.
Accessed: 05 December 2019
ESA, & Université Catholique de Louvain. (2010). GlobCover 2009 [Map]. Retrieved from:
http://due.esrin.esa.int/page_globcover.php. Accessed: 02 August 2019
ESA CCI Land Cover Project. (2015). Land Cover Map 2015 [Map]. Retrieved from
http://maps.elie.ucl.ac.be/CCI/viewer/index.php. Accessed: 02 August 2019
56
ESA CCI Land Cover Project. (2017). CCI Land Cover - S2 Prototype Land Cover 20m Map
of Africa 2016 [Map]. Retrieved from:
http://2016africalandcover20m.esrin.esa.int/download.php. Accessed: 05 December
2019
ESA CCI Land Cover Project. (2019). ESA CCI Land Cover - S2 Prototype Land Cover 20 m
Map of Africa 2016. Retrieved from http://2016africalandcover20m.esrin.esa.int/.
Accessed: 05 December 2019
Esch, T., Metz, A., Marconcini, M., & Keil, M. (2014). Combined use of multi-seasonal high
and medium resolution satellite imagery for parcel-related mapping of cropland and
grassland. International Journal of Applied Earth Observation and Geoinformation,
28, 230-237. doi:10.1016/j.jag.2013.12.007
Eurostat. (2018). LUCAS - Land use and land cover survey. Retrieved from
https://ec.europa.eu/eurostat/web/lucas/overview. Accessed: 13 August 2019
FAO. (2013). World Livestock 2013 - Changing disease landscapes. Retrieved from
http://www.fao.org/3/i3440e/i3440e.pdf. Accessed: 04 March 2019
FAO. (2014). Perennial Crops for Food Security. Proceedings of the FAO Expert Workshop.
28-30 August, 2013, Rome, Italy. Retrieved from http://www.fao.org/3/a-i3495e.pdf.
Accessed: 09 August 2019
FAO. (2015). Agroforestry. Definition. Retrieved from
http://www.fao.org/forestry/agroforestry/80338/en/. Accessed: 09 August 2019
FAO. (2016a). Conservation Agriculture. Retrieved from http://www.fao.org/3/a-i6169e.pdf.
Accessed: 09 August 2019
FAO. (2016b). Global Forest Resources Assessment 2015. How are the world’s forests
changing? Retrieved from http://www.fao.org/3/a-i4793e.pdf. Accessed: 04 March
2019
FAO. (2016c). The State of Food and Agriculture. Climate Change, Agriculture and Food
Security. Retrieved from http://www.fao.org/3/a-i6030e.pdf. Accessed: 06 September
2019
FAO, & ITPS. (2015). Status of the World's Soil Resources (Main Report). Retrieved from
http://www.fao.org/3/a-i5199e.pdf. Accessed: 01 March 2019
Fensholt, R., & Proud, S. R. (2012). Evaluation of Earth Observation based global long term
vegetation trends — Comparing GIMMS and MODIS global NDVI time series.
Remote Sensing of Environment, 119, 131-147. doi:10.1016/j.rse.2011.12.015
Ferguson, R. S., & Lovell, S. T. (2013). Permaculture for agroecology: design, movement,
practice, and worldview. A review. Agronomy for Sustainable Development, 34(2),
251-274. doi:10.1007/s13593-013-0181-6
Foody, G. M. (2004). Thematic Map Comparison: Evaluating the Statistical Significance of
Differences in Classification Accuracy. Photogrammetric Engineering & Remote
Sensing, 70(5), 627-633.
Frampton, W. J., Dash, J., Watmough, G., & Milton, E. J. (2013). Evaluating the capabilities
of Sentinel-2 for quantitative estimation of biophysical variables in vegetation. ISPRS
Journal of Photogrammetry and Remote Sensing, 82, 83-92.
doi:10.1016/j.isprsjprs.2013.04.007
Gagolewski, M., Tartanus, B., contributors (stringi source code), IBM and other contributors
(ICU4C source code), & Unicode Inc. (Unicode Character Database). (2019).
Character String Processing Facilities. Retrieved from https://cran.r-
project.org/web/packages/stringi/stringi.pdf. Accessed: 07 September 2019
Gandhi, G. M., Parthiban, S., Thummalu, N., & Christy, A. (2015). Ndvi: Vegetation Change
Detection Using Remote Sensing and Gis – A Case Study of Vellore District.
Procedia Computer Science, 57, 1199-1210. doi:10.1016/j.procs.2015.07.415
57
Gitelson, A. A., Kaufman, Y. J., & Merzlyak, M. N. (1996). Use of a green channel in remote
sensing of global vegetation from EOS-MODIS. Remote Sensing of Environment,
58(3), 289-298. doi:10.1016/S0034-4257(96)00072-7
Gomez, B., Banbury, K., Marden, M., Trustrum, N. A., Peacock, D. H., & Hoskin, P. J.
(2003). Gully erosion and sediment production: Te Weraroa Stream, New Zealand.
Water Resources Research, 39(7). doi:10.1029/2002WR001342
Google. (2018). Google Earth (Version 7.3.1.4507): Google. Accessed: 2018-05-15
Greenberg, J. A., & Mattiuzzi, M. (2018). Wrappers for the Geospatial Data Abstraction
Library (GDAL) Utilities: RDocumentation. Retrieved from
https://www.rdocumentation.org/packages/gdalUtils/versions/2.0.1.14. Accessed: 07
September 2019
Guo, X., Price, K. P., & Stiles, J. (2003). Grasslands Discriminant Analysis Using Landsat
TM Single and Multitemporal Data. Photogrammetric Engineering & Remote
Sensing, 69(11), 1255-1262.
Hansen, M. C., Potapov, P. V., Moore, R., Hancher, M., Turubanova, S. A., Tyukavina, A., . .
. Townshend, J. R. G. (2013). High-Resolution Global Maps of 21st-Century Forest
Cover Change [Map]. Retrieved from http://earthenginepartners.appspot.com/science-
2013-global-forest. Accessed: 07 September 2019
Hastie, T., Tibshirani, R., & Friedman, J. (2009). Elements of Statistical Learning. Data
Mining, Inference, and Prediction (Second Edition; Corrected 12th printing ed.).
Stanford, CA, USA: Springer Science+Business Media, LLC.
Herold, M., & Di Gregorio, A. (2012). Evaluating Land-Cover Legends Using the UN Land-
Cover Classification System. In Remote Sensing of Land Use and Land Cover.
Principles and Applications (pp. 425). Boca Raton, FL, USA: Tylor & Francis Group.
Hijmans, R. J., van Etten, J., Cheng, J., Sumner, M., Mattiuzzi, M., Greenberg, J. A., . . .
Wueest, R. (2019). Geographic Data Analysis and Modeling. Retrieved from
https://cran.r-project.org/web/packages/raster/raster.pdf. Accessed: 07 September
2019
Huang, C., Goward, S. N., Schleeweis, K., Thomas, N., Masek, J. G., & Zhu, Z. (2009).
Dynamics of national forests assessed using the Landsat record: Case studies in
eastern United States. Remote Sensing of Environment, 113(7), 1430-1442.
doi:10.1016/j.rse.2008.06.016
Huete, A. R. (1988). A soil-adjusted vegetation index (SAVI). Remote Sensing of
Environment, 25(3), 295-309. doi:10.1016/0034-4257(88)90106-X
Hussain, M., Chen, D., Cheng, A., Wei, H., & Stanley, D. (2013). Change detection from
remotely sensed images: From pixel-based to object-based approaches. ISPRS Journal
of Photogrammetry and Remote Sensing, 80, 91-106.
doi:10.1016/j.isprsjprs.2013.03.006
Immitzer, M. (2017). Mapping Tree Species, Forest Calamities and Growing Stock using
High Resolution Satellite Imagery - Possibilities and Limits. Vienna, Austria:
University of Natural Resources and Life Sciences.
Immitzer, M., Atzberger, C., & Koukal, T. (2012). Tree species classification with random
forest using very high spatial resolution 8-band WorldView-2 satellite data. Remote
Sensing, 4(9), 2661-2693.
Immitzer, M., Vuolo, F., & Atzberger, C. (2016). First Experience with Sentinel-2 Data for
Crop and Tree Species Classifications in Central Europe. Remote Sensing, 8(3).
doi:10.3390/rs8030166
Inglada, J., Arias, M., Tardy, B., Hagolle, O., Valero, S., Morin, D., . . . Koetz, B. (2015).
Assessment of an Operational System for Crop Type Map Production Using High
58
Temporal and Spatial Resolution Satellite Optical Imagery. Remote Sensing, 7(9),
12356-12379. doi:10.3390/rs70912356
IPCC. (2014). Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II
and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate
Change. Retrieved from https://www.ipcc.ch/report/ar5/syr/. Accessed: 05 January
2020
IPCC. (2019). IPCC Special Report on Climate Change, Desertification, Land Degradation,
Sustainable Land Management, Food Security, and Greenhous gas fluxes in
Terrestrial Ecosystems. Summary for Policymakers. Retrieved from
https://www.ipcc.ch/srccl/. Accessed: 13 January 2020
Islam, K., Jashimuddin, M., Nath, B., & Nath, T. K. (2018). Land use classification and
change detection by using multi-temporal remotely sensed imagery: The case of
Chunati wildlife sanctuary, Bangladesh. The Egyptian Journal of Remote Sensing and
Space Science, 21(1), 37-47. doi:10.1016/j.ejrs.2016.12.005
Jin, S., Yang, L., Danielson, P., Homer, C., Fry, J., & Xian, G. (2013). A comprehensive
change detection method for updating the National Land Cover Database to circa
2011. Remote Sensing of Environment, 132, 159-175. doi:10.1016/j.rse.2013.01.012
Jones, H. G., & Vaughan, R. A. (2010). Remote Sensing of Vegetation. Principles,
Techniques, and Application. New York, USA: Oxford University Press.
Kanyala Little Stars. (n.d.). Where We Are - Rusinga Island. Retrieved from
https://kanyalalittlestars.wordpress.com/where-we-are/. Accessed: 31 January 2019
Karlson, M. (2015). Remote Sensing of Woodland Structure and Composition in the Sudano-
Sahelian zone : Application of WorldView-2 and Landsat 8. Linköping, Sweden:
Linköping University.
Karlson, M., Ostwald, M., Reese, H., Bazié, H. R., & Tankoano, B. (2016). Assessing the
potential of multi-seasonal WorldView-2 imagery for mapping West African
agroforestry tree species. International Journal of Applied Earth Observation and
Geoinformation, 50, 80-88. doi:10.1016/j.jag.2016.03.004
Kaufman, Y. J., Wald, A. E., Remer, L. A., Gao, B.-C., Li, R.-R., & Flynn, L. (1997). The
MODIS 2.1-/spl mu/m channel-correlation with visible reflectance for use in remote
sensing of aerosol. IEEE transactions on Geoscience Remote Sensing, 35(5), 1286-
1298.
Kenya National Bureau of Statistics. (2010). The 2009 Kenya Population and Housing
Census. Volume 1A. Population Distribution by Administrative Units. Retrieved from
https://www.knbs.or.ke/publications/. Accessed: 10 November 2019
Kenya National Bureau of Statistics. (2012). 2009 Kenya Population and Housing Census.
Analytical Report on Population Dynamics. Retrieved from
https://www.knbs.or.ke/?page_id=3142. Accessed: 10 November 2019
Kenya National Bureau of Statistics. (2019). 2019 Kenya Population and Housing Census.
Population by County and Sub-County. Retrieved from
https://www.knbs.or.ke/?wpdmpro=2019-kenya-population-and-housing-census-
volume-i-population-by-county-and-sub-county. Accessed: 10 November 2019
Kimes, D. S., Markham, B. L., Tucker, C. J., & McMurtrey, J. E. (1981). Temporal
relationships between spectral response and agronomic variables of a corn canopy.
Remote Sensing of Environment, 11, 401-411. doi:10.1016/0034-4257(81)90037-7
Kuhn, M. (2008). Building Predictive Models in R Using the caret Package. Journal of
Statistical Software, 28(5), 26.
Kuhn, M., Contributions from Jed Wing, Weston, S., Williams, A., Keefer, C., Engelhardt,
A., . . . Hunt, T. (2019). Classification and Regression Training. Retrieved from
https://cran.r-project.org/web/packages/caret/caret.pdf. Accessed: 07 September 2019
59
Kuhn, M., & Johnson, K. (2013). Applied Predictive Modeling. New York, NY, USA:
Springer Science+Business Media.
Kussul, N., Lavreniuk, M., Skakun, S., & Shelestov, A. (2017). Deep Learning Classification
of Land Cover and Crop Types Using Remote Sensing Data. IEEE Geoscience and
Remote Sensing Letters, 14(5), 778-782. doi:10.1109/LGRS.2017.2681128
Laba, M., Gregory, S. K., Barden, J., Ogurcak, D., Hill, E., Fegraus, E., . . . DeGloria, S. D.
(2002). Conventional and fuzzy accuracy assessment of the New York Gap Analysis
Project land cover map. Remote Sensing of Environment, 81, 443-455.
Landsberg, F., & Henninger, N. (2007). Kenya GIS Data [Map]. Retrieved from:
https://www.wri.org/resources/data-sets/kenya-gis-data. Accessed: 07 September 2019
Langley, S. K., Cheshire, H. M., & Humes, K. S. (2001). A comparison of single date and
multitemporal satellite image classifications in a semi-arid grassland. Journal of Arid
Environments, 49(2), 401-411.
Leh, M., Bajwa, S., & Chaubey, I. (2013). Impact of Land Use Change on Erosion Risk: An
Integrated Remote Sensing, Geographic Information System and Modeling
Methodology. Land Degradation & Development, 24(5), 409-421.
doi:10.1002/ldr.1137
Lemon, J., Toews, M., Biancotto, E., Levy, O., Engelmann, R., Hecker, M., . . . Baral, D.
(2019). Various Plotting Functions: RDocumentation. Retrieved from
https://www.rdocumentation.org/packages/plotrix/versions/3.7-6. Accessed: 13 March
2019
Leutner, B., Horning, N., Schwalb-Willmann, J., & Hijmans, R. J. (2019). Tools for Remote
Sensing Data Analysis. Retrieved from https://cran.r-
project.org/web/packages/RStoolbox/RStoolbox.pdf. Accessed: 07 September 2019
Lewinski, S., Nowakowski, A., Rybicki, M., Kukawska, E., Malinowski, R. K., Michal
Krätzschmar, Elke Hofmann, Peter, Bielski, C., . . . Prieto, D. F. (2017). Towards
Automatic Global Land Cover Classification on Sentinel-2 Data. Paper presented at
the WorldCover 2017 Conference, Frascati, Italy.
http://s2glc.cbk.waw.pl/sites/default/files/inline-images/CBK_poster_A0_v2.pdf.
Liaw, A., & Wiener, M. (2002). Classification and Regression by randomForest. R News,
2(3), 18-22.
Lillesand, T. M., Kiefer, R. W., & Chipman, J. W. (2015). Remote sensing and image
interpretation (7th ed.). Hoboken, NJ, USA: John Wiley & Sons.
Löw, F., Knöfel, P., & Conrad, C. (2015). Analysis of uncertainty in multi-temporal object-
based classification. ISPRS Journal of Photogrammetry and Remote Sensing, 105, 91-
106.
Lu, D., & Weng, Q. (2007). A survey of image classification methods and techniques for
improving classification performance. International Journal of Remote Sensing, 28(5),
823-870. doi:10.1080/01431160600746456
Lunetta, R. S., Knight, J. F., Ediriwickrema, J., Lyon, J. G., & Worthy, L. D. (2006). Land-
cover change detection using multi-temporal MODIS NDVI data. Remote Sensing of
Environment, 105(2), 142-154. doi:10.1016/j.rse.2006.06.018
Mayaux, P., Bartholom, E., Cabral, A., Cherlet, M., Defourny, P., Di Gregorio, A., . . .
Vasconcelos, M. (2003). The Land Cover Map for Africa in the Year 2000 [Map].
Retrieved from: https://forobs.jrc.ec.europa.eu/products/glc2000/products.php.
Accessed: 02 August 2019
Montanarella, L., Pennock, D. J., McKenzie, N., Badraoui, M., Chude, V., Baptista, I., . . .
Vargas, R. (2016). World's soils are under threat. Soil, 2(1), 79-82. doi:10.5194/soil-2-
79-2016
60
Motohka, T., Nasahara, K. N., Oguma, H., & Tsuchida, S. (2010). Applicability of Green-Red
Vegetation Index for Remote Sensing of Vegetation Phenology. Remote Sensing,
2(10), 2369-2387. doi:10.3390/rs2102369
Müller-Wilm, U. (2018a). Sen2Cor Configuration and User Manual. Retrieved from
http://step.esa.int/thirdparties/sen2cor/2.5.5/docs/S2-PDGS-MPC-L2A-SUM-
V2.5.5_V2.pdf. Accessed: 30 October 2018
Müller-Wilm, U. (2018b). Sen2Cor Software Release Note. Retrieved from
http://step.esa.int/thirdparties/sen2cor/2.5.5/docs/S2-PDGS-MPC-L2A-SRN-
V2.5.5.pdf. Accessed: 30 October 2018
Mureithi, D. S. M., Mwagi, A., & Gruber, B. [Unpublished Work]. Field Report: Training on
Permaculture with emphasis on erosion control in Rusinga Island, Homabay County.
(2018). Training Report Manuscript. Rusinga Island, Kenya.
Myint, S. W., Gober, P., Brazel, A., Grossman-Clarke, S., & Weng, Q. (2011). Per-pixel vs.
object-based classification of urban land cover extraction using high spatial resolution
imagery. Remote Sensing of Environment, 115(5), 1145-1161.
doi:https://doi.org/10.1016/j.rse.2010.12.017
NASA, & METI. (2011). Routine ASTER Global Digital Elevation Model (Publication no.
10.5067/ASTER/ASTGTM.002). Retrieved from https://earthexplorer.usgs.gov/.
Accessed: 17 December 2018
NASA Applied Remote Sensing Training (Producer). (2018, 05 April 2019). Advanced
Webinar: Change Detection for Land Cover Mapping. [Webinar] Accessed: 05 April
2019
National Geomatics Center of China. (2010). Globeland30-2010 [Map]. Retrieved from
http://globallandcover.com/GLC30Download/index.aspx. Accessed: 07 September
2019
Niang, I., Ruppel, O. C., Abdrabo, M. A., Essel, A., Lennard, C., Padgham, J., & Urquhart, P.
(2014). Africa. In V. R. Barros, C. B. Field, D. J. Dokken, M. D. Mastrandrea, K. J.
Mach, T. E. Bilir, M. Chatterjee, K. L. Ebi, Y. O. Estrada, R. C. Genova, B. Girma, E.
S. Kissel, A. N. Levy, S. MacCracken, P. R. Mastrandrea, & L. L. White (Eds.),
Climate Change 2014: Impacts, Adaptation, and Vulnerability. Part B: Regional
Aspects. Contribution of Working Group II to the Fifth Assessment Report of the
Intergovernmental Panel on Climate Change Cambridge, UK & New York, NY,
USA: Cambridge University Press.
Nicholson, S. E. (2017). Climate and climatic variability of rainfall over eastern Africa.
Reviews of Geophysics, 55(3), 590-635. doi:10.1002/2016RG000544
Nyaga, M. [Unpublished Work]. Data collection on woody vegetation species diversity in
Rusinga Island, Homa Bay county, Kenya. (2018). Student Field Report. University of
Nairobi, Department of Land Resource Management and Agricultural Technology.
Nairobi, Kenya.
Nyberg, G., Knutsson, P., Ostwald, M., Öborn, I., Wredle, E., Otieno, D. J., . . . Malmer, A.
(2015). Enclosures in West Pokot, Kenya: Transforming land, livestock and
livelihoods in drylands. Pastoralism, 5(1). doi:10.1186/s13570-015-0044-7
Nyssen, J., Veyret‐Picot, M., Poesen, J., Moeyersons, J., Haile, M., Deckers, J., & Govers, G.
(2004). The effectiveness of loose rock check dams for gully control in Tigray,
northern Ethiopia. Soil Use and Management, 20(1), 55-64. doi:10.1111/j.1475-
2743.2004.tb00337.x
Okolla, L. [Unpublished Work]. Reconnaissance field visit to Rusinga Island on 2nd - 4th
May 2018. (2018). Student Field Report. University of Nairobi, Department of Land
Resource Management and Agricultural Technology. Nairobi, Kenya.
61
OpenStreetMap contributors. (2017). OpenStreetMap Data in Layered GIS Format. Retrieved
from https://download.geofabrik.de/africa/kenya.html. Accessed: 03 December 2018
Orange-Senqu River Commission. (2014). Rehabilitating rangelands for healthy headwaters:
Steps Basotho Communities are taking to reverse land degradation at the source of
the Orange-Senqu River. Retrieved from
https://iwlearn.net/resolveuid/5196ea7381784c7caba124942cab109c. Accessed: 05
September 2019
Otukei, J. R., & Blaschke, T. (2010). Land cover change assessment using decision trees,
support vector machines and maximum likelihood classification algorithms.
International Journal of Applied Earth Observation and Geoinformation, 12, S27-
S31. doi:10.1016/j.jag.2009.11.002
Pesaresi, M., Corbane, C., Julea, A., Florczyk, A., Syrris, V., & Soille, P. (2016). Assessment
of the Added-Value of Sentinel-2 for Detecting Built-up Areas. Remote Sensing,
8(299), 1-18. doi:10.3390/rs8040299
Pimentel, D., & Burgess, M. (2013). Soil Erosion Threatens Food Production. Agriculture,
3(3), 443-463. doi:10.3390/agriculture3030443
Pimentel, D., & Kounang, N. (1998). Ecology of Soil Erosion in Ecosystems. Ecosystems, 1,
416-426.
QGIS Development Team. (2019). QGIS. A Free and Open Source Geographic Information
System (Version 3.0 Girona, 3.2 Bonn, 3.4 Madeira (LTR), 3.6 Noosa): QGIS.
Accessed: 2018-05-02
R Core Team. (2018). R: A Language and Environment for Statistical Computing. Vienna,
Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-
project.org. Accessed: 02 May 2018
R Core Team. (2019). The R Graphics Devices and Support for Colours and Fonts:
RDocumentation. Retrieved from
https://www.rdocumentation.org/packages/grDevices/versions/3.6.1. Accessed: 07
September 2019
Rawat, J. S., & Kumar, M. (2015). Monitoring land use/cover change using remote sensing
and GIS techniques: A case study of Hawalbagh block, district Almora, Uttarakhand,
India. The Egyptian Journal of Remote Sensing and Space Science, 18(1), 77-84.
doi:10.1016/j.ejrs.2015.02.002
Rees, W. G. (2012). Physical Principles of Remote Sensing (3rd ed.). New York, USA:
Cambridge University Press.
Renard, K. G., Foster, G. R., Weesies, G. A., McCool, D. K., & Yoder, D. C. (1997).
Predicting Soil Erosion by Water: A Guide to Conservation Planning with the Revised
Universial Soil Loss Equation (RUSLE) (Vol. No. 703). Washingston, DC, USA: U.S.
Department of Agriculture.
Rouse, J. W., Haas, R. H., Schell, J. A., Deering, D. W., & Harlan, J. C. (1974). Monitoring
the vernal advancement and retrogradation (greenwave effect) of natural vegetation.
Accessed: 21 July 2018
RStudio. (2018). RStudio. Open Source and Enterprise-Ready Professional Software for R
(Version 1.1.447). Accessed: 02 May 2018
Schläpfer, D., Borel, C. C., Keller, J., & Itten, K. I. (1998). Atmospheric precorrected
differential absorption technique to retrieve columnar water vapor. Remote Sensing of
Environment, 65(3), 353-366.
Schultz, B., Immitzer, M., Formaggio, A., Sanches, I., Luiz, A., & Atzberger, C. (2015). Self-
Guided Segmentation and Classification of Multi-Temporal Landsat 8 Images for
Crop Type Mapping in Southeastern Brazil. Remote Sensing, 7(11), 14482-14508.
doi:10.3390/rs71114482
62
Scott, D., Mitchell, D., Shah, A., Laake, J., Henningsen, A., Andronic, L., . . . Garbade, S.
(2019). Export Tables to LaTeX or HTML: RDocumentation. Retrieved from
https://www.rdocumentation.org/packages/xtable/versions/1.8-4. Accessed: 07
September 2019
Senf, C., Leitão, P. J., Pflugmacher, D., van der Linden, S., & Hostert, P. (2015). Mapping
land cover in complex Mediterranean landscapes using Landsat: Improved
classification accuracies from integrating multi-seasonal and synthetic imagery.
Remote Sensing of Environment, 156, 527-536. doi:10.1016/j.rse.2014.10.018
Siachalou, S., Mallinis, G., & Tsakiri-Strati, M. (2015). A Hidden Markov Models Approach
for Crop Classification: Linking Crop Phenology to Time Series of Multi-Sensor
Remote Sensing Data. Remote Sensing, 7(4), 3633-3650. doi:10.3390/rs70403633
Signorell, A., Gel, Y., Mitchell, D., Barton, K., Chamely, S., Chhay, L., . . . Sabbe, N. (2017).
Tools for Descriptive Statistics: RDocumentation. Retrieved from
https://www.rdocumentation.org/packages/DescTools/versions/0.99.19. Accessed: 07
September 2019
Tewkesbury, A. P., Comber, A. J., Tate, N. J., Lamb, A., & Fisher, P. F. (2015). A critical
synthesis of remotely sensed optical image change detection techniques. Remote
Sensing of Environment, 160, 1-14. doi:10.1016/j.rse.2015.01.006
The Presidency. (2018). Kenya Vision 2030. Marking 10 Years of Progress (2008 - 2018).
Sector Progress & Project Updates, June 2018. Retrieved from
http://vision2030.go.ke/inc/uploads/2018/09/Kenya-Vision-2030-Sector-Progress-
Project-Updates-June-2018.pdf. Accessed: 05 September 2019
Tigges, J., Lakes, T., & Hostert, P. (2013). Urban vegetation classification: Benefits of
multitemporal RapidEye satellite data. Remote Sensing of Environment, 136, 66-75.
doi:10.1016/j.rse.2013.05.001
Tryon, C. A., Faith, J. T., Peppe, D. J., Keegan, W. F., Keegan, K. N., Jenkins, K. H., . . .
Beverly, E. J. (2014). Sites on the landscape: Paleoenvironmental context of late
Pleistocene archaeological sites from the Lake Victoria basin, equatorial East Africa.
Quaternary International, 331, 20-30. doi:10.1016/j.quaint.2013.05.038
Turner, W., Rondinini, C., Pettorelli, N., Mora, B., Leidner, A. K., Szantoi, Z., . . .
Woodcock, C. (2015). Free and open-access satellite data are key to biodiversity
conservation. Biological Conservation, 182, 173-176.
doi:10.1016/j.biocon.2014.11.048
UN Environment. (2018). UN Environment joins campaign to green Kenya [Press release].
Retrieved from https://www.unenvironment.org/news-and-stories/press-release/un-
environment-joins-campaign-green-kenya. Accessed: 05 September 2019
UNEP. (2009). Kenya: Atlas of Our Changing Environment. Nairobi, Kenya: Division of
Early Warning and Assessment (DEWA) & United Nations Environment Programme
(UNEP).
UNEP. (2016). GEO-6 Regional Assessment for Africa. Retrieved from
http://wedocs.unep.org/bitstream/handle/20.500.11822/7595/GEO_Africa_201611.pdf
?sequence=1&isAllowed=y. Accessed: 04 March 2019
Viña, A., Gitelson, A. A., Nguy-Robertson, A. L., & Peng, Y. (2011). Comparison of
different vegetation indices for the remote assessment of green leaf area index of
crops. Remote Sensing of Environment, 115(12), 3468-3478.
doi:10.1016/j.rse.2011.08.010
Vuolo, F., Richter, K., & Atzberger, C. (2011). Evaluation of time-series and phenological
indicators for land cover classification based on MODIS data. Paper presented at the
Remote Sensing for Agriculture, Ecosystems, and Hydrology XIII, Prague, Czech
Republic.
63
Wagenknecht, B. (Personal Communication). Background on Books for Trees, Badilisha Eco-
village Foundation Trust, and Rusinga Island. 2018, 16 March.
WCED. (1987). Our Common Future. Retrieved from http://www.un-documents.net/wced-
ocf.htm. Accessed: 09 August 2019
Weih, R. C., & Riggan, N. D. (2010). Object-Based Classification Vs. Pixel-Based
Classification: Comparative Importance of Multi-Resolution Imagery. The
International Archives of the Photogrammetry, Remote Sensing and Spatial
Information Sciences, XXXVIII-4(C7), 6.
Wibowo, A., Ismullah, I. H., Dipokusumo, B. S., & Wikantika, K. (2012). Land Degradation
Model Based on Vegetation and Erosion Aspects Using Remote Sensing Data. ITB
Journal of Sciences, 44(1), 19-34. doi:10.5614/itbj.sci.2012.44.1.3
Willis, K. S. (2015). Remote sensing change detection for ecological monitoring in United
States protected areas. Biological Conservation, 182, 233-242.
doi:10.1016/j.biocon.2014.12.006
Wulder, M. A., Franklin, S. E., White, J. C., Linke, J., & Magnussen, S. (2006). An accuracy
assessment framework for large‐area land cover classification products derived from
medium‐resolution satellite data. International Journal of Remote Sensing, 27(4), 663-
683. doi:10.1080/01431160500185284
Yang, X., Xu, B., Jin, Y., Qin, Z., Ma, H., Li, J., . . . Zhu, X. (2015). Remote sensing
monitoring of grassland vegetation growth in the Beijing–Tianjin sandstorm source
project area from 2000 to 2010. Ecological Indicators, 51, 244-251.
doi:10.1016/j.ecolind.2014.04.044
Yin, H., Pflugmacher, D., Kennedy, R. E., Sulla-Menashe, D., & Hostert, P. (2014). Mapping
Annual Land Use and Land Cover Changes Using MODIS Time Series. IEEE Journal
of Selected Topics in Applied Earth Observations and Remote Sensing, 7(8), 3421-
3427. doi:10.1109/jstars.2014.2348411
Yu, W., Zhou, W., Qian, Y., & Yan, J. (2016). A new approach for land cover classification
and change analysis: Integrating backdating and an object-based method. Remote
Sensing of Environment, 177, 37-47. doi:10.1016/j.rse.2016.02.030
Zhu, Z., & Woodcock, C. E. (2014). Continuous change detection and classification of land
cover using all available Landsat data. Remote Sensing of Environment, 144, 152-171.
doi:10.1016/j.rse.2014.01.011