LAB 5: SUPERVISED CLASSIFICATION - uregina.cauregina.ca/piwowarj/geog309/Lab5.pdf · Geography 309...
Transcript of LAB 5: SUPERVISED CLASSIFICATION - uregina.cauregina.ca/piwowarj/geog309/Lab5.pdf · Geography 309...
Geography 309 Lab 5 Page 1
J.M. Piwowar 2010.11.05
LAB 5: SUPERVISED CLASSIFICATION
Question Sheets
Due Date: November 19
Objectives
to study some of the mechanics of supervised classification
Preparation
Re-read Chapters 12 & 14 in your text.
Notes
1. Image classification is a complex task and it takes considerable effort to become comfortable
with classification concepts. As in previous Labs, you are asked to do some basic image
manipulations “by hand” using a pencil and paper before attempting a similar operation on
the computer. In Part B you will apply the concepts you have learned in Part A to classify a
digital image on the computer. The manual techniques described here are very similar to the
tasks performed by computer, however, the computer's great speed permits it to handle much
larger images with more channels and greater radiometric range.
2. All Figures and Tables referenced in Part A are included on the Answer Sheets attached to the
end of this Lab. Please use these Answer Sheets to submit your answers to Part A.
3. Detailed instructions for image classification using Geomatica can be found on-line by
following the Geomatica Visual Guide link from the bottom of the course homepage.
4. At any time when you are working with the Geomatica software, you can save a "snapshot"
of your work in progress by Saving it in a Project (look in the File menu). Be sure to save
your projects in your personal directory space.
A. Manual Image Classification1
1. Classification Using One Band
Data from a sample two-band image are shown in Figure 1. Figure 2 represents a ground
verification map for a particular environmental feature: "forest". Three sites have been positively
identified on the ground as being forested terrain, and the experts who collected this data are
further assured that the combination of these three sites is representative of all forest types to be
found in the area covered by the remotely sensed image. The ground verification map has been
geometrically registered to the image, so that the same point on the map and the image may now
be referenced by line and pixel coordinates. By using the spectral signature of these verified
1 This material is derived from material presented in the publication Introduction to Digital Images and Digital
Image Analysis Techniques by Tom Alföldi
Geography 309 Lab 5 Page 2
J.M. Piwowar 2010.11.05
forest areas, one may find all other "forest" pixels. This can be done by searching all pixels in the
scene for similar spectral signatures. The first task is then to define the spectral characteristics of
the given training sites.
PIXELS PIXELS 1 2 3 4 5 6 7 1 2 3 4 5 6 7
L 1 5 3 4 5 4 5 5 L 1 5 5 4 6 7 7 7
I 2 2 2 3 4 4 4 6 I 2 2 4 6 5 5 6 5
N 3 2 2 3 3 6 6 8 N 3 5 3 5 7 6 6 8
E 4 2 2 6 6 9 8 7 E 4 3 4 5 6 8 8 7
S 5 3 6 8 8 8 7 4 S 5 3 5 8 8 8 7 1
6 3 6 8 7 2 3 2 6 4 5 8 7 1 0 0
7 4 6 7 3 3 2 1 7 3 6 7 0 0 0 0
BAND ‘A’ BAND ‘B’
Figure 1: Image 1.
PIXELS 1 2 3 4 5 6 7
L 1 F
I 2 F
N 3
E 4 F
S 5
6
7
Figure 2: Ground verification map for Forest.
Step 1: For each verified "forest" pixel in Figure 2, extract the intensity levels from the corresponding locations in Figure 1. Enter these values in Table A1 on the Answer Sheet.
Notice that although the intensity values are not identical for each "forest" pixel, they are very similar. The category or class, "forest", may then be assumed to be characterized by the range of intensities found. The range is defined by the minimum and maximum value from these three samples. Enter these ranges in Table A1.
Question 1: (1 mark) Submit your completed Table A1.
The actual classification process now involves searching for all pixels that have an intensity level
falling within the range of intensities found in the test sites. This is done separately for each
band.
Geography 309 Lab 5 Page 3
J.M. Piwowar 2010.11.05
Step 2: Using the band ‘A’ digital image of Figure 1, scan all of the pixels in the image for intensities falling in the range defined for band ‘A’ in Table A1. For any pixel with a band ‘A’ intensity falling in this range (inclusive of the minimum and maximum intensities), shade in the corresponding pixel in the band 'A' image in Figure 3. Repeat the process for band ‘B’.
PIXELS PIXELS 1 2 3 4 5 6 7 1 2 3 4 5 6 7
L 1 L 1
I 2 I 2
N 3 N 3
E 4 E 4
S 5 S 5
6 6
7 7
BAND ‘A’ BAND ‘B’
Figure 3: Classified Forest maps.
The two forest theme maps in Figure 3 represent the same environmental feature (forest), yet are
different because each map was generated using information from one band only. The procedure
used to produce these theme maps is similar to a rudimentary form of intensity "slicing". One
specific range of intensities was sliced from the total available range. A more valid classification
may be produced if the intensities in both bands were to be considered simultaneously.
2. Parallelepiped Classification
In order to classify an image in a multi-spectral (or multi-band) mode, the intensities in all bands
must be considered simultaneously. In Figure 4, the range of intensities in band „A‟ representing
"forest", is represented along the x-axis by the shaded area from band „A‟ intensity 2 to 5,
inclusively. Similarly for band „B‟, "forest" is represented along the y-axis by the shaded area of
intensities 3 to 7, inclusively. The overlap of these two individual intensity ranges in this
two-dimensional diagram is a cross-hatched area representing the multispectral rectangular
spectral signature of "forest". In order to produce a multispectral classification of "forest", it is
necessary to find all pixels of the image whose spectral coordinates fall inside the cross-hatched
rectangular area.
Step 3: Using the digital images of Figure 1, scan all pixels for band ‘A’ intensities of 2, 3, 4, or 5. When a pixel has one of these band ‘A’ intensities, check if its band ‘B’ intensity is 3, 4, 5, 6 or 7. If a pixel agrees with both of these criteria, then shade in the corresponding pixel in Figure 5. Only the last four lines of the image need to be considered, since the first three lines are already mapped in this manner.
As a check on this classification, note that the forest map of Figure 5 is the overlap (logical
AND) between the band 'A' and band 'B' forest maps in Figure 3.
Geography 309 Lab 5 Page 4
J.M. Piwowar 2010.11.05
Ba
nd
‘B
’ In
ten
sitie
s
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9 Band ‘A’ Intensities
Figure 4: Feature space portion representing Forest.
PIXELS 1 2 3 4 5 6 7
L 1
I 2
N 3
E 4
S 5
6
7
Figure 5: Forest classification from Bands ‘A’ and ‘B’.
The accuracy of every classification should be verified by comparing a sample of classified
pixels with the actual feature found on the ground (a process known as ground truthing).
Assume that the forest map of Figure 5 has been ground truthed by visual inspection and the
results are shown in Figure 6. Note that there may not be an exact correspondence between your
classification and the ground truth map due to limitations of the classification procedure and/or
errors.
The person who ground truthed the classification, further divided the forest category into
homogeneous stands of coniferous and deciduous forest. Figure 6 shows the spatial distribution
of these two forest types. You want to use this information to refine your rectangular spectral
signatures to include these two forest types. The task is to delineate the portions of spectral
feature space corresponding to "coniferous" and "deciduous".
Geography 309 Lab 5 Page 5
J.M. Piwowar 2010.11.05
PIXELS 1 2 3 4 5 6 7
L 1 D D D D D
I 2 C D
N 3 C C D
E 4 C C
S 5 C
6 C
7 C
Figure 6: Ground truth map: Forest Types.
C = coniferous; D = deciduous
Ba
nd
'B' I
nte
nsitie
s
9 8 7 6 5 4 3 2 1 0
0 1 2 3 4 5 6 7 8 9 Band ‘A’ Intensities
Figure 7: "Forest" type designation in feature space.
Step 4: For each pixel identified in the ground verification map of Figure 6, find the corresponding band ‘A’ and band ‘B’ intensities in Figure 1. Plot each such spectral coordinate in Figure 7 using symbols ‘C’ (coniferous) and ‘D’ (deciduous). Define the rectangular spectral signature of "coniferous forest" by drawing a rectangle enclosing the minimum and maximum extents of the 'C' pixels in Figure 7. The two vertical sides of this rectangle are the lower and upper limits of the range of intensities for band ‘A’. The top and bottom lines of the rectangle are the upper and lower limits of the range of intensities for band ‘B’. Draw a similar rectangle for the spectral signature of "deciduous forest".
Note that the two rectangles in Figure 7 will partially overlap. It is this overlapping of spectral
signatures in feature space which is a major limiting factor to the usefulness of the parallelepiped
classification method. If a particular pixel has spectral coordinates, which fall into the overlap
region, then you would be unable to determine if this pixel represents coniferous or deciduous
forest.
Geography 309 Lab 5 Page 6
J.M. Piwowar 2010.11.05
Review
You have just completed one type of supervised classification. In this process, you used
representative samples of the environmental feature to be mapped (forest) to determine the range
of pixel intensities for this feature in each band. The range of intensities that correspond to the
feature of interest were plotted in feature space as spectral band versus spectral band. The
rectangle defined in two-dimensional feature space is known as the rectangular spectral
signature of the environmental feature. When more than two dimensions are used (the Enhanced
Thematic Mapper has 7 bands, for example), then N-bands can produce an N-dimensional
parallelepiped spectral signature, which is analogous to the (two-dimensional) rectangle
produced above. The frequent creation of overlapping spectral signatures is one of the prime
limitations of the parallelepiped classification approach.
3. Multispectral Vector Classification
A further refinement of the spectral signature, as defined by parallelepiped multispectral
classification, is possible. The multispectral vector classifier looks inside any feature space
rectangle to identify each spectral coordinate (also known as a cell or vector) and the number of
pixels which are associated with each coordinate. This defines the data density distribution in
feature space.
To illustrate, let's use a multispectral vector classification scheme to map coniferous and
deciduous forests in a different image. In this process, you will use the image data in Figures 6
and 7 to train the classifier and apply the derived signatures to a new image. In other words,
Figures 6 and 7 contain the test sites from which the spectral signatures of the two forest types
have been defined. These spectral signatures will then be extrapolated to another area (a new
image) to look for similar environmental features.
Figure 8 contains the digital images of band „A‟ and band „B‟ of the new scene ("Image 2"). A
vector is the spectral set of pixel values associated with a single spatial pixel. Using the data
from Figure 8 as an example, the spectral vector for the spatial pixel at Pixel 3, Line 5 is {5, 7}.
You are going to use the spectral signatures from "Image 1" (i.e., from the data in Figure 1) to
classify Image 2 into coniferous and deciduous forest types.
PIXELS PIXELS 1 2 3 4 5 6 7 1 2 3 4 5 6 7
L 1 3 4 1 1 2 2 2 L 1 7 7 0 0 0 3 3
I 2 4 4 4 2 1 2 2 I 2 7 7 6 0 0 3 4
N 3 3 3 5 2 2 2 2 N 3 4 4 7 1 1 3 4
E 4 5 5 2 2 2 2 2 E 4 7 7 4 4 3 4 4
S 5 4 5 5 2 2 2 2 S 5 6 7 7 4 5 5 5
6 4 3 3 4 4 5 5 6 7 5 5 5 7 7 6
7 5 5 3 4 4 5 4 7 7 6 6 7 7 7 7
BAND ‘A’ BAND ‘B’
Figure 8: Image 2.
Geography 309 Lab 5 Page 7
J.M. Piwowar 2010.11.05
Step 5: Figure 9 shows the Image 1 feature space representation of the spectral signatures of coniferous (C) and deciduous (D) forests, as previously constructed. For each pixel in Image 2 (Figure 8) look up the spectral vector (band ‘A’ and band ‘B’ intensity values) in Figure 9. In order to classify any pixel as one of the two forest types, the band ‘A’ and band ‘B’ intensities must fall within one of the cells in Figure 9 marked as ‘C’ or ‘D’. It is not sufficient for the spectral coordinates to merely fall within the rectangular limits: it is mandatory that the spectral coordinates being considered coincide with a cell marked by ‘C’ or ‘D’. Only in this manner will ambiguities related to the overlap region be avoided. Those pixels identified as coniferous or deciduous should be marked appropriately as ‘C’ or ‘D’ on the theme map of Figure 10 (on the Answer Sheet). You only need to classify the first four lines of Figure 8 since the last three lines have been mapped for you.
Ba
nd
'B' I
nte
nsitie
s
9 8
7 D D D 6 D D
5 C D
4 C C 3 C C C
2 1 0
0 1 2 3 4 5 6 7 8 9 Band ‘A’ Intensities
Figure 9: Rectangular spectral signatures from Image 1.
The above procedure is sometimes called N-dimensional training, to refer to the fact that the
intensities of more than one band are considered simultaneously. This type of training and
classification scheme is also known as non-parametric since the absolute location of spectral
coordinates in feature space is the criterion for defining an environmental feature and not
statistical parameters such as mean and standard deviation.
4. Interpretations from Spectral Signatures
Although ground verification is essential for the accurate assignment of environmentally valid
names to features found on an image, it is possible, however, to make deductions concerning an
unverified feature from its spatial and spectral characteristics.
If band „B‟ of Image 2 has been acquired in the near-IR part of the spectrum, we could expect a
one-dimensional histogram of this band to show a marked difference between a low intensity
mode corresponding to water and a high intensity mode corresponding to land features. This
characteristic can be seen in Figure 11. The left mode of the histogram comprising band „B‟
intensities less than 2 will relate to water pixels to a high degree of certainty. (The major
Geography 309 Lab 5 Page 8
J.M. Piwowar 2010.11.05
conflicting phenomenon would be shadow areas due to clouds or mountains, which would also
result in low near-IR intensities.)
Step 6: Find those pixels in Figure 8 that have band ‘B’ intensities less than 2 and mark the corresponding pixels in Figure 10 as ‘W’ (for water).
Num
be
r o
f P
ixels
17
16
15
14
13
12
11
10
9
8
7
6
5
4
3
2
1
0 1 2 3 4 5 6 7 8 9 Band ‘B’ intensities
Figure 11: One-dimensional histogram for the near-IR band of Image 2.
Ba
nd
'B' I
nte
nsitie
s
9
8
7 1 8 8
6 1 2 2
5 3 2 1
4 7 2
3 5
2
1 2
0 3 2 0 1 2 3 4 5 6 7 8 9 Band ‘A’ Intensities
Figure 12: Two-dimensional histogram for Image 2.
deciduous
coniferous
Geography 309 Lab 5 Page 9
J.M. Piwowar 2010.11.05
Figure 12 shows the two-dimensional histogram for Image 2. Note the cluster of three
low-intensity cells that correspond to "water". As a check of your water classification in
Question 7, sum the number of pixels corresponding to these three cells and verify that the
number of pixels that you marked „W‟ in Figure 10 agrees with this sum.
The portions of feature space, which were previously defined as representing coniferous and
deciduous forest, are indicated in Figure 12. It is reasonable to assume that the portion of feature
space that lies between two specific spectral signatures, may represent a mixed environmental
target. Thus, the cells lying between the areas shown as „coniferous‟ and „deciduous‟ may be the
spectral representation of the mixture of these two forest types, namely mixedwood.
Step 7: Identify the pixels that are represented by the (three) cells between deciduous and coniferous, and mark them on Figure 10 as ‘M’. Verify that the correct number of pixels have been identified as mixedwood by summing the densities of the three cells in Figure 12.
Question 2: (1 mark) Submit your classified image (Figure 10).
Question 3: (1 mark) What is the basic limitation of the parallelepiped classification technique?
Question 4: (1 mark) List the spectral vector for the spatial pixel at Pixel 3, Line 5 of Image 1.
B. Digital Image Classification
Now that you have seen how a supervised classification works with test data, you are ready to try
one using Geomatica. Although classification is based on sound scientific principles, obtaining a
satisfactory result depends a lot on the skills and creativity of the image analyst. A typical
supervised classification session involves several iterations, or repetitions, before a final version
is accepted. After each iteration, the classified image is examined for accuracy and
completeness. If the analyst is not satisfied with the result, they will modify the training set(s)
based on ancillary data gathered from field observations, aerial photographs, maps, and other
sources, and try the classification again. The classes used in this lab are broad enough that you
won't need to use any of these ancillary data sources: you should be able to interpret land cover
directly from the imagery. You will probably have to make 2 or 3 iterations, however, before
you are satisfied with your result.
Because of the mis-match that usually exists between the information classes desired by the
analyst and the spectral classes inherent in the data, it is impossible to obtain a classification that
is 100% accurate. Your goal is to reach an acceptable level of accuracy and completeness within
the time and resource constraints of this lab.
Geography 309 Lab 5 Page 10
J.M. Piwowar 2010.11.05
Assignment
In Lab 1, you examined the appearance of several land cover / land use types in the Regina
imagery. Use these categories, as listed in Table B1, as the basis for a supervised classification
of the scene.
1. Add your supervised classification to the same file you used for your unsupervised
classification in Lab 4.
2. Following the procedure as outlined in the Geomatica Visual Guide, set up your image for a
Supervised Classification. Use the Session Configuration exactly as it is shown on the
Supervised Classification web page. Don't forget to add 2 new layers to your image.
3. Draw Training Sets for each of the classes listed in Table B1 (where applicable). Each
category (class) that you want in your final image will have its own Training Set. Each
Training Set will be composed of several individual Training Sites. You should select
Training Sites from different parts of your image to get a good representation of the spectral
values for that feature. It is preferable to have a Training Set composed of several small,
widely dispersed Training Sites rather than just one or two large, localized Training Sites.
4. A spectral signature is a statistical summary of the spectral values contained within each
training set. Geomatica creates spectral signatures for your training sets automatically when
you save them. In an ideal world, the spectral signature for each class will be unique with
respect to the other classes you are trying to identify. One tool that you can use to visually
examine the spectral separability of your signatures is to plot their distributions on a spectral
scatterplot.
Following the directions for generating Class Signature Ellipses (at the bottom of the
Supervised Classification page in the on-line Geomatica Visual Guide), prepare a scatterplot
for your spectral signatures.
o Plot the image's red band along the X-axis and the near-IR band along the Y-axis.
o Export the plot as a TIFF file.
Question 5: (2 marks) Submit a colour print of your class signature ellipses.
2
What can you say about the distribution of your classes in this feature space? Do you think this will affect your classification? Why?/Why not?
Question 6: (1 mark) Submit a colour map of an image showing the training sets you used.
2 Turn off all the other
display layers so that just your training sets are showing. Add a Neatline, Border, Legend, and Title to your map.
5. Perform several classifications of your image according to the Parallelepiped, Minimum
Distance, and Maximum Likelihood algorithms and their various options. You will probably
have to fine-tune your training areas and classification algorithms until you create an image
that you feel best describes the land use / land cover of the Regina area. No more than about
10% of your image should be unclassified.
2 I require colour prints of your plots and images. If you do not have access to a colour printer, you may e-mail it to
me or I can copy it onto my USB memory stick during the lab.
Geography 309 Lab 5 Page 11
J.M. Piwowar 2010.11.05
Table B1: USGS Classification Categories
(see http://landcover.usgs.gov/classes.php for class definitions)
Land Cover / Land Use
1 Water 11 Open Water
12 Perennial Ice/Snow
2 Developed 21 Low Intensity Residential
22 High Intensity Residential
23 Commercial/Industrial/Transportation
3 Barren 31 Bare Rock/Sand/Clay
32 Quarries/Strip Mines/Gravel Pits
33 Transitional
4 Forested Upland 41 Deciduous Forest
42 Evergreen Forest
43 Mixed Forest
5 Shrubland 51 Shrubland
6 Non-Natural Woody 61 Orchards/Vineyards/Other
7 Herbaceous Upland Natural/Semi-
natural Vegetation 71 Grasslands/Herbaceous
8 Herbaceous Planted/Cultivated 81 Pasture/Hay
82 Row Crops
83 Small Grains
84 Fallow
85 Urban/Recreational Grasses
9 Wetlands 91 Woody Wetlands
92 Emergent Herbaceous Wetlands
6. Prepare a Map Composition to show your classified image.
Question 7: (3 marks) Submit a colour map of your classified image that you feel best describes the land use / land cover of the Regina area.
2
7. Using the Classification Report, prepare a summary table to show the spatial extents of your
classes across your image. Use the following headings in your table and add a line along the
bottom that shows the total # of pixels, the total percentages and the total area.
Geography 309 Lab 5 Page 12
J.M. Piwowar 2010.11.05
Class # Class Name # pixels Image Coverage
… … … … …
Totals Total #
pixels
Total
percentages
Total area
Question 8: (1 mark) Submit a copy of your classification summary table.
8. Assess the accuracy of your classification.
In Focus, right-click on Classification Metalayer; select Post-classification Analysis
Accuracy Assessment
Select Classified Image… - select the image channel that contains your classification
Generate Random Sample… - set the Number of Samples to 50
For each random sample in the list at the bottom right of the window:
o Click on the sample number
o The image will automatically centre itself on that location
o Turn the classification layer off and on to reveal the colour composite image.
Determine what the real class is of this pixel.
o Select the real class of this pixel from the list of classes along the bottom left of
the window. If the real class of this pixel is not represented by any of the classes
in the list, leave this sample blank and move on to the next one.
When you have examined all of the random points click Accuracy Report
o Select the Sample Report Listing tab. Click Generate Report. Click Save
Report… Browse to your personal folder and give your report a name. Click
Append.
o Select the Error (Confusion) Matrix tab. Click Generate Report. Click Save
Report… Browse to the report file you just created. Click Append.
o Select the Accuracy Statistics tab. Click Generate Report. Click Save Report…
Browse to the report file you just created. Click Append.
Close the Accuracy Assessment window.
Open the accuracy assessment report file you just created using Microsoft Word and
examine its contents.
Question9: (1 mark) Submit a copy of your accuracy assessment report file.
Geography 309 Lab 5 Page 13
J.M. Piwowar 2010.11.05
9. Write a summary paragraph describing your classification. Your paragraph should include
the following points:
Which classification algorithm you used; Why you selected that algorithm;
What is the Overall Accuracy of your classification? What is the Overall Kappa Statistic
of your classification? What does the kappa statistic tell us?
Which classes you feel are well classified in your image; refer to your accuracy
assessment report to substantiate your claim;
Which classes you feel are not-so-well classified in your image; refer to your accuracy
assessment report to substantiate your claim;
An analysis of why these classes were not well classified.
Question 10: (5 marks) Submit your classification summary paragraph.
Geography 309 Lab 5 Page 14
J.M. Piwowar 2010.11.05
NAME: MARK
LAB 5: SUPERVISED CLASSIFICATION
Answer Sheet
Due Date: November 19
Question 1: (1 mark) Submit your completed Table A1.
Table A1: Intensity Ranges for Forest.
Location 1 Location 2 Location 3 Range
enter (pixel,line)
coordinates below
enter (pixel,line)
coordinates below
enter (pixel,line)
coordinates below Minimum Maximum
( , ) ( , ) ( , )
Band 'A' Intensities
Band 'B' Intensities
Question 2: (1 mark) Submit your classified image (Figure 10).
PIXELS 1 2 3 4 5 6 7 L 1 I 2 N 3 E 4 S 5 D D D C C C C 6 D D D D 7 D D D D D D
Figure 10: Classified Image 2.
C = Coniferous Forest; D = Deciduous Forest