CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's...
Transcript of CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's...
CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE LEARNING
T. Ishikawa 1, A. Hayashi2, S.Nagamatsu3, Y. Kyutoku1, I. Dan1, T. Wada3, K. Oku3, Y. Saeki3, T. Uto3, T. Tanabata2, S. ISOBE2
N. Kochi1, 2 *
1 Faculty of science and Engineering, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan –
[email protected] 2 Department of Frontier Research Plant Genomics and Genetics, Kazusa DNA Research Institute, 2-6-7, Kazusa-Kamatari,
Kisarazu, Chiba, 292-0818, Japan 3 Agro-Environment Sciences and Biotechnology Div., Fukuoka Agriculture and Forestry Research Center, 587, Yoshiki,
Chikushino, Fukuoka, 818-8549, Japan
Commission II, WG II/10
KEY WORDS: Shape Recognition, Classification, Measurement, Chain code, Machine learning, Random Forest, Agriculture
ABSTRACT:
Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity, and value of the
products. For strawberries, the nine types of fruit shape were defined and classified by humans based on the sampler patterns of the
nine types. In this study, we tested the classification of strawberry shapes by machine learning in order to increase the accuracy of
the classification, and we introduce the concept of computerization into this field. Four types of descriptors were extracted from the
digital images of strawberries: (1) the Measured Values (MVs) including the length of the contour line, the area, the fruit length and
width, and the fruit width/length ratio; (2) the Ellipse Similarity Index (ESI); (3) Elliptic Fourier Descriptors (EFDs), and (4) Chain
Code Subtraction (CCS). We used these descriptors for the classification test along with the random forest approach, and eight of the
nine shape types were classified with combinations of MVs + CCS + EFDs. CCS is a descriptor that adds human knowledge to the
chain codes, and it showed higher robustness in classification than the other descriptors. Our results suggest machine learning's high
ability to classify fruit shapes accurately. We will attempt to increase the classification accuracy and apply the machine learning
methods to other plant species.
* Corresponding author
1. INTRODUCTION
Shape is one of the most important traits of agricultural
products due to its relationships with the quality, quantity and
value of the products. Shape is also important in cultivar
identification, as cultivars are defined by their morphological
characteristics. Although mechanization and automation are
used widely in agriculture (mainly for cultivation systems such
as planting and harvesting), automated technologies for the
identification of the shapes of agricultural products have not
been established. The identification of the shapes of agricultural
products has been done for centuries by visual assessment only,
and the criteria for judgement have not been well defined. It has
been considered challenging to transfer the classification of
shapes from visual assessment to computerization, in part
because it is difficult to verbally describe shapes in detail and in
a standard manner.
Strawberries are an important fruit species used worldwide as
fresh or processed products. In Japan, strawberries are third in
production value among agricultural crops (after rice and
tomatoes), contributing a significant portion of the agricultural
economy. Higher value is attached to shape as a trait for several
horticultural products — including strawberries — compared to
other crops, because it directly links to the price per unit. In
addition, a specific fruit shape is often preferred such as a
conical shape of strawberries for visual presentations. A
cultivar's specific shape is also important for a brand's value; an
example is the round shape of the strawberries of the cultivar
‘Fukuoka S6 Go’ (Trademark name, ‘'Amaou.')
Agricultural products are not generated from artificial designs,
and the shapes of the products are not uniform. In addition, the
shapes of these products can change depending on the growing
conditions. This is true for strawberries, and there are no well-
defined guidelines for the classification of strawberry shapes.
Therefore, sample patterns defined by agriculture authorities
have been used for shape classification, and the judgement is
done by a human's comparison of the sample pattern and the
fruit. This judgement often changes depending on the 'human
sensor' used, i.e., among individuals. Practice is thus required to
increase a human sensor's precision.
In Japan, the nine types of strawberry shape are defined in the
guideline issued for strawberries by the Plant Variety Protection
(PVP) office at the Ministry of Agriculture, Forestry and
Fisheries (MAFF, 2011). That guideline was issued for the
registration of strawberry varieties, and the nine shape types are
used in for characterizing the shape of developed and registered
novel strawberry varieties.
In this study, we tested the classification of strawberry shapes
by machine learning with the goal of increasing the accuracy of
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy
This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.
463
the classification, and in order to introduce computerization into
this field. We describe the types of strawberry shape in Section
2, our image analysis (Section 3), machine learning and feature
descriptions for classification (Section 4), our results (Section
5), and a discussion and summary (Section 6).
2. THE NINE TYPES OF STRAWBERRY SHAPE
The nine types of strawberry shape defined by the PVP office at
MAFF (Fragaria L. Char. 37, Fruit: shape) are shown in Figure
1: reniform, conical, cordate, ovoid, cylindrical, rhomboid,
obloid, globose, and wedged.
Figure 1. Strawberry shapes
(cited from Fragaria L. Char. 37, Fruit: shape)
Table 1. The combinations of the three criteria used for the
classification of the nine types of strawberry shape
(a) Proportions of length and width (b) Shape of the top part
(raised, flat, curving)
(c) Shape of the bottom part
Figure 2. The combinations of the three criteria
Figure 3. Fruit images of the nine types
No verbalized definition of the shapes is described in the
guideline. However, according to the strawberry breeders at the
Fukuoka Agricultural and Forestry Research Centre, several
criteria are commonly recognized for the classification of the
nine strawberry shapes: (a) the proportions of the length and
width of the strawberry; (b) the shape of the top part, i.e., the
part connecting the hull (raised, flat, or curving); and (c) the
shape of the bottom part (sharpness or roundness). The
combinations of these three criteria for classification are
summarized in Table 1 and illustrated in Figures 2 (a) – (c).
Examples of the nine sample patterns of the fruit are provided in
Figure 3.
3. IMAGE ANARYSIS
3.1 Image capture
A total of 2,969 strawberry fruit images were captured by a
digital camera (Canon EOS 40D). The 388 strawberry plants in
a Multi-parent Advanced Generation Inter-crosses (MAGIC)
population, which was derived from six funder lines, were
grown at the Fukuoka Agricultural and Forestry Research
Centre (Wada et al. 2017a). The harvested fruits were put on
Styrofoam trays (Figure. 4), and the digital images were taken
under artificial lighting covered with scattering light film.
This time we don’t use the colour checker. We are going to use
it for evaluation of breed improvement in future. The image
resolution of the system in this study was 0.08 mm/ pixel and
measuring accuracy was estimated as < 0.8 mm when the range
of error by lens distortion was assumed within 10 pixels
(calibrated result). The acceptable range of error was considered
as of 0.5~1.0mm in practical measurement, therefore we
considered that camera calibration was not an essential process
in our measuring system and skipped the process to increase the
throughput of the work. The accuracy of measurement was
rather promoted by increase of number of leaning data. The
digital images were processed by the following image analysis.
Figure 4. Input image Figure 5. Extracted fruit images
3.2 Image analysis software
The length (i.e., the major axis of the ellipse), the width (the
minor axis of the ellipse) and the ratio of width/length in each
fruit image were determined based on the software developed
by Hayashi et al. (2017b). This software uses the algorithms
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy
This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.
464
developed by Tanabata et al. (2012), and it automatically
extracts the fruit from the image's background based on colours.
The contour line of the fruit is determined by the software, and
ellipse approximation was performed for the measurement of
the length and width of the ellipse. Figures 4–6 are an input
image, extracted fruit images, and an approximate ellipse,
respectively.
We developed a program for generating chain codes (Freeman,
1974) of fruit contour lines in order to add feature descriptors.
The directions of approximate ellipses were identified by
extracting the top and bottom coordinates of the approximate
ellipses (Figure. 6). The start points of the chain codes were
identified, and the directions of connection points were
determined. We performed a principal component analysis
(PCA) with Elliptic Fourier Descriptors (EFDs) that were
determined based on the
chain codes as well as other
feature descriptors.
Figure 6. Approximate ellipse Figure 7. Encoding direction
3.3 SHAPE (software for the calculation of the EFDs)
The Elliptic Fourier Descriptors (EFDs) method, a
mathematical method for the description of contours, is often
used for the investigation of organs in organisms (Furuta et al.
1995, Keith et al. 2013). In this study, we determined the EFDs
with the use of freeware, SHAPE (Iwata and Ukai, 2002), which
was designed for the quantitative evaluations of biological
shapes based on EFDs. SHAPE has functions for extracting
contour lines from the digital images of (plant) organs, and for
investigating the contours in quantitative ways based on EFDs
and principal component scores. The features extracted as
principal components are visualized, and the scores are used as
the descriptors of the features. Shape is operated on Windows
OS, and includes four applications: (1) the extraction of contour
lines, (2) the derivation and normalization of the EFDs, (3) a
principal component analysis, and (4) visualization of the PCA
results (Figure 8). We input the chain codes data to SHAPE,
and we investigated the fruit shapes by using functions (2) to
(4). The obtained Fourier series was used for machine learning
as a feature descriptor.
Figure 8. The work of SHAPE
4. METHODS
4.1 Machine learning
Several studies were reported for the classification of shapes by
machine leaning; Classification of handwriting numeric
characters by using chain code and Fourier descriptors, (G. G.
Rajput, 2010), classification of individual fruits and plants
patterns (Anderson et al. 2010), and classification of specific
varieties based on contour lines. (Adams et al. 2017). Backhaus
et al. (2010) developed a phenotyping tool that represents the
change of partial shape of the leaf as contour bending energy.
In fruit classification, Arivazhagan et al. (2010) reported
identification of 15 different species from 2,635 digital fruit
images by using amount of characteristics derived from wavelet
transformation. Classification of fruits in different species by
SVM and K-NN were also reported by Zhang et al. (2012) and
Zawbaa et al. (2014). Identification of fruits in species level
(such as apple and banana) is more easier than classification of
fruit shape patterns in same species (such as this study) because
the target fruits shows obviously different shape among
different species. There were several studies reported
classification of fruits in same species; Separating 'good' and
'defective' strawberry fruits by neural network (Morimoto et al.,
2000), classification of lemon fruits in three types by color and
volume (Khojastehnazh et al. 2010), and classification of
strawberry fruits in four shape patterns (Nagata et al. 1996 and
Liming et al. 2010). Gonzalo et al. (2009) classified tomato
fruits in nine shape patterns for genetic analysis. In this case,
they use sliced fruit images that gave more amount of
characteristics than surface appearances. Our methods identify
more fine differences than previous studies based on digital
images of surface appearances.
We will first discuss the most appropriate machine learning
approach to the classification of strawberry shapes into the nine
types. We focused on two criteria: (1) the accuracy of
classification and (2) the smallest number of parameters needed
to minimize the process of trial and error. Deep learning was
passed over from the candidate because of the small size of
training and assessment data in this study (nine types in 1,500
data points).
The random forest approach was reported to show higher
accuracy compared to other approaches (Kobayashi, 2011).
Among the programs available for use with a random forest
approach, the scikit-learn module in Python has the benefit of a
smaller number of main parameters that require settings by a
learner. Table 2 summarizes the results of the comparisons of
characteristics between a random forest classifier and a support
vector machine (SVM) classifier, which is currently the most
commonly used classifier. An SVM requires the tuning of
kernels when normalized parameters and kernels are configured.
In this study, we planned to perform tests 36 times (the number
of possible pairs from the nine types: 9C2 = 36) for determining
the feature descriptors. To avoid configuring the parameters 36
times, we used a random forest approach.
Table 2. Comparisons of random forest and SVM
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy
This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.
465
The random forest was coded with the following parameters.
A: The number of binary decision trees in learning. A larger
number of binary decision trees increases the accuracy of
classification; but it requires more calculation. We used 500
binary decision trees in this study, in accord with the
recommendation of Breiman (2001).
B: The maximum number of feature descriptors used for the
learning in each binary decision tree. We used √M, as M is an
explanatory variable, according to Breiman (2001).
C: The maximum depth in each binary decision tree.
'Random forest' is an ensemble learning approach using binary
decision trees. The depth of each decision tree affects the
variances of the learning model. According to Breiman (2001),
the possibility of over-fitting is increased when the maximum
depth is increased. However, the proper depth changes
depending on the data set used. In this study, we configured the
maximum depth as 1 and selected effective feature descriptors
for the test as described below in section 5.1. In the test
explained in section 5.2, the entire condition of the
identification learning data was changed depending on the
selected feature descriptors. Therefore, the maximum depth was
configured as 7. The use of two maximum depth values was
expected to suppress the possibility of over-fitting.
4.2 Feature descriptors
We used the following four feature descriptors: Measured
Values (MVs), the Ellipse Similarity Index (ESI). EFDs, and
Chain Code Subtraction (CCS).
4.2.1 Measured Values (MVs): Five measured values were
used: (1) the length of the contour line, (2) the area, (3) the
major axis of the approximate ellipse (i.e., the fruit length), (4)
the minor axis of the approximate ellipse (i.e., the fruit width),
and (5) the ratio of the fruit width/length.
4.2.2 Ellipse Similarity Index (ESI): To investigate the
ellipse similarity of the fruits, we used two feature descriptors:
the optimum ellipse area ratio, and the optimum ellipse
boundary length ratio.
(a) Optimum ellipse area ratio (ER) = Ellipse area (EA) / Fruit
area (FA), Optimum ellipse area (EOA) = a*b*π/4 (length; long
radius = a, width; short radius = b)
(b) Optimum ellipse boundary length ratio = Optimum ellipse
boundary length (L)/fruit boundary length (LI), obtained with
the following equation:
L
2
2
3410
3
12
ba
ba
ba
ba
baπ
(1)
4.2.3 Elliptic Fourier Descriptors (EFDs):
The EFD method (Kuhl and Giardina, 1982) uses X and Y
coordinates of the point of the contour as a function of the arc
length. Here, we determined the EFDs for the contour line of
strawberries. First, suppose a point 'P' circles a contour line of a
fruit from the starting point of 'S' at a constant speed. With the
coordinates of 'P' at the time point 't' as x(t) and y(t), P
represents a periodic function rotating with a period of 'T'
because P returns to the starting point 'S' in each round. A
periodic function is represented as a Fourier descriptor by
Fourier transform.
Here, x(t) and y(t) are represented as follows:
x(t)=2
0a+
N
nT
tnb
T
tna nn
1
2sin
2cos
ππ (2)
y(t)=2
0c+
N
nT
tnd
T
tnc nn
1
2sin
2cos
ππ (3)
In this case, an, bn, cn, and dn are coefficients of the EFDs. The
first term of the formulas, a0 and c0, represent the X and Y
coordinates of origin. The shape information thus includes the
coefficients of EFDs. Both formulas are Fourier descriptors
with N dimensions, and therefore, a more refined contour can be
represented with higher dimensions. Figure 9 shows the
reconstruction of a contour line of a strawberry from N=1
(ellipse) to higher dimensions (N=80) that represent the contour
line with harmony ellipses. We performed a PCA with the
coefficients in order to extract the major differences between the
shapes. We used the first to fifth principal components by
following the approach using the principal component scores
(Rohlf and Archie, 1984).
Figure 9. Restriction of a contour line of strawberry by Fourier
series. White and blue lines show the contour lines of
strawberry and harmony ellipses, respectively.
4.2.4 Chain Code Subtraction (CCS): The directions of the
connection points of the chain codes are shown with eight
direction codes, and we thus considered that the appearance
ratio of the eight directions represented the shape of the fruit.
Figure 10 shows the appearance ratio of chain codes (the
maximum ratio is 1) of the chain code representative value (CR)
of each fruit shape estimated based on the sample patterns in
Figure 3. The x- and y-axes represent the nine types of
strawberry shape and the average of the appearance ratio,
respectively. The colours of the bars shows the eight directions.
The chain code subtraction (CCS) was calculated by subtracting
the chain codes of each fruit (CS) from the CR.
CCSn = CRn − CSn (n = fruit type 1–9) (4)
The histograms of CCS values are shown in Figure 11. The
distribution patterns of the histograms were different for each
shape type. Because the differences of distribution pattern were
more clearly represented in CCS, we added CCS in the
subsequent analysis as a feature descriptor.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy
This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.
466
Figure 10. Distribution of CR.
Figure 11. Distribution of CCS
5. RESULTS
5.1 Exploring the variables that affect the classification of
fruit shapes
To explore the variables that affect the classification of fruit
shapes, we performed a classification test using each pair of the
nine types of shape, in a round-robin system (9C2 = 36 pairs). A
total of 85 to 100 images of the fruits were selected for each
shape type in order to avoid bias in the learning data. The
images of each type were divided at a 7:3 ratio, and the former
and latter images were used as the learning data and test data,
respectively. The numbers of images of each shape type are
shown in Table 3.
Table 3. Number of images of each shape type used in the
training data and test data
We tested the usefulness of the MVs (the length of the contour
line, the area, the fruit length, width, and width/length) for the
classification of strawberry shapes, and we then added the ESI,
EDF and CCS to the MVs and investigated the effectiveness of
these additions by using the random forest approach. The depth
of the decision tree was set as 1 because the test was for the
classification of pairs of the nine shape types. A coefficient of
agreement defined by Cohen (1960, hereinafter Kappa
coefficient) was used for the index of the classification test. The
Kappa coefficient is an index representing the degree of
coincidence by two examiners. It is calculated by subtracting
the theoretical coincidence from the practical coincidence. In
the present study's test, the Kappa coefficients showed the
coincidences of the predictions of classification of strawberry
shape types estimated by the learning data and the results from
the test data. Kappa coefficients of equal or more than 0.9, 0.8,
0.7, 0.6 and less than 0.6 are classified as great, good, OK (fair),
possible, and re-work needed, respectively (Imai and Shiomi,
2004). Here we used the Kappa coefficient of 0.7 as the
decision threshold of successful classification.
5.1.1 Classification by MVs: The x axis of Figure 12-17
shows the 36 pairs and the y axis shows the Kappa coefficient.
Sixteen of the 36 pairs showed a Kappa coefficient ≥0.7,
suggesting that the 16 pairs accurately classified by the MVs
(Figure 12). The results suggested that the MVs are not
sufficient for the classification of all of the strawberry shape
types, but the MVs are useful in several combinations. We
therefore concluded that the MVs could be used as the basic
information for classification, and we used them for the
subsequent analyses.
Figure 12. The results of classification by MVs
5.1.2 The ESI, EFD, and CCS: A further classification was
performed by adding the ESI, the EFD, and the CCS,
respectively to the MVs. The results are indicated in Figure 13.
Figure 13. The classification results by adding the ESI (blue),
the EFD (orange) and the CCS (grey) to the measured values.
The numbers of pairs that showed Kappa coefficients ≥0.7
were:
a) MVs + ESI = 21/36 pairs
b) MVs + EFD = 23/36 pairs
c) MVs + CCS = 33/36 pairs
Larger Kappa coefficients were shown for many of the pairs
with the additional descriptors. MVs + CCS showed the highest
classification ability. Based on the results described above, we
used the following combinations for a further classification test:
d) MVs + CCS + ESI
e) MVs + CCS + EFD
The results are illustrated in Figure 14.
■ESI ■ EFD ■ CCS
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy
This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.
467
Figure 14. The classification results by MVs + CCS + ESI
(blue) and MVs + CCS + EFD (orange)
The numbers of pairs that showed Kappa coefficients ≥0.7
were:
d) MVs + CCS + ESI = 33/36 pairs
e) MVs + CCS + EFD = 35/36 pairs.
The improvement of classification ability was observed by
adding EFD to MVs + CCS, i.e., combination c). The addition
of ESI did not produce significant differences in the
classification by MVs + CCS.
To test the classification ability in a different way, we
performed a 3-fold cross-validation for the combination of c)
and e) above, which showed the best and second-best results.
The structure of the data set is shown in Figure 15, and the
results are illustrated in Figures 16 and 17.
Figure 15. Structure of data sets in the 3-hold cross-validation
Figure 16. The classification results with MVs + CCS
in the 3-fold cross-validation .
Figure 17. The classification results with MVs + CCS + EFD
in the 3-fold cross-validation.
The numbers of pairs that showed Kappa coefficients ≥0.7 in all
of the trees were:
d) MVs + CCS = 31/36 pairs
e) MVs + CCS + EFD = 34/36 pairs
Based on these results, we concluded that the best combination
for the classification was e), i.e., MVs + CCS + EFD.
5.2 Classification test for the nine strawberry shape types
We performed a classification test for the nine strawberry shape
types with the combination of MVs + CCS + EFD for practical
applications. That is, the data of all nine shape types were used
at the same time for learning, and the classification ability was
determined. Recall ratio, i.e., number of agreed / number of
tested, were used for investigation of accuracy, because use of
Kappa coefficients was not adequate for the investigation of
accuracy in classification of more than three types (Tsushima,
2002). The test design was as follows:
1) Each of 70 fruit images of each of the first eight types (Types
1–8) was used for learning data. For Type 9, 60 fruit images
were used due to a smaller total number of images. The total
number of fruit images for learning data was thus 620.
2) The fruit images for the test data did not overlap with the
learning data. A total of 871 fruit images were used (Table 4).
The numbers of images for each type ranged from 25 to 150.
3) The depth of the decision tree was set at 7 to increase the
accuracy of learning in the complex structure of learning data.
The results are shown in Table 4. The recall ratio ≥0.7 was
observed in eight of the nine types. The recall ratio of Type 1
was 0.71, which is slightly more than 0.7, and accuracy of
classification of this type is thus considered possible. Type 7
showed a smaller recall ratio, 0.52, and this type involved
difficulty in classification compared to the other types.
Table 4. Classification results with MVs + CCS + EFD
and using all nine of the strawberry shape types at once
for the learning data
■CCS & ESI ■CCS & EFD
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy
This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.
468
6. DISCUSSION AND SUMMARY
6.1 Results summary
The results described above in section 5.1.1 suggested that the
MVs (i.e., the length of the contour line, the area, the fruit
length and width, and the fruit width/length) were useful as
basal descriptors for the classification of strawberry shapes. The
next test adding the ESI or the EFD or the CCS to the MVs
improved the classification ability, in the descending order
CCS > FED > ESI for classification ability. Based on the
subsequent 3-fold cross-validation, we concluded that the best
combination for the classification of strawberry shapes was
MVs + CCS + EFD. We further investigated the classification
ability of MVs + CCS+ EFD by using the learning data with the
nine shape types simultaneously. Although the Type 7 results
indicated that re-working was necessary, the other types showed
recall ratio >0.7 (Types 1, 2, 3, 4, 5, 6, 8 and 9). These results
suggested that the eight of the nine strawberry shape types
could be classified by using MVs + CCS + EFD.
Nagata et al. (1996) reported that the agreement ratio (recall)
between their method and personal decision was 71% in
classification of the four types strawberry fruit shape (triangle,
square, round and abnormal), and considered it was 'high' ratio
because the average agreement ratio of classifications among
multiple persons was 67%. In this study, we classified the
strawberry fruit types into nine types, which is more complex
classification that Nagata et al. (1996). Therefore we concluded
that the agreement ratio of 70% is a considerable high value.
6.2 Classification of type 7 and type 1
Both Type 7 and Type 1, which showed the worst and second-
worst accuracy rates, showed a horizontal ellipse-based shape.
The difference between these two types is the shape of the top
part; Type 1 is curving and Type 7 is flat. This difference is
often subtle and difficult to identify, because the degree of
curving of the top part changes depending on the direction of
the set position of the fruit on the tray. Therefore, for the
classification between Type 1 and Type 7, it is necessary to
improve the classification ability for the top part of the fruit.
We considered several options to improve the accuracy of
distinguishing Types 1 and 7: (1) zooming in on the top part of
the fruit and adding a further image analysis; (2) increasing the
number of chain code sections (e.g., from eight to 16) for higher
accuracy in the expression of the shape; (3) increasing the
number of sample images; and (4) trying reinforcement learning.
The solutions from (2) to (4) are general approaches and are
effective for a variety of targets. The effectiveness of
reinforcement learning has been reported and is expected to
become one of the most promising approaches for increasing
the accuracy of shape classification.
Kochi et al. (2018) reported a robust methods for reconstruction
of three-dimensional model in strawberry fruits from the digital
images. Use of three-dimensional models would enhance the
accuracy improvement by supporting the solutions of (1) - (4),
because larger number of contour lines ( or descriptors) would
detected from the circumference shape of 3D models than 2D
images, and increase the analysing and the learning data.
6.3 The classification abilities of CCS and EFD
Our present findings indicated that the Fourier series and chain
codes, which represent the features of shapes in a numeric form,
were effective in the classification of strawberry shapes. The
CCS in particular showed higher classification ability than the
EFD and the ESI. Because the CCS was determined based on
the classification results of the sample patterns by humans, it
includes the human knowledge in the descriptor. In other words,
CCS is a descriptor that adds human knowledge to the chain
codes, and the combination resulted in a more robust
classification than the other indices. Because the classification
with MVs + CCS showed a Kappa coefficient <0.7 only for the
classifications of Type 1 versus Type 7, Type 2 versus Type 4,
and Type 2 versus Type 5, it was predicted that CCS better
classifies the shape of the bottom part of the strawberry (show
Figure 13).
EFDs quantify the features of fruit shapes and include
comprehensive information such as sharpness in the bottom part,
roundness of the fruit, and curving or raising of the top part. Its
effectivity was less than that of the chain codes. However, by
adding the EFD to MVs + CCS, Type 2 versus Type 4 and Type
2 versus Type 5 were successfully classified (show Figure 14).
We thus considered that using EFDs improved the classification
of the shape of the strawberries' top part.
6.4 Conclusion
The results obtained in this study suggested machine learning's
high ability to classify fruit shapes. Machine learning has
gained more attention in the study of plant genetics rather than
the identification of plant shapes. The investigations of
phenotypes of organisms on a large scale are known as
'phenomics' which is becoming a significant field in biology. It
is expected that machine learning will contribute to phenomics
as well as genetics. Based on the present findings, we will
attempt to increase the classification accuracy of machine
learning and the number of target plant species. The system's
packaging and automation are also necessary for practical
applications.
6.5 Future perspectives
Automation and labour-saving in classification of strawberry
fruit shapes is desired by strawberry farmers, distributors and
breeders. The process of classification of fruit shape and
packing occupies more than 60% of working time of farmers
(Suenaga et al, 1989). The farmers should pack the fruits in
limited times, for example, 20,000 fruits are classified and
packed (Hayashi et al, 2004). Therefore, Hayashi et al. (2004) estimated that more than 80% workers desired automation of
classification. Meanwhile, a few study reported use of machine
leaning for classification of fruit shapes; large or small size of
fruit, four types of classification (Cao et al, 1996). The result
obtained in this study suggested that machine leaning is
effective for classification of fruit shape into nine types. In
order to realize automation of the process of strawberry shape
classification, we will improve the classification accuracy in
machine leaning and try to develop an automation system for
fruit classification.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy
This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.
469
REFERENCES
Adams, B., Venitha, K., Upasana, S., Fawzi, M., and Sameerchand, P.,
2017. Automatic Recognition of Medicinal Plants using Machine
Learning Techniques. IJACSA, 8(4), pp. 166-175.
Anderson, R., Daniel, C., Hauagge, J.W., and Siome, G., 2010.
Automatic fruit and vegetable classification from images. Computers
and Electronics in Agriculture, 70(1), pp. 96–104.
Arivazhagan, S., Shebiah, R. N., Nidhyanandhan, S. S., and Ganesan,
L., 2010, Fruit recognition using color and texture features. Journal of
Emerging Trends in Computing and Information Sciences, 1(2), 90-94.
Breiman, L., 2001. Random Forests. Machine Learning, 45(1), pp. 5-32.
Backhaus, A., Kuwabara, A., Bauch, M., Monk, N., Sanguinetti, G.,
and Fleming, A., 2010. LEAFPROCESSOR: a new leaf phenotyping
tool using contour bending energy and shape cluster analysis. New
phytologist, 187(1), pp. 251-261.
Cao, Q., Nagata, M., Mitarai, M., Fujiki, T., and Kinoshita, O., 1996,
Study on Grade Judgment of Fruit Vegetables Using Machine Vision
(Part 2), Journal of SHITA, 8(4), pp.228-236.
Cohen, J., 1960. A coefficient of agreement for nominal scales.
Educational and Psychological Measurement, 20(1), pp. 37-46. doi:
10.1177/001316446002000104.
Freeman, H., 1974. Computer processing of line-drawing images. ACM
Computing Surveys, 6(1), pp. 57-97
Furuta, N., Ninomiya, S., Takahashi, N., Ohmorii, H., Ukaii, Y., 1995,
Quantitative Eyaluation of Soybean (Gtycine max L. Merr.) Leaflet
Principal Component Scores Based on Elliptic Fourier Descriptor,
Breeding Science, 45(3), p.315-320
G.G, Rajput., S.M, Mali., 2010. Marathi Handwritten Numeral
Recognition using Fourier Descriptors and Normalized Chain Code.
IJCA, Special Issue on RTIPPR(3), pp. 141–145.
Gonzalo, M. J., Brewer, M. T., Anderson, C., Sullivan, D., Gray, S.,
and van der Knaap, E. 2009, Tomato fruit shape analysis using
morphometric and morphology attributes implemented in Tomato
Analyzer software program. Journal of the American Society for
Horticultural Science, 134(1), 77-87.
Hayashi, S., Ota, T., Kubota, K., and Ajiki, K., 2004, The questionnaire
survey about laborsaving of the harvest fruit-sorting work of a
strawberry, Journal of the Japanese Society of Agricultural Machinery,
66, Issue Supplement Pages 111-112. (in Japanese)
Hayashi, A., Tanabata, T., and Wada, T., 2017b. A proposal of image
analysis system for measuring strawberries. Hort J, 16(1), pp. 446. (in
Japanese).
Imai, Z., and Shiomi, T., 2004. Reliability of evaluation in physical
therapy Studies. J Phys Ther Sci, 19(3), pp. 261-265. (in Japanese).
Iwata, H., and Ukai, Y., 2002. SHAPE: A computer program package
for quantitative evaluation of biological shapes based on elliptic Fourier
descriptors. J Heredity, 93(5), pp. 384-385.
Keith, W., Jesse, M., Mark, S., 2013. Comparison of digital image
analysis using elliptic Fourier descriptors and major dimensions to
phenotype seed shape in hexaploid wheat (Triticum aestivum L.).
Euphytica, 190(1), pp .99-116.
Khojastehnazh, M., Omid, M., and Tabatabaeefar, A., 2010,
Development of a lemon sorting system based on color and
size. African Journal of Plant Science, 4(4), 122-127.
Kobayashi, Y., Tanaka, S., and Tomiura, Y., 2011. Classification and
assessment of English scientific papers using random forests. IPSJ SIG
Technical Report, 2011-CH-90(6), pp. 1-8. (in Japanese).
Kochi, N., Tanabata, T., Hayashi, A., and Isobe, S., 2018. Development
of 3D measuring system and measurement assessment of strawberry
fruits. Large-Scale Point Cloud Processing, International Journal of
Automation Technology. Vol.12 No.3.
Kuhl, F.P., and Giardina, C.R., 1982. Elliptic Fourier features of a
closed contour. Computer Graphics and Image Processing, 18(3), pp.
236-258.
Liming, X., and Yanchao, Z., 2010, Automated strawberry grading
system based on image processing. Computers and Electronics in
Agriculture, 71, S32-S39.
MAFF (Ministry of Agriculture, Forestry and Fisheries), Plant Variety
Protection PVP Office. Japan Test Guidelines. 2011. Fragaia L.
(strawberry).
http://www.hinsyu.maff.go.jp/info/sinsakijun/kijun/1289.pdf [accessed
January, 1, 2018].
Morimoto, T., Takeuchi, T., Miyata, H., and Hashimoto, Y., 2000,
Pattern recognition of fruit shape based on the concept of chaos and
neural networks. Computers and electronics in agriculture, 26(2), 171-
186.
Nagata, M., Kinoshita, O., Asano, K., Cao, Q., and Hiyoshi, K., 1996,
Studies on Automatic Sorting System for Strawberry (Part 2) :
Development of Sorting System Using Image Processing, Journal of the
Japanese Society of Agricultural Machinery, 58(6), pp.61-67.
Rohlf, F.J., Archie, J.W., 1984. A comparison of Fourier methods for
the description of wing shape in mosquitoes (Diptera: Culicidae). Syst
Zool, 33(3), pp. 302-317.
Scikit-learn. Machine Learning in Python. http://scikit-learn.org/stable/
[accessed January 1, 2018].
Suenaga, T., Imamura, Y., Maeda, K., Yamada, T., and Takamatsu, M.
1989, The workloads of farmers who sort and pack strawberries in
accordance with standards of shipment and their awareness of standards
of shipment, Journal of the Japanese Association of Rural Medicine,
38(4), pp.895-907
Tanabata, T., Shibuya, T., Hori, K., Ebana, K., and Yano, M., 2012.
SmartGrain: High-throughput phenotyping software for measuring seed
shape through image analysis. Plant Physiol, 160(4), pp. 1871-1880.
Tsushima, E., 2002. Application of Coefficient of Reliability to the
Sturdies of Physical Therapy. Rigakuryoho Kagaku, 17(3), pp. 181-187.
Wada, T., Oku, K., Nagano, S., Isobe, S., Suzuki H., Mori M., Takata,
K., Hirata, C., Shimomura, K., Tsubone, M., Katayama, T., Hirashima,
K., Uchimura, Y., Ikegami, H., Sueyoshi, T., Obu, K., Hayashida, T.,
and Shibato, Y., 2017a. Development and characterization of a
strawberry MAGIC population derived from crosses with six cultivars.
Breed. Sci, 67, pp. 370-381.
Zawbaa, H. M., Abbass, M., Hazman, M., and Hassenian, A. E. 2014,
Automatic fruit image recognition system based on shape and color
features. In International Conference on Advanced Machine Learning
Technologies and Applications(pp. 278-290). Springer, Cham.
Zhang, Y., and Wu, L. ,2012, Classification of fruits using computer
vision and a multiclass support vector machine. sensors, 12(9), 12489-
12505.
The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy
This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.
470