CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's...

8
CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE LEARNING T. Ishikawa 1 , A. Hayashi 2 , S.Nagamatsu 3 , Y. Kyutoku 1 , I. Dan 1, T. Wada 3 , K. Oku 3 , Y. Saeki 3 , T. Uto 3 , T. Tanabata 2 , S. ISOBE 2 N. Kochi 1, 2 * 1 Faculty of science and Engineering, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan [email protected] 2 Department of Frontier Research Plant Genomics and Genetics, Kazusa DNA Research Institute, 2-6-7, Kazusa-Kamatari, Kisarazu, Chiba, 292-0818, Japan 3 Agro-Environment Sciences and Biotechnology Div., Fukuoka Agriculture and Forestry Research Center, 587, Yoshiki, Chikushino, Fukuoka, 818-8549, Japan Commission II, WG II/10 KEY WORDS: Shape Recognition, Classification, Measurement, Chain code, Machine learning, Random Forest, Agriculture ABSTRACT: Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity, and value of the products. For strawberries, the nine types of fruit shape were defined and classified by humans based on the sampler patterns of the nine types. In this study, we tested the classification of strawberry shapes by machine learning in order to increase the accuracy of the classification, and we introduce the concept of computerization into this field. Four types of descriptors were extracted from the digital images of strawberries: (1) the Measured Values (MVs) including the length of the contour line, the area, the fruit length and width, and the fruit width/length ratio; (2) the Ellipse Similarity Index (ESI); (3) Elliptic Fourier Descriptors (EFDs), and (4) Chain Code Subtraction (CCS). We used these descriptors for the classification test along with the random forest approach, and eight of the nine shape types were classified with combinations of MVs + CCS + EFDs. CCS is a descriptor that adds human knowledge to the chain codes, and it showed higher robustness in classification than the other descriptors. Our results suggest machine learning's high ability to classify fruit shapes accurately. We will attempt to increase the classification accuracy and apply the machine learning methods to other plant species. * Corresponding author 1. INTRODUCTION Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity and value of the products. Shape is also important in cultivar identification, as cultivars are defined by their morphological characteristics. Although mechanization and automation are used widely in agriculture (mainly for cultivation systems such as planting and harvesting), automated technologies for the identification of the shapes of agricultural products have not been established. The identification of the shapes of agricultural products has been done for centuries by visual assessment only, and the criteria for judgement have not been well defined. It has been considered challenging to transfer the classification of shapes from visual assessment to computerization, in part because it is difficult to verbally describe shapes in detail and in a standard manner. Strawberries are an important fruit species used worldwide as fresh or processed products. In Japan, strawberries are third in production value among agricultural crops (after rice and tomatoes), contributing a significant portion of the agricultural economy. Higher value is attached to shape as a trait for several horticultural products — including strawberries — compared to other crops, because it directly links to the price per unit. In addition, a specific fruit shape is often preferred such as a conical shape of strawberries for visual presentations. A cultivar's specific shape is also important for a brand's value; an example is the round shape of the strawberries of the cultivar ‘Fukuoka S6 Go’ (Trademark name, ‘'Amaou.') Agricultural products are not generated from artificial designs, and the shapes of the products are not uniform. In addition, the shapes of these products can change depending on the growing conditions. This is true for strawberries, and there are no well- defined guidelines for the classification of strawberry shapes. Therefore, sample patterns defined by agriculture authorities have been used for shape classification, and the judgement is done by a human's comparison of the sample pattern and the fruit. This judgement often changes depending on the 'human sensor' used, i.e., among individuals. Practice is thus required to increase a human sensor's precision. In Japan, the nine types of strawberry shape are defined in the guideline issued for strawberries by the Plant Variety Protection (PVP) office at the Ministry of Agriculture, Forestry and Fisheries (MAFF, 2011). That guideline was issued for the registration of strawberry varieties, and the nine shape types are used in for characterizing the shape of developed and registered novel strawberry varieties. In this study, we tested the classification of strawberry shapes by machine learning with the goal of increasing the accuracy of The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License. 463

Transcript of CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's...

Page 1: CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's background based on colours. The contour line of the fruit is determined by the software,

CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE LEARNING

T. Ishikawa 1, A. Hayashi2, S.Nagamatsu3, Y. Kyutoku1, I. Dan1, T. Wada3, K. Oku3, Y. Saeki3, T. Uto3, T. Tanabata2, S. ISOBE2

N. Kochi1, 2 *

1 Faculty of science and Engineering, Chuo University, 1-13-27, Kasuga, Bunkyo-ku, Tokyo, 112-8551, Japan –

[email protected] 2 Department of Frontier Research Plant Genomics and Genetics, Kazusa DNA Research Institute, 2-6-7, Kazusa-Kamatari,

Kisarazu, Chiba, 292-0818, Japan 3 Agro-Environment Sciences and Biotechnology Div., Fukuoka Agriculture and Forestry Research Center, 587, Yoshiki,

Chikushino, Fukuoka, 818-8549, Japan

Commission II, WG II/10

KEY WORDS: Shape Recognition, Classification, Measurement, Chain code, Machine learning, Random Forest, Agriculture

ABSTRACT:

Shape is one of the most important traits of agricultural products due to its relationships with the quality, quantity, and value of the

products. For strawberries, the nine types of fruit shape were defined and classified by humans based on the sampler patterns of the

nine types. In this study, we tested the classification of strawberry shapes by machine learning in order to increase the accuracy of

the classification, and we introduce the concept of computerization into this field. Four types of descriptors were extracted from the

digital images of strawberries: (1) the Measured Values (MVs) including the length of the contour line, the area, the fruit length and

width, and the fruit width/length ratio; (2) the Ellipse Similarity Index (ESI); (3) Elliptic Fourier Descriptors (EFDs), and (4) Chain

Code Subtraction (CCS). We used these descriptors for the classification test along with the random forest approach, and eight of the

nine shape types were classified with combinations of MVs + CCS + EFDs. CCS is a descriptor that adds human knowledge to the

chain codes, and it showed higher robustness in classification than the other descriptors. Our results suggest machine learning's high

ability to classify fruit shapes accurately. We will attempt to increase the classification accuracy and apply the machine learning

methods to other plant species.

* Corresponding author

1. INTRODUCTION

Shape is one of the most important traits of agricultural

products due to its relationships with the quality, quantity and

value of the products. Shape is also important in cultivar

identification, as cultivars are defined by their morphological

characteristics. Although mechanization and automation are

used widely in agriculture (mainly for cultivation systems such

as planting and harvesting), automated technologies for the

identification of the shapes of agricultural products have not

been established. The identification of the shapes of agricultural

products has been done for centuries by visual assessment only,

and the criteria for judgement have not been well defined. It has

been considered challenging to transfer the classification of

shapes from visual assessment to computerization, in part

because it is difficult to verbally describe shapes in detail and in

a standard manner.

Strawberries are an important fruit species used worldwide as

fresh or processed products. In Japan, strawberries are third in

production value among agricultural crops (after rice and

tomatoes), contributing a significant portion of the agricultural

economy. Higher value is attached to shape as a trait for several

horticultural products — including strawberries — compared to

other crops, because it directly links to the price per unit. In

addition, a specific fruit shape is often preferred such as a

conical shape of strawberries for visual presentations. A

cultivar's specific shape is also important for a brand's value; an

example is the round shape of the strawberries of the cultivar

‘Fukuoka S6 Go’ (Trademark name, ‘'Amaou.')

Agricultural products are not generated from artificial designs,

and the shapes of the products are not uniform. In addition, the

shapes of these products can change depending on the growing

conditions. This is true for strawberries, and there are no well-

defined guidelines for the classification of strawberry shapes.

Therefore, sample patterns defined by agriculture authorities

have been used for shape classification, and the judgement is

done by a human's comparison of the sample pattern and the

fruit. This judgement often changes depending on the 'human

sensor' used, i.e., among individuals. Practice is thus required to

increase a human sensor's precision.

In Japan, the nine types of strawberry shape are defined in the

guideline issued for strawberries by the Plant Variety Protection

(PVP) office at the Ministry of Agriculture, Forestry and

Fisheries (MAFF, 2011). That guideline was issued for the

registration of strawberry varieties, and the nine shape types are

used in for characterizing the shape of developed and registered

novel strawberry varieties.

In this study, we tested the classification of strawberry shapes

by machine learning with the goal of increasing the accuracy of

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy

This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.

463

Page 2: CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's background based on colours. The contour line of the fruit is determined by the software,

the classification, and in order to introduce computerization into

this field. We describe the types of strawberry shape in Section

2, our image analysis (Section 3), machine learning and feature

descriptions for classification (Section 4), our results (Section

5), and a discussion and summary (Section 6).

2. THE NINE TYPES OF STRAWBERRY SHAPE

The nine types of strawberry shape defined by the PVP office at

MAFF (Fragaria L. Char. 37, Fruit: shape) are shown in Figure

1: reniform, conical, cordate, ovoid, cylindrical, rhomboid,

obloid, globose, and wedged.

Figure 1. Strawberry shapes

(cited from Fragaria L. Char. 37, Fruit: shape)

Table 1. The combinations of the three criteria used for the

classification of the nine types of strawberry shape

(a) Proportions of length and width (b) Shape of the top part

(raised, flat, curving)

(c) Shape of the bottom part

Figure 2. The combinations of the three criteria

Figure 3. Fruit images of the nine types

No verbalized definition of the shapes is described in the

guideline. However, according to the strawberry breeders at the

Fukuoka Agricultural and Forestry Research Centre, several

criteria are commonly recognized for the classification of the

nine strawberry shapes: (a) the proportions of the length and

width of the strawberry; (b) the shape of the top part, i.e., the

part connecting the hull (raised, flat, or curving); and (c) the

shape of the bottom part (sharpness or roundness). The

combinations of these three criteria for classification are

summarized in Table 1 and illustrated in Figures 2 (a) – (c).

Examples of the nine sample patterns of the fruit are provided in

Figure 3.

3. IMAGE ANARYSIS

3.1 Image capture

A total of 2,969 strawberry fruit images were captured by a

digital camera (Canon EOS 40D). The 388 strawberry plants in

a Multi-parent Advanced Generation Inter-crosses (MAGIC)

population, which was derived from six funder lines, were

grown at the Fukuoka Agricultural and Forestry Research

Centre (Wada et al. 2017a). The harvested fruits were put on

Styrofoam trays (Figure. 4), and the digital images were taken

under artificial lighting covered with scattering light film.

This time we don’t use the colour checker. We are going to use

it for evaluation of breed improvement in future. The image

resolution of the system in this study was 0.08 mm/ pixel and

measuring accuracy was estimated as < 0.8 mm when the range

of error by lens distortion was assumed within 10 pixels

(calibrated result). The acceptable range of error was considered

as of 0.5~1.0mm in practical measurement, therefore we

considered that camera calibration was not an essential process

in our measuring system and skipped the process to increase the

throughput of the work. The accuracy of measurement was

rather promoted by increase of number of leaning data. The

digital images were processed by the following image analysis.

Figure 4. Input image Figure 5. Extracted fruit images

3.2 Image analysis software

The length (i.e., the major axis of the ellipse), the width (the

minor axis of the ellipse) and the ratio of width/length in each

fruit image were determined based on the software developed

by Hayashi et al. (2017b). This software uses the algorithms

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy

This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.

464

Page 3: CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's background based on colours. The contour line of the fruit is determined by the software,

developed by Tanabata et al. (2012), and it automatically

extracts the fruit from the image's background based on colours.

The contour line of the fruit is determined by the software, and

ellipse approximation was performed for the measurement of

the length and width of the ellipse. Figures 4–6 are an input

image, extracted fruit images, and an approximate ellipse,

respectively.

We developed a program for generating chain codes (Freeman,

1974) of fruit contour lines in order to add feature descriptors.

The directions of approximate ellipses were identified by

extracting the top and bottom coordinates of the approximate

ellipses (Figure. 6). The start points of the chain codes were

identified, and the directions of connection points were

determined. We performed a principal component analysis

(PCA) with Elliptic Fourier Descriptors (EFDs) that were

determined based on the

chain codes as well as other

feature descriptors.

Figure 6. Approximate ellipse Figure 7. Encoding direction

3.3 SHAPE (software for the calculation of the EFDs)

The Elliptic Fourier Descriptors (EFDs) method, a

mathematical method for the description of contours, is often

used for the investigation of organs in organisms (Furuta et al.

1995, Keith et al. 2013). In this study, we determined the EFDs

with the use of freeware, SHAPE (Iwata and Ukai, 2002), which

was designed for the quantitative evaluations of biological

shapes based on EFDs. SHAPE has functions for extracting

contour lines from the digital images of (plant) organs, and for

investigating the contours in quantitative ways based on EFDs

and principal component scores. The features extracted as

principal components are visualized, and the scores are used as

the descriptors of the features. Shape is operated on Windows

OS, and includes four applications: (1) the extraction of contour

lines, (2) the derivation and normalization of the EFDs, (3) a

principal component analysis, and (4) visualization of the PCA

results (Figure 8). We input the chain codes data to SHAPE,

and we investigated the fruit shapes by using functions (2) to

(4). The obtained Fourier series was used for machine learning

as a feature descriptor.

Figure 8. The work of SHAPE

4. METHODS

4.1 Machine learning

Several studies were reported for the classification of shapes by

machine leaning; Classification of handwriting numeric

characters by using chain code and Fourier descriptors, (G. G.

Rajput, 2010), classification of individual fruits and plants

patterns (Anderson et al. 2010), and classification of specific

varieties based on contour lines. (Adams et al. 2017). Backhaus

et al. (2010) developed a phenotyping tool that represents the

change of partial shape of the leaf as contour bending energy.

In fruit classification, Arivazhagan et al. (2010) reported

identification of 15 different species from 2,635 digital fruit

images by using amount of characteristics derived from wavelet

transformation. Classification of fruits in different species by

SVM and K-NN were also reported by Zhang et al. (2012) and

Zawbaa et al. (2014). Identification of fruits in species level

(such as apple and banana) is more easier than classification of

fruit shape patterns in same species (such as this study) because

the target fruits shows obviously different shape among

different species. There were several studies reported

classification of fruits in same species; Separating 'good' and

'defective' strawberry fruits by neural network (Morimoto et al.,

2000), classification of lemon fruits in three types by color and

volume (Khojastehnazh et al. 2010), and classification of

strawberry fruits in four shape patterns (Nagata et al. 1996 and

Liming et al. 2010). Gonzalo et al. (2009) classified tomato

fruits in nine shape patterns for genetic analysis. In this case,

they use sliced fruit images that gave more amount of

characteristics than surface appearances. Our methods identify

more fine differences than previous studies based on digital

images of surface appearances.

We will first discuss the most appropriate machine learning

approach to the classification of strawberry shapes into the nine

types. We focused on two criteria: (1) the accuracy of

classification and (2) the smallest number of parameters needed

to minimize the process of trial and error. Deep learning was

passed over from the candidate because of the small size of

training and assessment data in this study (nine types in 1,500

data points).

The random forest approach was reported to show higher

accuracy compared to other approaches (Kobayashi, 2011).

Among the programs available for use with a random forest

approach, the scikit-learn module in Python has the benefit of a

smaller number of main parameters that require settings by a

learner. Table 2 summarizes the results of the comparisons of

characteristics between a random forest classifier and a support

vector machine (SVM) classifier, which is currently the most

commonly used classifier. An SVM requires the tuning of

kernels when normalized parameters and kernels are configured.

In this study, we planned to perform tests 36 times (the number

of possible pairs from the nine types: 9C2 = 36) for determining

the feature descriptors. To avoid configuring the parameters 36

times, we used a random forest approach.

Table 2. Comparisons of random forest and SVM

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy

This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.

465

Page 4: CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's background based on colours. The contour line of the fruit is determined by the software,

The random forest was coded with the following parameters.

A: The number of binary decision trees in learning. A larger

number of binary decision trees increases the accuracy of

classification; but it requires more calculation. We used 500

binary decision trees in this study, in accord with the

recommendation of Breiman (2001).

B: The maximum number of feature descriptors used for the

learning in each binary decision tree. We used √M, as M is an

explanatory variable, according to Breiman (2001).

C: The maximum depth in each binary decision tree.

'Random forest' is an ensemble learning approach using binary

decision trees. The depth of each decision tree affects the

variances of the learning model. According to Breiman (2001),

the possibility of over-fitting is increased when the maximum

depth is increased. However, the proper depth changes

depending on the data set used. In this study, we configured the

maximum depth as 1 and selected effective feature descriptors

for the test as described below in section 5.1. In the test

explained in section 5.2, the entire condition of the

identification learning data was changed depending on the

selected feature descriptors. Therefore, the maximum depth was

configured as 7. The use of two maximum depth values was

expected to suppress the possibility of over-fitting.

4.2 Feature descriptors

We used the following four feature descriptors: Measured

Values (MVs), the Ellipse Similarity Index (ESI). EFDs, and

Chain Code Subtraction (CCS).

4.2.1 Measured Values (MVs): Five measured values were

used: (1) the length of the contour line, (2) the area, (3) the

major axis of the approximate ellipse (i.e., the fruit length), (4)

the minor axis of the approximate ellipse (i.e., the fruit width),

and (5) the ratio of the fruit width/length.

4.2.2 Ellipse Similarity Index (ESI): To investigate the

ellipse similarity of the fruits, we used two feature descriptors:

the optimum ellipse area ratio, and the optimum ellipse

boundary length ratio.

(a) Optimum ellipse area ratio (ER) = Ellipse area (EA) / Fruit

area (FA), Optimum ellipse area (EOA) = a*b*π/4 (length; long

radius = a, width; short radius = b)

(b) Optimum ellipse boundary length ratio = Optimum ellipse

boundary length (L)/fruit boundary length (LI), obtained with

the following equation:

L

2

2

3410

3

12

ba

ba

ba

ba

baπ

(1)

4.2.3 Elliptic Fourier Descriptors (EFDs):

The EFD method (Kuhl and Giardina, 1982) uses X and Y

coordinates of the point of the contour as a function of the arc

length. Here, we determined the EFDs for the contour line of

strawberries. First, suppose a point 'P' circles a contour line of a

fruit from the starting point of 'S' at a constant speed. With the

coordinates of 'P' at the time point 't' as x(t) and y(t), P

represents a periodic function rotating with a period of 'T'

because P returns to the starting point 'S' in each round. A

periodic function is represented as a Fourier descriptor by

Fourier transform.

Here, x(t) and y(t) are represented as follows:

x(t)=2

0a+

N

nT

tnb

T

tna nn

1

2sin

2cos

ππ (2)

y(t)=2

0c+

N

nT

tnd

T

tnc nn

1

2sin

2cos

ππ (3)

In this case, an, bn, cn, and dn are coefficients of the EFDs. The

first term of the formulas, a0 and c0, represent the X and Y

coordinates of origin. The shape information thus includes the

coefficients of EFDs. Both formulas are Fourier descriptors

with N dimensions, and therefore, a more refined contour can be

represented with higher dimensions. Figure 9 shows the

reconstruction of a contour line of a strawberry from N=1

(ellipse) to higher dimensions (N=80) that represent the contour

line with harmony ellipses. We performed a PCA with the

coefficients in order to extract the major differences between the

shapes. We used the first to fifth principal components by

following the approach using the principal component scores

(Rohlf and Archie, 1984).

Figure 9. Restriction of a contour line of strawberry by Fourier

series. White and blue lines show the contour lines of

strawberry and harmony ellipses, respectively.

4.2.4 Chain Code Subtraction (CCS): The directions of the

connection points of the chain codes are shown with eight

direction codes, and we thus considered that the appearance

ratio of the eight directions represented the shape of the fruit.

Figure 10 shows the appearance ratio of chain codes (the

maximum ratio is 1) of the chain code representative value (CR)

of each fruit shape estimated based on the sample patterns in

Figure 3. The x- and y-axes represent the nine types of

strawberry shape and the average of the appearance ratio,

respectively. The colours of the bars shows the eight directions.

The chain code subtraction (CCS) was calculated by subtracting

the chain codes of each fruit (CS) from the CR.

CCSn = CRn − CSn (n = fruit type 1–9) (4)

The histograms of CCS values are shown in Figure 11. The

distribution patterns of the histograms were different for each

shape type. Because the differences of distribution pattern were

more clearly represented in CCS, we added CCS in the

subsequent analysis as a feature descriptor.

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy

This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.

466

Page 5: CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's background based on colours. The contour line of the fruit is determined by the software,

Figure 10. Distribution of CR.

Figure 11. Distribution of CCS

5. RESULTS

5.1 Exploring the variables that affect the classification of

fruit shapes

To explore the variables that affect the classification of fruit

shapes, we performed a classification test using each pair of the

nine types of shape, in a round-robin system (9C2 = 36 pairs). A

total of 85 to 100 images of the fruits were selected for each

shape type in order to avoid bias in the learning data. The

images of each type were divided at a 7:3 ratio, and the former

and latter images were used as the learning data and test data,

respectively. The numbers of images of each shape type are

shown in Table 3.

Table 3. Number of images of each shape type used in the

training data and test data

We tested the usefulness of the MVs (the length of the contour

line, the area, the fruit length, width, and width/length) for the

classification of strawberry shapes, and we then added the ESI,

EDF and CCS to the MVs and investigated the effectiveness of

these additions by using the random forest approach. The depth

of the decision tree was set as 1 because the test was for the

classification of pairs of the nine shape types. A coefficient of

agreement defined by Cohen (1960, hereinafter Kappa

coefficient) was used for the index of the classification test. The

Kappa coefficient is an index representing the degree of

coincidence by two examiners. It is calculated by subtracting

the theoretical coincidence from the practical coincidence. In

the present study's test, the Kappa coefficients showed the

coincidences of the predictions of classification of strawberry

shape types estimated by the learning data and the results from

the test data. Kappa coefficients of equal or more than 0.9, 0.8,

0.7, 0.6 and less than 0.6 are classified as great, good, OK (fair),

possible, and re-work needed, respectively (Imai and Shiomi,

2004). Here we used the Kappa coefficient of 0.7 as the

decision threshold of successful classification.

5.1.1 Classification by MVs: The x axis of Figure 12-17

shows the 36 pairs and the y axis shows the Kappa coefficient.

Sixteen of the 36 pairs showed a Kappa coefficient ≥0.7,

suggesting that the 16 pairs accurately classified by the MVs

(Figure 12). The results suggested that the MVs are not

sufficient for the classification of all of the strawberry shape

types, but the MVs are useful in several combinations. We

therefore concluded that the MVs could be used as the basic

information for classification, and we used them for the

subsequent analyses.

Figure 12. The results of classification by MVs

5.1.2 The ESI, EFD, and CCS: A further classification was

performed by adding the ESI, the EFD, and the CCS,

respectively to the MVs. The results are indicated in Figure 13.

Figure 13. The classification results by adding the ESI (blue),

the EFD (orange) and the CCS (grey) to the measured values.

The numbers of pairs that showed Kappa coefficients ≥0.7

were:

a) MVs + ESI = 21/36 pairs

b) MVs + EFD = 23/36 pairs

c) MVs + CCS = 33/36 pairs

Larger Kappa coefficients were shown for many of the pairs

with the additional descriptors. MVs + CCS showed the highest

classification ability. Based on the results described above, we

used the following combinations for a further classification test:

d) MVs + CCS + ESI

e) MVs + CCS + EFD

The results are illustrated in Figure 14.

■ESI ■ EFD ■ CCS

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy

This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.

467

Page 6: CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's background based on colours. The contour line of the fruit is determined by the software,

Figure 14. The classification results by MVs + CCS + ESI

(blue) and MVs + CCS + EFD (orange)

The numbers of pairs that showed Kappa coefficients ≥0.7

were:

d) MVs + CCS + ESI = 33/36 pairs

e) MVs + CCS + EFD = 35/36 pairs.

The improvement of classification ability was observed by

adding EFD to MVs + CCS, i.e., combination c). The addition

of ESI did not produce significant differences in the

classification by MVs + CCS.

To test the classification ability in a different way, we

performed a 3-fold cross-validation for the combination of c)

and e) above, which showed the best and second-best results.

The structure of the data set is shown in Figure 15, and the

results are illustrated in Figures 16 and 17.

Figure 15. Structure of data sets in the 3-hold cross-validation

Figure 16. The classification results with MVs + CCS

in the 3-fold cross-validation .

Figure 17. The classification results with MVs + CCS + EFD

in the 3-fold cross-validation.

The numbers of pairs that showed Kappa coefficients ≥0.7 in all

of the trees were:

d) MVs + CCS = 31/36 pairs

e) MVs + CCS + EFD = 34/36 pairs

Based on these results, we concluded that the best combination

for the classification was e), i.e., MVs + CCS + EFD.

5.2 Classification test for the nine strawberry shape types

We performed a classification test for the nine strawberry shape

types with the combination of MVs + CCS + EFD for practical

applications. That is, the data of all nine shape types were used

at the same time for learning, and the classification ability was

determined. Recall ratio, i.e., number of agreed / number of

tested, were used for investigation of accuracy, because use of

Kappa coefficients was not adequate for the investigation of

accuracy in classification of more than three types (Tsushima,

2002). The test design was as follows:

1) Each of 70 fruit images of each of the first eight types (Types

1–8) was used for learning data. For Type 9, 60 fruit images

were used due to a smaller total number of images. The total

number of fruit images for learning data was thus 620.

2) The fruit images for the test data did not overlap with the

learning data. A total of 871 fruit images were used (Table 4).

The numbers of images for each type ranged from 25 to 150.

3) The depth of the decision tree was set at 7 to increase the

accuracy of learning in the complex structure of learning data.

The results are shown in Table 4. The recall ratio ≥0.7 was

observed in eight of the nine types. The recall ratio of Type 1

was 0.71, which is slightly more than 0.7, and accuracy of

classification of this type is thus considered possible. Type 7

showed a smaller recall ratio, 0.52, and this type involved

difficulty in classification compared to the other types.

Table 4. Classification results with MVs + CCS + EFD

and using all nine of the strawberry shape types at once

for the learning data

■CCS & ESI ■CCS & EFD

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy

This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.

468

Page 7: CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's background based on colours. The contour line of the fruit is determined by the software,

6. DISCUSSION AND SUMMARY

6.1 Results summary

The results described above in section 5.1.1 suggested that the

MVs (i.e., the length of the contour line, the area, the fruit

length and width, and the fruit width/length) were useful as

basal descriptors for the classification of strawberry shapes. The

next test adding the ESI or the EFD or the CCS to the MVs

improved the classification ability, in the descending order

CCS > FED > ESI for classification ability. Based on the

subsequent 3-fold cross-validation, we concluded that the best

combination for the classification of strawberry shapes was

MVs + CCS + EFD. We further investigated the classification

ability of MVs + CCS+ EFD by using the learning data with the

nine shape types simultaneously. Although the Type 7 results

indicated that re-working was necessary, the other types showed

recall ratio >0.7 (Types 1, 2, 3, 4, 5, 6, 8 and 9). These results

suggested that the eight of the nine strawberry shape types

could be classified by using MVs + CCS + EFD.

Nagata et al. (1996) reported that the agreement ratio (recall)

between their method and personal decision was 71% in

classification of the four types strawberry fruit shape (triangle,

square, round and abnormal), and considered it was 'high' ratio

because the average agreement ratio of classifications among

multiple persons was 67%. In this study, we classified the

strawberry fruit types into nine types, which is more complex

classification that Nagata et al. (1996). Therefore we concluded

that the agreement ratio of 70% is a considerable high value.

6.2 Classification of type 7 and type 1

Both Type 7 and Type 1, which showed the worst and second-

worst accuracy rates, showed a horizontal ellipse-based shape.

The difference between these two types is the shape of the top

part; Type 1 is curving and Type 7 is flat. This difference is

often subtle and difficult to identify, because the degree of

curving of the top part changes depending on the direction of

the set position of the fruit on the tray. Therefore, for the

classification between Type 1 and Type 7, it is necessary to

improve the classification ability for the top part of the fruit.

We considered several options to improve the accuracy of

distinguishing Types 1 and 7: (1) zooming in on the top part of

the fruit and adding a further image analysis; (2) increasing the

number of chain code sections (e.g., from eight to 16) for higher

accuracy in the expression of the shape; (3) increasing the

number of sample images; and (4) trying reinforcement learning.

The solutions from (2) to (4) are general approaches and are

effective for a variety of targets. The effectiveness of

reinforcement learning has been reported and is expected to

become one of the most promising approaches for increasing

the accuracy of shape classification.

Kochi et al. (2018) reported a robust methods for reconstruction

of three-dimensional model in strawberry fruits from the digital

images. Use of three-dimensional models would enhance the

accuracy improvement by supporting the solutions of (1) - (4),

because larger number of contour lines ( or descriptors) would

detected from the circumference shape of 3D models than 2D

images, and increase the analysing and the learning data.

6.3 The classification abilities of CCS and EFD

Our present findings indicated that the Fourier series and chain

codes, which represent the features of shapes in a numeric form,

were effective in the classification of strawberry shapes. The

CCS in particular showed higher classification ability than the

EFD and the ESI. Because the CCS was determined based on

the classification results of the sample patterns by humans, it

includes the human knowledge in the descriptor. In other words,

CCS is a descriptor that adds human knowledge to the chain

codes, and the combination resulted in a more robust

classification than the other indices. Because the classification

with MVs + CCS showed a Kappa coefficient <0.7 only for the

classifications of Type 1 versus Type 7, Type 2 versus Type 4,

and Type 2 versus Type 5, it was predicted that CCS better

classifies the shape of the bottom part of the strawberry (show

Figure 13).

EFDs quantify the features of fruit shapes and include

comprehensive information such as sharpness in the bottom part,

roundness of the fruit, and curving or raising of the top part. Its

effectivity was less than that of the chain codes. However, by

adding the EFD to MVs + CCS, Type 2 versus Type 4 and Type

2 versus Type 5 were successfully classified (show Figure 14).

We thus considered that using EFDs improved the classification

of the shape of the strawberries' top part.

6.4 Conclusion

The results obtained in this study suggested machine learning's

high ability to classify fruit shapes. Machine learning has

gained more attention in the study of plant genetics rather than

the identification of plant shapes. The investigations of

phenotypes of organisms on a large scale are known as

'phenomics' which is becoming a significant field in biology. It

is expected that machine learning will contribute to phenomics

as well as genetics. Based on the present findings, we will

attempt to increase the classification accuracy of machine

learning and the number of target plant species. The system's

packaging and automation are also necessary for practical

applications.

6.5 Future perspectives

Automation and labour-saving in classification of strawberry

fruit shapes is desired by strawberry farmers, distributors and

breeders. The process of classification of fruit shape and

packing occupies more than 60% of working time of farmers

(Suenaga et al, 1989). The farmers should pack the fruits in

limited times, for example, 20,000 fruits are classified and

packed (Hayashi et al, 2004). Therefore, Hayashi et al. (2004) estimated that more than 80% workers desired automation of

classification. Meanwhile, a few study reported use of machine

leaning for classification of fruit shapes; large or small size of

fruit, four types of classification (Cao et al, 1996). The result

obtained in this study suggested that machine leaning is

effective for classification of fruit shape into nine types. In

order to realize automation of the process of strawberry shape

classification, we will improve the classification accuracy in

machine leaning and try to develop an automation system for

fruit classification.

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy

This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.

469

Page 8: CLASSIFICATION OF STRAWBERRY FRUIT SHAPE BY MACHINE … · extracts the fruit from the image's background based on colours. The contour line of the fruit is determined by the software,

REFERENCES

Adams, B., Venitha, K., Upasana, S., Fawzi, M., and Sameerchand, P.,

2017. Automatic Recognition of Medicinal Plants using Machine

Learning Techniques. IJACSA, 8(4), pp. 166-175.

Anderson, R., Daniel, C., Hauagge, J.W., and Siome, G., 2010.

Automatic fruit and vegetable classification from images. Computers

and Electronics in Agriculture, 70(1), pp. 96–104.

Arivazhagan, S., Shebiah, R. N., Nidhyanandhan, S. S., and Ganesan,

L., 2010, Fruit recognition using color and texture features. Journal of

Emerging Trends in Computing and Information Sciences, 1(2), 90-94.

Breiman, L., 2001. Random Forests. Machine Learning, 45(1), pp. 5-32.

Backhaus, A., Kuwabara, A., Bauch, M., Monk, N., Sanguinetti, G.,

and Fleming, A., 2010. LEAFPROCESSOR: a new leaf phenotyping

tool using contour bending energy and shape cluster analysis. New

phytologist, 187(1), pp. 251-261.

Cao, Q., Nagata, M., Mitarai, M., Fujiki, T., and Kinoshita, O., 1996,

Study on Grade Judgment of Fruit Vegetables Using Machine Vision

(Part 2), Journal of SHITA, 8(4), pp.228-236.

Cohen, J., 1960. A coefficient of agreement for nominal scales.

Educational and Psychological Measurement, 20(1), pp. 37-46. doi:

10.1177/001316446002000104.

Freeman, H., 1974. Computer processing of line-drawing images. ACM

Computing Surveys, 6(1), pp. 57-97

Furuta, N., Ninomiya, S., Takahashi, N., Ohmorii, H., Ukaii, Y., 1995,

Quantitative Eyaluation of Soybean (Gtycine max L. Merr.) Leaflet

Principal Component Scores Based on Elliptic Fourier Descriptor,

Breeding Science, 45(3), p.315-320

G.G, Rajput., S.M, Mali., 2010. Marathi Handwritten Numeral

Recognition using Fourier Descriptors and Normalized Chain Code.

IJCA, Special Issue on RTIPPR(3), pp. 141–145.

Gonzalo, M. J., Brewer, M. T., Anderson, C., Sullivan, D., Gray, S.,

and van der Knaap, E. 2009, Tomato fruit shape analysis using

morphometric and morphology attributes implemented in Tomato

Analyzer software program. Journal of the American Society for

Horticultural Science, 134(1), 77-87.

Hayashi, S., Ota, T., Kubota, K., and Ajiki, K., 2004, The questionnaire

survey about laborsaving of the harvest fruit-sorting work of a

strawberry, Journal of the Japanese Society of Agricultural Machinery,

66, Issue Supplement Pages 111-112. (in Japanese)

Hayashi, A., Tanabata, T., and Wada, T., 2017b. A proposal of image

analysis system for measuring strawberries. Hort J, 16(1), pp. 446. (in

Japanese).

Imai, Z., and Shiomi, T., 2004. Reliability of evaluation in physical

therapy Studies. J Phys Ther Sci, 19(3), pp. 261-265. (in Japanese).

Iwata, H., and Ukai, Y., 2002. SHAPE: A computer program package

for quantitative evaluation of biological shapes based on elliptic Fourier

descriptors. J Heredity, 93(5), pp. 384-385.

Keith, W., Jesse, M., Mark, S., 2013. Comparison of digital image

analysis using elliptic Fourier descriptors and major dimensions to

phenotype seed shape in hexaploid wheat (Triticum aestivum L.).

Euphytica, 190(1), pp .99-116.

Khojastehnazh, M., Omid, M., and Tabatabaeefar, A., 2010,

Development of a lemon sorting system based on color and

size. African Journal of Plant Science, 4(4), 122-127.

Kobayashi, Y., Tanaka, S., and Tomiura, Y., 2011. Classification and

assessment of English scientific papers using random forests. IPSJ SIG

Technical Report, 2011-CH-90(6), pp. 1-8. (in Japanese).

Kochi, N., Tanabata, T., Hayashi, A., and Isobe, S., 2018. Development

of 3D measuring system and measurement assessment of strawberry

fruits. Large-Scale Point Cloud Processing, International Journal of

Automation Technology. Vol.12 No.3.

Kuhl, F.P., and Giardina, C.R., 1982. Elliptic Fourier features of a

closed contour. Computer Graphics and Image Processing, 18(3), pp.

236-258.

Liming, X., and Yanchao, Z., 2010, Automated strawberry grading

system based on image processing. Computers and Electronics in

Agriculture, 71, S32-S39.

MAFF (Ministry of Agriculture, Forestry and Fisheries), Plant Variety

Protection PVP Office. Japan Test Guidelines. 2011. Fragaia L.

(strawberry).

http://www.hinsyu.maff.go.jp/info/sinsakijun/kijun/1289.pdf [accessed

January, 1, 2018].

Morimoto, T., Takeuchi, T., Miyata, H., and Hashimoto, Y., 2000,

Pattern recognition of fruit shape based on the concept of chaos and

neural networks. Computers and electronics in agriculture, 26(2), 171-

186.

Nagata, M., Kinoshita, O., Asano, K., Cao, Q., and Hiyoshi, K., 1996,

Studies on Automatic Sorting System for Strawberry (Part 2) :

Development of Sorting System Using Image Processing, Journal of the

Japanese Society of Agricultural Machinery, 58(6), pp.61-67.

Rohlf, F.J., Archie, J.W., 1984. A comparison of Fourier methods for

the description of wing shape in mosquitoes (Diptera: Culicidae). Syst

Zool, 33(3), pp. 302-317.

Scikit-learn. Machine Learning in Python. http://scikit-learn.org/stable/

[accessed January 1, 2018].

Suenaga, T., Imamura, Y., Maeda, K., Yamada, T., and Takamatsu, M.

1989, The workloads of farmers who sort and pack strawberries in

accordance with standards of shipment and their awareness of standards

of shipment, Journal of the Japanese Association of Rural Medicine,

38(4), pp.895-907

Tanabata, T., Shibuya, T., Hori, K., Ebana, K., and Yano, M., 2012.

SmartGrain: High-throughput phenotyping software for measuring seed

shape through image analysis. Plant Physiol, 160(4), pp. 1871-1880.

Tsushima, E., 2002. Application of Coefficient of Reliability to the

Sturdies of Physical Therapy. Rigakuryoho Kagaku, 17(3), pp. 181-187.

Wada, T., Oku, K., Nagano, S., Isobe, S., Suzuki H., Mori M., Takata,

K., Hirata, C., Shimomura, K., Tsubone, M., Katayama, T., Hirashima,

K., Uchimura, Y., Ikegami, H., Sueyoshi, T., Obu, K., Hayashida, T.,

and Shibato, Y., 2017a. Development and characterization of a

strawberry MAGIC population derived from crosses with six cultivars.

Breed. Sci, 67, pp. 370-381.

Zawbaa, H. M., Abbass, M., Hazman, M., and Hassenian, A. E. 2014,

Automatic fruit image recognition system based on shape and color

features. In International Conference on Advanced Machine Learning

Technologies and Applications(pp. 278-290). Springer, Cham.

Zhang, Y., and Wu, L. ,2012, Classification of fruits using computer

vision and a multiclass support vector machine. sensors, 12(9), 12489-

12505.

The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, Volume XLII-2, 2018 ISPRS TC II Mid-term Symposium “Towards Photogrammetry 2020”, 4–7 June 2018, Riva del Garda, Italy

This contribution has been peer-reviewed. https://doi.org/10.5194/isprs-archives-XLII-2-463-2018 | © Authors 2018. CC BY 4.0 License.

470