Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal...

7
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 11, No. 11, 2020 Data Augmentation using Generative Adversarial Network for Gastrointestinal Parasite Microscopy Image Classification Mila Yoselyn Pacompia Machaca 1 , Milagros Lizet Mayta Rosas 2 , Eveling Castro-Gutierrez 3 , Henry Abraham Talavera D´ ıaz 4 and Victor Luis Vasquez Huerta 5 1 2345Universidad Nacional de San Agust´ ın de Arequipa Arequipa, Per´ u Abstract—Gastrointestinal parasitic diseases represent a latent problem in developing countries; it is necessary to create a support tools for the medical diagnosis of these diseases, it is required to automate tasks such as the classification of samples of the causative parasites obtained through the microscope using methods like deep learning. However, these methods require large amounts of data. Currently, collecting these images represents a complex procedure, significant consumption of resources, and long periods. Therefore it is necessary to propose a computational solution to this problem. In this work, an approach for generating sets of synthetic images of 8 species of parasites is presented, using Deep Convolutional Adversarial Generative Networks (DCGAN). Also, looking for better results, image enhancement techniques were applied. These synthetic datasets (SD) were evaluated in a series of combinations with the real datasets (RD) using the classification task, where the highest accuracy was obtained with the pre-trained Resnet50 model (99,2%), showing that increasing the RD with SD obtained from DCGAN helps to achieve greater accuracy. KeywordsGenerative Adversarial Network (GAN); Deep Con- volutional Generative Adversaria Network (DCGAN); gastrointesti- nal parasites; classification; deep learning I. I NTRODUCTION Diseases caused by parasites are a public health problem on a global scale; they can be of high risk and high prevalence, as shown by their incidence rates in the population. According to the World Health Organization (WHO), the malaria disease caused by the Plasmodium parasite causes 400000 deaths per year [1], and more than 2 billion people are infected with soil- transmitted helminthiases. [2]. According to the National Institute of Health of Peru (INS), intestinal parasitism increased its prevalence rate among sectors with fewer resources [3]. Although the cases registered in the south of the country with a diagnosis of intestinal parasitism were more frequent in children or people in school age 41.75%, this problem also affects people in adulthood or youth, who also presented a considerable percentage of incidence in this disease with 20.45% and 35.09%, respectively [4]. Here is the importance of addressing this problem using computational methods to implement support systems for medical diagnosis, recognition, and classification of images obtained through the microscope. However, methods based on deep learning that have had excellent results in similar applications require large datasets. The collection of medical data involves a lengthy procedure that may require applying different protocols and the intervention of a specialist. There- fore it is a task that represents a significant consumption of time and resources. In this work, the application of two techniques for image enhancement is presented, Wiener and Wavelet, in order to obtain an improvement in the quality of the images; also, an approach to increase data is presented for the generation of synthetic training samples of microscopy images of eight species of gastrointestinal parasites, using DCGAN a variation of GAN [5], from a reduced initial dataset. In total, three sets of augmented data are generated resulting from the training of DCGAN, both for the dataset resulting from the application of image enhancement with Wiener filter, the dataset resulting from image enhancement with Wavelet denoising, and the original dataset without enhancement. The three datasets are used for training in classification models pre-trained by transfer learning, Resnet34, Resnet50, among others, independently to verify a performance improvement and compare the results obtained. The article’s structure is explained below: In Section II, related works are addressed, Section III describes the method- ology used, to finish with the results and conclusions in Sections IV and V, respectively. II. RELATED WORKS According to the literature, various authors have applied deep learning in analyzing medical images [6], [7]. And in particular for the treatment of microscopy images of parasites [8], [9]. However, RD collection usually has a difficult and time- consuming procedure; this generates insufficient data for ex- perimentation with deep learning methods for tasks such as image segmentation and classification. For this reason, some authors focus on increasing datasets for better results. To address this problem in classifying in microscopy im- ages of parasites, some authors used: The Python Augmentor library, [10], or traditional data augmentation techniques, [11]. On the other hand, due to the problems encountered when working with layered laser scanning microscopy image data, in [12] M. Bloice et. al, present their own software package: Augmentor which is a stochastic pipeline-based augmentation www.ijacsa.thesai.org 765 | Page

Transcript of Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal...

Page 1: Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal Parasite Microscopy Image Classification Mila Yoselyn Pacompia Machaca1, Milagros Lizet

(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 11, No. 11, 2020

Data Augmentation using Generative AdversarialNetwork for Gastrointestinal Parasite Microscopy

Image Classification

Mila Yoselyn Pacompia Machaca1, Milagros Lizet Mayta Rosas2, Eveling Castro-Gutierrez3,Henry Abraham Talavera Dıaz4 and Victor Luis Vasquez Huerta5

12345Universidad Nacional de San Agustın de ArequipaArequipa, Peru

Abstract—Gastrointestinal parasitic diseases represent a latentproblem in developing countries; it is necessary to create asupport tools for the medical diagnosis of these diseases, it isrequired to automate tasks such as the classification of samplesof the causative parasites obtained through the microscope usingmethods like deep learning. However, these methods require largeamounts of data. Currently, collecting these images representsa complex procedure, significant consumption of resources, andlong periods. Therefore it is necessary to propose a computationalsolution to this problem. In this work, an approach for generatingsets of synthetic images of 8 species of parasites is presented, usingDeep Convolutional Adversarial Generative Networks (DCGAN).Also, looking for better results, image enhancement techniqueswere applied. These synthetic datasets (SD) were evaluated ina series of combinations with the real datasets (RD) using theclassification task, where the highest accuracy was obtained withthe pre-trained Resnet50 model (99,2%), showing that increasingthe RD with SD obtained from DCGAN helps to achieve greateraccuracy.

Keywords—Generative Adversarial Network (GAN); Deep Con-volutional Generative Adversaria Network (DCGAN); gastrointesti-nal parasites; classification; deep learning

I. INTRODUCTION

Diseases caused by parasites are a public health problemon a global scale; they can be of high risk and high prevalence,as shown by their incidence rates in the population. Accordingto the World Health Organization (WHO), the malaria diseasecaused by the Plasmodium parasite causes 400000 deaths peryear [1], and more than 2 billion people are infected with soil-transmitted helminthiases. [2].

According to the National Institute of Health of Peru(INS), intestinal parasitism increased its prevalence rate amongsectors with fewer resources [3]. Although the cases registeredin the south of the country with a diagnosis of intestinalparasitism were more frequent in children or people in schoolage 41.75%, this problem also affects people in adulthoodor youth, who also presented a considerable percentage ofincidence in this disease with 20.45% and 35.09%, respectively[4].

Here is the importance of addressing this problem usingcomputational methods to implement support systems formedical diagnosis, recognition, and classification of imagesobtained through the microscope. However, methods basedon deep learning that have had excellent results in similar

applications require large datasets. The collection of medicaldata involves a lengthy procedure that may require applyingdifferent protocols and the intervention of a specialist. There-fore it is a task that represents a significant consumption oftime and resources.

In this work, the application of two techniques for imageenhancement is presented, Wiener and Wavelet, in order toobtain an improvement in the quality of the images; also,an approach to increase data is presented for the generationof synthetic training samples of microscopy images of eightspecies of gastrointestinal parasites, using DCGAN a variationof GAN [5], from a reduced initial dataset. In total, three setsof augmented data are generated resulting from the training ofDCGAN, both for the dataset resulting from the application ofimage enhancement with Wiener filter, the dataset resultingfrom image enhancement with Wavelet denoising, and theoriginal dataset without enhancement. The three datasets areused for training in classification models pre-trained by transferlearning, Resnet34, Resnet50, among others, independently toverify a performance improvement and compare the resultsobtained.

The article’s structure is explained below: In Section II,related works are addressed, Section III describes the method-ology used, to finish with the results and conclusions inSections IV and V, respectively.

II. RELATED WORKS

According to the literature, various authors have applieddeep learning in analyzing medical images [6], [7]. And inparticular for the treatment of microscopy images of parasites[8], [9].

However, RD collection usually has a difficult and time-consuming procedure; this generates insufficient data for ex-perimentation with deep learning methods for tasks such asimage segmentation and classification. For this reason, someauthors focus on increasing datasets for better results.

To address this problem in classifying in microscopy im-ages of parasites, some authors used: The Python Augmentorlibrary, [10], or traditional data augmentation techniques, [11].On the other hand, due to the problems encountered whenworking with layered laser scanning microscopy image data,in [12] M. Bloice et. al, present their own software package:Augmentor which is a stochastic pipeline-based augmentation

www.ijacsa.thesai.org 765 | P a g e

Page 2: Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal Parasite Microscopy Image Classification Mila Yoselyn Pacompia Machaca1, Milagros Lizet

(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 11, No. 11, 2020

library of images and includes features relevant to the domainof biomedical images, in order to to ensure that the new datais meaningful.

In recent works, GANs have been used for the task of rec-ognizing microscopic images of parasites [13], where throughtransfer learning, the authors intend to identify the parasiteToxoplasma gondii using Fuzzy Cycle Generative AdversarialNetwork (FCGAN) and in [10], where V. Fomene proposesa classifier for the diagnosis of malaria in rural areas usingMobileNet and GAN.

GAN has even been used to the transfer between differentmodalities of microscopy images, as presented in [14], theauthors investigate a conditional GAN in order to use themonitoring techniques of cells through the microscope: PhaseContrast (PC) and Differential Interference Contrast (DIC),passing from one to the other through the proposed algorithm.The focus of this work is to use GANs for the generationof data augmentation in order to face the problem of datadeficiency for the application of deep learning models in taskssuch as the segmentation [15], [16], and classification [17],[18], of microscopy images.

GANs have also been used to segmentation microscopicimages of pluripotent retinal pigmented epithelial stem cells[19], where M. Majurski et al., use GAN in one of theirapproaches to optimize the coefficients of their ConvolutionalNeural Network (CNN). Some authors used Conditional GAN(CGAN), a conditional version of GAN, in which auxiliaryinformation is fed to both the discriminator and the generatoras an additional input layer [20]. This version was used toincrease the dataset made up of polycrystalline iron images[15], where they also presented the transfer learning appli-cation with data fusion simulations obtained with the MonteCarlo Potts model and the image style information obtainedfrom real images. CGAN was also used to increase the datasetof microscopic images of red blood cells [17]. The architecturethat was proposed in [16], for the generation of the nuclei cellimage-mask pair, consists of 2 stages where they first use aGAN to learn the data distribution of the masks and generatea synthesized binary mask and then incorporate this masksynthesized in the second GAN which also learns a mappingof the random noise vector to perform a conditional generationof the synthesized image.

On the other hand, Deep Convolutional GAN (DCGAN)has also been considered for data augmentation. DCGAN,which is a variation of GAN that uses convolutional layers inthe discriminator and convolutional-transpose in the generator,where also the discriminator is composed of stridden convolu-tion layers, batch norm layers, and LeakyReLU activations, andthe generator is composed of convolutional-transpose layers,batch norm layers, and ReLU activations, [21].

R. Verma et al., present in [18], use DCGAN to generatesynthetic samples of 5 classes of proteins, which were used inthe classification task both before and after the increase of SD,comparing DCGAN results with those of traditional methods.In search of better results, some authors have modified theoriginal structure of the GANs by including two discriminators[14]; it has also been considered to include U-net structuralelements in their GANs [14], [19].

It has also tried to improve the quality of microscopic im-

ages as preprocessing before data augmentation, such as con-trast enhancement with Histogram Equalization (HE), Adap-tive Histogram Equalization (AHE), and Contrast LimitedAdaptive Histogram Equalization (CLAHE) [18].

Some of the resulting augmented datasets have been eval-uated in pre-trained deep learning models under a transferlearning approach, such as VGG, Resnet, NasNet, Inception,MobileNet. Because these have become popular in medicalimage classification and segmentation work [11], [13], [18].

The review of the related works shows that the use ofGANs for data augmentation is competitive, performs well,or even outperforms traditional augmentation methods.

This work’s contribution is a) the improvement of thequality of the dataset of microscopy images of eight species ofgastrointestinal parasites with the Wiener and Wavelet filters,b) the use of DCGAN to augment the dataset by generatingSD, and c) the resulting datasets for the training of deeplearning pre-trained classification models by transfer learningand finally the comparison of the results obtained.

III. MATERIALS AND METHODS

For the development of this experimental research, aprocess that is applied in steps was defined. The proposedmethodology consists of image preprocessing with improve-ment techniques, use of DCGAN for data augmentation, andevaluating synthetic datasets generated by classifying imageswith previously trained models using the transfer-based learn-ing approach. In Fig. 1, the methodology of the approach forthis work is detailed.

A. Dataset

The dataset consists of a vector of characteristics of eachimage to be trained. Microscopy images of parasites were used.There are a total of 954 images:

• Ascaris.

• Hookworms.

• Trichuris trichiura.

• Hymenolepis nana.

• Diphyllobothrium pacificum.

• Taenia solium.

• Fasciola hepatica.

• Enterobius vermicularis.

Each image from ground truth has 1200 x 1600 (heightby width) and has three channels. A zoomed sample of eachspecies of these parasites is presented in Fig. 2.

B. Image Enhancement

Two techniques were applied independently, Wiener filterand Wavelet denoising to improve the set of images.

1) Wiener filter: Wiener filter has been applied for imageenhancement to reduce the additive noise and add variationsin images. Wiener filter is one of the most known filters usedfor image enhancement [22].

www.ijacsa.thesai.org 766 | P a g e

Page 3: Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal Parasite Microscopy Image Classification Mila Yoselyn Pacompia Machaca1, Milagros Lizet

(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 11, No. 11, 2020

2) Wavelet denoising: Wavelet denoising has been appliedwith the purpose to vary characteristics in the images [23].

Fig. 3, shows a sample of the results in terms of imageenhancement.

C. Generative Adversarial Network

CNNs are designed for data with spatial structures, and theyare composed of many filters, which convolve or slide throughthe data and produce an activation at each slide position. Theseactivations produce a feature map representing how much thedata in that region activated the filter.

On the other hand, GANs are a type of generative modelbecause they learn to copy the data distribution of the data theygive them. Therefore, they can generate novel images that lookalike. A GAN is called an “adversary” because it involves twocompeting networks (adversaries) trying to outwit each other.

The generator (G) is a neural network, which takes a vectorof random variables (Latent Space), and produces an imageigenerator.

The discriminator (D) is also a neural network, whichtakes an image, I, and produces a single output p, deciding

Fig. 1. Methodology. Top images should correspond to a specific speciesof parasite; then, it could be applied filter wiener or wavelet denoising asa preprocessing technique. After this, a DCGAN is able to generate imageswith one generator and one discriminator. Finally, ground truth and generatedimages are evaluated with classifiers Resnet34, Resnet50, and VGG. Theprocess of data augmentation is applied independently for each parasitespecies.

Fig. 2. Parasite species: A sample of each parasite species that is addressedin this work is presented, and special characteristics of the edges of these arehighlighted in the red boxes. Hymenolepis nana has (from outside to inside)a wide rim, a semitransparent middle part, and an inner part with a texturesimilar to the outer ring. Trichiuris trichiura has different color intensitieson edge. Hookworms have a semitransparent membrane with an irregularinternal shape. Enterobius vermicularis has a semitransparent membrane withan internal oval shape.

the probability that the image is real. When p = 1p = 1, thediscriminator strongly believes that the image is real, and whenp = 0p = 0, the discriminator strongly believes that the imageis false.

The discriminator (D) receives igenerator, and is taughtthat the image is false. In more concrete terms, the discrimina-tor maximizes log(1−pgenerator). The discriminator receivesa real image, ireal, and is taught that the image is real, ormaximizes log(preal).

The generator tries to do the exact opposite; it also triesto make the discriminator maximize the probability that itbelieves that the false image is real, so the generator will betrying to maximize log(pgenerator) [5].

As mentioned in Section II, a DCGAN uses convolutionallayers in the discriminator (D) and convolutional-transpose lay-ers in the generator (G). The discriminator, composed of strid-den convolution layers, batch norm layers, and LeakyReLUactivations, receives an image and returns a scalar probabilitythat it is from the distribution of real data.

The generator, consisting of convolutional-transpose layers,batch norm layers, and ReLU activations, receives a latent vec-tor, z, which is extracted from a standard normal distributionand returns an RGB image [21].

D. Data Augmentation

DCGAN was used as part of the proposed methodology forthe generation of synthetic images for data augmentation. Forthe DCGAN, binary cross-entropy is used as a loss function.And for discriminator and generator, Adam optimizers areapplied with a learning rate of 0.0002. In Table I, DCGANstructure used for this work is detailed. A class called GANis created. It imports the relevant classes and initializes thevariables. A generator model was created with the followinglayers:

• Convolutional transpose 2D layer with an input of100x100 image with 3 channels.

• Batch normalization: normalizes the data.

www.ijacsa.thesai.org 767 | P a g e

Page 4: Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal Parasite Microscopy Image Classification Mila Yoselyn Pacompia Machaca1, Milagros Lizet

(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 11, No. 11, 2020

Fig. 3. Enhanced image samples. (a) Microscopy image of Fasciola hepatica parasite corresponding to the original dataset, (b) Image enhanced with Wienerfilter, (c) Image enhanced with Wavelet denoising.

• ReLU.

These last three layers are repeated and make a block of fourCBR (Convolutional Transposed 2D - Batch Normalization -ReLU) Finally, the following layers are applied:

• Convolutional transpose 2D

• Hyperbolic Tangent

A discriminator can also become a sequential model bygoing in the opposite direction. The discriminator was built asa model that is governed by the following directives:

• The first layer is for applying a convolution to a 64x64image with 3 channels.

• Add a Leaky ReLU activation feature.

Then, the following three layers are applied:

• Convolution 2D

• Batch normalization 2D

• Leaky ReLU.

Finally, the following layers are applied to get the outputdecision:

• Convolution 2D

• Sigmoid

E. Evaluation

Classification task is defined to evaluate the syntheticmicroscopy image datasets generated by DCGAN. For this, thepre-trained deep learning models Resnet34, Resnet50, VGG16,Densenet121, and Inceptionv3 are used under a transfer learn-ing approach.

1) Accuracy: The metric that was chosen to evaluate themodels is Accuracy1.

Accuracy =

∑c TPc + FNc∑

c TPc + TNc + FPc + FNc(1)

TABLE I. DCGAN STRUCTURE

LAYER DESCRIPTION

ConvTranspose2d Applies a 2D transposed convolution operator overthe image.

BatchNorm2d Applies Batch Normalization over a 4D input.

ReLU Applies the rectified linear unit function element-wise.

Tanh Applies hyperbolic tangent function element-wise.

Conv2d Applies a 2D convolution over an input signalcomposed of several input planes.

LeakyReLU Applies leaky rectified linear unit functionelement-wise.

Sigmoid Applies sigmoid function element-wise.

IV. EXPERIMENTATION AND RESULTS

A. Tools and technological infrastructure

For the experimentation, the following software tools wereused: Python (libraries such as Pytorch, Numpy, Scipy, ScikitImage) and Cuda. The preprocessing, training and classifica-tion tasks were performed in a workstation with a processorIntel Xeon Gold 5115 CPU, Memory 128 GB, and a Videocard Quadro P5000 16GB.

B. Data Augmentation Results

DCGAN was trained separately in isolation for each of theimage sets for each mentioned parasite species. The averageexecution time for the datasets without any image enhance-ment considering 15000 epochs was 11 hours; the averageexecution time for the datasets improved with the Wiener filter,considering 15000 epochs was 8 hours. As for the data setsimproved with the Wavelet denoising. Initially, 15000 epochswere considered. However, it had to be reduced to only 3000epochs due to the long execution time that represented a highcomputational cost and the small number of images generated,obtaining an average of 7 hours, Fig. 4.

The numbers of synthetic images obtained in these runswere 612 for the datasets without image enhancement, 470for the datasets enhanced with the Wiener filter, and 159 forthe datasets enhanced with Wavelet denoising. These results

www.ijacsa.thesai.org 768 | P a g e

Page 5: Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal Parasite Microscopy Image Classification Mila Yoselyn Pacompia Machaca1, Milagros Lizet

(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 11, No. 11, 2020

represent an increase of 65.88%, 50.59%, and 17.12%, over theoriginal dataset, respectively. Fig. 5, shows the most significantamounts of synthetic images generated for each species ofparasite.

According to [16], large-scale GAN imaging and trainingstability is challenging. The original size of the images 1200x 1600 was kept for the training in DCGAN; the synthetic im-ages obtained have allowed magnifying the domain, providinga greater training ground for the classification of parasites’species. Fig. 6, shows a data grid with 64 generates syntheticimages.

Fig. 4. Execution time for generating synthetic images in DCGAN

Fig. 5. Increment of the dataset. Real Data: Images generated from real datawithout enhancement. Real Data + Wiener: Images generated from real dataenhanced with Wiener. Real Data + Wavelet: Images generated from real dataenhanced with Wavelet.

C. Results and Discussion

It was decided to train model using Resnet34, Resnet50,VGG16, Densenet121, and Inceptionv3, independently, resiz-ing the images from 1200 x 1600 to 64 x 64 to improvethe results and to evaluate the datasets and get a comparativeview of the accuracy metric, considering only 50 epochs, withthe datasets: First, only with the original RD without any

Fig. 6. Grid of synthetic images generated after 6000 epochs, parasite: Ascariswithout image enhancement before DCGAN training.

preprocessing or image enhancement applied. Second, with theoriginal RD without any preprocessing or image enhancementcombined with the SD generated. Third, only with the RDobtained after applying the Wiener filter. Fourth, with theRD obtained after applying the Wiener filter combined withthe SD generated from it. Fifth, only with the RD obtainedafter applying the Wavelet denoising and sixth, with the RDobtained after applying the Wavelet denoising combined withthe SD generated from it.

The results can be seen in Table II. The best accuracy:0.992 was obtained with the Resnet50 model, and the dataset:SD obtained after applying the Wiener filter + the RD. Thisshows that using SD in the image classification task improvesthe accuracy results.

According to the literature, no data augmentation workshave been found for classification of microscopy images ofthe eight species of gastrointestinal parasites using DCGAN.

However, the use of different techniques was found, forobjectives similar to those that have been raised in the presentwork. M. Roder et. al, present in [24], the use of RestrictedBoltzmann Machines for data augmentation and the classifica-tion of helminth eggs using Deep Belief Networks, obtaininga balanced accuracy of 92.09%, the authors worked withgrayscale microscopy images , while in this work RGB scaleimages were used.

The work of R. Verma et al. [18] is considered a similarmethodology to that described in the present work. Verma usedDCGAN for the generation of SD of 5 classes of proteinsand also classification models based on transfer learning withwhich they obtained a result of 0.924 for the accuracy measurein VGG16 training. The best result in classification wasobtained with the Resnet50 model with 0.992 for the accuracymeasure in this work. It should be noted that in their work, theauthors used different preprocessing techniques and datasets.

V. CONCLUSIONS

DCGAN was used to increase synthetic data, and theresults generated were compared through the classification

www.ijacsa.thesai.org 769 | P a g e

Page 6: Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal Parasite Microscopy Image Classification Mila Yoselyn Pacompia Machaca1, Milagros Lizet

(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 11, No. 11, 2020

TABLE II. CLASIFICATION RESULTS.

Model Cassification Metric: Accuracy

No DatasetEnhancement

Dataset withWiener filter

Dataset withWavelet

denoising

RD RD + SD RD RD + SD RD RD + SD

ResNet34 0.619 0.974 0.978 0.978 0.935 0.990

ResNet50 0.989 0.961 0.967 0.992 0.978 0.930

VGG16 0.869 0.902 0.826 0.892 0.847 0.888

Densenet121 0.967 0.987 0.967 0.985 0.956 0.907

Inceptiov3 0.967 0.980 0.956 0.956 0.956 0.990

task so that greater precision was obtained with the Resnet50model, with an accuracy of 0.992, and the dataset: RD +SD obtained after applying the Wiener filter. Other pr-trainedmodels also showed similar results, such as Resnet34 and In-ceptionv3, with an accuracy of 0.990, both with the dataset: RD+ SD obtained after applying Wavelet denoising. Therefore, itis shown that the use of RD + SD provides greater accuracyin the classification compared to using only the RD.

The combinations of datasets that gave the best results withan average accuracy of 0.961 were: RD + SD obtained fromthe set of images without improvements and RD + SD obtainedfrom the set of images improved with the Wiener filter.

The highest amount of SD generated from RD was obtainedwithout improvements (612). On the other hand, the amount ofSD obtained after applying both the Wiener filter (470) and theWavelet denoising (159) was lower due to the characteristicsof the images as it highlights impurities present in the samples.

Despite the complexity and computational cost of theDCGAN training stage, an accuracy of 99.2% was achieved.However, there is still room for improvement. In the future, it isintended to find preprocessing techniques that allow improvingall image sets in order to generate quality synthetic imagesfor the eight species of parasites. It is recommended to usetechniques other than Wiener filter and Wavelet denoisingfor Uncinaria, Trichiuris trichiura, and Enterobius vermicularisparasites that highlight their morphological characteristics toobtain better samples of synthetic data.

ACKNOWLEDGMENT

The authors would like to thank the support and subventionof the UNIVERSIDAD NACIONAL DE SAN AGUSTIN DEAREQUIPA to the Project called: Assistance in the diagnosisof gastrointestinal parasitosis through prevalence rates andmicrographs, using high performance computational and com-puter vision applied to the Arequipa region with the contractNo IBA-BIOM-2018-1. Thanks to the CiTeSoft Contract: EC-0003-2017-UNSA for the equipment and the resources bringto the project.

REFERENCES

[1] World Health Organization and others, “Malaria eradication: benefits,future scenarios and feasibility,” 2020.

[2] World Health Organization and others, “Soil-transmitted helminthiases:eliminating as public health problem soil-transmitted helminthiases inchildren: progress report 2001-2010 and strategic plan 2011-2020,”2012.

[3] M. Beltran, R. Tello, and C. Naquira, “Manual de procedimientos delaboratorio para el diagnostico de los parasitos intestinales del hombre,”2003.

[4] G. Pajuelo Camacho, D. Lujan Roca, and B. Paredes Perez, “Estudiode enteroparasitos en el hospital de emergencias pediatricas, lima-peru.”Revista Medica Herediana, vol. 16, no. 3, pp. 178–183, 2005.

[5] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley,S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,”Advances in neural information processing systems, pp. 2672–2680,2014.

[6] G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi,M. Ghafoorian, J. A. Van Der Laak, B. Van Ginneken, and C. I.Sanchez, “A survey on deep learning in medical image analysis,”Medical image analysis, vol. 42, pp. 60–88, 2017.

[7] O. Z. Kraus, J. L. Ba, and B. J. Frey, “Classifying and segmenting mi-croscopy images with deep multiple instance learning,” Bioinformatics,vol. 32, no. 12, p. 52–59, 2016.

[8] A. Peixinho, S. Martins, J. Vargas, A. Falcao, J. Gomes, and C. Suzuki,“Diagnosis of human intestinal parasites by deep learning,” Computa-tional Vision and Medical Image Processing V: Proceedings of the 5thEccomas Thematic Conference on Computational Vision and MedicalImage Processing (VipIMAGE 2015, Tenerife, Spain, p. 107, 2015.

[9] J. A. Quinn, R. Nakasi, P. K. Mugagga, P. Byanyima, W. Lubega,and A. Andama, “Deep convolutional neural networks for microscopy-based point of care diagnostics,” in Machine Learning for HealthcareConference, 2016, pp. 271–281.

[10] V. Fomene, “Developing a machine learning model for malaria diagno-sis in rural areas,” 2018.

[11] M. M. Aladago, “Classification and quantification of malaria parasitesusing convolutional neural networks,” 2018.

[12] M. D. Bloice, P. M. Roth, and A. Holzinger, “Biomedical imageaugmentation using augmentor,” Bioinformatics, vol. 35, no. 21, pp.4522–4524, 2019.

[13] S. Li, A. Li, D. A. M. Lara, J. E. G. Marın, M. Juhas, and Y. Zhang,“A novel transfer learning approach for toxoplasma gondii microscopicimage recognition by fuzzy cycle generative adversarial network,”bioRxiv, pp. 567–891, 2019.

[14] L. Han and Z. Yin, “Transferring microscopy image modalities withconditional generative adversarial networks,” in Proceedings of theIEEE Conference on Computer Vision and Pattern Recognition Work-shops, 2017, pp. 99–107.

[15] B. Ma, X. Wei, C. Liu, X. Ban, H. Huang, H. Wang, W. Xue, S. Wu,M. Gao, Q. Shen et al., “Data augmentation in microscopic images formaterial data mining,” npj Computational Materials, vol. 6, no. 1, pp.1–9, 2020.

[16] S. Pandey, P. R. Singh, and J. Tian, “An image augmentation ap-proach using two-stage generative adversarial network for nuclei imagesegmentation,” Biomedical Signal Processing and Control, vol. 57, p.101782, 2020.

[17] O. Bailo, D. Ham, and Y. Min Shin, “Red blood cell image gener-ation for data augmentation using conditional generative adversarialnetworks,” in Proceedings of the IEEE Conference on Computer Visionand Pattern Recognition Workshops, 2019.

[18] R. Verma, R. Mehrotra, C. Rane, R. Tiwari, and A. K. Agariya,“Synthetic image augmentation with generative adversarial network forenhanced performance in protein classification,” Biomedical Engineer-ing Letters, vol. 10, no. 3, pp. 443–452, 2020.

[19] M. Majurski, P. Manescu, S. Padi, N. Schaub, N. Hotaling, C. Simon Jr,and P. Bajcsy, “Cell image segmentation using generative adversarialnetworks, transfer learning, and augmentations,” in Proceedings ofthe IEEE Conference on Computer Vision and Pattern RecognitionWorkshops, 2019.

[20] M. Mirza and S. Osindero, “Conditional generative adversarial nets,”arXiv preprint arXiv:1411.1784, 2014.

[21] A. Radford, L. Metz, and S. Chintala, “Unsupervised representationlearning with deep convolutional generative adversarial networks,”arXiv preprint arXiv:1511.06434, 2015.

[22] R. K. B. Vijayalakshmi A, “Deep learning approach to detect malariafrom microscopic images.” Multimedia Tools and Applications, vol. 79,p. 15297–15317, 2019.

www.ijacsa.thesai.org 770 | P a g e

Page 7: Data Augmentation using Generative Adversarial Network for ......Network for Gastrointestinal Parasite Microscopy Image Classification Mila Yoselyn Pacompia Machaca1, Milagros Lizet

(IJACSA) International Journal of Advanced Computer Science and Applications,Vol. 11, No. 11, 2020

[23] A. M. Wink and J. B. T. M. Roerdink, “Denoising functional mr images:a comparison of wavelet denoising and gaussian smoothing,” IEEETransactions on Medical Imaging, vol. 23, no. 3, pp. 374–387, 2004.

[24] M. Roder, L. A. Passos, L. C. F. Ribeiro, B. C. Benato, A. X. Falcao,

and J. P. Papa, “Intestinal parasites classification using deep beliefnetworks,” in International Conference on Artificial Intelligence andSoft Computing. Springer, 2020, pp. 242–251.

www.ijacsa.thesai.org 771 | P a g e