Chapter6
ApplicationsoftheESPNetarchitectureinmedicalimaging
SachinMehta1
NicholasNuechterlein1
EzgiMercan2
BeibinLi1
ShimaNofallah1
WenjunWu1
XimingLu1
AnatCaspi1
MohammadRastegari (Changethisto1.)3
JoannElmore4
HannanehHajishirzi1
LindaShapiro1
1UniversityofWashington,Seattle,WA,UnitedStates
2SeattleChildren’sHospital,Seattle,WA,UnitedStates
3AllenInstituteofArtificialIntelligence(AI2)andXNOR.AI
4UniversityofCalifornia,LosAngeles,CA,UnitedStates
6.1IntroductionMedicalimagingcreatesvisualrepresentationsofthehumanbody,includingorgansandtissues,toaidindiagnosisandtreatment.Thesevisualrepresentationsareeffectiveinavarietyofmedicalsettingsand
havebecomeintegralinclinicaldecisionmaking.AcommonexampleisX-ray-basedradiographyexaminationsthatareusedtocapturevisualrepresentationsoftheinteriorstructureofthebody.Otherexamples
includePAPsmearanalysisforcervicalcancerscreeningandwholeslidebiopsyanalysisformultiplekindsofcancer,includingbreastcancerandmelanoma.Today’shumanphysicianisnolongerabletointerpretthe
vastamountofinformationnowavailableinmedicalimaging.Forexample,therearehundredsofthousandsofcellsonsomeofthewholeslideimages(WSIs)ofabiopsy.
Machinelearningisaneffectivetoolthatenablesmachines(orcomputers)tolearnmeaningfulpatternsfrommedicalimagingdata,whichcanbeusedtobuildcomputer-aideddiagnosticsystems[1,2].With
the recent advancements inhardware technology and the availability of a large amount ofmedical imagingdata, deep learning-basedmethods, especially convolutional neural networks (CNNs) [3], aregaining
attention inmedical imageanalysis [4,5]. Researchers have applieddeep learning-basedmethods to a variety ofmedical data, such asWSIs [6],magnetic resonance (MR) tumor scans [7], electronmicroscopic
recordings[8],andtasks,suchascancerdiagnosis[2,9]andcellular-levelentitydetection[8,10].
In this chapter, we will describe the ESPNet architecture [11] that has been successfully applied across a variety of visual recognition tasks, including image classification, object detection, semantic
segmentation, andmedical imageanalysis [6,12,13]. To demonstrate themodeling power of theESPNet architecture,we study the application of ESPNet to two differentmedical imaging tasks: (1) tissue-level
segmentationofbreastbiopsyWSIsand(2)tumorsegmentationin3DbrainMRimages.OurresultsshowthattheESPNetarchitecturelearnsmeaningfulrepresentationsefficientlythatallowsittodelivergood
accuracyondifferenttasks.
Therestofthechapterisorganizedasfollows.Section6.2reviewsthedifferenttypesofconvolutionsthatareusedintheESPNetarchitecture,whichisdescribedinSection6.3.Experimentalresultsontwo
differentmedicalimagingdatasetsareprovidedinSection6.4.Section6.5concludesthechapter.
6.2BackgroundTheconvolutionoperationliesattheheartofmanycomputervisionalgorithms,includingCNNs.Inthissection,wewillbrieflyreviewtwotypesofconvolutions,standardanddilatedconvolutions,whichare
usedintheESPNetarchitecture.
6.2.1StandardconvolutionForagiven2Dinputimage ,thestandardconvolutionmovesakernel ofsizen×n1overeveryspatiallocationoftheinput toproducetheoutput .Mathematically,itcanbedefinedas:
where*denotestheconvolutionoperations.
6.2.2DilatedconvolutionDilated convolutions2 are a special form of standard convolutions in which holes are inserted between kernel elements to increase the effective receptive field. These convolutions have been widely used in semantic
segmentation networks [14,15]. For a given 2D input image , the dilated convolutionmoves a kernel of size with a dilation rate of over every spatial location of the input to produce the output .
Mathematically,itcanbedefinedas:
where denotes thedilatedconvolutionoperationwithadilation rateof .Theeffective receptive fieldofadilatedconvolutionalkernel ofsize with a dilation rate of is .Note that
dilatedconvolutionsarethesameasstandardconvolutionswhenthedilationrate is1.AnexamplecomparingdilatedandstandardconvolutionsisshowninFig.6.1.
6.3TheESPNetarchitectureInthissection,weelaborateonthedetailsoftheESPNetarchitecture.WefirstdescribetheESPmodule,thecorebuildingblockoftheESPNetarchitecture,andthendescribedifferentvariantsoftheESPNet
architectureforanalyzingdifferenttypesofmedicalimages,suchaswholeslidebiopsyimagesand3Dbraintumorimages.
6.3.1EfficientspatialpyramidunitThecorebuildingblockoftheESPNetarchitectureistheefficientspatialpyramid(ESP)unit.Tobeefficient,theESPunitdecomposesthestandardconvolutionintoapoint-wiseconvolutionandspatialpyramidofdilated
Figure6.1Acomparisonbetweenstandardanddilatedconvolutionoperations.InputXispaddedwithzeros(whitecells)sothattheresultingoutputaftertheconvolutionoperationisofthesamesizeasthatoftheinput.
convolutionsusingtheRSTM(reduce,splitandtransform,andmerge)principle:
• Reduce:TheESPunitapplies point-wise3(or )convolutionstoaninputtensor toproduceanoutputtensor ,where and representthewidthandheightofthetensor,whereas and represent
thenumberofchannelsintheinputandoutputtensor.
• SplitandTransform:Tolearnthespatialrepresentationsfromalargereceptivefieldefficiently,theESPunitapplies , dilatedconvolutionssimultaneouslywithdifferentdilationrates to toproduceoutput
, .Tolearnspatialrepresentationsfromalargereceptivefield,theESPunitusesadifferentdilationrate, ,ineachbranch.ThisallowstheESPunittolearnspatialrepresentationsfromaneffective
receptivefieldof ,where denotesthekernelsize.
• Merge:TheESPunitconcatenatesthesed-dimensionalfeaturemapstoproduceaM-dimensionalfeaturemap ,where representstheconcatenationoperationand .
Fig.6.2comparestheESPunitwiththestandardconvolution.TheESPunitlearns parametersandperforms operations.Comparedto parametersand operationsforthe
standardconvolution,theRSTMprinciplereducesthetotalnumberofparametersandoperationsbyafactorof ,whilesimultaneouslyincreasingtheeffectivereceptivefieldbyapproximately .
6.3.1.1HierarchicalfeaturefusionfordegriddingintheefficientspatialpyramidunitDilatedconvolutionsinsertzeros,controlledbythedilationrate,betweenkernelelementstoincreasethereceptivefieldofthekernel.However,thisintroducesunwantedcheckerboardorgriddingartifactsontheoutput,as
showninFig.6.3.TheESPunitintroducesahierarchicalfeaturefusion(HFF)methodtoremovetheseartifactsinacomputationallyefficientmanner4.HFFhierarchicallyaddsoutputs andthenconcatenatethemtoproduceahigh-
dimensionalfeaturemap ,where⊕representstheelement-wiseadditionoperation.Fig.6.4visualizestheESPunitwithandwithoutHFF.
Figure6.2AcomparisonbetweenthestandardconvolutionandtheESPunit.IntheESPunit,thestandardconvolutionisdecompose (Weonlyrefertowhitecolorinthisfigurecaption,whichshouldbeokayforbothcolorandB&Wprints.SameforFigure6.1)dintoa
point-wiseconvolutionandaspatialpyramidofdilatedconvolutionsusingtheRSTMprinciple.Notethatindilatedconvolutions,zeros(representedinwhite)areinsertedbetweenthekernelelementstoincreasethereceptivefield.
Figure6.3Thisfigureillustratesagriddingartifactexample,wherea5×5inputwithsingle-activepixel(representedinblue)isconvolvedwitha3×3dilatedconvolutionkerneltoproduceanoutputwithagriddingartifact.Activekernelelementsarerepresentedinora (Thisfigure
illustratesagriddingartifactexample,wherea5x5inputwithasingle-activepixelisconvolvedwitha3x3dilatedconvolutionkerneltoproduceanouputwithagriddingartifact.Activeinput,output,andkernelelementsareshownincolor.)nge.
6.3.2SegmentationarchitectureMostimagesegmentationarchitecturesfollowanencoder–decoderstructure[16].Theencodernetworkisastackofencodingunits,suchasthebottleneckunitinResNet[17],anddownsamplingunitsthathelpthenetwork
learnmultiscalerepresentations.Spatialinformationislostduringfilteringanddownsamplingoperationsintheencoder;thedecodertriestorecoverthelossofthisinformation.Thedecodercanbeviewedasaninverseoftheencoder
thatstacksupsamplinganddecodingunitstolearnrepresentationsthatcanhelpproduceeitherbinaryormulticlasssegmentationmasks.Itisimportanttonotethatavanillaencoder–decodernetworkdoesnotshareinformation
betweenencodinganddecodingunitsateachspatiallevel.Toshareinformationbetweenencodinganddecodingunitsateachspatiallevel,U-Net[8]introducesaskipconnectionbetweentheencodinganddecodingunitsateach
spatiallevel.Thisconnectionestablishesadirectlinkbetweentheencodingandthedecodingunitsandimprovesgradientflow,thusimprovingsegmentationperformance.Fig.6.5comparesthevanillaencoder–decoderandU-Net
styleencoder–decoderarchitectures.
The ESPNet architecture for semantic segmentation extends the U-Net. In contrast to using computationally expensive VGG-style [18] blocks for encoding and decoding the information, the ESPNet architecture uses
computationallyefficientESPunits.
6.4ExperimentalresultsInthissection,weprovideresultsoftheESPNetarchitectureontwodifferentmedicalimagingdatasets:(1)breastbiopsywholeslideimagingand(2)braintumorsegmentation.
6.4.1Breastbiopsywholeslideimagedataset6.4.1.1Dataset
Thebreastbiopsydatasetconsistsof240WSIswithhaematoxylinandeosin(H&E)staining[19,20].Threeexpertpathologistsindependentlyinterpretedeachofthe240casesandthenmettorevieweachcasetogetherand
provideaconsensusreferencediagnosislabel,whichweuseasthegoldstandardgroundtruthforeachcase.Additionally,theexpertpathologistsmarked428regionofinterests(ROIs)onthese240WSIsthatwererepresentativeof
theirdiagnosis.Outof these428ROIs,58weremanually segmentedbyanexpert intoeightdifferent tissue-level segmentation labels:background,benignepithelium,malignantepithelium,normal stroma,desmoplastic stroma,
secretion,blood,andnecrosis.Fig.6.6visualizessomeROIsalongwiththeirtissue-levelsegmentationlabels.Furthermore,wesplitthese58ROIsintotwoequallysizedsubsets,atrainingset(30ROIs)andatestset(28ROIs),while
keepingthedistributionofdiagnosticcategoriessimilar(seeTable6.1).Atotalof87pathologistswhowereactivelyinterpretingbreastbiopsiesintheirownclinicalpracticesparticipatedinthestudyandprovidedoneofthefour
Figure6.4BlocklevelvisualizationoftheESPunitwithoutandwithhierarchicalfeaturefusion(HFF).
Figure6.5Encoder–decoderarchitecturesforsegmentation.Colorsrepresentdifferenttypesofunitsintheencoder–decoderarachitecture (Encoder-decoderarchitecturesfortissue-levelsemanticsegmentation):encodingunitingreen,downsamplingunitinred,
upsamplingunitinorange,anddecodingunitinblue.
diagnosticlabels(benign,atypia,ductalcarcinomainsitu,andinvasivecancer)perWSIofacase.Eachpathologistprovideddiagnosticlabelsforarandomlyassignedsubsetof60patients’WSIs,producinganaverageof22diagnostic
labelspercase.
Table6.1Diagnosticcategory-wisedistributionof58regionofinterests(ROIs)forthesegmentationsubset.
#ROI AverageROIsize(inpixels)
Expertconsensusreferencediagnosticcategory Training Test Total
Benign 4 5 9
Atypia 11 11 22
DCIS 12 10 22
Invasive 3 2 5
Total 30 28 58
6.4.1.2TrainingThebreastbiopsyROIs(orWSIs)havehugespatialdimensions,oftenoftheorderofgigapixels.Duetosuchhighspatialdimensionalityoftheseimagesandthelimitedcomputationalcapabilityofexistinghardwareresources,
itisdifficulttotrainconventionalCNNsdirectlyontheseimages.Thereforewefollowapatch-basedapproachthatsplitsaROI(orWSI)intosmallpatchesofsize384×384withanoverlapof56pixelsbetweenconsecutivepatches.We
usestandardaugmentationstrategiessuchasrandomflipping,cropping,andresizingtopreventoverfitting.Fortraining,wesplitthetrainingsetof30ROIsintotrainingandvalidationsubsetswitha90:10ratio.Wetrainournetwork
for100epochsusingstochasticgradientdescent(SGD)withaninitiallearningrateof0.0001,whichisdecreasedby0.5afterevery30epochs.Toevaluatethetissue-levelsegmentationperformance,weusemeanintersectionover
union(mIOU),awidelyusedmetrictoevaluatesegmentationperformance.
The shape and structure of objects of interest in the breast biopsy are variable in size, and splitting such structures into fixed size using a patch-based approach limits the contextual information; CNNs tend tomake
segmentationerrorsespeciallyattheborderofthepatch(seeFig.6.7).Toavoidsucherrors,wecentrallycropa384×384predictiontoasizeof256×256,asshowninFig.6.8.
Figure6.6ThesetofROIs(toprow)alongwiththeirtissue-levelsegmentationlabels(bottomrow)fromthedataset. (Wecan'tchangecolorsfortissuelabelsinFigure6.6.,6.7,6.8,and6.9)
Figure6.7Segmentationerrorsalongtheborderofthepatch.
6.4.1.3SegmentationresultsTables6.2and6.3summarizetheperformanceofdifferentsegmentationarchitectures.
Table6.2Impactofskipconnectionsanddifferentdecodingunits.
Skipconnection Network
parametersEncoding–decodingunits Add Concat mIOU
ESP–ESP (CouldyoupleaseeditthelayoutofTable6.2,sothatitisthesameaswesubmited?Inparticular,mergethethreerowsinColumn1(correspondingtoESP-ESPsetting),sothatreadersmaynotgetconfusedwithwhichrowisforwhichsetting) 1.95 M 35.23
✓ 1.95 M 36.19
✓ 2.25 M 38.03
ESP–PSP ✓ 2.75 M 44.03
Table6.3Comparisonwithstate-of-the-artmethods.
Method Networkparameters mIOU
Superpixel+SVM NA 25.8
SegNet 12.80 M 37.6
SegNet+additiveskipconnections 12.80 M 38.1
Multiresolution 26.03 M 44.20
Y-Net(ESP–PSP) 2.75 M 44.03
Notes:TheresultsinthefirstfourrowsaretakenfromRefs[6,11,21].
6.4.1.4SkipconnectionsSkipconnectionsbetweentheencoderandthedecoderinFig.6.5canbeconstructedusingtwooperations:(1)element-wiseadditionand(2)concatenation.TheimpactoftheseconnectionsisstudiedinTable6.2.Wecansee
that encoder–decoder networkswith skip connection constructed using concatenation operations improve the accuracy of the vanilla encoder–decoder by about 4% and that the encoder–decoder networkswith skip connections
constructedusingelement-wiseadditionoperationsimprovetheaccuracyofthevanillaencoder–decoderbyabout2%.
6.4.1.5PyramidalspatialpoolingasadecodingunitTraditionally,thedecodingunitinencoder–decoderarchitectures,suchasSegNet[16]andU-Net[8],isthesameastheencodingunit.Inourrecentwork[6,11,21],weintroducedageneralencoder–decodernetwork,sketched
inFig.6.5,thatallowstheuseofdifferentencodinganddecodingunits.WhenwereplacedtheESPunitwiththepyramidalspatialpooling(PSP)unit[22]inthedecoder,theperformanceofournetworkimprovedbyabout6%.Thisis
likelybecausethepoolingoperationsinthePSPunitallowthecaptureofbetterglobalcontextualrepresentations(Fig.6.9).
Figure6.8Centercroppingtominimizesegmentationerrorsalongpatchborder.
6.4.1.6Comparisonwithstate-of-the-artmethodsTable6.3comparestheperformancewithdifferentmethods.Wecanseethattheasymmetricencoder–decoderstructure,withtheESPastheencodingunitandthePSPasthedecodingunit,allowsustobuildalight-weight
networkthathas fewerparametersthanthenetworkofRefs[6,11,21]whiledeliveringsimilarsegmentationperformance.
6.4.1.7Tissue-levelsegmentationmasksforcomputer-aideddiagnosisTissue-levelsegmentationmasksprovideapowerfulabstractionfordiagnosticclassification.Todemonstratethedescriptivepoweroftissue-levelsegmentationmasks,weextractandstudytheimpactoftwofeatures,tissue-
distributionandstructuralfeatures [2], forafour-classbreastcancerdiagnostictask.Forthisanalysis,weusethe428ROIs.Weuseasupport-vectormachineclassifierwithapolynomialkernel ina leave-one-out-cross-validation
strategy.Wemeasure the performance in terms of sensitivity, specificity, and accuracy.Table 6.4 summarizes diagnostic classification results.We can see that simple features extracted from tissue-level segmentationmasks are
powerful,andweareabletoattainaccuraciessimilartopathologists.
Table6.4ImpactofdifferentfeaturesondiagnosticclassificationfromRef.[2].
Diagnosticfeatures Accuracy Sensitivity Specificity
Invasiveversusnoninvasive
Tissue-distributionfeature 0.94 0.70 0.95
Structuralfeature 0.91 0.49 0.96
Pathologists 0.98 0.84 0.99
AtypiaandDCISversusbenign
Tissue-distributionfeature 0.70 0.79 0.41
Structuralfeature 0.70 0.85 0.45
Pathologists 0.81 0.72 0.62
DCISversusatypia
Tissue-distributionfeature 0.83 0.88 0.78
Structuralfeature 0.85 0.89 0.80
Figure6.9ROI-wisepredictions.ThefirstrowshowsdifferentbreastbiopsyROIs,thesecondrowshowstissue-levelground-truthlabels;thethirdrowshowspredictions.ROI,regionofinterest.
Pathologists 0.80 0.70 0.82
Notes:Thebestnumbersareindicatedinbold.
6.4.2Braintumorsegmentation6.4.2.1Dataset
TheMultimodalBrainTumorSegmentationChallenge(BraTS)2018trainingsetconsistsof285multi (multi-instituitional?)institutionalpreoperativemultimodalMRtumorscans,eachconsistingofT1,postcontrastT1-weighted
(T1ce),T2,andFLAIRvolumes[23].Eachvolumeisannotatedwithvoxel labelscorrespondingtothefollowingtumorcompartments:enhancingtumor,peritumoraledema,background,andnecroticcoreandnonenhancingtumor.
Necroticcoreandnonenhancingtumorshareasinglelabel.ThesedataarecoregisteredtothestandardMNIanatomicaltemplate,interpolatedtothesameresolution,andskull-stripped.Ground-truthsegmentationsaremanually
drawnbyradiologists.Fig.6.10visualizessomeMRmodalitiesalongwiththeirvoxel-levelsegmentationlabels.
6.4.2.2TrainingMRscansarehigh-dimensional:eachofthefourMRsequencevolumeshasadimensionof .Weusestandardaugmentationstrategiessuchasrandomflipping,cropping,andresizingtopreventoverfitting.For
training,wesplitthetrainingsetof285MRscansintotrainandvalidationsubsetswith80:20ratio(228:57).Wetrainournetworkfor300epochsusingSGDwithaninitiallearningrateof0.0001anddecreaseditto0.00001after200
epochs.Weevaluatethesegmentationperformanceusinganonlineserver,thatmeasuresthesegmentationperformanceintermsoftheDicescore.Furthermore,weadapttheESPmoduleforvolume-wisesegmentationbyreplacing
thespatialdilatedconvolutionsinFig.6.4withthevolume-wisedilatedconvolutions.Formoredetails,pleaseseeRef.[13].
6.4.2.3ResultsTable6.5summarizestheresultsoftheBRATS2018onlinetestandvalidationsets.Unlikethewinningentriesfromthecompetitionthatoftenusespecificnormalizationtechniques,modelensembling,andpostprocessing
methodssuchasconditionalrandomfield(CRF)toboosttheperformance[7,24],ournetworkwasabletoattaingoodsegmentationperformancewhilelearningmerely3.8millionparameters,whichareanorderofmagnitudefewer
thanexistingmethods.Furthermore,ourvisualinspectionrevealsconvincingperformance;wedisplaysegmentationresultsoverlaidondifferentmodalitiesinFig.6.11.Itisimportanttonotethatthepredictionsmadebyournetwork
aresmoothandlacksomeofthegranularitypresentintheground-truthsegmentation.Thisislikelybecauseourmodelisnotabletolearnsuchgranulardetailswithalimitedamountoftrainingdata.Webelievethatpostprocessing
techniques,suchasCRFs,wouldhelpournetworkincapturingsuchgranulardetailsandwewillstudysuchmethodsinthefuture.
Table6.5Resultsobtainedbyourmethod[13]oftheBraTS2018onlinetestandvalidationset.
Networks Wholetumor Enhancingtumor Tumorcore
Ours—validation 0.883 0.737 0.814
Ours—test 0.850 0.665 0.782
Figure6.10ThesetofMRmodalities(left)alongwiththeirvoxel-levelsegmentationlabels(right).MR,magneticresonance.
Myronenko[24] 0.884 0.766 0.815
Notes:Wealsocomparewiththewinningentryonthetestset[24],whichusesanensembleof10modelsandspecialnormalizationtechniques.
6.4.3OtherapplicationsTheESPNetarchitectureisageneral-purposearchitectureandcanbeusedacrossawidevarietyofapplications,rangingfromimageclassificationtolanguagemodeling.Inourwork,wehaveusedtheESPNetsforimage
classification[12],objectdetection[12],semanticsegmentation[6,11,21],languagemodeling[12],andautismspectraldisorderprediction[25].
6.5ConclusionThis chapter describes the ESPNet architecture that is built using a novel convolutional unit, the ESP unit, introduced in Refs [6,11,21]. To effectively demonstrate the modeling power of the ESPNet
architectureonmedicalimaging,westudyitontwodifferentdatasets:(1)abreastbiopsyWSIdatasetand(2)abraintumorsegmentationMRimagingdataset.OuranalysisshowsthatESPNetcanlearnmeaningful
representationsfordifferentmedicalimagingdataefficiently.
AcknowledgmentResearch reported in this chapterwas supportedbyWashingtonStateDepartment ofTransportation researchgrantT1461-47,NSF III (1703166),NationalCancer Institute awards (R01CA172343,R01
CA140560,RO1CA200690,andU01CA231782),theNSFIGERTProgram(awardno.1258485),theAllenDistinguishedInvestigatorAward,SamsungGROaward,andgiftsfromGoogle,Amazon,andBloomberg.The
authorswouldalsoliketothankDr.JamenBartlettandDr.DonaldL.Weaverforhelpfuldiscussionsandfeedback.Theviewsandconclusionscontainedhereinarethoseoftheauthorsandshouldnotbeinterpretedas
necessarilyrepresentingendorsements,eitherexpressedorimplied,ofNationalCancerInstituteortheNationalInstitutesofHealth,ortheUSGovernment.
References[1]C.J.LynchandC.Liston,Newmachine-learningtechnologiesforcomputer-aideddiagnosis.s.l,Nat.Med.24,2018,1304–1305.
[2]E.Mercan,etal.,Assessmentofmachinelearningofbreastpathologystructuresforautomateddifferentiationofbreastcancerandhigh-riskproliferativelesions,JAMANetw.Open2,2019,e198777.
Figure6.11ThisfigurevisualizesthesegmentationresultsproducedbyournetworkoverlaidoneachofthefourMRmodalitiesintheBRATSdataset.MR,magneticresonance.
[3]LeCun,Y.,Bengio,Y.,Convolutionalnetworksforimages,speech,andtimeseries.VolumeThehandbookofbraintheoryandneuralnetworks.1995.
[4]G.Litjens,etal.,Asurveyondeeplearninginmedicalimageanalysis,Med.ImageAnal.42,2017,60–88.
[5]Shen,D.,Wu,G.,Suk,H.-I.,Deeplearninginmedicalimageanalysis.Annualreviewofbiomedicalengineering,2017,pp.221–248.
[6]Mehta,S.etal.,Y-net:Jointsegmentationandclassificationfordiagnosisofbreastbiopsyimages.s.l.,InternationalConferenceonMedicalImageComputingandComputer-AssistedIntervention(MICCAI),2018b,pp.893–901.
[7]Kamnitsas,K.etal.,Ensemblesofmultiplemodelsandarchitecturesforrobustbraintumoursegmentation.s.l.,InternationalMICCAIBrainlesionWorkshop.2017.
[8]Ronneberger,O.,Fischer,P.&Brox,T.,U-Net:ConvolutionalNetworksforBiomedicalImageSegmentation.s.l.,InternationalConferenceonMedicalImageComputingandComputer-AssistedIntervention(MICCAI),2015,pp.234–241.
[9]A.Esteva,etal.,Dermatologist-levelclassificationofskincancerwithdeepneuralnetworks,Nature542,2017,115–118.
[10]Cireşan,D.C.,Giusti,A.,Gambardella,L.M.,Schmidhuber,J.,MitosisDetectioninBreastCancerHistologyImageswithDeepNeuralNetworks.s.l.,InternationalConferenceonMedicalImageComputingandComputer-AssistedIntervention.2013.
[11]Mehta,S.etal.,ESPNet:Efficientspatialpyramidofdilatedconvolutionsforsemanticsegmentation.s.l.,ProceedingsoftheEuropeanConferenceonComputerVision(ECCV).2018c.
[12]Mehta,S.,Rastegari,M.,Shapiro,L.,Hajishirzi,H.,ESPNetv2:Alight-weight,powerefficient,andgeneralpurposeconvolutionalneuralnetwork.s.l.,IEEEConferenceonComputerVisionandPatternRecognition(CVPR).2019.
[13]Nuechterlein,N.,Mehta,S.,3D-ESPNetwithPyramidalRefinementforVolumetricBrainTumorImageSegmentation.s.l.,InternationalMICCAIBrainlesionWorkshop.2018.
[14]L.-C.Chen,etal.,Deeplab:semanticimagesegmentationwithdeepconvolutionalnets,atrousconvolution,andfullyconnectedCRFs,IEEETrans.PatternAnal.Mach.Intell.2017,834–848.
[15]Yu,F.,Koltun,V.,Multi-scalecontextaggregationbydilatedconvolutions.s.l.,InternationalConferenceonRepresentationLearning.2016.
[16]V.Badrinarayanan,A.KendallandR.Cipolla,Segnet:adeepconvolutionalencoder-decoderarchitectureforimagesegmentation,IEEETrans.patternAnal.Mach.Intell.2017,2481–2495.
[17]He,K.,Zhang,X.,Ren,S.,Sun,J.,DeepResidualLearningforImageRecognition.s.l.,IEEEconferenceoncomputervisionandpatternrecognition(CVPR),2016,pp.770–778.
[18]Simonyan,K.,Zisserman,A.,VeryDeepConvolutionalNetworksforLarge-ScaleImageRecognition.s.l.,InternationalConferenceonRepresentationLearning(ICLR).2015.
[19]J.G.Elmore,etal.,Diagnosticconcordanceamongpathologistsinterpretingbreastbiopsyspecimens,JAMA313,2015,1122–1132.
[20]J.G.Elmore,etal.,Arandomizedstudycomparingdigitalimagingtotraditionalglassslidemicroscopyforbreastbiopsyandcancerdiagnosis,J.Pathol.Inform.8,2017,12.
[21]Mehta,S.etal.,Learningtosegmentbreastbiopsywholeslideimages.s.l.,2018IEEEWinterConferenceonApplicationsofComputerVision(WACV),2018a,pp.663–672.
[22]Zhao,H.etal.,Pyramidsceneparsingnetwork.s.l.,IEEEconferenceoncomputervisionandpatternrecognition(CVPR),2017,pp.2881–2890.
[23]B.H.Menze,etal.,Themultimodalbraintumorimagesegmentationbenchmark(BRATS),IEEETrans.Med.Imaging34,2015,1993–2024.
[24]Myronenko,A.,3DMRIbraintumorsegmentationusingautoencoderregularization.s.l.,InternationalMICCAIBrainlesionWorkshop.2018.
[25]Li,B.etal.,AFacialAffectAnalysisSystemforAutismSpectrumDisorder.s.l.,IEEEInternationalConferenceonImageProcessing(ICIP).2019.
Footnotes1Forsimplicity,weassumethatnisanoddnumber,suchas3,5,7,andsoon.
2Dilatedconvolutionsaresometimesalsoreferredtoasatrousconvolutions,wherein“trous”meansholesinFrench.
QueriesandAnswersQuery:
Pleasecheckalltheauthornamesandaffiliations.
Answer:Yes,theauthor'snamesarecorrect.
Query:
Pleaseprovidecityandcountrynameforaffiliation3.
Answer:Changeofaffiliation3.PleaseusetheUniversityofWashingtoninsteadofAllenInstituteofArtificialintelligenceandXNOR.AI
Query:
Pleasenotethatwehavesetkeywords.Pleasecheckandamendifnecessary.
Answer:Correct
Query:
PleaseprovideexpansionforPAP,FLAIR,andMNI.
Answer:PAP:Papanicolaoutest,FLAIR:Fluidattenuatedinversionrecovery
Query:
PleasenotethatwehaveintroducedFig.6.9citationhere.Pleasecheckandamendifnecessary.
Answer:Lookscorrect
Query:
PleasealterthesignificancebothincaptionandimageofFigs.6.1–6.3,and6.5andimageofFigs.6.6and6.9toensurethattheyaremeaningfulwhenthechapterisreproducedbothincolorandB&W.
Answer:Checkourcomments
3Point-wiseconvolutionsareaspecialcaseofstandard convolutionswhen .Theseconvolutionsproduceanoutputplanebylearninglinearcombinationsbetweeninputchannels.Theseconvolutionsare
widelyusedforeitherchannelexpansionorchannelreductionoperations.
4GriddingartifactscanberemovedbyaddinganotherconvolutionlayerwithnodilationaftertheconcatenationoperationinFig.6.4(left);however,thiswillincreasethecomputationalcomplexityoftheblock.
Abstract
Medicalimagingisafundamentalpartofclinicalcarethatcreatesinformative,noninvasive,andvisualrepresentationsofthestructureandfunctionoftheinteriorofthebody.Withadvancementsintechnology
andtheavailabilityofmassiveamountsofimagingdata,data-drivenmethods,suchasmachinelearninganddatamining,havebecomepopularinmedicalimaginganalysis.Inparticular,deeplearning-basedmethods,
suchas convolutionalneuralnetworks,nowhave the requisite volumeofdataandcomputationalpower tobe consideredpractical clinical tools.Wedescribe thearchitectureof theESPNetnetworkandprovide
experimentalresultsforthetaskofsemanticsegmentationontwodifferenttypesofmedicalimages:(1)tissue-levelsegmentationofbreastbiopsywholeslideimagesand(2)3Dtumorsegmentationinbrainmagnetic
resonanceimages.OurresultsshowthattheESPNetarchitectureisefficientandlearnsmeaningfulrepresentationsfordifferenttypesofmedicalimages,whichallowsESPNettoperformwellontheseimages.
Keywords:ESPNet;wholeslideimages;magneticresonanceimages;medicalimaging;tumorsegmentation
Top Related