Segmentation uncertainty and error estimation in medical ... · Manual delineation study...
Transcript of Segmentation uncertainty and error estimation in medical ... · Manual delineation study...
Prof.LeoJoskowiczCASMIPLab,SchoolofComputerScienceandEngineering
TheHebrewUniversityofJerusalem,ISRAEL
Jointwith:D.Cohen,Dr.N.Caplan,Prof.J.SosnaDept.ofRadiology,HadassahUniv.MedicalCenter,Jerusalem,Israel
Segmentationuncertaintyanderrorestimationinmedicalimaging
@ Copyright L. Joskowicz 2018
Whydowemeasure?
“Measure what you can measure, andmakemeasureablewhatisnotmeasurable”
GalileoGalilei
“Ascienceisasmatureasitsmeasurementtools”
LouisPasteur
MotivationSegmentationofanatomicalstructuresandpathologies
inmedicalimagesisimportant!Clinical� Identification and quantification of structures
Livertumors Kidneystructure Carotidstenosis
Jan 25, 2017 July 8, 2018
Baseline Follow-up
MotivationSegmentationofanatomicalstructuresandpathologies
inmedicalimagesisimportant!Clinical� Diagnosis and disease progression
external fixationinternalfixation
MotivationSegmentationofanatomicalstructuresandpathologies
inmedicalimagesisimportant!Clinical� Treatment planning
Radiosurgery Femurfracturesurgery
3Dprintedpatientspecificsurgicalguides
MotivationSegmentationofanatomicalstructuresandpathologies
inmedicalimagesisimportant!Clinical� Treatment delivery -- surgery
Roboticsurgery
SpineAssist
MotivationSegmentationofanatomicalstructuresandpathologies
inmedicalimagesisimportant!TechnicalFundamental problem inmedical image processing� Structuremodeling� Atlas construction� Registration� Simulation� Big data radiology� …
Segmentationisahardproblem!
segmentation leak
Fuzzy/non-existingboundaries,lowcontrast,partialvolumeeffect,…
Segmentationerrorsandtheircorrectionsignificantlyhamper3Dmodelsuseintheclinic
Motivation
MotivationClinical� Definitionofsegmentationqualitywrt ideal/consensus� Segmentationresultsvalidationistime-consuming!� Noconfidencelevel isprovided� Ground-truthgenerationistediousandtime-consuming� Observervariabilityquantification
Technical� Algorithmdevelopment– essentialmeasure!� Comparisonbetweenmethods� Segmentationerrordetectionandcorrection
Segmentationvalidationisessential!
Requiresgroundtruth!
Quality =f(Error)
SegmentationQuality
Image Segmentation
SegmentationError
Groundtruth
Segmentationevaluation
• Groundtruthgiven• Onlyerrorestimation
Error
Error
SegmentationGroundtruth
Currentpractice
Howdowecreatethegroundtruth?
Radiologist 5Radiologist 4Radiologist 3
All Radiologists
Radiologist 2
Segmentationvariability:example
Radiologist 1
CT scan
tumor volumetumor boundary?
Segmentationvariability:exampleManualdelineationsby10radiologists
Lowvariability
Highvariability
Onecolorperradiologist
ObservationThereisnosinglegoldstandardorground-truth!
� Measurement/delineationisintrinsicallyuncertain!� Theobserverdelineationvariabilityis theuncertainty� Sourcesofobserverdelineationvariability:oSubjective observer-dependent• Manualhand-eyecoordinationskills• Attentivenessandthoroughness• Expertiseandknowledge
oObjective observer-independent• Imaging: scanning protocol, resolution, contrast, ...• Intrinsic: structure characteristics, fuzzy contoursdue to partial volume effects, neighboring structures, …
Image Segmentation
SegmentationError
Groundtruth
Segmentationqualityevaluation
…Observer1 Observer2 Observern
Meancontour
Error
Variability
Segmentation
Observers
SegmentationQuality
Quality =f(Error,Variability)
Severalobservers
Quality =f(Error,Variability)
SegmentationQuality
Image Segmentation
SegmentationError
Segmentationqualityevaluation
• Variabilityestimationfromseveralobservers
• Impractical!
Groundtruth
Meancontour
…Observer1 Observer2 Observern
ObserverVariability
Severalobservers
PreviousworkClinical• Literature on delineation observer variability is limited!• Measures are generic – not scan and structure specific• No variability range is provided: tumor volume 23±4cc
Technical• A dozen relevant works on segmentation evaluationwith no ground truth: 2010-18 (Top 2011, Grady 2014, Saad 2010,…)
• No generic segmentation variability model, too specific• Require ad-hoc models, e.g. probabilistic segmentation• Small-scale validation
Previouswork
GoalsQuantifyandestimatesegmentationvariability
forcommonstructuresandpathologies
1. Understand: large-scale manual delineation study toquantify observer variability
2. Estimate: automatic method for estimating the variabilityof a delineation without ground truth
3. Detect and correct: automatic methods for segmentationerrors correction and detection ß Notinthistalk
Quality =f(Error,Variability)
SegmentationQuality
Image Segmentation
Segmentationqualityevaluationnogroundtruth,noobservers!
SegmentationError
ObserverVariability
Segmentationpriors
• Automaticvariabilityestimation• NoErrorà onlyVariability!
Variability
Error
Variabilityestimation
Segmentationvariability:illustration
Lowvariability
Highvariability
Estimatedvariabilitywithoutground truth
Isthispossibleatall?
Segmentationvariability:definition� 𝑆 𝐼 = {𝑠& 𝐼 ,… , 𝑠) 𝐼 } setofN DelineationsofimageI� Setofvoxelsinsideadelineationforwhich:oAt least one annotator agrees à PossibleoAll annotators agree à ConsensusoDifference between Possible,Consensusà Variability
Delineations
Consensus
Possibledifference = union – intersection
Variability
+𝑠, 𝐼)
,.&
/𝑠𝑖(𝐼))
,.&
+𝑠, 𝐼 −/𝑠,
)
,.&
)
,.&
(𝐼)
Segmentationvariability:properties� Patient, structure, and scan-specific� Depends on: which annotators and howmany of them� One annotatorà no variability� As the number of annotators increases:
oPossible increases – voxels are addedoConsensus decreases – voxels are removedoVariability increases – voxels added to the difference
� After sufficient annotators, no more voxels areadded/deletedè Variability converges
� What is the actual variability across all annotators?
Manualdelineationstudy
� Collected 18 representative CT scans from 4 structures� Recruited 11 annotators: 4 residents, 2 mid-career,4 experts, 1 neuro-radiologist. Paid them by the hour.
� Performedmanually3,193CTsliceannotations� Protocoltoproduceexpert-validatedunbiaseddelineations
Quantifydelineationobservervariabilityforcommonstructuresandpathologies
Livertumors5cases
Lungtumors5cases
Kidneys6cases
BrainHematomas
2cases
CTresolution:512×512×102-449,0.5-0.98×0.5-0.98×1.0-3.3mm3
14scans1.5mm,2scans3mm,2scans1mmspacing,hematomas0.5x0.5x1.5mm3
ManualdelineationstudyQuantifydelineationobservervariabilityforcommonstructuresandpathologies
Livertumors5cases
Lungtumors5cases
Kidneys6cases
BrainHematomas
2cases
ExperimentalResultsVariability by
1. Manual tracing2. Pairs of annotators3. Groups of annotators4. Disagreement between annotators5. Case type and difficulty6. Expertise of annotators7. Surface distance difference
1.ManualtracingHowmuchvariabilitycomesfrommanualskillsalone?
Slice with the smallest variability for kidney contour:
Kidneycontours:16%[-8,+8]%
~4 pixelsKidneycontours:Nomedicalknowledgerequired!
Lowestvariabilityof16%forscans0.75x0.75x1.5mm3
2.PairsofannotatorsLivertumors18%[-6,+7]%
BrainHematomas
18%[-6,+6]%
KidneyContours
9%[-1,+1]%
Lungtumors21%
[-9,+10]%
40%
37%
13%
31%
Maximumdifferencebetweentwoannotators
Varia
bilit
y %
Casenumber
VolumeOverlapDifference
Significantmeandifferencesbetweenstructuresandcases
VerysignificantdiscrepanciesforliverandlungtumorsDiscrepancyrangesfrom5%to57%
3.Groupsofannotators
# of annotators
Varia
bilit
y %
one annotatorno variability
Possible
+31%
Consensus
-26%
Maximum diff two annotators
Minumum diff two annotators
Possible,Consensus,andVariabilityvolume%asafunctionofthe#ofannotators
Livertumors
Meanvariabilityof2annotatorsismuchsmallerthan10:
Maximum:10%vs.31%Minimum7%vs26%Thevariabilityof5vs.10annotatorsisalsosignificant!
VolumeOverlapDifference
3.Groupsofannotators
Possible
Consensus
+31%
-26%
Possible,Consensus,andVariabilityvolume%asafunctionofthe#ofannotators
Livertumors
Variabilityrangesfor<5annotatorsis20%!
Variabilityrangeforkannotatorsdecreasesslowly
# of annotators
Varia
bilit
y %
3.GroupsofannotatorsLungtumors
Kidneycontours
Livertumors[-24,+27]%
Lungtumors[-25,+31]%
Kindey contours[-12,+13]%
Brainhematomas[-24,+29]%
Similarprogressionrateforallstructures
Significantdifferencesbystructure:[-12,+23]%to[-25,31]%
100%
37%
Two
53%
Three
72%
Five
82%Eight
All
#ofannotators
Nor
mal
ized
Var
iabi
lity %
3.Groupsofannotators:normalizedvariability
Mea
n D
ice
coef
ficie
nt5.Cases bytype
Casenumber
Livertumors93%
[-3,+2]%
KidneyContours
96%[-1,+1]%
LungTumors91%
[-4,+4]%
Verygoodagreementwiththemeandelineationforallstructureswithlowvariability:91-96[-4,+4]%
Ofcourse,themeandelineationisunknowninpractice…
Volumeagreementwithmean
BrainHematomas
93%[-2,+2]%
6.Annotatorsexpertise
novices residents mid-career experts novices residents mid-career experts
novices residents mid-career experts novices residents mid-career experts
Livertumors
Kidneycontours Brainhematomas
Lungtumors 0.92-0.04,+0.03
0.94-0.02,+0.01
0.98-0.01,+0.01 0.94
-0.01,+0.02
Somestatisticaldifferencesbetweenstructures
Nostatisticaldifferencebetweengroups!
VolumeOverlap
Summaryofthestudy� Significant volume variability differences between cases(easy/hard) and structures (liver/lung tumors): 27-78%
� Wide volume variability range for 2 annotators: 5-57%
� Mean volume variability range of 2 annotators is muchsmaller than for 10 annotators: 7-10% vs 26-31%
� 40% of the variability is due to 1 annotator; 60% to 2
� Annotators disagreement similar in % and trends for all� 37%, 53%, 72% of the variability captured by2, 3, 5 annotatorsà about 10 annotators are necessary!
� No statistical difference between annotatorsexpertise!
AutomaticestimationofvariabilityAutomaticmethodforestimatingthevariabilityofa
givensegmentationwithoutgroundtruth
CTscan
Delineation
INPUT
Segmentationpriorslibrary
Sensitivity
OUTPUT
Variability
Contouranalysis
Manuallycompiledforeachstructure,scanprotocol, task
Localandglobalintensity,texture,shape,…
� Segmentationpriorsforeachproperty
� Qualityestimate:functionofsegmentationpriors
� Function is similar to objective/energy function inoptimization-based segmentation
� BUT: it is used for evaluation, not for optimizationà can be richer/more complex, no search!
� Variability estimation by sensitivity analysis of F(f(v))
Segmentationpriors
fi(v):voxel à quality(v)=f(error(v),variability(v))
f1(v),…,fk(v): image,structure,taskspecificfeaturepriors
F(f1(v),…,fk(v))=F(f(v))
� 𝐹(𝑠)segmentationpriorsfunction� s0 ∈ 𝑆 initialsegmentation� 𝜀 sensitivitythreshold
� ∆𝑠0 segmentationvariabilityrange
∀𝑠,∈ ∆𝑠0, 𝐹(𝑠𝑖) − 𝐹(𝑠;) < 𝜀
Variabilityestimationbysensitivityanalysis
𝜀
∆𝑠0
𝐹(𝑠)
𝑠0
ActualActualEstimatedEstimated
Automaticvariabilityestimation
Lungtumor
Estimationvalidatedwiththesamedataofthemanualdelineationstudy
Variabilityvolumedifference<6%
Variabilityvolumeagreement>70%
Highqualitypredictionofvolumevariability!
Lungtumor
Takehomemessages� There is no single segmentation ground truth!
� Significant manual segmentation variability 5-57%by type of structure, case, and observer: 15-45%
� Annotatordelineationvariabilitycanbequantified
� Delineation and variability estimation can be reliablycomputed automatically for many structures/pathologies
Manythanksto:N.Caplan coordinatorM.Awad,K.Azzam,A.Beinshtein,E.Ben-David,D.Halevi,N.Lev-Cohen,N.Simanovsky,A.Soto
Thanks for your attention!