Segmentation uncertainty and error estimation in medical ... · Manual delineation study...

Prof.LeoJoskowiczCASMIPLab,SchoolofComputerScienceandEngineering

TheHebrewUniversityofJerusalem,ISRAEL

Jointwith:D.Cohen,Dr.N.Caplan,Prof.J.SosnaDept.ofRadiology,HadassahUniv.MedicalCenter,Jerusalem,Israel

Segmentationuncertaintyanderrorestimationinmedicalimaging

@ Copyright L. Joskowicz 2018

Whydowemeasure?

“Measure what you can measure, andmakemeasureablewhatisnotmeasurable”

GalileoGalilei

“Ascienceisasmatureasitsmeasurementtools”

LouisPasteur

MotivationSegmentationofanatomicalstructuresandpathologies

inmedicalimagesisimportant!Clinical� Identification and quantification of structures

Livertumors Kidneystructure Carotidstenosis

Jan 25, 2017 July 8, 2018

Baseline Follow-up


inmedicalimagesisimportant!Clinical� Diagnosis and disease progression

external fixationinternalfixation


inmedicalimagesisimportant!Clinical� Treatment planning

Radiosurgery Femurfracturesurgery

3Dprintedpatientspecificsurgicalguides


inmedicalimagesisimportant!Clinical� Treatment delivery -- surgery

Roboticsurgery

SpineAssist


inmedicalimagesisimportant!TechnicalFundamental problem inmedical image processing� Structuremodeling� Atlas construction� Registration� Simulation� Big data radiology� …

Segmentationisahardproblem!

segmentation leak

Fuzzy/non-existingboundaries,lowcontrast,partialvolumeeffect,…

Segmentationerrorsandtheircorrectionsignificantlyhamper3Dmodelsuseintheclinic

Motivation

MotivationClinical� Definitionofsegmentationqualitywrt ideal/consensus� Segmentationresultsvalidationistime-consuming!� Noconfidencelevel isprovided� Ground-truthgenerationistediousandtime-consuming� Observervariabilityquantification

Technical� Algorithmdevelopment– essentialmeasure!� Comparisonbetweenmethods� Segmentationerrordetectionandcorrection

Segmentationvalidationisessential!

Requiresgroundtruth!

Quality =f(Error)

SegmentationQuality

Image Segmentation

SegmentationError

Groundtruth

Segmentationevaluation

• Groundtruthgiven• Onlyerrorestimation

Error

Error

SegmentationGroundtruth

Currentpractice

Howdowecreatethegroundtruth?

Radiologist 5Radiologist 4Radiologist 3

All Radiologists

Radiologist 2

Segmentationvariability:example

Radiologist 1

CT scan

tumor volumetumor boundary?

Segmentationvariability:exampleManualdelineationsby10radiologists

Lowvariability

Highvariability

Onecolorperradiologist

ObservationThereisnosinglegoldstandardorground-truth!

� Measurement/delineationisintrinsicallyuncertain!� Theobserverdelineationvariabilityis theuncertainty� Sourcesofobserverdelineationvariability:oSubjective observer-dependent• Manualhand-eyecoordinationskills• Attentivenessandthoroughness• Expertiseandknowledge

oObjective observer-independent• Imaging: scanning protocol, resolution, contrast, ...• Intrinsic: structure characteristics, fuzzy contoursdue to partial volume effects, neighboring structures, …

Image Segmentation

SegmentationError

Groundtruth

Segmentationqualityevaluation

…Observer1 Observer2 Observern

Meancontour

Error

Variability

Segmentation

Observers

SegmentationQuality

Quality =f(Error,Variability)

Severalobservers


SegmentationQuality

Image Segmentation

SegmentationError

Segmentationqualityevaluation

• Variabilityestimationfromseveralobservers

• Impractical!

Groundtruth

Meancontour

…Observer1 Observer2 Observern

ObserverVariability

Severalobservers

PreviousworkClinical• Literature on delineation observer variability is limited!• Measures are generic – not scan and structure specific• No variability range is provided: tumor volume 23±4cc

Technical• A dozen relevant works on segmentation evaluationwith no ground truth: 2010-18 (Top 2011, Grady 2014, Saad 2010,…)

• No generic segmentation variability model, too specific• Require ad-hoc models, e.g. probabilistic segmentation• Small-scale validation

Previouswork

GoalsQuantifyandestimatesegmentationvariability

forcommonstructuresandpathologies

1. Understand: large-scale manual delineation study toquantify observer variability

2. Estimate: automatic method for estimating the variabilityof a delineation without ground truth

3. Detect and correct: automatic methods for segmentationerrors correction and detection ß Notinthistalk


SegmentationQuality

Image Segmentation

Segmentationqualityevaluationnogroundtruth,noobservers!

SegmentationError

ObserverVariability

Segmentationpriors

• Automaticvariabilityestimation• NoErrorà onlyVariability!

Variability

Error

Variabilityestimation

Segmentationvariability:illustration

Lowvariability

Highvariability

Estimatedvariabilitywithoutground truth

Isthispossibleatall?

Segmentationvariability:definition� 𝑆 𝐼 = {𝑠& 𝐼 ,… , 𝑠) 𝐼 } setofN DelineationsofimageI� Setofvoxelsinsideadelineationforwhich:oAt least one annotator agrees à PossibleoAll annotators agree à ConsensusoDifference between Possible,Consensusà Variability

Delineations

Consensus

Possibledifference = union – intersection

Variability

+𝑠, 𝐼)

,.&

/𝑠𝑖(𝐼))

,.&

+𝑠, 𝐼 −/𝑠,

)

,.&

)

,.&

(𝐼)

Segmentationvariability:properties� Patient, structure, and scan-specific� Depends on: which annotators and howmany of them� One annotatorà no variability� As the number of annotators increases:

oPossible increases – voxels are addedoConsensus decreases – voxels are removedoVariability increases – voxels added to the difference

� After sufficient annotators, no more voxels areadded/deletedè Variability converges

� What is the actual variability across all annotators?

Manualdelineationstudy

� Collected 18 representative CT scans from 4 structures� Recruited 11 annotators: 4 residents, 2 mid-career,4 experts, 1 neuro-radiologist. Paid them by the hour.

� Performedmanually3,193CTsliceannotations� Protocoltoproduceexpert-validatedunbiaseddelineations

Quantifydelineationobservervariabilityforcommonstructuresandpathologies

Livertumors5cases

Lungtumors5cases

Kidneys6cases

BrainHematomas

2cases

CTresolution:512×512×102-449,0.5-0.98×0.5-0.98×1.0-3.3mm3

14scans1.5mm,2scans3mm,2scans1mmspacing,hematomas0.5x0.5x1.5mm3

ManualdelineationstudyQuantifydelineationobservervariabilityforcommonstructuresandpathologies

Livertumors5cases

Lungtumors5cases

Kidneys6cases

BrainHematomas

2cases

ExperimentalResultsVariability by

1. Manual tracing2. Pairs of annotators3. Groups of annotators4. Disagreement between annotators5. Case type and difficulty6. Expertise of annotators7. Surface distance difference

1.ManualtracingHowmuchvariabilitycomesfrommanualskillsalone?

Slice with the smallest variability for kidney contour:

Kidneycontours:16%[-8,+8]%

~4 pixelsKidneycontours:Nomedicalknowledgerequired!

Lowestvariabilityof16%forscans0.75x0.75x1.5mm3

2.PairsofannotatorsLivertumors18%[-6,+7]%

BrainHematomas

18%[-6,+6]%

KidneyContours

9%[-1,+1]%

Lungtumors21%

[-9,+10]%

40%

37%

13%

31%

Maximumdifferencebetweentwoannotators

Varia

bilit

y %

Casenumber

VolumeOverlapDifference

Significantmeandifferencesbetweenstructuresandcases

VerysignificantdiscrepanciesforliverandlungtumorsDiscrepancyrangesfrom5%to57%

3.Groupsofannotators

# of annotators

Varia

bilit

y %

one annotatorno variability

Possible

+31%

Consensus

-26%

Maximum diff two annotators

Minumum diff two annotators

Possible,Consensus,andVariabilityvolume%asafunctionofthe#ofannotators

Livertumors

Meanvariabilityof2annotatorsismuchsmallerthan10:

Maximum:10%vs.31%Minimum7%vs26%Thevariabilityof5vs.10annotatorsisalsosignificant!

VolumeOverlapDifference

3.Groupsofannotators

Possible

Consensus

+31%

-26%

Possible,Consensus,andVariabilityvolume%asafunctionofthe#ofannotators

Livertumors

Variabilityrangesfor<5annotatorsis20%!

Variabilityrangeforkannotatorsdecreasesslowly

# of annotators

Varia

bilit

y %

3.GroupsofannotatorsLungtumors

Kidneycontours

Livertumors[-24,+27]%

Lungtumors[-25,+31]%

Kindey contours[-12,+13]%

Brainhematomas[-24,+29]%

Similarprogressionrateforallstructures

Significantdifferencesbystructure:[-12,+23]%to[-25,31]%

100%

37%

Two

53%

Three

72%

Five

82%Eight

All

#ofannotators

Nor

mal

ized

Var

iabi

lity %

3.Groupsofannotators:normalizedvariability

Mea

n D

ice

coef

ficie

nt5.Cases bytype

Casenumber

Livertumors93%

[-3,+2]%

KidneyContours

96%[-1,+1]%

LungTumors91%

[-4,+4]%

Verygoodagreementwiththemeandelineationforallstructureswithlowvariability:91-96[-4,+4]%

Ofcourse,themeandelineationisunknowninpractice…

Volumeagreementwithmean

BrainHematomas

93%[-2,+2]%

6.Annotatorsexpertise

novices residents mid-career experts novices residents mid-career experts

novices residents mid-career experts novices residents mid-career experts

Livertumors

Kidneycontours Brainhematomas

Lungtumors 0.92-0.04,+0.03

0.94-0.02,+0.01

0.98-0.01,+0.01 0.94

-0.01,+0.02

Somestatisticaldifferencesbetweenstructures

Nostatisticaldifferencebetweengroups!

VolumeOverlap

Summaryofthestudy� Significant volume variability differences between cases(easy/hard) and structures (liver/lung tumors): 27-78%

� Wide volume variability range for 2 annotators: 5-57%

� Mean volume variability range of 2 annotators is muchsmaller than for 10 annotators: 7-10% vs 26-31%

� 40% of the variability is due to 1 annotator; 60% to 2

� Annotators disagreement similar in % and trends for all� 37%, 53%, 72% of the variability captured by2, 3, 5 annotatorsà about 10 annotators are necessary!

� No statistical difference between annotatorsexpertise!

AutomaticestimationofvariabilityAutomaticmethodforestimatingthevariabilityofa

givensegmentationwithoutgroundtruth

CTscan

Delineation

INPUT

Segmentationpriorslibrary

Sensitivity

OUTPUT

Variability

Contouranalysis

Manuallycompiledforeachstructure,scanprotocol, task

Localandglobalintensity,texture,shape,…

� Segmentationpriorsforeachproperty

� Qualityestimate:functionofsegmentationpriors

� Function is similar to objective/energy function inoptimization-based segmentation

� BUT: it is used for evaluation, not for optimizationà can be richer/more complex, no search!

� Variability estimation by sensitivity analysis of F(f(v))

Segmentationpriors

fi(v):voxel à quality(v)=f(error(v),variability(v))

f1(v),…,fk(v): image,structure,taskspecificfeaturepriors

F(f1(v),…,fk(v))=F(f(v))

� 𝐹(𝑠)segmentationpriorsfunction� s0 ∈ 𝑆 initialsegmentation� 𝜀 sensitivitythreshold

� ∆𝑠0 segmentationvariabilityrange

∀𝑠,∈ ∆𝑠0, 𝐹(𝑠𝑖) − 𝐹(𝑠;) < 𝜀

Variabilityestimationbysensitivityanalysis

𝜀

∆𝑠0

𝐹(𝑠)

𝑠0

ActualActualEstimatedEstimated

Automaticvariabilityestimation

Lungtumor

Estimationvalidatedwiththesamedataofthemanualdelineationstudy

Variabilityvolumedifference<6%

Variabilityvolumeagreement>70%

Highqualitypredictionofvolumevariability!

Lungtumor

Takehomemessages� There is no single segmentation ground truth!

� Significant manual segmentation variability 5-57%by type of structure, case, and observer: 15-45%

� Annotatordelineationvariabilitycanbequantified

� Delineation and variability estimation can be reliablycomputed automatically for many structures/pathologies

Manythanksto:N.Caplan coordinatorM.Awad,K.Azzam,A.Beinshtein,E.Ben-David,D.Halevi,N.Lev-Cohen,N.Simanovsky,A.Soto

Thanks for your attention!

Segmentation uncertainty and error estimation in medical ... · Manual delineation study...

Documents

Transcript of Segmentation uncertainty and error estimation in medical ... · Manual delineation study...