Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
-
Upload
paul-h-artes -
Category
Documents
-
view
216 -
download
0
Transcript of Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
1/24
Paul ArtesNeil OLeary
Dalhousie UniversityHalifax, Nova Scotia
Canada
Bad Apples of the Eye:
Identifying Outliers in HRT Rim Areausing Robust Regression
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
2/24
A
bstr
act
Paul Artes, Neil OLearyOphthalmology and Visual Sciences, Dalhousie University, Halifax, NS, Canada
Identifying outliers in time series of HRT rim area values.
Purpose: Neuroretinal rim area estimates of the Heidelberg Retina Tomograph (HRT)
occasionally have large errors that are not well modelled by Gaussian statistics (Owen et al,IOVS 2006). Such outliers compromise the validity of ordinary-least-squares (OLS) linearregression for interpreting rates of change in patients with glaucoma. We report on theprevalence of outliers in rim area time series and propose a method for identifying such data.
Methods: Patients with open-angle glaucoma (n=145, mean MD=-5.1 dB) were followed
within a prospective longitudinal study, in intervals of 4 months, for a median of 48 months. Timeseries of HRT2 rim area were evaluated using a robust regression technique (lmrob). This
technique iteratively re-estimates robustness weights (w) assigned to each observation toarrive at an accurate estimate of change and its statistical significance. For convenience, wearbitrarily labelled observations with w
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
3/24
Discuss outliers
Report on incidence of outliers in HRT rim area, in
patients with glaucoma followed over time.(touch on) causes...
Demonstrate MM-regression to identify suspect data.
Explain (roughly) how it works.
Argue pros & cons for using MM (rather than OLS)
P
lan
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
4/24
deviate markedly from rest of sample.(after Grubbs, 1969*)
Hard to define, and hard to classify.
Can destroy performance of Gaussian
statistics, for example, rates of change with
ordinary least-squares (OLS) regression.
.
O
utliers
Grubbs, F. E.: 1969, Procedures for detecting outlying observations in samples. Technometrics 11, 121
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
5/24
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
6/24
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
7/24
Our Data...focal (45)
Rates of Change in the Visual
Field and Optic Disc in Patients withDistinct Patterns of GlaucomatousOptic Disc Damage
Alexandre S. C. Reis, MD,1,2 Paul H. Artes, PhD,1 Anne C. Belliveau, BSc,1 Raymond P. LeBlanc, MD,1
Lesya M. Shuba, MD, PhD,1 Balwantray C. Chauhan, PhD,1 Marcelo T. Nicolela, MD1
Purpose: To investigate the rate of visual field and optic disc change in patients with distinct patterns ofglaucomatous optic disc damage.
Design: Prospective longitudinal study.Participants: A total of 131 patients with open-angle glaucoma with focal (n 45), diffuse (n 42), and
sclerotic (n 44) optic disc damage.Methods: Patients were examined every 4 months with standard automated perimetry (SAP, SITA Standard,
24-2 test, Humphrey Field Analyzer, Carl Zeiss Meditec, Dublin, CA) and confocal scanning laser tomography
(CSLT, Heidelberg Retina Tomograph, Heidelberg Engineering GmbH, Heidelberg, Germany) for a period of 4years. During this time, patients were treated according to a predefined protocol to achieve a target intraocularpressure (IOP). Rates of change were estimated by robust linear regression of visual field mean deviation (MD)and global optic disc neuroretinal rim area with follow-up time.
Main Outcome Measures: Rates of change in MD and rim area.Results: Rates of visual field change in patients with focal optic disc damage (mean 0.34, standard
deviation [SD] 0.69 dB/year) were faster than in patients with sclerotic (mean0.14, SD 0.77 dB/year) and diffuse(mean 0.01, SD 0.37 dB/year) optic disc damage (P 0.003, KruskalWallis). Rates of optic disc change inpatients with focal optic disc damage (mean11.70, SD 25.5103 mm2/year) were faster than in patients withdiffuse (mean9.16, SD 14.9 103 mm2/year) and sclerotic (mean 0.45, SD 20.6103 mm2/year) optic discdamage, although the differences were not statistically significant (P 0.11). Absolute IOP reduction fromuntreated levels was similar among the groups (P 0.59).
Conclusions: Patients with focal optic disc damage had faster rates of visual field change and a tendencytoward faster rates of optic disc deterioration when compared with patients with diffuse and sclerotic optic discdamage, despite similar IOP reductions during follow-up.
Financial Disclosure(s): Proprietary or commercial disclosure may be found after the references.Ophthalmology 2012;119:294303 2012 by the American Academy of Ophthalmology.
Rates of visual field and optic disc change are among themost relevant clinical parameters in the management ofglaucoma, providing an indication of the adequacy of treat-ment and overall prognosis.13 Most patients with glaucomashow evidence of change if observed sufficiently longenough. In some patients, these changes are detectable onlyafter many years or even decades and may have minimalimpact on quality of life. Other patients have rapid rates ofchange that cause a substantial risk of visual impairment.
Glaucoma is a progressive optic neuropathy with a wideclinical spectrum, and patients vary with respect to thesensitivity to intraocular pressure (IOP), presence of otherocular and systemic risk factors, and overall prognosis ofthe disease.47 Although this diversity has been widelyrecognized, there have been relatively few attempts to iden-tify subgroups of open-angle glaucoma (OAG) that have amore or less aggressive course of the disease.811
Different patterns of glaucomatous damage to the opticdisc have been described.12,13 There are patients who de-velop a more focal loss of tissue in the optic disc,14,15 whichoccurs from within the cup (notch) and is more frequentlyidentified at the superior and inferior poles. The remainingneuroretinal rim is usually well preserved. Other patientshave a more diffuse loss of rim tissue, with concentric cupenlargement, and no localized areas of loss or pallor.16 Athird common pattern is sclerotic, where the optic disc cupis characteristically saucerized, which refers to a shallowcupping extending to the disc margins with retention of acentral pale cup. This type of damage is associated withmarked areas of peripapillary atrophy and choroidal sclero-sis.17 Examples of these patterns of optic disc damages areshown in Figure 1.
We undertook this study to investigate the rates ofchange in glaucomatous patients with these 3 distinct pat-
294 2 01 2 b y t he A me ri ca n A ca dem y o f O ph th al mo lo gy I SS N 0 16 1- 64 20 /1 2/ $ se e fr on t ma tt erPublished by Elsevier Inc. doi:10.1016/j.ophtha.2011.07.040
, ,
sclerotic (44)
diffuse (42)
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
8/24
Results
small signal!-10.0 ! 10-3 mm2
Figure 3. Rates of rim area change in patients with focal, diffuse, and
sclerotic optic disc damage. The bold circles represent statistically signifi-
cant (P 0.05) negative or positive slopes, and the dashed line represents
a criterion for rapid rate of change (10.0103 mm2/year). The hori-
zontal and vertical lines represent the means and their 95% confidence
intervals.
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
9/24
MM-Regression
robust (resistant to outliers, 25%)efficient (85%, almost like OLS)
provides significance (p-value)
3-stage technique:
1) get high breakdown (S) estimate of slope & intercept
2) estimate the robust variance
3) refine slope and intercept
library (robustbase)
lmrob (y ~ x)
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
10/24
Examples...
San.29Feb32.R
74 75 76 77 78
0.5
0.6
0.7
0.8
0.9
1.0
1.1
age, years
go
ar
ma
rea
,mm
2007
07
25
2007
11
27
2008
03
26
79.9
x mm
4.44.03.63.22.82.42.01.61.20.80.40.0
y[mm
]
4.4
4.0
3.6
3.2
2.8
2.4
2.0
1.6
1.2
0.8
0.4
0.0
y[mm
]
4.4
4.0
3.6
3.2
2.8
2.4
2.0
1.6
1.2
0.8
0.4
.
0.89 mm2
0.73 mm2
x [mm]
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
11/24
Examples...
Mac.21Dec30.R
75 76 77 78 79
0.6
0.7
0.8
0.9
1.0
1.1
1.2
age, years
go
ar
ma
rea
,mm
2007
01
30
2008
05
21
7.1
x [mm]
4.03.63.22.82.42.01.61.20.80.40.0
y[mm]
4.0
3.6
3.2
2.8
2.4
2.0
1.6
1.2
0.8
0.4
1.05 mm2
x [mm]
4.03.63.22.82.42.01.61.20.80.40.0
y[mm]
4.0
3.6
3.2
2.8
2.4
2.0
1.6
1.2
0.8
0.4
0.91 mm2
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
12/24
Examples...
Mil.15Oct31.L
73 74 75 76 77
0.9
1.0
1.1
1.2
1.3
1.4
1.5
age, years
go
ar
ma
rea
,mm
2005
05
06
2005
09
20
21.1
x [mm]4.03.63.22.82.42.01.61.20.80.40.0
y[mm]
4.0
3.6
3.2
2.8
2.4
2.0
1.6
1.2
0.8
0.4
0.96 mm2
4.03.63.22.82.42.01.61.20.80.40.0
y[mm]
4.0
3.6
3.2
2.8
2.4
2.0
1.6
1.2
0.8
0.4
1.19 mm2
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
13/24
weights2$weight0.0 0.2 0.4 0.6 0.8 1.0
0
200
400
600
800
1000
10%
robustness weight
5%
2%
1%frequenc
y
Results: Robustness Weights
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
14/24
Results: Robustness Weights
robustness weight
frequenc
y
Histogram of weights2$weight.norm
weights2$weight.norm0.0 0.2 0.4 0.6 0.8 1.0
0
200
400
600
800
1000
10%5%2%1%
Simulated Gaussian Data
robustness weight
frequenc
y
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
15/24
Results: Robustness Weights
weights2$weight0.0 0.2 0.4 0.6 0.8 1.0
0
200
400
600
800
1000
10%
robustness weight
5%
2%
1%frequenc
y
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
16/24
0.0 0.2 0.4 0.6 0.8 1.0
0
200
400
600
800
1000
10%
5%2%1%
Simulated Gaussian Data
frequency
0.0 0.2 0.4 0.6 0.8 1.0
0
200
400
600
800
1000
10%5%2%
1%
frequency
robustness weights robustness weights
Real Data
Results: Robustness Weights
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
17/24
0.0 0.2 0.4 0.6 0.8 1.0
20
40
60
80
00
20
Results: Weight vs Image Quality
weight
Image
Quality(M
PHSD,m
)
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
18/24
0.1% 0.5% 1% 5% 10%
0.1%
0.5%
1%
5%
10%
Results
0.00 0.05 0.10 0.15 0.20
0.00
0.05
0.10
0.15
0.20
slopes2$ols.sdres
slopes2$rob.s
dres
significance (robust)
s
ignificance
(OLS)
-0.10 -0.05 0.00 0.05
-0.10
-0.05
0.00
0.05
slopes2$ols.slope
sopes
ro
.sope
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
19/24
Summary
Outliers occur quite frequently in imaging
(HRT, OCT).
New regression methods can take care of this.
Best use:
Highlight suspect data for clinicians attention?
Default for rate-of-change & significance?
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
20/24
References
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
21/24
References(1) S. Burke, Scientific Data Management1(1),
3238, 1997.
(2) S. Burke, Scientific Data Management2(1),
3641, 1998.
(3) S. Burke, Scientific Data Management2(2),
3240, 1998.
(4) J.L. Schafer, Monographs on Statistics and
Applied Probability 72 Analysis of
Incomplete Multivariate Data, Chapman & Hall(1997) ISBN 0-412-04061-1.
(5) R.J.A. Little & D.B. Rubin, Statistical Analysis
With Missing Data, John Wiley & Sons (1987),
ISBN 0-471-80243-9.
(6) ISO 3534. Statistics Vocabulary and Symbols.
Part 1: Probability and general statistical terms,
section 2.64. Geneva 1993.
(7) T.J. Farrant, Practical statistics for the analytical
scientist: A bench guide, Royal Society of
Chemistry 1997. (ISBN 0 85404 442 6).
(8) V. Barret & T. Lewis, Outliers in Statistical Data,
3rd Edition, John Wiley (1994).
(9) William H. Kruskal & Judith M. Tanur,
International Encyclopaedia of Statistics, Collier
Macmillian Publishers, 1978. ISBN 0-02-
917960-2.
(10) Analytical Methods Committee, Robust
Statistics How Not to Reject Outliers Part 2.
Analyst1989 114, 16937.
(11) D.C. Hoaglin, F. Mosteller & J.W. Tukey,
Understanding Robust and Exploratory Data
Analysis, John Wiley & Sons (1983), ISBN 0-
471-09777-2.
(12) M. Hollander & D.A. Wolf, Non-parametric
statistical methods, Wiley & Sons, New York
1973.
(13) W.W. Daniel,Applied non-parametric statistics,
Houghton Mifflin, Boston 1978.
(14) M. Sargent, VAM Bulletin, Issue 13, 45,
Autumn. Laboratory of the Government
Chemist, 1995.
19LCGC Europe Online Supplement statistics and data analysis
This is the last article in a series of short
papers introducing basic statistical methods
of use in analytical science. In the three
previous papers (13) we have assumed
the data has been tidy; that is, normally
distributed with no anomalous and/ormissing results. In the real world, however,
we often need to deal with messy data,
for example data sets that contain
transcription errors, unexpected extreme
results or are skewed. How we deal with
this type of data is the subject of this article.
Transcription errors
Transcription errors can normally be
corrected by implementing good quality
control procedures before statistical
analysis is carried out. For example, the
data can be independently checked or,
more rarely, the data can be entered, again
independently, into two separate files and
the files compared electronically to
highlight any discrepancies. There are also
a number of outlier tests that can be used
to highlight anomalous values before other
statistics are calculated. These tests do not
remove the need for good quality
assurance; rather they should be seen as
an additional quality check.
Missing data
No matter how well our experiments are
planned there will always be times when
something goes wrong, resulting in gaps in
the data. Some statistical procedures will
not work as well, or at all, with some data
missing. The best recourse is always to
repeat the experiment to generate the
complete data set. Sometimes, however,
this is not feasible, particularly where
readings are taken at set times or the cost
of retesting is prohibitive, so alternative
ways of addressing this problem are needed.
Current statistical software packages
typically deal with missing data by one of
three methods:Casewise deletion excludes all examples
(cases) that have missing data in at least
one of the selected variables. For example,
in ICPAAS (inductively coupled
plasmaatomic absorption spectroscopy)
calibrated with a number of standard
solutions containing several metal ions at
different concentrations, if the aluminium
value were missing for a particular test
portion, all the results for that test portion
would be disregarded (See Table 1).
This is the usual way of dealing with
missing data, but it does not guarantee
correct answers. This is particularly so, in
complex (multivariate) data sets where it is
possible to end up deleting the majority
of your data if the missing data are
randomly distributed across cases
and variables.
Pairwise deletion can be used as an
alternative to casewise deletion in
situations where parameters (correlation
coefficients, for example) are calculated on
successive pairs of variables (e.g., in a
recovery experiment we may be interestedin the correlations between material
recovered and extraction time, temperature,
particle size, polarity, etc. With pa irwise
deletion, if one solvent polarity measurement
was missing only this single pair would be
deleted from the correlation and the
correlations for recovery versus extraction
time and particle size would be unaffected)
(see Table 2).
Pairwise deletion can, however, lead to
serious problems. For example, if there is a
hidden systematic distribution of missing
points then a bias may result when
calculating a correlation matrix (i.e., different
correlation coefficients in the matrix can be
based on different subsets of cases).
Mean substitution replaces all missing
data in a variable by the mean value for
that variable. Though this looks as if the
This article, the fourth and final part of our statistics refresher series, looksat how to deal with messy data that contain transcription errors or extremeand skewed results.
Shaun Burke, RHM Technology Ltd, High Wycombe, Buckinghamshire, UK.
Missing Values, Outliers,
Robust Statistics &
Non-parametric Methods
table 1 Casewise deletion.
Solution 1
Solution 2
Solution 3
Solution 4
Al
567
234
B
94.5
72.1
34.0
97.4
Fe
578
673
674
429
Ni
23.1
7.6
44.7
82.9
Solution 2
Solution 4
Al
567
234
B
72.1
97.4
Fe
673
429
Ni
7.6
82.9
Casewise deletion. Statistical analysisonly carried out on the reduced data set.
REFERENCES
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
22/24
REFERENCES[1] V. Barnet, The Ordering of Multivariate Data (with Discussion), J. Royal
Statististical Society A, vol. 139, pp. 318-54, 1976.[2] V. Barnet and T. Lewis, Outliers in Statistical Data. Wiley, 1994.[3] C. Brodley and M. Friedl, Identifying and Eliminating Mislabeled Training
Instances, Proc. 13th Nat'l Conf. Artificial Intelligence (AAAI-96), pp. 799-805,1996.
[4] K. Carling, Resistant Outlier Rules and the Non-Gaussian Case,Computational Statistics and Data Analysis, vol. 33, no. 3, pp. 249-258, 2000.
[5]P.R. Cohen, Empirical Methods for Artificial Intelligence. MIT Press, 1995.[6] D. Collet and T. Lewis, The Subjective Nature of Outlier RejectionProcedures, Applied Statistics, vol. 25, pp. 228-237, 1976.
[7] R. Gnanadesikan and J.R. Kettenring, Robust Estimates, Residuals andOutlier Detection with Multi-Response Data, Biometrics, vol. 28, pp. 81-124,1972.
[8] D.J. Hand, Construction and Assessment of Classification Rules. Wiley, 1997.[9] J. Hanely and B. McNeil, The Meaning and Use of the Area under a
Receiver Operator Curve, Radiology, vol. 143, pp. 29-36, 1982.[10] D.M. Hawkins, Identification of Outliers. London: Chapman and Hall, 1980.[11] P.J. Huber, Robust Statistics. Wiley, 1981.[12] B. Kleiner and J. Hartigan, Representing Points in Many Dimensions by
Trees and Castles (with Discussion), J. Am. Statistical Assoc., vol. 76, pp.260-276, 1981.
[13] E. Knorr and R. Ng, A Unified Notion of Outliers: Properties andComputation, Proc. Third Int'l Conf. Knowledge Discovery and Data Mining(KDD-97), pp. 219-222, 1997.
[14] T. Kohonen, Self-Organization and Associative Memory. Springer-Verlag,1989.
[15] X. Liu, G. Cheng, and J.X. Wu, Identifying the Measurement Noise inGlaucomatous Testing: An Artificial Neural Network Approach, ArtificialIntelligence in Medicine, vol. 6, pp. 401-416, 1994.
[16] X. Liu, G. Cheng, and J.X. Wu, Noise and Uncertainty Management in
Intelligent Data Modeling, Proc. 12th Nat'l Conf. Artificial Intelligence(AAAI-94), pp. 263-268, 1994.[17] N. Matic, I. Guyon, L. Bottou, J. Denker, and V. Vapnik, Computer Aided
Cleaning of Large Databases for Character Recognition, Proc. 11th Int'lConf. Pattern Recognition, pp. 330-333, 1992.
[18] J.R. Quinlan, C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.[19] S.M. Weiss and C.A. Kulikowski, Computer Systems that Learn. Morgan
Kaufmann, 1995.[20] J.X. Wu, Visual Screening for Blinding Diseases in the Community Using
Computer Controlled Video Perimetry, PhD thesis, Univ. of London, 1993.
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
23/24
How does MM work?
should be named SM
first step is S-estimate of slope, intercept,variance
final M estimate is maximum likelihoodestimate (obtained from IRWLS), whereweights are...
-
7/30/2019 Identifying outliers in series of neuroretinal rim estimates with the Heidelberg Retina Tomograph
24/24
Helpful papers
Koller & Stahel, 2011: describes SMDM Hampel thesis - good discussion & refs on
how to analyze (and not just delete)
outliers