Beyond Tax Evasion and Misreported Trade: Global Evidence on ...
NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011;...
Transcript of NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011;...
![Page 1: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/1.jpg)
NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT ERROR IN TIME TO EVENT DATA: APPLICATION TO RISK PREDICTION MODELS
Malka Gorfine Tel Aviv University, Israel Joint work with
Danielle Braun and Giovanni Parmigiani, Department of Biostatistics, Harvard School of Public Health
BIRS 2016 1
![Page 2: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/2.jpg)
Setting Risk prediction model has first been developed based on error-free time to event data, and subsequently implemented in practical setting where time to event data can be error-prone.
BIRS 2016 2
![Page 3: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/3.jpg)
Motivation I Mendelian risk prediction models in genetic counseling: • Calculating the probability that an individual carries a cancer causing
inherited mutation based on his/her family history. • Predicting the absolute risk of developing the disease over time given
his/her mutation-carrier status and family history. These models are in wide clinical use and web-based patient-oriented tools: breast cancer, ovarian cancer, Lynch syndrome, pancreatic cancer, melanoma etc.
BIRS 2016 3
![Page 4: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/4.jpg)
These models were developed based on error-free data. Studies about accuracy of self-reported family history show that sensitivity and specificity for reported disease status vary by degree of relative and cancer type. Breast cancer: Disease status, sensitivity 65% - 95%; specificity 98% - 99%. Age of diagnosis was misreported for 3.1% of relatives, average of 4.5 years between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003).
Ovarian cancer: Age of diagnosis was misreported for 4.2% of relatives, average of 4.2 years between the true and misreported ages (Ziogas and Anton-Culver, 2003). Misreporting of family history, especially in disease status, leads to distortions in predictions (Katki, 2006).
BIRS 2016 4
2% of truly unaffected are reported as affected
65% of truly affected are reported affected
![Page 5: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/5.jpg)
Q: Is it possible to develop prediction models based on error-prone data? A: Not in the context of Mendelian risk prediction models which relies on penetrance estimates from the literature, based on error-free data.
BIRS 2016 5
disease probability given carrier status
![Page 6: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/6.jpg)
Motivation II Survival prediction models: In some disease settings, such as cancer, TTP is one of the predictors for survival. Assume a model has been developed based on error-free TTP. In practice, TTP is error-prone: • Tumor assessment is done using imaging, which varies by observers. • Scans are taken at regularly intervals.
BIRS 2016 6
Time to progression – length of time until the disease gets worse or spread
![Page 7: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/7.jpg)
The current setting vs the common settings The usual measurement error setting: ü Error-prone covariate observed in the main study. ü The goal is estimating the relationship between the outcome and the true covariate.
Current setting: ü The relationship between the outcome and the true covariate is known. ü The goal is to use this model for risk estimation based on an error-prone covariate. Naïvely using the error-prone covariate will lead to biased results.
BIRS 2016 7
![Page 8: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/8.jpg)
The data: - outcome - the true failure time - the true right-censoring time Error-free predictor: (for simplicity, one relative) where Example: and the mother’s age at onset.
BIRS 2016 8
Y
T o
C
H = T ,δ( )
T = min T o ,C( ) δ = I T o ≤ C( )
Y = 0 or 1 T o
![Page 9: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/9.jpg)
Error-free predictor: Error-prone predictor: Example: the counselee doesn’t know that his/her relative had the disease, or the correct age at onset. Assumption: We have a validation study with
and
but no need for the outcome . The risk prediction model: Our goal is estimating
BIRS 2016 9
H = T ,δ( )
H* = T *,δ *( )
H = T ,δ( ) H* = T *,δ *( )
Y
Pr Y | H( )
Pr Y | H *( )
![Page 10: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/10.jpg)
Our goal is estimating
Main idea: Assumptions: • contains no information on predicting beyond . • The measurement error model is transportable.
BIRS 2016 10
Pr Y | H *( )
P Y | H *( ) = P Y , H | H *( )dHH∫
= P Y | H , H *( )P H | H *( )dHH∫
= P Y | H( )P H | H *( )dHH∫
H * H Y P H | H *( )
surrogacy assumption
![Page 11: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/11.jpg)
A non parametric estimator of Main idea: Assumptions: Conditional independence of event and censoring times given .
BIRS 2016 11
Pr H |H *( )
H *
H = T ,δ( ) H* = T *,δ *( )
P T ,δ |T *,δ *( ) = λ T |T *,δ *( )δ S T |T *,δ *( )h T |T *,δ *( )1−δ G T |T *,δ *( )
Conditional hazard and survival of true failure time
Conditional hazard and survival of true censoring time
Assume, for simplicity, one family member
The distribution is left unspecified
![Page 12: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/12.jpg)
These hazards and survival functions can be estimated non-parametrically by using the validation data – a large study population that does not involve the counselee. Validation data Use kernel smoothed Kaplan-Meier estimator (Beran, 1981).
BIRS 2016 12
P T ,δ |T *,δ *( ) = λ T |T *,δ *( )δ S T |T *,δ *( )h T |T *,δ *( )1−δ G T |T *,δ *( )
Ti = min Tio ,Ci( ) δ i = I Ti
o ≤ Ci( ) i = 1,…,n
Hi = Ti ,δ i( ) Hi* = Ti
*,δ i*( )
dependent individuals
![Page 13: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/13.jpg)
Kernel smoothed Kaplan-Meier estimator (Beran, 1981): and we get for . For estimating the survival function of the censoring time, apply the above while treating the censoring times as events and the event times as censoring.
BIRS 2016 13
Wi t;bnl ,l( ) = I δ i
* = l( )Kt −Ti
*
bnl
⎛
⎝⎜⎞
⎠⎟ i = 1,…,n l = 0,1
Known kernel function Bandwidth sequences
Nadaraya-Watson weight
S t | t*,l( ) = 1−Wi t*;bnl ,l( )Wj t*;bnl ,l( )j=1,δ j
*=l
n∑ I Tj ≥ Ti( )⎛
⎝
⎜⎜
⎞
⎠
⎟⎟Ti≤t ,δ i=1,δ i
*=l∏
S t | t*,0( ), S t | t*,1( ) t,t* ∈(0,τ ]
![Page 14: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/14.jpg)
Summary
BIRS 2016 14
provided
P Y | H *( ) = P Y | H( ) P H | H *( )dH
H∫
H = H1 ,…,HR( )H ∗ = H1
∗,…,HR∗( )
P H |H ∗( ) = P H j |H j∗( )
j=1
R
∏
P H |H ∗( ) = P H j |H j∗( )
j=1
R
∏
P H j | H j∗( ) = P Tj ,δ j |Tj
∗,δ j∗( )
= λ Tj |Tj∗,δ j
∗( )δ j S Tj |Tj∗,δ j
*( ) h Tj |Tj∗,δ j
∗( )1−δ j G Tj |Tj∗,δ j
∗( )
![Page 15: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/15.jpg)
In case integrating over all possible values of H is computational challenging, a Monte-Carlo estimator can be used, by sampling From and the final proposed estimator is given by
BIRS 2016 15
provided
H(1) ,…, H ( B)
P H | H *( )
P Y | H *( ) = P Y | H( ) P H | H *( )dH
H∫
P Y | H *( ) = 1
BP Y | H (b)( )
b=1
B
∑
![Page 16: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/16.jpg)
Application: Mendelian Risk Prediction Model A counselee provides information on R relatives: Let Aims: o Estimating
o Estimating .
BIRS 2016 16
Hi* = Ti
*,δ i*( ) instead of Hi = Ti ,δ i( ) i = 1,…, R
γ i = γ i1,…,γ iM( ), γ im = 0 or 1
γ im = 1 indicates carrying the genetic variant that confer disease risk
P γ 0 |H0 ,H1*,…,HR
*( ) P T0
o > t | H0 , H1*,…, HR
*( )
![Page 17: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/17.jpg)
Carrier probability Write and in practice is naively being used. We propose
BIRS 2016 17
P Y |H( ) = P γ 0 |H0 ,H1,…,HR( )
P γ 0 | H0 , H1,…, HR( ) =P γ 0( ) P Hi |γ i( )P γ 1,…,γ R |γ 0( )i=0
R∏γ 1,…,γ R∑
P γ 0( ) P Hi |γ i( )P γ 1,…,γ R |γ 0( )i=0
R∏γ 1,…,γ R∑γ 0
∑
BRCAPRO estimate it via meta-analysis, and family history information is verified using medical records
conditional independence of family members’ phenotype given their genotypes.
P Y |H *( ) = P γ 0 |H0 ,H1*,…,HR
*( )
P γ 0 | H *( ) = P γ 0 | H( ) P H | H *( )dH
H∫
![Page 18: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/18.jpg)
Survival probability Write and in practice is naively being used. We propose
BIRS 2016 18
P Y | H *( ) = P γ 0 | H0 , H1*,…, HR
*( )
P T0o > t | H0 , H1,…, HR ,γ 0( )
=P T0
o > t |γ 0( ) P Hi |γ i( )P γ 1,…,γ R |γ 0( )i=1
R∏γ 1,…,γ R∑
P Hi |γ i( )P γ 1,…,γ R |γ 0( )i=0
R∏γ 1,…,γ R∑
P Y | H( ) = P T0o > t | H0 , H1 ,…, HR ,γ 0( )
P T0
o > t | H *( ) = P T0o > t | H( ) P H | H *( )dH
H∫
![Page 19: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/19.jpg)
Simulation Study • Setting: with single gene BRCA1
• Two datasets were generated, one to model the measurement error distribution, and the other represents the counselees.
• The BRCA1 carrier probability 0.006098.
• The penetrance function from BRCAPRO version 2.08.
• Normal censoring, mean 55, SD 10.
• Measurement error in disease status: sen=0.954, spec=0.974; sen=0.649 and spec=0.990. • Measurement error in age:
BIRS 2016 19
P Y |H( ) = P γ 0 |H0 ,H1,…,HR( )
100,000 families, each with 5 members (mother, father, 3 daughters)
P H |γ( )
T* = T + ε ε ∼ N 0,σ 2( ) σ =1,3,5
T* = TU U ∼ Exp 1( )
50,000 counselees
![Page 20: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/20.jpg)
Results
BIRS 2016 20
O/E = I γ 0i = 1( ) /i∑ Pii∑
MSEP = n−1 Pi − Pi γ 0 | H( ){ }2
i∑
![Page 21: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/21.jpg)
Summary of simulation results • We are able to eliminate almost all the bias induced by ME in
histories (O/E).
• We are able to improve accuracy (MSEP).
• We are able to improve discrimination (ROC-AUC) only to some degree.
BIRS 2016 21
![Page 22: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/22.jpg)
Survival prediction - summary of simulation results (multip.)
BIRS 2016 22
![Page 23: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/23.jpg)
Survival prediction - summary of simulation results (multip.)
BIRS 2016 23
![Page 24: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/24.jpg)
Survival prediction - summary of simulation results (multip.)
BIRS 2016 24
![Page 25: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/25.jpg)
Concluding remarks ü A non-parametric adjustment is provided, for measurement error in
time to event predictor.
ü Ignoring the measurement error, provides miscalibrated models.
ü The proposed adjustment improves calibration and total accuracy. ü The proposed method can be easily incorporated in BayesMendel R
package for direct clinical use.
Model discrimination only partially improved.
BIRS 2016 25
![Page 26: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/26.jpg)
END
BIRS 2016 26
![Page 27: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/27.jpg)
Example – misreporting breast cancer Counselees: • Data from the Cancer Genetics Network (CGN) Model Evaluation
Study, with known carrier status. • 2038 families, 34310 relatives. • 9.2% of the relatives have breast cancer. • Only error-prone self-reported family history is available. Validation data: • Data from U of California at Irvine (UCI). • 719 cancer affected counselees (breast, ovarian or colon cancer). • 1521 female relatives, 19.3% with breast cancer. • Error-prone and error-free family history are available.
BIRS 2016 27
![Page 28: NONPARAMETRIC ADJUSTMENT FOR MEASUREMENT …between the true and misreported ages (Mai et al 2011; Ziogas and Anton-Culver, 2003). Ovarian cancer: Age of diagnosis was misreported](https://reader030.fdocuments.in/reader030/viewer/2022040721/5e2d8f03dd3c007f8258d6f4/html5/thumbnails/28.jpg)
Example – misreporting breast cancer Log of O/E and 95% confidence intervals for being a BRCA carrier for counselees in CGN dataset, stratified by risk decile: • Transportability? • Small sample?
Very small improvement in Brier score and ROC-AUC.
BIRS 2016 28