STATISTICAL LEARNING FROM DATA “LIES, DAMNED LIES, AND STATISTICS’’, Mark Twain. Senate...
-
Upload
curtis-armstrong -
Category
Documents
-
view
216 -
download
3
Transcript of STATISTICAL LEARNING FROM DATA “LIES, DAMNED LIES, AND STATISTICS’’, Mark Twain. Senate...
STATISTICAL LEARNING FROM DATA
“LIES, DAMNED LIES, AND STATISTICS’’, Mark Twain.
Senate Approves Tighter Policing of Drug Makers, May 8, 2007
Mark van der Laan, www.stat.berkeley.edu/~laan
OVERVIEW How good is the human statistical intuition? Statistics are a tool of modern life. The loose statistical world of data analyses and scientific
publishing: “Why more than half [data based] published research findings are false?” (John P.A. Ioannidis)
The rigid statistical world of the FDA and Drug Manufacturers: Role of statistics in clinical trials.
Advances in statistics can help but are heavily un-used: The challenge.
Can we reduce the cost of prescription drugs through improvements in statistical practice and design?
Can statistics improve safety reviews of drugs in use? Concluding remarks.
THE QUIZZ-MASTER PROBLEM
Behind one of these doors is a car.Behind the other two is a goat.Click on the door that you think the car is behind.
YOU SELECT DOOR 1
To keep it exciting, I open one of the other two doors without a car behind it.Obviously the car is not behind door 3.But before I open door 1, the door you selected,I'm going to let you switch to door 2 if you like.Again, click on the door in which you think the car is behind.
Congratulations! You're a winner!
Recap: You originally picked door 1 and then switched to door 2.
Here is a summary of how previous contestants have fared.
# of Players Winners Percent Winners
Switched 131 88 67.2
Didn't Switch 116 42 36.2
This problem was given the name The Monty Hall Paradox in honor of the long time host of the television game show "Let's Make a Deal." Articles about the controversy appeared in the New York Times and other papers around the country.
Marilyn's answer to the reader was that the contestant should switch doors and she received nearly 10,000 responses from readers, most of them disagreeing with her. Hundreds of them were Professors in mathematics and scientists whose responses ranged from hostility to disappointment at the nation's lack of mathematical skills.
Quizz-Master Problem was raised by reader in Marilyn Vos Savant's Sunday Parade Column
Statistics are a tool of modern life Identifying associations,
correlations and patterns Establishing causation
based on randomized trials and observational studies.
Making predictions Shaping strategies and
future behavior Of people and societies Of machines and computing Of complex systems
Statistics are often cited to denotea certainty that does not, in fact, exist “There is increasing concern
that in modern research, false findings may be the majority, or even the vast majority, of published research claims.” “Simulations show that for most
study designs and settings it is more likely for a research claim to be false than true.”
“For many current scientific fields, claimed research findings often may be simply accurate measures of the prevailing bias.”
- J.P.A. Ioannidis, “Why Most Published Research Findings are False”, Chance, Vol. 18, No. 4, 2005
Causal Inference and Curse of Dimensionality
Causal Model and Data
Text from Taubes NYTimes Article
Text from Taubes NYTimes Article
False conclusions are expensive In medicine
False positives lead to expensive additional tests and anxiety False negatives lead to delayed treatment with escalated costs
and illness In drug discovery
False positives lead to failed trials The average cost of a phase III clinical trial is $4m-$20m,
some cost more than $100m False negatives lead to failed trials
Missed contraindications, negative interactions and imprecise dosages
In genomics, proteomics and chemoinformatics False positives are abundant and lead to wasted time, effort and
experimentation False negatives lead to missed business opportunities
In public policy False positives and false negatives lead to action based on false
premises and, frequently, public cynicism
Bias is a hazard of statistics Statistical data samples can be
biased The sample selected does not
represent the population Example: There are five red
heads in a town of 100 people. Our sample of 20 people happens to include all five.
Statistical methods for learning from data can be biased The statistical model selected is
not the one that best fits the data…
… for the question being asked! Statistical interpretations of
findings can be biased.
Variable Importance of HIV resistance mutations Goal: Rank a set of genetic mutations based on their
importance for determining an outcome Mutations (A) in the HIV protease enzyme
Measured by sequencing Outcome (Y) = change in viral load 12 weeks after starting
new regimen containing saquinavir How important is each mutation for viral resistance to this
specific protease inhibitor drug? Inform genotypic scoring systems
Stanford Drug Resistance Database
All Treatment Change Episodes (TCEs) in the Stanford Drug Resistance Database Patients drawn from 16 clinics in Northern CA
Baseline Viral Load
Viral GenotypeTCE (Change >= 1 Drug)
Final Viral Load
Change in Regimen
<24 weeks 12 weeks
333 patients on saquinavir regimen
Parameter of Interest Need to control for a range of other covariates W
Include: past treatment history, baseline clinical characteristics, non-protease mutations, other drugs in regimen
Parameter of Interest: Variable Importanceψ = E[E(Y|Aj=1,W)-E(Y|Aj=0,W)] For each protease mutation (indexed by j)
Analytic approach Standard approach:
Fit a single multivariable regression E(Y|A,W) i.e. Regress clinical response on mutations,
covariates Is this the best approach for answering the
scientific question of interest? What is the scientific question?
Construct best predictor vs. Estimate importance of each mutation
Prediction vs. Importance
Prediction – create a model that the clinician will use to help predict risk of a disease for the patient.
Explanation – trying to investigate the causal association of a treatment or risk factor and a disease outcome.
Targeted Maximum Likelihood MLE- aims to do good job of estimating whole
density Targeted MLE- aims to do good job at parameter
of interest General decrease in bias for parameter of Interest Protection under the null hypothesis Honest p-values, inference, multiple testing
Targeted Maximum Likelihood
In regression case, implementation just involves adding a covariate h(A,W) to the regression model
Requires estimating g(A|W) E.g. distribution of each mutation given covariates
Robust: Estimate of ψ is consistent if either g(A|W) is estimated consistently E(Y|A,W) is estimated consistently
More on this later . . .
)|Pr()|( where,)|0(
)0(
)|1(
)1(),( WaAWag
Wg
AI
Wg
AIWAh
Mutation Rankings Based on Variable Importance
Current Score Mutation VIM VIM p-value Crude Crude p-value
35 90M 0.70 0.00 0.76 0.00
40 48VM 0.79 0.00 1.07 0.00
0 30N -0.78 0.00 -1.06 0.00
10 82AFST 0.46 0.01 0.35 0.03
10 54VA 0.46 0.01 0.31 0.11
10 73CSTA 0.67 0.03 0.80 0.00
2 20IMRTVL 0.32 0.07 0.26 0.18
1 36ILVTA 0.28 0.10 0.27 0.12
2 10FIRVY 0.27 0.13 0.48 0.00
5 88DTG -0.23 0.24 -0.50 0.33
2 71TVI 0.18 0.29 0.14 0.37
5 32I -0.18 0.58 -0.20 0.55
2 63P 0.06 0.77 0.11 0.56
5 46ILV 0.13 0.98 0.27 0.10
“Better Evaluation Tools – Biomarkers and Disease” #1 highly-targeted research project in FDA “Critical Path Initiative”
Requests “clarity on the conceptual framework and evidentiary standards for qualifying a biomarker for various purposes”
“Accepted standards for demonstrating comparability of results, … or for biological interpretation of significant gene expression changes or mutations”
Proper identification of biomarkers can . . .
Identify patient risk or disease susceptibility Determine appropriate treatment regime Detect disease progression and clinical outcomes Access therapy effectiveness Determine level of disease activity etc . . .
Evaluation of Biomarker Discovery Methods
> Univariate Linear Regression Importance measure: Coefficient value
with associated p-valueMeasures marginal association
> RandomForest (Breiman 2001)Importance measures (no p-values)
RF1: variable’s influence on error rate
RF2: mean improvement in node splits due to
variable
> Variable Importance with LARS
• Importance measure: causal effect
Formal inference, p-values providedLARS used to fit initial E[Y|A,W]
estimate W={marginally significant covariates}
All p-values are FDR adjusted
> Test methods ability to determine “true” variables under increasing correlation conditions
• Ranking by measure and p-value• Minimal list necessary to get all “true”?
> Variables Block Diagonal correlation structure: 10 independent sets of 10
Multivariate normal distributionConstant ρ, variance=1ρ={0,0.1,0.2,0.3,…,0.9}
> OutcomeMain effect linear model10 “true” biomarkers, one variable from each set of 10
Equal coefficients Noise term with mean=0 sigma=10
“realistic noise”
),0|(),|( WAYEWaAYE pp
Methods Simulation Study
Simulation Results
No appreciable difference in ranking by importance measure or p-value plot above is with respect to ranked importance measures
List Length for linear regression and randomForest increase with increasing correlation, Variable Importance w/LARS stays near minimum (10) through ρ=0.6, with only small decreases in power
Linear regression list length is 2X Variable Importance list length at ρ=0.4 and 4X at ρ=0.6
RandomForest (RF2) list length is consistently short than linear regression but still is 50% than Variable Importance list length at ρ=0.4, and twice as long at ρ=0.6
Variable importance coupled with LARS estimates true causal effect and outperforms both linear regression and randomForest
Minimal List length to obtain all 10 “true” variables
0
20
40
60
80
100
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Correlation
Lis
t L
en
gth
Linear Reg
VImp w/LARS
RF1
RF2
THE ‘’RIGID’’ STATISTICAL WORLD OF THE FDA Clinical Trials Rigid statistical methodology and designs
required for FDA approval.
Clinical trials are expensive Time to market is critical
Half of the time-to-approval (currently 15.3 yrs) is spent in clinical trials Each day of delay is expensive
Moderately successful drug: $1m per day in lost sales Blockbuster drug: $3m per day
A lot of money is involved Spending on US-sponsored clinical trials is $25.6b in 2006
Biotech + Pharma = $22.6b, NIH = $3 9,937 trials this year
Pharma: 71% of R&D goes to drug development, 45% of this goes to clinical trials
Recruiting patients is expensive Direct costs of patient recruitment are high: $440m per year Indirect costs due to delays
#1 contributor to drug application delays 94% of trials in US miss their enrollment deadlines (Europe: 82%) 80% are delayed at least one month
Drop outs are a major problem 1 of 4 volunteers drops out of a study after it begins
MOVING TOWARDS ADAPTIVE CLINICAL TRIALS‘’A widely noted survey by Accenture provided some alarming
figures a few years ago: Eighty-nine percent of all drug candidates from the initiation of Phase I through FDA approval fail and many of them in the clinic.
Clearly, any techniques that could give an earlier read on these issues would be valuable. In too many cases , the chief result of a trial is to show that the trial itself was set up wrong, in ways that only became clear after the data were un-blinded. Did the numbers show that your dosage was suboptimal partway into a two year trial? Too bad- you probably weren'tallowed to know that. Were several arms of your study obviously pointless from the start? Even if you know, what could you do about it without harming the validity of the whole effort? Over the last years, such concerns have stimulated an unprecedented amount of work on new approaches. Ideas have come from industry, academia, and regulatory agencies such as the FDA's critical path initiative. A common theme in these efforts has been to move toward adaptive clinical trials.’’
Approval of Drugs, and Post-Market Safety Reviews of Drugs
FDA approvals are based on inefficient and often biased statistical methods (e.g., in how
they deal with informative drop out.) FDA does not have expertise to do post market
safety reviews, since this requires expertise in the challenging field of causal inference.
FDA needs to be modernized and need to make strong alliances with academic and industrial centers of excellence.
Senate Approves Tighter Policing of Drug Makers, May 8, 2007
Statistical Innovations are Available Statistical Inference for Adaptive designs. Targeted (Maximum Likelihood) Learning in
Biomarker Discovery Targeted (Maximum Likelihood) Learning of
Causal Effects of drugs and other interventions Targeted Learning of Treatment effect
modification due to genetic and genomic factors (Multiple Testing).
Learning of Individualized Treatment Rules (e.g., individualized medicine).
Super Learning in Prediction
CONCLUDING REMARKS: ‘’Statistics’’ can do a lot of harm in the hands of
people. Any published statistical analysis should be based
on publicly available data. Any statistical analysis should be based on a
priori specified analysis plan (MACHINE LEARNING) and any HUMAN INDUCED deviation from it should be documented (just like the FDA requires!).
Statistical tools used in practice need to improved: FDA needs to be modernized with well founded statistical innovations.