To Explain Or To Predict?
-
Upload
galit-shmueli -
Category
Education
-
view
2.488 -
download
6
Transcript of To Explain Or To Predict?
![Page 1: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/1.jpg)
To Explain or To Predict?Explanatory vs. Predictive Modeling in Scientific Research
Galit Shmuéli
Georgetown UniversityOctober 30, 2009
![Page 2: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/2.jpg)
The path to discovery
![Page 3: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/3.jpg)
Predict
Explain
![Page 4: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/4.jpg)
What are
“explaining”?
“predicting”?
![Page 5: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/5.jpg)
Statistical modeling in social science research
Purpose: test causal theory (“explain”)Association-based statistical models
Prediction nearly absent
![Page 6: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/6.jpg)
Whether statisticians like it or not,
in the social sciences,
association-based statistical models are used for testing causal theory.
Justification: a strong underlying theoretical model provides the causality.
Lesson #1:
![Page 7: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/7.jpg)
Definition: Explanatory Model
A statistical model used for testing causal theory
(“proper” or not)
![Page 8: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/8.jpg)
Definition: Predictive Model
An empirical model used for predicting new records/scenarios
![Page 9: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/9.jpg)
![Page 10: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/10.jpg)
Multi-page sections with theoretical justifications of each hypothesis
![Page 11: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/11.jpg)
Concept operationalization
4 pages of such tables
AngerEconomic stability
Trust
Well-being
Poverty
![Page 12: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/12.jpg)
Statistical model (here: path analysis)
![Page 13: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/13.jpg)
“Statistical” conclusions
![Page 14: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/14.jpg)
Research conclusions
![Page 15: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/15.jpg)
Lesson #2
In the social sciences,
empirical analysis is mainly used for testing causal theory.
Empirical prediction is considered un-academic.
Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth.
Parzen, Statistical Science 2001
![Page 16: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/16.jpg)
Prediction in the Information Systems literature
![Page 18: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/18.jpg)
“Examples of [predictive] theory in IS do not come readily to hand, suggesting that they are not common” Gregor, MISQ 2006
1072 articles
of which
52 empirical with predictive claims
![Page 19: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/19.jpg)
Breakdown of the 52 “predictive” articles
![Page 20: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/20.jpg)
To PredictTo Explain
test causal theory
(utility)
relevancenew theory
predictability
Scientific use of empirical models
Why Predict?
![Page 21: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/21.jpg)
Why are statistical
explanatory models different than
predictive models?
![Page 22: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/22.jpg)
Theory vs. its manifestation
?
![Page 23: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/23.jpg)
“The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”
![Page 24: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/24.jpg)
Given the research environment in the social sciences, two critically important points are:
1. Explanatory power and predictive accuracy cannot be inferred from one another.
2. The “best” explanatory model is (nearly) never the “best” predictive model, and vice versa.
![Page 25: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/25.jpg)
Point #1
Explanatory Power
Predictive Power ≠
Cannot infer one from the other
![Page 26: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/26.jpg)
What is R2 ?
![Page 27: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/27.jpg)
In-sample vs. out-of-sample evaluation
![Page 28: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/28.jpg)
out-of-sample
Performance Evaluation
Danger: type I,II errors
goodness-of-fit
p-values
Danger: over-fitting
costs
prediction accuracy
interpretation
run time
R2
![Page 29: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/29.jpg)
Suggestion for social scientists:
Report predictive accuracy in addition to explanatory power
![Page 30: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/30.jpg)
Explanatory Power
Pred
ictiv
e Po
wer
![Page 31: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/31.jpg)
Best explanatory model
Best predictive model
≠
Point #2
![Page 32: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/32.jpg)
Predict ≠ Explain
+ ?
“We should mention that not all data features were found to be useful. For example, we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However, we concluded that they could not help at all for improving the accuracy of well tuned collaborative filtering models.”
Bell et al., 2008
![Page 33: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/33.jpg)
Predict ≠ ExplainThe FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125%
“We are planning to… develop predictive models for bioavailability and bioequivalence”
Lester M. Crawford, 2005Acting Commissioner of Food & Drugs
![Page 34: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/34.jpg)
Let’s dig in
![Page 35: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/35.jpg)
Explanatory goal:
minimize model bias
Predictive goal:
minimize MSE (model bias + sampling variance)
![Page 36: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/36.jpg)
What isOptimized?
Bias Prediction MSE
Var(Y)= uncontrollable
bias2 = model misspecification
estimation (sampling variance)
or
![Page 37: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/37.jpg)
Linear Regression Example
True modelEstimated model
2211)( xxxf
)ˆˆ(0))(( 221122
1 xxVarxfYEMSE
2211ˆˆ)(ˆ xxxf
)ˆ())(*ˆ( 112
222122
2 xVarxAxxfYEMSE
11)(* xxf
11̂)(*ˆ xxf
211
11 '' xxxxA
Underspecified modelEstimated model
MSE2 < MSE1 when: σ2 large
|β2| small corr(x1,x2) high
limited range of x’s
![Page 38: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/38.jpg)
Two statistical modeling paths
China's Diverging Paths, photo by Clark Smith
![Page 39: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/39.jpg)
Goal Definition
Design & Collection
Data Preparation
EDA
Variables? Methods? Evaluation,
Validation & Model Selection
Model Use & Reporting
![Page 40: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/40.jpg)
Study design
Hierarchical data
Observational or experiment?
Primary or secondary data?
Instrument (reliability+validity vs. measur accuracy)
How much data?
How to sample?
& data collection
![Page 41: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/41.jpg)
Data preparation
reduced-feature models
missing
partitioning
![Page 42: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/42.jpg)
outliers
PCASVD
trends
Interactive visualization
summary stats plots
![Page 43: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/43.jpg)
Which variables?
Multicollinearity?
theory associations ex-post availability
A, B, A*B?
![Page 44: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/44.jpg)
ensemblesPLS
ridge regression
variance bias
PCR
Methods / Models
Blackbox / interpretableMapping to theory
boosting
![Page 45: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/45.jpg)
Evaluation, Validation& Model Selection
Training dataEmpirical model Holdout data
Predictive power
Over-fitting analysis
Theoretical model
Empirical model
Data
ValidationModel fit ≠
Explanatory power
![Page 46: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/46.jpg)
Inference
Model Use
Test causal theory
(utility) PredictionsRelevanceNew theoryPredictability
Predictive performance
Over-fitting analysis
Null hypothesis
Naïve/baseline
![Page 47: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/47.jpg)
Goal Definition
Design & Collection
Data Preparation
EDA
Variables? Methods? Evaluation,
Validation, & Model Selection
Model Use & Reporting
![Page 48: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/48.jpg)
How does all this impact
research
in the (social) sciences?
![Page 49: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/49.jpg)
Three Current Problems
“While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.”
Helmer & Rescher, 1959
Distinction blurred
Inappropriate modeling/assessment
Prediction underappreciated
![Page 50: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/50.jpg)
Why?
What can be done?
Statisticians should acknowledge the difference and teach it!
![Page 51: To Explain Or To Predict?](https://reader036.fdocuments.in/reader036/viewer/2022062313/55b9b364bb61ebe1388b456f/html5/thumbnails/51.jpg)
It’s time for Change
To Predict
To Explain