Measuring Forecaster Performance
description
Transcript of Measuring Forecaster Performance
NATIONAL DEFENSE INTELLIGENCE COLLEGE
Measuring Forecaster Performance
Lt Col James E. Kajdasz, Ph.D., USAF
NATIONAL DEFENSE INTELLIGENCE COLLEGEScholarship of Intelligence Analysis
• “A comprehensive review of the literature indicates that while much has been written, largely there has not been a progression of thinking relative to the core aspect and competencies of doing intelligence analysis.” (Mangio & Wilkinson, 2008)
• “Do [they] teach structured methods because they are the best way to do analysis, or do they teach structured methods because that’s what they can teach?” (Marrin, 2009)
NATIONAL DEFENSE INTELLIGENCE COLLEGEGrade forecasters on % correct?
judgments• We could grade forecaster accuracy similar to a T/F test. (yes/no answers)– Will Qadhafi still be in Libya at this time next year? No– Will the government of Yemen fall in the next year?
No– Will I still be driving my 2001 Corolla in the year
2020? Yes• Wait until outcomes occur/don’t occur, and
calculate percent of correct forecasts. • Compare Forecaster A to Forecaster B by
seeing who has the higher % correct.
NATIONAL DEFENSE INTELLIGENCE COLLEGEWhat about probabilistic judgments?
• When there is a high level of uncertainty, laypeople and even experts often qualify judgments. – Will Qadhafi still be in Libya at this time next year? No
(70% confidence)– Will the government of Yemen fall in the next year?
No (60% confidence)– Will I still be driving my 2001 Corolla in the year
2020? Yes (95% confidence)
NATIONAL DEFENSE INTELLIGENCE COLLEGEWhat about probabilistic judgments?
0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1.0 __ __ __ __ __ __ __ __ __ __ __
Impo
ssib
leHi
ghly
unlik
ely
Som
ewha
tun
likel
yAs
like
ly a
s ot
her
two
poss
ibili
ties
com
bine
d
Som
ewha
tlik
ely Hi
ghly
likel
yCe
rtain
ty
Tetlock, 2005
NATIONAL DEFENSE INTELLIGENCE COLLEGELet’s Compare analysts…
• So which analyst performed best?
• It’s hard to say… We need a summary statistic to summarize total performance.
Probability assignedEvent Occurred? Analyst 1 Analyst 2 Analyst 3
1 No (0) 0 0 0.12 Yes (1) 0.9 0.7 0.73 No (0) 0.1 0.3 04 Yes (1) 0.7 0.5 0.55 Yes (1) 0.9 1 1
NATIONAL DEFENSE INTELLIGENCE COLLEGEMean Probability Score
• Probability Score or Brier Score
– Estimate: • Probability provided by forecaster• .00 – 1.00
– Outcome: • 0 (if event did not occur)• 1 (if event did occur)
( )PS2( )PS Estimate Outcome
NATIONAL DEFENSE INTELLIGENCE COLLEGEMean Probability Score
• Probability Score or Brier Score
– Forecaster says 70% probability X will occur.– X occurs.–
2( )PS Estimate Outcome
2 2(.70 1) ( .3) .09PS
( )PS
NATIONAL DEFENSE INTELLIGENCE COLLEGEMean Probability Score
• Mean Probability Score or Mean Brier Score 2 2(.70 1) ( .3) .09PS
( )PS
2 2(.50 0) ( .5) .25PS 2 2(.10 0) (.10) .01PS
.12PS
NATIONAL DEFENSE INTELLIGENCE COLLEGELet’s Compare analysts…
Probability assignedEvent Occurred? Analyst 1 Analyst 2 Analyst 3
1 No (0) 0 0 0.12 Yes (1) 0.9 0.7 0.73 No (0) 0.1 0.3 04 Yes (1) 0.7 0.5 0.55 Yes (1) 0.9 1 1
0.02 0.09 0.07PS
NATIONAL DEFENSE INTELLIGENCE COLLEGEComponents of Total Forecaster Error
• Several things contribute to overall error, not all of which can be controlled by the forecaster.
Total Forecasting Error
Discrimination Errors
( )PS
CalibrationErrors
Variance of the Outcome
NATIONAL DEFENSE INTELLIGENCE COLLEGEDecomposing Mean Probability Score
PS
2Var(d) + (bias) [Var(d)(slope)](slope-2)+scatterPS
Bias Slope Scatter Var(d)
NATIONAL DEFENSE INTELLIGENCE COLLEGE
Decomposing PS: Bias
Bias f d Where:
= Mean estimate
= Mean outcome
Arkes, Dawson, Speroff & et.al. (1995)
f
d
Est
imat
ed P
roba
bilit
y of
Sur
viva
l (f)
Outcome Index (d)
NATIONAL DEFENSE INTELLIGENCE COLLEGE
Decomposing PS: Slope
1 0Slope f f Where:
= Mean estimate when outcome was 1
= Mean estimate when outcome was 0
Arkes, Dawson, Speroff & et.al. (1995)
1f
0f
Est
imat
ed P
roba
bilit
y of
Sur
viva
l (f)
Outcome Index (d)
NATIONAL DEFENSE INTELLIGENCE COLLEGE
Decomposing PS: Scatter
Where:
= Variance when outcome was 1 = Variance when outcome was 0
Arkes, Dawson, Speroff & et.al. (1995)
1( )Var f
0( )Var f
Est
imat
ed P
roba
bilit
y of
Sur
viva
l (f)
Outcome Index (d)
NATIONAL DEFENSE INTELLIGENCE COLLEGETitle
• Body
Patients DoctorsPS=.23 Bias=0.13 Slope=.13 Scat.=.05 PS=.18 Bias=-0.11 Slope=.26 Scat.=.05
Est
imat
ed P
roba
bilit
y of
Sur
viva
l (f)
Outcome Index (d) Outcome Index (d)
Arkes, Dawson, Speroff & et.al. (1995)
NATIONAL DEFENSE INTELLIGENCE COLLEGEPrediction Markets
NATIONAL DEFENSE INTELLIGENCE COLLEGEA-priori Hypotheses:
• H1: Discrimination will improve as the event nears
– Slope measure will increase over time.• H2: Scatter will decrease as the event nears
– Scatter measure will get smaller over time.• H3: Analysts will be biased toward predicting
the status quo– Bias measure will be negative
NATIONAL DEFENSE INTELLIGENCE COLLEGET-70 Days
NATIONAL DEFENSE INTELLIGENCE COLLEGET-60 Days
NATIONAL DEFENSE INTELLIGENCE COLLEGET-50 Days
NATIONAL DEFENSE INTELLIGENCE COLLEGET-40 Days
NATIONAL DEFENSE INTELLIGENCE COLLEGET-30 Days
NATIONAL DEFENSE INTELLIGENCE COLLEGET-20 Days
NATIONAL DEFENSE INTELLIGENCE COLLEGET-10 Days
NATIONAL DEFENSE INTELLIGENCE COLLEGE
• PS is a measure of overall error
• low PS is better
• Graph suggests curvilinear relationship with time
Total Error over Time
NATIONAL DEFENSE INTELLIGENCE COLLEGE
• PS composed of Bias, Slope, Scatter, and Variance of the outcome
• Graph suggests decrease in error is primarily due to improvement in slope
• Slope is a measure of discrimination
• High slope is better
Components of Error
NATIONAL DEFENSE INTELLIGENCE COLLEGE
• The observed slope was modeled.
• Curvilinear relationship modeled with Days and Days2
• Adj R2 = .834, p=.01• H1 supported.
Discrimination improves as date approaches.
.6
.4
.2S
lope
.0
-.2
Modeling Slope Over Time
NATIONAL DEFENSE INTELLIGENCE COLLEGE
• Scatter is a measure of ‘spread’ of probability estimates.
• Slight linear trend not significant.
• H2 not supported.
Scatter Over Time
NATIONAL DEFENSE INTELLIGENCE COLLEGEBias Over Time
• Questions recoded such that probability ‘0’ represented a continuation of status-quo, and probability ‘1’ represents a change in status-quo
• Analysts were biased toward predicting a change in the status-quo
– Indicated by positive bias numbers – t(6)=4.73, p < .01
• H3 not supported. • BUT significant results in the direction
opposite that hypothesized.• Linear trend over time not statistically
significant.
NATIONAL DEFENSE INTELLIGENCE COLLEGE
Lt Col James E. Kajdasz, Ph.D., [email protected]
The views expressed in this presentation are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S.
Government.