Copyright © 2010 Lumina Decision Systems, Inc.
Statistical Hypothesis Testing
(8th Session in “Gentle Introduction to Modeling Uncertainty”)
Lonnie Chrisman, Ph.D.Lumina Decision Systems
Analytica User Group15 July 2010
Copyright © 2010 Lumina Decision Systems, Inc.
Scope of Today’s Webinar
Included:• Conceptual underpinnings of classical
hypothesis testing.• Interpretation of statistical significance (p-
values). • General methodology for applying it in any
scenario.Intended to promote conceptual understanding.Building on Monte Carlo tools.
Not included:• Standard canned hypothesis tests (like t-tests,
etc)
Copyright © 2010 Lumina Decision Systems, Inc.
Outline
• Motivating example• Statistical significance• The Statistic• Methodology• Modeling the Null hypothesis• Computing the pValue• Interpretation of results• Drawbacks of methodology• Additional exercise
Copyright © 2010 Lumina Decision Systems, Inc.
Does Stock Market VolatilityVary with Day of Week?
• Random selected 100 trading days (from 2000-2010).• Computed day change (close-open)/open for S&P 500
index.Day of week
# samples
Volatility
Mon 20 19.4%
Tue 20 11.9%
Wed 20 21.5%
Thu 20 20.1%
Fri 20 14.3%
• Alice: “This shows that the market volatility does depend on the day of the week.”
• Bob: “No, the variation is just due to random sampling variation.”
Side note:
Annualized volatility := SDeviation * sqrt(T)where T = # trading days/yr = 250
Total volatility: 18.1%
Copyright © 2010 Lumina Decision Systems, Inc.
Download Model with S&P Data
• Please download: “Hypothesis Test S&P Volatility.ana”
the download link is at the bottom of talk abstract on Analytica Wiki.
• You’ll use this data for exercises…
Copyright © 2010 Lumina Decision Systems, Inc.
Statistical Significance
• Alice: “This shows that the market volatility depends on the day of the week.”
• Alice’s mission: To show that this observed variation is unlikely if it is just due to random sampling variation.
• Null Hypothesis: The “true” underlying volatility is the same for every day of the week.
• Level of significance: The probability that this much variation in volatility would be observed if the Null Hypothesis is true. (termed the “p-value”)
Day of week # samples Observed Volatility
Mon 20 19.4%
Tue 20 11.9%
Wed 20 21.5%
Thu 20 20.1%
Fri 20 14.3%
Copyright © 2010 Lumina Decision Systems, Inc.
Statistical Significance #2
• After her statistical analysis, Alice might say: “This shows at a significance level p=3% that market volatility varies with the day of the week.”
• By convention, p ≤ 5% is usually considered to be “statistically significant”. p>5% is said to be “not statistically significant”.
• What can you conclude if the p-value turns out to be 20%?
Day of week # samples Observed Volatility
Mon 20 19.4%
Tue 20 11.9%
Wed 20 21.5%
Thu 20 20.1%
Fri 20 14.3%
Copyright © 2010 Lumina Decision Systems, Inc.
The “Statistic”
• We need a scalar metric to summarize degree of conflict with Null-hypothesis (H0).
Smaller value more consistent with H0
Larger value greater disagreement with H0
• Examples:Max(vol,day) – Min(vol,day)SDeviation(vol,day)F = Variance(vol,day) / Total_volatility^2
• Exercise: Pick a statistic and compute its value for the S&P 500 dataset in your Analytica model.
Day # samples
Observed Volatility (vol)
Mon 20 19.4%
Tue 20 11.9%
Wed 20 21.5%
Thu 20 20.1%
Fri 20 14.3%
Total volatility: 18.1%
Copyright © 2010 Lumina Decision Systems, Inc.
Methodology
• Construct a model that simulates measurements given that the null-hypothesis is true.
Typically makes various assumptions.
• Use Monte Carlo simulation to produce several simulated data sets. Apply the statistic to each.
• pValue: Pr( Statsim ≥ Statmeas )
Model of NullHypothesis
SimulatedDataset
Statistic onsimulated
Measureddataset
Statistic onmeasured
pValue
Copyright © 2010 Lumina Decision Systems, Inc.
Modeling the Null Hypothesis
• Null Hypothesis: The volatility is 18.1% on every day of the week.
• How could you simulate the data?(Hint: There are multiple possible approaches)
What assumptions are you making?• Some ideas:
Randomly generate each day’s price change from a LogNormal distribution.Shuffle existing data.
• Exercise: Implement a model of the null-hypothesis in your Analytica model. (One random dataset for each item in Run)
Day # samples
Observed Volatility (vol)
Mon 20 19.4%
Tue 20 11.9%
Wed 20 21.5%
Thu 20 20.1%
Fri 20 14.3%
Total volatility: 18.1%
Copyright © 2010 Lumina Decision Systems, Inc.
Computing Statistic on Simulated
• Exercise: Apply your statistic to each simulated dataset.
Note: Larger statistic values occur when the variation in volatility by day is largest.
• Exercise: What fraction of simulated datasets have a larger statistic value than the actual data?
This is the p-valueIs Alice’s hypothesis statistically significant?
Copyright © 2010 Lumina Decision Systems, Inc.
Common Misuse of Paradigm:
Multiple Hypotheses• Scenario:
Alice identifies 20 other plausible hypotheses to test, e.g.:
Volatility on Tues is different than the other 4 days.Volatility varies my month.September has a higher volatility than other months.…
She tests each of these individually and finds one of them to be statistically significant at a 5% level.She publishes this result.
• What’s wrong here?• What should she do differently?
Copyright © 2010 Lumina Decision Systems, Inc.
Interpreting p-Value
• Small value (< 5%)Accept main hypothesisData is inconsistent with Null-hypothesis
• Otherwise (p > 5%)Conclude only that data sample was too small to detect relationship.Hypothesis may still be true or false:“Larger research study required”
• P-value is not:A measure of the strength of relationship.The probability that the hypothesis is true.
Copyright © 2010 Lumina Decision Systems, Inc.
Drawbacks with Statistical Hypothesis Testing Paradigm• 1 in 20 false hypotheses are accepted (at 5%
significance level).Often abused by people testing many hypotheses.
• Nearly any hypothesis is confirmed with a large enough sample.
Most hypotheses will have at least a miniscule “true” effect.With enough data, even the most miniscule effect becomes statistically significant.
• The “uncertainty” about the hypothesis is not available. Doesn’t provide P(H), which would be useful in model that use the results.
• Numerous subjective components that are not recognized or reported explicitly.
• “Cookbook tests” are very often misapplied when assumptions don’t hold, leading to greater confidence than is warranted by the data.
Copyright © 2010 Lumina Decision Systems, Inc.
New Exercise
Parkinson’s No Parkinson’s
Not exposed 10 140
Exposed to TCE 4 25
• Hypothesis: TCE exposure is associated with an increased risk of getting Parkinson’s disease.
• Null Hypothesis:Parkinson’s rates are the same among those exposed and not exposed to TCE.
• Exercise: Identify an appropriate statistic.Model the null-hypothesisCompute the p-Value
Number of subjects: (purely fictional data)
Copyright © 2010 Lumina Decision Systems, Inc.
Summary
• Statistical Hypothesis Testing tests:Is the support for a hypothesis statistically significant given a dataset.
• Significance level (p-value) is:Probability of seeing data at least as extreme as the actual data when the Null hypothesis is true.
• p-value <= 5% accept hypothesisp-value > 5% conclude nothing, need more data.
• Methodology:Identify statistic (scalar metric): A measure of divergence from null-hypothesis.Build model of null-hypothesis to “simulate” data sets.Compute p-value.
Top Related