Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Statistical Hypothesis Testing

(8th Session in “Gentle Introduction to Modeling Uncertainty”)

Lonnie Chrisman, Ph.D.Lumina Decision Systems

Analytica User Group15 July 2010


Scope of Today’s Webinar

Included:• Conceptual underpinnings of classical

hypothesis testing.• Interpretation of statistical significance (p-

values). • General methodology for applying it in any

scenario.Intended to promote conceptual understanding.Building on Monte Carlo tools.

Not included:• Standard canned hypothesis tests (like t-tests,

etc)


Outline

• Motivating example• Statistical significance• The Statistic• Methodology• Modeling the Null hypothesis• Computing the pValue• Interpretation of results• Drawbacks of methodology• Additional exercise


Does Stock Market VolatilityVary with Day of Week?

• Random selected 100 trading days (from 2000-2010).• Computed day change (close-open)/open for S&P 500

index.Day of week

# samples

Volatility

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%

• Alice: “This shows that the market volatility does depend on the day of the week.”

• Bob: “No, the variation is just due to random sampling variation.”

Side note:

Annualized volatility := SDeviation * sqrt(T)where T = # trading days/yr = 250

Total volatility: 18.1%


Download Model with S&P Data

• Please download: “Hypothesis Test S&P Volatility.ana”

the download link is at the bottom of talk abstract on Analytica Wiki.

• You’ll use this data for exercises…


Statistical Significance

• Alice: “This shows that the market volatility depends on the day of the week.”

• Alice’s mission: To show that this observed variation is unlikely if it is just due to random sampling variation.

• Null Hypothesis: The “true” underlying volatility is the same for every day of the week.

• Level of significance: The probability that this much variation in volatility would be observed if the Null Hypothesis is true. (termed the “p-value”)

Day of week # samples Observed Volatility

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%


Statistical Significance #2

• After her statistical analysis, Alice might say: “This shows at a significance level p=3% that market volatility varies with the day of the week.”

• By convention, p ≤ 5% is usually considered to be “statistically significant”. p>5% is said to be “not statistically significant”.

• What can you conclude if the p-value turns out to be 20%?

Day of week # samples Observed Volatility

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%


The “Statistic”

• We need a scalar metric to summarize degree of conflict with Null-hypothesis (H0).

Smaller value more consistent with H0

Larger value greater disagreement with H0

• Examples:Max(vol,day) – Min(vol,day)SDeviation(vol,day)F = Variance(vol,day) / Total_volatility^2

• Exercise: Pick a statistic and compute its value for the S&P 500 dataset in your Analytica model.

Day # samples

Observed Volatility (vol)

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%



Methodology

• Construct a model that simulates measurements given that the null-hypothesis is true.

Typically makes various assumptions.

• Use Monte Carlo simulation to produce several simulated data sets. Apply the statistic to each.

• pValue: Pr( Statsim ≥ Statmeas )

Model of NullHypothesis

SimulatedDataset

Statistic onsimulated

Measureddataset

Statistic onmeasured

pValue


Modeling the Null Hypothesis

• Null Hypothesis: The volatility is 18.1% on every day of the week.

• How could you simulate the data?(Hint: There are multiple possible approaches)

What assumptions are you making?• Some ideas:

Randomly generate each day’s price change from a LogNormal distribution.Shuffle existing data.

• Exercise: Implement a model of the null-hypothesis in your Analytica model. (One random dataset for each item in Run)

Day # samples

Observed Volatility (vol)

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%



Computing Statistic on Simulated

• Exercise: Apply your statistic to each simulated dataset.

Note: Larger statistic values occur when the variation in volatility by day is largest.

• Exercise: What fraction of simulated datasets have a larger statistic value than the actual data?

This is the p-valueIs Alice’s hypothesis statistically significant?


Common Misuse of Paradigm:

Multiple Hypotheses• Scenario:

Alice identifies 20 other plausible hypotheses to test, e.g.:

Volatility on Tues is different than the other 4 days.Volatility varies my month.September has a higher volatility than other months.…

She tests each of these individually and finds one of them to be statistically significant at a 5% level.She publishes this result.

• What’s wrong here?• What should she do differently?


Interpreting p-Value

• Small value (< 5%)Accept main hypothesisData is inconsistent with Null-hypothesis

• Otherwise (p > 5%)Conclude only that data sample was too small to detect relationship.Hypothesis may still be true or false:“Larger research study required”

• P-value is not:A measure of the strength of relationship.The probability that the hypothesis is true.


Drawbacks with Statistical Hypothesis Testing Paradigm• 1 in 20 false hypotheses are accepted (at 5%

significance level).Often abused by people testing many hypotheses.

• Nearly any hypothesis is confirmed with a large enough sample.

Most hypotheses will have at least a miniscule “true” effect.With enough data, even the most miniscule effect becomes statistically significant.

• The “uncertainty” about the hypothesis is not available. Doesn’t provide P(H), which would be useful in model that use the results.

• Numerous subjective components that are not recognized or reported explicitly.

• “Cookbook tests” are very often misapplied when assumptions don’t hold, leading to greater confidence than is warranted by the data.


New Exercise

Parkinson’s No Parkinson’s

Not exposed 10 140

Exposed to TCE 4 25

• Hypothesis: TCE exposure is associated with an increased risk of getting Parkinson’s disease.

• Null Hypothesis:Parkinson’s rates are the same among those exposed and not exposed to TCE.

• Exercise: Identify an appropriate statistic.Model the null-hypothesisCompute the p-Value

Number of subjects: (purely fictional data)


Summary

• Statistical Hypothesis Testing tests:Is the support for a hypothesis statistically significant given a dataset.

• Significance level (p-value) is:Probability of seeing data at least as extreme as the actual data when the Null hypothesis is true.

• p-value <= 5% accept hypothesisp-value > 5% conclude nothing, need more data.

• Methodology:Identify statistic (scalar metric): A measure of divergence from null-hypothesis.Build model of null-hypothesis to “simulate” data sets.Compute p-value.

Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”)

Documents

Transcript of Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”)