Download - Statistical Hypothesis Testing (8 th Session in “Gentle Introduction to Modeling Uncertainty”)

Transcript
Page 1: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Statistical Hypothesis Testing

(8th Session in “Gentle Introduction to Modeling Uncertainty”)

Lonnie Chrisman, Ph.D.Lumina Decision Systems

Analytica User Group15 July 2010

Page 2: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Scope of Today’s Webinar

Included:• Conceptual underpinnings of classical

hypothesis testing.• Interpretation of statistical significance (p-

values). • General methodology for applying it in any

scenario.Intended to promote conceptual understanding.Building on Monte Carlo tools.

Not included:• Standard canned hypothesis tests (like t-tests,

etc)

Page 3: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Outline

• Motivating example• Statistical significance• The Statistic• Methodology• Modeling the Null hypothesis• Computing the pValue• Interpretation of results• Drawbacks of methodology• Additional exercise

Page 4: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Does Stock Market VolatilityVary with Day of Week?

• Random selected 100 trading days (from 2000-2010).• Computed day change (close-open)/open for S&P 500

index.Day of week

# samples

Volatility

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%

• Alice: “This shows that the market volatility does depend on the day of the week.”

• Bob: “No, the variation is just due to random sampling variation.”

Side note:

Annualized volatility := SDeviation * sqrt(T)where T = # trading days/yr = 250

Total volatility: 18.1%

Page 5: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Download Model with S&P Data

• Please download: “Hypothesis Test S&P Volatility.ana”

the download link is at the bottom of talk abstract on Analytica Wiki.

• You’ll use this data for exercises…

Page 6: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Statistical Significance

• Alice: “This shows that the market volatility depends on the day of the week.”

• Alice’s mission: To show that this observed variation is unlikely if it is just due to random sampling variation.

• Null Hypothesis: The “true” underlying volatility is the same for every day of the week.

• Level of significance: The probability that this much variation in volatility would be observed if the Null Hypothesis is true. (termed the “p-value”)

Day of week # samples Observed Volatility

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%

Page 7: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Statistical Significance #2

• After her statistical analysis, Alice might say: “This shows at a significance level p=3% that market volatility varies with the day of the week.”

• By convention, p ≤ 5% is usually considered to be “statistically significant”. p>5% is said to be “not statistically significant”.

• What can you conclude if the p-value turns out to be 20%?

Day of week # samples Observed Volatility

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%

Page 8: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

The “Statistic”

• We need a scalar metric to summarize degree of conflict with Null-hypothesis (H0).

Smaller value more consistent with H0

Larger value greater disagreement with H0

• Examples:Max(vol,day) – Min(vol,day)SDeviation(vol,day)F = Variance(vol,day) / Total_volatility^2

• Exercise: Pick a statistic and compute its value for the S&P 500 dataset in your Analytica model.

Day # samples

Observed Volatility (vol)

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%

Total volatility: 18.1%

Page 9: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Methodology

• Construct a model that simulates measurements given that the null-hypothesis is true.

Typically makes various assumptions.

• Use Monte Carlo simulation to produce several simulated data sets. Apply the statistic to each.

• pValue: Pr( Statsim ≥ Statmeas )

Model of NullHypothesis

SimulatedDataset

Statistic onsimulated

Measureddataset

Statistic onmeasured

pValue

Page 10: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Modeling the Null Hypothesis

• Null Hypothesis: The volatility is 18.1% on every day of the week.

• How could you simulate the data?(Hint: There are multiple possible approaches)

What assumptions are you making?• Some ideas:

Randomly generate each day’s price change from a LogNormal distribution.Shuffle existing data.

• Exercise: Implement a model of the null-hypothesis in your Analytica model. (One random dataset for each item in Run)

Day # samples

Observed Volatility (vol)

Mon 20 19.4%

Tue 20 11.9%

Wed 20 21.5%

Thu 20 20.1%

Fri 20 14.3%

Total volatility: 18.1%

Page 11: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Computing Statistic on Simulated

• Exercise: Apply your statistic to each simulated dataset.

Note: Larger statistic values occur when the variation in volatility by day is largest.

• Exercise: What fraction of simulated datasets have a larger statistic value than the actual data?

This is the p-valueIs Alice’s hypothesis statistically significant?

Page 12: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Common Misuse of Paradigm:

Multiple Hypotheses• Scenario:

Alice identifies 20 other plausible hypotheses to test, e.g.:

Volatility on Tues is different than the other 4 days.Volatility varies my month.September has a higher volatility than other months.…

She tests each of these individually and finds one of them to be statistically significant at a 5% level.She publishes this result.

• What’s wrong here?• What should she do differently?

Page 13: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Interpreting p-Value

• Small value (< 5%)Accept main hypothesisData is inconsistent with Null-hypothesis

• Otherwise (p > 5%)Conclude only that data sample was too small to detect relationship.Hypothesis may still be true or false:“Larger research study required”

• P-value is not:A measure of the strength of relationship.The probability that the hypothesis is true.

Page 14: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Drawbacks with Statistical Hypothesis Testing Paradigm• 1 in 20 false hypotheses are accepted (at 5%

significance level).Often abused by people testing many hypotheses.

• Nearly any hypothesis is confirmed with a large enough sample.

Most hypotheses will have at least a miniscule “true” effect.With enough data, even the most miniscule effect becomes statistically significant.

• The “uncertainty” about the hypothesis is not available. Doesn’t provide P(H), which would be useful in model that use the results.

• Numerous subjective components that are not recognized or reported explicitly.

• “Cookbook tests” are very often misapplied when assumptions don’t hold, leading to greater confidence than is warranted by the data.

Page 15: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

New Exercise

Parkinson’s No Parkinson’s

Not exposed 10 140

Exposed to TCE 4 25

• Hypothesis: TCE exposure is associated with an increased risk of getting Parkinson’s disease.

• Null Hypothesis:Parkinson’s rates are the same among those exposed and not exposed to TCE.

• Exercise: Identify an appropriate statistic.Model the null-hypothesisCompute the p-Value

Number of subjects: (purely fictional data)

Page 16: Statistical Hypothesis Testing (8 th  Session in “Gentle Introduction to Modeling Uncertainty”)

Copyright © 2010 Lumina Decision Systems, Inc.

Summary

• Statistical Hypothesis Testing tests:Is the support for a hypothesis statistically significant given a dataset.

• Significance level (p-value) is:Probability of seeing data at least as extreme as the actual data when the Null hypothesis is true.

• p-value <= 5% accept hypothesisp-value > 5% conclude nothing, need more data.

• Methodology:Identify statistic (scalar metric): A measure of divergence from null-hypothesis.Build model of null-hypothesis to “simulate” data sets.Compute p-value.