Distribution AnalysisReal dataSummary
References
Distribution AnalysisFinding the best distribution that explains your data
Sébastien Casault
ENMAX Energy Corporation
8 October, 2015
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
IntroductionStatistical testsGoodness of fit
Introduction
We often fit observations to a model (e.g., lognormal distribution).How can we ensure that the model is appropriate? Is there a modelthat would provide more accurate predictions?
Goodness of fitMeasures of goodness of fit typically summarize the discrepancybetween observed values and the values expected under the modelin question. – Wikipedia
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
IntroductionStatistical testsGoodness of fit
Kolmogorov-Smirnov
The K-S statistic, D, is defined as:
Dn = supx|Fn(x)− F (x)|
for the hypothesized distribution is F , and empirical (sample)cumulative distribution function is Fn.
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
IntroductionStatistical testsGoodness of fit
Anderson-Darling
There are many fit tests - they are mostly variations of the KS test.
For example, the AD statistic, A, is defined as:
A = n∫ ∞−∞
(Fn(x)− F (x))2
F (x) (1− F (x)) dF (x)
and is a weighted sum of the quadratic difference between thehypothesized distribution and the sample one, placing more weighton observations in the tails.
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
IntroductionStatistical testsGoodness of fit
P value statistics
The P value is the answer to this question:
If the two samples were randomly sampled from identicalpopulations, what is the probability that the two cumulativefrequency distributions would be as far apart as observed?More precisely, what is the chance that the value of the teststatistic would be as large or larger than observed?
If the P value is small, conclude that the two groups were sampledfrom populations with different distributions. The populations maydiffer in median, variability or the shape of the distribution.
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
ContextCoal plant outagesDistribution fitting
Electricity prices in Alberta
One of the most volatile commodities traded in wholesale markets.
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
ContextCoal plant outagesDistribution fitting
Sources of volatility
An increasing portion of the supply portfolio is stochastic:
Alberta has an installed wind capacity of 8.3%Coal-fired power plants undergo forced outages
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
ContextCoal plant outagesDistribution fitting
Case study - Sundance 2
TransCanada’s Sundance A and B Power Purchase Arrangements
entitle TransCanada to more than 900 megawatts (MW) of capacity
from the Sundance Power Plant. TransCanada sells this electricity
under long-term contracts and into the spot market. The Sundance
Power Plant has a total of six generating units and is owned and
operated by TransAlta.
Sundance A & B Power Purchase Agreement
Power Purchase Arrangement Highlights
Sundance A PPA:
100 per cent of the output from units 1 & 2 = 560 MW. Term expires in 2017.
Sundance B PPA:
50 per cent of the output from Units 3 & 4 = 353 MW. Term expires in 2020.
Location:
The plant is located 70 kilometres (about 45 miles) west of Edmonton, Alberta on the south shore of Lake Wabamun.
In-Service Date:
Unit 1 - 1970; Unit 2 - 1973; Unit 3 - 1976; Unit 4 - 1977.
Capacity:
2,029 MW.
Fuel:
Coal from TransAlta’s Highvale mine.
Environmental Features:
Meets ISO 14001 standards; Regulated by Alberta Environment and the Alberta Electric Utilities Board.
Owner:
TransAlta Utilities Corporation.
Operator:
TransAlta Utilities Corporation.
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
ContextCoal plant outagesDistribution fitting
Outage statistics
The Sundance 2 unit has undergone several forced outages in 2015 -often coinciding with wholesale market price spikes.
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
ContextCoal plant outagesDistribution fitting
Distribution fitting
PROC UNIVARIATE DATA = WORK.ON ;VAR ON ;HISTOGRAM ON / NORMAL LOGNORMAL EXP WEIBULL ;CDFPLOT ON / WEIBULL ;
RUN ;
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
ContextCoal plant outagesDistribution fitting
Distribution statistics and goodness of fit
Both fits provide an accurate description of the observed data.There may be a more theoretical reason to choose the Weibulldistribution.
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
Summary
Finding the right predictive model is importantThere are several tests that can quantify how well a certainmodel fits empirical dataUsing these tests, we can obtain GOF statisticsBuild more reliable models using the right fit
Sébastien Casault Distribution Analysis
Distribution AnalysisReal dataSummary
References
Bibliography
1 Base SAS(R) 9.2 Procedures Guide: Statistical Procedures. UNIVARIATE Procedure, Goodness-of-FitTests.
2 Clauset, A., Shalizi, C. R., and Newman, M. E. J. (2007). Power-Law Distributions in Empirical Data.SIAM Review, 51, 661-703.
3 Hagiwara, Y. (1974). Probability of earthquake occurrence as obtained from a Weibull distribution analysisof crustal strain. Tectonophysics, 23, 313-318.
Sébastien Casault Distribution Analysis
Distribution AnalysisIntroductionStatistical testsGoodness of fit
Real dataContextCoal plant outagesDistribution fitting
SummaryReferences
Top Related