DISTRIBUTION OF NEW TEMPERATURE - Repositories

DISTRIBUTION OF NEW TEMPERATURE

EXTREMES

by

BAHTIYAR BABANAZAROV, B.S.

A THESIS

IN

MATHEMATICS

Submitted to the Graduate Faculty of Texas Tech University in

Partial Fulfillment of the Requirements for

the Degree of

MASTER OF SCIENCE

Approved

Clyde Martin Chairperson of the Committee

Akif Ibragimov

Accepted

John Borrelli Dean of the Graduate School

December, 2006

ACKNOWLEDGEMENTS

First, I would like to thank my advisor Horn Prof. Clyde Martin, who were very

understanding, supportive and inspiring to me throughout my study. He was inspiring

and helpful in all aspects of the thesis. This thesis would not happen without his

support and motivation.

I also would like to thank Prof. Akif Ibragimov for serving in the committee and

for useful discussions.

I would like to thank very special and close friend of mine who supported me

throughout my studies and always pushed me to work harder and harder. I would

also like to thank all my friends in Lubbock for their continuous and endless moral

support. Resul, Mehmet B., Mehmet K., Hakan, Abdulhadi, Emrah, Faruk, Kazim

abi. I also would like to thank some of my friends who supported me from the

distance. Thanks to Resat abi, Saim abi, Tansel abi and Murat abi.

Finally, I would like to thank my wife Gulzira for her support and dedication to

me. Now, this is the time to thank my parents. I have to thank them for almost

everything...

ii

CONTENTS

ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . ii

LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v

I INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1 Global Warming . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Causes of Global Warming . . . . . . . . . . . . . . . . . . 1

1.1.2 Complexity of the problem . . . . . . . . . . . . . . . . . . 1

1.2 Our approach to the problem . . . . . . . . . . . . . . . . . . . 2

1.2.1 What is extreme value theory and how are we applying it

to this problem? . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Thesis Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

II HISTORY OF EXTREME VALUE THEORY . . . . . . . . . . . . 4

2.1 Historical Background . . . . . . . . . . . . . . . . . . . . . . . 4

2.2 Some other applications of Extreme Value Theory . . . . . . . 5

2.3 Models for Extreme Values . . . . . . . . . . . . . . . . . . . . 5

III EXTREME VALUE MODELS . . . . . . . . . . . . . . . . . . . . . 7

3.1 Classical Block Maxima Models . . . . . . . . . . . . . . . . . 7

3.1.1 Types of distributions . . . . . . . . . . . . . . . . . . . . . 8

3.1.2 Outline Proof of the Extremal Types Theorem . . . . . . . 9

3.2 Threshold Models . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.1 The Generalized Pareto Distribution . . . . . . . . . . . . 11

3.2.2 Proof of Theorem 3.3 . . . . . . . . . . . . . . . . . . . . . 12

IV SELECTING A MODEL FOR THE PROBLEM . . . . . . . . . . . 16

4.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.1 Filtering the data . . . . . . . . . . . . . . . . . . . . . . . 16

4.1.2 Evaluating the data using Matlab/ Matlab Part . . . . . . 16

4.1.3 Table of extreme value exceedances . . . . . . . . . . . . . 21

4.2 Picking a model . . . . . . . . . . . . . . . . . . . . . . . . . . 24

iii

4.2.1 Picking Frechet distribution type . . . . . . . . . . . . . . 25

4.3 Least squares regression of the Model . . . . . . . . . . . . . . 25

4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

iv

LIST OF FIGURES

4.1 Frequency of extreme exceedances . . . . . . . . . . . . . . . . . . . . 24

v

CHAPTER I

INTRODUCTION

1.1 Global Warming

1.1.1 Causes of Global Warming

Global warming is one of the very widely discussed topics in our days. It is known

as a human caused problem and the main cause is the burning of fossil fuels-coal,

oil and gas which release carbon dioxide into the atmosphere. As a consequence

atmosphere gets polluted with carbon which blankets the earth and traps in heat.

The trapped heat causes global warming. [1]

In addition to the carbon dioxide, other atmospheric greenhouse gases such as

chlorofluorocarbons and their substitutes, methane, nitrous oxide, etc. have been ob-

served to increase. It is also claimed that atmospheric carbon dioxide concentrations

have increased since the mid-1700s through fossil fuel burning and changes in land

use, with more than 80% of this increase occurring since 1900. As an example, only

electricity generation itself causes 37% of global CO2 emissions. [2]

1.1.2 Complexity of the problem

Although the research show the causes of the global warming, it is not an easy

task to calculate or answer some problems such as

• exactly how fast global warming is happening

• exactly how much it will change

• what part of the earth will be affected more

[3]

The complexity of this is due to the complexity of atmospheric system that we

live in. It is too complicated to be explained with a few causes and predictions.

On the other hand, there are some predictions about which scientists are confident.

According to these mid-continent warming will be greater than over the oceans, and

1

there will be greater warming at higher latitudes. Some polar and glacial ice will

melt, and the oceans will warm; both effects will contribute to higher sea levels. The

hydrologic cycle will change and intensify, leading to changes in water supply as well

as flood and drought patterns. There will be considerable regional variations in the

resulting impacts. [3]

1.2 Our approach to the problem

As we mentioned above, global warming is multi-parametered complex problem

where we have to consider multiple parameters including but not limited to the tem-

perature increase/decrease, greenhouse gases’ change in the atmosphere, human ac-

tivities that might possibly affect the balance in the nature etc.

This thesis, naturally with its size and scope, is far away from considering all those

parameters and does not claim to prove/disprove the global warming. We are mainly

focused on how the extreme weather temperatures are distributed between 1913 and

1964, and what kind of statistical inferences we can make by using the statistical

method called extreme value theory.

1.2.1 What is extreme value theory and how are we applying it to this problem?

We will answer this question first by giving the historical background. Then,

we will see in detail the main types of extreme value models with their properties.

Following these steps, we go to the main part of the thesis; picking the appropriate

extreme value model to our approach and analyze it. Here, we will seek for the

answer to our one of the main questions: Is the frequency of extreme values are

increasing or decreasing? The answer to this question will actually be the main

part of the conclusion.

2

1.3 Thesis Outline

Here is the brief outline of the thesis:

In Chapter 2, we discuss the historical background of the extreme value theory

with major application examples.

In Chapter 3, we see the three types of models that extreme values have and

explore the properties of them in detail.

In Chapter 4, we discuss which model we use and why we use it.

Finally, we conclude the thesis and explain the results.

3

CHAPTER II

HISTORY OF EXTREME VALUE THEORY

2.1 Historical Background

Unlike most statistical methods which mainly deal with what goes on in the center

of a statistical distribution, Extreme Value Theory concerned more with what happens

in the extreme ends. Since by their nature, extreme events occur rarely, here we do

not have a comfort of having many observations, at least in the most cases. This

requires us to be able to guess more often than estimate by calculation as we would

do in a most statistical methods. So, Extreme Value Theory is a collection of methods

that deal with extreme or rare events. [4]

Emil Julius Gumbel, a German mathematician is considered to be the founder of

the extreme value theory. He once said ”It seems that the rivers know the theory. It

only remains to convince the engineers of the validity of this analysis.” [4]. He devel-

oped a distribution type, called Gumbel distribution which is used to find the sample

maximum (or the minimum) of a number of various distributions. The distributions

of the samples could be of the normal or exponential type. As we can see from the

quote of Gumbel, historically his original focus was to predict the maximum level of

the river.

Theorem called three types theorem is the cornerstone of the extreme value theory.

It was first stated by Fisher and Tipett [4] and was proved rigorously by Gnedenko to

the effect that there are only three types of distributions which can arise as limiting

distributions of extremes in random samples. [4]

Gumbel followed this theory and developed statistical methodology for extreme

values based on fitting the extreme value distributions to data consisting of maxima or

minima over a fixed time intervals. [4] To see an example, consider applying Gumbel’s

method to the annual maxima of a series of river flows for a certain period time. Now,

let’s say you want to find the maximum level of a river in a particular year having had

the list of maximum values for the past fifty years. Gumbel distribution is employed

4

to find the maximum and then this information is used to predict the probability of

maximums that might occur in the future. Therefore, predicting this would help you

to determine how tall should an embankment be so you do not get a flood.

2.2 Some other applications of Extreme Value Theory

Application of Extreme Value Theory (EVT) is not limited to the natural phe-

nomenon as once it was when Gumbel started this method. Recently, insurance and

finance world have been using the EVT very intensively. In insurance, a typical prob-

lem would be pricing of the catastrophic loss. EVT would be used to predict this.

In financial environment the example of main application of EVT is to stock market.

For example, what is the probability of stock market crash in 3 days? In addition to

the stock market, EVT is widely used in industry to calculate the industry losses. [5]

Another area where EVT is intensively used is risk-management. An example in

this field would be credit risk management. Expected loss, unexpected loss and stress

loss are the main parameters in this area that people try to estimate by using the

EVT. [5]

Gumbel and similar type distributions are therefore used in extreme value theory.

We will discuss all the details of the EVT such as what distribution families are there,

what models we have etc. in the coming chapter.

Properties of the Gumbel distribution:

• The standard Gumbel distribution has µ = 0 and β = 1

• cumulative distribution function F (x) = exp{− exp(−x)}

• and probability density function f(x) = exp{−x} ∗ exp{− exp(−x)}

2.3 Models for Extreme Values

We should note here briefly that there are mainly two types of models for Extreme

Values. We will just state them here briefly and discuss them detailly in the next

chapter.

5

1. Block Maxima

2. Threshold Models

We should note here that in recent years, the methodology that was once used by

Gumble has shifted towards the methods based on the exceedances over thresholds

rather than annual maxima. The limiting distribution in this context leads to the

distribution type called Pareto distribution. [6]

There are two main reason why threshold methods preferred to annual maximum

methods:

• The data is used more efficiently by taking all exceedances over a certain thresh-

old

• It is easily extended to situations where one wants to study how the extreme

levels of one variable, Y depends on some other variable, X

This completes the discussion of the historical background and main types of

models for Extreme Values. In the next chapter we will see the properties of these

distribution families and types in great detail.

6

CHAPTER III

EXTREME VALUE MODELS

Analysis of extreme events or extreme values requires estimation of the proba-

bilities of extreme events. Extreme value models are developed using asymptotic

arguments.

There are two families of models that describe the Extreme Values. First one is

the classical Block Maxima family models and the other one is Threshold Models

or Peak Over Threshold Models. [6] Threshold models have been developed lately

and have certain advantages over the Block Maxima models. The main advantage in

Threshold Model is the data is used more efficiently by taking all exceedances over

a certain threshold whereas in the Block Maxima model, you might waste the data

if one block happens to contain more extreme events than another. [4] Although,

it seems like they are two different families of models, in the later sections of this

chapter we will show that one family has a corresponding distribution family within

the another one.

3.1 Classical Block Maxima Models

Suppose

X1...Xn

are independent random variables with common distribution F.

F (x) = Pr(Xj ≤ x)∀j, x

The distribution function of the maximum

Mn = max{X1...Xn}

is given by the F n:

{Pr(Mn ≤ x)} = Pr{X1 ≤ x,X2 ≤ x, ...., Xn ≤ x}

7

= Pr{X1 ≤ x} ∗ Pr{X2 ≤ x}.... ∗ Pr{Xn ≤ x}

= F n(x)

[4]

This does not give us anything useful except we know that this distribution → 0

as n →∞ when it is in the range of 0 and 1. [4]

It turns out that we get a useful result by renormalizing. Define scaling constants

an ≥ 0 and bn so that

Pr{Mn − bn

an

≤ x} = Pr{Mn ≤ an ∗ x + bn}

= F n(an ∗ x + bn)

→ H(x) as n →∞ where H is nondegenerate. [6, p. 45] In another words,

Pr{Mn−bn

an≤ x} → H(x) as n →∞

It is beyond the scope of this thesis to discuss how to determine the constants an

and bn, but the examples will be provided for different an’s and bn’s where they will

lead to different types of Extreme Value Models.

3.1.1 Types of distributions

Theorem 3.1 [6, p. 48] If there exist sequence of constants {an} and {bn} such

that

Pr{Mn−bn

an≤ z} → G(z) as n → ∞,

where G is a non-degenerate distribution function, then G belongs to one of the

following families:

1. G(z) = Gumbel type:

• G(z) = exp{−exp(−( z−aa

))} where -∞ < z < ∞

2. G(z) = Frechet type:

8

• G(z) = 0 , if z ≤ b

• G(z) = exp{−( z−ba

)−α}, if z > b

3. Weibull type:

• G(z) = exp{−(−( z−aa

)α)}, if z < b

• G(z) = 1 , if z ≥ b

Each family has a location and scale parameter, b and a respectively; additionally,

Frechet and Weibull families have a shape parameter α. The importance of this

theorem is given any F, its limit distribution is one of the above families. It is kind

of equivalence of central limit theorem in extreme values.

These three distributions can be combined into a single family of models having

distribution functions of the form

G(z) = exp{−(1 + ξ ∗ (z − µ

σ))−1/ξ} (3.1)

This is called generalized extreme value(GEV) family of distributions. Now The-

orem 3.1 can be interpreted as if there exist sequences of constants {an} and {bn}such that

Pr{Mn−bn

an≤ z} → G(z) as n → ∞,

where G is a non-degenerate function, then G is the a member of the GEV family.

3.1.2 Outline Proof of the Extremal Types Theorem

Here is the informal proof: [6, p.49-51]

Formal justification of the extremal theorem is technical, though not especially

complicated - see Leadbetter et al. (1983), for example. In this section we give

informal proof. First, it is convenient to make the following definition.

Definition 3.1 A distribution G is said to be max-stable if, for every n = 2, 3, ...,

there are constants αn > 0 and βn such that Gn(αnz + βn) = G(z).

9

Since Gn is the distribution function of Mn = max{X1, ..., Xn}, where the Xi are

independent variables each having distribution function G, max-stability is a property

satisfied by distributions for which the operation of taking samle maxima leads to an

identical distribution, apart from a change of scale and location. The connection with

the extreme value limit laws is made by the following result.

Theorem 3.2 A distribution is max-stable if, and only if, it is a generalized value

distribution.

It requires only simple algebra to check that all members of the GEV family are

indeed max-stable. The converse requires ideas from functional analysis that are

beyond the scope of this book.

Theorem 3.2 is used directly in the proof of the extremal types theorem. The idea

is to consider Mnk, the maximum random variable in a sequence of nxk variables for

some large value of n. This can be regarded as the maximum of a single sequence

of length nxk, or as the maximum of k maxima, each of which is the maximum of

n observations. More precisely, suppose the limit distribution of Mn−bn

anis G. So, for

large enough n,

Pr{(Mn − bn

an

) ≤ z} ≈ G(z)

By Theorem 3.1. Hence, for any integer k, since nk is large,

Pr{(Mnk − bnk

ank

) ≤ z} ≈ G(z) (3.2)

But, since Mnk is the maximum of k variables having the same distribution as

Mn,

Pr{(Mnk − bnk

ank

) ≤ z} = (Pr{(Mn − bn

an

) ≤ z})k (3.3)

Hence, by (3.2) and (3.3) respectively,

Pr{Mnk ≤ z} ≈ G(z − bnk

ank

)

10

and

Pr{Mnk ≤ z} ≈ Gk(z − bn

an

)

Therefore, G and Gk are identical apart from location and scale coefficients. It

follows that G is max-stable and therefore a member of the GEV family by Theorem

3.2. It should be noted as I did the citation that the proof is taken as it appears [6]

pages 49-51.

Note that as it was mentioned before, determining an and bn on the above cases

is not an easy question. Of course, it is also beyond the scope of this thesis. Reader

may refer to the Leadbetter et al. (1983) for more details of how to determine

those constants. [4] Here we will look at some main examples. These examples are

taken from [4] Before the examples, let’s discuss the other families of Extreme Value

distributions which is called Threshold Models or Peaks over Threshold Models.

3.2 Threshold Models

Let X1...Xn be a sequence of independent and identically distributed random

variables which has distribution function F. Consider defining an event Xi as an

extreme event for those Xi that exceed some high threshold u. Denoting an arbitrary

term in the Xi sequence by X, it follows that a description of the stochastic behavior

of extreme events is given by

Pr{X > u + y|X > u} =1− F (u + y)

1− F (u), y > 0 (3.4)

Here we follow the same limit arguments as we did in the Block Maxima models.

3.2.1 The Generalized Pareto Distribution

The main result is contained in the following theorem. Theorem 3.3[6] Let X1...Xn

be a sequence of independent random variables with common distribution function

F, and let

Mn = max{X1...Xn}.

11

Denote an arbitrary term in the Xi sequence by X, and suppose that F satisfies

Theorem 3.1, so that for large n,

Pr{Mn ≤ z} ≈ G(z),

where

G(z) = exp{−(1 + ξ( z−µσ

))−1/ξ}for some µ, σ > 0 and ξ. Then, for large enough u, the distribution function of

(X − u), conditional on X > u, is approximately

H(y) = 1− (1 +ξy

σ)−1/ξ (3.5)

y > 0 and (1 + ξ∗yσ

> 0 and where

σ = σ + ξ(u− µ) (3.6)

The family of distributions defined by (3.3) is called the generalized Pareto

family. The Theorem 3.3 implies that, if block maxima have approximating dis-

tribution G, then threshold excesses have a corresponding approximate distributions

within the generalized Pareto family.

3.2.2 Proof of Theorem 3.3

The proof of Theorem 3.3 is from [6, p.76-77]. I give the proof given in [6, p.76-77]

as it is in the original source. It follows as:

This section provides an outline proof of Theorem 3.3. A more precise argument

is given by Leadbetter et al. (1983).

Let X have distribution function F. By the assumption of Theorem 3.1, for large

enough n,

F n(z) ≈ exp{−(1 + ξz − µ

σ)−1/ξ}

for some parameters µ, σ > 0 and ξ. Hence,

12

n log F (z) ≈ −(1 + ξz − µ

σ)−1/ξ (3.7)

But for large values of z, a Taylor expansion implies that

log F (z) ≈ −{1− F (z)}.

Substitution into (3.7), followed by rearrangement, gives

1− F (u) ≈ 1

n(1 + ξ

u− µ

σ)−1/ξ

for large u. Similarly, for y > 0,

1− F (u + y) ≈ 1

n(1 + ξ

u + y − µ

σ)−1/ξ (3.8)

Hence,

Pr{X > u + y|X > u} ≈ n−1(1 + ξ(u + y − µ)/σ)−1/ξ

n−1(1 + ξ(u− µ)/σ)−1/ξ

= (1 +ξ(u + y − µ)/σ

ξ(u− µ)/σ)−1/ξ

= (1 +ξy

σ)−1/ξ

where σ = σ + ξ(u− µ), as required. This completes the proof of Theorem 3.3.

Now, discussed the two families, we look at some main examples and see what are

the possible an,bn values and relation between these two models.

Example1: The following three examples are from [4].

First, let’s consider the exponential distribution: Suppose F (x) = 1 − exp(−x).

Let an = 1, bn = log n,

then F n(anx + bn) = (1− exp(−x− log n))n

= (1− exp(−x)n

)n

→ exp(−exp(−x))

13

Using the limit (1+ zn)n → exp(z) as n→∞, which is valid for any real or complex

z. Therefore, in the case of the exponential distribution, the appropriate limiting form

for the sample maxima is the Gumbel distribution.

Now, let’s look at the threshold version of the result. Set σu = 1, then

Fu(σuz) = F (u+σuz−F (u))1−F (u)

= exp(−u)−exp(−u−z)exp(−u)

= 1− exp(−z)

therefore in this case the exponential distribution is the exact distribution for

exceedances over a threshold. Thus, it is automatically the limiting distribution as

u → ∞ and it is known that the exponential distribution is a special case of the

Generalized Pareto distribution with ξ = 0.

Example2: Pareto-type tail

Suppose 1 − F (x) ∼ cx−α as x → ∞, with c and α both ≥ 0. This form cov-

ers the Pareto distribution and also some well-known distributions such as t and F

distributions. Let bn = 0 and an = (nc)1/α. Then for x > 0,

F n(anx) ≈ {1− c(anx)−α}n

= (1− x−α

n)n

→ exp(−x)−α

So, in this case the limiting distribution is Frechet. Now, let’s look at the threshold

form of this result. Let σu = ub where b > 0 is to be determined. Then

Fu(σuz) = F (u+σuz−F (u)1−F (u)

≈ cu−α−c(u−ubz)−alpha

cu−α

= 1− (1 + bz)−α

Now let ξ = 1α

and set b = ξ, the limit distribution is exactly as given above.

Example3: Suppose ωF = ω < ∞ and 1 − F (ω − y) ∼ cyα. Let bn = ω, an =

(nc)1/α. Then for x < 0

F n(anx + bn) = F n(ω + anx)

≈ {1− c(−anx)α}n

≈ {1− (−x)

n}n

14

→ exp{−(−x)α}

The corresponding limit when x > 0 is obviously 1. So, this is a case of convergence

to the Weibull type.

Again for the threshold version of this result, let u be very close to ω and consider

σu = b(ω − u) for b > 0 to be determined. Then for 0 < z < 1b

Fu(σuz) = F (u+σuz−F (u)1−F (u)

≈ c(ω−u)α−c(ω−u−σuz)α

c(ω−u)α

= (1 − bz)α Setting ξ = −1α

and b = 1α, we get Generalized Pareto Distribution

form.

15

CHAPTER IV

SELECTING A MODEL FOR THE PROBLEM

4.1 Background

4.1.1 Filtering the data

This project is completely based on the data that we obtained from National

Climatic Data Center’s web site. This data is the maximum daily temperatures from

a fixed station near Lubbock between 1913 and 1964.

Originally this data was in the Microsoft Excel format. Since all the calculations

were made in Matlab, the original data was filtered and stored in an array. The reason

why we call it filtering is because when we had the data in original form, there were

some letters representing Fahrenheit(F) and some other related terms. Obviously we

could not do anything with the numbers mixed with letters, so we had to filter the

data.

4.1.2 Evaluating the data using Matlab/ Matlab Part

Stored the data into matrix in Matlab, now it is time to write some code in Matlab

to explore the data more detailly. Recall that our aim is to fit our data into one of

the extreme value models. In order to be able to do so, we need to know more about

our data such as whether extreme values are increasing/decreasing, is the relative

frequency of the extreme values are getting increased/decreased etc.

The Matlab code is heavily commented and explained thoroughly, so we just attach

the code. Here is the Matlab code which gives the extreme exceedances between 1914

and 1963.

A = [x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16

x17 x18 x19 x20 x21 x22 x23 x24 x25 x26 x27 x28 x29 x30

x31];

16

% x1 is a column vector, it is the 1st day of the each month

%between 1914 % and 1963.

% As seen above, matrix A is consists of these column vectors.

k = 1;

k2 = 1;

% This is to put whole data into one string, in case if I want

%to graph everything at once

for i = 1:599

for j = 1:31

z(k,1) = A(i,j);

k = k+1;

end

end

% putting data into one string ends here

t = 1;

n = 2;

% putting the data of 1913 into one string

for i = 1:124

y1913(i,1) = z(i,1);

end

% taking out the 1913’s data out of the general string

% because 1913 is not given as a complete year. We have

%only September through December and including 1913

%messes up the analysis

[a,b] = size(z);

17

c = 1;

for i = 125:a

z2(c,1) = z(i,1);

c = c+1;

end

%=====================================================

[a,b] = size(z2);

t = 1;

v = 5;

rown = 1;

coln = 1;

% putting the rest of the data into years t=1 means 1914,

% t=49 means 1963

while(t<50)

for i = v:v+11

for j = 1:31

B(rown,t) = A(i,j);

rown = rown+1;

end

end

rown = 1;

t = t+1;

v = v+12;

end

% end putting the data into years t=1 means 1914, t=49 means 1963

18

% Now we are calculating the exceedance of the maximum temperature

t = 1;

c = 1;

l = 1;

while(t<50)

v = B(:,t);

max = v(1,1);

for i = 1:372

if(v(i,1)>max)

max = v(i,1);

maxmatrix(c,t) = max;

c = c+1;

end

end

maxcountholder(t,1) = c;

t = t+1;

c = 1;

end

% End of calculating the exceedance of the maximum temperature

x = 1914;

y = 1;

for i = 1:49

maxcountholder(i,2) = x;

x = x+1;

Y(i,1) = y;

y = y+1;

19

end

% maxcountholder is a vector which stores the number of exceedences

% in a year in its first column and the specific year in the secodn

%column. So, this gives us the information of how many maximum

%exceedances occured in each year between 1914 and 1963.

z4 = maxcountholder(:,1);

%plot(z4)

X = z4(:,1);

k = 1;

for i = 1:19

for j = 1:49

if(maxmatrix(i,j)~=0)

z3(k,1) = maxmatrix(i,j);

k = k+1;

end

end

end

%maxmatrix is a matrix which stores the maximum values that

%yeach year had. It is close related to the vector

%maxcountholder. For example for 1962, maxcountholder has

%5 and maxmatrix has [94 97 98 99]’. This means in

% 1962, there were 5 exceedances and those values exceeded

%each other and previous ones.

annualmax = [102 100 103 105 103 101 99 102 103 102 108 106

102 102 103 103 106 105 102 108 106 108 108 105

101 109 109 100 105 105 107 107 104 105 105 100

20

104 106 105 108 103 103 104 103 106 104 100 100]’;

%=========== end of the program ================================

Using the above Matlab code, we get the maximum frequency exceedance values

which is shown in table.

4.1.3 Table of extreme value exceedances

1914 1915 1916 1917 1918 1919 1920 1921 1922 1923

70 59 69 68 68 68 77 76 36 71

74 66 74 69 75 79 80 81 60 82

76 67 75 70 76 81 82 82 71 83

79 70 78 74 79 84 83 83 74 87

87 72 81 75 84 86 84 86 77 88

92 76 82 77 85 87 86 88 79 90

95 81 85 86 89 90 95 91 83 94

96 83 93 88 92 93 99 92 87 95

99 84 96 91 95 95 0 94 90 102

102 86 97 93 96 98 0 97 93 0

0 88 99 94 98 99 0 98 94 0

0 91 102 98 99 101 0 102 95 0

0 92 103 100 100 0 0 0 100 0

0 98 0 105 102 0 0 0 101 0

0 99 0 0 103 0 0 0 103 0

0 100 0 0 0 0 0 0 0 0

0 100 0 0 0 0 0 0 0 0

On the table above, frequency of maximum exceedances between 1914 and 1925.

Since the next table is large, it will be given on a separate page.

21

Table 4.3.1 cont’d

1924 1925 1926 1927 1928 1929 1930 1931 1932 1933

74 74 71 51 89 73 78 84 88 78

76 76 73 67 92 83 83 86 91 86

80 87 77 72 97 85 86 92 92 88

82 89 80 73 99 86 93 97 94 92

84 92 85 77 101 92 94 98 100 95

86 100 87 78 103 93 95 99 101 98

88 101 90 80 0 94 100 102 102 100

91 102 93 81 0 96 103 105 0 101

95 106 98 84 0 98 106 0 0 102

99 0 99 85 0 101 0 0 0 104

100 0 100 88 0 103 0 0 0 105

104 0 102 90 0 0 0 0 0 107

105 0 0 91 0 0 0 0 0 108

107 0 0 92 0 0 0 0 0 0

108 0 0 95 0 0 0 0 0 0

0 0 0 98 0 0 0 0 0 0

0 0 0 100 0 0 0 0 0 0

101

102

22

Table 4.3.1 cont’d

1934 1935 1936 1937 1938 1939 1940 1941 1942 1943

85 91 98 81 99 92 96 85 100 99

86 95 100 93 100 98 98 90 101 101

88 101 101 97 101 99 101 92 103 104

93 108 108 99 0 102 107 93 105 105

95 0 0 100 0 104 109 95 0 0

96 0 0 103 0 106 0 98 0 0

99 0 0 105 0 109 0 100 0 0

105 0 0 0 0 0 0 0 0 0

106 0 0 0 0 0 0 0 0 0

1944 1945 1946 1947 1948 1949 1950 1951 1952 1953

95 92 91 94 80 99 87 95 92 104

96 95 98 97 88 100 96 96 95 108

98 97 100 98 100 0 98 100 96 0

100 100 103 100 101 0 99 105 99 0

101 101 104 102 103 0 100 106 100 0

103 103 0 105 105 0 104 0 101 0

104 107 0 0 0 0 0 0 105 0

107 0 0 0 0 0 0 0 0 0

1954 1955 1956 1957 1958 1959 1960 1961 1962

99 95 96 101 96 90 100 93 94

100 98 97 102 100 98 0 94 97

101 102 99 103 105 99 0 99 98

102 103 100 0 106 103 0 100 99

103 0 104 0 0 104 0 0 0

23

0 10 20 30 40 502

4

6

8

10

12

14

16

18

20

Figure 4.1: Frequency of extreme exceedances

4.2 Picking a model

So far we discussed the background of our work. We filtered the data, then we

calculated the frequency of exceedances of maximum values. Now it is time to pick a

model that describes the distribution of our data best. As it was stated by a theorem

in Chapter 3, any extreme value model should be one of the three types:

1. Gumbel

2. Frechet

3. Weibull

type.

It should be noted here that we pick a model first by looking at its graph, mainly

its tail, then we do nonlinear regression to verify if it was the correct choice or not.

The graph of the annual number of maximum exceedances is given in the Figure 4.1

24

By looking at the graph closely, we see that it can be modeled by Frechet distribution

family.

4.2.1 Picking Frechet distribution type

Suppose F whose tail is of power law form,

1− F (x) ∼ c ∗ x−α

as x →∞ with both c and α ≥ 0

This form covers the Pareto distribution as well as t and F distributions. [7]

Let’s define scaling constants an = (n ∗ c)1/α and bn = 0 and renormalize F.

Then for x > 0

F n(an ∗ x) ≈ (1− c ∗ (an ∗ x)−α)n (4.1)

F n(an ∗ x) = (1− x−α

n)n (4.2)

equation on the above converges to exp(−x−α) for α ≥ 0 [?]

Thus, the limiting distribution is Frechet type.

4.3 Least squares regression of the Model

In the Figure 1.1, the x coordinate represents time(years) between 1914 and 1963.

So, x = 1 refers to the 1914 and so on. Let xi be represented by ti where i = 1...49.

The y coordinate has the values of the number of maximum temperature exceedances

and let yi be represented by βi.

In order to check how well our data fit into the picked Frechet model, we want to

do the least squares regression. Putting the values into the equation, we get:

f(α) =1

2∗

49∑i=1

(exp(−ti)−αi − βi)

2 (4.3)

Since this is a nonlinear equation, we will solve this iteratively by using the Matlab.

In fact, we use Newton’s method to solve f(α). Recall that Newton’s method is :

ki+1 = ki − f(x)

f ′(x)(4.4)

25

As we see above equation, we need first and second derivative of f(α).

Differentiating the equation 4.5 with respect to α using maple we get

f ′(α) =49∑i=1

(exp(−ti)−αi − βi) ∗ x−α ∗ ln(ti) ∗ exp(−ti)

−αi (4.5)

Taking the second derivative of f(α) with respect to α yields:

f ′′(α) =49∑i=1

[(−ti−αi)2 ∗ ln(ti)

2 ∗ (exp(−ti)−αi)2]−

−49∑i=1

[(exp(−ti)−αi − βi) ∗ x−α ∗ ln(ti)

2 ∗ exp(−ti)−αi ]+

+49∑i=1

[(exp(−ti)−αi − βi) ∗ x−α ∗ ln(ti)

2 ∗ exp(−ti)−αi ]

Here we attach the matlab code for this nonlinear least squares fit that we dis-

cussed above:

% This code is for the least squares fit of the data.

% The original equation is given in a seperate sheet.

% Here we look only on the numerical part of the solution

% This program solves the nonlinear equation by using the

% Newton’s method.

clear

clc

xz = 1;

for i = 1:49

t(i,1) = xz;

i = 1+i;

xz = xz+1;

end

for i = 1:49

26

a(i,1) = exp(t(i,1));

i = i+1;

end

% vector alfa stores the number of exceedances of the maximum

%temperature in each year between 1914 and 1963

alfa = [11 17 14 15 16 13 9 13 16 10 16 10 13 20 7 12 10 9 8

14 10 5 5 8 4 8 6 8 5 5 9 8 6 7 7 3 7 6 8 3 6 5 6 4

5 6 2 5 5]’;

% now k will be calculated iteratively.

% v1 is used to store the values of the first derivative

% w is used to store the values of the second derivative for

%the sake of % clarity, I split it into two parts.

v1 = 0;

v2 = 0;

w1 = 0;

w2 = 0;

v = 0;

w = 0;

k(1,1) = 0;

j = 1;

for l = 1:10

for i = 1:49

v = v + ((exp(-t(i,1)^(-k(j,1))))-alfa(i,1))

*t(i,1)*(t(i,1))^(-k(j,1))*log(t(i,1))*

*exp((-t(i,1))^(-k(j,1)));

w1 = (t(i,1)^(-k(j,1)))^2*(log(t(i,1)))^2*

*(exp(-t(i,1)^(-k(j,1))))^2;

w2 = ((exp(-t(i,1)^(-k(j,1))))-alfa(i,1))*

27

*(t(i,1))^(-k(j,1))*

(log(t(i,1)))^2*(exp(-t(i,1)^(-k(j,1))));

w3 = (exp(-t(i,1)^(-k(j,1)))-alfa(i,1))*

*(t(i,1)^(-k(j,1)))^2*

(log(t(i,1)))^2*exp(-t(i,1)^(-k(j,1)));

w = w1-w2+w3;

i = i+1;

end

j = j+1;

k(j,1) = k(j-1,1)-(v/w);

l = l+1;

i = 1;

end

Solution of the above equation yields α = 3.3717 ∗ 104

4.4 Conclusion

In this thesis, we worked with data which is the maximum daily temperatures

between 1913 and 1963. Our goal was to use the statistical methodology called

Extreme Value Theory to support the idea that there is a global warming. We

have applied Gumbel and Frechet distributions from Block Maximum Model to our

data and the result that we obtained contradicts with the fact of existence of Global

warming. Therefore, we conclude that Extreme Value Theory does not work for this

type problem. We think the reasons of Extreme Value Theory methods not working

for this problem might be:

1. The underlying probability distribution is changing

2. There might be some very important factors that we did not consider for this

problem

28

BIBLIOGRAPHY [1] Global warming, Retrieved September 8, 2006, from http://www.globalwarming.org, (n.d.). [2] American Geophysical Union (AGU), Human impacts on climate. Retrieved September 10, 2006, from http://www.agu.org/sci_soc/policy/climate_change_position.html, (n.d.). [3] World wild life, Retrived October 2, 2006, from www.worldwildlife.org, (n.d.). [4] Smith, R. Lecture notes on environmental statistics. Lecture presented at University of North Carolina, Chapel Hill, NC. Retrieved September 15, 2006, from http://www.stat.unc.edu/postscript/rs/envnotes.pdf. (n.d.). [5] Katz, R. Statistics of weather and climate extremes. Retrieved September 3, 2006, from www.isse.ucar.edu/extremevalues/extreme.html, (n.d.). [6] Embrechts, P. Resnick, S. & Samorodnitsky, G. Extreme value theory as a risk management tool. North American Actuarial Journal, 3, 30-41, (1999). [7] Coles, S. An introduction to statistical modeling of extreme values. New York: Springer, (2001).

29

http://www.globalwarming.org/

http://www.agu.org/sci_soc/policy/climate_change_position.html

http://www.worldwildlife.org/

http://www.stat.unc.edu/postscript/rs/envnotes.pdf

http://www.isse.ucar.edu/extremevalues/extreme.html

PERMISSION TO COPY

In presenting this thesis in partial fulfillment of the requirements for a master’s

degree at Texas Tech University or Texas Tech University Health Sciences Center, I

agree that the Library and my major department shall make it freely available for

research purposes. Permission to copy this thesis for scholarly purposes may be granted

by the Director of the Library or my major professor. It is understood that any copying

or publication of this thesis for financial gain shall not be allowed without my further

written permission and that any user may be liable for copyright infringement.

Agree (Permission is granted.)

___________Bahtiyar Babanazarov __________________ ______11/20/06__ Student Signature Date Disagree (Permission is not granted.) _______________________________________________ _________________ Student Signature Date

DISTRIBUTION OF NEW TEMPERATURE - Repositories

Documents

Transcript of DISTRIBUTION OF NEW TEMPERATURE - Repositories