SADC Course in Statistics Importance of the normal distribution (Session 09)

12
SADC Course in Statistics Importance of the normal distribution (Session 09)

Transcript of SADC Course in Statistics Importance of the normal distribution (Session 09)

Page 1: SADC Course in Statistics Importance of the normal distribution (Session 09)

SADC Course in Statistics

Importance of the normal distribution

(Session 09)

Page 2: SADC Course in Statistics Importance of the normal distribution (Session 09)

2To put your footer here go to View > Header and Footer

Learning Objectives

At the end of this session you will be able to:

• discuss reasons why the normal probability distribution is important

• state the Central Limit Theorem and its value in approximating Binomial and Poisson probabilities by normal probabilities

• explain how the assumption of normality for a given random variable can be checked

Page 3: SADC Course in Statistics Importance of the normal distribution (Session 09)

3To put your footer here go to View > Header and Footer

Importance of Normal Distribution• Many measurements can be closely

approximated by the normal distribution since many variables show normal variation as a resultant of many minor influences up and down

• Data which are not normal, can often be transformed into a normal random variable

• The normal distribution underpins a lot of inference ideas. We have seen that probability statements about any normally distributed variable can be done via N(0,1)

Page 4: SADC Course in Statistics Importance of the normal distribution (Session 09)

4To put your footer here go to View > Header and Footer

The Central Limit Theorem (CLT)

• One of the key reasons why the normal distribution is important is because of the Central Limit Theorem (CLT).

• This theorem states that the sample mean of any random variable has an approximate normal distribution, provided that the sample size is sufficiently large.

Page 5: SADC Course in Statistics Importance of the normal distribution (Session 09)

5To put your footer here go to View > Header and Footer

Consequences of the Central Limit Theorem• Many statistical techniques are based on the

assumption that the mean of the distribution follows a normal distribution

• As a consequence of the Central Limit Theorem, the above assumption is not invalidated as long as the sample size is large enough, e.g. say > about 30.

• The CLT also implies that the binomial and Poisson probabilities approach the normal probabilities as n becomes large (see below).

Page 6: SADC Course in Statistics Importance of the normal distribution (Session 09)

6To put your footer here go to View > Header and Footer

Normal approximation to the binomial distribution• Recall that the form of the binomial

distribution for p=0.5 closely resembles the normal distribution

• This is because the binomial probabilities are symmetric when p=0.5

• However, even with p0.5, the normal approximation holds for large n because a binomial random variable is the mean of several Bernoulli random variables and then the CLT applies

Page 7: SADC Course in Statistics Importance of the normal distribution (Session 09)

7To put your footer here go to View > Header and Footer

Normal approximation to the Poisson distribution

• Recall from previous session (slides 8-12) that as the Poisson parameter becomes large, the shape of the Poisson distribution becomes bell-shaped and symmetrical

• This is again a consequence of the CLT since is the mean of the Poisson distribution

Page 8: SADC Course in Statistics Importance of the normal distribution (Session 09)

8To put your footer here go to View > Header and Footer

More formally…

npp

pXZ

)1(

X

has a normal distribution with mean 0 and variance 1 (standard normal) when the sample size n is large.

Note that = r/n, where r=number of successes in n trials, i.e. r is a binomial random variable.

If is an average of a series of n Bernoulli random variables (0,1 variables), then

X

Page 9: SADC Course in Statistics Importance of the normal distribution (Session 09)

9To put your footer here go to View > Header and Footer

n

YZ

The same result is true for the Poisson average, i.e. Z defined below can be approximated by the standard normal distribution for large values of .

and further …

Page 10: SADC Course in Statistics Importance of the normal distribution (Session 09)

10To put your footer here go to View > Header and Footer

Thus the normal distribution plays an important role in statistics.

Most of the techniques covered in Modules H2 and H8 are based on assuming that the key response of interest follows a normal distribution.

We therefore need to be able to check whether measurements on a given random variable follows a normal distribution.

This is done by producing a normal probability plot.

Checking for normality

Page 11: SADC Course in Statistics Importance of the normal distribution (Session 09)

11To put your footer here go to View > Header and Footer

Statistics software packages generally have a facility for producing this plot.

Below is the plot for maize cob weights. In this plot, the Y-axis corresponds to values you would expect from an actual normal distribution. The X-axis corresponds to your data.

This implies that a straight line indicates the normality assumption is valid.

What do you deduce from graph below?

Normal Probability Plot

Page 12: SADC Course in Statistics Importance of the normal distribution (Session 09)

12To put your footer here go to View > Header and Footer