1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

60
1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS

Transcript of 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

Page 1: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

1

Lecture 7

THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS

Page 2: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

2

Populations and samples

• When we gather data, the POPULATION is the reference set containing ALL POSSIBLE OBSERVATIONS (ALL scores, ALL reaction times, ALL IQs).

• Our own data are usually a selection or SAMPLE from the population.

• In statistics, our data are assumed to be samples from known THEORETICAL populations.

Page 3: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

3

Distribution of 4000 IQs

Page 4: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

4

A sample from a population

• This is a picture of the distribution of 4000 IQs.

• The histogram is symmetrical and bell-shaped.

• That’s because we have sampled from a NORMAL DISTRIBUTION.

• The normal distribution is the most important theoretical population in statistics.

Page 5: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

5

Large samples

• From the Laws of Large Numbers, we can expect the values of sample statistics to be close to those of the corresponding population parameters.

• The mean of this large sample is 99.9 and the SD is 14.96. These values are quite close to the population mean of 100 and the population SD of 15.

Page 6: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

6

It makes sense to say that, if IQ is normally distributed with a mean of

100 and an SD of 15, we have sampled 4000 values from

a normal distribution.

Page 7: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

7

‘Population’ means ‘distribution’

• In these lectures, I shall use the terms ‘population’ and ‘distribution’ interchangeably.

Page 8: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

8

Statistics versus parameters

• STATISTICS are characteristics of SAMPLES; PARAMETERS are characteristics of POPULATIONS.

• A normal population has two parameters: 1. the mean;2. the standard deviation.

• The IQ population is (approximately) a normal distribution with a mean of 100 and an SD of 15.

Page 9: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

9

A notational convention

• Letters from the Roman alphabet, such as M and s (for the mean and standard deviation, respectively), are used to denote the values of the STATISTICS of samples.

• Greek letters, (μ, σ) are used to denote the values of the corresponding population characteristics or PARAMETERS.

Page 10: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

10

The IQ example

• In this particular sample of 4000 IQ’s, M = 99.9 and s = 14.96.

• In the population, μ = 100 and σ = 15.

Page 11: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

11

The caffeine experiment

• In the caffeine experiment, we are sampling from TWO populations:

1. the population of scores under the Placebo condition with mean μ1 and standard deviation σ1;

2. the population of scores under the Caffeine condition with mean μ2 and standard deviation σ2.

Page 12: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

12

Specifying a normal distribution

• Suppose that a variable X has a normal distribution with mean μ and standard deviation σ.

• We write this as shown.

Page 13: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

13

There are many normal distributions

• There are an infinite number of normal distributions, each specified by particular values for μ and σ.

• The IQ is approximately distributed as N(100, 15).

• The heights of men are approximately distributed as N(69, 2.6).

Page 14: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

14

IQs of at least 130

• Suppose IQ has a normal distribution, with a mean of 100 and a standard deviation of 15.

• What proportion of people have IQ’s of at least 130?

Page 15: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

15

At least 130? …

• If a variable is normally distributed, 95% of values lie within 1.96 standard deviations (2 approx.) on EITHER side of the mean.

• So only 2 ½ % (0.025) of values lie beyond 2 SD’s above the mean.

0.95 (95%)

mean

mean – 1.96×SD mean +1.96×SD

2 ½ % = .025 2 ½ % = .025

Page 16: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

16

At least 130? …

• An IQ of 130 is 2 standard deviations above the mean.

• So only 2 ½ % (0.025) of IQ’s lie beyond 130.

0.95 (95%)

mean

mean – 1.96×SD mean +1.96×SD

2 ½ % = .025 2 ½ % = .025

Page 17: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

17

Probability

• A PROBABILITY is a measure of likelihood ranging from 0 (an impossibility) to 1 (a certainty).

• In the POPULATION, relative frequencies are probabilities.

Page 18: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

18

Relative frequency as an area

Relative frequency of heights between 65 inches and 70 inches.

Write a little into this box

Page 19: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

19

Probability

• In the POPULATION, relative frequencies are PROBABILITIES.

• The area under the normal curve of height between 65 inches and 70 inches is the PROBABILITY of a height in that range.

Page 20: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

20

Probability as an area

Probability of a height between 65 inches and 70 inches.

Write a little into this box

Page 21: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

21

IQ and probability

• If IQ is indeed normally distributed with a mean of 100 and an SD of 15, 2.5% of values in the population are greater than 130.

• The PROBABILITY of an IQ of at least 130 is 0.025.

0.95

0.025

100 130

Probability of an IQ greater than 130

= 0.025

Page 22: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

22

Notation

• Intelligence is assumed to have a CONTINUOUS DISTRIBUTION: there are an infinite number of values between any two points.

• So the probability of any one value is zero. • Consequently Pr(IQ ≥ 130) = Pr(IQ > 130) and

Pr(IQ ≤ 130) = Pr (IQ < 130).

Page 23: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

23

Probability density

• Associated with each value x of IQ is a PROBABILITY DENSITY, which can be thought of as the probability of a value IN THE NEIGHBOURHOOD of x.

• The height of the normal curve above the value x is the probability density of x.

Page 24: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

24

Probability distribution

• The normal distribution is an example of a PROBABILITY DISTRIBUTION.

• It is so-called because we can use it to obtain the probability of values of the variable within a specified range.

• There are several important probability distributions in statistics, and they are all used for this purpose.

Page 25: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

25

The standard normal variable z

• To find out how many standard deviations an IQ of 130 is above the mean, we have to SUBTRACT the mean and DIVIDE by the value of the standard deviation, i.e., by 15.

• If X is the original variable (X is IQ in this example), we have transformed X into another variable z, which is known as the STANDARD NORMAL VARIABLE.

Page 26: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

26

The standard normal variable z

• If X is a normal variable, that is,

X~N(μ,σ),

z will also be normally distributed.

z is known as the STANDARD NORMAL VARIABLE.

Page 27: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

27

Standardisation

• Strictly speaking, z is defined in relation to the theoretical population mean μ.

• However, any set of scores X can be STANDARDISED by subtracting the sample mean from each score and dividing by the sample standard deviation.

• We shall investigate the effects of standardising the 4000 IQ scores in our large sample.

Page 28: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

28

Sample distribution of 4000 IQs

Page 29: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

29

The distribution of X

• This is the sample distribution of X, which is centred on 99.9 and has a standard deviation of 14.96 IQ points.

Page 30: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

30

The Compute Variable procedure

Page 31: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

31

Transforming IQ to z

Page 32: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

32

Distribution of z

• The distribution of z is also normal, but it is centred on zero and has a standard deviation of 1.

Page 33: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

33

Scientific notation

-1.4016E-4 means

-1.4016×10-4 = -.00014016, which is zero, within rounding error.

Scientific notation

Page 34: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

34

The statistics of z

• Just use the Descriptives procedure to find the mean and standard deviation of z.

• The mean is 0. • The standard deviation is 1.

Page 35: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

35

Effects of standardisation

Standardising a set of scores (or a population of scores) has two effects:

1. The mean becomes zero;

2. The standard deviation becomes 1.

Page 36: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

36

The standard normal distribution

• In the notation I introduced earlier, we can represent the standard normal distribution as follows.

Page 37: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

37

Distribution of z

• Standardising a set of scores does NOT make them normally distributed.

• If there’s a tail to the right (+ve skew) before transforming X to z, there will be one after the transformation.

• Nevertheless, whatever the shape of the original distribution, the mean standardised scores will be zero and the standard deviation will be 1.

Page 38: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

38

Deviations sum to zero

-ve deviations

+ve deviations

Zero deviations

The mean is the centre of gravity, or balance point. The deviations are the distances of the points from the balance point. They must sum to zero: the positives and negatives must cancel each other out.

Page 39: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

39

The mean of z

• The numerator of z, (X – mean), is a DEVIATION SCORE.

• Since deviations about the mean sum to zero, the mean of the distribution of z is also zero.

• So the bell-shaped STANDARD NORMAL DISTRIBUTION is centered on zero.

Page 40: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

40

The standard deviation of z

Page 41: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

41

Using z

X (IQ) 70 100 130 = 100 + 1.96SD

z -1.96 0 +1.96

0.95

Probability that X (IQ) is at least 130

AND ALSO

Probability that z is at least +1.96

Probability that X (IQ) lies between 70 and 130

AND ALSO

Probability that z lies between -1.96 and +1.96.

Page 42: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

42

Referring questions from X to z

• What is the probability of an IQ of at least 130?

• This is to ask about the probability that X is at least 130, where X ~N(100, 15).

• Transform X to z: z = (130 – 100)/15 = 2.

• We know that the probability of z greater than 2 (1.96) = .025.

Page 43: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

43

Between 100 and 130?

• Convert these values to values of z.

• If X = 100, z = 0.• If X = 30, z = 2. • Pr(z between 0 and 2)

= 0.475.

0.95 (95%)

μ

X μ – 1.96×SD μ +1.96×SD

2 ½ % = .025 2 ½ % = .025

z -1.96 0 +1.96

Page 44: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

44

Finding the probability of a range of values of X

• In the problems we have considered, the value of z has always been around +2 (about 1.96), so that we can find the probability from memory.

• Suppose z = 1, 0.5, or any value other than 1.96?

• Just standardise the value of X by converting it to z: z = (X – mean)/SD.

• The are available tables in standard statistics textbooks which give probabilities of any specified range of z. You can also use SPSS to find such probabilities.

Page 45: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

45

The standard normal distribution

• There are countless normal distributions.• But there is only ONE standard normal

distribution, to which any of the others can be transformed by z = (X – mean)/SD.

• So only the probabilities of ranges of values of z need to be tabled.

• It would not be feasible to table the probabilities for ALL possible normal distributions.

Page 46: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

46

To sum up …

• If we know the DISTRIBUTION of some variable, we can assign a probability of obtaining a value within a specified range.

• We can visualise the probability of such a value as the area under the curve of the distribution.

• If the distribution is normal, we can translate probability questions in the original units of measurement into questions about ranges of z, which, provided X is normally distributed, has the STANDARD NORMAL DISTRIBUTION.

Page 47: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

47

Percentiles

• A PERCENTILE is the VALUE or SCORE below which a specified percentage or proportion of the distribution lies.

• The 30th percentile is the value below which 30% of the distribution lies.

• The 70th percentile is the value below which 70% of scores lie.

Page 48: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

48

The 30th and 70th percentiles

0.30

30th percentile

70th percentile

0.70

(0.70)

(0.30)

Page 49: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

49

Cumulative probability

• The CUMULATIVE PROBABILITY of a particular value is the probability of a value LESS THAN OR EQUAL TO the value.

• The cumulative probability of a value at the 30th percentile is 0.3 . The cumulative probability of a value at the 70th percentile is 0.70.

30th

70th

Cumulative probability of a score at the 30th percentile= 0.30

Cumulative probability of a score at the 70th percentile= 0.70.

Page 50: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

50

Using cumulative probabilities

Given that height is normally distributed, with a mean of 69 inches and an SD of 2.6 inches, what is the probability of a man having a height between 65 and 70 inches?

Page 51: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

51

65

70

CumProb (65)

←CumProb (70)→

65 70

Pr of height between 65 & 70

Page 52: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

52

The cumulative distribution function

Page 53: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

53

Cumulative probability of 65

Page 54: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

54

Cumulative probability of 70

• Name the new target variable and insert the value 70.

• Each cumulative probability will appear in a column whose length is the number of rows in the existing data set.

Page 55: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

55

The cumulative probabilities

• There must be some data in the Editor already.

• SPSS will create the new Target variables you have specified and will enter the cumulative probabilities.

Page 56: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

56

65

70

0.06

← 0.65 →

65 70

(0.65 - .06) = 0.59

Page 57: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

57

Multiple-choice example

Page 58: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

58

Second example

Page 59: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

59

SPSS exercise 1

• Open the SPSS data file containing 4000 IQ’s.

• Use the Compute procedure (in the Transform menu) to standardise the scores.

• Use Descriptives to obtain the mean and standard deviation of z.

Page 60: 1 Lecture 7 THE NORMAL AND STANDARD NORMAL DISTRIBUTIONS.

60

SPSS exercise 2

• Assuming that height is normally distributed with a mean of 69 inches and an SD of 2.6 inches, what is the probability of a man having a height between 74.2 inches and 76.8 inches?

• Solve by using the CDF to find the cumulative probabilities directly and subtracting.

• Transform the heights to z and compare the cumulative probabilities you obtain with those you obtained using the first approach.