Lecture 1 Density curves and the CLT

30
1 Lecture 1 Density curves and the CLT Quantitative Methods Module I Gwilym Pryce [email protected]

description

Lecture 1 Density curves and the CLT. Quantitative Methods Module I Gwilym Pryce [email protected]. Notices:. Register Feedback forms Labs: Who wants to do the afternoon lab? Who wants to do the evening lab? Class Reps and Staff Student committee. Message: - PowerPoint PPT Presentation

Transcript of Lecture 1 Density curves and the CLT

Page 1: Lecture 1 Density curves and the CLT

1

Lecture 1Density curves and the CLT

Quantitative Methods Module I

Gwilym [email protected]

Page 2: Lecture 1 Density curves and the CLT

2

Notices: Register Feedback forms Labs:

– Who wants to do the afternoon lab?– Who wants to do the evening lab?

Class Reps and Staff Student committee. Message:

– those in Business taking the Master Class this week: come to seminar room 3 on the 3rd floor of the Business school at 10.00 on Wednesday? Thanks, Andy Furlong

Page 3: Lecture 1 Density curves and the CLT

3

Introduction: In this lecture we introduce some

statistical theory This theory sometimes seems abstract

for an applied quants course:– tempting just to use SPSS without properly

learning statistical theory, – which is a very powerful statistical package– … but a little knowledge is a dangerous

thing...

Page 4: Lecture 1 Density curves and the CLT

4

Page 5: Lecture 1 Density curves and the CLT

5

Page 6: Lecture 1 Density curves and the CLT

6

L1: Density Functions & CLT

L3: Introduction to Confidence Intervals

L4: Confidence Intervals for All Occasions

L5: Introduction to Hypothesis Tests

L2: Calculating z-scores

L6: Hypothesis Tests for All Occasions

L8: Regression

L7: Relationships between Categorical Variables

Quants I

24/09/2005 - v23

Page 7: Lecture 1 Density curves and the CLT

7

Page 8: Lecture 1 Density curves and the CLT

8

Aims & Objectives Aim

• the aim of this lecture is to introduce the concepts that under gird statistical inference

Objectives– by the end of this lecture students should

be able to:• Understand what a density curve is• understand the principles that allow us to make

inferences about the population from samples

Page 9: Lecture 1 Density curves and the CLT

9

Plan 1. Review of Induction material 2. Density curves & Symmetrical

Distributions 3. Normal Distribution 4. Central Limit Theorem

Page 10: Lecture 1 Density curves and the CLT

10

1. Review of Induction material 1.Measures of Central Tendency 2. Measures of Spread

– range, standard deviation– percentiles & outliers– Symmetric distributions

3. Density curves 4. Distribution of means from repeated

samples = central limit theorem. 5. Normal Distribution

n

xx i

2)(1

1xx

ns i

Page 11: Lecture 1 Density curves and the CLT

11

2. Density curves: idealised histograms (rescaled so that area sums to one)

Page 12: Lecture 1 Density curves and the CLT

12

Properties of a density curve Vertical axis indicates relative frequency

over values of the variable X– Entire area under the curve is 1– The density curve can be described by an

equation– Density curves for theoretical probability

models have known properties

Page 13: Lecture 1 Density curves and the CLT

13

Area under density curves: the area under a density curve that lies

between two numbers = the proportion of the data that lies between these two numbers:

• e.g. if area between two numbers x1 and x2 = 0.6, then this means 60% of xi lies between x1 and x2

– when the density curve is symmetrical, we make use of the fact that areas under the curve will also be symmetrical

Page 14: Lecture 1 Density curves and the CLT

14

Symmetrical Distributions

Mean = median

Mean = median

Areas of segments symmetrical

50% of sample < mean

50% of sample > mean

Page 15: Lecture 1 Density curves and the CLT

15

Symmetrical Distributions

•If 60% of sample falls between a and b, what % greater than b?

•What’s the probability of randomly choosing an observation greater than b?

60%

a b

Page 16: Lecture 1 Density curves and the CLT

16

20%

What’s the probability of being less than 6ft tall?

height6ft

Page 17: Lecture 1 Density curves and the CLT

17

3. Normal distribution: 68% and 95% rules

Slide 10 of 13 of Christian’s.

Page 18: Lecture 1 Density curves and the CLT

18

Normal Curves are all related

Infinite number of poss. normal distributions – but they vary only by mean and S.D.

• so they are all related -- just scaled versions of each other

a baseline normal distribution has been invented: – called the standard normal distribution– has zero mean and one standard deviation

Page 19: Lecture 1 Density curves and the CLT

19

NORM_2

6.806.00

5.204.40

3.602.80

2.001.20

.40-.40

-1.20-2.00

-2.80-3.60

-4.40-5.20

-6.00-6.80

50

40

30

20

10

0

NORM_2

16

14

12

10

8

6

4

2

0

Standardise

zzb

a b

za

c

zc

Page 20: Lecture 1 Density curves and the CLT

20

Standard Normal Curve we can standardise any observation

from a normal distribution – I.e. show where it fits on the standard

normal distribution by:• subtracting the mean from each value and

dividing the result by the standard deviaiton.• This is called the z-score = standardised value

of any normally distributed observation.

ii

xz Where = population mean

= population S.D.

Page 21: Lecture 1 Density curves and the CLT

21

• Areas under the standard normal curve between different z-scores are equal to areas between corresponding values on any normal distribution

• Tables of areas have been calculated for each z-score, – so if you standardise your observation, you can find out the

area above or below it.

– But we saw earlier that areas under density functions correspond to probabilities:

• so if you standardise your observation, you can find out the probability of other observations lying above or below it.

Page 22: Lecture 1 Density curves and the CLT

22

4. Distribution of means from repeated samples We have looked at how to calculate the

sample mean What distribution of means do we get if

we take repeated samples?

Page 23: Lecture 1 Density curves and the CLT

23

E.g. Suppose the distribution of income in the population looks like this:

Page 24: Lecture 1 Density curves and the CLT

24

Then suppose we ask a random sample of people what their income is.– This sample will probably have a similar

distribution of income as the population• Positive skew: mean is “pulled-up” by the incomes of fat-

cat, bourgeois capitalists. • Since the median is a “resistant measure”, the mean is

greater than the median

Then suppose we take a second sample, and then a third; and then compute the mean income of each sample:– Sample 1: mean income = £20,500– Sample 2: mean income = £18,006– Sample 3: mean income = £21,230

Page 25: Lecture 1 Density curves and the CLT

25

As more samples are taken, normal distribution of mean emerges

NORM_2

6.806.40

6.005.60

5.204.80

4.404.00

3.603.20

2.802.40

2.001.60

1.20.80

.40.00

-.40-.80

-1.20-1.60

-2.00-2.40

-2.80-3.20

-3.60-4.00

-4.40-4.80

-5.20-5.60

-6.00-6.40

-6.80

3.5

3.0

2.5

2.0

1.5

1.0

.5

0.0

NORM_2

6.86.4

6.05.6

5.24.8

4.44.0

3.63.2

2.82.4

2.01.6

1.2.8.4.0-.4-.8-1.2

-1.6-2.0

-2.4-2.8

-3.2-3.6

-4.0-4.4

-4.8-5.2

-5.6-6.0

-6.4-6.8

5

4

3

2

1

0

NORM_2

6.806.40

6.005.60

5.204.80

4.404.00

3.603.20

2.802.40

2.001.60

1.20.80

.40.00

-.40-.80

-1.20-1.60

-2.00-2.40

-2.80-3.20

-3.60-4.00

-4.40-4.80

-5.20-5.60

-6.00-6.40

-6.80

8

6

4

2

0

NORM_2

6.806.00

5.204.40

3.602.80

2.001.20

.40-.40

-1.20-2.00

-2.80-3.60

-4.40-5.20

-6.00-6.80

16

14

12

10

8

6

4

2

0

NORM_2

6.806.00

5.204.40

3.602.80

2.001.20

.40-.40

-1.20-2.00

-2.80-3.60

-4.40-5.20

-6.00-6.80

50

40

30

20

10

0

NORM_2

6.806.00

5.204.40

3.602.80

2.001.20

.40-.40

-1.20-2.00

-2.80-3.60

-4.40-5.20

-6.00-6.80

50

40

30

20

10

0

Page 26: Lecture 1 Density curves and the CLT

26

Why the normal distribution is useful:

Even if a variable is not normally distributed, its sampling distribution of means will be normally distributed, provided n is large (I.e. > 30) – I.e. some samples will have a mean that is

way out of line from population mean, but most will be reasonably close.

– “Central Limit Theorem”

Page 27: Lecture 1 Density curves and the CLT

27

– “The Central Limit Theorem is the fundamental sampling theorem. It is because of this theorem (and variations thereof), and not because of nature’s questionable tendency to normalcy, that the normal distribution plays such a key role in our work”

(Bradley & South)

Why….?

Page 28: Lecture 1 Density curves and the CLT

28

The standard error of the mean...– When we are looking at the distribution of

the sample mean, the standard deviation of this distribution is called the standard error of the mean

• I.e. SE = standard deviation of the sampling distribution.

– but we don’t usually know this• I.e. if we don’t know the population mean (I.e.

mean of all possible sample means), we are unlikely to know the standard error of sample means

– so what can we do?

Page 29: Lecture 1 Density curves and the CLT

29

CLT: What about Proportions?

What proportion of 10 catchers were female?

What happens if I repeat the experiment? – What would the distribution of sample

proportions look like?

Page 30: Lecture 1 Density curves and the CLT

30

Editing syntax files:

1. Start with an asterix:– Use *blah blah blah. to put headings in syntax

• anything after “ * ” is ignored by SPSS.

• Important way of keeping your syntax files in order• e.g.

*Descriptive Statistics on Income.*---------------------------------.

2. Forward slash and an asterix:– Use /*blah blah blah */ to comment on lines

• Anything between /* and */ is ignored by SPSS.• E.g.

COMPUTE z = x + y. /*Compute total income*/