RSS Hypothessis testing

31
Sampling Distributions, Standard Error, Confidence Interval Oyindamola Bidemi YUSUF KAIMRC-WR

description

Hypothessis testing by Dr. O. Yusuf as part of the 5th Research Summer School - Jeddah at KAIMRC - WR

Transcript of RSS Hypothessis testing

Page 1: RSS Hypothessis testing

Sampling Distributions, Standard Error, Confidence Interval

Oyindamola Bidemi YUSUF KAIMRC-WR

Page 2: RSS Hypothessis testing
Page 3: RSS Hypothessis testing

SAMPLE

Why do we sample? Note: information in sample may

not fully reflect what is true in the population

We have introduced sampling error by studying only some of the population

Can we quantify this error?

Page 4: RSS Hypothessis testing

SAMPLING VARIATIONS Taking repeated samples Unlikely that the estimates would be

exactly the same in each sample However, they should be close to the

true value By quantifying the variability of these

estimates, precision of estimate is obtained.

Sampling error is thereby assessed.

Page 5: RSS Hypothessis testing

SAMPLING DISTRIBUTIONS Distribution of sample estimates

- Means- Proportions

- Variance Take repeated samples and

calculate estimates Distribution is approximately

normal

Page 6: RSS Hypothessis testing

Mathematicians have examined the distribution of these sample estimates and their results are expressed in the central limit theorem

Page 7: RSS Hypothessis testing

central limit theorem

Sampling distributions are approximately normally distributed regardless of the nature of the variable in the parent population

The mean of the sampling distribution is equal to the true population mean Mean of sample means is an unbiased

estimate of the true population mean The standard deviation (SD) of sampling

distribution is directly proportional to the population SD and inversely proportional to the square root of the sample size

Page 8: RSS Hypothessis testing

SUMMARY: DISTRIBUTION OF SAMPLE ESTIMATES

NORMAL Mean = True population mean Standard deviation = Population

standard deviation divided by square root of sample size

Standard deviation called standard error

Page 9: RSS Hypothessis testing

ESTIMATION A major purpose or objective of health

research is to estimate certain population characteristics or phenomena

Characteristic or phenomenon can be quantitative such as average SYSTOLIC BLOOD PRESSURE of adult men or qualitative such as proportion with MALNUTRITION

Can be POINT or INTERVAL ESTIMATE

Page 10: RSS Hypothessis testing

Point estimates

Value of a parameter in a population e.g. mean or a proportion

We estimate value of a parameter using using data collected from a sample

This estimate is called sample statistic and is a POINT ESTIMATE of the parameter i.e. it takes a single value

Page 11: RSS Hypothessis testing

STANDARD ERROR Used to describe the variability of

sample means Depends on variability of individual

observations and the sample size Relationship described as –Standard error = Standard Deviation

Square root of sample size

Page 12: RSS Hypothessis testing

Sample 1 Mean

Sample 2 Mean

Sample 3 Mean

……….

….........

Sample n Mean

Standard error

Mean of the means

Mean of the meansThis mean will also have a standard deviation= SE

Standard error

Page 13: RSS Hypothessis testing

Standard Deviation or Standard Error?

Quote standard deviation if interest is in the variability of individuals as regards the level of the factor being investigated – SBP, Age and cholesterol level.

Quote standard Error if emphasis is on the estimate of a population parameter.It is a measure of uncertainty in the sample statistic as an estimate of population parameter.

Page 14: RSS Hypothessis testing

Interpreting SE

Large SE indicates that estimate is imprecise

Small SE indicates that estimate is precise

How can SE be reduced?

Page 15: RSS Hypothessis testing

Answer

If sample size is increased If data is less variable

Page 16: RSS Hypothessis testing

INTERVAL ESTIMATE Is SE particularly useful? More helpful to incorporate this

measure of precision into an interval estimate for the population parameter

How? By using the knowledge of the theoretical

probability distribution of the sample statistic to calculate a CI

Page 17: RSS Hypothessis testing

Not sufficient to rely on a single estimate

Other samples could yield plausible estimates

Comfortable to find a range of values within which to find all possible mean values

Page 18: RSS Hypothessis testing

WHAT IS A CONFIDENCE INTERVAL?

The CI is a range of values, above and below a finding, in which the actual value is likely to fall.

The confidence interval represents the accuracy or precision of an estimate.

Only by convention that the 95% confidence level is commonly chosen.

Researchers are confident that if other surveys had been done, then 95 per cent of the time — or 19 times out of 20 — the findings would fall in this range.

Page 19: RSS Hypothessis testing

CONFIDENCE INTERVAL

Statistic + 1.96 S.E. (Statistic) 95% of the distribution of sample means lies within 1.96 SD of the population mean

Page 20: RSS Hypothessis testing

Interpretation

If experiment is repeated many times, the interval would contain the true population mean on 95% of occasions

i.e. a range of values within which we are 95% certain that the true population mean lies

Page 21: RSS Hypothessis testing

Issues in CI interpretation

How wide is it? A wide CI indicates that estimate is imprecise

A narrow one indicates a precise estimate

Width is dependent on size of SE, which in turn depends on SS

Page 22: RSS Hypothessis testing

Factors affecting CI A narrow or small confidence interval

indicates that if we were to ask the same question of a different sample, we are reasonably sure we would get a similar result.

A wide confidence interval indicates that we are less sure and perhaps information needs to be collected from a larger number of people to increase our confidence.

Page 23: RSS Hypothessis testing

Confidence intervals are influenced by the number of people that are being surveyed.

Typically, larger surveys will produce estimates with smaller confidence intervals compared to smaller surveys.

Page 24: RSS Hypothessis testing

Why are CIs important

Because confidence intervals represent the range of values scores that are likely if we were to repeat the survey.

Important to consider when generalizing results.

Consider random sampling and application of correct statistical test

Like comfort zones that encompass the true population parameter

Page 25: RSS Hypothessis testing

Calculating confidence limits

The mean diastolic blood pressure from 16 subjects is 90.0 mm Hg, and the standard deviation is 14 mm Hg. Calculate its standard error and 95% confidence limits.

Page 26: RSS Hypothessis testing

Standard error = Standard Deviation Square root of sample size

14 √16

Page 27: RSS Hypothessis testing

95% CI: Statistic + 1.96 S.E. (Statistic)

Page 28: RSS Hypothessis testing

ANWERS

Standard error – 3.5 95% confidence limits – 82.55 to

97.46

Page 29: RSS Hypothessis testing

CI for a proportion

P + 1.96 S.E. (P) SE(P)= √p(1-p)/n Online calculators are available

Page 30: RSS Hypothessis testing

In summary

SD versus SE Meaning and interpretation of CI Shopping for the right sampling

distribution

Page 31: RSS Hypothessis testing

THANK YOU