MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling...

89
MBP1010 - Lecture 2: January 14, 2009 Density curves and standard normal distribution Sampling distribution of the mean Confidence Interval for the mean Hypothesis testing (1 sample t test) Reading: Introduction to the Practice of Statistics: 1.3, 3.4, 5.2, 6.1-6.4 and 7.1

description

Importance of Normal Distribution* 1. Distributions of real data are often close to normal. 2. Mathematically easy to work with so many statistical tests are designed for normal (or close to normal) distributions). 3. If the mean and SD of a normal distribution are known, you can make quantitative predictions about the population. * also called Gaussian curve

Transcript of MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling...

Page 1: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

MBP1010 - Lecture 2: January 14, 2009

1. Density curves and standard normal distribution

2. Sampling distribution of the mean

4. Confidence Interval for the mean

5. Hypothesis testing (1 sample t test)

Reading: Introduction to the Practice of Statistics: 1.3, 3.4, 5.2, 6.1-6.4 and 7.1

Page 2: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Standard deviation vs standard error for describing data

Table 1. Characteristics of study subjects (n=35)

Page 3: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Importance of Normal Distribution*

1. Distributions of real data are often close to normal.

2. Mathematically easy to work with so many statistical tests are designed for normal (or close to normal) distributions).

3. If the mean and SD of a normal distribution are known, you can make quantitative predictions about the population.

* also called Gaussian curve

Page 4: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Red bars = scores 6Proportion = 0.303

Page 5: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Red area under thedensity cure are 6.Proportion = 0.293

Page 6: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Cumulative proportion for value x is the proportion of allobservations that are x; this is the area to the left of the curve.

Page 7: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.
Page 8: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.
Page 9: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Mean = 64.5 inchesSD = 2.5 inches

“The 68-95-99.7 Rule”

Page 10: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

The standard normal distribution is: a normal distribution with a mean of 0 and a SD of 1. Normal distributions can be transformed to

standard normal distributions by the formula:

where X is a score from the original normal distribution,

μ is the mean of the original normal distribution, and

σ is the standard deviation of original normal distribution.

The standard normal distribution is sometimes called the

z distribution.

Page 11: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Standardized Normal Distribution

Page 12: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

A z score always reflects the number of standard deviations

above or below the mean a particular score is.

Ex. If a person scored 70 on a test with mean of 50 and

SD of 10, then they scored 2 standard deviations above

the mean. Converting the test scores to z scores, an X of 70

would be:

So, a z score of 2 means the original score was 2 SD

above the mean.

Z-score

Page 13: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Z Scores

-Provide a meaningful way to compare individuals from different normal distributions – on the same scaleIe. How many SD above or below the mean?

Eg, - bone density measures - growth charts – height of children at different ages - “normalized” data

Page 14: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

QQ-plot shows the theoretical quantiles versus the empirical quantiles. If the distribution is “normal”, we should observe a straight line.

Quantile-Quantile (Q-Q) Plot

Page 15: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Rice Virtual Lab in Statistics

http://onlinestatbook.com/rvls/

Hyperstat Online

Section 5. Normal Distribution - theory

Page 16: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Sampling and Estimation

Page 17: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Populations and Samples

Population: entire group of individuals that we want information about

Sample: a part of the population that we actually examine in order to gather information

Goal: to try to draw conclusions about the population from the sample

Page 18: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Whole Population

Sample

Mean = SD =

Mean = xSD = s

Sample Inference

Page 19: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Parameter:

- a number that describes the population- number is fixed but in practice we do not

know its value (eg, μ)

Statistic:

- a number that describes a sample (eg, x). - its value is known when we take a sample,

but it can change from sample to sample. - often used to estimate an unknown parameter .

Page 20: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Statistical inference is the process by which we draw conclusions about the population from the results observed in a sample..

Two main methods used in inferential statistics: estimation and hypothesis testing.

In estimation, the sample is used to estimate a parameter and a confidence interval about the estimate is constructed.

Page 21: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Random Sampling is Key!

- every individual in the population sampled must have a chance of being included in the sample

- the choice of one subject does not influence the chance of other subjects being chosen

- use a method of sampling in which chance alone operates- toss of a coin, draw from a hat- random number generators

- random assignment in clinical trials results in randomlyselected groups

Page 22: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

- the chances for each individual in the population to be selected is equal

- every possible sample an equal chance to be chosen

Simple Random Sampling (SRS)

Stratified Sampling

- divide the population into strata- choose SRS in each stratum- combine these SRS to form full sample eg. Strata: prognostic factors in cancer patients;

male/female, age - consult a statistician for more complex sampling

Page 23: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Sample mean (x) as an estimator of the population mean ()

What would happen if we repeated the sample several times?

Sampling variability:- repeated samples from the same populationwill not have the same mean

- depends partly on how variable the underlyingpopulation is and on the size of the sample

selected

Page 24: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Sampling Distribution of X

- the distribution of values taken by the mean (x) in allpossible samples of the same size from the same population

-

Page 25: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

1. Mean of sampling distribution of x =

2. SD of sampling distribution = - called standard error of the mean

3. Shape of the sampling distribution is approximately a normal curve, regardless of the shape of the population distribution, provided n is large enough (Central Limit Theorem)

Page 26: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Simulation of Sampling DistributionCentral Limit Theorum

Rice Virtual Lab in Statistics

http://onlinestatbook.com/rvls/

Page 27: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Population: All MBP1010 students

n=37 = 1.00 cup = 1.07 cups

Page 28: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Population One Randomly n=37 Selected Sample n=12

x = 0.875 s = 0.78

= 1.00 = 1.07

Page 29: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Population Sampling Distribution n=37 1000 repeats of n=12

= 1.00 = 1.07

Mean = 1.00SD = 0.26

Page 30: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Population Sampling Distribution One Sample n=37 1000 repeats of n=12 n=12

Mean = 1.00SD = 0.26

x = 0.875 s = 0.78 SEM = 0.23

s/n (SEM)

= 1.00 = 1.07

Page 31: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Confidence Interval of the Mean

Page 32: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Standard Normal Distribution

Page 33: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

95% Confidence Interval = 0.95

=0.025=0.025

-1.96 1.9697.5 th 2.5 th

Page 34: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

95% Confidence Interval for a population mean

Pr (-1.96 z 1.96) = 0.95 Pr (-1.96 1.96) = 0.95

Pr (x -1.96/n x + 1.96/n ) = 0.95

x - 1.96(/n) and x + 1.96(/n) are the 95 percent confidence intervals on the population mean

Express x in standardized form: z statistic

If population known (not realistic)

x - /n

Page 35: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

In the long run, 95% of all samples will have an interval that includes .

24 out of 25 samplesincluded (96%)

Page 36: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

90% Confidence Interval = 0.90

=0.05=0.05

-1.645 1.64595 th 5 th

Page 37: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

- use sample standard deviation (s) as an estimate of - therefore, /n estimated from sample using: s/n (standard error of the mean;SE)

- SE of the sample is the estimate of the SD that would be obtained from the means of a large number of samples drawn from that population

Confidence Interval for a population meanpopulation NOT known (usual)

Page 38: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

x - s/n

-need to consider reliability of both x and s as estimators of and respectively - shape of the distribution depends on the sample size n

Problem:

Critical Ratio = is not normally distributed

Therefore follows the t distributionx - s/n

Page 39: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

t - distribution

- degrees of freedom refer to number of independent quantities among a series of numerical quantities

- a family of distributions indexed by the degrees of freedom (n-1)

Page 40: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Degrees of Freedom

For SD:

- there are n deviations around the mean - there is one restriction: sum of deviations = 0- therefore once we have calculated n-1 deviations around the mean, the last number would be already determined as the sum must be 0 (ie. not independent).

- for n deviatons around the mean there are n-1 degrees of freedom (DF)

Page 41: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.
Page 42: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

x - t 24,0.975 x s/n, x + t 24,0.975 x s/n

t 24,0.975 = 2.064 (from tables of t dist)

2.1 - (2.064 x 1.9/ 25), 2.1 + (2.064 x 1.9/ 25)

= 1.32 , 2.88 cm

95% Confidence Interval for a population meanpopulation NOT known (usual)

A sample consists of 25 mice with a mean tumor size of 2.1 cm and SD = 1.9 cm.

Page 43: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.
Page 44: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Confidence interval for a Mean

Interpretation: - 95% of the intervals that could be constructed from repeated random samples of size 25 contain the true population mean

- we are 95% confident that the mean tumor sizeis between 1.32 and 2.88 cm.

Estimate of mean tumor size = 2.1 cm; n=25.

95% CI = 1.32 , 2.88 cm

Page 45: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Factors affecting the length of the confidence interval

Sample size: as n increases, length of the CI decreases

variation: as s, which reflects variability of the distributionof observations, increases, the length of the CI increases

level ofconfidence: as the confidence desired increases (ie 90,95,

99% CI), the length of the CI increases.

x t n-1, .975 x s/n s/n = SE

Page 46: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Standard deviation vs standard error for describing data

Table 1. Characteristics of study subjects (n=35)

Page 47: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Standard deviation vs standard error for describing data

If the purpose is to describe the data (eg. to see if subjects are typical): standard deviation

- variability of the observations

If the purpose is to describe the results (outcome) of the Study: standard error

confidence interval- precision of the estimate of a population parameter

Note:-can calculate one from the other - indicate clearly whether reporting SD or SE

Page 48: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

What Formal Statistical Inference Cannot Do

-tell you what population you should be interested in

- ensure that you sampled properly from the population

- determine whether measurements made are biased (systematically wrong)

DOES:- give a quantitative indication of how much random variation may have affected your results

Page 49: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Target Population Patients with All rheumatoid votersarthritis

Population Sampled Patients admitted telephone to a particular listings

hospital

Sample Studied Sample of sample ofrecords of above listingsabove patients

What/who are we trying to study?

Page 50: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.
Page 51: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Hypothesis Testing

Page 52: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Schematic Plots

| 45 + | | | | | | 40 + | | | | | | | 35 + 0 | | 0 | | 0 +-----+ | | | 30 + | | | | *--+--* | | | | | | | | 25 + | | | | | +-----+ | | | | +-----+ | 20 + | | | | | | | | *--+--* | | | | | 15 + | | | | +-----+ | | | | 10 + | | | | | | 5 + ------------+-----------+----------- GROUP 1 2

Low Fat Control

Dietary fat intake in the low fat and control groups(n=151 intervention and 187 control)

Page 53: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Blood HDL-cholesterol levels in the low fat and control groups (n=163 intervention and 199 control)

| 2.6 + | | | | 2.4 + | | | | 0 | | | 2.2 + | | | | | | | | | | 2 + | | | | | | | | | | | 1.8 + | +-----+ | | | | | | | | | +-----+ | | 1.6 + | | | | | | | | + | | | | *-----* | *--+--* | | 1.4 + | | | | | | | | | | | | +-----+ | +-----+ | 1.2 + | | | | | | | | | | | 1 + | | | | | | | | | | 0.8 + ------------+-----------+----------- GROUP 1 2 Low Fat Control

Page 54: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

mean = 1684 kcal/daySD = 380.5 kcal/day

Page 55: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Examples of conclusions of hypothesis tests

The mean intake of dietary fat is significantly lower in the low-fat group as compared to the control group (17.5 vs 28.3 percent energy from fat; p 0.001). (2 sample t test)

Does the energy intake of women in a sample differ from the “recommended” level of 1850 kcal?(1 sample t test)

Page 56: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Hypotheses

- hypotheses stated in terms of the population parameters (true means)

- null hypothesis: Ho

- statement of no effect or no difference- assess the strength of evidence against null hypothesis

- alternative hypothesis: Ha

- what we expect/hope to see

- Usually a 2 sided test

Page 57: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Control Intervention c = T

Xc vs XT

Page 58: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Compute the probability of obtaining a difference as large or larger than the observed difference assuming that, in fact, there is no difference in the true means.

If the probability is not very small, we concludethat observing such a difference is plausible, even when true means are equal, I.e. the data do not provide evidence that true means are different.

if probability is very small, we conclude there is a difference between the means.

Overview of hypothesis testing

Page 59: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Significance tests answers the question:

Is chance or sampling variation a likely explanation of the discrepancy betweena sample results and the null hypothesispopulation value?

Yes: sample result is compatible with ideathat sample is from population in which null hypothesis is true

No: discrepancy unlikely due to chance variation - sample result is not compatible with idea that sample is from population in which null hypothesis is true

Page 60: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Steps in Hypothesis Testing

1. State hypothesis.

2. Specify the significance level.

3. Calculate the test statistic.

4. Determine p value.

5. State conclusion.

Page 61: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

One Sample T test

Page 62: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

One Sample T test: Energy intake in women

For a sample of randomly selected 29 women:

Mean energy intake = 1,684 kcal/dayStandard deviation (s) = 380.5 kcal/day

Does the energy intake of women in this study differ from the “recommended” level of 1850 kcal?

Page 63: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Example of energy intakes

Ho: the true mean energy intake of women in the trial is not different from 1,850 kcal/day

Ha: the true mean energy intake of women in the trial is different from 1,850 kcal/day

Specific Notation:

Ho: = 1,850Ha: 1,850 (2 sided)

1. State hypotheses:

Page 64: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

2. Significance Level

- how much evidence against Ho we require to reject Ho (determine in advance)

- compare the p value with a fixed value that is considered decisive

- this value is called significance level - denoted as

- commonly use = 0.05

Page 65: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Significance Level

= 0.05

- require that the data give evidence against Ho so strong that it would happen not more than 5% of the time (1 in 20), when Ho is true.

= 0.01- require that the data give evidence against Ho so strong that it would happen not more than 1% of the time (1 in 100), when Ho is true.

Page 66: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

3. Calculate the test statistic

- test statistic measures compatibility between null hypothesis and the data

- to assess how far the estimate is from parameter:standardize the estimate

- z statistic (when known)

- t statistic (when not known)

Page 67: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

One Sample t test

- use t distribution when population standard deviation () not known

degrees of freedom = n-1

To test hypothesis Ho: = o based on a SRSof size n, compute the t statistic:

Page 68: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Based on sample of 29 women:

x = 1684 kcal/day; standard deviation (s) = 380.5 kcal/day

x - s/nt =

1684 - 1850380.5/29

= -2.35

Step 3. Calculate test statistic.

=

Page 69: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Determine the p value

- probability of getting an outcome as extreme or more extreme than the actually observed outcome

- extreme: far from would be expected if null hypothesis is true

- smaller the p value, the stronger the evidence against the null hypothesis

Page 70: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

t =

Energy Intake in Women

2 sided test:

P(t -2.35 or t 2.34)P(t -2.35) = 0.0130

P(t 2.35) = 1 - 0.9870 = 0.0130

P value = 2P( t -2.35) = 0.026

1684 - 1850380.5/29

= -2.35

Page 71: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

p = 0.0130

t = -2.35 t = 2.35

Step 4. Determine p value.

p = 0.0130

2 sided p = 0.026

Page 72: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

What does a “small” p value mean?

1. An unlikely event occurred (getting a large value for the test statistic by chance).

2. The null hypothesis is false.

Page 73: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Probability of getting an outcome as extreme ormore extreme than the actually observed outcomein either direction, if the null hypothesis is true.

P value for a 2 sided test:

Page 74: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Statistical Significance

In the example: p value = 0.026

2.6% chance of observing a mean energy intake of 1684 kcal/day in a sample of women even if the true mean is not different from the recommended level of 1850 kcal/day. What do we conclude?

Page 75: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Statistical Significance

p value = 0.026

We reject the null hypothesis, Ho.

The mean energy intake of women is significantly lower than the recommended intake (p < 0.05).

The mean energy intake of women is significantly lower than the recommended intake (p = 0.03).

(Significant at the 5% but not the 1% level)

Page 76: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

One Sample t-test

data: energy.intake t = -2.3493, df = 28, p-value = 0.02610alternative hypothesis: true mean is not equal to 1850 95 percent confidence interval: 1539.260 1828.741 sample estimates:mean of x 1684.001

R code: t.test(energy.intake, mu=1850)

Using R – One Sample t-test

R Output:

Page 77: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Statistical Significance

If recommended level is 1750 kca/day;then p = 0.36.

36% chance of observing a mean energy intake of 1684 kcal/day in a sample of women even if the true mean is not different from the recommended level of 1750 kcal/day. What do we conclude?

Page 78: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Statistical Significance

p value = 0.36

We do not reject the null hypothesis, Ho.

The data do not provide evidence that mean energy Intake of women is different from the recommendedlevel.

The mean energy intake of women in the study is not significantly different from recommended level of 1750 kcal/day (p = 0.36).

Page 79: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

p = 0.0130

Ho: = 1850Ha: < 1850

One sided test

Page 80: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Probability values for one-tailed tests are one half the value for two-tailed tests as long as the effect is in the specified direction.

Page 81: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

One-sided vs two-sided tests

- one sided tests are rarely justified

- decide on appropriate test prior to experiment

- Do not decide on a one-sided test after looking at the data

eg. p value for 2 sided is 0.09 p value for 1 sided is 0.045

If any doubt: choose 2 sided test!

Page 82: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

General guidelines for stating significance

0.01 p < 0.05 significant

0.001 p < 0.01 highly significant

p < 0.001 very highly significant

p > 0.05 not statistically significant (NS)

0.05 p < 0.10 trend towards statistical significance

If: results are:

Page 83: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Reporting actual p values

A. p value = 0.0512 Conclude: result is NS, p > 0.05

If the effect is interesting and potentially important would probably want to:- repeat study- check power of study

b. p value = 0.75Conclude: result is NS, p > 0.05- likely no effect

Page 84: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Comments/Cautions about hypothesis testing

Page 85: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Statistical vs clinical significance

- look at the size of effect not just p value

- look at confidence interval for parameter of interest

- with a large sample size, a very small effectmay be statistically significant

Page 86: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Exploratory data analysis vs hypothesis testing

- exploratory data analysis is important

- but cannot test a hypothesis on the same datathat first suggested it

- if report findings - clearly state - post hoc

- need to design a new study to test the hypothesis

Page 87: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Relationship between confidence interval and p value

Page 88: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

x - t 24,0.975 x s/n, x + t 24,0.975 x s/n

t 24,0.975 = 2.064 (from tables of t dist)

2.1 - (2.064 x 1.9/ 25), 2.1 + (2.064 x 1.9/ 25)

= 1.32 , 2.88 cm

95% Confidence interval for a population mean

A sample consists of 25 mice with a mean tumor size of 2.1 cm and SD = 1.9 cm.

Page 89: MBP1010 - Lecture 2: January 14, 2009 1. Density curves and standard normal distribution 2. Sampling distribution of the mean 4. Confidence Interval for.

Ho: = 2.9Ha: 2.9

CI and Hypothesis Test

t = x - s/n

2.1- 2.9 1.9/25

=

x = 2.1 cm s = 1.9 cm

= 2.105

p = 0. 0459

95 % CI for mean tumor size = 1.32 , 2.88 cm