RMTD 404

RMTD 404

Lecture 3

Distributions and Probability From this point on, we are going to work extensively with

distributions and probabilities of events.

Just about every situation that we deal with in statistics involves estimating probabilities of events given what we know about a distribution (e.g. coin).

Most of the time those distributions are not known, so we make assumptions about those distributions.

Based on those assumptions, we’ll estimate the probability or likelihood of observing a specific event or series of events.

The primary distribution we have been discussing is the normal distribution.

Distributions and Probability (recap) A normal distribution has the following characteristics:

• unimodal: peaked in the middle.• symmetrical: the left and right sides of the distribution are

mirror images.• bell-shaped: probabilities taper in the tails of the

distribution.• unlimited: the tails of the distribution extend to infinity in

both directions.

The normal distribution is useful for several reasons:• It is a shape that we frequently see in the real world and

everywhere else• We know the probability density function for the normal

curve. • Many of the statistics we deal with take a normally

distributed shape under repeated sampling. • Many statistical procedures have been developed that rely

on an assumption of normally distributed data.

Distributions and Probability (recap)

Distributions and Probability (recap) Recall that we can standardize a variable by linearly

transforming it to have a mean of 0 and standard deviation (and variance) of 1.

When you standardize a normally distributed variable, the underlying distribution (with a mean of 0 and variance of 1) is called a standard normal curve, represented as N(0,1). The standard normal curve is useful because it simplifies the interpretation of probabilities of events—most tables of normal curve probabilities are based on the standard normal curve.

A “score” in a standard normal distribution is called a z score, and we can compute z scores via the following transformation:

We interpret a z score as the number of standard deviation units that a particular element lies from the mean of the distribution. This is apparent from the form of the linear transformation that we apply to obtain z scores. Because of what we know about the standard normal curve, we can identify important probabilities associated with particular z scores.

P(z 0) = .50 P(0 z 1) = .34 P(-1 z 1) = .68 P(z 1) = .5 + .34 = .84

P(-2 z 2) = .9544 P(-1.96 z 1.96) = .95 P(z -1.96 and z 1.96)

= 1 - P(-1.96 z 1.96) = .05

Z Table

Although statistical software packages routinely compute the proportion of area under a normal curve for you, it is useful to learn how to read tables of those values.

This table gives three proportions associated with a variety of z scores.

Z TableColumn 2 Column 3 Column 4

Negative Values

Z TableAs you can see, we can get the area between any two z scores by adding and subtracting areas that cumulatively make up the area we’re interested in.

For example, what area is associated with each of the following statements?

P(1 < z < 2) =

P(-1 < z < 2) =

P (z < -2 or z > 2.5) =

Other distributions we’ll use

• T Distribution

• F Distribution

• Chi-Square Distribution

Sampling Distributions and Hypothesis Testing

• We are now going to begin discussing how to use what we know about the distributional properties of the statistics and parameters we are interested

• We want to determine the likelihood that an observed statistic came from some hypothetical population and make judgments based on this likelihood – this is basic hypothesis testing


• Two important terms– Sampling distribution is the distribution of the value

of a particular statistic, over a hypothetically infinite repeated samples of equal size taken from the same population

– Standard error is the standard deviation of the sampling distribution

– We are often interested in the mean – so with this information we want to know what the mean would look like over an infinite number of experiments


Very likely

Less likely

Unlikely


Steps for testing a hypothesis:

1. We generate a research hypothesis (in words)—a theory-based prediction. When written symbolically, the research hypothesis is called the alternative hypothesis (a.k.a. H1 or Ha).

2. We pretend that the data were chosen from a population with known characteristics. That is, we create a null hypothesis (H0)—one that, based on our theory, we believe to be incorrect.

3. We gather data (e.g., randomly sample people, randomly assign them to treatments, expose them to the treatments, and measure their responses to the treatments).

4. We compute the characteristics of the sampling distribution of the statistic assuming that the null hypothesis is true. (e.g. µ=0 σ=1)


5. We calculate the probability of obtaining a statistic as extreme as or more extreme than the one observed, based on the sampling distribution.

6. We decide whether the observed probability of that value (or a more extreme one) is too remote to support our theory. • If the probability of obtaining the observed statistic is

very small, then we reject our null hypothesis and retain our alternative hypothesis. That is, we retain our theory.

• If the probability of obtaining the observed statistic is not small, then we retain our null hypothesis and fail to support our alternative hypothesis. That is, we fail to support our theory (does not mean our theory is false!)

7. We make a substantive (word- and theory-driven) interpretation of the statistical test.

– *Knowing the shape of a sampling distribution allows us to determine the probability of observing a particular test or sample statistic under the assumption that the null hypothesis is true.

Example GRE quantitative scores are believed to be normally distributed

with a mean of 500 and standard deviation of 100.

Suppose you have a student who participated in a new GRE preparation course. Advocates of the course claim that its success will demonstrate that quantitative GRE scores can be altered by targeted study.

The developers of the GRE claim that the course will not work because the quantitative GRE test measures skills that must be developed over a long period of study.

As you can see, there is controversy—one position suggests that the student who has experienced the preparation course will perform better than average and the other position suggests that the student’s performance will be “typical.”

Example To go about determining whether this student is better than “typical,” we state

our research hypothesis (in this example, from the perspective of the proponents of the preparation course): This student has a higher than average quantitative test score.

Symbolically, we write the research hypothesis as the alternative hypothesis:

Ha: μX > μ 0 or μ test prep > μ typical or μ test prep > 500 (The population from which this observation came has a mean greater than

the typical mean of 500).

Then we state the converse of the alternative hypothesis as our null hypothesis:

H0: μ X μ 0 or μ test prep < μ typical or μ test prep < 500 (i.e., This student has “typical” quantitative skills.)

Note that the alternative and null hypotheses refer to parameters rather than statistics.

Example

Too remote to be plausible

Here’s a picture of our decision-making framework. Note that we need to identify only one point on the GRE scale where we believe that the possibility is too remote to be reasonable—a value that is too high to be believable. What value would you choose?

• Instead, maybe this score is part of another population

Example Suppose that we record the student’s GRE score, and

it equals 740. This observation seems to be more consistent with the claims of the course advocates than the claims of the test developers.

Now the question now becomes: Under the H0 assumption that this student is typical, how unusual is a score of 740 (or greater) on the quantitative section of the GRE?

That is, how unlikely is a score as or more extreme than 740? As specified in our decision making framework, we want to talk about the absolute magnitude of the score, relative to the population mean, by considering only the upper tail of the null distribution.

Example

• We can use z-scores to estimate the proability

P(x ≥ 740)P(z ≥ 2.4) = .01

Sampling Distributions and Hypothesis Testing – Rejecting the Null Hypothesis Hence, we would observe a score as or more remote than

740 less than 1% of the time in the population of “typical” GRE test takers.

We refer to this probability as a p-value -- the probability of obtaining a score as extreme as or more extreme than the one observed under the assumption of the null hypothesis.

If we believe that this score is too improbable to have occurred by chance, then we would reject our null hypothesis and retain our alternative and research hypotheses, concluding that this student is not typical.

If we do not believe that this is too improbable to have occurred by chance, then we would retain our null hypothesis.

*Typically, we don’t conclude that our null hypothesis is true, we simply conclude that we don’t have sufficient evidence to support our research hypothesis.

Sampling Distributions and Hypothesis Testing – Rejection Regions and Critical

Values Two points are important:

First, our decision making criteria is somewhat arbitrary—different people might use different criteria to define “improbable.”

Second, because we are making retain/reject decisions based on probabilities, we might be making a mistake—we could reject the null hypothesis when it is indeed true.

Researchers have adopted the convention that observations that could occur less than 5% of the time under the null hypothesis are improbable enough to reject the null hypothesis.

Other less common levels are 1% (i.e., a stricter rule, because it requires a more unusual result to “reject”) and 10% (a more lenient rule because it requires a less unusual result to “reject”).


Values This rejection level (a.k.a. significance level) indicates

how unlikely an event must be before we reject the null hypothesis.

So, by the conventional standard, the probability (p-value) must be .05 or less.

Two terms that are related to each other: Rejection region: The area(s) under the sampling

distribution where events are unlikely enough to warrant rejecting the null hypothesis;

Critical value: The raw score associated with the boundary of the rejection region.

In the GRE score example, the critical value equals 664. CV = X where P(z > Zx) < .05 CV = X where z =1.64 CV=100(1.64)+500=664


Values• In our case, H0 is μtest prep = 500. The critical value for the extreme

areas under the curve is 664. • Because our observed GRE score of 740 falls in the rejection

region (or outside of the critical value), we reject the null hypothesis and conclude that the alternative hypothesis is true.

Retain null

Reject null

CV = 664

x = 740

Recap• So far, we’ve introduced concepts that allow us to test the

null hypothesis in two ways.1. Compute critical values (by converting the relevant z

score or scores associated with a to the raw score scale) and compare the observed statistic to the critical value(s).

– If the observed statistic is more extreme than the critical value (s), then reject the null hypothesis.

2. Compute the p-value of the observed raw score (by converting the observed raw score to a z score and finding the probability of that z score in the normal curve table) and compare the p-value to the chosen α.

– If the p-value is smaller than the chosen a, then reject the null hypothesis

Errors• Since α equals the probability of incorrectly rejecting the null

hypothesis, (1- α) equals the probability of correctly retaining the null hypothesis.

• In our example, we would correctly retain the null hypothesis 1-.05 = .95 or 95% of the time.

• *The level α corresponds to the critical value (here 664) and represents the probability of rejecting a true null hypothesis.

1-alpha

alpha

CV = 664

x = 740

p

Errors• Now consider this figure, which contains an arbitrarily-chosen

alternative distribution (shaded). This is one of many possible distributions that could have generated the observed score.

• When we retain the null hypothesis, we can make another type of error when we retain the null hypothesis, if this alternative is true.

• In this example, we may incorrectly reject the alternative distribution and retain the null distribution with a very high probability.

Beta

This type of error is called a Type II error and is represented as the beta level (β) of the hypothesis test.

β represents the probability of incorrectly rejecting a true alternative distribution; incorrectly retaining the null

Errors• Recall that (1-α) represents the probability of correctly

retaining the null hypothesis. On the other hand, (1-β) represents the probability of correctly rejecting the null hypothesis.

• This probability is given a special name, statistical power or simply power.

1-Beta

Power only applies when H0 is false. That is, when H0 is true, we cannot correctly reject it!

Summary of Errors

The table below summarizes the nature of statistical errors and the corresponding symbols.

However, also realize we will typically NOT know what the “Truth” is—if we did, we would not need to use statistics in our decision-making.

Hence, estimating statistical power requires us to make a lot of assumptions.

Decision

Truth

Ho True Ho False

Reject Ho Type I error () Power (1-)

Retain Ho Correct decision (1-) Type II error ()

One and two-tailed tests• Our GRE example considered only one tail of the null distribution as

fair game for rejecting the null hypothesis. That is, the observed score could have been only greater than the population mean.

• One-tailed (a.k.a. directional) hypothesis test allows us to focus all of your attention on differences in one tail of the null distribution.

Your null hypothesis would state that the parameter you are interested in is equal to or more extreme than some value (e.g., H0: μX 0 or H0: μX 0, depending on the expected direction), and your alternative hypothesis would state that the parameter is greater than or less than that value (e.g., H1: μX > 0 or H1: μX < 0, respectively).

irrelevant

improbable

One and two-tailed tests If you cannot confidently predict the direction of the

expected difference, you should focus your attention on both tails of the null distribution. In this case, you would perform a two-tailed (a.k.a non-directional) test.

Your null hypothesis would state that the parameter you are interested in equals some value (e.g., H0: μX = 0), and your alternative hypothesis would state that the parameter is simply not equal to that value (e.g., H1: μX 0).

A two-tailed test would be appropriate either when (1) no theory exists for making a prediction about the

direction of observed differences, or (2) two competing theories predict the opposite outcomes. *Many researchers use two-tailed tests even though they

are seldom warranted.

One and two-tailed tests• When you choose a two-tailed test, you choose to

divide your Type I error rate (a) into both tails of the null distribution. As a result, you choose critical values for rejecting the null hypothesis that define the most extreme a/2 proportion in each tail.

irrelevant

improbable

One and two-tailed tests• By using a one-tailed hypothesis test, you require a less

extreme critical value—all of a lies in a single tail of the distribution.

• Hence, when α = .05 in a one-tailed (directional) test, the 5% of the null distribution that constitutes the rejection region lies in the single tail that is relevant to the hypothesis test.

• On the other hand, in a two-tailed (non-directional) test, the 5% of the null distribution that constitutes the rejection region is divided into each tail (2.5% each).

alpha/2alpha

alpha/2

RMTD 404

Documents

Transcript of RMTD 404