Math 3070-1 Review Sheet, Spring 2011

6
1 Math 3070-1 Review Sheet, Spring 2011 Descriptive Statistics First we need to familiarize ourselves with the concepts and relationships among Population Samples Objects The distribution of a population variable describes the likelihood of all possible values and it can be presented in one of the following forms: Graph Histogram Stemplot Bar graph Pie chart Summary numbers (condensed and partial information): 1. Sample mean or median (the center) 2. Sample variance or sample standard deviation, or quartiles (the spread) You need to know the formulas for these quantities (special attention to the formula for s. Mathematical formula for the distribution function Probability A probability space (system) is defined through a triplet that includes a sample space, a collection of events, and a probability measure. Make sure you understand the concepts of outcomes and events (corresponding to elements and subsets of a set). An outcome is an event but an event is not limited to outcomes. When we talk about probability of something, we refer to events, which contain outcomes but can be more complicated since they combine outcomes in different ways. Some of the crucial concepts are: 1. Unions and intersections: the probabilities of them in regard to the probabilities of individual events are governed by the probability axioms (you need to be familiar with them). The formula for the union is quite intuitive: P( A B) = P ( A) + P( B) P ( A B) 2. Counting problems (combinatorics): these are some of the most challenging problems in this course, even though they are not the focus of our studies. One particular question students ask all the time is which formula we should use for a particular problem: permutation or combination. The fact is that in many cases both would work in one way or another, so there is not a clear answer. A useful strategy we can follow is to break the selection procedure into several stages, with

Transcript of Math 3070-1 Review Sheet, Spring 2011

1

Math 3070-1 Review Sheet, Spring 2011

Descriptive Statistics First we need to familiarize ourselves with the concepts and relationships among

• Population • Samples • Objects

The distribution of a population variable describes the likelihood of all possible values and it can be presented in one of the following forms:

• Graph Histogram Stemplot Bar graph Pie chart …

• Summary numbers (condensed and partial information): 1. Sample mean or median (the center) 2. Sample variance or sample standard deviation, or quartiles (the spread)

You need to know the formulas for these quantities (special attention to the formula for

s.

• Mathematical formula for the distribution function Probability A probability space (system) is defined through a triplet that includes a sample space, a collection of events, and a probability measure. Make sure you understand the concepts of outcomes and events (corresponding to elements and subsets of a set). An outcome is an event but an event is not limited to outcomes. When we talk about probability of something, we refer to events, which contain outcomes but can be more complicated since they combine outcomes in different ways. Some of the crucial concepts are:

1. Unions and intersections: the probabilities of them in regard to the probabilities of individual events are governed by the probability axioms (you need to be familiar with them). The formula for the union is quite intuitive:

P(A∪ B) = P(A) + P(B) − P(A∩ B) 2. Counting problems (combinatorics): these are some of the most challenging

problems in this course, even though they are not the focus of our studies. One particular question students ask all the time is which formula we should use for a particular problem: permutation or combination. The fact is that in many cases both would work in one way or another, so there is not a clear answer. A useful strategy we can follow is to break the selection procedure into several stages, with

2

each stage clearly defined and manageable, and after this one can use the product rule to calculate the probability in question.

3. Conditional probability and independence: they should be studied together, and you can interpret the formula

P(A |B) =P(A∩ B)P(B)

by noting that if

A and

B are independent then

P(A |B) = P(A) will be implied from above and it leads to the definition of independence.

4. Bayes’ theorem gives a way to compute the conditional probabilities from other conditional probabilities with roles interchanged, and there are many unexpected results from the use of this formula

P(A j |B) =P(B | A j )P(A j )

P(B | Ai)P(Ai)i=1

k

Random Variables: Discrete and Continuous Note: In formulas involving rv’s, we use upper case letters to denote the random variables, and lower case letters for the values they assume. If you see a function with only lower case letter variables, you are dealing with a function of deterministic variables. The probability distribution is all you can ask for in a random variable. If you opt for functions or graphs, you can choose from

• Probability mass function (pmf) for discrete rv’s, or probability density function (pdf) for continuous rv’s

• Cumulative distribution function (cdf), and you need to pay attention to where the dots and circles are, and understand that it is nondecreasing.

Make sure you know how to convert between the pmf and cdf, or the pdf and cdf. The expected value, or the expectation, or the mean of a rv

h(X) that depends on the rv X (with the dependence specified by a formula), is the first important quantity you can compute from the distribution. The formulas are

E[h(X)] = h(x) ⋅ p(x)D∑ for discrete rv where p(x) is the pmf

E[h(X)] = h(x) f (x)dx−∞

∫ for continuous rv where

f (x) is the pdf

The variance measures the variability and the formula is

V[h(X)] = E[h(X)2]− [E(h(X))]2 Here is a list of distribution formulas we may need in the final:

1. Discrete:

3

• Binomial pmf:

b(x;n, p) =nx⎛

⎝ ⎜ ⎞

⎠ ⎟ px (1− p)n−x ,

• Poisson pmf:

p(x;λ) =e−λλx

x!.

Make sure you know the regions where the probability mass function assumes the value zero.

2. Continuous: • Uniform distribution

• Normal distribution:

f (x) =12πσ 2

e−(x−µ )2

2σ 2 .

Notice that the standardization procedure P[

X ≤ x]=P[

Z ≤ x − µσ

]=

Φ( x − µσ

) ,

where X has a normal distribution with mean

µ and standard deviation

σ , and Z has the standard normal distribution. The cdf for the standard normal is

Φ(z) which is given in table A.3. • Exponential distribution:

f (x) = λe−λx for

x ≥ 0 Joint distribution: pmf

p(x,y) for discrete, usually given in a table, and pdf

f (x,y) for continuous, often given by a formula. How do we obtain the cdf from either a pmf or a pdf? The marginal pmf or pdf describes the distribution if we don’t care about one of the variables (that is, we cannot distinguish different values the other variables assume). Here is an easy way to see why: the marginal pdf of

X does NOT depend on

y , therefore it is obtained by integrating the pdf

f (x,y) with respect to

y . All

y -values are covered in this integral and the

y -dependence vanishes as a consequence. Conditional pdf: this is in the same spirit as conditional probability. For example, the conditional pdf of

Y given that

X = x is

fY |X (y | x) =f (x,y)fX (x)

Covariance between two rv’s is

Cov(X,Y ) = E[(X −µX )(Y −µY )]

and the correlation is

ρX ,Y =Cov(X,Y )σX ⋅σY

. We see that the scaling factors are there so we

will have

−1≤ ρX ,Y ≤1. The correlation takes into account of the respective variances so it is often more useful to quote. When we have several rv’s, we are often interested in the behavior of combinations of these rv’s, and one of the simple cases is the linear combination of several rv’s, such as

4

the sum of several rv’s, or the average. For linear combinations, the expectation of the linear combination is the linear combination of the expectations. When we look the variability, we should look at the variance first, rather than the standard deviation. The reason is that we have the following formula, assuming the rv’s are independent

V (a1X1 ++ anXn ) = a12V (X1) ++ an

2V (Xn ) .

Here we need to be careful about the signs, and we see that the variances always add up in case of independence. If we have a sum of a large number of independent, identically distributed rv’s, divided by the square root of the number, this rv will behave like a normal rv. This is one of the most important theorems in probability theory: the Central Limit Theorem, which serves as the basis for the z test in later chapters. Point Estimation The general notation is

ˆ θ for an estimator of the population parameter

θ . There can be different estimators for the same parameter, and we would like to choose one for a particular application. Features you need to look for are: the bias and the variance of the estimator; and we prefer an unbiased estimator with the minimum variance. However, in reality it is not always readily available, sometimes you just cannot find one with all the desired features. A statistic is a quantity computed from a sample. Different random samples will result in different values for the statistic. The main focus of the latter part of the course is to use statistics from sample data to infer properties of the population under study. Statistical Inference There are two approaches here: confidence interval and test of significance, each can be applied to the following problems: inference about the mean, the proportion, or the variance of a population, inference about the difference between two population means and difference between two population proportions.

1. Confidence Intervals for One Sample:

The basic form of confidence intervals is

ˆ θ ± z* ⋅ SE or

ˆ θ ± t* ⋅ SE where

z* and

t* are the critical

z -value and

t -value based on the confidence level and the type of intervals, for

z -test and

t -test respectively, and SE is the standard error.

5

Inference about Large sample Sample not large

µ, Normal distribution

z -test.

zα or

zα / 2 ,

SE = s / n

t -test.

tα ,n−1 or

tα / 2,n−1,

SE = s / n

µ, Distribution unknown

Same as above

N/A

p (proportion)

Similar to the above except

SE =ˆ p (1− ˆ p )

n

Use binomial distribution

2. Test of Hypothesis for One Sample:

The statistic is

θ −θ0SE

, where

θ is the value of the statistic from a sample,

θ0 is

the null value, and

SE is the standard error with formula in the above table. The test to be used is also specified in the above table. The first step is always about stating the hypotheses. The null hypothesis is usually obvious, but there is a decision to be made about the form of the alternative hypothesis, which is usually determined from the context of the problem. It is recommended that you read the question several times before you suggest an alternative hypothesis. There are two equivalent approaches to test the hypotheses if you want to use your data to question the null hypothesis at the given level

α : (a) Compute the statistic and compare with the corresponding critical value; (b) Compute the P-value and then compare with

α . Notice that a large test statistic (in absolute value) is usually a good sign in rejecting the null hypothesis.

3. Confidence Interval and Test of Hypothesis Involving Two Samples:

First you have to decide if you should use a two-sample test or a paired data test. In the latter case the formulas are similar to the one sample test formulas as long as you keep the pairs and study the differences. In the true two-sample test, the major change is the definition of the Standard Error.

6

For the difference between the means:

SE =s12

m+s22

n. This is used in both the

confidence interval and the two-sample t-procedure.

For differences in proportions, we use different formulas for standard errors. In confidence interval computations, we use

SE =ˆ p 1ˆ q 1m

+ˆ p 2 ˆ q 2n

,

and in significance tests for comparing two proportions, with null hypothesis assuming that the proportions are the same, we use

SE = ˆ p ̂ q ( 1m

+1n

) ,

where

ˆ p is the pooled proportion.

For inference concerning two population variances, we should know that the statistic and that it has an F distribution.

4. Computation of the probability of the type-II errors: the formulas will be supplied

but you need to understand the meaning of type-II errors, and why we care about them.

Good luck with the final and congratulations to those who are graduating!