United Stats Of AMERICA. Unit 7 chapters 26-27 Jordo, Rob III, Kins and Toph.
-
Upload
eustacia-clarke -
Category
Documents
-
view
216 -
download
3
Transcript of United Stats Of AMERICA. Unit 7 chapters 26-27 Jordo, Rob III, Kins and Toph.
United Stats Of AMERICA
Unit 7chapters 26-27
Jordo, Rob III, Kins and Toph
Chapter 26
Three types of Chi-squared tests:1) Goodness of Fit2) Homogeneity3) Independence
Goodnessgracious of Fit
The test is used when you have one categorical variable from a single population. It
is used to determine whether sample data are consistent with a hypothesized distribution. (How
“good” the data fit the hypothesis)
Goodness of FitConditions:The sampling method is random.The variable under study is categorical. (counted)The expected value of the number of sample observations in each level of the variable is at least 5. (expected cells > 5)
Degrees of freedom:df = n - 1n = total number of categories
Goodness of FitHypothesis:
We need a null hypothesis (H0) and an alternative hypothesis (Ha). The hypotheses are mutually exclusive. So if one is true, the other must be false; and vice versa.For a chi-square goodness of fit test, the hypotheses take the following form.
H0: The data are consistent with a specified distribution.
Ha: The data are not consistent with a specified distribution.
Goodness of Fit
Acme Toy Company prints baseball cards. The company claims that
30% of the cards are rookies, 60% veterans, and 10% are All-Stars.
The cards are sold in packages of 100.
Suppose a randomly-selected package of cards has 50 rookies, 45
veterans, and 5 All-Stars. Is this consistent with Acme's claim? Use a
0.05 level of significance.
Here you can see there is only one categorical variable, and putting
these numbers in the calculator and doing a x^2 GOF test is super
easy.
Rookies Veterans All-Stars
50 45 100
noHomogeneity
This test is used for one categorical variable from two populations. It is used to determine whether frequency is consistent across different populations.
Homogeneity
Conditions● Expected Cells > 5● Categorical● Random
HomogeneityHypothesis:
H0: The distribution of separate categories is the same.Ha: The distribution is different.
HomogeneityViewing Preferences Row total
Lone Ranger Sesame Street
The Simpsons
Boys 50 30 20 100
Girls 50 80 70 200
Column total 100 110 90 300
In a study of the television viewing habits of children, a developmental psychologist selects a random sample of 300 first graders - 100 boys and 200 girls. Each child is asked which of the following TV programs they like best: The Lone Ranger, Sesame Street, or The Simpsons. Results are shown in the contingency table above. Do the boys' preferences for these TV programs differ significantly from the girls' preferences? Use a 0.05 level of significance.
declaration of Independence
● We use Independence to find out if one thing causes another or if two samples relate.
● Ho: will always be that X is INDEPENDENT of Y.● Ha: will always be X is DEPENDENT OF Y.
Independence
Example:
Yes No Total
Male 2 6 8
Female 4 8 12
Total 6 14 20
● To find Degrees of freedom you would do: (number of rows-1)(number of columns-1)For this chart it would be (2-1)(2-1)=1
● We have to find expected cells to make sure that they are greater or equal to 5. To do this for the shaded cell we would do 6 times 8 divided by 20 which equals 2.4 which is less than five so this example would not work.
Independence
Conditions:-Categorical-Counted-Expected > or to 5-Random-Independent
Chapter 27
Regression Analysis
● We use regression analysis to determine if a relationship exists between two quantitative variables.
● Chapter 27 is a throwback to earlier chapters
○ Chapter 8 - Scatterplots
H0: ẞ1=0 (This means that the slope is equal to 0, meaning that there is no linear relationship)HA: ẞ1≉N0 (The slope is not equal to 0, so there is a linear relationship)
Conditions● Straight Enough (Linear)● Quantitative Data● Residual Graph is good● Random● Nearly Normal● No Outliers
Example
How to make regression equation:● The row labeled “Constant” or the name of
the y-variable is the information for the y-intercept. (Beta 0)
● The other row, which is usually labeled with the name of the x-variable, shows the slope. (Beta 1)
Ŷ=83.608-4.0888(x)*****Make sure you talk about the slope and the
r-squared in context*****
Making an inference
Since we are testing the Beta 1, which is slope, we will look at P for the x variable.● P=0.000● Conclusion: We have enough
evidence to reject the null, and can conclude that there is a relationship between the two variables.
Confidence Intervals
The equation for a confidence interval is:ẞ1 士 T*(SE)
ẞ1 is -4.088. SE is .3842. We’ll do a 95% confidence interval, so we’ll need to find the t-score using an inverse-T function on the calculator. The equation comes out to be -4.088 士 2.1(.3842).
Conclusion: We can be 95% confident that the true mean of the relationship between the two variables is between -4.89 and -3.28.
*****Degrees of Freedom are always n-2*****