the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first...

22
The Math Part of the Course…

Transcript of the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first...

Page 1: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

The Math Part of the Course…

Page 2: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Measures of Central Tendency

Mode: The number with the highest frequency in a dataset Median: The middle number in a dataset Mean: The average of the dataset When to use each: Mode: Good for non-numerical data and for frequent occurrences Median: When an outlier may significantly influence the mean, use median Mean: When data have no likely outlier, use mean

Measures of Dispersion Range: Range of values in a dataset (describes the extremes around the typical case) Standard deviation: Shows how much variation there is from the mean. Low standard deviation indicates that the data points tend to be very close to the mean, whereas a high standard deviation indicates that the data is spread out over a large range of values.

Population Standard Deviation Formula

Sample Standard Deviation Formula

Solving for population standard deviation: Assume the dataset: 1, 8, 14, 29, 46

Step one: Solve for :

Step two: Solve for

1 19.6 -18.6 345.96 8 19.6 -11.6 134.56 14 19.6 -5.6 31.36 29 19.6 9.4 88.36 46 19.6 26.4 696.96

1297.20

Page 3: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Step three: Solve final equation

The Normal Distribution

Page 4: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Say μ = 2 and σ = 1/3 in a normal distribution.

The graph of the normal distribution is as follows:

μ = 2, σ = 1/3

The following graph represents the same information, but it has been standardized so that μ = 0 and σ = 1:

μ = 0, σ = 1

The two graphs have different μ and σ, but have the same shape (if we tweak the axes).

The new distribution of the normal random variable Z with mean 0 and variance 1 (or standard deviation 1) is called a standard normal distribution. Standardizing the distribution like this makes it much easier to calculate probabilities.

Considering our example above where μ = 2, σ = 1/3, then

One-half standard deviation = σ/2 = 1/6, and

Two standard deviations = 2σ = 2/3

Page 5: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

If we have mean μ and standard deviation σ, then

Since all the values of X falling between x1 and x2 have corresponding Z values between z1 and z2, it means:

The area under the X curve between X = x1 and X = x2 equals:

The area under the Z curve between Z = z1 and Z = z2.

Hence, we have the following equivalent probabilities:

P(x1 < X < x2) = P(z1 < Z < z2)

So ½ s.d. to 2 s.d. to the right of μ = 2 will be represented by the area from to .

This area is graphed as follows:

μ = 2, σ = 1/3

The area above is exactly the same as the area z1 = 0.5 to z2 = 2 in the standard normal curve:

μ = 0, σ = 1

Page 6: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Finding the Area Under the Normal Curve

In the standard normal curve, the mean is 0 and the standard deviation is 1.

The green shaded area in the diagram represents the area that is within 1.45 standard deviations from the mean. The area of this shaded portion is 0.4265 (or 42.65% of the total area under the curve).

To get this area of 0.4265, we read down the left side of the table for the standard deviation's first 2 digits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table for the "0.05" part (the top row represents the 2nd decimal place of the standard deviation that we are interested in.)

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279

We have:

(left column) 1.4 + (top row) 0.05 = 1.45 standard deviations

The area represented by 1.45 standard deviations to the right of the mean is shaded in green in the standard normal curve above.

You can see how to find the value of 0.4265 in the full z-table below. Follow the "1.4" row across and the "0.05" column down until they meet at 0.4265.

Page 7: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09

0.0 0.0000 0.0040 0.0080 0.0120 0.0160 0.0199 0.0239 0.0279 0.0319 0.0359

0.1 0.0398 0.0438 0.0478 0.0517 0.0557 0.0596 0.0636 0.0675 0.0714 0.0753

0.2 0.0793 0.0832 0.0871 0.0910 0.0948 0.0987 0.1026 0.1064 0.1103 0.1141

0.3 0.1179 0.1217 0.1255 0.1293 0.1331 0.1368 0.1406 0.1443 0.1480 0.1517

0.4 0.1554 0.1591 0.1628 0.1664 0.1700 0.1736 0.1772 0.1808 0.1844 0.1879

0.5 0.1915 0.1950 0.1985 0.2019 0.2054 0.2088 0.2123 0.2157 0.2190 0.2224

0.6 0.2257 0.2291 0.2324 0.2357 0.2389 0.2422 0.2454 0.2486 0.2517 0.2549

0.7 0.2580 0.2611 0.2642 0.2673 0.2704 0.2734 0.2764 0.2794 0.2823 0.2852

0.8 0.2881 0.2910 0.2939 0.2967 0.2995 0.3023 0.3051 0.3078 0.3106 0.3133

0.9 0.3159 0.3186 0.3212 0.3238 0.3264 0.3289 0.3315 0.3304 0.3365 0.3389

1.0 0.3413 0.3438 0.3461 0.3485 0.3508 0.3531 0.3554 0.3577 0.3599 0.3621

1.1 0.3643 0.3665 0.3686 0.3708 0.3729 0.3749 0.3770 0.3790 0.3810 0.3830

1.2 0.3849 0.3869 0.3888 0.3907 0.3925 0.3944 0.3962 0.3980 0.3997 0.4015

1.3 0.4032 0.4049 0.4066 0.4082 0.4099 0.4115 0.4131 0.4147 0.4162 0.4177

1.4 0.4192 0.4207 0.4222 0.4236 0.4251 0.4265 0.4279 0.4292 0.4306 0.4319

1.5 0.4332 0.4345 0.4357 0.4370 0.4382 0.4394 0.4406 0.4418 0.4429 0.4441

1.6 0.4452 0.4463 0.4474 0.4484 0.4495 0.4505 0.4515 0.4525 0.4535 0.4545

1.7 0.4554 0.4564 0.4573 0.4582 0.4591 0.4599 0.4608 0.4616 0.4625 0.4633

1.8 0.4641 0.4649 0.4656 0.4664 0.4671 0.4678 0.4686 0.4693 0.4699 0.4706

1.9 0.4713 0.4719 0.4726 0.4732 0.4738 0.4744 0.4750 0.4756 0.4761 0.4767

2.0 0.4772 0.4778 0.4783 0.4788 0.4793 0.4798 0.4803 0.4808 0.4812 0.4817

2.1 0.4821 0.4826 0.4830 0.4834 0.4838 0.4842 0.4846 0.4850 0.4854 0.4857

2.2 0.4861 0.4864 0.4868 0.4871 0.4875 0.4878 0.4881 0.4884 0.4887 0.4890

2.3 0.4893 0.4896 0.4898 0.4901 0.4904 0.4906 0.4909 0.4911 0.4913 0.4916

2.4 0.4918 0.4920 0.4922 0.4925 0.4927 0.4929 0.4931 0.4932 0.4934 0.4936

2.5 0.4938 0.4940 0.4941 0.4943 0.4945 0.4946 0.4948 0.4949 0.4951 0.4952

2.6 0.4953 0.4955 0.4956 0.4957 0.4959 0.4960 0.4961 0.4962 0.4963 0.4964

2.7 0.4965 0.4966 0.4967 0.4968 0.4969 0.4970 0.4971 0.4972 0.4973 0.4974

2.8 0.4974 0.4975 0.4976 0.4977 0.4977 0.4978 0.4979 0.4979 0.4980 0.4981

2.9 0.4981 0.4982 0.4982 0.4983 0.4984 0.4984 0.4985 0.4985 0.4986 0.4986

3.0 0.4987 0.4987 0.4987 0.4988 0.4988 0.4989 0.4989 0.4989 0.4990 0.4990

3.1 0.4990 0.4991 0.4991 0.4991 0.4992 0.4992 0.4992 0.4992 0.4993 0.4993

3.2 0.4993 0.4993 0.4994 0.4994 0.4994 0.4994 0.4994 0.4995 0.4995 0.4995

3.3 0.4995 0.4995 0.4995 0.4996 0.4996 0.4996 0.4996 0.4996 0.4996 0.4997

3.4 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4997 0.4998

3.5 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998 0.4998

3.6 0.4998 0.4998 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999

3.7 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999 0.4999

Page 8: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Find the area under the standard normal curve for the following, using the z-table. Sketch each one.

(a) between z = 0 and z = 0.78

(b) between z = -0.56 and z = 0

(c) between z = -0.43 and z = 0.78

(d) between z = 0.44 and z = 1.50

(e) to the right of z = -1.33.

(a) 0.2823

(b) 0.2123

Page 9: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

(c) 0.1664 + 0.2823 = 0.4487

(d) 0.4332 - 0.1700 = 0.2632

(e) 0.4082 + 0.5 = 0.9082

Page 10: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

It was found that the mean length of 100 parts produced by a lathe was 20.05 mm with a standard deviation of 0.02 mm. Find the probability that a part selected at random would have a length

(a) between 20.03 mm and 20.08 mm

(b) between 20.06 mm and 20.07 mm

(c) less than 20.01 mm

X = length of part

(a) 20.03 is 1 standard deviation below the mean;

20.08 is standard deviations above the mean

P(20.03<X<20.08) =P(-1<Z<1.5) =.3413+.4332 =.7745

So the probability is 0.7745.

(b) 20.06 is 0.5 standard deviations above the mean;

20.07 is 1 standard deviation above the mean

P(20.06<X<20.07) =P(.5<Z<1) =.3413-.1915 =.1498

So the probability is 0.1498.

(c) 20.01 is 2 s.d. below the mean.

P(X<20.07) =P(Z<-2) =.5-.4792 =.0228

So the probability is 0.0228.

Page 11: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

A company pays its employees an average wage of $3.25 an hour with a standard deviation of 60 cents. If the wages are approximately normally distributed, determine

a. the proportion of the workers getting wages between $2.75 and $3.69 an hour; b. the minimum wage of the highest 5%.

X = wage

(a)

P(2.75<X<3.69) = P(-.833<Z<.7333) =.298 + .268 =.566

So about 56.6% of the workers have wages between $2.75 and $3.69 an hour.

(b) W = minimum wage of highest 5%

x = 1.645 (from table)

X-3.25=.987 X=4.237

So the minimum wage of the top 5% of salaries is $4.24.

Page 12: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

The average life of a certain type of motor is 10 years, with a standard deviation of 2 years. If the manufacturer is willing to replace only 3% of the motors that fail, how long a guarantee should he offer? Assume that the lives of the motors follow a normal distribution.

X = life of motor

x = guarantee period

Normal Curve: μ = 10, σ = 2

We need to find the value (in years) that will give us the bottom 3% of the distribution. These are the motors that we are willing to replace under the guarantee.

P(X < x) = 0.03

The area that we can find from the z-table is

0.5 - 0.03 = 0.47

The corresponding z-score is z = -1.88.

Since , we can write:

Solving this gives x = 6.24.

So the guarantee period should be 6.24 years.

Page 13: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Measures of Association

Age Group

<12 12-24 >24

Monkey Low 4 6 18

Favorability Medium 8 9 9

Rating High 20 8 3

Lambda:

An asymmetrical measure of association: the value varies depending on which variable is independent.

Ranges from 0 to 1

Formula:

1. Calculate Row and Column Totals

Age Group

<12 12-24 >24

Monkey Low 4 6 18 28

Favorability Medium 8 9 9 26

Rating High 20 8 3 31

32 23 30 85

2. Calculate E1: Find the mode of the dependent variable (the attribute that occurs the most often) and subtract it from N (sample size). E1=N-ƒ of the mode

E1=85-31=54

3. Calculate E2: Find the mode in each column (i.e., category of the independent variable). Subtract each value from the column (category) total and add them together. E2=(Column total – Column mode) + (Column total – Column mode) for all attributes of the independent variable.

E2=(32-20)+(23-9)+(30-18)=12+14+12=38 4. Find lambda.

We know that thirty percent of the errors in predicting the relationship between age and monkey favorability can be reduced by taking into account the voter’s age.

Page 14: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Gamma:

• A measure of association using ordinal variables

• It is a symmetrical measure, therefore you don’t need to specify the IV and DV.

• Compares pairs of observations that are positive (going in the same direction) and negative (going in the opposite direction).

• Ranges from 0 to 1

• Formula:

• Ns=Count of Same order pairs (positive); Nd= Count of inverse order pairs (negative)

Age Group

<12 12-24 >24

Monkey Low 4 6 18

Favorability Medium 8 9 9

Rating High 20 8 3

To find Ns: Multiply top left cell frequency by the sum of all cells that are lower and to the right of that cell.

Ns= 4(9+8+9+3) + 8(8+3) + 6(9+3) + 9(3) Ns= 116 + 88 + 72 + 27 = 313

To find Nd: Multiply top right cell frequency by the sum of all cells that are lower and to the left of that cell.

Nd= 18(9+8+8+20) + 9(8+20) + 6(8+20) + 9(20) Nd= 810 + 252 + 168 + 180 = 1410

Interpret: Using age to predict monkey favorability results in a proportional reduction of error of 65%. There is an inverse or negative relationship: as age increases, favorability of monkeys decreases.

Page 15: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Chi-Square: Chi-square is a statistical test commonly used to compare observed data with data we would expect to obtain according to a specific hypothesis. For example, if, according to Mendel's laws, you expected 10 of 20 offspring from a cross to be male and the actual observed number was 8 males, then you might want to know about the "goodness to fit" between the observed and expected. Were the deviations (differences between observed and expected) the result of chance, or were they due to other factors. How much deviation can occur before you, the investigator, must conclude that something other than chance is at work, causing the observed to differ from the expected. The chi-square test is always testing what scientists call the null hypothesis, which states that there is no significant difference between the expected and observed result.

Age Group

<12 12-24 >24

Monkey Low 4 6 18 28

Favorability Medium 8 9 9 26

Rating High 20 8 3 31

32 23 30 85

Hypotheses: H0: Age and favorability are independent; H1: Age and favorability are related First step: Calculate the expected values of each cell. Our null hypothesis would be that age has no bearing on favorability of monkeys. As a result, the null hypothesis would expect that favorability within each age group would be equal. To calculate

the expected value of a cell:

Age Group

<12 12-24 >24

Monkey Low 4

(10.54) 6

(7.58) 18

(9.88) 28

Favorability Medium 8

(9.79) 9

(7.04) 9

(9.18) 26

Rating High 20

(11.67) 8

(8.39) 3

(10.94) 31

32 23 30 85

Second step: Calculate the chi-square calculated value.

Formula:

= + + + + + + + +

Page 16: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Third step: Determine the critical value

Significance Level

df .10 .05 .025 .01 .005 1 2.7055 3.8415 5.0239 6.6349 7.8794

2 4.6052 5.9915 7.3778 9.2104 10.5965 3 6.2514 7.8147 9.3484 11.3449 12.8381 4 7.7794 9.4877 11.1433 13.2767 14.8602 5 9.2363 11.0705 12.8325 15.0863 16.7496 6 10.6446 12.5916 14.4494 16.8119 18.5475

To use this table, we need to first determine our level of significance. For the purposes of this class,

let’s always work on the assumption that we want 95% confidence ( ). Next, we need to

figure out our degrees of freedom (df).

As a result, our critical value for .05 at df = 4 is 9.4877. Fourth step: Compare the calculated chi-square value with the critical value. Chi-square calculated: 23.66; chi-square critical: 9.49 As a result, we REJECT the null. We can conclude that monkey favorability and age are related in some way.

Page 17: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Two Sample T-Test Purpose: To compare responses from two groups. These two groups can come from different experimental treatments, or different natural "populations".

Assumptions:

each group is considered to be a sample from a distinct population the responses in each group are independent of those in the other group the distributions of the variable of interest are normal

In a test of the hypothesis that females smile at others more than males, females and males were videotaped while interacting and the number of smiles emitted was recorded. Using the following number of smiles in the 5-minute interaction, test the null hypothesis that there are no gender differences between the number of smiles.

Males Females

8 15 11 19 13 13 4 11 2 18

Step One: Calculate the Means of Each Group

Step Two: Solve for the Variances of the Two Samples

8 7.6 .4 .16 15 15.2 -.2 .04 11 7.6 3.4 11.56 19 15.2 3.8 14.44 13 7.6 5.4 29.16 13 15.2 -2.2 4.84 4 7.6 -3.6 12.96 11 15.2 -4.2 17.64 2 7.6 -5.6 31.36 18 15.2 2.8 7.84

85.2 44.8

21.3 11.2

Step Three: Solve for t

Page 18: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Step Four: Compare Calculated t-value with Critical t-value To determine the critical t-value, we first need to determine the degrees of freedom (df). With t-tests, df = n1+n2+-2. df = 5+5-2 = 8

At 95% confidence ( ), the critical t-value is consequently 2.306.

df 50% 60% 70% 80% 90% 95% 98% 99% 99.5% 99.8% 99.9% 1 1.000 1.376 1.963 3.078 6.314 12.71 31.82 63.66 127.3 318.3 636.6 2 0.816 1.061 1.386 1.886 2.920 4.303 6.965 9.925 14.09 22.33 31.60 3 0.765 0.978 1.250 1.638 2.353 3.182 4.541 5.841 7.453 10.21 12.92 4 0.741 0.941 1.190 1.533 2.132 2.776 3.747 4.604 5.598 7.173 8.610 5 0.727 0.920 1.156 1.476 2.015 2.571 3.365 4.032 4.773 5.893 6.869 6 0.718 0.906 1.134 1.440 1.943 2.447 3.143 3.707 4.317 5.208 5.959 7 0.711 0.896 1.119 1.415 1.895 2.365 2.998 3.499 4.029 4.785 5.408 8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 3.833 4.501 5.041 9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 3.690 4.297 4.781 10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 3.581 4.144 4.587 11 0.697 0.876 1.088 1.363 1.796 2.201 2.718 3.106 3.497 4.025 4.437 12 0.695 0.873 1.083 1.356 1.782 2.179 2.681 3.055 3.428 3.930 4.318 13 0.694 0.870 1.079 1.350 1.771 2.160 2.650 3.012 3.372 3.852 4.221 14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.326 3.787 4.140 15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.286 3.733 4.073 16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.252 3.686 4.015 t-score calculated: 2.98; t-score critical: 2.306 As a result, we REJECT the null. We can conclude that gender and smiling are related in some way.

Page 19: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Regression

Regression is a tool for describing how, how strongly, and under what conditions an independent and dependent variable are associated. It can be used to make causal inferences. The ordinary least squares regression formula is Y = a + bX and describes the slope of a line:

– Y = dependent variable

– a = y-intercept (or constant)

– b = slope or coefficient

– X = independent variable If b is positive, the relationship is positive; if b is negative, the relationship is negative.

Interpreting Regression

Data are gathered on 40 countries to study variations in birth rate. Consider this equation: Y = 32-.0018X r = - .78 Seb = .00024 Where: Y = birth rate per 1000 population and X = per capita income Identify the following: independent and dependent variables; regression coefficient; the constant; the correlation coefficient; the coefficient of determination; the standard error of the slope. IV: Per capita income DV: Birth rate per 1000 population Regression coefficient: -.0018 (for every drop of 1 in per capita income, we see an increase

of .0018 in birth rate per 1000 population) Constant: 32 (the predicted value of Y would be 32 if X=0) Correlation coefficient: -.78 (there is a strong, negative relationship) Coefficient of determination: .6084 (-.78*-.78) Standard error of the slope: .00024 What percent variation in birth rate is associated with per capita income?

6.084 (r2=-.78*-.78) What is the direction of the relationship?

Negative

Page 20: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Calculate the t-ratio. What does this tell you?

It allows us to test the hypothesis that b=0. df = 38 (n-2).

The critical t-value at 95% confidence and df = 38 is 2.024.

As a result, we REJECT the null. We can conclude that gender and smiling are related in some way.

A country has a per capita income of $2000. Estimate its birth rate. Y = 32-.0018X Y= 32-.0018(2000) Y= 32-3.6 Y= 28.4 28.4 births per 1000 population

Interpreting Multiple Regression Regression

Model Summary

Model R R Square Adjusted R Square

Std. Error of the

Estimate

1 .638a .407 .403 19.469

a. Predictors: (Constant), ZZ11. PRE IWR OBS: R gender, Y6. Employment status, J1.

Party ID: Does R think of self as Dem, Rep, Ind or what, Y1x. Age of Respondent, Y3.

Highest grade of school or year of college R completed, C5ax. SUMMARY: R better/worse

off than 1 year ago, F1ax. SUMMARY: economy better worse in last year, Y21a. Household

income

R-Square is the proportion of variance in the dependent variable which can be predicted from the independent variables. This value indicates that 41% of the variance in the dependent variable can be predicted from the independent variables. Note that this is an overall measure of the strength of association, and does not reflect the extent to which any particular independent variable is associated with the dependent variable.

Page 21: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

ANOVAb

Model Sum of Squares Df Mean Square F Sig.

1 Regression 352041.587 8 44005.198 116.098 .000a

Residual 513212.737 1354 379.035

Total 865254.324 1362

a. Predictors: (Constant), ZZ11. PRE IWR OBS: R gender, Y6. Employment status, J1. Party ID: Does R think of self as Dem,

Rep, Ind or what, Y1x. Age of Respondent, Y3. Highest grade of school or year of college R completed, C5ax. SUMMARY: R

better/worse off than 1 year ago, F1ax. SUMMARY: economy better worse in last year, Y21a. Household income

b. Dependent Variable: B1j. Feeling Thermometer: Republican Party

The F Value is the Mean Square Regression divided by the Mean Square Residual, yielding F. The p value associated with this F value is very small (0.0000). These values are used to answer the question "Do the independent variables reliably predict the dependent variable?". The p value is compared to your alpha level (typically 0.05) and, if smaller, you can conclude "Yes, the independent variables reliably predict the dependent variable". You could say that the group of independent variables can be used to reliably predict the dependent variable. If the p value were greater than 0.05, you would say that the group of independent variables do not show a significant relationship with the dependent variable, or that the group of independent variables do not reliably predict the dependent variable. Note that this is an overall significance test assessing whether the group of independent variables when used together reliably predict the dependent variable, and does not address the ability of any of the particular independent variables to predict the dependent variables. The ability of each individual independent variable to predict the dependent variable is addressed in the table below where each of the individual variables are listed.

Page 22: the Course…cstl-cla.semo.edu/wmiller/ps240/Methods Guide.pdfdigits (the whole number and the first number after the decimal point, in this case 1.4), then we read across the table

Coefficientsa

Model

Unstandardized Coefficients

Standardized

Coefficients

t Sig. B Std. Error Beta

1 (Constant) 62.215 3.569 17.430 .000

C5ax. SUMMARY: R better/worse off

than 1 year ago

.418 .432 .021 .966 .334

F1ax. SUMMARY: economy better

worse in last year

3.763 .743 .113 5.062 .000

J1. Party ID: Does R think of self as

Dem, Rep, Ind or what

7.393 .271 .601 27.269 .000

Y1x. Age of Respondent .087 .034 .054 2.546 .011

Y3. Highest grade of school or year

of college R completed

-.632 .243 -.062 -2.601 .009

Y6. Employment status -1.772 2.398 -.016 -.739 .460

Y21a. Household income .018 .106 .004 .169 .865

ZZ11. PRE IWR OBS: R gender -2.877 1.072 -.057 -2.684 .007

a. Dependent Variable: B1j. Feeling Thermometer: Republican Party

Feeling thermometer Republican Party = 62.215 + .418Better/Worse Off + 3.763 Economy + 7.393 PartyID + .087 Age - .632 Education – 1.772 Unemployed + .018 Income – 2.877 Gender (B) These estimates tell you about the relationship between the independent variables and the dependent variable. These estimates tell the amount of increase in Feeling Thermometer Republican that would be predicted by a 1 unit increase in the predictor. (b) These are the values for a regression equation if all of the variables are standardized to have a mean of zero and a standard deviation of one. Because the standardized variables are all expressed in the same units, the magnitudes of the standardized coefficients indicate which variables have the greatest effects on the predicted value. This is not necessarily true of the unstandardized coefficients. Because the magnitudes of the unstandardized coefficients can largely depend on the units of the variables, the effects of the variable on the prediction can be difficult to gauge. While the standardized coefficients may vary significantly from the unstandardized coefficients in magnitude, the sign (positive or negative) of the coefficients is unchanged. These columns provide the t value and 2 tailed p value used in testing the null hypothesis that the coefficient is 0. Coefficients having p values less than alpha are significant. For example, if you chose alpha to be 0.05, coefficients having a p value of 0.05 or less would be statistically significant (i.e., you can reject the null hypothesis and say that the coefficient is significantly different from 0).