Inferential Statistics. With inferential statistics you can do the following: Determine probability...

41
Inferential Statistics

Transcript of Inferential Statistics. With inferential statistics you can do the following: Determine probability...

Page 1: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Inferential Statistics

Page 2: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Inferential Statistics

With inferential statistics you can do the following:

Determine probability of characteristics of population based on the characteristics of your sample. You know what the relationship “looks like” in the sample. You have the actual numbers. Inferential statistics tells you the probability that the same relationship exists in the population.

Assess the strength of the relationship between independent and dependent variables. Inferential statistics allows you to determine if the relationship is statistically significant and helps you decide if it is substantially significant (e.g., is the relationship strong enough to matter).

Page 3: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Inferential StatisticsInferential statistics are used to test hypotheses

Research hypotheses state there is a relationship between two or more variables in your sample.

Initially you do NOT assume there is a relationship in the population (i.e., null hypothesis claims there is NO relationship).

You compute a statistic that indicates the probability that we can reject the null hypothesis and support the research hypothesis (e.g., the same relationship we see in our sample also exists in the population). If it is 95% probable that the sample relationship exists in the population, we say it is significant at the .05 level (5% chance of making an error, or we say there is also a relationship in the population from which this sample was drawn).

Page 4: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

How Do We Apply What We Learn From Inferential Statistics?

• BEFORE you use any Intervention, you should determine if there is evidence that it works.

• For instance, what is the probability that fertilizer will increase crop yield for farmers?

• BEFORE you work with any group, you may want to know the characteristics of that group.

• For instance, what proportion of abused women are

eventually killed by their partner? What is the probability that “risk” increases after they leave the partner?

• BEFORE you make recommendations, you want to understand the probabilities of success.

• What is the probability that those who participate in 4-H will graduate from college? Are they significantly more likely to graduate than those who do not participate?

Page 5: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Inferential Statistics Can Answer the Following Questions

• Is the Intervention I am currently using worth my time?• Does it work with 5% or 95% of program participants?

Are participants more likely than general population to reach goals?

• What factors are most important when attempting to increase effectiveness of intervention?• If I use a second intervention, does it increase success

by 10%, 15%, or 60%?• Are characteristics of program participants important?

• Policy Implications - Is it worth the tax payer’s dollars?• Is this a Spurious Relationship?• Is there a difference between groups receiving

intervention and those not receiving it• Is it large enough to warrant use of limited resources?

(substantial significance)• Is it large enough to argue that this intervention “works”

in different settings/situations?

Page 6: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Example of Applying Information Learned From Inferential Statistics

You implement an “on-line” program to improve communication amongst young married couples.

Using measurement of long term objectives, is it successful?• What percent of my respondents stayed married? How

much lower is the divorce rate for those who participated

than for general population?

What factors are most important?• Does required payment increase success?

• Do older/younger couples respond more

effectively to this counseling?

Policy Implications - Is it worth Tax Payer’s Dollars?• Is this relationship spurious (i.e., proactive individuals are

more likely to seek intervention and also more likely to have

good marriages)?

• Is the intervention “cost effective”? Is difference big

enough to matter?

• Is there evidence that this program could work in

other settings/situations or with other couples?

Page 7: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Correlation and Regression

Page 8: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Stating Hypothesis for Regression

• Null Hypothesis

• There is not relationship between any of the independent variables and the dependent variable

• Technically, all of the slopes are zero

• Research Hypothesis

• There is a relationship between at least one of the independent variables and the dependent variable

• Technically, at least one of the slopes are zero• This relationship could be positive or negative

Page 9: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Inferential Statistics

First consider bi-variate statistics

Pearson Correlation

When is it used?When you have a continuous independent

variable and a continuous dependent variable.

How do you interpret it?When the probability associated with the

___ statistics is .05 or less then you can assume there is a relationship between the dependent and independent variableFor instance you may want to know if the number of hours participants spend in your program is positively related to their scores on school exams

* NOTE The Pearson Correlation and Bi-variate Regression are very similar

Page 10: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Inferential Statistics

First consider bi-variate statistics

Bi-variate Regression

When is it used?When you have a continuous independent

variable and a continuous dependent (outcome) variableFor instance, you may want to know if the number of hours participants spend in your program is positively related to their scores on school exams

How do you interpret it?When the probability associated with the

F-statistic is .05 or less then you can assume there is a relationship between the dependent and the independent variable

• NOTE The Pearson Correlation and Bi-variate Regression

are very similar

Page 11: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Pearson Correlation

Consists of a continuous independent and a continuous dependent variable (i.e., X and Y)

A Pearson correlation coefficient is used to estimate the strength of the relationship between X and Y in the population

A Pearson correlation coefficient ranges from -1 to +1 The closer to -1 or +1 it is, the stronger the

relationship between X and Y, and the lower the probability that we would make a mistake if we claimed there is a relationship between X and Y in the population

A scatter plot can give a visual representation of the relationship between X and Y

A scatter plot shows all of the data points/plots and their relationship, using an X and Y axis

On the following slide, respondents’ scores on BETA and SAT were plotted so that there is one data point for someone who scored 1000 on the SAT and 12 on the BETA.

Page 12: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Bi-variate scatterplot showing a strong positive relationship

If all of the data points were on the regression line, then the correlation coefficient would be 1. This would indicate that if we know a person’s score on the SAT we can predict their score on

the BETA 100% of the time.

Page 13: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Bi-variate scatterplot showing a strong negative/inverse relationship

.

The regression line or slope indicates where the data points would be if you could predict Y after knowing X 100% of the

time. It is the “predicted” Y.

Page 14: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

The following slide contains a computer generated correlation matrix. A correlation matrix can provide the following information:

Strength of the relationship between any two of the variables

The probability that you would make a mistake if you claimed any two variables are related in the population

At the top of the correlation matrix, the following information is reported:

The mean of each continuous variable

The sample size

The standard deviation of each continuous variable

The range of scores for each continuous variable

The standard deviation of each continuous variable

Correlation Matrix

Page 15: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Pearson Correlation Coefficient

What does it tell us about the strength of the relationship between X and Y?

Strength of Relationship r value R2 values

Perfect 1.0 1.0

Strong .8 .64

Moderate .5 .25

Weak .2 .04

No Relationship 0 0

Weak - .2 - .04

Moderate - .5 - .25

Strong - .8 - .64

Perfect -1.0 -1.0

Strength of relationship (r or R2)The closer to 1 or -1 the R and R2 are, the stronger the

relationship. .

SignificanceThe stronger the relationship the more likely it is

significant.

Page 16: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

SR90 = number of men per every 100 womenTPOV90 = % of people living in povertyFHH = % of female headed households

EMPMAL = % of males employedEMPFEM90 = % of females employer

Correlation Matrix

Page 17: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

The first number in the matrix (marked by the maroon textbox) is the correlation coefficient. It indicates the strength of the relationship.

The second number is the probability. It must be .05 or less if you are to generalize to the population.

There may be a third number in the matrix. It would indicate the sample size.

Page 18: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Bi-Variate Regression

This is a bi-variate regression printout. It focuses specifically on the relationship between two of the variables (e.g. FHH90 and TPOV90)

reported in the matrix on the previous slide. Note that the standardized estimate is the same as the correlation coefficient. If you square this

number (.51215714) you would get .2623 (the R square). If you square the standardized estimate you always get the R-square. It is the percent increase in you ability to predict Y if you know X. In this example, your ability to predict the poverty rate (TPOV90) in a city increases by 26% if

you know the percent of female headed households in that city..

The Prob>F is .0001 This indicates this relationship is significant. We are more than 99% sure that this relationship exists in the population.

Page 19: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

This is a SAS printout of a Pearson Correlation Matrix. This matrix reports the relationship between 3 continuous variables (i.e., GPR, grade in school

and number of times the student has skipped class).

Page 20: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Bi-Variate Regression Continuous dependent variable Continuous independent variable Relationship between the two can be negative or

inverse, positive, linear or curvilinear.

Multiple Regression Continuous dependent variable You use multiple independent variables to predict

a continuous dependent variable For instance, you could use number of hours

participating in the program, score on attitude index and age to predict success in school (i.e., GPR)

A variation – you use one or more continuous independent variables and one categorical variable to predict a dependent variable—The categorical variable can have only two categories (i.e., male or female) For instance, you would use gender, number

of hours participating n the program, score on attitude index and age to predict success in school (i.e., GPR)

Types of Regression

Page 21: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Logit Regression

Categorical dependent variable (two categories)

You use continuous independent variables to predict the probability of falling into one category or another

For instance, how does number of hours studying for exams, age, and number of classes skipped, influence the probability that a student will graduate from high school? Graduation is measured simply as did

or did not graduate

Regression – continued

Page 22: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 23: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 24: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 25: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

This is a copy of a multiple regression

printout and includes a brief explanation of the

numbers reported.

Regression Printout

Page 26: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Interpreting a regression printout

Pr > F is <.000l indicating we can reject the null hypothesis and at least one independent variable is significantly related.

R-Square is .5187 indicating that our ability to predict posself (self-esteem score) increases by 51% if we know the value of all of the independent

variables.Pr . is less than .0001 for grades (abc) and how much like they other

students (likestu) indicating that these are the independent variables that are related to posself (self-esteem score).

Page 27: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Interpreting Parameter Estimates

• Parameter estimates are how much Y changes for every one unit change in X. From printout on previous slide, we see that for every grade increase (i.e., from C to B) then Posself (self esteem score) increases by .69 or 2/3 of a point.

• If the independent variable is a dummy variable then we interpret it slightly differently. It is how much Y changes when we go from one category to the other. From printout on previous slide we see that as we go from the category of male (coded 0) to female (coded 1) then Posself decreases by .21688.

Page 28: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Interpreting Parameter Estimates

• Caution – When you interpret a parameter estimate, you must consider how you measured the X variables. If your parameter estimate is 1 and you measured money in $10,000 increments, then for every $10,000 you spend on your child, their SAT score would increase by 1. If your parameter estimate is 1 and you measure money in dollars, then for every dollar, their SAT score would increase by 1. $10,000 would result in an increase of 10,000 on their SAT.

Page 29: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Why and when do we predict Y?

Explain results to a nonacademic audience

Explain regression in an interesting way

How do we predict?

First generate regression printout

THEN use the prediction formula – Y=a+b1x1+b2x2+b3x3+…….Where:

Y=the intercept or constant b=slope or parameter estimate which tells

you the change in Y for every one unit change in X

X=value of each independent variable that you select using the codebook

Y=the predicted value of your dependent variable as predicted by the combination of X variables

Predicting Y

Page 30: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 31: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 32: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 33: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 34: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 35: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 36: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 37: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 38: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 39: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.
Page 40: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Now You can do it

Page 41: Inferential Statistics. With inferential statistics you can do the following:  Determine probability of characteristics of population based on the characteristics.

Contact Information

Dr. Carol AlbrechtUSU ExtensionAssessment [email protected]