Comprehensive Exam Review

Comprehensive Exam Review Click the LEFT mouse key ONCE to continue

Research andProgram EvaluationPart 4 Click the LEFT mouse key ONCE to continue

Overview of Statistics

Clearly, it is not possible to present a comprehensive review of all statistics here. Therefore, what follows is a general overview of major principles of statistics.There are technical exceptions to (or variations of) most of what is presented.However, the information provided here is adequate for and applicable to most of the research in the counseling profession.

Parametric StatisticsUse of so-called parametric statistics is based on assumptions including that:the data represent population characteristics that are continuous and symmetrical.the variable(s) has a distribution that is essentially normal in the population.the sample statistic provides an estimate of the population parameter.

Recall that variables typically involved in research can be divided into the categories of discrete or continuous.Based on this distinction, (in general) all statistical analyses can be divided into:Analyses of Relationships among variablesorAnalyses of Differences based on variables

In the context of this general overview, all of the variables involved in analyses of relationships are continuous.Similarly, for analyses of differences, at least one variable must be continuous and at least one variable must be discrete.

Analyses of Relationships

The simplest (statistical) relationship involves only two (continuous) variables.In statistics, a relationship between two variables is known as a correlation.Calculation of the correlation coefficient allows us to address the question, What do we know (or can we predict) about Y given that we know X (or vice versa)?

The correlation coefficient:is used to indicate the relationship between two variables.ranges in values from -1.00 through 0.00 to +1.00.is known more formally as the Pearson Product-Moment Correlation Coefficient.is designated by a lower case r.

When r = -1.00, there is a perfect negative, or inverse, relationship between the two variables.This means that as one variable is changing, the associated variable is changing in the opposite direction in a proportional manner.When r = +1.00, there is a perfect positive, or direct, relationship between the two variables.

This means that as one variable is changing, the associated variable is changing in the same direction in a proportional manner.When r = 0.00, there is a zero-order relationship between the two variables.This means that change in one variable is unrelated to change in the associated variable.

The question to be confronted is...How do we know if the correlation coefficient calculated is any good?In general, there are two major ways to evaluate a correlation coefficient.One method is in regard to statistical significance.Statistical significance has to do with the probability (likelihood) that a result occurred strictly as a function of chance.

The results of the decision are operationalized in the alpha level selected for the study.Evaluation based in probability is like a game of chance.The researcher decides whether it will be a high stakes or a low stakes situation, de-pending on the implications of being wrong.

In the language of statistics, the alpha level (e.g., .01 or .05), sometimes called the level of significance, represents the (proportionate) chance that the researcher will be wrong in rejecting the null hypothesis.That is, the alpha level also is the probability of making a Type I Error.

In the language of statistics, the p value is the (exact) probability of obtaining the particular result for some statistical analysis.Technically, the p value is compared to the alpha level to determine statistical significance; if p is less than the alpha, the result is statistically significant.

Most computer programs generate p values (i.e., exact probabilities) from statistical analyses.However, most journal articles report results as comparisons of p values to alpha levels; that is, they report, for example, *p < .05, rather than, for example, *p = .0471.

There is at least one prominent limitation in the evaluation of a correlation coefficient based on statistical significance.This limitation is related to the conditions under which the statistical significance of the correlation coefficient is evaluated.The critical value is the value of the correlation coefficient necessary for it to be statistically significant at a given alpha level and for a given sample size.

In statistics, sample size is usually expressed in regard to degrees of freedom.For example, the degrees of freedom for a correlation coefficient is given by: df = N - 2.For the correlation coefficient, there is an inverse relationship between the critical values and degrees of freedom.As the degrees of freedom (i.e., sample size) increase, critical values (needed for statistical significance) decrease.

This means that a very small correlation coefficient can be statistically significant if the data are from a very large sample.Correlation coefficients cannot be evaluated as good or bad in an absolute sense; consideration must be given to the sample size from which the data were derived.

Consider two variables: A and BBy definition, if A is a variable, it has variance (i.e., not every person receives the same score on measure A).Another way to evaluate a correlation coefficient is in terms of shared variance.

BSimilarly, because B is a variable, it has variance, and all (i.e., 100%) of the variance of B can be represented by a circle.

Of interest is how much variance variables A and B share.

The percentage of shared variance is equal to:

The term r2 is known as the coefficient of determination.The percentage of shared variance is how much of the variance of variable A is common to variable B, and vice versa.Another way to think of it is that the percentage of shared variance is the amount of the same thing measured by (or reflected in) both variables.

The good news is that the shared variance method as a basis for evaluating a correlation coefficient is not dependent upon sample size.The bad news is that there is no way to determine what is an acceptable level of shared variance.Ultimately, the research consumer has to be the judge of what is a good correlation coefficient.

The Pearson Product-Moment Correlation coefficient can be used to predict one variable from another.Thats helpful, but has limited application because only two variables are involved.

Suppose we know of the relationships between Z and each of several other variables.

In multiple correlation, one variable is predicted from a (combined) set of other variables.The capital letter R is used to indicate the relationship between the set of variables and the variable being predicted.The variable being predicted is known as the criterion variable, and the variables in the set are known as the predictor variables.

In computing a multiple correlation coefficient, the most desirable situation is what is known as the Daisy Pattern.In the hypothetical Daisy Pattern, each predictor has a relatively high correlation with the criterion variable...

and each of the predictor variables has a relatively low correlation with each of the other predictor variables.

If achieved, a true Daisy Pattern would look something like this.

The multiple correlation computational procedures lead to a weighted combination of (some of) the predictor variables and a specific correlation between the weighted combination and the criterion variable.The same two methods used to evaluate a Pearson Product-Moment Correlation coefficient can be used to evaluate a multiple correlation coefficient.

The methods of evaluating R include:statistical significance, although the sample size limitation concern is less problematic if the sample is sufficient for the computa-tions.percentage of shared variance, where the expression R2 x 100 represents the sum of the intersections of the predictors with the criterion variable.

A canonical correlation (Rc) represents the relationship between a set of predictor variables and a set of criterion variables.A canonical correlation is usually expressed as a lambda coefficient, often Wilks Lambda, which is the result of the statistical computations.

Graphically, a canonical correlation might look like this:

The statistical significance of the lambda coefficient can be readily determined.The percentage of shared variance also can be calculated. However, because the lambda coefficient can have a value greater than one, the calculation of shared variance involves more than just squaring the lambda coefficient.

The following chart summarizes the nature of the three preceding analyses of relationships.

Factor analysis, a special type of analysis of relationships among variables, is a general family of data reduction techniques.It is intended to reduce the redundancy in a set of correlated variables and to represent the variables with a smaller set of derived variables (aka factors).Factor analyses may be computed within either of two contexts: exploratory or confirmatory.

Factor analysis starts with input of the raw data.Next, an intervariable correlation matrix is generated from the input data.Then, using sophisticated matrix algebra procedures, an initial factor (loading) matrix is derived from the correlation matrix.

There are three major components to the factor loading matrix.The first is the set of item numbers, usually arranged in sequence and hierarchical order.The second is the factor identifications, usually represented by Roman numerals.The third is the factor loadings, usually provided as hundredths - with or without the decimal point.

The result might look something like this:

An important question is, How do we know how many factors to retain?In factor analysis, potentially there can be as many factors as items.However, usually one or some combination of three methods is used to decide how many factors to retain.

One common method is to retain factors having eigenvalues greater than one.Each factor has an eigenvalue, which is the sum of the squared factor loadings for the factor.Retaining factors having eigenvalues greater than one also is known as applying the Kaiser Criterion.

A second common method is to apply the scree test.The scree test is a visual, intuitive method of determining how many factors to retain by examining the graph of the eigenvalues from the initial factor loading matrix.

A third possible method is based on how much of the total variance is to be accounted for by the retained factors.The total possible variance is equal to the number of items.Therefore, the variance percentage for any factor is the eigenvalue divided by the total number of items, times 100.Factors can be retained by summing these percentages until the desired percentage is reached.

Another important question in factor analysis is, How are the relationships among the factors to be conceptualized?A factor is two things:Conceptually, a factor is a representation of a construct.However, in regard to mathematics, a factor is a vector in n-dimensional space.

Factors as constructs may be separate and entirely distinct from one another or separate but conceptually related to one another.Factors as vectors reflect these possibilities by being positioned in n-dimensional space as either perpendicular to one another or as having an acute angle between them.The initial factor loading matrix is rotated to achieve the best mathematical represen-tation and clarity among the constructs.

If the factors are assumed to be distinct (i.e., independent) from one another, the rotation is said to be orthogonal.An orthogonal rotation is one in which the angles between factors are maintained as right angles during and after the rotation.The most common orthogonal rotation is the Varimax procedure.

If the factors are assumed to be related (i.e., dependent) to one another, the rotation is said to be oblique.An oblique rotation is one in which the angles between factors are maintained as less than right angles during and after the rotation.The most common oblique rotation is the Oblimin procedure.

We assign to a factor a name that encompasses whatever is reflected in the items having their highest factor loadings on the factor.There are a few important things to be remembered about factor analysis.First, a valid factor analysis requires lots of subjects, usually a minimum of ten times the number of subjects as items.

Another important point is that even though factor analysis is a sophisticated data analysis technique, quite a few relatively arbitrary decisions are made by the researcher in the process.Selection of the context and type of factor analysis to be used, determination of the number of factors to retain, and naming of the factors are just a few of the decisions to be made.

And finally, just because a research study contains a factor analysis doesnt necessarily mean that it is good research.The validity and appropriateness of the factor analysis must be evaluated in order to evaluate the worth of the research.

This concludes Part 4 of the presentation on RESEARCH AND PROGRAM DEVELOPMENT

Comprehensive Exam Review

Documents

Transcript of Comprehensive Exam Review