Chapter 8 compilation

VALIDITY AND RELIABILITY

Validity refers to the appropriateness, meaningfulness, correctness and usefulness of the inferences a researcher makes.

Reliability refers to the consistency of scores or answers from one administration of an instrument to another, and from one set of items to another.

Validation is the process of collecting and analyzing evidence to support inferences.

The term validity refers to the degree to which evidence supports any inferences a researcher makes based on the data he/she collects using a particular instrument.

The validation process is on not on the instrument but on the inferences a researcher makes.

An appropriate inference would be the one that is relevant ( related to the purposes of the study).

A meaningful inference says something about the meaning of the information (such as test scores) obtained through the use of an instrument.

Validity depends on the amount and type of evidence there is to support the interpretations researchers wish to make concerning data they have collected.

1) Content-related evidence of validity

2) Criterion-related evidence of validity3) Construct-related evidence of

validity

This evidence refers to the content and format of the instrument.

The content and format must be consistent with the definition of the variable and the sample of subjects to be measured.

One key element in this kind of evidence is the adequacy of the sampling.

Content validation is a matter of determining if the content that the instrument contains is an adequate sample of the domain of content it is supposed to represent.

The other aspect of content validation has to do with the format of the instrument such as clarity of printing and appropriateness of language.

However, valid results cannot be obtained if an adequate questions in an instrument presented in an inappropriate format ( such as giving a test written in English to children whose English is minimal).

A common way to do this is to have someone look at the content and format of the instrument and judge whether or not it is appropriate.

However, the qualifications of the judges are always in important consideration, and the judges must keep in mind the characteristics of the intended sample.

Criterion – A second test or other assessment procedure presumed to measure the same variable.

Researcher usually will compare the performance from one instrument with other performance obtained from another criterion.

Allow a time interval to elapse between administration of instrument and

obtaining criterion score

Concurrent

Both data from the instrument and

criterion are collected in near same time

A key index to both forms of criterion-related evidence.

Symbolized by the letter rIndicating the degree of relationship that

exist between the scores individuals obtained by the two instrument

Can have positive and negative relationship.The difference between +1.00 and -1.00 If the r is .00, it means that there is no

relation.

More typically associated with research studies than testing.

Multiple sources are used to collect evidence. A combination of observation, surveys, focus

groups, and other measures are used to identify how much of the trait being measured is possessed by the observee.

The variable being measured is clearly defined.

Hypotheses, are formed about how people who possess a lot versus a little of the variable will behave in a particular situation.

The hypotheses are tested both logically and empirically.

Reliability refers to the consistency of the scores obtained.

If scores are completely inconsistent for a person, they provide no useful information.

The distinction between reliability and validity is shown in Figure 8.2

Indeed, if the data is unreliable, it cannot be valid but if the data is valid, it will always be reliable to be used.

There are many factors lead to errors of measurements such as:

1)Differences in motivation2)Energy3)Anxiety4)Different testing situation

A reliability coefficient expresses relationship between scores of the same individuals on the same instrument at two different times, or on two parts of the same instrument.

1) Test-retest method2) Equivalent forms method3) Internal-consistency methods

Test-retest reliability is a measure of reliability obtained by administering the same test twice over a period of time to a group of individuals. The scores from Time 1 and Time 2 can then be correlated in order to evaluate the test for stability over time.

A reliability coefficient is then calculated to indicate the relationship between the two sets of scores obtained.

Stability of scores over a two- to three-month is usually viewed as sufficient evidence of test-retest reliability for most educational research.

In reporting test-retest reliability coefficients, the time interval between the two testings should always be reported.

As an example, a test designed to assess student learning in psychology could be given to a group of students twice, with the second administration perhaps coming a week after the first. The obtained correlation coefficient would indicate the stability of the scores.

The shorter the time gap, the higher the correlation; the longer the time gap, the lower the correlation.

This is because the two observations are related over time -- the closer in time we get the more similar the factors that contribute to error.

Since this correlation is the test-retest estimate of reliability, you can obtain considerably different estimates depending on the interval.

Two different but equivalent forms of an instrument are administered to the same group of people.

Knows as ‘alternate’ or ‘parallel’.Containing the same content but constructed

differentlyHigh coefficient will indicate a strong evidence

of reliability and vice versaCan be combined with test-retest where the

coefficient will also cover the consistency over time.

Correlation coefficient will be calculated from the scoring two halves (usually the odd vs. the even) of the test.

The internal consistency of the test will be described by the relativity of the two halves.

Calculated using the Spearman-Brown Prophecy Formula (pg. 156).

The reliability of the test may be increased by adding more item to it (have to be similar to the original item.)

Most frequently applied method in determining the internal consistency.

Need 3 informationNumber of items on the test – KThe mean – MStandard deviation of the test – SD

Example – pg. 157

Also known as Cronbach alpha Is a general form of the KR20 formulaTo be used in calculating the reliability of

items that are not scored by right vs. wrong (more than one answer is possible)

An index that shows the extent to which a measurement would vary under changed circumstances.

Hence, there are many possible standard errors for a given score.

eg: IQ tests;

SMEas over one year period with different content = 5 points

10-year period = 8 points’

We doubled standard errors in measurement in computing the ranges within which the second score is expected to fall.

This was done so 95% sure that our estimates were correct.

Instrument s that use direct observation are highly vulnerable to observer differences.

Researchers are obliged to investigate and report the degree of scoring agreement

Enhanced by training the observers and increasing the number of observation periods.

All researches should ensure that any inferences they draw that are based on data obtained through the use of an instrument are appropriate, credible, and backed up by evidence.

Internal validity refers to the degree to which observed differences on the dependent variable are directed to the independent variable, not to some other (uncontrolled) variable.

When a study has internal validity, it means that any relationship observed between two or more variables should be unambiguous as to what it means rather than being due to “something else”.

The “something else” may be any one (or more) of a number of factors, such as the age or the ability of the subjects, the conditions under which the study is conducted, or the types of materials used.

If these factors are not in some way or another controlled or accounted for, the researcher can never be sure that they are not the reason for any observed results.

In qualitative research, a study is said to have good internal validity if the alternative explanations (the “something else”) have been systematically ruled out.

Regardless of whether the study is qualitative or quantitative, if these “rival hypotheses” are not controlled or accounted for in some way, the researcher can never be sure that they are not the reason for any observed results.

1) Subject characteristics2) Loss of subjects (Mortality)3) Location4) Instrumentation5) Data collector characteristics

6) Testing7) History8) Maturation9) Attitude of subjects10)Regression11) Implementation

Selection bias happens when the selection of people for a study may result in the individuals (or groups) differing from one another in unintended ways that are related to the variables to be studied.

Some examples that might affect the results of a study include:

1)Age2)Strength3)Maturity4)Gender ethnicity

5) Coordination6) Speed7) Intelligence8) Vocabulary9) Attitude10) Reading ability11) Fluency12) Manual dexterity13) Socioeconomic status14) Religious beliefs15) Political beliefs

Mortality threat refers to the possibility that results are due to the fact that subjects who are for whatever reason “lost” to a study may differ from those who remain so that their absence has an important effect on the results of the study.

This threat is due to some reasons such as illness, family relocation, requirements of other activities or some individuals may drop out of the study.

Loss of subjects not only limits generalizability but also introduce bias- if those subjects who are lost would have responded differently from those from whom data were obtained.

However, there is an attempt to eliminate the problem of mortality is to provide evidence that the subjects lost were similar to those remaining on pertinent characteristics such as age, gender, ethnicity, pretest scores, or other variables that presumably that might be related to the study outcomes.

Indeed, the best solution to this threat is preventing or minimizing the loss of subjects.

Experimental mortality which is also known as the loss of subjects.

Example:In a Web-based instruction project entitled

Eruditio, it started with 161 subjects and only 95 of them completed the entire module. Those who stayed in the project all the way to end may be more motivated to learn and thus achieved higher performance.

Location threat happens when the particular locations in which an intervention is carried out, may create alternative explanations for results.

The best method to control this problem is to hold location constant-that is, keep it the same for all participants.

The particular location in which data are collected or in which an intervention is carried out, may create alternative explanations for results. This is called a location threat.

Example:Classrooms in which students are taught may

have more or less resources, workstations, lighting, or teachers who may skew the results inadvertently. The location in which tests are administered may affect responses. Parent assessments of their children may be different when done at home than at school or if done individually or in groups.

Student performance on tests may be lower if tests are given in noisy or poorly lighted rooms. Observations of student interaction may be affected by the physical arrangement in certain classrooms.

The best method of control for a location threat is to hold location constant that is, keep it the same for all participants.

Instrumentation is the process where instruments and procedures are used in collecting data in a study.

Instrument decay happens when instrumentation creates problem if the nature of the instrument (including the scoring procedure) is changed in some way or another.

This is often the case when instrument permits different interpretations of results (as in essay tests) or is especially long or difficult to score, thereby resulting in fatigue of the scorer.

Instruments are devices used by researchers to collect information.

Examples are: questionnaires, surveys, tests, observation, participation, studies….

Instrument Decay can be a problem if the nature of the instrument is changed over time.

This may be due to fatigue or repetition on the part of the person administering the test, taking the test, or correcting the test.

Fatigue often happens when a researcher scores a number of tests one after the other; he/she becomes tired and scores the tests differently.

For example, more rigorously at first, more generously later.

Data Collector Characteristics is an inevitable part of most instruments and can affect results. The individual who collects the data may affect the results unintentionally.

Example:People may be more willing to be interviewed by

females rather than males. Other characteristics could be language patterns, ethnicity, age, size….

Also, individuals may present information, researchers may collect data differently, or counselors may use different tactics when presenting orally.

These threats, know as the implementer effect need to be controlled for as much as possible.

The characteristics of the data gatherers may tamper the data.Female data collector will elicit more of a

confession from the situation compared Primary ways to control the threat

Use the same data collector(s)Analysing data separately between each

collectorEnsuring each collector were used equally in a

group setting

The data collectors or the scorer may unconsciously distort the dataMore time allowed in the examInterviewers asking leading questions

Technique to handle data collectors bias :Standardize all procedure

Require some trainingEnsure data collectors lack the information

that require them to distort the dataAlso known Planned Ignorance

Testing : the use of any instrumentation.Testing Threat : the subjects already

‘practiced’ the post-test using the pre-test given to them prior to the subject

A pre-test sometimes regarded as a practice thus making the subjects alert/aware of the questions.

One or more unanticipated and unplanned event occur during the course of the study that can affect the result/outcomeConstruction noisesDeath of a certain eminent person

Researcher must be alert on any events or occurrence during the study.

During an intervention, the change happen with the influence of time rather than the intervention itself.

It is a serious threat to pre-post studies or studies that span over years of time.

The best way overcome this is to have a good comparison group in the study.

Hawthorne Effect• This positive effect, resulting from increased attention and

recognition of subjects

eg: productivity increased were made in physical working conditions (increase in the number of breaks)

Opposite Effect The negative effect, resulting in becoming demoralized or

resentful and hence perform more poorly than the treatment group.

eg: productivity decreased when the control group receive no treatment at all

Remedy • Provide the control or comparison group(s) with a special

treatment comparable to that received by the experimental group.

Presence of regression threat

Change is studied in a group that is extremely low or high in its preintervention performance.

eg: a class of students of markedly low ability may are given special help. Six months later, their average score on test involving similar problem has improved, but not necessarily because of the special help

A possibility where an experimental group may be treated in ways that are unintended and not necessarily part of the method, yet which give them an advantage of one sort or another.

Different individuals implement different methods different outcomes

Some individuals have a personal bias in favor of one method over the other.

Evaluate the individuals who implement each method on pertinent (relevant) characteristic, and then try to equate the treatment groups on these dimensions.

To require that each method be taught by all teachers in the study.

Some teachers may have different abilities to implement the different methods.

Use several different individuals to implement each method, thereby reducing the chances of an advantage to either method.

Allow individuals to choose the method they wish to implement.

Have all methods used by all implementers, but with their preferences known beforehand.

Standardizing the conditions under which the study occurs.

(Location, Instrumentation, Subject attitude and Implementation threats)

Obtaining and using more information on the subjects of the study.

(Subject characteristics, Mortality, Maturation and Regression threats)

Obtaining and using more information on the details of the study.

( Location, Instrumentation, History, Subject attitude and Implementation threats)

Choosing an appropriate design.

The degree to which results are generalizable, or applicable, to groups and environments outside the research setting.

The extent to which the results of a study can be generalized determines the external validity of the study.

External validity is related to generalizing. That's the major thing you need to keep in mind.

A study that has a large, randomly selected sample or a carefully matched sample is said to have external validity.

Recall that validity refers to the approximate truth of propositions, inferences, or conclusions so external validity refers to the approximate truth of conclusions the involve generalizations.

In simpler words, external validity is the degree to which the conclusions in your study would hold for other persons in other places and at other times.

Population refers to any set of people or events from which the sample is selected and to which the study results will generalize.

Population generalizability refers to the degree to which a sample represents the population of interest.

However, if the results of a study only apply to the group being studied and if that group is fairly small or is narrowly defined, the usefulness of any findings is seriously limited.

This is why trying to find a representative sample is so important because researchers usually want the results of an investigation to be as widely applicable as possible.

Representativeness refers only to the essential, or relevant, characteristics of a population.

In science there are two major approaches on how we provide evidence for a generalization.

The first approach is the Sampling Model.

In the sampling model, you start by identifying the population you would like to generalize to. Then, you draw a fair sample from that population and conduct your research with the sample. Finally, because the sample is representative of the population, you can automatically generalize your results back to the population.

However, there are several problems with this approach.

First, perhaps you don't know at the time of your study who you might ultimately like to generalize to.

Second, you may not be easily able to draw a fair or representative sample.

Third, it's impossible to sample across all times that you might like to generalize to (like next year).

A non-random sample reduces the external validity of the study.

Much medical research is done on the patients one sees in the clinic, this is a non-random sample that is not representative of a larger population. It will not generalize because it is not a fatal flaw in the study.

A study with a non-random sample still identifies true facts about the sample and perhaps those findings will be true for others as well. It is best to define your population first, and then obtain a random sample.

The sample size required depends on the requirements of the study and size of the population.

As a rule the bigger the better. If the sample is too small then the performance of a few individuals can have a big effect on the data, and render the data less representative of the population.

Researches should describe the sample as thoroughly ad possible (in detail; age, gender,

ethnicity and others) so that interested others can judge for themselves the degree that they want.

Replication; repeats the study using different groups of subjects in different situations.

Have not been used; Educational researches may be unaware of the hazards

involved in generalizing when one does not have a random sample.

It is simply not feasible for a researcher to invest the time, money or other resources to obtain a random sample.

The degree to which the result of a study can be extended to other settings or condition.

The researcher must ensure that the important aspect must match in order to generalize the finding from another study

What hold true for another subject/material/condition/time doesn’t mean it will remain true with the other

Hence, researcher must be careful in generalizing the findings from another research with the other

THANK YOU

Chapter 8 compilation

Technology

Transcript of Chapter 8 compilation