SELECTION OF MEASUREMENT INSTRUMENTS

SELECTION OF MEASUREMENTINSTRUMENTS

Administer a standardized instrumentAdminister a self developed instrumentRecord naturally available data (GPA,

absenteeism rates) Measure physical performance data

3 WAYS TO COLLECT DATA

Validity

The degree that a test measures what it is supposed to measure.

Valid for what?

Valid for whom?

Types of Validity

Content validity

The test measures the intended content

Item validity:

Sampling validity:

Face (logical) validity:

Test items measure intended content

Test samples content adequately

Test appears to measure what is supposed to

Who Determines Validity?

NO SET TEST!

Researcher determines validity by making a comparison about what should be included and what is included!

Types of Validity

Construct validity

The test measures the intended hypothetical construct

Construct- a non-observable trait

INTELLIGENCE

ANXIETY

RATE OFLEARNING

RPE

Types of ValidityConcurrent validity

Scores on your test are related to scoreson a previously administered test e.g. , judges ratings and tournament results

Steps

•Administer a VO2Max test

•Administer a treadmill walking test•Correlate the two sets of test scores•Evaluate the results

•If correlation is high, VO2Max can be substituted

Types of Validity

PREDICTIVE VALIDITYTest predicts how well an individual will perform in the future

Predictive validity of GRE for graduate study

Prediction of NTE scores and success as a teacher

Prediction of population who will become obese

Examples

To Determine Predictive Validity

Steps

•Administer the gre•Wait until first year gpa is established•Correlate two sets of test scores•Evaluate the results•Determine validity coefficient

Interpretation

Scores range from 0 to 1

Higher score is better

Reliability

Consistency of test measurementHigh test reliability means that when the test is retaken, the same scores would be earned Reliability ranges from 0 to 1

How do validity and reliability relate?

A valid test is always reliable

A reliable test is not always valid

Tests with high reliability may not measurewhat is intended by the researcher

Why do tests have low reliability?

Errors in the test

Failure to follow procedures

Student fatigue

Inattention to detail

Ambiguous questions

Familiarity with the test

Unclear directions

Improper administration

Student mood

Test-Retest ReliabilityScores are consistent over time

Steps

1. Administer test to group

2. Administer test again after time has

Passed (1 or 2 weeks)

3. Correlate 2 sets of scores

Coefficient of Stability

Alternate Forms Reliability

1. Administer one test form

2. Administer second form to the same group


Coefficient of Equivalence

Test A and Test B measure the same traits

Steps

Split-half Reliability

Requires only one administration of the test

1. Administer the total test to group

2. Divide test into 2 comparable halves (odd

or even questions)

3. Compute a set of scores for each half


5. Apply Spearman-Brown correction

COEFFICIENT OF EQUIVALENCE

Steps

Spearman-Brown example

Spearman-Brown correction- used to predict

reliability from 25 item to 50 item test

SPLIT-HALF RELIABILITY = O.80

r = 2r split half

1 + r split half

r = 2 x (.80)

1.00 + .80=

1.60

1.80= .89

Rationale Equivalence Reliability

Requires only one administration of the test

1. Administer the total test to group

2. Divide test into 2 comparable halves (odd

or even questions)

3. Compute a set of scores for each half


5. Apply Spearman-Brown correction

COEFFICIENT OF EQUIVALENCE

Steps

Internal and External Reliability

External Reliability Researcher status Choice of subjects Social situations

and conditions Analytic constructs Methods of data

collection and analysis

Internal Reliability Inter-observer

agreement All team members

trained the same way All team members

treat subjects identically

For videotape and transcript analysis reliability is established before the experiment begins

Scorer/rater Reliability

Occurs when subjective scoring of test itemsis performed

Inter-judge reliability - 2 or more scorers rate the tests the same way

Intra-judge reliability - same scorer rates each test the same way

Reliability Coefficients

The closer to 1.0 the better

Achievement/aptitude tests – reliability should not be less than 0.9

Subtest reliability should be calculatedfor tests that have more than one component

Standard Error of Measurement

Used to express test reliability

Small standard error of measurement (SEM) indicates high reliability

Interpreted similarly to standard deviaton

SEm = SD 1 - r

Four types of measurement scales

Nominal- subjects grouped based on gender, race, fitness levelOrdinal- rank comparisons, rank tallest to shortestInterval- supply the order and the distance between sets of scores (used with standard scores)Ratio-used when there are no zero points, IQ of 160 is not twice as smart as an IQ of 80…

Standard Scores

Z scores M = 0, SD = 1.0 Used to compare

and contrast 2 different test scores e.g, push-up and 40 yard dash

T scores M = 50, SD = 10 Converts Z scores

to all positive measures

Measuring Affective Behavior

PersonalityAnxietySelf-esteemSocial BehaviorRPEs

Hey, look no hands.

LIKERTSTRONGLY

AGREEAGREE UNDECIDED DISAGREE STRONGLY

DISAGREE

1 2 3 4 5

High point values on a positive statement indicateA positive attitude.

SEMANTIC DIFFERENTIAL

NECESSARY UNNECESSARY__ __ __ __ __ __ __

__ __ __ __ __ __ __FAIR UNFAIR

3 2 1 0 -3 -2 -1

POSITIVE NEGATIVE

Types of Scales

Personality Tests

Non-projective - uses a self-report instrument

Inventory - yes/no questions

Scale - used to determine what an individual feels or believes

May not be accurate due to societal influences

SELECTION OF MEASUREMENT INSTRUMENTS

Documents

Transcript of SELECTION OF MEASUREMENT INSTRUMENTS