ESU 6 BLUE RIVER COHORT JANUARY 4, 2012 Cooperative Learning Mitzi Hoback and Suzanne Whisler.
TerraNova Evaluation of a Standardized Test Mini-Project 1 Teresa Frields and Mitzi Hoback.
-
Upload
job-sparks -
Category
Documents
-
view
218 -
download
0
Transcript of TerraNova Evaluation of a Standardized Test Mini-Project 1 Teresa Frields and Mitzi Hoback.
TerraNovaEvaluation of a Standardized
Test
Mini-Project 1
TerraNovaEvaluation of a Standardized
Test
Mini-Project 1
Teresa Frields and Mitzi Hoback
A. General InformationA. General Information
Title: TerraNova
Publisher: CTB/McGraw-Hill
Date of Publication: 1997
A. General Information Cost
A. General Information Cost
Varies as to what is purchased
$122 per 30 Complete Battery Plus consumable test booklets
$92.50 per 30 Complete Battery Plus reusable test booklets
A. General Information Administration Time
A. General Information Administration Time
Varies by test and level
Typically given over a period of several test sessions or days
Fall, Winter, and Spring testing periods available
B. Brief Description of Purpose and Nature of Test
General Purpose of Test
B. Brief Description of Purpose and Nature of Test
General Purpose of Test
Constructed as a “comprehensive modular assessment series” of student achievement
Promoted as a device to help diverse audiences understand student academic achievement and progress
Reports provide useful and informative data which allows for national comparison of group and individual achievement
B. Brief Description of Purpose and Nature of Test
Population for which test is applicable
B. Brief Description of Purpose and Nature of Test
Population for which test is applicable
K-12Reading/language arts and mathematics
available for K-12Science and social studies tests available
1-2
B. Brief Description of Purpose and Nature of Test
Description of Content
B. Brief Description of Purpose and Nature of Test
Description of ContentMultiple choice formatGenerates precise norm-referenced achievement
scores and a full complement of objective mastery scores
Designed to measure concepts, processes, and skills taught throughout the nation
Content areas measured are Reading/Language Arts, Mathematics, Science, and Social Studies
B. Brief Description of Purpose and Nature of Test
Appropriateness of Assessment Method
B. Brief Description of Purpose and Nature of Test
Appropriateness of Assessment Method
Selected-response items can provide information on basic knowledge and some patterns of reasoning
Does not provide evidence for performance standards/targets
Other TerraNova formats provide a combination of selected-response and constructed-response
C. Technical EvaluationNorms/Standards
C. Technical EvaluationNorms/Standards
1. Type – The battery generates precise norm-referenced achievement scores and a full compliment of objective mastery scores.
Types of scores provided: Scaled Scores Grade Equivalents National Percentiles National Stanines Normal Curve EquivalentsReports are provided both individually and as groups of
students.
C. Technical EvaluationNorms/Standards
C. Technical EvaluationNorms/Standards
2. Standardization Sample – Size: The norming sample was based on a stratified national sample.
295 schools
Fall & Spring norming studies involved between 860,000 and 1,720,000
C. Technical EvaluationNorms/Standards
C. Technical EvaluationNorms/Standards
2. Standardization Sample – Representativeness:
Separate sampling designs were used for institutions of different types
Public schools stratified by region, community, type, size, & Orshansky Percentile (an indicator of socioeconomic status)
C. Technical EvaluationNorms/Standards
C. Technical EvaluationNorms/Standards
Standardization Sample – procedure followed in obtained sample:
Spring Standardization – April, 1996Fall Standardization – October 1996Recommended test administration period is
five week window centered on the norming periods
C. Technical EvaluationNorms/Standards
C. Technical EvaluationNorms/Standards
3. Standardization Sample – Availability of subgoup norms
Questionnaire sent to participating schools95% responded in the fall100% responded in the spring
C. Technical EvaluationNorms/Standards
C. Technical EvaluationNorms/Standards
3. Standard setting procedures employed – qualifications and selection of judges:
Nominations were made of experienced teachers and curriculum specialists with national reputations
Judges had to possess “deep understanding” of one of the five content areas
C. Technical EvaluationNorms/Standards
C. Technical EvaluationNorms/Standards
3. Standard setting procedures employed – number of judges:
2 committees for each of 5 content areasPrimary/Elementary and Middle/High
School4-5 teachers per committee, one curriculum
expert (external) and one CTB content expert (approximately 70 people total)
C. Technical EvaluationReliability
C. Technical EvaluationReliability
1. Types – Measure of internal consistency:
Kuder-Richardson Formula 20 (KR20) Item pattern KR20 (a unique measure that
takes into account the additional accuracy associated with IRT item-pattern scoring)
Coefficient alphaOn individual student score reports, a student’s score is
reported along with a confidence band.
C. Technical EvaluationReliability
C. Technical EvaluationReliability
2. Results:
Reliability coefficients were consistently .80s and .90s
Spelling consistently lowerGrade 1 and 2 also had slightly lower
coefficients
C. Technical EvaluationValidity
C. Technical EvaluationValidity
1. Types – Content-related:
Numerous studies (e.g. classroom pilots, usability, sensitivity) conducted
Advisory panel of teachers, administrators, and content specialists from all parts of country
Based on recommendations of SCANS (Secretary’s Commission of Achieving Necessary skills) report
C. Technical EvaluationValidity
C. Technical EvaluationValidity
1. Types – Content-related:
Developers and scorers worked together as constructed-response items were scored for consistency and accuracy of scoring guides and process
Reviewed various informational sources for children to determine topics of interest
C. Technical EvaluationValidity
C. Technical EvaluationValidity
1. Types – Criterion-related:
Conducted variety of research studies, such as correlation with SAT and ACT, NAEP, TIMMS
C. Technical EvaluationValidity
C. Technical EvaluationValidity
1. Types – Construct-related:
Careful test development process to support content validity and comprehensiveness of test
Construct validity for skills, concepts and processes measured in each subject
C. Technical EvaluationValidity
C. Technical EvaluationValidity
2. Results:
Provides achievement scores that are valid for several types of educational decision making
A thorough validity evaluation encompassed content-, criterion-, and construct-related evidence
BiasBias
Used the following procedures to reduce the amount of bias:
Ensured valid test planFollowed stringent editorial guidelinesConducted expert reviewsAnalyzed student data for differential item
functioningSelected best items
D. Summary of MMY ReviewsD. Summary of MMY Reviews
Reviewed by Judith A. Monsaas, Assoc. Prof. Of Education, North Georgia College and State University, Dahlonega, GA
Tests are “very engaging and user friendly”. Materials are well-constructed, and attractive,
Addition of performance standards is helpful for schools moving toward a standards-based curriculum framework
D. Review, continuedD. Review, continued
Claims to assist in decision making in many areas, including evaluation of student progress, instructional program planning, curriculum analysis, class grouping, etc. This reviewer believes they can support this claim
Has a particularly useful section for parents on “Using Test Results”
D. Review, continuedD. Review, continued
“Although these tests are attractive and more engaging than most achievement tests I have inspected, I doubt that students will forget that they are taking a test.”
Good section on “Avoiding Misinterpretations” when using grade equivalents is helpful
D. Review, continued D. Review, continued
Process used to develop the test and ensure content validity was very thorough and clearly explained
Norming and score reporting methods are well-developedReviewer’s only problem is with the mastery classifications for the criterion-
referenced interpretations. She feels they are arbitrarily defined.
D. Review continuedD. Review continued
Reviewed by Anthony J. Nitko, Professor, Department of Educational Psychology, University of Arizona, Tucson, AZ
One change in the new edition is that items within each subtest are organized according to contextual themes, countering the criticism that standardized tests assess strictly decontextualized knowledge and skills
D. Review ContinuedD. Review Continued
Developers carefully analyzed curriculum guides from around the country, as well as national and state standards and textbook series
Several usability studies were run. The results of these were used to improve test items, teachers’ directions, and page designs
D. Review continuedD. Review continued
Earlier editions criticized for problems related to speed. This version corrects those. Typically fewer than 4% of students fail to respond to the last item on each subtest
“One of the better batteries of its type.”Teachers’ materials exceptionally well-
done and informative
E. Critique of the Instrument E. Critique of the Instrument
Our research on the TerraNova helps us to draw the following conclusions:
A complete and comprehensive testNumerous measures and studies were done
to ensure technical requirementsTerraNova takes pride in its overall test
design, construction, norming, national standardization process, reliability, validity, and the reduction of bias issues
E. Critique of the InstrumentE. Critique of the Instrument
Does a good job supporting its purpose as a measure to aid in student achievement
Provides three main types of information including norm-referenced information, some criterion information, and standards-based performance information
Serves as a good measure in comparing student achievement with national performances
E. Critique of the InstrumentE. Critique of the Instrument
This is not a test that should be used by itself. It is simply one type of measure and cannot be the only measure used in making critical decisions
When used in conjunction with other test methods and teacher judgment, it is an effective measure for what it purports to do
Caution should be used when using this assessment to track state standards, although it purports to be accurately correlated, there is no substantial proof.
E. Critique of the InstrumentE. Critique of the Instrument
Interesting Tidbits:Del Harnish has done research on bias
issues and is published for his work on the TerraNova
Testnote Clarity is a computer program available with the disaggregation of data which allows the user to customize and apply to district curriculum