Interpreting State Test

Post on 24-Feb-2016

32 views 0 download

Tags:

description

Interpreting State Test. Growth Model Emeteric PVAAS. PVAAS Growth Methods. The Growth Standard Methodology. - PowerPoint PPT Presentation

Transcript of Interpreting State Test

Interpreting State Test

Growth ModelEmeteric

PVAAS

PVAAS Growth MethodsTesting Subjects – Grades Methodology

PSSA in consecutive

years

Math 4-8Reading 4-8

Growth Standard

PSSA not in consecutive

years

Writing – 5, 8, 11Science – 4, 8, 11

Math & Reading – 11Predictive

The Growth Standard Methodology

• Each year a cohort’s estimated achievement (using all historical PSSA data available) will be located on the appropriate grade level distribution from the 2005-06 statewide distributions.

• The 2005-06 performance distributions are used to establish “typical” performance at each grade level so that growth in consecutive years can be measured relative to the same standard each year.

Growth Standard MethodologyGrades 4 through 8 – Reading & Math

A cohort makes one year’s growth when…

The estimated achievement for the current year maintains the same relative position as the estimated achievement for the previous year in the statewide data base of all cohorts’ estimated achievement.

200

400

600

800

1000

1200

0 50 100 150 200 250 300

x

Population Histogram

200

400

600

800

1000

1200

0 50 100 150 200 250 300

x

Population Histogram

4th Grade Distribution 2005-06

5th Grade Distribution 2005-06

Cohort Position

Cohort Position

Predictive MethodologyWriting, Science, and Grade 11 Reading & Math

A cohort makes one year’s growth when…

The mean observed score from the actual test is not significantly different from the mean predicted score for the cohort. The mean predicted score is calculated based on all reading and math data in each student’s record in the cohort.

Mean Observed Score ≈ Mean Predicted Score

Mean Predicted Score

± error

Mean Observed Score

Summary:One Year’s Growth

Growth Standard Method

Math and Reading – Grades 4 through 8

Cohort maintains its achievement position.

Predictive Method

Math & Reading – Grade 11Writing – Grades 5, 8, 11 Science – Grades 4, 8, 11

Cohort’s actual performance is as expected.

Using the Growth Standard

• What is a Growth Standard and how is it set?

– The Growth Standard specifies the minimal acceptable academic gain from grade to grade for a cohort of students.

Using the Growth Standard

• How can we compare scores across different years?– The growth Standard uses converts PSSA scores to

an Equal interval score that allows you to compare scores. Without the conversion, you cannot compare scores.

Using the Growth Standard

• The use of a Growth Standard creates the possibility that ALL schools can demonstrate appropriate growth.

An Analogy

An Analogy

• Doctors plot a child’s length/height over time.

• Each child may have a unique growth curve.

When is growth “acceptable”?

• The length/height measurement is increasing over time.

• The length/height measurement maintains the approximate position in its length/height distributions as the child grows.

• The child’s length/height continues to increase in a consistent manner.

When is growth “acceptable”?

• The PSSA growth standard acts in a similar manner as a child’s growth chart

• Deviation from “typical” Further Investigation is needed

What is the Growth Standard for a child’s length/height?

• The standard is that the child maintain the approximate same position each of the increasing distributions of length/heights as the child grows.

• A significant deviation from that pattern does not indicate a problem; it indicates a need for further investigation.

Simulated Growth Standard Charts for Academic Achievement

• Let us build an Academic Achievement Growth Chart.

1. Collect the average performances of a large sample of students using a uniform assessment during each year of their career through school.

2. Plot curves to represent appropriate percentile patterns.

3. An example: Suppose the following table represents the means and SDs of a group of students on the PSSA beginning in 3rd grade and continuing through 8th grade and ultimately 11th grade.

800

1000

1200

1400

1600

1800

2000

3 4 5 6 7 8 9 10 11 12Grade_2005_2013

PSSA Math Scatter Plot

A Growth Standard Chart for Academic Achievement

PSSA MathGrade Score SD <new>

12345678

3 1270 250

4 1290 280

5 1300 255

6 1285 276

7 1310 262

8 1335 260

11 1320 270

3 1520

An example of a cohort’s growth…

This cohort’s mean performances have met the Growth Standard since

1. The growth curve approximately maintains its position in the distribution of scores.

2. There are no significant deviations in the pattern of growth over time.

800

1000

1200

1400

1600

1800

2000

3 4 5 6 7 8 9 10 11 12Grade_2005_2013

PSSA Math Scatter Plot

Note that there is a problem of comparing scaled scores across years…

A Problem…

• It will take six years to create an academic growth chart.

• We can use Base Year distributions.

• Distributions of the Base Year match the distributions of a single cohort over time.

800

1000

1200

1400

1600

1800

2000

3 4 5 6 7 8 9 10 11 12Grade_in_2006

PSSA Math Scatter Plot

We use the base year distributions.

The base year for PVAAS is 2006.

Using the Base Year 2006

Suppose the distributions from 2006 are given by

Grade 3 4 5 6

Mean 1270 1290 1300 1285

SD 250 310 255 276

Conversion to NCE scores will use the Base Year distributions in their calculations.

Suppose the means of a cohort in two consecutive years are:2007: 3rd 1390 and 2008: 4th 1450

NCE scores are calculated for both using the 2006 means and SD’s.

Grade 3 4Mean 1270 1290

SD 250 310

2007: 3rd 1390

2008: 4th 1450

1390 1270 0.48250

50 21.06*0.4860.11

z score

NCE score

1450 1290 0.52310

50 21.06*0.5260.95

z score

NCE score

All future PSSA scaled scores will be converted to NCE scores using the 2006 Base year parameters for the comparison to calculate the mean gain of a cohort of students.

The NCE Growth CurvesNCE PSSA Math

Grade Score SD <new>12345678

3 50.00 21.06

4 50.00 21.06

5 50.00 21.06

6 50.00 21.06

7 50.00 21.06

8 50.00 21.06

11 50.00 21.06

3 71.06 0

20

40

60

80

100

Grade3 4 5 6 7 8 9 10 11 12

NCE PSSA Math Scatter Plot

Some Thoughts…This Growth Standard concept demonstrates the need for longitudinal data when considering academic growth since each student has his/her own academic growth curve.But…

The example also exhibits the remaining two issues for PVAAS value-added methods:

1. Comparing scores from year to year

2. Estimate the “true” level of achievement for input into the growth curve.

Calculation of Gain from year to year

Student growth is measured by difference in performance in consecutive years.

Grade 3 4 5 6Score 1290 1310 1330 1365Gain 20 20 35

But there is a problem with this!

These scores are not comparable!

Comparing scaled scores on the PSSA from different years

PSSA tests have different means and standard deviations at each grade and for different years. For example, in 8th grade:

Math ReadingYear Mean SD Mean SD2005 1370 222.2 1360 274.32004 1350 208.1 1370 239.7

A Solution: Conversion to NCE Scores

• NCE scores indicate the position of a scaled score on a reference scale (mean = 50, sd = 21.06) so that the scaled scores from different distributions with different scales can be compared.

• The use of NCE scores does not impose a normal distribution on the data, nor does the use of NCE scores have any relationship to normed referenced tests.

• NCEs are excellent for looking at scores over time.Using Data to Improve Student Learning in High Schools

Victoria L. Bernhardt

NCE Scores Are About Position

To calculate an NCE score:1. Calculate the z-score of the data value of interest,

that is, the number of standard deviations the data value is from the mean of its distribution:

2. The NCE score is calculated using the following formula:

observed score meanz scoreSD

50 21.06*( )NCE score z score

The need for uniform scales…

• George scores a 655 on the SAT mathematics exam.

• George also scores a 28 on the ACT mathematics exam.

Which score should he report to his colleges if he wants to provide the

“better” score?

A Matter of Comparison

How do we compare George’sscores?

Mean SD George

SAT 520 110 655

ACT 20.7 5.0 28

The nature of each distribution is irrelevant to the question of interest:

A Solution

• Conversion of both scores to NCE scores allows for the identification of the position of each score on the same scale.

• This identification of position provides the capability of comparison since the converted scores will be based on the same distribution parameters.

Which Score Should George Choose to Report?

65550 21.06*NCE score 655 52050 21.06* 75.85

110NCE score

Using a NCE scale with mean 50 and standard deviation 21.06…

SAT score of 655 NCE score 75.85

ACT score of 28 NCE score 80.74

Clearly, he should report his ACT score!

ACTscore

SATscore

28 20.750 21.06* 80.745.0

NCE score

Consider Another Hypothetical Scenario…

In 2006, Wilma was in 4th grade and scored as follows on the 4th grade PSSA:

Mean for 4th Grade – 2006 = 1303.24

Standard Deviation for 4th Grade – 2006 = 164.20

Wilma’s scaled score = 1425

In 2005, Wilma was in 3rd grade and scored as follows on the 3rd grade PSSA:

Mean for 3rd Grade – 2005 = 1356.75

Standard Deviation for 3rd Grade – 2005 = 126.20

Wilma’s scaled score = 1425

Do these scores indicated that Wilma progressed during 4th grade?

Let’s Look at it Graphically…

Even though Wilma’s scaled scores were the same (both 1425), since the distributions were different, we really can’t compare the two scores…

Wilma Wilma

A Tentative Solution: Conversion to Percentiles

In our example, Wilma score of 1425 was in the 66th percentile for 2005 but was in the 76th percentile for 2006. These percentiles focus on Wilma’s position in

each distribution.

WilmaWilma

But…

• We cannot calculate Wilma’s gain – the difference of percentiles does not make sense…

• Percentiles are not meaningful for calculating means for different years, gains, etc., since they are calculated from different distributions.

The Complete Solution: Conversion to NCE Scores

• To establish a basis of comparison for different distributions from different schools in different years, we convert the scaled scores to units in the SAME scale.

• The scale we will use is from the NCE distribution with mean 50 and standard deviation approximately equal to 21.06.

Mean

The NCE Distribution and Wilma

Wilma’s NCE score for 2005 (3rd grade) is 61 while her score for 2006 (4th grade) is 66.

Wilma2006

4th

Wilma20053rd

Wilma’s gain…

Wilma’s gain = 2006 NCE score – 2005 NCE score

(4th Grade) (3rd Grade)

= 66 – 61

= + 5

•The mean gain of all of the students in Wilma’s cohort (+5 NCE points) can now be compared to the Growth Standard for growth for Wilma’s cohort.

What about estimating the true level of achievement of a cohort of students?

The Assessment Dilemma

True Student Achievement

Any test is just a snapshot in time!

PVAAS Statewide Methodology

Student ATest

Score(2009)

Student ABase YearNCE Score

(2006)

2009 Observed School Mean NCE

Scores

The Problem with the Mean of the Observed Scores

The mean of the observed NCE scores at best represents a single snapshot in time of student achievement of the PSSA Anchors…

Is it the most comprehensive assessment of the school’s TRUE level of achievement?

How about the Bad Day syndrome?

Observed vs. Composite Estimate…Which is better?

What if we combined the new, observed data with all of the prior PSSA assessment information that we have for this cohort of students?

Would not a longitudinal view of the cohort’s performance yield a more precise and reliable estimate of the true level of achievement?

This is the essence and power of the PVAAS value-added growth methodology!

Consider an Example…

Determine the percent of candies that are blue…

If you were to open only one bag and find that 13% of the candies are blue, how much confidence would you have in your estimate of the true percentage of blue candies for all candies?

Only One Sample? A Bit Risky…

Let’s open 50 bags and look at the distribution of the percents of blue candies…

Looking at these 50 bags, what would you estimate the “true” percent of blue candies for all candies?

What If?

Let’s open 50 more bags and add them to the 50 selected earlier…

Distribution with n = 50

Distribution with n = 100

With this additional data, we can make a better estimate of the true percent of blue candies!

The Function of Estimates• The PVAAS methodology provides

estimates of current and previous achievement, and subsequent gain for the school entity using all information for each student, no matter how complete or sparse.

• This process yields fair estimates of the impact of schooling on the rates of progress of the student populations and mitigates the problem of student mobility.

PVAAS Statewide Methodology

Computer

2009 Observed School Mean NCE

Scores

2008 EstimatedSchool Mean NCE

Score

2007 Estimated School Mean NCE

Score

2006 Estimated School Mean NCE

Score

2009 Estimated School Mean NCE

Scores

Gain =2009 Estimate – 2008 Estimate

Compare to Growth Standard School Rating

How to Measure Growth of a School?

Using a Growth Standard

• Student scaled scores are converted to NCE scores (2006 parameters).

• The mean NCE score for each school is calculated.• PVAAS revises all earlier estimates based on the addition of the

current data. • PVAAS calculates an estimated NCE mean score.

Estimated Mean NCE Gain = Current Estimated NCE mean – Previous Estimated NCE mean

• Gain is compared to Growth Standard for School Effect Rating.

Here is the Fall 2009 PVAAS District/School Report

Gain RatingsMean NCE Gain for a cohort in a given year represents the progress of students in that cohort relative to the Growth Standard of 0.

Color ratings:

Green – mean gain greater than or

equal to the Growth Standard favorable indicator

Yellow – mean gain less than one SE

below the Growth Standard warning sign

Light Red – mean gain is between

one and two SE’s below the Growth

Standard stronger caution

Red – mean gain less two SE’s

below the Growth Standard most

serious warning

Level of Evidence – The Role of Standard Error

The color-coded ratings on the mean gain of cohorts are based on the level of confidence we have that the gain of the cohort is truly below the Growth Standard…

More than 2 SE’s below Growth

Standard

Between 1 and 2 SE’s below

Growth Standard

Less than 1 SE below Growth

StandardAt or above the

Growth Standard

Significant Evidence of Lack

of Progress

Greater Evidence of Lack of Progress

Slight Evidence of Lack of Progress

THE GOAL

The Power of PVAAS

The power of this methodology is that it produces:

– Accurate estimates of the true level of achievement of the students in this school.

– Updated estimates of all prior mean performance estimates simultaneously as new data is input into the longitudinal data structure.

– Over time, more accurate and reliable estimates of the true level of understanding of the students in this grade or school.