Interpreting State Test

54
Interpreting State Test Growth Model Emeteric PVAAS

description

Interpreting State Test. Growth Model Emeteric PVAAS. PVAAS Growth Methods. The Growth Standard Methodology. - PowerPoint PPT Presentation

Transcript of Interpreting State Test

Page 1: Interpreting  State Test

Interpreting State Test

Growth ModelEmeteric

PVAAS

Page 2: Interpreting  State Test

PVAAS Growth MethodsTesting Subjects – Grades Methodology

PSSA in consecutive

years

Math 4-8Reading 4-8

Growth Standard

PSSA not in consecutive

years

Writing – 5, 8, 11Science – 4, 8, 11

Math & Reading – 11Predictive

Page 3: Interpreting  State Test

The Growth Standard Methodology

• Each year a cohort’s estimated achievement (using all historical PSSA data available) will be located on the appropriate grade level distribution from the 2005-06 statewide distributions.

• The 2005-06 performance distributions are used to establish “typical” performance at each grade level so that growth in consecutive years can be measured relative to the same standard each year.

Page 4: Interpreting  State Test

Growth Standard MethodologyGrades 4 through 8 – Reading & Math

A cohort makes one year’s growth when…

The estimated achievement for the current year maintains the same relative position as the estimated achievement for the previous year in the statewide data base of all cohorts’ estimated achievement.

200

400

600

800

1000

1200

0 50 100 150 200 250 300

x

Population Histogram

200

400

600

800

1000

1200

0 50 100 150 200 250 300

x

Population Histogram

4th Grade Distribution 2005-06

5th Grade Distribution 2005-06

Cohort Position

Cohort Position

Page 5: Interpreting  State Test

Predictive MethodologyWriting, Science, and Grade 11 Reading & Math

A cohort makes one year’s growth when…

The mean observed score from the actual test is not significantly different from the mean predicted score for the cohort. The mean predicted score is calculated based on all reading and math data in each student’s record in the cohort.

Mean Observed Score ≈ Mean Predicted Score

Mean Predicted Score

± error

Mean Observed Score

Page 6: Interpreting  State Test

Summary:One Year’s Growth

Growth Standard Method

Math and Reading – Grades 4 through 8

Cohort maintains its achievement position.

Predictive Method

Math & Reading – Grade 11Writing – Grades 5, 8, 11 Science – Grades 4, 8, 11

Cohort’s actual performance is as expected.

Page 7: Interpreting  State Test

Using the Growth Standard

• What is a Growth Standard and how is it set?

– The Growth Standard specifies the minimal acceptable academic gain from grade to grade for a cohort of students.

Page 8: Interpreting  State Test

Using the Growth Standard

• How can we compare scores across different years?– The growth Standard uses converts PSSA scores to

an Equal interval score that allows you to compare scores. Without the conversion, you cannot compare scores.

Page 9: Interpreting  State Test

Using the Growth Standard

• The use of a Growth Standard creates the possibility that ALL schools can demonstrate appropriate growth.

Page 10: Interpreting  State Test

An Analogy

Page 11: Interpreting  State Test

An Analogy

• Doctors plot a child’s length/height over time.

• Each child may have a unique growth curve.

Page 12: Interpreting  State Test

When is growth “acceptable”?

• The length/height measurement is increasing over time.

• The length/height measurement maintains the approximate position in its length/height distributions as the child grows.

• The child’s length/height continues to increase in a consistent manner.

Page 13: Interpreting  State Test

When is growth “acceptable”?

• The PSSA growth standard acts in a similar manner as a child’s growth chart

• Deviation from “typical” Further Investigation is needed

Page 14: Interpreting  State Test

What is the Growth Standard for a child’s length/height?

• The standard is that the child maintain the approximate same position each of the increasing distributions of length/heights as the child grows.

• A significant deviation from that pattern does not indicate a problem; it indicates a need for further investigation.

Page 15: Interpreting  State Test

Simulated Growth Standard Charts for Academic Achievement

• Let us build an Academic Achievement Growth Chart.

1. Collect the average performances of a large sample of students using a uniform assessment during each year of their career through school.

2. Plot curves to represent appropriate percentile patterns.

3. An example: Suppose the following table represents the means and SDs of a group of students on the PSSA beginning in 3rd grade and continuing through 8th grade and ultimately 11th grade.

Page 16: Interpreting  State Test

800

1000

1200

1400

1600

1800

2000

3 4 5 6 7 8 9 10 11 12Grade_2005_2013

PSSA Math Scatter Plot

A Growth Standard Chart for Academic Achievement

PSSA MathGrade Score SD <new>

12345678

3 1270 250

4 1290 280

5 1300 255

6 1285 276

7 1310 262

8 1335 260

11 1320 270

3 1520

Page 17: Interpreting  State Test

An example of a cohort’s growth…

This cohort’s mean performances have met the Growth Standard since

1. The growth curve approximately maintains its position in the distribution of scores.

2. There are no significant deviations in the pattern of growth over time.

800

1000

1200

1400

1600

1800

2000

3 4 5 6 7 8 9 10 11 12Grade_2005_2013

PSSA Math Scatter Plot

Note that there is a problem of comparing scaled scores across years…

Page 18: Interpreting  State Test

A Problem…

• It will take six years to create an academic growth chart.

• We can use Base Year distributions.

• Distributions of the Base Year match the distributions of a single cohort over time.

Page 19: Interpreting  State Test

800

1000

1200

1400

1600

1800

2000

3 4 5 6 7 8 9 10 11 12Grade_in_2006

PSSA Math Scatter Plot

We use the base year distributions.

The base year for PVAAS is 2006.

Page 20: Interpreting  State Test

Using the Base Year 2006

Suppose the distributions from 2006 are given by

Grade 3 4 5 6

Mean 1270 1290 1300 1285

SD 250 310 255 276

Conversion to NCE scores will use the Base Year distributions in their calculations.

Page 21: Interpreting  State Test

Suppose the means of a cohort in two consecutive years are:2007: 3rd 1390 and 2008: 4th 1450

NCE scores are calculated for both using the 2006 means and SD’s.

Grade 3 4Mean 1270 1290

SD 250 310

2007: 3rd 1390

2008: 4th 1450

1390 1270 0.48250

50 21.06*0.4860.11

z score

NCE score

1450 1290 0.52310

50 21.06*0.5260.95

z score

NCE score

All future PSSA scaled scores will be converted to NCE scores using the 2006 Base year parameters for the comparison to calculate the mean gain of a cohort of students.

Page 22: Interpreting  State Test

The NCE Growth CurvesNCE PSSA Math

Grade Score SD <new>12345678

3 50.00 21.06

4 50.00 21.06

5 50.00 21.06

6 50.00 21.06

7 50.00 21.06

8 50.00 21.06

11 50.00 21.06

3 71.06 0

20

40

60

80

100

Grade3 4 5 6 7 8 9 10 11 12

NCE PSSA Math Scatter Plot

Page 23: Interpreting  State Test

Some Thoughts…This Growth Standard concept demonstrates the need for longitudinal data when considering academic growth since each student has his/her own academic growth curve.But…

The example also exhibits the remaining two issues for PVAAS value-added methods:

1. Comparing scores from year to year

2. Estimate the “true” level of achievement for input into the growth curve.

Page 24: Interpreting  State Test

Calculation of Gain from year to year

Student growth is measured by difference in performance in consecutive years.

Grade 3 4 5 6Score 1290 1310 1330 1365Gain 20 20 35

But there is a problem with this!

These scores are not comparable!

Page 25: Interpreting  State Test

Comparing scaled scores on the PSSA from different years

PSSA tests have different means and standard deviations at each grade and for different years. For example, in 8th grade:

Math ReadingYear Mean SD Mean SD2005 1370 222.2 1360 274.32004 1350 208.1 1370 239.7

Page 26: Interpreting  State Test

A Solution: Conversion to NCE Scores

• NCE scores indicate the position of a scaled score on a reference scale (mean = 50, sd = 21.06) so that the scaled scores from different distributions with different scales can be compared.

• The use of NCE scores does not impose a normal distribution on the data, nor does the use of NCE scores have any relationship to normed referenced tests.

• NCEs are excellent for looking at scores over time.Using Data to Improve Student Learning in High Schools

Victoria L. Bernhardt

Page 27: Interpreting  State Test

NCE Scores Are About Position

To calculate an NCE score:1. Calculate the z-score of the data value of interest,

that is, the number of standard deviations the data value is from the mean of its distribution:

2. The NCE score is calculated using the following formula:

observed score meanz scoreSD

50 21.06*( )NCE score z score

Page 28: Interpreting  State Test

The need for uniform scales…

• George scores a 655 on the SAT mathematics exam.

• George also scores a 28 on the ACT mathematics exam.

Which score should he report to his colleges if he wants to provide the

“better” score?

Page 29: Interpreting  State Test

A Matter of Comparison

How do we compare George’sscores?

Mean SD George

SAT 520 110 655

ACT 20.7 5.0 28

The nature of each distribution is irrelevant to the question of interest:

Page 30: Interpreting  State Test

A Solution

• Conversion of both scores to NCE scores allows for the identification of the position of each score on the same scale.

• This identification of position provides the capability of comparison since the converted scores will be based on the same distribution parameters.

Page 31: Interpreting  State Test

Which Score Should George Choose to Report?

65550 21.06*NCE score 655 52050 21.06* 75.85

110NCE score

Using a NCE scale with mean 50 and standard deviation 21.06…

SAT score of 655 NCE score 75.85

ACT score of 28 NCE score 80.74

Clearly, he should report his ACT score!

ACTscore

SATscore

28 20.750 21.06* 80.745.0

NCE score

Page 32: Interpreting  State Test

Consider Another Hypothetical Scenario…

In 2006, Wilma was in 4th grade and scored as follows on the 4th grade PSSA:

Mean for 4th Grade – 2006 = 1303.24

Standard Deviation for 4th Grade – 2006 = 164.20

Wilma’s scaled score = 1425

In 2005, Wilma was in 3rd grade and scored as follows on the 3rd grade PSSA:

Mean for 3rd Grade – 2005 = 1356.75

Standard Deviation for 3rd Grade – 2005 = 126.20

Wilma’s scaled score = 1425

Do these scores indicated that Wilma progressed during 4th grade?

Page 33: Interpreting  State Test

Let’s Look at it Graphically…

Even though Wilma’s scaled scores were the same (both 1425), since the distributions were different, we really can’t compare the two scores…

Wilma Wilma

Page 34: Interpreting  State Test

A Tentative Solution: Conversion to Percentiles

In our example, Wilma score of 1425 was in the 66th percentile for 2005 but was in the 76th percentile for 2006. These percentiles focus on Wilma’s position in

each distribution.

WilmaWilma

Page 35: Interpreting  State Test

But…

• We cannot calculate Wilma’s gain – the difference of percentiles does not make sense…

• Percentiles are not meaningful for calculating means for different years, gains, etc., since they are calculated from different distributions.

Page 36: Interpreting  State Test

The Complete Solution: Conversion to NCE Scores

• To establish a basis of comparison for different distributions from different schools in different years, we convert the scaled scores to units in the SAME scale.

• The scale we will use is from the NCE distribution with mean 50 and standard deviation approximately equal to 21.06.

Mean

Page 37: Interpreting  State Test

The NCE Distribution and Wilma

Wilma’s NCE score for 2005 (3rd grade) is 61 while her score for 2006 (4th grade) is 66.

Wilma2006

4th

Wilma20053rd

Page 38: Interpreting  State Test

Wilma’s gain…

Wilma’s gain = 2006 NCE score – 2005 NCE score

(4th Grade) (3rd Grade)

= 66 – 61

= + 5

•The mean gain of all of the students in Wilma’s cohort (+5 NCE points) can now be compared to the Growth Standard for growth for Wilma’s cohort.

Page 39: Interpreting  State Test

What about estimating the true level of achievement of a cohort of students?

Page 40: Interpreting  State Test

The Assessment Dilemma

True Student Achievement

Any test is just a snapshot in time!

Page 41: Interpreting  State Test

PVAAS Statewide Methodology

Student ATest

Score(2009)

Student ABase YearNCE Score

(2006)

2009 Observed School Mean NCE

Scores

Page 42: Interpreting  State Test

The Problem with the Mean of the Observed Scores

The mean of the observed NCE scores at best represents a single snapshot in time of student achievement of the PSSA Anchors…

Is it the most comprehensive assessment of the school’s TRUE level of achievement?

How about the Bad Day syndrome?

Page 43: Interpreting  State Test

Observed vs. Composite Estimate…Which is better?

What if we combined the new, observed data with all of the prior PSSA assessment information that we have for this cohort of students?

Would not a longitudinal view of the cohort’s performance yield a more precise and reliable estimate of the true level of achievement?

This is the essence and power of the PVAAS value-added growth methodology!

Page 44: Interpreting  State Test

Consider an Example…

Determine the percent of candies that are blue…

If you were to open only one bag and find that 13% of the candies are blue, how much confidence would you have in your estimate of the true percentage of blue candies for all candies?

Page 45: Interpreting  State Test

Only One Sample? A Bit Risky…

Let’s open 50 bags and look at the distribution of the percents of blue candies…

Looking at these 50 bags, what would you estimate the “true” percent of blue candies for all candies?

Page 46: Interpreting  State Test

What If?

Let’s open 50 more bags and add them to the 50 selected earlier…

Distribution with n = 50

Distribution with n = 100

With this additional data, we can make a better estimate of the true percent of blue candies!

Page 47: Interpreting  State Test

The Function of Estimates• The PVAAS methodology provides

estimates of current and previous achievement, and subsequent gain for the school entity using all information for each student, no matter how complete or sparse.

• This process yields fair estimates of the impact of schooling on the rates of progress of the student populations and mitigates the problem of student mobility.

Page 48: Interpreting  State Test

PVAAS Statewide Methodology

Computer

2009 Observed School Mean NCE

Scores

2008 EstimatedSchool Mean NCE

Score

2007 Estimated School Mean NCE

Score

2006 Estimated School Mean NCE

Score

2009 Estimated School Mean NCE

Scores

Gain =2009 Estimate – 2008 Estimate

Compare to Growth Standard School Rating

Page 49: Interpreting  State Test

How to Measure Growth of a School?

Using a Growth Standard

• Student scaled scores are converted to NCE scores (2006 parameters).

• The mean NCE score for each school is calculated.• PVAAS revises all earlier estimates based on the addition of the

current data. • PVAAS calculates an estimated NCE mean score.

Estimated Mean NCE Gain = Current Estimated NCE mean – Previous Estimated NCE mean

• Gain is compared to Growth Standard for School Effect Rating.

Page 50: Interpreting  State Test

Here is the Fall 2009 PVAAS District/School Report

Page 51: Interpreting  State Test

Gain RatingsMean NCE Gain for a cohort in a given year represents the progress of students in that cohort relative to the Growth Standard of 0.

Color ratings:

Green – mean gain greater than or

equal to the Growth Standard favorable indicator

Yellow – mean gain less than one SE

below the Growth Standard warning sign

Light Red – mean gain is between

one and two SE’s below the Growth

Standard stronger caution

Red – mean gain less two SE’s

below the Growth Standard most

serious warning

Page 52: Interpreting  State Test

Level of Evidence – The Role of Standard Error

The color-coded ratings on the mean gain of cohorts are based on the level of confidence we have that the gain of the cohort is truly below the Growth Standard…

More than 2 SE’s below Growth

Standard

Between 1 and 2 SE’s below

Growth Standard

Less than 1 SE below Growth

StandardAt or above the

Growth Standard

Significant Evidence of Lack

of Progress

Greater Evidence of Lack of Progress

Slight Evidence of Lack of Progress

THE GOAL

Page 53: Interpreting  State Test

The Power of PVAAS

The power of this methodology is that it produces:

– Accurate estimates of the true level of achievement of the students in this school.

– Updated estimates of all prior mean performance estimates simultaneously as new data is input into the longitudinal data structure.

– Over time, more accurate and reliable estimates of the true level of understanding of the students in this grade or school.

Page 54: Interpreting  State Test