Prepared for the U.S. Department of Education in 2010, the report Error Rates in Measuring Teacher and School Performance Based on Student Test Score Gains offers insight into the error rates involved in using value-added models (VAM) to label teachers as effective or ineffective.
As I wrote in an earlier post, it is inappropriate to use VAM or students’ standardized test results to measure the performance of an individual teacher. Since we do not hear much about the significant uncertainty associated with VAM, please share this report widely.
This paper addresses likely error rates for measuring teacher and school performance in the upper elementary grades using value-added models applied to student test score gain data. Using realistic performance measurement system schemes based on hypothesis testing, we develop error rate formulas based on OLS and Empirical Bayes estimators. Simulation results suggest that value-added estimates are likely to be noisy using the amount of data that are typically used in practice. Type I and II error rates for comparing a teacher’s performance to the average are likely to be about 25 percent with three years of data and 35 percent with one year of data. Corresponding error rates for overall false positive and negative errors are 10 and 20 percent, respectively. Lower error rates can be achieved if schools are the performance unit. The results suggest that policymakers must carefully consider likely system error rates when using value-added estimates to make high-stakes decisions regarding educators.
“Existing research has consistently found that teacher- and school-level averages of student test score gains can be unstable over time. Studies have found only moderate year-to-year correlations—ranging from 0.2 to 0.6—in the value-added estimates of individual teachers (McCaffrey et al. 2009; Goldhaber and Hansen 2008) or small to medium-sized school grade-level teams (Kane and Staiger 2002b). As a result, there are significant annual changes in teacher rankings based on value-added estimates. Studies from a wide set of districts and states have found that one-half to two-thirds of teachers in the top quintile or quartile of performance from a particular year drop below that category in the subsequent year (Ballou 2005; Aaronson et al. 2008; Koedel and Betts 2007; Goldhaber and Hansen 2008; McCaffrey et al. 2009).”