The Inevitability of the Use of Value-Added Measures in Teacher Evaluations

“Value added” or “VA” refers to the use of statistical techniques to measure teachers’ impacts on their students’ standardized test scores, controlling for such student characteristics as prior years’ scores, gender, ethnicity, disability, and low-income status.

Reports on a massive new study that seem to affirm the use of the technique have recently been splashed across the media and chewed over in the blogosphere.  Further from the limelight, developments in Wisconsin seem to ensure that in the coming years value-added analyses will play an increasingly important role in teacher evaluations across the state.  Assuming the analyses are performed and applied sensibly, this is a positive development for student learning.

The Chetty Study     

Since the first article touting its findings was published on the front page of the January 6 New York Times, a new research study by three economists assessing the value-added contributions of elementary school teachers and their long-term impact on their students’ lives – referred to as the Chetty article after the lead author – has created as much of a stir as could ever be expected for a dense academic study.

And an amazing study it is.  The researchers capitalized on their access to a treasure trove of data – twenty years of information on test scores and classroom and teacher assignments in grades 3 through 8 for more than 2.5 million students attending schools in a large urban school district, as well as federal tax records from 1996 through 2010 that the researchers were able to link to most of the individual students for whom they had test scores.

Using the data, the researchers studied whether value-added measures are fairly attributable to individual teachers.  Their first major conclusion is that value-added measures provide unbiased estimates of teachers’ causal impacts on test scores.  Next, they made use of the tax data to substantiate their second major conclusion that students assigned to high-VA teachers are more likely to do better in later life.

The researchers summarized the findings supporting their second conclusion this way:

Students assigned to high-VA teachers are more likely to attend college, attend higher-ranked colleges, earn higher salaries, live in higher SES neighborhoods, and save more for retirement.  They are also less likely to have children as teenagers.  Teachers have large impacts in all grades from 4 to 8.  On average, a one standard deviation improvement in teacher VA in a single grade raises earnings by about 1% at age 28.  Replacing a teacher whose VA is in the bottom 5% with an average teacher would increase the present value of students’ lifetime income by more than $250,000 for the average classroom in our sample.  We conclude that good teachers create substantial economic value and that test score impacts are helpful in identifying such teachers.

The study has garnered lots of attention.  See, e.g., herehere  and here.  I find myself agreeing with a pretty balanced assessment by Matthew DiCarlo in a post on the Shanker Blog, which is affiliated with the American Federation of Teachers.  Dr. DiCarlo writes:

The fact that teachers matter is not in dispute. The issues have always been how to measure teacher effectiveness at the individual-level and, more importantly, whether and how it can be improved overall.

On the one hand, the connection between value-added and important future outcomes does suggest that there may be more to test-based teacher productivity measures – at least in a low-stakes context – than may have been previously known. In other words, to whatever degree the findings of this paper can be generalized, these test-based measures may in fact be associated with long-term desired outcomes, such as earnings and college attendance. There is some strong, useful signal there.

On the other hand, this report’s findings do not really address important questions about the proper role for these estimates in measuring teacher “quality” at the individual level (as previously discussed here), particularly the critical details (e.g., the type of model used, addressing random error) that many states and districts using these estimates seem to be ignoring. Nor do they assess the appropriate relative role of alternative measures, such as principal observations, which provide important information about teacher effectiveness not captured by growth model estimates.

Most importantly, the results do not really speak directly to how teacher quality is best improved, except insofar as it adds to the body of compelling evidence that teachers are important and that successful methods for improving teacher quality – if and when they are identified and implemented – could yield benefits for a broad range of outcomes over the long-term.

I think it’s fair to note that the authors of the Chetty study probably wouldn’t disagree with this assessment.

Objections to Value-Added Analysis

There seem to be two principal objections to the use of value-added assessments on a conceptual level.  (On a nuts-and-bolts level, there can be plenty of objections of a more technical nature to such variables as the sufficiency of the data used in a value-added analysis and the formulation of the various regression analyses that yield the results of the study.)

The first conceptual objection challenges whether the approach could ever be reliable and meaningful.  The Chetty study seems to go a long way toward demonstrating that, with enough data, value-added analyses can provide a uniquely valuable basis for comparative assessments of the contributions of individual teachers to student learning.

The second objection – and one noted in the Chetty study– is that the reliability of value-added assessments may be compromised if it becomes a significant part of teachers’ evaluations.  Teachers would have an incentive to adjust their approach in a way that placed more emphasis on test preparation for their students and “teaching to the test” could result, which might skew the value added analysis.

The specter of “teaching to the test” is not one that bothers me much.  At least for us in Madison, I don’t find it to be a very compelling criticism or caution regarding the use of value-added analyses.  First, in Madison we won’t have a wholesale shift in emphasis whereby curriculum is narrowed down to only those subjects that are tested on standardized assessments.

Second, our good teachers will not be driven to “teach to the test.”  They won’t have to.  One of the ways in which their skills will be manifested will be their value-added measures.  I do not favor and cannot foresee our adopting any sort of merit pay scheme tied to value added results, so successful teachers won’t have economic incentives to shortchange parts of the curriculum in order to put inordinate emphasis on reading and math.

Finally, it could be that struggling teachers would be more inclined to work hard to bring up their students’ scores on standardized tests if they need to show improvement in their value-added measures in order to hold on to their jobs.  There are a lot worse job-survival strategies that struggling teachers could employ.  Assuming the standardized tests accurately assess students’ knowledge on the topics we want them to understand, “teaching to the test” has never struck me as all that bad a thing for teachers whose students have not heretofore demonstrated sufficient academic growth.

Value-Added in Madison            

For the least several years, the Madison school district has received value-added reports prepared by the Value-Added Research Center, part of the UW Center for Education Research.

Initially, the reports only compared our schools against each other.  This year, for the first time, we received a report that looks at our value-added figures as compared with state averages.  We were told that “MMSD performs well relative to the state– VA for entire district positive on average in 2009-10, with stronger results for reading than for math.”  I don’t have any way to assess the actual significance of the extent to which the Madison value-added figures slightly exceed the state averages.   The entire value-added report we received in September can be found here.

We have not been presented with value added measures for individual teachers.  I assume, without knowing for sure, that we don’t have enough data to derive such figures in a way that would be both reliable and meaningful.  I also suspect that we’re at least several years away from being able to do so.

Statewide Developments:  Value Added as an Alternative to No Child Left Behind 

As I wrote previously, the Obama administration has offered states the opportunity for waivers from the most onerous provisions of No Child Left Behind.  In order to qualify for a waiver, a state must:  (1) adopt college and career readiness standards in reading and math, as Wisconsin has adopted the Common Core standards; (2) establish an accountability system that, inter alia, calls for interventions for the lowest-performing 5% of schools in the state and (3) establish a teacher and principal evaluation system that assesses performance based on student progress over time as well as other measures of professional practice.

According to information from the Wisconsin Association of School Boards, the Department of Public Instruction is readying its application for the state’s NCLB waiver in time for the submission deadline of February 21.  DPI expects to have the draft waiver application available for public comment by January 23.  The state Senate and Assembly Education Committees plan to hold a joint informational hearing on the waiver proposal on January 25.

As part of the effort to obtain a NCLB waiver, State Superintendent Tony Evers appointed a Wisconsin Educator Effectiveness Design Team.  The group has proposed a statewide evaluation framework for teachers and principals that is based half on “educator practice” and half on “student outcomes.”

The “educator practice” component of the evaluation is to include multiple observations supplemented by other measures of practice.  To the extent available, student outcomes are to be measured primarily on the basis of individual value-added data on statewide standardized assessments, district-specific standardized assessment results, and collaboratively-established and teacher-developed “Student Learning Objectives.”

If school districts’ analytical capabilities improve to the point where they can assign reliable value-added rankings to their individual teachers, I’d assume parents would clamor for those figures so they could try to funnel their kids to the top-ranked teachers.  In light of this natural tendency, it’s interesting to note that the DPI Design Team “recommends that the laws and regulations of the State of Wisconsin ensure that personally identifiable information in relation to the evaluation system is not subject to public disclosure.  As such, individual evaluation ratings (and subcomponents used to determine ratings) are not subject to open records requests.”

The proposed evaluation system will sort teachers into three categories:  “developing,” “effective,” and “exemplary.”  Teachers judged as “developing” will “undergo an intervention phase.”  If the intervention does not succeed in bumping the teacher up into the “effective” category, then “the district shall move to a removal phase.”

To meet the NCLB waiver requirements, this evaluation system is to be fully implemented by the 2014-15 school year, when this year’s first graders are in fourth grade.

The Coming Tectonic Shift in Teacher Evaluation in Wisconsin

DPI acknowledges that the new teacher evaluation system it is developing will mark a major shift for Wisconsin. If the framework is implemented with fidelity, it certainly will.  There has been and presumably continues to be considerable resistance in the state to a high-stakes teacher evaluation system that takes into account student learning as measured through standardized tests, as value added analyses do.

As part of the effort to qualify for federal Race to the Top funding a couple of years ago, the state took a stab at opening the door to teacher evaluations based in part on student performance.  The effort was less than an unqualified success.

One of the requirements to qualify for Race to the Top funds was that a state could not bar consideration of student achievement as part of the process for evaluating teachers.  This was inconvenient, because Wisconsin had a state law that did just that.

Prompted by the Race to the Top requirements, the legislature acted in 2009 to change this law.  However, while the legislature attempted to comply with the letter of the Race to the Top requirement by eliminating the specific bar on using the results of statewide assessments as part of the evaluation of teacher performance, it flouted the requirement’s spirit by maintaining the bar on relying in any way on those results to discharge a teacher or non-renew his or her contract.

The practical effect of the changes the law wrought was that results of students’ performance on the statewide assessment, whether the WKCE or its successor, were quite unlikely to be used in the evaluation of teacher performance.  A school district wasn’t going to have two separate teacher evaluation processes, one that takes into account student performance but that couldn’t be used for making non-renewal decisions, and a separate process that could be relied on in determining a teacher’s employment future with the district.

Last month, with 2011 Wisconsin Act 105, the legislature changed the law once more, removing the bar on relying on the results of standardized tests to discharge or non-renew a teacher.  Now, the results of standardized tests can be taken into account but cannot be used “as the sole reason” for this type of adverse employment action.  The statute also specifies that school districts may use value-added analyses of scores on standardized tests to evaluate teachers.

The new teacher evaluation system that DPI is developing would also represent a major change for Madison.  The current collective bargaining agreement between the school district and MTI provides that the criteria to be used in measuring a teacher’s performance are limited to “professional knowledge, professional interest, assignments to pupils, instructional preparation, rapport with and control of pupils, techniques of teaching.”

While the DPI Design Team’s first “guiding principle” is that “the ultimate goal of education is student learning,” measures of student learning cannot currently be relied upon to evaluate MMSD teachers.  This seems likely to change.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment