Merit Pay Pro and Con

Susan Troller had a typically good and very substantive article in the Capital Times this week about merit pay for teachers and other dimensions of teacher evaluations.

Merit pay is an issue that highlights the culture clash between the new breed of educational reformers and the traditional education establishment that finds its foundation in teachers and their unions.

Educational reformers nowadays frequently come to education as an avocation after successful business careers.  These reformers, like Bill Gates and Eli Broad,  believe that our approach to education can be improved if we import the sort of approaches to quality and innovation that have proved effective in the business world.

So, for example, let’s figure out what’s the single most important school-based variable in determining student achievement.  Research indicates that it’s the quality of the teacher.  Well then, let’s evaluate teachers in a way that lets us assess that quality, let’s put in place professional development that will allow our teachers to enhance that quality, and let’s have compensation systems that allow us to reward that quality.

That’s not exactly the way we do it now.  Our teacher compensation system is based on seniority and the accumulation of professional degrees.  At best, these factors are only tangentially related to teacher quality.  So the reformers say, let’s overhaul the system and make it (to our way of thinking) more rational.  A keystone to any proposal for reform is a merit pay system that somehow links teacher compensation to teacher quality as manifested though enhanced student achievement.

And, by and large, teachers and their unions don’t like it, as the Cap Times article made clear.  A whole host of objections have been raised to merit pay for teachers.  Some of the objections have some validity and others do not.  Let’s try to sort through them.

Here are four more-or-less valid reasons why teachers may not like merit pay:

1. WKCE is a lousy test. Since our focus is enhanced student achievement, we want to assess the extent to which our teachers enhance our students’ learning.  The obvious way to do this is by looking at how a teacher’s students perform on a standardized test, since this is a common measure that enables us to draw comparisons across classrooms.  Unfortunately, the standardized test we currently have in Wisconsin is the WKCE, and it’s just not very good for these purposes.  For one thing, it is administered in the fall, so it can’t measure the performance of a student’s current teacher.  For another, it is not administered to high school students at any time after the fall of their sophomore year.  Any serious effort to use standardized test results as a meaningful part of teacher evaluation will probably have to wait until we have better standardized tests.  I am optimistic that better tests are on their way but they won’t be here for at least a few years.

2.  It’s a zero-sum game. Without more money overall, moving to a merit pay approach is a zero-sum game, with a loser for every winner.  Michelle Rhee, the superintendent of the Washington, D.C. school system, is able to offer significantly higher salaries to effective teachers in D.C. schools because she was able to attract a number of donors to contribute the millions necessary for the pay bumps.  We don’t have that here.  If we’re not able to increase the size of the overall pay pie, then for anyone who gets a bigger piece, someone else is going to have to make do with less.  If teachers generally tend to be risk-averse when it comes to salary, and I think that they probably do, a change to a more variable and less predictable pay system is not likely to be welcomed.

3. It’s too hard to figure out who is actually good. There are a number of challenges in evaluating teacher performance.  First, we’re not primarily concerned about teaching, we’re concerned about learning, so teaching excellence isn’t as easy to identify as other examples of stellar performance.  We’re not very sympathetic to the teacher who says I taught good but they learned lousy.

Second, teachers tend to work in relative isolation.  Evaluators can sit in on a class and observe, but the very fact of observation is likely to affect the teacher’s performance in one way or another.  (A teacher’s students know the most about how he or she operates on a day-to-day basis, but we tend not to trust their evaluations.)

There’s also some irreducible mystery to teaching.  We’ve all had teachers who were somehow able to run a quiet and orderly classroom without ever raising their voice or showing the least effort.  Other teachers couldn’t control a classroom if you equipped them with a megaphone and a bullwhip.  It’s a challenge to reduce that intangible quality to a measurement matrix.

4. Would you want your professional future in the small hands of a bunch of nine-year-olds holding number 2 pencils?

And here are four not-so-great reasons why teachers may not like merit pay:

1. Merit pay discourages collaboration. One of the criticisms of merit pay voiced in the Cap Times article is that such a system would discourage collaboration among teachers.  I don’t see why this would be the case.  It’s not as if the teachers in each school would all be pitted against each other, with the lowest-ranking voted off the island each week.   Teachers should have the same incentive to collaborate to hone their own skills and to be helpful to their colleagues under a merit pay system as they do today. Effective teacher collaboration should lead to enhanced student performance, which is the goal of the enterprise.  A sound teacher evaluation system would value and reward effective collaboration.

2. We don’t want teaching to the test. Another criticism of merit pay is that it provides too much incentive for teachers to just “teach to the test.”  I have never understood why this is supposed to be such a bad thing. So long as the test that is taught to provides an accurate measure of the materials the students should have mastered, what’s wrong with teaching to the test?  It means that students will be studying and learning the materials they we expect them to master.

The concern, I imagine, is that teachers will put undue emphasis on learning the materials that will be on the test, and will short-change the students with respect to other parts of the curriculum.  Good teachers won’t do this.  Poor teachers fall into that category because they are less able to effectively cover everything that a good teacher does.  If this is the case, and the less-good teachers will have to scale down their scope of activities, why not have them focus on teaching the materials that their students are expected to learn?

3. Some kids are smarter than others. A common criticism of attaching significance for a teacher to how well his or her students do on standardized tests is that some students are simply going to score higher on standardized tests than others no matter what the teacher does.  It’s unfair to base a teacher’s salary on test results, when so much of the difference in the scores is attributable to the luck of the draw in terms of what particular students are in the classroom.

This is a valid point, but it‘s not a conversation stopper.  The way the variability in student abilities is taken into account is by utilizing a value-added method of analysis.  The value-added approach takes a longitudinal approach to individual students and tracks how much they learn from year to year.   No matter where an individual student starts the year, a good teacher should help that student make at least a year’s worth of progress. An advanced student might still be advanced at the end of a school year, but if he or she made less than a year’s progress in a particular grade his or her teacher likely did not do the world’s greatest job for the student.  If a value-added approach has sufficient data it can lead to assessments of teacher effectiveness that control for the particular demographics of the teacher’s classroom.

4. Teachers don’t have that much impact. One somewhat curious criticism that teachers make of merit pay is that, given the stresses in students’ lives, it would be unfair to hold teachers accountable for how much their students learn.  The Cap Times quotes MTI’s John Matthews making the point: “When students walk in the door each morning, the teacher doesn’t know whether a child had two parents at home, a single parent or no parents. They don’t know if they get dental care, so they might have a toothache. They don’t know if they had dinner or breakfast, so they might be hungry. It’s hard to concentrate on whether two plus two is four if you’re wondering about where your parents were the night before.” (Richard Rothstein’s book Class and Schools, which has been enthusiastically endorsed by Matthews, develops this argument.)

Teachers who contend that an individual student’s learning is so dependent on factors outside of the school’s control as to make it unfair to base a teacher evaluation system on student performance end up, surprisingly, arguing against the significance of their work.  They are saying there is only so much we can do for our students, their capacity to learn may be overwhelmed by other factors in their lives.

It’s an odd thing.  Proponents of better evaluation systems justify their position by pointing to the significance of teacher quality in determining how much a student learns.  Teachers resistant to the approach respond, in effect, oh, no, we’re not really so important.  There is only so much we can do.

In this case, I think the teachers doth protest too much.  It’s one thing to argue that our ability to analyze the results of standardized tests on a value-added basis hasn’t advanced to the point where we can fairly rely upon them to judge teacher effectiveness.  It’s another to throw up our hands and say some of these kids can’t learn no matter how good the teacher.

* * * *

Where do I come down on this?  In principle, I’d like to see a merit pay system for teachers.  I’d want to base it on a value-added analysis of student performance on some standardized test that is better than the WKCE. I wouldn’t want to make the analysis too refined, but it should take into account significant deviations from the mean, both good and bad. I’d want to take into account other characteristics of teacher quality as well.  And I’d want teachers to play a critical role in any group that was charged with coming up with the evaluation method.

But what I want is unlikely to make any difference.  Given the strong opposition to the very notion of merit pay, particularly among teacher unions, it’s not going to happen around here anytime soon.

This entry was posted in Uncategorized. Bookmark the permalink.

5 Responses to Merit Pay Pro and Con

  1. Laura Chern says:

    The biggest problem with merit pay is that the person who determines who gets the extra pay may be biased, despite any objective process. So, if principals determine merit pay, their friends may get the raises regardless of merit. Imagine if an administrator’s spouse gets a merit raise. I understand the advantages but I think this could be a time consuming can of worms – that is why public employees unions generally are against it.

  2. Larry Winkler says:

    “[W]hat’s the single most important school-based variable in determining student achievement. Research indicates that it’s the quality of the teacher.” Yes, and no. Statistically, teacher quality might be the most important single factor, but it accounts for perhaps 12% of the variation in student achievement, leaving 88% of the causes for student achievement unaccounted for. Therefore, it’s also true that “4. Teachers don’t have that much impact.” Your trying to use the word “important” in its several definitions simultaneously, when “important” statistically has only one meaning, and it doesn’t mean important overall. This is a pervasive problem. You run into the same problems if you use legal terminology with a person who only knows those terms in common English usage.

    “1. WKCE is a lousy test. …Any serious effort to use standardized test results … will probably have to wait until we have better standardized tests.” I good beginners book on testing is probably W. James Popham, “The Truth About Testing” would help disabuse you of the notion that “better standardized tests” will get you anywhere. But, we can go back to any of our own testing experiences to illustrate an important issue: there indeed may be poor tests and good tests, but the characteristics of even “good” tests can simply measure different things even if they purport to measure “achievement”. A timed multiple-choice test measures different things, than a multiple-choice test that is not timed, measures different things than a fill-in-the-blank, measures different things than a blue-book essay test, measures different things than an open book test, measures different things than a homework assignment. Therefore a test only measures what it is designed to test based on its particular characteristics. “Math Achievement” on a given test does not have the same meaning as “math achievement” used in common vernacular. Same problem as described above.

    For the not-so-great reasons you indicate:
    “2. We don’t want teaching to the test. … I have never understood why this is supposed to be such a bad thing.” So, I’ll tell you. It happens in a couple of ways. 1) Since a test only can measure, if at all, only a very narrow set of skills which can be tested using a multiple-choice or short answer test within a limited testing time, one can gear instruction based on just that portion of the curriculum and the skill necessary do well on the test. 2) In high stakes testing, where the merit of a teacher is based on getting kids to some minimal level overall, then a teacher can (will?) increase their “success” by finding those students at the border of the magic level, and focus instruction to bring them just past that border, and focusing on those just above the level, to ensure they stay above. Efforts on students too far below, who have no chance of achieving sufficiently will be a complete waste of time, and those sufficiently above the magic level we can also ignore because they will remain above that magic level, regardless.

    “No matter where an individual student starts the year, a good teacher should help that student make at least a year’s worth of progress.” Great mantra, but is that true? Do we know what is a year’s worth of progress? Is that testable? Would the improvement be apparent on a given test? “No” on all counts. Your mental model of learning expects to see some linear growth of knowledge, and skills, when in most experiences, the better model is a step function, not a linear function. That is, there may be a step up, but much time is actually spent on the plateau, before showing a “sudden” leap to the next step up. There can even be steps down in the process of learning. One can find that an approach that served you well learning prior information, actually limits your ability to get to the next level, requiring reassessment and relearning of older material using a new skill set. I have noted this on several occasions, when some musician, actor, sports person drops out for a prolonged time to relearn their craft.

    One should simply look at the decades of efforts wasted on just these kinds of simplistic answers and realize these are diversionary tactics, nothing more.

  3. Larry Winkler says:

    Addendum: Your single most important factor language looked familiar. And I finally found its source, and it’s scary. It’s from William Sanders, the prime pusher of the TVAAS (Tennessee Value-Added Assessment System), a statistical system now incorporated into SAS modules. The VAA stuff has been a recent juggernaut for proponents, is attractive to the uninitiated (it’s both very simplistic and statistical, so it has a high AWE factor), is now a great way to make a name for yourself in some academic circles, and there is plenty of money in it for grants. It has too many markings of flim-flam sauce — another educational bubble akin to the housing bubble.

    I reviewed some of Sander’s studies supporting this model some five years ago and was not impressed with the experimental design. In one study, because he wants to make longitudinal predictions, and keep his sample sizes up, missing data is simply added to the sample of real data by assuming his model is correct. You cannot test your model if you invent the data to support it. (And there was a lot of missing data he had to invent).

    Then we get into his definition of teacher effectiveness. It’s circular, in which he *defines* an effective teacher as one who raises the average test score between the first and last test-taking on a particular test. Excessively naive.

    Then, he assumes that test scores are integral measures instead of ordinal — that is, a one point rise on a test in the middle of the scoring range is the same as a one point increase at either end of the scale. He uses norm-referenced tests which do not have that characteristic. His used items from McGraw-Hill’s CTBS. No way!

    What about stability of his measures of teacher effectiveness? That is, if you measure teacher effectiveness using one test, and get a ranking of effectiveness, if you change the test that you use to measure effectiveness, will the effectiveness ranking remain pretty much the same? Well, no! In one study, two math tests were used, one measuring math procedures (calc proficiency), and other math problem solving. Same set of teachers were being measured on the same set of kids using two different tests, and the set of effective teachers on the procedures test was different than the set of effective teachers on the problem solving test.

    Really, so you are going to make merit and hiring and firing decisions on this flimsy basis, or think that world is coming when the tests get better? You’re being snookered if you are committed to doing good.

  4. Thanks Larry for all the good points.

    I’ll add four more.

    1. The teacher a student has the next year has been found to statistically “influence” the value added measures of learning, indicating biases in assignment not accounted for in the model.

    2. By definition, a useful standardized test doesn’t measure what students know, but rather differentiates among them. If all or a vast majority of students answer an item correctly, that item doesn’t differentiate well and is rejected. Yet another way that the promise of test data-driven policies is misleading. I’m not saying ignore or eliminate all standardized tests, just that their limits need to be front and center.

    3) The TVAAS methods are proprietary, meaning that there is little transparency the limitations are hidden. The system touted by UW researchers is more transparent, but statements by multiple Board Members indicate that they lack a basic understanding of what it is and isn’t. That makes using it in in any intelligent manner impossible. One recent example is in the post above where “variability in student abilities” is equated with value added analysis. This is something that the model used by UW and MMSD does, but it isn’t intrinsic to VAA and similar renorming can be done without VAA. I’m not picking on Ed and maybe he knows this, but each layer of statistical manipulation makes it increasingly less likely that policy makers will understand the process and results. By the way, another way to look at these sorts of manipulations is that they are based on lowering expectations.

  5. 4) What about teachers i untested subjects?

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s