This week, millions of students across New York will begin taking the state’s annual round of standardized tests, starting with the Common Core English Language Arts test on Tuesday. But it’s not just students’ performance that will be graded — teachers are being evaluated just as much, if not more.
While 30 states require test scores to be considered in teacher evaluations, New York stands out in just how much weight it may soon be putting on the scores. In March, the state legislature approved a hotly contested measure backed by Gov. Andrew Cuomo that takes away district control of teacher evaluations and could allow the education commissioner to use test scores to determine as much as half of a teacher’s evaluation, giving test scores and observations the same weight. Legislators seem confused about the practical implications of the bill and are investigating the language. The education department has until June 30 to finalize the evaluation system.
The bill is just the latest in a flurry of teacher evaluation policy reforms that have been implemented across the country in recent years. New York had already reformed its teacher evaluations in 2012, but after 96 percent of teachers were deemed effective last year, Cuomo is looking to rework the system again.
According to the Center on Great Teachers and Leaders at American Institutes for Research, a nonprofit behavioral and social science research organization, every state but California has developed plans for a new system to be implemented at some point between 2011 and 2019. These new models of teacher evaluation include state tests, district tests, classroom observations, student and parent surveys, self-assessments from teachers, lesson-plan reviews, measures of professional learning, and more.
All these attempts to reform teacher evaluations are happening as the research on how best to assess teacher quality remains in its earliest stages, and teachers and students are being affected with every shift in policy. “We’re in the Wright brothers stage,” said Jim Hull, a senior policy analyst for the Center for Public Education. “We’re getting these policies in the air, and they’re not designed to be as effective as they probably could be. There’s still a lot to learn.”
One New York principal, Carol Burris of South Side High School in Rockville Centre, who is also a fellow with the nonprofit National Education Policy Center and who has been an outspoken opponent of high-stakes testing, said that New York’s policy will affect students for the worse.
“It puts the test scores in a pre-eminent position, which will have a bad effect on kids,” she said. “We’ve already seen the effects that this has — a narrowing of curriculum and teaching to the test — and that will just become an even more prevalent practice.”
In Burris’s ideal world, test scores would play at most a secondary role in teacher evaluations. “I think the role of test scores should be outside the model. Teachers should be evaluated by their principal, and if for some reason that evaluation is way off the mark, then decent test scores can be used as a form of protection to challenge that evaluation,” she said.
Hull understands her anxiety about the new policy. “There’s definitely room for concern when you talk about 50 percent of an evaluation being based on test scores,” he said. “Especially if it’s not a high-quality assessment [of the students] and doesn’t assess a broad range of knowledge.” The quality of student tests is a particularly sticky issue in New York, where the first two years of Common Core test results, which have shown outsized gaps between demographic groups, have done little to ease critics’ concerns that scoring well on the tests requires skills that are irrelevant to the subject matter being tested.
There are competing views on the best way to approach incorporating student test scores into teacher evaluations. The model that’s currently accepted as the most accurate is known as the “value-added model” because it’s supposed to measure exactly how much of a student’s progress made over the course of the year is thanks to her teacher. It compares each student’s test scores to their own past scores and to other students’ scores. “It’s not perfect, but there is no other measure as good as [the value-added model],” Hull said. “But that’s not to say that other measures shouldn’t be used as well. You really want to have multiple measures of student outcomes.”
The Measures of Effective Teaching (MET) Project, the magnum opus of teacher-evaluation studies — every expert I spoke with on this topic pointed me to it — confirms the idea that student growth, often in the form of test scores, should be included in evaluations, along with other metrics. The project, which was funded by the Bill & Melinda Gates Foundation, was a three-year research partnership between 3,000 teacher volunteers and dozens of independent research teams who together developed and tested multiple measures of teacher evaluations to help better identify effective teaching. It didn’t recommend one specific model, given that every district has its own needs, but it did highlight the fact that “heavily weighting a single measure may incentivize teachers to focus too narrowly on a single aspect of effective teaching and neglect its other important aspects.”
The project found that an evaluation model heavily weighting standardized test scores was more effective than other models in predicting improvements in how students performed on those tests, but was the least predictive of how students performed on supplemental higher-order tests (usually given at the district or school level), and also produced the least reliable results, based on year-to-year consistency in teachers’ scores. Here are the results, from the project’s research brief:
Texas is in the process of overhauling its evaluation system, and it will value test scores at a maximum of 20 percent — even less than the MET Project’s Model 4. Specifically, Texas’s new model will count “student growth” as 20 percent of a teacher’s score. “But there are four options that districts have for measuring student growth, and just one of them ties back to test scores,” said Tim Regal, the director of educator evaluation and support for the Texas Education Agency.
Texas’s new model, which will replace a 17-year-old system, is in its pilot year and being implemented in 57 districts. Next year, it will expand to 200 districts before the final statewide implementation in 2016-2017.
But rather than just changing the way different factors are weighted, the overhaul is an attempt to transform the way teacher evaluations are used and understood. “Right now, when you talk to teachers in Texas about appraisal, they have a certain mindset because it’s always been used just to determine renewal and nonrenewal,” he said, referring to the common practice of using evaluations solely to determine which teachers to let go. “The only thing that mattered was their score. A lot of our focus during the pilot year has been about how to change that mindset to be about a process of professional development and growth for all teachers.”
In addition to the 20 percent of teachers’ evaluations that will be based on student growth, another 70 percent will be based on observations from principals, assistant principals and other campus leaders, and 10 percent will be based on self-assessment.
According to Angela Minnici, director of the Center on Great Teachers and Leaders, a change in mindset about how teacher evaluations are used is necessary. “We need to move away from thinking about evaluation systems as only ways to dismiss underperformers,” she said. “We really should be thinking about evaluations as a source of power and investment for moving teachers to a better level of performance overall.”
While New York is putting even more emphasis on testing, most states are moving toward a system more like the one in Texas.
“States are definitely making a big step forward,” Hull said. “Over a short period of time, evaluation systems have gone basically from an exercise in bureaucracy to being given immediate feedback and information on how to improve performance. That type of feedback is how we get the best doctors, best lawyers, and it’s also how we’ll get the best teachers.”