Umpires Are Less Blind Than They Used To Be

Dusty Dellinger knows how difficult it is to be an umpire. “There’s an old saying that they expect you to be perfect from day one and get better,” the former Major League Baseball official said over the phone. As the director of Minor League Baseball Umpire Development and the Minor League Baseball Umpire Training Academy, he knows how elusive perfection can be.

Correctly calling 140 pitches flying 90-plus mph and breaking six inches or more is a near-impossible standard. And when mistakes are made, players and managers aren’t bashful. Jonathan Papelbon said D.J. Reyburn should “go back to Triple A” after a confrontation over balls and strikes. Joe Girardi complained about inconsistency. Larry Andersen did too after he retired, labeling the men behind the plate arrogant. You don’t have to look too hard for more examples.

That’s led plenty of people to wonder when robots will come for the umps’ jobs. But lost amid those blue-sky dreams is what’s happened to the way we judge the blue behind the plate. Technology has changed how we can evaluate umps. It shows that umps are getting better, that there’s a significant gap between the best and worst, and that the best umps aren’t working the biggest games.

After every game, umpires receive a report from the league office that informs them about their accuracy, their correct calls, and the ones they missed. Pitchers, hitters and fans have near-instant access to information on an umpire’s accuracy, too. The chart below shows the accuracy rates for calling balls and strikes for each ump since 2008, when MLB installed the PITCHf/x tracking system in every stadium.¹

Umps are getting better, and they’re also remarkably consistent. An ump who makes more accurate calls in one year will likely do the same the next; an ump who misses more calls in a given season will likely be as bad the next. Umpire accuracy is more steady than a player’s batting average or a pitcher’s ERA, and as consistent as OPS (on-base plus slugging) and wins above replacement.

To see how this works, look at the performance of Lance Barksdale and Tim Welke. While they both follow the league’s general trend of increased accuracy — more about that later — they have, respectively, been one of the best and one of the worst umpires over the past seven years.

The difference between Barksdale and a league average ump is about five correct calls per game; the difference between Barksdale and the league’s worst umpire is closer to 10 calls a game. On average, that’s about one judgment call per inning that a good ump is getting right and a bad ump is getting wrong. That might not sound like much, but if once every six outs a batter gets another swing after a third strike that wasn’t or a pitcher strikes a hitter out on a pitch that’s actually a ball, you can start to see the impact.

Given their differences, umps develop reputations. Near the end of infielder Mark DeRosa’s 16-year career, he knew what to expect from the umpire calling balls and strikes. “You gain knowledge over the course of being in the big leagues for the course of a couple of seasons,” he said. “You understand which umpires are a little bit wider in their zone, who are a little bit more north-south, who’s going to force the pitcher to come tight.”

Before games, he and his teammates would even talk about what they could expect during the game: “A comment would be passed back and forth, whether we should be pulling the trigger tonight or ‘this guy is normally a hitter’s umpire and likes to force the pitcher to come back over the plate, so let’s be a little bit more picky with what you’re going to swing at.’ ”

An umpire who understands what calls he is missing is an ump who can improve. “It was amazing how my perspective of the strike zone changed when I got this technology,” Dellinger said. “I thought pitches were on the plate, until you get that data back. You see that some of those pitches were not on the plate. It wasn’t something that was done intentionally. It was just your perception of the strike zone. I was able to quickly make adjustments based on having that information, which was huge to me.”

Seeing the data, however, can make fans less charitable. “They see a pitch that is out of the box, and they think, ‘Aw, he’s a bad umpire,’ ” Dellinger said. “I’m thinking, ‘You should have seen it 15 or 20 years ago.’ ”

He’s right — ump accuracy has improved since 2008. But it has been on only one type of pitch: strikes.

While umps call balls no differently than they did seven years ago, they’re accurately gauging strikes at much higher rates. This distinction is so large that Brian Mills, a professor of tourism, recreation and sports management at the University of Florida, cites the increasing size of the strike zone as accounting for about half of the league’s 50-point drop in OPS since 2008.

In other words, steroid testing isn’t the only change responsible for MLB’s drop in offensive output. It’s also more called strikes.

While the league and the umpires association have access to data showing that specific umps tend to be better at calling balls and strikes, it does not appear that they use this information to reward those who are the most accurate with choice assignments, like the All-Star Game or the postseason.²

According to Peter Woodfork, senior vice president of baseball operations, balls and strikes play a role, but don’t write Lance Barksdale’s name into your World Series scorecard just yet. “Once you meet a standard, you’re in the mix,” Woodfork said, likening the selection process to that of the NCAA tournament. Assignments are doled out using a mix of analytics and judgment: “Balls and strikes is taken into account along with field work, rules, instant replay and handling situations. Professionalism also factors into grading umpires. The plate work may carry more weight in the evaluation, but they are all important.”

If plate work is important, it hasn’t shown in playoff assignments. According to numbers from BaseballSavant.com, umps who were No. 70, 71 and 76 in the accuracy rankings (out of 79) called balls and strikes in the ALCS last year, with only one of the top 10 umps receiving a league championship series or World Series spot. And this more exhaustive look at umps also finds that postseason spots do not appear to be linked to regular-season performance.

“Like any other profession, you can go up and go down, but the consistency over time often helps,” Woodfork said. “We don’t ignore what you’ve done in the past, but that year carries the most weight.” If that’s true, expect our old friend Barksdale to receive a high-profile opportunity, as his 90 percent accuracy rate through July 1 is far and away the best single-season number in our data.

But while decisions on postseason spots won’t come for several weeks, MLB has already had one opportunity to reward an umpire for past performance, getting to pick a home plate umpire for July’s All-Star Game.
It chose Tim Welke — the same Tim Welke who has consistently had one of the league’s worst rankings since 2008.

Footnotes

The data was collected from BaseballSavant.com. Umps in the data set saw at least 3,000 pitches (called balls or strikes) in each season, with a smaller restriction (1,800 pitches) for 2015.
MLB declined to make specific umpires available for interviews but did let Peter Woodfork, senior vice president of baseball operations, and Randy Marsh, director of major league umpires, talk.

FiveThirtyEight

Umpires Are Less Blind Than They Used To Be

Footnotes

Comments