Baseball Managers Are Getting Smarter About Handling Their Bullpens

Even in Major League Baseball’s enlightened sabermetric age, you’re liable to see at least one instance of obvious bullpen mismanagement on any given night. Such blunders usually come in the form of a manager hewing too closely to traditional inning-based roles — “saving” bullpen aces for a meaningless ninth frame when the game is on the line earlier, or bringing in a closer to hold a three-run lead in the ninth when a rookie would do just fine. These tactical errors can be frustrating to watch, as tenuous leads become deficits while the team’s best relievers are sitting in the bullpen waiting for “their” inning.

Logic and hard-won experience have shown that a team’s best relievers should pitch during the most important junctures of a game, regardless of when those moments occur. It’s a philosophy that saber-savvy analysts inside and outside the game have been pushing for some time now. And despite their occasional lapses in judgment, it appears that managers across the league, in addition to deploying some of the best bullpens ever in an absolute sense, are getting much better at optimally deploying their relievers according to skill, regardless of their age or experience.

To measure this, we ranked the relievers on each team¹ in every full season since 1988, from best to worst in each of three statistics: ERA, fielding-independent pitching (FIP), and deserved run average (DRA), which is Baseball Prospectus’s homegrown attempt to create a single, context-neutral statistic for evaluating pitcher performance.² We then ranked those same pitchers by the average leverage index — essentially, the importance (and pressure) of the moment — at the point when they first entered the game.³ Finally, we checked how well each team’s ranking of relievers by leverage index matched its rankings by ERA, FIP and DRA, and averaged those correlations across all 30 teams.⁴

If managers have indeed gotten smarter about using their best relievers in the highest-leverage situations, we’d expect to see an increasingly negative correlation between each of the three statistics (for which lower is better) and leverage index (for which higher numbers represent more important moments) over time.⁵ And indeed, that’s (sort of) what we see for ERA and FIP — the correlation becomes increasingly negative as we move rightward along the graph. But the change here isn’t dramatic, nor is it statistically significant.

Far more interesting, though, is what’s been happening with DRA. As it turns out, over the last 18 years, DRA has become dramatically more correlated with average leverage index than ERA and FIP have — at statistically significant levels.⁶ Outside of 1998, which was a clear outlier season for good bullpen usage, 2015 had the best correlation in our dataset, and 2014 had the second-best.⁷

Managers, in short, are getting better at this whole bullpen-management thing. Way better.

And here’s the wacky part: DRA is by far the most sophisticated of the three metrics under consideration, and it wasn’t even invented until 2015. Analysts in the public sphere have been using ERA — and, later, FIP — to justify complaints about reliever usage for a while, but some of that supposed mismanagement is probably just noise. Cut it away, as DRA tries to do by including sophisticated variables for catcher and umpire quality,⁸ and it becomes apparent that managers have been getting a lot smarter about how they deploy their relievers for quite some time now. The analytics community just didn’t have the tools to measure their improvement until recently.

Of course, it hasn’t been all sunshine and roses for managers. The old canard about experienced relievers⁹ “knowing how to close out games” still led to some poor choices until fairly recently. For example, our analysis found that, until about 2000, older relievers were consistently given more than their fair share of high-leverage opportunities. That effect has basically disappeared since 2011, however, suggesting that managers nowadays are valuing performance more heavily than veteran intangibles.

We also found that managers were indefensibly fond of giving more save opportunities to relievers who had put up gaudy save totals in the past, regardless of their underlying performance. In 1997, for example, “proven closers” (who we defined as relievers with 20 or more saves in the prior year¹⁰) with below-average DRAs were still handed an average of 33 save opportunities during these lousy seasons (of those 33, six were blown).¹¹ But that tic has also started to disappear in recent seasons. This year, subpar relievers are down to 13 save chances, which is on pace to be one of the lowest averages in our dataset even if it goes up in the last month of the season.

And it’s not just that save opportunities are being shifted away from proven closers. Back in 1988, proven closers entered games with an average leverage index of about 1.90, which is quite high –the typical game has just a few such moments. This year, that figure was down to about 1.5, meaning the leverage burden has become significantly more spread out between all the relievers on an average staff (and reliever usage has become somewhat more optimal in the bargain).

Of course, there are quite a few caveats attached to this general trend of smarter bullpen management. The first has to do with reliever specialization. Today’s widespread use of 13-man pitching staffs has allowed some clubs to carry pitchers whose only role is to come in for one or two outs in some very particular situation, almost regardless of leverage. Our analysis doesn’t account for that, and accordingly unfairly penalizes managers for using these relievers on occasion.

Another caveat relates to fatigue. Since even the hardiest of relievers needs a day off here and there — say, after pitching on three consecutive days — the best reliever available for any given game might not be the best reliever on the roster. We admittedly don’t have the data to account for this factor, but we think that adjusting for availability would affect only the magnitude, and not the direction, of the trend we observed.

So although head-scratching moments still abound, the evidence strongly suggests that big-league managers — so often pilloried as strategic dinosaurs by the sabermetric avant garde — had intuited the principles underlying today’s most advanced statistics long before those stats had even been invented and began adjusting their reliever usage accordingly. So the next time you see a manager make a headache-inducing bullpen decision, take some solace in the fact that these tactical errors are much less frequent than they used to be. And know also that maybe it’s not an error at all; maybe the manager is basing his decision on the next advanced pitching metric — the one that the public won’t find out about for another decade.

Footnotes

Excluding relievers who had switched teams at midseason.
In the interest of full disclosure, please note that Rian is a writer and editor at Baseball Prospectus, and that Rob participated in the design of DRA.
We pulled leverage index data from Fangraphs.
For the statistically initiated, we used a weighted Spearman correlation, with the weights determined by innings pitched.
In other words, we should expect to see relievers with low ERAs enter games with high leverage indices more often, and to see those same pitchers enter games with low leverage indices less often.
For those curious, a simple linear model predicting the correlation between DRA and leverage based on the year results in a p-value of 0.0008.
We didn’t include 2016 because the season is not yet complete.
And the weather!
As measured by service time and saves.
At other cutoff points — one, five, 10, 15 — the trend is the same.
And, yes, there were fewer bad “proven closers” who got save opportunities overall as well.

FiveThirtyEight

Baseball Managers Are Getting Smarter About Handling Their Bullpens

Footnotes

Comments