Baseball’s Savviest (And Crappiest!) Bullpen Managers

As September draws to a close with multiple teams still locked in tight playoff races, baseball fans across the country have ample reason to pore over every last detail of their managers’ decisions. And when it comes to bullpen management, they have a great deal to scrutinize. Mistakes in this arena — which, by definition, almost always occur late in games — usually come in the form of either saving an ace reliever for “his inning” even as the game slips away at an earlier stage, or, conversely, wasting top relievers by deploying them in unimportant situations.

Earlier this month, we showed that major league managers have gotten better at avoiding these types of errors over the past three decades. Instead of handing big moments to subpar relievers based on tired notions of seniority,¹ managers are increasingly handing important responsibilities to the best relievers available. But not every manager is equally adept at doing this. Grading individual skippers on their ability to consistently deploy their best relievers in the biggest moments, we find that bullpen management is a repeatable skill that can be fairly assigned to individual managers, and that good bullpen management is worth something on the order of one win per season.

Here’s how it worked. First, we ranked the relievers on each team² in every full season since 2000³ from best to worst in deserved run average (DRA), which is Baseball Prospectus’s context-neutral metric for evaluating pitcher performance.⁴ We then ranked those same pitchers by the average leverage index — essentially, the importance (and pressure) of the moment — at the point when they first entered the game.⁵ Finally, we checked how well each team’s ranking of relievers by leverage index matched its ranking by DRA, a correlation⁶ we’re calling a team’s reliever management (RM) score. Effective bullpen managers use their best relievers (those with the lowest DRAs) in the most important moments (those with the highest leverage index), which pushes the RM score toward an ideal of -1.

In our last article, we refrained from assessing the reliever usage of individual skippers because we weren’t sure yet whether what we were grading was attributable to the manager’s ability or whether it was just a function of the bullpen he had at his disposal in any given year. So we decided to test that relationship out. If reliever management is indeed a skill, we’d expect to see the same group of skippers be good at it — or bad at it — year after year. You don’t wake up one morning and forget how to drive a car, but sometimes you do hit every red light on your commute to work, or, in this case, get handed a bad batch of relievers.

After calculating each team’s RM score, we assigned it to their manager of record that season (the one who managed the most games). Then we looked at whether individual managers’ RM scores were correlated with each other from year to year. Although the effect we found was rather weak — only about 10 percent of the variation in RM score year-over-year is likely attributable to managerial choices — it was statistically significant, even two years out.⁷ So it’s reasonable to assign at least some credit (or blame) for a team’s RM score to the man in the dugout.

Still, there’s so much variation in team RM scores from year to year that we needed to use a more sophisticated statistical model to estimate each skipper’s overall bullpen-management ability.⁸ When we applied our chosen model to each manager’s raw RM scores for each season, we ended up with an aggregate measure of how likely any given manager was to optimally match their relievers to appropriate situations — good relievers to tense moments, worse relievers to calmer ones.

We’re calling the resulting metric weighted reliever management plus (wRM+), and in the style of other “plus” statistics, it’s been rescaled for ease of interpretability: 100 is average, with numbers above 100 corresponding to the percentage factor by which a manager is better than average (or worse than average, for scores below 100). For example, Joe Torre grades out as the best manager since 2000 with a score of 113, meaning his bullpen management was 13 percent better than average. Here’s the rest of the top 10:⁹

The best bullpen managers since 2000
MANAGER	WRM+
Joe Torre	113
Ozzie Guillen	111
Joe Girardi	111
Bruce Bochy	108
Jim Tracy	108
Bob Geren	108
Fredi Gonzalez	108
Bud Black	106
Buddy Bell	105
Eric Wedge	105

At a glance, this leaderboard passes the sniff test. Aside from interlopers such as erstwhile Braves skipper Fredi Gonzalez and former A’s manager Bob Geren, it’s a list of eight well-respected tacticians. Moreover, the first five men listed have all won Manager of the Year awards, as have seven of the top 10. While it is famously difficult to predict who will win that honor, which suggests the award might not be the most robust measure of managerial quality, it’s still good to know that our new metric isn’t coming completely out of left field. And the bottom-10 list also makes sense, as it could pass as a meeting of the Crusty Old Curmudgeons Society:

The worst bullpen managers since 2000
NAME	WRM+
Manny Acta	87
Clint Hurdle	90
Jerry Narron	90
Dusty Baker	91
John Gibbons	91
Tony La Russa	92
Jim Leyland	92
Bobby Valentine	93
Bob Melvin	93
Ron Washington	94

Several of these guys are on the record as advocating innings-based roles, which are the bane of optimal relief management. But even the worst bullpen managers can change their philosophies over time. Hurdle went from having one of the worst RM scores in the league in Colorado to having one of the best in Pittsburgh. His overall ranking is more of a testament to his earlier difficulties than to his current acumen, and to the influence that front offices can have on managerial decision-making.

So, now that we have a means of grading individual managers on reliever usage, how much is that actually worth in terms of wins and losses? To answer that, we looked at how many fewer runs were allowed — which in turn points to how many extra games were won — by good bullpen managers versus bad ones, sketching out a rough estimate of how many additional wins a manager’s bullpen smarts have been worth to his team.¹⁰

Perhaps surprisingly, we found that bullpen management — good or bad — doesn’t actually affect a team’s overall performance all that much. Certainly it’s not as important as, say, having good relievers to employ in the first place. A manager who’s bad at managing a bullpen (for example, Manny Acta) might be expected to win about 0.5 fewer games per season as a result of his bullpen-management problems than an average manager with the same ’pen, while a good one (such as Joe Girardi) might win 0.5 games more than average over the course of a season. The total effect of this skill has a range of perhaps one win per year.

In other words, bullpen management isn’t the be-all and end-all of managerial skills. That fits with what we already knew about managers: How they shape the chemistry and morale of the team tends to be vastly more important than their on-field tactical machinations, no matter how high-profile those machinations might be. And since every team is getting better and better at using their bullpen, the range of this skill is likely to shrink even further. And, more to the point, the single biggest determinant of team success, now and forever, remains the same: player quality.

The usual caveats, discussed in greater detail in our earlier article, still apply.¹¹ What’s more, for now we’re laying all the responsibility for the bullpen at the feet of the manager, when the front office and pitching coach probably also play a role. And our particular ranking method doesn’t account for fluctuations in reliever performance throughout the season — a guy who’s good in the first half but terrible in the second will be viewed as the average of the two — or for bullpens where the range of talent available to the manager is not wide (which makes for less obvious choices).

Still, we can say this with some certainty: Effective bullpen management is a skill attributable at least in part to managers, and is not just the result of random variation. Moreover, some managers are far better at handling their bullpens than others, probably to the tune of a win or so at the margins every year (which is not nothing — consider the tight wild-card races in both leagues this year). So your deepest suspicions about bullpen usage were always correct — unless you’re a Braves fan, in which case it’s probably worth sending a note of apology to Fredi Gonzalez.

CORRECTION (Sept. 21, 4:35 p.m.): A table in an earlier version of this article incorrectly listed Dale Sveum among the worst bullpen managers since 2000. Sveum shouldn’t have qualified for the list because he managed fewer than five MLB seasons during that period.

Footnotes

Such as age, career saves or years of big-league experience.
Excluding relievers who switched teams midseason.
Moving the cutoff up from 1988 in order to focus on managers who are still relatively fresh in the collective memory.
The stat accounts for, among other things, weather, team defense and umpire performance.
We pulled leverage index data from Fangraphs.
Specifically, a Spearman correlation weighted by innings pitched.
We found p-values under 0.02 in both our year-over-year and two-year correlations.
Specifically, we used a Gaussian random effects model with terms for the manager and the year, since we previously determined that bullpen management was a skill that managers are increasingly improving at. A random effects model, in contrast to a fixed-effects model, assumes a great deal of statistical noise around an uncertain mean, then strips that noise away to estimate, as accurately as possible, a “true talent” level over time.
To be especially sure we were isolating managerial skill, we limited the table to only include skippers who have managed at least five seasons since 2000.
For every team in our sample, we looked at how many runs the team gave up and how much the team over- or underperformed its run differential. We found that raw RM scores were significantly — if also weakly — correlated with both runs allowed (with an r of 0.11 and a p-value of 0.015) and whether a team over- or underperformed its run differential (an r of 0.12 and a p-value of 0.02). We used linear regressions of RM scores on these two numbers — and the fact that each win is equivalent to about 10 runs — to derive a total run value for bullpen management.
Notably, our simple correlation-based metric doesn’t take into account matchups, reliever fatigue or bullpens changing over the course of a year.

FiveThirtyEight

Baseball’s Savviest (And Crappiest!) Bullpen Managers

Footnotes

Comments