FiveThirtyEight

Hall of Fame relief pitcher Richard “Goose” Gossage isn’t the biggest fan of the “Moneyball” revolution. Here at FiveThirtyEight, we don’t think his expletive-laced tirades about nerds ruining baseball have always found their target the way his fastballs once did. But on one point, he’s absolutely right: The save is a stupid [bleep]ing statistic.

Gossage recently lashed out against modern closers — including all-time saves leader Mariano Rivera — arguing that they aren’t used in the right situations and that cheaply earned saves exaggerate closers’ value compared to the pitchers of his day. “I would like to see these guys come into more jams, into tighter situations and finish the game. … In the seventh, eighth or ninth innings. I don’t think they’re utilizing these guys to the maximum efficiency and benefit to your ballclub,” Gossage said. “This is not a knock against Mo [Rivera],” he continued later. “[But] I’d like to know how many of Mo’s saves are of one inning with a three-run lead. If everybody in that [bleep]ing bullpen can’t save a three-run lead for one inning, they shouldn’t even be in the big leagues.”

Gossage is right about pretty much all of that. A pitcher probably shouldn’t get much credit for handling just the final inning when his team has a three-run lead. Moreover, the top relief pitchers today are less valuable than they were in Gossage’s heyday in the 1970s and ’80s. In large part, that’s because managers are trying to maximize the number of saves for their closer, as opposed to the number of wins for their team. They’re managing to a stat and playing worse baseball as a result.

But there’s a solution. Building on the work of Baseball Prospectus’s Russell Carleton, I’ve designed a statistic and named it the goose egg to honor (or troll) Gossage. The basic idea — aside from some additional provisions designed to handle inherited runners, which we’ll detail later — is that a pitcher gets a goose egg for a clutch, scoreless relief inning. Specifically, he gets credit for throwing a scoreless inning when it’s the seventh inning or later and the game is tied or his team leads by no more than two runs. A pitcher can get more than one goose egg in a game, so pitching three clutch scoreless innings counts three times as much as one inning does.

The goose egg properly rewards the contributions made by Gossage and other “firemen” of his era, who regularly threw two or three innings at a time, often came into the game with runners on base, and routinely pitched in tie games and not just in save situations. I’ve calculated goose eggs for all seasons since 1930 — plus select seasons since 1921 — based on play-by-play data from Retrosheet. While Gossage ranks only 23rd in major league history with 310 saves, he’s the lifetime leader in goose eggs (677) — ahead of Rivera and every other modern closer.

Plus select seasons since 1921

Source: Retrosheet

If managers want to squeeze every ounce of potential and talent out of their top relievers — maybe even doubling their value — it’s time to give up the save and embrace the goose.

 

Bullpens are still built around the save

While I come to bury the save, let me first sing some of its praises. The statistic, invented by the sportswriter Jerome Holtzman and officially adopted by Major League Baseball in 1969, came into the world with noble intentions. Relief pitchers were becoming more commonplace — the share of starts that ended in complete games would decline from 40 percent in 1950 to 22 percent in 1970. But these pitchers’ contributions were largely unheralded by fans, Holtzman correctly noted, because they rarely earned wins or losses and ERA did not reveal much about which relievers had been used in clutch situations.

Furthermore, some of the intuitions behind the save rule are correct. Modern statistics such as leverage index find that late-inning situations when a team holds a narrow lead are indeed quite important. For instance, an at-bat in the ninth inning when the pitcher’s team leads by one run has a leverage index of 3.3. That means it has more than three times as much impact on the game’s outcome as an average at-bat.

The problem is that there’s a fuzzy relationship between the most valuable relief situations and the ones that the save rewards. Take a look at the following chart, which shows the leverage index in different situations based on the inning and the game score:

Imagine that one evening, Pitcher A throws a scoreless eighth inning in a game where his team leads by one run — a situation that has a leverage index of 2.4 — before being pulled for his team’s closer. Meanwhile, in another ballgame on the other side of town, Pitcher B enters the game in the ninth inning when his team holds a three-run lead — a leverage index of just 0.9 — and gives up two runs but eventually records the final out. Pitcher A’s performance was quite valuable. Pitcher B’s was not — in fact, it was kind of crappy. But Pitcher B gets a save for his troubles whereas Pitcher A doesn’t. It doesn’t make a lot of sense.

There are other problems with the save, also. It doesn’t give a pitcher any additional reward for pitching multiple innings — even though two clutch innings pitched in relief are roughly twice as valuable as one. And a pitcher doesn’t get a save for pitching in a tie game, even though it’s one of the highest-leverage situations.

I know I’m not breaking much news here: Stat geeks have been complaining about the save for years. But don’t modern, post-“Moneyball” teams know better than this? Aren’t they using their best relievers in the highest-leverage situations, whether or not they yield a save? In a word: no. (In 11 words: Mostly not, except maybe for the Cleveland Indians and Andrew Miller.) The next table reflects how teams used their closers (as defined by closermonkey.com, a site that tracks bullpen usage obsessively) over the course of 2016, as measured by the number of innings the closer pitched in different situations:

The typical modern closer is really just a ninth-inning specialist. In 2016, the average closer threw 66 innings, and 56 of them came in the ninth inning. This included 11 innings in games where his team led by three runs in the ninth — a save situation, but not a high-leverage one. Conversely, it included just six innings in tie games in the ninth, which is not a save situation but is one of the highest-leverage situations you can find.

Again, this is pretty much how you’d use your bullpen if the goal was to maximize the number of saves for your closer (instead of the number of wins for your team). Managers seem so conditioned by the “only use your closer in the ninth inning with a lead” heuristic that they often use their closers in the ninth inning when their team leads by more than three runs, which is a not a save situation and is even more of a waste of the closer’s supposed talent. Baseball teams have supposedly reached a state of statistical enlightenment — but their closer usage is every bit as stubborn as NFL teams’ too-frequent refusal to go for it on 4th down.

 

Defining a goose egg

If managers were thinking about goose eggs rather than saves, they’d find plenty of better ways to use their best relievers. So let’s define a goose egg, officially. Just as for the save rule, the formal definition is a bit more complicated than the quick-and-dirty version I described above. But here goes:

A relief pitcher records a goose egg for each inning in which:
  1. It’s the seventh inning or later;
  2. At the time the pitcher faces his first batter of the inning:
    1. His team leads by no more than two runs, or
    2. The score is tied, or
    3. The tying run is on base or at bat
  3. No runs (earned or unearned) are charged to the pitcher in the inning and no inherited runners score while the pitcher is in the game; and
  4. The pitcher either:
    1. Records three outs (one inning pitched), or
    2. Records at least one out, and the number of outs recorded plus the number of inherited runners totals at least three.

You’ll notice that the rules are more forgiving to pitchers who enter the game with runners on base, since these cases can have much higher leverage indexes than situations where the bases are empty. For instance, if a pitcher enters the game with two runners on and records a single out without allowing a run, he’ll earn a goose egg.

But the rule is strict about what it means by a scoreless inning. An unearned run cooks a goose egg, just as an earned run does. (The eggs are delicate.) And a pitcher doesn’t get a goose egg if a run scores while he’s in the game, even if the run was charged to another pitcher.

Overall, these rules can yield high goose-egg totals among many types of relievers, not just closers. That’s clear when you look at the goose egg leaderboard for 2016, for example. The Indians’ Miller and the Mets’ Jeurys Familia tied for the major league lead with 42 goose eggs last year, but Familia was used as a typical modern closer (and led the majors with 51 saves) while Miller often entered the game in the seventh or eighth inning. Mets setup man Addison Reed tied for fourth in the majors with 39 goose eggs last season, meanwhile, even though he had just one save.

ERA and W-L record cover relief appearances only

Sources: FanGraphs, Retrosheet

Miller and Familia’s league-leading total would have been paltry by Gossage’s standards, however. In addition to being the lifetime leader in goose eggs, he’s also the single-season leader, having recorded 82 goose eggs (almost as many as Miller and Familia combined) in 1975, when he threw 141.2 (!) innings in relief for the Chicago White Sox.

The top firemen of Gossage’s day routinely had 60 goose eggs or more in a season, with their totals sometimes reaching into the 70s or — in the case of Gossage in 1975 and John Hiller in 1974 — the 80s.

Just one pitcher since 2000 — the Angels’ Scot Shields in 2005 — has had as many as 60 goose eggs in a season, however. These days, it’s rare for a pitcher to record even 50 goose eggs. League-leading goose-egg totals have plummeted even as saves have risen. The turning point seems to have been 1990, when Bobby Thigpen and Dennis Eckersley both beat the single-season saves record while rarely working more than one inning at a time. In the 1970s and 1980s, the average league leader in saves threw 112 innings over 69 appearances. Since 1990, by contrast, the average saves leader has also appeared in 69 games but has thrown only 71 innings.

Plus select seasons since 1921.

Source: Retrosheet

 

Broken eggs and GWAR
(goose wins above replacement)

Having only learned about the goose egg a few moments ago, you might still be a little suspicious of it. Sure, closers are pitching fewer innings than they used to and getting fewer goose eggs. But perhaps they’re pitching more efficiently and providing more overall value as a result? It goes without saying that pitchers like Miller and Zach Britton are really good at their jobs.

To properly value relievers, we need a companion statistic called the broken egg, which is to a goose egg as a blown save is to a save. (I wanted to call this companion stat a “blown goose,” but my editors decided that vaguely dirty jokes were the hill they wanted to die on.) We’ll define it as follows:

A relief pitcher records a broken egg for each inning in which:
  1. He could have gotten a goose egg if he’d recorded enough outs;
  2. At least one earned run is charged to the pitcher; and
  3. The pitcher does not close out the win for his team.

In other words, you get a broken egg when you could have gotten a goose egg but are charged with an earned run instead, with an exemption if you get the last out of the game. Note that this leaves some situations that result in neither goose eggs nor broken eggs, which we’ll say are a “meh.” For instance, if a run scores while you’re in the game but it isn’t charged to you, that’s neither a goose egg or a broken egg; it’s a meh. I’ll speak no more of mehs in this article because they’re pretty boring; when I use the phrase “goose opportunity,” it means a goose egg or a broken egg.

There are usually about three goose eggs for every broken egg, meaning that relievers convert about 75 percent of their goose opportunities. And unlike saves and blown saves, which are highly punitive to guys who aren’t closers, the goose system gives middle relievers a fair shake. For instance, Mark Eichhorn — a good-but-not-great middle reliever for the Blue Jays and other teams in the 1980s and ’90s — converted 76 percent of his lifetime goose opportunities, about the same rate as an average closer.

Goose eggs and broken eggs — when taken together — also do a good job of replicating more complicated statistics. For instance, there’s a 0.78 correlation between a simple linear combination of these stats and the highly sophisticated statistic win probability added (WPA), which is arguably the best way to value relief pitchers. WPA is a lot of work to calculate, however, so goose eggs and broken eggs get you to mostly the same place but are relatively simple counting statistics. Saves and blown saves, on the other hand, have a much noisier relationship with WPA (a correlation of 0.50).

But if you take your statistics with an extra helping of rigor — and if you’ve read this far, you probably do — there are a few more things to consider. It’d be nice to adjust performance for a pitcher’s park and league; it was a lot easier to convert goose opportunities at Dodger Stadium in the low-offense 1960s than at Coors Field during the juiced-offense era. We’d also like to know how valuable a late-inning reliever is, which will require some notion of what the replacement level is for the goose statistic. Considering that a lot of high-performing closers — including Rivera — were once middling starters, is the job really that challenging?

To answer those questions, we need to create another new stat: goose wins above replacement (GWAR). To do that, I went back to the history books. Over time, the number of goose opportunities per game has increased (as teams pull their starting pitchers earlier) while the success rate for converting them has varied. The offense-friendly era from 1993 through 2009 was a rough one for relief pitchers, who converted a middling 73.8 percent of their goose opportunities. The best relievers from this era, such as Rivera and Trevor Hoffman, might be slightly underrated without considering this context. But since 2010, which has seen a revival of pitching, the goose-egg conversion rate has improved to 76.5 percent.

Source: Retrosheet

To determine the goose replacement level, I looked at the performance of pitchers since 1996 who made no more than 150 percent of the league’s minimum salary and who were acquired in free agency, on waivers, or through the Rule 5 draft. Essentially, these are the guys who are available to any major league team at any time for next to nothing — the literal definition of replacement-level players. But they actually weren’t too bad in goose situations. They converted 71.5 percent of their goose opportunities during this period, as compared to 74.7 percent for the league as a whole. To put that in more familiar terms, these relievers had a 3.91 ERA, weighted by their number of goose situations, as compared to a 3.64 weighted ERA for the league overall.

Therefore, a team shouldn’t be spending a lot for average relief pitching — the average relievers just aren’t that much better than the replacement-level guys. Pick up a few failed starters off the waiver wire, tell them to limit their repertoire to their two best pitches, and test them out in Triple-A or in low-leverage situations. You won’t necessarily have the next Gossage or Miller — those guys are scarcer and more valuable commodities — but you’ll probably find a couple of pretty good late-inning relievers without paying a lot to do it.

A complete formula for GWAR, which adjusts for a pitcher’s park as well as his league and converts performance in goose situations to wins, can be found in the footnotes.

The best relievers of all time, according to goose

Even with all this extra work, however, we come to basically the same conclusion that we did before: Most of the best relief seasons came a long time ago, and from pitchers who followed Gossage’s usage pattern rather than Rivera’s.

Plus select seasons since 1921

Sources: Retrosheet, baseball-reference.com

The best relief-pitching season of all time, according to this metric, belongs to Stu Miller, who had 79 goose eggs and just 7 broken eggs for the 1965 Baltimore Orioles. Miller’s traditional numbers looked pretty good that year — he went 14-7 with a 1.89 ERA and 24 saves in 119.1 innings pitched, finishing seventh in American League MVP balloting. His goose stats make it clear that he was almost unhittable in high-leverage situations, however. He contributed 7.5 wins above replacement according to GWAR, which is a Cy Young Award-caliber performance.

After Miller’s 1965 comes Gossage’s 1975, and then there’s a year from Rivera. But Rivera’s best season according to GWAR was not 2004, when he had a league-leading and career-high 53 saves, but 1996, when he was used as a setup man to John Wetteland and had just 5 saves in 107.2 innings of 2.09 ERA relief. Rivera was promoted to closer the next year, but his value declined as the Yankees held him to 71.2 innings despite the success he’d had in the fireman role.

Only two of the top 40 relief seasons have come in the past 10 years. You can be literally almost perfect — as Britton and his 0.54 ERA were last year — and yet still not provide as much value as pitchers like Gossage did because you didn’t have enough volume in high-leverage situations.

The lifetime GWAR leaderboard is somewhat more forgiving to modern closers. Rivera tops the list, with Hoffman second and Gossage third:

Plus select seasons since 1921

Sources: Retrosheet, baseball-reference.com

So perhaps you can argue that modern closer usage at least helps the best relievers to preserve their longevity, even if it almost certainly doesn’t maximize their value over the course of a given season. Then again, Rivera and Hoffman and Billy Wagner might just have been freaks; there’s been a ton of turnover in the closer ranks lately. Of the top 10 pitchers in saves in 2011, only three were still in the league in 2016, and only one (Craig Kimbrel) was still regularly working as a closer. As long as teams are burning through relief pitchers, they might as well try to get more value out of their best ones.

So how should an ace reliever be used?

Managers have a lot of room for improvement if they forget about saves and use goose eggs as a bullpen guide. A bare-bones workload for a goose-optimized closer would look something like this:

  • Pitch in all goose situations, including ties, in the ninth inning. For a typical team, that works out to about 40 or 45 innings over the course of the season.
  • Pitch in goose situations in the eighth inning when his team leads by one run exactly, with the plan of usually also pitching the 9th when the game remains in a goose situation. This will add another 15 innings or so.
  • Pitch in any goose situations in extra innings, up to a maximum of two total innings pitched for the game. Keep in mind that this will often be impossible because the closer will already have been used earlier in the game. Still, this should amount to another five or 10 innings in a typical season.

That will work out to a total of around 65 innings pitched for the season — about the same number that closers throw now — over roughly 50 appearances. But those innings would come with a super-high leverage index of about 2.5. And the pitcher would go from around 40 or 45 goose opportunities in a season to 60 or 65 instead, potentially generating nearly 50 percent more value as a result.

For an older or injury-prone closer (say, the Los Angeles Angels’ Huston Street), that might be basically all the work they could handle. But there are lot of teams that might want to replicate MiIler’s success, and there are younger, fitter pitchers who could build on this minimal workload. Depending on the day, they could enter in the eighth inning in tie games, for instance. And they could come into the game with runners on, even in the seventh inning; it can be worth using your best reliever to get your team out of a jam in these cases even if you have to remove him from the game later. A pitcher picking up some of these situations might wind up throwing 85 or 90 innings — and a roughly equal number of goose opportunities — over the course of a season in which he makes 60 or 65 appearances. Those pitchers could have roughly double the value that modern closers do. It’s really not that radical a shift from how pitchers are used now.

But it doesn’t have to stop there. Modern teams have about 150 goose opportunities in a season. One day, they’ll find a guy with the right genetics and the right mentality to throw two or three innings every second or third day — someone who really could approach Gossage’s usage pattern — and when that happens, Gossage’s 82-goose-egg single-season record might come under threat. It would be a high bar to clear. But it would be an accomplishment worth chasing down, whereas a save record usually isn’t.

You can download detailed data on goose eggs and broken eggs for all pitchers since 1930 here.