How Our 2016 MLB Predictions Work

This methodology article is for an old version of our MLB forecast. See how our latest MLB predictions work.

During last year’s MLB playoffs, we introduced our baseball Elo ratings and used them to take a look at the best teams going into the playoffs and the World Series. Today we’re publishing two interactives using similar ratings: our 2016 MLB Predictions, which preview upcoming games and show the chances that each team will make the playoffs or win the World Series, and the Complete History Of MLB, which charts the successes and failures of every franchise throughout history.

Thanks to Retrosheet, we’ve collected game results and box scores back to 1871 and used them to create an Elo-based rating system and predictive model for baseball that incorporates home-field advantage, margin of victory, travel, rest and — most importantly — starting pitchers. The ratings are also adjusted for park and era effects and account for the fact that favorites are more likely to win in the postseason than in the regular season.

Elo is a simple but elegant system that can be tuned and customized endlessly to incorporate available data. In our baseball Elo system, each team has a rating (the average is about 1500), and after every game, the winning team gains some Elo points while the losing team loses the same number of points. The number of points exchanged is based on the chances our model gave each team to win the game and the margin of victory; a win by a big underdog results in a bigger exchange of Elo points than a win by a favorite, and the larger the margin of victory, the larger the exchange.

Before every game, we also adjust each team’s rating based on whether it has home-field advantage, how far it’s traveled to the game, how many days of rest it’s had and which pitcher is slated to start.

Screen Shot 2016-04-25 at 9.27.22 AM

Home-field advantage is worth 24 Elo points in our model, and travel and rest adjustments are worth up to about 5 points each;¹ these three components are combined into the “Home field, travel and rest” section in the image above.²

Starting pitcher adjustments can have a substantial impact on pre-game team ratings and win probabilities. For example, in June of 2000, Pedro Martinez was worth about 109 Elo points to the Red Sox each time he started, which is the equivalent of a 15 percent boost to their chances of winning the game.³ This means Martinez was worth 109 points more than the average starting pitcher on his team (or even a bit more, since his starts were already incorporated into the team’s overall rating).

To generate these pitcher adjustments, we’re using a version of Bill James’s game scores proposed by Tangotiger (and slightly modified by us) to isolate pitching performances. After each game, the starting pitcher’s game score is calculated as:

47.4 + 1.5*outs + strikeouts – 2*walks – 2*hits – 3*runs – 4*homeruns

We maintain a running average of these game scores for each pitcher to produce his overall pitcher score.⁴ Here’s a list of the pitchers with the highest peak scores in history, the dates when they peaked and their corresponding Elo bonus:⁵

All-time pitcher peaks, based on pitcher score
NAME	FRANCHISE	DATE	PITCHER SCORE	RATING ADJ.
Pedro Martinez	BOS	6/8/2000	78.0	+108.6
Randy Johnson	ARI	5/16/2000	71.8	83.3
Greg Maddux	ATL	7/19/1995	71.8	67.3
Roger Clemens	TOR	7/28/1997	70.6	75.3
Dazzy Vance	LAD	5/5/1929	69.7	76.3
Curt Schilling	ARI	4/7/2002	69.1	64.3
Bob Gibson	STL	5/25/1969	68.6	76.6
Dwight Gooden	NYM	5/6/1986	68.5	74.4
Frank Tanana	LAA	6/24/1977	68.4	72.2
Bob Feller	CLE	8/4/1940	68.4	64.7
Jake Arrieta	CHC	10/7/2015	68.3	63.4
Sandy Koufax	LAD	10/14/1965	68.1	64.4
J.R. Richard	HOU	4/30/1980	67.7	67.1
Pete Alexander	PHI	7/13/1915	67.7	64.7
Johan Santana	MIN	10/5/2004	67.6	66.9
Kevin Brown	SD	10/8/1998	67.4	65.4
Tom Seaver	NYM	4/26/1972	67.3	72.8
Lefty Grove	OAK	6/13/1932	66.9	64.1
Mike Scott	HOU	5/18/1987	66.9	61.4
Ron Guidry	NYY	9/15/1978	66.9	62.4

Pitchers in the table above are listed only once, at their absolute peak. If we listed multiple peaks per player, Martinez, Greg Maddux, Randy Johnson and Roger Clemens would occupy 18 of the top 20 spots.

In addition to scores for each pitcher, we maintain a pitching score for every team — these are based on the game scores of every starting pitcher on that team. Each pitcher’s Elo adjustment is relative to his team’s pitching score; pitchers above the team average give the team a bonus when they start, and pitchers below the team average give the team a penalty. Note that in the table above, one pitcher may have a higher overall score than another pitcher but a smaller Elo adjustment; this generally means that his team had a better rotation or that he started more games and his game scores contributed more to the team’s overall average.

These are the 2016 pitchers who give their team the biggest boost when they start:

Top 2016 pitchers by rating adjustment to their team
NAME	TEAM	SCORE	RATING ADJ.
Clayton Kershaw	LAD	66.0	+63.5
Jake Arrieta	CHC	67.8	57.2
Zack Greinke	ARI	59.4	43.1
Kenta Maeda	LAD	60.7	38.5
Jose Fernandez	MIA	57.3	36.5
Gerrit Cole	PIT	57.1	35.8
Dallas Keuchel	HOU	58.4	34.6
Chris Sale	CHW	60.0	32.3
David Price	BOS	57.9	28.6
Madison Bumgarner	SF	58.3	28.4

Since starting pitcher bonuses from the two teams are additive, they cancel each other out when two top pitchers face each other and can be quite large when a strong pitcher faces a weak pitcher. The biggest mismatch of all time according to our pitcher scores was a 137-point Elo swing back in 1997, when Randy Johnson (+87) faced Ricky Bones (-50).

Our 2016 preseason team ratings are a blend of our 2015 end-of-season ratings (reverted to the mean by one-third) and four projection systems (PECOTA, ZiPS, Steamer and Davenport). Our preseason team pitcher scores use our 2015 end-of-season pitcher scores combined with projected starts from the same four projection systems.

We use Monte Carlo simulations to play out the season thousands of times to see how often each team makes the playoffs or wins the World Series. As with our other forecasts, we run simulations “hot,” meaning that a team’s rating changes within each simulation based on simulated results, including the bonus for playoff wins. For games where a starting pitcher is not yet known, we assume a pitcher of average strength will play.

Our Complete History Of MLB uses a slightly simplified Elo system that doesn’t take pitchers, travel or rest into account.⁶ Like the forecast, it will update after each game.

It’s early, but the Cubs are looking pretty good. We’re looking forward to the rest of the season and hope you’ll follow along with us.

Footnotes

The travel adjustment is calculated as (-MILES_TRAVELED^(1/3))*0.31, and the rest adjustment is DAYS_REST*2.3.
Here are some more of our baseball Elo parameters:
- The general K factor is 4, though it’s 6 for postseason games and also is adjusted based on margin of victory.
- Home-field advantage is worth 24 points.
- The difference in rating between two teams is multiplied by 4/3 for postseason games.
- Expected margins of victory are calculated with elo_diff^3*5.46554876e-08 + elo_diff^2*8.96073139e-06 + elo_diff*2.44895265e-03 + BASE (BASE is dependent on the year and stadium the game is played in but has an average of about 3.4).
- The actual margin of victory in each game is also adjusted for era/stadium effects and then flattened a bit with ((abs(margin_adj) + 1)^0.7)*1.41.
- Thus, the whole Elo shift is (outcome – winprob) * (adjusted_margin / expected_margin) * (6 if postseason else 4).
Starting pitcher adjustments give our model about a 1 percentage point improvement in the percentage of games correctly “called” and a corresponding improvement in mean squared error.
Just like the margin of victory modifier, game scores are normalized for eras and stadiums so pitchers are directly comparable. They’re also adjusted to take the opposing team’s offensive strength into account.
A pitcher’s Elo bonus is calculated with 4.7*(pitcher_score – team_pitching_score).
This means that the ratings in our Complete History interactive won’t always match the ratings in our 2016 MLB Predictions, but using separate rating systems gives us the flexibility to alter our forecast methodology from year to year but keep our Elo history interactives unchanged.

FiveThirtyEight

How Our 2016 MLB Predictions Work

Footnotes

Comments