After we broke down the voting data on ESPN’s MLB Forecast results last week, an alert FiveThirtyEight reader, Andrew Jondahl, pointed out something weird in the panel’s predictions. Andrew noticed that — from last season to this one — no team was projected to move more than one spot in its division.
Such little movement would be highly unusual in the real world. Since 1998, when baseball moved to its current divisional format, nearly 30 percent of teams moved up or down by two or more spots from one season to the next.
So why does the 2014 ESPN MLB Forecast panel call for so little movement? Is this a bug or a feature?
If we assume the panelists are trying to maximize predictive accuracy, then it’s a feature. This is true for the same reason it’s better to predict no more than, say, 35 home runs for any player in a given season, even though we know the majors’ leader typically hits at least 45 homers (if not 50 or more).
Why? We have no way of knowing which player will stray into the HR stratosphere, so it’s best to make regressive predictions for each of the dozen or so guys who could make a credible case for being the outlier; roughly half of the group will exceed their forecast, and half will fall short.
The same goes for division forecasts. From 1999 to 2012, an average of 11 MLB teams per season moved zero spots within their division, 10 moved a single spot, six moved two spots, two moved three spots and one moved four spots. But if we tried to parcel out specific teams into each category, the odds are we’d be less accurate than if we just predicted no movement for any team.
A great illustration of this principle comes every year around this time. In the NCAA basketball tournament, certain first-round upset combinations (like a No. 12 seed beating a No. 5 seed) are very likely each year. However, there’s a big difference between knowing that fact and being able to capitalize on it by identifying the matchup in which an upset will occur. It’s just as easy to wreck a bracket by chasing false positives — upset picks that don’t happen — as it is to pick a favorite who loses.
A No. 12 seed has won roughly 1.5 times in the round of 64 in each tournament since the NCAA field expanded to 64 teams in 1985. But if we took the most likely No. 12 vs. No. 5 upset in the field (according to the teams’ pre-tournament Simple Ratings) and flipped a coin over whether to pick the second-most likely upset, we’d pick winners at a rate 9 percent lower than if we just picked the No. 5 seed to win no matter what.
Now, maybe your NCAA tournament pool sweetens the deal by rewarding upsets enough to make chasing those No. 12 seeds a viable strategy, but the overall point stands. Just because we know the overall frequency of an event happening, it doesn’t mean we know whether it will happen in any specific case. The best we can do is be regressive in our forecasts, accepting that some will be wrong, but that the overall prediction will be more accurate in the long run for it.