How Our 2017 College Football Playoff Predictions Work

FiveThirtyEight’s College Football Playoff forecast model is in some ways both my most favorite and my least favorite of the many statistical models we publish. That’s because, instead of trying to predict the games themselves — we mostly¹ defer to ESPN’s Football Power Index for that — we try to predict the behavior of the small group of human beings who make up the playoff selection committee. This is a lot of “fun,” but also quite a challenge.

It’s a challenge not necessarily because the selection committee is inherently unpredictable. Most of the time, several of the playoff participants turn out to be fairly obvious, and our model has correctly predicted 11 of 12 playoff participants in the three years of its existence so far.²

The goal of a statistical model, however, is to represent events in a formal, mathematical way, and ideally, you’d like to be able to do that with a few relatively simple mathematical functions. Simpler is usually better when it comes to model-building. That doesn’t really work in the case of the selection committee, however. We finally have a reasonable amount of data to work with — 2017 will be the fourth year of the playoff. And what we’ve found is that even though our model can do a reasonably good job of anticipating the committee’s behavior, it has to account for the group behaving in somewhat complicated ways.

We discovered in 2014, for example — when the committee excluded TCU from the playoff despite the team holding the No. 3 spot in the committee’s penultimate rankings — that it isn’t always consistent from week to week. Instead, it can partly re-evaluate the evidence as it goes. For example, if the committee has an 8-0 team ranked behind a 7-1 team, there’s a reasonable chance that the 8-0 team will leapfrog the other in the next set of rankings even if both teams win their next game in equally impressive fashion. That’s because the committee defaults toward looking mostly at wins and losses among power conference teams while putting some emphasis on strength of schedule and less on margin of victory or “game control.” Therefore, our model does the same thing, based on a version of Elo ratings that attempts to mimic the committee’s behavior, along with a separate formula based simply on wins and losses. (For a more formal description of how our model works, see here.)

We’ve added other wrinkles over the years. Before the 2015 season, for example, we added a bonus for teams that win their conference championships, since the committee explicitly says that it accounts for conference championships in its rankings (although exactly how much it weights them is difficult to say).³ And late last year, we added an adjustment for head-to-head results, another factor that the committee explicitly says it considers. The committee has been a bit more consistent about applying this criterion, according to our testing. If two teams have roughly equal résumés but one of them won a head-to-head matchup earlier in the season (say, Oklahoma over Ohio State), it’s a reasonably safe bet that the winner will end up ranked higher.

Still, there are no guarantees. Our college football forecasts — like all of our forecasts at FiveThirtyEight — are probabilistic. Not only do we account for the uncertainty in the results of the games themselves, but also the error in how accurately we can predict the committee’s ratings. I spent some time this week evaluating our model’s published forecasts from 2014 to 2016 and found that they were pretty well-calibrated. That is to say, teams that are given a 60 percent chance of making the playoff will actually make the playoff about six out of 10 times and fail to do so about four out of 10 times over the long run. Because the potential for error is greater the further you are from the playoff, uncertainty is higher the earlier you are in the regular season. As of the launch of our forecast in early October, for example, as many as 15 or 20 teams still belong in the playoff “conversation.” That number will gradually be whittled down — probably to around five to seven teams before the committee releases its final rankings.

We’ve made a few additional changes in preparation for launch this year, which I’ll briefly describe here:

First, we’re using the AP poll as a proxy for the committee’s rankings until the committee releases its first set of rankings on Oct. 31. This change has allowed us to launch our forecast earlier than in past seasons. (We’d previously waited until the committee’s first rankings were out.) Our model builds in additional uncertainty while the AP poll is being used, to account for the fact that the committee, which is made up mostly of former coaches and athletic directors, doesn’t size up the teams in quite the same way that the media voters in the AP poll do.
Second, game-by-game forecasts are now based on a combination of FPI ratings and committee (or AP) rankings, instead of solely FPI. We think FPI is a really good system, and we’re not saying that just because it was developed by our ESPN colleagues — it’s done an excellent job of predicting games over the past three years. In our testing this year, however, we found that accounting for the committee’s rankings (or the AP’s rankings before the committee’s rankings are available) contributes some predictive power (in addition to FPI). So game predictions are now based 75 percent on FPI and 25 percent on the rankings.⁴
And, finally, our system now gives teams from power conferences more advantages, because that’s how human voters tend to see them. We’ve calculated our Elo ratings back to the 1988 college football season. Between each season, ratings are reverted partly to the mean to account for roster turnover and so forth. In a change this year, teams are now reverted to the mean of all teams in their conference, rather than to the mean of all FBS teams. Thus, teams from power conferences — especially the SEC — start out with a higher default rating.⁵ This both yields more accurate predictions of game results and better mimics how committee and AP voters rank the teams. For better or worse, teams from non-power conferences (except Notre Dame) rarely got the benefit of the doubt under the old BCS system, and that’s been the case under the selection committee as well. In addition, we’ve made the conference championship bonus larger for teams from well-rated conferences; this also improves predictive accuracy.

Our forecasts will update at the end of each game, as well as when new AP rankings or new committee rankings are released. We hope you’ll have fun following the season with us.

Footnotes

In previous years, the game forecasts were based entirely on FPI. This year, they’re based mostly on FPI instead — see below for more detail.
Based on game results through the final week of the regular season before the committee released its final rankings.
Determining how much a conference championship matters is tricky because a team that wins a championship game has a lot of other things going for it — for instance, by virtue of winning its conference’s championship game, a team gets an additional head-to-head win against another strong team, something the committee (and our model) already value highly. In the three years of the selection committee so far, it doesn’t appear that many decisions have come down to whether a team won its championship or not. Still, our testing suggests that the committee probably does reward winning championships, at least for teams in power conferences.
Since the committee ranks only the top 25 teams, we estimate how they rate the remaining 105 FBS teams based on our Elo ratings.
To be more precise, our model treats conferences as existing along a spectrum, rather than in binary groups of “power” and “minor” conferences. For instance, the American Athletic Conference — which has two teams in the AP top 25 — is more highly rated than the Sun Belt Conference.

Footnotes

Comments