Modern life has become the algorithmized life, a data-rich dreamscape in which the solution to nearly every problem lies somewhere inside a spreadsheet. Every problem, that is, except for college football’s.
On Tuesday night, the new College Football Playoff (CFP) Committee will release its ranking of the best teams in college football. It’s a list generated by 13 human experts1 — they’ll have the aid of simple statistics, sure, but ultimately the committee and its members’ human biases are the ones accountable. College football has moved the onus from the machines to the men.
But only because the machines got them in trouble. In an unlikely marriage,2 college football became an early adopter of numerically driven policymaking in 1998, when it ratified the Bowl Championship Series (BCS) to determine its consensus national champion.3 Billed as an enlightened merger between the old-guard media polls — thus preserving the sport’s strong sense of tradition — and the computer rankings that so easily proliferated in the tech-boom ‘90s, the BCS was supposed to use data to help usher in a new era of college football.
Instead, all it produced was controversy, revolt and a system so universally loathed that its demise was one of the few initiatives for which President Obama was able to marshall bipartisan support. A great deal of the criticism centered on “the computers,” a faceless army of machines that supposedly wouldn’t know a 3-4 defense from a 4-3. One of the biggest selling points of the College Football Playoff has been that it involves people who do know defensive formations.
Yet there’s evidence that the switch from BCS to CFP won’t matter much, at least in terms of actually picking a champion with more efficiency. The big leap forward may simply be a lateral move.
College football’s champion has always been more beauty-pageant winner than undisputed warrior. There are far too many teams — playing far too few games — to be able to rely on wins and losses alone as sole arbiters of worth. NFL teams make the playoffs through their records alone,4 but college football teams, marooned in various conferences, play schedules of vastly differing quality. Any endeavor to pick a truly national champion has to, by necessity, grapple with the balance between performance and strength of opposition.
Originally, the media and coaches were the arbiters of who was great and who wasn’t, through the Associated Press Top 25 and the Coaches’ Poll. In theory, those who followed the sport most closely should produce a relatively equitable ranking of the country’s best teams. But the rankings became fraught with controversy and accusations of regional bias. The two major polls couldn’t always agree about which team was No. 1, producing a number of years in which multiple schools “won” the national championship. And college football’s longstanding system of bowl games, which act at once as postseason contests and meaningless exhibitions, occasionally complicated matters even further by contractually preventing the best teams from facing off even when there was clarity atop the polls.
The BCS, which mixed polls with the supposed objectivity of computers, was supposed to fix all that. The existence of mathematical ranking systems in college football dated back at least 70 years prior, but since the AP began continuously issuing polls in 1936 these systems had never been the game’s preeminent selectors.
It didn’t go smoothly. The computers became an easy punching bag for everything that fans and media hated about the BCS as a whole. “I think over the years, the computers were a scapegoat,” algorithm-maker Richard Billingsley told ESPN’s Mark Schlabach in August. “If there was an issue or if somebody didn’t like the results, it was the computers’ fault, and that wasn’t fair at all.”
“Humans had more to do with the BCS than the computers did, but people were just wrong about it,” former BCS director Bill Hancock added. “I think the computers got a bum rap.”
Even so, computer ratings played a large role in the BCS, and there were a number of reasons why the foray into data-crunching failed. First, the formula concocted by BCS creator Roy Kramer was inelegant, stirring the polls and computer ratings into an arbitrary statistical mishmash that included team loss totals and an arcane strength of schedule calculation. Also, it was badly overfit. As Stewart Mandel writes in “Bowls, Polls, and Tattered Souls,” Kramer “had his minions test the formula by applying it to past seasons’ results and making sure it spit out the correct two teams each year.” When future seasons5 failed to play out as tidily as the test sample did, the BCS endlessly tweaked its formula to retroactively “fix” whatever the previous year’s controversy was, rather than anticipating future fusses.
And perhaps the BCS’s biggest sin of all was banishing computer rating systems that took into account a team’s margin of victory in its games. It was seeking to reduce the incentive for coaches to run up the score on overmatched opponents, but in doing so it also deprived the computer ratings of key data points. One of the most crucial findings in sabermetrics, across virtually all sports, is that the average margin by which a team wins or loses conveys more information than wins and losses alone. This is especially true in a sport like college football, where the sample of games is so small.
Perhaps a computerized system could work if it were deployed with more skill. But college football’s decision-makers have decided instead that using no data — or at least a fuzzy interpretation of what’s available — is better than rigidly adhering to a defective model.
And it may not make much of a difference.
There will likely be a great deal of crossover between the playoff committee’s selection and the teams the BCS would have listed in its top four slots. In the estimation of SB Nation’s Bill Connelly, no fewer than 75 percent of the top four teams in the BCS rankings each year from 1998 to 2012 — and probably closer to 85 percent to 90 percent — aligned perfectly with the teams a hypothetical playoff committee would have selected had the current system been in place over those years.
There also isn’t much distinction between the BCS’s and the CFP’s accuracy in determining the nation’s true best team. The CFP’s four-team bracket would be more likely to feature the deserving champion (a four-team playoff system has about a 45 percent greater chance of including the best team than a two-team setup like the BCS). But the CFP loses that advantage by forcing the top team to play an additional game, opening it up to becoming the victim of bad luck. According to past research of mine, a two-team playoff is won by the best team in the country about 29 percent of the time, while a four-team playoff crowns the best team at a 31 percent clip — hardly any improvement at all.
The debut of the College Football Playoff is being celebrated as progress because it returns to the simplicity of human debate. But data and formulae ultimately weren’t to blame for the BCS’s woes, and it’s unlikely that its committee-based successor will reduce the number of college football controversies. Only an emotionless algorithm would have it any other way.