Grading The Many, Many, Many College Football Bowls

College football’s bowl season officially kicks off on Saturday afternoon with the Gildan New Mexico Bowl between New Mexico and the University of Texas at San Antonio. And it’s pretty typical for lower-tier bowls: The combatants aren’t very good — the Lobos and Roadrunners rank No. 81 and 101, respectively, in ESPN’s Football Power Index rankings — but make for a pretty even matchup and are likely to put on an offensive show. (According to Sports-Reference.com’s Simple Rating System, or SRS, the two teams are projected to combine for 71 points, about 22 percent more than the typical FBS game.¹) It’s the kind of low-stakes pre-Christmas game meant primarily for fans, gamblers or otherwise inveterate college football junkies (raises hand).

(Disclosure: The Gildan New Mexico Bowl is one of 13 bowl games this year that are owned and operated by ESPN, the parent company of FiveThirtyEight.)

A couple of years ago, I wrote about the sport’s bloated bowl schedule, and things have only expanded since. Including the College Football Playoff championship game on Jan. 9, the FBS postseason now includes 41 bowl games over the span of 24 days, tying the all-time record set last year for the most jam-packed bowl season ever.

And in terms of the quality of games, there are a lot more matchups that look like New Mexico-UTSA these days than, say, Ohio State vs. Clemson. To quantify this, I developed an index to rate the caliber of each bowl since the AP poll era began in 1936, grading each game on a 5-point scale (3 is average) based on three factors:

The quality of the teams involved. For this, I used the harmonic mean of the two teams’ pregame Elo ratings, our pet metric for determining a team’s strength at any given moment. (Why harmonic? To ensure that both teams in a matchup had a high rating for a bowl to get one.) The 2015 CFP championship game between Ohio State and Oregon rated as a “5” on my grading scale — it featured the 10th- and 12th-best teams in college football history (in terms of Elo at their peak) — and was followed closely by the 2006 Rose Bowl between Texas and USC. For a “1,” look to the 1947 Harbor Bowl between New Mexico and Montana State.
How close the matchup is. In addition to combining great teams, a quality bowl should feature a relatively even matchup. To measure that, I used the difference in the pregame Elo ratings of the two teams in each bowl; closer matchups earn a higher grade.² The closest bowl in my data set? The 1996 Carquest Bowl between Miami and Virginia earned a “5,” with each team sporting a +10.5 Elo rating going into the game. (Miami won, 31-21.) The most lopsided bowl, on the other hand, was the 1970 Tangerine Bowl — Elo favored Toledo by nearly 29 points over William & Mary. (Toledo won by 28.)
How much offense the game is likely to feature. This category is a bit more subjective than the others, because some people might not agree that more scoring makes for a better viewing experience. But truckloads of points are generally fun, and some of the hallmarks of lower-tier bowls are trick offenses and terrible defenses. Indeed, the average team in a pre-New Year’s bowl is 19 percent better on offense than on defense according to SRS.³ So I gauged every historical bowl by how much more the teams were projected to score (based on their respective offensive and defensive SRS ratings) than the per-game average was for FBS/Division I-A teams in the same season.⁴ The 1972 Peach Bowl between all-offense/no-defense West Virginia and N.C. State was a “5”; the 1963 Cotton Bowl with Texas and LSU (which combined to allow 9.6 points per game during the season) was a “1.”

Add up the scores in each category, and you get a sort of total measure for the entertainment value of each bowl. Here’s how this year’s crop stacks up:

Because the average grade in each category is a 3, the typical bowl scores around a total of 9 across all three of the factors I considered. In practice, however, the average bowl’s score hovered slightly above that mark from the 1960s until the late 1990s, when it began to plunge sharply — unsurprisingly, that was around the time we saw a steep uptick in the number of bowls:

The top bowls are as good as they’ve ever been. The average grade for the top five bowls in each season, for instance, increased steadily from the late 1970s until the mid-’90s, and it’s stayed roughly level since then. The same goes for each season’s top 10 bowls. But the quality of the worst bowls each year has fallen off a cliff over time. The average grade for this year’s worst five bowls is 18 percent lower than it was in 1996 and 12 percent lower than it was a decade ago.

This isn’t to say that more football is a bad thing, even when it’s played by increasingly mediocre teams. Although the dregs of bowl season feature far worse programs than they used to, practically all of that dip has come on the defensive side of the ball — offensive grades for low-level bowls are steady (if not slightly up) since the early 1980s, despite the overall decrease in quality for the teams participating in them.

In other words: If obscure bowls can’t draw in good teams, they appear to have countered that by featuring teams that will at least play a high-scoring brand of football.

That’s partly why I’ll be tuned in for, say, Tulsa-Central Michigan or Navy-Louisiana Tech — games that may not carry much meaning for the neutral observer, yet somehow hold an appeal nonetheless. Some of that is probably wrapped up in my own nostalgia for the long holiday breaks of childhood, watching endless streams of college football at the dawn of the 1990s bowl explosion. But some is also the fun of watching unusual opponents score a ton of points on each other in the quasi-pageantry of a bowl atmosphere. Games like these are vestigial parts of a postseason system that seems hopelessly out of place in 2016 — but even so, they’re not completely devoid of charm.

Footnotes

The formula for projecting the number of total points in a matchup is relatively simple: Take the average number of points per game for all FBS/Division I-A schools in a given season, add one school’s offensive SRS and subtract its opponent’s defensive SRS. Do the same for the reverse situation — school A’s defense against school B’s offense — and add the two numbers together to get a stat-based “over/under” for a game.
To give you a sense of how predictive this is, the quarter of bowls since 1936 that were projected to be the closest ended up having a margin of victory 22 percent smaller than the quarter of bowls that were projected to be most lopsided.
Since 1947, the first season in the database that featured a bowl before New Year’s Day.
Since 1936, the correlation between projected and actual points across all bowls was 0.7.

Footnotes

Comments