Late last month, NCAA officials met with some of basketball’s most prominent analytics experts to remake the way they select teams for the men’s NCAA tournament. Until now, they’ve used the ratings percentage index (RPI) to help guide their decisions, but that stat has become antiquated as far more advanced ranking systems have been developed. Efforts to replace the RPI, though, raise a lot of tricky questions.
According to multiple people I spoke to who were at the meeting, the NCAA is not interested in generating a completely new metric from scratch. Instead, officials favored using multiple ranking systems to create a composite index that would be a resource on selection Sunday. But as the many controversies around college football’s Bowl Championship Series showed, developing a new rating, even one made up of accepted metrics, is full of twists and turns, roadblocks and landmines. Finding the right formula will require asking deep philosophical questions about what a ranking system should try to achieve — and whether certain statistical compromises are even possible.
What’s so wrong with the RPI?
Until the NCAA adopts a new metric, the committee is stuck with the RPI. Developed in 1980 by statistician Jim Van Valkenburg, the RPI was originally intended to adjust a team’s record for its strength of schedule, a noble cause in a sport where 351 teams face opponents of wildly dissimilar quality. Critics of the RPI — and there are many — are less concerned about its goal and than its execution. It’s an arbitrary formula that mashes together a team’s winning percentage with those of its opponents (and opponents’ opponents), and as a result, it amplifies the importance of a strong schedule at the expense of everything else.
|CORRELATION WITH COMMITTEE RANK|
|YEAR||RPI RANK||KENPOM RANK|
The amount of sway the RPI still holds over the selection process in 2017 is a matter of debate. Officially, the NCAA maintains that the RPI is just one of many tools at the committee’s disposal, and the organization takes pains to show just how little influence the metric has whenever media members are invited to attend mock selection exercises. Bill Hancock, who spent 13 years as director of the men’s basketball tournament before moving on to lead the BCS and College Football Playoff, told me that the RPI’s clout with the committee has been waning for years. “In the first couple years I was there, it carried more weight,” Hancock recalled of the late 1980s, when he first joined the NCAA’s staff. “But by the time I left [in 2005], it really was just another factor — nothing more.” He said the tournament had tweaked the RPI formula “a time or two” during his tenure, and that it had even made a concerted effort to reduce the stat’s influence during the 1990s.
Even so, Ken Pomeroy, dean of college basketball statheads and one of the people invited to the NCAA meeting, thinks it’s impossible not to be affected by the omnipresence of RPI-related data points in the committee room. “In discussions with other committee members, they always stress to me, ‘Hey, we’re not just relying on the RPI, we’re allowed to use whatever we want,’” Pomeroy said. “But obviously it’s much more convenient to use the RPI, because the RPI is on their computer screen, in front of their face.”
Pomeroy himself helped intensify the public’s desire for something better than the RPI when he launched his tempo-free ratings at KenPom.com in 2004. In the ensuing 13 years, Pomeroy’s numbers have become the de facto industry standard for public-facing college basketball statistics; in turn, their increased popularity has driven fans and the media to pore over selections and seedings using tools far more advanced than the nearly four-decades-old RPI.
Coaches, too, know that the RPI isn’t up to snuff. The National Association of Basketball Coaches raised concerns last May about the metrics being used to evaluate their teams, and David Worlock, the NCAA’s director of media coordination and statistics, told me the organization pushed for the inclusion of more up-to-date stats in the selection process.
In a statement, the NABC said, “The NABC ad hoc group never had specific concerns about a single metric or metrics being included in a potential composite ranking. The coaches in the group simply expressed interest in utilizing both predictive and results-based metrics. The only concern expressed was that the coaches didn’t want to completely move away from using metrics that still factor in wins and losses.”
Worlock is spearheading the new-metrics initiative. “It’s important to stay relevant; it’s just as important to have justification and rationale for every decision that gets made during selection week,” he told me via email. “We recognize the flaws in the RPI, and while there isn’t a perfect metric or combination of metrics, we owe it to ourselves and to the committee to use additional data that exists so that we are not overly relying on the RPI to measure teams and sort data.”
A few easy solutions
The analytics meeting in Indianapolis was broad and open-ended, Pomeroy said, much more the beginning of a conversation than a definitive conclusion. The hope was that the league could find a way to take most of the committee’s mechanisms that are currently underpinned by RPI — like the so-called “nitty-gritty report,” which breaks down a team’s records against opponents from different ranking tiers — and replace them with similar mechanisms based on a blend of more modern metrics.
Everyone I talked to agreed that one of the most important (and easiest) reforms would be to find a better method of balancing the quality of a team’s home and away records. The committee’s current system emphasizes a team’s record, but makes no distinction between a close loss on the road and a close win at home. (The former could be more suggestive of a good team.) As of now, a home win against a top-25 team is considered better than a road win versus a top-50 team, though both of those wins could be equally difficult.
Every state-of-the-art power rating now makes a home-court adjustment, so a new, composite ranking could easily calibrate the strength of a team’s opponent to include a difficulty boost if the game was played on the road and a downgrade if it came at home. This kind of modification would instantly affect which bubble teams make it into the tournament and could even change schools’ scheduling habits in future seasons.
When the discussion turned to whether scoring margin should be considered by the metrics that feed the new rankings, however, the questions got more complex. On the one hand, research shows that a team’s point differential is more predictive of future outcomes than its win-loss record.1 On the other hand, the inclusion of victory margin could encourage coaches to run up the score or, less nefariously, could lead the metric to misconstrue how dominant a team was over an entire game by focusing only on the final tally.
“If a team is up 20, do they keep the starters in for the final 90 seconds to keep the lead at 20,” Worlock wonders, “or do they risk winning by ‘only’ 14 because the walk-ons allowed a couple of three-pointers? [And] are there injury risks if the decision to play the starters longer is the direction a coach chooses to go?”
For veterans of the college-sports ranking business, college basketball’s debate echoes what college football went through many years ago. After complaints that excessive score-padding carried potentially undeserving teams to the national title game in 2000 (Florida State) and 2001 (Nebraska), the BCS asked its computer systems to begin disregarding point differentials in the 2002 season. The change led to a mini-revolt among number-crunchers, several of whom recused themselves from the process rather than alter their formulas to remove what they saw as vital information about teams’ quality.
Today’s statisticians downplay such an uncompromising approach, however, pointing to such solutions as game control, which measures dominance through a team’s average in-game win probability rather than the final score, and strength of record, which measures how difficult it would be for a generic “good team” to earn a specific team’s record, given its opponents. Both use a mix of metrics that measure how dominant a team was without explicitly accounting for a team’s margin in a way that coaches could manipulate during garbage time.
It’s (philosophically) complicated
But beyond the practical concerns of sportsmanship, the scoring-margin debate also speaks to a deeper philosophical question at the core of any ranking system: Should the NCAA’s new metric reward the best teams or the most accomplished ones?
The selection committee is only starting to wrap its head around the subtle distinction between the two categories. Due to luck, underperformance and other circumstantial factors, it’s very possible for the most talented team in the country to not have the most impressive record in the country. Since things like point differential are more predictive measures, they give us a better read on a team’s underlying talent (the “best” team), while a metric like strength of record is more retrospective (the “most deserving” team). Because there will always be incongruities between the two types of rankings, a good ranking system will explicitly decide beforehand whether it’s measuring talent for the future or rewarding accomplishment in the past — or, if it’s somewhere in between, what the intended mix is.
In the past, the NCAA has sent conflicting messages about whether its selection and seeding process is fundamentally a forward-looking endeavor or a backward-looking one. “In terms of the how the committee should select teams, it actually says in [the NCAA guidelines] that they need to select the best 36 at-large teams — based on results,” Pomeroy said. “Best” and “based on results” don’t always line up,, though. “It’s like a contradiction right in that sentence.”
The distinction may not lend itself to a tidy resolution. The coaches want the committee to be armed with better metrics, but they don’t want them to consider scoring margin because it affects in-game decision-making. The committee wants to reward a team’s entire body of work, but will also drop teams when they suffer a key injury on the eve of the selection. And whatever compromise the NCAA settles on will be served up for hungry fans and members of the media to instantly pick apart.
Still, the move toward a more modernized NCAA tournament selection system is an encouraging sign. Hancock told me that NCAA officials held a similar meeting during the 1990s — complete with some of the same ratings gurus — but nothing ever came of it. Although there are no guarantees that anything will change this time around, either, the climate seems more friendly to reform in 2017 than it has been in a long time.2 The NCAA’s next statistical guide may not be perfect, but any revisions it contains would represent four decades of progress and would likely serve as an official endorsement of metrics that the hoops cognoscenti have relied on to pick their own brackets for a long time.