How To Tell If A March Madness Underdog Is Going To Win

When the NCAA men’s basketball tournament picks up full speed Thursday, many fans will tune in with the hopes of seeing one thing: upsets. Some no doubt will come on last-minute buzzer beaters, but plenty will probably be long-simmering, the kinds of games that you can’t look away from. Those games will be all about tension: The underdog can’t possibly hold on to this lead, can it?

We can figure out the answer to that question. Or at least what we should expect to happen when an upset is brewing — specifically, at what point in the game an underdog with a lead is more likely to win than lose.

I analyzed play-by-play data from every NCAA tournament since 2004, which is the earliest that second-by-second scoring data is readily available. I considered all games played by teams with different seeds, leaving me with about 700 games to analyze. In the analysis, I estimated the probability that a lower-seeded team (i.e., the better team, according to the selection committee) wins the game, depending on the score and the time remaining.

At the opening tipoff, the underdog has a 29 percent chance of winning the game. But if the game is tied or the underdog is ahead with five minutes remaining in the first half, the probability of an upset is higher than 50 percent.1

It is, of course, not that simple. There’s a big difference between an “underdog” that’s a No. 2 seed and one that’s a No. 16 seed.

To better distinguish between these two cases, I split the data based on “big” and “small” upsets. Any game in which there was more than a four-seed difference in the teams’ seedings was considered a potential big upset, and the games in which the difference was four or fewer were counted in the “small upset” category. (A No. 10 seed beating a No. 7 seed is a small upset, a No. 11 seed beating a No. 6 seed is a big one, etc.)2

The graph shows the difference between the average upset, the big upset and the small upset. As you might expect, the big underdogs begin the game with a slightly lower probability of winning (about 20 percent). Also unsurprisingly, a big underdog with the lead does not cross the 50 percent win probability threshold until around halftime.

But the results are very different when you look at small upsets. In these matchups, when the underdog has a lead or the game is tied at any point more than five or six minutes into the game, it’s likely that we’re going to see an upset.

We all know, however, that all leads are not created equal. Through the rest of the article, I’ll split the data a bit further, based on how big the underdog’s or favorite’s lead is: a three-possession game (a lead of 7 or more points), a two-possession game (a 4-to-6-point lead), or a one-possession game (0-to-3-point lead).

The figure above shows the results from this analysis.3 When a game is close, within 3 points in either direction (gray), the average favorite is still more likely than not to win. But, assuming the game stayed within 3 points all the way through, the chances of an upset increase throughout the game. By the last few minutes, if the game is within one possession, the average better-seeded team has only a slight advantage in win probability. The probability that an average underdog with a two-possession lead (light green) will win crosses the 50 percent threshold with about five minutes left in the first half. And an underdog who leads by more than 7 points (dark green) perhaps shouldn’t be considered an underdog at all.4 Its odds of completing the upset are more than 50 percent very early in the first half.

Things are slightly different in big upset situations; the underdog must wait until the second half for a modest lead (4 to 6 points) to trump the seedings.5 If, however, you’re watching a game in which a big underdog has a three-possession lead in the first half, keep watching because there’s a good chance that it’ll pull off the upset.

In smaller potential upsets, an underdog with at least a 4-point lead (light and dark green) at nearly any point in the game has a better chance of winning than losing. The underdog wins about 40 percent of one-possession games (gray), regardless of the time remaining.

So, as you watch games, don’t get too excited about a big underdog with a small lead, at least until the second half. If there is not a big gap in the seedings between the two teams, then the scoreboard, not the seedings, is what matters. And remember, this whole analysis is the aggregation of hundreds of games. Any one particular game can certainly defy the odds.

Check out FiveThirtyEight’s March Madness predictions.

## Footnotes

1. The green line in the first chart does not fully reach 100 percent because games tied at the end of the second half are included in the analysis.

2. Ideally, I would break up the data more finely by looking at each possible seeding matchup, but with only 700 games and 120 possible matchups to work with, there was not enough data.

3. The jaggedness in the raw data has been smoothed with cubic regression.

4. Although there are almost zero games in which a team is ahead by seven points in the first minute or two, the cubic regression allows me to estimate these probabilities. There’s more uncertainty about the exact estimates early in the game because there is less data, but this problem disappears just a few minutes into the game.

5. The tangling of the curves in the middle graph shouldn’t lead us to believe that a 4-point lead very early in the game is better than a 7-point lead. It is more the result of a very small number of games with huge score differentials early in the game. Those open up the possibility of outliers influencing the shape of the curves on the left side.

Stephen Pettigrew is a Ph.D. candidate in the Harvard Department of Government. In addition to studying American politics, Stephen writes about sports analytics on his blog, Rink Stats.