How Our 2019 Women’s World Cup Predictions Work

Editor’s note: This article is adapted from an article about our 2018 World Cup predictions.

The Women’s World Cup is back, and so is another edition of FiveThirtyEight’s Women’s World Cup predictions. For those of you familiar with our World Cup forecast for the men in 2018, or our club soccer predictions, much of our 2019 forecast will look familiar. We show the chance that each team will win, lose or tie every one of its matches, as well as a table that details how likely each team is to finish first, second or third in its group and advance to the knockout stage. Our predictions also incorporate in-game win probabilities that update in real time.

Below is a summary of how the forecast works, including a description of FiveThirtyEight’s Soccer Power Index (SPI) ratings, how we turn those ratings into a forecast and how we calculate our in-game win probabilities.

SPI ratings

At the heart of our forecast are FiveThirtyEight’s SPI ratings, which are our best estimate of overall team strength. In our system, every team has an offensive rating that represents the number of goals that it would be expected to score against an average team on a neutral field and a defensive rating that represents the number of goals that it would be expected to concede. These ratings, in turn, produce an overall SPI rating, which represents the percentage of points — a win is worth 3 points, a tie worth 1 point and a loss worth 0 points — the team would be expected to take if that match were played over and over again.

To generate our SPI ratings, we run through every past match in our database of women’s international matches — back to 1971 — evaluating the performance of both teams with four metrics:

The number of goals they scored.
The number of goals they scored, adjusted to account for red cards and the time and score of the match when each goal was scored.
The number of goals they were expected to score given the shots they took.
The number of goals they were expected to score given the nonshooting actions they took near the opposing team’s goal.

(These metrics are described in more detail in our post explaining how our club soccer predictions work. In matches for which we don’t have play-by-play data, only the final score is considered.)

Given a team’s performance in the metrics above and the defensive SPI rating of the opposing team, it is assigned an offensive rating for the current match. It is also assigned a defensive rating based on its pre-match defensive rating and the attacking performance of the other team.

These match ratings are combined with the team’s pre-match ratings to produce new offensive and defensive SPI ratings for the team. The weight assigned to the new match’s ratings is relative to the game’s importance; a World Cup qualifier, for example, would be weighted more heavily than an international friendly.

Match forecasts

Given each team’s SPI rating, the process for generating win/loss/draw probabilities for a World Cup match is three-fold:

We calculate the number of goals that we expect each team to score during the match. These projected match scores represent the number of goals that each team would need to score to keep its offensive rating exactly the same as it was going into the match.
Using our projected match scores and the assumption that goal scoring in soccer follows a Poisson process, which is essentially a way to model random events at a known rate, we generate two Poisson distributions around those scores. Those give us the likelihood that each team will score no goals, one goal, two goals, etc.
We take the two Poisson distributions and turn them into a matrix of all possible match scores, from which we can calculate the likelihood of a win, loss or draw for each team. To avoid undercounting draws, we increase the corresponding probabilities in the matrix.¹

Take, for example, the 2014 men’s World Cup opening match between Brazil and Croatia. Before the match, our model was very confident that Croatia would score no goals or one goal. Brazil’s, distribution, however, was much wider, leading to its being a significant — 86 percent — favorite in the match.

Although there is evidence that home-field advantage in soccer is shrinking, teams still get a boost in performance when playing the World Cup on home soil. Similarly, teams from the same confederation as the host nation experience a smaller but still measurable improvement in their performances. In the 2019 Women’s World Cup, we’re applying a home-field advantage for France of about 0.15 goals and a bonus about one-half that size to all teams from the UEFA confederation. These are both a bit smaller than the advantage that historical World Cup results suggest.

Tournament forecast

Once we’re able to forecast individual matches, we turn those match-by-match probabilities into a tournament forecast using Monte Carlo simulations. This means that we simulate the tournament thousands of times, and the probability that a team wins the tournament represents the share of simulations in which it wins it. As with our other forecasts, we run our Women’s World Cup simulations hot, which means that each team’s rating changes based on what is happening in a given simulation.

Live match forecasts

Our live match forecasts calculate each team’s chances of winning, losing or drawing a match in real time. These live win probabilities feed into our tournament forecast to give a real-time view of the World Cup as it plays out.

Because we lack enough play-by-play data for women’s international soccer to build a live model from scratch, the parameters described below were initially established while building our live model for the 2018 men’s World Cup. When possible, we’ve verified that these parameters and decisions carry over to the women’s game.

Our live model works essentially the same way as our pre-match forecasts. At any point in the match, we can calculate the number of goals we expect each team to score in the remaining time. We generate Poisson distributions based on those projected goals and a matrix of all possible scores for the remainder of the match. When the matrix is combined with the current score of the match, we can use it to calculate live win probabilities.

For example, in the 65th minute of that same Brazil vs. Croatia match, with the score tied 1-1, our projected distributions for the remainder of the match had narrowed considerably. A Brazil win was still the most likely outcome, but much less so than at the start of the match.

Before a match, we can determine each team’s rate of scoring based on the number of goals it’s projected to score over the entire match. This rate isn’t constant over the entire match, however, as more goals tend to be scored near the end of a match than near the beginning.² We account for this increase as the match progresses, which results in added uncertainty and variance toward the end of the match.

We also account for added time. On average, a soccer match is 96 minutes long, with two minutes of added time in the first half and four minutes of added time in the second half. The data that powers our forecast doesn’t provide the exact amount of added time, but we can approximate the number of added minutes in the second half by looking at two things:

The number of bookings so far in the match. Historically, each second-half booking tends to add about 11 seconds of time to the end of the match.
Whether the match is close. There tends to be about 40 extra seconds of added time when the two teams are within a goal of each other in the 90th minute.

Our live model also factors in overtime and shootouts, should we see any in the knockout phase of this World Cup. Our live shootout forecasts follow the same methodology described in this 2014 article.

Finally, we make three types of adjustments to each team’s scoring rates based on what has happened so far in the match itself.

Red cards are important. A one-player advantage is significant in soccer and adjusts scoring rates by about 1.1 goals per match, split between the two teams (one rate goes up; the other down). Put another way, a red card for the opposing team is worth roughly three times home-field advantage.

Consider a match in which our SPI-based goal projection is 1.50-1.50 and the home team has a 37 percent chance of winning before the match. If a red card were shown to the away team in the first minute, our projected goals would shift to 2.05-0.95, and the home team’s chance of winning would go up to 62 percent.

Good teams tend to score at a higher rate than expected when losing. The most exciting matches to watch live are often ones in which the favored team goes down a goal or two and has to fight its way back. An exploration of the data behind our live model confirmed that any team that’s down by a goal tends to score at a higher rate than its pre-match rate would indicate, but the better the team that’s behind is, the bigger the effect.

Take the 2014 Brazil vs. Croatia match. Before the match, Brazil was a substantial favorite, with an 86 percent chance of winning, but it went down 1-0 after Marcelo’s own goal in the 11th minute. Without adjusting for this effect, our model would have given Brazil a 58 percent chance to come back and win the match, but with the adjustment, our model gave the team a 66 percent chance of winning. (Brazil went on to win the match 3-1.)

Non-shot expected goals are a good indication that a team is performing above or below expectation. Anyone who has watched soccer knows that a team can come very close to scoring even if it doesn’t get off a shot, perhaps stopped by a last-minute tackle or an offside call. A team that puts its opponent in a lot of dangerous situations may be dominating the game in a way that isn’t reflected by traditional metrics.

As a match progresses, each team accumulates non-shot expected goals (xG) as they take actions near the opposing team’s goal. Each non-shot xG above our pre-match expectation is worth a 0.34 goal adjustment to the pre-match scoring rates. For example, if we expect non-shot xG accumulation to be 1.0-0.5 at halftime but it is actually 0.5-1.0, this would be a swing of 1.0 non-shot xG, and a 0.34 goal adjustment would be applied to the original scoring rates. This isn’t a huge adjustment; at halftime, the away team in this example would have about a 5-percentage-point better chance of winning the match than if non-shot xG were proceeding as expected.

In the case that there has been a red card in a match, the red card adjustment takes precedence over the non-shot xG adjustment.

We took particular care to calibrate the live model appropriately; that is, when our model says a team has a 32 percent chance of winning, it should win approximately 32 percent of the time. Just as important is having the appropriate amount of uncertainty around the tails of the model; when our model says a team has only a 1 in 1,000 chance of coming back to win the match, that should happen every 1,000 matches or so. The 2019 Women’s World Cup is only 52 matches, so it’s unlikely that our model will be perfectly calibrated over such a small sample, but we’re confident that it’s well-calibrated over the long run.

The U.S. — as usual — is one of the favorites this year, and we hope you follow along with us as the tournament plays out.

Check out our latest Women’s World Cup predictions.

Footnotes

There has been some debate about what kind of distribution best models scoring in soccer. We’ve found that two independent Poisson distributions work well with the addition of diagonal inflation. That is, we generate the two distributions independently but increase the value of each cell in the matrix where the scores are equal by some constant (somewhere around 9 percent, but this differs by league and is based on the degree to which we would have undercounted draws had we not inflated the diagonal).
The rate of scoring in the 85th minute is about 1.4 times the rate of scoring in the fifth minute.

SPI ratings

Match forecasts

Tournament forecast

Live match forecasts

Footnotes

Comments