Today we’re publishing FiveThirtyEight’s club soccer predictions interactive, which includes team ratings, odds for upcoming matches and forecasts for the top five European domestic soccer leagues — the Premier League (England), La Liga (Spain), Bundesliga (Germany), Serie A (Italy) and Ligue 1 (France) — along with the UEFA Champions League, Europe’s premier club competition. Our forecasts are available in both English and Spanish, and we‘ll be adding more leagues in the future, likely starting in a few months with Liga MX, MLS and NWSL.
The forecasts are based on a substantially revised version of ESPN’s Soccer Power Index (SPI), a rating system originally devised by FiveThirtyEight editor-in-chief Nate Silver in 2009 for rating international soccer teams and last revised for the 2014 World Cup. For the interactive, we have updated and adapted SPI to incorporate club soccer scores going back to 1888 (from more than 550,000 matches in all)1, as well as newer play-by-play data from Opta that has been available since summer 2010.
In SPI, each team is assigned an offensive and defensive rating, expressed in terms of number of goals it would expect to score and yield against a middling team — so a high offensive rating is good, and a high defensive rating is bad.2 Given the ratings for any two teams, we can project the result of a match between them in a variety of formats — such as a league match, a home-and-away tie or a cup final — as well as simulate whole seasons to arrive at the probability each team will win the league, qualify for the Champions League or be relegated to a lower division. After every match, a team’s ratings are adjusted based on its performance in that match and the strength of its opponent. Unlike with the Elo rating system we use in several other sports, when a soccer team wins a match but performs worse than expected, its ratings decline.
Underlying quality of play
Soccer can be tricky to model because there are so few goals scored in each match. The final scoreline fairly often will disagree with most people’s impressions of the quality of each team’s play, and the low-scoring nature of the sport sometimes will lead to prolonged periods of luck, where a team may be getting good results despite playing poorly (or vice versa).
To mitigate this randomness, and better estimate each team’s underlying quality of play, we’re using four metrics to evaluate a team’s performance after each match: goals, adjusted goals, shot-based expected goals and non-shot expected goals.
The first is simply how many goals a team scored in the match. The second, adjusted goals, accounts for the conditions under which each goal was scored. For adjusted goals, we reduce the value of goals scored when a team has more players on the field3, as well as goals scored late in a match when a team is already leading4. We increased the value of all other goals to make the total number of adjusted goals add up to the total number of goals scored.
Shot-based expected goals are an estimate of how many goals a team “should” have scored given the shots they took in that match. Each shot is assigned a probability of scoring based on the distance and angle from the goal, as well as the part of the body the shot was taken with, with an adjustment for the player who took the shot5. These individual shot probabilities are added together to produce a team’s shot-based expected goals for that match, which may be bigger or smaller than the number of goals it actually scored.
Non-shot expected goals are an estimate of how many goals a team “should” have scored based on non-shooting actions they took around the opposing team’s goal6: passes, interceptions, take-ons and tackles. For example, we know that intercepting the ball at the opposing team’s penalty spot results in a goal about 9 percent of the time, and a completed pass that is received six yards directly in front of the goal leads to a score about 14 percent of the time. We add these individual actions up across an entire match to arrive at a team’s non-shot expected goals. Just as for shot-based expected goals, there is an adjustment for each action based on the success rates of the player or players taking the action (both the passer and the receiver in the case of a pass).
Take Sunday’s match between Everton and Manchester City, for example. Although Everton won 4-0, our model didn’t see the match as nearly so lopsided. Two of Everton’s goals came with the lead after the 70th minute. Furthermore, Everton took only six shots. Our shot-based expected goals model would expect only about 0.4 of those shots to go in the net, not the four that did. Man City also was the better team according to our non-shot based expected goals model. In all, our composite scores saw the final result as a 2.16-0.84 win for Everton — much narrower than 4-0.
Since all four metrics represent the number of goals a team scored or could have been expected to score during the match, they’re directly comparable, and a team’s composite offensive score is an average of the four metrics; its composite defensive score is an average of the four metrics for its opponent. “An average doesn’t sound very empirical,” you might say, but our testing indicates it does about as well as any other way of combining the metrics. If anything, the expected goals components should count a bit more toward the overall match rating than the goals-based measures, but we have only a little more than six seasons’ worth of data for those components, while we have goals data back to 1888. Therefore, we’re being a little cautious about incorporating this new data. A team is assigned an offensive and defensive rating for a match based on its composite score and the pre-match ratings of its opponent, and these game ratings are combined with the team’s pre-match ratings to produce its updated ratings.
As with our Elo-based rating systems, each team’s ratings change in the offseason. Rather than reverting each team toward the same mean, we revert it toward a time-weighted average of its final rating over the past five seasons. In addition, we adjust each team’s preseason rating based on players it acquires or sells in the offseason.7
Once we’ve established ratings for every team in the leagues we cover, we forecast the outcomes of upcoming matches with a Poisson model that forecasts the estimated number of goals we expect each team to score. The parameters in the model are the offensive and defensive ratings of the two teams, home-field advantage8, and the number of days of rest for each team. We can use these goal forecasts to estimate the probability of each team winning, as well as the chance the match will end in any given score.
We then run Monte Carlo simulations to play out each league’s season 10,000 times using our individual match forecasts. As with our other forecasts, we run our Monte Carlo simulations “hot,” meaning that instead of a team’s ratings remaining static within each simulated season, the ratings can rise or fall based on the simulated matches the team plays. In effect, this widens the distribution of possible outcomes by allowing a weak team to go on a winning streak and increase its ratings substantially, or providing for the possibility that a strong team loses its first few games of a simulated season and is penalized accordingly.
Leagues and tiers
One challenge when building such a system is the large number of leagues around the world: we have over 400 in our database. Determining a team’s strength within its league is relatively straightforward, but figuring out its strength relative to teams in other leagues is a second challenge. There are often few matches between teams in different leagues or regions. For example, clubs in the Americas rarely play European clubs aside from the Club World Cup or summer warmup matches, for which European sides often don’t field their best teams.
|1||England, Germany, Italy, Spain|
|4||Belgium, Czech Republic, Netherlands, Russia, Ukraine|
|5||Austria, Bulgaria, Croatia, Denmark, Finland, Greece, Hungary, Ireland, Israel, Norway, Poland, Romania, Scotland, Slovakia, Slovenia, Sweden, Switzerland, Turkey|
|6||Albania, Andorra, Armenia, Azerbaijan, Belarus, Bosnia, Cyprus, Estonia, Faroe Islands, Georgia, Iceland, Kazakhstan, Latvia, Lithuania, Luxembourg, Macedonia, Malta, Moldova, Montenegro, Northern Ireland, Serbia, Wales|
To compare different leagues, we’ve come up with a tiered system. Each league belongs to a tier, and each successive tier is a bit weaker9 than the one above it. We calculated these tiers using both an analysis of interleague matches (e.g. Champions League or Europa League) and UEFA’s league-strength coefficients.
Right now we’re about halfway through the European club season, and several leagues have good races brewing for the last few months. You can follow along at our interactive.