For a better browsing experience, please upgrade your browser.

FiveThirtyEight

Sports

Looking for a World Cup favorite? All you really need to know is this: The World Cup gets underway Thursday in Sao Paulo, and it’s really hard to beat Brazil in Brazil.

Today we’re launching an interactive that calculates every team’s chances of advancing past the group stage and eventually winning the tournament. The forecasts are based on the Soccer Power Index (SPI), an algorithm I developed in conjunction with ESPN in 2010. SPI has Brazil as the heavy favorite, with a 45 percent chance of winning the World Cup, well ahead of Argentina (13 percent), Germany (11 percent) and Spain (8 percent).1

Here’s where I’d insert the punchline about how you didn’t need a computer to tell you that Brazil is the favorite. But some of you apparently did.

True, Brazil is the betting favorite to win the World Cup — but perhaps not by as wide a margin as it should be. The team’s price at the betting market Betfair as of early Sunday evening implied that it has about a 23 percent chance of winning the World Cup2 — only a little better than Argentina (19 percent), Germany (13 percent) and Spain (13 percent).

silver-feature-wcpreview-1

Argentina, Germany and Spain, like Brazil, are wonderful soccer teams. You could perhaps debate which of the four would be favored if the World Cup were played on a hastily constructed soccer pitch somewhere in the middle of the desert.

But this World Cup is being played in Brazil. No country has beaten Brazil on its home turf in almost 12 years. Brazil’s last loss at home came in a friendly on Aug. 21, 2002. That game against Paraguay, incidentally, is one the Brazilians may not have been particularly interested in winning. Brazil had won the World Cup in Japan earlier that summer; the Paraguay match was the team’s homecoming. Although Brazil started most of its regulars, by midway through the game it substituted out almost all of its stars.

To a find a loss at home in a match that mattered to Brazil — in a World Cup qualifier, or as part of some other tournament — you have to go back to 1975, when Brazil lost the first leg of the Copa América semifinal to Peru. None of the players on Brazil’s current World Cup roster was alive at the time.

It may be that the impact of home-field advantage is gradually declining in international soccer. Travel conditions are somewhat better than they were a few decades ago — provided you’re not flying coach, which international soccer stars normally aren’t. Meanwhile, the rise of the international transfer market means that those stars may be playing far from home to begin with. Of the 23 men coach Luiz Felipe Scolari selected for the Brazilian team, all but five play for club teams in Europe. (It’s hard to know for sure, but one imagines that if Pelé were playing today, it might be for Real Madrid or Bayern Munich — not Santos.)

Even so, home-field advantage is large in soccer as compared with other sports — especially in transcontinental competition, where travel distances are longer. In World Cups since 1990, a period that includes several hosted by countries that didn’t have winning soccer traditions, home teams have a record of 27 wins, six draws, and six defeats.3 SPI’s estimates of home-field advantage are based on more recent data still — games from late 2006 onward.

But Brazil’s edge is not based solely on home-field advantage.

The challenge of rating international soccer teams

Suppose we insist on a purist’s approach to rating the teams. First, we look only at relatively recent matches (those since the completion of the previous World Cup in South Africa). Second, we look only at important games, excluding all friendlies. Third, we pay no attention to the scoring margin — wins, losses and draws are all that matter. And fourth, we look only at games against top-flight competition — specifically against other teams that qualified for this year’s World Cup. Each team’s record in such games is as follows4:

silver-feature-wcpreview-1-table

Our first problem comes with the small sample sizes. Brazil and Germany have played just six of these games in the almost four years since South Africa, for example. But it gets a lot worse. England has played only four. The Netherlands has played only two. Cameroon and Ghana haven’t played any at all. This isn’t a completely useless list. In fact, Brazil, Argentina, Spain and Germany emerge as a reasonably clear top four.

The other big problem is that almost all of this play occurred within continents, such as for continental championships and in World Cup qualifying matches. (The Confederations Cup, held in Brazil last year and dominated by the home team, was the major exception.) The United States’ record of five wins, three losses and one draw looks relatively promising, for instance. But all those games were played against the three other North American teams who qualified: Mexico, Honduras and Costa Rica. It’s pretty well established that the U.S. usually gets the better of Costa Rica and Honduras, and can hold its own against Mexico. That doesn’t say much about whether the U.S. can beat Germany, Ghana or Portugal.

There simply isn’t much information about how particular national soccer teams play against one another when they have the most on the line, especially in games involving teams from different continents. That’s why they play the World Cup, of course. But that isn’t very helpful in trying to anticipate the tournament’s outcome.

A quick introduction to SPI

I designed SPI to address some of these problems. SPI is a little complex as compared with something like our NCAA basketball projection model. Complexity isn’t necessarily a good thing when it comes to a forecasting model. Among other problems, more complex models may require more computational power (SPI takes a long time to run) and more time to prepare and clean data (SPI requires us to link players between club and international competition, not so easy given the state of soccer data). Also, more complex models may be less transparent and harder to explain. There’s something to be said for a simple model that you know to be flawed, so long as you can point out when and where those flaws are likely to occur.

With that said, we’ve been reasonably pleased with SPI’s results in 2010 and since, and it’s less complex in principle than in practice. The principles behind it are as follows:

  • It’s predictive, rather than retrospective. It’s not trying to reward teams for good play — it’s trying to guess who would win in a match played tomorrow.
  • It weights matches on a varying scale of importance based on the composition of lineups. Sometimes even friendly matches are taken quite seriously, such as if a team is playing against a historic rival, or if it badly needs a tune-up before an upcoming tournament. Sometimes even tournament matches are blown off if a team has already clinched its position. Where there is sufficient data to do so, SPI evaluates whether a team has its best lineup in the game by comparing it against the lineups used in the most important matches. We’d know that Brazil wasn’t taking its 2002 friendly against Paraguay all that seriously, for instance, because it pulled all the players who helped it win the World Cup just months earlier.
  • It assigns both offensive and defensive ratings to teams (as some basketball-rating systems like Ken Pomeroy’s do). The offensive and defensive ratings are meant to reflect how many goals a team would score and allow if it played an average international team.5 A lower defensive rating is therefore better, while a high offensive rating is good. Soccer is a fluid sport, so offense and defense aren’t easy to separate. Nevertheless, there are some useful reasons to handle things in this way. In particular, we’ve found that SPI defensive ratings have a little more predictive power than offensive ratings in games against elite competition, like most of those matches that will be played in the World Cup. This may reflect the fact that high offensive ratings can result from running up the score against inferior competition. (Among the “big four” teams this year, Germany is notable for having a prolific offense but a back line that sometimes concedes soft goals.)
  • Finally, in addition to rating national teams, SPI uses data from major international club leagues (England, Spain, Germany, Italy and, newly this year, France) and competitions (like the Champions League and the Europa League) to rate their players. This works by assigning a plus-minus rating to each player on the pitch for a given match (see here for much more detail). The plus-minus system isn’t that advanced because the data isn’t either — we basically have to make a lot of inferences from goals, bookings and starting lineups and substitutions. Still, merely knowing that a player is in the starting lineup for FC Barcelona or Chelsea tells you a fair amount about him.

Technically speaking, SPI is two rating systems rolled into one: one based solely on a national team’s play, and one that reflects a composite of player ratings for what SPI projects to be a team’s top lineup. Usually the two components are strongly correlated with one another. But there are some minor exceptions. The United States, for instance, would rank something like 15th in the world based solely on our national team’s play, but SPI has us a little lower because American players aren’t accomplishing much in Europe. The contrast would be a team like France — its national team results have been inconsistent, but it always has a lot of talent, which may or may not come together.

A tiny bit more housekeeping about SPI and the interactive: First, in addition to an offensive and defensive rating, each team also has an overall SPI rating (for instance, 89.1 for Spain). This reflects what percentage of the possible points a team would accumulate6 if it played a round-robin against every other national team.

This definition is fairly obscure; the more interesting question is about a team’s chances against the others it will actually face in Brazil. These are also listed in the interactive: For instance, the United States has a 38 percent chance of beating Ghana and a 29 percent chance of drawing with it. We’ll be updating the numbers at the conclusion of each match.7

There’s a lot more detail available on SPI here and here. The main improvement in the model since 2010 is that Alok Pattani and his colleagues at ESPN Stats & Info have put a lot more work into using SPI to predict the results of individual matches, and particularly the distribution of possible final scores. These match projections are calibrated based on the historical results that most resemble the World Cup, i.e. competitive (non-friendly) matches between the top 100 SPI teams. They use something called a diagonal inflated bivariate Poisson regression to estimate the distribution of possible outcomes. The fancy math is necessary because goals scored and goals allowed are used as tiebreakers in qualifying out of World Cup groups, so knowing the chance of a 1-0 win compared to a 2-1 win or a 2-0 win is sometimes important.

Travel distance and South America vs. Europe

We also put a lot of research into evaluating whether travel distance matters (above and beyond home-field advantage). Is it important, for instance, that Uruguay is traveling much less far to Brazil than Russia or Japan is?

Our findings were a bit ambiguous. We found, first of all, that east-west distance traveled matters much more than north-south distance. In other words, any geographic advantage may reflect the avoidance of jet lag rather than the mere fact of being close to home. However, we also found that while the travel effect was reasonably significant when evaluated based on all World Cup matches dating back to 1952, it’s been much less significant in competitive matches taken from the era for which SPI has highly detailed data (from late 2006 onward).

This may be for the reasons I described above — international travel has probably improved, and the notion of a home country is a little different in a period when most of the best players for Brazil or Argentina now play in Europe anyway. We had a lot of debate about whether to include a “strong” adjustment for east-west distance traveled (one calibrated based on data from 1952 onward), a “weak” adjustment (one based on the much weaker signal from 2006 onward) or not to include it at all, and wound up going with the weak adjustment. The weak adjustment makes little difference — it might reduce the advancement odds for a team like Japan by a couple of percentage points, for instance, but not more than that.

You’ll notice that SPI is nevertheless favorably disposed toward the South American teams. It’s not just Brazil — SPI is also slightly higher on teams like Chile, Uruguay and Colombia than other systems are. (The same was true in 2010, when Uruguay and Chile were good bets against the prevailing odds, according to the system.)

The South American teams to qualify for this year’s World Cup have compiled 16 wins, 11 losses and 14 draws against European qualifiers in games played since the completion of the last World Cup. All those matches except those in the Confederations Cup were friendlies, so they may not be that informative. Nevertheless, SPI is placing a big bet on the notion that the level of competition between national teams in South America is at least the equal of and perhaps slightly superior to the level of competition between national teams in Europe. Historically at least, the odds have been somewhat in South America’s favor when games are played in this hemisphere. In World Cups in the Americas since 1950, South American teams have 39 wins, 21 losses and 15 draws in games played against teams from Europe.8 Indeed, no European team has ever won a World Cup played in the Americas.

A whirlwind tour of the eight groups

On the off chance that your eyes didn’t glaze over after “diagonal inflated bivariate Poisson regression,” here’s how SPI sees the groups — and the United States’ chance of advancing.

Group A: Brazil, Cameroon, Croatia, Mexico

There’s little doubt that Brazil is the class of the group — SPI gives the team a 99.3 percent chance of advancing to the knockout stage — and that Cameroon is the weakest link. Just how weak is an open question given Cameroon’s lack of competitive matches against top-flight teams and a threatened boycott over how much its athletes will be paid. But most likely the second knockout slot will go to Mexico or Croatia.

SPI is not fond of either team. It sees Croatia as having the slightly better player talent but Mexico as playing a little better as a unit — despite its struggle to qualify for the World Cup at all. It’s worth mentioning that Mexico’s international record is not so bad outside of this cycle’s World Cup qualification — it dominated the 2011 Gold Cup, for instance.

Part of this is about how much to weigh a longer history of results against more recent ones. The SPI view is that a team’s form can vary a lot from competition to competition but not necessarily in a predictable way, and that you should generally err on the side of the team with the better long-term history. Either way, Brazil (and SPI) would really have to blow it to not pass through the group stage with relative ease.

Group B: Australia, Chile, Netherlands, Spain

This group — not the one the United States is in — is the “Group of Death,” with three teams ranked in the SPI top 10. That’s unfortunate for Australia, which is the odd team out and has less chance than any other squad of advancing to the knockout stage, according to SPI.

Instead the questions are, first, whether the Netherlands or Chile is superior, and second, whether both might be strong enough to deny Spain a place in the knockout stage.

SPI’s answer to the first question is Chile — but both teams are hard to rate. Chile has been prone to playing well against weaker competition but not so well against the world’s elite; that could be a sample-size fluke or it could be something real. The Netherlands, meanwhile, played quite miserably in Euro 2012 after advancing to the World Cup final in 2010. That could also be a fluke, but the team is aging, as Robin Van Persie, Wesley Sneijder and Arjen Robben each recently celebrated their 30th birthdays.

Put it like that, and Spain seems safe. But SPI estimates there’s a 20 percent chance that both the Netherlands and Chile play up to the higher end of the range, or they get a lucky bounce, or Australia pulls off a miracle, and Spain fails to advance despite wholly deserving to.

Group C: Colombia, Greece, Ivory Coast, Japan

This is one of the weaker groups and sets up nicely for Colombia, which has plenty to recommend it despite playing in its first World Cup since 1998. In contrast to Chile, Colombia has held up reasonably well against the world’s elite: a draw against the Netherlands in the Netherlands and against Argentina in Argentina; a win against Belgium in Belgium. The team also has some questionable results, however, like draws in friendlies against Senegal and Tunisia.

But who can challenge Colombia? Greece is just the sort of team that SPI usually isn’t keen on: a mid-tier European squad that lacks elite talent. However, the alternatives are Japan and Ivory Coast, and this does not look like a promising year for teams outside of Europe and South America, who collectively have just a 2.2 percent chance of winning the World Cup. Japan also has as far to travel as any team in the field and is indeed nearly antipodal to Brazil. The better bet is probably Ivory Coast, which is well ahead of both Japan and Greece in the player-ranking component of SPI, but whose captain, Didier Drogba, is now 36 years old. It’s a flawed group of opponents, although Colombia has sometimes lost or drawn against flawed opponents.

Group D: Costa Rica, England, Italy, Uruguay

Betting markets see England, Italy and Uruguay as about equally likely to advance while Costa Rica is in a distant fourth place. SPI, by contrast, has England and Uruguay ahead of Italy and views the group as middling enough that Costa Rica could pull off a huge upset.

Both England and Italy rank more highly according to the player-rating component of SPI than based on their play as national teams. This is a common state of affairs for England but less so for Italy, which rarely has among the best offenses in the world but which has normally played more consistent defense. Instead, Italy conceded 10 goals in five matches in the Confederations Cup last year.

It also might not matter much in the end. England, Italy and Uruguay are the sort of teams that might be able to entertain championship dreams in a World Cup with more parity, but not in one where they would have to overcome Brazil, Argentina, Germany or Spain at some point.

Group E: Ecuador, France, Honduras, Switzerland

SPI is bearish on most European teams as compared to the consensus. France is one of the closer things to an exception: SPI has it as the seventh-best team in the world, whereas it ranks 12th in the Elo ratings and 17th according to FIFA. (As a side note, the Elo ratings are perfectly reasonable whereas the FIFA ratings are not. FIFA ranked Brazil as just the 22nd-best team in the world a year ago — it has since climbed to third — a proposition about as ridiculous as hoping to host a World Cup in Qatar.) The reason is the player-rating side of SPI. France has arguably as much player talent as any team but Brazil, Germany, Spain or Argentina — but its national team results have been inconsistent for a long while.

But France is drawn into a reasonably good group. Switzerland, for some reason, ranks sixth in the world in the FIFA rankings but Elo has it 16th and we have it 21st. Ecuador, which has some credible results against European teams (a draw last month in the Netherlands; a win last year against Portugal) might be the tougher out.

Group F: Argentina, Bosnia-Herzegovina, Iran, Nigeria

It would be a major upset if Argentina failed to advance to the knockout stage — SPI gives the team a 93 percent chance of doing so, the second-highest total in the field after Brazil. SPI also has Argentina as the second-best team in the world, so that’s no huge surprise, but it has an easier draw than Germany or Spain.

Still, Bosnia-Herzegovina, playing in its first World Cup under that flag, is the 13th-best team in the world according to SPI. It’s also one of the most offense-minded teams in the field and not one that will allow opponents to play it safe. For that reason, the first match of the group — between Argentina and Bosnia-Herzegovina on Sunday — could dictate how the rest of the group plays out. Even a loss, however, probably wouldn’t prevent Argentina and Lionel Messi from advancing.

Group G: Germany, Ghana, Portugal, United States

American coach Jurgen Klinsmann is the anti-Joe Namath: He made news by predicting that the United States wouldn’t win the World Cup, and suggested that the team’s goal instead was to advance to the knockout stage.

Statistically speaking, Klinsmann’s assessment is prudent. The U.S., according to SPI, has a 36 percent chance of advancing through Group G, but only a 0.4 percent probability (about 1 chance in 250) of coming home with the World Cup trophy.

Every now and then teams defy the odds. The Villanova Wildcats had perhaps only a 1-in-800 chance of winning the NCAA men’s basketball tournament in 1985 based on their play up to that point, and they won. Still, in such a top-heavy World Cup — one where teams like England and Italy are stretching to entertain championship dreams — this probably isn’t the tournament for Cinderella stories.

And yet, there may be ever so slightly more pessimism than there ought to be about Klinsmann’s lesser goal of leading the U.S. to the knockout stage. The 36 percent chance SPI gives the United States isn’t great — and it’s fallen some since the World Cup draw was announced in December. But it’s a little higher than the prevailing betting odds, which put the Americans’ chances at about 26 percent.

It’s not that SPI takes an especially optimistic view of the U.S. team. The player-rating component of the system hurts it, as I mentioned. While Klinsmann has somewhat deliberately tried to steer the roster toward players who are seeking to gain experience in Europe — and not in MLS — there’s a big difference between playing for Stoke City or Sunderland versus Arsenal or Man U.

However, there may be a bit of irrational fear around Ghana. The African teams did little to distinguish themselves in the 2010 World Cup despite a wonderful opportunity in South Africa. They’re hard to peg because they don’t play competitive matches against the rest of the world all that often, but SPI does not have them on the rise this year.

Portugal? SPI is more down on the Team of Five than it seems it should be. In SPI’s defense, Portugal was a little underwhelming in World Cup qualifying, drawing twice with Israel and once with Northern Ireland. And the team isn’t deep: While Cristiano Ronaldo is one of the best two or three footballers in the world, Portugal has no other player who clearly belongs in the top 100.

Germany? Well, they’re really good. But as an offense-minded squad, the team might be ever so slightly more prone toward letting in a soft goal and drawing (although probably not losing) a game that it shouldn’t. Keep hope alive, America.

Group H: Algeria, Belgium, Russia, South Korea

This is the weakest group in the field by some margin just about any way you slice and dice it. According to SPI, it has both the worst best team (Belgium is dangerous but ranks 11th in the world — every other group has at least one team in the top 10) and the worst worst team (Algeria ranks 65th in the world per SPI, the worst in the 32-team field).

The group does provide an opportunity for Belgium to gain a little momentum. It will be important for the Red Devils to win the group outright because the second-place entrant from Group H will face the winner of Group G in the Round of 16 — probably Germany.

The biggest threat to Belgium is from Russia. Russia is hard to peg because it doesn’t play highly competitive international matches all that often as a national team, and because the entire roster is drawn from players whose club play is in Russia itself — not one of the leagues that SPI tracks. From what we can tell, however, Russia is a fairly defense-minded team — possibly a prudent approach in a weak group where a draw against Belgium and a 1-0 win against either Algeria or South Korea would do the trick. It’s also a team that, like the U.S., might take some pride in advancing. Russia will host the World Cup in 2018 but has never made it past the group stage.9

Footnotes

  1. These numbers are as of June 9, 2014. The numbers from Betfair referenced below are as of the evening of June 8. ^
  2. The Betfair odds for the 32 teams add up to slightly greater than 100 percent. I’ve prorated each team’s odds so that they equal 100 percent exactly. ^
  3. These totals count both Japan and South Korea as home teams in 2002. They account for games decided on penalty kicks as wins or losses; some soccer statisticians prefer to record such games as draws. ^
  4. These figures treat games won on penalty kicks as draws. ^
  5. With more than 200 national sides, an average international team is pretty bad — think of a team like Canada that isn’t atrocious but that would rarely qualify for the World Cup. ^
  6. At three points for a win and one point for a draw. ^
  7. These updates will immediately reflect the direct effects of a team’s performance. For instance, if the United States wins against Ghana, its chances of advancing from Group H will be considerably improved. There’s also a secondary or indirect effect — the fact that the U.S. beats Ghana will slightly increase its SPI score, which could further improve its chances in future matches. We’ll be running new simulations at the end of each match but new SPI figures only once per day (they are more computationally intensive than the simulations themselves), generally in the late evening. Thus, you may see a change in a team’s odds from the evening to the next morning even though no new games have been played.

    Finally, keep in mind that one team’s performance potentially affects the SPI for every other team. If Ghana beats the U.S., for example, that will slightly improve SPI’s estimates of how strong Africa is compared to other continents, and could thereby also improve the odds for teams like Nigeria and Ivory Coast. ^

  8. This particular statistic treats games decided by penalty shootouts as wins or losses rather than draws — I know I’ve been inconsistent about this, as I’ve been drawing data from different sources. ^
  9. The Soviet Union didn’t have much World Cup success either, but did finish in fourth place in 1966 and made the quarterfinals on other occasions. ^

Filed under , , , , , , , , , ,

comments Add Comment

Powered by WordPress.com VIP