To mark the beginning of the official campaign period in advance of the May 7 election (now 38 days away), FiveThirtyEight is launching United Kingdom general election predictions, developed by me and my colleagues at electionforecast.co.uk.
Last week, we provided a crash course in the political history of the U.K. Here, we’ll introduce the key ideas behind the forecasts, as well as some of the major challenges of forecasting U.K. elections.
Interpreting polls of the United Kingdom
As in U.S. presidential elections, a lot of the polling here is done at the “national” level. These polls almost always exclude Northern Ireland, so they cover 632 of the 650 parliamentary constituencies in the U.K. This article focuses on our model for the constituencies in England, Scotland and Wales; we use an entirely separate model for Northern Ireland because of the separate party system and more limited data there.
We can draw on a lot of polling history to make sense of U.K. polls. The topline figures from all the polls back to the 1970s — for the Conservatives, Labour and Liberal Democrats — are compiled by several databases and websites. They give a broad historical perspective on how useful U.K. polls were for predicting elections in the past. Like the forecasting methods developed by FiveThirtyEight editor-in-chief Nate Silver and others for use in other countries, ours pools all the polls, taking house effects into account.
Once we do this and compare these polling averages to the results of subsequent elections, it becomes clear that there are consistent differences in the U.K. between what the polls show and what actually happens in elections. The polls evolve in certain patterns as the elections approach, and they tend to be biased in particular ways on election day.
What the average poll says now is not the best guess of what will happen in the subsequent election. Polls tend to overstate changes in support for one party or another since the last election. If, for example, six months before the election, the average poll has the Conservatives down 5 percentage points from their results in the last election, come election day, they end up about 3 percentage points down on average. We can estimate how the relative weight to put on the polls changes as elections approach and use that in our forecast.1
The historical data shows that the polls start moving around much more quickly in the month before each election, when Parliament is dissolved and an election announced. This year, for the first time, the election date was set far in advance2 and the campaign period is a bit longer, so it is difficult to know how the polls will perform.
Setting aside exactly when and how quickly the predictiveness of the polls increases, we can get a pretty good idea from this historical analysis of the overall predictive power of the polls and how to weight them. But this only gets us a U.K. (minus Northern Ireland) vote share for each party. This is the easy part of the problem; the difficult part is translating that vote share into seats.
Interpreting polls of constituencies
In past elections, options for how to forecast seats were very limited. The classic solution was to use an approach called uniform national swing (UNS). That is, if our forecast of the U.K. vote share had Labour up 3 percentage points and the Conservatives down 3 percentage points, we would simply add 3 percentage points to Labour’s results in the last election in each constituency, subtract 3 from the Conservatives’ results in each constituency and tally up the winners using the resulting vote shares. This approach had the virtue of simplicity, and in a two-party world, it worked remarkably well. More sophisticated models applied UNS and then added some random noise to account for the observed degree of deviations in past elections.
But with the breakdown of two-party politics, UNS cannot predict how many seats each party will get. For example, the Scottish National Party is likely to increase its total national vote share from 1.7 percent in the last election to something in the 3 percent to 5 percent range in this election. But we know these gains will not be distributed uniformly across the U.K.; because the SNP only fields candidates in the 59 Scottish constituencies, such an increase implies that the party will gain something like 20 percentage points in each of those seats, enough to win most of them.
Everyone knows that vote share swings vary from constituency to constituency — the question is what kinds of variation we can predict given available data. An obvious solution would be to apply uniform swing at the level of regions, rather than across the whole of Great Britain. And indeed, that is what the most successful forecasts of the 2010 general election did. Uniform regional swing assumes we can measure (using occasionally published regional polls) swing at the regional level and therefore capture whether changes vary by region, although not within.
But, clearly, we could take this logic a lot further. If we could figure out how to measure swings at the constituency level, our predictions could improve even more.
The primary obstacle to doing this in past elections was that very few constituency-level polls were published. The political parties were conducting such polls for internal purposes, but the media was not commissioning them for publication. U.K. parliamentary constituencies have small populations, polls are expensive, and the U.K. media operates primarily at the national level. Why would a national newspaper pay for a poll of a single constituency, which has fewer than 100,000 residents, and which would reveal information just about that constituency? Constituency polling is only useful if a lot of it is being done, across many constituencies. The resources are there to do this in the aggregate, but the media doesn’t have a strong incentive to commission these polls.
A new development in this election is that Michael Ashcroft, a (very wealthy) Conservative in the House of Lords, has been personally commissioning constituency polls at a rate of about a dozen a month for the past year. While there has been much debate about his motives, and about whether his polls are trustworthy, he is abiding by the disclosure rules of the British Polling Council and seems to be releasing all the polls that he commissions. These polls are one of the best resources we have on what is happening at the constituency level. We have also been able to get access to individual-level polling data from YouGov, the most frequent pollster at the national level, which means that we also have small samples for every constituency in addition to the larger Ashcroft polls for some constituencies.
One limitation with these polls is that we do not have the same historical record that we have for the national polls to help us calibrate how predictive they are. A major concern is that Ashcroft’s constituency-level polls reveal substantial differences in the relative support for the parties depending on how the questions are asked, and we have little evidence to indicate which of these questions is more predictive. Ashcroft asks two voting-intention questions in all his constituency polls. The first is the “generic” question that is widely used in U.K. polls: “If there was a general election tomorrow, which party would you vote for?” This is followed up with a more “specific” question: “Thinking specifically about your own parliamentary constituency at the next general election and the candidates who are likely to stand for election to Westminster there, which party’s candidate do you think you will vote for in your own constituency?”
In our U.K. political history article, we noted that support for the Liberal Democrats has fallen since the 2010 election, when the party won 23 percent of the U.K. vote, to under 10 percent in U.K. voting intention polls recently. Because of the first-past-the-post electoral system, the Liberal Democrats were able to secure only 9 percent of the seats with this 23 percent of the vote. With the party’s likely decline in national vote share, its seat forecast depends on how much support it retains where it has incumbent MPs. The generic and specific polling questions have very different implications here. The Liberal Democrats do far better in the specific question than they do in the generic question, especially where they are the incumbents.
We think the specific question is likely to be more accurate based on indirect evidence from the last election cycle, but we won’t really know until the election occurs because it hasn’t been deployed so widely before. We are using the results of the specific question where available in our prediction model and calibrating nonspecific questions asked by other pollsters in their constituency polls to match. All national polls use the generic form of the question, and so this discrepancy between the two question formats is part of why we expect the Liberal Democrats to outperform their current national polling level.
This may also be part of why the U.K. polls tend to overstate changes in support for the parties. When people respond to the standard generic question, they may be effectively answering the question: “Which party do you like the most?” But when they turn out to vote, and perhaps when they answer the more specific survey question, they are considering the candidates in their constituencies and which parties are competitive there. Since the same parties tend to be competitive in the same places from election to election, the results of the generic question might suggest that some people will change their party from the last election, even though they will not ultimately do so, because the party they like the most is never the party they actually vote for given the circumstances in the constituency they live in.
To use these sources of information about current vote intention, we have to do some modeling to fill in the gaps in the constituency polling and stretch the information in the YouGov data. We use a multilevel regression model to describe how vote intention at the constituency level depends on a variety of factors, including region, incumbency, constituency demographics and results in the last election. We then reconcile the constituency-level vote intentions we get from this data with the national-level forecast that we constructed using the national polls, by applying a swing model that we built from the historical record of constituency vote share swings from election to election.3 This step is very important because it ensures that our constituency-level forecasts add up to our U.K.-level forecasts so that we get a consistent translation of votes into seats.
We have checked our approach by “retrocasting” the 2010 election using the more limited data available before that election. We did much better than any published forecasts at the time, but, of course, we knew the results of that election when we designed the model. So the retrocast only shows that our approach is not obviously wrong. There is a lot more data to work with this election, which ought to help, but it could also just provide more rope for us to hang ourselves with. Incorporating new data sources is difficult because without a historical record, we can’t be certain of their accuracy. Much has changed since 2010, and what worked then may not work this year. We will have to wait for May 8 to know how well we’ve done this time.