The post-election wave of the British Election Study (BES) Internet panel allows us to take a closer look at possible causes of the polling miss during the recent General Election. We previously identified five possible explanations: 1) “don’t knows” shifting, 2) a late swing among voters, 3) Shy Tories, 4) problems achieving a representative sample and 5) differential turnout. This post outlines the evidence we have gathered so far (see our working paper here for a more detailed account).
Late swing and “don’t knows”
The post-election data immediately casts doubt on two of the theories. In our campaign wave, 7% of people said that they “don’t know” who they would vote for. In the post-election survey (when we can see how undecided respondents ended up voting), we find a very small edge for the Conservatives among previously undecided voters. However, “don’t knows” only contribute around 0.05 percentage points towards the polling gap, so it is unlikely to have been a major factor. Similarly, we find that there is no difference between the proportion of respondents supporting the Conservatives in the campaign wave and the post-election wave, making it unlikely that there was a late swing.
Figure 1 Support for each party among BES respondents in the campaign and post-election waves
We also have evidence against the Shy Tories theory. We can examine the Shy Tory theory by considering where there is likely to be social pressure on Conservative voters. For example, it seems unlikely that Tories would need to be shy in the heavily Conservative Shires but it is more plausible that they would be by shy in traditional Labour heartlands like Sunderland. Figure 2 shows that we actually observe the opposite pattern. The deviation between the proportion of BES respondents saying they voted Conservative and the actual proportion of voters who did is highest in strong Conservative areas where we would expect the least social pressure against voting Conservative.
Figure 2 Conservative 2015 vote share in the BES post-election survey and actual results according to 2010 Conservative and Labour shares
We also find no evidence for another aspect of the Shy Tories theory. Several pollsters have suggested that placing the vote intention question later in a survey makes respondents more willing to admit that they plan to vote Conservative. In the first three waves of the BES we randomized the placement of the vote intention question to be either at the start of the survey or at the end after all other questions. We find that the question placement makes no difference to the proportion of respondents intending to vote Conservative. Taking these findings together, we are doubtful whether Shy Tories were a major contributor to the polling miss.
We also have more evidence about the representativeness of polling samples used before the election. One possible source of non-representativeness could be the groupings used for weighting by polling firms. For instance, in the graph below, we look at whether the age groupings used for weighting (both by YouGov and many other polling firms) hide variation within those groups (thanks to Helmut Platt for suggesting this possibility). The red line shows the lead in share of the vote that the Conservatives have over Labour by age. The vertical lines represent the breakpoints between the standard age bands used for weighting. The bars show the difference between the percentage of BES respondents of a particular age and the percentage of the 2011 population of the same age (e.g. positive bars mean that an age is over-represented in the BES).The most important deviation is the oldest age group, where younger (less Conservative leaning) respondents are overrepresented whilst older (more Conservative leaning) respondents are underrepresented. The net effect of this difference is to dampen the Conservative lead. This problem is even greater for the oldest respondents in the sample –those over age 80 make up 5.1% of the population, but only 0.5% of the BES. This evidence suggests there is some pro-Labour bias due to the age groupings used, but this might yet be cancelled out by other parts of the weighting scheme. We will need to examine all the weighting variables before we can draw conclusions about the contribution of non-representative samples to the polling miss.
Figure 3 Deviation in proportion of respondents of each age compared to census (left axis). Vertical lines designate the age boundaries of the weighting age groups. The red line is a LOWESS regression of Conservative-Labour lead against age among BES respondents in wave 6 (right axis).
There is also new evidence for the differential turnout theory. 91.6% of our respondents claim to have voted compared with 66.4% in Great Britain as a whole. While this partially reflects the fact that polling respondents tend to be more politically interested than the general population, we also have considerable evidence that respondents overstate their turnout: 20% of respondents in areas without local elections claim to have voted in them in 2015; 3-6% of respondents in the campaign wave claim to have voted by post before the postal ballots were actually issued and 46% of respondents who we could not verify as registered to vote in June 2014 claim to have voted in the 2014 European Elections. In all of these cases, the fibbers lean significantly more Labour than other respondents.
We look at the impact of overstated turnout more precisely by building a predictive model of turnout based on the validated vote in the 2010 BES face-to-face survey. The model accounts for a respondent’s stated likelihood of voting prior to the election, turnout in previous elections, their age, marital status, household income, unemployment and trade union membership, as well as several constituency factors, including the overall turnout in their constituency in the previous General Election. After accounting for these factors, we estimate that our respondents’ turnout is likely to have actually been around 73.4%. Importantly, we can look at how vote intention differs among respondents who have different predicted probabilities of voting. Figure 4 shows that the Labour lead among unlikely voters grew hugely between 2010 and 2015, suggesting that differential turnout is an important factor in explaining the polling miss: considerably fewer of those saying they were going to vote Labour are likely to have actually turned out to vote. Re-weighting our respondents according to their predicted probabilities of voting explains about 25% of the gap in the Conservative lead between the pre-campaign wave of our survey and the actual election results.
Figure 4 Predicted probability of pre-election vote intention by predicted turnout probability in 2010 and 2015. The white bars show the distribution of predicted turnout probability in each year. The shaded areas illustrate the size of the Labour-Conservative gap amongst those less likely to vote for each year.
The evidence in the BES suggests that the reason for the increased impact of differential turnout is not due to a change in the relative enthusiasm between Labour and Conservative supporters since 2010. 84% of Labour supporters in 2015 said that it was “very likely” that they would vote, compared to 86% of Conservative supporters, while in 2010 the figures were 87% and 90% respectively. Rather the data suggest that the increase in the turnout gap between Labour and the Conservatives can be explained by shifts in party support amongst those who are actually less likely to turnout to vote, even if they say they will. This evidence strongly suggests that differential turnout was a major factor in the polling miss.
If differential turnout is the primary cause of the polling problems, this is relatively good news for pollsters. It should be possible for pollsters to fix many of their by using turnout weighting that accounts for the wider set of factors we have identified.
Our analysis of the post-election BES data makes us much more sceptical about late swing, “don’t knows” and Shy Tories. By contrast, we are leaning strongly towards differential turnout as part of the explanation and think that it’s likely that sampling and weighting also played at least some role.