The release of the British Election Study post-election face-to-face survey allows us to revisit the question of why the polls went wrong before the 2015 General Election. Based on our internet panel, we previously examined five possible explanations for why the polls went wrong and came to the conclusion that the main culprits for the polling miss were differential turnout and the representativeness of polling samples. We stand by our previous conclusions but based on our new data we now conclude that the representativeness of samples is likely to be even more important than we previously concluded. We now think that the primary reason why the polls went wrong before the election is because they were drawn from unrepresentative samples. In this blog we focus on the representativeness of samples, you can find an updated version of our full paper – including further evidence against the ‘Shy Tory’ theory – here.
The BES face-to-face survey is a random sample face-to-face survey – the gold standard survey research method – that was designed to try and include respondents who are usually hard to reach in surveys and maximise representativeness (for more details on the design and representativeness of the survey, see here). By comparing the face-to-face survey with our internet panel (which shows very similar results to the pre-election polls) we can assess where the polls failed to achieve representativeness and how this affected their results.
To start with we can see that the face-to-face survey was much closer to the actual result than internet panel. Unlike the panel survey, which underestimated the Conservative-Labour lead by more than 6 points, the face-to-face survey actually overestimates the Conservative lead by 1.47. The face-to-face survey is not perfect and overestimates the Conservative and Labour share of the vote and underestimates the Liberal Democrat and UKIP shares. We will have to wait until we have finished our vote validation process before we can say whether this represents systematic error, people misreporting turnout or simply sampling error. Nonetheless the errors are considerably smaller than the internet panel.
Why does the face-to-face do a better job of estimating the vote shares than the panel? In short we think it is because it is much better at getting people who did not vote in the election to answer our survey: the reported turnout in the face-to-face is 73.3%, higher than the actual turnout at the election, but considerably lower than the 91.2% reported turnout in the panel. It may seem counter-intuitive that having more people who did not vote affects the apparent differences in party support amongst those who did but the reason it does so is actually quite simple.
All surveys, whether they are conducted in person, by phone, or on the internet, aim to achieve a representative sample. They might do this in a number of ways, such as by using a random sampling design or using sampling quotas to ensure there are the correct numbers of certain types of respondents. Once the survey has been conducted any deficiencies in representativeness can be corrected by weighting the data.
Demographic targets for quotas and weighting are usually taken from information about the British population as a whole, from sources like the census. However those who turn out to vote are not actually representative of the population as a whole – they tend to be older, more affluent, and better educated. The problem for political polling is that those who turnout to vote are also more likely to answer surveys. This may sound like an odd ‘problem’ but when combined with survey targets based on the population, rather than the electorate, it can lead to a distortion in the polls. For example, if young people who vote are more likely to answer surveys than young people who do not vote, then a survey might end up with too many young voters, even when it appears to have the right number of young people. Given that young people tend to be more Labour leaning, this might inflate the Labour share of the vote.
The table below illustrates this problem by comparing the distribution of age in the face-to-face and panel surveys. When looking at the full sample (voters and non-voters), both surveys are very similar and representative of the population. However if we only look at those who said they voted, large differences emerge between the two surveys. In particularly there are more young voters in the panel and fewer older voters.
If we compare the full and voter only columns for each survey it is easy to see why: the full and voter only columns for the face-to-face are much more different from each other than the same columns are for the panel. The sum of the absolute differences between the full and voter only samples is 14.4 for the face-to-face and only 3.9. Given that older people vote much more than young people there should be the differences between the full sample and the voter only sample that the face-to-face survey shows. There are not enough non-voting respondents in the panel and so these differences do not emerge.
In our full paper we conduct the same analysis on other demographic groups including subjective social class, income, and working status, and find similar results. These all point to the same conclusion – because surveys under sample those who do not vote, by making the survey look representative of the population, pollsters may actually be making their polls less representative of those who turn out to vote.