By Jon Mellon and Chris Prosser
On Tuesday the inquiry into why the polls went wrong before 2015 election released its preliminary findings. Their main finding agrees with our own research: unrepresentative samples are to blame for the 2015 polling miss. However, inaccurate polling due to unrepresentative samples is not a new problem and pollsters didn’t substantially change their methodology from previous elections, which raises an important question – why did the polls go wrong this time.
As the inquiry makes clear, it is important to put the 2015 polling miss into context. Polls at previous elections have also been inaccurate (and perhaps increasingly so) but because they have tended to pick the winner correctly this seems to have gone unnoticed by the media and wider public. This is not to say that the 2015 polling missing was not important – it was the worst polling miss for nearly a quarter of a century – but it is important to put that miss into a context of a long standing problem of unrepresentative samples. The 2015 polling miss is particularly interesting because it came on the back of the 2010 election when the pollsters were more accurate on the Conservative-Labour lead than any election since 1979 but substantially overestimated the Liberal Democrat share of the vote. As we explain in this blog, these three things – getting the 2010 Con-Lab lead right, the 2010 Lib Dems wrong, and the 2015 Con-Lab lead wrong – are all connected, and help us understand what is wrong with the polls and how we might fix them.
Understanding and fixing unrepresentative samples
In our paper investigating the 2015 polling miss the key problem we identify with the polls is the tendency to oversample politically engaged respondents: particularly those who have high rates of turnout. This problem is exacerbated because the polling samples are then weighted to resemble the general population rather than only those who turnout to vote. Since turnout is often as much as thirty percentage points too high in polling samples, even demographic groups with low turnout often have high turnout percentages in the samples. This makes the demographics of the sample of voters appear too similar to the general population and not enough like the voting population. Non-voters from low turnout demographic groups are replaced by voters from these demographic groups, skewing the voter sample.
To make it clear what we mean by this, take a simplified example. In this simplified world there is only one demographic characteristic, whether people are young or old, and each of these groups makes up half the population. Politically there are two important differences between these young and old people – old people are much more likely to turn out to vote than young people, and much more likely to vote Conservative when they do. As shown in the chart below, if there were an election in this world, half the young people would not vote, 40% would vote Labour and the remaining 10% would vote Conservative. Only 20% of the old people would stay at home on election day, 20% of them would vote Labour, and the remaining 60% would vote Conservative. The results of this election is that the Conservatives win the election with about 54% of the vote and turnout is 65%.
Imagine now that we do a poll in this world and we have the same problem that the real world does – people who don’t vote also don’t respond to political polls, and that this is the only sampling problem the polls have. Because we either use demographic quotas or we weight our sample we end up with the correct ratio of half young people and half old people – the sample looks representative in terms of demographics, just like real polls do. The problem lies under the surface.
As the chart below shows, within both young and old people the ratio of Labour voters to Conservative voters is the same as in the population, and indeed the ratio of young non-voters to old non-voters is correct too. The problem is that the ratio of voters to non-voters is completely wrong – in this sample the level of turnout is 86% (similar to the numbers in real polls). Because young people are less likely to vote, this leads to the disproportionate overrepresentation of young voters, which in turn results in an overrepresentation of Labour voters. In this example the polling miss is even worse than the 2015 miss – despite the Conservative 8 point lead in the population Labour now have a 6 point lead over the Conservatives – all because non-voters didn’t answer the poll.
It is hopefully clear from this example that the apparent levels of support for parties in polls can be drastically affected by the absence of people who do not even vote. This is precisely what we think is to blame for the 2015 polling miss. In our paper we show that if we simulate the problem of non-response from non-voters on the British Election Study face-to-face survey (which got much closer to the result than the polls) by removing the non-voters from the sample and weighting the remaining respondents as if they were the population, then we have the same problem as the polls – almost a dead heat between the Conservatives and Labour.
Identifying the problem of misweighted samples due to missing non-voters also suggests a way in which polls might be improved. Ideally polls would increase the number of non-voters in their samples by getting representative samples that need minimal reweighting, like the BES face-to-face survey. Proper representative samples are expensive and time consuming – not an ideal method for the fast-paced and budget constrained world of political polling. However, the impact of missing non-voters in samples could be reduced in much the same way as imbalances in demographic factors are, using weighting. In our paper we demonstrate this on the BES Internet panel and show that simply weighting up non-voters to their correct proportion substantially increases the Conservative lead over Labour.
Of course this is easy to do after the fact when we know who has voted and who hasn’t, but what can we do before an election, particularly when we know that people overstate their likelihood of voting? In a new paper we show that the same principle of upweighting non-voters can be applied before elections by predicting the probability of respondents actually voting and weighting the probabilities so that the expected level of turnout in the sample is pegged to a more realistic turnout level than the very high turnout implied by the sample. Although the result is not perfect, making this adjustment to the BES Internet panel using only information that was available before the election would have substantially improved the accuracy of the survey.
Given that the main problem we identified with the polls in 2015 relates to the fundamental and longstanding issue that polls are unrepresentative in terms of political attention, we must ask an important question: why did the UK polling industry perform so badly in 2015 rather than in some other election?
Those who have followed our research into why the polls went wrong will know that we have previously suggested that the miss may have been due to differential turnout. We no longer think this is the case. When we examine the BES validated vote data (which checks whether people actually voted against the marked electoral register) we find no evidence of different levels of turnout by whether people said they were going to vote for the Conservatives or Labour (more details are in our paper).
So why did we previously think we had evidence for differential turnout? Our previous analysis relied on predicting respondents’ probabilities of actually turning out to vote, using a variety of demographic and attitudinal variables. Although we didn’t realise it at the time, this method wasn’t actually predicting individual respondent’s probabilities of voting, but instead (inversely) predicting how likely they were to be overweighted within the survey because they looked like people who do not vote even if they were actually voters themselves.
So what was actually different between 2010 and 2015? To answer this, we measure how overweighted 2015 respondents are by comparing the ratio of respondents’ normal demographic weights to demographic weights that also include an adjustment for turnout – those with higher numbers are the most overweighted respondents. The figure below plots the level of support for each party by this ratio and shows the most overweighted respondents in 2015 were disproportionally Labour leaning.
We can also conduct the same exercise for the 2010 BES Internet Panel. The figure below shows these results for the 2010 data. Whereas there is essentially no relationship between down-weighting and Liberal Democrat vote in 2015, in 2010 the relationship is strong and positive. Conversely, the relationship between Labour vote share and adjustment is much weaker in 2010 than it was in 2015.
These results potentially explain the 2010 over-estimation of the Liberal Democrats, when the polls overestimated Liberal Democrat support by around 4.0 percentage points: a figure not that much lower than the 2015 polling miss. Research into that polling miss was unable to find a clear explanation for this over-estimate, but concluded that it was highly unlikely to be solely the result of a late swing away from the Liberal Democrats. Instead our findings suggest that this over-estimate results from polls over-sampling/overweighting voters from low turnout demographic groups and that these voters tended to support the Liberal Democrats. The surprise surge of support for the Liberal Democrats among low-turnout demographic groups such as students may in fact have saved the pollsters from a 2015 polling miss in 2010 by reducing Labour’s support amongst overweighted respondents.
Given these findings, we think that the most likely explanation for ‘why 2015?’ is a swing from the Liberal Democrats to Labour amongst the respondents most likely to be overrepresented in the polls. British polls have a long history of overestimating Labour support. The key to understanding why this is the case is understanding the impact of missing non-voters on the effects of demographic weights. The key to understanding why the polls have got different things wrong in the last two elections is the changing electoral fortunes of the Liberal Democrats amongst those who are overweighted in polls.