By Jon Mellon and Chris Prosser
Tuesday night saw an eerily familiar feeling setting in as it became increasingly clear that the conventional wisdom driven by public polling had greatly miscalculated Trump’s chances just as it had with the UK Conservative party in 2015 and the EU referendum in June. It now appears that the polls slightly overestimated Clinton’s overall vote lead, but more importantly hugely overestimated her support in key rust-belt swing states. It’s still much too early to say exactly what went wrong with the US polls this time, but our investigation of the 2015 UK polling miss can give some initial hints about where we should be looking.
In our analysis (forthcoming in Public Opinion Quarterly), we considered five possible explanations for the UK polling miss that saw the Conservatives outperform the polling average by 4.2 percentage points:
- Unrepresentative samples
- Shy voters
- Tturnout misreporting/differential turnout
- Late swing
- Shifts in undecided voters
In this blog we go through how we looked for evidence for each of these explanations and whether there’s any initial evidence about whether they apply to the US.
At the end of our 2015 investigation we concluded that representative samples were the primary cause of the 2015 polling miss. Polls try to capture public opinion by capturing a sample of respondents that closely resembles the actual target population, through a combination of sample selection and weighting. Because response rates to phone polls have fallen dramatically and internet polls generally use self-selecting samples, there is substantial room for samples to be systematically different from the population they try to capture.
In Britain we found that the key explanation for the polling miss was the under sampling of non-voters. Because some types of people are less likely to vote, the electorate is very different demographically from the population as a whole (e.g. voters tend to be older, richer, whiter, and more educated than non-voters). Voters also tend to be more likely to answer surveys than non-voters. When a survey sample has too few non-voters in it, the sample of voters appears demographically more similar to the general population than it should in reality. To put it another way, undersampling non-voters makes the electorate look younger and more ethnically diverse than it really is. This is because polling samples are weighted to look like the general population rather than the electorate. Voters that demographically resemble non-voters (e.g. young voters) will be overweighted and they take the place of their missing non-voting demographic counterparts. The demographics of turnout and party support tend to be correlated and so the parties that these overweighted voters support end up being overestimated in the poll – e.g. in Britain younger and poorer people are less likely to turnout to vote, but more likely to vote Labour when they do, and consequently British polls overestimate Labour support.
The pattern of overestimating the support of the party supported by younger and poorer voters is what we would expect if unrepresentative samples were the problem with the US polls. However we don’t yet have high quality probability samples to compare the results of public opinion polling to. Until we have that, it is hard to draw any strong conclusions on representativeness. Certainly, the US polls face many of the same challenges as British pollsters. On the other hand, they do have the luxury of relatively high quality data on the composition of the electorate (as opposed to the general population) in previous years due to more extensive exit poll data than the UK which might make the representativeness of samples less of an issue in the US case.
In time the American National Election Study may well play the same role the British Election Study did in explaining the 2015 polling miss. It is also worth bearing in mind that probability samples are not perfect: they can be affected by any of the other biases highlighted here and can have representativeness problems of their own if they suffer from non-response bias.
Another possible explanation of polling misses is that pollsters incorrectly think that respondents supporting one party are more likely to vote than they actually are. Pollsters generally use a scale where respondents rate their likelihood of voting to weight or filter their samples.
We initially suspected differential turnout was a problem in 2015 polls, as the overly high turnout in opinion polls could be put down to respondents’ lying about their turnout (indeed we suspected this was a major factor initially). However, when we matched our respondents against their actual voting records, we found that while a reasonable number did seem to have fibbed about voting, Labour supporters were no more likely to have done so than Conservatives. Instead, the apparent differences in turnout likelihood reflected the over-representation of politically engaged respondents.
US pollsters are likely to conduct a similar exercise this year, and many already suspect that the failure to adequately filter out unlikely voters to be a possible explanation. Certainly depressed turnout in black areas appears to have contributed to Clinton’s loss, but it remains to be seen whether pollsters overestimated their likelihood to vote before the election. Again, it is very possible that polls oversampled engaged black voters rather than that they failed to filter out black non-voters who took their survey.
As soon as any polling miss happens, people immediately rush to claim that many voters lied about who they intended to vote for on the basis that they are embarrassed or scared to admit support for a socially stigmatized candidate. However, the evidence for this in 2015 was pretty weak. We found that the underestimation of the Conservative vote in 2015 was strongest in areas where the Conservatives had the highest support.
An early analysis of where the polling errors happened in the US elections suggests that the errors follow a strikingly similar pattern to the UK. The errors were largest in areas with strong previous Republican support: the areas where we would think that social pressure against Trump voting would be weaker. Once again, this is circumstantial evidence, but those who think people lied to pollsters about how they would vote need to explain why Trump voters were shy about their opinions in Oklahoma rather than in Hawaii.
The next explanation we considered is late swing. On this argument the polls would be wrong not because of problems with how they conducted their fieldwork but because voters changed their minds just before going into the polling booths. We were able to test this claim pretty effectively in 2015, because the British Election Study Internet Panel recontacted 30,000 respondents who had answered our questionnaire during the campaign. We found no evidence of a systematic swing to the Conservatives among these voters, making it very unlikely that late swing accounted for the error.
We will have to wait for recontact polls (likely already in the field) to assess this explanation in the US case. However, there is some limited evidence in the initial data from the exit polls, which shows that voters who made up their minds in the last week in Michigan, Pennsylvania and Florida leaned more strongly to Trump than voters who decided earlier. On the other hand, the same analysis applied to the 2012 exit poll, would also imply that there was a late swing to Romney in Michigan and Pennsylvania. In fact, Obama outperformed his polls in both states, so it is not clear how much weight we should put on this type of analysis.
Shifts in undecided voters
A related problem is that voters who said they were undecided in the polls may split differentially towards one candidate or the other one Election Day. In the 2015 British case, we found that this did not meaningfully contribute the polling error, as there was only a small difference in how undecided voters decided compared to other voters, and overall levels of undecided voters was fairly low during the campaign.
In the US, there were unusually high numbers of undecided voters in polls this year, so the potential is certainly there for an effect. However, there does not seem to be much of a correlation between the polling error and the proportion of undecided voters in pre-election polls. For instance, Florida and Pennsylvania both had fairly low proportions of undecided voters, while Michigan had one of the higher figures.
So what do we know so far? This analysis is based on the relatively little public data we currently have available and we know from previous experience that new information can dramatically change our assessment of different polling explanations. So far though, the US data is at least consistent with the representativeness problems we saw in 2015. There is every chance that another explanation will be more important in the US but as American pollsters and political scientists begin their investigations into what went wrong with the polls in 2016, the lessons from the 2015 UK polling miss should be a key consideration.