The British Election Study Internet Panel offers huge advantages to studying British politics. So much of the current political situation can only be understood in terms of how individual voters are changing their attitudes and behaviours. The 16 waves of the panel conducted since 2014 offer an unparalleled picture of the volatile journey taken by British voters over this turbulent period.
However, panel data is not without its drawbacks. Taking all 16 waves of the panel means that a respondent would have spent more than five hours taking BES surveys over this time period. It is therefore a reasonable question to ask whether any normal person can be expected to do this, and whether even a normal person might be changed by spending so much time taking political surveys. In technical terms, these are the problems of panel attrition (politically engaged people are differentially likely to stay in a panel) and panel conditioning (taking surveys can change people’s opinions or behaviours).
In this post, I set aside the issues of the representativeness of non-probability samples in general. These are extremely important issues which we have discussed elsewhere and we continue to work on improvements to weighting to account for these issues and, most importantly, continue to conduct gold standard face-to-face probability surveys after each election. However, the issues of panel attrition and conditioning are additional concerns on top of more general concerns about non-probability samples that deserve their own attention.
Unlike some studies, our panel is designed to remain cross-sectionally representative at each wave. That means that you should be able to take a sample taken in wave 13 (2017 post-election wave) and make inferences about the attitudes of people at that time. To do this, YouGov samples replacement respondents who resemble the kind of people who had dropped out. This means that the raw sample at each timepoint remains representative of the population in terms of the factors that YouGov quota samples from. However, the potential concern is that the respondents who do stay in the sample within these factors will be unusual.
For instance, imagine that the only factors that are important are whether you were below the age of 40 or above the age of 40 and political interest. Young people have lower levels of political interest than older people. For the purposes of this hypothetical we weight and sample for age but we do not observe political interest.
In wave 1, we start with half the sample over 40 and half below with both groups having reasonable levels of political interest. However, in wave 2, many of the politically disinterested will have dropped out. Because young people have lower political interest, we have lost more young people than old people. To make the sample representative again we would sample young and old voters in the correct proportions. However, the problem comes because, while the sample of young people who dropped out was disproportionately uninterested in politics, the sample who are used to replace them are just normally interested in politics. At wave 3, the same thing happens again, with politically disengaged people dropping out and those who are interested in politics staying in the panel. Repeating this over many waves will gradually make the remaining young people more and more interested in politics because the disinterested topups tend to drop out while the politically interested topups tend to stay in.
In practice, YouGov’s sampling procedures include a lot more factors than just age, so this problem should be reduced substantially. Most importantly, the sampling and weighting accounts for education, past turnout and political attention, which are probably the variables of most concern.
It is therefore an open empirical question as to whether the Internet panel’s cross-sectional samples are skewed by differential attrition in practice and, if so, by how much. To test whether this was a problem (and to provide a solution), in wave 16 we conducted a panel refresh.
The idea behind this was that a new sample of 18,000 respondents (18,217 in practice) would be drawn from YouGov’s panel without regards to whether they had taken part in previous BES surveys in addition to the standard goal of maximising retention. By chance, many of these respondents would have taken previous surveys, but the idea is that this 18,217 person sample represents the sample that we would expect to achieve if we started an entirely new panel with YouGov.
Because the longitudinal element of the BES is obviously highly important we did not want to risk the panel refresh interfering with long-term panel retention. To achieve both of these goals, we increased the total sample size for wave 16 to 37,959. This allows us to achieve both normal retention and a fresh sample could be simultaneously.
While there is much work left to do comparing the samples, our initial analysis does not suggest that the standard cross-sectional wave 16 sample yields meaningfully different results than our fresh wave 16 sample. Comparisons for vote intention, likelihood to vote, European Parliament vote choice and turnout are shown below using YouGov’s standard weighting:
|I would not vote||7.9||8.1|
|Scottish National Party (SNP)||2.8||2.7|
|Change UK- The Independent Group||1.8||1.4|
|Scottish National Party (SNP)||3.1||3.2|
|Did not vote||35.8||35.5|
|Very unlikely that I would vote||9.5||9.6|
|Neither likely nor unlikely||4.5||4.7|
|Very likely that I would vote||67.3||66.9|
|EU referendum vote intention||Fresh||Full|
|Stay/remain in the EU||46.7||46.5|
|Leave the EU||39.8||39.8|
|I would/will not vote||6.6||6.8|
In each case, the weighted fresh and standard samples do not yield hugely differing results and there is no clear indication of a bias caused by the standard cycle of attrition and replenishment. The results of the panel refresh are encouraging for the ongoing quality of the panel. Panel attrition is not random and respondents who have taken very large numbers of waves should still be analysed cautiously. However, there is no evidence so far that the overall quality of the Internet panel is declining. There is no evidence that we should be more concerned about the representativeness of an internet panel cross-section wave conducted at wave 15 than at wave 2.
However, even if it does turn out later that some important factor is affected by panel attrition and conditioning, the fresh sample in wave 16 will ensure that there is a subsample of the panel that can be used to diagnose this problem and correct it if needed. The indicator variable for the fresh sample is fresh_sampleW16 in the panel and the relevant weight is wt_fresh_W16.