As I work for a UK charity, I need to be very careful on social media during the election campaign. Charities are constrained by the requirements of both charity regulations and electoral law. Simply put, charities are forbidden to publicly support or oppose any candidate or party. Although I’m sure no-one sees this personal blog as the opinions of my employer, I will be being cautious and conforming to the rules above during the election period.
There was a little bit of concern, to put it mildly, about the accuracy of election polling in 2015. In response, polling companies have modified the way they collect, analyse and draw conclusions from polling data – but although each company has reacted, they have all done so in differing ways.
So, to make sense of any random 2017 poll, we really need to know three things – the polling company responsible, the date of the poll, and the type of poll.
Some people think that the political affiliation of the newspaper or website that publishes the poll also has an impact – in practice no reputable polling company would fudge their data to meet the political predilections of an editor.
But how to spot a reputable polling company? The easiest way is to check that they are members of the British Polling Council. Members are expected to comply to rules, which require each company to share full details of their sampling and analysis methodologies. Though the BPC doesn’t endorse particular methodologies, it ensures that each is clearly documented with, where possible, the underlying data also disclosed.
There’s a similarity with the process of peer review – and as with peer review the lay reader such as you or I will assume that the methodologies and maths have been seen to make sense by other experts.
I’ve been waving the word “methodology” around a bit – this just means the way a sample is taken and the way this sample is analysed and extrapolated to give those all-important headline figures.
There are two main types of polls – phone polls involve ringing people up at random to gather a representative sample, whereas online polls take a large number of willing participants and select a representative sample from within these. Both have common criticisms that can be quickly dismissed – although there are demographic (age, social class…) indicators correlated with the likelihood of home phone use, and although online participants are likely to be more politically engaged than other groups the analysis and extrapolation stage takes account of these differences.
One other red herring is the idea of “clustering”, some people who should know better claim that polls will aim to have results in line with other polls rather than risk being an outlier. Whereas it is sensible to suggest that poll responses are influenced by other poll results (as indeed may be the election itself), the idea of polling companies massaging their figures to fit a trend line is ridiculous.
Let’s have some more definitions – a sample is a small segment of a larger population that is as representative as possible of the wider population. For elections, the wider population is everyone who will vote in the election, and the sample aims to reflect the make-up of this population as closely as possible.
Polling companies may take account of – for example – age, social class, location, previous or current political activity, voting history and likelihood of voting in developing a sample. For most companies a sample will be around 1,000 people.
Responding to 2015
At the last election, polls showed a likely hung parliament right up until the exit poll. This error was claimed by some to have effected the election campaigns, and there was serious disquiet about the state of polling from commentators and politicians. In response, the BPC commission a report into polling practice, which was published in March last year.
A big point of controversy around the 2015 polls concerned how polling samples are made up. The BPC report concluded:
Our conclusion is that the primary cause of the polling miss was unrepresentative samples. The methods the pollsters used to collect samples of voters systematically over-represented Labour supporters and under-represented Conservative supporters. The statistical adjustment procedures applied to the raw data did not mitigate this basic problem to any notable degree.
[We can] rule out the possibility that at least some of the errors might have been caused by flawed analysis, or by use of inaccurate weighting targets on the part of the pollsters. We were also able to exclude the possibility that postal voters, overseas voters, and unregistered voters made any detectable contribution to the polling errors. The ways that pollsters asked respondents about their voting intentions was also eliminated as a possible cause of what went wrong
The BPC simply felt that the samples used by polling companies contained too many people that are unlikely to vote, and too many people that supported Labour, to be a fair representation of the country as a whole.
Older people are more likely to vote. And they are more likely to vote Conservative. So some polling companies have focused on this correlation as a means of correcting for the 2015 errors.
Kantar Polling (formerly TNS), for example, has adjusted their sample weighting methodology to include more over 70s in the analysed data. YouGov have also increased the numbers of over 65s in their weighted samples.
Rather than adding older voters to the sample (which carries a risk of skewing the poll in other ways), some companies have focused on likelihood to vote as a key determinant of sample weighting.
Ipsos MORI, ICM and YouGov are using reported past voting behavior (did a participant vote in the 2015 election and/or the 2016 EU referendum?) as a sample weighting tool. ComRes use a statistical methodology based on weighting for age and social class instead of self-reported behavior.
Panelbase, as of last week, use 2015 voters not the general population as the basis of their sample weighting.
The “Don’t Know” problem
When you ask people who they will vote for, there will always be some who have not made a decision. The way “don’t knows” are handled in polling is a matter of no small controversy. The always entertaining UK Polling Report (run by Anthony J Wells of YouGov) has a good explanation of the background of this issue.
The TL;DR is that people who say that they don’t know for whom they will vote are likely to end up voting for the same party they voted for at the last election. Some (ICM, Populus) have historically used this as a weighted indicator of future voting, others (Ipsos MORI, ComRes) use “squeeze questions” to flush out a party preference which is then counted in a similar way as definite voting intentions. And there is YouGov, which simply did not include “don’t knows” in their samples, considering them less likely to vote.
The BPC report was pretty scathing on this whole mess, recommending that polling companies.
review current allocation methods for respondents who say they don’t know, or refuse to disclose which party they intend to vote for. Existing procedures are ad hoc and lack a coherent theoretical rationale. Model-based imputation procedures merit consideration as an alternative to current approaches.
So, by 2017, these controversial allocations have changed in some cases.
ICM are now going to add more “don’t knows” to parties previously supported (they used to add half of them, they now add three quarters to Conservative or Labour totals as applicable. They are also going to assume that those who don’t indicate a preference this time round and don’t know who they voted for last time are 20% more likely to vote for Conservatives and 20% less likely to vote for Labour.
Kantar have added a squeeze question for “don’t knows”, and are developing a model to add even those who answer “I don’t know” to the squeeze question to some later polls – based on which leader they find most trustworthy and respondent demographics.
A note on dates
The key thing to look for is the dates during which field work (the actual collection of responses) was carried out, not the date of publication. Wikipedia lists polls according to the field work dates and, as such, has a useful trendline that reflects possible changes in votes over time.
The above has been a (hopefully readable) summary of how election polling works, but how can we use this information to make sense of polls and preserve our blood pressure. Here’s a few tips from me:
- Only pay attention to polls from BPS members. Though others may be fun, we don’t know anything about how they were conducted or how they might skew.
- For analysing trends, only compare polls from the same company. The same or similar methodology producing different results on different dates is suggestive of a change in public opinion.
- For analysing the differences between polling companies, compare polls conducted on the same date. If you think company X’s methodology overrepresents party A, compare to polls conducted at similar times by companies Y and Z.
- Remember the margin of error. It is fair to assume that a poll of around 1,000 people will be accurate to around 3%, 19 times out of 20. So a poll showing a party share of 40% may indicate support anywhere between 37% and 43%. This error shrinks slightly for larger samples.
- Beware unusual polls conducted in novel ways – a good recent example is the YouGov aggregated statistical model that startled everyone over the Bank Holiday. This is a highly experimental model based on extrapolating constituency-level results from very small samples using machine learning approaches. It might be interesting, but we don’t yet know what margin of error it may have, or how it compares to other more conventional polls.
- Beware outliers – polls at odds with the consensus are often shared and reported more widely than other, more “boring”, poll results. But take account of the margin of error, and the possibility that it just could be an unusual sample.
- Beware confirmation bias – reputable polls you don’t like are equally likely to be as accurate as reputable polls that you do.
- Look for the data tables – as in all fields of research, publication of data tables allows us to take a more detailed view of the results. Is the sample “normal”? Are the extrapolations fair? Looking at the raw data can tell us.