‹ Tutorials

What makes a good, robust, and useable research sample?

This is a question we see very often. It strikes at the heart of the purpose that research serves, and how we can maximise its value by creating a useable research sample. Therefore, we have put together this article to answer this common question.

Firstly, what is a research sample?

A sample is a subset of a larger population that are included within a research study. Instead of examining everyone or everything in a population, researchers select a sample to represent the whole.

A good, robust and useable research sample is characterised by 4 key pillars:

  • Size
  • Representativeness
  • Audience relevance
  • Recency

All four are important in evaluating the validity of a piece of research and all four should be rightfully questioned by journalists when reporting on a study.

Sapio Research ensures that all research produced can be reported on by journalists by doing the following:

  1. Recommending sample sizes that we believe are both statistically robust and reportable
  2. Ensuring that each sub-group of the sample is appropriately representative.  Sub-group examples include age, gender, and region in B2C studies, and sector, company size, and job role in B2B studies.
  3. Ensure that robust screening criteria are in place to only include individuals who are relevant and can discuss the topics in the questionnaire.
  4. Providing transparency over dates when the fieldwork was completed

Size

Quite simply – the larger the sample size, the more robust the data set.  All samples have a ‘margin of error’ – this is the maximum and minimum amount we would expect the survey results to deviate by if were to interview the entire population of an audience as opposed to the research sample provided.

The chart below shows the expected margin of error for a range of sample sizes.

Source: Conjointly

Here we can see that a sample size of 1,664 respondents will have a margin of error of +/-2%, whereas for a sample of 188 respondents will have a margin of error of +/-6%. Sapio’s view is that if a margin of error of any sub-group is above +/- 12% then it will be difficult to analyse and make assumptions from. This equates to a sample group size of approximately 50. This  influences what sample size we suggest.

Example

  • A sample size of 2,000 is appropriate for a UK consumer study if you want to do cross-analysis by region. There are 12 regions in the UK, with the smallest being Northern Ireland, which makes up 2.9% of the UK population. 2.9% of 2,000 = 58 – which is the approximate sample size you would see for Northern Ireland on a nationally representative study of this size.
  • A sample size of 1,000 is appropriate for a UK consumer study if you want to do cross-analysis by age and gender for typical age groupings. Approximately 5% of the UK population are aged 18-24 and male – this tends to be the smallest age/gender grouping in the UK. 5% of 1,000 = 50 – which is the approximate sample size you would see for 18-24 year old men in a nationally representative study of this size.
  • A sample size of 400 is appropriate for a B2B study if you want to be able to break down by eight equal sectors. If each sector has 50 responses, then it will be possible to analyse in this way.
  • A sample size of 200 is appropriate for a B2B study if you want to focus on fewer subgroups, and the audience is a niche or senior level.

From a statistical perspective, as shown in the chart, there is no ‘threshold’ for what makes a good or a bad sized sample, but the larger the sample is, the more robust it is, with diminishing returns.

A sample size of 2,000 may seem small  to replicate the real-life reality of the entire UK population, so  we recommend that the margin of error (for 2,000 this is +/- x%) is clearly communicated to the audience receiving the research.

Representativeness

Often overlooked, the balance of the sample is crucial in providing robust research results. By ‘balance’ we mean that the distribution of sub-groups within your sample should be as close to the distribution you see in the real world as possible. What do we mean by that?

In the UK, the population has approximately 51% women, and 49% men. In a nationally representative consumer study, you would expect this to be replicated closely in a survey sample. If it is not, then there is less confidence that the results in the study can be extrapolated to the whole population.

Sapio Research not only carefully monitor and manage the sample balance in surveys, we also conduct our own research using publicly available data sources (e.g. ONS, Eurostat, Worldometer etc.)  to ensure we are up to date on what a correct sample distribution should be.

The representativeness of a sample should be questioned by journalists as often, if not more often than the size itself. Unrepresentative sample means unreliable data. We recommend that journalists communicate the sample distribution when reporting (e.g. whether it is a nationally representative study).

Audience

The audience profile of the sample is important in determining the context of the responses. For example, if you were to ask a sample of people that live in West London what they thought of Chelsea football club, they would have a very different opinion to those in North London!

We recommend sense checking the audience to evaluate the validity of the research by asking a couple of questions:

  1. Do these individuals exist? If you had a sample of 200 CEOs working for FTSE100 companies, there is an obvious issue there that questions the validity of the research.
  2. Will this audience be able to answer the questions they have been provided? Using a similar example, will CEOs at FTSE100 be able to talk about the inner workings of the IT department? Probably not.

Recency

As a rule of thumb, if the research is over a year old, it is likely that it is no longer relevant. The preference is that the research has been conducted within the last 12 months if it is to be reported on. This can also be subjective – for example, if an event has occurred after the research, that may have affected the results e.g. a sudden increase in energy prices after a finalised research study around the cost of living.

We recommend that the date of the fieldwork research should be clearly communicated when reporting it.

> Try out our free tool – Significant Difference Calculator

> Recommended read – Best Practice Guide for Using Statistics in Communications

We are passionate about providing research that is insightful, robust, and useable by journalists. If there are any questions around the sample we’ve used in our research studies, please contact us directly team@sapioresearch.com.

Get topical tips, insights and more delivered to your inbox

Be the first to know about the latest international business sentiments, behaviours and plans to stay one step ahead of your competition.