top of page
Search

Can we trust self-reported data?

Updated: Feb 24, 2022


The short answer is: we have to. Self-reporting is often the most convenient, and sometimes, the only way in which we can understand certain phenomena. Self-reporting of symptoms is how you start every visit to the doctor’s and it is instrumental in helping doctors establish a clear picture of your health status [1, 2, 3]. Self-reported accounts of other people’s or one’s inappropriate behavior is an essential aspect of law enforcement [4, 5] and governmental oversight [6, 7]. Self-reported satisfaction with a product, for example the Net Promoter Score, has become an indispensable tool for forecasting company growth [8]. The initial question then becomes, how can we maximize the quality and trustworthiness of self-report data?


This article very briefly discusses three major sources of error in self-report data and strategies that can be used to attenuate their impact. This is by no means a comprehensive list. In addition to the references cited here, I recommend “The Science of Self Report” [9] for a more detailed introduction to the subject.


Memory

In order to correctly report something we must, of course, remember it. Turning an experience into a memory is called "encoding" and how well a memory is encoded is directly related to how well it can be retrieved or remembered. Sometimes we live through an experience without properly encoding it in the first place [10]. Has it ever happened to you to have no memory of driving home along your usual route? It is known that memories are encoded better when they are accompanied by strong emotions, either positive or negative [11]. Even after a memory is created, it can change over time. The act of retelling a memory can change how that memory is encoded and the story will be a little different every time [12,13].


The longer the time elapsed since an event, the more difficult it may be to retrieve its memory [14]. Therefore, self-reporting should take place as close as possible to the experience or event we want to study. The answer to “How well did you sleep?” will be more accurate as soon as the person wakes up compared to at the end of the day. Experimental researchers refer to Experience Sampling [15] or Momentary Ecological Assessment [16] as methods to sample experiences in real time and in realistic environments. These methods are greatly facilitated by current technologies, in particular the Internet and smartphones. There are, nowadays, easy to use Experience Sampling platforms with all the features you need to deploy this type of data collection.

Response Bias

This is a catch-all term to refer to various emotional or cognitive tendencies that may lead to inaccurate self-reporting [17]. The most obvious one is outright lying, often as a way to protect ourselves from negative consequences or to “fit in”. The presence of an interviewer, for example, may cause people to feel self-conscious and to bias their answers in order to avoid negative judgment or scrutiny. It has been shown that people are less honest about sensitive information if they are reporting to an interviewer than if they are self-reporting [18].


The simplest way to minimize the fear of negative consequences is anonymity. Anonymity is seen as an essential practice to ensure whistleblowers are encouraged to report illegal activity [19, 20]. However, there is no scientific consensus on whether anonymity improves the accuracy of self-reports. Some studies conclude that it does [21, 22], and some conclude that it does not [23, 24].


Less controversial is the idea that appealing to the respondent’s sense of ethics and honor systematically reduces dishonesty in self-reports. One study, for example, found that asking participants to sign an “honesty pledge” consistently reduced dishonesty during a subsequent task [25]. Another study found that simply asking respondents to sign at the beginning of a self-report instead of at the end reduced dishonesty [26]. This study concluded that the act of signing makes ethics salient to respondents, priming them to be more honest while self-reporting.


Interpretation

How people interpret a question directly influences how they answer it. This may seem obvious and easy to address, but it is quite difficult for one person to imagine all the ways in which a question might be interpreted. Interpretation depends on understanding, culture, language, situation [27, 28, 29]. The question “Have you been a victim of sexual harassment?” would probably be interpreted today differently than 50 years ago.


The work of “validating” a questionnaire includes checking that questions are similarly interpreted by the majority of respondents creating reliable or repeatable observations [30, 31]. Validated questionnaires are often published in scientific journals but are very specific to a given task or target group. For many applications, a full validation study is not required but it can be helpful to check with a small group of people, if possible of different backgrounds, if their interpretations are similar. More often than not, you will find that questions can be improved.


References

798 views1 comment

Recent Posts

See All

4 insights to data-innovation in the Life Science space

Business intelligence, artificial intelligence, machine learning, data-driven innovation... I seem to hear these terms everywhere these days. Despite being a data scientist myself, I often wonder what

bottom of page