In this paper, Gajardo et al., highlight the role of big data of social determinants in the better understanding of the epidemiology of coronary heart disease (CHD) (1). The role of social status on chronic diseases was already underlined by Mark Lalonde in the mid 1970s when he introduced the “social health” conceptual model. It was also underlined by Dahlgren-Whitehead 20 years later, in the early 1990s, with the introduction of the social “rainbow” model of health, in which various economic, environmental and social inequalities were presented as determinants of people's health status, and their ability to prevent sickness, or having access to effective treatments.
Gajardo et al., discussed the report of a large-scale, population-based study of 137,408 CHD patients treated in primary care in Australia, in which social determinants of CHD were studied (2). In this report, the authors found that only one half of CHD patients received adequate medication; in addition, only one half of patients were not screened for CHD-associated risk factors. All these findings were analysed under the prism of patients’ social status. The discussion followed in the European Journal of Preventive Cardiology by Gajardo et al., (1) underlined the importance of the use of “Big Data” in epidemiologic research, considering as “big data” the large sample size of the study.
Indeed, although in epidemiology it is common to use data based on thousands of participants, it is not so common to have a single study of 100,000 or more individuals. “Big Data” is hard to define, but when you “see” it you know that it exists! Today, a large amount of information is stored from a variety of electronic health records, registries, medical images, social media channels etc., accomplishing the five “V”, i.e., variety, volume, velocity, value and veracity, that characterise Big Data. Without any doubt, Big Data in epidemiologic research is essential to better describe and understand the “unknown” reference population, as the sampling error and selection bias are eliminated.
Moreover, Big Data analytics may also help to develop personalised medicine, which seems to be the future for modern public health. However, as it is also underlined by Gajardo et al., (1) low and middle-income countries lack of the necessary supplies to produce such Big Data; a fact that increases the gaps in knowledge regarding the social disparities in health. The United Nations (UN) “Global Pulse” program has been developed based on the recognition that the huge amount of digital data available nowadays may offer great opportunities to increase the understanding of changes in human well-being, to get real-time feedback on the effectiveness of the applied policies, and to be an initiative to promote the use of Big Data for the study of non-communicable diseases. Thus, to reduce the barriers in providing information that some countries face. However, an important issue that needs to be discussed and understood by everyone who is trying to interpret Big Data is that by using it based on common analytical methods, the likelihood of making Type I errors (i.e., false rejection of the “null” hypothesis) in research hypothesis testing is reduced (i.e., easier to achieve small p-values). So, the chance of having meaningless results increase.
The investigators should look, instead of at p-values of the statistical tests, for the magnitude of the effects. By rerouting the statistical power of a test that yields meaningless significant p-values, tight bounds on interval estimates will be achieved. Thus, it can then decide that a phenomenon is not worth following up on and can better understand the data and identify the true large effects, which are always those giving the most leverage for action. Conclusively, it is of major importance to improve access to Big Data in the population, but at the same time to use appropriate analytical techniques, in order to provide policy makers with robust tools to better understand and manage social inequalities in health.