Both data-driven techniques and biophysical modelling have recently contributed to huge advances in the field of cardiovascular research.
Data-driven approaches, such as machine learning techniques, are able to make sense of complex and heterogeneous datasets and can detect patterns that might be challenging for the human eye. As detailed in a recent review by Mincholé and Rodriguez , their application to medical signals such as electrocardiograms has helped to identify abnormal rhythms, mechanical dysfunction and to improve patient stratification. Biophysical computer models, based on widely accepted physiological and physical concepts, allow the simulation of personalized cardiac function. They provide perfect control over their parameters and can thereby help to better understand physiological mechanisms. A recent review by Niederer, Lumens and Trayanova  discusses their potential to identify cardiovascular disease mechanisms, predict clinical outcomes and suggest potential therapeutic options for patients with cardiac diseases.
However, these two approaches, taken individually, still exhibit considerable challenges before reaching clinical practice. Here, we discuss these challenges and show how the combination of “data-driven” and “model-driven” approaches, rather than their individual use, can revolutionize our knowledge of cardiovascular mechanisms and improve patient care.
Data-driven approaches are increasingly applied to the classification and clustering of medical data such as electrocardiographic (ECG) signals, cardiovascular images, patient demographics and clinical outcome data. They can extract hidden patterns in patient populations, providing phenotypic subgroups that may relate to adverse outcomes or risk. However, one major drawback of such techniques is the lack of interpretability of the outcome, as the decision process followed by the algorithm is often elusive. In this case, combining the data-driven approach with biophysical model simulations may help understand the physiological mechanisms behind unexpected classification outcomes. A successful example of this combined approach was presented by Lyon et al.  in Europace for risk stratification in hypertrophic cardiomyopathy. In a previous work , the authors identified subgroups of patients based on their ECG morphology using blinded unsupervised learning techniques. These groups were associated with different arrhythmic risk. In the Europace publication, the authors developed a personalized imaging-based computer simulation framework to investigate the effect of electrophysiological and structural abnormalities on the ECG. This combined approach revealed two different potential mechanisms (i.e. ionic remodelling and Purkinje-myocardial coupling abnormalities) that could explain the different phenotypic subgroups and even led to different therapeutic strategies to be followed in hypertrophic cardiomyopathy patients. Such an example shows the potential of a combined approach to improve our disease understanding at various scales: from cellular mechanisms all the way to clinical patient classification.
In another context, we could imagine that the data-driven approach may identify phenotypes that could not be explained by the biophysical model. In this case, the benefit of the combined approach would be mutual, and the insights from the data-driven approach would help refine the physiological assumptions of the model.
Another well-known limitation of current data-driven techniques (such as machine learning or deep learning) is the need for a large amount of data to train and validate the algorithms and ensure proper generalization . This is practically complicated to achieve as the amount, quality and completeness of clinical data is often lacking. New approaches may take advantage of simulated synthetic data generated from biophysical computer models to enrich the datasets available and help train robust and generalizable machine learning algorithms. A successful proof-of-concept example of such an approach was recently published by Ledezma et al . combining electrophysiological computer models with neural networks to detect ischemic events. The authors used cellular computer models to generate populations of models with electrophysiological properties in the range of experimental data. Biomarkers computed from this synthetic data were then used to train neural networks to detect different degrees of ischemia. This work showed the impact of inter-subject variability on ECG-based ischemia detection and provided novel sets of biomarkers to detect ischemic events.
“Data-driven” and “biophysical modelling” approaches are thus highly complementary. Indeed, data-driven approaches can provide insights at the population scale and can help characterize patients according to their phenotype, while modelling approaches can provide us with unique insight into the mechanisms leading to these phenotypes. Combining these approaches therefore provides a framework to improve disease understanding from cellular mechanisms to population dynamics. The combination of these two approaches can also help address the methodological challenges that they both face, with the generation of virtual patient data, and the refinement of biophysical model assumptions.
We believe in the enormous potential of integrating these computational techniques in cardiovascular research and expect that their combination will provide a powerful artificial intelligence framework to improve patient stratification, arrhythmic risk prediction and discovery of potential therapeutic targets.