Note: This short series focuses on highly topical applications of AI and machine learning in cardiac studies. It does not consist of detailed review articles, but focused short educational articles to help clinicians apprehend this currently “hot topic” and read between the lines.
The automation of cardiac segmentation is a long-standing problem for research. Its drastic improvements in recent years now facilitate its transfer to clinically-applicable tools. This article will give a brief overview of where do such improvements come from, how much trustable the results are, and what could come next for clinical practice. We invite the readers eager to go further to refer to more complete reviews on this topic, such as Chen et al.1
What’s new with machine learning?
Many traditional segmentation methods2 rely on models (e.g. deformable models or active contours) that are imposed to the imaging data, controlled by few hyper-parameters (i.e. parameters determined by the user) with a balance between the accuracy of the segmented regions and the smoothness of their contours. These models are not specific to the image databases they are applied to, which limits their performance.
Current machine learning approaches for segmentation rely on much more flexible models that go beyond this limitation. By definition they are optimal to the data they are trained on: training means that the model parameters (i.e. parameters determined by the algorithm itself) are automatically adjusted to best solve the problem on these data. This also means that the model may be inaccurate on new data unseen during training, if the model optimization did not include proper validation or if the new data are very different from the training data (e.g. training on a given hospital’s database and testing on databases from other hospitals, or training on 1.5T MR sequences and testing on 3T ones).
Segmentation is posed as a supervised learning problem. This means that:
- the training data are labeled (labels being the manual annotations made by some experts),
- the model inputs are images (or patches) and labels,
- the model outputs are the category assigned to each pixel (e.g. myocardium, cavity, etc.),
- learning aims at finding the best parameters of the model such that the segmentations obtained by the model best fit those of the experts.
Why such results with deep learning?
Neural networks are a subcategory of machine learning techniques able to model complex relationships between inputs (here, the images) and outputs (here, the segmentations) (Figure 1).
Figure 1: Automatic cardiac segmentation means modeling the complex relationship between the input images and the output segmented regions (myocardium, cavity, etc.). A simple relationship can be linear, meaning that a linear combination (formulated through the coefficients of a matrix) relates inputs and outputs, as in standard regression models. In the case of segmentation, these relationships are non-linear, meaning that they cannot be expressed by a simple matrix. Nonetheless, they can be estimated by chaining several simple (linear and non-linear) operations as neural networks do.
(a) Generic view of a neural network used for segmentation.
(b) The U-Net architecture3, which became popular for (cardiac) segmentation Here, convolutional layers replace the standard neurons depicted in (a).
A simple neural network consists of an intermediate layer composed of neurons that connects the inputs and outputs. For each neuron, a simple non-linear operation is realized to transform the input data to the desired optimal solution. The network parameters are the weights attributed to these connections (determined by the algorithm itself), and the hyper-parameters that define the network itself (e.g. the number of neurons for the layer, the non-linear functions to be used for each neuron, all these parameters being determined by the user). A deep learning network consists of several intermediate layers (at least three) connected to each other, therefore able to model more complex relationships but also meaning more parameters to optimize. Such models are extremely interesting for segmentation because they can apprehend the complexity of this relationship (e.g. uniformity of the myocardium surrounded by brighter/darker pixels, image gradient at the transition, multi-scale image representation, etc.) without having to make it explicit.
Note that by language abuse, deep learning is often referred to as “artificial intelligence” (AI), while AI is actually a much broader field consisting of other techniques such as reinforcement learning, natural language processing, etc.
Database issues, recent advances and challenges
As the success of deep learning methods highly depends on the quality of the databases used for learning, several initiatives have emerged in recent years to offer such databases with expert annotations to the community. In particular, there has been a huge interest for MRI sequences, due to several public challenges (Kaggle, ACDC-MICCAI, etc.) and large databases initiatives (Cardiac Atlas Project, UK Biobank, etc.), which led to segmentations of 2D slices near the level of performance of experts4. MRI sequences are actually well suited for segmentation as they exhibit rather well defined frontiers between the different regions to segment, except in the presence of many trabeculations, artifacts, or thin myocardium. Other types of images such as echocardiography or late-enhancement MRI are more challenging, although promising segmentation results start being released.
Recent works consider the full volume (e.g. stack of slices in MR or 3D B-mode images in echocardiography) as 3D inputs for the segmentation model. However, processing 3D data requires substantially more demanding computations, and implies much less samples for training than processing the volume as a set of independent 2D images. Nevertheless, in this case the consistency of segmentations between the slices of the same volume or across the frames of the same sequence may not be guaranteed, and careful double-check is required in more challenging regions such as the apex or the right ventricle, until robust solutions are implemented in clinically-available tools.
Interpreting what guided the model towards a given segmentation result is not a must-have for clinical observers (apart for improving these models), contrary to other applications such as diagnosis or prognosis where this is a hot research topic.
Although deep learning solutions are in essence optimal to the data they are trained on, there exists well-known validation strategies to prevent overfitting to the training data and foster the generalization to new datasets "not too different" from the training set. Nonetheless, the transferability of a model to databases with different properties, including from different fields (e.g. from computer vision to medical imaging), is a crucial challenge under active research. The specificity of the databases used for learning is also a key aspect under-addressed at the moment, as these mostly consist of a specific type of population (even for large databases), potentially from a single type of device, and expert-dependent annotations.
Impact for the engineering and clinical communities
Although several challenges still exist, the recent advances in cardiac segmentation have already profoundly impacted the engineering and clinical communities.
Engineers witnessed a boom in the setup of annotated databases and in computational power, which allowed the use of neural networks for segmentation purposes. It also gave strong visibility to the field, up to involving researchers not necessarily dedicated to medical imaging, although medical images raise specific challenges as discussed above.
Reliable automatic segmentation not only frees clinicians from a time-consuming and subjective task, but also enables the automatic extraction of anatomical descriptors, to be complemented by motion tracking in the near future. The involvement of industrial actors provides user-friendly tools already applicable in the clinic, provided that the end-user keeps a critical view on the main limitations of these methods. The ESC and the EACVI recently launched a task force on regulatory issues associated with diagnostic imaging devices, including software, which should provide recommendations on database and software standards and discuss each actor’s responsibilities.
Conflict of Interest
Nothing to declare