Congenital heart disease (CHD), in which structural abnormalities of the heart arise during embryonic development, are the most common type of human birth defect. They affect ~1% of live births . 30-50% of cases are thought to be caused by simple genetic means. Exome or whole genome sequencing is now routinely performed on many CHD cases, and sequence data from tens of thousands of individuals and family members is available for analysis. From this vast accumulation of data, it has become apparent that CHD has a high degree of genetic heterogeneity, with sequence variants associated with particular CHD cases having been identified in >100 genes. Furthermore, estimates suggest that there might be 300 or more CHD-associated genes yet to be discovered . How to best identify these genes quickly and efficiently is the subject of many research projects worldwide.
In this paper, Gonzalez-Teran et al test the hypothesis that sequence variants in proteins that interact with known cardiac transcription factors will be candidates for causing CHD. They focussed on proteins interacting with two transcription factors long known to cause many cases of CHD: TBX5 (associated with Holt-Oram syndrome) and GATA4 (associated with septal defects, pulmonary stenosis and outflow tract abnormalities). To define sets of proteins capable of interacting with these transcription factors, they first differentiated human iPSC to form cardiac progenitors and cardiomyocytes using well-established protocols [3,4]. They then performed immunoprecipitation using TBX5 or GATA4 antibodies. Mass spectrometry was used to identify all the immunoprecipitated proteins, with parallel experiments using differentiated TBX5 or GATA4 knockout iPSC cells used to eliminate non-specifically immunoprecipitated proteins.
High stringency filtering led to 272 potential interactors, which they termed the “interactome”. Unexpectedly, >85% of these genes were ubiquitously expressed in the human protein atlas, rather than being heart-specific. Importantly, the interactome included a few previously identified genes known to cause CHD, for example CHD7 for GATA4 and TAB2 for TBX5. The top two biological processes revealed by gene ontology enrichment analysis were “transcription regulation” and “chromatin modification”, as might be expected from proteins capable of interacting with a transcription factor. This gene list was cross referenced with a sequence variation dataset of very rare (minor allele frequency <10-5) loss-of-function (LOF) or missense variants in the exome/genome sequences of 3,000 CHD Trios. CHD cases had a significant enrichment of variants in the interactome genes compared to control genes, even when known cardiac development genes were removed from the list. In total, the authors identified 20 LoF and 53 missense variants as candidates for causing CHD in individual cases.
These potential CHD-causing sequence variants were prioritized for further analysis by in silico integration of a combination of widely used gene and variant metrics to assess variant deleteriousness, weighted depending on the gene’s expression in the developing mouse heart (E14.5). This was then validated in vitro using transcription co-activation assays. Encouragingly, the highest scoring variants in TBX5, GATA6, CHD4, CHD7 had all been previously identified in the cohort. New highly-scoring variants were found in the chromatin modifier genes BRD4, SMARCC and GLYR1 (for GATA4); and CSNK2A1 and SAP18 (for TBX5). In the remaining experiments, only the GLYPR1 variant was analysed in detail, as proof-of-concept for the potential of this approach to identify new CHD genes.
GLYR1 is a chromatin reader involved in chromatin modification and regulation of gene expression through nucleosome demethylation. Co-expression in vitro with GATA4 increases transcriptional activation activity, and the amino acid change destabilises the GLYR1-GATA4 interaction. ChIP-Seq suggests that GLYR1 is recruited to a large set of cardiac genes during cardiomyocyte differentiation from human iPSC, with significant co-occupation of regulatory elements with GATA4. Finally, CRISPR/Cas9-mediated genome editing was used to introduce the human sequence variant into Glyr1. 54% of homozygotes and 16% of heterozygote pups for this allele died shortly after birth (compared to 4% wild types). 15% of the homozygotes had ventricular septal defects. However, when crossed with a mouse strain carrying a Gata4 null allele, double heterozygous embryos had completely penetrance septal defects. Together, these data strongly support the hypothesis that this sequence variant is causative of CHD in this individual, or is “probably pathogenic” in the official ACMG nomenclature [5}.
This paper is a tour-de-force, employing a bewildering battery of up-to-the-minute techniques to identify and verify new candidate CHD genes. As such, its conclusions are beyond reproach. However, the very nature of this paper underlies a very real problem at the heart of recent studies of the genetic origins of CHD. Namely, all of this exemplary work has only revealed a single CHD case with a de novo, probably pathogenic, variant in GLYR1 out of a population of several thousand CHD cases. No other CHD case with a de novo or inherited variant in GLYR1 was identified, suggesting that the contribution of mutation this gene to CHD in the wider population is at best low. One is left wondering whether this approach is viable or sustainable for future studies.