Increased Frequency of De Novo Copy Number Variations in Congenital Heart Disease by Integrative Analysis of SNP Array and Exome Sequence Data (original) (raw)

. Author manuscript; available in PMC: 2015 Oct 24.

Abstract

Rationale

Congenital heart disease (CHD) is among the most common birth defects. Most cases are of unknown etiology.

Objective

To determine the contribution of de novo copy number variants (CNVs) in the etiology of sporadic CHD.

Methods and Results

We studied 538 CHD trios using genome-wide dense single nucleotide polymorphism (SNP) arrays and/or whole exome sequencing (WES). Results were experimentally validated using digital droplet PCR. We compared validated CNVs in CHD cases to CNVs in 1,301 healthy control trios. The two complementary high-resolution technologies identified 63 validated de novo CNVs in 51 CHD cases. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either SNP array (_p_=7x10−5, Odds Ratio (OR)=4.6) or WES data (_p_=6x10−4, OR=3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (_p_=0.02, OR=2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in WES and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q sub-telomeric deletions.

Conclusions

We demonstrate a significantly increased frequency of rare de novo CNVs in CHD patients compared with healthy controls and suggest several novel genetic loci for CHD.

Keywords: De novo copy number variation, congenital heart disease, SNP-array, whole exome sequencing, CNV burden, congenital cardiac defect, microarray, genomics

INTRODUCTION

Congenital heart disease (CHD) is the most frequent birth defect, affecting approximately 7 in 1000 live births,1 and is a significant cause of childhood morbidity and mortality.2 Rare Mendelian disorders, specific chromosomal abnormalities, and copy number variants (CNVs) are known to explain a subset of CHD cases,2-4 but the cause of over 80% of CHD remains unexplained.5-12

The application of evolving technologies that detect structural variation throughout the genome has demonstrated a considerable contribution of CNVs to CHD. Early cytogenetic studies recognized an increased prevalence of de novo chromosomal abnormalities in syndromic CHD patients, observations that were replicated and extended to non-syndromic CHD with successive generations of CNV detection technologies including array CGH and low density SNP arrays. Using these techniques, researchers have demonstrated significant burden of large de novo CNV in some specific CHD lesions. Such CNVs are reported to occur in 13.9% of infants with single ventricles compared to 4.4% in controls,13 in 10% of non-syndromic tetralogy of Fallot (TOF) compared to 4% of controls,5 and in 12.7% children with hypoplastic left heart syndrome compared to 2% of controls.20 Among different CHD lesions, the frequency of large de novo CNVs is similar.20 While many large CNVs are unique to a single CHD patient, several are recurrent in CHD cohorts. A 3-Mb 22q11.2 deletion is the most common recurrent de novo CNV associated with syndromic conotruncal defects (CTDs) and is found overall in at least 10% of TOF, 35% of truncus and 50% of interrupted aortic arch (IAA) type B cases.23 Recurrent de novo CNVs in CHD patients reported in multiple studies also occur at chromosomes 1q21.1, 3p25.1, 7q11.13, 8p23.1, 11q24-25, and 16p13.11.

The identification of CHD loci that are altered by CNVs provides opportunities to elucidate disease pathogenesis. However, discerning the causal gene(s) and inferring critical networks and pathways that cause or contribute to CHD has been difficult because low-resolution technologies used in many studies (array CGH and low-density SNP arrays) typically define large CNVs (>100kb) involving many genes. To address these issues, we capitalized on two independent strategies, high-density SNP genotyping arrays (Illumina Omni-1.0 and 2.5M) and whole exome sequencing (WES), to detect smaller de novo CNVs in a family-based trio study of sporadic CHD cases with conotruncal, heterotaxy, and left ventricular outflow tract defects.24 We compared CNVs found in CHD trios to those identified in healthy control trios. Through these analyses we sought to compare the robustness of genome-wide CNV detection using array-based and sequence-based technologies to determine if there was an increased burden of smaller de novo CNVs in CHD patients as was demonstrated with larger CNVs, and to determine if fewer genes altered by these CNVs enabled more precise detection of gene networks and pathways contributing to the pathogenesis of CHD.

METHODS

Ethics statement

The protocol was approved by the Institutional Review Boards of Boston Children's Hospital, Brigham and Women's Hospital, Great Ormond St. Hospital, Children's Hospital of Los Angeles, Children's Hospital of Philadelphia, Columbia University Medical Center, Icahn School of Medicine and Mt. Sinai, Rochester School of Medicine and Dentistry, Steven and Alexandra Cohen Children's Medical Center of New York, and Yale School of Medicine. Written informed consent was obtained from each participating subject or their parent/guardian.

Patient cohorts

CHD probands and parents were recruited into the CHD Genes Study of the Pediatric Cardiac Genomics Consortium (CHD genes: ClinicalTrials.gov identifier NCT01196182) as previously described,24 using protocols approved by Institutional Review Boards of each institution. Trios selected for this study had no history of CHD in first-degree relatives. CHD diagnoses were obtained from echocardiograms, catheterization and operative reports; extra-cardiac findings were extracted from medical records and included dysmorphic features, major anomalies, non-cardiac medical problems, and deficiencies in growth or developmental delay. The etiologies for CHD were unknown; patients with previously identified cytogenetic anomalies or pathogenic CNVs identified through routine clinical evaluation were excluded. Whole blood samples were collected and genomic DNA extracted.

CHD trios were studied by SNP arrays (n=414) or by WES (n=356), including a subset (n=233) that were analyzed by both methods. The distribution by CHD lesions in patients genotyped by arrays was: 403 (61%) left ventricular obstruction (LVO); 197 (30%) conotruncal defects (CTD); 49 (7%) heterotaxy (HTX); and 12 (2%) other cardiac diagnoses (Supplementary Table I). The distribution by CHD lesions in patients studied by WES was 284 (46.1%) left ventricular obstruction (LVO); 235 (38.1%) conotruncal defects (CTD); 78 (12.7%) heterotaxy (HTX); and 19 (3.1%) with other cardiac diagnoses (Supplemental Table II).

Control trios were the unaffected sibling and parents of a child with autism who were consented and recruited through the Simons Simplex Collection (SSC). CNVs were identified in the same way in the control trios as in the cases using SNP arrays (n=814) or WES (n=872), including a subset (n=385) analyzed by both methods.

Additional data on the distribution and prevalence of previously reported CNVs in the general population was derived from the Database of Genomic Variants (http://dgv.tcag.ca) and from 649 de-identified control subjects who had participated in an unrelated psychiatric case-control study, genotyped on the same high density SNP array platforms at the same genotyping center as the CHD probands (438 on the Illumina Omni-1M and 211 on the Illumina Omni-2.5M). These controls were used only to prioritize the de novo CNVs identified by SNP array methods that were selected for confirmation analyses.

Array genotyping and CNV identification

A total of 360 CHD parental samples genotyped on the Omni1M and 654 on Omni2.5M arrays were applied for cluster definition using Illumina Genome Studio clustering algorithm. Raw data is publicly available through the database of genotypes and phenotypes (dbGaP) National Heart, Lung, and Blood Institute (NHLBI) Bench to Bassinet Program: The Pediatric Cardiac Genetics Consortium (PCGC) under dbGaP Study Accession: phs000571.v1.p1.We removed clusters with outlier values of SNP call rate, Hardy-Weinberg equilibrium, AA/AB/BB cluster means, and minor allele frequency to improve the intensity noise (Log R ratio standard deviation) from a mean of 0.2 (using the default cluster file from Illumina) to 0.1 for CHD samples. Briefly, individual samples were filtered through a standard quality control pipeline.25 B-allele frequency (BAF) and Log R ratio (LRR) values were exported from Illumina Genome Studio. Only samples with SNP call rate > 98%, standard deviation (SD) of normalized intensity (LRR) < 0.3, absolute value of GC-corrected LRR <0.005, as well as CNV call count <800 for Omni1-Quadv1 or <300 for Omni2.5-8v1 were included.63 Samples with high inbreeding coefficients, that were duplicated, or had gender mismatches, and trios with Mendelian errors > 1% were removed from analyses. We started with 1,536 genotyped samples (512 trios), including 561 on the Illumina Omni-1M and 969 on the Illumina Omni-2.5M. Four hundred and sixty-one trios had the same array version for all family members. Upon completion of these QC procedures 1,245 samples, including 447 genotyped on the Illumina Omni-1M and 798 on the Illumina Omni-2.5M high-density SNP array platforms, were taken forward for analysis, constituting 415 complete trios (Supplemental Table III).

Three groups (CHOP, Harvard, Yale) independently analyzed genotyping data using slightly different algorithms to detect putative de novo CNVs. For each of the three independent analyses, CNVs were called for each subject using PennCNV 64 with the hidden Markov model algorithm and custom-made population frequency of B-allele (PFB) and GC model files. CNVs were called when 10 or more consecutive probes demonstrated consistent copy number change. The PennCNV detect_cnv --trio option was used to boost transmission probability of CNV calling for initially de novo scored CNVs. Fragmented CNV calls were merged using clean_cnv. All candidate CNVs were visually inspected to ensure the appropriate pattern of LRR and B-allele frequency was consistent with the CNV call. Additionally, Gnosis,25 QuantiSNP,65 and Nexus (biodiscovery.com) were used to increase specificity. De novo CNVs were prioritized for quality by genomic length, number of probes, confidence score based on signal strength, 50% overlap of two or more algorithms, low parental origin p-value using infer_snp_allele, and visual BAF/LRR review. CNVs with a minor allele frequency > 1% were removed, leaving rare CNVs. All putative de novo CNVs were experimentally evaluated by digital droplet PCR (ddPCR, Supplemental Figure I), and only validated CNVs are reported.

De novo CNV loci that were previously reported as pathogenic were defined by reported recurrence in at least two publications using independent data. Although some of the CNVs reported here overlap with previously reported CNVs in CHD patients based on review of the literature,66, they do not meet our frequency constraint for previously reported pathogenic de novo CNV loci.

CNV identification and variant calling from WES data

WES data from 356 CHD trios were analyzed for de novo CNVs (Supplemental Table II). WES samples were captured with the Nimblegen SeqCap Exome V2 chemistry and sequenced on the Illumina HiSeq 2000 platform as previously described.26 Sequence reads were aligned to the human reference genome hg19 using Novoalign (http://novocraft.com), BWA,67 and ELAND.68 Duplicates were marked with Picard (http://picard.sourceforge.net). Indel realignment and Base Quality Score Recalibration was done with GATK. XHMM is an algorithm to detect exon-level copy number variation and assign CNV quality metrics38 and was used at four of the PCGC analysis sites (CHOP, Harvard, Columbia and Mount Sinai) to identify de novo CNVs (Supplemental Figure II). Candidate de novo CNVs were inspected visually. Putative de novo CNVs were prioritized for confirmation based on genomic length, low sequence depth variability and low prevalence in the XHMM call set data (AF<1%). All putative de novo CNVs were independently confirmed by ddPCR.

SNVs and short insertions/deletions (indels) were called from the Novoalign alignment of WES trios using a pipeline derived from GATK version 2.7 best practices.69 Briefly, aligned reads were first compressed using the GATK ReducedReads module and variants were called on all CHD WES trios using the UnifiedGenotyper joint variant calling module. Identified variants were filtered using GATK variant quality score recalibration. Variants were annotated using SnpEff.70 De novo SNVs and indels were independently confirmed using Sanger sequencing.

CNV confirmation with digital droplet PCR

Putative CNVs were experimentally confirmed with ddPCR as previously reported71 using an 18-27 base pair FAM probe designed within each candidate CNV region, avoiding homopolymer runs or probes that began with G. A VIC probe targeting the RPP30 gene was used as reference. Reaction mixtures of 20 μL volume comprising ddPCR Master Mix (Bio-Rad), relevant forward and reverse primers and probe(s) and 100 ng of digested DNA were prepared, ensuring that approximately 25-75% of the 10,000 droplets ultimately produced were positive for FAM or VIC signal. For de novo CNV confirmations, DNA from the CHD patient and parents was used. After thermal cycling, plates were transferred to a droplet reader (Bio-Rad) that flows droplets single-file past a two-color fluorescence detector. Differentiation between droplets that contain target and those that did not was achieved by applying a global fluorescence amplitude threshold in QuantaSoft (Bio-Rad). The threshold was set manually based on visual inspection at approximately the midpoint between the average fluorescence amplitude of positives and negative droplet clusters on each of the FAM and VIC channels. Confirmed CNV duplications had approximately 50% increase in the ratio of positive to negative droplets as did the reference channel. Conversely confirmed CNV deletions had approximately half the ratio of positive to negative droplets as did the reference channel.

Network analysis

Three bioinformatic algorithms were utilized: DAVID,35 DAPPLE,36 and WebGestalt.37 Four different gene lists derived from the de novo CNV loci were used (Supplemental Table IV). The lists were constructed as follows: (1) All genes contained within de novo CNV intervals; (2) Published “causative” genes from previously reported CHD CNVs intervals in addition to all genes in novel CHD CNV intervals. “Causal” genes in previously reported CNV intervals included ELN (Williams syndrome), RAI1 (Smith-Magenis syndrome),TBX1 (22q11 deletion), GATA4 (8p23.1 deletion), GJA5 (1q21.1 duplication), and NKX2.5 (5q35.1 deletion); (3) Genes contained solely within novel CHD CNV intervals (e.g., exclude genes from previously published CNVs); (4) Genes contained within de novo CNV intervals that are highly expressed in the developing mouse heart (top quartile of all genes expressed E14.5 mouse heart).26 We anticipated that genes in list 2 and list 4 would have increased specificity for CHD in comparison to genes in list 1 and that genes in list 3 would be biased towards new disease networks.

We expanded network analysis input gene lists by including both de novo CNV genes and de novo single nucleotide variants (SNV) that were previously identified in CHD probands by WES.26 Only de novo SNVs predicted to be deleterious (e.g., loss of function (LOF): nonsense, frame-shift, and splice site mutations and missense variants that alter highly conserved amino acid residues or predicted to be deleterious by SIFT or PolyPhen2) were included in the expanded gene list. The additional gene lists included: (5) All genes within a de novo CNV interval (e.g., list 1) and protein-altering SNVs and (6) Published “causative” genes from previously reported CHD CNVs intervals in addition to all genes in novel CHD CNV intervals (e.g., list 2) and protein altering SNVs.

Statistical analysis

Burden calculations were done with a Fisher exact test computed in the R statistical computing environment. For analyses using DAVID, networks with an enrichment of genes impacted by CNVs were assigned a _p_-value with Benjamini and Hochberg correction for multiple testing with a false discovery rate of 0.05. In DAPPLE, type I error was controlled through permutation. _p_-values of less than 0.05 were considered significant.

RESULTS

Identification of De Novo CNVs

We studied 415 CHD trios genotyped by SNP arrays and 356 trios by WES analysis, including 233 trios studied by both methods. No trios had an affected first-degree relative and the genetic cause of CHD in all studied children was unknown (Supplementary Tables 1 and 2).

Sixty-five de novo CNVs identified in CHD cases were independently confirmed by ddPCR. De novo CNVs were identified in 51 unique probands (9.8%). These CNVs ranged in size from 0.1 kb to 12.8 Mb. 48 of these (76%) were <500 kb and half were smaller than 110 kb. The number of genes in the CNV intervals ranged from 1 to 175 with 42 (67%) having ≤ 5 genes. Four de novo intervals contained no genes. Six probands had two de novo CNVs, two had three CNVs and one had four CNVs.

The parental origin of deletion CNVs was determined when the haplotype of the remaining copy could be uniquely assigned to one parent. Seven de novo CNVs arose on maternal chromosomes and 10 on paternal chromosomes. The remainder could not be assigned due to uninformative or insufficient numbers of informative parent-of-origin SNPs.

Comparison of SNP Array and WES CNV calling

To consider the accuracy of identifying de novo CNVs from SNP array data, we first considered a set of 40 high-confidence PennCNV de novo CNV calls that contained ≥10 adjacent SNPs, were >10 kb in length, and passed visual inspection. Among these 40 high-confidence putative CNVs, 40 were experimentally validated by ddPCR in the proband and 32 (80%) were experimentally confirmed to be de novo, representing a false positive rate of 20%. For smaller de novo CNVs identified using the high-density array data, we considered a set of 97 high-confidence PennCNV putative de novo CNV calls based on 7-9 SNPs. While 88% were experimentally validated by ddPCR in the proband, only four of the 97 (5%) were confirmed to be de novo.

From the WES data, we selected an initial set of 29 putative CNVs with a size range spanning six orders of magnitude from 530 bases in length (two exons) to more than 8 Mb in length covering hundreds of exons. Twenty-six of the 29 CNVs (90%) confirmed experimentally to be de novo, representing a false positive rate of 10%. The three false positive CNVs included one 530-bp region that contained only two exon targets and two different inherited CNVs that were miscalled as de novo because both parents harbored CNVs at the locus. Based on these considerations, we restricted subsequent WES de novo CNV calls to those containing ≥3 exons and for which each parental dataset contained no CNVs within the locus.

To evaluate false negative rates of the two platforms and analyses, we tested our ability to detect four CNVs (two 22q11 deletions, one 17p11 duplication, and one 10q terminal deletion; Supplemental Table V in clinical cases previously diagnosed with these CNVs. These four CNVs served as positive controls and were distinct from the PCGC cohort. Both the SNP array and WES platforms detected each of these four large, clinically significant CNVs.

We also compared the results of de novo CNVs analysis from the 233 trios studied by both SNP array and WES. Among 42 confirmed de novo CNVs in these trios, 24% (10/42) were identified by both platforms while 40% (17/42) were identified only with the SNP arrays and 35% (15/42) only by WES (Figure 1). The recognized technical limitations of each platform prevented detection of some CNVs. For example, CNVs that occur exclusively in noncoding sequences are not captured by WES whilst CNVs in coding or non-coding genomic regions where the SNP density is sparse can escape detection by SNP arrays.

Figure 1.

Figure 1

Comparison of SNP array and WES platforms in detection of the 42 validated de novo CNVs in the subset of 233 probands studied by both technologies. 10 of the 42 were detected by both methods, 32 were called by one method. The figures below the dotted line show the number of CNVs that were below the detection limits of the second platform (CNVs that span <10 SNPs on SNP arrays or <3 exons on WES) and hence could not have been called. The figures above the dotted line show the number of CNVs with sufficient SNPs and/or exons to enable high confirmation rates, but that were not called.

From our studies we deduced that de novo CNVs were accurately detected by arrays when ≥10 adjacent SNPs were impacted or by WES when greater than three adjacent exons were impacted. In our dataset, 29 of 42 CNVs fulfilled both of these criteria and should have been identified by both technologies (Figure 1). However, only 34% (10/29) of these ddPCR confirmed CNVs were identified by both platforms. SNP arrays uniquely identified 34% (10/29) and WES analyses uniquely identified 31% (9/29). Taken together, the false negative rate of each methodology is approximately 30-35%. Overall, the genome-wide analyses of de novo CNVs identified by SNP arrays was reasonably concordant with WES data, but each also identified complementary CNVs. The minimum CNV size that we reliably detected by SNP arrays was 10 kb and by WES was 1 kb, although some smaller CNVs identified by these techniques were validated.

CNV burden analysis

The burden of de novo CNVs in CHD cases and control trios was initially compared using analyses from SNP arrays. De novo CNVs were assessed in 841 control trios, studied using the Illumina Omni1M array to match the case trio array resolution and called using the PennCNV algorithm using computational parameters described previously25. Nine de novo ddPCR-validated CNVs were identified among 841 control trios. Twenty-two de novo ddPCR-validated CNVs were identified among 462 CHD trios with SNP arrays. These data define a significant burden of CNVs in CHD cases compared to controls (OR: 4.6, Fisher p-value: 7 x 10−5; Table 2). After excluding nine previously identified CHD-associated CNVs, the calculated burden of novel CNVs identified in CHD cases remained modestly significant (OR: 2.7, Fisher _p_=0.02).

Table 2.

Case Control de novo CNV Burden

N Probands N (%) CNVs OR P-value
SNP Array SSC1 841 9 (1%) - -
PCGC: all CNVs 462 22 (4.7%) 4.6 7 × 10−5
PCGC: novel loci 13 (2.8%) 2.7 0.02
WES SSC2 872 14 (1.6%) - -
PCGC: all CNVs 356 19 (5.6%) 3.5 6 × 10−4
PCGC: novel loci 13 (3.9%) 2.3 0.03

To provide further support for this finding, we analyzed the burden of de novo CNVs that were identified by WES. WES in CHD cases and control trios were technically comparable, including the same Nimblegen V2 exome capture chemistry and similar sequence read depths obtained on identical Illumina platforms. Sixty percent of control trios were sequenced at the same site (Yale Center for Genome Analysis) that sequenced the cases. Raw sequence reads were processed through the identical short read aligner (Novoalign) for CNV burden analysis. SNP genotyping of CHD and control datasets and principal component analysis did not identify any systematic biases (Supplemental Figure V). Cases and controls were matched for gender as best as possible with slight excess of male cases. Using an identical XHMM pipeline (CNVs involving ≥3 exons and no parental CNVs within 1 MB), we identified 19 de novo ddPCR-validated CNVs in 356 CHD trios, and 14 de novo ddPCR-validated CNVs in 872 control trios (OR: 3.5, Fisher _p_=6 x 10−4; Table 2). Excluding the six de novo CNVs previously identified as CHD-associated, we identified a similar OR and _p_-value as in the SNP array data (OR: 2.3, Fisher _p_=0.03).

Our data identify an increased burden of CNVs, detected by SNP arrays or WES, in CHD patients compared to controls. We observed a larger mean size of de novo CNVs with increased burden in CHD patients (3.6 Mb) than controls (495 kb; t-test _p_=0.035) with the distribution of CHD CNVs skewed towards the largest CNVs identified in CHD cases. The median size of de novo CNVs from CHD cases (522 kb) was also significantly larger than controls (118 kb; Mann-Whitney _p_=0.028). Of the CNVs identified by SNP array which were capable of detecting CNVs outside of coding regions, there was a trend towards an increased number of de novo CNVs in controls that contained no coding exon (4/9) compared to PCGC cases (3/22; Fisher _p_= 0.15).

Putative CHD loci at 15q11.2 and 2p13.3

Overlapping de novo CNVs found in multiple cases and not in controls likely contain disease genes. Sixteen of 65 (25%) de novo CNVs in CHD probands have been previously implicated in CHD5, including four 22q11.2 deletions, three 8p23 deletions (involving GATA4), two 1q21.1 duplications (involving PRKAB2, PDIA3P, FMO5, CHD1L, BCL9, ACP6 and GJA5), one 22q11.2 distal duplication, one 2q22.3 deletion (that causes Mowat-Wilson syndrome), one 11q24.2-q25 deletion (that causes Jacobsen syndrome) and four with CNVs in 15q11.2.

CNVs in four CHD probands (two deletions, two duplications) at the BP1-BP2 15q11.2 locus that spans approximately 225 kb (chr15:22,836,000-23,062,000) were observed as recurrent de novo events (Supplemental Figures III and IV). Both patients with duplications (1-00192, 1-00315) and one with a deletion (1-00243) had LVO due to aortic coarctation. The remaining proband (1-01396) had TOF with pulmonary atresia. As there was no de novo CNV identified in this region among 814 and 872 control trios studied respectively by SNP arrays or WES, this locus has a significant burden of de novo CNVs in CHD cases (4/538 CHD vs. 0/1301 controls; Fisher _p_=0.007). CNVs at the 15q11.2 locus were observed at low frequency (AF<1%) in the Database for Genomic Variants (DGV). Among the three genes altered by this CNV (CYFIP1, NIPA1, and NIPA2), only CYFIP1 is highly expressed in the developing mouse heart.26 CYFIP1 encodes the cytoplasmic FMR1-interacting protein 1, which has dual roles in inhibiting local protein synthesis and in promoting actin remodeling.27 An earlier study observed an increased burden of inherited deletions in CHD cases at 15q11.21 and a recent paper identified a single proband with a 6-Mb de novo duplication at 15q11.2-q13.120 and two additional cases with inherited 300-400-kb duplications at 15q11.2. Our data provide additional evidence that de novo CNVs at 15q11.2 may contribute to disease risk in CHD.

In addition, a recurrent CNV was observed to alter a novel locus at chromosome 2p13.3. A de novo 190-kb deletion was identified in a TOF proband (1-01536) and was maternally inherited in a proband with truncus arteriosus (1-01805). No 2p13.3 CNV was found in control samples or in DGV. Among three genes included in the CNV interval (ASPRV1, PCBP1 and PCBP1-AS1), only PCBP1 is highly expressed in the developing mouse heart.26 PCBP1 encodes a major cellular poly(rC)-binding protein, which controls translation from mRNAs containing the DICE (differentiation control element).28 In DECIPHER, patient 257771 with an atrioventricular canal defect had a 7-Mb overlapping deletion of 2p13.3, suggesting this locus may also contribute to disease risk in CHD.

Integration of CNV and sequence data to identify CHD genes

To improve the identification of specific genes altered by CNVs that might cause or contribute to CHD, we searched the WES data for de novo, rare loss-of-function (LOF) variants in genes encoded in CNV intervals. We identified a terminal deletion of chromosome 11q24.2-q25, which causes Jacobsen syndrome in one CHD patient (1-01486) with clinical manifestations typical of this dominant disorder (hypoplastic left heart, coarctation of the aorta, mitral and aortic valve atresia, strabismus, and short stature). ETS1 has been proposed as the critical CHD gene in the Jacobsen syndrome locus based on impaired ventricular development in an _Ets1_-null mouse.29 WES analyses identified a de novo ETS1 frameshift mutation (chr11:128350159GTCCT>G, c.1046_1049delAGGA, [p.K349fs]) in another CHD patient without the chromosome 11q24.2-q25 deletion with cardiac abnormalities observed in Jacobsen syndrome (hypoplastic left heart and mitral valve atresia). Our data provide the first human genetic evidence to suggest that ETS1 mutations contribute to the cause of cardiac malformations in Jacobsen syndrome.

We also assessed whether de novo CNVs in combination with a rare or novel deleterious variant on the other allele might produce recessive forms of CHD. One CHD patient (1-01179) with a de novo 10q25-26 deletion also had a novel CTBP2 variant (p.R134W) on the remaining allele. The hemizygous variant was absent from public genome databases, is predicted to be damaging (Polyphen2 score of 0.998), and altered a phylogenetically conserved residue (PhyloP score = 2.54). Cardiac abnormalities are present in approximately one third of patients with subterminal chromosome 10q deletions and recently CTBP2 was proposed as a candidate CHD gene.32 The clinical manifestations of our patient, truncus arteriosus and right aortic arch, resemble the phenotypes identified in a _Ctbp2_-null mouse (failure of vascular remodeling and cardiac looping).33 We suggest that CTBP2 sequence analyses in individuals with chromosome 10q deletions may identify additional variants in a subset of patients that modify phenotype.

Correlation of CHD phenotypes and CNVs

The frequency of de novo CNVs was 10% among conotruncal anomalies, 6% among left-sided obstructive lesions and 21% in heterotaxy. We observed a modest trend towards increased extra-cardiac manifestations such as developmental delay in patients with de novo CNVs (Supplemental Table VI). Approximately 31% of all CHD patients studied with SNP arrays or WES had extra-cardiac manifestations, whilst 40% (21/52; OR:1.5, Fisher _p_=0.2) of patients with de novo CNVs had extra-cardiac features. This association has been found in some,34 but not all,20 previous studies, perhaps due to differences in the ages of the CHD patients studied, methods of clinical data collection, and the definition of an extra-cardiac anomaly.

Gene networks impacted by CNVs in CHD

We employed pathway and network analysis with DAVID,35 DAPPLE,36 and WebGestalt,37 using as input four different lists of genes encoded within all de novo CNV loci (Methods and Supplemental table IV). Initial gene lists contained: (1) all genes encoded in a de novo CNV interval; (2) genes previously defined as causative with CNVs intervals plus all genes in novel de novo CNV intervals; (3) only genes contained within novel de novo CNV intervals; (4) all genes contained within de novo CNV intervals that are highly expressed (top 25%) in the developing heart.26

DAVID identified enrichment of a gene pathway implicated in acetylation p<2.3x10−4; phosphoprotein p<3.9x10−4, and G protein-activated inward rectifier potassium channel p<2.5x10−2 (Benjamini-Hochberg corrected). WebGestalt implicated an enrichment of previously identified CHD genes including ELN, NKX2.5, GATA4, and ZEB2 contributing to Gene Ontology processes: anatomical structure formation involved in morphogenesis p<0.03, cardioblast differentiation p<0.03, and septum secundum development p<0.02 (Benjamini-Hochberg corrected).

Using DAPPLE, we identified two additional sub-networks of direct protein/protein interactions that were consistently observed across four gene lists. Among genes encoded within CNVs that are highly expressed in the developing heart, a sub-network consisting of NKX2.5 and GATA4 (p<0.1, Figure 2a) and a sub-network consisting of ETS1, JUN, TOP2A, and MKI67 (p<0.01, Figure 2b) were identified. By further expanding the CNV gene lists to include genes with de novo LOF mutations, the ETS1/JUN/TOP2A sub-network was significantly elaborated upon and enriched (p<0.005). Each of these three genes was directly linked through protein-protein interactions to sub-networks containing ≥ 10 additional genes identified in either CNV or WES datasets.26 This entire network incorporated over 60 genes implicated in CHD (Figure 2c). As the ETS1/JUN/TOP2A sub-network was robust to the specific de novo CNV gene list (criteria 2 above) and expanded with the addition of genes containing rare de novo LOF mutations, the data suggest that this sub-network contains genes and pathways involved in CHD.

Figure 2.

Figure 2

Network analysis of CNV loci genes. Two networks of direct protein-protein interactions, (A) NKX2.5/Gata4 and (B) ETS1/JUN/TOP2A, were consistently identified in the DAPPLE de novo CNV loci analysis. P-values from the genes highly expressed in the developing heart, the most restrictive gene set list, are presented here. (C) The ETS1/JUN/TOP2A network was significantly elaborated upon by incorporating genes with deleterious de novo point mutations and indels in the WES exome sequencing analysis in addition to the CNV loci. Of note, two probands had de novo ETS1 variants (one CNV and one frameshift), two probands had de novo SMAD2 variants (a splice site mutation and a highly conserved missense variant) and two probands had de novo ELN variants (both Williams syndrome CNVs).

DISCUSSION

We report whole-genome CNV analyses using complementary detection technologies in a large cohort of CHD patients. CNV detection in WES has been investigated in schizophrenia38 and autism,39 but array-based and sequence-based strategies have not previously been directly compared, and our data highlight the differences between array-based and sequence-based strategies to detect de novo CNVs. By defining small CNVs with high resolution and integrating these findings with WES data that identified rare deleterious mutations, we identified novel de novo CNVs and genes involved in the pathogenesis of CHD. We show that 9.8% (53/538) of CHD patients without a previously identified genetic etiology have rare de novo CNVs. We previously demonstrated that 10% of CHD patients in our cohort have de novo single nucleotide or small insertion/deletion mutations in genes highly expressed in the developing heart that are likely to be damaging.26 None of the CHD patients with rare de novo CNVs reported here carry these variants. Even if all the de novo CNVs and de novo predicted pathogenic sequence variants we have identified were causative, we do not yet know the etiology for the majority of CHD subjects in our study.

Our detection rate of approximately 10% de novo CNVs in CHD patients is equivalent to previous studies, despite identifying small CNVs. Had we not excluded patients with known pathogenic CNVs identified through clinical care, we expect that de novo CNVs would have been identified in approximately 15% of CHD patients, based on the prevalence of common de novo CNVs in CHD (e.g., 7% of TOF with chromosome 22q11 deletions, and 1% of TOF to 1q21 CNVs). In our study, these CNV loci accounted for <1% of CHD probands

Despite these exclusion criteria, we identified a four-fold increased frequency of rare de novo CNVs relative to the background frequencies of 1.2% (detected by SNP arrays) and 1.8% (detected by WES) of de novo CNVs in controls (_p_=7 x 10−5, _p_=4 x10−4 respectively). Even after excluding previously defined CNVs, we still observed an approximate two-fold increase in novel rare de novo CNVs (_p_=0.02).

Since the odds ratio of de novo CNVs in cases vs controls was 3.5-4.6, we estimate that between 50-70% of de novo CNVs observed in cases may be disease causing. The possibility exists that a higher percentage of de novo CNVs increase the risk of CHD but may not be sufficient to cause CHD without other contributing genetic or environmental factors. Additionally, subtle anatomic defects in the heart may not have been diagnosed in the control group since controls were not systematically examined by echocardiogram. Overall, our evidence suggests a model in which de novo CNVs contribute to CHD

The comparison of dense array-based platforms and WES analyses to detect independently validated CNVs indicate that each strategy identifies only ~70% of the CNVs that should be within the detection limitations of each technology. As such, these two CNV methodologies provide substantial complementary information. An important corollary to this conclusion is that previously published CNV analyses in human disease may have significantly underestimated the burden conveyed by these structural variants.

Amongst all confirmed de novo CNVs, 61% (41) were deletions and 39% (26) were duplications. The proportion of these classes of CNVs are not significantly different; whether or not the trend toward more CNV deletions in CHD is biologically meaningful or reflects greater sensitivity to detect deletions by these methods will require further analyses. De novo CNVs ranged in size from less than 1 kb to 12.8 Mb, with a median size of 110 kb. Thus, half of the independently confirmed CNVs were smaller than the reported detection limit of most prior studies. For example, four CHD patients had 200-kb de novo CNVs on chromosome 15. While the pathogenicity of the identified CNVs remains to be determined, we propose that the smaller CNVs involving fewer genes are particularly valuable in defining specific candidate CHD genes in comparison to larger CNVs that typically include many more candidates. The ability to reliably detect small CNVs is helpful, particularly if they fall within large CNVs previously identified and define a critical interval of overlap. For example, we identified one de novo CNV that only affected JUN and another that only altered TOP2A, two genes that were implicated by network analyses as interacting with transcription factors SMAD2, SMAD4 and ETS1, molecules that play important roles in cardiovascular development.

Although there is considerable complexity in CHD phenotypes, we observed no significant difference in the frequencies of rare de novo CNVs among distinct CHD sub-classifications. While CHD patients with CNVs in our cohort were more likely to have extra-cardiac phenotypes (OR: 1.5), this trend fell short of significance. Whether this finding reflects shared developmental biologic pathways among different organ systems or the possibility that CNVs perturb multiple genes that individually contribute to organ system development is unknown.

We identified several de novo CNVs that impacted established CHD genes including GATA4 and GJA5. We also identified a CHD patient with a deletion of chromosome 5q34-q35.2, encompassing NKX2-5. LOF NKX2-5 mutations are an established cause of CHD, and CNVs encompassing NKX2-5 have been previously recognized in CHD.

We identified recurrent de novo CNVs involving deletions or duplications at chromosome 15q11.2. As the proximal region of chromosome 15 is meiotically unstable due to the segmental duplications that serve as breakpoint hotspots, recurrent de novo events at this locus might reflect locus genomic instability. However, the excess burden of de novo CNVs at this locus in CHD patients compared to controls (Fisher _p_=0.007) suggests otherwise. The report of an excess burden of inherited deletions in CHD patients at this locus3 lends further evidence for pathogenicity although this study lacked information on inheritance. As CNVs at chromosome15q11.2 CNV exhibit incomplete penetrance for both neuropsychiatric and CHD phenotypes, genes affected by this could participate in inherited and sporadic CHD.

Chromosome 15q11.2 deletions and duplications are implicated in neurodevelopmental disorders including schizophrenia, intellectual disability and autism.43-45 That chromosome 15q11.2 CNVs are also associated with CHD adds to a growing list of loci (22q11,46 1q21, 7q11.23,48 16p11.2, and 16p13.11) that link cardiac malformations and neurocognitive disorders. These (and other) genetic loci may explain in part the significant co-expression of heart and brain developmental phenotypes in many children.

By integrating CNV and sequencing data from WES, we also identified candidate genes within CNV regions that may cause dominant or recessive forms of CHD. We present the first human ETS1 LOF mutation that likely contributes to Jacobsen syndrome. We also identified a rare inherited and predicted deleterious CTBP2 missense variant that is hemizygous due to a de novo CNV deletion, associated with a CHD phenotype comparable to that observed in _Ctbp2-_null mice. Continued integration of CNV and sequence data should enable more comprehensive assessments of genetic causes of disease. The current study provides suggestive data, and sequencing large cohorts of CHD patients for mutations in these two genes will be necessary to unambiguously prove the role of these genes in CHD.

Network analyses by DAPPLE was more successful in elucidating novel network biology than DAVID and WebGestalt, which rely heavily on previously annotated gene sets and are challenged by the addition of unrelated genes encoded with CNV intervals along with pathogenic genes. If pathogenic CNVs on average contain one main causal gene and approximately five unrelated genes, then we might expect DAVID and WebGestalt to be less informative for CNV network analyses.51 Conversely, DAPPLE, based on proteome-wide protein-protein interaction data rather than previously curated gene lists, calculates p-values through within-degree node-label permutation, which is more permissive to background noise.36

DAPPLE network analysis reinforced the central role of transcriptional regulation in congenital heart disease. The identification of one network, including NKX2.5/GATA4, provided a robust positive control as protein-protein interactions and substantial contributions by these molecules to CHD are previously described. Direct protein-protein interactions between ETS1/JUN/TOP2A have also been reported,54-56 but this network has not been previously implicated in CHD. In an expanded network analysis of these molecules that included rare LOF mutations identified from exome sequencing, JUN was linked to SMAD2 and SMAD4, molecules that participate in cardiac development through the TGF-β signaling pathway.57-60

We focused our current analysis solely on rare de novo CNVs. As the etiology of CHDs is known to be polygenic, and incomplete penetrance of genes for CHD has been previously described, future analyses of rare inherited CNVs may expand these findings.

The novel de novo CNVs we report should be considered provisional pending replication in independent studies. Replication of the overall effect and the magnitude of the risk of these identified variants is needed. While it is not yet possible to draw a conclusion about whether any particular de novo CNV is causal, the identification of additional CNVs and mutations in specific genes within the CNV intervals will be required to validate the new loci identified here.

In summary, integration of high resolution complementary platforms for CNV and sequence data on large numbers of patients with CHD has proven valuable to define the underlying genomic architecture of CHD and expand the genes and networks involved in cardiac development and is likely applicable to the study of other diseases.

Supplementary Material

304458R2 Acknowledgement Permissions

304458R2 Online Data Supplement

CircRes_CIRCRES-2014-304458.xml

Novelty and Significance.

What Is Known?

What New Information Does This Article Contribute?

CHD is amoung the most common birth defects. Many genomic loci are implicated in CHD, but most cases are of unknown etiology. A significant increase in CNV burden was observed when comparing CHD trios with healthy trios, using either SNP array (_p_=7x10−5, Odds Ratio (OR)=4.6) or WES data (_p_=6x10−4, OR=3.5) and remained after removing 16% of de novo CNV loci previously reported as pathogenic (_p_=0.02, OR=2.7). We observed recurrent de novo CNVs on 15q11.2 encompassing CYFIP1, NIPA1, and NIPA2 and single de novo CNVs encompassing DUSP1, JUN, JUP, MED15, MED9, PTPRE SREBF1, TOP2A, and ZEB2, genes that interact with established CHD proteins NKX2-5 and GATA4. Integrating de novo variants in WES and CNV data suggests that ETS1 is the pathogenic gene altered by 11q24.2-q25 deletions in Jacobsen syndrome and that CTBP2 is the pathogenic gene in 10q sub-telomeric deletions. This is the first large cohort study of CHD families using WES and dense state of the art SNP arrays for integrative de novo CNV discovery. The new loci implicated here provide novel diagnostic markers for early detection of CHD and novel therapeutic drug targets.

Table 1.

Confirmed rare de novo CNVs in discovery cohort. Genomic coordinates refer to hg19.

ID Chr Start End Band CNV1 Syndrome/ gene AnalysisObserved2 Cardiac Lesion:(diagnosis)3 ParentOrigin Extra-cardiac Ngenes Size(kb)
1-01401 1 59247993 59251097 p32.1 1 JUN A LVOT (HLHS) - - 1 3.1
1-03171 1 145586403 145799634 q21.1 3 1q21.1 dup/ GJA54 A E CTD (TOF/APVS) - - 7 213.2
1-01036 1 146631133 147416212 q21.1 3 1q21.1 dup/ GJA54 E CTD (TOF) M - 15 785.1
1-01486 1 194201171 194304070 q24.2-q25 3 CDC73 A LVOT (HLHS) - Yes 0 102.9
1-01518 1 248750565 248795110 q44 3 OR2T10,OR2T11 A LVOT (HLHS) - - 2 44.5
1-01536 2 70168995 70359345 p13.3 1 PCBP1 A CTD (TOF/PA) - - 5 190.4
1-01401 2 102493466 103001458 q11.2-q12.1 1 MAP4K4 E LVOT (HLHS) - - 6 508.0
1-01401 2 145155868 145274931 q22.3 1 Mowat-Wilson/ ZEB24 E LVOT (HLHS) - - 1 119.1
1-00762 3 60661 11712230 p26.1 3 ARL8B,ARPC4,CAMK1,CAV3,CRBN,EMC3,ITPR1,SEC13,SETD5,VGLL4 A ASD/PS (ASD) - Yes 103 11651.6
1-01049 3 15637812 15643461 p25.1 3 BTD,HACL1 E CTD (TOF) - - 2 5.6
1-01045 3 47780965 48309270 p21.31 3 CDC25A,DHX30,MAP4,SMARCC1 A LVOT (HLHS) - - 14 528.3
1-02093 3 197143652 197186111 q29 3 BDH1 A CTD (TOF/PA) - Yes 0 42.5
1-00771 4 185603346 185638397 q34.1 1 CENPU,PRIMPOL E CTD (DTGA/VSD) P Yes 2 35.1
1-00789 5 136464 232969 p15.33 3 CCDC127,LRRC14B,PLEKHG4B,SDHA A CTD (TOF) - - 4 96.5
1-00113 5 133706994 133730455 q31.1 1 UBE2B A CTD (TOF/PA) - Yes 1 23.5
1-00296 5 166386727 173073664 q34-q35.2 1 NKX2.54 A CTD (TOF) M Yes 53 6686.9
1-01916 6 36646788 36651971 p21.2 1 CDKN1A A HTX (HTX) - - 1 5.2
1-01049 6 43484783 43485159 p21.1 3 POLR1C E CTD (TOF) - - 1 0.4
1-00096 7 50179707 50191153 p12.2 1 C7orf72 E CTD (TOF/PA) - Yes 1 11.4
1-00800 7 72719386 74138603 q11.23 1 Williams syndrome_4_ A CTD (VSD/PS) P Yes 34 1419.2
1-00540 7 72721123 74140708 q11.23 1 Williams syndrome_4_ A LVOT (ASD) M Yes 34 1419.6
1-00977 7 138258252 143807632 q24-q25 1 C7orf55,FAM115A,LUC7L2,MKRN1,NDUFB2,UBN2,ZC3HAV1L,ZYX E CTD (TOF) - - 175 5549.4
1-01995 7 142334207 142460871 q34 1 MTRNR2L6,PRSS1 E CTD (TOF) M - 15 126.7
1-01562 8 8067768 12530976 p22.1-p23.1 3 GATA44 A CTD (TOF) - - 75 4463.2
1-02625 8 8102183 12190106 p23.1 3 GATA44 A LVOT (CoA) M Yes 62 4087.9
1-00566 8 11606428 11710963 p23.1 1 GATA44 A E CTD (TOF) - - 6 104.5
1-00948 8 119053343 119064098 q24.1 1 EXT1 A LVOT (CoA) P Yes 1 10.8
1-02360 9 5302500 5337760 p24.1 3 RLN1,RLN2 A CTD (ASD) - Yes 3 35.3
1-01852 11 34458230 34460862 p13 1 CAT A CTD (VSD) - - 1 2.6
1-00565 11 42968283 42970488 p12 3 HNRNPKP3 A LVOTO (ASD) - - 0 2.2
1-01536 11 65157239 65408708 q13.1 1 EHBP1L1,LTBP3,MAP3K11,PCNXL3,SCYL1,SSSCA1 A CTD (TOF/PA) - - 14 251.5
1-00230 11 86939592 87025456 q14.2 1 TMEM135 A E LVOT (ASD) P Yes 1 85.9
1-01486 11 125641368 134943190 q24.2-q25 1 Jacobsen / ETS14 A E LVOT (HLHS) P Yes 73 9301.8
1-00795 11 134598043 134617838 q25 3 LOC283177 A CTD (VSD) M - 0 19.8
1-00124 12 8003758 8123306 p13.31 3 SLC2A14,SLC2A3 A LVOT (As/HLHS) - - 3 119.5
1-00050 12 52845952 52862783 q13.13 1 KRT6C A LVOT (HLHS) - - 1 16.8
1-02411 14 58860893 58881694 q23.1 1 TIMM9,TOMM20L A CTD (TOF) - - 2 20.8
1-01049 14 74551632 74551731 q24.3 3 LIN52 E CTD (TOF) - - 1 0.1
1-00192 15 22296985 23161330 q11.2 3 1 MB from PW / CYFIP14 A LVOT (CoA) - - 20 864.3
1-00315 15 22750305 23140114 q11.2 3 1 MB from PW / CYFIP14 A LVOT (CoA) M - 5 389.8
1-01396 15 22750305 23228712 q11.2 1 1 MB from PW / CYFIP14 A E CTD (TOF/PA) P - 6 478.4
1-00243 15 22835893 23062345 q11.2 1 1 MB from PW / CYFIP14 E LVOT (CoA) P Yes 4 226.5
1-01994 15 28389771 28446734 q13.2 1 HERC2 E LVOT (ASD) P - 1 57.0
1-01696 15 44833588 44856873 q21.1 1 EIF3J,SPG11 A E CTD (TriAtresia/DTGA) - - 2 23.3
1-01941 15 88761539 88779300 q25.3 3 NTRK3 A CTD (TOF/DTGA) P - 1 17.8
1-01427 17 21562473 22252439 p11.2 1 FAM27L,FLJ36000,MTRNR2L1 A HTX (HTX) - Yes 7 690.0
1-00561 17 27962393 28099002 q11.2 1 SSH2 A LVOT (ASD) - Yes 3 136.6
1-01995 17 38544624 38548586 q21.1 1 TOP2A A E CTD (TOF) - - 1 4.0
1-01049 17 39845210 39846477 q21.2 3 EIF1 E CTD (TOF) - - 2 1.3
1-01588 18 65138642 78015180 q22.1-q23 1 NFATC14 A LVOT (CoA) - Yes 58 12876.5
1-02170 19 20601006 20717536 p12 1 ZNF826P A CTD (TOF) - Yes 1 116.5
1-00174 19 40515744 40681387 q13.2 1 ZNF546,ZNF780A,ZNF780B A CTD (TOF/PA) - Yes 4 165.6
1-01536 19 47792293 47905132 q13.33 1 C5AR1,C5AR2,DHX34 A CTD (TOF/PA) - - 3 112.8
1-00730 20 14529657 14583899 p12.2 1 MACROD2,MACROD2-IT1 A CTD (DTGA) - - 2 54.2
1-01194 22 18844632 21500000 q11.2 1 DiGeorge / TBX14 A CTD (VSD) P Yes 80 2655.4
1-00113 22 18886915 22000000 q11.2 1 DiGeorge / TBX14 A E CTD (TOF/PA) P Yes 96 3113.1
1-01836 22 19020529 21380382 q11.2 1 DiGeorge / TBX14 A E CTD (TOF) M - 70 2359.9
1-00988 22 20733495 21464479 q11.2 1 DiGeorge / TBX14 A CTD (HLHS/HTX) M Yes 31 731.0
1-02133 22 25661725 25919492 q11.23 3 22q11 distal duplication_4_ A CTD (TOF) - - 4 257.8
1-00425 22 36038076 36149338 q12.3 1 APOL5,APOL6,RBFOX2 A E LVOT (HLHS) - - 4 111.3
1-01427 22 42522638 42531210 q13.2 3 CYP2D6 A HTX (HTX) - Yes 2 8.6
1-01941 X 23003525 23086619 p22.11 3 DDX53,RP11-40F8.2 A CTD (TOF/DTGA) - - 1 83.1
1-00197 X 148685645 148693146 q28 3 TMEM185A E LVOT (HLHS) - Yes 1 7.5

ACKNOWLEDGMENTS

The authors are grateful to the patients and families who participated in this research. We thank the following team members for contributions to patient recruitment: D. Awad, K. Celis, D. Etwaru, J. Kline, R. Korsin, A. Lanz, E. Marquez, J. K. Sond, A. Wilpers, R. Yee (Columbia Medical School); A. Roberts, K. Boardman, J. Geva, J. Gorham, B. McDonough, A. Monafo, J. Stryker (Harvard Medical School); N. Cross (Yale School of Medicine); S. M. Edman, J. L. Garbarini, J. E. Tusi, S. H. Woyciechowski (Children's Hospital of Philadelphia); R. Kim, J. Ellashek and N. Tran (Children's Hospital of Los Angeles); K. Flack (University College London); A. Romano, D.Gruber, N. Stellato (Steve and Alexandra Cohen Children's Medical Center of New York); D. Guevara, A. Julian, M. Mac Neal, C. Mintz (Icahn School of Medicine at Mount Sinai); and G. Porter and E. Taillie (University of Rochester School of Medicine and Dentistry). We also thank V. Spotlow, P. Candrea, K. Pavlik and M. Sotiropoulos for their expert production of exome sequences, and we thank M. Lemma, C. Kim, F. G. Otieno, M. Khan and K. Thomas for their expert production of genome wide genotypes.

We are grateful to all of the families at the participating SFARI Simplex Collection (SSC) sites, as well as the principal investigators (A. Beaudet, R. Bernier, J. Constantino, E. Cook, E. Fombonne, D. Geschwind, D. Grice, A. Klin, D. Ledbetter, C. Lord, C. Martin, D. Martin, R. Maxim, J. Miles, O. Ousley, K. Pelphrey, B. Peterson, J. Piggot, C. Saulnier, M. State, W. Stone, J. Sutcliffe, C. Walsh, E. Wijsman) and the coordinators and staff at the SSC clinical sites.

The authors thank New England Research Institutes (NERI) S. Tennstedt, B. Williams, D. Nash, J. Barenholtz, K. Cucchi, K. Dandreo, S. Yates, T. Hamza, and C. Taglienti.

SOURCES OF FUNDING

This work was supported by the National Institutes of Health (NIH) National Heart, Lung, and Blood Institute (NHLBI) Pediatric Cardiac Genomics Consortium (U01-HL098188, U01-HL098147, U01-HL098153, U01-HL098163, U01-HL098123, U01-HL098162) and in part by the Simons Foundation for Autism Research and the NIH Centers for Mendelian Genomics (5U54HG006504).

Nonstandard Abbreviations and Acronyms

CTD

conotruncal defect

LVOT

Left Ventricular Outflow Tract Obstruction

TA

truncus arteriosus

TOF

tetralogy of Fallot

HLH

hypoplastic left heart syndrome

APVS

Absent pulmonary valve syndrome

ASD

Atrial septal defect

CoA

Coarctation of the Aorta

DTGA

dextro-Transposition of the great arteries

HTX

Heterotaxy

PA

Pulmonary Atresia

PS

Pulmonary Stenosis

TriAtresia

Tricuspid atresia

VSD

Ventricular Septal Defect

Footnotes

C.S., E.G., B.D.G., R.L., J.S., H.H., and W.K.C. are co-senior authors.

REFERENCES

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

304458R2 Acknowledgement Permissions

304458R2 Online Data Supplement

CircRes_CIRCRES-2014-304458.xml