RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease - PubMed (original) (raw)
. 2015 Jan 9;347(6218):1254806.
doi: 10.1126/science.1254806. Epub 2014 Dec 18.
Babak Alipanahi 1, Leo J Lee 1, Hannes Bretschneider 2, Daniele Merico 3, Ryan K C Yuen 3, Yimin Hua 4, Serge Gueroussov 5, Hamed S Najafabadi 1, Timothy R Hughes 6, Quaid Morris 7, Yoseph Barash 8, Adrian R Krainer 4, Nebojsa Jojic 9, Stephen W Scherer 10, Benjamin J Blencowe 11, Brendan J Frey 12
Affiliations
- PMID: 25525159
- PMCID: PMC4362528
- DOI: 10.1126/science.1254806
RNA splicing. The human splicing code reveals new insights into the genetic determinants of disease
Hui Y Xiong et al. Science. 2015.
Abstract
To facilitate precision medicine and whole-genome annotation, we developed a machine-learning technique that scores how strongly genetic variants affect RNA splicing, whose alteration contributes to many diseases. Analysis of more than 650,000 intronic and exonic variants revealed widespread patterns of mutation-driven aberrant splicing. Intronic disease mutations that are more than 30 nucleotides from any splice site alter splicing nine times as often as common variants, and missense exonic disease mutations that have the least impact on protein function are five times as likely as others to alter splicing. We detected tens of thousands of disease-causing mutations, including those involved in cancers and spinal muscular atrophy. Examination of intronic and exonic variants found using whole-genome sequencing of individuals with autism revealed misspliced genes with neurodevelopmental phenotypes. Our approach provides evidence for causal variants and should enable new discoveries in precision medicine.
Copyright © 2015, American Association for the Advancement of Science.
Figures
Figure 1. The human splicing code
(a) For a given cell type, the computational model extracts the regulatory code from a test DNA sequence and predicts the percent of transcripts with the exon spliced in, Ψ. (b) Predictions were made for 10,689 test exons profiled in 16 tissues, exons and tissues were binned according to their RNA-seq assessed values of Ψ, and for each bin (column) the distribution of code-predicted Ψ is plotted (_n_=56,104).
Figure 2. Accounting for RNA-binding proteins
(a) The splicing code accounts for the affinities of RNA-binding proteins assayed in 98 in vitro experiments (13). (b) When code-predicted Ψ values are subtracted from RNA-seq assessed values of Ψ, their correlations with the binding affinities mostly vanish.
Figure 3. Genome-wide analysis of genetic variations
(a) To assess the effect of a single nucleotide variation (SNV), the computational model is applied to the reference sequence and the variant. Then, the maximum difference ΔΨ across tissues is computed, along with a ‘regulatory score’ that also accounts for prediction confidence (Sec. S7). (b) The effect on Ψ of 658,420 intronic and exonic SNVs. (c) Locations and predicted ΔΨ of 81,608 disease annotated intronic SNVs and synonymous or missense exonic SNVs. In different sequence regions, the scores of disease SNVs tend to be larger than those of SNPs (Ansari-Bradley tests for equal dispersion, n includes both types).
Figure 4. Regulatory scores of GWAS SNPs
(a) Distributions of regulatory scores for GWAS-implicated SNPs (_n_=457), non-GWAS-implicated SNPs (_n_=262,347) and disease SNVs (_n_=18,291) in introns. (b) Regulatory scores of disease annotated intronic SNVs that are causal (_n_=17,631), supported by in vitro/vivo data (_n_=224), only associated (_n_=324), or associated but have additional functional evidence (_n_=112). t-test _P_-values.
Figure 5. The mutational landscape of spinal muscular atrophy
(a) Spinal muscular atrophy arises when there is homozygous loss of SMN1 function, but functional protein can be produced by modifying the regulation of SMN2, which differs from SMN1 in four nucleotides (red lightning bolts) and exhibits decreased inclusion of exon 7. (b) Three mutations that the splicing code predicts will increase exon 7 inclusion in SMN2 (green lighting bolts) were selected from predictions for all possible single-nucleotide substitutions 150nt into the intron. These were validated using RT-PCR (c), along with the predicted differences in SMN1 and SMN2 regulation due to three individual substitutions and all four substitutions. Predictions and RT-PCR data have a Spearman correlation of 0.82 (_P_=0.017, one-sided permutation test). (d) Predicted ΔΨ for 85 individual mutations located in four regions are plotted against RT-PCR-assessed values; the Spearman correlation is 0.74 (_P_=5.7e-16, one-sided permutation test).
Figure 6. The mutational landscape of nonpolyposis colorectal cancer
(a) Predicted ΔΨ for mutations in MLH1 and MSH2 arising in patients with nonpolyposis colorectal cancer, or Lynch syndrome. Coding sequence (CDS) numbering is based on GenBank NM_000249.3 and NM_000251.2 and starts at A of the ATG translation initiation codon. (b) Validation using 134 MLH1 variations tested by RT-PCR (AUC=92.4%, _P_=2.8e-24, one-sided permutation test) and 73 MSH2 variations (AUC=93.8%, _P_=8.7e-15, one-sided permutation test).
Figure 7. Splicing misregulation in individuals with autism
(a) Genes containing at least one SNV that the computational model predicts will cause decreased exon inclusion were identified in five autism spectrum disorder (ASD) cases and twelve controls, by thresholding ΔΨ using either the 2nd or 3rd percentile of ΔΨ for SNPs. (b) Genes that our method predicts are misregulated in ASD cases more frequently have high expression in brain tissues than in control cases. (c) The effect of varying the threshold on ΔΨ, and thus the number of case and control genes, on the odds ratio for the enrichment of central nervous system development genes (GO:0007417); in all cases, P<0.05.
Figure 8. Misregulated genes and functional categories enriched in individuals with autism
Gene Ontology and pathway categories that are enriched (_P_≤0.01, Fisher's exact test) in misregulated genes from ASD cases compared to controls were identified (_n_=18), along with the corresponding set of genes from ASD cases. Each gene set is shown as a red or pink dot, depending on whether the 2nd or 3rd percentile threshold was used for detection (Fig. 7a), and size is proportional to the number of genes in the set. Edge thickness indicates the fraction of genes shared between two sets. Groups of functionally related gene sets are highlighted by blond discs. The names of novel genes that are not already implicated in ASD and have neural-related phenotypes are printed in black, the names of genes already implicated in ASD are printed in red, and otherwise gene names are printed in pale blue. If a gene is in multiple categories, the number of categories is written in superscript and genes in which a stop codon is introduced by the SNV are labeled ‘s’.
Comment in
- RNA. Prescribing splicing.
Guigó R, Valcárcel J. Guigó R, et al. Science. 2015 Jan 9;347(6218):124-5. doi: 10.1126/science.aaa4864. Science. 2015. PMID: 25574005 Free PMC article. - Genetic variation and alternative splicing.
Estivill X. Estivill X. Nat Biotechnol. 2015 Apr;33(4):357-9. doi: 10.1038/nbt.3195. Nat Biotechnol. 2015. PMID: 25850059 No abstract available.
Similar articles
- Genetic variation and alternative splicing.
Estivill X. Estivill X. Nat Biotechnol. 2015 Apr;33(4):357-9. doi: 10.1038/nbt.3195. Nat Biotechnol. 2015. PMID: 25850059 No abstract available. - RNA. Prescribing splicing.
Guigó R, Valcárcel J. Guigó R, et al. Science. 2015 Jan 9;347(6218):124-5. doi: 10.1126/science.aaa4864. Science. 2015. PMID: 25574005 Free PMC article. - RNA analysis reveals splicing mutations and loss of expression defects in MLH1 and BRCA1.
Sharp A, Pichert G, Lucassen A, Eccles D. Sharp A, et al. Hum Mutat. 2004 Sep;24(3):272. doi: 10.1002/humu.9267. Hum Mutat. 2004. PMID: 15300854 - The contribution of alternative splicing to genetic risk for psychiatric disorders.
Reble E, Dineen A, Barr CL. Reble E, et al. Genes Brain Behav. 2018 Mar;17(3):e12430. doi: 10.1111/gbb.12430. Epub 2017 Dec 6. Genes Brain Behav. 2018. PMID: 29052934 Review. - Pseudoexon activation in disease by non-splice site deep intronic sequence variation - wild type pseudoexons constitute high-risk sites in the human genome.
Petersen USS, Doktor TK, Andresen BS. Petersen USS, et al. Hum Mutat. 2022 Feb;43(2):103-127. doi: 10.1002/humu.24306. Epub 2021 Dec 5. Hum Mutat. 2022. PMID: 34837434 Review.
Cited by
- From computational models of the splicing code to regulatory mechanisms and therapeutic implications.
Capitanchik C, Wilkins OG, Wagner N, Gagneur J, Ule J. Capitanchik C, et al. Nat Rev Genet. 2024 Oct 2. doi: 10.1038/s41576-024-00774-2. Online ahead of print. Nat Rev Genet. 2024. PMID: 39358547 Review. - Variant Impact Predictor database (VIPdb), version 2: trends from three decades of genetic variant impact predictors.
Lin YJ, Menon AS, Hu Z, Brenner SE. Lin YJ, et al. Hum Genomics. 2024 Aug 28;18(1):90. doi: 10.1186/s40246-024-00663-z. Hum Genomics. 2024. PMID: 39198917 Free PMC article. - Spatial multi-omics: deciphering technological landscape of integration of multi-omics and its applications.
Liu X, Peng T, Xu M, Lin S, Hu B, Chu T, Liu B, Xu Y, Ding W, Li L, Cao C, Wu P. Liu X, et al. J Hematol Oncol. 2024 Aug 24;17(1):72. doi: 10.1186/s13045-024-01596-9. J Hematol Oncol. 2024. PMID: 39182134 Free PMC article. Review. - Quantum mechanical electronic and geometric parameters for DNA k-mers as features for machine learning.
Masuda K, Abdullah AA, Pflughaupt P, Sahakyan AB. Masuda K, et al. Sci Data. 2024 Aug 22;11(1):911. doi: 10.1038/s41597-024-03772-5. Sci Data. 2024. PMID: 39174574 Free PMC article. - Clinical, pathological and genetic characteristics of 17 unrelated children with Alagille Syndrome.
Yan J, Huang Y, Cao L, Dong Y, Xu Z, Wang F, Gao Y, Feng D, Zhang M. Yan J, et al. BMC Pediatr. 2024 Aug 20;24(1):532. doi: 10.1186/s12887-024-04973-y. BMC Pediatr. 2024. PMID: 39164659 Free PMC article.
References
- Bernstein BE, et al. Nature. 2012;489:57–74. - PubMed
- Barash Y, et al. Nature. 2010;465:53–9. - PubMed
- Barbosa-Morais NL, et al. Science. 2012;338:1587–1593. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
- R37 GM042699/GM/NIGMS NIH HHS/United States
- R37-GM42699A/GM/NIGMS NIH HHS/United States
- CAPMC/ CIHR/Canada
- P30 CA045508/CA/NCI NIH HHS/United States
- R01 GM042699/GM/NIGMS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical