Identification of common genetic variants controlling transcript isoform variation in human whole blood - PubMed (original) (raw)

Identification of common genetic variants controlling transcript isoform variation in human whole blood

Xiaoling Zhang et al. Nat Genet. 2015 Apr.

Abstract

An understanding of the genetic variation underlying transcript splicing is essential to dissect the molecular mechanisms of common disease. The available evidence from splicing quantitative trait locus (sQTL) studies has been limited to small samples. We performed genome-wide screening to identify SNPs that might control mRNA splicing in whole blood collected from 5,257 Framingham Heart Study participants. We identified 572,333 cis sQTLs involving 2,650 unique genes. Many sQTL-associated genes (40%) undergo alternative splicing. Using the National Human Genome Research Institute (NHGRI) genome-wide association study (GWAS) catalog, we determined that 528 unique sQTLs were significantly enriched for 8,845 SNPs associated with traits in previous GWAS. In particular, we found 395 (4.5%) GWAS SNPs with evidence of cis sQTLs but not gene-level cis expression quantitative trait loci (eQTLs), suggesting that sQTL analysis could provide additional insights into the functional mechanism underlying GWAS results. Our findings provide an informative sQTL resource for further characterizing the potential functional roles of SNPs that control transcript isoforms relevant to common diseases.

PubMed Disclaimer

Conflict of interest statement

COMPETING FINANCIAL INTERESTS

The authors declare no competing financial interests.

Figures

Figure 1

Figure 1

Overview of sQTL analysis. Flowchart of the gene-level and exon-level _cis_-QTL analysis used to identify cis splicing QTL associations (cis sQTLs) using 5,257 whole-blood samples and 9,623,954 common SNPs (minor allele frequency (MAF) > 1%) within 50 kb of each gene. *Among the 8,799 genes identified by exon-level QTL analysis, 5,935 overlapped with 6,137 genes from the gene-level analysis. Therefore, 2,864 genes with only QTL associations at the exon level were defined as genes with sQTL associations, including 672,845 significant probe set–SNP pairs. After excluding 214 (7.5%) genes with suspect sQTL signals due to the SNP-in-probe problem (corresponding to 760 (10%) unique exons), 572,333 cis sQTLs (2,650 genes) were finally identified.

Figure 2

Figure 2

An example of the PTGS1 gene with its associated cis sQTL located in the 3′ acceptor splice site, (a) Visualization of probe set 3188118 in the context of all other probe sets belonging to the PTGS1 gene (Affymetrix transcript ID 3188111). For each probe set, the fold change between the mean expression scores of the two homozygous genotypes (mean(AA)/mean(GG)) at rs3842788 is shown by vertical gray bars (probe set 3188118 is highlighted in green). The horizontal red line represents the fold change in expression at the whole-gene-level against SNP rs3842788. (b) Box plot of the expression signals for probe set 3188118 with the indicated genotypes at SNP rs3842788, giving a P value for association of 2.2 × 10−25 (P value = 0.001 for the association between rs3842788 genotype and the whole-gene expression level of PTGS1). The sample size analyzed is shown under each genotype. The solid horizontal line within the box represents the median. The interquartile range (IQR) is defined as Q3-Q1 with whiskers that extend 1.5 times the IQR from the box edges, (c) Schematic of the seven transcript isoforms of PTGS1 in the NCBI RefSeq database. A zoomed-in view of exon 3 of PTGS1 with Affymetrix Exon array core probe sets is shown below the exon. The significant probe set 3188118 is highlighted in blue and corresponds to alternative 3′ acceptor splice site usage that results in a shorter exon for transcript NM_001271367.1. The SNP rs3842788 is located in the last position of the intron and is a G>A substitution that disrupts the consensus splice-site sequence.

Figure 3

Figure 3

Cis sQTLs are highly enriched in binding sites for RNA-binding proteins. Examples are shown for the RNA-binding proteins ELAVL1 and PABPC1 that are present in the ENCODE GM12878 (lymphoid) and K562 (myeloid) cell lines, respectively. Error bars, s.d. Red data points represent minimum values, first quarter, median, third quarter and maximum value, which provide an overall distribution for the enrichment score obtained through 200 permutation tests.

Figure 4

Figure 4

Functional annotation of genes and exons with _cis_-sQTL associations, (a) Concordance of sQTL-associated exons and known alternative splicing events. (b) Example of the bleeding exon type of transcript isoform event: OAS2, probe set 3432520 versus rs117666908. (c) Example of the cassette exon type of transcript isoform event: STXBP2, probe set 3819052 versus rs72994460.

References

    1. Hindorff LA et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009). - PMC - PubMed
    1. Cookson W, Liang L, Abecasis G, Moffatt M & Lathrop M Mapping complex disease traits with global gene expression. Nat. Rev. Genet 10, 184–194 (2009). - PMC - PubMed
    1. Westra HJ et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet 45, 1238–1243 (2013). - PMC - PubMed
    1. Li Q, Lee JA & Black DL Neuronal regulation of alternative pre-mRNA splicing. Nat. Rev. Neurosci 8, 819–831 (2007). - PubMed
    1. Yeo G, Holste D, Kreiman G & Burge CB Variation in alternative splicing across human tissues. Genome Biol. 5, R74 (2004). - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources