Detecting differential usage of exons from RNA-seq data - PubMed (original) (raw)

Detecting differential usage of exons from RNA-seq data

Simon Anders et al. Genome Res. 2012 Oct.

Abstract

RNA-seq is a powerful tool for the study of alternative splicing and other forms of alternative isoform expression. Understanding the regulation of these processes requires sensitive and specific detection of differential isoform abundance in comparisons between conditions, cell types, or tissues. We present DEXSeq, a statistical method to test for differential exon usage in RNA-seq data. DEXSeq uses generalized linear models and offers reliable control of false discoveries by taking biological variation into account. DEXSeq detects with high sensitivity genes, and in many cases exons, that are subject to differential exon usage. We demonstrate the versatility of DEXSeq by applying it to several data sets. The method facilitates the study of regulation and function of alternative exon usage on a genome-wide scale. An implementation of DEXSeq is available as an R/Bioconductor package.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Flattening of gene models: This (fictional) gene has three annotated transcripts involving three exons (light shading), one of which has alternative boundaries. We form counting bins (dark shaded boxes) from the exons as depicted; the exon of variable length gets split into two bins.

Figure 2.

Figure 2.

Dependence of dispersion on the mean. Each dot corresponds to one counting bin in the data of Brooks et al. (2010) (discussed in detail in the Results section); (_x_-axis) normalized count, averaged over all samples; (_y_-axis) estimate of the dispersion. The bars at the bottom denote dispersion values outside the plotting range (in particular, those cases in which the sample dispersion is close to zero). (Solid red line) The regression line; (dashed lines) the 1-, 5-, 95-, and 99-percentiles of the χ2 distribution with 4 degrees of freedom scaled such that it has the fitted mean.

Figure 3.

Figure 3.

The treatment of knocking down the splicing factor pasilla affects the fourth exon (counting bin E004) of the gene Ten-m (CG5723). (Top panel) Fitted values according to the linear model; (middle panel) normalized counts for each sample; (bottom panel) flattened gene model. (Red) Data for knockdown samples; (blue) control.

Figure 4.

Figure 4.

Fold changes of exon usage versus averaged normalized count value for all tested counting bins for the Brooks and coworkers data. (Red) Significance at 10% FDR. Bars at the margin represent bins with fold changes outside the plotting range.

Figure 5.

Figure 5.

Ribosomal protein gene RpS14b (from the Brooks and coworkers data) is shown here as an example for a gene with heterogeneous dispersion. The first exon has zero count in the paired-end samples untreated 2, in the single-end sample treated 2, and in the paired-end sample treated 3, and large nonzero counts in the four other samples. Colors are as in Figure 3.

Similar articles

Cited by

References

    1. Anders S. 2011. http://www-huber.embl.de/users/anders/HTSeq/ HTSeq: Analysing high-throughput sequencing data with Python.
    1. Anders S, Huber W 2010. Differential expression analysis for sequence count data. Genome Biol 11: R106 doi: 10.1186/gb-2010-11-10-r106 - PMC - PubMed
    1. Baggerly KA, Deng L, Morris JS, Aldaz CM 2003. Differential expression in SAGE: Accounting for normal between-library variation. Bioinformatics 19: 1477–1483 - PubMed
    1. Blekhman R, Marioni JC, Zumbo P, Stephens M, Gilad Y 2010. Sex-specific and lineage-specific alternative splicing in primates. Genome Res 20: 180–189 - PMC - PubMed
    1. Bourgon R, Gentleman R, Huber W 2010. Independent filtering increases detection power for high-throughput experiments. Proc Natl Acad Sci 107: 9546–9551 - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources