Statistical inferences for isoform expression in RNA-Seq - PubMed (original) (raw)

Statistical inferences for isoform expression in RNA-Seq

Hui Jiang et al. Bioinformatics. 2009.

Abstract

The development of RNA sequencing (RNA-Seq) makes it possible for us to measure transcription at an unprecedented precision and throughput. However, challenges remain in understanding the source and distribution of the reads, modeling the transcript abundance and developing efficient computational methods. In this article, we develop a method to deal with the isoform expression estimation problem. The count of reads falling into a locus on the genome annotated with multiple isoforms is modeled as a Poisson variable. The expression of each individual isoform is estimated by solving a convex optimization problem and statistical inferences about the parameters are obtained from the posterior distribution by importance sampling. Our results show that isoform expression inference in RNA-Seq is possible by employing appropriate statistical methods.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Histogram of gene expressions in liver samples in the unit of RPKM. Genes are grouped into eight log-scaled bins according to their expressions. Genes are considered to be lowly (or highly) expressed if their RPKMs are below 1 (or above 100). Genes that have RPKMs between 1 and 100 are considered to be moderately expressed.

Fig. 2.

Fig. 2.

(a) Visualization of RNA-Seq reads falling into mouse gene Pdlim5 in CisGenome Browser (Ji et al., 2008). The four horizontal tracks in the picture are (from top to bottom): genomic coordinates, gene structure where exons are magnified for better visualization, the reads falling into each genomic coordinate in brain and muscle samples, where the red or blue bar represents the number of reads on the forward or reverse strand that starts at that position. Visualization of mouse genes Dbi (b), Clk1 (c) and Fetub (d) in brain tissue.

Fig. 3.

Fig. 3.

Statistical inference using importance sampling for mouse gene Fetub in brain tissue (Fig. 2d). Histograms of marginal posterior distribution of θ1, θ2 and θ3 are given in (a), (b) and (c), respectively. The gene expression is shown in (d) as a reference. The two red dotted vertical lines in each pictures are the boundaries for the 95% probability intervals. (e), (f) and (g) are the heatmaps showing marginal posterior distributions of all two-parameter combinations. We can see from the heatmaps that θ1 is almost uncorrelated with the other two parameters, while θ2 and θ3 are negatively correlated.

Similar articles

Cited by

References

    1. Cloonan N, et al. Stem cell transcriptome profiling via massive-scale mRNA sequencing. Nat. Methods. 2008;5:613–619. - PubMed
    1. Ji H, et al. An integrated software system for analyzing chip-chip and chip-seq data. Nat. Biotechnol. 2008;26:1293–1300. - PMC - PubMed
    1. Jiang H, Wong WH. Seqmap : mapping massive amount of oligonucleotides to the genome. Bioinformatics. 2008;24:2395–2396. - PMC - PubMed
    1. Kapur K, et al. Cross-hybridization modeling on affymetrix exon arrays. Bioinformatics. 2008;24:2887–2893. - PMC - PubMed
    1. Karolchik D, et al. The UCSC genome browser database: 2008 update. Nucleic Acids Res. 2008;36:D773–D779. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources