A novel min-cost flow method for estimating transcript expression with RNA-Seq - PubMed (original) (raw)

A novel min-cost flow method for estimating transcript expression with RNA-Seq

Alexandru I Tomescu et al. BMC Bioinformatics. 2013.

Abstract

Background: Through transcription and alternative splicing, a gene can be transcribed into different RNA sequences (isoforms), depending on the individual, on the tissue the cell is in, or in response to some stimuli. Recent RNA-Seq technology allows for new high-throughput ways for isoform identification and quantification based on short reads, and various methods have been put forward for this non-trivial problem.

Results: In this paper we propose a novel radically different method based on minimum-cost network flows. This has a two-fold advantage: on the one hand, it translates the problem as an established one in the field of network flows, which can be solved in polynomial time, with different existing solvers; on the other hand, it is general enough to encompass many of the previous proposals under the least sum of squares model. Our method works as follows: in order to find the transcripts which best explain, under a given fitness model, a splicing graph resulting from an RNA-Seq experiment, we find a min-cost flow in an offset flow network, under an equivalent cost model. Under very weak assumptions on the fitness model, the optimal flow can be computed in polynomial time. Parsimoniously splitting the flow back into few path transcripts can be done with any of the heuristics and approximations available from the theory of network flows. In the present implementation, we choose the simple strategy of repeatedly removing the heaviest path.

Conclusions: We proposed a new very general method based on network flows for a multiassembly problem arising from isoform identification and quantification with RNA-Seq. Experimental results on prediction accuracy show that our method is very competitive with popular tools such as Cufflinks and IsoLasso. Our tool, called Traph (Transcrips in gRAPHs), is available at: http://www.cs.helsinki.fi/gsa/traph/.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Example of an offset network. An input G to Problem UTEJC (a), and the offset network G* (b); arcs are labeled with their capacity, unlabeled arcs having infinite capacity

Figure 2

Figure 2

Performance on simulated data. Performance of IsoLasso, Cufflinks, and Traph on simulated data: single genes scenario (a), (b); batch mode scenario (c), (d)

Figure 3

Figure 3

Results on real human data. Histogram of the distribution of transcript lengths of the annotation, and of the ones reported by Traph, Cufflinks and IsoLasso

Figure 4

Figure 4

Results on real human data. Venn diagram of the intersections of the sets of reported transcripts

Similar articles

Cited by

References

    1. Mortazavi A, Williams BAA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature Methods. 2008;5:621–628. doi: 10.1038/nmeth.1226. - DOI - PubMed
    1. Pepke S, Wold B, Mortazavi A. Computation for ChIP-seq and RNA-seq studies. Nature methods. 2009;6:(11):s22–s32. - PMC - PubMed
    1. Shah S. et al.The clonal and mutational evolution spectrum of primary triple-negative breast cancers. Nature. 2012;486:(7403):395–399. - PMC - PubMed
    1. Trapnell C. et al.Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature Biotechnology. 2010;28:511–515. doi: 10.1038/nbt.1621. - DOI - PMC - PubMed
    1. Heber S, Alekseyev M, Sze SH, Tang H, Pevzner PA. Splicing graphs and EST assembly problem. Bioinformatics. 2002;18(suppl 1):S181–S188. doi: 10.1093/bioinformatics/18.suppl_1.S181. - DOI - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources