Toward simplifying and accurately formulating fragment assembly - PubMed (original) (raw)
. 1995 Summer;2(2):275-90.
doi: 10.1089/cmb.1995.2.275.
Affiliations
- PMID: 7497129
- DOI: 10.1089/cmb.1995.2.275
Toward simplifying and accurately formulating fragment assembly
E W Myers. J Comput Biol. 1995 Summer.
Abstract
The fragment assembly problem is that of reconstructing a DNA sequence from a collection of randomly sampled fragments. Traditionally, the objective of this problem has been to produce the shortest string that contains all the fragments as substrings, but in the case of repetitive target sequences this objective produces answers that are overcompressed. In this paper, the problem is reformulated as one of finding a maximum-likelihood reconstruction with respect to the two-sided Kolmogorov-Smirnov statistic, and it is argued that this is a better formulation of the problem. Next the fragment assembly problem is recast in graph-theoretic terms as one of finding a noncyclic subgraph with certain properties and the objectives of being shortest or maximally likely are also recast in this framework. Finally, a series of graph reduction transformations are given that dramatically reduce the size of the graph to be explored in practical instances of the problem. This reduction is very important as the underlying problems are NP-hard. In practice, the transformed problems are so small that simple branch-and-bound algorithms successfully solve them, thus permitting auxiliary experimental information to be taken into account in the form of overlap, orientation, and distance constraints.
Similar articles
- Short superstrings and the structure of overlapping strings.
Armen C, Stein C. Armen C, et al. J Comput Biol. 1995 Summer;2(2):307-32. doi: 10.1089/cmb.1995.2.307. J Comput Biol. 1995. PMID: 7497131 - A new algorithm for DNA sequence assembly.
Idury RM, Waterman MS. Idury RM, et al. J Comput Biol. 1995 Summer;2(2):291-306. doi: 10.1089/cmb.1995.2.291. J Comput Biol. 1995. PMID: 7497130 - Reconstructing strings from substrings.
Skiena SS, Sundaram G. Skiena SS, et al. J Comput Biol. 1995 Summer;2(2):333-53. doi: 10.1089/cmb.1995.2.333. J Comput Biol. 1995. PMID: 7497132 - Prediction of function in DNA sequence analysis.
Gelfand MS. Gelfand MS. J Comput Biol. 1995 Spring;2(1):87-115. doi: 10.1089/cmb.1995.2.87. J Comput Biol. 1995. PMID: 7497122 Review.
Cited by
- The transcript catalogue of the short-lived fish Nothobranchius furzeri provides insights into age-dependent changes of mRNA levels.
Petzold A, Reichwald K, Groth M, Taudien S, Hartmann N, Priebe S, Shagin D, Englert C, Platzer M. Petzold A, et al. BMC Genomics. 2013 Mar 16;14:185. doi: 10.1186/1471-2164-14-185. BMC Genomics. 2013. PMID: 23496936 Free PMC article. - Paired de bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers.
Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P. Medvedev P, et al. J Comput Biol. 2011 Nov;18(11):1625-34. doi: 10.1089/cmb.2011.0151. Epub 2011 Oct 14. J Comput Biol. 2011. PMID: 21999285 Free PMC article. - Improved assembly of noisy long reads by k-mer validation.
Carvalho AB, Dupim EG, Goldstein G. Carvalho AB, et al. Genome Res. 2016 Dec;26(12):1710-1720. doi: 10.1101/gr.209247.116. Epub 2016 Oct 7. Genome Res. 2016. PMID: 27831497 Free PMC article. - New advances in sequence assembly.
Phillippy AM. Phillippy AM. Genome Res. 2017 May;27(5):xi-xiii. doi: 10.1101/gr.223057.117. Genome Res. 2017. PMID: 28461322 Free PMC article. No abstract available. - Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage.
Chakraborty M, Baldwin-Brown JG, Long AD, Emerson JJ. Chakraborty M, et al. Nucleic Acids Res. 2016 Nov 2;44(19):e147. doi: 10.1093/nar/gkw654. Epub 2016 Jul 25. Nucleic Acids Res. 2016. PMID: 27458204 Free PMC article.
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous