The greedy path-merging algorithm for contig scaffolding (original) (raw)
Published: 01 September 2002 Publication History
Abstract
Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this problem. The method was originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics. This interim approach was used at an early stage of the sequencing of the human genome to produce a preliminary assembly based on preliminary whole genome shotgun data produced at Celera and preliminary human contigs produced by the Human Genome Project.
References
[1]
Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. 2000. Genbank. Nuc. Acids Res. 28, 1, 15--8.
[2]
Bevington, P. R. 1969. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, Inc., New York.
[3]
Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability, A Guide to the Theory of NP-completeness. Bell Telephone Laboratories, Inc.
[4]
Green, P. 1994. Documentation for Phrap. http://bozeman.mbt.washington.edu/phrap.docs/phrap.html.
[5]
Huson, D. H., Reinert, K., Kravitz, S. A., Remington, K. A., Delcher, A. L., Dew, I. M., Flanigan, M., Halpern, A. L., Lai, Z., Mobarry, C. M., Sutton, G. G., and Myers, E. W. 2001. Design of a compartmentalized shotgun assembler for the human genome. Bioinformatics (Proceedings of ISMB 2001) 17, 132--139.
[6]
Huson, D. H., Reinert, K., and Myers, E. W. 2001b. The greedy path-merging algorithm for sequence assembly. In Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB-01), pp. 157--163.
[7]
International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409, 6822, 860--921.
[8]
Lander, E. S., and Waterman, M. S. 1988. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2, 231--239.
[9]
Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fasulo, D. P., Flanigan, M. J., Kravitz, S. A., Mobarry, C. M., Reinert, K. H. J., Remington, K. A., Anson, E. L., Bolanos, R. A., Chou, H.-H., Jordan, C. M., Halpern, A. L., Lonardi, S., Beasley, E. M., Brandon, R. C., Chen, L., Dunn, P. J., Lai, Z., Liang, Y., Nusskern, D. R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G. M., Adams, M. D., and Venter, J. C. 2000. A whole-genome assembly of Drosophila. Science 287, 2196--2204.
[10]
Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., and Petersen, G. B. 1992. Nucleotide sequence of bacteriophage λ DNA. J. Mol. Bio. 162, 4, 729--773.
[11]
Sanger, F., Nicklen, S., and Coulson, A. R. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Nat. Acad. Sci. 74, 12, 5463--5467.
[12]
U. S. Department of Energy, Office of Energy Research, and Office of Biological and Environmental Research. 1997. Human genome program report. http://www. ornl.gov/hgmis/publicat/97pr/.
[13]
Venter, J. C., Adams, M. D., Myers, E. W., et al. 2001. The sequence of the human genome. Science 291, 1145--1434.
[14]
Webber, J. L., and Myers, E. W. 1997. Human whole-genome shotgun sequencing. Gen. Res. 7, 5, 401--409.
Information & Contributors
Information
Published In
Journal of the ACM Volume 49, Issue 5
September 2002
137 pages
Copyright © 2002 ACM.
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
Published: 01 September 2002
Published in JACM Volume 49, Issue 5
Permissions
Request permissions for this article.
Check for updates
Author Tag
Qualifiers
- Article
Contributors
Other Metrics
Bibliometrics & Citations
Bibliometrics
Article Metrics
- View Citations
- Downloads (Last 12 months)10
- Downloads (Last 6 weeks)2
Reflects downloads up to 09 Jan 2025
Other Metrics
Citations
- Baire AMarijon PAndreace FPeterlongo P(2024)Back to sequences: Find the origin of k-mersJournal of Open Source Software10.21105/joss.070669:101(7066)Online publication date: Sep-2024
- Epain VAndonov R(2024)Global exact optimisations for chloroplast structural haplotype scaffoldingAlgorithms for Molecular Biology10.1186/s13015-023-00243-119:1Online publication date: 6-Feb-2024
- Bhowmik ORahman TKalyanaraman A(2024)Maptcha: an efficient parallel workflow for hybrid genome scaffoldingBMC Bioinformatics10.1186/s12859-024-05878-425:1Online publication date: 8-Aug-2024
- Zhu WLiu YZhao YLiao XTong MLiao X(2023)An Optimized Scaffolding Algorithm for Unbalanced SequencingNew Generation Computing10.1007/s00354-023-00221-641:3(553-579)Online publication date: 28-May-2023
- Zhao JZhang SWu SZhang WSu X(2023)Current Progress of Bioinformatics for Human HealthMethodologies of Multi-Omics Data Integration and Data Mining10.1007/978-981-19-8210-1_8(145-162)Online publication date: 16-Jan-2023
- Li MLi L(2022)RegScaf: a regression approach to scaffoldingBioinformatics10.1093/bioinformatics/btac17438:10(2675-2682)Online publication date: 25-Mar-2022
- Dida FYi G(2021) Empirical evaluation of methods for de novo genome assemblyPeerJ Computer Science10.7717/peerj-cs.6367(e636)Online publication date: 9-Jul-2021
- Rahman APachter L(2021)SWALO: scaffolding with assembly likelihood optimizationNucleic Acids Research10.1093/nar/gkab717Online publication date: 20-Aug-2021
- Ma JZhu DJiang HZhu B(2021)On the solution bound of two-sided scaffold fillingTheoretical Computer Science10.1016/j.tcs.2021.04.024873(47-63)Online publication date: Jun-2021
- El-Khishin DAgeez ASaad MIbrahim AShokrof MHassan LAbouelhoda M(2020)Sequencing and assembly of the Egyptian buffalo genomePLOS ONE10.1371/journal.pone.023708715:8(e0237087)Online publication date: 19-Aug-2020
- Show More Cited By
View Options
Login options
Check if you have access through your login credentials or your institution to get full access on this article.
Full Access
View options
View or Download as a PDF file.
eReader
View online with eReader.
Media
Figures
Other
Tables
Affiliations
Daniel H. Huson
Tübingen University, Tübingen, Germany
Knut Reinert
Free University Berlin, Berlin, Germany
Eugene W. Myers
University of California Berkeley, Berkeley, CA