The greedy path-merging algorithm for contig scaffolding (original) (raw)

Authors Info & Claims

Published: 01 September 2002 Publication History

Abstract

Given a collection of contigs and mate-pairs. The Contig Scaffolding Problem is to order and orientate the given contigs in a manner that is consistent with as many mate-pairs as possible. This paper describes an efficient heuristic called the greedy-path merging algorithm for solving this problem. The method was originally developed as a key component of the compartmentalized assembly strategy developed at Celera Genomics. This interim approach was used at an early stage of the sequencing of the human genome to produce a preliminary assembly based on preliminary whole genome shotgun data produced at Celera and preliminary human contigs produced by the Human Genome Project.

References

[1]

Benson, D. A., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., Rapp, B. A., and Wheeler, D. L. 2000. Genbank. Nuc. Acids Res. 28, 1, 15--8.

[2]

Bevington, P. R. 1969. Data Reduction and Error Analysis for the Physical Sciences. McGraw-Hill, Inc., New York.

[3]

Garey, M. R., and Johnson, D. S. 1979. Computers and Intractability, A Guide to the Theory of NP-completeness. Bell Telephone Laboratories, Inc.

[4]

Green, P. 1994. Documentation for Phrap. http://bozeman.mbt.washington.edu/phrap.docs/phrap.html.

[5]

Huson, D. H., Reinert, K., Kravitz, S. A., Remington, K. A., Delcher, A. L., Dew, I. M., Flanigan, M., Halpern, A. L., Lai, Z., Mobarry, C. M., Sutton, G. G., and Myers, E. W. 2001. Design of a compartmentalized shotgun assembler for the human genome. Bioinformatics (Proceedings of ISMB 2001) 17, 132--139.

[6]

Huson, D. H., Reinert, K., and Myers, E. W. 2001b. The greedy path-merging algorithm for sequence assembly. In Proceedings of the 5th Annual International Conference on Computational Molecular Biology (RECOMB-01), pp. 157--163.

[7]

International Human Genome Sequencing Consortium. 2001. Initial sequencing and analysis of the human genome. Nature 409, 6822, 860--921.

[8]

Lander, E. S., and Waterman, M. S. 1988. Genomic mapping by fingerprinting random clones: A mathematical analysis. Genomics 2, 231--239.

[9]

Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fasulo, D. P., Flanigan, M. J., Kravitz, S. A., Mobarry, C. M., Reinert, K. H. J., Remington, K. A., Anson, E. L., Bolanos, R. A., Chou, H.-H., Jordan, C. M., Halpern, A. L., Lonardi, S., Beasley, E. M., Brandon, R. C., Chen, L., Dunn, P. J., Lai, Z., Liang, Y., Nusskern, D. R., Zhan, M., Zhang, Q., Zheng, X., Rubin, G. M., Adams, M. D., and Venter, J. C. 2000. A whole-genome assembly of Drosophila. Science 287, 2196--2204.

[10]

Sanger, F., Coulson, A. R., Hong, G. F., Hill, D. F., and Petersen, G. B. 1992. Nucleotide sequence of bacteriophage λ DNA. J. Mol. Bio. 162, 4, 729--773.

[11]

Sanger, F., Nicklen, S., and Coulson, A. R. 1977. DNA sequencing with chain-terminating inhibitors. Proc. Nat. Acad. Sci. 74, 12, 5463--5467.

[12]

U. S. Department of Energy, Office of Energy Research, and Office of Biological and Environmental Research. 1997. Human genome program report. http://www. ornl.gov/hgmis/publicat/97pr/.

[13]

Venter, J. C., Adams, M. D., Myers, E. W., et al. 2001. The sequence of the human genome. Science 291, 1145--1434.

[14]

Webber, J. L., and Myers, E. W. 1997. Human whole-genome shotgun sequencing. Gen. Res. 7, 5, 401--409.

Information & Contributors

Information

Published In

cover image Journal of the ACM

Journal of the ACM Volume 49, Issue 5

September 2002

137 pages

Copyright © 2002 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 September 2002

Published in JACM Volume 49, Issue 5

Permissions

Request permissions for this article.

Check for updates

Author Tag

  1. Genome assembly

Qualifiers

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

Reflects downloads up to 09 Jan 2025

Other Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Affiliations

Daniel H. Huson

Tübingen University, Tübingen, Germany

Knut Reinert

Free University Berlin, Berlin, Germany

Eugene W. Myers

University of California Berkeley, Berkeley, CA