Genome assembly comparison identifies structural variants in the human genome (original) (raw)

References

  1. Marth, G.T. et al. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23, 452–456 (1999).
    Article CAS Google Scholar
  2. Tsui, C. et al. Single nucleotide polymorphisms (SNPs) that map to gaps in the human SNP map. Nucleic Acids Res. 31, 4910–4916 (2003).
    Article CAS Google Scholar
  3. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).
    Article CAS Google Scholar
  4. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
    Article CAS Google Scholar
  5. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
    Article CAS Google Scholar
  6. Myers, E.W., Sutton, G.G., Smith, H.O., Adams, M.D. & Venter, J.C. On the sequencing and assembly of the human genome. Proc. Natl. Acad. Sci. USA 99, 4145–4146 (2002).
    Article CAS Google Scholar
  7. Adams, M.D., Sutton, G.G., Smith, H.O., Myers, E.W. & Venter, J.C. The independence of our genome assemblies. Proc. Natl. Acad. Sci. USA 100, 3025–3026 (2003).
    Article CAS Google Scholar
  8. Istrail, S. et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl. Acad. Sci. USA 101, 1916–1921 (2004).
    Article CAS Google Scholar
  9. Waterston, R.H., Lander, E.S. & Sulston, J.E. On the sequencing of the human genome. Proc. Natl. Acad. Sci. USA 99, 3712–3716 (2002).
    Article CAS Google Scholar
  10. Waterston, R.H., Lander, E.S. & Sulston, J.E. More on the sequencing of the human genome. Proc. Natl. Acad. Sci. USA 100, 3022–3024 (2003).
    Article CAS Google Scholar
  11. Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
    Article CAS Google Scholar
  12. Zhang, Z., Schwartz, S., Wagner, L. & Miller, W. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000).
    Article CAS Google Scholar
  13. Mobarry, C. & Sutton, G. An assembly-to-assembly comparison tool. in Proceedings of the Third Annual RECOMB Satellite Meeting on DNA Sequencing Technologies and Computation (2003).
    Google Scholar
  14. Kent, W.J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
    Article CAS Google Scholar
  15. Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).
    Article CAS Google Scholar
  16. Bailey, J.A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
    Article CAS Google Scholar
  17. Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
    Article CAS Google Scholar
  18. Redon, R. et al. Global variation in copy number in the human genome. Nature (in the press).
  19. Wang, J. et al. dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum. Mutat. 27, 323–329 (2006).
    Article Google Scholar
  20. Hillier, L.W. et al. The DNA sequence of human chromosome 7. Nature 424, 157–164 (2003).
    Article CAS Google Scholar
  21. Scherer, S.W. et al. Human chromosome 7: DNA sequence and biology. Science 300, 767–772 (2003).
    Article CAS Google Scholar
  22. Schmutz, J. et al. The DNA sequence and comparative analysis of human chromosome 5. Nature 431, 268–274 (2004).
    Article CAS Google Scholar
  23. Shendure, J., Mitra, R.D., Varma, C. & Church, G.M. Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344 (2004).
    Article CAS Google Scholar
  24. Bennett, S.T., Barnes, C., Cox, A., Davies, L. & Brown, C. Toward the 1,000 dollars human genome. Pharmacogenomics 6, 373–382 (2005).
    Article CAS Google Scholar
  25. Service, R.F. Gene sequencing. The race for the $1000 genome. Science 311, 1544–1546 (2006).
    Article CAS Google Scholar
  26. Cheung, J. et al. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 4, R25 (2003).
    Article Google Scholar
  27. Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
    Article CAS Google Scholar
  28. Feuk, L. et al. Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet. 1, e56 (2005).
    Article Google Scholar
  29. Pfaffl, M.W. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 29, e45 (2001).
    Article CAS Google Scholar
  30. Osborne, L.R. et al. A 1.5 million-base pair inversion polymorphism in families with Williams-Beuren syndrome. Nat. Genet. 29, 321–325 (2001).
    Article CAS Google Scholar

Download references

Acknowledgements

We thank T. Tang, L. Wong, J. Wittnam, C.-F. Chu and W. Hwang of The Centre for Applied Genomics for technical assistance. Computational analyses were supported by the Shared Hierarchical Academic Research Computing Network (SHARCNET) and the Centre for Computational Biology at the Hospital for Sick Children. The work was supported by Genome Canada/Ontario Genomics Institute, the Canadian Institutes of Health Research (CIHR), the Canada Foundation for Innovation and the McLaughlin Centre for Molecular Medicine (all to S.W.S). L.A. and X.E. are supported by Genoma España and Genome Canada joint R+D+I projects and by the Generalitat de Catalunya (Departament d'Universitats, 2005SGR00008, and Departament de Salut). L.F. is supported by CIHR. S.W.S. is an Investigator of CIHR and International Scholar of Howard Hughes Medical Institute.

Author information

Authors and Affiliations

  1. The Hospital for Sick Children and Department of Molecular and Medical Genetics, Program in Genetics and Genomic Biology, University of Toronto and The Centre for Applied Genomics, MaRS Centre, Toronto, M5G 1L7, Ontario, Canada
    Razi Khaja, Junjun Zhang, Jeffrey R MacDonald, Yongshu He, Ann M Joseph-George, John Wei, Muhammad A Rafiq, Cheng Qian, Mary Shago, Stephen W Scherer & Lars Feuk
  2. Department of Biosciences, Commission on Science and Technology for Sustainable Development in the South Institute of Information Technology (CIIT), Islamabad, 44000, Pakistan
    Muhammad A Rafiq
  3. Genes and Disease Program, Center for Genomic Regulation, Charles Darwin s/n, Barcelona Biomedical Research Park, Barcelona, 08003, Catalonia, Spain
    Lorena Pantano, Lluis Armengol & Xavier Estivill
  4. Genome Science Laboratory, Research Center for Advanced Science and Technology, University of Tokyo, 4-6-1 Komaba, Meguro, 153-8904, Tokyo, Japan
    Hiroyuki Aburatani
  5. Affymetrix, Inc., Santa Clara, 95051, California, USA
    Keith Jones
  6. The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, CB10 1SA, Cambridge, UK
    Richard Redon & Matthew Hurles
  7. Pompeu Fabra University, Charles Darwin s/n, and National Genotyping Centre, Passeig Marítim 37-49, Barcelona Biomedical Research Park, Barcelona, Catalonia, Spain
    Xavier Estivill
  8. Windber Research Institute, 620 7th Street, Windber, 15963-1331, Pennsylvania, USA
    Richard J Mural
  9. Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, 20 Shattuck St., Boston, 02115, Massachusetts, USA
    Charles Lee

Authors

  1. Razi Khaja
    You can also search for this author inPubMed Google Scholar
  2. Junjun Zhang
    You can also search for this author inPubMed Google Scholar
  3. Jeffrey R MacDonald
    You can also search for this author inPubMed Google Scholar
  4. Yongshu He
    You can also search for this author inPubMed Google Scholar
  5. Ann M Joseph-George
    You can also search for this author inPubMed Google Scholar
  6. John Wei
    You can also search for this author inPubMed Google Scholar
  7. Muhammad A Rafiq
    You can also search for this author inPubMed Google Scholar
  8. Cheng Qian
    You can also search for this author inPubMed Google Scholar
  9. Mary Shago
    You can also search for this author inPubMed Google Scholar
  10. Lorena Pantano
    You can also search for this author inPubMed Google Scholar
  11. Hiroyuki Aburatani
    You can also search for this author inPubMed Google Scholar
  12. Keith Jones
    You can also search for this author inPubMed Google Scholar
  13. Richard Redon
    You can also search for this author inPubMed Google Scholar
  14. Matthew Hurles
    You can also search for this author inPubMed Google Scholar
  15. Lluis Armengol
    You can also search for this author inPubMed Google Scholar
  16. Xavier Estivill
    You can also search for this author inPubMed Google Scholar
  17. Richard J Mural
    You can also search for this author inPubMed Google Scholar
  18. Charles Lee
    You can also search for this author inPubMed Google Scholar
  19. Stephen W Scherer
    You can also search for this author inPubMed Google Scholar
  20. Lars Feuk
    You can also search for this author inPubMed Google Scholar

Contributions

The study was designed by R.K., S.W.S. and L.F. The GCA algorithm was created by R.K. Sequence alignment and computational analysis was performed by R.K., J.Z., J.R.M, J.W., C.Q., L.A. and R.J.M. FISH analysis was performed by Y.H., A.M.J.G., M.S. and C.L. PCR analysis was performed by M.A.R., L.P., L.A. and L.F. J.Z., J.R.M, J.W., C.Q., H.A., K.J., R.R., M.H., L.A., X.E., C.L., S.W.S. and L.F contributed to the analysis of overlap with genomic features, creation of data sets for such analysis and interpretation of the data. S.W.S. and L.F conceptualized, designed and coordinated the experiments. The paper was written by S.W.S and L.F.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Table 1

Results for MegaBLAST and A2Amapper comparing R27c versus Build 35 and comparing Build 35 versus R27c. (PDF 18 kb)

Supplementary Table 2

List of copy-unmatched sequences identified by GCA; table also shows information on repeat content and re-BLAT versus Build 35, Build 36 and chimpanzee Build 1. (XLS 176 kb)

Supplementary Table 3

Intra- and interscaffold inversions identified by GCA between R27c and Build 35. (PDF 10 kb)

Supplementary Table 4

List of refined set of unmatched sequences used for analysis of overlap with genomic features; all entries in this list with an insertion point were used for genomic overlap analysis. (XLS 5127 kb)

Supplementary Table 5

Analysis of RefSeq genes and mRNAs. (XLS 377 kb)

Supplementary Table 6

Results and details for PCR-based assays. (XLS 34 kb)

Supplementary Table 7

Results and details for fluoresecence in situ hybridization experiments. (PDF 37 kb)

Supplementary Table 8

Results of comparisons of single-base mismatches detected by GCA with dbSNP_125 and with HapMap QC+/QC− SNPs. (PDF 59 kb)

Supplementary Table 9

Comparison between assembly differences and other genomic features. (XLS 82 kb)

Supplementary Methods (PDF 105 kb)

Rights and permissions

About this article

Cite this article

Khaja, R., Zhang, J., MacDonald, J. et al. Genome assembly comparison identifies structural variants in the human genome.Nat Genet 38, 1413–1418 (2006). https://doi.org/10.1038/ng1921

Download citation