Genome assembly comparison identifies structural variants in the human genome (original) (raw)
References
- Marth, G.T. et al. A general approach to single-nucleotide polymorphism discovery. Nat. Genet. 23, 452–456 (1999).
Article CAS Google Scholar - Tsui, C. et al. Single nucleotide polymorphisms (SNPs) that map to gaps in the human SNP map. Nucleic Acids Res. 31, 4910–4916 (2003).
Article CAS Google Scholar - Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).
Article CAS Google Scholar - Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS Google Scholar - Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Article CAS Google Scholar - Myers, E.W., Sutton, G.G., Smith, H.O., Adams, M.D. & Venter, J.C. On the sequencing and assembly of the human genome. Proc. Natl. Acad. Sci. USA 99, 4145–4146 (2002).
Article CAS Google Scholar - Adams, M.D., Sutton, G.G., Smith, H.O., Myers, E.W. & Venter, J.C. The independence of our genome assemblies. Proc. Natl. Acad. Sci. USA 100, 3025–3026 (2003).
Article CAS Google Scholar - Istrail, S. et al. Whole-genome shotgun assembly and comparison of human genome assemblies. Proc. Natl. Acad. Sci. USA 101, 1916–1921 (2004).
Article CAS Google Scholar - Waterston, R.H., Lander, E.S. & Sulston, J.E. On the sequencing of the human genome. Proc. Natl. Acad. Sci. USA 99, 3712–3716 (2002).
Article CAS Google Scholar - Waterston, R.H., Lander, E.S. & Sulston, J.E. More on the sequencing of the human genome. Proc. Natl. Acad. Sci. USA 100, 3022–3024 (2003).
Article CAS Google Scholar - Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Article CAS Google Scholar - Zhang, Z., Schwartz, S., Wagner, L. & Miller, W. A greedy algorithm for aligning DNA sequences. J. Comput. Biol. 7, 203–214 (2000).
Article CAS Google Scholar - Mobarry, C. & Sutton, G. An assembly-to-assembly comparison tool. in Proceedings of the Third Annual RECOMB Satellite Meeting on DNA Sequencing Technologies and Computation (2003).
Google Scholar - Kent, W.J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Article CAS Google Scholar - Pruitt, K.D., Tatusova, T. & Maglott, D.R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).
Article CAS Google Scholar - Bailey, J.A. et al. Recent segmental duplications in the human genome. Science 297, 1003–1007 (2002).
Article CAS Google Scholar - Iafrate, A.J. et al. Detection of large-scale variation in the human genome. Nat. Genet. 36, 949–951 (2004).
Article CAS Google Scholar - Redon, R. et al. Global variation in copy number in the human genome. Nature (in the press).
- Wang, J. et al. dbRIP: a highly integrated database of retrotransposon insertion polymorphisms in humans. Hum. Mutat. 27, 323–329 (2006).
Article Google Scholar - Hillier, L.W. et al. The DNA sequence of human chromosome 7. Nature 424, 157–164 (2003).
Article CAS Google Scholar - Scherer, S.W. et al. Human chromosome 7: DNA sequence and biology. Science 300, 767–772 (2003).
Article CAS Google Scholar - Schmutz, J. et al. The DNA sequence and comparative analysis of human chromosome 5. Nature 431, 268–274 (2004).
Article CAS Google Scholar - Shendure, J., Mitra, R.D., Varma, C. & Church, G.M. Advanced sequencing technologies: methods and goals. Nat. Rev. Genet. 5, 335–344 (2004).
Article CAS Google Scholar - Bennett, S.T., Barnes, C., Cox, A., Davies, L. & Brown, C. Toward the 1,000 dollars human genome. Pharmacogenomics 6, 373–382 (2005).
Article CAS Google Scholar - Service, R.F. Gene sequencing. The race for the $1000 genome. Science 311, 1544–1546 (2006).
Article CAS Google Scholar - Cheung, J. et al. Genome-wide detection of segmental duplications and potential assembly errors in the human genome sequence. Genome Biol. 4, R25 (2003).
Article Google Scholar - Kent, W.J. et al. The human genome browser at UCSC. Genome Res. 12, 996–1006 (2002).
Article CAS Google Scholar - Feuk, L. et al. Discovery of human inversion polymorphisms by comparative analysis of human and chimpanzee DNA sequence assemblies. PLoS Genet. 1, e56 (2005).
Article Google Scholar - Pfaffl, M.W. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 29, e45 (2001).
Article CAS Google Scholar - Osborne, L.R. et al. A 1.5 million-base pair inversion polymorphism in families with Williams-Beuren syndrome. Nat. Genet. 29, 321–325 (2001).
Article CAS Google Scholar
Acknowledgements
We thank T. Tang, L. Wong, J. Wittnam, C.-F. Chu and W. Hwang of The Centre for Applied Genomics for technical assistance. Computational analyses were supported by the Shared Hierarchical Academic Research Computing Network (SHARCNET) and the Centre for Computational Biology at the Hospital for Sick Children. The work was supported by Genome Canada/Ontario Genomics Institute, the Canadian Institutes of Health Research (CIHR), the Canada Foundation for Innovation and the McLaughlin Centre for Molecular Medicine (all to S.W.S). L.A. and X.E. are supported by Genoma España and Genome Canada joint R+D+I projects and by the Generalitat de Catalunya (Departament d'Universitats, 2005SGR00008, and Departament de Salut). L.F. is supported by CIHR. S.W.S. is an Investigator of CIHR and International Scholar of Howard Hughes Medical Institute.
Author information
Authors and Affiliations
- The Hospital for Sick Children and Department of Molecular and Medical Genetics, Program in Genetics and Genomic Biology, University of Toronto and The Centre for Applied Genomics, MaRS Centre, Toronto, M5G 1L7, Ontario, Canada
Razi Khaja, Junjun Zhang, Jeffrey R MacDonald, Yongshu He, Ann M Joseph-George, John Wei, Muhammad A Rafiq, Cheng Qian, Mary Shago, Stephen W Scherer & Lars Feuk - Department of Biosciences, Commission on Science and Technology for Sustainable Development in the South Institute of Information Technology (CIIT), Islamabad, 44000, Pakistan
Muhammad A Rafiq - Genes and Disease Program, Center for Genomic Regulation, Charles Darwin s/n, Barcelona Biomedical Research Park, Barcelona, 08003, Catalonia, Spain
Lorena Pantano, Lluis Armengol & Xavier Estivill - Genome Science Laboratory, Research Center for Advanced Science and Technology, University of Tokyo, 4-6-1 Komaba, Meguro, 153-8904, Tokyo, Japan
Hiroyuki Aburatani - Affymetrix, Inc., Santa Clara, 95051, California, USA
Keith Jones - The Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, CB10 1SA, Cambridge, UK
Richard Redon & Matthew Hurles - Pompeu Fabra University, Charles Darwin s/n, and National Genotyping Centre, Passeig Marítim 37-49, Barcelona Biomedical Research Park, Barcelona, Catalonia, Spain
Xavier Estivill - Windber Research Institute, 620 7th Street, Windber, 15963-1331, Pennsylvania, USA
Richard J Mural - Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, 20 Shattuck St., Boston, 02115, Massachusetts, USA
Charles Lee
Authors
- Razi Khaja
You can also search for this author inPubMed Google Scholar - Junjun Zhang
You can also search for this author inPubMed Google Scholar - Jeffrey R MacDonald
You can also search for this author inPubMed Google Scholar - Yongshu He
You can also search for this author inPubMed Google Scholar - Ann M Joseph-George
You can also search for this author inPubMed Google Scholar - John Wei
You can also search for this author inPubMed Google Scholar - Muhammad A Rafiq
You can also search for this author inPubMed Google Scholar - Cheng Qian
You can also search for this author inPubMed Google Scholar - Mary Shago
You can also search for this author inPubMed Google Scholar - Lorena Pantano
You can also search for this author inPubMed Google Scholar - Hiroyuki Aburatani
You can also search for this author inPubMed Google Scholar - Keith Jones
You can also search for this author inPubMed Google Scholar - Richard Redon
You can also search for this author inPubMed Google Scholar - Matthew Hurles
You can also search for this author inPubMed Google Scholar - Lluis Armengol
You can also search for this author inPubMed Google Scholar - Xavier Estivill
You can also search for this author inPubMed Google Scholar - Richard J Mural
You can also search for this author inPubMed Google Scholar - Charles Lee
You can also search for this author inPubMed Google Scholar - Stephen W Scherer
You can also search for this author inPubMed Google Scholar - Lars Feuk
You can also search for this author inPubMed Google Scholar
Contributions
The study was designed by R.K., S.W.S. and L.F. The GCA algorithm was created by R.K. Sequence alignment and computational analysis was performed by R.K., J.Z., J.R.M, J.W., C.Q., L.A. and R.J.M. FISH analysis was performed by Y.H., A.M.J.G., M.S. and C.L. PCR analysis was performed by M.A.R., L.P., L.A. and L.F. J.Z., J.R.M, J.W., C.Q., H.A., K.J., R.R., M.H., L.A., X.E., C.L., S.W.S. and L.F contributed to the analysis of overlap with genomic features, creation of data sets for such analysis and interpretation of the data. S.W.S. and L.F conceptualized, designed and coordinated the experiments. The paper was written by S.W.S and L.F.
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Table 1
Results for MegaBLAST and A2Amapper comparing R27c versus Build 35 and comparing Build 35 versus R27c. (PDF 18 kb)
Supplementary Table 2
List of copy-unmatched sequences identified by GCA; table also shows information on repeat content and re-BLAT versus Build 35, Build 36 and chimpanzee Build 1. (XLS 176 kb)
Supplementary Table 3
Intra- and interscaffold inversions identified by GCA between R27c and Build 35. (PDF 10 kb)
Supplementary Table 4
List of refined set of unmatched sequences used for analysis of overlap with genomic features; all entries in this list with an insertion point were used for genomic overlap analysis. (XLS 5127 kb)
Supplementary Table 5
Analysis of RefSeq genes and mRNAs. (XLS 377 kb)
Supplementary Table 6
Results and details for PCR-based assays. (XLS 34 kb)
Supplementary Table 7
Results and details for fluoresecence in situ hybridization experiments. (PDF 37 kb)
Supplementary Table 8
Results of comparisons of single-base mismatches detected by GCA with dbSNP_125 and with HapMap QC+/QC− SNPs. (PDF 59 kb)
Supplementary Table 9
Comparison between assembly differences and other genomic features. (XLS 82 kb)
Supplementary Methods (PDF 105 kb)
Rights and permissions
About this article
Cite this article
Khaja, R., Zhang, J., MacDonald, J. et al. Genome assembly comparison identifies structural variants in the human genome.Nat Genet 38, 1413–1418 (2006). https://doi.org/10.1038/ng1921
- Received: 20 July 2006
- Accepted: 13 October 2006
- Published: 22 November 2006
- Issue Date: 01 December 2006
- DOI: https://doi.org/10.1038/ng1921