Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation - PubMed (original) (raw)
. 2016 Jan 4;44(D1):D733-45.
doi: 10.1093/nar/gkv1189. Epub 2015 Nov 8.
Mathew W Wright 1, J Rodney Brister 1, Stacy Ciufo 1, Diana Haddad 1, Rich McVeigh 1, Bhanu Rajput 1, Barbara Robbertse 1, Brian Smith-White 1, Danso Ako-Adjei 1, Alexander Astashyn 1, Azat Badretdin 1, Yiming Bao 1, Olga Blinkova 1, Vyacheslav Brover 1, Vyacheslav Chetvernin 1, Jinna Choi 1, Eric Cox 1, Olga Ermolaeva 1, Catherine M Farrell 1, Tamara Goldfarb 1, Tripti Gupta 1, Daniel Haft 1, Eneida Hatcher 1, Wratko Hlavina 1, Vinita S Joardar 1, Vamsi K Kodali 1, Wenjun Li 1, Donna Maglott 1, Patrick Masterson 1, Kelly M McGarvey 1, Michael R Murphy 1, Kathleen O'Neill 1, Shashikant Pujar 1, Sanjida H Rangwala 1, Daniel Rausch 1, Lillian D Riddick 1, Conrad Schoch 1, Andrei Shkeda 1, Susan S Storz 1, Hanzhen Sun 1, Francoise Thibaud-Nissen 1, Igor Tolstoy 1, Raymond E Tully 1, Anjana R Vatsan 1, Craig Wallin 1, David Webb 1, Wendy Wu 1, Melissa J Landrum 1, Avi Kimchi 1, Tatiana Tatusova 1, Michael DiCuccio 1, Paul Kitts 1, Terence D Murphy 1, Kim D Pruitt 2
Affiliations
- PMID: 26553804
- PMCID: PMC4702849
- DOI: 10.1093/nar/gkv1189
Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation
Nuala A O'Leary et al. Nucleic Acids Res. 2016.
Abstract
The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.
Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Similar articles
- NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy.
Pruitt KD, Tatusova T, Brown GR, Maglott DR. Pruitt KD, et al. Nucleic Acids Res. 2012 Jan;40(Database issue):D130-5. doi: 10.1093/nar/gkr1079. Epub 2011 Nov 24. Nucleic Acids Res. 2012. PMID: 22121212 Free PMC article. - NCBI Reference Sequences: current status, policy and new initiatives.
Pruitt KD, Tatusova T, Klimke W, Maglott DR. Pruitt KD, et al. Nucleic Acids Res. 2009 Jan;37(Database issue):D32-6. doi: 10.1093/nar/gkn721. Epub 2008 Oct 16. Nucleic Acids Res. 2009. PMID: 18927115 Free PMC article. - NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins.
Pruitt KD, Tatusova T, Maglott DR. Pruitt KD, et al. Nucleic Acids Res. 2005 Jan 1;33(Database issue):D501-4. doi: 10.1093/nar/gki025. Nucleic Acids Res. 2005. PMID: 15608248 Free PMC article. - NCBI Taxonomy: a comprehensive update on curation, resources and tools.
Schoch CL, Ciufo S, Domrachev M, Hotton CL, Kannan S, Khovanskaya R, Leipe D, Mcveigh R, O'Neill K, Robbertse B, Sharma S, Soussov V, Sullivan JP, Sun L, Turner S, Karsch-Mizrachi I. Schoch CL, et al. Database (Oxford). 2020 Jan 1;2020:baaa062. doi: 10.1093/database/baaa062. Database (Oxford). 2020. PMID: 32761142 Free PMC article. Review. - EGASP: the human ENCODE Genome Annotation Assessment Project.
Guigó R, Flicek P, Abril JF, Reymond A, Lagarde J, Denoeud F, Antonarakis S, Ashburner M, Bajic VB, Birney E, Castelo R, Eyras E, Ucla C, Gingeras TR, Harrow J, Hubbard T, Lewis SE, Reese MG. Guigó R, et al. Genome Biol. 2006;7 Suppl 1(Suppl 1):S2.1-31. doi: 10.1186/gb-2006-7-s1-s2. Epub 2006 Aug 7. Genome Biol. 2006. PMID: 16925836 Free PMC article. Review.
Cited by
- The evolutionary landscape of prokaryotic chromosome/plasmid balance.
Xue W, Hong J, Wang T. Xue W, et al. Commun Biol. 2024 Nov 4;7(1):1434. doi: 10.1038/s42003-024-07167-5. Commun Biol. 2024. PMID: 39496780 Free PMC article. - Harnessing the power of AI in precision medicine: NGS-based therapeutic insights for colorectal cancer cohort.
Murcia Pienkowski V, Skoczylas P, Zaremba A, Kłęk S, Balawejder M, Biernat P, Czarnocka W, Gniewek O, Grochowalski Ł, Kamuda M, Król-Józaga B, Marczyńska-Grzelak J, Mazzocco G, Szatanek R, Widawski J, Welanyk J, Orzeszko Z, Szura M, Torbicz G, Borys M, Wohadlo Ł, Wysocki M, Karczewski M, Markowska B, Kucharczyk T, Piatek MJ, Jasiński M, Warchoł M, Kaczmarczyk J, Blum A, Sanecka-Duin A. Murcia Pienkowski V, et al. Front Oncol. 2024 Oct 7;14:1407465. doi: 10.3389/fonc.2024.1407465. eCollection 2024. Front Oncol. 2024. PMID: 39435285 Free PMC article. - Clinical Experience of Cerebrospinal Fluid-Based Liquid Biopsy Demonstrates Superiority of Cell-Free DNA over Cell Pellet Genomic DNA for Molecular Profiling.
Bale TA, Yang SR, Solomon JP, Nafa K, Middha S, Casanova J, Sadowska J, Skakodub A, Ahmad H, Yu HA, Riely GJ, Kris MG, Chandarlapaty S, Rosenblum MK, Gavrilovic I, Karajannis MA, Pentsova E, Miller A, Boire A, Mellinghoff I, Berger MF, Zehir A, Ladanyi M, Benayed R, Arcila ME. Bale TA, et al. J Mol Diagn. 2021 Jun;23(6):742-752. doi: 10.1016/j.jmoldx.2021.03.001. Epub 2021 Mar 27. J Mol Diagn. 2021. PMID: 33781965 Free PMC article. - Morganella Phage Mecenats66 Utilizes an Evolutionarily Distinct Subtype of Headful Genome Packaging with a Preferred Packaging Initiation Site.
Zrelovs N, Jansons J, Dislers A, Kazaks A. Zrelovs N, et al. Microorganisms. 2022 Sep 7;10(9):1799. doi: 10.3390/microorganisms10091799. Microorganisms. 2022. PMID: 36144401 Free PMC article. - Genome sequences of 70 multidrug-resistant Gram-negative isolates in high-risk neonates in the Northeast of Mexico.
Rodriguez-Orduña L, Lara-Diaz VJ, Alcorta-Garcia MR, Lopez-Villaseñor CN, Licona-Cassani C. Rodriguez-Orduña L, et al. Microbiol Resour Announc. 2024 Oct 10;13(10):e0027424. doi: 10.1128/mra.00274-24. Epub 2024 Sep 3. Microbiol Resour Announc. 2024. PMID: 39225481 Free PMC article.
References
- Gray K.A., Yates B., Seal R.L., Wright M.W., Bruford E.A. Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 2015;43:D1079–D1085. - PMC - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources