Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation - PubMed (original) (raw)

. 2016 Jan 4;44(D1):D733-45.

doi: 10.1093/nar/gkv1189. Epub 2015 Nov 8.

Mathew W Wright 1, J Rodney Brister 1, Stacy Ciufo 1, Diana Haddad 1, Rich McVeigh 1, Bhanu Rajput 1, Barbara Robbertse 1, Brian Smith-White 1, Danso Ako-Adjei 1, Alexander Astashyn 1, Azat Badretdin 1, Yiming Bao 1, Olga Blinkova 1, Vyacheslav Brover 1, Vyacheslav Chetvernin 1, Jinna Choi 1, Eric Cox 1, Olga Ermolaeva 1, Catherine M Farrell 1, Tamara Goldfarb 1, Tripti Gupta 1, Daniel Haft 1, Eneida Hatcher 1, Wratko Hlavina 1, Vinita S Joardar 1, Vamsi K Kodali 1, Wenjun Li 1, Donna Maglott 1, Patrick Masterson 1, Kelly M McGarvey 1, Michael R Murphy 1, Kathleen O'Neill 1, Shashikant Pujar 1, Sanjida H Rangwala 1, Daniel Rausch 1, Lillian D Riddick 1, Conrad Schoch 1, Andrei Shkeda 1, Susan S Storz 1, Hanzhen Sun 1, Francoise Thibaud-Nissen 1, Igor Tolstoy 1, Raymond E Tully 1, Anjana R Vatsan 1, Craig Wallin 1, David Webb 1, Wendy Wu 1, Melissa J Landrum 1, Avi Kimchi 1, Tatiana Tatusova 1, Michael DiCuccio 1, Paul Kitts 1, Terence D Murphy 1, Kim D Pruitt 2

Affiliations

Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation

Nuala A O'Leary et al. Nucleic Acids Res. 2016.

Abstract

The RefSeq project at the National Center for Biotechnology Information (NCBI) maintains and curates a publicly available database of annotated genomic, transcript, and protein sequence records (http://www.ncbi.nlm.nih.gov/refseq/). The RefSeq project leverages the data submitted to the International Nucleotide Sequence Database Collaboration (INSDC) against a combination of computation, manual curation, and collaboration to produce a standard set of stable, non-redundant reference sequences. The RefSeq project augments these reference sequences with current knowledge including publications, functional features and informative nomenclature. The database currently represents sequences from more than 55,000 organisms (>4800 viruses, >40,000 prokaryotes and >10,000 eukaryotes; RefSeq release 71), ranging from a single record to complete genomes. This paper summarizes the current status of the viral, prokaryotic, and eukaryotic branches of the RefSeq project, reports on improvements to data access and details efforts to further expand the taxonomic representation of the collection. We also highlight diverse functional curation initiatives that support multiple uses of RefSeq data including taxonomic validation, genome annotation, comparative genomics, and clinical testing. We summarize our approach to utilizing available RNA-Seq and other data types in our manual curation process for vertebrate, plant, and other species, and describe a new direction for prokaryotic genomes and protein name management.

Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

PubMed Disclaimer

Similar articles

Cited by

References

    1. Nosek B.A., Alter G., Banks G.C., Borsboom D., Bowman S.D., Breckler S.J., Buck S., Chambers C.D., Chin G., Christensen G., et al. SCIENTIFIC STANDARDS. Promoting an open research culture. Science. 2015;348:1422–1425. - PMC - PubMed
    1. Gray K.A., Yates B., Seal R.L., Wright M.W., Bruford E.A. Genenames.org: the HGNC resources in 2015. Nucleic Acids Res. 2015;43:D1079–D1085. - PMC - PubMed
    1. Ruzicka L., Bradford Y.M., Frazer K., Howe D.G., Paddock H., Ramachandran S., Singer A., Toro S., Van Slyke C.E., Eagle A.E., et al. ZFIN, The zebrafish model organism database: Updates and new directions. Genesis. 2015;53:498–509. - PMC - PubMed
    1. UniProt C. UniProt: a hub for protein information. Nucleic acids Res. 2015;43:D204–212. - PMC - PubMed
    1. Kozomara A., Griffiths-Jones S. miRBase: annotating high confidence microRNAs using deep sequencing data. Nucleic Acids Res. 2014;42:D68–73. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources