NCBI Taxonomy: a comprehensive update on curation, resources and tools - PubMed (original) (raw)

Review

. 2020 Jan 1:2020:baaa062.

doi: 10.1093/database/baaa062.

Stacy Ciufo 1, Mikhail Domrachev 1, Carol L Hotton 1, Sivakumar Kannan 1, Rogneda Khovanskaya 1, Detlef Leipe 1, Richard Mcveigh 1, Kathleen O'Neill 1, Barbara Robbertse 1, Shobha Sharma 1, Vladimir Soussov 1, John P Sullivan 1, Lu Sun 1, Seán Turner 1, Ilene Karsch-Mizrachi 1

Affiliations

Review

NCBI Taxonomy: a comprehensive update on curation, resources and tools

Conrad L Schoch et al. Database (Oxford). 2020.

Abstract

The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy.

Published by Oxford University Press 2020.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Summarized flow of NCBI Taxonomy information.

Figure 2

Figure 2

Species names added over time to NCBI Taxonomy. The first occurrence of each species in the NCBI Taxonomy was determined by the created date of its associated TaxNode. This date represents the first addition of the species into the database irrespective of subsequent name changes.

Figure 3

Figure 3

Estimate of the percentage of formal species names missing from the public NCBI databases. Curves were generated by plotting the number of formal species in the NCBI Taxonomy against the running total of described species in the corresponding group by the end of the year. The IJSEM was used as the source for bacteria. The International Plant Names Index (IPNI; 27) was used as the source for the green plants. The Species 2000 Annual Checklist (46) was used as the source for invertebrates and Fungi. Vertebrate data were collected from the Catalogue of Fishes (21), Amphibian Species of the World (17), the Reptile Database (32), Avibase (19) and the American Society of Mammalogists (18). Archaea and viruses were omitted for having a small number of species and a specialized process for reporting new species, respectively.

Figure 4

Figure 4

Total number of names labeled as unpublished in NCBI Taxonomy, over time.

Figure 5

Figure 5

NCBI TaxBrowser example page.

Similar articles

Cited by

References

    1. Karsch-Mizrachi I., Takagi T. and Cochrane G. (2018) The international nucleotide sequence database collaboration. Nucleic Acids Res., 46, D48–D51. - PMC - PubMed
    1. Strasser B.J. (2008) GenBank—natural history in the 21st century? Science, 322, 537–538. - PubMed
    1. Wilkinson M.D., Dumontier M., Aalbersberg I.J. et al. (2016) The FAIR guiding principles for scientific data management and stewardship. Sci. Data, 3, 160018. - PMC - PubMed
    1. Schuler G.D., Epstein J.A., Ohkawa H. et al. (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol., 266, 141–162. - PubMed
    1. Federhen S. (2012) The NCBI taxonomy database. Nucleic Acids Res., 40, D136–D143. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources