NCBI Taxonomy: a comprehensive update on curation, resources and tools - PubMed (original) (raw)
Review
. 2020 Jan 1:2020:baaa062.
doi: 10.1093/database/baaa062.
Stacy Ciufo 1, Mikhail Domrachev 1, Carol L Hotton 1, Sivakumar Kannan 1, Rogneda Khovanskaya 1, Detlef Leipe 1, Richard Mcveigh 1, Kathleen O'Neill 1, Barbara Robbertse 1, Shobha Sharma 1, Vladimir Soussov 1, John P Sullivan 1, Lu Sun 1, Seán Turner 1, Ilene Karsch-Mizrachi 1
Affiliations
- PMID: 32761142
- PMCID: PMC7408187
- DOI: 10.1093/database/baaa062
Review
NCBI Taxonomy: a comprehensive update on curation, resources and tools
Conrad L Schoch et al. Database (Oxford). 2020.
Abstract
The National Center for Biotechnology Information (NCBI) Taxonomy includes organism names and classifications for every sequence in the nucleotide and protein sequence databases of the International Nucleotide Sequence Database Collaboration. Since the last review of this resource in 2012, it has undergone several improvements. Most notable is the shift from a single SQL database to a series of linked databases tied to a framework of data called NameBank. This means that relations among data elements can be adjusted in more detail, resulting in expanded annotation of synonyms, the ability to flag names with specific nomenclatural properties, enhanced tracking of publications tied to names and improved annotation of scientific authorities and types. Additionally, practices utilized by NCBI Taxonomy curators specific to major taxonomic groups are described, terms peculiar to NCBI Taxonomy are explained, external resources are acknowledged and updates to tools and other resources are documented. Database URL: https://www.ncbi.nlm.nih.gov/taxonomy.
Published by Oxford University Press 2020.
Figures
Figure 1
Summarized flow of NCBI Taxonomy information.
Figure 2
Species names added over time to NCBI Taxonomy. The first occurrence of each species in the NCBI Taxonomy was determined by the created date of its associated TaxNode. This date represents the first addition of the species into the database irrespective of subsequent name changes.
Figure 3
Estimate of the percentage of formal species names missing from the public NCBI databases. Curves were generated by plotting the number of formal species in the NCBI Taxonomy against the running total of described species in the corresponding group by the end of the year. The IJSEM was used as the source for bacteria. The International Plant Names Index (IPNI; 27) was used as the source for the green plants. The Species 2000 Annual Checklist (46) was used as the source for invertebrates and Fungi. Vertebrate data were collected from the Catalogue of Fishes (21), Amphibian Species of the World (17), the Reptile Database (32), Avibase (19) and the American Society of Mammalogists (18). Archaea and viruses were omitted for having a small number of species and a specialized process for reporting new species, respectively.
Figure 4
Total number of names labeled as unpublished in NCBI Taxonomy, over time.
Figure 5
NCBI TaxBrowser example page.
Similar articles
- Type material in the NCBI Taxonomy Database.
Federhen S. Federhen S. Nucleic Acids Res. 2015 Jan;43(Database issue):D1086-98. doi: 10.1093/nar/gku1127. Epub 2014 Nov 14. Nucleic Acids Res. 2015. PMID: 25398905 Free PMC article. - The NCBI Taxonomy database.
Federhen S. Federhen S. Nucleic Acids Res. 2012 Jan;40(Database issue):D136-43. doi: 10.1093/nar/gkr1178. Epub 2011 Dec 1. Nucleic Acids Res. 2012. PMID: 22139910 Free PMC article. - Database resources of the National Center for Biotechnology Information.
Sayers EW, Beck J, Bolton EE, Brister JR, Chan J, Comeau DC, Connor R, DiCuccio M, Farrell CM, Feldgarden M, Fine AM, Funk K, Hatcher E, Hoeppner M, Kane M, Kannan S, Katz KS, Kelly C, Klimke W, Kim S, Kimchi A, Landrum M, Lathrop S, Lu Z, Malheiro A, Marchler-Bauer A, Murphy TD, Phan L, Prasad AB, Pujar S, Sawyer A, Schmieder E, Schneider VA, Schoch CL, Sharma S, Thibaud-Nissen F, Trawick BW, Venkatapathi T, Wang J, Pruitt KD, Sherry ST. Sayers EW, et al. Nucleic Acids Res. 2024 Jan 5;52(D1):D33-D43. doi: 10.1093/nar/gkad1044. Nucleic Acids Res. 2024. PMID: 37994677 Free PMC article. - Database resources of the National Center for Biotechnology Information.
Sayers EW, Beck J, Bolton EE, Bourexis D, Brister JR, Canese K, Comeau DC, Funk K, Kim S, Klimke W, Marchler-Bauer A, Landrum M, Lathrop S, Lu Z, Madden TL, O'Leary N, Phan L, Rangwala SH, Schneider VA, Skripchenko Y, Wang J, Ye J, Trawick BW, Pruitt KD, Sherry ST. Sayers EW, et al. Nucleic Acids Res. 2021 Jan 8;49(D1):D10-D17. doi: 10.1093/nar/gkaa892. Nucleic Acids Res. 2021. PMID: 33095870 Free PMC article. Review. - Education resources of the National Center for Biotechnology Information.
Cooper PS, Lipshultz D, Matten WT, McGinnis SD, Pechous S, Romiti ML, Tao T, Valjavec-Gratian M, Sayers EW. Cooper PS, et al. Brief Bioinform. 2010 Nov;11(6):563-9. doi: 10.1093/bib/bbq022. Epub 2010 Jun 22. Brief Bioinform. 2010. PMID: 20570844 Free PMC article. Review.
Cited by
- Taxometer: Improving taxonomic classification of metagenomics contigs.
Kutuzova S, Nielsen M, Piera P, Nissen JN, Rasmussen S. Kutuzova S, et al. Nat Commun. 2024 Sep 27;15(1):8357. doi: 10.1038/s41467-024-52771-y. Nat Commun. 2024. PMID: 39333501 Free PMC article. - Commensal consortia decolonize Enterobacteriaceae via ecological control.
Furuichi M, Kawaguchi T, Pust MM, Yasuma-Mitobe K, Plichta DR, Hasegawa N, Ohya T, Bhattarai SK, Sasajima S, Aoto Y, Tuganbaev T, Yaginuma M, Ueda M, Okahashi N, Amafuji K, Kiridoshi Y, Sugita K, Stražar M, Avila-Pacheco J, Pierce K, Clish CB, Skelly AN, Hattori M, Nakamoto N, Caballero S, Norman JM, Olle B, Tanoue T, Suda W, Arita M, Bucci V, Atarashi K, Xavier RJ, Honda K. Furuichi M, et al. Nature. 2024 Sep;633(8031):878-886. doi: 10.1038/s41586-024-07960-6. Epub 2024 Sep 18. Nature. 2024. PMID: 39294375 Free PMC article. - Multimodal Hox5 activity generates motor neuron diversity.
Kc R, López de Boer R, Lin M, Vagnozzi AN, Jeannotte L, Philippidou P. Kc R, et al. Commun Biol. 2024 Sep 17;7(1):1166. doi: 10.1038/s42003-024-06835-w. Commun Biol. 2024. PMID: 39289460 Free PMC article. - simona: a comprehensive R package for semantic similarity analysis on bio-ontologies.
Gu Z. Gu Z. BMC Genomics. 2024 Sep 16;25(1):869. doi: 10.1186/s12864-024-10759-4. BMC Genomics. 2024. PMID: 39285315 Free PMC article. - The Resurgence of Mpox: A New Global Health Crisis.
Acosta-España JD, Bonilla-Aldana DK, Luna C, Rodriguez-Morales AJ. Acosta-España JD, et al. Infez Med. 2024 Sep 1;32(3):267-271. doi: 10.53854/liim-3203-1. eCollection 2024. Infez Med. 2024. PMID: 39282537 Free PMC article. No abstract available.
References
- Strasser B.J. (2008) GenBank—natural history in the 21st century? Science, 322, 537–538. - PubMed
- Schuler G.D., Epstein J.A., Ohkawa H. et al. (1996) Entrez: molecular biology database and retrieval system. Methods Enzymol., 266, 141–162. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources