Assessment of BOLD and GenBank - Their accuracy and reliability for the identification of biological materials - PubMed (original) (raw)
Assessment of BOLD and GenBank - Their accuracy and reliability for the identification of biological materials
Kelly A Meiklejohn et al. PLoS One. 2019.
Abstract
Taxonomic identification of biological materials can be achieved through DNA barcoding, where an unknown "barcode" sequence is compared to a reference database. In many disciplines, obtaining accurate taxonomic identifications can be imperative (e.g., evolutionary biology, food regulatory compliance, forensics). The Barcode of Life DataSystems (BOLD) and GenBank are the main public repositories of DNA barcode sequences. In this study, an assessment of the accuracy and reliability of sequences in these databases was performed. To achieve this, 1) curated reference materials for plants, macro-fungi and insects were obtained from national collections, 2) relevant barcode sequences (rbcL, matK, trnH-psbA, ITS and COI) from these reference samples were generated and used for searching against both databases, and 3) optimal search parameters were determined that ensure the best match to the known species in either database. While GenBank outperformed BOLD for species-level identification of insect taxa (53% and 35%, respectively), both databases performed comparably for plants and macro-fungi (~81% and ~57%, respectively). Results illustrated that using a multi-locus barcode approach increased identification success. This study outlines the utility of the BLAST search tool in GenBank and the BOLD identification engine for taxonomic identifications and identifies some precautions needed when using public sequence repositories in applied scientific disciplines.
Conflict of interest statement
The authors have declared that no competing interests exist.
Figures
Fig 1
Overall classification accuracies from BOLD and GenBank for: A) COI insect sequences (n = 17), B) ITS macro-fungi sequences (n = 14), and C) plant taxa using either a 2-locus (rbcL and matK; n = 53) and 4-locus approach (rbcL, matK, trnH-psbA and ITS2; n = 28). The identification success for genus is denoted by the light color and species by the dark color. Blue bars correspond to results from searches against BOLD and green against GenBank.
Fig 2
Classification using BOLD and GenBank for: A-B) COI insect sequences (n = 17), C-D) ITS macro-fungi sequences (n = 14), and E-F) plant taxa using either a 2-locus approach (rbcL and matK; n = 53) or 4-locus approach (rbcL, matK, trnH-psbA, and ITS2; n = 28). A,C,E) Assessment of the specificity of the top match(es) in both databases: reliable match, where all records with the same top match statistics matched the expected taxa (dark blue), or ambiguous match, where records with the same top statistic match represent more than one species (other colors; e.g., gray = undetermined species, light blue = congeneric species, etc). B,D,F) Taxonomic level classification. Taxa were correctly identified to the species-level (dark blue) or higher taxonomic level (other colors; e.g., light blue = genus, green = class, etc).
Similar articles
- Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia.
Baena-Bejarano N, Reina C, Martínez-Revelo DE, Medina CA, Tovar E, Uribe-Soto S, Neita-Moreno JC, Gonzalez MA. Baena-Bejarano N, et al. PLoS One. 2023 Apr 24;18(4):e0277379. doi: 10.1371/journal.pone.0277379. eCollection 2023. PLoS One. 2023. PMID: 37093820 Free PMC article. - A protocol for obtaining DNA barcodes from plant and insect fragments isolated from forensic-type soils.
Meiklejohn KA, Jackson ML, Stern LA, Robertson JM. Meiklejohn KA, et al. Int J Legal Med. 2018 Nov;132(6):1515-1526. doi: 10.1007/s00414-018-1772-1. Epub 2018 Feb 8. Int J Legal Med. 2018. PMID: 29423711 - Two new computational methods for universal DNA barcoding: a benchmark using barcode sequences of bacteria, archaea, animals, fungi, and land plants.
Tanabe AS, Toju H. Tanabe AS, et al. PLoS One. 2013 Oct 18;8(10):e76910. doi: 10.1371/journal.pone.0076910. eCollection 2013. PLoS One. 2013. PMID: 24204702 Free PMC article. - Tiny insects, big troubles: a review of BOLD's COI database for Thysanoptera (Insecta).
Lindner MF, Gonçalves LT, Bianchi FM, Ferrari A, Cavalleri A. Lindner MF, et al. Bull Entomol Res. 2023 Oct;113(5):703-715. doi: 10.1017/S0007485323000391. Epub 2023 Aug 24. Bull Entomol Res. 2023. PMID: 37614126 Review. - Fungal DNA barcoding.
Xu J. Xu J. Genome. 2016 Nov;59(11):913-932. doi: 10.1139/gen-2016-0046. Epub 2016 Aug 30. Genome. 2016. PMID: 27829306 Review.
Cited by
- The buzz about honey-based biosurveys.
Vuong P, Griffiths AP, Barbour E, Kaur P. Vuong P, et al. NPJ Biodivers. 2024 Apr 17;3(1):8. doi: 10.1038/s44185-024-00040-y. NPJ Biodivers. 2024. PMID: 39242847 Free PMC article. Review. - WiPFIM: A digital platform for interlinking biocollections of wild plants, fruits, associated insects, and their molecular barcodes.
Onyango B, Copeland R, Mbogholi J, Wamalwa M, Kibet C, Tonnang HEZ, Senagi K. Onyango B, et al. Ecol Evol. 2024 Jun 1;14(6):e11457. doi: 10.1002/ece3.11457. eCollection 2024 Jun. Ecol Evol. 2024. PMID: 38826163 Free PMC article. - Guidelines for the Analysis of DNA Barcoding/Metabarcoding Sequencing Data and Interpretation of Publicly Available Databases.
Damaso N, Elwick KE, Robertson JM. Damaso N, et al. Methods Mol Biol. 2024;2744:391-402. doi: 10.1007/978-1-0716-3581-0_25. Methods Mol Biol. 2024. PMID: 38683333 - DNA Barcoding and Metabarcoding Protocols for Species Identification.
Elwick KE, Damaso N, Robertson JM. Elwick KE, et al. Methods Mol Biol. 2024;2744:155-169. doi: 10.1007/978-1-0716-3581-0_9. Methods Mol Biol. 2024. PMID: 38683317 - rCRUX: A Rapid and Versatile Tool for Generating Metabarcoding Reference libraries in R.
Curd EE, Gal L, Gallego R, Silliman K, Nielsen S, Gold Z. Curd EE, et al. Environ DNA. 2024 Jan;6(1):e489. doi: 10.1002/edn3.489. Epub 2023 Nov 29. Environ DNA. 2024. PMID: 38370872
References
- Bruns TD, White TJ, & Taylor JW. Fungal molecular systematics. Annu. Rev. Ecol. Syst. 1991;22: 525–564.
- Ratnasingham S, & Hebert PD. BOLD: The Barcode of Life Data System (http://www.barcodinglife.org). Mol. Ecol. Resour. 2007;7: 355–364. - PMC - PubMed
Publication types
MeSH terms
Grants and funding
This research was supported in part by an appointment to the Visiting Scientist Program at the FBI Laboratory Division, administered by the Oak Ridge Institute of Science and Education, through an interagency agreement between the US Department of Energy and the FBI. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
LinkOut - more resources
Full Text Sources
Research Materials
Miscellaneous