BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata - PubMed (original) (raw)

. 2012 Jan;40(Database issue):D57-63.

doi: 10.1093/nar/gkr1163. Epub 2011 Dec 1.

Karen Clark, Robert Gevorgyan, Vyacheslav Gorelenkov, Eugene Gribov, Ilene Karsch-Mizrachi, Michael Kimelman, Kim D Pruitt, Sergei Resenchuk, Tatiana Tatusova, Eugene Yaschenko, James Ostell

Affiliations

PMID: 22139929
PMCID: PMC3245069
DOI: 10.1093/nar/gkr1163

BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata

Tanya Barrett et al. Nucleic Acids Res. 2012 Jan.

Abstract

As the volume and complexity of data sets archived at NCBI grow rapidly, so does the need to gather and organize the associated metadata. Although metadata has been collected for some archival databases, previously, there was no centralized approach at NCBI for collecting this information and using it across databases. The BioProject database was recently established to facilitate organization and classification of project data submitted to NCBI, EBI and DDBJ databases. It captures descriptive information about research projects that result in high volume submissions to archival databases, ties together related data across multiple archives and serves as a central portal by which to inform users of data availability. Concomitantly, the BioSample database is being developed to capture descriptive information about the biological samples investigated in projects. BioProject and BioSample records link to corresponding data stored in archival repositories. Submissions are supported by a web-based Submission Portal that guides users through a series of forms for input of rich metadata describing their projects and samples. Together, these databases offer improved ways for users to query, locate, integrate and interpret the masses of data held in NCBI's archival repositories. The BioProject and BioSample databases are available at http://www.ncbi.nlm.nih.gov/bioproject and http://www.ncbi.nlm.nih.gov/biosample, respectively.

PubMed Disclaimer

Figures

Figure 1.

Schematic depicting how BioProject, BioSample and data objects can be organized and linked. This example is composed of one umbrella project that encompasses three subprojects, each of which generated data derived from two BioSample records. Users can query either the BioProject or the BioSample database to retrieve the relevant records, and then navigate through links to the corresponding experimental data which continue to be stored in NCBI's primary data archives, including GenBank, SRA, dbGaP and GEO. This schematic depicts direct links that can be applied between objects; it does not depict links to corresponding records in other NCBI databases, including PubMed, Gene, Genome and Taxonomy.

Figure 2.

Screenshot of a Genome Sequencing project that is a component of an umbrella project that encompasses data generated from an E. coli pathogen outbreak (upper panel) (17) and a corresponding sample record (lower panel). The records display the project title, summary, data type, locus_tag prefix and various project attributes including the scope and capture method (A). The Project Data section (B) lists the availability of corresponding sequence and assembly data in the Nucleotide and SRA databases where the data can be downloaded. Navigation panels assist users to link to Genome-level resources for that organism (C), or to ‘Navigate Up’ to the parent umbrella project, or to ‘Navigate across’ to sibling projects that are part of that umbrella project, as well as any additional projects related by organism (D). The ‘Related Information’ panel (E) contains full list of linkages for that record; clicking the BioSample link directs the user to the sample record shown in the lower panel, which lists the attributes that were collected for that sample including the collection date, isolation source, country and strain and serovar (F).

Cited by

Toward Accurate and Quantitative Comparative Metagenomics.
Nayfach S, Pollard KS. Nayfach S, et al. Cell. 2016 Aug 25;166(5):1103-1116. doi: 10.1016/j.cell.2016.08.007. Cell. 2016. PMID: 27565341 Free PMC article. Review.
MarineMetagenomeDB: a public repository for curated and standardized metadata for marine metagenomes.
Nata'ala MK, Avila Santos AP, Coelho Kasmanas J, Bartholomäus A, Saraiva JP, Godinho Silva S, Keller-Costa T, Costa R, Gomes NCM, Ponce de Leon Ferreira de Carvalho AC, Stadler PF, Sipoli Sanches D, Nunes da Rocha U. Nata'ala MK, et al. Environ Microbiome. 2022 Nov 18;17(1):57. doi: 10.1186/s40793-022-00449-7. Environ Microbiome. 2022. PMID: 36401317 Free PMC article.
BioPS: System for screening and assessment of biofuel-production potential of cyanobacteria.
Motwalli O, Essack M, Salhi A, Hanks J, Mijakovic I, Bajic VB. Motwalli O, et al. PLoS One. 2018 Aug 10;13(8):e0202002. doi: 10.1371/journal.pone.0202002. eCollection 2018. PLoS One. 2018. PMID: 30096176 Free PMC article.
PDX Finder: A portal for patient-derived tumor xenograft model discovery.
Conte N, Mason JC, Halmagyi C, Neuhauser S, Mosaku A, Yordanova G, Chatzipli A, Begley DA, Krupke DM, Parkinson H, Meehan TF, Bult CC. Conte N, et al. Nucleic Acids Res. 2019 Jan 8;47(D1):D1073-D1079. doi: 10.1093/nar/gky984. Nucleic Acids Res. 2019. PMID: 30535239 Free PMC article.
Gypsy moth genome provides insights into flight capability and virus-host interactions.
Zhang J, Cong Q, Rex EA, Hallwachs W, Janzen DH, Grishin NV, Gammon DB. Zhang J, et al. Proc Natl Acad Sci U S A. 2019 Jan 29;116(5):1669-1678. doi: 10.1073/pnas.1818283116. Epub 2019 Jan 14. Proc Natl Acad Sci U S A. 2019. PMID: 30642971 Free PMC article.

References

1. Cochrane G, Karsch-Mizrachi I, Nakamura Y. The International Nucleotide Sequence Database Collaboration. Nucleic Acids Res. 2011;39:D15–D18. - PMC - PubMed
1. Benson DA, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. GenBank. Nucleic Acids Res. 2011;39:D32–D37. - PMC - PubMed
1. Shumway M, Cochrane G, Sugawara H. Archiving next generation sequencing data. Nucleic Acids Res. 2010;38:D870–D871. - PMC - PubMed
1. Barrett T, Troup DB, Wilhite SE, Ledoux P, Evangelista C, Kim IF, Tomashevsky M, Marshall KA, Phillippy KH, Sherman PM, et al. NCBI GEO: archive for functional genomics data sets–10 years on. Nucleic Acids Res. 2011;39:D1005–D1010. - PMC - PubMed
1. Fingerman IM, McDaniel L, Zhang X, Ratzat W, Hassan T, Jiang Z, Cohen RF, Schuler GD. NCBI Epigenomics: a new public resource for exploring epigenomic data sets. Nucleic Acids Res. 2011;39:D908–D912. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources
Research Materials
- NCI CPTC Antibody Characterization Program