BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata - PubMed (original) (raw)
. 2012 Jan;40(Database issue):D57-63.
doi: 10.1093/nar/gkr1163. Epub 2011 Dec 1.
Karen Clark, Robert Gevorgyan, Vyacheslav Gorelenkov, Eugene Gribov, Ilene Karsch-Mizrachi, Michael Kimelman, Kim D Pruitt, Sergei Resenchuk, Tatiana Tatusova, Eugene Yaschenko, James Ostell
Affiliations
- PMID: 22139929
- PMCID: PMC3245069
- DOI: 10.1093/nar/gkr1163
BioProject and BioSample databases at NCBI: facilitating capture and organization of metadata
Tanya Barrett et al. Nucleic Acids Res. 2012 Jan.
Abstract
As the volume and complexity of data sets archived at NCBI grow rapidly, so does the need to gather and organize the associated metadata. Although metadata has been collected for some archival databases, previously, there was no centralized approach at NCBI for collecting this information and using it across databases. The BioProject database was recently established to facilitate organization and classification of project data submitted to NCBI, EBI and DDBJ databases. It captures descriptive information about research projects that result in high volume submissions to archival databases, ties together related data across multiple archives and serves as a central portal by which to inform users of data availability. Concomitantly, the BioSample database is being developed to capture descriptive information about the biological samples investigated in projects. BioProject and BioSample records link to corresponding data stored in archival repositories. Submissions are supported by a web-based Submission Portal that guides users through a series of forms for input of rich metadata describing their projects and samples. Together, these databases offer improved ways for users to query, locate, integrate and interpret the masses of data held in NCBI's archival repositories. The BioProject and BioSample databases are available at http://www.ncbi.nlm.nih.gov/bioproject and http://www.ncbi.nlm.nih.gov/biosample, respectively.
Figures
Figure 1.
Schematic depicting how BioProject, BioSample and data objects can be organized and linked. This example is composed of one umbrella project that encompasses three subprojects, each of which generated data derived from two BioSample records. Users can query either the BioProject or the BioSample database to retrieve the relevant records, and then navigate through links to the corresponding experimental data which continue to be stored in NCBI's primary data archives, including GenBank, SRA, dbGaP and GEO. This schematic depicts direct links that can be applied between objects; it does not depict links to corresponding records in other NCBI databases, including PubMed, Gene, Genome and Taxonomy.
Figure 2.
Screenshot of a Genome Sequencing project that is a component of an umbrella project that encompasses data generated from an E. coli pathogen outbreak (upper panel) (17) and a corresponding sample record (lower panel). The records display the project title, summary, data type, locus_tag prefix and various project attributes including the scope and capture method (A). The Project Data section (B) lists the availability of corresponding sequence and assembly data in the Nucleotide and SRA databases where the data can be downloaded. Navigation panels assist users to link to Genome-level resources for that organism (C), or to ‘Navigate Up’ to the parent umbrella project, or to ‘Navigate across’ to sibling projects that are part of that umbrella project, as well as any additional projects related by organism (D). The ‘Related Information’ panel (E) contains full list of linkages for that record; clicking the BioSample link directs the user to the sample record shown in the lower panel, which lists the attributes that were collected for that sample including the collection date, isolation source, country and strain and serovar (F).
Similar articles
- "METAGENOTE: a simplified web platform for metadata annotation of genomic samples and streamlined submission to NCBI's sequence read archive".
Quiñones M, Liou DT, Shyu C, Kim W, Vujkovic-Cvijin I, Belkaid Y, Hurt DE. Quiñones M, et al. BMC Bioinformatics. 2020 Sep 3;21(1):378. doi: 10.1186/s12859-020-03694-0. BMC Bioinformatics. 2020. PMID: 32883210 Free PMC article. - BioSamples database: an updated sample metadata hub.
Courtot M, Cherubin L, Faulconbridge A, Vaughan D, Green M, Richardson D, Harrison P, Whetzel PL, Parkinson H, Burdett T. Courtot M, et al. Nucleic Acids Res. 2019 Jan 8;47(D1):D1172-D1178. doi: 10.1093/nar/gky1061. Nucleic Acids Res. 2019. PMID: 30407529 Free PMC article. - The CAIRR Pipeline for Submitting Standards-Compliant B and T Cell Receptor Repertoire Sequencing Studies to the National Center for Biotechnology Information Repositories.
Bukhari SAC, O'Connor MJ, Martínez-Romero M, Egyedi AL, Willrett D, Graybeal J, Musen MA, Rubelt F, Cheung KH, Kleinstein SH. Bukhari SAC, et al. Front Immunol. 2018 Aug 16;9:1877. doi: 10.3389/fimmu.2018.01877. eCollection 2018. Front Immunol. 2018. PMID: 30166985 Free PMC article. - Gene expression omnibus: microarray data storage, submission, retrieval, and analysis.
Barrett T, Edgar R. Barrett T, et al. Methods Enzymol. 2006;411:352-69. doi: 10.1016/S0076-6879(06)11019-8. Methods Enzymol. 2006. PMID: 16939800 Free PMC article. Review. - Overview of FEED, the feeding experiments end-user database.
Wall CE, Vinyard CJ, Williams SH, Gapeyev V, Liu X, Lapp H, German RZ. Wall CE, et al. Integr Comp Biol. 2011 Aug;51(2):215-23. doi: 10.1093/icb/icr047. Epub 2011 Jun 22. Integr Comp Biol. 2011. PMID: 21700574 Free PMC article. Review.
Cited by
- The International Nucleotide Sequence Database Collaboration.
Nakamura Y, Cochrane G, Karsch-Mizrachi I; International Nucleotide Sequence Database Collaboration. Nakamura Y, et al. Nucleic Acids Res. 2013 Jan;41(Database issue):D21-4. doi: 10.1093/nar/gks1084. Epub 2012 Nov 24. Nucleic Acids Res. 2013. PMID: 23180798 Free PMC article. - GenBank.
Benson DA, Cavanaugh M, Clark K, Karsch-Mizrachi I, Lipman DJ, Ostell J, Sayers EW. Benson DA, et al. Nucleic Acids Res. 2013 Jan;41(Database issue):D36-42. doi: 10.1093/nar/gks1195. Epub 2012 Nov 27. Nucleic Acids Res. 2013. PMID: 23193287 Free PMC article. - Pathogen metadata platform: software for accessing and analyzing pathogen strain information.
Chang WE, Peterson MW, Garay CD, Korves T. Chang WE, et al. BMC Bioinformatics. 2016 Sep 15;17(1):379. doi: 10.1186/s12859-016-1231-2. BMC Bioinformatics. 2016. PMID: 27634291 Free PMC article. - KGHC: a knowledge graph for hepatocellular carcinoma.
Li N, Yang Z, Luo L, Wang L, Zhang Y, Lin H, Wang J. Li N, et al. BMC Med Inform Decis Mak. 2020 Jul 9;20(Suppl 3):135. doi: 10.1186/s12911-020-1112-5. BMC Med Inform Decis Mak. 2020. PMID: 32646496 Free PMC article. - AgroSeek: a system for computational analysis of environmental metagenomic data and associated metadata.
Liang X, Akers K, Keenum I, Wind L, Gupta S, Chen C, Aldaihani R, Pruden A, Zhang L, Knowlton KF, Xia K, Heath LS. Liang X, et al. BMC Bioinformatics. 2021 Mar 10;22(1):117. doi: 10.1186/s12859-021-04035-5. BMC Bioinformatics. 2021. PMID: 33691615 Free PMC article.
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Research Materials