PubChem Substance and Compound databases - PubMed (original) (raw)

. 2016 Jan 4;44(D1):D1202-13.

doi: 10.1093/nar/gkv951. Epub 2015 Sep 22.

Paul A Thiessen 1, Evan E Bolton 2, Jie Chen 1, Gang Fu 1, Asta Gindulyte 1, Lianyi Han 1, Jane He 1, Siqian He 1, Benjamin A Shoemaker 1, Jiyao Wang 1, Bo Yu 1, Jian Zhang 1, Stephen H Bryant 1

Affiliations

PubChem Substance and Compound databases

Sunghwan Kim et al. Nucleic Acids Res. 2016.

Abstract

PubChem (https://pubchem.ncbi.nlm.nih.gov) is a public repository for information on chemical substances and their biological activities, launched in 2004 as a component of the Molecular Libraries Roadmap Initiatives of the US National Institutes of Health (NIH). For the past 11 years, PubChem has grown to a sizable system, serving as a chemical information resource for the scientific research community. PubChem consists of three inter-linked databases, Substance, Compound and BioAssay. The Substance database contains chemical information deposited by individual data contributors to PubChem, and the Compound database stores unique chemical structures extracted from the Substance database. Biological activity data of chemical substances tested in assay experiments are contained in the BioAssay database. This paper provides an overview of the PubChem Substance and Compound databases, including data sources and contents, data organization, data submission using PubChem Upload, chemical structure standardization, web-based interfaces for textual and non-textual searches, and programmatic access. It also gives a brief description of PubChem3D, a resource derived from theoretical three-dimensional structures of compounds in PubChem, as well as PubChemRDF, Resource Description Framework (RDF)-formatted PubChem data for data sharing, analysis and integration with information contained in other databases.

Published by Oxford University Press on behalf of Nucleic Acids Research 2015. This work is written by (a) US Government employee(s) and is in the public domain in the US.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Data organization in PubChem. SID, CID and AID are the identifiers for the Substance, Compound and BioAssay databases, respectively.

Figure 2.

Figure 2.

PubChem standardization process in which unique chemical structures are extracted from the Substance database and stored in the Compound database.

Figure 3.

Figure 3.

A snapshot of the Document Summary (DocSum) page returned from an Entrez Search for ‘tylenol’ against the PubChem Compound database.

Figure 4.

Figure 4.

A snapshot of the top portion of the Compound Summary page for CID 1983 (Tylenol).

Figure 5.

Figure 5.

A snapshot of the Chemical Structure Search tool.

Figure 6.

Figure 6.

Diagram showing the high-level overview of PubChemRDF semantic relationships.

Similar articles

Cited by

References

    1. Bolton E.E., Wang Y., Thiessen P.A., Bryant S.H. PubChem: integrated platform of small molecules and biological activities. In: Wheeler RA, Spellmeyer DC, editors. Annual Reports in Computational Chemistry. Vol. 4. Amsterdam: Elsevier; 2008. pp. 217–241.
    1. Wang Y.L., Xiao J.W., Suzek T.O., Zhang J., Wang J.Y., Bryant S.H. PubChem: a public information system for analyzing bioactivities of small molecules. Nucleic Acids Res. 2009;37:W623–W633. - PMC - PubMed
    1. Wang Y.L., Bolton E., Dracheva S., Karapetyan K., Shoemaker B.A., Suzek T.O., Wang J.Y., Xiao J.W., Zhang J., Bryant S.H. An overview of the PubChem BioAssay resource. Nucleic Acids Res. 2010;38:D255–D266. - PMC - PubMed
    1. Wang Y.L., Xiao J.W., Suzek T.O., Zhang J., Wang J.Y., Zhou Z.G., Han L.Y., Karapetyan K., Dracheva S., Shoemaker B.A., et al. PubChem's BioAssay Database. Nucleic Acids Res. 2012;40:D400–D412. - PMC - PubMed
    1. Wang Y.L., Suzek T., Zhang J., Wang J.Y., He S.Q., Cheng T.J., Shoemaker B.A., Gindulyte A., Bryant S.H. PubChem BioAssay: 2014 update. Nucleic Acids Res. 2014;42:D1075–D1082. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources