DrugBank: a comprehensive resource for in silico drug discovery and exploration (original) (raw)

Journal Article

Department of Computing Science and Department of Biological Sciences, University of Alberta Edmonton, AB, Canada T6G 2E8

Search for other works by this author on:

Department of Computing Science and Department of Biological Sciences, University of Alberta Edmonton, AB, Canada T6G 2E8

Search for other works by this author on:

Department of Computing Science and Department of Biological Sciences, University of Alberta Edmonton, AB, Canada T6G 2E8

Search for other works by this author on:

Department of Computing Science and Department of Biological Sciences, University of Alberta Edmonton, AB, Canada T6G 2E8

Search for other works by this author on:

Department of Computing Science and Department of Biological Sciences, University of Alberta Edmonton, AB, Canada T6G 2E8

Search for other works by this author on:

Department of Computing Science and Department of Biological Sciences, University of Alberta Edmonton, AB, Canada T6G 2E8

Search for other works by this author on:

Department of Computing Science and Department of Biological Sciences, University of Alberta Edmonton, AB, Canada T6G 2E8

Search for other works by this author on:

Department of Computing Science and Department of Biological Sciences, University of Alberta Edmonton, AB, Canada T6G 2E8

Search for other works by this author on:

Revision received:

08 October 2005

Accepted:

08 October 2005

Published:

01 January 2006

Cite

David S. Wishart, Craig Knox, An Chi Guo, Savita Shrivastava, Murtaza Hassanali, Paul Stothard, Zhan Chang, Jennifer Woolsey, DrugBank: a comprehensive resource for in silico drug discovery and exploration, Nucleic Acids Research, Volume 34, Issue suppl_1, 1 January 2006, Pages D668–D672, https://doi.org/10.1093/nar/gkj067
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

DrugBank is a unique bioinformatics/cheminformatics resource that combines detailed drug (i.e. chemical) data with comprehensive drug target (i.e. protein) information. The database contains >4100 drug entries including >800 FDA approved small molecule and biotech drugs as well as >3200 experimental drugs. Additionally, >14 000 protein or drug target sequences are linked to these drug entries. Each DrugCard entry contains >80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data. Many data fields are hyperlinked to other databases (KEGG, PubChem, ChEBI, PDB, Swiss-Prot and GenBank) and a variety of structure viewing applets. The database is fully searchable supporting extensive text, sequence, chemical structure and relational query searches. Potential applications of DrugBank include in silico drug target discovery, drug design, drug docking or screening, drug metabolism prediction, drug interaction prediction and general pharmaceutical education. DrugBank is available at http://redpoll.pharmacy.ualberta.ca/drugbank/.

INTRODUCTION

Until the 1980s, most of our knowledge about drugs, drug mechanisms and drug receptors could fit in a few encyclopedic books and a couple dozen schematic figures. However, with the recent explosion in biological and chemical knowledge, this is no longer the case. There is simply too much data (images, models, structures and sequences) from too many sources. Unfortunately, most of this information still resides in textbooks or print journals. The limited drug or drug receptor data that is electronically available is either inaccessible (except through expensive subscriptions), inadequate or widely scattered among many different public databases. This state of affairs largely reflects the ‘two solitudes’ of cheminformatics and bioinformatics. Neither discipline has really tried to integrate with the other. As a consequence, the wealth of electronic sequence/structure data that exists today has never been well linked to the enormous body of drug or chemical knowledge that has accumulated over the past half century.

Recently, some notable efforts have been made to partially overcome this ‘informatics gap’. The Therapeutic Target Database or TTD is one such example (1). This very useful web-based resource contains linked lists of names for >1100 small molecule drugs and drug targets (i.e. proteins). In addition to the TTD, a number of more comprehensive small molecule databases have also emerged including KEGG (2), ChEBI (3) and PubChem (http://pubchem.ncbi.nlm.nih.gov/). Each contains tens of thousands of chemical entries—including hundreds of small molecule drugs. All three databases provide names, synonyms, images, structure files and hyperlinks to other databases. Furthermore, both KEGG and PubChem support structure similarity searches. Unfortunately, these databases were not specifically designed to be drug databases, and so they do not provide specific pharmaceutical information or links to specific drug targets (i.e. sequences). Furthermore, because these databases were designed to be synoptic (containing <15 fields per compound entry) they do not provide a comprehensive molecular summary of any given drug or its corresponding protein target. More specialized drug databases such as PharmGKB (4) or on-line pharmaceutical encyclopedias such as RxList (5) tend to offer much more detailed clinical information about many drugs (their pharmacology, metabolism and indications) but they were not designed to contain structural, chemical or physico-chemical information. Instead their data content is targeted more towards pharmacists, physicians or consumers.

Ideally, what is needed is something that combines the strengths of, say, PharmGKB, PubChem and Swiss-Prot to create a single, fully searchable in silico drug resource that links sequence, structure and mechanistic data about drug molecules (including biotech drugs) with sequence, structure and mechanistic data about their drug targets. Beyond its obvious educational value, this kind of database could potentially allow researchers to easily visualize and explore 3D drug interactions, compare drug similarities or perform in silico drug (or drug target) discovery. Here, we wish to describe just such a database—called DrugBank.

DATABASE DESCRIPTION

Fundamentally, DrugBank is a dual purpose bioinformatics–cheminformatics database with a strong focus on quantitative, analytic or molecular-scale information about both drugs and drug targets. In many respects it combines the data-rich molecular biology content normally found in curated sequence databases such as Swiss-Prot and UniProt (6) with the equally rich data found in medicinal chemistry textbooks and chemical reference handbooks. By bringing these two disparate types of information together into one unified and freely available resource, we wanted to allow educators and researchers from diverse disciplines and backgrounds (academic, industrial, clinical, non-clinical) to conduct the type of in silico learning and discovery that is now routine in the world of genomics and proteomics.

The diversity of data types and the required breadth of domain knowledge, combined with the fact that the data were mostly ‘paper-bound’ made the assembly of DrugBank both difficult and time-consuming. To compile, confirm and validate this comprehensive collection of data, more than a dozen textbooks, several hundred journal articles, nearly 30 different electronic databases, and at least 20 in-house or web-based programs were individually searched, accessed, compared, written or run over the course of four years. The team of DrugBank archivists and annotators included two accredited pharmacists, a physician and three bioinformaticians with dual training in computing science and molecular biology/chemistry.

DrugBank currently contains >4100 drug entries, corresponding to >12 000 different trade names and synonyms. These drug entries were chosen according to the following rules: the molecule must contain more than one type of atom, be non-redundant, have a known chemical structure and be identified as a drug or drug-like molecule by at least one reputable data source. To facilitate more targeted research and exploration, DrugBank is divided into four major categories: (i) FDA-approved small molecule drugs (>700 entries), (ii) FDA-approved biotech (protein/peptide) drugs (>100 entries), (iii) nutraceuticals or micronutrients such as vitamins and metabolites (>60 entries) and (iv) experimental drugs, including unapproved drugs, de-listed drugs, illicit drugs, enzyme inhibitors and potential toxins (3200 entries). These individual ‘Drug Types’ are also bundled into two larger categories including all FDA drugs (Approved Drugs) and All Compounds (Experimental + FDA + nutraceuticals). DrugBank's coverage for non-trivial FDA-approved drugs is ∼80% complete. In addition, >14 000 protein (i.e. drug target) sequences are linked to these drug entries. More complete information about the numbers of drugs, drug targets and non-redundant drug targets (including their sequences) is available in the DrugBank ‘download’ page. The entire database, including text, sequence, structure and image data occupies nearly 16 gigabytes of data—most of which can be freely downloaded.

DrugBank is a fully searchable web-enabled resource with many built-in tools and features for viewing, sorting and extracting drug or drug target data. Detailed instructions on where to locate and how to use these browsing/search tools are provided on the DrugBank homepage. As with any web-enabled database, DrugBank supports standard text queries (through the text search box located on the home page). It also offers general database browsing using the ‘Browse’ and ‘PharmaBrowse’ buttons located at the top of each DrugBank page. To facilitate general browsing, DrugBank is divided into synoptic summary tables which, in turn, are linked to more detailed ‘DrugCards’—in analogy to the very successful GeneCards concept (7). All of DrugBank's summary tables can be rapidly browsed, sorted or reformatted (using up to six different criteria) in a manner similar to the way PubMed abstracts may be viewed. Clicking on the DrugCard button found in the leftmost column of any given DrugBank summary table opens a webpage describing the drug of interest in much greater detail. Each DrugCard entry contains >80 data fields with half of the information being devoted to drug/chemical data and the other half devoted to drug target or protein data (see Table 1). In addition to providing comprehensive numeric, sequence and textual data, each DrugCard also contains hyperlinks to other databases, abstracts, digital images and interactive applets for viewing molecular structures (Figure 1). In addition to the general browsing features, DrugBank also provides a more specialized ‘PharmBrowse’ feature. This is designed for pharmacists, physicians and medicinal chemists who tend to think of drugs in clusters of indications or drug classes. This particular browsing tool provides navigation hyperlinks to >70 drug classes, which in turn list the FDA-approved drugs associated with the drugs. Each drug name is then linked to its respective DrugCard.

A key distinguishing feature of DrugBank from other on-line drug resources is its extensive support for higher level database searching and selecting functions. In addition to the data viewing and sorting features already described, DrugBank also offers a local BLAST (8) search that supports both single and multiple sequence queries, a boolean text search [using GLIMPSE; (9)], a chemical structure search utility and a relational data extraction tool (10). These can all be accessed via the database navigation bar located at the top of every DrugBank page.

The BLAST search (SeqSearch) is particularly useful as it can potentially allow users to quickly and simply identify drug leads from newly sequenced pathogens. Specifically, a new sequence, a group of sequences or even an entire proteome can be searched against DrugBank's database of known drug target sequences by pasting the FASTA formatted sequence (or sequences) into the SeqSearch query box and pressing the ‘submit’ button. A significant hit reveals, through the associated DrugCard hyperlink, the name(s) or chemical structure(s) of potential drug leads that may act on that query protein (or proteome).

DrugBank's structure similarity search tool (ChemQuery) can be used in a similar manner to its sequence search tools. Users may sketch (through ACD's freely available chemical sketching applet) or paste a SMILES string (11) of a possible lead compound into the ChemQuery window. Submitting the query launches a structure similarity search tool that looks for common substructures from the query compound that match DrugBank's database of known drug or drug-like compounds. High scoring hits are presented in a tabular format with hyperlinks to the corresponding DrugCards (which in turn links to the protein target). The ChemQuery tool allows users to quickly determine whether their compound of interest acts on the desired protein target. This kind of chemical structure search may also reveal whether the compound of interest may unexpectedly interact with unintended protein targets. In addition to these structure similarity searches, the ChemQuery utility also supports compound searches on the basis of chemical formula and molecular weight ranges.

DrugBank's data extraction utility (Data Extractor) employs a simple relational database system that allows users to select one or more data fields and to search for ranges, occurrences or partial occurrences of words, strings or numbers. The data extractor uses clickable web forms so that users may intuitively construct SQL-like queries. Using a few mouse clicks, it is relatively simple to construct very complex queries (‘find all drugs less than 600 daltons with LogPs less than 3.2 that are antihistamines’) or to build a series of highly customized tables. The output from these queries is provided as an HTML format with hyperlinks to all associated DrugCards.

QUALITY ASSURANCE, COMPLETENESS AND CURATION

Every effort is made to ensure that DrugBank is as complete, correct and current as possible. Each DrugCard is entered or prepared by one member of the curation team and separately validated by second member of the curation team. Additional spot checks are routinely performed on each entry by senior members of the curation group, including a physician, an accredited pharmacist and two PhD-level biochemists. Several software packages including text mining tools, chemical parameter calculators and protein annotation tools (10) have been modified or specifically developed to aid in DrugBank's data entry and data validation. These tools collate and display text (and images) from multiple sources allowing the curators to compare, assess, enter and correct drug or drug target information. In addition to using a CVS (Current Versioning System), all changes and edits to the central database are monitored, dated and displayed on the DrugBank ‘download’ page using a specially developed text tracking system. A second text tracking system has been implemented to monitor the completeness (0–100%) of each field (for all approved drugs) and to display up-to-date statistics on the number of drugs, drug targets and non-redundant sequences in various drug categories. This information is also displayed in the ‘download’ page. To ensure DrugBank is current, new drugs (approved and experimental) are identified using continuously running screen-scraping tools linked to the FDA, the PDB and RxList websites. Backfilling of older, more obscure and orphan drugs is ongoing and done manually. Drug targets are identified and confirmed using multiple sources (PubMed, TTD, FDA labels, RxList, PharmGKB, textbooks) as are all drug structures (KEGG, PubChem, images from FDA labels).

CONCLUSION

In summary, DrugBank is a comprehensive, web-accessible database that brings together quantitative chemical, physical, pharmaceutical and biological data about thousands of well-studied drugs and drug targets. DrugBank is primarily focused on providing the kind of detailed molecular data needed to facilitate drug discovery and drug development. This includes physical property data, structure and image files, pharmacological and physiological data about thousands of drug products as well as extensive molecular biological information about their corresponding drug targets. DrugBank is unique, not only in the type of data it provides but also in the level of integration and depth of coverage it achieves. In addition to its extensive small molecule drug coverage, DrugBank is certainly the only public database we are aware of that provides any significant information about the 110+ approved biotech drugs. DrugBank also supports an extensive array of visualizing, querying and search options including a structure similarity search tool and an easy-to-use relational data extraction system. It is hoped that DrugBank will serve as a useful resource to not only members of the pharmaceutical research community but to educators, students, clinicians and the general public.

A screenshot montage of the DrugBank Database showing several possible views of information describing the drug Ramipril. Not all fields are shown.

Figure 1

A screenshot montage of the DrugBank Database showing several possible views of information describing the drug Ramipril. Not all fields are shown.

Table 1

Summary of the data fields or data types found in each DrugCard

Drug or compound information	Drug target or receptor information
Generic name	Target name
Brand name(s)/synonyms	Target synonyms
IUPAC name	Target protein sequence
Chemical structure/sequence	Target no. of residues
Chemical formula	Target molecular weight
PubChem/KEGG/ChEBI Links	Target pI
Swiss-Prot/GenBank Links	Target gene ontology
FDA/MSDS/RxList Links	Target general function
Molecular weight	Target specific function
Melting point	Target pathways
Water solubility	Target reactions
pKa or pI	Target Pfam domains
LogP or hydrophobicity	Target signal sequences
NMR/MS spectra	Target transmembrane regions
MOL/SDF/PDF text files	Target essentiality
MOL/PDB image files	Target GenBank protein ID
SMILES string	Target Swiss-Prot ID
Indication	Target PDB ID
Pharmacology	Target cellular location
Mechanism of action	Target DNA sequence
Biotransformation/absorption	Target chromosome location
Patient/physician information	Target locus
Metabolizing enzymes	Target SNPs/mutations

Drug or compound information	Drug target or receptor information
Generic name	Target name
Brand name(s)/synonyms	Target synonyms
IUPAC name	Target protein sequence
Chemical structure/sequence	Target no. of residues
Chemical formula	Target molecular weight
PubChem/KEGG/ChEBI Links	Target pI
Swiss-Prot/GenBank Links	Target gene ontology
FDA/MSDS/RxList Links	Target general function
Molecular weight	Target specific function
Melting point	Target pathways
Water solubility	Target reactions
pKa or pI	Target Pfam domains
LogP or hydrophobicity	Target signal sequences
NMR/MS spectra	Target transmembrane regions
MOL/SDF/PDF text files	Target essentiality
MOL/PDB image files	Target GenBank protein ID
SMILES string	Target Swiss-Prot ID
Indication	Target PDB ID
Pharmacology	Target cellular location
Mechanism of action	Target DNA sequence
Biotransformation/absorption	Target chromosome location
Patient/physician information	Target locus
Metabolizing enzymes	Target SNPs/mutations

A more complete listing is provided on the DrugBank home page.

Table 1

Summary of the data fields or data types found in each DrugCard

Drug or compound information	Drug target or receptor information
Generic name	Target name
Brand name(s)/synonyms	Target synonyms
IUPAC name	Target protein sequence
Chemical structure/sequence	Target no. of residues
Chemical formula	Target molecular weight
PubChem/KEGG/ChEBI Links	Target pI
Swiss-Prot/GenBank Links	Target gene ontology
FDA/MSDS/RxList Links	Target general function
Molecular weight	Target specific function
Melting point	Target pathways
Water solubility	Target reactions
pKa or pI	Target Pfam domains
LogP or hydrophobicity	Target signal sequences
NMR/MS spectra	Target transmembrane regions
MOL/SDF/PDF text files	Target essentiality
MOL/PDB image files	Target GenBank protein ID
SMILES string	Target Swiss-Prot ID
Indication	Target PDB ID
Pharmacology	Target cellular location
Mechanism of action	Target DNA sequence
Biotransformation/absorption	Target chromosome location
Patient/physician information	Target locus
Metabolizing enzymes	Target SNPs/mutations

Drug or compound information	Drug target or receptor information
Generic name	Target name
Brand name(s)/synonyms	Target synonyms
IUPAC name	Target protein sequence
Chemical structure/sequence	Target no. of residues
Chemical formula	Target molecular weight
PubChem/KEGG/ChEBI Links	Target pI
Swiss-Prot/GenBank Links	Target gene ontology
FDA/MSDS/RxList Links	Target general function
Molecular weight	Target specific function
Melting point	Target pathways
Water solubility	Target reactions
pKa or pI	Target Pfam domains
LogP or hydrophobicity	Target signal sequences
NMR/MS spectra	Target transmembrane regions
MOL/SDF/PDF text files	Target essentiality
MOL/PDB image files	Target GenBank protein ID
SMILES string	Target Swiss-Prot ID
Indication	Target PDB ID
Pharmacology	Target cellular location
Mechanism of action	Target DNA sequence
Biotransformation/absorption	Target chromosome location
Patient/physician information	Target locus
Metabolizing enzymes	Target SNPs/mutations

A more complete listing is provided on the DrugBank home page.

The authors wish to thank Genome Prairie, a division of Genome Canada for financial support. Funding to pay the Open Access publication charges for this article was provided by Genome Canada.

Conflict of interest statement. None declared.

REFERENCES

Chen, X., Ji, Z.L., Chen, Y.Z.

2002

TTD: therapeutic target database

Nucleic Acids Res

412

–415

Kanehisa, M., Goto, S., Kawashima, S., Okuno, Y., Hattori, M.

2004

The KEGG resource for deciphering the genome

Nucleic Acids Res

D277

–D280

Brooksbank, C., Cameron, G., Thornton, J.

2005

The European Bioinformatics Institute's data resources: towards systems biology

Nucleic Acids Res

D46

–D53

Hewett, M., Oliver, D.E., Rubin, D.L., Easton, K.L., Stuart, J.M., Altman, R.B., Klein, T.E.

2002

PharmGKB: the Pharmacogenetics Knowledge Base

Nucleic Acids Res

163

–165

Hatfield, C.L., May, S.K., Markoff, J.S.

1999

Quality of consumer drug information provided by four web sites

Am. J. Health Syst. Pharm

2308

–2311

Bairoch, A., Apweiler, R., Wu, C.H., Barker, W.C., Boeckmann, B., Ferro, S., Gasteiger, E., Huang, H., Lopez, R., Magrane, M., et al.

2005

The Universal Protein Resource (UniProt)

Nucleic Acids Res

D154

–D159

Rebhan, M., Chalifa-Caspi, V., Prilusky, J., Lancet, D.

1998

GeneCards: a novel functional genomics compendium with automated data mining and query reformulation support

Bioinformatics

656

–664

Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J.

1997

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

Nucleic Acids Res

3389

–3402

Manber, U. and Bigot, P.

USENIX Symposium on Internet Technologies and Systems (NSITS'97)

1997

Monterey, CA pp.

231

–239

Sundararaj, S., Guo, A., Habibi-Nazhad, B., Rouani, M., Stothard, P., Ellison, M., Wishart, D.S.

2004

The CyberCell Database (CCDB): a comprehensive, self-updating, relational database to coordinate and facilitate in silico modeling of Escherichia coli

Nucleic Acids Res

D293

–D295

Weininger, D.

1988

SMILES 1. Introduction and encoding rules

J. Chem. Inf. Comput. Sci

–38

© The Author 2006. Published by Oxford University Press. All rights reserved The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org

I agree to the terms and conditions. You must accept the terms and conditions.

Submit a comment

Name

Affiliations

Comment title

Comment

You have entered an invalid code

Thank you for submitting a comment on this article. Your comment will be reviewed and published at the journal's discretion. Please check for further notifications by email.

Citations

Views

Altmetric

Metrics

Total Views 28,880

22,277 Pageviews

6,603 PDF Downloads

Since 11/1/2016

Month:	Total Views:
November 2016	6
December 2016	9
January 2017	38
February 2017	86
March 2017	111
April 2017	82
May 2017	64
June 2017	82
July 2017	81
August 2017	83
September 2017	48
October 2017	80
November 2017	87
December 2017	246
January 2018	291
February 2018	308
March 2018	325
April 2018	252
May 2018	252
June 2018	250
July 2018	248
August 2018	301
September 2018	264
October 2018	255
November 2018	304
December 2018	287
January 2019	189
February 2019	204
March 2019	263
April 2019	307
May 2019	239
June 2019	252
July 2019	262
August 2019	230
September 2019	251
October 2019	257
November 2019	212
December 2019	162
January 2020	200
February 2020	262
March 2020	180
April 2020	188
May 2020	147
June 2020	276
July 2020	199
August 2020	219
September 2020	290
October 2020	306
November 2020	302
December 2020	285
January 2021	281
February 2021	337
March 2021	536
April 2021	375
May 2021	358
June 2021	280
July 2021	326
August 2021	412
September 2021	499
October 2021	538
November 2021	482
December 2021	471
January 2022	453
February 2022	607
March 2022	615
April 2022	558
May 2022	516
June 2022	391
July 2022	413
August 2022	395
September 2022	340
October 2022	409
November 2022	423
December 2022	424
January 2023	465
February 2023	436
March 2023	497
April 2023	355
May 2023	429
June 2023	281
July 2023	352
August 2023	365
September 2023	367
October 2023	371
November 2023	372
December 2023	371
January 2024	437
February 2024	410
March 2024	512
April 2024	464
May 2024	415
June 2024	367
July 2024	280
August 2024	373
September 2024	400

DrugBank: a comprehensive resource for in silico drug discovery and exploration (original) (raw)

Cite

Abstract

INTRODUCTION

DATABASE DESCRIPTION

QUALITY ASSURANCE, COMPLETENESS AND CURATION

CONCLUSION

REFERENCES

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Cited

DrugBank: a comprehensive resource for in silico drug discovery and exploration (original) (raw)

Cite

Abstract

INTRODUCTION

DATABASE DESCRIPTION

QUALITY ASSURANCE, COMPLETENESS AND CURATION

CONCLUSION

REFERENCES

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited