ARDB—Antibiotic Resistance Genes Database (original) (raw)

Abstract

The treatment of infections is increasingly compromised by the ability of bacteria to develop resistance to antibiotics through mutations or through the acquisition of resistance genes. Antibiotic resistance genes also have the potential to be used for bio-terror purposes through genetically modified organisms. In order to facilitate the identification and characterization of these genes, we have created a manually curated database—the Antibiotic Resistance Genes Database (ARDB)—unifying most of the publicly available information on antibiotic resistance. Each gene and resistance type is annotated with rich information, including resistance profile, mechanism of action, ontology, COG and CDD annotations, as well as external links to sequence and protein databases. Our database also supports sequence similarity searches and implements an initial version of a tool for characterizing common mutations that confer antibiotic resistance. The information we provide can be used as compendium of antibiotic resistance factors as well as to identify the resistance genes of newly sequenced genes, genomes, or metagenomes. Currently, ARDB contains resistance information for 13 293 genes, 377 types, 257 antibiotics, 632 genomes, 933 species and 124 genera. ARDB is available at http://ardb.cbcb.umd.edu/.

INTRODUCTION

The discovery of penicillin in 1928 by Alexander Fleming has revolutionized the treatment of bacterial infections. The large-scale use of antibiotics, however, has also led to an increase in the number of microbes that can resist treatment. Drug resistant bacteria are an increasing threat to public health, as highlighted by a recent estimate that in the US methicillin-resistant Staphylococcus aureus (MRSA) may contribute to more deaths than HIV (1). Methicillin-resistant strains of S. aureus were initially documented in the 1960s (2) and have been associated with higher mortality rates (3,4) than their drug-sensitive counterparts. Similar challenges are posed by the emergence of multidrug- and extensively-drug resistant tuberculosis (MDR-TB and XDR-TB, respectively) (5,6). Antibiotic resistance can result from large genomic changes, such as the acquisition of entire plasmids or mobile elements encoding resistance factors. Recent studies are, however, revealing the important role small mutations play in the evolution of resistance. For example, only 35 point-mutations distinguish a vancomycin-resistant strain of S. aureus from its sensitive counterpart, and these mutations evolved in just 3 months within an infected patient (7). Furthermore, antibiotic resistance genes have the potential to be used for bioterrorism purposes through genetically modified organisms. These factors emphasize the urgent need for a better understanding of the mechanisms through which bacteria develop resistance, as well as for the development of new techniques for the rapid identification of resistance factors. The database presented in this article provides a first component of an informatics infrastructure aimed at enabling such studies.

Several mechanisms have been characterized through which bacteria become resistant to antibiotics (8): (i) the production of enzymes that digest/metabolize the antibiotic; (ii) efflux pumps that eliminate the drug from the cell; (iii) modifications to the cellular target of the antibiotic that prevent binding; (iv) activation of an alternate pathway that bypasses drug action; and (v) particularly for gram-negative bacteria, down-regulation or elimination of transmembrane porins through which drugs enter the cell (9). The annotation information commonly associated with genes deposited in public databases is insufficiently detailed for representing this variety of resistance mechanisms and the additional meta-information relevant in this context. Specifically, each resistance gene is associated with a resistance profile (set of antibiotics or classes of antibiotics targeted by the gene), yet this information is usually not available. Second, resistance often requires the cooperation of multiple genes, usually within a same operon [e.g. vancomycin resistance VanA operon requires seven genes (10)], while most annotation information is targeted at individual genes. Finally, resistance frequently results from modifications to, or the disruption of an individual gene (e.g. modifications of the drug target), information incompatible with standard annotation procedures. Consequently, specialized resources are necessary for annotating and cataloging information related to antibiotic resistance.

Several recent efforts have been made to partially unify this information, such as Antibiotic Resistance Genes Online (ARGO) (11), MvirDB (12) and a compendium of TEM β-lactamase genes at the Lahey Clinic (http://www.lahey.org/Studies/). All, however, have limited functionality. ARGO only contains part of β-lactamase, vancomycin and tetracycline resistance genes. In addition, it does not include rich annotation information such as resistance profile, mechanism of action, operon information or gene sequence. Furthermore, many of the links between ARGO and GenBank target incorrect records (e.g. links to a genome instead of the relevant gene record). MvirDB is a broad repository of virulence-associated genes, including toxins, virulence factors and antibiotic resistance. The latter information is simply a replicate of the ARGO database. The Lahey Clinic website is a comprehensive collection of TEM type β-lactamases, which attempts to standardize the nomenclature for these genes. In addition to these specialized resources, antibiotic resistance information can be extracted in a restricted manner from GenBank (13) and SwissProt (14), databases that lack many important types of information relevant in this domain.

To address the limitations of currently available public resources, and to facilitate the identification and characterization of antibiotic resistance genes, we have created a manually curated database [Antibiotic Resistance Genes Database (ARDB)] unifying most of the publicly available genes and related information. Our motivations in creating ARDB are (i) to provide a centralized compendium of information on antibiotic resistance; (ii) to facilitate the consistent annotation of resistance information in newly sequenced organisms; and (iii) to facilitate the identification and characterization of new genes. We believe this resource will be found useful by a broad range of scientists, including microbiologists, clinicians and the bio-defense research community.

DATABASE CONTENTS AND CONSTRUCTION

The diversity of antibiotic resistance genes, types and mechanisms, combined with the fact that related information, such as resistance profile, is mostly ‘paper-bound’ made the construction of ARDB both difficult and time-consuming. To compile, confirm and validate this collection of data, several textbooks and several hundred journal articles were searched and summarized.

The majority of protein and nucleic acid sequences of known antibiotic resistance genes were retrieved from the NCBI nucleotide and protein databases and additional sequences were retrieved from the Swiss-Prot database. Genes were grouped into resistance types based on their protein sequence similarity using the following approach. First, the sequence of an experimentally confirmed representative was identified for every type of resistance, based on literature searches and meta-information provided by the NCBI protein database. These representative resistance genes were then used to ‘fish out’ additional homologues using similarity searches against the NCBI nr database. The similarity cutoff was set at 80% unless a different value was recommended in the literature for a specific resistance type. Using this approach we identified 13 254 protein sequences putatively involved in antibiotic resistance. We filtered this set by removing vector sequences, synthetic constructs and redundant genes, resulting in a non-redundant set of 6206 proteins. This set was further refined by removing incomplete sequences, thereby yielding a core set of 4554 antibiotic resistance proteins. Each sequence was associated with corresponding CDD, COG, ontology and source organism information. Furthermore, the genes were grouped into resistance types, corresponding to clusters of genes with similar resistance profiles, operon membership and mechanism of action. In addition, basic information about known antibiotics was extracted from KEGG DRUG (15), PubChem, PubMed MeSH database and the Chemical Entities of Biological Interest (ChEBI) ontology. Although ARDB is mainly targeted at antibiotic resistance genes, 12 additional drug targets have also been included into ARDB with relevant information [16S rRNA (16), 23S rRNA, gyrA (17), gyrB, parC, parE, rpoB, katG, pncA, embB, folP, dfr], whose modification has been shown to confer resistance.

The data flow for the curation process is highlighted in Figure 1. ARDB is implemented as a MySQL relational database, and the corresponding schema is available on our website. Access to this database is provided through a CGI-based web interface.

ARDB curation data flow.

Figure 1.

ARDB curation data flow.

ONTOLOGY INFORMATION

No comprehensive ontology is currently available for annotating antibiotic resistance information. To facilitate the computational analysis of antibiotic resistance information we have created a set of ontology terms aimed at characterizing both the resistance profile conferred by a specific gene and its specific mechanism of action. Specifically, for every antibiotic X, we have created a set of ‘X resistance’ terms. Furthermore, we classify several mechanisms of action, including drug target modification, replacement or protection, drug enzymatic destruction and drug transport. Drug transport is further subclassified into ATP-binding cassette (ABC) drug efflux, major facilitator superfamily (MFS) drug efflux, small multidrug resistance (SMR) drug efflux and resistance-nodulation-cell division (RND) drug efflux, following the terminology used in (18). These terms are defined within an Antibiotic Resistance (AR) ontology and are associated with each record present in our database. We are currently working with the broader ontology community to further refine this information and integrate it within existing ontology development efforts.

DATA ACCESS AND DATA MINING

Users can access our database through a web interface at http://ardb.cbcb.umd.edu. This interface provides several modes of interaction as highlighted below.

Keyword searches

Simple keyword search is available at the top of each page of ARDB website (Figure 2a), providing a quick means for searching a specific object in our database (gene, type, antibiotic, genome and genus) (Figure 2b and f). Users can search all of the data, or narrow down the search to a specific type of information. For example, users interested in the molecular mechanisms of resistance to tetracycline can search for the keyword ‘tetracycline’ within the ‘Resistance Type’ database. An advanced search function is also available, allowing users to select from among the available keywords associated with each database field.

Sample web pages from ARDB. (a) Front page, (b) resistance type, (c) blast result, (d) mutation annotation, (e) browse and (f) genome information.

Figure 2.

Sample web pages from ARDB. (a) Front page, (b) resistance type, (c) blast result, (d) mutation annotation, (e) browse and (f) genome information.

Similarity searches

BLAST

To help identify and annotate antibiotic resistance genes, a BLAST interface is also provided. One or more sequences can be provided to this interface in a multi-FASTA file, corresponding to a set of gene sequences. Furthermore, both nucleotide and amino-acid sequences are accepted by our system. The results can be visualized as standard BLAST output, however additional displays are provided that are specific to antibiotic resistance information. Our ‘ARDB annotation format’ groups individual BLAST hits according to resistance type as inferred from the level of similarity to the genes within the database associated with a specific type of resistance (Figure 2c). A second view allows users to download a tab-delimited spreadsheet summary of the antibiotic resistance genes identified within the uploaded file.

RPSBLAST

In addition to BLAST we also provide an RPSBLAST (19) interface relying on Position Specific Scoring Matrix (PSSM) created from sequences associated with each resistance type, using an approach similar to the NCBI Conserved Domain Database (20). The output of this interface is similar to that provided by the BLAST interface mentioned above.

Polymorphism detection

Additionally, a mutation-specific search function is provided to identify polymorphisms previously characterized to confer resistance (Figure 2d). For example, a G-C mutation at position 1058 of the Escherichia coli 16S rRNA has been shown to confer resistance to tetracycline (21). This information is extracted from the detailed BLAST alignment between the query sequence and a reference sequence in our database. Currently this function is available for 12 genes (16S rRNA, 23S rRNA, gyrA, gyrB, parC, parE, rpoB, katG, pncA, embB, folP, dfr), and we expect to extend it as more information becomes available in the literature.

Pre-annotated information

The antibiotic resistance profiles of 632 complete bacterial genomes have already been annotated and deposited in ARDB allowing quick search. This information can be conveniently extracted through keyword searches against the ‘genome’ database, or through the ‘Genome Resistance Profiles Comparison’ link from the front page. The latter approach allows users to summarize and compare the resistance profiles of multiple organisms present in our database.

Browse

A ‘browse’ function is available that allows the users to visualize several classes of antibiotic resistance genes, grouped by their resistance profile. This functionality is currently available for aminoglycoside, β-lactam, macrolide–lincosamide–streptogramin B, multidrug transporter, tetracycline and vancomycin resistance (Figure 2e).

Submission

In order to facilitate community-driven refinement of our database we provide an interface through which users can submit information about novel resistance genes. This interface captures several types of information not commonly available in other databases [Minimum Inhibitory Concentration (MIC), resistance type, ontology, citation information, etc.]. Furthermore we provide a simple file format and upload functionality to facilitate the submission of information for multiple genes. The information received will be vetted and inserted into the database. We are also planning to develop an interface that allows community-deposited information to be directly added to the database as ‘provisional’ records, pending additional manual curation.

CONCLUSION AND DISCUSSION

The database described in this article, ARDB, unifies most of the publicly available antibiotic resistance genes and provides a reliable annotation service to researchers investigating the molecular basis for resistance in bacteria. Because of the large diversity and the rapid identification of new resistance genes, the current version of ARDB is just a first catalog of currently available information, and will continue to be updated over the coming months and years. We plan to coordinate our development efforts with researchers actively involved in antibiotic resistance research as well as with the developers of biological ontologies and of databases storing related information (such as virulence factors or toxins). As part of these efforts we aim to refine the structure of our database, better determine the types of information stored and identify additional requirements for the user interface. Future efforts will also target the development of new approaches for cataloguing and characterizing polymorphisms correlated with resistance, as well as for annotating changes to cellular regulatory networks that underlie the mechanisms of drug tolerance.

ACKNOWLEDGEMENTS

We would like to thank Kim Bishop-Lilly and Tim Read for providing initial feedback on our database and for their insightful comments and advice.

FUNDING

Uniformed Services University of the Health Sciences, administered by the Henry Jackson Foundation (HU001-06-1-0015 to M.P.). Funding for open access charge: Uniformed Services University of the Health Sciences, administered by the Henry Jackson Foundation (HU001-06-1-0015).

Conflict of interest statement. None declared.

REFERENCES

1

Antimicrobial resistance: it's not just for hospitals

,

JAMA

,

2007

, vol.

298

(pg.

1803

-

1804

)

2

Methicillin-resistant staphylococci

,

J. Clin. Pathol.

,

1961

, vol.

14

(pg.

385

-

393

)

3

Nosocomial methicillin-resistant Staphylococcus aureus bacteremia: is it any worse than nosocomial methicillin-sensitive Staphylococcus aureus bacteremia?

,

Infect. Control. Hosp. Epidemiol.

,

2000

, vol.

21

(pg.

645

-

648

)

4

Mortality after infection with methicillin-resistant Staphylococcus aureus (MRSA) diagnosed in the community

,

BMC Med.

,

2008

, vol.

6

pg.

2

5

et al.

Detection of multidrug resistance in Mycobacterium tuberculosis

,

J. Clin. Microbiol.

,

2007

, vol.

45

(pg.

179

-

192

)

6

Extensively drug-resistant tuberculosis as a cause of death in patients co-infected with tuberculosis and HIV in a rural area of South Africa

,

Lancet

,

2006

, vol.

368

(pg.

1575

-

1580

)

7

et al.

Tracking the in vivo evolution of multidrug resistance in Staphylococcus aureus by whole-genome sequencing

,

Proc. Natl Acad. Sci USA

,

2007

, vol.

104

(pg.

9451

-

9456

)

8

Molecular mechanisms of antibacterial multidrug resistance

,

Cell

,

2007

, vol.

128

(pg.

1037

-

1050

)

9

Porins, efflux pumps and multidrug resistance in Acinetobacter baumannii

,

J. Antimicrob. Chemother.

,

2007

, vol.

59

(pg.

1210

-

1215

)

10

Vancomycin resistance in gram-positive cocci

,

Clin. Infect. Dis.

,

2006

, vol.

42

Suppl. 1

(pg.

S25

-

S34

)

11

Antibiotic Resistance Genes Online (ARGO): a Database on vancomycin and beta-lactam resistance genes

,

Bioinformation

,

2005

, vol.

1

(pg.

5

-

7

)

12

MvirDB—a microbial database of protein toxins, virulence factors and antibiotic resistance genes for bio-defence applications

,

Nucleic Acids Res.

,

2007

, vol.

35

(pg.

D391

-

D394

)

13

GenBank

,

Nucleic Acids Res.

,

2008

, vol.

36

(pg.

D25

-

D30

)

14

The universal protein resource (UniProt)

,

Nucleic Acids Res.

,

2008

, vol.

36

(pg.

D190

-

D195

)

15

et al.

KEGG for linking genomes to life and the environment

,

Nucleic Acids Res.

,

2008

, vol.

36

(pg.

D480

-

D484

)

16

Ribosomal RNA and protein mutants resistant to spectinomycin

,

EMBO J

,

1990

, vol.

9

(pg.

735

-

739

)

17

A double mutation in the gyrA gene is necessary to produce high levels of resistance to moxifloxacin in Campylobacter spp. clinical isolates

,

Int. J. Antimicrob. Agents

,

2005

, vol.

25

(pg.

542

-

545

)

18

Multiple molecular mechanisms for multidrug resistance transporters

,

Nature

,

2007

, vol.

446

(pg.

749

-

757

)

19

Gapped BLAST and PSI-BLAST: a new generation of protein database search programs

,

Nucleic Acids Res.

,

1997

, vol.

25

(pg.

3389

-

3402

)

20

et al.

CDD: a Conserved Domain Database for protein classification

,

Nucleic Acids Res.

,

2005

, vol.

33

(pg.

D192

-

D196

)

21

16S rRNA mutation associated with tetracycline resistance in a gram-positive bacterium

,

Antimicrob. Agents Chemother.

,

1998

, vol.

42

(pg.

1702

-

1705

)

© 2008 The Author(s)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.