HydDB: A web tool for hydrogenase classification and analysis - PubMed (original) (raw)

HydDB: A web tool for hydrogenase classification and analysis

Dan Søndergaard et al. Sci Rep. 2016.

Abstract

H2 metabolism is proposed to be the most ancient and diverse mechanism of energy-conservation. The metalloenzymes mediating this metabolism, hydrogenases, are encoded by over 60 microbial phyla and are present in all major ecosystems. We developed a classification system and web tool, HydDB, for the structural and functional analysis of these enzymes. We show that hydrogenase function can be predicted by primary sequence alone using an expanded classification scheme (comprising 29 [NiFe], 8 [FeFe], and 1 [Fe] hydrogenase classes) that defines 11 new classes with distinct biological functions. Using this scheme, we built a web tool that rapidly and reliably classifies hydrogenase primary sequences using a combination of k-nearest neighbors' algorithms and CDD referencing. Demonstrating its capacity, the tool reliably predicted hydrogenase content and function in 12 newly-sequenced bacteria, archaea, and eukaryotes. HydDB provides the capacity to browse the amino acid sequences of 3248 annotated hydrogenase catalytic subunits and also contains a detailed repository of physiological, biochemical, and structural information about the 38 hydrogenase classes defined here. The database and classifier are freely and publicly available at http://services.birc.au.dk/hyddb/.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Sequence similarity network of hydrogenase sequences.

Nodes represent individual proteins and the edges show the BLAST E_-values between them at the log_E filter defined at the bottom-left of each panel. The sequences are colored by class as defined in the legends. Figure S1 shows the further delineation of the encircled [NiFe] hydrogenase classes.

Figure 2

Figure 2. Evaluating the _k_-NN classifier for k = 1…10.

For each k, a 5-fold cross-validation was performed. The mean precision ± two standard deviations of the folds is shown in the figure (note the y-axis). k = 1 provides the most accurate classifier. However, k = 4 provides almost the same precision and is more robust to errors in the training set (reflected by the lower standard deviation). In general, the standard deviation is very small, indicating that the predictions are robust to changes in the training data.

References

    1. Schwartz E., Fritsch J. & Friedrich B. H2-metabolizing prokaryotes (Springer Berlin Heidelberg, 2013).
    1. Greening C. et al.. Genome and metagenome surveys of hydrogenase diversity indicate H2 is a widely-utilised energy source for microbial growth and survival. Isme J. 10, 761–777 (2016). - PMC - PubMed
    1. Cook G. M., Greening C., Hards K. & Berney M. In Advances in Bacterial Pathogen Biology (ed. Poole R. K.) 65, 1–62 (Academic Press, 2014). - PubMed
    1. Lane N., Allen J. F. & Martin W. How did LUCA make a living? Chemiosmosis in the origin of life. BioEssays 32, 271–280 (2010). - PubMed
    1. Weiss M. C. et al.. The physiology and habitat of the last universal common ancestor. Nat. Microbiol. 1, 16116 (2016). - PubMed

LinkOut - more resources