Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs - PubMed (original) (raw)

Function-based classification of carbohydrate-active enzymes by recognition of short, conserved peptide motifs

Peter Kamp Busk et al. Appl Environ Microbiol. 2013 Jun.

Abstract

Functional prediction of carbohydrate-active enzymes is difficult due to low sequence identity. However, similar enzymes often share a few short motifs, e.g., around the active site, even when the overall sequences are very different. To exploit this notion for functional prediction of carbohydrate-active enzymes, we developed a simple algorithm, peptide pattern recognition (PPR), that can divide proteins into groups of sequences that share a set of short conserved sequences. When this method was used on 118 glycoside hydrolase 5 proteins with 9% average pairwise identity and representing four characterized enzymatic functions, 97% of the proteins were sorted into groups correlating with their enzymatic activity. Furthermore, we analyzed 8,138 glycoside hydrolase 13 proteins including 204 experimentally characterized enzymes with 28 different functions. There was a 91% correlation between group and enzyme activity. These results indicate that the function of carbohydrate-active enzymes can be predicted with high precision by finding short, conserved motifs in their sequences. The glycoside hydrolase 61 family is important for fungal biomass conversion, but only a few proteins of this family have been functionally characterized. Interestingly, PPR divided 743 glycoside hydrolase 61 proteins into 16 subfamilies useful for targeted investigation of the function of these proteins and pinpointed three conserved motifs with putative importance for enzyme activity. Furthermore, the conserved sequences were useful for cloning of new, subfamily-specific glycoside hydrolase 61 proteins from 14 fungi. In conclusion, identification of conserved sequence motifs is a new approach to sequence analysis that can predict carbohydrate-active enzyme functions with high precision.

PubMed Disclaimer

Figures

Fig 1

Fig 1

Correlation of PPR stringency and peptide length to correct prediction of the function of GH5 proteins. The correct prediction of the function of 118 GH5 proteins as a function of the stringency (cutoff/number of peptides) was calculated as an average of the prediction rates obtained by performing PPR analysis with peptide lengths of 3, 4, 5, 6, 8, and 10 amino acids. Likewise, correct prediction rates for all stringencies were calculated for each peptide length, as indicated.

Fig 2

Fig 2

Analysis of the proteins by EC number and cross comparison of the peptides in each GH5 subfamily.

Fig 3

Fig 3

Correlation between PPR subfamilies and CAZy subfamilies of 5,457 GH13 proteins. Shown is a cross comparison of the proteins in the PPR subfamilies with the same proteins in CAZy subfamilies. The PPR subfamilies were arranged to give the highest number of shared proteins between subfamilies along the diagonal of the diagram.

Fig 4

Fig 4

Cross comparison and distribution of the conserved hexapeptides in the GH61 sequences. The distribution of hexapeptides for each subfamily was calculated as the number of hexapeptides mapping to each 20-amino-acid interval, as described in Materials and Methods. The accumulated hexapeptide frequency (vertical axis) was calculated as the sum of the distribution of all the subfamilies in each 20-amino-acid interval. The horizontal axis designates the amino acid intervals.

Fig 5

Fig 5

Mapping of conserved amino acid residues on the surface of GH61E (Protein Data Bank [PDB] accession number 3EII). Conserved amino acids in GH61 subfamily 3 mapping to the surface of the GH61E structure (12) are indicated in yellow. Shown is an alignment of GH61E and the conserved amino acid residues of the GH61 subfamilies in the regions depicted on the surface of the GH61E structure. Residues that are highly conserved between subfamilies are indicated in red. Numbering above the alignment indicates amino acid positions relative to the start residue in GH61E.

Fig 6

Fig 6

Amplification of a GH61 protein with subfamily 1-specific primers. PCR was performed with the 6 possible combinations of the 5 primers constructed for GH61 subfamily 1 on Chaetomium thermophilum DNA, and the product was analyzed on a 2% agarose gel. Numbers and bars to the right of the gel indicate the migration of the bands in the DNA size marker, and numbers above the lanes indicate the combination of primers.

Fig 7

Fig 7

Characterization of the PCR products from 14 fungi. The number of conserved peptides from the GH61 subfamily was counted in each PCR product from the 14 fungi from the orders Sordariales (S), Helotiales (H), and Eurotiales (E). Furthermore, all the sequences were aligned. Sequences originating from the primers were discarded before analysis.

Similar articles

Cited by

References

    1. Kim C, Lee B. 2007. Accuracy of structure-based sequence alignment of automatic methods. BMC Bioinformatics 8: 355 doi:10.1186/1471-2105-8-355 - DOI - PMC - PubMed
    1. Huang W, Umbach DM, Li L. 2006. Accurate anchoring alignment of divergent sequences. Bioinformatics 22: 29–34 - PubMed
    1. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, Henrissat B. 2009. The Carbohydrate-Active EnZymes database (CAZy): an expert resource for glycogenomics. Nucleic Acids Res. 37: D233–D238 doi:10.1093/nar/gkn663 - DOI - PMC - PubMed
    1. Henrissat B, Davies G. 1997. Structural and sequence-based classification of glycoside hydrolases. Curr. Opin. Struct. Biol. 7: 637–644 - PubMed
    1. Henrissat B. 1991. A classification of glycosyl hydrolases based on amino acid sequence similarities. Biochem. J. 280(Part 2): 309–316 - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources