Biological systems discovery in silico: radical S-adenosylmethionine protein families and their target peptides for posttranslational modification - PubMed (original) (raw)

Biological systems discovery in silico: radical S-adenosylmethionine protein families and their target peptides for posttranslational modification

Daniel H Haft et al. J Bacteriol. 2011 Jun.

Abstract

Data mining methods in bioinformatics and comparative genomics commonly rely on working definitions of protein families from prior computation. Partial phylogenetic profiling (PPP), by contrast, optimizes family sizes during its searches for the cooccurring protein families that serve different roles in the same biological system. In a large-scale investigation of the incredibly diverse radical S-adenosylmethionine (SAM) enzyme superfamily, PPP aided in building a collection of 68 TIGRFAMs hidden Markov models (HMMs) that define nonoverlapping and functionally distinct subfamilies. Many identify radical SAM enzymes as molecular markers for multicomponent biological systems; HMMs defining their partner proteins also were constructed. Newly found systems include five groupings of protein families in which at least one marker is a radical SAM enzyme while another, encoded by an adjacent gene, is a short peptide predicted to be its substrate for posttranslational modification. The most prevalent, in over 125 genomes, featuring a peptide that we designate SCIFF (six cysteines in forty-five residues), is conserved throughout the class Clostridia, a distribution inconsistent with putative bacteriocin activity. A second novel system features a tandem pair of putative peptide-modifying radical SAM enzymes associated with a highly divergent family of peptides in which the only clearly conserved feature is a run of His-Xaa-Ser repeats. A third system pairs a radical SAM domain peptide maturase with selenocysteine-containing targets, suggesting a new biological role for selenium. These and several additional novel maturases that cooccur with predicted target peptides share a C-terminal additional 4Fe4S-binding domain with PqqE, the subtilosin A maturase AlbA, and the predicted mycofactocin and Nif11-class peptide maturases as well as with activators of anaerobic sulfatases and quinohemoprotein amine dehydrogenases. Radical SAM enzymes with this additional domain, as detected by TIGR04085, significantly outnumber lantibiotic synthases and cyclodehydratases combined in reference genomes while being highly enriched for members whose apparent targets are small peptides. Interpretation of comparative genomics evidence suggests unexpected (nonbacteriocin) roles for natural products from several of these systems.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

The SCIFF system multiple sequence alignment and genomic regions. (A) The TIGRFAMs seed alignment TIGR03973 for the SCIFF (_s_ix _c_ysteines _i_n _f_orty-_f_ive residues) protein is shown shaded according to the degree of sequence identity in each column. Sequences more than 80% identical were removed. The six cysteines that are universal or nearly so are indicated with arrows. A run of 10 residues, SCQSACKTSC, is invariant except for two sequences with one conservative substitution each. The first, third, fourth, and fifth cysteines are flanked on one or both sides by amino acids with small side chains (Gly, Ser, or Ala), as is common for posttranslational modifications that cross-link cysteines to other residues during peptide maturation. The species of origin for the sequences shown, in order from top to bottom, are Clostridium perfringens ATCC 13124, Clostridium novyi NT, Thermosinus carboxydivorans Nor1, Desulfotomaculum reducens MI-1, Caldicellulosiruptor saccharolyticus DSM 8903, Clostridium sp. strain L2-50, Faecalibacterium prausnitzii M21/2, Paenibacillus larvae subsp. larvae BRL-230010, Clostridium scindens ATCC 35704, Epulopiscium sp. ‘N.t. morphotype B’, Anaerofustis stercorihominis DSM 17244, “Candidatus Desulforudis audaxviator” MP104C, Natranaerobius thermophilus JW/NM-WN-LF, Eubacterium biforme DSM 3989, Dethiobacter alkaliphilus AHT 1, Anaerococcus lactolyticus ATCC 51172, Acidaminococcus sp. strain D21, Shuttleworthia satelles DSM 14600, Selenomonas flueggei ATCC 43531, Eubacterium saphenum ATCC 49989, Desulfotomaculum acetoxidans DSM 771, Dialister invisus DSM 15470, Ammonifex degensii KC4, Subdoligranulum variabile DSM 15176, Clostridium hathewayi DSM 13479, Thermoanaerobacter italicus Ab9, Ethanoligenens harbinense YUAN-3, Filifactor alocis ATCC 35896, and Carboxydothermus hydrogenoformans Z-2901. (B) Genome region figure showing the SCIFF precursor and its maturase (red) appearing in a housekeeping gene context with the queuosine tRNA modification genes queA and tgt (green) and the Sec system subunit genes yajC, secD, and secF (black). In some species, an additional conserved hypothetical protein (c.h.p.) is also present (gray).

Fig. 2.

Fig. 2.

Multiple sequence alignment of His-Xaa-Ser proteins. Sequences were aligned by MUSCLE and minimally hand edited at sites from the first His-Xaa-Ser repeat to the C terminus. The three shortest sequences are shown at their full lengths, although others have additional C-terminal sequence not shown. Three sequences, identified by genus names, were not previously identified as protein-coding features. The sequences shown, in order from top to bottom, are from Vibrio parahaemolyticus RIMD 2210633, Stigmatella aurantiaca DW4/3-1, Rhodobacter sp. strain SW2, Blautia hansenii DSM 20583, Phenylobacterium zucineum HLK1, Pseudomonas fluorescens SBW25, Bacteroides sp. strain D2, Victivallis vadensis ATCC BAA-548, Ralstonia eutropha H16, Desulfovibrio vulgaris strain Miyazaki F, “Candidatus Azobacteroides pseudotrichonymphae” genomovar CFP2, Aeromonas salmonicida subsp. salmonicida A449, and Opitutaceae bacterium TAV2. Member sequences occur in close proximity to paired radical SAM enzymes, one each from families TIGR03977 and TIGR03978.

Fig. 3.

Fig. 3.

Multiple sequence alignment and genomic region view of selenobacteriocin precursor peptides. (A) Multiple alignment. The letter U represents UGA (normally a stop) codon at the start of a bacterial selenocysteine insertion element (SECIS) translated as selenocysteine (SeCys), the 21st amino acid. The two alignment columns that contain at least one U are indicated with arrows; all non-SeCys residues in those columns are Cys. Model TIGR04081 describes sequences up to the column immediately past the first selenocysteine-containing column. Sequences, in order from top to bottom, include putative (seleno)bacteriocins from Geobacter sulfurreducens PCA (extended), Geobacter sp. strain M18 (extended), Chlorobium phaeobacteroides BS1, Prosthecochloris aestuarii DSM 271, Desulfococcus oleovorans Hxd3 (no gene shown in GenBank), Desulfomicrobium baculatum DSM 4028 (extended), Desulfohalobium retbaense DSM 5692 (extended), Desulfurivibrio alkaliphilus AHT2, Desulfonatronospira thiodismutans ASO3-1, and Geobacter lovleyi SZ (extended). (B) Corrected genomic region for the GSU_1558/GSU_1559 and GSU_1560 genes from Geobacter sulfurreducens PCA. Diagonal arrows indicate the positions of the two predicted SeCys residues. Underneath the arrow diagram are the identified selenocysteine insertion elements, or SECIS. The SECIS elements begin with UGA codons that are translated as SeCys and are 80% identical through their first 30 bases.

Fig. 4.

Fig. 4.

A ribosomal peptide natural product cassette in Clostridium botulinum A2 Kyoto. This six-gene cluster for a CLI_3235-type system shows two genes for which models were not constructed at the left and right (black). At the left is a transporter, CLM_3254, with both the ATP-binding and permease domains of ABC transporters. At the right is a peptidase, CLM_3249. The central four genes (red) are a locally cysteine-rich putative RTNP precursor of family TIGR04065, a radical SAM enzyme of family TIGR04068, a conserved hypothetical protein described by family TIGR04066, and an acyl carrier protein homolog described by TIGR04069. The related cassette in Clostridium botulinum F Langeland contains the same genes in the same order (plus one additional gene). Cassettes are flanked by unrelated genes in the genomes of these two C. botulinum strains.

Similar articles

Cited by

References

    1. Altschul S. F., et al. 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25:3389–3402 - PMC - PubMed
    1. Benjdia A., et al. 2010. Anaerobic sulfatase-maturating enzyme—a mechanistic link with glycyl radical-activating enzymes? FEBS J. 277:1906–1920 - PMC - PubMed
    1. Bierbaum G., Sahl H. G. 2009. Lantibiotics: mode of action, biosynthesis and bioengineering. Curr. Pharm. Biotechnol. 10:2–18 - PubMed
    1. Brindley A. A., Zajicek R., Warren M. J., Ferguson S. J., Rigby S. E. 2010. NirJ, a radical SAM family member of the d1 heme biogenesis cluster. FEBS Lett. 584:2461–2466 - PubMed
    1. Butcher B. G., Lin Y. P., Helmann J. D. 2007. The yydFGHIJ operon of Bacillus subtilis encodes a peptide that induces the LiaRS two-component system. J. Bacteriol. 189:8616–8625 - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources