NRPquest: Coupling Mass Spectrometry and Genome Mining for Nonribosomal Peptide Discovery - PubMed (original) (raw)
. 2014 Aug 22;77(8):1902-9.
doi: 10.1021/np500370c. Epub 2014 Aug 12.
Affiliations
- PMID: 25116163
- PMCID: PMC4143176
- DOI: 10.1021/np500370c
NRPquest: Coupling Mass Spectrometry and Genome Mining for Nonribosomal Peptide Discovery
Hosein Mohimani et al. J Nat Prod. 2014.
Abstract
Nonribosomal peptides (NRPs) such as vancomycin and daptomycin are among the most effective antibiotics. While NRPs are biomedically important, the computational techniques for sequencing these peptides are still in their infancy. The recent emergence of mass spectrometry techniques for NRP analysis (capable of sequencing an NRP from small amounts of nonpurified material) revealed an enormous diversity of NRPs. However, as many NRPs have nonlinear structure (e.g., cyclic or branched-cyclic peptides), the standard de novo sequencing tools (developed for linear peptides) are not applicable to NRP analysis. Here, we introduce the first NRP identification algorithm, NRPquest, that performs mutation-tolerant and modification-tolerant searches of spectral data sets against a database of putative NRPs. In contrast to previous studies aimed at NRP discovery (that usually report very few NRPs), NRPquest revealed nearly a hundred NRPs (including unknown variants of previously known peptides) in a single study. This result indicates that NRPquest can potentially make MS-based NRP identification as robust as the identification of linear peptides in traditional proteomics.
Figures
Figure 1
NRPquest pipeline starts with mining the microbial genome for putative NRPs using standard tools such as NRPSpredictor2 and constructing a database of putative NRPs. In the green rectangle, the results of NRPSpredictor2 are illustrated for Bacillus subtilis subsp. subtilis NCIMB 3610. This strain has two NRPS gene clusters, which according to NRPSpredictor2 produce two surfactins (7 amino acids each) and one plipastatin (10 amino acids). Adenylation domains are shown in red, condensation domains in blue, PCP domains in green, and thioesterase domains in light blue. Two blind modifications (with arbitrary offsets) are added to each NRP, and different possible structures (linear/cyclic/branched-cyclic) are considered (blue rectangle), resulting in ∼134 million modified peptides. The red rectangle illustrates PSMs formed between each spectrum and each putative modified NRP with feasible mass difference. PSMs are scored and their _p_-values are computed using MS-DPR. MS-DPR approximates the probability distribution of scores of PSMs formed by a random peptide and the spectrum and further derives the _p_-value as the area under the extreme tail of the distribution. Spectra are further analyzed by spectral networks to enlarge the set of identified statistically significant PSMs. The yellow rectangle illustrates a spectral network of surfactins. The red arrows in the network illustrate how annotations are propagated from a node with low _p_-value 2.4 × 10–10 (precursor m/z 1022.7 Da) to nodes with higher _p_-values (e.g., a node with precursor m/z 1030.7 Da), thus rescuing these nodes from being discarded as statistically insignificant.
Figure 2
Peptide network (a), spectral network (b), and annotations of nodes in the spectral networks (c) in the case of tyrocidines. The multitag algorithm for rescoring PSMs starts from a node with a known annotation in the spectral network and propagates annotations from known to unknown peptides through the edges in the network. The peptide network and spectral network of the tyrocidines are shown in parts (a) and (b). In part (c), annotations of each node in the spectral network are shown. Note that the nine nodes in the spectral network correspond to nine singly charged tyrocidines shown in Table S2. The spectral network revealed two novel tyrocidine variants at masses 1294.7 Da (node 6) and 1338.7 Da (node 9).
Figure 3
Spectral networks of six NRP families identified by NRPquest: (a) daptomycin, (b) arylomycin, (c) pristinamycin, (d) plipastatin, (e) surfactin, and (f) tyrocidine (Table S1). Only spectra forming the most statistically significant PSMs are shown. Each node in these spectral networks may represent either a single spectrum or a group of very similar spectra (with similar precursor masses) compressed into a single node to simplify the network (in the latter case, the m/z of a cluster is the average of m/z of spectra in the cluster). The thickness of the edges indicates the level of similarity between the nodes in the spectral networks. Two connected components (in the case of plipastatin and tyrocidine) correspond to two different charge states (currently, the spectral alignment algorithm may fail to connect spectra from related peptides with different charges by an edge).
References
- Newman D. J.; Cragg G. M. J. Nat. Prod. 2007, 70, 461–477. -PubMed
- Strieker M.; Tanovi A.; Marahiel M. A. Curr. Opin. Struct. Biol. 2010, 20, 234–240. -PubMed
- Arnison P. G.; Bibb M. J.; Bierbaum G.; Bowers A. A.; Bugni T. S.; Bulaj G.; Camarero J. A.; Campopiano D. J.; Challis G. S.; Clardy J.; Cotter P. D.; Craik D. J.; Dawson M.; Dittmann E.; Donadio S.; Dorrestein P. C.; Entian K. D.; Fischbach M. A.; Garavelli J. S.; Gransson U.; Gruber C. W.; Haft D. H.; Hemscheidt T. K.; Hertweck C.; Hill C.; Horswill A. R.; Jaspars M.; Kelly W. L.; Klinman J. P.; Kuipers O. P.; Link A. J.; Liu W.; Marahiel M. A.; Mitchell D. A.; Moll G. L.; Moore B. S.; Muller R.; Nair S. K.; Nes I. F.; Norris G. E.; Olivera B. M.; Onaka H.; Patchett M. L.; Reaney M. J. T.; Rebuffat S.; Ross R. P.; Sahl H. G.; Schmidt E. W.; Selsted M. E.; Severinov K.; Shen B.; Sivonen K.; Smith L.; Stein T.; Sussmuth R. E.; Tagg J. R.; Tang G. L.; Truman A. W.; Vederas J. C.; Walsh C. T.; Walton J. D.; Wenzel S. C.; Willey J. M.; van der Donk W. A. Nat. Prod. Rep. 2013, 30, 108–160. -PMC -PubMed
- Sieber S. A.; Marahiel M. A. Chem. Rev. 2005, 105, 715–738. -PubMed
- Stachelhaus T.; Mootz H. D.; Marahiel M. A. Chem. Biol. 1999, 6, 493–505. -PubMed
Publication types
MeSH terms
Substances
Grants and funding
- GM097509/GM/NIGMS NIH HHS/United States
- 1-P41-RR024851-01/RR/NCRR NIH HHS/United States
- R01 GM086283/GM/NIGMS NIH HHS/United States
- P41 GM103484/GM/NIGMS NIH HHS/United States
- R01 GM097509/GM/NIGMS NIH HHS/United States
- P41 RR024851/RR/NCRR NIH HHS/United States
- GM086283/GM/NIGMS NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources
Medical
Molecular Biology Databases
Miscellaneous