Automated Pipeline for De Novo Metabolite Identification Using Mass-Spectrometry-Based Metabolomics (original) (raw)
Related papers
The Analyst, 2009
The chemical identification of mass spectrometric signals in metabolomic applications is important to provide conversion of analytical data to biological knowledge about metabolic pathways. The complexity of electrospray mass spectrometric data acquired from a range of samples (serum, urine, yeast intracellular extracts, yeast metabolic footprints, placental tissue metabolic footprints) has been investigated and has defined the frequency of different ion types routinely detected. Although some ion types were expected (protonated and deprotonated peaks, isotope peaks, multiply charged peaks) others were not expected (sodium formate adduct ions). In parallel, the Manchester Metabolomics Database (MMD) has been constructed with data from genome scale metabolic reconstructions, HMDB, KEGG, Lipid Maps, BioCyc and DrugBank to provide knowledge on 42,687 endogenous and exogenous metabolite species. The combination of accurate mass data for a large collection of metabolites, theoretical isotope abundance data and knowledge of the different ion types detected provided a greater number of electrospray mass spectrometric signals which were putatively identified and with greater confidence in the samples studied. To provide definitive identification metabolitespecific mass spectral libraries for UPLC-MS and GC-MS have been constructed for 1,065 commercially available authentic standards. The MMD data are available at http://dbkgroup.org/ MMD/
JUMPm: A Tool for Large-Scale Identification of Metabolites in Untargeted Metabolomics
Metabolites
Metabolomics is increasingly important for biomedical research, but large-scale metabolite identification in untargeted metabolomics is still challenging. Here, we present Jumbo Mass spectrometry-based Program of Metabolomics (JUMPm) software, a streamlined software tool for identifying potential metabolite formulas and structures in mass spectrometry. During database search, the false discovery rate is evaluated by a target-decoy strategy, where the decoys are produced by breaking the octet rule of chemistry. We illustrated the utility of JUMPm by detecting metabolite formulas and structures from liquid chromatography coupled tandem mass spectrometry (LC-MS/MS) analyses of unlabeled and stable-isotope labeled yeast samples. We also benchmarked the performance of JUMPm by analyzing a mixed sample from a commercially available metabolite library in both hydrophilic and hydrophobic LC-MS/MS. These analyses confirm that metabolite identification can be significantly improved by estimat...
Review Met Ide Ann Metabolomics 20130321
Identification of metabolites is a major challenge in biological studies and relies in principle on mass spectrometry (MS) and nuclear magnetic resonance (NMR) methods. The increased sensitivity and stability of both NMR and MS systems have made dereplication of complex biological samples feasible. Metabolic databases can be of help in the identification process. Nonetheless, there is still a lack of adequate spectral databases that contain high quality spectra, but new developments in this area will assist in the (semi-)automated identification process in the near future. Here, we discuss new developments for the structural elucidation of low abundant metabolites present in complex sample matrices. We describe how a recently developed combination of high resolution MS multistage fragmentation (MS n ) and high resolution one dimensional (1D)-proton ( 1 H)-NMR of liquid chromatography coupled to solid phase extraction (LC-SPE) purified metabolites can circumvent the need for isolating extensive amounts of the compounds of interest to elucidate their structures. The LC-MS-SPE-NMR hardware configuration in conjunction with high quality databases facilitates complete structural elucidation of metabolites even at sub-microgram levels of compound in crude extracts. However, progress is still required to optimally exploit the power of an integrated MS and NMR approach. Especially, there is a need to improve and expand both MS n and NMR spectral databases. Adequate and userfriendly software is required to assist in candidate selection based on the comparison of acquired MS and NMR spectral information with reference data. It is foreseen that these focal points will contribute to a better transfer and exploitation of structural information gained from diverse analytical platforms.
MyCompoundID: Using an Evidence-Based Metabolome Library for Metabolite Identification
Analytical Chemistry, 2013
Identification of unknown metabolites is a major challenge in metabolomics. Without the identities of the metabolites, the metabolome data generated from a biological sample cannot be readily linked with the proteomic and genomic information for studies in systems biology and medicine. We have developed a web-based metabolite identification tool (http:// www.mycompoundid.org) that allows searching and interpreting mass spectrometry (MS) data against a newly constructed metabolome library composed of 8 021 known human endogenous metabolites and their predicted metabolic products (375 809 compounds from one metabolic reaction and 10 583 901 from two reactions). As an example, in the analysis of a simple extract of human urine or plasma and the whole human urine by liquid chromatography-mass spectrometry and MS/MS, we are able to identify at least two times more metabolites in these samples than by using a standard human metabolome library. In addition, it is shown that the evidence-based metabolome library (EML) provides a much superior performance in identifying putative metabolites from a human urine sample, compared to the use of the ChemPub and KEGG libraries.
Biologically Consistent Annotation of Metabolomics Data
Analytical chemistry, 2017
Annotation of metabolites remains a major challenge in liquid chromatography-mass spectrometry (LC-MS) based untargeted metabolomics. The current gold standard for metabolite identification is to match the detected feature with an authentic standard analyzed on the same equipment and using the same method as the experimental samples. However, there are substantial practical challenges in applying this approach to large data sets. One widely used annotation approach is to search spectral libraries in reference databases for matching metabolites; however, this approach is limited by the incomplete coverage of these libraries. An alternative computational approach is to match the detected features to candidate chemical structures based on their mass and predicted fragmentation pattern. Unfortunately, both of these approaches can match multiple identities with a single feature. Another issue is that annotations from different tools often disagree. This paper presents a novel LC-MS data ...
TrAC Trends in Analytical Chemistry, 2016
Mass spectrometry-based metabolomics is now widely used to obtain new insights into human, plant and microbial biochemistry, drug and biomarker discovery, nutrition research and food control. Despite this great shared interest, identifying and characterizing the structure of metabolites has become a major bottleneck for converting raw mass spectrometric data into biological knowledge. In this regard, comprehensive and wellannotated MS-based spectral databases play a key role towards converting raw spectral data into metabolite annotations and thus biological knowledge. The main characteristics of the mass spectral databases currently used in MS-based metabolomics, are reviewed in this paper, underlining the advantages and limitations of each. Extending this, the overlap of compounds with MS n (n2) spectra from authentic chemical standards in most public and commercial databases has been calculated for the first time. Finally, future prospects for mass spectral databases are discussed in terms of the needs posed by novel applications and instrumental advancements.
Bioinformatics, 2011
The study of metabolites (metabolomics) is increasingly being applied to investigate microbial, plant, environmental and mammalian systems. One of the limiting factors is that of chemically identifying metabolites from mass spectrometric signals present in complex datasets. Results: Three workflows have been developed to allow for the rapid, automated and high-throughput annotation and putative metabolite identification of electrospray LC-MS-derived metabolomic datasets. The collection of workflows are defined as PUTMEDID_LCMS and perform feature annotation, matching of accurate m/z to the accurate mass of neutral molecules and associated molecular formula and matching of the molecular formulae to a reference file of metabolites. The software is independent of the instrument and data pre-processing applied. The number of false positives is reduced by eliminating the inaccurate matching of many artifact, isotope, multiply charged and complex adduct peaks through complex interrogation of experimental data. Availability: The workflows, standard operating procedure and further information are publicly available at http://www.mcisb.org/ resources/putmedid.html. Contact:
Metabolomics technologies and metabolite identification
TrAC Trends in Analytical Chemistry, 2007
Metabolomics studies rely on the analysis of the multitude of small molecules (metabolites) present in a biological system. Most commonly, metabolomics is heavily supported by mass spectrometry (MS) and nuclear magnetic resonance (NMR) as parallel technologies that provide an overview of the metabolome and high-power compound elucidation. Over and above large-scale analysis, a major effort is needed for unequivocal identification of metabolites. The combination of liquid chromatography (LC)-MS and NMR is a powerful methodology for identifying metabolites. Better chemical characterization of the metabolome will undoubtedly enlarge knowledge of any biological system.