Highly sensitive feature detection for high resolution LC/MS - PubMed (original) (raw)

Highly sensitive feature detection for high resolution LC/MS

Ralf Tautenhahn et al. BMC Bioinformatics. 2008.

Abstract

Background: Liquid chromatography coupled to mass spectrometry (LC/MS) is an important analytical technology for e.g. metabolomics experiments. Determining the boundaries, centres and intensities of the two-dimensional signals in the LC/MS raw data is called feature detection. For the subsequent analysis of complex samples such as plant extracts, which may contain hundreds of compounds, corresponding to thousands of features -- a reliable feature detection is mandatory.

Results: We developed a new feature detection algorithm centWave for high-resolution LC/MS data sets, which collects regions of interest (partial mass traces) in the raw-data, and applies continuous wavelet transformation and optionally Gauss-fitting in the chromatographic domain. We evaluated our feature detection algorithm on dilution series and mixtures of seed and leaf extracts, and estimated recall, precision and F-score of seed and leaf specific features in two experiments of different complexity.

Conclusion: The new feature detection algorithm meets the requirements of current metabolomics experiments. centWave can detect close-by and partially overlapping features and has the highest overall recall and precision values compared to the other algorithms, matchedFilter (the original algorithm of XCMS) and the centroidPicker from MZmine. The centWave algorithm was integrated into the Bioconductor R-package XCMS and is available from (http://www.bioconductor.org/).

PubMed Disclaimer

Figures

Figure 1

Figure 1

Mass trace and chromatographic peak of Biochanin A [M + _H_]+ mass signal. The upper panel shows the mass trace of the biochanin A [M + _H_]+ mass signal across 10 seconds with colour-coded intensities. The corresponding chromatographic peak is shown below.

Figure 2

Figure 2

Region Of Interest (ROI) detection. Raw data in the chromatographic and m/z region around the [M + _H_]+ mass signal (1) of biochanin A. In addition to the three isotopic peaks (2–4) other mass signals are marked as ROIs.

Figure 3

Figure 3

Matched filter effects, example region 1. HPLC/ESI-QTOF-MS of a A. thaliana leaf extract. Extracted ion chromatogram (277.213 – 277.221 m/z) and matched filter results using second derivative Gaussian with different filter widths. Negative filter values were omitted.

Figure 4

Figure 4

Matched filter effects, example region 2. HPLC/ESI-QTOF-MS of a A. thaliana leaf extract. Extracted ion chromatogram (967.53–967.56 m/z, same sample that was used for Figure 3) and matched filter results using second derivative Gaussian with different filter widths. Negative filter values were clipped.

Figure 5

Figure 5

Mexican Hat Wavelet. Mexican hat wavelet at different scales.

Figure 6

Figure 6

centWave results for example region 1. centWave results for example region 1. The lower part shows the same extracted ion chromatogram (277.213–277.221 m/z) as in Figure 3 and the detected chromatographic peaks from the centWave algorithm as Gaussian fits. The upper part shows the CWT coefficients on the different scales. A cross marks the scale where the peak was optimally localised. The vertical grey lines show the peak borders which were estimated from the coefficients of this scale.

Figure 7

Figure 7

centWave results for example region 2. centWave results for example region 2. The lower part shows the same extracted ion chromatogram (967.53–967.56 m/z) as in Figure 4 and the detected chromatographic peaks from the centWave algorithm as Gaussian fits. The upper part shows the CWT coefficients on the different scales. A cross marks the scale where the peak was optimally localised. The vertical grey lines show the peak borders which were estimated from the coefficients of this scale.

Figure 8

Figure 8

Venn Diagrams of Detected Features. Venn Diagrams showing the number of features in seed and leaf extracts that were found by the three different algorithms. Only the overlapping (green coloured) subsets were used as ground truth.

Figure 9

Figure 9

F-score values for Experiment 1 & 2. F-score (combined measure of recall and precision, calculated from the ground truth features) for dilution series of the seed and leaf extract (left-most and middle part) and for mixtures of the seed and leaf extract (right-most part of the figure). Detected features that match the respective ground truth features were counted als true positives, while all other features returned were considered as false positives. Higher F-score values represent better feature detection performance.

Figure 10

Figure 10

F-score values for Experiment 1 & 2 (alternative parameter settings). F-score (combined measure of recall and precision, calculated from the ground truth features) for dilution series of the seed and leaf extract (left-most and middle part) and for mixtures of the seed and leaf extract (right-most part of the figure). Detected features that match the respective ground truth features were counted als true positives, while all other features returned were considered as false positives. Higher F-score values represent better feature detection performance. Alternative parameter settings were used (see Additional file 4).

Similar articles

Cited by

References

    1. Oliver S, Winson M, Kell D, Baganz F. Systematic functional analysis of the yeast genome. Trends Biotechnol. 1998;16:373–378. - PubMed
    1. Fiehn O, Kopka J, Dörmann P, Altmann T, Trethewey R, Willmitzer L. Metabolite profiling for plant functional genomics. Nature Biotechnology. 2000;18:115. - PubMed
    1. Dunn WB. Current trends and future requirements for the mass spectrometric investigation of microbial, mammalian and plant metabolomes. Physical Biology. 2008;5:24. http://stacks.iop.org/1478-3975/5/011001 - PubMed
    1. Roepenack-Lahaye Ev, Degenkolb T, Zerjeski M, Franz M, Roth U, Wessjohann L, Schmidt J, Scheel D, Clemens S. Profiling of Arabidopsis Secondary Metabolites by Capillary Liquid Chromatography Coupled to Electrospray Ionization Quadrupole Time-of-Flight Mass Spectrometry. Plant Physiology. 2004;134:548–559. - PMC - PubMed
    1. Böttcher C, Roepenack-Lahaye Ev, Schmidt J, Schmotz C, Neumann S, Scheel D, Clemens S. Metabolome Analysis of Biosynthetic Mutants Reveals Diversity of Metabolic Changes and Allows Identification of a Large Number of New Compounds in Arabidopsis thaliana. Plant Physiol. 2008. p. 108.117754.http://www.plantphysiol.org/cgi/content/abstract/pp.108.117754v1 - PMC - PubMed

MeSH terms

Substances

LinkOut - more resources