LC-MS/MS data analysis with xcms (original) (raw)
In this section we analyze a small SWATH data set consisting of a single mzML file with data from the same sample analyzed in the previous section but recorded in SWATH mode. We again read the data with the readMsExperiment()
function. The resulting object will contain all recorded MS1 and MS2 spectra in the specified file. Similar to the previous data file, we filter the file to signal between 230 and 610 seconds.
Below we determine the number of MS level 1 and 2 spectra in the present data set.
As described in the introduction, in SWATH mode all ions within pre-defined isolation windows are fragmented and MS2 spectra measured. The definition of these isolation windows (SWATH pockets) is imported from the mzML files and available as additional spectra variables. Below we inspect the respective information for the first few spectra. The upper and lower isolation window m/z is available with spectra variables "isolationWindowLowerMz"
and"isolationWindowUpperMz"
respectively and the target m/z of the isolation window with "isolationWindowTargetMz"
. We can use the spectraData()
function to extract this information from the spectra within our swath_data
object.
We could also access these variables directly with the dedicatedisolationWindowLowerMz()
and isolationWindowUpperMz()
functions.
In the present data set we use the value of the isolation window target m/z to define the individual SWATH pockets. Below we list the number of spectra that are recorded in each pocket/isolation window.
We have thus between 422 and 423 MS2 spectra measured in each isolation window.
To inspect the data we can also extract chromatograms from both the measured MS1 as well as MS2 data. For MS2 data we have to set parameter msLevel = 2L
and, for SWATH data, in addition also specify the isolation window from which we want to extract the data. Below we extract the TIC of the MS1 data and of one of the isolation windows (isolation window target m/z of 270.85) and plot these.
Without specifying the isolationWindowTargetMz
parameter, all MS2 spectra would be considered in the chromatogram extraction which would result in a_chimeric_ chromatogram such as the one shown below:
For MS2 data without specific, different, m/z isolation windows (such as e.g. Waters MSe data) parameter isolationWindowTargetMz
can be omitted in thechromatograms()
call in which case, as already stated above, all MS2 spectra are considered in the chromatogram calculation. Alternatively, if the isolation window is not provided or specified in the original data files, it would be possible to manually define a value for this spectra variable, such as in the example below (from which the code is however not evaluated) were we assign the value of the precursor m/z to the spectra’s isolation window target m/z.
Chromatographic peak detection in MS1 and MS2 data
Similar to a conventional LC-MS analysis, we perform first a chromatographic peak detection (on the MS level 1 data) with the findChromPeaks()
method. Below we define the settings for a _centWave_-based peak detection and perform the analysis.
cwp <- CentWaveParam(snthresh = 5, noise = 100, ppm = 10,
peakwidth = c(3, 30))
swath_data <- findChromPeaks(swath_data, param = cwp)
swath_data
## Object of class XcmsExperiment
## Spectra: MS1 (422) MS2 (3378)
## Experiment data: 1 sample(s)
## Sample data links:
## - spectra: 1 sample(s) to 3800 element(s).
## xcms results:
## - chromatographic peaks: 62 in MS level(s): 1
Next we perform a chromatographic peak detection in MS level 2 data separately for each individual isolation window. We use thefindChromPeaksIsolationWindow()
function employing the same peak detection algorithm reducing however the required signal-to-noise ratio. TheisolationWindow
parameter allows to specify which MS2 spectra belong to which isolation window and hence defines in which set of MS2 spectra chromatographic peak detection should be performed. As a default the "isolationWindowTargetMz"
variable of the object’s spectra is used.
cwp <- CentWaveParam(snthresh = 3, noise = 10, ppm = 10,
peakwidth = c(3, 30))
swath_data <- findChromPeaksIsolationWindow(swath_data, param = cwp)
swath_data
## Object of class XcmsExperiment
## Spectra: MS1 (422) MS2 (3378)
## Experiment data: 1 sample(s)
## Sample data links:
## - spectra: 1 sample(s) to 3800 element(s).
## xcms results:
## - chromatographic peaks: 370 in MS level(s): 1, 2
The findChromPeaksIsolationWindow()
function added all peaks identified in the individual isolation windows to the chromPeaks
matrix containing already the MS1 chromatographic peaks. These newly added peaks can be identified through the"isolationWindow"
column in the object’s chromPeakData
.
chromPeakData(swath_data)
## DataFrame with 370 rows and 6 columns
## ms_level is_filled isolationWindow isolationWindowTargetMZ
## <integer> <logical> <numeric> <numeric>
## CP01 1 FALSE NA NA
## CP02 1 FALSE NA NA
## CP03 1 FALSE NA NA
## CP04 1 FALSE NA NA
## CP05 1 FALSE NA NA
## ... ... ... ... ...
## CP366 2 FALSE 601.85 601.85
## CP367 2 FALSE 601.85 601.85
## CP368 2 FALSE 601.85 601.85
## CP369 2 FALSE 601.85 601.85
## CP370 2 FALSE 601.85 601.85
## isolationWindowLowerMz isolationWindowUpperMz
## <numeric> <numeric>
## CP01 NA NA
## CP02 NA NA
## CP03 NA NA
## CP04 NA NA
## CP05 NA NA
## ... ... ...
## CP366 388.8 814.9
## CP367 388.8 814.9
## CP368 388.8 814.9
## CP369 388.8 814.9
## CP370 388.8 814.9
Below we count the number of chromatographic peaks identified within each isolation window (the number of chromatographic peaks identified in MS1 is 62).
table(chromPeakData(swath_data)$isolationWindow)
##
## 163.75 208.95 244.05 270.85 299.1 329.8 367.35 601.85
## 2 38 32 14 105 23 62 32
We thus successfully identified chromatographic peaks in the different MS levels and isolation windows. As a next step we have to identify which of the measured signals represents data from the same original compound to _reconstruct_fragment spectra for each MS1 signal (chromatographic peak).
Reconstruction of MS2 spectra
Identifying the signal of the fragment ions for the precursor measured by each MS1 chromatographic peak is a non-trivial task. The MS2 spectrum of the fragment ion for each MS1 chromatographic peak has to be reconstructed from the available MS2 signal (i.e. the chromatographic peaks identified in MS level 2). For SWATH data, fragment ion signal should be present in the same isolation window that contains the m/z of the precursor ion and the chromatographic peak shape of the MS2 chromatographic peaks of fragment ions of a specific precursor should have a similar retention time and peak shape than the precursor’s MS1 chromatographic peak.
After detection of MS1 and MS2 chromatographic peaks has been performed, we can reconstruct the MS2 spectra using the reconstructChromPeakSpectra()
function. This function defines an MS2 spectrum for each MS1 chromatographic peak based on the following approach:
- Identify MS2 chromatographic peaks in the isolation window containing the m/z of the ion (the MS1 chromatographic peak) that have approximately the same retention time than the MS1 chromatographic peak (the accepted difference in retention time can be defined with the
diffRt
parameter). - Extract the MS1 chromatographic peak and all MS2 chromatographic peaks identified by the previous step and correlate the peak shapes of the candidate MS2 chromatographic peaks with the shape of the MS1 peak. MS2 chromatographic peaks with a correlation coefficient larger than
minCor
are retained. - Reconstruct the MS2 spectrum using the m/z of all above selected MS2 chromatographic peaks and their intensity; each MS2 chromatographic peak selected for an MS1 peak will thus represent one mass peak in the reconstructed spectrum.
To illustrate this process we perform the individual steps on the example of fenamiphos (exact mass 303.105800777 and m/z of [M+H]+ adduct 304.113077). As a first step we extract the chromatographic peak for this ion.
fenamiphos_mz <- 304.113077
fenamiphos_ms1_peak <- chromPeaks(swath_data, mz = fenamiphos_mz, ppm = 2)
fenamiphos_ms1_peak
## mz mzmin mzmax rt rtmin rtmax into intb
## CP34 304.1124 304.1121 304.1126 423.945 419.445 428.444 10697.34 10688.34
## maxo sn sample
## CP34 2401.849 618 1
Next we identify all MS2 chromatographic peaks that were identified in the isolation window containing the m/z of fenamiphos. The information on the isolation window in which a chromatographic peak was identified is available in the chromPeakData
.
keep <- chromPeakData(swath_data)$isolationWindowLowerMz < fenamiphos_mz &
chromPeakData(swath_data)$isolationWindowUpperMz > fenamiphos_mz
We also require the retention time of the MS2 chromatographic peaks to be similar to the retention time of the MS1 peak and extract the corresponding peak information. We thus below select all chromatographic peaks for which the retention time range contains the retention time of the apex position of the MS1 chromatographic peak.
keep <- keep &
chromPeaks(swath_data)[, "rtmin"] < fenamiphos_ms1_peak[, "rt"] &
chromPeaks(swath_data)[, "rtmax"] > fenamiphos_ms1_peak[, "rt"]
fenamiphos_ms2_peak <- chromPeaks(swath_data)[which(keep), ]
In total 24 MS2 chromatographic peaks match all the above conditions. Next we extract the ion chromatogram of the MS1 peak and of all selected candidate MS2 signals. To ensure chromatograms are extracted from spectra in the correct isolation window we need to specify the respective isolation window by passing its isolation window target m/z to thechromatogram()
function (in addition to setting msLevel = 2
). This can be done by either getting the isolationWindowTargetMz
of the spectra after the data was subset using filterIsolationWindow()
(as done below) or by selecting the isolationWindowTargetMz
closest to the m/z of the compound of interest.
rtr <- fenamiphos_ms1_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms1_peak[, c("mzmin", "mzmax")]
fenamiphos_ms1_chr <- chromatogram(swath_data, rt = rtr, mz = mzr)
## Processing chromatographic peaks
rtr <- fenamiphos_ms2_peak[, c("rtmin", "rtmax")]
mzr <- fenamiphos_ms2_peak[, c("mzmin", "mzmax")]
## Get the isolationWindowTargetMz for spectra containing the m/z of the
## compound of interest
swath_data |>
filterIsolationWindow(mz = fenamiphos_mz) |>
spectra() |>
isolationWindowTargetMz() |>
table()
##
## 299.1
## 423
The target m/z of the isolation window containing the m/z of interest is thus 299.1 and we can use this in the chromatogram()
call below to extract the data from the correct (MS2) spectra.
fenamiphos_ms2_chr <- chromatogram(
swath_data, rt = rtr, mz = mzr, msLevel = 2L,
isolationWindowTargetMz = rep(299.1, nrow(rtr)))
## Processing chromatographic peaks
We can now plot the extracted ion chromatogram of the MS1 and the extracted MS2 data.
plot(rtime(fenamiphos_ms1_chr[1, 1]),
intensity(fenamiphos_ms1_chr[1, 1]),
xlab = "retention time [s]", ylab = "intensity", pch = 16,
ylim = c(0, 5000), col = "blue", type = "b", lwd = 2)
#' Add data from all MS2 peaks
tmp <- lapply(fenamiphos_ms2_chr@.Data,
function(z) points(rtime(z), intensity(z),
col = "#00000080",
type = "b", pch = 16))
Figure 5: Extracted ion chromatograms for Fenamiphos from MS1 (blue) and potentially related signal in MS2 (grey)
Next we can calculate correlations between the peak shapes of each MS2 chromatogram with the MS1 peak. We illustrate this process on the example of one MS2 chromatographic peaks. Note that, because MS1 and MS2 spectra are recorded consecutively, the retention times of the individual data points will differ between the MS2 and MS1 chromatographic data and data points have thus to be matched (aligned) before performing the correlation analysis. This is done automatically by the correlate()
function. See the help for the align
method for more information on alignment options.
compareChromatograms(fenamiphos_ms2_chr[1, 1],
fenamiphos_ms1_chr[1, 1],
ALIGNFUNARGS = list(method = "approx"))
## [1] 0.9997871
After identifying the MS2 chromatographic peaks with shapes of enough high similarity to the MS1 chromatographic peaks, an MS2 spectrum could be_reconstructed_ based on the m/z and intensities of the MS2 chromatographic peaks (i.e., using their "mz"
and "maxo"
or "into"
values).
Instead of performing this assignment of MS2 signal to MS1 chromatographic peaks manually as above, we can use the reconstructChromPeakSpectra()
function that performs the exact same steps for all MS1 chromatographic peaks in a DIA data set. Below we use this function to reconstruct MS2 spectra for our example data requiring a peak shape correlation higher than 0.9
between the candidate MS2 chromatographic peak and the target MS1 chromatographic peak.
swath_spectra <- reconstructChromPeakSpectra(swath_data, minCor = 0.9)
swath_spectra
## MSn data (Spectra) with 62 spectra in a MsBackendMemory backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## CP01 2 239.458 NA
## CP02 2 240.358 NA
## CP03 2 329.577 NA
## CP04 2 329.771 NA
## CP05 2 346.164 NA
## ... ... ... ...
## CP58 2 551.735 NA
## CP59 2 551.735 NA
## CP60 2 575.134 NA
## CP61 2 575.134 NA
## CP62 2 574.942 NA
## ... 20 more variables/columns.
## Processing:
## Merge 1 Spectra into one [Tue Apr 15 20:59:12 2025]
As a result we got a Spectra
object of length equal to the number of MS1 peaks in our data. The length of a spectrum represents the number of peaks it contains. Thus, a length of 0 indicates that no matching peak (MS2 signal) could be found for the respective MS1 chromatographic peak.
lengths(swath_spectra)
## [1] 0 0 1 1 1 0 0 0 0 0 0 0 3 0 3 4 0 3 0 1 0 9 14 1 0
## [26] 0 15 4 1 1 2 4 6 15 12 11 2 4 13 0 0 0 0 1 2 0 1 0 0 0
## [51] 3 0 2 1 7 7 0 0 0 0 0 2
For reconstructed spectra additional annotations are available such as the IDs of the MS2 chromatographic peaks from which the spectrum was reconstructed ("ms2_peak_id"
) as well as the correlation coefficient of their chromatographic peak shape with the precursor’s shape ("ms2_peak_cor"
). Metadata column "peak_id"
contains the ID of the MS1 chromatographic peak:
spectraData(swath_spectra, c("peak_id", "ms2_peak_id", "ms2_peak_cor"))
## DataFrame with 62 rows and 3 columns
## peak_id ms2_peak_id ms2_peak_cor
## <character> <list> <list>
## CP01 CP01
## CP02 CP02
## CP03 CP03 CP063 0.950582
## CP04 CP04 CP105 0.95157
## CP05 CP05 CP153 0.924545
## ... ... ... ...
## CP58 CP58
## CP59 CP59
## CP60 CP60
## CP61 CP61
## CP62 CP62 CP334,CP329 0.918915,0.911944
We next extract the MS2 spectrum for our example peak most likely representing [M+H]+ ions of Fenamiphos using its chromatographic peak ID:
fenamiphos_swath_spectrum <- swath_spectra[
swath_spectra$peak_id == rownames(fenamiphos_ms1_peak)]
We can now compare the reconstructed spectrum to the example consensus spectrum from the DDA experiment in the previous section (variable ex_spectrum
) as well as to the MS2 spectrum for Fenamiphos from Metlin (with a collision energy of 10V). For better visualization we normalize also the peak intensities of the reconstructed SWATH spectrum with the same function we used for the experimental DDA spectrum.
fenamiphos_swath_spectrum <- addProcessing(fenamiphos_swath_spectrum,
scale_fun)
par(mfrow = c(1, 2))
plotSpectraMirror(fenamiphos_swath_spectrum, ex_spectrum,
ppm = 50, main = "against DDA")
plotSpectraMirror(fenamiphos_swath_spectrum, fenamiphos[2],
ppm = 50, main = "against Metlin")
Figure 6: Mirror plot comparing the reconstructed MS2 spectrum for Fenamiphos (upper panel) against the measured spectrum from the DDA data and the Fenamiphhos spectrum from Metlin
If we wanted to get the EICs for the MS2 chromatographic peaks used to generate this MS2 spectrum we can use the IDs of these peaks which are provided with$ms2_peak_id
of the result spectrum.
pk_ids <- fenamiphos_swath_spectrum$ms2_peak_id[[1]]
pk_ids
## [1] "CP199" "CP201" "CP211" "CP208" "CP200" "CP202" "CP217" "CP215" "CP205"
## [10] "CP212" "CP221" "CP223" "CP213" "CP207" "CP220"
With these peak IDs available we can extract their retention time window and m/z ranges from the chromPeaks
matrix and use the chromatogram()
function to extract their EIC. Note however that for SWATH data we have MS2 signal from different isolation windows. Thus we have to first filter the swath_data
object by the isolation window containing the precursor m/z with thefilterIsolationWindow()
to subset the data to MS2 spectra related to the ion of interest. In addition, we have to use msLevel = 2L
in the chromatogram()
call because chromatogram()
extracts by default only data from MS1 spectra and we need to specify the target m/z of the isolation window containing the fragment data from the compound of interest.
rt_range <- chromPeaks(swath_data)[pk_ids, c("rtmin", "rtmax")]
mz_range <- chromPeaks(swath_data)[pk_ids, c("mzmin", "mzmax")]
pmz <- precursorMz(fenamiphos_swath_spectrum)[1]
## Determine the isolation window target m/z
tmz <- swath_data |>
filterIsolationWindow(mz = pmz) |>
spectra() |>
isolationWindowTargetMz() |>
unique()
ms2_eics <- chromatogram(
swath_data, rt = rt_range, mz = mz_range, msLevel = 2L,
isolationWindowTargetMz = rep(tmz, nrow(rt_range)))
## Processing chromatographic peaks
Each row of this ms2_eics
contains now the EIC of one of the MS2 chromatographic peaks. We can also plot these in an overlay plot.
plotChromatogramsOverlay(ms2_eics)
Figure 7: Overlay of EICs of chromatographic peaks used to reconstruct the MS2 spectrum for fenamiphos
As a second example we analyze the signal from an [M+H]+ ion with an m/z of 376.0381 (which would matchProchloraz). We first identify the MS1 chromatographic peak for that m/z and retrieve the reconstructed MS2 spectrum for that peak.
prochloraz_mz <- 376.0381
prochloraz_ms1_peak <- chromPeaks(swath_data, msLevel = 1L,
mz = prochloraz_mz, ppm = 5)
prochloraz_ms1_peak
## mz mzmin mzmax rt rtmin rtmax into intb
## CP22 376.0373 376.037 376.0374 405.046 401.446 409.546 3664.051 3655.951
## maxo sn sample
## CP22 897.3923 278 1
prochloraz_swath_spectrum <- swath_spectra[
swath_spectra$peak_id == rownames(prochloraz_ms1_peak)]
lengths(prochloraz_swath_spectrum)
## [1] 9
The MS2 spectrum for the (tentative) MS1 signal for prochloraz reconstructed from the SWATH MS2 data has thus 9 peaks.
In addition we identify the corresponding MS1 peak in the DDA data set, extract all measured MS2 chromatographic peaks and build the consensus spectrum from these.
prochloraz_dda_peak <- chromPeaks(dda_data, msLevel = 1L,
mz = prochloraz_mz, ppm = 5)
prochloraz_dda_peak
## mz mzmin mzmax rt rtmin rtmax into intb
## CP034 376.0385 376.0378 376.0391 405.295 401.166 410.145 5082.157 5072.77
## maxo sn sample
## CP034 1350.633 310 1
The retention times for the chromatographic peaks from the DDA and SWATH data match almost perfectly. Next we get the MS2 spectra for this peak.
prochloraz_dda_spectra <- dda_spectra[
dda_spectra$chrom_peak_id == rownames(prochloraz_dda_peak)]
prochloraz_dda_spectra
## MSn data (Spectra) with 5 spectra in a MsBackendMzR backend:
## msLevel rtime scanIndex
## <integer> <numeric> <integer>
## 1 2 401.438 3253
## 2 2 402.198 3259
## 3 2 404.677 3306
## 4 2 405.127 3316
## 5 2 405.877 3325
## ... 37 more variables/columns.
##
## file(s):
## PestMix1_DDA.mzML
## Processing:
## Filter: select retention time [230..610] on MS level(s) [Tue Apr 15 20:58:49 2025]
## Filter: select MS level(s) 2 [Tue Apr 15 20:59:00 2025]
## Merge 1 Spectra into one [Tue Apr 15 20:59:00 2025]
In total 5 spectra were measured, some with a relatively high number of peaks. Next we combine them into a consensus spectrum.
prochloraz_dda_spectrum <- combineSpectra(
prochloraz_dda_spectra, FUN = combinePeaks, ppm = 20,
peaks = "intersect", minProp = 0.8, intensityFun = median, mzFun = median,
f = prochloraz_dda_spectra$chrom_peak_id)
## Backend of the input object is read-only, will change that to an 'MsBackendMemory'
## Warning in FUN(X[[i]], ...): 'combinePeaks' for lists of peak matrices is
## deprecated; please use 'combinePeaksData' instead.
At last we load also the Prochloraz MS2 spectra (for different collision energies) from Metlin.
prochloraz <- Spectra(
system.file("mgf", "metlin-68898.mgf", package = "xcms"),
source = MsBackendMgf())
## Start data import from 1 files ... done
To validate the reconstructed spectrum we plot it against the corresponding DDA spectrum and the MS2 spectrum for Prochloraz (for a collision energy of 10V) from Metlin.
prochloraz_swath_spectrum <- addProcessing(prochloraz_swath_spectrum, scale_fun)
prochloraz_dda_spectrum <- addProcessing(prochloraz_dda_spectrum, scale_fun)
par(mfrow = c(1, 2))
plotSpectraMirror(prochloraz_swath_spectrum, prochloraz_dda_spectrum,
ppm = 40, main = "against DDA")
plotSpectraMirror(prochloraz_swath_spectrum, prochloraz[2],
ppm = 40, main = "against Metlin")
Figure 8: Mirror plot comparing the reconstructed MS2 spectrum for Prochloraz (upper panel) against the measured spectrum from the DDA data and the Prochloraz spectrum from Metlin
The spectra fit relatively well. Interestingly, the peak representing the precursor (the right-most peak) seems to have a slightly shifted m/z value in the reconstructed spectrum. Also, by closer inspecting the spectrum two groups of peaks with small differences in m/z can be observed (see plot below).
plotSpectra(prochloraz_swath_spectrum)
Figure 9: SWATH-derived MS2 spectrum for prochloraz
These could represent fragments from isotopes of the original compound. DIA MS2 data, since all ions at a given retention time are fragmented, can contain fragments from isotopes. We thus below use the isotopologues()
function from the MetaboCoreUtils package to check for presence of potential isotope peaks in the reconstructed MS2 spectrum for prochloraz.
library(MetaboCoreUtils)
isotopologues(peaksData(prochloraz_swath_spectrum)[[1]])
## [[1]]
## [1] 3 4 5
##
## [[2]]
## [1] 6 7
Indeed, peaks 3, 4 and 5 as well as 6 and 7 have been assigned to a group of potential isotope peaks. While this is no proof that the peaks are indeed fragment isotopes of prochloraz it is highly likely (given their difference in m/z and relative intensity differences). Below we thus define a function that keeps only the monoisotopic peak for each isotope group in a spectrum.
## Function to keep only the first (monoisotopic) peak for potential
## isotopologue peak groups.
rem_iso <- function(x, ...) {
idx <- isotopologues(x)
idx <- unlist(lapply(idx, function(z) z[-1]), use.names = FALSE)
if (length(idx))
x[-idx, , drop = FALSE]
else x
}
prochloraz_swath_spectrum2 <- addProcessing(prochloraz_swath_spectrum,
rem_iso)
par(mfrow = c(1, 2))
plotSpectra(prochloraz_swath_spectrum)
plotSpectra(prochloraz_swath_spectrum2)
Figure 10: SWATH MS2 spectrum for prochloraz before (left) and after deisotoping (right)
Removing the isotope peaks from the SWATH MS2 spectrum increases also the spectra similarity score (since reference spectra generally will contain only fragments of the ion of interest, but not of any of its isotopes).
compareSpectra(prochloraz_swath_spectrum, prochloraz_dda_spectrum)
## [1] 0.4623719
compareSpectra(prochloraz_swath_spectrum2, prochloraz_dda_spectrum)
## [1] 0.5932303
Similar to the DDA data, the reconstructed MS2 spectra from SWATH data could be used in the annotation of the MS1 chromatographic peaks.