Creating new MsBackend classes (original) (raw)

The MsBackend class defines core methods that have to be implemented by a MS_backend_ as well as optional methods with default implementations that might be implemented for a new backend but don’t necessarily have to. These functions are described in sections Required methods and Optional methods, respectively.

To create a new backend a class extending the virtual MsBackend needs to be implemented. In the example below we create thus a simple class with adata.frame to contain general spectral properties (spectra variables) and two slots for m/z and intensity values. These are stored as NumericListobjects since both m/z and intensity values are expected to be of type numericand to allow to store data from multiple spectra into a single backend object. We also define a simple constructor function that returns an empty instance of our new class.

The 3 slots spectraVars, mz and intensity will be used to store our MS data, each row in spectraVars being data for one spectrum with the columns being the different spectra variables (i.e. additional properties of a spectrum such as its retention time or MS level) and each element in mz andintensity being a numeric with the m/z and intensity values of the respective spectrum.

We should ideally also add some basic validity function that ensures the data to be OK. The function below simply checks that the number of rows of thespectraVars slot matches the length of the mz and intensity slot.

We can now create an instance of our new class with the MsBackendTest()function.

Note that a backend class does not necessarily need to contain all the data like the one from our example. Backends such as the MsBackendMzR for example retrieve the data on the fly from the raw MS data files or the MsBackendSqlfrom the MsBackendSql a SQL database.

Required methods

Methods listed in this section must be implemented for a new class extendingMsBackend. Methods should ideally also implemented in the order they are listed here. Also, it is strongly advised to write dedicated unit tests for each newly implemented method or function already during the development.

`dataStorage()`

The dataStorage spectra variable of a spectrum provides some information how or where the data is stored. The dataStorage() method should therefor return acharacter vector with length equal to the number of spectra of a backend object with that information. For most backends the data storage information can be a simple string such as "memory" or "database" to specify that the data of a spectrum is stored within the object itself or in a database, respectively.

Backend classes that keep only a subset of the MS data in memory and need to load data from data files upon request will use this spectra variable to store and keep track of the original data file for each spectrum. An example is theMsBackendMzR backend that retrieves the MS data on-the-fly from the original data file(s) whenever m/z or intensity values are requested from the backend. Calling dataStorage() on an MsBackendMzR returns thus the names from the originating files.

For our example backend we define a simple dataStorage() method that simply returns the column "dataStorage" from the @svars (as a character).

setMethod("dataStorage", "MsBackendTest", function(object) {
    as.character(object@spectraVars$dataStorage)
})

`length()`

length() is expected to return a single integer with the total number of spectra that are available through the backend class. For our example backend we simply return the number of rows of the data.frame stored in the@spectraVars slot.

setMethod("length", "MsBackendTest", function(x) {
    nrow(x@spectraVars)
})

`backendInitialize()`

The backendInitialize() method is expected to be called after creating an instance of the backend class and should prepare (initialize) the backend which in most cases means that MS data is loaded. This method can take any parameters needed by the backend to get loaded/initialized with data (which can be file names from which to load the data, a database connection or object(s) containing the data). During backendInitialize() usually also the special spectra variables dataStorage and dataOrigin are set.

Below we define a backendInitialize() method that takes as arguments adata.frame with spectra variables and two lists with the m/z and intensity values for each spectrum.

setMethod(
    "backendInitialize", "MsBackendTest",
    function(object, svars, mz, intensity) {
        if (!is.data.frame(svars))
            stop("'svars' needs to be a 'data.frame' with spectra variables")
        if (is.null(svars$dataStorage))
            svars$dataStorage <- "<memory>"
        if (is.null(svars$dataOrigin))
            svars$dataOrigin <- "<user provided>"
        object@spectraVars <- svars
        object@mz <- NumericList(mz, compress = FALSE)
        object@intensity <- NumericList(intensity, compress = FALSE)
        validObject(object)
        object
    })

In addition to adding the data to object, the function also defined thedataStorage and dataOrigin spectra variables. The purpose of these two variables is to provide some information on where the data is stored (in memory as in our example) and from where the data is originating. ThedataOrigin would for example allow to specify from which original data files individual spectra derive.

We can now create an instance of our backend class and fill it with data. We thus first define our MS data and pass this to the backendInitialize() method.

## A data.frame with spectra variables.
svars <- data.frame(msLevel = c(1L, 2L, 2L),
                    rtime = c(1.2, 1.3, 1.4))
## m/z values for each spectrum.
mzs <- list(c(12.3, 13.5, 16.5, 17.5),
            c(45.1, 45.2),
            c(64.4, 123.1, 124.1))
## intensity values for each spectrum.
ints <- list(c(123.3, 153.6, 2354.3, 243.4),
             c(100, 80.1),
             c(12.3, 35.2, 100))

## Create and initialize the backend
be <- backendInitialize(MsBackendTest(),
                        svars = svars, mz = mzs, intensity = ints)
be

## An object of class "MsBackendTest"
## Slot "spectraVars":
##   msLevel rtime dataStorage      dataOrigin
## 1       1   1.2    <memory> <user provided>
## 2       2   1.3    <memory> <user provided>
## 3       2   1.4    <memory> <user provided>
## 
## Slot "mz":
## NumericList of length 3
## [[1]] 12.3 13.5 16.5 17.5
## [[2]] 45.1 45.2
## [[3]] 64.4 123.1 124.1
## 
## Slot "intensity":
## NumericList of length 3
## [[1]] 123.3 153.6 2354.3 243.4
## [[2]] 100 80.1
## [[3]] 12.3 35.2 100
## 
## Slot "readonly":
## [1] FALSE
## 
## Slot "version":
## [1] "0.1"

While this method works and is compliant with the MsBackend API (because there is no requirement on the input parameters for the backendInitialize() method), it would be good practice for backends that are supposed to support replacing data, to add an optional additional parameter data that would allow passing the complete MS data (including m/z and intensity values) to the function as aDataFrame. This would simplify the implementation of some replacement methods and would in addition also allow to change the backend of a Spectra using thesetBackend() to our new backend. We thus re-implement thebackendInitialize() method supporting also to initialize the backend with such a data frame and we also implement a helper function that checks spectra variables for the correct data type.

#' Helper function to check if core spectra variables have the correct
#' data type.
#'
#' @param x `data.frame` with the data for spectra variables.
#'
#' @param name `character` defining the column names (spectra variables) of `x`
#'     for which the correct data type should be evaluated.
.sv_valid_data_type <- function(x, name = colnames(x)) {
    sv <- coreSpectraVariables()[names(coreSpectraVariables()) %in% name]
    for (i in seq_along(sv)) {
        if (!is(x[, names(sv[i])], sv[i]))
            stop("Spectra variabe \"", names(sv[i]), "\" is not of type ",
                 sv[i], call. = FALSE)
    }
    TRUE
}

This function is then used to check the input data in our newbackendInitialize() method.

setMethod(
    "backendInitialize", "MsBackendTest",
    function(object, svars, mz, intensity, data) {
        if (!missing(data)) {
            svars <- as.data.frame(
                data[, !colnames(data) %in% c("mz", "intensity")])
            if (any(colnames(data) == "mz"))
                mz <- data$mz
            if (any(colnames(data) == "intensity"))
                intensity <- data$intensity
        }
        if (!is.data.frame(svars))
            stop("'svars' needs to be a 'data.frame' with spectra variables")
        if (is.null(svars$dataStorage))
            svars$dataStorage <- "<memory>"
        if (is.null(svars$dataOrigin))
            svars$dataOrigin <- "<user provided>"
        .sv_valid_data_type(svars)
        object@spectraVars <- svars
        object@mz <- NumericList(mz, compress = FALSE)
        object@intensity <- NumericList(intensity, compress = FALSE)
        validObject(object)
        object
    })

We below create the backend again with the updated backendInitialize().

## Create and initialize the backend
be <- backendInitialize(MsBackendTest(),
                        svars = svars, mz = mzs, intensity = ints)
be

## An object of class "MsBackendTest"
## Slot "spectraVars":
##   msLevel rtime dataStorage      dataOrigin
## 1       1   1.2    <memory> <user provided>
## 2       2   1.3    <memory> <user provided>
## 3       2   1.4    <memory> <user provided>
## 
## Slot "mz":
## NumericList of length 3
## [[1]] 12.3 13.5 16.5 17.5
## [[2]] 45.1 45.2
## [[3]] 64.4 123.1 124.1
## 
## Slot "intensity":
## NumericList of length 3
## [[1]] 123.3 153.6 2354.3 243.4
## [[2]] 100 80.1
## [[3]] 12.3 35.2 100
## 
## Slot "readonly":
## [1] FALSE
## 
## Slot "version":
## [1] "0.1"

The backendInitialize() method that we implemented for our backend class expects the user to provide the full MS data. This does however not always have to be the case. The backendInitialize() method of the MsBackendMzR backend takes for example the file names of the raw mzML, mzXML or CDF files as input and initializes the backend by importing part of the data from these. Also the backends defined by the MsBackendMgf or r Biocpkg("MsBackendMsp") packages work in the same way and thus allow to import MS data from these specific file formats. The backendInitialize() method of the backend defined in the MsBackendSql on the other hand takes only the connection to a database containing the data as input and performs some sanity checks on the data but does not load the data into the backend. Any subsequent data access is handled by the methods of the backend class through SQL calls to the database.

The purpose of the backendInitialize() method is to initialize and prepare the data in a way that it can be accessed by a Spectra object (through the initialized backend class). Whether the data is loaded by thebackendInitialize() method into memory or simply referenced to within the backend class does not matter as long as the backend is able to provide the data with its accessor methods.

Note also that a backendInitialize() function should ideally also perform some data sanity checks (e.g. whether spectra variables have the correct data type etc).

`spectraVariables()`

The spectraVariables() method should return a character vector with the names of all available spectra variables of the backend. While a backend class should support defining and providing their own spectra variables, eachMsBackend class must provide also the core spectra variables (in the correct data type). Since not all data file formats provide values for all these spectra variables they can however also be NA (with the exception of the spectra variable "dataStorage").

The coreSpectraVariables() function returns the full list of mandatory spectra variables along with their expected data type.

coreSpectraVariables()

##                 msLevel                   rtime          acquisitionNum 
##               "integer"               "numeric"               "integer" 
##               scanIndex                      mz               intensity 
##               "integer"           "NumericList"           "NumericList" 
##             dataStorage              dataOrigin              centroided 
##             "character"             "character"               "logical" 
##                smoothed                polarity             precScanNum 
##               "logical"               "integer"               "integer" 
##             precursorMz      precursorIntensity         precursorCharge 
##               "numeric"               "numeric"               "integer" 
##         collisionEnergy  isolationWindowLowerMz isolationWindowTargetMz 
##               "numeric"               "numeric"               "numeric" 
##  isolationWindowUpperMz 
##               "numeric"

A typical spectraVariables() method for a MsBackend class will thus be implemented similarly to the one for our MsBackendTest test backend: it will return the union of the core spectra variables and the names for all available spectra variables within the backend object.

setMethod("spectraVariables", "MsBackendTest", function(object) {
    union(names(coreSpectraVariables()), colnames(object@spectraVars))
})
spectraVariables(be)

##  [1] "msLevel"                 "rtime"                  
##  [3] "acquisitionNum"          "scanIndex"              
##  [5] "mz"                      "intensity"              
##  [7] "dataStorage"             "dataOrigin"             
##  [9] "centroided"              "smoothed"               
## [11] "polarity"                "precScanNum"            
## [13] "precursorMz"             "precursorIntensity"     
## [15] "precursorCharge"         "collisionEnergy"        
## [17] "isolationWindowLowerMz"  "isolationWindowTargetMz"
## [19] "isolationWindowUpperMz"

`spectraData()`

The spectraData() method should return the full spectra data within a backend as a DataFrame object (defined in the _S4Vectors_package). The second parameter columns allows to define the names of the spectra variables that should be returned in the DataFrame. Each row in this data frame should represent one spectrum, each column a spectra variable. Columns "mz" and "intensity" (if requested) have to contain each aNumericList with the m/z and intensity values of the spectra. The DataFrame must provide values (even if they are NA) for all requested spectra variables of the backend (including the core spectra variables).

In our toy backend class we keep the spectra variable data in a simpledata.frame without any constraints such as required columns etc. To ensure that spectraData() always returns all required core spectra variables (of the correct data type) we can use however the fillCoreSpectraVariables()function. This function adds eventually missing core spectra variables to a data.frame thus we use it in our implementation of the spectraData()method below:

setMethod(
    "spectraData", "MsBackendTest",
    function(object, columns = spectraVariables(object)) {
        if (!all(columns %in% spectraVariables(object)))
            stop("Some of the requested spectra variables are not available")
        res <- fillCoreSpectraVariables(
            object@spectraVars,
            columns = columns[!columns %in% c("mz", "intensity")])
        ## Add m/z and intensity values to the result
        res$mz <- object@mz
        res$intensity <- object@intensity
        as(res[, columns, drop = FALSE], "DataFrame")
})

As an alternative, we could also initialize the @spectraVars data frame within the backendInitialize() method adding columns for spectra variables that are not provided by the user and require that this data frame always contains all core spectra variables. Extracting spectra data (single spectra variables or the full data) might thus be more efficient then the on-the-fly initialization with eventual missing spectra variables, but the backend class would also have a larger memory footprint because even spectra variables with only missing values for all spectra need to be stored within the object.

We can now use spectraData() to either extract the full spectra data from the backend, or only the data for selected spectra variables.

## Full data
spectraData(be)

## DataFrame with 3 rows and 19 columns
##     msLevel     rtime acquisitionNum scanIndex                 mz
##   <integer> <numeric>      <integer> <integer>      <NumericList>
## 1         1       1.2             NA        NA 12.3,13.5,16.5,...
## 2         2       1.3             NA        NA          45.1,45.2
## 3         2       1.4             NA        NA   64.4,123.1,124.1
##                  intensity dataStorage      dataOrigin centroided  smoothed
##              <NumericList> <character>     <character>  <logical> <logical>
## 1  123.3, 153.6,2354.3,...    <memory> <user provided>         NA        NA
## 2              100.0, 80.1    <memory> <user provided>         NA        NA
## 3         12.3, 35.2,100.0    <memory> <user provided>         NA        NA
##    polarity precScanNum precursorMz precursorIntensity precursorCharge
##   <integer>   <integer>   <numeric>          <numeric>       <integer>
## 1        NA          NA          NA                 NA              NA
## 2        NA          NA          NA                 NA              NA
## 3        NA          NA          NA                 NA              NA
##   collisionEnergy isolationWindowLowerMz isolationWindowTargetMz
##         <numeric>              <numeric>               <numeric>
## 1              NA                     NA                      NA
## 2              NA                     NA                      NA
## 3              NA                     NA                      NA
##   isolationWindowUpperMz
##                <numeric>
## 1                     NA
## 2                     NA
## 3                     NA

## Selected variables
spectraData(be, c("rtime", "mz", "centroided"))

## DataFrame with 3 rows and 3 columns
##       rtime                 mz centroided
##   <numeric>      <NumericList>  <logical>
## 1       1.2 12.3,13.5,16.5,...         NA
## 2       1.3          45.1,45.2         NA
## 3       1.4   64.4,123.1,124.1         NA

## Only missing core spectra variables
spectraData(be, c("centroided", "polarity"))

## DataFrame with 3 rows and 2 columns
##   centroided  polarity
##    <logical> <integer>
## 1         NA        NA
## 2         NA        NA
## 3         NA        NA

`peaksData()`

The peaksData() method extracts the MS peaks data from a backend, which includes the m/z and intensity values of each MS peak of a spectrum. These are expected to be returned as a List of numerical matrices with columns in eachmatrix being the requested peaks variables (with the default being "mz"and "intensity") of one spectrum. Backends must provide at least these two peaks variables.

Below we implement the peaksData() method for our backend. We need to loop over the @mz and @intensity slots to merge the m/z and intensity of each spectrum into a matrix. Also, for simplicity reasons, we accept only c("mz", "intensity") for the columns parameter. This is the expected default behavior for a MsBackend, but in general the columns parameter is thought to allow the user to specify which peaks variables should be returned in each matrix.

setMethod(
    "peaksData", "MsBackendTest",
    function(object, columns = c("mz", "intensity")) {
        if (length(columns) != 2 && columns != c("mz", "intensity"))
            stop("'columns' supports only \"mz\" and \"intensity\"")
        mapply(mz = object@mz, intensity = object@intensity,
               FUN = cbind, SIMPLIFY = FALSE, USE.NAMES = FALSE)
    })

And with this method we can now extract the peaks data from our backend.

peaksData(be)

## [[1]]
##        mz intensity
## [1,] 12.3     123.3
## [2,] 13.5     153.6
## [3,] 16.5    2354.3
## [4,] 17.5     243.4
## 
## [[2]]
##        mz intensity
## [1,] 45.1     100.0
## [2,] 45.2      80.1
## 
## [[3]]
##         mz intensity
## [1,]  64.4      12.3
## [2,] 123.1      35.2
## [3,] 124.1     100.0

The peaksData() method is used in many data analysis functions of theSpectra object to extract the MS data, thus ideally this method should be implemented in an efficient way. For our backend we need to loop over the lists of m/z and intensity values which is obviously not ideal. Thus, storing the m/z and intensity values in separate slots as done in this backend might not be ideal. The MsBackendMemory backend for example stores the MS data already as alist of matrices which results in a more efficient peaksData() method (but comes also with a larger overhead when adding, replacing or checking MS data).

Note also that while a backend needs to provide m/z and intensity values, additional peak variables would also be supported. The MsBackendMemory class for example allows to store and provide additional peak variables that can then be added as additional columns to each returned matrix. In this case the default peaksVariables() method should also be overwritten to list the additionally available variables and the columns parameter of thepeaksData() method should allow selection of these additional peaks variables (in addition to the required "mz" and "intensity" variables).

`extractByIndex()` and `[`

The extractByIndex() and [ methods allows to subset MsBackend objects. This operation is expected to reduce a MsBackend object to the selected spectra. These methods must also support duplication (e.g. [c(1, 1, 1)] and extraction in any arbitrary order (e.g. [c(3, 1, 5, 3)]). While both methods subset the object, extractByIndex() only supports to subset with an integerindex, while [, to be compliant with the base R implementation, should support to subset by indices or logical vectors. An error should be thrown if indices are out of bounds, but the method should also support returning an empty backend with [integer()]. Note that the MsCoreUtils::i2index function can be used to check for correct input (and convert the input to an integer index).

The extractByIndex() method is used by the data operation and analysis methods on Spectra objects, while the [ is intended to be used by the end user (if needed). Below we implement extractByIndex() for our backend:

setMethod("extractByIndex", c("MsBackendTest", "ANY"), function(object, i) {
    object@spectraVars <- object@spectraVars[i, ]
    object@mz <- object@mz[i]
    object@intensity <- object@intensity[i]
    object
})

The [ does not need to be defined because a default implementation for the base MsBackend exists.

We can now subset our backend to the last two spectra.

a <- extractByIndex(be, 2:3)
spectraData(a)

## DataFrame with 2 rows and 19 columns
##     msLevel     rtime acquisitionNum scanIndex                mz
##   <integer> <numeric>      <integer> <integer>     <NumericList>
## 1         2       1.3             NA        NA         45.1,45.2
## 2         2       1.4             NA        NA  64.4,123.1,124.1
##           intensity dataStorage      dataOrigin centroided  smoothed  polarity
##       <NumericList> <character>     <character>  <logical> <logical> <integer>
## 1       100.0, 80.1    <memory> <user provided>         NA        NA        NA
## 2  12.3, 35.2,100.0    <memory> <user provided>         NA        NA        NA
##   precScanNum precursorMz precursorIntensity precursorCharge collisionEnergy
##     <integer>   <numeric>          <numeric>       <integer>       <numeric>
## 1          NA          NA                 NA              NA              NA
## 2          NA          NA                 NA              NA              NA
##   isolationWindowLowerMz isolationWindowTargetMz isolationWindowUpperMz
##                <numeric>               <numeric>              <numeric>
## 1                     NA                      NA                     NA
## 2                     NA                      NA                     NA

Or extracting the second spectrum multiple times.

a <- be[c(2, 2, 2)]
spectraData(a)

## DataFrame with 3 rows and 19 columns
##       msLevel     rtime acquisitionNum scanIndex            mz     intensity
##     <integer> <numeric>      <integer> <integer> <NumericList> <NumericList>
## 2           2       1.3             NA        NA     45.1,45.2   100.0, 80.1
## 2.1         2       1.3             NA        NA     45.1,45.2   100.0, 80.1
## 2.2         2       1.3             NA        NA     45.1,45.2   100.0, 80.1
##     dataStorage      dataOrigin centroided  smoothed  polarity precScanNum
##     <character>     <character>  <logical> <logical> <integer>   <integer>
## 2      <memory> <user provided>         NA        NA        NA          NA
## 2.1    <memory> <user provided>         NA        NA        NA          NA
## 2.2    <memory> <user provided>         NA        NA        NA          NA
##     precursorMz precursorIntensity precursorCharge collisionEnergy
##       <numeric>          <numeric>       <integer>       <numeric>
## 2            NA                 NA              NA              NA
## 2.1          NA                 NA              NA              NA
## 2.2          NA                 NA              NA              NA
##     isolationWindowLowerMz isolationWindowTargetMz isolationWindowUpperMz
##                  <numeric>               <numeric>              <numeric>
## 2                       NA                      NA                     NA
## 2.1                     NA                      NA                     NA
## 2.2                     NA                      NA                     NA

`backendMerge()`

The backendMerge() method merges (combines) MsBackend objects (of the same type!) into a single instance. For our test backend we thus need to combine the values in the @spectraVars, @mz and @intensity slots. To support also merging of data.frames with different set of columns we use theMsCoreUtils::rbindFill() function instead of a simple rbind() (this function joins data frames making an union of all available columns).

setMethod("backendMerge", "MsBackendTest", function(object, ...) {
    res <- object
    object <- unname(c(object, ...))
    res@mz <- do.call(c, lapply(object, function(z) z@mz))
    res@intensity <- do.call(c, lapply(object, function(z) z@intensity))
    res@spectraVars <- do.call(MsCoreUtils::rbindFill,
                               lapply(object, function(z) z@spectraVars))
    validObject(res)
    res
})

Again, this implementation which requires 3 loops might not be the most efficient - but it allows to merge backends of the type MsBackendTest.

a <- backendMerge(be, be[2], be)
a

## An object of class "MsBackendTest"
## Slot "spectraVars":
##    msLevel rtime dataStorage      dataOrigin
## 1        1   1.2    <memory> <user provided>
## 2        2   1.3    <memory> <user provided>
## 3        2   1.4    <memory> <user provided>
## 21       2   1.3    <memory> <user provided>
## 11       1   1.2    <memory> <user provided>
## 22       2   1.3    <memory> <user provided>
## 31       2   1.4    <memory> <user provided>
## 
## Slot "mz":
## NumericList of length 7
## [[1]] 12.3 13.5 16.5 17.5
## [[2]] 45.1 45.2
## [[3]] 64.4 123.1 124.1
## [[4]] 45.1 45.2
## [[5]] 12.3 13.5 16.5 17.5
## [[6]] 45.1 45.2
## [[7]] 64.4 123.1 124.1
## 
## Slot "intensity":
## NumericList of length 7
## [[1]] 123.3 153.6 2354.3 243.4
## [[2]] 100 80.1
## [[3]] 12.3 35.2 100
## [[4]] 100 80.1
## [[5]] 123.3 153.6 2354.3 243.4
## [[6]] 100 80.1
## [[7]] 12.3 35.2 100
## 
## Slot "readonly":
## [1] FALSE
## 
## Slot "version":
## [1] "0.1"

`$`

The $ method is expected to extract a single spectra variable from a backend. Parameter name should allow to name the spectra variable to return. Each MsBackend must support extracting the core spectra variables with this method (even if no data might be available for that variable). In our example implementation below we make use of the spectraData() method, but more efficient implementations might be available as well (that would not require to first subset/create a DataFrame with the full data and to then subset that again). Also, the $ method should check if the requested spectra variable is available and should throw an error otherwise.

setMethod("$", "MsBackendTest", function(x, name) {
    spectraData(x, columns = name)[, 1L]
})

With this we can now extract the MS levels

be$msLevel

## [1] 1 2 2

or a core spectra variable that is not available within the backend.

be$precursorMz

## [1] NA NA NA

or also the m/z values

be$mz

## NumericList of length 3
## [[1]] 12.3 13.5 16.5 17.5
## [[2]] 45.1 45.2
## [[3]] 64.4 123.1 124.1

`lengths()`

The lengths() method is expected to return an integer vector (same length as the number of spectra in the backend) with the total number of peaks per spectrum.

For our MsBackendTest we can simply use the lengths() method on the m/z or intensity values for that.

setMethod("lengths", "MsBackendTest", function(x, use.names = FALSE) {
    lengths(x@mz, use.names = use.names)
})

And we can now get the peaks count per spectrum:

lengths(be)

## [1] 4 2 3

`isEmpty()`

The isEmpty() method is expected to return for each spectrum the information whether it is empty, i.e. does not contain any MS peaks (and hence m/z or intensity values). The result of the method has to be a logical of length equal to the number of spectra represented by the backend with TRUE indicating whether a spectrum is empty and FALSE otherwise. For our implementation of theisEmpty() method we use the lenghts() method defined above that returns the number of MS peaks per spectrum.

setMethod("isEmpty", "MsBackendTest", function(x) {
    lengths(x) == 0L
})
isEmpty(be)

## [1] FALSE FALSE FALSE

`acquisitionNum()`

Extract the acquisitionNum core spectra variable. The method is expected to return an integer vector with the same length as there are spectra represented by the backend. For our backend we simply re-use the spectraData() method.

setMethod("acquisitionNum", "MsBackendTest", function(object) {
    spectraData(object, "acquisitionNum")[, 1L]
})
acquisitionNum(be)

## [1] NA NA NA

`centroided()`

Extract for each spectrum the information whether it contains _centroided_data. The method is expected to return a logical vector with the same length as there are spectra represented by the backend.

setMethod("centroided", "MsBackendTest", function(object) {
    spectraData(object, "centroided")[, 1L]
})
centroided(be)

## [1] NA NA NA

`collisionEnergy()`

Extract for each spectrum the collision energy applied to generate the fragment spectrum. The method is expected to return a numeric vector with the same length as there are spectra represented by the backend (with NA_real_ for spectra for which this information is not available, such as MS1 spectra).

setMethod("collisionEnergy", "MsBackendTest", function(object) {
    spectraData(object, "collisionEnergy")[, 1L]
})
collisionEnergy(be)

## [1] NA NA NA

`dataOrigin()`

Extract the data origin spectra variable for each spectrum. This spectra variable can be used to store the origin of each spectra. The method is expected to return a character vector of length equal to the number of spectra represented by the backend.

setMethod("dataOrigin", "MsBackendTest", function(object) {
    spectraData(object, "dataOrigin")[, 1L]
})
dataOrigin(be)

## [1] "<user provided>" "<user provided>" "<user provided>"

`intensity()`

Extract the intensity values for each spectrum in the backend. The result is expected to be a NumericList of length equal to the number of spectra represented by the backend. For our test backend we can simply return the@intensity slot since the data is already stored within a NumericList.

setMethod("intensity", "MsBackendTest", function(object) {
    object@intensity
})
intensity(be)

## NumericList of length 3
## [[1]] 123.3 153.6 2354.3 243.4
## [[2]] 100 80.1
## [[3]] 12.3 35.2 100

`isolationWindowLowerMz()`

Extract the core spectra variable isolationWindowLowerMz from the backend. This information is usually provided for each spectrum in the raw mzML files. The method is expected to return a numeric vector of length equal to the number of spectra represented by the backend.

setMethod("isolationWindowLowerMz", "MsBackendTest", function(object) {
    spectraData(object, "isolationWindowLowerMz")[, 1L]
})
isolationWindowLowerMz(be)

## [1] NA NA NA

`isolationWindowTargetMz()`

Extract the core spectra variable isolationWindowTargetMz from the backend. This information is usually provided for each spectrum in the raw mzML files. The method is expected to return a numeric vector of length equal to the number of spectra represented by the backend.

setMethod("isolationWindowTargetMz", "MsBackendTest", function(object) {
    spectraData(object, "isolationWindowTargetMz")[, 1L]
})
isolationWindowTargetMz(be)

## [1] NA NA NA

`isolationWindowUpperMz()`

Extract the core spectra variable isolationWindowUpperMz from the backend. This information is usually provided for each spectrum in the raw mzML files. The method is expected to return a numeric vector of length equal to the number of spectra represented by the backend.

setMethod("isolationWindowUpperMz", "MsBackendTest", function(object) {
    spectraData(object, "isolationWindowUpperMz")[, 1L]
})
isolationWindowUpperMz(be)

## [1] NA NA NA

`msLevel()`

Extract the MS level for each spectrum in the backend. This method is expected to return an integer of length equal to the number of spectra represented by the backend.

setMethod("msLevel", "MsBackendTest", function(object) {
    spectraData(object, "msLevel")[, 1L]
})
msLevel(be)

## [1] 1 2 2

`mz()`

Extract the m/z values for each spectrum in the backend. The result is expected to be a NumericList of length equal to the number of spectra represented by the backend. Also, the m/z values are expected to be ordered increasingly for each element (spectrum).

setMethod("mz", "MsBackendTest", function(object) {
    object@mz
})
mz(be)

## NumericList of length 3
## [[1]] 12.3 13.5 16.5 17.5
## [[2]] 45.1 45.2
## [[3]] 64.4 123.1 124.1

`polarity()`

Extract the polarity core spectra variable for each spectrum in the backend. This method is expected to return an integer of length equal to the number of spectra represented by the backend. Negative and positive polarity are expected to be encoded by 0L and 1L, respectively.

setMethod("polarity", "MsBackendTest", function(object) {
    spectraData(object, "polarity")[, 1L]
})
polarity(be)

## [1] NA NA NA

`precScanNum()`

Extract the acquisition number of the precursor for each spectrum. This method is expected to return an integer of length equal to the number of spectra represented by the backend. For MS1 spectra (or if the acquisition number of the precursor is not provided) NA_integer_ has to be returned.

setMethod("precScanNum", "MsBackendTest", function(object) {
    spectraData(object, "precScanNum")[, 1L]
})
precScanNum(be)

## [1] NA NA NA

`precursorCharge()`

Extract the charge of the precursor for each spectrum. This method is expected to return an integer of length equal to the number of spectra represented by the backend. For MS1 spectra (or if the charge of the precursor is not provided) NA_integer_ has to be returned.

setMethod("precursorCharge", "MsBackendTest", function(object) {
    spectraData(object, "precursorCharge")[, 1L]
})
precursorCharge(be)

## [1] NA NA NA

`precursorIntensity()`

Extract the intensity of the precursor for each spectrum. This method is expected to return an numeric of length equal to the number of spectra represented by the backend. For MS1 spectra (or if the precursor intensity for a fragment spectrum is not provided) NA_real_ has to be returned.

setMethod("precursorIntensity", "MsBackendTest", function(object) {
    spectraData(object, "precursorIntensity")[, 1L]
})
precursorIntensity(be)

## [1] NA NA NA

`precursorMz()`

Extract the precursor m/z for each spectrum. This method is expected to return an numeric of length equal to the number of spectra represented by the backend. For MS1 spectra (or if the precursor m/z for a fragment spectrum is not provided) NA_real_ has to be returned.

setMethod("precursorMz", "MsBackendTest", function(object) {
    spectraData(object, "precursorMz")[, 1L]
})
precursorMz(be)

## [1] NA NA NA

`rtime()`

Extract the retention time of each spectrum. This method is expected to return anumeric of length equal to the number of spectra represented by the backend.

setMethod("rtime", "MsBackendTest", function(object) {
    spectraData(object, "rtime")[, 1L]
})
rtime(be)

## [1] 1.2 1.3 1.4

`scanIndex()`

Extract the scan index core spectra variable. The scan index represents the relative index of the spectrum within the respective raw data file and can be different than the acquisitionNum (which is the index of a spectrum as recorded by the MS instrument). This method is expected to return a integer of length equal to the number of spectra represented by the backend.

setMethod("scanIndex", "MsBackendTest", function(object) {
    spectraData(object, "scanIndex")[, 1L]
})
scanIndex(be)

## [1] NA NA NA

`smoothed()`

Extract the smoothed core spectra variable that indicates whether a spectrum was smoothed. This variable is supported for backward compatibility but seldomly used. The method is expected to return a logical with length equal to the number of spectra represented by the backend.

setMethod("smoothed", "MsBackendTest", function(object) {
    spectraData(object, "smoothed")[, 1L]
})
smoothed(be)

## [1] NA NA NA

`spectraNames()`

The spectraNames() can be used to extract (optional) names (or IDs) for individual spectra of a backend, or NULL if not set. For our test backend we can use the rownames of the @spectraVars slot to store spectra names.

setMethod("spectraNames", "MsBackendTest", function(object) {
    rownames(object@spectraVars)
})
spectraNames(be)

## [1] "1" "2" "3"

These are all the methods that need to be implemented for a valid read-only MsBackend class and running a test on such an object as described in section_Testing the validity of the backend_ should not produce any errors. For backends that support also data replacement also the methods listed in the next section need to be implemented.

`tic()`

The tic() method should return the total ion count (i.e. the sum of intensities) for each spectrum. This information is usually also provided by the raw MS data files, but can also be calculated on the fly from the data. The parameter initial (which is by default TRUE) allows to define whether the provided original tic should be returned (for initial = TRUE) or whether the tic should be calculated on the actual data (initial = FALSE). The original tic values are usually provided by a spectra variable "totIonCurrent". Thus, for initial = TRUE, in our implementation below we return the value of such a spectra variable it it is avaialble or NA if it is not.

setMethod("tic", "MsBackendTest", function(object, initial = TRUE) {
    if (initial) {
        if (any(spectraVariables(object) == "totIonCurrent"))
            spectraData(object, "totIonCurrent")[, 1L]
        else rep(NA_real_, length(object))
    } else vapply(intensity(object), sum, numeric(1), na.rm = TRUE)
})

We can now either return the original (initial) TIC (which is not available).

tic(be)

## [1] NA NA NA

Or calculate the TIC based on the actual intensity values.

tic(be, initial = FALSE)

## [1] 2874.6  180.1  147.5

Data replacement methods

As stated in the general description, MsBackend implementations can also be purely read-only resources allowing to just access data, but not to replace the data. Thus, it is not strictly required to implement these methods, but for a fully functional backend it is suggested (as much as possible). A backend for a purely read-only MS data resource might even extend the MsBackendCachedbackend defined in the Spectra package that provides a mechanism to cache (spectra variable) data in a data.frame within the object. TheMsBackendMassbankSql implemented in the _MsBackendMassbank_package extends for example this backend and thus allows modifying some spectra variables without changing the original data in the MassBank SQL database.

`spectraData<-`

The spectraData<- method should allow to replace the data within a backend. The method should take a DataFrame with the full data as input value and is expected to replace the full data within the backend, i.e. all spectra variables as well as peak data. Also, importantly, the number of spectra before and after calling the spectraData<- method on an object has to be the same. For our implementation we can make use of the optional parameter datathat we added to the backendInitialize() method and that allows to fill aMsBackendTest object with the full data.

setReplaceMethod("spectraData", "MsBackendTest", function(object, value) {
    if (!inherits(value, "DataFrame"))
        stop("'value' is expected to be a 'DataFrame'")
    if (length(object) && length(object) != nrow(value))
        stop("'value' has to be a 'DataFrame' with ", length(object), " rows")
    object <- backendInitialize(MsBackendTest(), data = value)
    object
})

To test this new method we extract the full spectra data, add an additional column (spectra variable) and replace the data again.

d <- spectraData(be)
d$new_col <- c("a", "b", "c")

spectraData(be) <- d
be$new_col

## [1] "a" "b" "c"

`intensity<-`

The intensity<- method should allow to replace the intensity values of all spectra in a backend. This method is expected to only replace the values of the intensities, but must not change the number of intensities (and hence peaks) of a spectrum (that could be done with the peaksData<- method that allows to replace intensity and m/z values at the same time). The value for the method should ideally be a NumericList to ensure that all intensity values are indeed numeric. In addition to the method we implement also a simple helper function that checks for the correct length of value. Each data replacement method needs to check for that and this function thus reduces code duplication.

.match_length <- function(x, y) {
    if (length(x) != length(y))
        stop("Length of 'value' has to match the length of 'object'")
}

setReplaceMethod("intensity", "MsBackendTest", function(object, value) {
    .match_length(object, value)
    if (!(is.list(value) || inherits(value, "NumericList")))
        stop("'value' has to be a list or NumericList")
    if (!all(lengths(value) == lengths(mz(object))))
        stop("lengths of 'value' has to match the number of peaks per spectrum")
    if (!inherits(value, "NumericList"))
        value <- NumericList(value, compress = FALSE)
    object@intensity <- value
    object
})

We could now use this method to replace the intensities in our backend with modified intensities.

intensity(be)

## NumericList of length 3
## [[1]] 123.3 153.6 2354.3 243.4
## [[2]] 100 80.1
## [[3]] 12.3 35.2 100

intensity(be) <- intensity(be) - 10
intensity(be)

## NumericList of length 3
## [[1]] 113.3 143.6 2344.3 233.4
## [[2]] 90 70.1
## [[3]] 2.3 25.2 90

`mz<-`

The mz<- method should allow to replace the m/z values of all spectra in a backend. The implementation can be the same as for the intensity<-method. m/z values within each spectrum need to be increasingly ordered. We thus also check that this is the case for the provided m/z values. We take here the advantage that a efficient is.unsorted() implementation for NumericList is already available, which is faster than e.g. callingvapply(mz(be), is.unsorted, logical(1)).

setReplaceMethod("mz", "MsBackendTest", function(object, value) {
    .match_length(object, value)
    if (!(is.list(value) || inherits(value, "NumericList")))
        stop("'value' has to be a list or NumericList")
    if (!all(lengths(value) == lengths(mz(object))))
        stop("lengths of 'value' has to match the number of peaks per spectrum")
    if (!inherits(value, "NumericList"))
        value <- NumericList(value, compress = FALSE)
    if (any(is.unsorted(value)))
        stop("m/z values need to be increasingly sorted within each spectrum")
    object@mz <- value
    object
})

`peaksData<-`

The peaksData<- should allow to replace the peaks data (m/z and intensity values) of all spectra in a backend. In contrast to the mz<- and intensity<-methods this method should also support changing the number of peaks per spectrum (e.g. due to filtering). Parameter value has to be a list ofmatrix objects with columns "mz" and "intensity". The length of this list has to match the number of spectra in the backend. In the implementation for our backend class we need to loop over this list to extract the m/z and intensity values and assign them to the @mz and @intensity slots.

setReplaceMethod("peaksData", "MsBackendTest", function(object, value) {
    if (!(is.list(value) || inherits(value, "SimpleList")))
        stop("'value' has to be a list-like object")
    .match_length(object, value)
    object@mz <- NumericList(lapply(value, "[", , "mz"), compress = FALSE)
    object@intensity <- NumericList(lapply(value, "[", , "intensity"),
                                    compress = FALSE)
    validObject(object)
    object
})

Using the peaksData<- method we can now also for example remove peaks.

pd <- peaksData(be)
## Remove the first peak from the first spectrum
pd[[1L]] <- pd[[1L]][-1L, ]

lengths(be)

## [1] 4 2 3

peaksData(be) <- pd
lengths(be)

## [1] 3 2 3

`$<-`

The $<- method should allow to replace values for spectra variables or also to add additional spectra variables to the backend. As with all replacement methods, the length() of value has to match the number of spectra represented by the backend.

setReplaceMethod("$", "MsBackendTest", function(x, name, value) {
    .match_length(x, value)
    if (name == "mz") {
        mz(x) <- value
    } else if (name == "intensity") {
       intensity(x) <- value
    } else {
        x@spectraVars[[name]] <- value
    }
    .sv_valid_data_type(x@spectraVars, name)
    x
})

We can now replace for example existing spectra variables:

msLevel(be)

## [1] 1 2 2

be$msLevel <- c(2L, 1L, 2L)
msLevel(be)

## [1] 2 1 2

Or even add new spectra variables.

be$new_var <- c("a", "b", "c")
be$new_var

## [1] "a" "b" "c"

Replacement methods for all core spectra variables can be implemented similarly.

`selectSpectraVariables()`

The selectSpectraVariables() function should allow to reduce the information within the backend (parameter object) to the selected spectra variables (parameter spectraVariables). This is equivalent to a subset by columns/variables. For core spectra variables, if not specified by parameterspectraVariables, only their values are expected to be removed (since core spectra variables are expected to be available even if they are not defined within a backend). The implementation for our backend will remove any columns in the @spectraVars data frame not defined in the spectraVariablesparameter. Special care is given to the "mz" and "intensity" spectra variables: if they are not selected, the @mz and @intensity slots are initialized with empty NumericList (of length matching the number of spectra). Note also that some backends might throw an error if a spectra variable required for the backend is removed (such as "dataStorage" for aMsBackendMzR backend, which is required by the backend to allow retrieval of m/z and intensity values).

setMethod(
    "selectSpectraVariables", "MsBackendTest",
    function(object, spectraVariables = spectraVariables(object)) {
        keep <- colnames(object@spectraVars) %in% spectraVariables
        object@spectraVars <- object@spectraVars[, keep, drop = FALSE]
        if (!any(spectraVariables == "mz"))
            object@mz <- NumericList(vector("list", length(object)),
                                     compress = FALSE)
        if (!any(spectraVariables == "intensity"))
            object@intensity <- NumericList(vector("list", length(object)),
                                            compress = FALSE)
        validObject(object)
        object
    })

We can now use selectSpectraVariables() to remove for example the spectra variable "new_var" added above.

be2 <- be
be2 <- selectSpectraVariables(be2, c("msLevel", "rtime", "mz",
                                     "intensity", "dataStorage"))
spectraVariables(be2)

##  [1] "msLevel"                 "rtime"                  
##  [3] "acquisitionNum"          "scanIndex"              
##  [5] "mz"                      "intensity"              
##  [7] "dataStorage"             "dataOrigin"             
##  [9] "centroided"              "smoothed"               
## [11] "polarity"                "precScanNum"            
## [13] "precursorMz"             "precursorIntensity"     
## [15] "precursorCharge"         "collisionEnergy"        
## [17] "isolationWindowLowerMz"  "isolationWindowTargetMz"
## [19] "isolationWindowUpperMz"

The spectra variable "new_var" is now no longer be available. Note however that still all core spectra variables are listed, even if they were not selected with the spectraVariables parameter. While these variables (such as"dataOrigin") are still listed by spectraVariables(be2), their actual values have been removed:

dataOrigin(be)

## [1] "<user provided>" "<user provided>" "<user provided>"

dataOrigin(be2)

## [1] NA NA NA

If "mz" and "intensitity" are not selected, the m/z and intensity values get removed.

be2 <- selectSpectraVariables(be2, c("msLevel", "rtime", "dataStorage"))
mz(be2)

## NumericList of length 3
## [[1]] numeric(0)
## [[2]] numeric(0)
## [[3]] numeric(0)

intensity(be2)

## NumericList of length 3
## [[1]] numeric(0)
## [[2]] numeric(0)
## [[3]] numeric(0)

`centroided<-`

Replace the value for the centroided core spectra variable. The provided data type must be a logical. We re-use the function .sv_valid_data_type which was defined above for backendInitialize() to check for the correct data type of core spectra variables.

setReplaceMethod("centroided", "MsBackendTest", function(object, value) {
    object@spectraVars[["centroided"]] <- value
    .sv_valid_data_type(object@spectraVars, "centroided")
    object
})

Alternatively, we could also simply re-use the $<- replacement method above. This would make the whole code base for our backend cleaner as replacing or adding spectra variables would be handled in a single central function.

setReplaceMethod("centroided", "MsBackendTest", function(object, value) {
    object$centroided <- value
    object
})

centroided(be) <- c(TRUE, FALSE, TRUE)
centroided(be)

## [1]  TRUE FALSE  TRUE

`collisionEnergy<-`

Replace the values for the collision energy. Parameter value has to be of typenumeric.

setReplaceMethod("collisionEnergy", "MsBackendTest", function(object, value) {
    object$collisionEnergy <- value
    object
})

collisionEnergy(be) <- c(NA_real_, 20.0, 20.0)
collisionEnergy(be)

## [1] NA 20 20

`dataOrigin<-`

Replace the values for the data origin spectra variable. Parameter value has to be of type character.

setReplaceMethod("dataOrigin", "MsBackendTest", function(object, value) {
    object$dataOrigin <- value
    object
})

dataOrigin(be)

## [1] "<user provided>" "<user provided>" "<user provided>"

dataOrigin(be) <- c("unknown", "file a", "file b")
dataOrigin(be)

## [1] "unknown" "file a"  "file b"

`dataStorage<-`

Replace the values for the data storage spectra variable. Parameter value has to be of type character. Since our backend does not really make any use of this spectra variable, we can accept any character value. For other backends, that for example need to load data on-the-fly from data files, this spectra variable could be used to store the name of the data files and hence we would need to perform some additional checks within this replacement function.

setReplaceMethod("dataStorage", "MsBackendTest", function(object, value) {
    object$dataStorage <- value
    object
})

dataStorage(be)

## [1] "<memory>" "<memory>" "<memory>"

dataStorage(be) <- c("", "", "")
dataStorage(be)

## [1] "" "" ""

`isolationWindowLowerMz<-`

Replace the values for the isolation window lower m/z spectra variable. Parameter value has to be of type numeric (NA_real_ missing values are supported, e.g. for MS1 spectra).

setReplaceMethod(
    "isolationWindowLowerMz", "MsBackendTest", function(object, value) {
        object$isolationWindowLowerMz <- value
        object
    })

isolationWindowLowerMz(be) <- c(NA_real_, 245.3, NA_real_)
isolationWindowLowerMz(be)

## [1]    NA 245.3    NA

`isolationWindowTargetMz<-`

Replace the values for the isolation window target m/z spectra variable. Parameter value has to be of type numeric (NA_real_ missing values are supported, e.g. for MS1 spectra).

setReplaceMethod(
    "isolationWindowTargetMz", "MsBackendTest", function(object, value) {
        object$isolationWindowTargetMz <- value
        object
    })

isolationWindowTargetMz(be) <- c(NA_real_, 245.4, NA_real_)
isolationWindowTargetMz(be)

## [1]    NA 245.4    NA

`isolationWindowUpperMz<-`

Replace the values for the isolation window upper m/z spectra variable. Parameter value has to be of type numeric (NA_real_ missing values are supported, e.g. for MS1 spectra).

setReplaceMethod(
    "isolationWindowUpperMz", "MsBackendTest", function(object, value) {
        object$isolationWindowUpperMz <- value
        object
    })

isolationWindowUpperMz(be) <- c(NA_real_, 245.5, NA_real_)
isolationWindowUpperMz(be)

## [1]    NA 245.5    NA

`msLevel<-`

Replace the MS level of spectra in a backend. Parameter value has to be of type integer. Missing values (NA_integer_) are supported.

setReplaceMethod("msLevel", "MsBackendTest", function(object, value) {
    object$msLevel <- value
    object
})

msLevel(be)

## [1] 2 1 2

msLevel(be) <- c(1L, 1L, 2L)
msLevel(be)

## [1] 1 1 2

`polarity<-`

Replace the values for the polarity spectra variables. Parameter value has to be of type integer and should ideally also use the standard encoding 0Land 1L for negative and positive polarity (and NA_integer for missing). Thus, in our implementation we also make sure the input parameter contains the expected values (although this is not a strictly required).

setReplaceMethod("polarity", "MsBackendTest", function(object, value) {
    if (!all(value %in% c(0, 1, NA)))
        stop("'polarity' should be encoded as 0L (negative), 1L (positive) ",
             "with missing values being NA_integer_")
    object$polarity <- value
    object
})

polarity(be) <- c(0L, 0L, 0L)
polarity(be)

## [1] 0 0 0

`rtime<-`

Replace the retention times for the spectra represented by the backend. Parameter value must be of type numeric. Also, although it is not a strict requirement, retention times should ideally be ordered increasingly per sample and their unit should be seconds.

setReplaceMethod("rtime", "MsBackendTest", function(object, value) {
    object$rtime <- value
    object
})

rtime(be)

## [1] 1.2 1.3 1.4

rtime(be) <- rtime(be) + 2
rtime(be)

## [1] 3.2 3.3 3.4

`smoothed<-`

Replace the spectra variable smoothed that indicates whether some data smoothing operation was performed on the spectra. Parameter value must be of type logical.

setReplaceMethod("smoothed", "MsBackendTest", function(object, value) {
    object$smoothed <- value
    object
})

smoothed(be) <- rep(TRUE, 3)
smoothed(be)

## [1] TRUE TRUE TRUE

`spectraNames<-`

Replace the names of individual spectras within the backend. Same as fornames, colnames or rownames, spectraNames are expected to be of typecharacter. In our backend implementation we store the spectra names into therownames of the @spectraVars data frame.

setReplaceMethod("spectraNames", "MsBackendTest", function(object, value) {
    rownames(object@spectraVars) <- value
    object
})

spectraNames(be) <- c("a", "b", "c")
spectraNames(be)

## [1] "a" "b" "c"

Optional methods

Default implementations for these methods are available for MsBackendclasses, thus these methods don’t have to be implemented for each new backend. For some backends, depending on how the data is represented or accessed within it, different implementations might however be more efficient.

`backendBpparam()`

The backendBpparam() method is supposed to evaluate whether a provided (or the default) parallel processing setup is supported by the backend. Backends that do not support parallel processing should return SerialParam() instead.

The default implementation is shown below.

setMethod("backendBpparam", signature = "MsBackend",
          function(object, BPPARAM = bpparam()) {
              ## Return SerialParam() instead to disable parallel processing
              BPPARAM
          })

`backendParallelFactor()`

The backendParallelFactor() allows a backend to suggest a preferred way how the backend could be split for parallel processing. See also the notes on parallel processing above for more information. The default implementation returns factor() (i.e. a factor of length 0) hence not suggesting any splitting:

setMethod("backendParallelFactor", "MsBackend", function(object, ...) {
    factor()
})

`backendRequiredSpectraVariables()`

The backendRequiredSpectraVariables() method can be implemented if a backend needs specific spectra variables to work. The default implementation is:

setMethod("backendRequiredSpectraVariables", "MsBackend",
          function(object, ...) {
              character()
          })

The implementation for MsBackendMzR returns c("dataStorage", "scanIndex") as the backend needs these two spectra variables to load the MS data on-the-fly from the original data files.

`dropNaSpectraVariables()`

The dropNaSpectraVariables() is supposed to allow removing all spectra variables from a data set (storage) that contain only missing values (i.e. where the value of a spectra variable for each spectrum is NA). This function is intended to reduce memory requirements of backends such as the MsBackendMzRthat load values from all core spectra variables from the original data files, even if their values are only NA. Removing these missing values from the backend can hence reduce the size in memory of a backend without data loss (because methods extracting core spectra variables are supposed to always returnNA values even if no data is available for them - in such cases the NAvalues are supposed to be created on-the-fly.

The default implementation is shown below.

setMethod("dropNaSpectraVariables", "MsBackend", function(object) {
    svs <- spectraVariables(object)
    svs <- svs[!(svs %in% c("mz", "intensity"))]
    spd <- spectraData(object, columns = svs)
    keep <- !vapply1l(spd, function(z) all(is.na(z)))
    selectSpectraVariables(object, c(svs[keep], "mz", "intensity"))
})

`isReadOnly()`

isReadOnly() is expected to return a logical(1) with either TRUE orFALSE indicating whether the backend supports replacing data or not. The default implementation is shown below.

setMethod("isReadOnly", "MsBackend", function(object) {
    object@readonly
})

`peaksVariables()`

The peaksVariables() is expected to return a character vector with the names of the peaks variables (i.e. information and properties of individual mass peaks) available in the backend. The default implementation for MsBackendreturns by default c("mz", "intensity"). This method should only be implemented for backends that (eventually) also provide additional peaks variables. The default implementation is shown below.

setMethod("peaksVariables", "MsBackend", function(object) {
    c("mz", "intensity")
})

`uniqueMsLevels()`

This method should return the unique MS level(s) of all spectra within the backend. The default implementation is shown below.

setMethod("uniqueMsLevels", "MsBackend", function(object, ...) {
    unique(msLevel(object))
})

This method thus retrieves first the MS levels of all spectra and then callsunique() on them. Database-based backends might avoid such an eventually heavy operation by selecting the unique MS levels directly using an SQL call.

`precursorMz<-`

Replace the values for the precursor m/z spectra variable. Parameter value has to be of type numeric (NA_real_ missing values are supported, e.g. for MS1 spectra). The default implementation uses the$<- method:

setReplaceMethod("precursorMz", "MsBackend", function(object, ..., value) {
    object$precursorMz <- value
    object
})

`ionCount()`

The ionCount() method should return a numeric (length equal to the number of spectra represented by the backend) with the sum of all intensities within each spectrum. For empty spectra NA_real_ should be returned. The method below is the default implementation of the method.

setMethod("ionCount", "MsBackend", function(object) {
    vapply1d(intensity(object), sum, na.rm = TRUE)
})

`isCentroided()`

This method should return the information for each spectrum whether it is_centroided_. In contrast to the centroided() method a heuristic approach is used. The default implementation is shown below.

setMethod("isCentroided", "MsBackend", function(object, ...) {
    vapply1l(peaksData(object), .peaks_is_centroided)
})

`reset()`

This is a special method that backends may implement or support, but don’t necessary have to. This method will be called by the reset,Spectra method and is supposed to restore the data to its original state. The default implementation for MsBackend shown below simply returns the backend as-is. TheMsBackendSql backend from the MsBackendSql package in contrast re-initializes the data using the data from the database.

setMethod("reset", "MsBackend", function(object) {
    object
})

`export()`

This method should export the data from a MsBackend. The method is called by the export,Spectra method that passes itself as a second argument to the function. The export,MsBackend implementation is thus expected to take aSpectra object as second argument from which all data should be taken and exported. Implementation of this method is optional. The implementation of the method for the MsBackendMzR backend is shown below.

setMethod("export", "MsBackendMzR", function(object, x, file = tempfile(),
                                             format = c("mzML", "mzXML"),
                                             copy = FALSE,
                                             BPPARAM = bpparam()) {
    l <- length(x)
    file <- sanitize_file_name(file)
    if (length(file) == 1)
        file <- rep_len(file, l)
    if (length(file) != l)
        stop("Parameter 'file' has to be either of length 1 or ",
             length(x), ", i.e. 'length(x)'.", call. = FALSE)
    f <- factor(file, levels = unique(file))
    tmp <- bpmapply(.write_ms_data_mzR, split(x, f), levels(f),
                    MoreArgs = list(format = format, copy = copy),
                    BPPARAM = BPPARAM)
})

See alternatively also the MsBackendMgf package for an implementation for the MsBackendMgf backend.

`split()`

The split() method should allow to split a MsBackend into a list ofMsBackend objects. The default implementation is shown below.

setMethod("split", "MsBackend", function(x, f, drop = FALSE, ...) {
    split.default(x, f, drop = drop, ...)
})

`supportsSetBackend()`

Whether a MsBackend supports the setBackend() method that allows to change the backend of a Spectra object from one to another backend. To supportsetBackend() the backend needs to have a parameter data in itsbackendInitialize() method that allows it to be initialized with a DataFramecontaining the full spectra data. The default implementation is shown below.

setMethod("supportsSetBackend", "MsBackend", function(object, ...) {
    !isReadOnly(object)
})

`filterDataOrigin()`

The filter* methods are expected to take a MsBackend and to subset it based on some criteria. While also the [ method could be used to perform such subset operation, these methods might allow more efficient ways to subset the data e.g. by performing the operation within a database with a dedicated SQL call. A default implementation is available for every filter function and thus a method needs only to be implemented if the data storage/representation within a backend would allow a more efficient operation.

All filter methods are expected to return the subset backend (i.e. an instance of the same backend class with the same, or less spectra).

The filterDataOrigin() should subset the backend to spectra with theirdataOrigin spectra variable matching the values provided with the dataOriginparameter. The default implementation for MsBackend is shown below.

setMethod("filterDataOrigin", "MsBackend",
          function(object, dataOrigin = character()) {
              if (length(dataOrigin)) {
                  object <- object[dataOrigin(object) %in% dataOrigin]
                  if (is.unsorted(dataOrigin))
                      object[order(match(dataOrigin(object), dataOrigin))]
                  else object
              } else object
          })

`filterDataStorage()`

Similar to the filterDataOrigin(), the filterDataStorage() should subset a backend to spectra with their dataStorage spectra variable matching the values provided with the dataStorage parameter.

setMethod("filterDataStorage", "MsBackend",
          function(object, dataStorage = character()) {
              if (length(dataStorage)) {
                  object <- object[dataStorage(object) %in% dataStorage]
                  if (is.unsorted(dataStorage))
                      object[order(match(dataStorage(object), dataStorage))]
                  else object
              } else object
          })

`filterEmptySpectra()`

The filterEmptySpectra() should remove all empty spectra (i.e. spectra without any mass peaks) from the backend. The method is expected to return the subset backend. The default implementation for MsBackend is shown below.

setMethod("filterEmptySpectra", "MsBackend", function(object, ...) {
    if (!length(object)) return(object)
    object[as.logical(lengths(object))]
})

`filterIsolationWindow()`

The filterIsolationWindow() filters the backend to spectra with the providedmz value being within their isolationWindowLowerMz() andisolationWindowUpperMz(). The parameter mz defining this target m/z is expected to be a numeric of length 1. The default implementation forMsBackend is shown below.

setMethod("filterIsolationWindow", "MsBackend",
          function(object, mz = numeric(), ...) {
              if (length(mz)) {
                  if (length(mz) > 1)
                      stop("'mz' is expected to be a single m/z value")
                  keep <- which(isolationWindowLowerMz(object) <= mz &
                                isolationWindowUpperMz(object) >= mz)
                  object[keep]
              } else object
          })

`filterMsLevel()`

The filterMsLevel() method is expected to reduce the backend to spectra with the provided MS level(s). Parameter msLevel has to be an integer (any length). The default implementation for MsBackend is shown below.

setMethod("filterMsLevel", "MsBackend",
          function(object, msLevel = integer()) {
              if (length(msLevel)) {
                  object[msLevel(object) %in% msLevel]
              } else object
          })

`filterPolarity()`

The filterPolarity() method is expected to subset the backend to spectra matching the provided polarity (or polarities). Parameter polarity has to be an integer (of any length). The default implementation for MsBackend is shown below.

setMethod("filterPolarity", "MsBackend",
          function(object, polarity = integer()) {
              if (length(polarity))
                  object[polarity(object) %in% polarity]
              else object
          })

`filterPrecursorMzRange()`

The filterPrecursorMzRange() method filters the backend to spectra with theirprecursorMz being between the provided m/z range (parameter mz). This method was previously named filterPrecursorMz(). Parameter mz is expected to be anumeric of length 2 defining the lower and upper limit of this precursor m/z range. The default implementation for MsBackend is shown below.

library(MsCoreUtils)

## 
## Attaching package: 'MsCoreUtils'

## The following object is masked from 'package:IRanges':
## 
##     reduce

## The following objects are masked from 'package:Spectra':
## 
##     bin, entropy, smooth

## The following object is masked from 'package:stats':
## 
##     smooth

setMethod("filterPrecursorMzRange", "MsBackend",
          function(object, mz = numeric()) {
              if (length(mz)) {
                  mz <- range(mz)
                  keep <- which(between(precursorMz(object), mz))
                  object[keep]
              } else object
          })

`filterPrecursorMzValues()`

The filterPrecursorMzValues() method filters the backend to spectra with their m/z values matching to the provided m/z value(s). Parameters ppm andtolerance (both expected to be numeric of length 1) allow to define the conditions for the relaxed matching. Parameter mz has to be a numeric (of any length). The default implementation for MsBackend is shown below.

setMethod("filterPrecursorMzValues", "MsBackend",
          function(object, mz = numeric(), ppm = 20, tolerance = 0) {
              if (length(mz)) {
                  object[.values_match_mz(precursorMz(object), mz = mz,
                                          ppm = ppm, tolerance = tolerance)]
              } else object
          })

The .values_match_mz function used by this function is defined as:

.values_match_mz <- function(x, mz, ppm = 20, tolerance = 0) {
    o <- order(x, na.last = NA)
    cmn <- common(x[o], sort(mz), tolerance = tolerance, ppm = ppm,
                  duplicates = "keep", .check = FALSE)
    sort(o[cmn])
}

`filterPrecursorCharge()`

The filterPrecursorCharge() method filters the backend to spectra with matching precursor charge. Parameter z defining the requested precursor charge has to be an integer (of any length). The default implementation forMsBackend is shown below.

setMethod("filterPrecursorCharge", "MsBackend",
          function(object, z = integer()) {
              if (length(z)) {
                  keep <- which(precursorCharge(object) %in% z)
                  object[keep]
              } else object
          })

`filterPrecursorScan()`

The filterPrecursorScan() method filters the backend to parent (e.g. MS1) and children scans (e.g. MS2) of acquisition number acquisitionNum. Parameter fdefines how the backend should be split (by default by original data file) to avoid selecting spectra from different samples/files. The default implementation for MsBackend is shown below.

setMethod("filterPrecursorScan", "MsBackend",
          function(object, acquisitionNum = integer(), f = dataOrigin(object)) {
              if (length(acquisitionNum) && length(f)) {
                  if (!is.factor(f))
                      f <- factor(f, exclude = character())
                  keep <- unsplit(lapply(split(object, f = f), function(z, an) {
                      .filterSpectraHierarchy(acquisitionNum(z),
                                              precScanNum(z),
                                              an)
                  }, an = acquisitionNum), f = f)
                  object[keep]
              } else object
          })

`filterRt()`

The filterRt() method filters the backend to spectra with their retention time being between the provided rt range. Parameter rt is expected to be a numeric of length 2 defining the lower and upper bound of this range. Parameter msLevel. (note the . in the name of the parameter!) can be optionally used to restrict the filtering to the selected MS levels (i.e. the RT filter is only applied to spectra of the selected MS levels and all spectra with a different MS level are returned as well). The default implementation forMsBackend is shown below.

setMethod("filterRt", "MsBackend",
          function(object, rt = numeric(), msLevel. = integer()) {
              if (length(rt)) {
                  rt <- range(rt)
                  sel_ms <- msLevel(object) %in% msLevel.
                  sel_rt <- between(rtime(object), rt) & sel_ms
                  object[sel_rt | !sel_ms]
              } else object
          })

Implementation notes

In this tutorial we implemented a simple in-memory MsBackend from scratch. For many real-life situation it might however be better to extend some of the pre-defined backend classes from the Spectra package to avoid duplicating functionality. A good starting point might be the MsBackendMemorybackend for any in-memory data representation, or the MsBackendCached for backends that retrieve data from inherently read-only resources (such as database connection or raw data files) but still would need to support adding spectra variables or changing values of spectra variables. Similarly, if the only purpose of a backend is to import or export data in a specific format, theMsBackendMemory might be extended and a single method (backendInitialize()) would need to be implemented for the new class: this new backendInitialize()would then call the code to import the data from the new file format and store it within the available slots of the MsBackendMemory object. Examples would be the backends provided by the _MsBackendMgf_and MsBackendMsp classes.

Creating new MsBackend classes (original) (raw)

Required methods

dataStorage()

length()

backendInitialize()

spectraVariables()

spectraData()

peaksData()

extractByIndex() and [

backendMerge()

$

lengths()

isEmpty()

acquisitionNum()

centroided()

collisionEnergy()

dataOrigin()

intensity()

isolationWindowLowerMz()

isolationWindowTargetMz()

isolationWindowUpperMz()

msLevel()

mz()

polarity()

precScanNum()

precursorCharge()

precursorIntensity()

precursorMz()

rtime()

scanIndex()

smoothed()

spectraNames()

tic()

Data replacement methods

spectraData<-

intensity<-

mz<-

peaksData<-

$<-

selectSpectraVariables()

centroided<-

collisionEnergy<-

dataOrigin<-

dataStorage<-

isolationWindowLowerMz<-

isolationWindowTargetMz<-

isolationWindowUpperMz<-

msLevel<-

polarity<-

rtime<-

smoothed<-

spectraNames<-