CardinalIO: Parsing and writing imzML files (original) (raw)

Contents

Introduction

CardinalIO provides fast and efficient parsing and writing of imzML files for storage of mass spectrometry (MS) imaging experiments. It is intended to take over all file importing and exporting duties for the Cardinal package for MS imaging data analysis. Only the most basic methods are provided here. Support for higher-level objects (e.g., MSImagingExperiment from Cardinal) should provided in their respective packages.

The imzML format is an open standard for long-term storage of MS imaging experimental data. Each MS imaging dataset is composed of two files: (1) an XML metadata file ending in “.imzML” that contains experimental metadata and (2) a binary data file ending in “.ibd” that contains the actual m/z and intensity arrays. The files are linked by a UUID. Both files must be present to successfully import an MS imaging dataset.

The imzML specification is described in detail here along with example data files (two of which are included in this package). Software tools for converting vendor formats to imzML can be found here. A Java-based imzML validator is available here. A web-based imzML validator is available here.

Installation

CardinalIO can be installed via the BiocManager package.

if (!require("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install("CardinalIO")

The same function can be used to update CardinalIO and other Bioconductor packages.

Once installed, CardinalIO can be loaded with library():

library(CardinalIO)
## Loading required package: BiocParallel
## Loading required package: matter
## Loading required package: Matrix
## Loading required package: ontologyIndex

Structure of imzML files

Valid imzML datasets are composed of two files (“.imzML” and “.ibd”) and come in two types: “continuous” and “processed”.

XML

The XML (“.imzML”) file contains only human-readable experimental metadata in a structured plain text format using a controlled vocabulary. It can include many experimental details including sample preparation, instrument configuration, scan settings, etc. Note that a imzML file is also a valid mzML file, with additional requirements and constraints to accomodate the imaging modality.

Binary

The binary data (“.ibd”) file contains the binary m/z and intensity arrays. The structure of these files is defined by metadata in the XML file. Two arrangements of the internal binary data arrays are possible depending on the type of imzML file (“continuous” or “processed”).

A visualization of the imzML format as described below

Continuous

For “continuous” imzML files, all mass spectra share the same m/z values. Therefore, the m/z array is stored only once in the binary data file.

Processed

For “processed” imzML files, each mass spectrum has its own unique set of m/z values. Therefore, each m/z array is stored with its corresponding intensity array. This format is common for high mass resolution experiments where it would be prohibitive to store the complete profile spectrum, so the profile spectra are stored sparsely.

Additional notes

Note that both imzML types may contain either profile or centroided spectra. The spectrum representation should be specified in the imzML metadata file. Further note that despite the name, the “processed” type does not imply that any spectral processing has been performed beyond basic processing performed by the instrument.

Parsing imzML files is performed with parseImzML().

path <- exampleImzMLFile("continuous")
path
## [1] "/tmp/RtmpmGvOiy/Rinst10dc4d69a62f0e/CardinalIO/extdata/Example_Continuous_imzML1.1.1/Example_Continuous.imzML"
p <- parseImzML(path, ibd=TRUE)
p
## ImzML: /tmp/RtmpmGvOiy/Rinst10dc4d69a62f0e/CardinalIO/extdata/Example_Continuous_imzML1.1.1/Example_Continuous.imzML
## 
## $fileDescription(3): fileContent sourceFileList contact
## $sampleList(1): sample1
## $scanSettingsList(1): scansettings1
## $softwareList(2): Xcalibur TMC
## $instrumentConfigurationList(1): LTQFTUltra0
## $dataProcessingList(2): XcaliburProcessing TMCConversion
## $run(1): spectrumList
## $ibd(3): uuid mz intensity

By default, only the “.imzML” metadata is parsed. Using ibd=TRUE will also attach the mass spectra (without loading them into memory).

The resulting ImzML object is like a list, and can be traversed in the same way using the standard $, [ and [[ operators.

Mass spectra

If the option ibd=TRUE was used when parsing the imzML file, then the mass spectra data is attached (without loading the data into memory).

p$ibd$mz
## <9 length> matter_list :: out-of-core list
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=1 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=2 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=3 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=4 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=5 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
##              [1]      [2]      [3]      [4]      [5]      [6] ...
## $Scan=6 100.0833 100.1667 100.2500 100.3333 100.4167 100.5000 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)
p$ibd$intensity
## <9 length> matter_list :: out-of-core list
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=1   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=2   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=3   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=4   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=5   0   0   0   0   0   0 ...
##         [1] [2] [3] [4] [5] [6] ...
## $Scan=6   0   0   0   0   0   0 ...
## ...
## (6.75 KB real | 0 bytes shared | 302.37 KB virtual)

These out-of-memory lists can be subset like normal lists. They can alternatively be pulled fully into memory using as.list().

mz1 <- p$ibd$mz[[1L]]
int1 <- p$ibd$intensity[[1L]]
plot(mz1, int1, type="l", xlab="m/z", ylab="Intensity")

Writing imzML files

Writing imzML files is performed with writeImzML(). This is a generic function, so methods can be written to support new classes in other packages. CardinalIO provides methods for ImzML and ImzMeta.

Session information

sessionInfo()
## R version 4.5.0 beta (2025-04-02 r88102)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
## 
## Matrix products: default
## BLAS:   /home/biocbuild/bbs-3.22-bioc/R/lib/libRblas.so 
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0  LAPACK version 3.12.0
## 
## locale:
##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=en_GB              LC_COLLATE=C              
##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
## 
## time zone: America/New_York
## tzcode source: system (glibc)
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] CardinalIO_1.7.0    ontologyIndex_2.12  matter_2.11.0      
## [4] Matrix_1.7-3        BiocParallel_1.43.0 BiocStyle_2.37.0   
## 
## loaded via a namespace (and not attached):
##  [1] cli_3.6.4           knitr_1.50          magick_2.8.6       
##  [4] rlang_1.1.6         xfun_0.52           ProtGenerics_1.41.0
##  [7] generics_0.1.3      jsonlite_2.0.0      S4Vectors_0.47.0   
## [10] htmltools_0.5.8.1   tinytex_0.57        sass_0.4.10        
## [13] stats4_4.5.0        rmarkdown_2.29      grid_4.5.0         
## [16] evaluate_1.0.3      jquerylib_0.1.4     fastmap_1.2.0      
## [19] yaml_2.3.10         lifecycle_1.0.4     bookdown_0.43      
## [22] BiocManager_1.30.25 compiler_4.5.0      codetools_0.2-20   
## [25] irlba_2.3.5.1       Rcpp_1.0.14         lattice_0.22-7     
## [28] digest_0.6.37       R6_2.6.1            parallel_4.5.0     
## [31] magrittr_2.0.3      bslib_0.9.0         tools_4.5.0        
## [34] BiocGenerics_0.55.0 cachem_1.1.0