GitHub - waldronlab/XeniumIO: software for importing 10X Xenium data (original) (raw)
Introduction
The XeniumIO
package provides functions to import 10X Genomics Xenium Analyzer data into R. The package is designed to work with the output of the Xenium Analyzer, which is a software tool that processes Visium spatial gene expression data. The package provides functions to import the output of the Xenium Analyzer into R, and to create a TENxXenium
object that can be used with other Bioconductor packages.
Supported Formats
TENxIO
The 10X suite of packages support multiple file formats. The following table lists the supported file formats and the corresponding classes that are imported into R.
Extension | Class | Imported as |
---|---|---|
.h5 | TENxH5 | SingleCellExperiment w/ TENxMatrix |
.mtx / .mtx.gz | TENxMTX | SummarizedExperiment w/ dgCMatrix |
.tar.gz | TENxFileList | SingleCellExperiment w/ dgCMatrix |
peak_annotation.tsv | TENxPeaks | GRanges |
fragments.tsv.gz | TENxFragments | RaggedExperiment |
.tsv / .tsv.gz | TENxTSV | tibble |
VisiumIO
Extension | Class | Imported as |
---|---|---|
spatial.tar.gz | TENxSpatialList | DataFrame list * |
.parquet | TENxSpatialParquet | tibble * |
XeniumIO
Extension | Class | Imported as |
---|---|---|
.zarr.zip | TENxZarr | (TBD) |
Installation
if (!require("BiocManager", quietly = TRUE)) install.packages("BiocManager")))
BiocManager::install("XeniumIO")
Loading package
XeniumIO
The TENxXenium
class has a metadata
slot for the experiment.xenium
file. The resources
slot is a TENxFileList
or TENxH5
object containing the cell feature matrix. The coordNames
slot is a vector specifying the names of the columns in the spatial data containing the spatial coordinates. The sampleId
slot is a scalar specifying the sample identifier.
TENxXenium( resources = "path/to/matrix/folder/or/file", xeniumOut = "path/to/xeniumOut/folder", sample_id = "sample01", format = c("mtx", "h5"), boundaries_format = c("parquet", "csv.gz"), spatialCoordsNames = c("x_centroid", "y_centroid"), ... )
The format
argument specifies the format of the resources
object, either “mtx” or “h5”. The boundaries_format
allows the user to choose whether to read in the data using the parquet
or csv.gz
format.
Example Folder Structure
Note that the xeniumOut
unzipped folder must contain the following files:
*outs
├── cell_feature_matrix.h5
├── cell_feature_matrix.tar.gz
| ├── barcodes.tsv*
| ├── features.tsv*
| └── matrix.mtx*
├── cell_feature_matrix.zarr.zip
├── experiment.xenium
├── cells.csv.gz
├── cells.parquet
├── cells.zarr.zip
[...]
Note that currently the zarr
format is not supported as the infrastructure is currently under development.
Xenium class
The resources
slot should either be the TENxFileList
from the mtx
format or a TENxH5
instance from an h5
file. The boundaries can either be a TENxSpatialParquet
instance or a TENxSpatialCSV
. These classes are automatically instantiated by the constructor function.
showClass("TENxXenium")
#> Class "TENxXenium" [package "XeniumIO"]
#>
#> Slots:
#>
#> Name: resources boundaries coordNames
#> Class: TENxFileList_OR_TENxH5 TENxSpatialParquet_OR_TENxSpatialCSV character
#>
#> Name: sampleId colData metadata
#> Class: character TENxSpatialParquet XeniumFile
import
method
The import
method for a TENxXenium
instance returns aSpatialExperiment
class object. Dispatch is only done on the con
argument. See ?BiocIO::import
for details on the generic. The import
function call is meant to be a simple call without much input. For more details in the package, see ?TENxXenium
.
getMethod("import", c(con = "TENxXenium")) #> Method Definition: #> #> function (con, format, text, ...) #> { #> sce <- import(con@resources, ...) #> metadata <- import(con@metadata) #> coldata <- import(con@colData) #> SpatialExperiment::SpatialExperiment(assays = list(counts = assay(sce)), #> rowData = rowData(sce), mainExpName = mainExpName(sce), #> altExps = altExps(sce), sample_id = con@sampleId, colData = as(coldata, #> "DataFrame"), spatialCoordsNames = con@coordNames, #> metadata = list(experiment.xenium = metadata, polygons = import(con@boundaries))) #> } #> <bytecode: 0x622eb1e584c8> #> <environment: namespace:XeniumIO> #> #> Signatures: #> con format text #> target "TENxXenium" "ANY" "ANY" #> defined "TENxXenium" "ANY" "ANY"
Importing an Example Xenium Dataset
The following code snippet demonstrates how to import a Xenium Analyzer output into R. The TENxXenium
object is created by specifying the path to the xeniumOut
folder. The TENxXenium
object is then imported into R using the import
method for the TENxXenium
class.
First, we cache the ~12 MB file to avoid downloading it multiple times (via_BiocFileCache_).
zipfile <- paste0( "https://mghp.osn.xsede.org/bir190004-bucket01/BiocXenDemo/", "Xenium_Prime_MultiCellSeg_Mouse_Ileum_tiny_outs.zip" ) destfile <- XeniumIO:::.cache_url_file(zipfile)
We then create an output folder for the contents of the zipped file. We use the same name as the zip file but without the extension (withtools::file_path_sans_ext
).
outfold <- file.path( tempdir(), tools::file_path_sans_ext(basename(zipfile)) ) if (!dir.exists(outfold)) dir.create(outfold, recursive = TRUE)
We now unzip the file and use the outfold
as the exdir
argument tounzip
. The outfold
variable and folder will be used as thexeniumOut
argument in the TENxXenium
constructor. Note that we use the ref = "Gene Expression"
argument in the import
method to pass down to the internal splitAltExps
function call. This will set themainExpName
in the SpatialExperiment
object.
unzip( zipfile = destfile, exdir = outfold, overwrite = FALSE ) TENxXenium(xeniumOut = outfold) |> import(ref = "Gene Expression") #> class: SpatialExperiment #> dim: 5006 36 #> metadata(2): experiment.xenium polygons #> assays(1): counts #> rownames(5006): ENSMUSG00000052595 ENSMUSG00000030111 ... ENSMUSG00000055670 ENSMUSG00000027596 #> rowData names(3): ID Symbol Type #> colnames(36): aaamobki-1 aaclkaod-1 ... olbjkpjc-1 omjmdimk-1 #> colData names(13): cell_id transcript_counts ... segmentation_method sample_id #> reducedDimNames(0): #> mainExpName: Gene Expression #> altExpNames(5): Deprecated Codeword Genomic Control Negative Control Codeword Negative Control Probe Unassigned Codeword #> spatialCoords names(2) : x_centroid y_centroid #> imgData names(0):
Note that you may also use the swapAltExp
function to set amainExpName
in the SpatialExperiment
but this is not recommended. The operation returns a SingleCellExperiment
which has to be coerced back into a SpatialExperiment
. The coercion also loses some metadata information particularly the spatialCoords
.
TENxXenium(xeniumOut = outfold) |> import() |> swapAltExp(name = "Gene Expression") |> as("SpatialExperiment") #> class: SpatialExperiment #> dim: 5006 36 #> metadata(1): TENxFileList #> assays(1): counts #> rownames(5006): ENSMUSG00000052595 ENSMUSG00000030111 ... ENSMUSG00000055670 ENSMUSG00000027596 #> rowData names(3): ID Symbol Type #> colnames(36): aaamobki-1 aaclkaod-1 ... olbjkpjc-1 omjmdimk-1 #> colData names(13): cell_id transcript_counts ... segmentation_method sample_id #> reducedDimNames(0): #> mainExpName: Gene Expression #> altExpNames(5): Genomic Control Negative Control Codeword Negative Control Probe Unassigned Codeword Deprecated Codeword #> spatialCoords names(0) : #> imgData names(0):
The dataset was obtained from the 10X Genomics website under the X0A v3.0 sectionand is a subset of the Xenium Prime 5K Mouse Pan Tissue & Pathways Panel. The link to the data can be seen as the url
input above and shown below for completeness.
Session Info
Click to expand sessionInfo()
R version 4.5.0 Patched (2025-04-15 r88148)
Platform: x86_64-pc-linux-gnu
Running under: Ubuntu 24.04.2 LTS
Matrix products: default
BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.26.so; LAPACK version 3.12.0
locale:
[1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
time zone: America/New_York
tzcode source: system (glibc)
attached base packages:
[1] stats4 stats graphics grDevices utils datasets methods base
other attached packages:
[1] BiocStyle_2.37.0 XeniumIO_1.1.1 TENxIO_1.11.1 SingleCellExperiment_1.31.0
[5] SummarizedExperiment_1.39.0 Biobase_2.69.0 GenomicRanges_1.61.0 GenomeInfoDb_1.45.3
[9] IRanges_2.43.0 S4Vectors_0.47.0 BiocGenerics_0.55.0 generics_0.1.3
[13] MatrixGenerics_1.21.0 matrixStats_1.5.0 colorout_1.3-2
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 dplyr_1.1.4 blob_1.2.4 arrow_19.0.1.1 filelock_1.0.3
[6] fastmap_1.2.0 BiocFileCache_2.99.0 digest_0.6.37 lifecycle_1.0.4 RSQLite_2.3.9
[11] magrittr_2.0.3 compiler_4.5.0 rlang_1.1.6 tools_4.5.0 utf8_1.2.4
[16] yaml_2.3.10 knitr_1.50 VisiumIO_1.5.1 askpass_1.2.1 S4Arrays_1.9.0
[21] bit_4.6.0 curl_6.2.2 DelayedArray_0.35.1 abind_1.4-8 rsconnect_1.3.4
[26] withr_3.0.2 purrr_1.0.4 sys_3.4.3 grid_4.5.0 cli_3.6.5
[31] rmarkdown_2.29 crayon_1.5.3 rstudioapi_0.17.1 httr_1.4.7 tzdb_0.5.0
[36] rjson_0.2.23 BiocBaseUtils_1.11.0 DBI_1.2.3 cachem_1.1.0 assertthat_0.2.1
[41] parallel_4.5.0 BiocManager_1.30.25 XVector_0.49.0 vctrs_0.6.5 Matrix_1.7-3
[46] jsonlite_2.0.0 hms_1.1.3 bit64_4.6.0-1 archive_1.1.12 magick_2.8.6
[51] credentials_2.0.2 glue_1.8.0 codetools_0.2-20 BiocIO_1.19.0 UCSC.utils_1.5.0
[56] tibble_3.2.1 pillar_1.10.2 rappdirs_0.3.3 htmltools_0.5.8.1 openssl_2.3.2
[61] R6_2.6.1 dbplyr_2.5.0 httr2_1.1.2 gert_2.1.5 vroom_1.6.5
[66] evaluate_1.0.3 lattice_0.22-7 readr_2.1.5 SpatialExperiment_1.19.0 memoise_2.0.1
[71] Rcpp_1.0.14 SparseArray_1.9.0 whisker_0.4.1 xfun_0.52 pkgconfig_2.0.3