ClustSIGNAL: a spatial clustering method (original) (raw)

ClustSIGNAL: **Clust**ering of **S**patially **I**nformed **G**ene expression with **N**eighbourhood **A**dapted **L**earning.

An R package to perform spatially-informed cell type clustering on high-resolution spatial transcriptomics data. This method has two-fold motivation: overcoming data sparsity by using neighbourhood heterogeneity information to guide an adaptive smoothing approach, and performing spatially-informed clustering by embedding spatial context into the gene expression. To achieve these, we calculate entropy as a measure of heterogeneity of cell neighbourhoods and use it to generate weight distributions and perform adaptive smoothing of gene expression. Homogeneous neighbourhoods generally have low entropy, and so, smoothing is performed over more cells in these neighbourhoods. Contrarily, heterogeneous neighbourhoods have high entropy and are smoothed over a much smaller region. The resulting adaptively smoothed gene expression is used for clustering.

For a tutorial on how to use ClustSIGNAL, see the vignette at this website.

Installation

To install ClustSIGNAL via Bioconductor:

To install ClustSIGNAL from GitHub:

# install.packages("devtools")
devtools::install_github("SydneyBioX/clustSIGNAL")

Method description

Figure: ClustSIGNAL method overview.

Here, we present ClustSIGNAL, a spatial clustering method developed to handle data sparsity while considering the variability in cell arrangement of tissue regions. The core steps involved in the method are sequential:

1. The method starts with non-spatial clustering and subclustering (default louvain clustering) to classify cells into clusters and subclusters that we refer to as ‘initial clusters’ and ‘initial subclusters’, respectively.

2. The neighbourhood of each cell is then defined in terms of initial subcluster composition.

3. The cells in the neighbourhood are also sorted and rearranged so that the neighbours belonging to the same initial clusters as the index cell are placed closer to it.

4. Neighbourhood heterogeneity is measured as entropy, where a high entropy value indicates more heterogeneity in the neighbourhood and a low entropy value indicates a more homogeneous neighbourhood.

5. The entropy values are used to generate weight distributions specific to each neighbourhood.

6. The gene expressions of cells are adaptively smoothed using the entropy-guided weight distributions; cells in heterogeneous neighbourhoods (high entropy regions) undergo smoothing over a smaller region, whereas cells in homogeneous neighbourhoods (low entropy regions) undergo smoothing over a larger region.

7. Non-spatial clustering is performed with adaptively smoothed gene expression to generate ClustSIGNAL clusters.

ClustSIGNAL parameters

ClustSIGNAL package uses a SpatialExperiment object as input. We provide users with a number of parameters to explore and experiment with, as well as prior tested default values for quick runs. ClustSIGNAL can be used for single sample or multisample analysis with just one function call. Below is the list of the parameters offered and their possible values:

Running ClustSIGNAL

Before running ClustSIGNAL, it is important to ensure that the input SpatialExperiment object has spatial coordinates stored in the spatialCoords matrix. Otherwise, the method will throw an error asking the user to provide spatial coordinates.

# load required packages
library(clustSIGNAL)

data(ClustSignal_example)

# Here, the sample labels are in 'sample_id' column.
res <- clustSIGNAL(spe, samples = "sample_id", outputs = "a")