An introduction to the MoleculeExperiment Class (original) (raw)

Use case 1: from dataframes to ME object

Here we demonstrate how to work with an ME object from toy data, representing a scenario where both the detected transcripts information and the boundary information have already been read into R. This requires the standardisation of the data with the dataframeToMEList() function.

The flexibility of the arguments in dataframeToMEList() enable the creation of a standard ME object across dataframes coming from different vendors of molecule-based spatial transcriptomics technologies.

Generate a toy transcripts data.frame.

moleculesDf <- data.frame(
  sample_id = rep(c("sample1", "sample2"), times = c(30, 20)),
  features = rep(c("gene1", "gene2"), times = c(20, 30)),
  x_coords = runif(50),
  y_coords = runif(50)
)
head(moleculesDf)
#>   sample_id features   x_coords  y_coords
#> 1   sample1    gene1 0.92015495 0.4950360
#> 2   sample1    gene1 0.78873036 0.6171017
#> 3   sample1    gene1 0.06555999 0.4093783
#> 4   sample1    gene1 0.39189833 0.8546513
#> 5   sample1    gene1 0.86206024 0.5022486
#> 6   sample1    gene1 0.76114128 0.5978346

Generate a toy boundaries data.frame.

boundariesDf <- data.frame(
  sample_id = rep(c("sample1", "sample2"), times = c(16, 6)),
  cell_id = rep(
    c(
      "cell1", "cell2", "cell3", "cell4",
      "cell1", "cell2"
    ),
    times = c(4, 4, 4, 4, 3, 3)
  ),
  vertex_x = c(
    0, 0.5, 0.5, 0,
    0.5, 1, 1, 0.5,
    0, 0.5, 0.5, 0,
    0.5, 1, 1, 0.5,
    0, 1, 0, 0, 1, 1
  ),
  vertex_y = c(
    0, 0, 0.5, 0.5,
    0, 0, 0.5, 0.5,
    0.5, 0.5, 1, 1,
    0.5, 0.5, 1, 1,
    0, 1, 1, 0, 0, 1
  )
)
head(boundariesDf)
#>   sample_id cell_id vertex_x vertex_y
#> 1   sample1   cell1      0.0      0.0
#> 2   sample1   cell1      0.5      0.0
#> 3   sample1   cell1      0.5      0.5
#> 4   sample1   cell1      0.0      0.5
#> 5   sample1   cell2      0.5      0.0
#> 6   sample1   cell2      1.0      0.0

Standardise transcripts dataframe to ME list format.

moleculesMEList <- dataframeToMEList(moleculesDf,
  dfType = "molecules",
  assayName = "detected",
  sampleCol = "sample_id",
  factorCol = "features",
  xCol = "x_coords",
  yCol = "y_coords"
)
str(moleculesMEList, max.level = 3)
#> List of 1
#>  $ detected:List of 2
#>   ..$ sample1:List of 2
#>   .. ..$ gene1: tibble [20 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ gene2: tibble [10 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ sample2:List of 1
#>   .. ..$ gene2: tibble [20 × 2] (S3: tbl_df/tbl/data.frame)

Standardise boundaries dataframe to ME list format.

boundariesMEList <- dataframeToMEList(boundariesDf,
  dfType = "boundaries",
  assayName = "cell",
  sampleCol = "sample_id",
  factorCol = "cell_id",
  xCol = "vertex_x",
  yCol = "vertex_y"
)
str(boundariesMEList, 3)
#> List of 1
#>  $ cell:List of 2
#>   ..$ sample1:List of 4
#>   .. ..$ cell1: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ cell2: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ cell3: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ cell4: tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
#>   ..$ sample2:List of 2
#>   .. ..$ cell1: tibble [3 × 2] (S3: tbl_df/tbl/data.frame)
#>   .. ..$ cell2: tibble [3 × 2] (S3: tbl_df/tbl/data.frame)

Create an ME object by using the MoleculeExperiment object constructor.

toyME <- MoleculeExperiment(
  molecules = moleculesMEList,
  boundaries = boundariesMEList
)
toyME
#> MoleculeExperiment class
#> 
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (2): gene1 gene2
#> ---- molecules (30)
#> ---- location range: [0.07,0.97] x [0.01,0.94]
#> -- sample2:
#> ---- features (1): gene2
#> ---- molecules (20)
#> ---- location range: [0.05,0.98] x [0,0.97]
#> 
#> 
#> boundaries slot (1): cell
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (4): cell1 cell2 cell3 cell4
#> -- sample2:
#> ---- segments (2): cell1 cell2

Add boundaries from an external segmentation algorithm.

In this example, we use the extent of the molecules of generated for toyME to align the boundaries with the molecules. In general, the extent of the segmentation is required for this alignment.

repoDir <- system.file("extdata", package = "MoleculeExperiment")
segMask <- paste0(repoDir, "/BIDcell_segmask.tif")
boundaries(toyME, "BIDcell_segmentation") <- readSegMask(
  # use the molecule extent to define the boundary extent
  extent(toyME, assayName = "detected"),
  path = segMask, assayName = "BIDcell_segmentation",
  sample_id = "sample1", background_value = 0
)

toyME
#> MoleculeExperiment class
#> 
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (2): gene1 gene2
#> ---- molecules (30)
#> ---- location range: [0.07,0.97] x [0.01,0.94]
#> -- sample2:
#> ---- features (1): gene2
#> ---- molecules (20)
#> ---- location range: [0.05,0.98] x [0,0.97]
#> 
#> 
#> boundaries slot (2): cell BIDcell_segmentation
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (4): cell1 cell2 cell3 cell4
#> -- sample2:
#> ---- segments (2): cell1 cell2
#> - BIDcell_segmentation:
#> samples (1): sample1
#> -- sample1:
#> ---- segments (88): 54748 54771 ... 58184 58186

Displayed below is the BIDcell segmentation image added to toyME as another boundaries

BIDcell_segmask_img <- EBImage::readImage(segMask)
EBImage::display(BIDcell_segmask_img, method = "raster")

Finally, a digital in-situ, with cell boundaries and BIDcell segmentation boundaries (red polygons in sample1) can be plotted.

ggplot_me() + 
  geom_polygon_me(toyME, assayName = "cell", byFill = "segment_id", colour = "black", alpha = 0.3) + 
  geom_polygon_me(toyME, assayName = "BIDcell_segmentation", fill = NA, colour = "red" ) + 
  geom_point_me(toyME, assayName = "detected", byColour = "feature_id", size = 1) +
  theme_classic()

Use case 2: from machine’s output directory to ME object

The MoleculeExperiment package also provides functions to directly work with the directories containing output files of commonly used technologies. This is especially useful to work with data from multiple samples. Here we provide convenience functions to read in data from Xenium (10X Genomics), CosMx (Nanostring) and Merscope (Vizgen).

repoDir <- system.file("extdata", package = "MoleculeExperiment")
repoDir <- paste0(repoDir, "/xenium_V1_FF_Mouse_Brain")

me <- readXenium(repoDir, keepCols = "essential", addBoundaries = "cell")
me
#> MoleculeExperiment class
#> 
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- features (137): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
#> ---- molecules (962)
#> ---- location range: [4900,4919.98] x [6400.02,6420]
#> -- sample2:
#> ---- features (143): 2010300C02Rik Acsbg1 ... Zfp536 Zfpm2
#> ---- molecules (777)
#> ---- location range: [4900.01,4919.98] x [6400.16,6419.97]
#> 
#> 
#> boundaries slot (1): cell
#> - cell:
#> samples (2): sample1 sample2
#> -- sample1:
#> ---- segments (5): 67500 67512 67515 67521 67527
#> -- sample2:
#> ---- segments (9): 65043 65044 ... 65070 65071

readXenium() standardises the transcript and boundary information such that the column names are consistent across technologies when handling ME objects.

In addition, the keepCols argument of readXenium() enables the user to decide if they want to keep all data that is vendor-specific (e.g., column with qv score), some columns of interest, or only the essential columns. By default, it is set to essential, which refers to feature names, x and y locations in the transcripts file, and segment ids, x and y locations for the vertices defining the boundaries in the boundaries file.

For CosMx and Merscope data we provide convenience functions that standardise the raw transcripts data into a MoleculeExperiment object and additionally read the boundaries included in the standard data releases.

repoDir <- system.file("extdata", package = "MoleculeExperiment")
repoDir <- paste0(repoDir, "/nanostring_Lung9_Rep1")

meCosmx <- readCosmx(repoDir, keepCols = "essential", addBoundaries = "cell")
meCosmx
#> MoleculeExperiment class
#> 
#> molecules slot (1): detected
#> - detected:
#> samples (2): sample_1 sample_2
#> -- sample_1:
#> ---- features (969): AATK ABL1 ... YES1 ZFP36
#> ---- molecules (23844)
#> ---- location range: [924.01,1031.94] x [26290,26398]
#> -- sample_2:
#> ---- features (943): AATK ABL1 ... YBX3 ZFP36
#> ---- molecules (7155)
#> ---- location range: [2894.03,3002] x [26290.05,26398]
#> 
#> 
#> boundaries slot (1): cell
#> - cell:
#> samples (2): sample_1 sample_2
#> -- sample_1:
#> ---- segments (113): 1 2 ... 112 113
#> -- sample_2:
#> ---- segments (83): 114 115 ... 195 196

repoDir <- system.file("extdata", package = "MoleculeExperiment")
repoDir <- paste0(repoDir, "/vizgen_HumanOvarianCancerPatient2Slice2")
meMerscope <- readMerscope(repoDir,
  keepCols = "essential",
  addBoundaries = "cell"
)
meMerscope
#> MoleculeExperiment class
#> 
#> molecules slot (1): detected
#> - detected:
#> samples (1): vizgen_HumanOvarianCancerPatient2Slice2
#> -- vizgen_HumanOvarianCancerPatient2Slice2:
#> ---- features (486): ACKR3 ACTA2 ... ZBED2 ZEB1
#> ---- molecules (15160)
#> ---- location range: [10219.02,10386.83] x [8350.93,8395.3]
#> 
#> 
#> boundaries slot (1): cell
#> - cell:
#> samples (1): vizgen_HumanOvarianCancerPatient2Slice2
#> -- vizgen_HumanOvarianCancerPatient2Slice2:
#> ---- segments (24): 45862 45865 ... 54562 54564