Identify and visualise regions of cell type colocalization in multiplexed imaging data that has been segmented at a single-cell resolution.
lisaClust 1.15.2
if (!require("BiocManager")) {
install.packages("BiocManager")
}
BiocManager::install("lisaClust")
# load required packages
library(lisaClust)
library(spicyR)
library(ggplot2)
library(SingleCellExperiment)
library(SpatialDatasets)
Clustering local indicators of spatial association (LISA) functions is a
methodology for identifying consistent spatial organisation of multiple
cell-types in an unsupervised way. This can be used to enable the
characterization of interactions between multiple cell-types
simultaneously and can complement traditional pairwise analysis. In our
implementation our LISA curves are a localised summary of an L-function
from a Poisson point process model. Our framework lisaClust
can be
used to provide a high-level summary of cell-type colocalization in
high-parameter spatial cytometry data, facilitating the identification
of distinct tissue compartments or identification of complex cellular
microenvironments.
To illustrate our lisaClust
framework, we consider a very simple
toy example where two cell-types are completely separated spatially. We
simulate data for two different images.
set.seed(51773)
x <- round(c(
runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3,
runif(200) + 3, runif(200) + 2, runif(200) + 1, runif(200)
), 4) * 100
y <- round(c(
runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3,
runif(200), runif(200) + 1, runif(200) + 2, runif(200) + 3
), 4) * 100
cellType <- factor(paste("c", rep(rep(c(1:2), rep(200, 2)), 4), sep = ""))
imageID <- rep(c("s1", "s2"), c(800, 800))
cells <- data.frame(x, y, cellType, imageID)
ggplot(cells, aes(x, y, colour = cellType)) +
geom_point() +
facet_wrap(~imageID) +
theme_minimal()
First we store our data in a SingleCellExperiment
object.
SCE <- SingleCellExperiment(colData = cells)
SCE
## class: SingleCellExperiment
## dim: 0 1600
## metadata(0):
## assays(0):
## rownames: NULL
## rowData names(0):
## colnames: NULL
## colData names(4): x y cellType imageID
## reducedDimNames(0):
## mainExpName: NULL
## altExpNames(0):
We can then use the convenience function lisaClust
to simultaneously
calculate local indicators of spatial association (LISA) functions and perform
k-means clustering. The number of clusters can be specified with the k =
parameter.
In the example below, we’ve chosen k = 2
, resulting in a total of 2 clusters. The cell type column can be specified using the cellType =
argument. By default, lisaClust
uses the column named cellType
.
The clusters identified by lisaClust
are stored in colData
of the SingleCellExperiment
object as a new column called regions
.
SCE <- lisaClust(SCE, k = 2)
colData(SCE) |> head()
## DataFrame with 6 rows and 5 columns
## x y cellType imageID region
## <numeric> <numeric> <factor> <character> <character>
## 1 36.72 38.58 c1 s1 region_2
## 2 61.38 41.29 c1 s1 region_2
## 3 33.59 80.98 c1 s1 region_2
## 4 50.17 64.91 c1 s1 region_2
## 5 82.93 35.60 c1 s1 region_2
## 6 83.13 2.69 c1 s1 region_2
lisaClust
also provides the convenient hatchingPlot
function to
visualise the different regions that have been demarcated by the
clustering. hatchingPlot
outputs a ggplot
object where the regions
are marked by different hatching patterns. In a real biological dataset,
this allows us to plot both regions and cell-types on the same
visualization.
In the example below, we can visualise our stimulated data where our 2
cell types have been separated neatly into 2 distinct regions based on
which cell type each region is dominated by. region_2
is dominated by
the red cell type c1
, and region_1
is dominated by the blue cell
type c2
.
hatchingPlot(SCE, useImages = c("s1", "s2"))
## Using other clustering methods.
While the lisaClust
function is convenient, we have not implemented an exhaustive
suite of clustering methods as it is very easy to do this yourself. There are
just two simple steps.
We can calculate local indicators of spatial association (LISA) functions
using the lisa
function. Here the LISA curves are a
localised summary of an L-function from a Poisson point process model. The radii
that will be calculated over can be set with Rs
.
lisaCurves <- lisa(SCE, Rs = c(20, 50, 100))
head(lisaCurves)
## 20_c1 20_c2 50_c1 50_c2 100_c1 100_c2
## cell_1 5.556700 -2.764143 15.631209 -6.910357 11.733097 -9.198914
## cell_2 4.833149 -2.764143 13.940407 -6.910357 9.532662 -8.543440
## cell_3 5.918476 -2.764143 9.008588 -6.910357 9.157887 -7.813862
## cell_4 4.109597 -2.764143 11.907928 -6.910357 8.404425 -8.140036
## cell_5 3.024270 -2.764143 10.159278 -6.910357 9.006286 -8.283564
## cell_6 7.986742 -2.764143 8.675070 -6.910357 12.859615 -13.820714
The LISA curves can then be used to cluster the cells. Here we use k-means
clustering. However, other clustering methods like SOM could also be used. We can store these
cell clusters or cell “regions” in our SingleCellExperiment
object.
# Custom clustering algorithm
kM <- kmeans(lisaCurves, 2)
# Storing clusters into colData
colData(SCE)$custom_region <- paste("region", kM$cluster, sep = "_")
colData(SCE) |> head()
## DataFrame with 6 rows and 6 columns
## x y cellType imageID region custom_region
## <numeric> <numeric> <factor> <character> <character> <character>
## 1 36.72 38.58 c1 s1 region_2 region_2
## 2 61.38 41.29 c1 s1 region_2 region_2
## 3 33.59 80.98 c1 s1 region_2 region_2
## 4 50.17 64.91 c1 s1 region_2 region_2
## 5 82.93 35.60 c1 s1 region_2 region_2
## 6 83.13 2.69 c1 s1 region_2 region_2
Next, we apply our lisaClust
framework to two images of breast cancer obtained by Keren et al. (2018).
We will start by reading in the data from the SpatialDatasets
package
as a SingleCellExperiment
object. Here the data is in a format consistent with
that outputted by CellProfiler.
kerenSPE <- SpatialDatasets::spe_Keren_2018()
This data includes annotation of the cell-types of each cell. Hence, we can
move directly to performing k-means clustering on the local indicators of
spatial association (LISA) functions using the lisaClust
function, remembering
to specify the imageID
, cellType
, and spatialCoords
columns in colData
. For the purpose of demonstration, we will be using
only images 5 and 6 of the kerenSPE
dataset.
kerenSPE <- kerenSPE[,kerenSPE$imageID %in% c("5", "6")]
kerenSPE <- lisaClust(kerenSPE,
k = 5
)
These regions are stored in colData
and can be extracted.
colData(kerenSPE)[, c("imageID", "region")] |>
head(20)
## DataFrame with 20 rows and 2 columns
## imageID region
## <character> <character>
## 21154 5 region_4
## 21155 5 region_4
## 21156 5 region_4
## 21157 5 region_3
## 21158 5 region_3
## ... ... ...
## 21169 5 region_3
## 21170 5 region_3
## 21171 5 region_1
## 21172 5 region_3
## 21173 5 region_1
lisaClust
also provides a convenient function, regionMap
, for examining which
cell types are located in which regions. In this example, we use this to check
which cell types appear more frequently in each region than expected by chance.
Here, we clearly see that healthy epithelial and mesenchymal tissue are highly concentrated in region 1, immune cells are concentrated in regions 2 and 4, whilst tumour cells are concentrated in region 3.
We can further segregate these cells by increasing the number of clusters, i.e.,
increasing the parameter k =
in the lisaClust()
function. For the purposes
of demonstration, let’s take a look at the hatchingPlot
of these regions.
regionMap(kerenSPE,
type = "bubble"
)
Finally, we can use hatchingPlot
to construct a ggplot
object where
the regions are marked by different hatching patterns. This allows us to
visualize the 5 regions and 17 cell-types simultaneously.
hatchingPlot(kerenSPE, nbp = 300)
sessionInfo()
## R Under development (unstable) (2024-10-21 r87258)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.1 LTS
##
## Matrix products: default
## BLAS: /home/biocbuild/bbs-3.21-bioc/R/lib/libRblas.so
## LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_GB LC_COLLATE=C
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: America/New_York
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] SpatialDatasets_1.5.0 SpatialExperiment_1.17.0
## [3] ExperimentHub_2.15.0 AnnotationHub_3.15.0
## [5] BiocFileCache_2.15.0 dbplyr_2.5.0
## [7] SingleCellExperiment_1.29.0 SummarizedExperiment_1.37.0
## [9] Biobase_2.67.0 GenomicRanges_1.59.0
## [11] GenomeInfoDb_1.43.0 IRanges_2.41.0
## [13] S4Vectors_0.45.0 BiocGenerics_0.53.1
## [15] generics_0.1.3 MatrixGenerics_1.19.0
## [17] matrixStats_1.4.1 ggplot2_3.5.1
## [19] spicyR_1.19.0 lisaClust_1.15.2
## [21] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 jsonlite_1.8.9
## [3] MultiAssayExperiment_1.33.0 magrittr_2.0.3
## [5] spatstat.utils_3.1-0 magick_2.8.5
## [7] farver_2.1.2 nloptr_2.1.1
## [9] rmarkdown_2.28 zlibbioc_1.53.0
## [11] vctrs_0.6.5 memoise_2.0.1
## [13] minqa_1.2.8 spatstat.explore_3.3-3
## [15] tinytex_0.54 rstatix_0.7.2
## [17] htmltools_0.5.8.1 S4Arrays_1.7.1
## [19] curl_5.2.3 broom_1.0.7
## [21] SparseArray_1.7.0 Formula_1.2-5
## [23] sass_0.4.9 bslib_0.8.0
## [25] plyr_1.8.9 cachem_1.1.0
## [27] mime_0.12 lifecycle_1.0.4
## [29] pkgconfig_2.0.3 Matrix_1.7-1
## [31] R6_2.5.1 fastmap_1.2.0
## [33] GenomeInfoDbData_1.2.13 digest_0.6.37
## [35] numDeriv_2016.8-1.1 colorspace_2.1-1
## [37] AnnotationDbi_1.69.0 tensor_1.5
## [39] RSQLite_2.3.7 ggpubr_0.6.0
## [41] labeling_0.4.3 filelock_1.0.3
## [43] fansi_1.0.6 spatstat.sparse_3.1-0
## [45] httr_1.4.7 polyclip_1.10-7
## [47] abind_1.4-8 mgcv_1.9-1
## [49] compiler_4.5.0 bit64_4.5.2
## [51] withr_3.0.2 backports_1.5.0
## [53] BiocParallel_1.41.0 carData_3.0-5
## [55] DBI_1.2.3 highr_0.11
## [57] ggupset_0.4.0 ggforce_0.4.2
## [59] coxme_2.2-22 ggsignif_0.6.4
## [61] MASS_7.3-61 concaveman_1.1.0
## [63] rappdirs_0.3.3 DelayedArray_0.33.1
## [65] rjson_0.2.23 tools_4.5.0
## [67] goftest_1.2-3 glue_1.8.0
## [69] nlme_3.1-166 grid_4.5.0
## [71] ClassifyR_3.11.0 reshape2_1.4.4
## [73] gtable_0.3.6 spatstat.data_3.1-2
## [75] class_7.3-22 tidyr_1.3.1
## [77] data.table_1.16.2 car_3.1-3
## [79] utf8_1.2.4 XVector_0.47.0
## [81] spatstat.geom_3.3-3 BiocVersion_3.21.1
## [83] pillar_1.9.0 stringr_1.5.1
## [85] splines_4.5.0 dplyr_1.1.4
## [87] tweenr_2.0.3 lattice_0.22-6
## [89] survival_3.7-0 bit_4.5.0
## [91] deldir_2.0-4 tidyselect_1.2.1
## [93] Biostrings_2.75.0 knitr_1.48
## [95] V8_6.0.0 bookdown_0.41
## [97] xfun_0.49 pheatmap_1.0.12
## [99] fftwtools_0.9-11 scam_1.2-17
## [101] stringi_1.8.4 UCSC.utils_1.3.0
## [103] yaml_2.3.10 boot_1.3-31
## [105] evaluate_1.0.1 codetools_0.2-20
## [107] tibble_3.2.1 BiocManager_1.30.25
## [109] cli_3.6.3 munsell_0.5.1
## [111] jquerylib_0.1.4 Rcpp_1.0.13
## [113] spatstat.random_3.3-2 png_0.1-8
## [115] bdsmatrix_1.3-7 spatstat.univar_3.0-1
## [117] parallel_4.5.0 ggh4x_0.2.8
## [119] blob_1.2.4 lme4_1.1-35.5
## [121] ggthemes_5.1.0 lmerTest_3.1-3
## [123] scales_1.3.0 purrr_1.0.2
## [125] crayon_1.5.3 rlang_1.1.4
## [127] KEGGREST_1.47.0