nichecompass.benchmarking.compute_clisis

nichecompass.benchmarking.compute_clisis(adata, cell_type_key='cell_type', batch_key=None, spatial_knng_key='spatial_knng', latent_knng_key='nichecompass_latent_knng', spatial_key='spatial', latent_key='nichecompass_latent', n_neighbors=90, n_jobs=1, seed=0)

Compute the Cell Type Local Inverse Simpson’s Index Similarity (CLISIS). The CLISIS measures how accurately the latent nearest neighbor graph preserves local neighborhood cell type heterogeneity from the spatial nearest neighbor graph. The CLISIS ranges between ‘0’ and ‘1’ with higher values indicating better local neighborhood cell type heterogeneity preservation. It is computed by first calculating the Cell Type Local Inverse Simpson’s Index (CLISI) as proposed by Luecken, M. D. et al. Benchmarking atlas-level data integration in single-cell genomics. Nat. Methods 19, 41–50 (2022) on the latent and spatial nearest neighbor graph(s) respectively.* Afterwards, the ratio of the two CLISI scores is taken and logarithmized as proposed by Heidari, E. et al. Supervised spatial inference of dissociated single-cell data with SageNet. bioRxiv 2022.04.14.488419 (2022) doi:10.1101/2022.04.14.488419, leveraging the properties of the log that np.log2(x/y) = -np.log2(y/x) and np.log2(x/x) = 0. At this stage, values closer to 0 indicate better local neighborhood cell type heterogeneity preservation. We then normalize the resulting value by the maximum possible value that would occur in the case of minimal local neighborhood cell type preservation to scale our metric between ‘0’ and ‘1’. Finally, we compute the median of the absolute normalized scores and subtract it from 1 so that values closer to ‘1’ indicate better local neighborhood cell type heterogeneity preservation.

If a ´batch_key´ is provided, separate spatial nearest neighbor graphs per batch will be computed and the spatial clisi scores are computed for each batch separately.

If existent, uses precomputed nearest neighbor graphs stored in ´adata.obsp[spatial_knng_key + ‘_connectivities’]´ and ´adata.obsp[latent_knng_key + ‘_connectivities’]´. Alternatively, computes them on the fly using ´spatial_key´, ´latent_key´ and ´n_neighbors´, and stores them in ´adata.obsp[spatial_knng_key + ‘_connectivities’]´ and ´adata.obsp[latent_knng_key + ‘_connectivities’]´ respectively.

  • The Inverse Simpson’s Index measures the expected number of

samples needed to be sampled before two are drawn from the same category. The Local Inverse Simpson’s Index combines perplexity-based neighborhood construction with the Inverse Simpson’s Index to account for distances between neighbors. The CLISI score is the LISI applied to cell nearest neighbor graphs with cell types as categories, and indicates the effective number of different cell types represented in the local neighborhood of each cell. If the cells are well mixed, we might expect the CLISI score to be close to the number of unique cell types (e.g. neigborhoods with an equal number of cells from 2 cell types get a CLISI of 2). Note, however, that even under perfect mixing, the value would be smaller than the number of unique cell types if the absolute number of cells is different for different cell types.

Parameters:
  • adata (AnnData) – AnnData object with cell type annotations stored in ´adata.obs[cell_type_key]´, precomputed nearest neighbor graphs stored in ´adata.obsp[spatial_knng_key + ‘_connectivities’]´ and ´adata.obsp[latent_knng_key + ‘_connectivities’]´ or spatial coordinates stored in ´adata.obsm[spatial_key]´ and the latent representation from a model stored in ´adata.obsm[latent_key]´.

  • cell_type_key (str (default: 'cell_type')) – Key under which the cell type annotations are stored in ´adata.obs´.

  • batch_key (Optional[str] (default: None)) – Key under which the batches are stored in ´adata.obs´. If ´None´, the adata is assumed to only have one unique batch.

  • spatial_knng_key (str (default: 'spatial_knng')) – Key under which the spatial nearest neighbor graph is / will be stored in ´adata.obsp´ with the suffix ‘_connectivities’.

  • latent_knng_key (str (default: 'nichecompass_latent_knng')) – Key under which the latent nearest neighbor graph is / will be stored in ´adata.obsp´ with the suffix ‘_connectivities’.

  • spatial_key (Optional[str] (default: 'spatial')) – Key under which the spatial coordinates are stored in ´adata.obsm´.

  • latent_key (Optional[str] (default: 'nichecompass_latent')) – Key under which the latent representation from a model is stored in ´adata.obsm´.

  • n_neighbors (Optional[int] (default: 90)) – Number of neighbors used for the construction of the nearest neighbor graphs from the spatial coordinates and the latent representation from a model in case they are constructed.

  • n_jobs (int (default: 1)) – Number of jobs to use for parallelization of neighbor search.

  • seed (int (default: 0)) – Random seed for reproducibility.

Return type:

float

Returns:

clisis: The Cell Type Local Inverse Simpson’s Index Similarity.