nichecompass.utils.add_gps_from_gp_dict_to_adata

nichecompass.utils.add_gps_from_gp_dict_to_adata(gp_dict, adata, genes_uppercase=True, gp_targets_mask_key='nichecompass_gp_targets', gp_targets_categories_mask_key='nichecompass_gp_targets_categories', targets_categories_label_encoder_key='nichecompass_targets_categories_label_encoder', gp_sources_mask_key='nichecompass_gp_sources', gp_sources_categories_mask_key='nichecompass_gp_sources_categories', sources_categories_label_encoder_key='nichecompass_sources_categories_label_encoder', gp_names_key='nichecompass_gp_names', source_genes_idx_key='nichecompass_source_genes_idx', target_genes_idx_key='nichecompass_target_genes_idx', genes_idx_key='nichecompass_genes_idx', min_genes_per_gp=1, min_source_genes_per_gp=0, min_target_genes_per_gp=0, max_genes_per_gp=None, max_source_genes_per_gp=None, max_target_genes_per_gp=None, filter_genes_not_in_masks=False, add_fc_gps_instead_of_gp_dict_gps=False, plot_gp_gene_count_distributions=False)

Add gene programs defined in a gene program dictionary to an AnnData object. This is done by converting the gene program lists of gene program target and source genes to binary masks and aligning the masks with genes for which gene expression is available in the AnnData object.

Parts of the implementation are inspired by https://github.com/theislab/scarches/blob/master/scarches/utils/annotations.py#L5 (01.10.2022).

Parameters:

gp_dict (dict) – Nested dictionary containing the gene programs with keys being gene program names and values being dictionaries with keys ´targets´ and ´sources´, where ´targets´ contains a list of the names of genes in the gene program for the reconstruction of the gene expression of the node itself (receiving node) and ´sources´ contains a list of the names of genes in the gene program for the reconstruction of the gene expression of the node’s neighbors (transmitting nodes).
adata (AnnData) – AnnData object to which the gene programs will be added.
genes_uppercase (bool (default: True)) – If True, convert the gene names in the adata and in the gene program dictionary to uppercase for comparison.
gp_targets_mask_key (str (default: 'nichecompass_gp_targets')) – Key in ´adata.varm´ where the binary gene program mask for target genes of a gene program will be stored (target genes are used for the reconstruction of the gene expression of the node itself (receiving node )).
gp_sources_mask_key (str (default: 'nichecompass_gp_sources')) – Key in ´adata.varm´ where the binary gene program mask for source genes of a gene program will be stored (source genes are used for the reconstruction of the gene expression of the node’s neighbors (transmitting nodes)).
gp_names_key (str (default: 'nichecompass_gp_names')) – Key in ´adata.uns´ where the gene program names will be stored.
source_genes_idx_key (str (default: 'nichecompass_source_genes_idx')) – Key in ´adata.uns´ where the index of the source genes that are in the gene program mask will be stored.
target_genes_idx_key (str (default: 'nichecompass_target_genes_idx')) – Key in ´adata.uns´ where the index of the target genes that are in the gene program mask will be stored.
genes_idx_key (str (default: 'nichecompass_genes_idx')) – Key in ´adata.uns´ where the index of a concatenated vector of target and source genes that are in the gene program masks will be stored.
min_genes_per_gp (int (default: 1)) – Minimum number of genes in a gene program inluding both target and source genes that need to be available in the adata (gene expression has been probed) for a gene program not to be discarded.
min_source_genes_per_gp (int (default: 0)) – Minimum number of source genes in a gene program that need to be available in the adata (gene expression has been probed) for a gene program not to be discarded.
min_target_genes_per_gp (int (default: 0)) – Minimum number of target genes in a gene program that need to be available in the adata (gene expression has been probed) for a gene program not to be discarded.
max_genes_per_gp (Optional[int] (default: None)) – Maximum number of genes in a gene program inluding both target and source genes that can be available in the adata (gene expression has been probed) for a gene program not to be discarded.
max_source_genes_per_gp (Optional[int] (default: None)) – Maximum number of source genes in a gene program that can be available in the adata (gene expression has been probed) for a gene program not to be discarded.
max_target_genes_per_gp (Optional[int] (default: None)) – Maximum number of target genes in a gene program that can be available in the adata (gene expression has been probed) for a gene program not to be discarded.
filter_genes_not_in_masks (bool (default: False)) – If ´True´, remove the genes that are not in the gp masks from the adata object.
add_fc_gps_instead_of_gp_dict_gps (bool (default: False)) – Note: this parameter is just used for ablation studies. If ´True´, ignores the gene programs from the gp dict and instead creates a mask of fully-connected gene programs (same amount as gps in the gp dict).
plot_gp_gene_count_distributions (bool (default: False)) – If ´True´, display the distribution of gene programs per number of source and target genes.