nichecompass.utils.add_multimodal_mask_to_adata

nichecompass.utils.add_multimodal_mask_to_adata(adata, adata_atac, gene_peak_mapping_dict, filter_peaks_based_on_genes=True, filter_hvg_peaks=False, n_hvg_peaks=4000, batch_key='batch', gene_peaks_mask_key='nichecompass_gene_peaks', gp_targets_mask_key='nichecompass_gp_targets', gp_sources_mask_key='nichecompass_gp_sources', gp_names_key='nichecompass_gp_names', ca_targets_mask_key='nichecompass_ca_targets', ca_sources_mask_key='nichecompass_ca_sources', source_peaks_idx_key='nichecompass_source_peaks_idx', target_peaks_idx_key='nichecompass_target_peaks_idx', peaks_idx_key='nichecompass_peaks_idx')

Retrieve atac target and source gene program masks from the rna gene program masks stored in ´adata´. This is achieved by mapping the genes from the gene programs to the peaks defined in the mapping dictionary. Only consider peaks that are in ´adata_atac´ and store the results as sparse boolean matrices in ´adata_atac.varm´. Also store a gene peak mapping mask in ´adata´.

Parameters:
  • adata (AnnData) – AnnData rna object with rna gene program masks stored in ´adata.varm[gp_targets_mask_key]´ and ´adata.varm[gp_sources_mask_key]´, and gene program names stored in ´adata.uns[gp_names_key]´.

  • adata_atac (AnnData) – AnnData atac object to which the atac gene program masks will be added.

  • gene_peak_mapping_dict (dict) – A mapping dictionary with uppercase genes as keys and the corresponding list of peaks as values.

  • filter_peaks_based_on_genes (bool (default: True)) – If ´True´, filter ´adata_atac´ to only keep peaks that are mapped to genes in ´gene_peak_mapping_dict´.

  • filter_hvg_peaks (bool (default: False)) – If ´True´, filter ´adata_atac´ to only keep the ´n_hvg_peaks´ highly variable peaks. Is applied after gene-based peak filter.

  • n_hvg_peaks (int (default: 4000)) – Number of highly variable peaks to be filtered if ´filter_hvg_peaks´ is ´True´.

  • batch_key (bool (default: 'batch')) – Key in ´adata.obs´ where the batches for highly variable peak filtering are stored if ´filter_hvg_peaks´ is ´True´.

  • gene_peaks_mask_key (str (default: 'nichecompass_gene_peaks')) – Key in ´adata.varm´ where the binary mapping mask from genes to peaks will be stored.

  • gp_targets_mask_key (str (default: 'nichecompass_gp_targets')) – Key in ´adata.varm´ where the binary gene program mask for target genes of a gene program is stored.

  • gp_sources_mask_key (str (default: 'nichecompass_gp_sources')) – Key in ´adata.varm´ where the binary gene program mask for source genes of a gene program is stored.

  • gp_names_key (str (default: 'nichecompass_gp_names')) – Key in ´adata.uns´ where the gene program names are stored.

  • ca_targets_mask_key (str (default: 'nichecompass_ca_targets')) – Key in ´adata_atac.varm´ where the binary gene program mask for target peaks of a gene program will be stored.

  • ca_sources_mask_key (str (default: 'nichecompass_ca_sources')) – Key in ´adata_atac.varm´ where the binary gene program mask for source peaks of a gene program will be stored.

  • source_peaks_idx_key (str (default: 'nichecompass_source_peaks_idx')) – Key in ´adata_atac.uns´ where the index of the source peaks that are in the atac source mask will be stored.

  • target_peaks_idx_key (str (default: 'nichecompass_target_peaks_idx')) – Key in ´adata_atac.uns´ where the index of the target peaks that are in the atac target mask will be stored.

  • peaks_idx_key (str (default: 'nichecompass_peaks_idx')) – Key in ´adata_atac.uns´ where the index of a concatenated vector of target and source peaks that are in the atac masks will be stored.

Return type:

Tuple[AnnData, AnnData]

Returns:

adata:

The modified AnnData rna object with the gene peak mask stored.

adata_atac:

The modified AnnData atac object with atac gene program masks stored.