nichecompass.utils.extract_gp_dict_from_collectri_tf_network

nichecompass.utils.extract_gp_dict_from_collectri_tf_network(species, tf_network_file_path='collectri_tf_network.csv', load_from_disk=False, save_to_disk=False, plot_gp_gene_count_distributions=True, gp_gene_count_distributions_save_path=None)

Retrieve 1072 mouse or 1186 human transcription factor (TF) target gene gene programs from CollecTRI via decoupler. CollecTRI is a comprehensive resource containing a curated collection of TFs and their transcriptional targets compiled from 12 different resources. This collection provides an increased coverage of transcription factors and a superior performance in identifying perturbed TFs compared to the DoRothEA network and other literature based GRNs see https://decoupler-py.readthedocs.io/en/latest/notebooks/dorothea.html).

Parameters:

species (Literal['mouse', 'human']) – Species for which the gene programs will be extracted.
load_from_disk (bool (default: False)) – If ´True´, the CollecTRI TF network will be loaded from disk instead of from the decoupler library.
save_to_disk (bool (default: False)) – If ´True´, the CollecTRI TF network will additionally be stored on disk. Only applies if ´load_from_disk´ is ´False´.
plot_gp_gene_count_distributions (bool (default: True)) – If ´True´, display the distribution of gene programs per number of source and target genes.
gp_gene_count_distributions_save_path (Optional[str] (default: None)) – Path of the file where the gene program gene count distribution plot will be saved if ´plot_gp_gene_count_distributions´ is ´True´.

Return type:

dict

Returns:

gp_dict: Nested dictionary containing the CollecTRI TF target genes gene programs with keys being gene program names and values being dictionaries with keys ´sources´, ´targets´, ´sources_categories´, and ´targets_categories´, where ´sources´ and ´targets´ contain the CollecTRI TFs and target genes, and ´sources_categories´ and ´targets_categories´ contain the categories of all genes (‘tf’ or ‘target_gene’).