nichecompass.utils.extract_gp_dict_from_collectri_tf_network
- nichecompass.utils.extract_gp_dict_from_collectri_tf_network(species, tf_network_file_path='collectri_tf_network.csv', load_from_disk=False, save_to_disk=False, plot_gp_gene_count_distributions=True, gp_gene_count_distributions_save_path=None)
Retrieve 1072 mouse or 1186 human transcription factor (TF) target gene gene programs from CollecTRI via decoupler. CollecTRI is a comprehensive resource containing a curated collection of TFs and their transcriptional targets compiled from 12 different resources. This collection provides an increased coverage of transcription factors and a superior performance in identifying perturbed TFs compared to the DoRothEA network and other literature based GRNs see https://decoupler-py.readthedocs.io/en/latest/notebooks/dorothea.html).
- Parameters:
species (
Literal['mouse','human']) – Species for which the gene programs will be extracted.load_from_disk (
bool(default:False)) – If ´True´, the CollecTRI TF network will be loaded from disk instead of from the decoupler library.save_to_disk (
bool(default:False)) – If ´True´, the CollecTRI TF network will additionally be stored on disk. Only applies if ´load_from_disk´ is ´False´.plot_gp_gene_count_distributions (
bool(default:True)) – If ´True´, display the distribution of gene programs per number of source and target genes.gp_gene_count_distributions_save_path (
Optional[str] (default:None)) – Path of the file where the gene program gene count distribution plot will be saved if ´plot_gp_gene_count_distributions´ is ´True´.
- Return type:
- Returns:
gp_dict: Nested dictionary containing the CollecTRI TF target genes gene programs with keys being gene program names and values being dictionaries with keys ´sources´, ´targets´, ´sources_categories´, and ´targets_categories´, where ´sources´ and ´targets´ contain the CollecTRI TFs and target genes, and ´sources_categories´ and ´targets_categories´ contain the categories of all genes (‘tf’ or ‘target_gene’).