nichecompass.utils.extract_gp_dict_from_nichenet_lrt_interactions

nichecompass.utils.extract_gp_dict_from_nichenet_lrt_interactions(species, version='v2', keep_target_genes_ratio=1.0, max_n_target_genes_per_gp=250, load_from_disk=False, save_to_disk=False, lr_network_file_path='nichenet_lr_network.csv', ligand_target_matrix_file_path='../data/gene_programs/nichenet_ligand_target_matrix.csv', gene_orthologs_mapping_file_path='../data/gene_annotations/human_mouse_gene_orthologs.csv', plot_gp_gene_count_distributions=True, gp_gene_count_distributions_save_path=None)

Retrieve the NicheNet ligand receptor network and ligand target gene regulatory potential matrix as described in Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat. Methods 17, 159–162 (2020), and extract 1287 mouse or 1226 human gene programs of ligands with their corresponding receptors and top target genes based on NicheNet regulatory potential scores.

Parameters:
  • species (Literal['mouse', 'human']) – Species for which the gps will be extracted. The default is human and, if version is ‘v1’, human genes are mapped to mouse orthologs using a mapping file. NicheCompass contains a default mapping file stored under “<root>/data/gene_annotations/human_mouse_gene_orthologs.csv”, which was created with Ensembl BioMart (http://www.ensembl.org/info/data/biomart/index.html).

  • version (Literal['v1', 'v2'] (default: 'v2')) – Version of the NicheNet ligand receptor network and ligand target gene regulatory potential matrix. ´v2´ is an improved version of ´v1´, and has separate files for mouse and human.

  • keep_target_genes_ratio (float (default: 1.0)) – Ratio of target genes that are kept compared to total target genes. This ratio is applied over the entire matrix (not on gene program level), and determines the ´all_gps_score_keep_threshold´, which will be used to filter target genes according to their regulatory potential scores.

  • max_n_target_genes_per_gp (int (default: 250)) – Maximum number of target genes per gene program. If a gene program has more target genes than ´max_n_target_genes_per_gp´, only the ´max_n_target_genes_per_gp´ gene programs with the highest regulatory potential scores will be kept. Default value is chosen based on MultiNicheNet specification (s. Browaeys, R. et al. MultiNicheNet: a flexible framework for differential cell-cell communication analysis from multi-sample multi-condition single-cell transcriptomics data. bioRxiv (2023) doi:10.1101/2023.06.13.544751).

  • load_from_disk (bool (default: False)) – If ´True´, the NicheNet files will be loaded from disk instead of the web.

  • save_to_disk (bool (default: False)) – If ´True´, the NicheNet files will additionally be stored on disk.

  • lr_network_file_path (Optional[str] (default: 'nichenet_lr_network.csv')) – Path of the file where the NicheNet ligand receptor network will be stored (if ´save_to_disk´ is ´True´) or loaded from (if ´load_from_disk´ is ´True´).

  • ligand_target_matrix_file_path (Optional[str] (default: '../data/gene_programs/nichenet_ligand_target_matrix.csv')) – Path of the file where the NicheNet ligand target matrix will be stored (if ´save_to_disk´ is ´True´) or loaded from (if ´load_from_disk´ is ´True´).

  • gene_orthologs_mapping_file_path (Optional[str] (default: '../data/gene_annotations/human_mouse_gene_orthologs.csv')) – Path of the file where the gene orthologs mapping is stored. Relevant if version is ´v1´ and species is ´mouse´.

  • plot_gp_gene_count_distributions (bool (default: True)) – If ´True´, display the distribution of gene programs per number of sources and targets.

  • gp_gene_count_distributions_save_path (Optional[str] (default: None)) – Path of the file where the gene program gene count distribution plot will be saved if ´plot_gp_gene_count_distributions´ is ´True´.

Return type:

dict

Returns:

gp_dict: Nested dictionary containing the NicheNet ligand receptor target gene programs with keys being gene program names and values being dictionaries with keys ´sources´, ´targets´, ´sources_categories´, and ´target_categories´, where ´sources´ contains the NicheNet ligands, ´targets´ contains the NicheNet receptors and target genes, ´sources_categories´ contains the categories of the sources, and ´target_categories´ contains the categories of the targets.