nichecompass.utils.extract_gp_dict_from_omnipath_lr_interactions

nichecompass.utils.extract_gp_dict_from_omnipath_lr_interactions(species, min_curation_effort=2, load_from_disk=False, save_to_disk=False, lr_network_file_path='../data/gene_programs/omnipath_lr_network.csv', gene_orthologs_mapping_file_path='../data/gene_annotations/human_mouse_gene_orthologs.csv', plot_gp_gene_count_distributions=True, gp_gene_count_distributions_save_path=None)

Retrieve 724 human ligand-receptor interactions from OmniPath and extract them into a gene program dictionary. OmniPath is a database of molecular biology prior knowledge that combines intercellular communication data from many different resources (all resources for intercellular communication included in OmniPath can be queried via ´op.requests.Intercell.resources()´). If ´species´ is ´mouse´, orthologs from human interactions are returned.

Parts of the implementation are inspired by https://workflows.omnipathdb.org/intercell-networks-py.html (01.10.2022).

Parameters:
  • species (Literal['mouse', 'human']) – Species for which the gene programs will be extracted. The default is human. Human genes are mapped to mouse orthologs using a mapping file. NicheCompass contains a default mapping file stored under “<root>/data/gene_annotations/human_mouse_gene_orthologs.csv”, which was created with Ensembl BioMart (http://www.ensembl.org/info/data/biomart/index.html).

  • min_curation_effort (int (default: 2)) – Indicates how many times an interaction has to be described in a paper and mentioned in a database to be included in the retrieval.

  • load_from_disk (bool (default: False)) – If ´True´, the OmniPath ligand receptor interactions will be loaded from disk instead of from the OmniPath library.

  • save_to_disk (bool (default: False)) – If ´True´, the OmniPath ligand receptor interactions will additionally be stored on disk. Only applies if ´load_from_disk´ is ´False´.

  • lr_network_file_path (Optional[str] (default: '../data/gene_programs/omnipath_lr_network.csv')) – Path of the file where the OmniPath ligand receptor interactions will be stored (if ´save_to_disk´ is ´True´) or loaded from (if ´load_from_disk´ is ´True´).

  • gene_orthologs_mapping_file_path (Optional[str] (default: '../data/gene_annotations/human_mouse_gene_orthologs.csv')) – Path of the file where the gene orthologs mapping is stored if species is ´mouse´.

  • plot_gp_gene_count_distributions (bool (default: True)) – If ´True´, display the distribution of gene programs per number of source and target genes.

  • gp_gene_count_distributions_save_path (Optional[str] (default: None)) – Path of the file where the gene program gene count distribution plot will be saved if ´plot_gp_gene_count_distributions´ is ´True´.

Return type:

dict

Returns:

gp_dict: Nested dictionary containing the OmniPath ligand-receptor interaction gene programs with keys being gene program names and values being dictionaries with keys ´sources´, ´targets´, ´sources_categories´, and ´targets_categories´, where ´sources´ contains the OmniPath ligands, ´targets´ contains the OmniPath receptors, ´sources_categories´ contains the categories of the sources, and ´targets_categories´ contains the categories of the targets.