nichecompass.utils.get_gene_annotations

nichecompass.utils.get_gene_annotations(adata, adata_atac=None, gtf_file_path='../data/gene_annotations/gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz', adata_join_col_name=None, gtf_join_col_name='gene_name', by_func=None, drop_unannotated_genes=True)

Get genomic annotations including chromosomal bp positions of genes by joining with a GTF file from GENCODE. The GFT file is provided but can also be downloaded from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.chr_patch_hapl_scaff.annotation.gff3.gz. for example.

Parts of the implementation are adapted from Cao, Z.-J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022) -> https://github.com/gao-lab/GLUE/blob/master/scglue/data.py#L86; 14.04.23.

Parameters:
  • adata (AnnData) – AnnData rna object for which to get gene annotations.

  • adata_join_col_name (Optional[str] (default: None)) – Column in ´adata.var´ that is used to merge with GTF file. If ´None´, ´adata.var_names´ is used.

  • gtf_file_path (Optional[PathLike] (default: '../data/gene_annotations/gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz')) – Path to the GTF file used to get gene annotations.

  • gtf_join_col_name (Optional[str] (default: 'gene_name')) – Column in GTF file that is used to merge with ´adata.var´, e.g. ´gene_id´, or ´gene_name´.

  • by_func (Optional[Callable] (default: None)) – An element-wise function used to transform merging fields, e.g. for removing suffix in gene IDs.

  • drop_unannotated_genes (bool (default: True)) – If ´True´, drop genes for which no annotation was found.

Return type:

Tuple[AnnData, AnnData]

Returns:

adata:

The annotated AnnData rna object.

adata_atac:

The annotated AnnData atac object.

Note

The genomic locations are converted to 0-based as specified in bed format rather than 1-based as specified in GTF format.