nichecompass.utils.get_gene_annotations
- nichecompass.utils.get_gene_annotations(adata, adata_atac=None, gtf_file_path='../data/gene_annotations/gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz', adata_join_col_name=None, gtf_join_col_name='gene_name', by_func=None, drop_unannotated_genes=True)
Get genomic annotations including chromosomal bp positions of genes by joining with a GTF file from GENCODE. The GFT file is provided but can also be downloaded from https://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_mouse/release_M32/gencode.vM32.chr_patch_hapl_scaff.annotation.gff3.gz. for example.
Parts of the implementation are adapted from Cao, Z.-J. & Gao, G. Multi-omics single-cell data integration and regulatory inference with graph-linked embedding. Nat. Biotechnol. 40, 1458–1466 (2022) -> https://github.com/gao-lab/GLUE/blob/master/scglue/data.py#L86; 14.04.23.
- Parameters:
adata (
AnnData) – AnnData rna object for which to get gene annotations.adata_join_col_name (
Optional[str] (default:None)) – Column in ´adata.var´ that is used to merge with GTF file. If ´None´, ´adata.var_names´ is used.gtf_file_path (
Optional[PathLike] (default:'../data/gene_annotations/gencode.vM25.chr_patch_hapl_scaff.annotation.gtf.gz')) – Path to the GTF file used to get gene annotations.gtf_join_col_name (
Optional[str] (default:'gene_name')) – Column in GTF file that is used to merge with ´adata.var´, e.g. ´gene_id´, or ´gene_name´.by_func (
Optional[Callable] (default:None)) – An element-wise function used to transform merging fields, e.g. for removing suffix in gene IDs.drop_unannotated_genes (
bool(default:True)) – If ´True´, drop genes for which no annotation was found.
- Return type:
- Returns:
- adata:
The annotated AnnData rna object.
- adata_atac:
The annotated AnnData atac object.
Note
The genomic locations are converted to 0-based as specified in bed format rather than 1-based as specified in GTF format.