nichecompass.data.initialize_dataloaders

nichecompass.data.initialize_dataloaders(node_masked_data, edge_train_data=None, edge_val_data=None, edge_batch_size=64, node_batch_size=64, n_direct_neighbors=-1, n_hops=1, shuffle=True, edges_directed=False, neg_edge_sampling_ratio=1.0)

Initialize edge-level and node-level training and validation dataloaders.

Parameters:
  • node_masked_data (Data) – PyG Data object with node-level split masks.

  • edge_train_data (Optional[Data] (default: None)) – PyG Data object containing the edge-level training set.

  • edge_val_data (Optional[Data] (default: None)) – PyG Data object containing the edge-level validation set.

  • edge_batch_size (Optional[int] (default: 64)) – Batch size for the edge-level dataloaders.

  • node_batch_size (int (default: 64)) – Batch size for the node-level dataloaders.

  • n_direct_neighbors (int (default: -1)) – Number of sampled direct neighbors of the current batch nodes to be included in the batch. Defaults to ´-1´, which means to include all direct neighbors.

  • n_hops (int (default: 1)) – Number of neighbor hops / levels for neighbor sampling of nodes to be included in the current batch. E.g. ´2´ means to not only include sampled direct neighbors of current batch nodes but also sampled neighbors of the direct neighbors.

  • shuffle (bool (default: True)) – If True, shuffle the dataloaders.

  • edges_directed (bool (default: False)) – If False, both symmetric edge index pairs are included in the same edge-level batch (1 edge has 2 symmetric edge index pairs).

  • neg_edge_sampling_ratio (float (default: 1.0)) – Negative sampling ratio of edges. This is currently implemented in an approximate way, i.e. negative edges may contain false negatives.

Return type:

dict

Returns:

loader_dict: Dictionary containing training and validation PyG LinkNeighborLoader (for edge reconstruction) and NeighborLoader (for gene expression reconstruction) objects.