xchrom.tr.Generator

class xchrom.tr.Generator(seq_path, adata, cell_input_key, peakid=None, m=None, batch_size=128)[source]

Generate input data for XChrom model training and create TensorFlow datasets.

This class combines data generation and dataset creation functionality, compatible with tf.data.Dataset.from_generator.

Parameters:
  • seq_path (str) – The path to the sequence HDF5 file, generated by make_h5_sparse function

  • adata (anndata.AnnData) – The anndata object containing: - adata.X: The raw count matrix of data - adata.obs[‘b_zscore’]: The sequencing depth vector - adata.obsm[cell_embed]: The initial cell embedding matrix

  • cell_input_key (str, default 'zscore32_perpc') – The key name of the initial cell embedding in adata.obsm

  • peakid (array-like, optional) – The array of peak indices to extract adata.X data. If None, uses sorted(range(adata.shape[1]))

  • m (scipy.sparse matrix, optional) – The peak-cell matrix. Can be loaded with sparse.load_npz(‘m.npz’) or adata.X.toarray().T. If None, uses adata.X.toarray().T

  • batch_size (int, default 128) – The batch size for the dataset

n_cells

The number of cells

Type:

int

n_peaks

The number of peaks

Type:

int

Examples

>>> gen1 = Generator(seq_path='sequence.h5', adata=ad_train, m=m_train,
...                 cell_input_key='zscore32_perpc', batch_size=256)
>>> train_ds = gen1.create_dataset(shuffle=True)
>>> gen2 = Generator(seq_path='sequence.h5', adata=ad_val, m=m_val,
...                 cell_input_key='zscore32_perpc', batch_size=256)
>>> val_ds = gen2.create_dataset(shuffle=False)
>>> model.fit(train_ds, validation_data=val_ds,
...           batch_size=128,epochs=1000, callbacks=callbacks_list)
__init__(seq_path, adata, cell_input_key, peakid=None, m=None, batch_size=128)[source]

Methods

__init__(seq_path, adata, cell_input_key[, ...])

close()

Explicitly close the HDF5 file

create_dataset([shuffle])

Create a TensorFlow dataset from the generator.

get_dataset_info()

Get information about the dataset.