xchrom.tr.Generator
- class xchrom.tr.Generator(seq_path, adata, cell_input_key, peakid=None, m=None, batch_size=128)[source]
Generate input data for XChrom model training and create TensorFlow datasets.
This class combines data generation and dataset creation functionality, compatible with tf.data.Dataset.from_generator.
- Parameters:
seq_path (str) – The path to the sequence HDF5 file, generated by make_h5_sparse function
adata (anndata.AnnData) – The anndata object containing: - adata.X: The raw count matrix of data - adata.obs[‘b_zscore’]: The sequencing depth vector - adata.obsm[cell_embed]: The initial cell embedding matrix
cell_input_key (str, default 'zscore32_perpc') – The key name of the initial cell embedding in adata.obsm
peakid (array-like, optional) – The array of peak indices to extract adata.X data. If None, uses sorted(range(adata.shape[1]))
m (scipy.sparse matrix, optional) – The peak-cell matrix. Can be loaded with sparse.load_npz(‘m.npz’) or adata.X.toarray().T. If None, uses adata.X.toarray().T
batch_size (int, default 128) – The batch size for the dataset
- n_cells
The number of cells
- Type:
int
- n_peaks
The number of peaks
- Type:
int
Examples
>>> gen1 = Generator(seq_path='sequence.h5', adata=ad_train, m=m_train, ... cell_input_key='zscore32_perpc', batch_size=256) >>> train_ds = gen1.create_dataset(shuffle=True) >>> gen2 = Generator(seq_path='sequence.h5', adata=ad_val, m=m_val, ... cell_input_key='zscore32_perpc', batch_size=256) >>> val_ds = gen2.create_dataset(shuffle=False) >>> model.fit(train_ds, validation_data=val_ds, ... batch_size=128,epochs=1000, callbacks=callbacks_list)
Methods
__init__(seq_path, adata, cell_input_key[, ...])close()Explicitly close the HDF5 file
create_dataset([shuffle])Create a TensorFlow dataset from the generator.
get_dataset_info()Get information about the dataset.