xchrom.tl.calc_ism_from_bed

xchrom.tl.calc_ism_from_bed(cell_embedding_ad: AnnData, peak_bed: str | Path, fasta_file: str | Path, XChrom_model: tensorflow.keras.Model, output_path: str | Path, cellembed_raw: str = 'X_pca_harmony', seq_len: int = 1344, save_individual: bool = True, **calc_ism_kwargs)[source]

Calculate the ISM from BED file.

This function performs end-to-end ISM calculation starting from genomic coordinates in BED format. It extracts sequences, converts to one-hot encoding, and computes ISM matrices for all peaks.

Parameters:
  • cell_embedding_ad (anndata.AnnData) – anndata object with Initial cell embeddings

  • cellembed_raw (str) – Key of the raw cell input embedding in the cell embedding adata,to generate model input.

  • peak_bed (Union[str, Path]) – Path to BED file containing peak coordinates

  • fasta_file (Union[str, Path]) – Path to genome FASTA file

  • XChrom_model (tf.keras.Model) – XChrom model with trained weights

  • output_path (Union[str, Path]) – Directory to save ISM results

  • seq_len (int) – Sequence length, default 1344

  • save_individual (bool) – Whether to save individual peak ISM files, default True

  • **calc_ism_kwargs – Additional keyword arguments passed to calc_ism function

Returns:

List of ISM matrices for all peaks, each with shape (n_cells, seq_len, 4)

Return type:

list

Examples

>>> ism_results = calc_ism_from_bed(
...     peak_bed='peaks.bed',
...     fasta_file='hg38.fa',
...     XChrom_model=trained_model,
...     output_path='./ISM_results/'
... )
>>> print(f"Processed {len(ism_results)} peaks")
>>> print(f"Each ISM matrix shape: {ism_results[0].shape}")

Files Created

output_path/peakN_ism.npy : Individual ISM matrix for peak N (if save_individual=True) output_path/all_peaks_ism.npy : Combined ISM matrices for all peaks output_path/peak_coordinates.txt : Peak coordinates reference file