xchrom.tl.calc_ism_from_bed
- xchrom.tl.calc_ism_from_bed(cell_embedding_ad: AnnData, peak_bed: str | Path, fasta_file: str | Path, XChrom_model: tensorflow.keras.Model, output_path: str | Path, cellembed_raw: str = 'X_pca_harmony', seq_len: int = 1344, save_individual: bool = True, **calc_ism_kwargs)[source]
Calculate the ISM from BED file.
This function performs end-to-end ISM calculation starting from genomic coordinates in BED format. It extracts sequences, converts to one-hot encoding, and computes ISM matrices for all peaks.
- Parameters:
cell_embedding_ad (anndata.AnnData) – anndata object with Initial cell embeddings
cellembed_raw (str) – Key of the raw cell input embedding in the cell embedding adata,to generate model input.
peak_bed (Union[str, Path]) – Path to BED file containing peak coordinates
fasta_file (Union[str, Path]) – Path to genome FASTA file
XChrom_model (tf.keras.Model) – XChrom model with trained weights
output_path (Union[str, Path]) – Directory to save ISM results
seq_len (int) – Sequence length, default 1344
save_individual (bool) – Whether to save individual peak ISM files, default True
**calc_ism_kwargs – Additional keyword arguments passed to calc_ism function
- Returns:
List of ISM matrices for all peaks, each with shape (n_cells, seq_len, 4)
- Return type:
list
Examples
>>> ism_results = calc_ism_from_bed( ... peak_bed='peaks.bed', ... fasta_file='hg38.fa', ... XChrom_model=trained_model, ... output_path='./ISM_results/' ... ) >>> print(f"Processed {len(ism_results)} peaks") >>> print(f"Each ISM matrix shape: {ism_results[0].shape}")
Files Created
output_path/peakN_ism.npy : Individual ISM matrix for peak N (if save_individual=True) output_path/all_peaks_ism.npy : Combined ISM matrices for all peaks output_path/peak_coordinates.txt : Peak coordinates reference file