xchrom.tl.generate_tf_activity_data
- xchrom.tl.generate_tf_activity_data(bed_file: str | Path, input_fasta: str | Path, motif_file: str | Path, output_dir: str | Path, n_samples: int = 1000, seq_len: int = 1344, n_motif_instances: int = 1000, seed: int = 10)[source]
Prepare motif data and background sequences for TF activity calculation
- Parameters:
bed_file (Union[str, Path]) – BED file path, containing peak regions
input_fasta (Union[str, Path]) – Reference genome FASTA file path
motif_file (Union[str, Path]) – MEME format motif file path
output_dir (Union[str, Path]) – Output directory path for the generated data
n_samples (int, default 1000) – Number of sampled peaks
seq_len (int, default 1344) – Sequence length
n_motif_instances (int, default 1000) – Number of instances to generate for each motif
seed (int, default 10) – Random seed
- Returns:
(background_fasta_path, motif_dir_path) - background sequence file path and motif directory path
- Return type:
tuple
Examples
>>> bg_fasta, motif_dir = prepare_motif_data( ... bed_file="peaks.bed", ... input_fasta="hg38.fa", ... motif_file="motifs.meme", ... output_dir="./motif_fasta", ... n_samples=1000, ... seed=10 ... )