xchrom.tl.generate_tf_activity_data

xchrom.tl.generate_tf_activity_data(bed_file: str | Path, input_fasta: str | Path, motif_file: str | Path, output_dir: str | Path, n_samples: int = 1000, seq_len: int = 1344, n_motif_instances: int = 1000, seed: int = 10)[source]

Prepare motif data and background sequences for TF activity calculation

Parameters:
  • bed_file (Union[str, Path]) – BED file path, containing peak regions

  • input_fasta (Union[str, Path]) – Reference genome FASTA file path

  • motif_file (Union[str, Path]) – MEME format motif file path

  • output_dir (Union[str, Path]) – Output directory path for the generated data

  • n_samples (int, default 1000) – Number of sampled peaks

  • seq_len (int, default 1344) – Sequence length

  • n_motif_instances (int, default 1000) – Number of instances to generate for each motif

  • seed (int, default 10) – Random seed

Returns:

(background_fasta_path, motif_dir_path) - background sequence file path and motif directory path

Return type:

tuple

Examples

>>> bg_fasta, motif_dir = prepare_motif_data(
...     bed_file="peaks.bed",
...     input_fasta="hg38.fa",
...     motif_file="motifs.meme",
...     output_dir="./motif_fasta",
...     n_samples=1000,
...     seed=10
... )