xchrom.tr.train_XChrom

xchrom.tr.train_XChrom(input_folder: str | Path, cell_embedding_ad: str | Path, out_path: str | Path = './train_out', bottleneck: int = 32, batch_size: int = 128, lr: float = 0.01, epochs: int = 1000, save_freq: int = 1000, trackscore: bool = False, celltype: str = 'celltype', seed: int = 20, train_split: float = 0.9, cellembed_raw: str = 'X_pca', verbose: Literal[0, 1, 2] = 1, print_scores: bool = False, **kwargs) Dict[str, Any][source]

Train XChrom model

Parameters:
  • input_folder (Union[str, Path]) – Preprocessed data folder, should contain: trainval_seqs.h5, splits.h5, ad_trainval.h5ad, m_trainval.npz

  • cell_embedding_ad (Union[str, Path]) – scRNA-seq data file path containing raw cell embedding

  • out_path (Union[str, Path], default 'train_out') – Output path

  • bottleneck (int, default 32) – Bottleneck layer size,should be the same as the dimension of raw cell embedding

  • batch_size (int, default 128) – Batch size

  • lr (float, default 0.01) – Learning rate

  • epochs (int, default 1000) – Number of training epochs

  • save_freq (int, default 1000) – Model saving frequency

  • trackscore (bool, default False) – Whether to compute score metrics every epoch

  • celltype (str, default 'cell_type') – Cell type label column name (used when trackscore=True)

  • seed (int, default 20) – Random seed

  • train_split (float, default 0.9) – Training set/validation set ratio

  • cellembed_raw (str, default 'X_pca') – Raw cell embedding key in cell embedding adata

  • verbose (int, default 1) – Training verbosity mode. 0=silent, 1=progress bar, 2=one line per epoch

  • print_scores (bool, default False) – Whether to print ns,ls scores every epoch when trackscore=True

  • **kwargs (dict) – Additional parameters

Returns:

Dictionary containing training history and model information

Return type:

Dict[str, Any]

Examples

>>> import xchrom as xc
>>> history = xc.tr.train_XChrom(
    input_folder='./data/1_within_sample/train_data/',
    cell_embedding_ad='./data/1_within_sample/m_brain_paired_rna.h5ad',
    cellembed_raw='X_pca',
    out_path='./data/1_within_sample/train_out/',
    trackscore = True,
    celltype = 'pc32_leiden',
    epochs = 1000,
    save_freq = 1000,
    verbose = 0,  # silent mode, no progress bar
    print_scores = False  # whether to print ns,ls scores every epoch when trackscore=True
    )