XChrom quick start guide

This notebook walks users through a tiny end-to-end run to verify that XChrom is installed and working. We will load example data, train a small model, and visualize the training history.

Workflow:

  • Import the XChrom package.

  • List and access the bundled demo data.

  • Train XChrom model using the demo data.

  • Save the model and plot the training curves.

Demo data (model input)

  • The demo data consists of a small subset (100 cells and 1000 peaks) sampled from the original dataset.

  • Preprocessed training and test data are provided in xchrom/data/train_data/ directory.

  • The raw cell embeddings h5ad file is provided in the xchrom/data/test_rna.h5ad directory.

Output

  • Trained model: ./data/quick_start/E1000best_model.h5

  • Training history (pickle): ./data/quick_start/history.pickle

  • Training history plot (PDF): ./data/quick_start/train_history_plot.pdf

[1]:
import xchrom as xc
[2]:
# list all available files and directories in xchrom test data
print("available files and directories:", xc.list_items())
# get data directory path
print("data directory:", xc.get_data_dir())
available files and directories: ['test_rna.h5ad', 'train_data']
data directory: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data
[3]:
data_path = xc.get_data_dir()
[4]:
history = xc.tr.train_XChrom(
    input_folder = f'{data_path}/train_data',
    cell_embedding_ad = f'{data_path}/test_rna.h5ad',
    cellembed_raw='X_pca',
    out_path='./data/quick_start/train_out/',
    epochs = 10,
    verbose = 1
)
=== Start training XChrom model ===
Input folder: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data/train_data
Cell embedding file: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data/test_rna.h5ad
Raw cell embedding key: X_pca
Output path: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/source/data/quick_start/train_out
Model parameters: bottleneck=32, batch_size=128, lr=0.01
1. Load raw cell embedding and make z-score normalization...
Raw cell embedding saved to: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data/test_rna.h5ad.obsm['X_pca']
Initial cell embedding saved to: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data/test_rna.h5ad.obsm['zscore32_perpc']
Initial cell embedding shape: (100, 32)
2. Load training data...
3. Prepare train/val data split...
Training peak number: 810, Validation peak number: 91
4. Create TensorFlow dataset...
2025-08-22 15:03:28.083032: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations:  SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-08-22 15:03:29.667224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20361 MB memory:  -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:31:00.0, compute capability: 8.6
2025-08-22 15:03:29.667868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22350 MB memory:  -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:4b:00.0, compute capability: 8.6
2025-08-22 15:03:29.668297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 22350 MB memory:  -> device: 2, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:98:00.0, compute capability: 8.6
2025-08-22 15:03:29.668681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 22350 MB memory:  -> device: 3, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:b1:00.0, compute capability: 8.6
5. Build and compile model...
Model: "model"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to
==================================================================================================
sequence (InputLayer)           [(None, 1344, 4)]    0
__________________________________________________________________________________________________
stochastic_reverse_complement ( ((None, 1344, 4), () 0           sequence[0][0]
__________________________________________________________________________________________________
stochastic_shift (StochasticShi (None, 1344, 4)      0           stochastic_reverse_complement[0][
__________________________________________________________________________________________________
gelu (GELU)                     (None, 1344, 4)      0           stochastic_shift[0][0]
__________________________________________________________________________________________________
conv1d (Conv1D)                 (None, 1344, 288)    19584       gelu[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 1344, 288)    1152        conv1d[0][0]
__________________________________________________________________________________________________
max_pooling1d (MaxPooling1D)    (None, 448, 288)     0           batch_normalization[0][0]
__________________________________________________________________________________________________
gelu_1 (GELU)                   (None, 448, 288)     0           max_pooling1d[0][0]
__________________________________________________________________________________________________
conv1d_1 (Conv1D)               (None, 448, 288)     414720      gelu_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 448, 288)     1152        conv1d_1[0][0]
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D)  (None, 224, 288)     0           batch_normalization_1[0][0]
__________________________________________________________________________________________________
gelu_2 (GELU)                   (None, 224, 288)     0           max_pooling1d_1[0][0]
__________________________________________________________________________________________________
conv1d_2 (Conv1D)               (None, 224, 323)     465120      gelu_2[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 224, 323)     1292        conv1d_2[0][0]
__________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D)  (None, 112, 323)     0           batch_normalization_2[0][0]
__________________________________________________________________________________________________
gelu_3 (GELU)                   (None, 112, 323)     0           max_pooling1d_2[0][0]
__________________________________________________________________________________________________
conv1d_3 (Conv1D)               (None, 112, 363)     586245      gelu_3[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 112, 363)     1452        conv1d_3[0][0]
__________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D)  (None, 56, 363)      0           batch_normalization_3[0][0]
__________________________________________________________________________________________________
gelu_4 (GELU)                   (None, 56, 363)      0           max_pooling1d_3[0][0]
__________________________________________________________________________________________________
conv1d_4 (Conv1D)               (None, 56, 407)      738705      gelu_4[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 56, 407)      1628        conv1d_4[0][0]
__________________________________________________________________________________________________
max_pooling1d_4 (MaxPooling1D)  (None, 28, 407)      0           batch_normalization_4[0][0]
__________________________________________________________________________________________________
gelu_5 (GELU)                   (None, 28, 407)      0           max_pooling1d_4[0][0]
__________________________________________________________________________________________________
conv1d_5 (Conv1D)               (None, 28, 456)      927960      gelu_5[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 28, 456)      1824        conv1d_5[0][0]
__________________________________________________________________________________________________
max_pooling1d_5 (MaxPooling1D)  (None, 14, 456)      0           batch_normalization_5[0][0]
__________________________________________________________________________________________________
gelu_6 (GELU)                   (None, 14, 456)      0           max_pooling1d_5[0][0]
__________________________________________________________________________________________________
conv1d_6 (Conv1D)               (None, 14, 512)      1167360     gelu_6[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 14, 512)      2048        conv1d_6[0][0]
__________________________________________________________________________________________________
max_pooling1d_6 (MaxPooling1D)  (None, 7, 512)       0           batch_normalization_6[0][0]
__________________________________________________________________________________________________
gelu_7 (GELU)                   (None, 7, 512)       0           max_pooling1d_6[0][0]
__________________________________________________________________________________________________
conv1d_7 (Conv1D)               (None, 7, 256)       131072      gelu_7[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 7, 256)       1024        conv1d_7[0][0]
__________________________________________________________________________________________________
gelu_8 (GELU)                   (None, 7, 256)       0           batch_normalization_7[0][0]
__________________________________________________________________________________________________
reshape (Reshape)               (None, 1, 1792)      0           gelu_8[0][0]
__________________________________________________________________________________________________
dense (Dense)                   (None, 1, 32)        57344       reshape[0][0]
__________________________________________________________________________________________________
cell_embed (InputLayer)         [(None, 91, 32)]     0
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 1, 32)        128         dense[0][0]
__________________________________________________________________________________________________
lambda (Lambda)                 (None, 91, 32)       0           cell_embed[0][0]
__________________________________________________________________________________________________
dropout (Dropout)               (None, 1, 32)        0           batch_normalization_8[0][0]
__________________________________________________________________________________________________
layer_normalization (LayerNorma (None, 91, 32)       64          lambda[0][0]
__________________________________________________________________________________________________
gelu_9 (GELU)                   (None, 1, 32)        0           dropout[0][0]
__________________________________________________________________________________________________
dense_1 (Dense)                 (None, 91, 64)       2112        layer_normalization[0][0]
__________________________________________________________________________________________________
tf.compat.v1.squeeze (TFOpLambd (None, 32)           0           gelu_9[0][0]
__________________________________________________________________________________________________
sequencing_depth (InputLayer)   [(None, 91)]         0
__________________________________________________________________________________________________
final_cellembed (Dense)         (None, 91, 32)       2080        dense_1[0][0]
__________________________________________________________________________________________________
tf.expand_dims (TFOpLambda)     (None, 32, 1)        0           tf.compat.v1.squeeze[0][0]
__________________________________________________________________________________________________
tf.expand_dims_1 (TFOpLambda)   (None, 91, 1)        0           sequencing_depth[0][0]
__________________________________________________________________________________________________
tf.linalg.matmul (TFOpLambda)   (None, 91, 1)        0           final_cellembed[0][0]
                                                                 tf.expand_dims[0][0]
__________________________________________________________________________________________________
dense_2 (Dense)                 (None, 91, 1)        2           tf.expand_dims_1[0][0]
__________________________________________________________________________________________________
tf.compat.v1.squeeze_2 (TFOpLam (None, 91)           0           tf.linalg.matmul[0][0]
__________________________________________________________________________________________________
tf.compat.v1.squeeze_1 (TFOpLam (None, 91)           0           dense_2[0][0]
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None, 91)           0           tf.compat.v1.squeeze_2[0][0]
                                                                 tf.compat.v1.squeeze_1[0][0]
__________________________________________________________________________________________________
tf.math.sigmoid (TFOpLambda)    (None, 91)           0           tf.__operators__.add[0][0]
==================================================================================================
Total params: 4,524,068
Trainable params: 4,518,218
Non-trainable params: 5,850
__________________________________________________________________________________________________
6. Set training callbacks...
7. Start training...
Model will be saved to: data/quick_start/train_out/E1000best_model.h5
Epoch 1/10
2025-08-22 15:03:32.399414: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2025-08-22 15:03:34.311489: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8800
2025-08-22 15:03:34.384349: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
7/7 [==============================] - 10s 379ms/step - loss: 0.8043 - binary_accuracy: 0.7448 - auc: 0.5669 - pr: 0.1801 - val_loss: 178.2907 - val_binary_accuracy: 0.8591 - val_auc: 0.5000 - val_pr: 0.1409
Epoch 2/10
7/7 [==============================] - 1s 74ms/step - loss: 0.4698 - binary_accuracy: 0.8192 - auc: 0.6475 - pr: 0.2310 - val_loss: 6.9911 - val_binary_accuracy: 0.8555 - val_auc: 0.5062 - val_pr: 0.1476
Epoch 3/10
7/7 [==============================] - 1s 73ms/step - loss: 0.4251 - binary_accuracy: 0.8224 - auc: 0.6977 - pr: 0.2798 - val_loss: 1.7770 - val_binary_accuracy: 0.8257 - val_auc: 0.6334 - val_pr: 0.2380
Epoch 4/10
7/7 [==============================] - 1s 73ms/step - loss: 0.4052 - binary_accuracy: 0.8393 - auc: 0.7175 - pr: 0.3142 - val_loss: 0.7914 - val_binary_accuracy: 0.8022 - val_auc: 0.6343 - val_pr: 0.2196
Epoch 5/10
7/7 [==============================] - 1s 77ms/step - loss: 0.3960 - binary_accuracy: 0.8454 - auc: 0.7162 - pr: 0.3061 - val_loss: 0.5019 - val_binary_accuracy: 0.8017 - val_auc: 0.6692 - val_pr: 0.2455
Epoch 6/10
7/7 [==============================] - 1s 71ms/step - loss: 0.3873 - binary_accuracy: 0.8467 - auc: 0.7273 - pr: 0.3299 - val_loss: 0.4448 - val_binary_accuracy: 0.7977 - val_auc: 0.6930 - val_pr: 0.2637
Epoch 7/10
7/7 [==============================] - 1s 73ms/step - loss: 0.3859 - binary_accuracy: 0.8438 - auc: 0.7382 - pr: 0.3403 - val_loss: 0.4215 - val_binary_accuracy: 0.8044 - val_auc: 0.7035 - val_pr: 0.2759
Epoch 8/10
7/7 [==============================] - 1s 73ms/step - loss: 0.3791 - binary_accuracy: 0.8476 - auc: 0.7433 - pr: 0.3548 - val_loss: 0.3979 - val_binary_accuracy: 0.8332 - val_auc: 0.7113 - val_pr: 0.2880
Epoch 9/10
7/7 [==============================] - 1s 71ms/step - loss: 0.3783 - binary_accuracy: 0.8477 - auc: 0.7458 - pr: 0.3519 - val_loss: 0.3838 - val_binary_accuracy: 0.8457 - val_auc: 0.7182 - val_pr: 0.2903
Epoch 10/10
7/7 [==============================] - 1s 73ms/step - loss: 0.3785 - binary_accuracy: 0.8485 - auc: 0.7446 - pr: 0.3521 - val_loss: 0.3801 - val_binary_accuracy: 0.8504 - val_auc: 0.7290 - val_pr: 0.3060
8. Save training results...
=== Training completed! ===
Best model: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/source/data/quick_start/train_out/E1000best_model.h5
Training history: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/source/data/quick_start/train_out/history.pickle
[5]:
xc.pl.plot_train_history(
    history = history['history'],
    savefig = True,
    out_file = './data/quick_start/train_out/train_history_plot.pdf'
    )
_images/quick_start_5_0.png

XChrom has been successfully installed and loaded! You can now use XChrom for analysis.