XChrom quick start guide
This notebook walks users through a tiny end-to-end run to verify that XChrom is installed and working. We will load example data, train a small model, and visualize the training history.
Workflow:
Import the XChrom package.
List and access the bundled demo data.
Train XChrom model using the demo data.
Save the model and plot the training curves.
Demo data (model input)
The demo data consists of a small subset (100 cells and 1000 peaks) sampled from the original dataset.
Preprocessed training and test data are provided in
xchrom/data/train_data/directory.The raw cell embeddings h5ad file is provided in the
xchrom/data/test_rna.h5addirectory.
Output
Trained model:
./data/quick_start/E1000best_model.h5Training history (pickle):
./data/quick_start/history.pickleTraining history plot (PDF):
./data/quick_start/train_history_plot.pdf
[1]:
import xchrom as xc
[2]:
# list all available files and directories in xchrom test data
print("available files and directories:", xc.list_items())
# get data directory path
print("data directory:", xc.get_data_dir())
available files and directories: ['test_rna.h5ad', 'train_data']
data directory: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data
[3]:
data_path = xc.get_data_dir()
[4]:
history = xc.tr.train_XChrom(
input_folder = f'{data_path}/train_data',
cell_embedding_ad = f'{data_path}/test_rna.h5ad',
cellembed_raw='X_pca',
out_path='./data/quick_start/train_out/',
epochs = 10,
verbose = 1
)
=== Start training XChrom model ===
Input folder: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data/train_data
Cell embedding file: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data/test_rna.h5ad
Raw cell embedding key: X_pca
Output path: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/source/data/quick_start/train_out
Model parameters: bottleneck=32, batch_size=128, lr=0.01
1. Load raw cell embedding and make z-score normalization...
Raw cell embedding saved to: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data/test_rna.h5ad.obsm['X_pca']
Initial cell embedding saved to: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/xchrom/data/test_rna.h5ad.obsm['zscore32_perpc']
Initial cell embedding shape: (100, 32)
2. Load training data...
3. Prepare train/val data split...
Training peak number: 810, Validation peak number: 91
4. Create TensorFlow dataset...
2025-08-22 15:03:28.083032: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-08-22 15:03:29.667224: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 20361 MB memory: -> device: 0, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:31:00.0, compute capability: 8.6
2025-08-22 15:03:29.667868: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 22350 MB memory: -> device: 1, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:4b:00.0, compute capability: 8.6
2025-08-22 15:03:29.668297: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:2 with 22350 MB memory: -> device: 2, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:98:00.0, compute capability: 8.6
2025-08-22 15:03:29.668681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1510] Created device /job:localhost/replica:0/task:0/device:GPU:3 with 22350 MB memory: -> device: 3, name: NVIDIA GeForce RTX 3090, pci bus id: 0000:b1:00.0, compute capability: 8.6
5. Build and compile model...
Model: "model"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
sequence (InputLayer) [(None, 1344, 4)] 0
__________________________________________________________________________________________________
stochastic_reverse_complement ( ((None, 1344, 4), () 0 sequence[0][0]
__________________________________________________________________________________________________
stochastic_shift (StochasticShi (None, 1344, 4) 0 stochastic_reverse_complement[0][
__________________________________________________________________________________________________
gelu (GELU) (None, 1344, 4) 0 stochastic_shift[0][0]
__________________________________________________________________________________________________
conv1d (Conv1D) (None, 1344, 288) 19584 gelu[0][0]
__________________________________________________________________________________________________
batch_normalization (BatchNorma (None, 1344, 288) 1152 conv1d[0][0]
__________________________________________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 448, 288) 0 batch_normalization[0][0]
__________________________________________________________________________________________________
gelu_1 (GELU) (None, 448, 288) 0 max_pooling1d[0][0]
__________________________________________________________________________________________________
conv1d_1 (Conv1D) (None, 448, 288) 414720 gelu_1[0][0]
__________________________________________________________________________________________________
batch_normalization_1 (BatchNor (None, 448, 288) 1152 conv1d_1[0][0]
__________________________________________________________________________________________________
max_pooling1d_1 (MaxPooling1D) (None, 224, 288) 0 batch_normalization_1[0][0]
__________________________________________________________________________________________________
gelu_2 (GELU) (None, 224, 288) 0 max_pooling1d_1[0][0]
__________________________________________________________________________________________________
conv1d_2 (Conv1D) (None, 224, 323) 465120 gelu_2[0][0]
__________________________________________________________________________________________________
batch_normalization_2 (BatchNor (None, 224, 323) 1292 conv1d_2[0][0]
__________________________________________________________________________________________________
max_pooling1d_2 (MaxPooling1D) (None, 112, 323) 0 batch_normalization_2[0][0]
__________________________________________________________________________________________________
gelu_3 (GELU) (None, 112, 323) 0 max_pooling1d_2[0][0]
__________________________________________________________________________________________________
conv1d_3 (Conv1D) (None, 112, 363) 586245 gelu_3[0][0]
__________________________________________________________________________________________________
batch_normalization_3 (BatchNor (None, 112, 363) 1452 conv1d_3[0][0]
__________________________________________________________________________________________________
max_pooling1d_3 (MaxPooling1D) (None, 56, 363) 0 batch_normalization_3[0][0]
__________________________________________________________________________________________________
gelu_4 (GELU) (None, 56, 363) 0 max_pooling1d_3[0][0]
__________________________________________________________________________________________________
conv1d_4 (Conv1D) (None, 56, 407) 738705 gelu_4[0][0]
__________________________________________________________________________________________________
batch_normalization_4 (BatchNor (None, 56, 407) 1628 conv1d_4[0][0]
__________________________________________________________________________________________________
max_pooling1d_4 (MaxPooling1D) (None, 28, 407) 0 batch_normalization_4[0][0]
__________________________________________________________________________________________________
gelu_5 (GELU) (None, 28, 407) 0 max_pooling1d_4[0][0]
__________________________________________________________________________________________________
conv1d_5 (Conv1D) (None, 28, 456) 927960 gelu_5[0][0]
__________________________________________________________________________________________________
batch_normalization_5 (BatchNor (None, 28, 456) 1824 conv1d_5[0][0]
__________________________________________________________________________________________________
max_pooling1d_5 (MaxPooling1D) (None, 14, 456) 0 batch_normalization_5[0][0]
__________________________________________________________________________________________________
gelu_6 (GELU) (None, 14, 456) 0 max_pooling1d_5[0][0]
__________________________________________________________________________________________________
conv1d_6 (Conv1D) (None, 14, 512) 1167360 gelu_6[0][0]
__________________________________________________________________________________________________
batch_normalization_6 (BatchNor (None, 14, 512) 2048 conv1d_6[0][0]
__________________________________________________________________________________________________
max_pooling1d_6 (MaxPooling1D) (None, 7, 512) 0 batch_normalization_6[0][0]
__________________________________________________________________________________________________
gelu_7 (GELU) (None, 7, 512) 0 max_pooling1d_6[0][0]
__________________________________________________________________________________________________
conv1d_7 (Conv1D) (None, 7, 256) 131072 gelu_7[0][0]
__________________________________________________________________________________________________
batch_normalization_7 (BatchNor (None, 7, 256) 1024 conv1d_7[0][0]
__________________________________________________________________________________________________
gelu_8 (GELU) (None, 7, 256) 0 batch_normalization_7[0][0]
__________________________________________________________________________________________________
reshape (Reshape) (None, 1, 1792) 0 gelu_8[0][0]
__________________________________________________________________________________________________
dense (Dense) (None, 1, 32) 57344 reshape[0][0]
__________________________________________________________________________________________________
cell_embed (InputLayer) [(None, 91, 32)] 0
__________________________________________________________________________________________________
batch_normalization_8 (BatchNor (None, 1, 32) 128 dense[0][0]
__________________________________________________________________________________________________
lambda (Lambda) (None, 91, 32) 0 cell_embed[0][0]
__________________________________________________________________________________________________
dropout (Dropout) (None, 1, 32) 0 batch_normalization_8[0][0]
__________________________________________________________________________________________________
layer_normalization (LayerNorma (None, 91, 32) 64 lambda[0][0]
__________________________________________________________________________________________________
gelu_9 (GELU) (None, 1, 32) 0 dropout[0][0]
__________________________________________________________________________________________________
dense_1 (Dense) (None, 91, 64) 2112 layer_normalization[0][0]
__________________________________________________________________________________________________
tf.compat.v1.squeeze (TFOpLambd (None, 32) 0 gelu_9[0][0]
__________________________________________________________________________________________________
sequencing_depth (InputLayer) [(None, 91)] 0
__________________________________________________________________________________________________
final_cellembed (Dense) (None, 91, 32) 2080 dense_1[0][0]
__________________________________________________________________________________________________
tf.expand_dims (TFOpLambda) (None, 32, 1) 0 tf.compat.v1.squeeze[0][0]
__________________________________________________________________________________________________
tf.expand_dims_1 (TFOpLambda) (None, 91, 1) 0 sequencing_depth[0][0]
__________________________________________________________________________________________________
tf.linalg.matmul (TFOpLambda) (None, 91, 1) 0 final_cellembed[0][0]
tf.expand_dims[0][0]
__________________________________________________________________________________________________
dense_2 (Dense) (None, 91, 1) 2 tf.expand_dims_1[0][0]
__________________________________________________________________________________________________
tf.compat.v1.squeeze_2 (TFOpLam (None, 91) 0 tf.linalg.matmul[0][0]
__________________________________________________________________________________________________
tf.compat.v1.squeeze_1 (TFOpLam (None, 91) 0 dense_2[0][0]
__________________________________________________________________________________________________
tf.__operators__.add (TFOpLambd (None, 91) 0 tf.compat.v1.squeeze_2[0][0]
tf.compat.v1.squeeze_1[0][0]
__________________________________________________________________________________________________
tf.math.sigmoid (TFOpLambda) (None, 91) 0 tf.__operators__.add[0][0]
==================================================================================================
Total params: 4,524,068
Trainable params: 4,518,218
Non-trainable params: 5,850
__________________________________________________________________________________________________
6. Set training callbacks...
7. Start training...
Model will be saved to: data/quick_start/train_out/E1000best_model.h5
Epoch 1/10
2025-08-22 15:03:32.399414: I tensorflow/compiler/mlir/mlir_graph_optimization_pass.cc:185] None of the MLIR Optimization Passes are enabled (registered 2)
2025-08-22 15:03:34.311489: I tensorflow/stream_executor/cuda/cuda_dnn.cc:369] Loaded cuDNN version 8800
2025-08-22 15:03:34.384349: I tensorflow/stream_executor/cuda/cuda_blas.cc:1760] TensorFloat-32 will be used for the matrix multiplication. This will only be logged once.
7/7 [==============================] - 10s 379ms/step - loss: 0.8043 - binary_accuracy: 0.7448 - auc: 0.5669 - pr: 0.1801 - val_loss: 178.2907 - val_binary_accuracy: 0.8591 - val_auc: 0.5000 - val_pr: 0.1409
Epoch 2/10
7/7 [==============================] - 1s 74ms/step - loss: 0.4698 - binary_accuracy: 0.8192 - auc: 0.6475 - pr: 0.2310 - val_loss: 6.9911 - val_binary_accuracy: 0.8555 - val_auc: 0.5062 - val_pr: 0.1476
Epoch 3/10
7/7 [==============================] - 1s 73ms/step - loss: 0.4251 - binary_accuracy: 0.8224 - auc: 0.6977 - pr: 0.2798 - val_loss: 1.7770 - val_binary_accuracy: 0.8257 - val_auc: 0.6334 - val_pr: 0.2380
Epoch 4/10
7/7 [==============================] - 1s 73ms/step - loss: 0.4052 - binary_accuracy: 0.8393 - auc: 0.7175 - pr: 0.3142 - val_loss: 0.7914 - val_binary_accuracy: 0.8022 - val_auc: 0.6343 - val_pr: 0.2196
Epoch 5/10
7/7 [==============================] - 1s 77ms/step - loss: 0.3960 - binary_accuracy: 0.8454 - auc: 0.7162 - pr: 0.3061 - val_loss: 0.5019 - val_binary_accuracy: 0.8017 - val_auc: 0.6692 - val_pr: 0.2455
Epoch 6/10
7/7 [==============================] - 1s 71ms/step - loss: 0.3873 - binary_accuracy: 0.8467 - auc: 0.7273 - pr: 0.3299 - val_loss: 0.4448 - val_binary_accuracy: 0.7977 - val_auc: 0.6930 - val_pr: 0.2637
Epoch 7/10
7/7 [==============================] - 1s 73ms/step - loss: 0.3859 - binary_accuracy: 0.8438 - auc: 0.7382 - pr: 0.3403 - val_loss: 0.4215 - val_binary_accuracy: 0.8044 - val_auc: 0.7035 - val_pr: 0.2759
Epoch 8/10
7/7 [==============================] - 1s 73ms/step - loss: 0.3791 - binary_accuracy: 0.8476 - auc: 0.7433 - pr: 0.3548 - val_loss: 0.3979 - val_binary_accuracy: 0.8332 - val_auc: 0.7113 - val_pr: 0.2880
Epoch 9/10
7/7 [==============================] - 1s 71ms/step - loss: 0.3783 - binary_accuracy: 0.8477 - auc: 0.7458 - pr: 0.3519 - val_loss: 0.3838 - val_binary_accuracy: 0.8457 - val_auc: 0.7182 - val_pr: 0.2903
Epoch 10/10
7/7 [==============================] - 1s 73ms/step - loss: 0.3785 - binary_accuracy: 0.8485 - auc: 0.7446 - pr: 0.3521 - val_loss: 0.3801 - val_binary_accuracy: 0.8504 - val_auc: 0.7290 - val_pr: 0.3060
8. Save training results...
=== Training completed! ===
Best model: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/source/data/quick_start/train_out/E1000best_model.h5
Training history: /picb/bigdata/project/miaoyuanyuan/train/XChrom_test/XChrom/source/data/quick_start/train_out/history.pickle
[5]:
xc.pl.plot_train_history(
history = history['history'],
savefig = True,
out_file = './data/quick_start/train_out/train_history_plot.pdf'
)
XChrom has been successfully installed and loaded! You can now use XChrom for analysis.