| # AlloGen Inference Guide |
|
|
| This guide covers how to score binder designs and apply guidance with the bundled Q_θ checkpoint. Training is not part of the public release — only inference and guidance. |
| |
| > **Env var.** Throughout this doc, `${ALLOGEN_ROOT}` is the path to the cloned repo. Either `cd` into it and use relative paths, or `export ALLOGEN_ROOT=/path/to/AlloGen`. |
| |
| > **Python.** Use the env from `environment.yml` / `requirements.txt`. All scripts insert `code/` into `sys.path` via a `_CODE_DIR` boot block, so they work from any CWD. |
| |
| --- |
| |
| ## 1. Checkpoint |
| |
| The Phase 2 weights `checkpoints/Q_theta_phase2.pt` are the **v4-S2 target-swap split** model used in the paper. Phase 1 (`Q_theta_phase1.pt`) is the DockQ regression intermediate. |
| |
| Pull via Git LFS: |
| |
| ```bash |
| git lfs install |
| git lfs pull |
| ``` |
| |
| --- |
| |
| ## 2. Score binders |
| |
| ### 2a. Python API |
| |
| ```python |
| import sys |
| sys.path.insert(0, 'code') |
| |
| from models.differentiable_features import DifferentiableQTheta |
|
|
| scorer = DifferentiableQTheta( |
| checkpoint='checkpoints/Q_theta_phase2.pt', |
| device='cuda:0', |
| ) |
| scorer.load_receptor( |
| holo_path='holo.pdb', rec_chain='A', |
| apo_path='apo.pdb', apo_chain='A', |
| ) |
| q_holo = scorer.score('design.pdb', binder_chain='B', state='holo') |
| q_apo = scorer.score('design.pdb', binder_chain='B', state='apo') |
| print(f'S = {q_holo - q_apo:.3f}') |
| ``` |
| |
| ### 2b. CLI on the bundled sample |
|
|
| ```bash |
| python code/scripts/evaluate.py \ |
| --target cam \ |
| --checkpoint checkpoints/Q_theta_phase2.pt \ |
| --data_dir data/sample/ \ |
| --outdir /tmp/cam_inference \ |
| --no_wandb |
| ``` |
|
|
| Scores every binder in `data/sample/cam/test.pkl` and writes `tables/eval_cam_test.json` with Spearman ρ, AUC, and selectivity gap. |
|
|
| --- |
|
|
| ## 3. Guidance methods (PXDesign) |
|
|
| The shipped guidance code wraps **PXDesign** as the prior and uses Q_θ as the gradient / classifier signal. |
| |
| | Script | Method | |
| |---|---| |
| | `code/scripts/pxdesign_guidance/langevin_pxdesign.py` | Post-hoc Langevin refinement | |
| | `code/scripts/pxdesign_guidance/smc_pxdesign.py` | Sequential Monte Carlo | |
| | `code/scripts/pxdesign_guidance/tds_pxdesign.py` | Twisted Diffusion Sampler | |
| | `code/scripts/pxdesign_guidance/guided_pxdesign.py` | Classifier guidance | |
| | `code/scripts/pxdesign_guidance/iterative_refinement.py` | Iterative refinement loop | |
| | `code/scripts/pxdesign_guidance/qtheta_pxdesign.py` | Q_θ wrapper used by the above | |
|
|
| Common flags: |
|
|
| - `--checkpoint checkpoints/Q_theta_phase2.pt` |
| - `--holo_pdb your_holo.pdb` / `--apo_pdb your_apo.pdb` |
| - `--output_dir designs/` |
| - `--device cuda:0` |
| - `--seed 42` |
|
|
| Method-specific arguments (steps, batch sizes, guidance scales) are in each script's `argparse` block. |
|
|
| To plug Q_θ into RFdiffusion, Proteina-ComplexA, or any other backbone prior, see `code/scripts/README.md`. |
| |
| --- |
| |
| ## 4. Bundled sample data |
| |
| `data/sample/cam/test.pkl` — held-out test split for Calmodulin (CaM), small enough to run on a laptop CPU in under a minute. **The only data shipped in the repo.** Score your own targets via the Python API in §2a (raw PDBs as input). |
| |
| --- |
| |
| ## 5. Training reproduction |
| |
| Training data, training scripts, and per-target processed graphs are NOT shipped in this public release. The paper's main result (Phase 2 on the **v4-S2 target-swap** split) is provided as a frozen checkpoint at `checkpoints/Q_theta_phase2.pt`. Retraining requires the full pipeline (separate request). |
| |
| --- |
| |
| ## 6. Citation |
| |
| ```bibtex |
| @inproceedings{cao2026allogen, |
| title = {AlloGen: State-Selective Scoring for Allosteric Binder Design}, |
| author = {Cao, Hanqun and others}, |
| booktitle = {Advances in Neural Information Processing Systems (NeurIPS)}, |
| year = {2026} |
| } |
| ``` |
| |