| --- |
| license: apache-2.0 |
| tags: |
| - protein-design |
| - allosteric |
| - state-selectivity |
| - guided-generation |
| - rfdiffusion |
| - pxdesign |
| - proteina |
| library_name: pytorch |
| --- |
| |
| # AlloGen |
|
|
|
|
|  |
|
|
| State-selectivity scoring + guided generation for allosteric binder design. |
|
|
| 🧪 **One-click demo for biology users:** |
| [](https://colab.research.google.com/#fileId=https%3A//huggingface.co/ChatterjeeLab/AlloGen/raw/main/notebooks/AlloGen_CaM_demo.ipynb) — score CaM binders and run Q_θ-guided PXDesign sampling in 5 minutes. Notebook lives at [`notebooks/AlloGen_CaM_demo.ipynb`](notebooks/AlloGen_CaM_demo.ipynb). |
| |
| AlloGen trains a scorer Q_θ(X, Y) ∈ (0,1) that ranks how well a binder Y discriminates a target's **holo** (active) state X¹ from its **apo** (inactive) state X⁰. The selectivity score is: |
|
|
| S(Y) = Q_θ(X¹, Y) − Q_θ(X⁰, Y) |
| |
| Q_θ serves as both a re-ranker (best-of-K) and a gradient signal for guided generation on top of frozen priors (RFdiffusion, PXDesign, Proteina-ComplexA) via Langevin, SMC, TDS, or classifier guidance. |
| |
| This repository accompanies the paper *AlloGen: AlloGen: Conformation-Selective Binder Generation with Differential State Scoring* (arXiv 2026). |
| |
| ## Installation |
| |
| ```bash |
| conda env create -f environment.yml |
| conda activate allogen |
| ``` |
| |
| Or pip-only: |
| |
| ```bash |
| python -m venv .venv && source .venv/bin/activate |
| pip install -r requirements.txt |
| ``` |
| |
| Python 3.10 + PyTorch 2.x are required. A CUDA GPU is recommended for guidance, but CPU works for scoring single designs. |
| |
| ## Inference quickstart |
| |
| ```bash |
| # Score the bundled CaM inference sample against the v4-S2 (target-swap) checkpoint |
| python code/scripts/evaluate.py \ |
| --target cam \ |
| --checkpoint checkpoints/Q_theta_phase2.pt \ |
| --data_dir data/sample/ \ |
| --outdir /tmp/cam_inference \ |
| --no_wandb |
| ``` |
| |
| See [`inference.md`](inference.md) for the scoring API + guidance command lines. |
|
|
| ## Repo layout |
|
|
| ``` |
| code/ |
| data/ dataset / graph construction, PDB I/O, target YAMLs |
| models/ Q_θ scorer (graph transformer) + differentiable wrapper |
| trainers/ two-phase training loop (DockQ regression + selectivity) |
| utils/ PDB I/O, backbone frames, SAM optimizer |
| scripts/ evaluate, rescore, PXDesign guidance (see scripts/README.md) |
| checkpoints/ Q_θ paper weights (v4-S2 target-swap split, via Git LFS) |
| data/sample/ tiny CaM inference sample (test split only) |
| ``` |
|
|
| ## Checkpoints |
|
|
| Paper weights for the **v4-S2 target-swap** split are bundled via **Git LFS**: |
|
|
| ```bash |
| git lfs install |
| git lfs pull |
| ``` |
|
|
| | File | Use | |
| |---|---| |
| | `checkpoints/Q_theta_phase1.pt` | Phase 1 (DockQ regression) intermediate checkpoint | |
| | `checkpoints/Q_theta_phase2.pt` | Phase 2 (selectivity) — main paper result | |
| | `checkpoints/Q_theta_train_curve.csv` | Training curve metadata | |
|
|
| ## Scoring a single design |
|
|
| ```python |
| import sys; sys.path.insert(0, 'code') |
| from models.differentiable_features import DifferentiableQTheta |
| |
| scorer = DifferentiableQTheta( |
| checkpoint='checkpoints/Q_theta_phase2.pt', |
| device='cuda:0', |
| ) |
| scorer.load_receptor( |
| holo_path='your_holo.pdb', rec_chain='A', |
| apo_path='your_apo.pdb', apo_chain='A', |
| ) |
| q_holo = scorer.score('design.pdb', binder_chain='B', state='holo') |
| q_apo = scorer.score('design.pdb', binder_chain='B', state='apo') |
| print(f'S = {q_holo - q_apo:.3f}') |
| ``` |
|
|
| ## Guidance methods |
|
|
| The shipped guidance code wraps **PXDesign** as the prior and uses Q_θ as the gradient / classifier signal. All four method variants (Langevin, SMC, TDS, classifier guidance) live in `code/scripts/pxdesign_guidance/`. |
|
|
| See [`inference.md`](inference.md) §3 for command lines. |
|
|
| To deploy Q_θ with **RFdiffusion**, **Proteina-ComplexA**, or any other backbone prior, see [`code/scripts/README.md`](code/scripts/README.md) — Q_θ exposes `DifferentiableQTheta` for `∇_x S(x)`, and the PXDesign code is a worked template to mirror. |
|
|
| ## Citation |
|
|
| ```bibtex |
| @article{cao2026allogen, |
| title = {AlloGen: Conformation-Selective Binder Generation with Differential State Scoring}, |
| author = {Cao, Hanqun and Quinn, Zachary and Pal, Aastha and Kimura, Sumi and Zhang, Jingjie and Heng, Pheng Ann and Chatterjee, Pranam}, |
| year = {2026}, |
| eprint = {2606.05474}, |
| archivePrefix = {arXiv}, |
| primaryClass = {q-bio.BM} |
| } |
| ``` |
|
|