AlloGen / README.md

Update README.md

2327665 verified 2 days ago

4.52 kB

	---
	license: apache-2.0
	tags:
	- protein-design
	- allosteric
	- state-selectivity
	- guided-generation
	- rfdiffusion
	- pxdesign
	- proteina
	library_name: pytorch
	---

	# AlloGen


	![allogen](https://cdn-uploads.huggingface.co/production/uploads/64cd5b3f0494187a9e8b7c69/et5-pzgiGiAH0uVqvs8tM.png)

	State-selectivity scoring + guided generation for allosteric binder design.

	🧪 One-click demo for biology users:
	[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/#fileId=https%3A//huggingface.co/ChatterjeeLab/AlloGen/raw/main/notebooks/AlloGen_CaM_demo.ipynb) — score CaM binders and run Q_θ-guided PXDesign sampling in 5 minutes. Notebook lives at [`notebooks/AlloGen_CaM_demo.ipynb`](notebooks/AlloGen_CaM_demo.ipynb).

	AlloGen trains a scorer Q_θ(X, Y) ∈ (0,1) that ranks how well a binder Y discriminates a target's holo (active) state X¹ from its apo (inactive) state X⁰. The selectivity score is:

	S(Y) = Q_θ(X¹, Y) − Q_θ(X⁰, Y)

	Q_θ serves as both a re-ranker (best-of-K) and a gradient signal for guided generation on top of frozen priors (RFdiffusion, PXDesign, Proteina-ComplexA) via Langevin, SMC, TDS, or classifier guidance.

	This repository accompanies the paper AlloGen: AlloGen: Conformation-Selective Binder Generation with Differential State Scoring (arXiv 2026).

	## Installation

	```bash
	conda env create -f environment.yml
	conda activate allogen
	```

	Or pip-only:

	```bash
	python -m venv .venv && source .venv/bin/activate
	pip install -r requirements.txt
	```

	Python 3.10 + PyTorch 2.x are required. A CUDA GPU is recommended for guidance, but CPU works for scoring single designs.

	## Inference quickstart

	```bash
	# Score the bundled CaM inference sample against the v4-S2 (target-swap) checkpoint
	python code/scripts/evaluate.py \
	--target cam \
	--checkpoint checkpoints/Q_theta_phase2.pt \
	--data_dir data/sample/ \
	--outdir /tmp/cam_inference \
	--no_wandb
	```

	See [`inference.md`](inference.md) for the scoring API + guidance command lines.

	## Repo layout

	```
	code/
	data/ dataset / graph construction, PDB I/O, target YAMLs
	models/ Q_θ scorer (graph transformer) + differentiable wrapper
	trainers/ two-phase training loop (DockQ regression + selectivity)
	utils/ PDB I/O, backbone frames, SAM optimizer
	scripts/ evaluate, rescore, PXDesign guidance (see scripts/README.md)
	checkpoints/ Q_θ paper weights (v4-S2 target-swap split, via Git LFS)
	data/sample/ tiny CaM inference sample (test split only)
	```

	## Checkpoints

	Paper weights for the v4-S2 target-swap split are bundled via Git LFS:

	```bash
	git lfs install
	git lfs pull
	```

	\| File \| Use \|
	\|---\|---\|
	\| `checkpoints/Q_theta_phase1.pt` \| Phase 1 (DockQ regression) intermediate checkpoint \|
	\| `checkpoints/Q_theta_phase2.pt` \| Phase 2 (selectivity) — main paper result \|
	\| `checkpoints/Q_theta_train_curve.csv` \| Training curve metadata \|

	## Scoring a single design

	```python
	import sys; sys.path.insert(0, 'code')
	from models.differentiable_features import DifferentiableQTheta

	scorer = DifferentiableQTheta(
	checkpoint='checkpoints/Q_theta_phase2.pt',
	device='cuda:0',
	)
	scorer.load_receptor(
	holo_path='your_holo.pdb', rec_chain='A',
	apo_path='your_apo.pdb', apo_chain='A',
	)
	q_holo = scorer.score('design.pdb', binder_chain='B', state='holo')
	q_apo = scorer.score('design.pdb', binder_chain='B', state='apo')
	print(f'S = {q_holo - q_apo:.3f}')
	```

	## Guidance methods

	The shipped guidance code wraps PXDesign as the prior and uses Q_θ as the gradient / classifier signal. All four method variants (Langevin, SMC, TDS, classifier guidance) live in `code/scripts/pxdesign_guidance/`.

	See [`inference.md`](inference.md) §3 for command lines.

	To deploy Q_θ with RFdiffusion, Proteina-ComplexA, or any other backbone prior, see [`code/scripts/README.md`](code/scripts/README.md) — Q_θ exposes `DifferentiableQTheta` for `∇_x S(x)`, and the PXDesign code is a worked template to mirror.

	## Citation

	```bibtex
	@article{cao2026allogen,
	title = {AlloGen: Conformation-Selective Binder Generation with Differential State Scoring},
	author = {Cao, Hanqun and Quinn, Zachary and Pal, Aastha and Kimura, Sumi and Zhang, Jingjie and Heng, Pheng Ann and Chatterjee, Pranam},
	year = {2026},
	eprint = {2606.05474},
	archivePrefix = {arXiv},
	primaryClass = {q-bio.BM}
	}
	```