Instructions to use worstcoder/SD3.5M-DiffusionNFT-MultiReward with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use worstcoder/SD3.5M-DiffusionNFT-MultiReward with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("worstcoder/SD3.5M-DiffusionNFT-MultiReward", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps
- Draw Things
- DiffusionBee
| base_model: | |
| - stabilityai/stable-diffusion-3.5-medium | |
| library_name: diffusers | |
| pipeline_tag: text-to-image | |
| # Model Card | |
| ## Model Details | |
| ### Model Description | |
| This is a reproduced LoRA of SD3.5-Medium, post-trained with DiffusionNFT on multiple reward models, as presented in the paper [Diffusion Negative-aware FineTuning (DiffusionNFT)](https://huggingface.co/papers/2509.16117). | |
| ### Paper Abstract | |
| Online reinforcement learning (RL) has been central to post-training language | |
| models, but its extension to diffusion models remains challenging due to | |
| intractable likelihoods. Recent works discretize the reverse sampling process | |
| to enable GRPO-style training, yet they inherit fundamental drawbacks, | |
| including solver restrictions, forward-reverse inconsistency, and complicated | |
| integration with classifier-free guidance (CFG). We introduce Diffusion | |
| Negative-aware FineTuning (DiffusionNFT), a new online RL paradigm that | |
| optimizes diffusion models directly on the forward process via flow matching. | |
| DiffusionNFT contrasts positive and negative generations to define an implicit | |
| policy improvement direction, naturally incorporating reinforcement signals | |
| into the supervised learning objective. This formulation enables training with | |
| arbitrary black-box solvers, eliminates the need for likelihood estimation, and | |
| requires only clean images rather than sampling trajectories for policy | |
| optimization. DiffusionNFT is up to 25times more efficient than FlowGRPO in | |
| head-to-head comparisons, while being CFG-free. For instance, DiffusionNFT | |
| improves the GenEval score from 0.24 to 0.98 within 1k steps, while FlowGRPO | |
| achieves 0.95 with over 5k steps and additional CFG employment. By leveraging | |
| multiple reward models, DiffusionNFT significantly boosts the performance of | |
| SD3.5-Medium in every benchmark tested. | |
| ### Model Sources | |
| <!-- Provide the basic links for the model. --> | |
| - **Repository:** https://github.com/NVlabs/DiffusionNFT | |
| - **Paper:** https://huggingface.co/papers/2509.16117 | |
| - **Project Page:** https://research.nvidia.com/labs/dir/DiffusionNFT | |
| ## Uses | |
| Please refer to the evaluation script in GitHub. |