Qwen3.5-35B-A3B-Turbo-SWE v0.0.1
A coding-focused fine-tune of Qwen3.5-35B-A3B using SFT + GRPO on a mixture of real-world coding agent trajectories from Codex, Claude Code, and OpenCode sessions.
What is this?
Qwen3.5-35B-A3B is a 35B parameter MoE model with only 3B active parameters per token. This fine-tune improves its coding capabilities by training on real-world software engineering trajectories.
Training Pipeline
- Data Extraction: 4,551 training pairs extracted from 4,756 coding agent sessions (Codex, Claude Code, OpenCode)
- Labeling: 3,580 pairs labeled with Claude Opus 4.6 for quality scoring (avg reward 0.477)
- SFT Phase: bf16 LoRA (rank 64) on 2,674 high-quality pairs. Loss: 1.438 -> 0.509 (-65%)
- GRPO Phase: Group Relative Policy Optimization with G=8 sampling, 200 prompts x 8 completions = 1,600 scored samples. Execution-based reward function (compile + run).
- GRPO Weight Update: RFT on 161 best completions (reward >= 0.5). Loss converged to 1.97.
Architecture
| Property | Value |
|---|---|
| Base Model | Qwen3.5-35B-A3B |
| Total Parameters | 35B |
| Active Parameters | ~3B per token |
| Experts | 256 total, 8 routed + 1 shared |
| Attention | Hybrid: 30x Gated DeltaNet + 10x Full Attention |
| Context | 262K native |
| Quantization | Q4_K_M (4.88 BPW) |
Performance
| Metric | Value |
|---|---|
| Prompt eval | 139 tok/s (M3 Ultra) |
| Decode | 70.7 tok/s (M3 Ultra) |
| Model size | 20GB (Q4_K_M) |
| VRAM usage | ~21GB |
Usage
llama.cpp
llama-server -m Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1-Q4_K_M.gguf --port 8080
Ollama
echo 'FROM Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1-Q4_K_M.gguf' > Modelfile
ollama create turbo-swe -f Modelfile
ollama run turbo-swe
OpenAI-compatible API
from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="none")
response = client.chat.completions.create(
model="turbo-swe",
messages=[{"role": "user", "content": "Write a prime checker in Python"}],
)
print(response.choices[0].message.content)
Training Details
- Hardware: RTX PRO 6000 Blackwell (96GB VRAM)
- Training Data: Mixture of real coding agent trajectories from Codex, Claude Code, and OpenCode sessions (coding, debugging, refactoring)
- SFT Duration: ~3 hours (670 steps)
- GRPO Sampling: ~2 hours (1,600 completions)
- GRPO Update: ~12 minutes (123 steps)
- Reward Function: Execution-based (parse + compile + run)
License
Apache 2.0 (same as base model)
- Downloads last month
- 348
Hardware compatibility
Log In to add your hardware
4-bit
Model tree for rachpradhan/Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1
Base model
Qwen/Qwen3.5-35B-A3B-Base