Qwen3.5-35B-A3B-Turbo-SWE v0.0.1

A coding-focused fine-tune of Qwen3.5-35B-A3B using SFT + GRPO on a mixture of real-world coding agent trajectories from Codex, Claude Code, and OpenCode sessions.

What is this?

Qwen3.5-35B-A3B is a 35B parameter MoE model with only 3B active parameters per token. This fine-tune improves its coding capabilities by training on real-world software engineering trajectories.

Training Pipeline

Data Extraction: 4,551 training pairs extracted from 4,756 coding agent sessions (Codex, Claude Code, OpenCode)
Labeling: 3,580 pairs labeled with Claude Opus 4.6 for quality scoring (avg reward 0.477)
SFT Phase: bf16 LoRA (rank 64) on 2,674 high-quality pairs. Loss: 1.438 -> 0.509 (-65%)
GRPO Phase: Group Relative Policy Optimization with G=8 sampling, 200 prompts x 8 completions = 1,600 scored samples. Execution-based reward function (compile + run).
GRPO Weight Update: RFT on 161 best completions (reward >= 0.5). Loss converged to 1.97.

Architecture

Property	Value
Base Model	Qwen3.5-35B-A3B
Total Parameters	35B
Active Parameters	~3B per token
Experts	256 total, 8 routed + 1 shared
Attention	Hybrid: 30x Gated DeltaNet + 10x Full Attention
Context	262K native
Quantization	Q4_K_M (4.88 BPW)

Performance

Metric	Value
Prompt eval	139 tok/s (M3 Ultra)
Decode	70.7 tok/s (M3 Ultra)
Model size	20GB (Q4_K_M)
VRAM usage	~21GB

Usage

llama.cpp

llama-server -m Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1-Q4_K_M.gguf --port 8080

Ollama

echo 'FROM Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1-Q4_K_M.gguf' > Modelfile
ollama create turbo-swe -f Modelfile
ollama run turbo-swe

OpenAI-compatible API

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="none")
response = client.chat.completions.create(
    model="turbo-swe",
    messages=[{"role": "user", "content": "Write a prime checker in Python"}],
)
print(response.choices[0].message.content)

Training Details

Hardware: RTX PRO 6000 Blackwell (96GB VRAM)
Training Data: Mixture of real coding agent trajectories from Codex, Claude Code, and OpenCode sessions (coding, debugging, refactoring)
SFT Duration: ~3 hours (670 steps)
GRPO Sampling: ~2 hours (1,600 completions)
GRPO Update: ~12 minutes (123 steps)
Reward Function: Execution-based (parse + compile + run)

License

Apache 2.0 (same as base model)

Downloads last month: 348

GGUF

Model size

35B params

Architecture

qwen35moe

Hardware compatibility

4-bit

Model tree for rachpradhan/Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1

Base model

Qwen/Qwen3.5-35B-A3B-Base

Quantized

(7)

this model