Qwen3.5-35B-A3B-Turbo-SWE v0.0.1

A coding-focused fine-tune of Qwen3.5-35B-A3B using SFT + GRPO on a mixture of real-world coding agent trajectories from Codex, Claude Code, and OpenCode sessions.

What is this?

Qwen3.5-35B-A3B is a 35B parameter MoE model with only 3B active parameters per token. This fine-tune improves its coding capabilities by training on real-world software engineering trajectories.

Training Pipeline

  1. Data Extraction: 4,551 training pairs extracted from 4,756 coding agent sessions (Codex, Claude Code, OpenCode)
  2. Labeling: 3,580 pairs labeled with Claude Opus 4.6 for quality scoring (avg reward 0.477)
  3. SFT Phase: bf16 LoRA (rank 64) on 2,674 high-quality pairs. Loss: 1.438 -> 0.509 (-65%)
  4. GRPO Phase: Group Relative Policy Optimization with G=8 sampling, 200 prompts x 8 completions = 1,600 scored samples. Execution-based reward function (compile + run).
  5. GRPO Weight Update: RFT on 161 best completions (reward >= 0.5). Loss converged to 1.97.

Architecture

Property Value
Base Model Qwen3.5-35B-A3B
Total Parameters 35B
Active Parameters ~3B per token
Experts 256 total, 8 routed + 1 shared
Attention Hybrid: 30x Gated DeltaNet + 10x Full Attention
Context 262K native
Quantization Q4_K_M (4.88 BPW)

Performance

Metric Value
Prompt eval 139 tok/s (M3 Ultra)
Decode 70.7 tok/s (M3 Ultra)
Model size 20GB (Q4_K_M)
VRAM usage ~21GB

Usage

llama.cpp

llama-server -m Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1-Q4_K_M.gguf --port 8080

Ollama

echo 'FROM Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1-Q4_K_M.gguf' > Modelfile
ollama create turbo-swe -f Modelfile
ollama run turbo-swe

OpenAI-compatible API

from openai import OpenAI
client = OpenAI(base_url="http://localhost:8080/v1", api_key="none")
response = client.chat.completions.create(
    model="turbo-swe",
    messages=[{"role": "user", "content": "Write a prime checker in Python"}],
)
print(response.choices[0].message.content)

Training Details

  • Hardware: RTX PRO 6000 Blackwell (96GB VRAM)
  • Training Data: Mixture of real coding agent trajectories from Codex, Claude Code, and OpenCode sessions (coding, debugging, refactoring)
  • SFT Duration: ~3 hours (670 steps)
  • GRPO Sampling: ~2 hours (1,600 completions)
  • GRPO Update: ~12 minutes (123 steps)
  • Reward Function: Execution-based (parse + compile + run)

License

Apache 2.0 (same as base model)

Downloads last month
348
GGUF
Model size
35B params
Architecture
qwen35moe
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for rachpradhan/Qwen3.5-35B-A3B-Turbo-SWE-v0.0.1

Quantized
(7)
this model