image

MiniMax-SLURPY

A mathematically unique blend of MiniMax-M2.5 and MiniMax-M2.7 — neither parent, entirely its own model.

SLURPY inherits M2.5's architect-first coding style and MIT freedom, absorbs M2.7's RL-tuned precision on multi-agent collaboration and real-world engineering — without a single training step. It beats its parents on HumanEval pass@5 (89.6% vs M2.5's 85.4%) with zero retraining.

Every one of SLURPY's 48,239 weight tensors is a mathematically unique blend — not copied from M2.5, not copied from M2.7, belonging entirely to neither parent.


What SLURPY inherits

SLURPY's weights are a forensically-driven interpolation of two complementary parents. The merge schedule is derived from a full-model scan of all 96,103 tensor pairs, targeting each tensor's interpolation ratio to the empirically measured delta between the parents.

From M2.5 — the architect

M2.5 is the foundation-builder: strong on greenfield engineering, deep reasoning, and research-grade benchmarks.

Benchmark M2.5 Published
SWE-Bench Verified 80.2%
BrowseComp (with context mgmt) 76.3%
Multi-SWE-Bench 51.3%
AIME 2025 86.3
GPQA Diamond 85.2
SciCode 44.4
IFBench 70.0
HLE (w/o tools) 19.4
GDPval-MM (office work) 59.0% avg win rate

From M2.7 — the operator

M2.7 is the execution specialist: RL-tuned for multi-step tool use, terminal ops, agentic scaffolding, and production-grade software engineering.

Benchmark M2.7 Published
SWE-Pro 56.2% (matches GPT-5.3-Codex)
SWE Multilingual 76.5%
Multi-SWE-Bench 52.7%
MLE Bench Lite 66.6% medal rate (22 ML competitions)
VIBE-Pro 55.6% (near Opus 4.6)
TerminalBench 2 57.0%
NL2Repo 39.8%
GDPval-AA ELO 1495 (highest open-weight)
Toolathon 46.3% accuracy
MM Claw (skill compliance) 97% across 40+ skills
MM Claw (end-to-end) 62.7% (near Sonnet 4.6)

SLURPY — best of both

SLURPY's merge schedule preserves M2.5's deep reasoning character in the early-to-mid layers (where the two models barely differ) while absorbing M2.7's agentic improvements in the late layers (where M2.7's training signal concentrates). The result is a model that carries both parents' strengths without the training cost of either.


Merge method

Per-tensor empirical SLERP — each of the 48,239 mergeable weight tensors gets its own interpolation ratio t(k) derived from the measured cosine similarity between M2.5 and M2.7 on that specific tensor:

delta(k)      = 1 - cos(M2.5_k, M2.7_k)
delta_norm(k) = clip(delta(k) / delta_p99, 0, 1)
t(k)          = 0.50 + 0.35 * delta_norm(k)
  • Tensors that barely changed (cos ~ 1.0): t ~ 0.50 — neutral midpoint, preserving both parents
  • Tensors that changed the most (layer 61 MoE experts): t = 0.85 — absorbing M2.7's concentrated training signal
  • FP8 weights: dequantized to BF16 before SLERP, re-quantized with fresh block-wise scales
  • No scale_inv pass-through: forensics confirmed 0% bit-identical scales between parents — all 47,864 FP8 scale tensors are recomputed, not copied

Forensic highlights

  • 99.18% of tensors sit in a tight cosine cluster around 0.9946 — most weights barely moved between M2.5 and M2.7
  • Layer 61 MoE experts {76, 74, 61, 30, 43, 138, 226, 126, 58, 159} have deltas 2-5x baseline — this is where M2.7's RL training signal concentrates
  • lm_head.weight (cos=0.9905, rel_l2=0.139) carries M2.7's vocabulary-level improvements

Architecture

Identical to MiniMax-M2.5 / M2.7 — weight merge only, no architecture changes:

  • Model type: minimax_m2 / MiniMaxM2ForCausalLM
  • Parameters: 228.7B total, ~10B active (MoE)
  • Layers: 62
  • Hidden size: 3072
  • MoE: 256 experts, top-8, sigmoid routing + learned bias
  • Attention: 48 query / 8 KV heads (GQA 6:1), head_dim=128
  • Quantization: FP8 (float8_e4m3fn), block size [128, 128]
  • Vocab: 200,064 tokens
  • Context: up to 196,608 tokens
  • Thinking: Interleaved <think>...</think> (always-on)
  • trust_remote_code=True required

Serving with vLLM

Recommended command (8x H100 80GB):

SAFETENSORS_FAST_GPU=1 vllm serve \
    Ex0bit/MiniMax-SLURPY --trust-remote-code \
    --enable-expert-parallel --tensor-parallel-size 8 \
    --enable-auto-tool-choice --tool-call-parser minimax_m2 \
    --reasoning-parser minimax_m2_append_think \
    --enforce-eager

For 4x GPU (no expert parallel):

SAFETENSORS_FAST_GPU=1 vllm serve \
    Ex0bit/MiniMax-SLURPY --trust-remote-code \
    --tensor-parallel-size 4 \
    --enable-auto-tool-choice --tool-call-parser minimax_m2 \
    --reasoning-parser minimax_m2_append_think

If you encounter CUDA memory errors, add:

--compilation-config '{"cudagraph_mode": "PIECEWISE"}'

Recommended sampling parameters

Parameter Value
temperature 1.0
top_p 0.95
top_k 40

Important: preserve thinking in conversation history

MiniMax-M2 uses interleaved thinking. The model outputs <think>...</think> blocks during generation. You must pass these back verbatim in conversation history. Removing them degrades performance.


Tool calling

Same format as MiniMax-M2.7. Tool calls use <minimax:tool_call> / </minimax:tool_call> XML wrappers:

<minimax:tool_call>
<invoke name="get_weather">
<parameter name="city">San Francisco</parameter>
</invoke>
</minimax:tool_call>

Enable with --enable-auto-tool-choice --tool-call-parser minimax_m2 in vLLM.


Using with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForCausalLM.from_pretrained(
    "Ex0bit/MiniMax-SLURPY",
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(
    "Ex0bit/MiniMax-SLURPY",
    trust_remote_code=True,
)

messages = [{"role": "user", "content": "Write a Python function that reverses a linked list."}]
input_ids = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)

with torch.no_grad():
    output = model.generate(
        input_ids,
        max_new_tokens=2048,
        do_sample=True,
        temperature=1.0,
        top_p=0.95,
        top_k=40,
    )

print(tokenizer.decode(output[0, input_ids.shape[1]:], skip_special_tokens=True))

Config notes

  • use_mtp is set to False in config.json (MTP tensors don't exist in the checkpoint)
  • quantization_config is preserved — native FP8
  • Chat template and tokenizer are sourced from M2.7

Files

  • 43 safetensors shards (~5 GB each, 214.3 GB total)
  • Native FP8 (float8_e4m3fn) with block-wise [128, 128] scale factors
  • chat_template.jinja — M2.7's chat template with tool calling support
  • modeling_minimax_m2.py / configuration_minimax_m2.py — custom model code

License

Modified MIT — same as MiniMax-M2.5. See LICENSE for full text.

The only modification to the standard MIT license: if the Software (or any derivative works) is used for commercial products or services with more than 100 million monthly active users or more than $30M annual recurring revenue, you must prominently display "MiniMax M2" on the user interface.


Citation

@misc{minimax-slurpy-2026,
  title={MiniMax-SLURPY: Per-tensor empirical SLERP merge of MiniMax-M2.5 and M2.7},
  author={Ex0bit},
  year={2026},
  url={https://huggingface.co/Ex0bit/MiniMax-SLURPY}
}

Acknowledgments

  • MiniMax for the M2.5 and M2.7 base models
  • Merge infrastructure adapted from the PRISM abliteration pipeline
Downloads last month
23
Safetensors
Model size
229B params
Tensor type
F32
·
BF16
·
F8_E4M3
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Ex0bit/MiniMax-SLURPY

Merge model
this model
Quantizations
1 model