Learning from Language Feedback via Variational Policy Distillation Paper • 2605.15113 • Published 4 days ago • 6
MiroThinker-1.7 & H1: Towards Heavy-Duty Research Agents via Verification Paper • 2603.15726 • Published Mar 16 • 186
SkillOrchestra: Learning to Route Agents via Skill Transfer Paper • 2602.19672 • Published Feb 23 • 58
Alibaba-Apsara/Superior-Reasoning-SFT-gpt-oss-120b Viewer • Updated Jan 31 • 306k • 1.66k • 347
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts Paper • 2601.17111 • Published Jan 23 • 5
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts Paper • 2601.17111 • Published Jan 23 • 5
Least-Loaded Expert Parallelism: Load Balancing An Imbalanced Mixture-of-Experts Paper • 2601.17111 • Published Jan 23 • 5
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published Nov 24, 2025 • 63
Kimi Linear: An Expressive, Efficient Attention Architecture Paper • 2510.26692 • Published Oct 30, 2025 • 133