Sangsang/rlsd_Qwen3-4B-Instruct-2507_lora32_n2048_seed42_lr1e-06_mcl8192_within_batch Text Generation • Updated 4 days ago • 18
Sangsang/rlsd_Qwen3-4B-Base_lora32_n2048_seed42_lr1e-06_mcl8192_within_batch Text Generation • Updated 4 days ago • 16
Sangsang/rlsd_Qwen3-4B_lora32_n2048_seed42_lr1e-06_mcl16384_within_batch Text Generation • Updated 4 days ago • 11
Sangsang/sdpo_Qwen3-4B-Instruct-2507_lora32_n2048_seed42_lr1e-05_mcl8192_full_voacb Text Generation • Updated 4 days ago • 17
Sangsang/sdpo_Qwen3-4B-Base_lora32_n2048_seed42_lr1e-05_mcl8192_full_voacb Text Generation • Updated 4 days ago • 10
Sangsang/sdpo_Qwen3-4B_lora32_n2048_seed42_lr1e-05_mcl16384_full_voacb Text Generation • Updated 4 days ago • 16
Sangsang/grpo_Qwen3-4B-Instruct-2507_lora32_n2048_seed42_lr1e-05_mcl8192 Text Generation • Updated 4 days ago • 19
Sangsang/grpo_Qwen3-4B-Instruct-2507_lora32_n2048_seed42_lr1e-06_mcl8192 Text Generation • Updated 4 days ago • 19
Sangsang/grpo_Qwen3-4B-Base_lora32_n2048_seed42_lr1e-06_mcl8192 Text Generation • Updated 4 days ago • 16
Sangsang/grpo_Qwen3-4B_lora32_n2048_seed42_lr1e-06_mcl16384 Text Generation • Updated 4 days ago • 15
Sangsang/Olmo-3-7B-Instruct-SFT-ContextGRPOwDistill_2x4_eps20 Text Generation • Updated about 1 month ago • 11
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen2.5-7B-Instruct_bw0p75_fw0p25_ema0p999_ep30 Text Generation • Updated Apr 23 • 2
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen2.5-7B-Instruct_bw0p25_fw0p75_ema0p999_ep30 Text Generation • Updated Apr 23 • 2
Sangsang/feedback_asymmetric_kl_fixed_ema_Llama-3.1-8B-Instruct_bw0p75_fw0p25_ema0p999_ep30 Text Generation • Updated Apr 23 • 4
Sangsang/feedback_asymmetric_kl_fixed_ema_Llama-3.1-8B-Instruct_bw0p25_fw0p75_ema0p999_ep30 Text Generation • Updated Apr 23 • 4
Sangsang/feedback_asymmetric_kl_fixed_ema_Qwen3-14B_bw0p5_fw0p5_ema0p999_ep30 Text Generation • Updated Apr 15 • 5
Sangsang/grpo_Qwen3-0.6B_bs16_g16_mb128_lr1e-6_b1e-3_clip0p2_temp0p7_ep30 Text Generation • Updated Apr 12 • 2