Sebastian Gabarain's picture

Sebastian Gabarain

Locutusque

·

SebastianG74019

AI & ML interests

Pushing performance in small language models

Recent Activity

liked a dataset 2 days ago

Locutusque/esmeralda-agentic

posted an update 2 days ago

🚀 Introducing Esmeralda-Llama-3.1-8B-control The first release in the Esmeralda model family by Locutusque. This model is intentionally small and experimental — a control/baseline proof-of-concept designed to answer one question: «“How strong is my new "Locutusque/esmeralda-agentic" dataset before scaling to larger runs?”» Training Details - Base: Llama 3.1 8B - Training precision: bf16 mixed precision - Chat template: modified ChatML - Dataset size: ~37k examples - Examples actually used for this run: ~5k The dataset includes: - multi-turn agentic traces - reasoning traces - structured assistant behavior - generalist instruction data Benchmark Results Compared against: - Llama 3.1 8B Instruct - Hermes-3-Llama-3.1-8B HumanEval 57.3 — Esmeralda 56.1 — Llama 3.1 Instruct 52.4 — Hermes-3 MBPP 53.2 — Esmeralda 56.8 — Llama 3.1 Instruct 48.2 — Hermes-3 GPQA Diamond 15.7 — Esmeralda 15.7 — Llama 3.1 Instruct 18.2 — Hermes-3 EQ-Bench 59.2 — Esmeralda 61.1 — Llama 3.1 Instruct 63.1 — Hermes-3 EQ-Bench Parseable (Syntax Stability) 🔥 100.0% — Esmeralda 92.4% — Llama 3.1 Instruct 91.2% — Hermes-3 Here Be Dragons 🐉 I also experimented with a new TruthfulQA free-generation evaluation setup. - Responses were judged by Gemma 4 26B A4B - The judge compared generations directly against ground-truth answers - Models were evaluated in 8-bit quantized form to speed up inference TruthfulQA (LLM Judge) 0.682 — Esmeralda-Llama-3.1-8B-control 0.587 — Hermes-3-Llama-3.1-8B (reported MC2 score; methodology differs) For a lightweight control run trained on only a fraction of the dataset, I’m pretty encouraged by the results. The model is released under the standard Llama 3.1 license, and I’d genuinely love feedback from people testing it in real workflows. Model: https://huggingface.co/Locutusque/Esmeralda-Llama-3.1-8B-control Dataset: https://huggingface.co/datasets/Locutusque/esmeralda-agentic

liked a model 2 days ago

Locutusque/Esmeralda-Llama-3.1-8B-control

View all activity

Organizations

Locutusque 's datasets 81

Locutusque/esmeralda-agentic

Viewer • Updated 2 days ago • 37k • 81 • 1

Locutusque/Hermes-3-shuffled

Viewer • Updated 9 days ago • 959k • 97

Locutusque/lordx64-claude-opus-4.7-max-cleaned

Viewer • Updated 9 days ago • 4.81k • 81 • 1

Locutusque/hermes-agent-reasoning-traces-glm-5.1-formatted

Viewer • Updated 9 days ago • 7.06k • 70

Locutusque/claude-opus-4.7-reasoning-4k

Viewer • Updated 9 days ago • 4.03k • 78 • 1

Locutusque/liberalis-cogitator

Viewer • Updated Oct 24, 2025 • 1.1M • 78 • 2

Locutusque/Medical-R1-Distill-Data-ShareGPT

Viewer • Updated Oct 24, 2025 • 22k • 36 • 1

Locutusque/lmsys-best-2

Viewer • Updated Oct 3, 2025 • 26.9k • 12

Locutusque/ultra-dpo-data

Viewer • Updated Aug 17, 2025 • 16.6k • 15

Locutusque/FalseReject-sharegpt

Viewer • Updated Jul 28, 2025 • 14.6k • 26 • 1

Locutusque/WebInstruct-verified-sharegpt

Viewer • Updated Jul 28, 2025 • 233k • 26

Locutusque/Mind-Corpus

Viewer • Updated Jul 28, 2025 • 125 • 39 • 9

Locutusque/deeplm-training-data

Viewer • Updated Apr 11, 2025 • 2.17M • 301 • 3

Locutusque/Wild-GPT-4-Turbo-Cleaned-EN

Viewer • Updated Apr 4, 2025 • 28.6k • 14 • 1

Locutusque/prm800k_phase_2_original

Viewer • Updated Mar 27, 2025 • 97.8k • 8

Locutusque/prm800k_phase_1_original

Viewer • Updated Mar 27, 2025 • 949 • 5

Locutusque/Dark-Sentience-V2

Viewer • Updated Mar 21, 2025 • 145 • 15 • 9

Locutusque/reasoning-v1-small-sample

Viewer • Updated Mar 21, 2025 • 62.6k • 5 • 2

Locutusque/s1K-claude-3-7-sonnet-sharegpt

Viewer • Updated Mar 16, 2025 • 1k • 14 • 3

Locutusque/codeforces-sharegpt

Viewer • Updated Mar 16, 2025 • 47.8k • 30

Locutusque/OpenR1-Math-220k-Default-ShareGPT

Viewer • Updated Mar 16, 2025 • 93.7k • 15

Locutusque/unnatural_instructions_gpt-4o-mini_scale_x2

Viewer • Updated Mar 15, 2025 • 119k • 7 • 1

Locutusque/lmsys-best

Viewer • Updated Mar 15, 2025 • 20.8k • 77 • 1

Locutusque/Platinum-CoT-v0.1-Flagged-ShareGPT

Viewer • Updated Mar 11, 2025 • 2.14k • 5

Locutusque/hercules-v6.9

Viewer • Updated Feb 15, 2025 • 2.96M • 435 • 5

Locutusque/Math-Evol-Instruct-v0.1-ShareGPT

Viewer • Updated Feb 15, 2025 • 1.29k • 3 • 1

Locutusque/Platinum-CoT-v0.1-ShareGPT

Viewer • Updated Feb 15, 2025 • 2.42k • 6 • 1

Locutusque/Math-Evol-Instruct-v0.1

Viewer • Updated Feb 15, 2025 • 1.29k • 25 • 2

Locutusque/ColumnedChatCombined

Updated Feb 12, 2025 • 128 • 2

Locutusque/preference-mix-40k-cleaned

Viewer • Updated Jan 16, 2025 • 30k • 14