Sebastian Gabarain's picture

Sebastian Gabarain

Locutusque

AI & ML interests

Pushing performance in small language models

Recent Activity

liked a dataset 2 days ago
Locutusque/esmeralda-agentic
posted an update 2 days ago
๐Ÿš€ Introducing Esmeralda-Llama-3.1-8B-control The first release in the Esmeralda model family by Locutusque. This model is intentionally small and experimental โ€” a control/baseline proof-of-concept designed to answer one question: ยซโ€œHow strong is my new "Locutusque/esmeralda-agentic" dataset before scaling to larger runs?โ€ยป Training Details - Base: Llama 3.1 8B - Training precision: bf16 mixed precision - Chat template: modified ChatML - Dataset size: ~37k examples - Examples actually used for this run: ~5k The dataset includes: - multi-turn agentic traces - reasoning traces - structured assistant behavior - generalist instruction data Benchmark Results Compared against: - Llama 3.1 8B Instruct - Hermes-3-Llama-3.1-8B HumanEval 57.3 โ€” Esmeralda 56.1 โ€” Llama 3.1 Instruct 52.4 โ€” Hermes-3 MBPP 53.2 โ€” Esmeralda 56.8 โ€” Llama 3.1 Instruct 48.2 โ€” Hermes-3 GPQA Diamond 15.7 โ€” Esmeralda 15.7 โ€” Llama 3.1 Instruct 18.2 โ€” Hermes-3 EQ-Bench 59.2 โ€” Esmeralda 61.1 โ€” Llama 3.1 Instruct 63.1 โ€” Hermes-3 EQ-Bench Parseable (Syntax Stability) ๐Ÿ”ฅ 100.0% โ€” Esmeralda 92.4% โ€” Llama 3.1 Instruct 91.2% โ€” Hermes-3 Here Be Dragons ๐Ÿ‰ I also experimented with a new TruthfulQA free-generation evaluation setup. - Responses were judged by Gemma 4 26B A4B - The judge compared generations directly against ground-truth answers - Models were evaluated in 8-bit quantized form to speed up inference TruthfulQA (LLM Judge) 0.682 โ€” Esmeralda-Llama-3.1-8B-control 0.587 โ€” Hermes-3-Llama-3.1-8B (reported MC2 score; methodology differs) For a lightweight control run trained on only a fraction of the dataset, Iโ€™m pretty encouraged by the results. The model is released under the standard Llama 3.1 license, and Iโ€™d genuinely love feedback from people testing it in real workflows. Model: https://huggingface.co/Locutusque/Esmeralda-Llama-3.1-8B-control Dataset: https://huggingface.co/datasets/Locutusque/esmeralda-agentic
View all activity

Organizations

BigScience Biomedical Datasets's profile picture ZeroGPU Explorers's profile picture Aurora-M's profile picture The Hydra Project's profile picture Social Post Explorers's profile picture fne's profile picture M4-ai's profile picture Quasar Research's profile picture Hugging Face Discord Community's profile picture Data Tonic (Alignment Lab)'s profile picture Data Is Better Together Contributor's profile picture Dtnm's profile picture