Failed LFM2 Experiment

This model is a failed LFM2 experiment. It uses a 150M-parameter version of the LFM2 architecture from the Transformer library.

Initially, it was supposed to be trained on 4B UltraFineWeb tokens using the Muon optimizer. However, due to budget constraints it unexpectedly stopped training at around 1.5B tokens.

Due to this, the model hadn't converged properly. It was still in its high-LR phase when it stopped, so it failed to properly consolidate language modeling capabilities.

I used this experiment as a way to learn about pre-trainning, architectures, as well as how Muon works.

Downloads last month: 19

Safetensors

Model size

0.2B params

Tensor type

F32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

MostLime
/

LFM2-150M-1.5B

Failed LFM2 Experiment

Dataset used to train MostLime/LFM2-150M-1.5B

Space using MostLime/LFM2-150M-1.5B 1