Failed LFM2 Experiment

This model is a failed LFM2 experiment. It uses a 150M-parameter version of the LFM2 architecture from the Transformer library.

Initially, it was supposed to be trained on 4B UltraFineWeb tokens using the Muon optimizer. However, due to budget constraints it unexpectedly stopped training at around 1.5B tokens.

Due to this, the model hadn't converged properly. It was still in its high-LR phase when it stopped, so it failed to properly consolidate language modeling capabilities.

I used this experiment as a way to learn about pre-trainning, architectures, as well as how Muon works.

Downloads last month
19
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train MostLime/LFM2-150M-1.5B

Space using MostLime/LFM2-150M-1.5B 1