Failed LFM2 Experiment
This model is a failed LFM2 experiment. It uses a 150M-parameter version of the LFM2 architecture from the Transformer library.
Initially, it was supposed to be trained on 4B UltraFineWeb tokens using the Muon optimizer. However, due to budget constraints it unexpectedly stopped training at around 1.5B tokens.
Due to this, the model hadn't converged properly. It was still in its high-LR phase when it stopped, so it failed to properly consolidate language modeling capabilities.
I used this experiment as a way to learn about pre-trainning, architectures, as well as how Muon works.
- Downloads last month
- 19
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support