Growing models

In this blog, and in our Discord channel, we discuss training in detail. A topic that is often overlooked, is how to grow a model. Especially in incentivized, collaborative and distributed training, this is a key ingredient. This post explores the concept of model growth with concrete Python code examples Read more…

On training (part 1)

One of the first objectives for subnet 29 was to see whether the sample packing during training makes a difference for the achievable loss. TL;DR Sample packing makes significant difference. Per-token loss of a trained model is generally higher when training on packed samples, and evaluating on single samples. Or, Read more…