training

Growing models in the hidden size dimension

As explained in our previous article, growing a model in most dimensions in quite simple, but increasing the hidden size comes with a few problems. This article dives deep and shows how it can be done. Why, again? We can simply grow a model’s MLP (intermediate size) or number of Read more…

By coldint, 1 yearOctober 25, 2024 ago

training

Growing models

In this blog, and in our Discord channel, we discuss training in detail. A topic that is often overlooked, is how to grow a model. Especially in incentivized, collaborative and distributed training, this is a key ingredient. This post explores the concept of model growth with concrete Python code examples Read more…

By coldint, 1 yearOctober 1, 2024 ago

training

On training (part 2)

This post discusses one of the essential ingredients for training a model: how to see whether your model is actually improving, in other words, how do you benchmark the result of your training? Please read our prior blog posts (e.g. On incentive (part 1)) to understand the concept of sample Read more…

By coldint, 1 yearSeptember 17, 2024 ago

training

On training (part 1)

One of the first objectives for subnet 29 was to see whether the sample packing during training makes a difference for the achievable loss. TL;DR Sample packing makes significant difference. Per-token loss of a trained model is generally higher when training on packed samples, and evaluating on single samples. Or, Read more…

By coldint, 1 year ago