So we registered a subnet. Time to turn all our ideas into actual plans, and start building right away. Have a working validator up and running within a week. Build a website, register at GitHub, W&B, HuggingFace, open a Discord channel.

With so many ideas and so little time, we have to be smart. Start from something people know, start from a codebase we know, and build on that.

Why start a subnet at all?

Subnet 29 is born out of miner frustration on other subnets. Miners are incentivized, more than anyone else, to scrutinize the validator and the weighting logic, which can play a bigger role than the actual training that should form the core of the subnet. This is fine as long as the subnet goal is aligned with the validation logic, but as soon as misalignment creeps in, miners will not pursue the goal the subnet intends to.

Flaws

This is what we saw as miners on SN6, the finetuning subnet, in March to May of 2024. Various flaws in the validation logic meant that training a chatbot that was objectively better at answering general questions, wouldn’t win you a dime. As miners scrutinized the validation data source, they soon found out that the datasource had repeating samples. Therefore, (over)training on such samples resulted in models achieving extremely low losses, winning those miners significant amounts of TAO. The models were useless in practice.

Bugs

Things would only get worse as miners discovered that they could in fact inject their own data into the validation data stream. They would upload models and inject the associated samples 100-fold, so their models were able to score losses close to zero, meaning they would always win and receive all TAO emitted.

“Undocumented features”

But even before that, all kinds of metagaming tactics were employed, such as model copying. Fake miners would clone models and upload them within a minute after a model was published. They would typically duplicate the models of players that were known to produce good models. The scoring logic as implemented (accidentally) rewarded the first copy of the top model with some 10% of the reward the top model got, which is quite significant, compared to the (lack of) work done.

The battle of metagamers

To beat the fake miners at their game, fake-fake-miners started committing the model hash, submitted by honest miners, within the same block. They would track pending extrinsics and push them them within the 12 second block window (see our previous blog post about pushing and tracking register_subnet extrinsics).

Trying to earn some honest TAO

Reporting these kinds of issues did not help much, unfortunately, so we all just lived with it and employed defensive tactics, keeping an eye out for the next bug and exploit. Going through the process described above we were amazed to see that only contributions in the form of models were rewarded, while much of the improvement was to be found in eliminating bugs and metagaming opportunities.

All in all, it felt like a race to the bottom, which for SN6, it actually turned out to be.

Our turn! Where to start?

In our four month mining career we scrutinized the codebases of SN6 and SN9, in our attempt to reach the top of the respective leaderboards. Although we failed often, we sometimes succeeded. We even expanded the validator codebases to add our own logging for analysis – why does our model (not) win, why does another model perform better, why does uploading five copies help? – so it seemed like a good starting point to take the SN9 validator with our changes and publish them. Apart from this, some low hanging fruit can be implemented right away. And so SN29 will be version 2 of SN9, for now at least.

Changes in week 0

These are the major changes to the validator after forking from SN9, as published in week 0:

  • scoring metric awards one win per sample
  • early-upload advantage decays with time
  • change winrate-to-weight transformation, from softmax to winrate^f
  • samples are evaluated per sample, not 4096-token-packed
  • a bug bounty mechanism dubbed “Hall of Fame” is implemented
  • preparations are in place for having multiple competitions (like SN6 used to have)
  • codebase is cleaned up a little, removing all kinds of backward compatibility
  • upload script is improved
  • reduce excessive forking (run_in_subprocess()); will research proper timeout handling in bittensor code soon
  • add various tables to the logging output that show relative model scores
  • implement “benchmark.json” that allows injecting models into the evaluation loop while the validator is running, as a feature for miners
Categories: origin

0 Comments

Leave a Reply

Avatar placeholder

Your email address will not be published. Required fields are marked *