In the previous post on incentive we described how loss values are used to score models against each other, and how the relative standing between models changes over time due to decaying advantage. From that conceptual idea a few steps are needed to get to a ranking of models and to distribution of incentive.
The measurement quantum
When models are listed or summarized, often “the average loss of the model” is shown: the average value of a collection of sample losses. In order to accurately compare models however, individual sample losses are calculated and compared between models. A random set of samples is first selected, and then the loss on all of these samples is determined for all competing models. From this point on, comparison strategies diverge.
“Win rate”
One concept to arrive at a “win rate” per model, is to take each model, and, for every sample, count the number of competing models that are worse on that particular sample. Although this may sound simple, it introduces complex meta-gaming dynamics, when you consider that every competitor is able to submit a string of additional copies of a model, that will lose 100% of samples against that first model. This will significantly increase the win rate of the first model. Now imagine every competing player doing this: the validators will be evaluating many times more models than are actually competing, slowing down the evaluation of models, while ultimately not even rewarding the objectively best model, but rather the smartest competitor.
SN29 concept of “win rate”
To address this issue, and make uploading copies a pure waste of time and bandwidth, with nothing to gain, we changed the logic slightly: per sample one winning model is determined. The number of wins a model scores, is indicative of its performance. It is easily seen that a copy will not score a single win.
From win rate to incentive: reward broadly trained models
The ability of a model to score well on a range of samples is incentivized. Consider a situation where miner A uploads two models that both score 25% wins, and miner B uploads one model that scores 50% wins. Miner B should be rewarded more, because in practice his model is the most useful model. To achieve a skew towards that end, various mathematical strategies can be chosen. In SN29 the win rate is raised to some power (>1), and the resulting numbers are normalized. The power factor is chosen to be 1.2 initially, so in the example given miner B will receive an intermediate score of 0.5^1.2 = 0.43 and miner A will receive an intermediate score of 2 * 0.25^1.2 = 0.38. After normalization, miner B will receive 54% of the incentive.
Multiple competitions
Within the subnet, various competitions are available for miners, each with its own set of parameters. This may range from limits on the number of parameters, the tokenizer used, the model type, etc. Competitions parameters are updated weekly and reflect the direction(s) of research. For example, by choosing to relax a particular constraint, miners are incentivized to focus on how to best apply the freedom gained to improve a model. By doing this in incremental steps, we hope to avoid creating a winner-takes-all situation where one miner can dominate the competition indefinitely by sheer processing/spending power. We hope to unlock smart solutions at every turn – and by forcing frequent publishing of models we effectively force participants to lay their cards on the table, so that we all learn. Each competition gets a set percentage of the total reward, that is then divided using the incentive formulas described above, per competition.
Rewarding other contributions: Hall of Fame
But before anything else we are most thankful to those individuals who help us out, notify us about bugs (instead of exploiting them) or in any way do something good for the subnet or the community. If an event would warrant us handing out a bounty, we add the hotkey of the contributor to the Hall of Fame, along with the starting block, a short description of the contribution and the rewarded amount, in units of epochs of incentive. The validators will award the hotkey a decaying amount that will eventually add up to the total reward. The decay rate is 0.5% per epoch, which amounts to roughly 50% per week. The bounties are awarded first in the incentive distribution logic, with a cap of 40% of emissions, after which the remaining emission is distributed according to model scores.
From weight to tao
Within the subnet, validators perform everything described above to compute “weight values” for each participating miner. In blockchain code, these weight values are then combined into a single incentive value per miner, using a consensus algorithm that removes outliers, which then translates into the actual reward in tao.
0 Comments