What problem would you like to see Reinforcement Learning applied to?
We often talk about the dangers and challenges of AI and self-improving agents, but I'm curious what you view as potential beneficial applications of AI - if any! As a ML researcher I encounter a lot of positivity and hype in the field, so the very different perspective of the...
An important distinction here is that the number of tokens a model was trained for should not be confused with the number of tokens in a dataset: if each token is seen exactly once during training then it has been trained for one "epoch".
In my experience scaling continues for quite a few epochs over the same datset, only if the model has more parameters than the datset tokens and training for >10 epochs does overfitting kick in and scaling break down.