A couple more thoughts on “what dataset/environments are necessary for training AGI”:
I also find it odd that Bio Anchors does not talk much about data requirements, and I‘m glad you pointed that out.
Thus, to get timelines, we'd also need to estimate what dataset/environments are necessary for training AGI. But I'm not sure we know what these datasets/environments look like.
I suspect this could be easier to answer than we think. After all, if you consider a typical human, they only have a certain number of skills, and they only have a certain number of experiences. The skills and experiences may be numerous, but they are finite. If we can enumerate and analyze all of them, we may be able to get a lot of insight into what is “necessary for training AGI”.
If I were to try to come up with an estimate, here is one way I might approach it: