CEO at Redwood Research.
AI safety is a highly collaborative field--almost all the points I make were either explained to me by someone else, or developed in conversation with other people. I'm saying this here because it would feel repetitive to say "these ideas were developed in collaboration with various people" in all my comments, but I want to have it on the record that the ideas I present were almost entirely not developed by me in isolation.
Please contact me via email (bshlegeris@gmail.com) instead of messaging me on LessWrong.
This can matter for deployment as well as research! Back in 2021, a friend of mine made this mistake while training a customer service model, leading to the model making an extremely inappropriate sexual comment while being demoed to a potential customer; he eventually figured out that a user had said that to the model at some point.
That said I'm not actually sure why in general it would be a mistake in practice to train on the combination. Often models improve their performance when you train them on side tasks that have some relevance to what they're supposed to be doing---that's the whole point of pretraining. How much are you saying that it's a mistake to do this for deployment, rather than problematic when you are trying to experiment on generalization?
Even though the basic ideas are kind of obvious, I think that us thinking them through and pushing on them has made a big difference in what companies are planning to do.
The bioanchors post was released in 2020. I really wish that you bothered to get basic facts right when being so derisive about people's work.
I also think it's bad manners for you to criticize other people for making clear predictions given that you didn't make such predictions publicly yourself.
Notably, bioanchors doesn't say that we should be confident AI won't arrive before 2040! Here's Ajeya's distribution in the report (which was finished in about July 2020).
My core argument in this post isn't really relevant to anything that was happening in 2020, because people weren't really pushing on concrete changes to safety practices at AI companies yet.
Ugh, I think you're totally right and I was being sloppy; I totally unreasonably interpreted Eliezer as saying that he was wrong about how long/how hard/how expensive it would be to get between capability levels. (But maybe Eliezer misinterpreted himself the same way? His subsequent tweets are consistent with this interpretation.)
I totally agree with Eliezer's point in that post, though I do wish that he had been clearer about what exactly he was saying.
Another important point here is if there was substantial economic incentive to build strong Go players, then powerful Go players would have built earlier, and the time between players of those two levels would probably have been more longer.
Thanks heaps for pulling this up! I totally agree with Eliezer's point there.
It really depends what you mean by a small amount of time. On a cosmic scale, ten years is indeed short. But I definitely interpreted Eliezer back then (for example, while I worked at MIRI) as making a way stronger claim than this; that we'd e.g. within a few days/weeks/months go from AI that was almost totally incapable of intellectual work to AI that can overpower humanity. And I think you need to believe that much stronger claim in order for a lot of the predictions about the future that MIRI-sphere people were making back then to make sense. I wish we had all been clearer at the time about what specifically everyone was predicting.
FWIW, it looks to me like they restrict their linked roles to things that are vaguely related to safety or alignment. (I think that the 80,000 Hours job board does include some roles that don't have a plausible mechanism for improving AI outcomes except via the route of making Anthropic more powerful, e.g. the alignment fine-tuning role.)