Davidad's Bold Plan for Alignment: An In-Depth Explanation
Gabin Kolly and Charbel-Raphaël Segerie contributed equally to this post. Davidad proofread this post. Thanks to Vanessa Kosoy, Siméon Campos, Jérémy Andréoletti, Guillaume Corlouer, Jeanne S., Vladimir I. and Clément Dumas for useful comments. Context Davidad has proposed an intricate architecture aimed at addressing the alignment problem, which necessitates extensive...
- The formal models don't need to be open and public, and probably shouldn't be. Of course this adds a layer of difficulty, since it is harder to coordinate on an international scale and invite a lot of researchers to help on your project when you also want some protection against your model being stolen or published on the internet. It is perhaps okay if it is open source in the case where it is very expensive to train a model in this simulation and no other group can afford it.
- Good question. I don't know, and I don't think that I have a good model of what the simulation would look like. Here
... (read more)