Gabin

Message

168

Gabin has not written any posts yet.

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Gabin Kolly and Charbel-Raphaël Segerie contributed equally to this post. Davidad proofread this post. Thanks to Vanessa Kosoy, Siméon Campos, Jérémy Andréoletti, Guillaume Corlouer, Jeanne S., Vladimir I. and Clément Dumas for useful comments. Context Davidad has proposed an intricate architecture aimed at addressing the alignment problem, which necessitates extensive...

Apr 19, 2023167

Replying toDavidad's Bold Plan for Alignment: An In-Depth Explanation

Gabin3y

Davidad's Bold Plan for Alignment: An In-Depth Explanation

The formal models don't need to be open and public, and probably shouldn't be. Of course this adds a layer of difficulty, since it is harder to coordinate on an international scale and invite a lot of researchers to help on your project when you also want some protection against your model being stolen or published on the internet. It is perhaps okay if it is open source in the case where it is very expensive to train a model in this simulation and no other group can afford it.
Good question. I don't know, and I don't think that I have a good model of what the simulation would look like. Here

Charbel-Raphaël

Charbel-Raphaël, Gabin

Gabin Kolly and Charbel-Raphaël Segerie contributed equally to this post. Davidad proofread this post.

Thanks to Vanessa Kosoy, Siméon Campos, Jérémy Andréoletti, Guillaume Corlouer, Jeanne S., Vladimir I. and Clément Dumas for useful comments.

Context

Davidad has proposed an intricate architecture aimed at addressing the alignment problem, which necessitates extensive knowledge to comprehend fully. We believe that there are currently insufficient public explanations of this ambitious plan. The following is our understanding of the plan, gleaned from discussions with Davidad.

This document adopts an informal tone. The initial sections offer a simplified overview, while the latter sections delve into questions and relatively technical subjects. This plan may seem extremely ambitious, but the appendix provides further elaboration on... (read 6113 more words →)

167

LESSWRONG
LW

LESSWRONG
LW

Gabin

Gabin

Gabin

Gabin

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Context

Davidad's Bold Plan for Alignment: An In-Depth Explanation

Context