Constructability: Plainly-coded AGIs may be feasible in the near future
Charbel-Raphaël Segerie and Épiphanie Gédéon contributed equally to this post. Many thanks to Davidad, Gabriel Alfour, Jérémy Andréoletti, Lucie Philippon, Vladimir Ivanov, Alexandre Variengien, Angélina Gentaz, Simon Cosson, Léo Dana and Diego Dorn for useful feedback. TLDR: We present a new method for a safer-by design AI development. We think using plainly coded AIs may be feasible in the near future and may be safe. We also present a prototype and research ideas on Manifund. Epistemic status: Armchair reasoning style. We think the method we are proposing is interesting and could yield very positive outcomes (even though it is still speculative), but we are less sure about which safety policy would use it in the long run. Current AIs are developed through deep learning: the AI tries something, gets it wrong, then gets backpropagated and all its weight adjusted. Then it tries again, wrong again, backpropagation again, and weights get adjusted again. Trial, error, backpropagation, trial, error, backpropagation, ad vitam eternam ad nauseam. Of course, this leads to a severe lack of interpretability: AIs are essentially black boxes, and we are not very optimistic about post-hoc interpretability. We propose a different method: Constructability or AI safety via pull request.[1] By pull request, we mean that instead of modifying the neural network through successive backpropagations, we construct and design plainly-coded AIs (or hybrid systems) and explicitly modify its code using LLMs in a clear, readable, and modifiable way. This plan may not be implementable right now, but might be as LLMs get smarter and faster. We want to outline it now so we can iterate on it early. One possible long-term vision that constructability could lead to, in which we make use of a black-box superhuman coder to create code that we then audit and deploy. Overview If the world released a powerful and autonomous agent in the wild, white box or black box, or any color really, human