Skip to content

ChatGPT on Designing Friendly Superintelligence

May 24, 2023

Now that we have, for example, the head of OpenAI suggesting to the US Senate that a license (to be issued by a new regulatory agency) should be required in order to own and operate an advanced AI… “singularity politics” has become a topic in mainstream headlines. So let’s instead see how things are progressing on the technical front.

Back in January, I could ask ChatGPT for its thoughts on the topic of benevolent superhuman AI, and it would come up with questions and answers like these.

Now it’s late May, and I am able to generate a 50-page paper, “Designing a Human-Friendly Superintelligent Deep Learning Network”, by asking ChatGPT the question

Could you write a multi-chapter technical paper describing a design for a deep learning network that would become superintelligent while still being human-friendly?

Its initial reply was just an abstract and contents, but when asked, it obligingly provided the full text of each subsection. If we leave out the prompting and the formatting, steps which could also be automated, I’d estimate that the text took between 5 and 10 minutes to generate.

What of the paper itself? It’s purely qualitative, no equations or references. You could treat it as a project manifesto, listing everything that has to come together to create a friendly superintelligence, and proposing a variety of techniques and protocols that might be relevant, but not actually specifying how they should be integrated (though it does acknowledge that system integration is part of the process, see section 7.1).

As far as I can see, the technical ideas that are referenced – with one exception – come from the standard theory and practice of deep learning. So really what it’s given us, is a manual and a bunch of ideas, for safely and effectively carrying out a generic deep learning project.

The one exception is “Coherent Extrapolated Volition”, which shows up in Chapter 5, Value Alignment, as the most advanced form of value learning. It’s rather charming that this idea from the visionary fringe of AI safety, conceived precisely to tackle the problem of making human-friendly superintelligence, showed up quietly integrated into ChatGPT’s manifesto, alongside conventional technical concepts of deep learning, without any attempt by myself to make it appear.

A determined skeptic might be unimpressed by this document. It’s just words, in a sense it’s just rhetoric, and we already knew that ChatGPT can produce words. The idea of a “friendly superintelligence” is still just as much vaporware, as it was before I conducted my latest experiment – right?

Right. But language models can not only write, they can also code. Turning this manifesto into code would be a lot more involved than turning it into a PDF, and of course the code would not work on the first try. Now suppose human beings look at that failed first try, learn from what went wrong, and try again; and keep repeating that process. On the tenth try, we still wouldn’t have a friendly superintelligence, but maybe we would have actual working code, along with training and testing protocols. And on the hundredth try? And the thousandth?

It’s inevitable that these unexpected AIs of 2023 will be used to help design their successors. That’s undoubtedly already happening in the big tech companies. And that’s why I think this document is interesting. It shows you what ChatGPT “thinks” when it is asked, in a very unadorned way, how to make a friendly superintelligence. These are the principles and methods that it spontaneously suggests; and they may be among the principles and methods that it would try to apply, in a more sophisticated attempt to design and create its successors.

From → Uncategorized

Leave a Comment

Leave a comment