I asked ChatGPT how to make a human-friendly superintelligence, and it gave me a 50-page plan. There are no technical novelties, or any attempt to deal with the dangers that are supposed to be peculiar to superintelligence or self-enhancing systems. The manifesto is really just a potpourri of principles and methods that might be appropriate for any deep learning system that's supposed to be interacting with human beings. And yet I suspect startups have been founded on far less. 

To me, this was interesting because this is what ChatGPT comes up with, when you ask it in the laziest possible way, with no prompt engineering at all, how to do this thing. As such, this is representative of its "spontaneous thoughts", the first things that come to its mind, when it is presented with this task. It is therefore also a glimpse of how an LLM-based agent might proceed, when similarly instructed. 

I would be interested to see the results of similar experiments to this one. As I said, I made no effort at all, I just asked the question and then got it to expand on its own outline. The people who are actually developing large language models must have conducted similar, but far more sophisticated, experiments by now. 

New to LessWrong?

New Comment