Review

There has been a 25 min interview with Ilya conducted by Sven Strohband and released on July 17: https://www.youtube.com/watch?v=xym5f0XYlSc

This interview has a section dedicated to AI safety (7 min starting from 14:56). Ilya is now the co-lead of the OpenAI "superalignment" effort, and his thinking will likely be particularly influential in how this effort evolves.

What he is saying seems to be somewhat different from what is in the consensus OpenAI "superalignment" documents. It's compatible, but the emphasis is rather different. In particular, thinking about humans controlling or steering a superintelligent system is limited to an analogy of controlling a nuclear reactor to prevent a meltdown, and a more collaborative approach between humans and AIs seems to be emphasized instead.

(I am not sure when the interview has been recorded, but no earlier than July 6, since it mentions Introducing Superalignment.)

Here is my attempt at editing the YouTube transcript of that part of the conversation. The truly interesting part starts at 20:07. He hopes that a collaboration with superintelligence could solve the issues of misuse (so, no, he is not aiming to make superintelligence alignable to arbitrary goals, designing the proper goals is likely to be a collaborative activity between humans and AIs). I've put some bold marks for emphasis.

15:03 Sven: it's worthwhile to also talk about AI safety, and OpenAI has released the document just recently where you're one of the undersigners. Sam has testified in front of Congress. What worries you most about AI safety?

15:27 Ilya: Yeah I can talk about that. So let's take a step back and talk about the state of the world. So you know, we've had this AI research happening, and it was exciting, and now you have the GPT models, and now you all get to play with all the different chatbots and assistance and, you know, Bard and ChatGPT, and they say okay that's pretty cool, it can do things; and indeed they already are. You can start perhaps worrying about the implications of the tools that we have today, and I think that it is a very valid thing to do, but that's not where I allocate my concern.

16:14 The place where things get really tricky is when you imagine fast forwarding some number of years, a decade let's say, how powerful will AI be? Of course with this incredible future power of AI which I think will be difficult to imagine frankly. With an AI this powerful you could do incredible amazing things that are perhaps even outside of our dreams. Like if you can really have a dramatically powerful AI. But the place where things get challenging are directly connected to the power of the AI. It is powerful, it is going to be extremely unbelievably powerful, and it is because of this power that's where the safety issues come up, and I'll mention three I see... I personally see three... like you know when you get so... you alluded to the letter that we posted at OpenAI a few days ago, actually yesterday, about what with... about some ideas that we think would be good to implement to navigate the challenges of superintelligence.

17:46 Now what is superintelligence, why did we choose to use the term "superintelligence"? The reason is that superintelligence is meant to convey something that's not just like an AGI. With AGI we said, well you have something kind of like a person, kind of like a co-worker. Superintelligence is meant to convey something far more capable than that. When you have such a capability it's like can we even imagine how it will be? But without question it's going to be unbelievably powerful, it could be used to solve incomprehensibly hard problems. If it is used well, if we navigate the challenges that superintelligence poses, we could radically improve the quality of life. But the power of superintelligence is so vast so the concerns.

18:37 The concern number one has been expressed a lot and this is the scientific problem of alignment. You might want to think of it from the as an analog to nuclear safety. You know you build a nuclear reactor, you want to get the energy, you need to make sure that it won't melt down even if there's an earthquake and even if someone tries to I don't know smash a truck into it. (Sven: Yep.) So this is the superintelligent safety and it must be addressed in order to contain the vast power of the superintelligence. It's called the alignment problem. One of the suggestions that we had in our... in the post was an approach that an international organization could do to create various standards at this very high level of capability, and I want to make this other point you know about the post and also about our CEO Sam Altman Congressional testimony where he advocated for regulation of AI. The intention is primarily to put rules and standards of various kinds on the very high level of capability. You know you could maybe start looking at GPT-4, but that's not really what is interesting, what is relevant here, but something which is vastly more powerful than that, when you have a technology so powerful it becomes obvious that you need to do something about this power. That's the first concern, the first challenge to overcome.

20:08 The Second Challenge to overcome is that of course we are people, we are humans, "humans of interests", and if you have superintelligences controlled by people, who knows what's going to happen... I do hope that at this point we will have the superintelligence itself try to help us solve the challenge in the world that it creates. This is not... no longer an unreasonable thing to say. Like if you imagine a superintelligence that indeed sees things more deeply than we do, much more deeply. To understand reality better than us. We could use it to help us solve the challenges that it creates.

20:43 Then there is the third challenge which is the challenge maybe of natural selection. You know what the Buddhists say: the change is the only constant. So even if you do have your superintelligences in the world and they are all... We've managed to solve alignment, we've managed to solve... no one wants to use them in very destructive ways, we managed to create a life of unbelievable abundance, which really like not just not just material abundance, but Health, longevity, like all the things we don't even try dreaming about because there's obviously impossible, if you've got to this point then there is the third challenge of natural selection. Things change, you know... You know that natural selection applies to ideas, to organizations, and that's a challenge as well.

21:28 Maybe the Neuralink solution of people becoming part AI will be one way we will choose to address this. I don't know. But I would say that this kind of describes my concern. And specifically just as the concerns are big, if you manage, it is so worthwhile to overcome them, because then we could create truly unbelievable lives for ourselves that are completely even unimaginable. So it is like a challenge that's really really worth overcoming.

22:00 Sven: I very much like the idea that there needs to be the sort of threshold above which we we really really should pay attention. Because you know speaking as as a German, if it's like European style regulation often from people that don't really know very much about the field, you can also completely kill innovation which is a which be... it would be a little bit of a pity.

My own final comments:

  • I really like the emphasis on a collaborative non-adversarial approach to interactions between humans and AIs. I think we need to find a way to combine security mindset with a collaborative non-adversarial approach, or we'll be completely doomed.
  • In particular, I like that steering and controlling is understood in terms of avoiding the blow-up, and that it seems that the overall interaction is supposed to be collaborative, including the goal-making (although the phrase "no one wants to use them in very destructive ways" does suggest a rather radical reshaping of the world structure and of its inhabitants by this human-AI collaboration, and one might worry what would that imply, and whether it would go well).
  • Speaking of merging humans and AIs, I'd prefer people to focus more on the intermediate solutions before jumping to Neuralink-grade ones. In particular, high-end augmented reality and high-end non-invasive brain-computer interfaces can go a long way and are much easier to accelerate rapidly, so I wish people would not gloss over those intermediate solutions, but would talk about them more.
New Comment
3 comments, sorted by Click to highlight new comments since:

Thanks for posting this. I am still a bit fuzzy on what exactly the Superalignment plan is, or if there even is a firm plan at this stage. Hope we can learn more soon.

I think they had a reasonably detailed (but unfortunately unrealistic) plan for aligning superintelligence before Ilya became a co-lead of the Superalignment team. That had been published, in multiple installments.

The early July text https://openai.com/blog/introducing-superalignment was the last of those installments, and most of its technical content was pre-Ilya (as far as I knew), but it also introduced Ilya as a co-lead.

But the problem with most such alignment plans including this one had always been that they didn't have much chance of working for a self-improving superintelligent AI or ecosystem of AIs, that is, exactly when we start really needing them to work.

I think Ilya understood this very well, and he started to revise plans and to work in new directions in this sense, and we were seeing various bits of his thoughts on that in his various interviews (in addition to what he said here, one other motif he was returning to in recent months was that it is desirable that superintelligent AIs would think about themselves as something like parents, and about us as something like their children, so one of the questions is what should we do to achieve that).

But I don't know if he would want to publish details going forward (successful AI safety research is capability research, there is no way to separate them, and the overall situation might be getting too close to the endgame). He will certainly share something, but the core novel technical stuff will more and more be produced via intellectual collaboration with cutting edge advanced (pre-public-release in-house) AI systems, and they would probably want to at least introduce a delay before sharing something as sensitive as this.