Thanks for posting this. I am still a bit fuzzy on what exactly the Superalignment plan is, or if there even is a firm plan at this stage. Hope we can learn more soon.
I think they had a reasonably detailed (but unfortunately unrealistic) plan for aligning superintelligence before Ilya became a co-lead of the Superalignment team. That had been published, in multiple installments.
The early July text https://openai.com/blog/introducing-superalignment was the last of those installments, and most of its technical content was pre-Ilya (as far as I knew), but it also introduced Ilya as a co-lead.
But the problem with most such alignment plans including this one had always been that they didn't have much chance of working for a self-improving superintelligent AI or ecosystem of AIs, that is, exactly when we start really needing them to work.
I think Ilya understood this very well, and he started to revise plans and to work in new directions in this sense, and we were seeing various bits of his thoughts on that in his various interviews (in addition to what he said here, one other motif he was returning to in recent months was that it is desirable that superintelligent AIs would think about themselves as something like parents, and about us as something like their children, so one of the questions is what should we do to achieve that).
But I don't know if he would want to publish details going forward (successful AI safety research is capability research, there is no way to separate them, and the overall situation might be getting too close to the endgame). He will certainly share something, but the core novel technical stuff will more and more be produced via intellectual collaboration with cutting edge advanced (pre-public-release in-house) AI systems, and they would probably want to at least introduce a delay before sharing something as sensitive as this.
There has been a 25 min interview with Ilya conducted by Sven Strohband and released on July 17: https://www.youtube.com/watch?v=xym5f0XYlSc
This interview has a section dedicated to AI safety (7 min starting from 14:56). Ilya is now the co-lead of the OpenAI "superalignment" effort, and his thinking will likely be particularly influential in how this effort evolves.
What he is saying seems to be somewhat different from what is in the consensus OpenAI "superalignment" documents. It's compatible, but the emphasis is rather different. In particular, thinking about humans controlling or steering a superintelligent system is limited to an analogy of controlling a nuclear reactor to prevent a meltdown, and a more collaborative approach between humans and AIs seems to be emphasized instead.
(I am not sure when the interview has been recorded, but no earlier than July 6, since it mentions Introducing Superalignment.)
Here is my attempt at editing the YouTube transcript of that part of the conversation. The truly interesting part starts at 20:07. He hopes that a collaboration with superintelligence could solve the issues of misuse (so, no, he is not aiming to make superintelligence alignable to arbitrary goals, designing the proper goals is likely to be a collaborative activity between humans and AIs). I've put some bold marks for emphasis.
My own final comments: