Exploring non-anthropocentric aspects of AI existential safety: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential (this is a relatively non-standard approach to AI existential safety, but this general direction looks promising).
Same here :-)
I do see feasible scenarios where these things are sustainably nice.
But whether we end up reaching those scenarios... who knows...
I really don’t think it’s crazy to believe that humans figure out a way to control AGI at least.
They want to, yes. But is it feasible?
One problem is that "AGI" is a misnomer (the road to superintelligence goes not via human equivalence, but around it; we have the situation where AI systems are wildly superhuman along larger and larger number of dimensions, and are still deficient along some important dimensions compared to humans, preventing us from calling them "AGIs"; by the time they are no longer deficient along any important dimensions, they are already wildly superhuman along way too many dimensions).
Another problem, a "narrow AGI" (in the sense defined by Tom Davidson, https://www.lesswrong.com/posts/Nsmabb9fhpLuLdtLE/takeoff-speeds-presentation-at-anthropic, so we are still talking about very "sub-AGI" systems) is almost certainly sufficient for "non-saturating recursive self-improvement", so one has a rapidly moving target for one's control ambitions (it's also likely that it's not too difficult to reach the "non-saturating recursive self-improvement" mode, so if one freezes one's AI and prevents it from self-modifications, others will bypass its capabilities).
In 2023 Ilya was sounding like he had good grasp of these complexities and he was clearly way above par in the quality of his thinking about AI existential safety: https://www.lesswrong.com/posts/TpKktHS8GszgmMw4B/ilya-sutskever-s-thoughts-on-ai-safety-july-2023-a
Of course, it might be just the stress of this very adversarial situation, talking to hostile lawyers, with his own lawyer pushing him hard to say as little as possible, so I would hope this is not a reflection of any genuine evolution in his thinking. But we don't know...
I also think it’s possible that the U.S. and China might already be talking behind the scenes about a superintelligence ban.
Even if they are talking about this, too many countries and orgs are likely to have feasible route to superintelligence. For example, Japan is one of those countries (for example, they have Sakana AI), and their views on superintelligence are very different from our Western views, so it would be difficult to convince them to join a ban; e.g. quoting from https://www.lesswrong.com/posts/Yc6cpGmBieS7ADxcS/japan-ai-alignment-conference-postmortem:
A second difficulty in communicating alignment ideas was based on differing ontologies. A surface-level explanation is that Japan is quite techno-optimistic compared to the west, and has strong intuitions that AI will operate harmoniously with humans. A more nuanced explanation is that Buddhist- and Shinto-inspired axioms in Japanese thinking lead to the conclusion that superintelligence will be conscious and aligned by default. One senior researcher from RIKEN noted during the conference that "it is obviously impossible to control a superintelligence, but living alongside one seems possible." Some visible consequences of this are that machine consciousness research in Japan is taken quite seriously, whereas in the West there is little discussion of it.
Other countries which are contenders include UK, a number of European countries including Switzerland, Israel, Saudi Arabia, UAE, Singapore, South Korea, and, of course, Brazil and Russia, and I doubt this is a complete list.
We already are seeing recursive self-improvement efforts taking longer to saturate, compared to their behavior a couple of years ago. I doubt they'll keep saturating for long.
Thanks for posting that deposition.
It’s really strange how he phrases it here.
On one hand, he has switched from focusing on the ill-defined “AGI” to focusing on superintelligence a while ago. But he is using this semi-obsolete “AGI” terminology here.
On the other hand, he seemed to have understood a couple of years ago that no one could be “in charge” of such a system, that at most one could perhaps be in charge of a privileged access to it and privileged collaboration with it (and even that is only feasible if the system chooses to cooperate in maintaining this kind of privileged access).
So it’s very strange, almost as if he has backtracked a few years in his thinking… of course, this is right after a break in page numbers, this is page 300, and the previous one is page 169 (I guess there is a process for what of this (marked as “highly confidential”) material is released).
I think the idea is that non-profit money is much closer to public property, especially when the non-profit has this kind of charter. So the complaints about a potentially unfair deal are legitimate.
But I am not sure that the non-profit has less of the expected future money as a result, even before the yet undisclosed warrants.
On one hand, it has presumably smaller share than before (it’s tricky to know exactly, with capped profits for other investors and such, one really needs to calculate more precisely and not just presume). On the other hand, the restructuring is expected to increase OpenAI’s future market share by enabling it to expand faster. So, in expectation, this is, presumably, a smaller share of a larger pie. Whether this smaller share of a larger pie is smaller in expectation compared to the pre-existing situation is not clear.
The material effect of this restructuring is non-obvious: one needs to do some quantitative modeling, to take into account the undisclosed terms of the warrants, and so on in order to figure this out.
(It’s probably not an accident that Microsoft equity in OpenAI is not far from 10x of what they invested, and their original profit cap was 100x, and warrants require another 10x growth in valuation to start kicking in. It is likely that the board was trying to formulate a fair replacement of that 100x profit cap when formulating the warrants (but it’s really annoying that the terms of those warrants don’t seem to be disclosed; or might they be actually disclosed somewhere deep in the filed documents?).)
Yes, in this sense you are right. In many countries, regulatory barriers are all-important. Although, a good chunk of the world can start adopting fast (and medical tourism does exist).
I think the main body of OpenAI will be dealing with the key safety issues, not even the whole main body, but the "core group". They have to, the key safety problems are of such nature they can't be dealt from the outside, and the non-profit is "the outside" in this sense, they can only direct/advise/assent/review the plans, but they can't do more than that, they just don't know how to do it. We've got a glimpse of OpenAI current thinking on "core safety" from Jakub Pachocki during the latest livestream (that's whom they now have instead of Ilya), it has sounded good modulo the main difficulty, and we don't know if they are well prepared to address the main difficulty (maintaining invariant properties through accelerating recursive self-improvement provided by artificial AI researchers, so not letting those properties diverge and tightening the delta between what those properties ideally should be and what they are at the moment, making sure that not only the probability of big disaster per unit of time does not grow, but that it diminishes fast enough, so that the accumulated probability of big disaster remains moderate in the infinite limit).
The other big project led by the non-profit, the cybersecurity improvements, shows that the non-profit is ready to lead on externalities, on systemic safety problems downstream of AI development. They are better equipped to do that, they have connections across the industry, this requires a systemic action, a lot of coordination.
(I presume their biomedical project will also try to quietly (or not so quietly) include prevention of artificial pandemics, which is another big downstream safety externality of AI development. The non-profit is capable of driving that.)
But with the core safety of self-modifying, self-improving systems, one can't split safety and capability, it has to be the same group of people, a group of leading AI researchers who need to be strongly mindful of existential safety, to have a correct approach of collaborating on that set of issues with AI systems, and to drive a take-off jointly with collaborating AI systems (I don't know if OpenAI has a right group of people in this sense these days).
I think those “lines in the sand” are very artificial. That’s especially true about AGI, because the road to superintelligence goes not via human equivalence, but around it.
So at any point in time we have AI systems which are somewhat deficient compared to humans along some dimensions to be called “true AGI”, but also strongly superhuman along larger and larger number of dimensions. At the point in time when all important deficiencies compared to humans are gone and we can call a system “AGI” without reservations, it’s already wildly superhuman along many dimensions (including many capabilities related to biomedical research).
But also we expect continuous progress, we don’t expect saturation, so at any given point in time any given task remains easier to accomplish in the future. But that’s not a good reason to postpone, because we usually need the solution ASAP. People are dying now, more than a million each week, and the sooner we can start to meaningfully decrease this number, the better.
In any case, AIs need to get better at biomedical research in order to be helpful with this, and it takes time. I doubt there is a generic intelligence capability from which everything follows automatically and super rapidly. The direction is towards artificial research assistants, then to artificial researchers, then to very superhuman artificial researchers, but one still needs to push it for any given application field. (Of course, people prioritize AI research first, for obvious reasons, and that’s also where the most formidable existential safety challenges come from, because artificial AI researchers do mean straightforward non-saturating recursive self-improvement, so safety-wise we should talk about that aspect first. But it’s good that they are pushing towards research help in more applied areas too, when those applied areas are urgent. It grounds the whole thing in the right values and the right priorities to some extent. If it slows down the rush to superintelligence a bit, it might be a positive thing too. Although I don’t really expect a slowdown from that, I think AI practioners and AIs themselves will learn a lot from those “biomedical exercises”.)
Health and curing diseases. The OpenAI Foundation will fund work to accelerate health breakthroughs so everyone can benefit from faster diagnostics, better treatments, and cures. This will start with activities like the creation of open-sourced and responsibly built frontier health datasets, and funding for scientists.
The first seems like a generally worthy cause that is highly off mission. There’s nothing wrong with health and curing diseases, but pushing this now does not advance the fundamental mission of OpenAI. They are going to start with, essentially, doing AI capabilities research and diffusion in health, and funding scientists to do AI-enabled research. A lot of this will likely fall right back into OpenAI and be good PR.
Again, that’s a net positive thing to do, happy to see it done, but that’s not the mission.
I don't think that's correct. The mission is to ensure that AGI benefits all humanity. There are various facets of it, but dealing with health, diseases(, and aging) is one of the main ways smarter and smarter AI systems are expected to benefit all humanity.
AI systems are strong enough already to start contributing in this sense, so it's time for OpenAI to start pushing explicitly in this direction. Also it would be good if AIs see that we actually value this direction.
But going deeper into that is probably not for this comment. In your previous post you wrote:
Sam Altman: We have a safety strategy that relies on 5 layers: Value alignment, Goal alignment, Reliability, Adversarial robustness, and System safety. Chain-of-thought faithfulness is a tool we are particularly excited about, but it somewhat fragile and requires drawing a boundary and a clear abstraction.
All five of these are good things, but I notice (for reasons I will not attempt to justify here) that I do not expect he who approaches the problem in this way to have a solution that scales to true automated AI researchers. The Tao is missing.
That's certainly correct. None of what they have been saying sheds any light on how to scale this safety strategy to the situation when one has true automated AI researchers. We should be discussing various aspects of this fundamental problem more.
Still, value alignment is fundamental, and the importance of taking care of health issues of humans is an important part of value alignment, so it's a good thing for them to start emphasizing that.
I think unilateralism + leadership is quite unconceivable right now.
I am interested in any scenario you have in mind (not with the intent to fight whatever you suggest, just to see if there are ideas or mechanisms I may be missing).
I think, with G20, it's very easy to imagine. Here is one such scenario.
Xi Jinping decides (for whatever reason) that the ASI needs to be stopped. He orders a secret study, and if the study indicates that there are feasible pathways, he orders to proceed along some of them (perhaps, in parallel).
For example, he might demand international negotiations and threaten a nuclear war, and he is capable to make China to line up behind him in support of this policy.
On the other hand, if that study suggests a realistic path to a unilateral pivotal act, he might also order a secret project towards performing that pivotal act.
With a democracy, it's more tricky, especially given that democratic institutions are in bad shape right now.
But if the labor market is a disaster due to AI, and the state is not stepping in adequately to make people whole in the material sense, I can imagine anti-AI forces taking power via democratic means (the main objection is timelines, 4 years is like infinity these days). The incumbent politicians might also start changing their positions on this, if things are bad and there is enough pressure.
A more exotic scenario is an AI executive figuring out how to take over a nuclear-weapons-armed country while being armed only with a sub-AGI specialized system him/herself, and then deciding to impose a freeze on AI development. "A sub-AGI-powered human-led coup, followed by a freeze". The country in question might support this, depending on the situation.
Another exotic scenario is a group of military officers performing a coup, and their platform might include "stop AI" as one of the clauses. The country will consist of people who support them and people who are mostly silent due to fear.
I think it's not difficult to generate scenarios. None of these scenarios is very pleasant, there is that, unfortunately... (And there is no guarantee that any such scenario will actually succeed at stopping the ASI. That's the problem with all these bans on AI, and scary state forces, and nuclear threats. It's not clear if they end up actually preventing the development of an ASI by a small actor, there are too many unknowns.)
It's an interesting question.
If one dives into the essay itself, one sees some auxiliary PNRs which the author thinks have already occurred.
For instance, in the past, it would have been conceivable for a single country of the G20 to unilaterally make it their priority to ban the development of ASI and its precursors.
In the past, it would have been conceivable for any country in the West to decide to fight off Big Tech and lead the collective fight.
I think both are still quite conceivable (both in a democracy and in a non-democratic state, with an additional remark that there is no guarantee that all countries in the West stay democratic). So here I disagree with the author.
But the Soft PNR is tricky. In the essay, the author defines the Soft PNR differently than in this post:
The Soft PNR is when AI systems are so powerful that, although they “can” theoretically be turned off, there is not enough geopolitical will left to do so.
And geopolitical will is something which can fluctuate. Now it's absent, currently there is no geopolitical will to do so, but in the future it might emerge (then it might disappear again, and so on).
When something can fluctuate in this fashion, in what sense can one talk about a "point of no return"?
But as an initial approximation, this framing still might be useful (before one starts diving into the details of how inevitable disagreements over this matter between countries and between political forces might be resolved; when one dives into those disagreements one discovers different "points of no return", e.g. how likely it is that the coalition of pro-AI people and AIs is effectively undefeatable).
I think we do tend to underestimate differences between people.
We know theoretically that people differ a lot, but we usually don’t viscerally feel how strong those differences are. One of the most remarkable examples of that is described here:
https://www.lesswrong.com/posts/NyiFLzSrkfkDW4S7o/why-it-s-so-hard-to-talk-about-consciousness
With AI existential safety, I think our progress is so slow because people mostly pursue anthropocentric approaches. Just like with astronomy, one needs a more invariant point of view to make progress.
I’ve done a bit of scribblings along those lines: https://www.lesswrong.com/posts/WJuASYDnhZ8hs5CnD/exploring-non-anthropocentric-aspects-of-ai-existential
But that’s just a starting point, a seed of what needs to be done in order to make progress…