If one thinks the chance of an existential disaster is "anywhere between 10% and 90%", one should definitely worry about the potential of any plan to counter it to backfire.
Is permanent disempowerment (where the future of humanity only gets a tiny sliver of the reachable universe) an "existential disaster"? It's not literal extinction, "existential risk" can mean either, the distinction can be crucial.
I think permanent disempowerment (or extinction) is somewhere between 90% and 95% unconditionally, and north of 95% conditional on building a superintelligence by 2050. But literal extinction is only between 10% and 30% (on current trajectory). The chances improve with interventions such as a lasting ASI Pause, including an AGI-led ASI Pause, which makes it more likely that ASIs are at least aligned with the AGIs. A lasting AGI Pause (rather than an ASI Pause) is the only straightforward and predictably effective way to avoid permanent disempowerment, and a sane civilization would just do that, with some margin of even weaker AIs and even worse hardware.
Dodging permanent disempowerment (rather than merely extinction) without an AGI Pause likely needs AGIs that somehow both haven't taken over and simultaneously effective enough at helping with the ASI Pause effort. This could just take the form of allocating 80% of AGI labor or whatever to ASI alignment projects, so that capabilities never outpace the ability to either contain or avoid misalignment. So not necessarily a literal Pause, when there are AGIs around that can set up lasting institutions with inhuman levels of robustness capable of implementing commitments to pursue ASI alignment that are less blunt than a literal Pause yet still effective. But for the same reasons that this kind of thing might work, it seems unlikely to work without an AGI takeover.
where the future of humanity only gets a tiny sliver of the reachable universe
I am not sure how to think about this. "Canned primates" are not going to reach a big part of the physically reachable universe. For the purposes of thinking about "the light cone", one should still think about "merge with AI", "uploading", and so on. That line of reasoning should be not about "humans vs AIs", but about ways to have a "good merge" (that is, without succumbing to S-risks, and without doing bad things to unmodified biologicals).
Also, I tend to privilege already living humans or their close descendants over the more remote ones, so achieving personal immortality is important if one wants to enjoy a sizable chunk of "the light cone" (it takes time to reach it). Of course, we need personal immortality ASAP anyway, otherwise their "everyones dies" would really become true (although not all at once, and not without replacement, but that's cold comfort for those currently alive).
That's the intended meaning, I go into more detail in the linked post. Hence "the future of humanity" rather than simply "humanity", something humanity would endorse as its future, which is not exclusively (or at all) biological humans. Currently living humans could in principle develop tools to uplift themselves all the way to star-sized superintelligences, but that requires a star, while what humans might instead get is a metaphorical server rack, hence permanent disempowerment.
My comment is primarily an objection about vague terminology not distinguishing permanent disempowerment from extinction. Avoiding permanent disempowerment seems like the correct shared cause, while the cause of merely non-extinction has many ways of endorsing plans that lead to permanent disempowerment. And not being content with permanent disempowerment (even under the conditions of eutopia within strict constraints on resources) depends on noticing that more is possible.
Yes.
What I am going to say is semi-off-topic for this post (I was trying not to consider potential object-level disagreements), but I have noticed that when discussing human intelligence augmentation, the authors of IABIED always talk only about genetic enhancements and never about direct merge between humans and electronic devices (which seems to also be consistent with their past writings on this). So it seems that for unspecified (but perhaps very rational) reasons, they want to keep enhanced humans purely biological for quite a while.
(Perhaps they think that we can't handle close coupling of humans and electronics in a way which is existentially safe at this time.)
Whereas, sufficient uplifting requires fairly radical changes. And, in any case, intelligence augmentation via coupling with electronics is likely to be a much faster path and to produce a more radical intelligence augmentation. But, perhaps, they think that the associated existential risks are too high...
Genetic enhancement seems like a safe-ish way of getting a few standard deviations without yet knowing what you are really doing, that current humanity could actually attempt in practice. And that might help a lot with both the "knowing what you are doing" part, and with not doing irreversible things without knowing what you are doing. Any change risks misalignment, uplifting to a superintelligence requires ASI-grade alignment theory and technology, even lifespans for baseline biological humans that run into centuries risk misalignment (since this never happened before). There's always cryonics, which enables waiting for future progress, if civilization was at all serious about it.
So when you talk about "merging with AI", that is very suspicious, because a well-developed uplifting methodology doesn't obviously look anything like "merging with AI". You become some kind of more capable mind, that's different from what you were before, not taking irreversible steps towards something you wouldn't endorse. Without such a methodology, it's a priori about as bad an idea as building superintelligence in 2029.
I usually think about “reversible merges” for the purpose of intelligence augmentation (not for the purpose of space travel, though).
I tend to think that high-end non-invasive BCI are powerful enough for that and safer than implants. But yes, there still might be serious risks, both personal and existential.
would the AI safety research itself slow down orders of magnitude?
As far as I understood, the IABIED plan is to ensure that no one ever creates anything except for Verifiably Incapable Systems until AI alignment gets solved. But they didn't prevent mankind from uniting the AI companies into a megaproject, then confining AI research to said project and letting anyone send their takes on the project's forum and the public view anything approved by the forum's admins (e.g. capability evaluations, but not architecture discussions).
In addition, the public is allowed to create tiny models like the ones on which Agent-4 from the AI-2027 forecast did experiments to solve mechinterp. And to run verifiably incapable models, finetune them by approved[1] finetuning data, steer them.
What I don't understand is why the underground lab wouldn't join the INTERNATIONAL megaproject. This behaviour would require them to be too reckless or omnicidal maniacs or to want to take over the world. And no, anti-woke stance isn't an explanation because China would also participate and the CCP isn't pro-woke.
Unfortunately, your second point still stands: before Yudkowsky-style AI research takeover the labs could actually counteract.
Finetuning the models with anything unapproved (e.g. due to misaligning the models) should lead to the finetuner being invited to the project or prohibited to inform anyone else that the dataset is unapproved.
What I don't understand is why the underground lab wouldn't join the INTERNATIONAL megaproject.
Because they don't want to be known. That's what the word "underground" means.
The enforcement regime of this kind is prone to abuses, so there will be a lot of distrust; also they might feel that everyone is too incapacitated, and while they would not normally have a chance against larger above-the-ground orgs, the new situation is different.
to want to take over the world
Yes, this would be their plan, to take over the world, or to pass the control to the ASI which they would presume to be friendly to them (and, if they have an altruistic mindset, to everyone else too, but even in this case, the problem is that their assumptions of friendliness might be mistaken).
If one thinks the chance of an existential disaster is close to 100%, one might tend to worry less about the potential of a plan to counter it to backfire. It's not clear if that is a correct approach even if one thinks the chances of an existential disaster are that high, but I am going to set that aside.
If one thinks the chance of an existential disaster is "anywhere between 10% and 90%", one should definitely worry about the potential of any plan to counter it to backfire.
Out of all ways the IABIED plan to ban AI development and to ban publication of AI research could potentially backfire, I want to list three most obvious ways which seem to be particularly salient. I think it's useful to have them separately from object-level discussions.
1. Change of the winner. The most obvious possibility is that the plan would fail to stop ASI, but would change the winner of the race. If one thinks that the chance of an existential disaster is "anywhere between 10% and 90%", but that the actual probability depends on the identity and practices of the race winner(s), this might make the chances much worse. Unless one thinks the chances of an existential disaster are already very close to 100%, one should not like the potential of an underground lab winning the race during the prohibition period.
2. Intensified race and other possible countermeasures. A road to prohibition is a gradual process, it's not a switch one can immediately flip on. This plan is not talking about a "prohibition via a coup". When it starts looking like the chances of a prohibition to be enacted are significant, this can spur a particularly intense race (a number of AI orgs would view the threat of prohibition on par with the threat of a competitor winning). Again, if one thinks the chances of an existential disaster are already very close to 100%, this might not matter too much, but otherwise the further accelerated race might make the chances of avoiding existential disasters worse. Before succeeding at "shutting it all down", gradual advancement of this plan will have an effect of creating a "crisis mode", and various actors doing various things in "crisis mode".
3. Various impairments for AI safety research. Regarding the proposed ban on publication of AI research, one needs to ask where various branches of AI safety research stand. The boundary between safety research and capability research is thin, there is a large overlap. For example, talking about interpretability research, Nate was saying (April 2023, https://www.lesswrong.com/posts/BinkknLBYxskMXuME/if-interpretability-research-goes-well-it-may-get-dangerous)
I'm still supportive of interpretability research. However, I do not necessarily think that all of it should be done in the open indefinitely. Indeed, insofar as interpretability researchers gain understanding of AIs that could significantly advance the capabilities frontier, I encourage interpretability researchers to keep their research closed.
It would be good to have some clarity on this from the authors of the plan. Do they propose the ban on publications to cover all research that might advance AI capabilities, including AI safety research that might advance the capabilities? Where do they stand on this? For those of us who have the chance of an existential disaster "anywhere between 10% and 90%", this feels like something with strong potential of making our chances worse. Not only this whole plan is increasing the chances of shifting the ASI race winner to be an underground lab, but would that underground lab also be deprived of benefits of being aware of advances in AI safety research, and would the AI safety research itself slow down orders of magnitude?