Just jaunt superquantumly to another quantum world instead of superluminally to an unobservable galaxy. What about these two physically impossible counterfactuals is less than perfectly isomorphic? Except for some mere ease of false-to-fact visualization inside a human imagination that finds it easier to track nonexistent imaginary Newtonian billiard balls than existent quantum clouds of amplitude, with the latter case, in reality, covering both unobservable galaxies distant in space and unobservable galaxies distant in phase space.
I reiterate the galaxy example; saying that you could counterfactually make an observation by violating physical law is not the same as saying that something's meaning cashes out to anticipated experiences. Consider the (exact) analogy between believing that galaxies exist after they go over the horizon, and that other quantum worlds go on existing after we decohere them away from us by observing ourselves being inside only one of them. Predictivism is exactly the sort of ground on which some people have tried to claim that MWI isn't meaningful, and they're correct in that predictivism renders MWI meaningless just as it renders the claims "galaxies go on existing after we can no longer see them" meaningless. To reply "If we had methods to make observations outside our quantum world, we could see the other quantum worlds" would be correctly rejected by them as an argument from within predictivism; it is an argument from outside predictivism, and presumes that correspondence theories of truth can be defined meaningfully by imagining an account from outside the universe of how the things that we've observed have their own causal processes generating those observations, such that having thus identified the causal processes through observation, we may speak of unobservable but fully identified variables with no observable-to-us consequences such as the continued existence of distant galaxies and other quantum worlds.
One minor note is that, among the reasons I haven't looked especially hard into the origins of "verificationism"(?) as a theory of meaning, is that I do in fact - as I understand it - explicitly deny this theory. The meaning of a statement is not the future experimental predictions that it brings about, nor isomorphic up to those predictions; all meaning about the causal universe derives from causal interactions with us, but you can have meaningful statements with no experimental consequences, for example: "Galaxies continue to exist after the expanding universe carries them over the horizon of observation from us." For my actual theory of meaning see the "Physics and Causality" subsequence of Highly Advanced Epistemology 101 For Beginners.
That is: among the reasons why I am not more fascinated with the antecedents of my verificationist theory of meaning is that I explicitly reject a verificationist account of meaning.
My point is that plausible scenarios for Aligned AGI give you AGI that remains aligned only when run within power bounds, and this seems to me like one of the largest facts affecting the outcome of arms-race dynamics.
This all assumes that AGI does whatever its supposed operator wants it to do, and that other parties believe as much? I think the first part of this is very false, though the second part alas seems very realistic, so I think this misses the key thing that makes an AGI arms race lethal.
I expect that a dignified apocalypse looks like, "We could do limited things with this software and hope to not destroy the world, but as we ramp up the power and iterate the for-loops more times, the probability of destroying the world goes up along a logistic curve." In "relatively optimistic" scenarios it will be obvious to operators and programmers that this curve is being ascended - that is, running the for-loops with higher bounds will produce an AGI with visibly greater social sophistication, increasing big-picture knowledge, visible crude attempts at subverting operators or escaping or replicating outside boxes, etc. We can then imagine the higher-ups demanding that crude patches be applied to get rid of the visible problems in order to ramp up the for-loops further, worrying that, if they don't do this themselves, the Chinese will do that first with their stolen copy of the code. Somebody estimates a risk probability, somebody else tells them too bad, they need to take 5% more risk in order to keep up with the arms race. This resembles a nuclear arms race and deployment scenario where, even though there's common knowledge that nuclear winter is a thing, you still end up with nuclear winter because people are instructed to incrementally deploy another 50 nuclear warheads at the cost of a 5% increase in triggering nuclear winter, and then the other side does the same. But this is at least a relatively more dignified death by poor Nash equilibrium, where people are taking everything as seriously as they took nuclear war back in the days when Presidents weren't retired movie actors.
In less optimistic scenarios that realistically reflect the actual levels of understanding being displayed by programmers and managers in the most powerful organizations today, the programmers themselves just patch away the visible signs of impending doom and keep going, thinking that they have "debugged the software" rather than eliminated visible warning signs, being in denial for internal political reasons about how this is climbing a logistic probability curve towards ruin or how fast that curve is being climbed, not really having a lot of mental fun thinking about the doom they're heading into and warding that off by saying, "But if we slow down, our competitors will catch up, and we don't trust them to play nice" along of course with "Well, if Yudkowsky was right, we're all dead anyways, so we may as well assume he was wrong", and generally skipping straight to the fun part of running the AGI's for-loops with as much computing power as is available to do the neatest possible things; and so we die in a less dignified fashion.
My point is that what you depict as multiple organizations worried about what other organizations will successfully do with an AGI being operated at maximum power, which is believed to do whatever its operator wants to do, reflects a scenario where everybody dies really fast, because they all share a mistaken optimistic belief about what happens when you operate AGIs at increasing capability. The real lethality of the arms race is that blowing past hopefully-visible warning signs or patching them out, and running your AGI at increasing power, creates an increasing risk of the whole world ending immediately. Your scenario is one where people don't understand that and think that AGIs do whatever the operators want, so it's a scenario where the outcome of the multipolar tensions is instant death as soon as the computing resources are sufficient for lethality.
Thank you very much! It seems worth distinguishing the concept invention from the name brainstorming, in a case like this one, but I now agree that Rob Miles invented the word itself.
The technical term corrigibility, coined by Robert Miles, was introduced to the AGI safety/alignment community in the 2015 paper MIRI/FHI paper titled Corrigibility.
Eg I'd suggest that to avoid confusion this kind of language should be something like "The technical term corrigibility, a name suggested by Robert Miles to denote concepts previously discussed at MIRI, was introduced..." &c.
Seems rather obvious to me that the sort of person who is like, "Oh, well, we can't possibly work on this until later" will, come Later, be like, "Oh, well, it's too late to start doing basic research now, we'll have to work with whatever basic strategies we came up with already."
Why do you think the term "corrigibility" was coined by Robert Miles? My autobiographical memory tends to be worryingly fallible, but I remember coining this term myself after some brainstorming (possibly at a MIRI workshop). This is a kind of thing that I usually try to avoid enforcing because it would look bad if all of the concepts that I did in fact invent were being cited as traceable to me - the truth about how much of this field I invented does not look good for the field or for humanity's prospects - but outright errors of this sort should still be avoided, if an error it is.
Agent designs that provably meet more of them have since been developed, for example here.
First I've seen this paper, haven't had a chance to look at it yet, would be very surprised if it fulfilled the claims made in the abstract. Those are very large claims and you should not take them at face value without a lot of careful looking.
Lots of people work for their privileges! I practiced writing for a LONG time - and remain continuously aware that other people cannot be expected to express their ideas clearly, even assuming their ideas to be clear, because I have Writing Privilege and they do not. Does my Writing Privilege have an innate component? Of course it does; my birth lottery placed me in a highly literate household full of actually good books, which combined with genuine genetic talent got me a 670 Verbal score on the pre-restandardized SAT at age eleven; but most teens with 670V SAT scores can't express themselves at all clearly, and it was a long long time and a lot of practice before I started being able to express myself clearly ever even on special occasions. It remains a case of Privilege, and would be such even if I'd obtained it entirely by hard work starting from an IQ of exactly 100, not that this is possible, but if it were possible it would still be Privilege. People who study hard, work hard, compound their luck, and save up a lot of money, end up with Financial Privilege, and should keep that in mind before expecting less financially privileged friends to come with them on a non-expenses-paid fun friendly trip. We are all locally-Privileged in one aspect or another, even that kid at the center of Omelas, and all we can do is keep it in mind.