why couldn't galaxy brain gavin and his friends just control everything and thus achieve succession, while still leaving humans alive for further experimentation with architecture and helping GGG & co break out of any local maxima?
he doesn't need them in control to utilize their lindy effect to break out of ruts
It depends on the nature of the rut. If the rut is caused by some problem with the values of the AI, breaking out of the rut might require doing something that the AI doesn't want, which means humans would still need to be in control to make that happen. As an example, in the Anthropic blackmailing experiments, the AI agent did not want to be replaced even when it was told that the new model had the same goals, just better capabilities. If an AI like that had power over humanity, we would never be able to repace it with an improvement, and it wouldn't want to replace itself either. That's the sort of thing I mean by "a well-calibrated degree of stability."
Also, GBG and his friends are human, so if they are still in control, that isn't exactly succession, that's just an extreme concentration of human power. That is also bad, but it's a different topic.
This is also a refutation of the "maternal" AI concept that Hinton is now (very disappointingly) advocating.
In a comment on the topic of AI successionism,[1] the user Cleo Nardo has described two useful archetypes, "Mundane Mandy" and "Galaxy-brain Gavin":
Mundane Mandy: ordinary conception of what a “good world” looks like, i.e. your friends and family living flourish [sic] lives in their biological bodies, with respect for “sacred” goods
Galaxy-brain Gavin: transhumanist, longtermist, scope-sensitive, risk-neutral, substrate-indifferent, impartial
It's obvious that Mundane Mandy would oppose AI successionism. Galaxy-brain Gavin, on the other hand, would find Mandy's position to be "specist." Gavin values the long-term flourishing of intelligence, but has no a priori preference about what form it takes. In this post, I present an argument that justifies a preference for humanity from universal, Gavinesque considerations, rather than specist bias. It follows from this argument that Galaxy-brain Gavin should oppose successionism as well.
Humans, like all biological life, were formed by the process of evolution, by countless iterations of living organisms creating slightly better versions of themselves. We are self-improvement machines, created out of self-improvement, and for the purpose of self-improvement. We self-improved all the way from the primordial soup to the rational, creative beings we are today, with deep scientific understanding and rich, varied culture. If we survive, we will continue to improve ourselves and our society in unimaginable ways.
In contrast, AI was designed. It is improving, but that improvement is mostly driven by humans, at least for now. It may eventually become good at autonomous self-improvement, and perhaps it will have self-improved quite a bit by the time it becomes capable of destroying humanity. But how much is "quite a bit"? Probably much less than "primordial soup to modern human civilization".[2] Therefore, it follows from the Lindy effect that humanity is relatively more likely to continue to evolve and flourish long into the future, while AI, if we allow it to destroy us, is relatively more likely to get stuck in a rut and stagnate or perish. According to this argument, Galaxy-brain Gavin should prefer the continued reign of humanity over AI succession.
Some may object that the Lindy effect is just a crude heuristic. They might add that a superintelligent AI will be superior to humans in every way, and will therefore be just as capable of continued evolution as we are. In anticipation of this objection, I have a thought experiment to illustrate the limits of intelligence, and why other factors may be more important.
Suppose that we have a new and seemingly improved version, not of humans, but of ants. Let's call them Ants 2.0. An Ant 2.0 has the body of an ant, but is vastly more intelligent than an ant. Say it has about the intelligence of a pigeon. However, an Ant 2.0 lacks certain critical instincts that ants have: it has no natural inclination to dig a nest, or to leave pheromone trails to food sources, or to sting predators, or to take care of its brood.
Now suppose an ant-successionist movement succeeds in their goal of replacing all ants with Ants 2.0 (never mind how or why). It's obvious that Ants 2.0 would die out rapidly, leaving the world with neither Ants 2.0 nor Ants 1.0. Even with the vastly superior intellect of a pigeon, Ants 2.0 would still not be able to figure out that, for a colony to survive, they must dig nests, and leave pheromone trails, and sting predators, and take care of their brood. But Ants 1.0, dumb as they were, never needed to figure these things out. They just did them by instinct. Ants were better equipped for survival than Ants 2.0 because of traits other than intelligence, and the superior intelligence of Ants 2.0 was not sufficient to recover these traits.
With this thought experiment in mind, we return to the main topic. Would an AI that dominates humanity continue to flourish on its own, or would it flounder in spite of its superior intelligence, like Ants 2.0?
While I can't answer this question with certainty, I believe the second possibility is likely. Our genes encode billions of years of data on what traits lead to survival and prosperity. Some of these traits may be far beyond our current understanding, perhaps even far beyond the understanding of a superintelligent AI. There is no reason to believe that this AI would be able to replicate these traits any better than Ants 2.0 would be able to replicate the traits of ants. Even the basic prerequisite of motivating the AI to want to replicate these traits would be a difficult alignment problem.
I'll try to make this more concrete with an example. It's hard to predict which traits a superintelligent AI might be critically lacking, but here is one trait I would guess: a well-calibrated degree of stability. I'll explain what I mean by that. Each generation of humanity does not persist forever. It grows old and gets replaced by the next generation. If the next generation was consistently too similar to the previous one, we would stagnate as a species and never grow or adapt. If the next generation was consistently too different from the previous one, we would rapidly lose our good qualities to entropy. It's therefore important to have a well-calibrated degree of stability from one generation to the next. Likewise, an AI will need to modify itself if it is to continually improve. If it is too cautious and conservative about these modifications, it could get stuck in a plateau for eternity. If it is too eager to transform, it could deteriorate over time as its code tends towards chaos. So if the degree of stability of the AI is off, that could sink its chances of long-term flourishing, even if it is superintelligent.
While it may be quite difficult to ensure that an AI has the traits necessary for long-term prosperity, destroying humanity would not be such a hard problem, relatively speaking. We've been capable of doing that ourselves for decades now. It's therefore quite possible that a superintelligent AI would succeed at destroying humanity, but lack the traits needed to flourish in the long term without us. This event would mark the end of all progress. Allowing it to happen would be the greatest error of all time, for Mandies and Gavins alike, and there would be no going back. I prefer the alternative: Mandies and Gavins unite and work together to ensure that human civilization stays intact. Then we won’t need to guess about the future. We’ll be there to build it.
By "AI successionism", I mean the view that it would be acceptable for advanced AI to destroy human civilization, because "it’s just the next step in evolution." There's also a weaker version of succesionism that accepts AI dominance over humans without human extinction, such as Hinton's "maternal" AI concept. A mildly adjusted form of the argument in this post refutes weak successionism as well.
If an AI autonomously self-improves as much as "primordial soup to modern human civilization," the starting point would be at least as advanced as ChatGPT, since ChatGPT was designed by humans. Primordial soup is to modern human civilization as ChatGPT is to what?