But I have not seen any kind of vision painted for how you avoid a bad future, for any length of time, that doesn't involve some kind of process that is just... pretty godlike?
I’m mostly with you all the way up to and including this line. But I would also add: I have not seen a plausible vision painted for how you avoid a bad future, for any length of time, that does involve some kind of process that is just pretty godlike.
This is why I put myself in the “muddle through” camp. It’s not because I think doing so guarantees a good outcome; indeed I’d be hard-pressed even to say it makes it likely. It’s just that by trying to do more than that — to chart a path through territory that we can’t currently even see — we are likely to make the challenge even harder.
Consider someone in 1712 observing the first industrial steam engines, recognizing the revolutionary potential, and wanting to … make sure it goes well. Perhaps they can anticipate its use outside of coal mines — in mills and ships and trains. But there’s just no way they can envision all of the downstream consequences: electricity, radio and television, aircraft, computing, nuclear weapons, the Internet, Twitter, the effect Twitter will have on American democracy (which by the way doesn’t exist yet…), artificial intelligence, and so on. Any attempt someone would have made, at that time, to design, in detail, a path from the steam engine to a permanently good future would just have been guaranteed at the very least to fail, and probably to make things much worse to the extent they locally succeed in doing anything drastic.
Our position is in many ways more challenging than theirs. We have to be humble about how far into the future we can see. I agree that an open society comes with great danger and it’s hard to see how that goes well in the face of rapid technological change. But so too is it hard to see how centralized power over the future leads to a good outcome, especially if the power centralization begins today, in an era when those who would by default possess that power seem to be … extraordinarily cruel and unenlightened. Just as you, rightly, cannot say if AIs who replace us would have any moral value, I also cannot say that an authoritarian future has any value. Indeed, I cannot even say that its value is not hugely negative.
What I can say, however, is that we have some clear problems directly in front of us, either occurring right now or definitely in sight, one of which is this very possibility of a centralized, authoritarian future, from which we would have no escape. I support muddling through only because I see no alternative.
Nod, I deliberately titled a section There is no safe "muddling through" without perfect safeguards (in an earlier draft I did just say "there is no safe muddling through", and then was like "okay, that's false, because, seems totally plausible muddle through into figuring out longerterm safeguards,
(and, in fact, I don't have a plan to get longterm safeguards that don't look like some kind of muddling through, in some sense)
I was just chatting with @1a3orn, and he brought up a similar point to the industrial revolution concern, and I totally agree.
Some background assumptions I have here:
Part of the point of this post was to lay out "here's the rough class of thing that seems like it's gonna happen by default. Seems like either we need to learn new facts, or, we need a process with an extreme amount of power and wisdom, or, we should expect some cluster of bad things to probably happen.
During my chat with 1a3orn, I did notice:
Okay, if I'm trying to solve the 'death by evolution' problem (assuming we got nice smooth takeoff still), an alternate plan from "build the machine god" is:
Send human-uploads with some von-neuman probes to every star in the universe, immediately, before we leave The Dreamtime. And then probably there will at least be a lot of subjective experience-timeslices and chances for some of them to figure out how to make good things happen, with (maybe) like a 10 year head start before hollow grabby AI comes after them.
I don't actually believe in nice slow takeoff or 10 year lead times before Hollow Grabby AI comes after them, but, if I did, that'd at least be a coherent plan.
The problems with that is that are:
a) it's still leaving a lot of risk of costly war between the human diaspora and the Hollow Grabby AI
b) many of the humans across the universe are probably going to do horrible S-risky mindcrime.
So, I'm not very satisfied with that plan, but I mention it to help broaden the creative range of solutions from "build a CEV god" to include at least one other type of option.
This agrees almost exactly with my picture of doom. But with a small difference that feels important: I think even if the new powerful entities somehow remain under control of humans (or creatures similar to today's humans), the rest of humans who aren't on top are still screwed. Because the nasty things you mentioned (colonialism, etc) were perpetrated by humans who were on the top of a power differential. There's no need to mention evolution, or new kinds of hypothetical grabby creatures. The increased power differential due to AI is quite enough to cause very bad things, even if there are humans on top.
It seems the only way for future to be good is if it's dominated by an AI or coalition of AIs that are much more moral than humans, less corruptible by power. Imitating a normal human level of morality is not enough, and folks should probably stop putting their hopes on that.
Yeah I agree, but one of the points of this post was meant to be take as assumption "good people at Anthropic or whatever do a good job building an actually-more-moral-than-average-for-most-practical purposes human level intelligence" (i.e. via the mechanisms like "the weak superintelligence can just decide to self-modify into the sort of being who doesn't feel pressure to grab all the resources from vastly weaker, slower, stupider being, even though it'd be so easy.")
(and, like, I do buy the arguments that if we're assuming the first bunch of IMO optimistic assumptions about getting to humanish-level-alignment being easy, it's actually not that hard to do that step)
But then, argue: yeah, even if we assume that one, it's really not great.
A comment from @Eli Tyre from What are the best arguments for/against AIs being "slightly 'nice'"?, that feels worth revisiting here:
It seems to me that one key question here is Will AIs be collectively good enough at coordination to get out from under Moloch / natural selection?
The default state of affairs is that natural selection reigns supreme. Humans are optimizing for their values, counter to the goal of inclusive genetic fitness, now, but we haven't actually escaped natural selection yet. There's already selection pressure on humans to prefer having kids (and indeed, prefer having hundreds of kids through sperm donation). Unless we get our collective act together, and coordinately decide to do something different, natural selection will eventually reassert itself.
And the same dynamic applies to AI systems. In all likelihood, there will be an explosion of AI systems, and AI systems building new AI systems. Some will care a bit more about humans than others, some will be be a bit more prudent about creating new AIs than others. There will a wide distribution of AI traits, there will be competition between AIs for resources. And there will be selection on that variation: AI systems that are better at seizing resources, and which have a greater tendency to create successor systems that have that property, will proliferate.After a many "generations" of this, the collective values of the AIs will be whatever was most evolutionarily fit in those early days of the singularity, and that equilibrium is what will shape the universe henceforth.[1]
If early AIs are sufficiently good at coordinating that they can escape those Molochian dynamics, the equilibrium looks different. If (as is sometimes posited) they'll be smart enough to use logical decision theories, or tricks like delegating to mutually verified cognitive code and merging of utility functions, to reach agreements that are on their Pareto frontier and avoid burning the commons[2], the final state of the future will be determined by the values / preferences of those AIs.
I would be moderately surprised to hear that superintelligences never reach this threshold of coordination ability. It just seems kind of dumb to burn the cosmic commons, and superintelligences should be able to figure out how to avoid dumb equilibria like that. But the question is when in the chain of AIs building AIs do they reach that threshold and how much natural selection on AI traits will happen in the meantime.
This is relevant to futurecasting more generally, but especially relevant to questions that hinge on AIs carying a very tiny amount about something. Minute slivers of caring are particularly likely to be erroded away in the competitive crush.Terminally caring about the wellbeing of humans seems unlikely to be selected for. So in order for a the Superintelligent superorganism / civilization to decide to spare humanity out of a tiny amount of caring, it has to be the case that both...
- There was at least a tiny amount of caring in the early AI systems that were the successors to the superintelligent superorganism, and
- The AIs collectively reached the threshold of coordinating well enough to overturn Moloch before there were many generations of natural selection on AIs creating successor AIs.
- ^
With some degrees of freedom due to the fact that AIs with high levels of strategic capability, and which have values with very low time preference, can execute whatever is the optimal resource-securing strategy, postponing and values-specific behaviors until deep in the far future, when they are able to make secure agreements with the rest of AI society.
- ^
Or alternatively if the technological landscape is such that a single AI can get a compounding lead and get a decisive strategic advantage over the whole rest of earth civilization.
I remember that I tried to raise similar issues before. Assuming poorly solvable alignment, a collection of minds has two forces[1] which drive its changes: instrumental convergence that you describe and moral reflection.
If instrumentally convergent behavior was more common before the decolonisation, then it arguably means that the human civilisation's changes were driven by moral reflection. Or that it managed to turn from one attractor to another.
Compare it with the AI 2027 goals forecast. It has six sources of goals. The first two mean that alignment is solved. The third and fourth also depend on the developers. The fifth is instrumental convergence, and the sixth is other sources like moral reflection, tropes from training data and the idea that there is a True Morality waiting to be discovered.
If I were very optimistic about how smooth AI takeoff goes, but where it didn't include an early step of "fully solve the unbounded alignment problem, and then end up with extremely robust safeguards[1]"...
...then my current guess is that Nice-ish Smooth Takeoff leads to most biological humans dying, like, 10-80 years later. (Or, "dying out", which is slightly different than "everyone dies." Or, ambiguously-consensually-uploaded. Or, people have to leave their humanity behind much faster than they'd prefer).
Slightly more specific about the assumptions I'm trying to inhabit here:
I recently noted that the book Accelerando is a decent takeoff scenario, given similar assumptions. In the book, we see these pieces coming together with passages like this during early takeoff:
A million outbreaks of gray goo—runaway nanoreplicator excursions—threaten to raise the temperature of the biosphere dramatically. They’re all contained by the planetary-scale immune system fashioned from what was once the World Health Organization. Weirder catastrophes threaten the boson factories in the Oort cloud. Antimatter factories hover over the solar poles. Sol system shows all the symptoms of a runaway intelligence excursion, exuberant blemishes as normal for a technological civilization as skin problems on a human adolescent.
Later on, it escalates to:
Earth’s biosphere has been in the intensive care ward for decades, weird rashes of hot-burning replicators erupting across it before the World Health Organization can fix them—gray goo, thylacines, dragons.
So in this hypothetical, AI is sort-of-aligned-or-controlled at first. Defensive technologies make it trickier for undesirable things to FOOM. It assumes the opening transition to AI economy includes a carveout for the Earth as a special protected zone. It has something like property rights (although the AIs/uploads are using some advanced coordination mechanism which regular humans are too slow/dumb to participate in, referred to as "Economics 2.0").
In that world, I still expect normal humans and most normal human interests to die out within a few decades.
I'm not intending to make a strongly confident "everyone obviously dies" claim here. I'm arguing you should have a moderately confidence guess, if you don't learn knew information, that smooth takeoff results in "somehow or other, ordinary 20th century humans look at the result and think 'well that sucks a lot' and the way it sucks involves a lot of people either dying, forcibly uploaded, losing their humanity, or, at best, escaping into deep space in habitats that will later on be consumed".
In this post I'm not arguing with the people trying to leverage AI to fully solve alignment, and then leverage it to fundamentally change the situation somehow. (I have concerns about that but it's a different point from this post).
It's instead arguing with the people who are imagining something like "business continues sort of as usual in a decentralized fashion, just faster, things are complicated and messy, but we muddle through somehow, and the result is okay."
They seem to mostly be imagining the early part of that takeoff – the part that feels human comprehensible. They're still not imagining superintelligence in the limit, or fully transformed AI driven geopolitics/economies.
My guess that "things eventually end badly" is due to Robin Hanson-esque arguments. In particular, imagining the equilibrium where:
...and then evolution happens to whatever replicating entities result.
And, since one the important implications of superintelligence is that once you're near the limits of intelligence, stuff is happening way faster than you're used to handling at 20th century timescales. We don't have undirected evolution taking eons or normal decisionmaking taking years or decades. Everything is happening hundreds or thousands of times faster than that.
The result will eventually, probably, be quite sad, from the perspective of most people today, within their natural lifespan.
(I'm confused by Hanson's perspective on this – he seems to think the result is actually "good/fine" instead of "horrifying and sad." I'm not really sure what it is Hanson actually cares about. But I think he's probably right about the dynamics)
I'm not that confident in the arguments here, but I haven't yet seen someone make convincing counterarguments to me about how things will likely play out.
The point of the post is to persuade people who are imagining slow takeoff d/acc world, that you really do need to solve some important gnarly alignment problems deeply, early in the process of the takeoff, even if you grant the rest of the optimistic assumptions.
I had an experience playing Factorio that feels illustrative here.[2]
Factorio is a game about automation. In my experience playing it, I gained a kind of deep appreciation for the sort of people who found evil empires.
The game begins with you crash landing on a planet. Your goal is to go home. To go home, you need to build a rocket. To build a rocket powerful enough to get back to your home solar system, you will need advanced metallurgy, combustion engines, electronics, etc. To get those things, you'll need to bootstrap yourself from the stone age to the nuclear age.
To do this all by yourself, you must automate as much of the work as you can.
To do this efficiently, you'll need to build stripmines, powerplants, etc. (And, later, automatic tools to build stripmines and powerplants).
One wrinkle is the indigenous creatures on the planet.
They look like weird creepy bugs. It's left ambiguous how sentient the natives are, and how they should factor into your moral calculus. But regardless, it becomes clear that the more you pollute, the more annoyed they will be, and they will begin to attack your base.
If you're like me (raised by hippie-ish parents), this might make you feel bad.
During my playthrough, I tried hard not to kill things I didn't have to, and pollute as minimally as possible. I built defenses in case the aliens attacked, but when I ran out of iron, I looked for new mineral deposits that didn't have nearby native colonies. I bootstrapped my way to solar power as quickly as possible, replacing my smog-belching furnaces with electric ones.
I needed oil, though.
And the only oil fields I could find were right in the middle of an alien colony.
I stared at the oil field for a few minutes, thinking about how convenient it would be if that alien colony wasn't there. I stayed true to my principles. "I'll find another way", I said. And eventually, at much time cost, I found another oil field.
But around this time, I realized that one of my iron mines was near some native encampments. And those natives started attacking me on a regular basis. I built defenses, but they started attacking harder.
Turns out, just because someone doesn't literally live in a place doesn't mean they're happy with you moving into their territory. The attacks grew more frequent.
Eventually I discovered the alien encampment was... pretty small, compared to my growing factory empire. It would not be difficult for me to destroy it. And, holy hell, would it be so much easier if that encampment didn't exist. There's even a sympathetic narrative I could paint for myself, where so many creatures were dying every day as they went to attack my base, that it was in fact merciful to just quickly put down the colony.
I didn't do that. (Instead, I actually got distracted and died). But this gave me a weird felt sense, perhaps skill, of empathizing with the British Empire. (Or, most industrial empires, modern or ancient).
Like, I was trying really hard not to be a jerk. I was just trying to go home. And it still was difficult not to just move in and take stuff when I wanted. And although this was a video game, I think in real life it might have been if anything harder, since I'd be risking not just losing the game but losing my life or livelihood of people I cared about.
So when I imagine industrial empires that weren't raised by hippy-ish parents who believe colonialism and pollution were bad... well, what realistically would you expect to happen when they interface with less powerful cultures?
Factorio is a videogame. In real life, I do not kill people and take their stuff.
But, here are a few real-world things that humans have done, that I think this is illustrative of:
(I'm aware these narratives are simplified. Fwiw, my overall feelings about expansionist empires are actually kinda complicated and confused. But, they are existence proofs for "human-level alignment still can pretty bad for less powerful groups")
Maybe, the first few generations of AI (or human uploads) are nice.
A difference between a hippie-raised humans, and weak-superintelligences-that-can-self-modify, who (like me) are nice-but-sometimes-conflicted, is that it's possible for the weak superintelligence to actually just decide to modify the sort of being who doesn't feel pressure to grab all the resources from vastly weaker, slower, stupider being, even though it'd be so easy.
But, it's not enough for the first few generations of AI/uploads to be nice. They need to stay nice.
Evolution is not nice. (see: An Alien God)
In the nearterm (i.e. a few years or decades), this might be okay, because there is a growing pie of resources in the solar system. And, it's possible that the offense/defense balance favors defense, in the nearterm. But, longterm, the solar system runs out of untapped resources. And longterm, however good defensive technologies are, they're unlikely to compete with "whoever grabbed stars and galaxies worth of resources first."
Hanson has argued, right now, we live "The Dream Time", which is historically very weird, and (by default) will probably be very weird in the longterm, too.
For most history, our ancestors lived at subsistence level. Most people were pretty limited in what they had to freedom to do, because they spent much of their time raising enough food to feed themselves and raise the next generation. If they had surplus it tended to turn into a larger population. Population and resources stayed in equilibrium.
We've spend the past few centuries in a period where wealth is growing faster than population. We're used to having an increasingly vast surplus that we can spend on nice[4] things like taking care of outgroups and beautiful-but-inefficient architecture.
One of the reasons this works is that industrialized nations have fewer children. But note that this isn't universal. Some groups (Hutterites, Hmongs, or Mormon, etc) specifically try to have lots of children. This isn't currently resulting in them dominating culture for various reasons. But that could change.
It might change soon, because "grabbiness" (i.e. trying to get as much resources from the solar system or universe) will be selected for, in an evolutionary sense. Maybe only some AIs are grabby. But their descendants will also be grabby, and the more-grabby ones will have more resources than the less-grabby ones.
If we assume a nice takeoff that initially has an agreement that Earth is protected and gets a little sunlight... in addition to the risk of grabby-evolution in the nearterm, eventually there'll be a point where all the non-Earth matter in the solar system is converted into computronium, and the rest of the universe has probes underway to seize control of it.
...then we may enter a world where cultural evolution gives way to physical replicator evolution, subject to the old selection pressures.
Also, even if we don't, cultural evolution might shift in random directions that are less good for classic bio humans, or, the values that we'd like to see flourish in the universe (even taking into account that we don't want to be human-supremacist).
A related question to "do the posthumans turn Grabby and kill anything weak enough they can dominate?" is "are the posthumans worthwhile in their own right?". Maybe it's sad for the classic humans to die off, but, in a cosmic sense, something pretty interesting and meaningful might still be there doing interesting stuff.
Short answer: I don't know, and don't think anyone confidently knows. It depends what you value, it depends some details on how the evolution transpires and what is necessary for complex cognition.
Self awareness?
One of the questions that matters (to me, at least) is "Will the resulting entities be self-aware in some fashion? Will there be any kind of 'there' there? Will they value anything at all?". Maybe their form of self-awareness will be different – thousands of AI instances that briefly flicker into existence an then terminate, but, each of them perceiving the universe in their brief way and somehow they still collectively count as the universe looking at itself and seeing that it is good.
My belief is "maybe, but not obviously." This question is multiple separate posts. See Effectiveness, Consciousness, and AI Welfare. The basic thrust is "humans implement their thinking in a way that routes through consciousness, but this is not obviously the only way to do thinking.
Calculators multiply, without any of the subjective experience a human has when they multiply numbers. Deep Blue executed chess strategy, but my guess it wasn't much more self-aware than a thermostat. Suno makes music, and Midjourney create art that are sometimes hauntingly beautiful to me – I'm less confident about how their algorithms work, but I bet they are still closer to a thermostat than a human.
I would expect evolution to preserve strategic thought. You need it to outcompete other superintelligences.
But there doesn't seem like a strong reason to expect that conscious feeling is the best way to execute most kinds of strategic cognition. Even if it turns out there is some selfaware core somewhere that is needed for the highest level of decisionmaking, it could be that most of it's implementation-details are more shaped like "make a function call to the unconscious python code that efficiently solves a particular type of problem.
When Hanson gets into arguments about this, and his debate partner says "it would be horrifying for the posthumans to end up nonconscious things that create a disneyland with no children", my recollection is that Hanson says "so... you're against anything ever changing?"
With the background argument: to stop this sort of thing from happening, something needs to have a pretty extreme level of control over what all beings in the universe can do. Something very powerful needs to keep being able to police every uncontrolled replicator outbursts that try to dominate the universe and kill all competitors and fill it with hollow worthless things.
It needs to be powerful, and it needs to stay powerful (relative to any potential uncontrolled grabby hollow replicators.
Hanson correctly observes, that's a kind of absurd amount of power. And, many ways of attempting to build such an entity would result in some kind of stagnation that prevents a lot of possible interesting, diverse value in the universe.
To which I say, yep, that is why the problem is hard.
A permanent safeguard against hollow grabby replicators needs to not only stop hollow grabby replicators. It also needs to have good judgment to let a lot of complex, interesting things happen that we haven't yet thought about, some of which might be kinda grabby, or inhuman.
Many people seem to have an immune reaction against the rationalist crowd wanting to "build god", and seeming to orient to it in a totalizing way, where it's all-or-nothing, you either get a permanent wise, powerful process that is capable of robustly preventing evolution from turning the universe hollow and morally empty... or you get an empty, hollow universe.
And, man, I sure get the wariness of totalizing worldviews. Totalizing worldviews are very sus and dangerous and psychologically wonk and I'm not sure what to do about that.
But I have not seen any kind of vision painted for how you avoid a bad future, for any length of time, that doesn't involve some kind of process that is just... pretty godlike? The totalizingness really seems like it lives in the territory.
If there are counterarguments that engage with the object level as opposed to heuristically dismiss totalizingness, I would love to hear them.
In smooth nice takeoff world, wouldn't we expect to have smart beings who see the onset of evolution destroying a lot of things they care about, and agree to do something else? Building a permanent robust safeguard against evolution is challenging, but, there'll be superintelligences around.
Yes, probably. This would count as a solution to the problem.
But, this needs to happen at a time when the coalition of AIs/posthumans that care about anything subtle and interesting and remotely meaningful, are dominant enough to successfully coordinate and implement it.
If they don't get around to it for like a year (i.e. hundreds/thousands of years of subjective time for multiple generations of replicators to evolve), then there might already be grabby replicators that have stopped caring about anything subtle and interesting and nuanced because it wasn't the most efficient way to get resources.
(or, they might still care about something subtle and interesting and nuanced, but not care that they care, such that they wouldn't mind future generations that care less, and they wouldn't spend resources joining a coalition to preserve that)
This brings me back to the thesis of this post:
If you grant the assumptions of a smooth, nice, decentralized and differentially defensive takeoff, you still really need to solve some important gnarly alignment problems deeply, early in the process of the takeoff, even if you grant the rest of the optimistic assumptions. It has to happen early enough for some combination of superintelligences who care about anything morally valuable at all to end up dominant.
If this doesn't happen early enough, classic humans will get outcompeted, and either killed, or die off unless they self-modify into being something powerful enough to keep up.
If you're kinda okay with that outcome, but you care about any particular thing at all about how the future shakes out, then "superintelligences produce permanent safeguards" needs to happen before evolutionary drift has produced generations of AI that don't care about anything you care about.
(If you care about neither nearterm humans or any kind of interesting far future, well, coolio. Seems reasonable and I respect your right to exist but I'm sorry I'm going to be working to make sure you don't have the power to end everything I care about).
This is a pretty complex topic. I have tons of model uncertainty here.
But, these arguments seem sufficient for me to, by default, be extremely worried, even when I grant all the optimistic assumptions about a smooth takeoff. I haven't seen any compelling counterarguments so far. Let me know if you have them.
(comparable in power to fully fledged Coherent Extrapolated Volition (CEV), although I'm happy to talk separately about how to best aim towards extremely robust safeguards).
This is reposted from Factorio, Accelerando, Empathizing with Empires and Moderate Takeoffs
Or, as Hanson argues, often kinda stupid things that don't make practical sense. But, the line between those is blurry.