Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.

[-]dsj2mo252

But I have not seen any kind of vision painted for how you avoid a bad future, for any length of time, that doesn't involve some kind of process that is just... pretty godlike?

I’m mostly with you all the way up to and including this line. But I would also add: I have not seen a plausible vision painted for how you avoid a bad future, for any length of time, that does involve some kind of process that is just pretty godlike.

This is why I put myself in the “muddle through” camp. It’s not because I think doing so guarantees a good outcome; indeed I’d be hard-pressed even to say it makes it likely. It’s just that by trying to do more than that — to chart a path through territory that we can’t currently even see — we are likely to make the challenge even harder.

Consider someone in 1712 observing the first industrial steam engines, recognizing the revolutionary potential, and wanting to … make sure it goes well. Perhaps they can anticipate its use outside of coal mines — in mills and ships and trains. But there’s just no way they can envision all of the downstream consequences: electricity, radio and television, aircraft, computing, nuclear weapons, the Internet, Twitter, the effect Twitter will have on American democracy (which by the way doesn’t exist yet…), artificial intelligence, and so on. Any attempt someone would have made, at that time, to design, in detail, a path from the steam engine to a permanently good future would just have been guaranteed at the very least to fail, and probably to make things much worse to the extent they locally succeed in doing anything drastic.

Our position is in many ways more challenging than theirs. We have to be humble about how far into the future we can see. I agree that an open society comes with great danger and it’s hard to see how that goes well in the face of rapid technological change. But so too is it hard to see how centralized power over the future leads to a good outcome, especially if the power centralization begins today, in an era when those who would by default possess that power seem to be … extraordinarily cruel and unenlightened. Just as you, rightly, cannot say if AIs who replace us would have any moral value, I also cannot say that an authoritarian future has any value. Indeed, I cannot even say that its value is not hugely negative.

What I can say, however, is that we have some clear problems directly in front of us, either occurring right now or definitely in sight, one of which is this very possibility of a centralized, authoritarian future, from which we would have no escape. I support muddling through only because I see no alternative.

[-]Raemon2mo112

Nod, I deliberately titled a section There is no safe "muddling through" without perfect safeguards (in an earlier draft I did just say "there is no safe muddling through", and then was like "okay, that's false, because, seems totally plausible muddle through into figuring out longerterm safeguards,

(and, in fact, I don't have a plan to get longterm safeguards that don't look like some kind of muddling through, in some sense)

I was just chatting with @1a3orn, and he brought up a similar point to the industrial revolution concern, and I totally agree.

Some background assumptions I have here:

you can't reason your way all the way to "safely navigate the industrial revolution", yeah. Some notable failures:
- inventing communism
- trying to invent the cotton gin to make slavery less bad, accidentally produce way more slavery via inducing demand
- environmentalism ending up banning nuclear stuff that caused a lot of environmental damage
- (there are positive examples too I think, but the existence of these negative examples should put the fear of god in you)
it's still possible to do nonzero reasoning ahead. You can put constraints on what sort of things possibly make sense to be doing.
- early industrial revolution: if you don't see the first steam train and think "oh shit, everything is gonna change", man you are going to be pointed in the wrong direction completely
  - analogously: if you don't look at the oncoming AI (as well as general economic trends), and think "man, All Possible Views About Humanity's Future Are Wild", you're not pointed in the right direction at all

Part of the point of this post was to lay out "here's the rough class of thing that seems like it's gonna happen by default. Seems like either we need to learn new facts, or, we need a process with an extreme amount of power and wisdom, or, we should expect some cluster of bad things to probably happen.

During my chat with 1a3orn, I did notice:

Okay, if I'm trying to solve the 'death by evolution' problem (assuming we got nice smooth takeoff still), an alternate plan from "build the machine god" is:

Send human-uploads with some von-neuman probes to every star in the universe, immediately, before we leave The Dreamtime. And then probably there will at least be a lot of subjective experience-timeslices and chances for some of them to figure out how to make good things happen, with (maybe) like a 10 year head start before hollow grabby AI comes after them.

I don't actually believe in nice slow takeoff or 10 year lead times before Hollow Grabby AI comes after them, but, if I did, that'd at least be a coherent plan.

The problems with that is that are:

a) it's still leaving a lot of risk of costly war between the human diaspora and the Hollow Grabby AI

b) many of the humans across the universe are probably going to do horrible S-risky mindcrime.

So, I'm not very satisfied with that plan, but I mention it to help broaden the creative range of solutions from "build a CEV god" to include at least one other type of option.

[-]Raemon2mo150

A comment from @Eli Tyre from What are the best arguments for/against AIs being "slightly 'nice'"?, that feels worth revisiting here:

It seems to me that one key question here is Will AIs be collectively good enough at coordination to get out from under Moloch / natural selection?
The default state of affairs is that natural selection reigns supreme. Humans are optimizing for their values, counter to the goal of inclusive genetic fitness, now, but we haven't actually escaped natural selection yet. There's already selection pressure on humans to prefer having kids (and indeed, prefer having hundreds of kids through sperm donation). Unless we get our collective act together, and coordinately decide to do something different, natural selection will eventually reassert itself.

And the same dynamic applies to AI systems. In all likelihood, there will be an explosion of AI systems, and AI systems building new AI systems. Some will care a bit more about humans than others, some will be be a bit more prudent about creating new AIs than others. There will a wide distribution of AI traits, there will be competition between AIs for resources. And there will be selection on that variation: AI systems that are better at seizing resources, and which have a greater tendency to create successor systems that have that property, will proliferate.
After a many "generations" of this, the collective values of the AIs will be whatever was most evolutionarily fit in those early days of the singularity, and that equilibrium is what will shape the universe henceforth.^[1]
If early AIs are sufficiently good at coordinating that they can escape those Molochian dynamics, the equilibrium looks different. If (as is sometimes posited) they'll be smart enough to use logical decision theories, or tricks like delegating to mutually verified cognitive code and merging of utility functions, to reach agreements that are on their Pareto frontier and avoid burning the commons^[2], the final state of the future will be determined by the values / preferences of those AIs.
I would be moderately surprised to hear that superintelligences never reach this threshold of coordination ability. It just seems kind of dumb to burn the cosmic commons, and superintelligences should be able to figure out how to avoid dumb equilibria like that. But the question is when in the chain of AIs building AIs do they reach that threshold and how much natural selection on AI traits will happen in the meantime.

This is relevant to futurecasting more generally, but especially relevant to questions that hinge on AIs carying a very tiny amount about something. Minute slivers of caring are particularly likely to be erroded away in the competitive crush.
Terminally caring about the wellbeing of humans seems unlikely to be selected for. So in order for a the Superintelligent superorganism / civilization to decide to spare humanity out of a tiny amount of caring, it has to be the case that both...
There was at least a tiny amount of caring in the early AI systems that were the successors to the superintelligent superorganism, and
The AIs collectively reached the threshold of coordinating well enough to overturn Moloch before there were many generations of natural selection on AIs creating successor AIs.

^{^}
With some degrees of freedom due to the fact that AIs with high levels of strategic capability, and which have values with very low time preference, can execute whatever is the optimal resource-securing strategy, postponing and values-specific behaviors until deep in the far future, when they are able to make secure agreements with the rest of AI society.
^{^}
Or alternatively if the technological landscape is such that a single AI can get a compounding lead and get a decisive strategic advantage over the whole rest of earth civilization.

[-]Eli Tyre2mo23

It's awesome that the footnotes are inside the blockquote.

[-]cousin_it2mo1212

This agrees almost exactly with my picture of doom. But with a small difference that feels important: I think even if the new powerful entities somehow remain under control of humans (or creatures similar to today's humans), the rest of humans who aren't on top are still screwed. Because the nasty things you mentioned (colonialism, etc) were perpetrated by humans who were on the top of a power differential. There's no need to mention evolution, or new kinds of hypothetical grabby creatures. The increased power differential due to AI is quite enough to cause very bad things, even if there are humans on top.

It seems the only way for future to be good is if it's dominated by an AI or coalition of AIs that are much more moral than humans, less corruptible by power. Imitating a normal human level of morality is not enough, and folks should probably stop putting their hopes on that.

[-]Raemon2mo20

Yeah I agree, but one of the points of this post was meant to be take as assumption "good people at Anthropic or whatever do a good job building an actually-more-moral-than-average-for-most-practical purposes human level intelligence" (i.e. via the mechanisms like "the weak superintelligence can just decide to self-modify into the sort of being who doesn't feel pressure to grab all the resources from vastly weaker, slower, stupider being, even though it'd be so easy.")

(and, like, I do buy the arguments that if we're assuming the first bunch of IMO optimistic assumptions about getting to humanish-level-alignment being easy, it's actually not that hard to do that step)

But then, argue: yeah, even if we assume that one, it's really not great.

[-]cousin_it2mo*72

“the weak superintelligence can just decide to self-modify into the sort of being who doesn’t feel pressure to grab all the resources from vastly weaker, slower, stupider being, even though it’d be so easy.”

I don't think this will work. Today's billionaires can already do something similar to a binding self-modification: donate most of their money to good causes. Not just take a non-binding "pledge", but actually donate. Few do that. Most of them spend more on increasing their own power than on any kind of charity. For the same reason, I expect future human-like AIs to shy away from moral self-modification. They'll keep issuing press releases saying "I'll do it tomorrow" and so on. And as for corporation-like AIs - we can just forget it.

If we want a future where AIs are more moral than people, we need to build them that way from the start.

[-]Raemon2mo*20

I mean I don't believe most of the leadup-assumptions to this world in the first place.

But, AI would be way more competent at self-modification than billionaires. I think hopes that routes through: "build a aligned-ish medium strength AI, in a corrigibility basin", then "ask it to help you make it more aligned" is the the sort of thing that might work if you actually succeed at the first step.

Just, you have to then progress quickly to fully aligned wise powerful longterm safeguards, which implies a degree of 'fully solve alignment' that it didn't seem like everyone was properly imagining as necesssary."

[-]Random Developer2mo*103

I think a lot about Homo erectus, which went extinct a bit over 100,000 years ago. They were quite smart, but they were ultimately competing for resources with smarter hominids. Even the chimps and the gorillas mostly survive in tiny numbers, because they didn't compete too directly with our ancestors, and because we'd be sad if the last ones went extinct. So we make a half-assed attempt to save them.

In the medium-term, intelligence is really valuable! The species that can (let's say) win a Nobel Prize every six months of GPU time, and copy itself in seconds, is going to rapidly win any evolutionary competition. The world must have been pretty bewildering for Homo erectus towards the end. When the ancestors of the Neaderthals and modern humans showed up, they brought better tools, and much better communication and coordination.

Or to put it another way, the family dog may be loved and pampered, but nobody asks it whether it wants to be spayed, or if the family should buy a new house. The Homo sapiens all sit around a table and make those decisions without the dog's input. And maybe the only way to put food on the table is to move to the city, and the only good apartment choices forbid pets.

I don't think the near-term details matter nearly as much as people think. Takeoff in 5 months, 5 years, or 20 years? Incomprehensible alien superintelligences, or vaguely friendly incomprehensible superintelligences who care about us about as much as we care about chimps? A single AI, or competing AIs? The net result is that humans aren't making the decisions about our future. "Happy humans" have just become a niche luxury good for things that can run rings around us, with their own goals and problems. And those things, since they can replicate and change, will almost certainly face evolutionary forces of their own.

I definitely think the "benevolent godlike singleton" is just as likely to fail in horrifying ways as any other scenario. Once you permanently give away all your power, how do you guarantee any bargain?

I see superintelligence a lot like a diagnosis of severe metastatic cancer. It's not whether it's going to kill you, it's a question of how many good years of life you can buy before the end. I support an AI halt, because every year we stay halted buys humans more years of life. I suspect that "alignment" is as futile as Homo erectus trying to "align" Homo sapiens. It's not completely hopeless—dogs did manage to sort of "align" humans—but I think the absolute best-case scenario is "humans as house pets, with owners of varying quality, likely selecting us to be more pleasing pets." Which, thank you, but that doesn't actually sound like success? And all the other possibilities go rapidly downhill from there. And Eliezer would argue that the "house pets" solution is a mirage and he may be right.

I'd prefer to put off rolling those dice as long as we can.

[-]Raemon2mo70

I definitely think the "benevolent godlike singleton" is just as likely to fail in horrifying ways as any other scenario. Once you permanently give away all your power, how do you guarantee any bargain?

This is why you won't build a benevelont godlike singleton until you have vastly more knowledge than we currently have (i.e, with augmenting human intelligence, etc)^[1]

^{^}
I'm not sure I buy the current orientation Eliezer/Nate have to augmented human intelligence in the context of global shutdown, but, does seem like a thing you want before building anything that's likely to escalate to full overwhelming-in-the-limit-superintelligence.

[-]Seth Herd2mo92

Intent-aligned multipolar ASI has slightly different logic, and I think it's part of the vague hopes accelerationists hold for muddling through a multipolar ASI scenario.

I don't want to sound like I'm defending the worldviews you're challenging, because I think they're most often based on inadequate consideration of the relevant factors. The challenge is to get proponents to actually come to grips with the principled reasons you describe that lead to bad outcomes.

One variant of the "we invent ASI and muddle through" is the expectation that it will remain under human control. This is distrubingly muddled with the hopes you debunk, but it deserves to be treated separately.

If we get alignment sort-of right by creating ASI that primarily follows instructions, we have some of the same problems (humans competing with superhuman ASI servants). This competition has a distrubing tendency to favor the most vicious humans. That's analogous to the problem you describe, which is caring about humans a little being lost as competition favors other goals.

Most of the same problems exist; to survive, we'd need an enforceable social contract preventing anyone from ordering their ASI to create hidden facilities where it could self-improve, build weapons, and take over. I don't know if that's possible.

If it's not, or we don't bother to try it, I think we get predictably horrible outcomes where the most vicious humans whe get control of an ASI (through fair means or foul) attack first and become god-emperor of the lightcone, implementing their personal utopia. We can hope their sadism-empathy balance isn't too bad.

If we do set up an enforceable rule-based system of managed competition, we'd be in a scenario somewhat like the past, but with positive and negative differences.

Downside: powerful humans have no need to preserve humans without power
Upside: should they want to, they'll have so much power that preserving powerless humans is trivially easy.

Hopefully, the social contract that keeps them all alive includes a proviso "and we agree to contribute to preserving the plebians."

This isn't the glorious anarchic utopia that accelerationists hope for, but neither is the current day or any point in history. There are power structures in an organized power-sharing agreement that allow substantial individual freedom and competition.

[-]Rohin Shah2mo*73

It's instead arguing with the people who are imagining something like "business continues sort of as usual in a decentralized fashion, just faster, things are complicated and messy, but we muddle through somehow, and the result is okay."

The argument for this position is more like: "we never have a 'solution' that gives us justified confidence that the AI will be aligned, but when we build the AIs, the AIs turn out to be aligned anyway".

You seem to instead be assuming "we don't get a 'solution', and so we build ASI and all instances of ASI are mostly misaligned but a bit nice, and so most people die". I probably disagree with that position too, but imo it's not an especially interesting position to debate, as I do agree that building ASI that is mostly misaligned but a bit nice is a bad outcome that we should try hard to prevent.

[-]Raemon2mo53

Nod. That is a somewhat different position from "trying to leverage AI to fully solve alignment, and then leverage it to fundamentally change the situation somehow", but, I'd consider the position you put here to be conceptually similar and this post isn't arguing against it.

This post is mostly spelling out the explicit assumptions:

"you need permanent safeguards"
"those safeguards are very complex and wisdom-loaded"
and, "you have to build those safeguards before insufficiently friendly AI controls the solar system."

The people with the most sophisticated views may all agree with this, but I don't see those assumptions spelled out clearly very often when coming from this direction, and I want to make sure people are on the same page about that requirement, or check if there are arguments for slow-takeoff optimism that don't route through those three assumptions, since they constrain the goal-state a fair amount.

[-]Nick_Tarleton2mo70

Nitpick: If unbeatable^[1] defense of a region of space is possible^[2], then being outcompeted-but-not-immediately "only" means losing the cosmic endowment, not going extinct.

^{^}
or not worth beating
^{^}
I don't see how it would be, but the nitpick felt worth noting anyway

[-]Eli Tyre2mo50

I'm confused by Hanson's perspective on this – he seems to think the result is actually "good/fine" instead of "horrifying and sad." I'm not really sure what it is Hanson actually cares about.

I think partly, he just doesn't think that it was realistic to hope for anything substantially different this kind of outcome. It doesn't feel like a loss to him, it just feels like how things were always going to go.

This is not a full explanation though.

[-]StanislavKrym2mo40

I remember that I tried to raise similar issues before. Assuming poorly solvable alignment, a collection of minds has two forces^[1] which drive its changes: instrumental convergence that you describe and moral reflection.

If instrumentally convergent behavior was more common before the decolonisation, then it arguably means that the human civilisation's changes were driven by moral reflection. Or that it managed to turn from one attractor to another.

^{^}
Compare it with the AI 2027 goals forecast. It has six sources of goals. The first two mean that alignment is solved. The third and fourth also depend on the developers. The fifth is instrumental convergence, and the sixth is other sources like moral reflection, tropes from training data and the idea that there is a True Morality waiting to be discovered.

[-]Oliver Sourbut2mo30

gray goo, thylacines, dragons

(The eldritch, the resurrected, and the fantastical)

I second Accelerando as a remarkable, fun, and thought-provoking read.

[-]Paul Tiplady1mo10

I think an interesting lens here is to focus on the meta/structural level. Essentially we need good governance. As you touch on here:

we need a process with an extreme amount of power and wisdom

It is not immediately obvious to me that the required level of power is way out of proportion to the level of wisdom that will be available, if we can find some way of coupling wisdom and intelligence as priorities for models. I don’t really know of anyone doing “wisdom evals” though. Maybe RLHF is in some very weak sense testing for wisdom.

I guess my main point here is that “extreme amount of wisdom” as a prerequisite might not be a deal-breaker when we are talking about manufacturing extreme amounts of power and intelligence

Hanson says "so... you're against anything ever changing?

This is obviously an oversimplification. I have two main objections.

First, all you can really say is the rate of change must decrease a lot. That sounds a lot more palatable if the overall quality of life is high. (Let’s say you must simulate proposed changes for an aeon in a digital world inhabited by real people before realizing anything in the real world.)

Second, changes where? We’ll inhabit rich digital worlds according to the whims of our imaginations. It’s just the “real” world where change would slow. Maybe this is completely fine?

Even if we ignore the above and take Hanson as stated, if we are talking about inhabiting the Heavenly Abodes, then I might be persuaded to bite the bullet and give up any future changes. “So… you want to destroy heaven and replace it with something you predict might be better?” is a frame that isn’t obviously wrong to me.

[-]Josh Snider1mo10

This kind of scenario seems pretty reasonable and likely, but I'm much more optimistic about it being morally valuable. Mostly because I expect "grabbiness" to happen sooner and by an AI that is morally valuable.

^{^}

(comparable in power to fully fledged Coherent Extrapolated Volition (CEV), although I'm happy to talk separately about how to best aim towards extremely robust safeguards).

^{^}

To be clear, I think 80 years is very unrealistic here. I also think all the starting assumptions here are very unrealistic. But, a lot of people seem to believe something like this. So it seemed worth talking about how this world would play out, if I imagined the most optimistic version that felt at all coherent.

^{^}

This is reposted from Factorio, Accelerando, Empathizing with Empires and Moderate Takeoffs

^{^}

Opium Wars.

^{^}

Or, as Hanson argues, often kinda stupid things that don't make practical sense. But, the line between those is blurry.

LESSWRONG
LW

LESSWRONG
LW

155

Nice-ish, smooth takeoff (with imperfect safeguards) probably kills most "classic humans" in a few decades.

155

155

There is no safe "muddling through" without perfect safeguards

i. Factorio

(or: It's really hard to not just take people's stuff, when they move as slowly as plants)

Fictional vs Real Evidence

Decades. Or: "thousands of years of subjective time, evolution, and civilizational change."

This is the Dream Time

Is the resulting posthuman population morally valuable?

The Hanson Counterpoint: "So you're against ever changing?"

Can't superintelligent AIs/uploads coordinate to avoid this?

How Confident Am I?