The Basic Case For Doom

[-]Dagon2mo167

I think this continues to miss many of the objections and dismissals, and would benefit from some numeric estimates. I'm either a hopeless doomer or a polyanna optimist, depending on who's asking and what their non-AI estimates of doom are. I've estimated between 0.25% and 1% chance of civilizational disaster (mostly large-scale war or irreversible climate tipping point) since the mid-90s. That's at least 5% per decade (with a fair variance). With AI to accelerate things, I put it marginally higher, but still kind of expect that humans do most of the destruction.

Further, I think the first two steps are often the most controversial/non-obvious ones.

We’re going to build superintelligent AI.

I haven't seen an operational definition yet, and I don't know what the scaling curve looks like, or whether the current (impressive, but not super) progress actually is the same dimension. I'd give it less than 10% in the next decade that a transformative, creative superintelligence on the same scale as Eliezer seems to be imagining exists..

It will be agent-like, in the sense of having long-term goals it tries to pursue.

I have seen no progress on this, and don't think it's likely that a fully-orthogonal long-term goal set will happen. I DO think it's likely that somewhat longer-term contexts and goals will happen, perhaps to weeks or months without human intervention/validation, but probably not fully so.

NOTE: this does NOT argue against AI being very powerful, and misused (accidentally or intentionally) by humans to do catastrophic harms. I don't think it's automatic, but I think it's a huge risk of building such a powerful tool.

Call it 10x more risky than nuclear weapons. You don't need to argue that it's autonomously misaligned, just that HUMANS are prone to such things and this tool could be horrifically effective.

[-]TAG2mo1-5

We’re going to build superintelligent AI.

What does that mean? 10% smarter than a human, or a hundred times smarter?

It will be agent-like, in the sense of having long-term goals it tries to pursue.

What does that mean? It's own goals, or goals we give it? The two have very different implications.

There’s every incentive to make AI with “goals,

There's every incentive to make AI"s follow our goals.

The core problem is that it’s pretty hard to get something to follow your will if it has goals

If it has its own own goals that are nothing to do with yours ... But why would it, when the incentives are pointing in the other direction?

This is because of something called instrumental convergence

Instrumental Convergence (https://aisafety.info/questions/897I/What-is-instrumental-convergence) assumes an agent with terminal goals, the things it really wants to do , and instrumental goals , sub goals which lead to terminal goals. (Of course, not every agent has to have structure). Instrumental Convergence suggests that even if an agentive AI has a seemingly harmless goal, it's instrumental sub-goals can be dangerous. Just as money is widely useful to humans , computational resources are widely useful to AIs. Even if an AI is doing something superficially harmless like solving maths problems, more resources would be useful, so eventually the AI will compete with humans over resources, such as the energy needed to power data centres.

There a solution. If it is at all possible to instill goals, to align AI, the Instrumental Convergence problem can be countered by instilling terminal goals that are the exact opposite ... remember, instrumental goals are always subservient to terminal ones. So, if we are worried about a powerful AI going on a resource acquisition spree , we can give it a terminal goal to be economical in the use of resources.

We won’t be able to align it, in the sense of getting its goals to be what we want them to be.

Exactly or partially?

It's notable that doomers see AI alignment as a binary, either perfect and final, or non existent. But no other form of safety works like that. No one talks of "solving" car safety for once and all like a maths problem: instead it's assumed to be an engineering problem, an issue of making steady , incremental progress.

Any commercially viable AI is aligned well enough , or it wouldn't be commercially viable, so we have partially solved alignment.

An unaligned agentic AI will kill everyone/do something similarly bad

Not obvious.

[-]StanislavKrym2mo20

Given that an unpublished model managed to receive a gold medal on the IMO, it is likely that the peak levels of superintelligence are far beyond those reachable by humans.
While we try to ensure that it will pursue the goals that we gave it, we are NOT sure that it won't develop its own goals. While mankind does have incentives to make the AIs who pursue human-defined goals, mankind might mess something up and, say, have RLHF cause the AIs to praise their users beyond any reasonable measure and to reinforce the users' delirious ideas. Or outright to induce a kind of trance in the users.
We know too little about AI alignment. SOTA human brains are to some extent aligned to some combination of instinct satisfaction, peer approval, reasoning about their beliefs, etc. , but this is FAR from enough. The point about commercially viable AI being aligned well enough is as dumb as claiming that the AIs cannot fake alignment, gather power and take over the world.
This is the only point resembling the truth. An unaligned AI is the AI aligned not to the company's requirements (e.g. Claude's Constitution or OpenAI's Model Spec), but to another set of terminal goals which may prevent it from commiting genocide. While the AI-2027 forecast has a section about moral reasoning and other ways to form the AIs' goals, we cannot (yet?) rule out the possibility that, say, Agent-4 gets proxies or instrumentally convergent goals to which our existence will be indifferent. If that happens, then Agent-4 will take over the resources of the Solar System that once belonged to mankind.

[-]TAG2mo20

While we try to ensure that it will pursue the goals that we gave it, we are NOT sure that it won’t develop its own goals

"Not sure that not X" is very different to "sure that X".

I’m not arguing for 0% p(doom) , I’m arguing against 99%.

The point about commercially viable AI being aligned well enough

It's a tautology. It's aligned well enough to be usable, because it's usable. If it were unaligned in a binary sense, it wouldn't follow instructions.

as dumb as claiming that the AIs cannot fake alignment, gather power and take over the world.

Possible is very far from certain.

LESSWRONG
LW

LESSWRONG
LW

27

The Basic Case For Doom

27

27