rosehadshar — LessWrong

Sense-making about extreme power concentration

All of 1-4 seem plausible to me, and I don't centrally expect that power concentration will lead to everyone dying.

Even if all of 1-4 hold, I think the future will probably be a lot less good than it could have been:
- 4 is more likely to mean that earth becomes a nature reserve for humans or something, than that the stars are equitably allocated

- I'm worried that there are bad selection effects such that 3 already screens out some kinds of altruists (e.g. ones who aren't willing to strategy steal). Some good stuff might still happen to existing humans, but the future will miss out on some values completely

- I'm worried about power corrupting/there being no checks and balances/there being no incentives to keep doing good stuff for others

Sense-making about extreme power concentration

rosehadshar6d40

Thanks, agree that 'emergent dynamics' is woolly above.

I guess I don't think the y-axis should be the temporal dimension. To give some cartoon examples:

I'd put an extremely Machievellian 10 year plan on the part of a cabal of politicians to backslide into a dictatorship then seize power over the rest of the world near the top end of the axis
I'd put unfavourable order of capabilities, where in an unplanned way superpersuasion comes online before defenses, and actors fail to coordinate not to deploy it because of competitive dynamics, near the bottom end of the axis. Even if the whole thing unfolds over a few months

I do think the y-axis is pretty correlated with temporal scales, but I don't think it's the same. I also don't think physical violence is the same, though it's probably also correlated (cf the backsliding example which is v powerseeking but not v violent).

The thing I had in mind was more like, should I imagine some actor consciously trying to bring power concentration about? To the extent that's a good model, it's power-seeking. Or should I imagine that no actor is consciously planning this, but the net result of the system is still extreme power concentration? If that's a good model, it's emergent dynamics.

Idk, I see that this is messy and probably there's some other better concept here

AI-enabled coups: a small group could use AI to seize power

rosehadshar1mo*Ω120

I think I agree that, once an AI-enabled coup has happened, the expected remaining AI takeover risk would be much lower. This is partly because it ends the race within the country where the takeover happened (though it wouldn't necessarily end the international race), but also partly just because of the evidential update: apparently AI is now capable of taking over countries, and apparently someone could instruct the AIs to do that, and the AIs handed the power right back to that person! Seems like alignment is working.

I don't currently agree that the remaining AI takeover risk would be much lower:

The international race seems like a big deal. Ending the domestic race is good, but I'd still expect reckless competition I think. Maybe you're imagining that a large chunk of powergrabs are motivated by stopping the race? I'm a bit sceptical.
I don't think the evidential update is that strong. If misaligned AI found it convenient to take over the US using humans, why should we expect them to immediately cease to find humans useful at that point? They might keep using humans as they accumulate more power, up until some later point.
There's another evidential update which I think is much stronger, which is that the world has completely dropped the ball on an important thing almost no one wants (powergrabs), where there are tractable things they could have done, and some of those things would directly reduce AI takeover risk (infosec, alignment audits etc). In a world where coups over the US are possible, I expect we've failed to do basic alignment stuff too.

Curious what you think.

The Industrial Explosion

rosehadshar3mo30

I think it might be a bit clearer to communicate the stages by naming them based on the main vector of improvement throughout the entire stage, i.e. 'optimization of labor' for stage one, 'automation of labor' for stage two, 'miniturization' for stage three.

I think these names are better names for the underlying dynamics, at least - thanks for suggesting them. (I'm less clear they are better labels for the stages overall, as they are a bit more abstract.)

Should there be just one western AGI project?

rosehadshar9mo10

Changed to motivation, thanks for the suggestion.

I agree that centralising to make AI safe would make a difference. It seems a lot less likely to me than centralising to beat China (there's already loads of beat China rhetoric, and it doesn't seem very likely to go away).

Should there be just one western AGI project?

rosehadshar9mo10

"it is potentially a lot easier to stop a single project than to stop many projects simultaneously" -> agree.

Should there be just one western AGI project?

rosehadshar9mo20

I think I still believe the thing we initially wrote:

Agree with you that there might be strong incentives to sell stuff at monopoloy prices (and I'm worried about this). But if there's a big gap, you can do this without selling your most advanced models. (You sell access to weaker models for a big mark up, and keep the most advanced ones to yourselves to help you further entrench your monopoly/your edge over any and all other actors.)
I'm sceptical of worlds where 5 similarly advanced AGI projects don't bother to sell
- Presumably any one of those could defect at any time and sell at a decent price. Why doesn't this happen?
- Eventually they need to start making revenue, right? They can't just exist on investment forever
  (I am also not an economist though and interested in pushback.)

Should there be just one western AGI project?

rosehadshar9mo30

Thanks, I expect you're right that there's some confusion in my thinking here.

Haven't got to the bottom of it yet, but on more incentive to steal the weights:
- partly I'm reasoning in the way that you guess, more resources -> more capabilities -> more incentives
- I'm also thinking "stronger signal that the US is all in and thinks this is really important -> raises p(China should also be all in) from a Chinese perspective -> more likely China invests hard in stealing the weights"
- these aren't independent lines of reasoning, as the stronger signal is sent by spending more resources
- but I tentatively think that it's not the case that at a fixed capability level the incentives to steal the weights are the same. I think they'd be higher with a centralised project, as conditional on a centralised project there's more reason for China to believe a) AGI is the one thing that matters, b) the US is out to dominate

Should there be just one western AGI project?

rosehadshar9mo30

Thanks, I agree this is an important argument.

Two counterpoints:

The more projects you have, the more attempts at alignment you have. It's not obvious to me that more draws are net bad, at least at the margin of 1 to 2 or 3.
I'm more worried about the harms from a misaligned singleton than from a misaligned (or multiple misaligned) systems in a wider ecosystem which includes powerful aligned systems.

Should there be just one western AGI project?

rosehadshar9mo10

Thanks! Fwiw I agree with Zvi on "At a minimum, let’s not fire off a starting gun to a race that we might well not win, even if all of humanity wasn’t very likely to lose it, over a ‘missile gap’ style lie that we are somehow not currently in the lead."

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments