Thanks, agree that 'emergent dynamics' is woolly above.
I guess I don't think the y-axis should be the temporal dimension. To give some cartoon examples:
I do think the y-axis is pretty correlated with temporal scales, but I don't think it's the same. I also don't think physical violence is the same, though it's probably also correlated (cf the backsliding example which is v powerseeking but not v violent).
The thing I had in mind was more like, should I imagine some actor consciously trying to bring power concentration about? To the extent that's a good model, it's power-seeking. Or should I imagine that no actor is consciously planning this, but the net result of the system is still extreme power concentration? If that's a good model, it's emergent dynamics.
Idk, I see that this is messy and probably there's some other better concept here
I think I agree that, once an AI-enabled coup has happened, the expected remaining AI takeover risk would be much lower. This is partly because it ends the race within the country where the takeover happened (though it wouldn't necessarily end the international race), but also partly just because of the evidential update: apparently AI is now capable of taking over countries, and apparently someone could instruct the AIs to do that, and the AIs handed the power right back to that person! Seems like alignment is working.
I don't currently agree that the remaining AI takeover risk would be much lower:
Curious what you think.
I think it might be a bit clearer to communicate the stages by naming them based on the main vector of improvement throughout the entire stage, i.e. 'optimization of labor' for stage one, 'automation of labor' for stage two, 'miniturization' for stage three.
I think these names are better names for the underlying dynamics, at least - thanks for suggesting them. (I'm less clear they are better labels for the stages overall, as they are a bit more abstract.)
Changed to motivation, thanks for the suggestion.
I agree that centralising to make AI safe would make a difference. It seems a lot less likely to me than centralising to beat China (there's already loads of beat China rhetoric, and it doesn't seem very likely to go away).
"it is potentially a lot easier to stop a single project than to stop many projects simultaneously" -> agree.
I think I still believe the thing we initially wrote:
Eventually they need to start making revenue, right? They can't just exist on investment forever
(I am also not an economist though and interested in pushback.)
Thanks, I expect you're right that there's some confusion in my thinking here.
Haven't got to the bottom of it yet, but on more incentive to steal the weights:
- partly I'm reasoning in the way that you guess, more resources -> more capabilities -> more incentives
- I'm also thinking "stronger signal that the US is all in and thinks this is really important -> raises p(China should also be all in) from a Chinese perspective -> more likely China invests hard in stealing the weights"
- these aren't independent lines of reasoning, as the stronger signal is sent by spending more resources
- but I tentatively think that it's not the case that at a fixed capability level the incentives to steal the weights are the same. I think they'd be higher with a centralised project, as conditional on a centralised project there's more reason for China to believe a) AGI is the one thing that matters, b) the US is out to dominate
Thanks, I agree this is an important argument.
Two counterpoints:
Thanks! Fwiw I agree with Zvi on "At a minimum, let’s not fire off a starting gun to a race that we might well not win, even if all of humanity wasn’t very likely to lose it, over a ‘missile gap’ style lie that we are somehow not currently in the lead."
All of 1-4 seem plausible to me, and I don't centrally expect that power concentration will lead to everyone dying.
Even if all of 1-4 hold, I think the future will probably be a lot less good than it could have been:
- 4 is more likely to mean that earth becomes a nature reserve for humans or something, than that the stars are equitably allocated
- I'm worried that there are bad selection effects such that 3 already screens out some kinds of altruists (e.g. ones who aren't willing to strategy steal). Some good stuff might still happen to existing humans, but the future will miss out on some values completely
- I'm worried about power corrupting/there being no checks and balances/there being no incentives to keep doing good stuff for others