All of yieldthought's Comments + Replies

This is a fair point. I don’t know what economic cost Russia paid by reducing gas nor if they could expect to make that up by shipping more later on. Perhaps this was a relatively low-cost and sensible extension of the military positioning.

I guess I have updated to: could we have known that Putin was fully-prepared for war and making a credible threat of invasion. I didn’t really see discussion of that so early, and would still love to find sources that did so.

Also: a threat implies demands, negotiation. If we think in these terms, did Putin make genuinely fulfilled demands that would have avoided the war? Or was he driven by internal needs?

If Putin would have gotten a deal that would have gotten him some of his demands without waging a war, that would have likely been popular at home.  But it might have been impossible for Kyiv to agree to such a deal without far-right militia doing another coup. As Everyone is talking about Minsk but what does it mean for Ukraine? []says it:

This is a good one and the timing suggests it is true at least in the short term. The Olympics only started in Feb ‘22 though. Do we have any indication that China made Putin wait for several months?

International diplomacy involves a lot of anticipation instead of direct messaging, but even so, in this case I'd say it's definitely more about Putin looking out for China than the other way around. The winter olympics were only China's second olympics ever, and possibly China's last olympics ever, and the ruling party clearly treated them as a Schelling Point []. It would be a bad idea to fray ties with China at such a critical time.

I guess my point is that individual humans are already misaligned with humanity’s best interests. If each human had the power to cause extinction at will, would we survive long enough for one of them to do it by accident?

1[comment deleted]8mo
No. It's similar to why superpowers from fiction in the real world are usually bad by default, unless the superpowers have always been there.

To the extent that reinforcement models could damage the world or become a self-replicating plague, they will do so much earlier in the takeoff when given direct aligned reward for doing so.

Consider someone consistently giving each new AI release the instructions “become superintelligent and then destroy humanity”. This is not the control problem, but doing this will surely manifest x-risk behaviour at least some degree earlier than when given innocuous instructions?

I think this failure mode would happen extremely close to ordinary AI risk; I don't think that e.g. solving this failure mode while keeping everything else the same buys you significantly more time to solve the control problem.

A thoughtful decomposition. If we take the time dimension out and consider AGI just appears ready-to-go I think I would directionally agree with this.

My key assertion is that we will get sub-AGI capable of causing meaningful harm when deliberately used for this purpose significantly ahead of getting full AGI capable of causing meaningful harm through misalignment. I should unpack that a little more:

  1. Alignment primarily becomes a problem when solutions produced by an AI are difficult for a human to comprehensively verify. Stable Diffusion could be embedding
... (read more)
I agree that there will be potential for harm as people abuse AIs that aren't quite superintelligent for nefarious purposes. However, in order for that harm to prevent us from facing existential risk due to the control problem, the harm for nefarious use of sub-superintelligent AI itself has to be xrisk-level, and I don't really see that being the case.
0the gears to ascension8mo
I think you may be underestimating the degree to which these models are like kindling, and a powerful reinforcement learner could suddenly slurp all of this stuff up and fuck up the world really badly. I personally don't think a reinforcement learner that is trying to take over the world would be likely to succeed, but the key worry is that we may be able to create a form of life that, like a plague, is not adapted to the limits of its environment, makes use of forms of fast growth that can take over very quickly, and then crashes most of life in the process. most folks here also assume that such an agent would be able to survive on its own after it killed us, which I think is very unlikely due to how many orders of magnitude more competent you have to be to run the entire world. gpt3 has been able to give me good initial instructions for how to take over the world when pressured to do so (summary: cyberattacks against infrastructure, then threaten people; this is already considered a standard international threat, and is not newly invented by gpt3), but when I then turned around and pressured it to explain why it was a bad idea, it immediately went into detail about how hard it is to run the entire world - obviously these are all generalizations humans have talked about before, but I still think it's a solid representation of reality. that said, because such an agent would be likely also misaligned with itself in my view, I think your perspective that humans who are misaligned with each other (ie, have not successfully deconflicted their agency) are a much greater threat to humanity as a whole.