Yeh so thinking a little more I'm not sure my original comment conveyed everything I was hoping to. I'll add that even if you could get a side of A4 explaining AI x-risk in front of a capabilities researcher at <big_capabilities_lab>, I think they would be much more likely to engage with it if <big_capabilities_lab> is mentioned.I think arguments will probably be more salient if they include "and you personally, intentionally or not, are entangled with this."Saying that, I don't have any data about the above. I'm keen to hear any personal experiences anyone else might have in this area.
Ok not sure I understand this. Are you saying "Big corps are both powerful and complicated. Trying to model their response is intractably difficult so under that uncertainty you are better to just steer clear?"
I think it's good that someone is bringing this up. I think as a community we want to be deliberate and thoughtful with this class of things.That being said, my read is that the main failure mode with advocacy at the moment isn't "capabilities researchers are having emotional responses to being called out which is making it hard for them to engage seriously with x-risk."It's "they literally have no idea that anyone thinks what they are doing is bad."Consider FAIR trying their hardest to open-source capabilities work with OPT. The tone and content of the responses shows overwhelming support for doing something that is, in my worldview, really, really bad.I would feel much better if these people at least glanced their eyeballs over arguments for not open-source capabilities. Using the names of specific labs surely makes it more likely that the relevant writing ends up in front of them?
I think the failure case identified in this post is plausible (and likely) and is very clearly explained so props for that!
However, I agree with Jacob's criticism here. Any AGI success story basically has to have "the safest model" also be "the most powerful" model, because of incentives and coordination problems.
Models that are themselves optimizers are going to be significantly more powerful and useful than "optimizer free" models. So the suggestion of trying to avoiding mesa-optimization altogether is a bit of a fabricated option. There is an interesting parallel here with the suggestion of just "not building agents" (https://www.gwern.net/Tool-AI).
So from where I am sitting, we have no option but to tackle aligning the mesa-optimizer cascade head-on.
This post seems to be using a different meaning of "consequentialism" to what I am familiar with (that of moral philosophy). Subsequently, I'm struggling to follow the narrative from "consequentialism is convergently instrumental" onwards.
Can someone give me some pointers of how I should be interpreting the definition of consequentialism here? If it is just the moral philosophy definition, then I'm getting very confused as to why "judge morality of actions by their consequences" is a useful subgoal for agents to optimize against...