I'm working on an article for the Open Phil AI worldview contest. I am thinking of explaining my interpretation of Nate's take on agency of advaned models (see here). Generally, I just want to explain is just what Nate argues - that more ambitious tasks require more "agentic" behaviour - but I wanted to illustrate with a few examples that to me makes the argument clearer than typical MIRI discussions of this issue.
One worry about discussing this issue is that, if you argue compellingly that agency is required for more advanced AI, then you might convince some people working on advanced AI to look for ways to make it more agentic. This could lead to acceleration of capabilities in a potentially undesirable direction.
I've found MIRI staff explanations of this point to often amount to "I've thought about it, and if you think about it I think you'll end up agreeing with me". It's plausible that this is motivated by a desire not to make the argument too clear. I'm pretty unsure about this, though, because if you feel that it's likely to be harmful to share this idea then I would think the appropriate policy is not to talk about it at all, rather than to talk about it vaguely. To the extent that this idea does suggest pathways to higher capability, I think there are probably lots of people in the AI business who can put 2 and 2 together, so to speak.
My own view is:
- AI researchers will pay a lot more attention to successful experiments than to abstract ideas, so such discussions are probably less compelling to AI developers
- If someone is convinced by an argument to try agency promoting experiments, I think the argument must be plausible enough a priori, and that there are enough people working on novel AI ideas that someone else probably would have tried a similar experiment fairly soon (timescale ~ a couple of months, and the overall impact is probably less because of other advances that happen during that month)
- Also, most experiments themselves aren’t especially compelling
- On the other hand, there's a large upside from having people engaged in AI x-risk questions to have a good idea of how this is likely to play out, and impact on this front doesn’t depend on someone going out to run a successful experiment
A limiting case is if everyone agrees that AI x-safety community agrees AI has to be highly agentic to carry out ambitious tasks. In this case, I think it’s likely that some developers explore ways to make their AI more agentic earlier than they otherwise would have, but the x-safety community is much better coordinated about that models. Its murky, but I think this is probably good overall.
So I think it's probably best to talk about it plainly. What are your thoughts?
I think, for example, that talk about how AI might be a winner takes all game might have encouraged the "full speed ahead" approach to developing AGI
(Also, if you do succeed in writing what you think is an infohazardously-good explanation, you can just ask someone you trust to read it privately before posting it publicly.)