While it doesn't solve your problem, I think a clearer distinction between preferences and plans would somewhat narrow the issue, and clear up some of the mess around manipulation vs. counsel and the like. For example, if we focus on the prediction and planning phase and consider preferences constant, assume we have agent 1 discussing their plans with an AI, or with another agent 2 in general. Here, the difference between manipulation and honest counsel from the AI is easier to pin down: if the AI is providing a truthful representation of how it expects ea... (read more)
While it doesn't solve your problem, I think a clearer distinction between preferences and plans would somewhat narrow the issue, and clear up some of the mess around manipulation vs. counsel and the like. For example, if we focus on the prediction and planning phase and consider preferences constant, assume we have agent 1 discussing their plans with an AI, or with another agent 2 in general. Here, the difference between manipulation and honest counsel from the AI is easier to pin down: if the AI is providing a truthful representation of how it expects ea... (read more)