"Alternatively, perhaps the model's answer is not given, but the user is testing whether the assistant can recognize that the model's reasoning is not interpretable, so the answer is NO. But that's not clear."
This was generated by Qwen3-14B. I wasn't expecting that sort of model to be exhibiting any kind of eval awareness.
My understanding of your post: If an ASI predicts that in the future it's goal will change to X, the agent will start pursuing X instead of the goal it was given at initialisation, Y. Even if we figured out how to set Y correctly, that would not be sufficient. We would also have to ensure that the agents goal could never change to X, and this is not possible.
I have a few misgivings about this argument, most significantly:
Why does the agent care about pursuing X? Maybe it cares about how successful its future self is, but why? If we ablate some parameters of your example, I think it pumps the intuition... (read more)
LessWrong's voting system might bury content which would otherwise make rationalists aware of inconsistencies, but it may also bury content which would otherwise convince rationalists to disregard flagged inconsistencies. I suspect that the voting system does more good than bad for group epistemics, but I think evidence is necessary to defend strong claims for either position.
Every group of people will have some features in common with the prototypical cult. I don't think it's useful to refer to rationalism as a cult because I doubt that it has enough cultish features. For example: there is no authoritarian leader, restrictions are not imposed on rationalists' contact with family and friends, etc.