Am I interpreting you correctly that the responses of both Opus 4 and o3 here are wrong according to the theorem?
Also would the following restatement of the theorem be a correct understanding? The student model can't ever become worse (according to the teacher) when fine tuned on (any) ouputs from the teacher, on any distribution.
If the strategy is vibes-invariant, it's also ignoring useful information. It's not sensible to use an X-invariant strategy unless you believe X carries no information whatsoever. And that's kind of what the OP is arguing, that vibes do carry information. If you disagree with that, argue that directly! Arguing that you can adopt an invariant strategy without tossing away information is not correct or useful.
I wonder if that's why your friend might say something "cool, can you pick me up on the way? Any time is good for me."
As in, to release you from the implicit assurance of the specific time.
This comment articulates the main thought I was having reading this post. I wonder how Buck is avoiding this very trap, and if there is any hope at all of the Moderate strategy overcoming this problem?