x
Do LLMs Learn Hidden Preferences from Neutral Feedback? — LessWrong