Do LLMs Learn Hidden Preferences from Neutral Feedback?
Epistemic status: this is purely preliminary and exploratory. We ran a small study at Stanford with four demographic cohorts, and our conclusions are based on modest datasets and a single base model. There is plenty (!!) of room for confounders and random noise, and the patterns we see may not...