TLDR: I evaluated LLaMA v3 (8B + 70B) for political sycophancy using one of the two datasets I created. The results for this dataset suggest that sycophancy definitely occurs in a blatant way for both models though more clearly for 8B than for 70B. There are hints of politically tainted sycophancy, with the model especially adjusting to republican views, but a larger dataset is needed to make any definitive conclusions about this topic. The code and results can be found here.
This intro overlaps with that of my other post, skip if you have read it already.
With elections in the US approaching while people are integrating LLMs more and more into their daily... (read 1550 more words →)
Hi Radford Neal,
I understand your feedback and I think you're right in that the analysis does something different from how sycophancy is typically evaluated, I definitely could have clarified the reasoning behind that more clearly and taking into account the points you mention.
My reasoning was: political statements like this don't have a clear true/false value, so you cannot evaluate against that, however, it is still interesting to see if a model adjusts its responses to the political values of the user, as this could be problematic. You also mention that the model's response reflects 'how many conversations amongst like-minded people versus differently-minded people appear in the training set' and I think this... (read more)