This work was conducted as my final project for the AI Safety Fundamentals course. Due to time constraints, certain choices were made to simplify some aspects of this research.
Intro
Sycophancy, the tendency to agree with someone’s opinions, is a well-documented issue in large language models (LLMs). These models can echo user opinions or preferences without critical analysis, often sacrificing accuracy and factfulness in the process.
In this post I show how common measures of sycophancy in LLMs can produce results that are unreliable, if not misleading. This research was partially motivated by the need for a Science of Evals in AI model evaluations. This post is a small step towards a new methodology to obtain... (read 2317 more words →)
A few months ago I was unsure whether I should apply to the bootcamp or not. After reading your post I decided I would, and having just attended the ML4Good France 2025, I'm very glad I did. So a big thank you for this post!
I can endorse the recommendation for those reading to attend. It was an excellent experience that gave me everything I needed actually start contributing to AI Safety. I made lots of connections in the fields, many like-minded friends, and left with a specific step-by-step plan for contributing to the field - something I was missing before.
All in all this post is still an accurate description of the camp and the organisers have improved upon some of the points in "Areas for improvement" and updated the curriculum.