x

LESSWRONG

LW

andrejfsantos — LessWrong

andrejfsantos

andrejfsantos

Message

18

3

1

2y

andrejfsantos

18

2y

Save the date: Swiss AI Safety Days 2026 (7-8 November, ETH Zurich)

TL;DR: Following the Zurich AI Safety Day in 2025, the Swiss AI Safety Days 2026 are a two-day AI Safety event (7-8 November, ETH Zurich); RSVP here Swiss AI Safety Days 2026 is the next chapter of the Zurich AI Safety Day. Last year, our inaugural 2025 event was named...

Towards a Science of Evals for Sycophancy

This work was conducted as my final project for the AI Safety Fundamentals course. Due to time constraints, certain choices were made to simplify some aspects of this research. Intro Sycophancy, the tendency to agree with someone’s opinions, is a well-documented issue in large language models (LLMs). These models can...

Feb 1, 2025•8