TL;DR: We show that LLM agents can figure out who you are from your anonymous online posts. Across Hacker News, Reddit, LinkedIn, and anonymized interview transcripts, our method identifies users with high precision – and scales to tens of thousands of candidates. While it has been known that individuals can...
OpenAI’s updates of GPT-4o in April 2025 famously induced absurd levels of sycophancy: the model would agree with everything users would say, no matter how outrageous. After they fixed it, OpenAI released a postmortem; and while widely discussed, I find it curious that this sentence received little attention: > Similarly,...
tl;dr: LLMs rapidly improving at software engineering and math means lots of projects are better off as Google Docs until your AI agent intern can implement them. Implementation keeps getting cheaper Writing research code has gotten a lot faster over the past few years. Since 2021 and OpenAI Codex, new...
Consider the following two questions: Is this move good or bad? Is this forecast accurate? In both cases, the ground truth is not known to us humans. Furthermore, in both cases there either already exist superhuman AI systems (as in the case of chess), or researchers are actively working to...
Thanks to Flo Dorner for feedback on the technical content in this post. I have recently been accepted to the SERI ML Alignment Theory Scholars Program program, where applicants are paired with mentors. The application process was nonstandard: each mentor has a set of open-ended questions on safety research you...