x

LESSWRONG

LW

Benjamin Plaut — LessWrong

Benjamin Plaut

Benjamin Plaut

Message

1

3

10mo

Benjamin Plaut

1

10mo

AI Safety via Generalization and Caution: A Research Agenda

This post is a condensed version of the full paper. You can also watch this talk for an overview of the conceptual arguments (although the talk focuses on one project out of five). Suggested reading options: 1. Just the summary 2. Summary + "A generalization-based framing of AI safety" +...

Technical AI Safety research taxonomy attempt (2025)

Preface I'm a postdoc at CHAI working on AI safety. This document contains my rough attempt at a taxonomy of technical AI safety research in 2025. It’s certainly imperfect, and likely biased towards the research areas that I have the most experience with. I’m sure I’ve accidentally omitted some important...

Aug 27, 2025•2