x

LESSWRONG

LW

vedant-badoni — LessWrong

vedant-badoni

vedant-badoni

Message

33

5mo

vedant-badoni

33

5mo

Exploring Reinforcement Learning Effects on Chain-of-Thought Legibility

by Julian H, RohanS, Baram Sosis, vedant-badoni, and The-Turtle

This project was conducted as part of the SPAR Fall 2025 cohort. TL;DR * Chain-of-thought (CoT) monitoring may serve as a core pillar for AI safety if further advancements in AI capabilities do not significantly degrade the monitorability of LLM serial reasoning. * As such, we studied the effects of...