Julian H

Message

Exploring Reinforcement Learning Effects on Chain-of-Thought Legibility

This project was conducted as part of the SPAR Fall 2025 cohort. TL;DR * Chain-of-thought (CoT) monitoring may serve as a core pillar for AI safety if further advancements in AI capabilities do not significantly degrade the monitorability of LLM serial reasoning. * As such, we studied the effects of...

Jan 641

Introducing the XLab AI Security Guide

This work was supported by UChicago XLab. Today, we are announcing our first major release of the XLab AI Security Guide: a set of online resources and coding exercises covering canonical papers on jailbreaks, fine-tuning attacks, and proposed methods to defend AI systems from misuse. Each page on the course...

Dec 27, 202519

LESSWRONG
LW

LESSWRONG
LW

Julian H

Julian H

Exploring Reinforcement Learning Effects on Chain-of-Thought Legibility

Introducing the XLab AI Security Guide

Julian H

Julian H

Exploring Reinforcement Learning Effects on Chain-of-Thought Legibility

Introducing the XLab AI Security Guide

TL;DR