x

LESSWRONG

LW

yaumeng — LessWrong

yaumeng

yaumeng

Message

27

1

2y

yaumeng

27

2y

Efficiently Detecting Hidden Reasoning with a Small Predictor Model

by RohanS, Vishnu Vardhan Sai Lanka, yaumeng, and daria

This post is based on work done during SPAR by Daria Ivanova, Yau-Meng Wong, Nicholas Chen, and Vishnu Vardhan with guidance from Rohan Subramani. We are grateful to Rauno Arike, Robert McCarthy, and Kei Nishimura-Gasparian for feedback. TL;DR We introduce two black-box approaches to detect hidden reasoning in a model’s...

Jul 13, 2025•34