x

LESSWRONG

LW

Julie Steele — LessWrong

Julie Steele

Julie Steele

Message

2

2mo

Julie Steele

2mo

Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability

Julie Steele2mo*10

Another question - why did you use the user message for the CoT replacement but a system prompt for the CoT skipping?

Prompted CoT Early Exit Undermines the Monitoring Benefits of CoT Uncontrollability

Julie Steele2mo10

Hey, really cool post! These accuracy numbers aren't conditional on controllability task success, right? How is the accuracy conditional on controllability success?