Graeme Ford — LessWrong

Fine Tuning CoT obfuscation into Kimi K2.5

In these research notes, I will share some CoT traces from lightly fine tuned checkpoints of Kimi K2.5 that may be examples of load-bearing obfuscation within the internal reasoning outputs of the model. These are notes and collected model outputs from a project that I have been working on haphazardly...

Mar 2615

Silent Agreement Evaluation

[For MATS mentors: for research updates on my ongoing CoT obfuscation model organism project - see my recent less wrong post here here and project page here] Measuring out-of-context Schelling coordination capabilities in large language models. Overview This is my first foray into AI safety research, and is primarily exploratory....

Jan 195

Measuring Schelling Coordination - Reflections on Subversion Strategy Eval

How might we best design experiments to measure Schelling coordination capabilities? A side project building a Schelling coordination eval has led to a role as an AI Safety researcher - this work will officially begin in a month or so. I have been hesitant to post here or share my...

May 12, 20256