Load-Bearing Obfuscation and Self-Jailbreaking CoT
In these research notes, I will share some CoT traces from lightly fine tuned checkpoints of Kimi K2.5 that may be examples of load-bearing obfuscation within the internal reasoning outputs of the model. These are notes and collected model outputs from a project that I have been working on haphazardly...