Exploring capability gated out-of-context reasoning
tldr * We explore stylized but concerning out-of-context reasoning where the same prompt can contain computation-dependent hidden information that influences a stronger model, but does not a weaker one e.g. a cost-effective monitor. * This is done through simple JSON “pointer-chains” in prompts. Chain-depth tunes difficulty (deeper-chains influence Opus but...
Apr 98