Self-Attribution Bias: When AI Monitors Go Easy on Themselves
Paper A common pattern in AI evaluation pipelines involves using an LLM to generate an action, then using the same model to evaluate whether that action is safe or correct. This appears in coding agents that review their own pull requests, tool-using assistants that assess whether their proposed actions are...
Mar 643