Can We Trust the Judge? A novel method of Modelling Human Bias and Systematic Error in Debate-Based Scalable Oversight
Author Note: This essay was developed during a recent job application process (which I didn’t get). In hindsight, I suspect I may have missed the mark in how I framed the relevance to alignment, so I’m posting it here both to share the idea and to invite critical feedback on...
Jul 19, 20251