Automated Sandwiching & Quantifying Human-LLM Cooperation: ScaleOversight hackathon results
We ran a hackathon on scalable oversight with Gabriel Recchia as keynote speaker (watch the talk) and Ruiqi Zhong as co-judge. Here, we share the top projects and results. In summary: * We can automate the “sandwiching” paradigm from Cotra [1] by having a smaller model ask structured questions to...
Feb 23, 20238