Researcher at Ought
Thanks for this post, Paul!
NOTE: Response to this post has been even greater than we expected. We received more applications for experiment participant than we currently have the capacity to manage so we are temporarily taking the posting down. If you've applied and don't hear from us for a while, please excuse the delay! Thanks everyone who has expressed interest - we're hoping to get back to you and work with you soon.
It's correct that, so far, Ought has been running small-scale experiments with people who know the research background. (What is amplification? How does it work? What problem is it intended to solve?)
Over time, we also think it's necessary to run larger-scale experiments. We're planning to start by running longer and more experiments with contractors instead of volunteers, probably over the next month or two. Longer-term, it's plausible that we'll build a platform similar to what this post describes. (See here for related thoughts.)
The reason we've focused on small-scale experiments with a select audience is that it's easy to do busywork that doesn't tell you anything about the question of interest. The purpose of our experiments so far has been to get high-quality feedback on the setup, not to gather object-level data. As a consequence, the experiments have been changing a lot from week to week. The biggest recent change is the switch from task decomposition (analogous to amplification with imitation learning as distillation step) to decomposition of evaluation (analogous to amplification with RL as distillation step). Based on these changes, I think that if we had stopped at any point so far and focused on scaling up instead of refining the setup, it would have been a mistake.
The log is taken from this tree. There isn't much more to see than what's visible in the screenshot. Building out more complete versions of meta-reasoning trees like this is on our roadmap.
What I'd do differently now: