GPT-3 is similar enough to "prosaic" AGI that we can work on key alignment problems without relying on conjecture or speculative analogies. And because GPT-3 is already being deployed in the OpenAI API, its misalignment matters to OpenAI’s bottom line — it would be much better if we had an API that was trying to help the user instead of trying to predict the next word of text from the internet.
I think this puts our team in a great place to have an impact:
- If our research succeeds I think it will directly reduce existential risk from AI. This is not meant to be a warm-up problem, I think it’s the real thing.
- We are working with state of the art systems that could pose an existential risk if scaled up, and our team’s success actually matters to the people deploying those systems.
- We are working on the whole pipeline from “interesting idea” to “production-ready system,” building critical skills and getting empirical feedback on whether our ideas actually work.
We have the real-world problems to motivate alignment research, the financial support to hire more people, and a research vision to execute on. We are bottlenecked by excellent researchers and engineers who are excited to work on alignment.
What the team does
In the past Reflection focused on fine-tuning GPT-3 using a reward function learned from human feedback. Our most recent results are here, and had the unusual virtue of simultaneously being exciting enough to ML researchers to be accepted at NeurIPS while being described by Eliezer as “directly, straight-up relevant to real alignment problems.”
We’re currently working on three things:
- [20%] Applying basic alignment approaches to the API, aiming to close the gap between theory and practice.
- [60%] Extending existing approaches to tasks that are too hard for humans to evaluate; in particular, we are training models that summarize more text than human trainers have time to read. Our approach is to use weaker ML systems operating over shorter contexts to help oversee stronger ones over longer contexts. This is conceptually straightforward but still poses significant engineering and ML challenges.
- [20%] Conceptual research on domains that no one knows how to oversee and empirical work on debates between humans (see our 2019 writeup). I think the biggest open problem is figuring out how and if human overseers can leverage “knowledge” the model acquired during training (see an example here).
If successful, ideas will eventually move up this list, from the conceptual stage to ML prototypes to real deployments. We’re viewing this as practice for integrating alignment into transformative AI deployed by OpenAI or another organization.
What you’d do
Most people on the team do a subset of these core tasks:
- Design+build+maintain code for experimenting with novel training strategies for large language models. This infrastructure needs to support a diversity of experimental changes that are hard to anticipate in advance, work as a solid base to build on for 6-12 months, and handle the complexity of working with large language models. Most of our code is maintained by 1-3 people and consumed by 2-4 people (all on the team).
- Oversee ML training. Evaluate how well models are learning, figure out why they are learning badly, and identify+prioritize+implement changes to make them learn better. Tune hyperparameters and manage computing resources. Process datasets for machine consumption; understand datasets and how they affect the model’s behavior.
- Design and conduct experiments to answer questions about our models or our training strategies.
- Design+build+maintain code for delegating work to ~70 people who provide input to training. We automate workflows like sampling text from books, getting multiple workers’ answers to questions about that text, running a language model on those answers, then showing the results to someone else for evaluation. It also involves monitoring worker throughput and quality, automating decisions about what tasks to delegate to whom, and making it easy to add new work or change what people are working on.
- Participate in high-level discussion about what the team should be working on, and help brainstorm and prioritize projects and approaches. Complicated projects seem to go more smoothly if everyone understands why they are doing what they are doing, is on the lookout for things that might slip through the cracks, is thinking about the big picture and helping prioritize, and cares about the success of the whole project.