Alignment Research @ EleutherAI
The past and future of AI alignment at Eleuther Initially, EleutherAI focused mainly on supporting open source research. AI alignment was something that was acknowledged by many of the core members as important, but it was not the primary focus. We mainly had discussions about the topic in the #alignment channel and other parts of our discord while we worked on other projects. As EAI grew, AI alignment started to get taken more seriously, especially by its core members. What started off as a single channel turned into a whole host of channels about different facets of alignment. We also hosted several reading groups related to alignment, such as the modified version of Richard Ngo’s curriculum and an interpretability reading group. Eventually alignment became the central focus for a large segment of EAIs leadership, so much so that all our previous founders went off to do full time alignment research at Conjecture and OpenAI. Right now, the current leadership believes making progress in AI alignment is very important. The organization as a whole is involved in a mix of alignment research, interpretability work, and other projects that we find interesting. Moving forward, EAI remains committed to facilitating and enabling open source research, and plans to ramp up its alignment and interpretability research efforts. We want to increase our understanding and control of modern ML systems and minimize existential risks posed by artificial intelligence. Our meta-level approach to alignment It is our impression that AI alignment is still a very pre-paradigmatic field. Progress in the field often matches the research pattern we see in the ELK report, where high level strategies are proposed, and problems or counterexamples are found. Sometimes these issues can be fixed, but oftentimes fundamental issues are identified that make an initially promising approach less interesting. A consequence of this is that it’s difficult to commit to an object level strategy to make