The Problem With the Word ‘Alignment’
This post was written by Peli Grietzer, inspired by internal writings by TJ (tushita jha), for AOI[1]. The original post, published on Feb 5, 2024, can be found here: https://ai.objectives.institute/blog/the-problem-with-alignment. The purpose of our work at the AI Objectives Institute (AOI) is to direct the impact of AI towards human autonomy and human flourishing. In the course of articulating our mission and positioning ourselves -- a young organization -- in the landscape of AI risk orgs, we’ve come to notice what we think are serious conceptual problems with the prevalent vocabulary of ‘AI alignment.’ This essay will discuss some of the major ways in which we think the concept of ‘alignment’ creates bias and confusion, as well as our own search for clarifying concepts. At AOI, we try to think about AI within the context of humanity’s contemporary institutional structures: How do contemporary market and non-market (eg. bureaucratic, political, ideological, reputational) forces shape AI R&D and deployment, and how will the rise of AI-empowered corporate, state, and NGO actors reshape those forces? We increasingly feel that ‘alignment’ talk tends to obscure or distort these questions. The trouble, we believe, is the idea that there is a single so-called Alignment Problem. Talk about an ‘Alignment Problem’ tends to conflate a family of related but distinct technical and social problems, including: P1: Avoiding takeover from emergent optimization in AI agents P2: Ensuring that AI’s information processing (and/or reasoning) is intelligible to us P3: Ensuring AIs are good at solving problems as specified (by user or designer) P4: Ensuring AI systems enhance, and don’t erode, human agency P5: Ensuring that advanced AI agents learn a human utility function P6: Ensuring that AI systems lead to desirable systemic and long term outcomes Each of P1-P6 is known as ‘the Alignment Problem’ (or as the core research problem in ‘Alignment Research’) to at least some
I expect it matters to the extent we care about whether the generalizing to the new question is taking place in the expensive pretraining phase, or in the active in-context phase.