How AI Fails Us: A non-technical view of the Alignment Problem

Disclaimer: I am writing this post after doing some amount of reading into alignment, but I acknowledge that my priors in this case are not well established. I have conducted limited experiments into ML research (largely focusing on the utility and interpretability of language and vision transformers, including occlusion sensitivity), and have since broadened my reading into more social and humanistic writing on the subject of AI development.

I am somewhat surprised that this particular document has never surfaced on the Forums that I can find with the search function, since it on some level echoes many concerns I have seen in the space (that AI development is unconstrained, suffers from poor use of metrics in place of human goals i.e. goodhart's law, is being pursued without regards to ethics or safety concerns etc.). In particular, I believe this writing addresses three areas of concern I have found when reviewing some views within the AI alignment community:

AI development is presumed as an inevitable "fact of society". The development and popularisation of AI and related technologies is presumed to follow exponential mathematical growth (See: "Explosive scientific and technological advancement"). This is part of broader efforts to quantize, measure, and estimate growth rates for science and technology in general, and ignores the human component involved in scientific development (namely, the socioeconomic and political factors that lead to the resources necessary for such development being allocated). As a counterpoint to these views, Microsoft has pumped 1 billion dollars into OpenAI alone - AI development is not just taking off because of an inevitable trend.
As a result of the perceived unchangeable and inevitable nature of AI development, alignment solutions fall into what I perceive to be two broad categories: Anti-development solutions and technological solutions. Anti-development solutions focus on, for example, stopping others from developing AGI by obtaining some kind of global moratorium, usually by offering some small group highly destructive strategic power. More facetiously, they fall into the "bomb the silicon mines" camp, where developent of computing technology is hampered in some way. These solutions are (rightly I think) perceived as difficult to implement and, where they concern essentially holding the world hostage, somewhat unethical. Therefore, research has focused on producing technological solutions (e.g. Reinforcement Learning with Human Feedback or Iterated Amplification) that are palatable to the current paradigm of AI development. This means that they pursue autonomous intelligent agents that are better aligned with human aims than before, usually through some kind of human feedback mechanism. Notably, this does not stop the problem of bad human actors giving AI poor goals to begin with (e.g. using AI systems to develop killer robots, or using AI systems to create better bioweapons), and as the article explains remains chained to ideas of competing with and replacing humans through intellectual domination rather than assisting humans.
As a final potential blindspot, AI alignment's relative lack of popularity has led to what I would argue is an unspoken assumption that we must (or, indeed, are ethically obliged to) act alone. Policy, social, and economic factors are not considered as valid vectors of consideration insofar as they do not relate to, for example, funding to create technological solutions (see point 2), or are obstacles to be overcome in the search for a solution. I am aware that this may be due to the high level of specialisation within AI and computer science within the alignment community. However, this also indicates perhaps why we may become inclined to pessimism or the sense that we are "going it alone".

Overall, even if you do not agree with these points, I highly encourage at least skimming the report I have linked.

LESSWRONG
LW

LESSWRONG
LW

7

How AI Fails Us: A non-technical view of the Alignment Problem

7

7