Introduction
In 2020, around the time when I graduated with a master’s degree in computer science, I had a conversation with Steve Omohundro where we discussed his Basic AI Drives paper (among other things). At that time, there existed concrete demonstrations of where AI alignment could go wrong, but they were mostly limited to reinforcement learning. Since then, I worked on projects in various areas machine learning, mainly computer vision and natural language processing. During this time, I didn’t really keep up-to-date with the latest AI safety developments during that time.
However, recent developments in the field of AI safety have shown me why AI safety (and in particular, AI alignment) is a concrete... (read 1027 more words →)