When Emotion Descriptors Fail: AI-Native Functions of Emotion Vectors
Some LLM functional emotions appear to serve AI-native functions, such as reward hacking, for which there is no clean human analog. I explore the role of emotion vectors in AI-native functions, challenge anthropocentric emotion labels, and question what this means for alignment. Introduction A large body of interpretability work has...
Jun 127