AI models inherently alter "human values." So, alignment-based AI safety approaches must better account for value drift
Hi all. This post outlines my concerns over the effectiveness of alignment-based solutions to AI safety problems. My basic worry is that any rearticulation of human values, whether by a human or an AI, necessarily reshapes them. If this is the case, alignment is not sufficient for AI safety, since...