bfitzgerald3132

Hi all. This post outlines my concerns over the effectiveness of alignment-based solutions to AI safety problems. My basic worry is that any rearticulation of human values, whether by a human or an AI, necessarily reshapes them. If this is the case, alignment is not sufficient for AI safety, since an aligned AI still challenge humanity’s autonomy over its normative conceptions of ethics. Additionally, AI’s ability to represent institutions and its association with “science,” “technology,” and other nebulous (but revered) concepts gives a single model an unprecedented amount of influence over the totality of human values.

At the very least, value drift should be acknowledged as a structural issue with all alignment solutions.... (read 3667 more words →)

LESSWRONG
LW

LESSWRONG
LW

bfitzgerald3132

bfitzgerald3132

bfitzgerald3132

AI models inherently alter "human values." So, alignment-based AI safety approaches must better account for value drift

bfitzgerald3132

bfitzgerald3132

bfitzgerald3132

AI models inherently alter "human values." So, alignment-based AI safety approaches must better account for value drift