Purpose and Introduction
Much AI Safety research has focused on ensuring AIs conform to the intentions implicitly given by humans, but this approach alone may be insufficient (Ji et al., 2024, Section 4.3). Some recent research has explicitly used human values and preferences as benchmarks for AI moral reasoning (e.g., Hendrycks et al., 2020; Jin et al., 2022), and concluded that the study of human moral psychology could be instrumental for improving AI moral reasoning and behavior. While identifying human values and evaluating AIs against them is important, it may not be the only way in which human psychology can be applied to AI moral alignment. I think it is important to evaluate... (read 3424 more words →)