[ Question ]

Since figuring out human values is hard, what about, say, monkey values?

by shminux 1 min read1st Jan 202013 comments

36


So, human values are fragile, vague and possibly not even a well defined concept, yet figuring it out seems essential for an aligned AI. It seems reasonable that, faced with a hard problem, one would start instead with a simpler one that has some connection to the original problem. For someone not working in the area of ML or AI alignment, it seems obvious that researching simpler-than-human values might be a way to make progress. But maybe this is one of those false obvious ideas that non-experts tend to push after a cursory learning about a complex research topic.

That said, assuming that the value complexity scales with intelligence, studying less intelligent agents and their version of values maybe something to pursue. Dolphin values. Monkey values. Dog values. Cat values. Fish values. Amoeba values. Sure, we lose the inside view in this case, but the trade-off seems at least being worthy of exploring. Is there any research going in that area?

New Answer
Ask Related Question
New Comment

2 Answers

Yes. See:

Mammalian Value Systems

Gopal P. Sarma, Nick J. Hay(Submitted on 28 Jul 2016 (v1), last revised 21 Jan 2019 (this version, v4))

Characterizing human values is a topic deeply interwoven with the sciences, humanities, art, and many other human endeavors. In recent years, a number of thinkers have argued that accelerating trends in computer science, cognitive science, and related disciplines foreshadow the creation of intelligent machines which meet and ultimately surpass the cognitive abilities of human beings, thereby entangling an understanding of human values with future technological development. Contemporary research accomplishments suggest sophisticated AI systems becoming widespread and responsible for managing many aspects of the modern world, from preemptively planning users' travel schedules and logistics, to fully autonomous vehicles, to domestic robots assisting in daily living. The extrapolation of these trends has been most forcefully described in the context of a hypothetical "intelligence explosion," in which the capabilities of an intelligent software agent would rapidly increase due to the presence of feedback loops unavailable to biological organisms. The possibility of superintelligent agents, or simply the widespread deployment of sophisticated, autonomous AI systems, highlights an important theoretical problem: the need to separate the cognitive and rational capacities of an agent from the fundamental goal structure, or value system, which constrains and guides the agent's actions. The "value alignment problem" is to specify a goal structure for autonomous agents compatible with human values. In this brief article, we suggest that recent ideas from affective neuroscience and related disciplines aimed at characterizing neurological and behavioral universals in the mammalian class provide important conceptual foundations relevant to describing human values. We argue that the notion of "mammalian value systems" points to a potential avenue for fundamental research in AI safety and AI ethics.

Consider the trilobites. If there had been a trilobite-Friendly AI using CEV, invincible articulated shells would comb carpets of wet muck with the highest nutrient density possible within the laws of physics, across worlds orbiting every star in the sky. If there had been a trilobite-engineered AI going by 100% satisfaction of all historical trilobites, then trilobites would live long, healthy lives in a safe environment of adequate size, and the cambrian explosion (or something like it) would have proceeded without them.

https://www.lesswrong.com/posts/cmrtpfG7hGEL9Zh9f/the-scourge-of-perverse-mindedness?commentId=jo7q3GqYFzhPWhaRA