So, human values are fragile, vague and possibly not even a well defined concept, yet figuring it out seems essential for an aligned AI. It seems reasonable that, faced with a hard problem, one would start instead with a simpler one that has some connection to the original problem. For someone not working in the area of ML or AI alignment, it seems obvious that researching simpler-than-human values might be a way to make progress. But maybe this is one of those false obvious ideas that non-experts tend to push after a cursory learning about a complex research topic.

That said, assuming that the value complexity scales with intelligence, studying less intelligent agents and their version of values maybe something to pursue. Dolphin values. Monkey values. Dog values. Cat values. Fish values. Amoeba values. Sure, we lose the inside view in this case, but the trade-off seems at least being worthy of exploring. Is there any research going in that area?

New to LessWrong?

New Answer
New Comment

2 Answers sorted by

Yes. See:

Mammalian Value Systems

Gopal P. Sarma, Nick J. Hay(Submitted on 28 Jul 2016 (v1), last revised 21 Jan 2019 (this version, v4))

Characterizing human values is a topic deeply interwoven with the sciences, humanities, art, and many other human endeavors. In recent years, a number of thinkers have argued that accelerating trends in computer science, cognitive science, and related disciplines foreshadow the creation of intelligent machines which meet and ultimately surpass the cognitive abilities of human beings, thereby entangling an understanding of human values with future technological development. Contemporary research accomplishments suggest sophisticated AI systems becoming widespread and responsible for managing many aspects of the modern world, from preemptively planning users' travel schedules and logistics, to fully autonomous vehicles, to domestic robots assisting in daily living. The extrapolation of these trends has been most forcefully described in the context of a hypothetical "intelligence explosion," in which the capabilities of an intelligent software agent would rapidly increase due to the presence of feedback loops unavailable to biological organisms. The possibility of superintelligent agents, or simply the widespread deployment of sophisticated, autonomous AI systems, highlights an important theoretical problem: the need to separate the cognitive and rational capacities of an agent from the fundamental goal structure, or value system, which constrains and guides the agent's actions. The "value alignment problem" is to specify a goal structure for autonomous agents compatible with human values. In this brief article, we suggest that recent ideas from affective neuroscience and related disciplines aimed at characterizing neurological and behavioral universals in the mammalian class provide important conceptual foundations relevant to describing human values. We argue that the notion of "mammalian value systems" points to a potential avenue for fundamental research in AI safety and AI ethics.

Thanks, that's interesting! There isn't a lot they do with the question, but at least they ask it.

4avturchin4y
There is a couple of followup articles by the authors, which could be found if you put the title of this article in the Google Scholar and look at the citations.

Consider the trilobites. If there had been a trilobite-Friendly AI using CEV, invincible articulated shells would comb carpets of wet muck with the highest nutrient density possible within the laws of physics, across worlds orbiting every star in the sky. If there had been a trilobite-engineered AI going by 100% satisfaction of all historical trilobites, then trilobites would live long, healthy lives in a safe environment of adequate size, and the cambrian explosion (or something like it) would have proceeded without them.

https://www.lesswrong.com/posts/cmrtpfG7hGEL9Zh9f/the-scourge-of-perverse-mindedness?commentId=jo7q3GqYFzhPWhaRA

9 comments, sorted by Click to highlight new comments since: Today at 12:32 PM

My background: I've spent a lot more time with animals than rational humans.

To point me in the right direction before trying to write more - I would like a definition for "values" to work with.

An instant find via images - human values. Anywhere near?!

I think an interesting line of research would be to start with simpler animals and move towards more complex ones to look for evidence of valence-based values. Since we're looking for neurological feedback mechanisms and signs that control signals in those systems are what creates valence and interacts to ultimately create values, starting with brains with fewer neurons seems like a smart approach.

Right, it makes sense to me, just surprised it's not being actively pursued.

If I could get my hands on some money, I'd very much like to see it channeled into neuroscience research to better scan brains so we can identify the mechanisms at work, although I'm not sure how much differential impact this would have, as my outsider impression is that this kind of work is already happening, and at best I could only slightly nudge it towards directions I think are useful but would largely be bottlenecked by the need for more fundamental, broadly useful work in neuroscience to be done first.

The other possibility would be to retrain myself as a neuroscientist so I could do this work myself, but I think that works against my comparative advantages at this point in my life.

Losing the inside view is a fundamental change in the question, and is a different definition of value than we usually use.

Indeed. And some exploration of this change in view and figuring out some invariant definition of value might be a good place to start.

Given that animals don't act like expected utility maximizers, what do you mean when you talk about their values? For humans, you can ground a definition of "true values" in philosophical reflection (and reflection about how that reflection relates to their true values, and so on), but non-human animals can't do philosophy.

Given that animals don't act like expected utility maximizers

Why do you think so? Higher animals behave very much like humans.

non-human animals can't do philosophy

I would not consider it a disadvantage, or to mean that they don't have a version of values.

Depending how you define EU maximisation, everything us doing it, nothing is doing it, and many points in between.