[ Question ]

How is reinforcement learning possible in non-sentient agents?

by SomeoneKind1 min read5th Jan 20213 comments

3

Reinforcement LearningSufferingAI
Personal Blog

(Probably a stupid nooby question that won't help solve alignment)

Suppose you implement a goal in an AI through a reinforcement learning system. Why does the AI really "care" about this goal? Why does it obey? It does because it is punished and/or rewarded, which motivates it to achieve that goal.

Okay. So why does AI really care about punishment and reward in the first place? Why does it follows its implemented goal? 

Sentient beings do because they feel pain and pleasure. They have no choice but to care about punishment and reward. They inevitably do it because they feel it. Assuming that our AI does not feel, what is the nature of its system of punishments and rewards? How is it possible to punish or reward a non-sentient agent? 

My intuitive response would be "It is just physics. What we call 'reward' and 'punishment' are just elements of a program forcing an agent to do something", but I don't understand how this RL physics is different from that in our carbonic animal brains.
Do Artificial Reinforcement Learners Matter Morally, written by Brian Tomasik, makes the distinction even less obvious for me. What do I miss?

New Answer
Ask Related Question
New Comment

2 Answers

An oversimplified picture of a reinforcement-learning agent (in particular, roughly a Q-learning agent with a single state) could be as follows. A program has two numerical variables: go_left and go_right. The agent chooses to go left or right based on which of these variables is larger. Suppose that go_left is 3 and go_right is 1. The agent goes left. The environment delivers a "reward" of -4. Now go_left gets updated to 3 - 4 = -1 (which is not quite the right math for Q-learning, but ok). So now go_right > go_left, and the agent goes right.

So what you said is exactly correct: "It is just physics. What we call 'reward' and 'punishment' are just elements of a program forcing an agent to do something". And I think our animal brains do the same thing: they receive rewards that update our inclinations to take various actions. However, animal brains have lots of additional machinery that simple RL agents lack. The actions we take are influenced by a number of cognitive processes, not just the basic RL machinery. For example, if we were just following RL mechanically, we might keep eating candy for a long time without stopping, but our brains are also capable of influencing our behavior via intellectual considerations like "Too much candy is bad for my health". It's possible these intellectual thoughts lead to their own "rewards" and "punishments" that get applied to our decisions, but at least it's clear that animal brains make choices in very complicated ways compared with barebones RL programs.

You wrote: "Sentient beings do because they feel pain and pleasure. They have no choice but to care about punishment and reward." The way I imagine it (which could be wrong) is that animals are built with RL machinery (along with many other cognitive mechanisms) and are mechanically driven to care about their rewards in a similar way as a computer program does. They also have cognitive processes for interpreting what's happening to them, and this interpretive machinery labels some incoming sensations as "good" and some as "bad". If we ask ourselves why we care about not staying outside in freezing temperatures without a coat, we say "I care because being cold feels bad". That's a folk-psychology way to say "My RL machinery cares because being outside in the cold sends rewards of -5 at each time step, and taking the action of going inside changes the rewards to +1. And I have other cognitive machinery that can interpret these -5 and +1 signals as pain and pleasure and understand that they drive my behavior."

Assuming this account is correct, the main distinction between simple programs and ourselves is one of complexity -- how much additional cognitive machinery there is to influence decisions and interpret what's going on. That's the reason I argue that simple RL agents have a tiny bit of moral weight. The difference between them and us is one of degree.

Seems to me that there must be more about pain and pleasure than mere -1 and +1 signals, because there are multiple methods how to make some behavior more or less likely. Pain and pleasure is one such option, habits are another option, unconscious biases yet another. Each of them make some behavior more likely and some other behavior less likely, but feel quite differently from inside. Compared to habits and unconscious biases, pain and pleasure have some extra quality because of how they are implemented in our bodies.

The simple RL agents, unless they have the specific circuits to feel pain and pleasure, are in my opinion more analogical to the habits or unconscious biases.

You don't need a brain for evolution to work on you. As long as there is a selective pressure, reproduction with mutation, and death linked to fitness, you will have improvements towards whatever is fittest for the environment.

The majority of living things on earth don't care about anything in a manner we can empathise with. They either don't have brains at all, or their brains and senses are so different from ours that the idea of us being able to understand their subjective experience and motivations is nil. 

The above being said, within the animal kingdom there are several branches that clearly experience emotion as we understand it. We feel, but so do organisms as different as birds and octopodes. Clearly this is a useful adaptation (at least within organisms big enough for us to see). Perhaps it will be useful for AI too?