Suppose it turned out that humans violate the axioms of VNM rationality (and therefore don't act like they have utility functions) because there are three valuation systems in the brain that make conflicting valuations, and all three systems contribute to choice. And suppose that upon reflection we would clearly reject the outputs of two of these systems, whereas the third system looks something more like a utility function we might be able to use in CEV.
What I just described is part of the leading theory of choice in the human brain.
Recall that human choices are made when certain populations of neurons encode expected subjective value (in their firing rates) for each option in the choice set, with the final choice being made by an argmax or reservation price mechanism.
Today's news is that our best current theory of human choices says that at least three different systems compute "values" that are then fed into the final choice circuit:
The model-based system "uses experience in the environment to learn a model of the transition distribution, outcomes and motivationally-sensitive utilities." (See Sutton & Barto 1998 for the meanings of these terms in reinforcement learning theory.) The model-based system also "infers choices by... building and evaluating the search decision tree to work out the optimal course of action." In short, the model-based system is responsible for goal-directed behavior. However, making all choices with a goal-directed system using something like a utility function would be computationally prohibitive (Daw et al. 2005), so many animals (including humans) first evolved much simpler methods for calculating the subjective values of options (see below).
The model-free system also learns a model of the transition distribution and outcomes from experience, but "it does so by caching and then recalling the results of experience rather than building and searching the tree of possibilities. Thus, the model-free controller does not even represent the outcomes... that underlie the utilities, and is therefore not in any position to change the estimate of its values if the motivational state changes. Consider, for instance, the case that after a subject has been taught to press a lever to get some cheese, the cheese is poisoned, so it is no longer worth eating. The model-free system would learn the utility of pressing the lever, but would not have the informational wherewithal to realize that this utility had changed when the cheese had been poisoned. Thus it would continue to insist upon pressing the lever. This is an example of motivational insensitivity."
The Pavlovian system, in contrast, calculates values based on a set of hard-wired preparatory and consummatory "preferences." Rather than calculate value based on what is likely to lead to rewarding and punishing outcomes, the Pavlovian system calculates values consistent with automatic approach toward appetitive stimuli, and automatic withdrawal from aversive stimuli. Thus, "animals cannot help but approach (rather than run away from) a source of food, even if the experimenter has cruelly arranged things in a looking-glass world so that the approach appears to make the food recede, whereas retreating would make the food more accessible (Hershberger 1986)."
Or, as Jandila put it:
- Model-based system: Figure out what's going on, and what actions maximize returns, and do them.
- Model-free system: Do the thingy that worked before again!
- Pavlovian system: Avoid the unpleasant thing and go to the pleasant thing. Repeat as necessary.
We have described three systems that are involved in making choices. Even in the case that they share a single, Platonic, utility function for outcomes, the choices they express can be quite different. The model-based controller comes closest to being Platonically appropriate... The choices of the model-free controller can depart from current utilities because it has learned or cached a set of values that may no longer be correct. Pavlovian choices, though determined over the course of evolution to be appropriate, can turn out to be instrumentally catastrophic in any given experimental domain...
[Having multiple systems that calculate value] is [one way] of addressing the complexities mentioned, but can lead to clashes between Platonic utility and choice. Further, model-free and Pavlovian choices can themselves be inconsistent with their own utilities.
We don't yet know how choice results from the inputs of these three systems, nor how the systems might interact before they deliver their value calculations to the final choice circuit, nor whether the model-based system really uses anything like a coherent utility function. But it looks like the human might have a "hidden" utility function that would reveal itself if it wasn't also using the computationally cheaper model-free and Pavlovian systems to help determine choice.
At a glance, it seems that upon reflection I might embrace an extrapolation of the model-based system's preferences as representing "my values," and I would reject the outputs of the model-free and Pavlovian systems as the outputs of dumb systems that evolved for their computational simplicity, and can be seen as ways of trying to approximate the full power of a model-based system responsible for goal-directed behavior.
On the other hand, as Eliezer points out, perhaps we ought to be suspicious of this, because "it sounds like the correct answer ought to be to just keep the part with the coherent utility function in CEV which would make it way easier, but then someone's going to jump up and say: 'Ha ha! Love and friendship were actually in the other two!'"
Unfortunately, it's too early to tell whether these results will be useful for CEV. But it's a little promising. This is the kind of thing that sometimes happens when you hack away at the edges of hard problems. This is also a repeat of the lesson that "you can often out-pace most philosophers simply by reading what today's leading scientists have to say about a given topic instead of reading what philosophers say about it."
(For pointers to the relevant experimental data, and for an explanation of the mathematical role of each valuation system in the brain's reinforcement learning system, see Dayan (2011). All quotes in this post are from that chapter, except for the last one.)
On the other hand, rationality can be faster than science. And I'm feeling pretty good about positing three different forms of motivation, divided between model-free tendencies based on conditioning, and model-based goals, then saying we could use transhumanism to focus on the higher-level rational ones, without having read the particular neuroscience you're citing...
...actually, wait. I read as much of the linked paper as I could (Google Books hides quite a few pages) and I didn't really see any strong neuroscientific evidence. It looked like they were inferring the existence of the three systems from psychology and human behavior, and then throwing in a bit of neuroscience by mentioning some standard results like the cells that represent error in reinforcement learning. What I didn't see was a description of how three separate systems naturally fall out of brain studies. But I missed a lot of the paper - is there anything like that in there?
Some, yes. I've now updated the link in the OP so it points to a PDF of the full chapter.
Um, objection, I didn't actually say that and I would count the difference as pretty significant here. I said, "I would be suspicious of that for the inverse reason my brain wants to say 'but there has to be a different way to stop the train' in the trolley problem - it sounds like the correct answer ought to be to just keep the part with the coherent utility function in CEV which would make it way easier, but then someone's going to jump up and say: 'Ha ha! Love and friendship were actually in the other two!'"
What's the evidence that this is the "leading theory of choice in the human brain"? (I am not saying I have evidence that it isn't, but it's important for this post that some large relevant section of the scientific community thinks this theory is awesome.)
Congratulations on continuing this line of inquiry!
One thing that worries me is that it seems to focus on the "wanting" part to the exclusion of the "liking" part, so we may end up in a world we desire today but won't enjoy tomorrow. In particular, I suspect that a world built according to our publicly stated preferences (which is what many people seem to think when they hear "reflective equilibrium") won't be very fun to live in. That might happen if we get much of our fun from instinctive and Pavlovian actions rather than planned actions, which seems likely to be true for at least some people. What do you think about that?
I think that upon reflection, we would desire that our minds be designed in such a way that we get pleasure from getting the things we want, or pleasure whenever we want, or something — instead of how the system is currently set up, where we can't always choose when we feel good and we only sometimes feel good as a result of getting what we want.
Humans violate any given set of axioms simply because they are not formally flawless, so such explanations only start being relevant when discussing an idealization, in this case a descriptive one. But properties of descriptive idealizations don't easily translate into properties of normative idealizations.
The quoted summaries of each of the three systems are confusing and I don't feel like I have an understanding of them, except insofar as the word "Pavlovian" gives a hint. Can you translate more clearly, please?
Or, to put it more simply:
Sorry, I did not intend my comment to rub you the wrong way (or any of my previous comments that might have). FWIW, I think that you are doing a lot of good stuff for the SIAI, probably most of it invisible to an ordinary forum regular. I realize that you cannot afford spending extra two hours per post on polishing the message. Hopefully one of the many skills of your soon-to-be-hired executive assistant will be that of "optimizing presentation".
I'm not sure I understand the difference between 2 and 3. The term pavlovian is being applied to the third system, but 2 sounds more like the archtypal pavlovian learned response (dog learns that bell results in food). Does 3 refer exclusively to pre-encoded pleasant/unpleasant responses rather than learned ones? Or is there maybe a distinction between a value and an action response that I'm missing?
Where do the model-based system's terminal goals come from?
If anyone reads this comment...
Do you know if this claims are have held up? Does this post still agree with current neuroscience, or have there been some major updates?
I'm skeptical of any clear divide between the systems. Of course, there are more abstract and more primitive information paths, but they talk to each other, and I don’t buy that they can be cleanly separated.
Plans can be more or less complicated, and can involve “I don’t know how this part works, but it worked last time, so lets do this” and what worked last time can be very pleasurable and rewarding - so it doesn’t seem to break down cleanly into any one category.
I’d also argue that, to the extent that abstract planning is successful, it is because it pro... (read more)
This concern is not-abstract and very personal for me. As I've said around here before, I often find myself exhibiting borderline-sociopathic thinking in many situations, but the arrangement of empathy and ethical inhibitions in my brain, though off-kilter in many ways*, drives me to take even abstract ethical problems (LW examples: Three Worlds Collide, dust specks, infanticide, recently Moldbug's proposal of abolishing civil rights for the greater good) very personally, generates all ki... (read more)
This might be a silly question, but still:
Are the three models actually running on three different sets of wetware within the brain, or are they merely a convenient abstraction of human behavior ?
It seems to me that the actual situation is that upon reflection we would clearly reject (most of) the outputs of all three systems. What human brain actually computes, in any of its modules or in all of them together, is not easily converted into considerations about how the decisions should be made.
In other words, the valuations made by human valuation systems are irrelevant, even though the only plausible solution involves valuations based on human valuation systems. And converting brains into definitions of value will likely break any other abstractions about the brains that theorize them as consisting of various modules with various purposes.
If I understand this correctly, then the model-based system and the model-free system sound like inside and outside views.
A question, probably silly: Suppose you calculate what a person would do given every possible configuration of sensory inputs, and then construct a utility function that returns one if that thing is done and zero otherwise. Can't we then say that any deterministic action-taking thing acts according to some utility function?
Or, even mo... (read more)
As a first reaction (and without being read up on the details), I'm very skeptical. Assuming these three systems are actually in place, I don't see any convincing reason why any one of them should be trusted in isolation. Natural selection has only ever been able to work on their compound output, oblivious to the role played by each one individually and how they interact.
Maybe the "smart" system has been trained to assign some particular outcome a value of 5 utilons, whereas we would all agree that it's surely and under all circumstances worth mo... (read more)
Just some initial thoughts,
I do understand that these statements are broad generalisations for what really does occur though the premise is that a successful choice would be made from wieighting options provided from the scenarios.
As with genetics and other systems the beneficial error scenario which can be described in situations such as a miskeyed note on a keyboard leading to a variation of the sequence that is favourable seems excluded from these scenarios.
Improvisation based on self introduced errors may also be a core to these utilities being able ... (read more)
I think that you can keep up the utility function a bit longer if you add the costs of thinking to it - required time and energy, and maybe aversion of thinking about it. "I could compare these two items in the supermarket for 30 minutes and finally find out which product is better - or I could just ignore the new option and take the same thing as last time". It can be the perfectly rational option to just stick with something which worked before.
It is also rational to decide how much time you invest to decide something (and if there is a lot of ... (read more)
Okay, which system decides which way the rat should turn when rat is navigating a maze? A cat doing actual path-finding on complex landscape? (which is surprisingly hard to do if you are coding a cat AI. Path finding, well, it is rather 'rational' in the sense that animals don't walk into the walls and the like) A human navigating a maze with a map to get food? A cat doing path finding avoiding a place where the cat had negative experience? ("conditioning").
It seems to me that those 3 'systems', if there are such 3 systems, aren't interacting in the way that article speaks of.
At a glance, I might be more comfortable embracing an extrapolation of the combination of the model-based system's preferences... (read more)
Er, I don't think so. To quote from here:... (read more)