# 28

This continues my previous post on Robin Hanson's pre-rationality, by offering some additional comments on the idea.

The reason I re-read Robin's paper recently was to see if it answers a question that's related to another of my recent posts: why do we human beings have the priors that we do? Part of that question is why are our priors pretty close to each other, even if they're not exactly equal. (Technically we don't have priors because we're not Bayesians, but we can be approximated as Bayesians, and those Bayesians have priors.) If we were created by a rational creator, then we would have pre-rational priors. (Which, since we don't actually have pre-rational priors, seems to be a good argument against us having been created by a rational creator. I wonder what Aumann would say about this?) But we have other grounds for believing that we were instead created by evolution, which is not a rational process, in which case the concept doesn't help to answer the question, as far as I can see. (Robin never claimed that it would, of course.)

The next question I want to consider is a normative one: is pre-rationality rational? Pre-rationality says that we should reason as if we were pre-agents who learned about our prior assignments as information, instead of just taking those priors as given. But then, shouldn't we also act as if we were pre-agents who learned about our utility function assignments as information, instead of taking them as given? In that case, we're led to the conclusion that we should all have common utility functions, or at least that pre-rational agents should have values that are much less idiosyncratic than ours. This seems to be a reductio ad absurdum of pre-rationality, unless there is an argument why we should apply the concept of pre-rationality only to our priors, and not to our utility functions. Or is anyone tempted to bite this bullet and claim that we should apply pre-rationality to our utility functions as well? (Note that if we were created by a rational creator, then we would have common utility functions.)

The last question I want to address is one that I already raised in my previous post. Assuming that we do want to be pre-rational, how do we move from our current non-pre-rational state to a pre-rational one? This is somewhat similar to the question of how do we move from our current non-rational (according to ordinary rationality) state to a rational one. Expected utility theory says that we should act as if we are maximizing expected utility, but it doesn't say what we should do if we find ourselves lacking a prior and a utility function (i.e., if our actual preferences cannot be represented as maximizing expected utility).

The fact that we don't have good answers for these questions perhaps shouldn't be considered fatal to pre-rationality and rationality, but it's troubling that little attention has been paid to them, relative to defining pre-rationality and rationality. (Why are rationality researchers more interested in knowing what rationality is, and less interested in knowing how to be rational? Also, BTW, why are there so few rationality researchers? Why aren't there hordes of people interested in these issues?)

As I mentioned in the previous post, I have an idea here, which is to apply some concepts related to UDT, in particular Nesov's trading across possible worlds idea. As I see it now, pre-rationality is mostly about the (alleged) irrationality of disagreements between counterfactual versions of the same agent, when those disagreements are caused by irrelevant historical accidents such as the random assortment of genes. But how can such agents reach an agreement regarding what their beliefs should be, when they can't communicate with each other and coordinate physically? Well, at least in some cases, they may be able to coordinate logically. In my example of an AI whose prior was picked by the flip of a coin, the two counterfactual versions of the AI are similar enough to each other and symmetrical enough, for each to infer that if it were to change its prior from O or P to Q, where Q(A=heads)=0.5, the other AI would do the same, but this inference wouldn't be true for any Q' != Q, due to lack of symmetry.

Of course, in the actual UDT, such "changes of prior" do not literally occur, because coordination and cooperation between possible worlds happen naturally as part of deciding acts and strategies, while one's preferences stay constant. Is that sufficient, or do we really need to change our preferences and make them pre-rational? I'm not sure.

# 28

New Comment

Or is anyone tempted to bite this bullet and claim that we should apply pre-rationality to our utility functions as well?

Yup. (Or something like it.)

I just don't see pre-rationality being much tied to whether you in fact had a rational creator. The point is, as you say, to consider the info in the way you were created. I certainly do think one should also consider the info in the preferences we were given, as well as the beliefs, but I just don't see this implying that we should have the common preferences. If you could prove something similar to what I proved about common priors for common utility functions, that would be very interesting.

In your paper, you defined pre-rationality as having the same beliefs as a hypothetical pre-agent who learns about your prior assignment. Let's extend this definition to values. You are extended-pre-rational if each counterfactual version of you (e.g., where you inherited a different assortment of genes from your parents) has the same beliefs and values as a pre-agent who learns about his prior and utility function assignments. Since values (i.e., utility functions) don't change or update upon learning new info, all counterfactual versions of you must have the same utility functions, if they are extended-pre-rational.

Does that make sense, or do I need to formalize the argument?

You can define a similar "pre" condition, but it is far less clear why one should satisfy such a condition. Beliefs are about the world out there, so it seems clearer that you don't want your beliefs to change when you change but the world out there doesn't change. Values are about you, so it seems reasonable for your values to change even when the world out there doesn't change.

Are beliefs just about the world out there, or are they also partly about you? Certainly, as a matter of fact, people's beliefs do change when they change but the outside world doesn't change. According to standard normative rationality (i.e., expected utility maximization) that's irrational, but under EU maximization it's also irrational to change one's values, since that causes inconsistencies between decisions made at different points in time.

I think there is a line between the objective and subjective parts of preference (or as you put it, what's about you and what's about the world), but perhaps it should be drawn somewhere other than between the prior and the utility function. But right now that's little more than a vague idea.

Well among economists it is accepted as rational for your preferences to change with context, including time. As you probably know there are EU equivalence theorems that for any p0,U0, there are many other p1,U1; p2,U2; etc. that produce all the same choices. I break this symmetry by saying the p is about the world while the U is about you. The patterns of choice that are explained by changes in you should go in U, and the patters of choices that are explained by changes in what you believe about the world go in p.

Well among economists it is accepted as rational for your preferences to change with context, including time.

That's surprising for me to hear, and seems to contradict the information given at http://en.wikipedia.org/wiki/Time_inconsistency#In_behavioral_economics

Exponential discounting and, more generally, time-consistent preferences are often assumed in rational choice theory, since they imply that all of a decision-maker's selves will agree with the choices made by each self.

Later on it says:

This would imply disagreement by people's different selves on decisions made and a rejection of the time consistency aspect of rational choice theory.

But I thought this rejection means rejection as a positive/descriptive theory of how humans actually behave, not as a normative theory of what is rational. Are you saying that economists no longer consider time consistency to be normative?

ETA: Whoever is voting Robin down, why are you doing that?

Conflicts are unfortunate, but hardly irrational. If is is not irrational for two different people at the same time to have different preferences, it is not irratoinal for the same person at different time to have different preferences.

I have to admit, I always thought of time consistency as a standard part of individual rationality, and didn't consider that anyone might take the position that you're taking. I'll have to think about this some more. In the mean time, what about my other question, how to actually become pre-rational? Have you looked at this comment yet?

If people could cheaply bind their future selves, and didn't directly prefer not to do so, it would be irrational of them to let their future selves have different preferences.

If you owned any slave and could cheaply do so, you'd want to mold it to share exactly your preferences. But should you treat your future selves as your slaves?

Upon further reflection, I think altruism towards one's future selves can't justify having different preferences, because there should be a set of compromise preferences such that both your current self and your future selves are better off if you bind yourself (both current and future) to that set.

The logical structure of this argument is flawed. Here's another argument that shares the same structure, but is clearly wrong:

If you owned any slave and could cheaply do so, you'd want to ensure it doesn't die of neglect. But should you treat your future selves as your slaves?

Here's another version that makes more sense:

If you had an opportunity to mold a friend to share exactly your preferences, and could do so cheaply, you might still not want to do so, and wouldn't be considered irrational for it. So why should you be considered irrational for not molding your future selves to share exactly your preferences?

One answer here might be that changing your friend's preferences is a wrong because it hurts him according to his current preferences. Doing the same to your future selves isn't wrong because they don't exist yet. But I think Robin's moral philosophy says that we should respect the preferences of nonexistent people, so his position seems consistent with that.

This seems like the well-worn discussion on whether rational agents should be expected to change their preferences. Here's Omohundro on the topic:

"Their utility function will be precious to these systems. It encapsulates their values and any changes to it would be disastrous to them. If a malicious external agent were able to make modifications, their future selves would forevermore act in ways contrary to their current values. This could be a fate worse than death! Imagine a book loving agent whose utility function was changed by an arsonist to cause the agent to enjoy burning books. Its future self not only wouldn’t work to collect and preserve books, but would actively go about destroying them. This kind of outcome has such a negative utility that systems will go to great lengths to protect their utility functions."

He goes on to discuss the issue in detail and lists some exceptional cases.

Re: why are there so few rationality researchers? Why aren't there hordes of people interested in these issues

Rationality is not on the curriculum. People typically learn about it through osmosis in the science classes. Along with critical thinking, it has been considered to be too simple to be a subject in its own right. So, it fell somewhere between the science and math stools - and got lost down there.

I should say that people typically fail to learn about it through osmosis.

(Too simple a subject, indeed. What a prime example of a statement that's Not Even Wrong. Perhaps "too removed from ordinary human experience" is a better description.)

Simple - at least compared to science or maths, surely. If you look at the school curriculum, you often have to be a big and complex subject to get your own dedicated slot.

I'm not denigrating the subject - just trying to see what happened to its timetable in the context of the school curriculum.

Well, it depends on the definition of "rationality" used. Many components are taught formally and are anything but simple - such as probability theory.

Probability theory is a pretty small subset of maths - plus it is probably already being taught anyway in the maths curriculum.

That article's now moved to a new URL. (In case it moves again in future, it's Daniel T. Willingham's "Critical Thinking: Why Is It So Hard to Teach?", published in the summer 2007 American Educator.)

Yes, they do. The average user exposed to them may not apply them that way, but they certainly exist.

I think I made a statement that is too strong.

I'll quote from the article instead:

One issue is that the common conception of critical thinking or scientific thinking (or historical thinking) as a set of skills is not accurate. Critical thinking does not have certain characteristics normally associated with skills—in particular, being able to use that skill at any time. If I told you that I learned to read music, for example, you would expect, correctly, that I could use my new skill (i.e., read music) whenever I wanted. But critical thinking is very different. As we saw in the discussion of conditional probabilities, people can engage in some types of critical thinking without training, but even with extensive training, they will sometimes fail to think critically. This understanding that critical thinking is not a skill is vital. ‡

‡ Although this is not highly relevant for K-12 teachers, it is important to note that for people with extensive training, such as Ph.D.-level scientists, critical thinking does have some skill-like characteristics. In particular, they are better able to deploy critical reasoning with a wide variety of content, even that with which they are not very familiar. But, of course, this does not mean that they will never make mistakes.

And another quote:

Unfortunately, metacognitive strategies can only take you so far. Although they suggest what you ought to do, they don’t provide the knowledge necessary to implement the strategy. For example, when experimenters told subjects working on the band problem that it was similar to the garden problem, more subjects solved the problem (35 percent compared to 19 percent without the hint), but most subjects, even when told what to do, weren’t able to do it. Likewise, you may know that you ought not accept the first reasonable-sounding solution to a problem, but that doesn’t mean you know how to come up with alternative solutions or weigh how reasonable each one is. That requires domain knowledge and practice in putting that knowledge to work.

"The next question I want to consider is a normative one: is pre-rationality rational? Pre-rationality says that we should reason as if we were pre-agents who learned about our prior assignments as information, instead of just taking those priors as given. But then, shouldn't we also act as if we were pre-agents who learned about our utility function assignments as information, instead of taking them as given?"

As I understand it, preferences and therefore utility functions are by nature a-rational since they are their own ends. Choosing to alter your own utility function involves another part of the same function deciding that there is more overall utility in changing the other part (for example, wishing that you didn't like the taste of chocolate so you won't get fat). Thus we cannot escape our priors in this regard.

I have been more concerned with the fickle nature of utility functions, and what that means for making predictions of future utility, especially in the face of irreversible decisions (a good example is the decision to birth a child, though going to gradate school fits in many ways too). Should humans reduce future utility calculations to only those functions which remain stable over time and in many circumstances? I fear much subtlety is lost if we consider preference too broadly, but that might be my present self selfishly weighting her preferences too heavily.

Another take at clarifying that UDT seems to say, to add to the discussion at the end of the post: there is no way to change not only the past, the future (see: free will), but also counterfactuals. What was, and what will be is fixed to what it actually was and what it actually will be, but the same applies to what could be, that is what could be is also fixed to what it actually could be. This doesn't interfere with the ability to determine what will be, what was, and what could be. And that's all UDT says: one can't temper with what one could've done, in the sense of changing it, although one can determine it.

To bind preference, shouldness to this factual setting, it's better to forget the details of expected utility maximization algorithm, and say that what the agent thinks it should do (at the moment, which may be a very limited perspective that doesn't reflect all of the agent's preference, but only that part that acts at that moment) is what it actually does. Thus, different preferences simply correspond to different algorithms for choosing actions, or more generally to different ways in which the agent determines the (dependence between) past, future, and counterfactuals.

Now if we get back to the point that all these things are "fixed" and are "as they actually are", we can see that there can be no rational disagreement, about anything, ever. One can't disagree about facts, but one can't also disagree about values, when values are seen as a counterpart of actions, facts also. Of course, different agents are different systems, and so they get located by different observations and perform different actions, and in this sense can be said to have different states on knowledge and act on different values, but this is a fact about dots on the picture, not about the picture whole.

(Of course, counterfactuals are the only real thing in the context of this discussion, "past" and "future" aren't concepts appropriately homogeneous here, when I say "determine the future", I mean "determine the 'counterfactuals' that branch in the future".)

What was, and what will be is fixed to what it actually was and what it actually will be, but the same applies to what could be, that is what could be is also fixed to what it actually could be.

As it could have been in the beginning, so it could have been now, and forever could have been going to be.

I'm a fan of the particular prior tense myself. http://www.somethingawful.com/d/news/today-we-learn.php

Years ago, Jennifer Rodriguez-Mueller and I invented the genre of Tense Poetry. An example:

It did not do
what would have been done