Making decisions using multiple worldviews

I like the problem this post poses. I think the "policy approach" is totally the wrong direction.

As I see it, the most-important-in-practice reason that the "epistemic approach" fails is that it treats a worldview as a single monolithic black box model of the whole world. As a general rule, worldviews in practice are "local" in the sense that they focus on a few particular constraints on how-the-world-works, and mostly don't make strong predictions about everything else. You can see this a lot in e.g. last year's MIRI discussions: a very common pattern is "my model strongly predicts X, but does not strongly predict anything about Y". That locality is the main key to integrating multiple worldviews: in most places, either neither model makes a strong claim, or only one model makes a strong claim. The places where the two conflict tend to be rare (more precisely, out-of-distribution on today's world), and require thinking about generalization properties of the underlying constraints on which each worldview focuses.

The "policy approach" goes in basically the opposite direction - rather than opening the black box, it treats the black box as even more impenetrable. It doesn't even try to probe the internal epistemic gears producing policy proposals.

(And this complaint carries over to the analogous alignment strategies and models of cognition mentioned at the end of the post.)

[-]Richard_Ngo3y40

As a general rule, worldviews in practice are "local" in the sense that they focus on a few particular constraints on how-the-world-works, and mostly don't make strong predictions about everything else. ... That locality is the main key to integrating multiple worldviews

Interesting point. I think there are some models of the world for which this is true - e.g. models developed within specific academic disciplines. So you can merge economics with biology pretty well just by combining them locally. But I'm mainly focusing on worldviews which are broader, such that even if they don't make strong predictions about a given area, they still have background beliefs which clash strongly with other worldviews. E.g. an environmentalist meets a neoliberal: the neoliberal knows few details about the environment, the environmentalist knows few details about the economy, but each is confident that the actual details of reality are going to vindicate their own high-level worldview.

(In the MIRI discussions, this tends to look like Eliezer saying "my model doesn't have strong predictions about X or Y or Z, except that you can't find settings of all of these variables which solve the alignment problem". It's not that he's making predictions in different places, it's that he's making predictions at a different level of abstraction.)

the most-important-in-practice reason that the "epistemic approach" fails is that it treats a worldview as a single monolithic black box model of the whole world

So I totally agree that breaking down the black box is going to work better, when you can do it. The question is: given that you're strongly constrained on breaking-down-black-boxes, where should you spend that effort?

[-]johnswentworth3y20

So, part of my argument here is that "limited in breaking down the black boxes" is the wrong way to view the limitation. It's limited attention, time, etc. That doesn't necessarily translate into a limitation in breaking down black boxes, especially if you have some general knowledge about how to usefully break down the relevant black boxes.

And that's where the "a few constraints" part comes in. Like, when Eliezer says "my model doesn't have strong predictions about X or Y or Z, except that you can't find settings of all of these variables which solve the alignment problem", that's actually a fairly simple constraint on its own. It's simpler and more useful than a black-box of Eliezer's predictions or policy suggestions. That's the sort of thing we want to extract.

[-]Richard_Ngo3y40

when Eliezer says "my model doesn't have strong predictions about X or Y or Z, except that you can't find settings of all of these variables which solve the alignment problem", that's actually a fairly simple constraint on its own. It's simpler and more useful than a black-box of Eliezer's predictions or policy suggestions. That's the sort of thing we want to extract.

But when Paul (or most other alignment researchers) say "in fact you can find settings of those variables which solve the alignment problem", now we've got another high-level claim about the world which is inconsistent with the first one. So if your strategy for operating multiple worldviews is to build a new worldview by combining claims like these, then you'll hit a contradiction pretty quickly; and in this case it's highly nontrivial to figure out how to resolve that contradiction, or produce a sensible policy from a starting set of contradictory beliefs. Whereas if you calculate the separate policies first, it may well be the case that they're almost entirely consistent with each other.

(As an aside: I once heard someone give a definition of "rationalists" as "people who try to form a single coherent worldview". Feels relevant here.)

[-]johnswentworth3y40

Let's walk through an example a bit more. Eliezer says something like "my model doesn't have strong predictions about X or Y, except that you can't have both X and Y at the same time, and that's what you'd need in order for alignment to be easy.". Then e.g. Rohin comes along and says something like "my model is that Z generally solves most problems most of the time, therefore it will also solve alignment". And clearly these make incompatible predictions in the case of alignment specifically. But they're both models which say lots of stuff about lots of things other than alignment. Other than alignment, the models mostly make predictions about different things, so it's not easy to directly test them against each other.

The actual strategy I want here is to take both of those constraints and say "here's one constraint, here's another constraint, both of them seem to hold across a pretty broad range of situations but they're making opposite predictions about alignment difficulty, so one of them must not generalize to alignment for some reason". And if you don't currently understand which situations the two constraints do and do not generalize to, or where the loopholes are, then that's the correct epistemic state. It is correct to say "yup, there's two constraints here which both make lots of correct predictions in mostly-different places and one of them must be wrong in this case but I don't know which".

... and this is totally normal! Like, of course people have lots of different heuristics and model-fragments which mostly don't overlap but do occasionally contradict each other. That's fine, that's a very ordinary epistemic state for a human, we have lots of experience with such epistemic states. Humans have lots of intuitive practice with things like "agonize about which of those conflicting heuristics we trust more in this situation" or "look for a policy which satisfies both of these conflicting model-fragments" or "consider the loopholes in each of these heuristics; does one of them have loopholes more likely to apply to this problem?".

Of course we still try to make our worldview more coherent over time; a conflict is evidence that something in there is wrong. But throwing away information - whether by abandoning one constraint wholesale, or by black-boxing things in various ways - is not the way to do that. We resolve the contradiction by thinking more about the internal details of how and why and when each model works, not by thinking less about the internal details. And if we don't want to do the hard work of thinking about how and why and when each model works, then we shouldn't be trying to force a resolution in the contradiction. Without doing that work, the most-accurate epistemic state we can reach is a model in which we know there's a contradiction, we know there's something wrong, but we don't know what's wrong (and, importantly, we know that we don't know what's wrong). Forcing a resolution to the contradiction, without figuring out where the actual problem is, would make our epistemic state less correct/accurate, not more.

[-]Richard_Ngo3y20

I'm curious what part of your comment you think I disagree with. I'm not arguing for "forcing a resolution", except insofar as you need to sometimes actually make decisions under worldview uncertainty. In fact, "forcing a resolution" by forming "all-things-considered" credences is the thing I'm arguing against in this post.

I also agree that humans have lots of experience weighing up contradictory heuristics and model-fragments. I think all the mechanisms you gave for how humans might do these are consistent with the thing I'm advocating. In particular, "choose which heuristics to apply" or "search for a policy consistent with different model-fragments" seem like basically what the policy approach would recommend (e.g. searching for a policy consistent with both the Yudkowsky model-fragment and the Christiano model-fragment). By contrast, I don't think this is an accurate description of the way most EAs currently think about epistemic deference, which is the concrete example I'm contrasting my approach against.

(My model here is that you see me as missing a mood, like I'm not being sufficiently anti-black-boxes. I also expect that my proposal sounds more extreme than it actually is, because for simplicity I'm focusing on the limiting case of having almost no ability to resolve disagreements between worldviews.)

[-]johnswentworth3y42

Huh. Yeah, I got the impression from the post that you wanted to do something like replace "epistemic deference plus black-box tests of predictive accuracy" with "policy deference plus simulated negotiation". And I do indeed feel like a missing anti-black-box mood is the main important problem with epistemic deference. But it's plausible we're just in violent agreement here :).

[-]Oliver Sourbut3y30

I like the way you tie real-world advice to principles in ML and RL. In general I think there are a lot of risks to naively applying epistemic deference and worldview aggregation and you articulate some really nicely here.

Something I've noticed with a few of your posts is that they often contain a lot of nuggets of ideas! And for you they seem to cohere into maybe a single high-level thought, but I sometimes want to pull them into smaller chunks^[1]. For example, I imagine you (or others) might want to refer individually to the core idea in the paragraph beginning

However, even if in practice we end up mostly evaluating worldviews based on their epistemic track record, I claim that it’s still valuable to consider the epistemic track record as a proxy for the quality of their advice, rather than using it directly to evaluate how much we trust each worldview...

Now, the rest of the post gives this core idea context and support, but I think it stands on its own as well.

One compromise :D between putting lots of ideas together and splitting them apart too atomically could be to add meaningful sub-headings. (This also incidentally makes it easy to link out to the specific part of the text from another place via # links.)

Maybe we differ in the number of effective working memory slots we have available (for what I mean see https://www.sciencedaily.com/releases/2008/04/080402212855.htm though see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4159388/ which challenges this) ↩︎

[-]Richard_Ngo3y20

Just wanted to note that this comment was quite helpful for me, and has influenced other blog posts that I'm writing. Thanks!

[-]Caerulea-Lawrence3y10

Hi Richard,

the reason for commenting is that I liked this post, and therefore I tried to look for something useful to add. The following is the argument, that without applying the same approach to our current worldview, we will probably fall prey to many of the traps the policy approach tries to avoid.

Moreover, as I want to apply it on my inner self, I am treating the conclusions I have made a priori, as a separate worldview.
 
 I am using Erik Eriksons 8 Stages of Psychosocial development model loosely to construe examples, but it isn’t that important. The main idea is still that during development, even from genetic heritage, we make/come with a lot of presumptions. And so we have already integrated multiple perspectives, but it might not have been very conscious, and therefore possibly not the best approach going forward. 

Here is a simplefied example of what I am talking about: 

Lets say you were afraid of snakes growing up. In adolescence you were in a culture that was very positive of snakes, and so as not to be seen as weak or different, you decided to force yourself to overcome it - and you chose to do so by staying with snakes until you didn’t feel ‘fear’ anymore. As you noticed other, smaller ‘fears’ in yourself, which you believed were phobias, you used this approach continuously - and it made you feel strong, and you got positive social feedback.

Since you are evaluationg ‘track record’ and advice, this might look good, but this case and probably many others, it might be easy to fall victim to the illusion this inner part has created, advice like it is good to face your fears/phobias till the are conquered. But, if we look at it closely, what is the logic under which this advice is created? - It is To fit in with my peers.

Now we might have invested a lot of energy into this advice, and have done it for years - even urged others to do it as well, and consequently built up vested interest in it being so. Therefore, if you do not pay attention, you might decide to give this worldview a default say when it comes to feelings related to fear. The worldview itself, however, is actually fundamentally interested in fitting in with one’s peers, but for it to work one you do not need to know the inherent logic of the worldview it was made in. It would still be helpful, but rarely clear cut.

So, when going through the process, when you come across things in your life you have already evaluated, you add it to the process without weighing it neutrally. This, in turn, has disrupting effects for evaluating strategies and decision-making, because it confuses cause and effect, and true area of decision-interest for the illusion of interest.  This differentce might lead to some subtle changes in the end-result as well.

Here is a simple use of your model, adding the conclusions you already have, seperately:

1. «Conquering my fears/phobias is good advice, and it has a good track record (10 less phobias, less stress etc. so far)»

Even though the former looks fine on the surface, I will have to thread carefully. If I do not know the worldview it comes from, even more so. Do I have any other beliefs about fear/phobias that might be useful or legitimate, but got discarded along the way? Could they, if given some time and effort, be better/complementing to my current belief?

2. «Conquering fears is a part of my identity, and I feel strong and get positive reactions socially. Therefore I care about any and all aspects where fear is related».

These two sentences look related, but does sentence A give sentence B? There are many other options. Since it seems like I have a vested interest (Part of my identity) in this view, I should be extra discerning. Furthermore, I might have suppressed other worldviews’ viewpoints regarding fear - and will have to look closely for any other beliefs regarding the area of fear that are valid.

The third point is the most interesting one, but also the most complex. The example I have used so far can be used as a part of the overall strategy.

Lets say you are planning on moving in with a SO. One part of the equation is that you fear you will lose parts of your autonomy.

 Adding in all the former points, this is what it would look like:
 A little sidenote: 
*Conquering fear in this case can look very differently. Since we are cheating, by knowing the origin, we would try to fit in. So, instead of trying to fight for our autonomy, we might just get it over with and get used to having less. Giving it up till we do not fear it anymore.*

So, since the relationship is important, this belief is loudly suggesting you should use it. Give up your autonomy, till your fear is gone.

However, since you have isolated this as just one worldview’s belief, your search for a different/discarded view has yielded the following: «Allowing and expressing fear can foster a closer bond in close relationships.» You have found some evidence for this as well.

The conquering fears strategy is interested in anything regarding fear, and as this is a new and also relational fear, it is making it clear that it wants to have the say. 

The belief that allowing and expressing fear can foster a closer bond in close relationships, does also have an interest in the area of fear. Since the other has vested interest however, this new belief does not shout, it whispers. Since you are aware, you try to listen as best you can.

So, what to do now. That is not easy, and beyond the scope of my comment, to answer.  Moreover, even though you add this extra precaution, we usually are blind to our blindspots - but at least you are trying to find some.

I also wanted to thank you for sharing this, and putting in all the work. It is a small node in a growing cluster of nodes concerned with Rationality and Epistemology.

LESSWRONG
LW

LESSWRONG
LW

50

Making decisions using multiple worldviews

50

50

Meta-rationality as the limiting case of separate worldviews

Comparing the two approaches

Problems with the policy approach