On the limits of idealized values

Very nice and clear writing, thank you! This is exactly the kind of stuff I'd love to see more on LW:

Suppose I can create either this galaxy Joe’s favorite world, or a world of happy puppies frolicking in the grass. The puppies, from my perspective, are a pretty safe bet: I myself can see the appeal.

Though I think some parts could use more work, shorter words and clearer images:

Second (though maybe minor/surmountable): even if your actual attitudes yield determinate verdicts about the authoritative form of idealization, it seems like we’re now giving your procedural/meta evaluative attitudes an unjustified amount of authority relative to your more object-level evaluative attitudes.

But most of the post is good.

R. Scott Bakker made a related point in Crash Space:

The reliability of our heuristic cues utterly depends on the stability of the systems involved. Anyone who has witnessed psychotic episodes has firsthand experience of consequences of finding themselves with no reliable connection to the hidden systems involved. Any time our heuristic systems are miscued, we very quickly find ourselves in ‘crash space,’ a problem solving domain where our tools seem to fit the description, but cannot seem to get the job done.

And now we’re set to begin engineering our brains in earnest. Engineering environments has the effect of transforming the ancestral context of our cognitive capacities, changing the structure of the problems to be solved such that we gradually accumulate local crash spaces, domains where our intuitions have become maladaptive. Everything from irrational fears to the ‘modern malaise’ comes to mind here. Engineering ourselves, on the other hand, has the effect of transforming our relationship to all contexts, in ways large or small, simultaneously. It very well could be the case that something as apparently innocuous as the mass ability to wipe painful memories will precipitate our destruction. Who knows? The only thing we can say in advance is that it will be globally disruptive somehow, as will every other ‘improvement’ that finds its way to market.

Human cognition is about to be tested by an unparalleled age of ‘habitat destruction.’ The more we change ourselves, the more we change the nature of the job, the less reliable our ancestral tools become, the deeper we wade into crash space.

In other words, yeah, I can imagine an alter ego who sees more and thinks better than me. As long as it stays within human evolutionary bounds, I'm even okay with trusting it more than myself. But once it steps outside these bounds, it seems like veering into "crash space" is the expected outcome.

[-]Joe Carlsmith4y10

Glad you liked it, and thanks for sharing the Bakker piece -- I found it evocative.

[-]paulfchristiano4y190

I feel like "Something is good to the extent that an idealized version of me would judge it good" is a useful heuristic about goodness, but I agree that it doesn't really work as a definition and I liked this post.

It seems like an important heuristic if we are in a bad position to figure out what is good directly (e.g. because we are spending our time fending off catastrophe or competing with each other), where it feels possible to construct an idealization that we'd trust more than ourselves (e.g. by removing the risk of extinction or some kinds of destructive conflict).

In particular, it seems we could (often) trust them to figure out how to perform further idealization better than we would. We don't want to pick just any self-ratifying idealization, but we can hope to get around this by taking little baby steps each of which ratifies the next. The very simple version of this heuristic quickly loses its value once we take a few steps and fix the most obviously broken+urgent things about our situation. Then we are left with hard questions about which kinds of idealizations are "best," and then eventually with hard object-level questions.

(I do think any of these calls, even the apparently simple ones, is value-laden. For example, it seems like a human is not a single coherent entity, and different processes of idealization could lead to different balances between conflicting desires or ways of being. This kind of problem is more obvious for groups than individuals, since it's clear from everyday life how early steps of "idealization" can fundamentally change the balance of power, but I think it's also quite important for incoherent individuals. Not to mention more mundane forms of wrongness, that seem possible even for the simplest kinds of idealization and mean that no idealization is really a free lunch.)

I wrote about something a bit like your "ghost civilization" here (under "Finding Earth" and then "Extrapolation").

[-]Joe Carlsmith4y50

I agree that it's a useful heuristic, and the "baby steps" idealization you describe seems to me like a reasonable version to have in mind and to defer to over ourselves (including re: how to continue idealizing). I also appreciate that your 2012 post actually went through sketched a process in that amount of depth/specificity.

[-]habryka4y110

Promoted to curated: I really liked this post. I've had some thoughts along similar lines for a while, and this post clarified a bunch of them in much better ways than I have succeeded at so far. It also seems like a pretty important topic. Thank you for writing this!

[-]Charlie Steiner4y90

Good stuff! I'm not sure if you pulled your punches at the end in service of hope - different ghost councils will lead you to different decisions, as will different ways of consulting ghost councils, and different ways of choosing consultation methods, and so on ad infinitum. There is no Cartesian boundary between you and the ghosts that lets you pick what ghosts to listen to from a point of infinite distance and infinite leverage; you just kinda get the ghosts you get. You have to make peace not only with your own agency, but also with your own contingency, to end up in a place where maybe doing your best really can be enough.

[-]Joe Carlsmith4y30

Thanks :). I didn't mean for the ghost section to imply that the ghost civilization solves the problems discussed in the rest of the post re: e.g. divergence, meta-divergence, and so forth. Rather, the point was that taking responsibility for making the decision yourself (this feels closely related to "making peace with your own agency"), in consultation with/deference towards whatever ghost civilizations etc you want, changes the picture relative to e.g. requiring that there be some particular set of ghosts that already defines the right answer.

[-]Wei Dai4y70

On a popular view about meta-ethics, what you should value is determined by what an idealized version of you would value. Call this view “idealizing subjectivism.”

What do you think of the view that “idealizing subjectivism” is just a "interim meta-ethics", a kind of temporary placeholder until we figure out the real nature of morality? As an analogy, consider "what you should believe (about math and science) is determined by what an idealized version of you would believe (about math and science)." This doesn't seem very attractive to us, given available alternative philosophies of math and science that are more direct and less circular, but might have made a good interim philosophy of math and science in the past, when we were much more confused about these topics.

Another comment is that I wish all meta-ethics, including this one, would engage more with the idea that the functional role of morality in humans is apparently some combination of:

a tool - for cooperation and increasing group fitness
a weapon - to coordinate and attack enemies/rivals with (e.g., bringing someone down by creating a mob to accuse them of some moral violation that you may have just recently invented)
a game - to gain status by displaying virtue/morality (by ordinary people) or intelligence/wisdom/sophistication (by philosophers)

This seems like it ought to have some implications for meta-ethics, but I'm not sure what exactly, and again wish there was more engagement with it. (See also related comment here.) Perhaps one relevant question is, should you think of your idealized self as existing in an environment where morality still plays these roles? Why or why not?

[-]Joe Carlsmith4y30

In the past, I've thought of idealizing subjectivism as something like an "interim meta-ethics," in the sense that it was a meta-ethic I expected to do OK conditional on each of the three meta-ethical views discussed here, e.g.:

Internalist realism (value is independent of your attitudes, but your idealized attitudes always converge on it)
Externalist realism (value is independent of your attitudes, but your idealized attitudes don't always converge on it)
Idealizing subjectivism (value is determined by your idealized attitudes)

The thought was that on (1), idealizing subjectivism tracks the truth. On (2), maybe you're screwed even post-idealization, but whatever idealization process you were going to do was your best shot at the truth anyway. And on (3), idealizing subjectivism is just true. So, you don't go too far wrong as an idealizing subjectivist. (Though note that we can run similar lines or argument for using internalist or externalist forms of realism as the "interim meta-ethics." The basic dynamic here is just that, regardless of what you think about (1)-(3), doing your idealization procedures is the only thing you know how to do, so you should just do it.)

I still feel some sympathy towards this, but I've also since come to view attempts at meta-ethical agnosticism of this kind as much less innocent and straightforward than this picture hopes. In particular, I feel like I see meta-ethical questions interacting with object-level moral questions, together with other aspects of philosophy, at tons of different levels (see e.g. here, here, and here for a few discussions), so it has felt corresponding important to just be clear about which view is most likely to be true.

Beyond this, though, for the reasons discussed in this post, I've also become clearer in my skepticism that "just do your idealization procedure" is some well-defined thing that we can just take for granted. And I think that once we double click on it, we actually get something that looks less like any of 1-3, and more like the type of active, existentialist-flavored thing I tried to point at in Sections X and XI.

Re: functional roles of morality, one thing I'll flag here is that in my view, the most fundamental meta-ethical questions aren't about morality per se, but rather are about practical normativity more generally (though in practice, many people seem most pushed towards realism by moral questions in particular, perhaps due to the types of "bindingness" intuitions I try to point at here -- intuitions that I don't actually think realism on its own helps with).

Should you think of your idealized self as existing in a context where morality still plays these (and other) functional roles? As with everything about your idealization procedure, on my picture it's ultimately up to you. Personally, I tend to start by thinking about individual ghost versions of myself who can see what things are like in lots of different counterfactual situations (including, e.g., situations where morality plays different functional roles, or in which I am raised differently), but who are in some sense "outside of society," and who therefore aren't doing much in the way of direct signaling, group coordination, etc. That said, these ghost version selves start with my current values, which have indeed resulted from my being raised in environments where morality is playing roles of the kind you mentioned.

[-]Wei Dai4y20

so it has felt corresponding important to just be clear about which view is most likely to be true.

I guess this means you've rejected both versions of realism as unlikely? Have you explained why somewhere? What do you think about position 3 in this list?

As with everything about your idealization procedure, on my picture it’s ultimately up to you.

This sounds like a version of my position 4. Would you agree? I think my main problem with it is that I don't know how to rule out positions 1,2,3,5,6.

therefore aren’t doing much in the way of direct signaling, group coordination, etc.

Ok, interesting. How does your ghost deal with the fact that the real you is constrained/motivated by the need to do signaling and coordination with morality? (For example does the ghost accommodate the real you by adjusting its conclusions to be more acceptable/useful for these purposes?) Is "desire for status" a part of your current values that the ghost inherits, and how does that influence its cognition?

[-]Joe Carlsmith4y*30

I haven't given a full account of my views of realism anywhere, but briefly, I think that the realism the realists-at-heart want is a robust non-naturalist realism, a la David Enoch, and that this view implies:

an inflationary metaphysics that it just doesn't seem like we have enough evidence for,
an epistemic challenge (why would we expect our normative beliefs to correlate with the non-natural normative facts?) that realists have basically no answer to except "yeah idk but maybe this is a problem for math and philosophy too?" (Enoch's chapter 7 covers this issue; I also briefly point at it in this section, in talking about why the realist bot would expect its desires and intuitions to correlate with the the contents of the envelope buried in the mountain), and
an appeal to a non-natural realm that a lot of realists take as necessary to capture the substance and heft of our normative lives, but which I don't think is necessary for this, at least when it comes to caring (i think moral "authority" and "bindingness regardless of what you care about" might be a different story, but one that "the non-natural realm says so" doesn't obviously help with, either). i wrote up my take on this issue here.

Also, most realists are externalists, and I think that externalist realism severs an intuitive connection between normativity and motivation that I would prefer to preserve (though this is more of an "I don't like that" than a "that's not true" objection). I wrote about this here.

There are various ways of being a "naturalist realist," too, but the disagreement between naturalist realism and anti-realism/subjectivism/nihilism is, in my opinion, centrally a semantic one. The important question is whether anything normativity-flavored is in a deep sense something over and above the standard naturalist world picture. Once we've denied that, we're basically just talking about how to use words to describe that standard naturalist world picture. I wrote a bit about how I think of this kind of dialectic here:

This is a familiar dialectic in philosophical debates about whether some domain X can be reduced to Y (meta-ethics is a salient comparison to me). The anti-reductionist (A) will argue that our core intuitions/concepts/practices related to X make clear that it cannot be reduced to Y, and that since X must exist (as we intuitively think it does), we should expand our metaphysics to include more than Y. The reductionist (R) will argue that X can in fact be reduced to Y, and that this is compatible with our intuitions/concepts/everyday practices with respect to X, and hence that X exists but it’s nothing over and above Y. The nihilist (N), by contrast, agrees with A that it follows from our intuitions/concepts/practices related to X that it cannot be reduced to Y, but agrees with D that there is in fact nothing over and above Y, and so concludes that there is no X, and that our intuitions/concepts/practices related to X are correspondingly misguided. Here, the disagreement between A vs. R/N is about whether more than Y exists; the disagreement between R vs. A/N is about whether a world of only Y “counts” as a world with X. This latter often begins to seem a matter of terminology; the substantive questions have already been settled.

There's a common strain of realism in utilitarian circles that tries to identify "goodness" with something like "valence," treats "valence" as a "phenomenal property", and then tries to appeal to our "special direct epistemic access" to phenomenal consciousness in order to solve the epistemic challenge above. i think this doesn't help at all (the basic questions about how the non-natural realm interacts with the natural one remain unanswered -- and this is a classic problem for non-physicalist theories of consciousness as well), but that it gets its appeal centrally via running through people's confusion/mystery relationship with phenomenal consciousness, which muddies the issue enough to make it seem like the move might help. I talk about issues in this vein a bit in the latter half of my podcast with Gus Docker.

Re: your list of 6 meta-ethical options, I'd be inclined to pull apart the question of

(a) do any normative facts exists, and if so, which ones, vs.
(b) what's the empirical situation with respect to deliberation within agents and disagreement across agents (e.g., do most agents agree and if so why; how sensitive is the deliberation of a given agent to initial conditions, etc).

With respect to (a), my take is closest to 6 ("there aren't any normative facts at all") if the normative facts are construed in a non-naturalist way, and closest to "whatever, it's mostly a terminology dispute at this point" if the normative facts are construed in a naturalist way (though if we're doing the terminology dispute, I'm generally more inclined towards naturalist realism over nihilism). Facts about what's "rational" or "what decision theory wins" fall under this response as well (I talk about this a bit here).

With respect to (b), my first pass take is "i dunno, it's an empirical question," but if I had to guess, I'd guess lots of disagreement between agents across the multiverse, and a fair amount of sensitivity to initial conditions on the part of individual deliberators.

Re: my ghost, it starts out valuing status as much as i do, but it's in a bit of a funky situation insofar as it can't get normal forms of status for itself because it's beyond society. It can, if it wants, try for some weirder form of cosmic status amongst hypothetical peers ("what they would think if they could see me now!"), or it can try to get status for the Joe that it left behind in the world, but my general feeling is that the process of stepping away from the Joe and looking at the world as a whole tends to reduce its investment in what happens to Joe in particular, e.g.:

Perhaps, at the beginning, the ghost is particularly interested in Joe-related aspects of the world. Fairly soon, though, I imagine it paying more and more attention to everything else. For while the ghost retains a deep understanding of Joe, and a certain kind of care towards him, it is viscerally obvious, from the ghost’s perspective, unmoored from Joe’s body, that Joe is just one creature among so many others; Joe’s life, Joe’s concerns, once so central and engrossing, are just one tiny, tiny part of what’s going on.

That said, insofar as the ghost is giving recommendations to me about what to do, it can definitely take into account the fact that I want status to whatever degree, and am otherwise operating in the context of social constraints, coordination mechanisms, etc.

[-]Wei Dai4y41

an epistemic challenge (why would we expect our normative beliefs to correlate with the non-natural normative facts?) that realists have basically no answer to except “yeah idk but maybe this is a problem for math and philosophy too?”

i think this doesn’t help at all (the basic questions about how the non-natural realm interacts with the natural one remain unanswered—and this is a classic problem for non-physicalist theories of consciousness as well), but that it gets its appeal centrally via running through people’s confusion/mystery relationship with phenomenal consciousness, which muddies the issue enough to make it seem like the move might help.

It seems that you have a tendency to take "X'ists don't have an answer to question Y" as strong evidence for "Y has no answer, assuming X" and therefore "not X", whereas I take it as weak evidence for such because it seems pretty likely that even if Y has an answer given X, humans are just not smart enough to have found it yet. It looks like this may be the main crux that explains our disagreement over meta-ethics (where I'm much more of an agnostic).

but my general feeling is that the process of stepping away from the Joe and looking at the world as a whole tends to reduce its investment in what happens to Joe in particular

This doesn't feel very motivating to me (i.e., why should I imagine idealized me being this way), absent some kind of normative force that I currently don't know about (i.e., if there was a normative fact that I should idealize myself in this way). So I'm still in a position where I'm not sure how idealization should handle status issues (among other questions/confusions about it).

[-]Rana Dexsin4y20

Approximately what I might have said had I attempted to actually make it coherent! I look forward to seeing what comes out of this.

(One exception: if galaxy Rana were to match anything like your description, I have a rough pre-existing protocol (which is only partially computed and also hard to describe) for trying to work this out. I think this might not generalize well to other value systems or mind architectures, though, and I doubt it invalidates the thought experiment as such.)

[-]Ofer4y10

Or consider the idea that idealization involves or is approximated by “running a large number of copies of yourself, who then talk/argue a lot with each other and with others, […]”

Later in the "Ghost civilizations" section you mentioned the idea of ghost copies "supervising/supporting/scrutinizing an explorer trying some sort of process or stimulus that could lead to going off the rails". It's interesting to think about technologies like lie-detectors in this context, for mitigating risks like the "memetic hazards that are fatal from an evaluative perspective" that you mentioned. For example, suppose that a Supervisor Copy asks many Explorer Copies to enter a secure room that is then locked. The Explorer Copies then pursue a certain risky line of thought X. They then get to write down their conclusion, but the Supervisor Copy only gets to read it if all the Explorer Copies pass a lie-detector test in which they claim that they did not stumble upon any "memetic hazard" etc.

As an aside, all those copies can be part of a single simulation that we run for this purpose, in which they all get treated very well (even if they end up without the ability to affect anything outside the simulation).

Related to what you wrote near the end ("In a sense, I can use the image of them…"), I just want to add that using an imaginary idealized version of oneself as an advisor may be a great way to mitigate some harmful cognitive biases and also just a great productivity trick.

[-]TAG4y10

A rejection of certain types of robust realism about value, on which value is just a brute feature of the world “out there.”

Its a three horse race , not a two horse race. There isn't just realism and subjectivism (individual level relativism), there's group level ethics.

Its s fact that it exists, and what it exists to share behaviour...otherwise there would not be such behaviour shaping social phenomena as praise and blame, punishment and reward.

A related embrace of a kind of Humeanism about means and ends. The world can tell you the means to your ends, but it cannot tell you what ends to pursue — those must in some sense be there already, in your (idealized?) heart.

But society can tell you what not to do.

You have noticed that some subjects might have murdery values. So you can't get any intuitively satisfactory ethics out of everyone doing what they value, since some people want to murder.

Your solution is .. the "ideal" part of ideal subjectivism? But it's not clear that would turn murdery people into non murdery people ... and it's voluntary anyway ..if they don't value reflective equilibrium, they're not going to do it.

An aspiration to maintain some kind of deep connection between what’s valuable, and what actually moves us to act (though note that this connection is not universalized — e.g., what’s valuable relative to you may not be motivating to others).

Why? If you want an entirely voluntary system of ethics, I suppose that is valuable.

There's a sense in which ethical motivation always comes from what individuals value, but that doesn't imply that motivation has to come from a subjective or solipsistic process. Group morality also has a solution: society punishes you, or threatens to, and that works on your subjective (but shared) desire not to be punished.

Theres probably a moral question about what values people should or would voluntarily pursue, once the problems of do-not-steal and do-not-kill have been solved. Making a voluntary , private, decision to achieve your own values.

But the thou-shalt-nots,the aspects of ethics that are basically public, basically obligatory, and basically about not putting negative value on other people, are more important. That's built into the word "supererogatory".

[-]Joe Carlsmith4y20

I agree that there are other meta-ethical options, including ones that focus more on groups, cultures, agents in general, and so on, rather than individual agents (an earlier draft had a brief reference to this). And I think it's possible that some of these are in a better position to make sense of certain morality-related things, especially obligation-flavored ones, than the individually-focused subjectivism considered here (I gesture a little at something in this vicinity at the end of this post). I wanted a narrower focus in this post, though.

[-]TAG4y*10

Ok. I'm glad you noticed, in the linked post, that utilitarianism doesn't have a decent model of obligation.

[-]Charlie Steiner4y20

Now I'm trying to recall a reference. Was there a LW post in the last few years about treating society, rather than individuals, as the subject of value learning? Maybe also something about how non-western societies are less likely to put individual values as paramount?

[-]Rohin Shah4y50

This one?

[-]Charlie Steiner4y20

Yes!

[+][comment deleted]4y10

LESSWRONG
LW

LESSWRONG
LW

128

On the limits of idealized values

128

128

I. Clarifying the view

II. The appeal

III. Which idealization?

IV. Galaxy Joe

V. Mind-hacking vs. insight

VI. Privileged procedures

VII. Appeals to actual attitudes

VIII. Appeals to idealized attitudes

IX. Hoping for convergence, tolerating indeterminacy

X. Passive and active ethics

XI. Ghost civilizations