Today's post, Invisible Frameworks was originally published on 22 August 2008. A summary (taken from the LW wiki):

 

A particular system of values is analyzed, and is used to demonstrate the idea that anytime you consider changing your morals, you do so using your own current meta-morals. Forget this at your peril.


Discuss the post here (rather than in the comments to the original post).

This post is part of the Rerunning the Sequences series, where we'll be going through Eliezer Yudkowsky's old posts in order so that people who are interested can (re-)read and discuss them. The previous post was No License To Be Human, and you can use the sequence_reruns tag or rss feed to follow the rest of the series.

Sequence reruns are a community-driven effort. You can participate by re-reading the sequence post, discussing it here, posting the next day's sequence reruns post, or summarizing forthcoming articles on the wiki. Go here for more details, or to have meta discussions about the Rerunning the Sequences series.

New Comment
6 comments, sorted by Click to highlight new comments since: Today at 2:17 AM

There is no choice a supermajority of agents would make instrumentally.

For example: I am a happiness maximizer. If I could press a button that would give my future self free energy, this would result in my future self making people happy, so I would press it. If a happiness minimizer was given the same choice, and could press a button that would give my future self free energy, it would result in my future self making people happy, so he would not press it.

It's only generally instrumentally useful when it's future you that deals with it. That's how things commonly work, but there's nothing fundamental about it.

supermajority

You realize that word just means "more than a majority", right? Usually 2/3rds, but it seems like "majority" fits your argument just fine.

I'm also assuming you're referring to the space of all possible agents, rather than the space of all actual agents, because goodness, there's LOTS that 90+% of all humans agree on.

I'm also assuming you're referring to the space of all possible agents, rather than the space of all actual agents, because goodness, there's LOTS that 90+% of all humans agree on.

Yeah.

My point is that if you flip your utility function, it flips your actions. It just doesn't seem that way because if you flip your utility function, and future!your utility function, then a lot of actions stay the same.

Not necessarily. Suppose there are two switches next to each other: switch A and switch B. If you leave switch A alone, I'll write it "a", if you flip it, it's "A." The payoffs look like this:

ab: +1
aB: -1
Ab: +3
AB: -3

I mean, it flips a binary action. If you replace the "you" deciding whether or not to switch A with one with an opposite utility function, it will flip it so that future you will do less damage when you flip B.

Ah, I see. Interesting distinction to make, but I don't think it was intended by roko.