# All of lackofcheese's Comments + Replies

"Solving" selfishness for UDT

I think there are some rather significant assumptions underlying the idea that they are "non-relevant". At the very least, if the agents were distinguishable, I think you should indeed be willing to pay to make n higher. On the other hand, if they're indistinguishable then it's a more difficult question, but the anthropic averaging I suggested in my previous comments leads to absurd results.

1Stuart_Armstrong7yThe anthropic averaging leads to absurd results only because it wasn't a utility function over states of the world. Under heads, it ranked 50%Roger+50%Jack differently from the average utility of those two worlds.
"Solving" selfishness for UDT

I don't think that's entirely correct; SSA, for example, is a halfer position and it does exclude worlds where you don't exist, as do many other anthropic approaches.

Personally I'm generally skeptical of averaging over agents in any utility function.

1Stuart_Armstrong7yWhich is why I don't use anthropic probability, because it leads to these kinds of absurdities. The halfer position is defined in the top post (as is the thirder), and your setup uses aspects of both approaches. If it's incoherent, then SSA is incoherent, which I have no problem with. SSA != halfer.
1Stuart_Armstrong7yAveraging makes a lot of sense if the number of agents is going to be increased and decreased in non-relevant ways. Eg: you are an upload. Soon, you are going to experience eating a chocolate bar, then stubbing your toe, then playing a tough but intriguing game. During this time, you will be simulated on n computers, all running exactly the same program of you experiencing this, without any deviations. But n may vary from moment to moment. Should you be willing to pay to make n higher during pleasant experience or lower during unpleasant ones, given that you will never detect this change?
"Solving" selfishness for UDT

You definitely don't have a 50% chance of dying in the sense of "experiencing dying". In the sense of "ceasing to exist" I guess you could argue for it, but I think that it's much more reasonable to say that both past selves continue to exist as a single future self.

Regardless, this stuff may be confusing, but it's entirely conceivable that with the correct theory of personal identity we would have a single correct answer to each of these questions.

1Stuart_Armstrong7yConceivable. But it doesn't seem to me that such a theory is necessary, as it's role seems merely to be able to state probabilities that don't influence actions.
"Solving" selfishness for UDT

OK, the "you cause 1/10 of the policy to happen" argument is intuitively reasonable, but under that kind of argument divided responsibility has nothing to do with how many agents are subjectively indistinguishable and instead has to do with the agents who actually participate in the linked decision.

On those grounds, "divided responsibility" would give the right answer in Psy-Kosh's non-anthropic problem. However, this also means your argument that SIA+divided = SSA+total clearly fails, because of the example I just gave before, and beca... (read more)

1Stuart_Armstrong7yThe divergence between reference class (of identical people) and reference class (of agents with the same decision) is why I advocate for ADT (which is essentially UDT in an anthropic setting).
"Solving" selfishness for UDT

As I mentioned earlier, it's not an argument against halfers in general; it's against halfers with a specific kind of utility function, which sounds like this: "In any possible world I value only my own current and future subjective happiness, averaged over all of the subjectively indistinguishable people who could equally be "me" right now."

In the above scenario, there is a 1/2 chance that both Jack and Roger will be created, a 1/4 chance of only Jack, and a 1/4 chance of only Roger.

Before finding out who you are, averaging would lead ... (read more)

1Stuart_Armstrong7yOh. I see. The problem is that that utility takes a "halfer" position on combining utility (averaging) and "thirder" position on counterfactual worlds where the agent doesn't exist (removing them from consideration). I'm not even sure it's a valid utility function - it seems to mix utility and probability. For example, in the heads world, it values "50% Roger vs 50% Jack" at the full utility amount, yet values only one of "Roger" and "Jack" at full utility. The correct way of doing this would be to value "50% Roger vs 50% Jack" at 50% - and then you just have a rescaled version of the thirder utility. I think I see the idea you're getting at, but I suspect that the real lesson of your example is that that mixed halfer/thirder idea cannot be made coherent in terms of utilities over worlds.
"Solving" selfishness for UDT

I don't think linked decisions make the halfer paradox I brought up go away. Any counterintuitive decisions you make under UDT are simply ones that lead to you making a gain in a counterfactual possible worlds at the cost of a loss in actual possible worlds. However, in the instance above you're losing both in the real scenario in which you're Jack, and in the counterfactual one in which you turned out to be Roger.

Granted, the "halfer" paradox I raised is an argument against having... (read more)

2Stuart_Armstrong7yDid I make a mistake? It's possible - I'm exhausted currently. Let's go through this carefully. Can you spell out exactly why you think that halfers are such that: 1. They are only willing to pay 1/2 for a ticket. 2. They know that they must either be Jack or Roger. 3. They know that upon finding out which one they are, regardless of whether it's Jack or Roger, they would be willing to pay 2/3. I can see 1) and 2), but, thinking about it, I fail to see 3).
"Solving" selfishness for UDT

But SIA also has some issues with order of information, though it's connected with decisions

Can you illustrate how the order of information matters there? As far as I can tell it doesn't, and hence it's just an issue with failing to consider counterfactual utility, which SIA ignores by default. It's definitely a relevant criticism of using anthropic probabilities in your decisions, because failing to consider counterfactual utility results in dynamic inconsistency, but I don't think it's as strong as the associated criticism of SSA.

2Stuart_Armstrong7yYes, that's essentially it. However, the idea of divided responsibility has been proposed before (though not in those terms) - it's not just a hack I made up. Basic idea is, if ten people need to vote unanimously "yes" for a policy that benefits them all, do they each consider that their vote made the difference between the policy and no policy, or that it contributed a tenth of that difference? Divided responsibility actually makes more intuitive sense in many ways, because we could replace the unanimity requirement with "you cause 1/10 of the policy to happen" and it's hard to see what the difference is (assuming that everyone votes identically). But all these approaches (SIA and SSA and whatever concept of responsibility) fall apart when you consider that UDT allows you to reason about agents that will make the same decision as you, even if they're not subjectively indistinguishable from you. Anthropic probability can't deal with these - worse, it can't even consider counterfactual universes where "you" don't exist, and doesn't distinguish well between identical copies of you that have access to distinct, non-decision relevant information. Ah, subjective anticipation... That's an interesting question. I often wonder whether it's meaningful. If we create 10 identical copies of me and expose 9 of them one stimuli and 1 to another, what is my subjective anticipation of seeing one stimuli over the other? 10% is one obvious answer, but I might take a view of personal identity that fails to distinguish between identical copies of me, in which case 50% is correct. What if identical copies will be recombined later? Eliezer had a thought experiment where agents were two dimensional, and could get glued or separated from each other, and wondered whether this made any difference. I do to. And I'm also very confused about quantum measure, for similar reasons.
"Solving" selfishness for UDT

That's not true. The SSA agents are only told about the conditions of the experiment after they're created and have already opened their eyes.

Consequently, isn't it equally valid for me to begin the SSA probability calculation with those two agents already excluded from my reference class?

Doesn't this mean that SSA probabilities are not uniquely defined given the same information, because they depend upon the order in which that information is incorporated?

2Stuart_Armstrong7yYep. The old reference class problem. Which is why, back when I thought anthropic probabilities were meaningful, I was an SIAer. But SIA also has some issues with order of information, though it's connected with decisions ( http://lesswrong.com/lw/4fl/dead_men_tell_tales_falling_out_of_love_with_sia/ [http://lesswrong.com/lw/4fl/dead_men_tell_tales_falling_out_of_love_with_sia/] ). Anyway, if your reference class consists of people who have seen "this is not room X", then "divided responsibility" is no longer 1/3, and you probably have to go whole UTD.
"Solving" selfishness for UDT

I think that argument is highly suspect, primarily because I see no reason why a notion of "responsibility" should have any bearing on your decision theory. Decision theory is about achieving your goals, not avoiding blame for failing.

However, even if we assume that we do include some notion of responsibility, I think that your argument is still incorrect. Consider this version of the incubator Sleeping Beauty problem, where two coins are flipped.
HH => Sleeping Beauties created in Room 1, 2, and 3
HT => Sleeping Beauty created in Room 1

2Stuart_Armstrong7yThe SSA probability of HH is 1/4, not 1/3. Proof: before opening their eyes, the SSA agents divide probability as: 1/12 HH1 (HH and they are in room 1), 1/12 HH2, 1/12 HH3, 1/4 HT, 1/4 TH, 1/4 TT. Upon seeing a sign saying "this is not room X", they remove one possible agent from the HH world, and one possible world from the remaining three. So this gives odds of HH:Â¬HH of (1/12+1/12):(1/4+1/4) = 1/6:1/2, or 1:3, which is a probability of 1/4. This means that SSA+divided responsibility says EU(A) is \$3, and EU(B) is \$3.3. - exactly the same ratios as the first setup, with B as the best choice.
"Solving" selfishness for UDT

There's no "should" - this is a value set.

The "should" comes in giving an argument for why a human rather than just a hypothetically constructed agent might actually reason in that way. The "closest continuer" approach makes at least some intuitive sense, though, so I guess that's a fair justification.

The halfer is only being strange because they seem to be using naive CDT. You could construct a similar paradox for a thirder if you assume the ticket pays out only for the other copy, not themselves.

I think there's more t... (read more)

2Stuart_Armstrong7yLinked decisions is also what makes the halfer paradox go away. To get a paradox that hits at the "thirder" position specifically, in the same way as yours did, I think you need only replace the ticket with something mutually beneficial - like putting on an enjoyable movie that both can watch. Then the thirder would double count the benefit of this, before finding out who they were.
"Solving" selfishness for UDT

On 1), I agree that "pre-chewing" anthropic utility functions appears to be something of a hack. My current intuition in that regard is to reject the notion of anthropic utility (although not anthropic probability), but a solid formulation of anthropics could easily convince me otherwise.

On 2), if it's within the zone of validity then I guess that's sufficient to call something "a correct way" of solving the problem, but if there is an equally simple or simpler approach that has a strictly broader domain of validity I don't think you can be justified in calling it "the right way".

"Solving" selfishness for UDT

That's a reasonable point, although I still have two major criticisms of it.

1. What is your resolution to the confusion about how anthropic reasoning should be applied, and to the various potential absurdities that seem to come from it? Non-anthropic probabilities do not have this problem, but anthropic probabilities definitely do.
2. How can anthropic probability be the "right way" to solve the Sleeping Beauty problem if it lacks the universality of methods like UDT?
1Manfred7y1 - I don't have a general solution, there are plenty of things I'm confused about - and certain cases where anthropic probability depends on your action are at the top of the list. There is a sense in which a certain extension of UDT can handle these cases if you "pre-chew" indexical utility functions into world-state utility functions for it (like a more sophisticated version of what's described in this post, actually), but I'm not convinced that this is the last word. Absurdity and confusion have a long (if slightly spotty) track record of indicating a lack in our understanding, rather than a lack of anything to understand. 2 - Same way that CDT gets the right answer on how much to pay for 50% chance of winning \$1, even though CDT isn't correct. The Sleeping Beauty problem is literally so simple that it's within the zone of validity of CDT.
"Solving" selfishness for UDT

The strongest argument against anthropic probabilities in decision-making comes from problems like the Absent-Minded Driver, in which the probabilities depend upon your decisions.

If anthropic probabilities don't form part of a general-purpose decision theory, and you can get the right answers by simply taking the UDT approach and going straight to optimising outcomes given the strategies you could have, what use are the probabilities?

I won't go so far as to say they're meaningless, but without a general theory of when and how they should be used I definitely think the idea is suspect.

4Manfred7yProbabilities have a foundation independent of decision theory, as encoding beliefs about events. They're what you really do expect to see when you look outside. This is an important note about the absent-minded driver problem et al, that gets lost if one gets comfortable in the effectiveness of UDT. The agent's probabilities are still accurate, and still correspond to the frequency with which they see things (truly!) - but they're no longer related to decision-making in quite the same way. "The use" is then to predict, as accurately as ever, what you'll see when you look outside yourself. And yes, probabilities can sometimes depend on decisions, not only in some anthropic problems but more generally in Newcomb-like ones. Yes, the idea of having a single unqualified belief, before making a decision, doesn't make much sense in these cases. But Sleeping Beauty is not one of these cases.
"Solving" selfishness for UDT

OK; I agree with you that selfishness is ill-defined, and the way to actually specify a particular kind of selfishness is to specify a utility function over all possible worlds (actual and counterfactual). Moreover, the general procedure for doing this is to assign "me" or "not me" label to various entities in the possible worlds, and derive utilities for those worlds on the basis of those labels. However, I think there are some issues that still need to be resolved here.

If I don't exist, I value the person that most closely resembles

2Stuart_Armstrong7yIndeed. That's a valid consideration. In the examples above, this doesn't matter, but it makes a difference in the general case.
2Stuart_Armstrong7yThere's no "should" - this is a value set. This is the extension of the classical selfish utility idea. Suppose that future you joins some silly religion and does some stupid stuff and so on (insert some preferences of which you disprove here). Most humans would still consider that person "them" and would (possibly grudgingly) do things to make them happy. But now imagine that you were duplicated, and the other duplicate went on and did things you approved of more. Many people would conclude that the second duplicate was their "true" self, and redirect all their efforts towards them. This is very close to Nozick's "closer continuer" approach http://www.iep.utm.edu/nozick/#H4 [http://www.iep.utm.edu/nozick/#H4] . It seems the simplest extension of classical selfishness is that the utility function assigns preferences to the physical being that it happens to reside in. This allows it to assign preferences immediately, without first having to figure out their location. But see my answer to the next question (the real issue is that our normal intuitions break down in these situations, making any choice somewhat arbitrary). UDT (or CDT with precommitments) forces selfish agents who don't know who they are into behaving the same as copy-altruists. Copy altruism and adding/averaging come apart under naive CDT. (Note that for averaging versus adding, the difference can only be detected by comparing with other universes with different numbers of people.) The halfer is only being strange because they seem to be using naive CDT. You could construct a similar paradox for a thirder if you assume the ticket pays out only for the other copy, not themselves.
Anthropic decision theory for selfish agents

First of all, I think your argument from connection of past/future selves is just a specific case of the more general argument for reflective consistency, and thus does not imply any kind of "selfishness" in and of itself. More detail is needed to specify a notion of selfishness.

I understand your argument against identifying yourself with another person who might counterfactually have been in the same cell, but the problem here is that if you don't know how the coin actually came up you still have to assign amounts of "care" to the poss... (read more)

Introducing Corrigibility (an FAI research subfield)

That's definitely a more elegant presentation.

I'm not too surprised to hear you had already discovered this idea, since I'm familiar with the gap between research and writing speed. As someone who is not involved with MIRI, consideration of some FAI-related problems is at least somewhat disincentivized by the likelihood that MIRI already has an answer.

As for flaws, I'll list what I can think of. First of all, there are of course some obvious design difficulties, including the difficulty of designing US in the first place, and the difficulty of choosing th... (read more)

8So8res7yYeah, sorry about that -- we are taking some actions to close the writing/research gap and make it easier for people to contribute fresh results, but it will take time for those to come to fruition. In the interim, all I can provide is LW karma and textual reinforcement. Nice work! (We are in new territory now, FWIW.) I agree with these concerns; specifying US is really hard and making it interact nicely with UN is also hard. Roughly, you add correction terms f1(a1), f2(a1, o1, a2), etc. for every partial history, where each one is defined as E[Ux|A1=a1, O1=o1, ..., do(On rel Press)]. (I think.) Things are certainly difficult, and the dependence upon this particular agent's expectations is indeed weird/brittle. (For example, consider another agent maximizing this utility function, where the expectations are the first agent's expectations. Now it's probably incentivized to exploit places where the first agent's expectations are known to be incorrect, although I haven't the time right now to figure out exactly how.) This seems like potentially a good place to keep poking.
Anthropic decision theory for selfish agents

I already have a more detailed version here; see the different calcualtions for E[T] vs E[IT]. However, I'll give you a short version. From the gnome's perspective, the two different types of total utilitarian utility functions are:
T = total \$ over both cells
IT = total \$ over both cells if there's a human in my cell, 0 otherwise.
and the possible outcomes are
p=1/4 for heads + no human in my cell
p=1/4 for heads + human in my cell
p=1/2 for tails + human in my cell.

As you can see, these two utility functions only differ when there is no human in the gnome's ... (read more)

0Manfred7yThanks for giving this great example. This works because in the total utilitarian case (and average utilitarian, and other more general possibilities) the payoff of one gnome depends on the action of the other, so they have to coordinate for maximum payoff. This effect doesn't exist in any selfish case, which is what I was thinking about at the time. But this definitely shows that isomorphism can be more complicated than what I said.
What false beliefs have you held and why were you wrong?

The distinction is that a theory of "unicorns" is a theory that describes how and why other people (and probably you yourself) think about unicorns, while a theory of unicorns would explain actual unicorns. The latter would clearly fail as a theory, because you're never going to actually see a unicorn.

The same distinction doesn't apply to Newtonian mechanics, because Newtonian mechanics is a theory of mechanics, not a theory of how people think about mechanics.

On those grounds, I think it's quite reasonable to say that virtual particles are real, and "unicorns" are real, but unicorns are not real.

0shminux7yNot sure if you read anything I wrote in this thread. Note that both Newton's laws and "unicorn" laws are models. You don't find Newton's laws in Nature, just like you don't find "unicorn" laws. You don't find virtual particles, either, as they are but terms in the perturbative expansion of a particular quantum field theory (which is also a model, and not found in the wild). Anyway, disengaging now.
Introducing Corrigibility (an FAI research subfield)

I think I can give a concise unification of my idea with Karl's. In short, the comment in the paper that

The concerns in Section 4.2 could potentially be addressed by some form of counterfactual (rather than conditional) reasoning.

is correct, and the fix is a pretty simple one. Basically, we want the following:

1. In selecting a1, the agent should act as though it was indifferent between the counterfactual possibilities of shutting down and not shutting down, conditional on the same actions and observations.
2. In selecting a2, the agent should desire to shu
4So8res7yThanks, and nice work! Yeah, this is pretty key. You need it to optimize for both cases as if the probability of the button being pressed is fixed and independent of whether the programmers actually press the button. We can achieve this via a causal intervention on whether or not the button is pressed, and then clean up your U a bit by redefining it as follows: U(a1, o, a2) := { UN(a1, o, a2) + E[US|do(O in Press)] if o not in Press ; US(a1, o, a2) + E[UN|do(O not in Press)] else } (Choosing how to compare UN values to US values makes the choice of priors redundant. If you want the priors to be 2:1 in favor of US then you could also have just doubled US in the first place instead; the degree of freedom in the prior is the same as the degree of freedom in the relative scaling. See also Loudness Priors [http://intelligence.org/files/LoudnessPriors.pdf], a technical report from the last workshop.) This method does seem to fulfill all the desiderata in the paper, although we're not too confident in it yet (it took us a little while to notice the "managing the news" problem in the first version, and it seems pretty likely that this too will have undesirable properties lurking somewhere). I'm fairly pleased with this solution, though, and a little miffed -- we found something similar to this a little while back (our research outstrips our writing speed, unfortunately) and now you've gone and ruined the surprise! :-) (In seriousness, though, nice work. Next question is, can we pick any holes in it?)
What false beliefs have you held and why were you wrong?

Ah, but then you're talking about a theory of "unicorns" rather than a theory of unicorns.

1shminux7yNot sure what you are saying. My guess is that you are implying that the quotation is not the referent [http://lesswrong.com/lw/ok/the_quotation_is_not_the_referent/], and unicorns are hypothetical magical creatures, while "unicorns" are vivid and very real descriptions of them in the stories often read and written by the local bronies. If so, then all I have to say that unicorn is not an accurate or fertile theory, while "unicorn" most definitely is. The difference is the domain of validity: can you go outside and find one running around, or can you mostly encounter them in books and movies? But that applies to most theories. If you go slow, Newtonian mechanics is adequate, if you study fast-moving objects, Newton gives bad predictions. Similarly, if you apply the predictions of the "unicorn" model beyond the domain of its validity, you are going to be disappointed, though occasionally you might discover a new applicable domain, such as a cosplay or a SFF convention.
Anthropic decision theory for selfish agents

The deeper point is important, and I think you're mistaken about the necessary and sufficient conditions for an isomorphism here.

If a human appears in a gnome's cell, then that excludes the counterfactual world in which the human did not appear in the gnome's cell. However, on UDT, the gnome's decision does depend on the payoffs in that counterfactual world.

Thus, for the isomorphism argument to hold, the preferences of the human and gnome must align over counterfactual worlds as well as factual ones. It is not sufficient to have the same probabilities for ... (read more)

1Manfred7yCould you give a worked example of the correct action for the gnome with a human in their cell depending on the payoffs for the gnome without a human in their cell? (Assuming they know whether there's a human in their cell, and know the three different possible sets of payoffs for the available actions - if these constraints were relaxed I think it would be clearly doable. As it is I'm doubtful.)
Introducing Corrigibility (an FAI research subfield)

I think this means "indifference" isn't really the right term any more, because the agent is not actually indifferent between the two sets of observations, and doesn't really need to be.

So, how about U(a1, o, a2) =
UN(a1, o, a2) + max_b(US(a1, o, b)), if o is not in Press
US(a1, o, a2) + max_b(UN(a1, o, b)), if o is in Press

or, in your notation, U(a1, o, a2) = g(a1, o) + UN(a1, o, a2) if o is in Press, or US(a1, o, a2) + f(a1, o) if o is in Press.

Introducing Corrigibility (an FAI research subfield)

OK, you're right on that point; I misunderstood the "managing the news" problem because I hadn't quite realised that it was about shifting observations between the Press/NotPress sets. As you've said, the only resolution is to select a1 based on
E[max_b(UN(a1, O, b) | O; a1]
and not
E[max_b(UN(a1, O, b) | O not in Press; a1]

Introducing Corrigibility (an FAI research subfield)

OK, following on from Karl's comment below, which seems to resolve Section 4.2, I think it shouldn't be that difficult to resolve some or all of the Section 4.1 concerns as well. In order to satisfy Desideratum 4, we need the agent to be incentivised to build in safeguards to ensure that its possible future actions will be rated well according to US in cases where the shutdown button is pressed. The most obvious way to do this is to penalise situations where there is no good shutdown action according to US and reward situations where there is such an actio... (read more)

5Karl7yFirstly, the important part of my modification to the indifference formalism is not about conditioning on the actual o but it's the fact that in evaluating the expectation of UN it take the action in A2 (for a given pair (a1,o)) which maximize UN instead of the action which maximize U (note that U is equal to US in the case that o is not in Press.). Secondly an agent which chose a1 by simply maximizing E[UN | NotPress; a1] + E[US | Press; a1] do exhibit pathological behaviors. In partcular, there will still be incentives to manage the news, but from both sides now (there is an incentive to cause the button to be pressed in the event of an information which is bad news from the point of view of UN and incentives to cause the button to not be pressed in the events of information which is bad news from the point of view of US.
Anthropic decision theory for selfish agents

I guess your comment means that you must have blinked an eye, so your comment can't be completely true. That said, as discussions of pre-emptively submissive gnomes go, I would generally expect the amount of eye-blinking on LW to be well below average ^_~

-1Lumifer7yI arched my eyebrow :-P
Anthropic decision theory for selfish agents

OK, time for further detail on the problem with pre-emptively submissive gnomes. Let's focus on the case of total utilitarianism, and begin by looking at the decision in unlinked form, i.e. we assume that the gnome's advice affects only one human if there is one in the room, and zero humans otherwise. Conditional on there being a human in cell B, the expected utility of the human in cell B buying a ticket for \$x is, indeed, (1/3)(-x) + (2/3)(1-x) = 2/3 - x, so the breakeven is obviously at x = 2/3. However, if we also assume that the gnome in the other cel... (read more)

-1Lumifer7yOne of the aspects of what makes LW what it is -- people with serious expressions on their faces discuss the problems with pre-emptively submissive gnomes and nobody blinks an eye X-D
Anthropic decision theory for selfish agents

Yep, I think that's a good summary. UDT-like reasoning depends on the utility values of counterfactual worlds, not just real ones.

2Stuart_Armstrong7yI'm starting to think this is another version of the problem of personal identity... But I want to be thorough before posting anything more.
Anthropic decision theory for selfish agents

I don't think that works, because 1) isn't actually satisfied. The selfish human in cell B is indifferent over worlds where that same human doesn't exist, but the gnome is not indifferent.

Consequently, I think that as one of the humans in your "closest human" case you shouldn't follow the gnome's advice, because the gnome's recommendation is being influenced by a priori possible worlds that you don't care about at all. This is the same reason a human with utility function T shouldn't follow the gnome recommendation of 4/5 from a gnome with utili... (read more)

2Stuart_Armstrong7yLet's ditch the gnomes, they are contributing little to this argument. My average ut=selfish argument was based on the fact that if you changed the utility of everyone who existed from one system to the other, then people's utilities would be the same, given that they existed. The argument here is that if you changed the utility of everyone from one system to the other, then this would affect their counterfactual utility in the worlds where they don't exist. That seems... interesting. I'll reflect further.
1Stuart_Armstrong7yI think I'm starting to see the argument...
Anthropic decision theory for selfish agents

Having established the nature of the different utility functions, it's pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it's important not to confuse the two. This is part of the reason for the confusion over Beluga's post.
Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise.
Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my c... (read more)

1Stuart_Armstrong7yI like your analysis. Interestingly, the gnomes advise in the T and A cases for completely different reasons than in the S case. But let me modify the case slightly: now the gnomes adopt the utility function of the closest human. This makes no difference to the T and A cases. But now in the S case, the gnomes have a linked decision, and E[S] = 0.25(-x) + 0.25(-x) + 0.5(1-x) = 0.5-x This also seems to satisfy "1) Their utility functions coincide exactly over all a priori possible worlds. 2) The humans do not have any extra information that the gnomes do not." Also, the gnomes are now deciding the T, A and S cases for the same reasons (linked decisions).
Anthropic decision theory for selfish agents

I think I can resolve the confusion here, but as a quick summary, I'm quite sure Beluga's argument holds up. The first step is to give a clear statement of what the difference is between the indexical and non-indexical versions of the utility functions. This is important because the UDT approach translates to "What is the optimal setting for decision variable X, in order to maximise the expected utility over all a priori possible worlds that are influenced by decision variable X?" On the basis of UDT or UDT-like principles such as an assumption o... (read more)

4lackofcheese7yHaving established the nature of the different utility functions, it's pretty simple to show how the gnomes relate to these. The first key point to make, though, is that there are actually two distinct types of submissive gnomes and it's important not to confuse the two. This is part of the reason for the confusion over Beluga's post. Submissive gnome: I adopt the utility function of any human in my cell, but am completely indifferent otherwise. Pre-emptively submissive gnome: I adopt the utility function of any human in my cell; if there is no human in my cell I adopt the utility function they would have had if they were here. The two are different precisely in the key case that Stuart mentioned---the case where there is no human at all in the gnome's cell. Fortunately, the utility function of the human who will be in the gnome's cell (which we'll call "cell B") is entirely well-defined, because any existing human in the same cell will always end up with the same utility function. The "would have had" case for the pre-emptively submissive gnomes is a little stranger, but it still makes sense---the gnome's utility would correspond to the anti-indexical component JU of the human's utility function U (which, for selfish humans, is just zero). Thus we can actually remove all of the dangling references in the gnome's utility function, as per the discussion between Stuart and Beluga. If U is the utility function the human in cell B has (or would have), then the submissive gnome's utility function is IU (note the indexicalisation!) whereas the pre-emptively submissive gnome's utility function is simply U. Following Beluga's post here [http://lesswrong.com/r/discussion/lw/l58/anthropic_decision_theory_for_selfish_agents/bhlz] , we can use these ideas to translate all of the various utility functions to make them completely objective and observer-independent, although some of them reference cell B specifically. If we refer to the second cell as "cell C", swapping between
Anthropic decision theory for selfish agents

There's some confusion here that needs to be resolved, and you've correctly pinpointed that the issue is with the indexical versions of the utility functions, or, equivalently, the gnomes who don't see a human at all.

I think I have a comprehensive answer to these issues, so I'm going to type it up now.

On Caring

A good point. By abuse I wouldn't necessarily mean anything blatant though, just that selfish people are happy to receive resources from selfless people.

Sure, and there isn't really anything wrong with that as long as the person receiving the resources really needs them.

Valuing people equally by default when their instrumental value isn't considered. I hope I didn't misunderstand you. That's about as extreme it gets but I suppose you could get even more extreme by valuing other people more highly than yourself.

The term "altruism" is often ... (read more)

On Caring

That's one way to put it, yes.

On Caring

One can reasonably argue the other way too. New children are easier to make than new adults.

True. However, regardless of the relative value of children and adults, it is clear that one ought to devote significantly more time and effort to children than to adults, because they are incapable of supporting themselves and are necessarily in need of help from the rest of society.

Since she has finite resources, is there a practical difference?

Earlier I specifically drew a distinction between devoting time and effort and valuation; you don't have to value ... (read more)

2hyporational7yA good point. By abuse I wouldn't necessarily mean anything blatant though, just that selfish people are happy to receive resources from selfless people. Valuing people equally by default when their instrumental value isn't considered. I hope I didn't misunderstand you. That's about as extreme it gets but I suppose you could get even more extreme by valuing other people more highly than yourself.
One Life Against the World

If you have the values already and you don't have any reason to believe the values themselves could be problematic, does it matter how you got them?

It may be that an altruistic high in the past has led you to value altruism in the present, but what matters in the present is whether you value the altruism itself over and above the high.

On Caring

Accounting for possible failure modes and the potential effects of those failure modes is a crucial part of any correctly done "morality math".

Granted, people can't really be relied upon to actually do it right, and it may not be a good idea to "shut up and multiply" if you can expect to get it wrong... but then failing to shut up and multiply can also have significant consequences. The worst thing you can do with morality math is to only use it when it seems convenient to you, and ignore it otherwise.

However, none of this talk of failu... (read more)

On Caring

Probably not just any random person, because one can reasonably argue that children should be valued more highly than adults.

However, I do think that the mother should hold other peoples' children as being of equal value to her own. That doesn't mean valuing her own children less, it means valuing everyone else's more.

Sure, it's not very realistic to expect this of people, but that doesn't mean they shouldn't try.

1hyporational7yOne can reasonably argue the other way too. New children are easier to make than new adults. Since she has finite resources, is there a practical difference? It seems to me extreme altruism is so easily abused that it will inevitably wipe itself out in the evolution of moral systems.
On Caring

So, either there is such a thing as the "objective" value and hence, implicitly, you should seek to approach that value, or there is not.

I don't see any reason to believe in an objective worth of this kind, but I don't really think it matters that much. If these is no single underlying value, then the act of assigning your own personal values to people is still the same thing as "passing judgement on the worth of humans", because it's the only thing those words could refer to; you can't avoid the issue simply by calling it a subjective ... (read more)

3Lumifer7ySo, for example, you believe that to a mother the value of her own child should be similar to that of a random person anywhere on Earth -- right? It's a "mere circumstance" that this particular human happens to be her child.
On Caring

My actions alone don't necessarily imply a valuation, or at least not one that makes any sense.

There are a few different levels at which one can talk about what it means to value something, and revealed preference is not the only one that makes sense.

2hyporational7yIs this basically another way of saying that you're not the king of your brain, or something else?
On Caring

I'm not entirely sure what a "personal perception of the value of a human being" is, as distinct from the value or worth of a human being. Surely the latter is what the former is about?

Granted, I guess you could simply be talking about their instrumental value to yourself (e.g. "they make me happy"), but I don't think that's really the main thrust of what "caring" is.

3Lumifer7yThe "worth a human being" implies that there is one, correct, "objective" value for that human being. We may not be able to observe it directly so we just estimate it, with some unavoidable noise and errors, but theoretically the estimates will converge to the "true" value. The worth of a human being is a function with one argument: that human being. The "personal perception of the value of a human being" implies that there are multiple, different, "subjective" values for the same human being. There is no single underlying value to which the estimates converge. The personal perception of a value is a function with two arguments: who is evaluated and who does the evaluation.
A few thoughts on a Friendly AGI (safe vs friendly, other minds problem, ETs and more)

I can (and do) believe that consciousness and subjective experience are things that exist, and are things that are important, without believing that they are in some kind of separate metaphysical category.

0the-citizen7yI understand, but I just want to urge you to examine the details of that really closely, starting with examining "consciousness"s place in Dualist thought. What I'm suggesting if many of us have got a concept from a school of thought you explicitly disagree with embedded in your thinking, and that's worth looking into. It's always alluring to dismiss things that run contrary to the existence of something we feel is important, but sometimes those rare times when we question our core values and thought that we make the most profound leaps forward.
One Life Against the World

There is no need for morality to be grounded in emotional effects alone. After all, there is also a part of you that thinks that there is, or might be, something "horrible" about this, and that part also has input into your decision-making process.

Similarly, I'd be wary of your point about utility maximisation. You're not really a simple utility-maximising agent, so it's not like there's any simple concept that corresponds to "your utility". Also, the concept of maximising "utility generally" doesn't really make sense; there i... (read more)

0hyporational7yThe high is a mechanism by which values are established. Reward or punishment in the past but not necessarily in the present is sufficient for making you value something in the present. Because of our limited memories introspection is pretty useless for figuring out whether you value something because of the high or not.
Questions on Theism

It's a rather small sample size, isn't it? I don't think you can draw much of a conclusion from it.

Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities

The game AIs for popular strategy games are often bad because the developers don't actually have the time and resources to make a really good one, and it's not a high priority anyway - most people playing games like Civilization want an AI that they'll have fun defeating, not an AI that actually plays optimally.

I think you're mostly correct on this. Sometimes difficult opponents are needed, but for almost all games that can be trivially achieved by making the AI cheat rather than improving the algorithms. That said, when playing a game vs an AI you do w... (read more)

Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities

I wouldn't say that poker is "much easier than the classic deterministic games", and poker AI still lags significantly behind humans in several regards. Basically, the strongest poker bots at the moment are designed around solving for Nash equilibrium strategies (of an abstracted version of the game) in advance, but this fails in a couple of ways:

1. These approaches haven't really been extended past 2- or 3-player games.
2. Playing a NE strategy makes sense if your opponent is doing the same, but your opponent almost always won't be. Thus, in ord
Superintelligence Reading Group - Section 1: Past Developments and Present Capabilities

Although computers beat humans at board games without needing any kind of general intelligence at all, I don't think that invalidates game-playing as a useful domain for AGI research.

The strength of AI in games is, to a significant extent, due to the input of humans in being able to incorporate significant domain knowledge into the relatively simple algorithms that game AIs are built on.

However, it is quite easy to make game AI into a far, far more challenging problem (and, I suspect, a rather more widely applicable one)---consider the design of algorithms... (read more)

On Caring

I agree; I don't see a significant difference between thinking that I ought to value other human beings equally but failing to do so, and actually viewing them equally and not acting accordingly. If I accept either (1) or (2) it's still a moral failure, and it is one that I should act to correct. In either case, what matters is the actions that I ought to take as a result (i.e. effective altruism), and I think the implications are the same in both cases.

That being said, I guess the methods that I would use to correct the problem would be different in eithe... (read more)

3Jiro7yYou seem to be agreeing by not really agreeing. What does it even mean to say "I value other people equally but I don't act on that"? Your actions imply a valuation, and in that implied valuation you clearly value yourself more than other people. It's like saying "I prefer chocolate over vanilla ice cream, but if you give me them I'll always pick the vanilla". Then you don't really prefer chocolate over vanilla, because that's what it means to prefer something.
On Caring

Yes, if I really ought to value other human beings equally then it means I ought to devote a significant amount of time and/or money to altruistic causes, but is that really such an absurd conclusion?

Perhaps I don't do those things, but that doesn't mean I can't and it doesn't mean I shouldn't.

1Jiro7yYou can say either 1. You ought to value other human beings equally, but you don't. 2. You do value other human beings equally, and you ought to act in accordance with that valuation, but you don't. You appear to be claiming 2 and denying 1. However, I don't see a significant difference between 1 and 2; 1 and 2 result in exactly the same actions by you and it ends up just being a matter of semantics.
Applications of logical uncertainty

Here's some of the literature:
Heuristic search as evidential reasoning by Hansson and Mayer
A Bayesian Approach to Relevance in Game Playing by Baum and Smith

and also work following Stuart Russell's concept of "metareasoning"
On Optimal Game-Tree Search using Rational Meta-Reasoning by Russell and Wefald
Principles of metareasoning by Russell and Wefald
and the relatively recent
Selecting Computations: Theory and Applications by Hay, Russell, Tolpin and Shimony.

On the whole, though, it's relatively limited. At a bare minimum there is plenty of room ... (read more)

Applications of logical uncertainty

Surely probability or something very much like it is conceptually the right way to deal with uncertainty, whether it's logical uncertainty or any other kind? Granted, most of the time you don't want to deal with explicit probability distributions and Bayesian updates because the computation can be expensive, but when you work with approximations you're better off if you know what it is you're approximating.

In the area of search algorithms, I think these kinds of approaches are woefully underrepresented, and I don't think it's because they aren't particula... (read more)

9lackofcheese7yHere's some of the literature: Heuristic search as evidential reasoning [http://arxiv.org/abs/1304.1509] by Hansson and Mayer A Bayesian Approach to Relevance in Game Playing [http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.26.7961] by Baum and Smith and also work following Stuart Russell's concept of "metareasoning" On Optimal Game-Tree Search using Rational Meta-Reasoning [http://ijcai.org/Past%20Proceedings/IJCAI-89-VOL1/PDF/053.pdf] by Russell and Wefald Principles of metareasoning [http://www.agent.ai/doc/upload/200403/russ91_1.pdf] by Russell and Wefald and the relatively recent Selecting Computations: Theory and Applications [http://arxiv.org/abs/1207.5879] by Hay, Russell, Tolpin and Shimony. On the whole, though, it's relatively limited. At a bare minimum there is plenty of room for probabilistic representations in order to give a better theoretical foundation, but I think there is also plenty of practical benefit to be gained from those techniques as well. As a particular example of the applicability of these methods, there is a phenomenon referred to as "search pathology" or "minimax pathology", in which for certain tree structures searching deeper actually leads to worse results, when using standard rules for propagating value estimates up a tree (most notably minimax). From a Bayesian perspective this clearly shouldn't occur, and hence this phenomenon of pathology must be the result of a failure to correctly update on the evidence.