# All of Chantiel's Comments + Replies

It doesn't matter how many fake versions of you hold the wrong conclusion about their own ontological status, since those fake beliefs exist in fake versions of you. The moral harm caused by a single real Chantiel thinking they're not real is infinitely greater than infinitely many non-real Chantiels thinking they are real.

Interesting. When you say "fake" versions of myself, do you mean simulations? If so, I'm having a hard time seeing how that could be true. Specifically, what's wrong about me thinking I might not be "real"? I mean, if I though I was i...

"If the real Chantiel is so correlated with you that they will do what you will do, then you should believe you're real so that the real Chantiel will believe they are real, too. This holds even if you aren't real."

By "real", do you mean non-simulated? Are you saying that even if 99% of Chantiels in the universe are in simulations, then I should still believe I'm not in one? I don't know how I could convince myself of being "real" if 99% of Chantiels aren't.

Do you perhaps mean I should act as if I were non-simulated, rather than literally being non-simulated?

1MackGopherSena1y
[edited]

Thanks for the response, Gwern.

he is explicit that the minds in the simulation may be only tenuously related to 'real'/historical minds;

Oh, I guess I missed this. Do you know where Bostrom said the "simulations" can only tenuously related to real minds? I was rereading the paper but didn't see mention of this. I'm just surprised, because normally I don't think zoo-like things would be considered simulations.

This falls under either #1 or #2, since you don't say what human capabilities are in the zoo or explain how exactly this zoo situation matters to

...

I've realized I'm somewhat skeptical of the simulation argument.

The simulation argument proposed by Bostrom argued, roughly, that either almost exactly all Earth-like worlds don't reach a posthuman level, almost exactly all such civilizations don't go on to build many simulations, or that we're almost certainly in a simulation.

Now, if we knew that the only two sorts of creatures that experience what we experience are either in simulations or the actual, original, non-simulated Earth, then I can see why the argument would be reasonable. However, I don't kno...

8gwern1y
I think you should reread the paper [https://www.simulation-argument.com/simulation.pdf]. This falls under either #1 or #2, since you don't say what human capabilities are in the zoo or explain how exactly this zoo situation matters to running simulations; do we go extinct at some time long in the future when our zookeepers stop keeping us alive (and "go extinct before reaching a “posthuman” stage"), having never become powerful zookeeper-level civs ourselves, or are we not permitted to ("extremely unlikely to run a significant number of simulations")? This is just fork #3: "we are in a simulation". At no point does fork #3 require it to be an exact true perfect-fidelity simulation of an actual past, and he is explicit that the minds in the simulation may be only tenuously related to 'real'/historical minds; if aliens would be likely to create Earth-like worlds, for any reason, that's fine because that's what necessary, because we observe an Earth-like world (see the indifference principle section).
1TLW1y
Interesting. I am also skeptical of the simulation argument, but for different reasons. My main issue is: the normal simulation argument requires violating the Margolus–Levitin theorem[1], as it requires that you can do an arbitrary amount of computation[2] via recursively simulating[3]. This either means that the Margolus–Levitin theorem is false in our universe (which would be interesting), we're a 'leaf' simulation where the Margolus–Levitin theorem holds, but there's many universes where it does not (which would also be interesting), or we have a non-zero chance of not being in a simulation. This is essentially a justification for 'almost exactly all such civilizations don't go on to build many simulations'. 1. ^ A fundamental limit on computation: ≤6∗1033operations/second/Joule 2. ^ Note: I'm using 'amount of computation' as shorthand for 'operations / second / Joule'. This is a little bit different than normal, but meh. 3. ^ Call the scaling factor - of amount of computation necessary to simulate X amount of computation - C. So e.g. C=0.5 means that to simulate 1 unit of computation you need 2 units of computation. If C≥1, then you can violate the Margolus–Levitin theorem simply by recursively sub-simulating far enough. If C<1, then a universe that can do X computation can simulate no more than CX total computation regardless of how deep the tree is, in which case there's at least a 1−C chance that we're in the 'real' universe.

For robustness, you have a dataset that's drawn from the wrong distribution, and you need to act in a way that you would've acted if it was drawn from the correct distribution. If you have an amplification dynamic that moves models towards few attractors, then changing the starting point (training distribution compared to target distribution) probably won't matter. At that point the issue is for the attractor to be useful with respect to all those starting distributions/models. This doesn't automatically make sense, comparing models by usefulness doesn't

...

I've been thinking about what you've said about iterated amplification, and there are some things I'm unsure of. I'm still rather skeptical of the benefit of iterated amplification, so I'd really appreciate a response.

You mentioned that iterated amplification can be useful when you have only very limited, domain-specific models of human behavior, where such models would be unable to come up with the ability to create code. However, there are two things I'm wondering about. The first is that it seems to me that, for a wide range of situations, you need a ge...

Amplification induces a dynamic in the model space, it's a concept of improving models (or equivalently in this context, distributions). This can be useful when you don't have good datasets, in various ways. For robustness, you have a dataset that's drawn from the wrong distribution, and you need to act in a way that you would've acted if it was drawn from the correct distribution. If you have an amplification dynamic that moves models towards few attractors, then changing the starting point (training distribution compared to target distribution) probably won't matter. At that point the issue is for the attractor to be useful with respect to all those starting distributions/models. This doesn't automatically make sense, comparing models by usefulness doesn't fall out of the other concepts. For chess, you'd use the idea of winning games (better models are those that win more, thus amplification should move models towards winning), which is not inherent in any dataset of moves. For AGI, this is much more nebulous, but things like reflection (thinking about a problem longer, conferring with others, etc.) seem like a possible way of bootstrapping a relevant amplification, if goodharting is kept in check throughout the process.

I hadn't fully appreciated to difficultly that could result from AIs having alien concepts, so thanks for bringing it up.

However, it seems to me that this would not be a big problem, provided the AI is still interpretable. I'll provide two ways to handle this.

For one, you could potentially translate the human concepts you care about into statements using the AI's concepts. Even if the AI doesn't use the same concepts people do, AIs are still incentivized to form a detailed model of the world. If you can have access to all the AI's world model, but still ca...

Another problem is that the system cannot represent and communicate the whole predicted future history of the universe to us.

This is a good point and one that I, foolishly, hadn't considered.

However, it seems to me that there is a way to get around this. Specifically, just provide the query-answerers the option to refuse to evaluate the utility of a description of a possible future. If this happens, the AI won't be able to have its utility function return a value for such a possible future.

To see how to do this, note that if a description of a possible ...

Sorry for taking a ridiculously long time to get back to you. I was dealing with some stuff.

This works great when you can recognize good things within the represention the AI uses to think about the world. But what if that's not true?

Yes, that is correct. As I said in the article, a high degree of interpretability is necessary to use the idea.

It's true that interpretability is required, but the key point of my scheme is this: interpretability is all you need for intent alignment, provided my scheme is correct. I don't know of any other alignment strate...

2Charlie Steiner1y
My point was that you don't just need interpretability, you need the AI to "meet you halfway" by already learning the right concept that you want to interpret. You might also need it to not learn "spurious" concepts that fit the data but generalize poorly. This doesn't happen by default AFAICT, it needs to be designed for.

I've made a few posts that seemed to contain potentially valuable ideas related to AI safety. However, I got almost no feedback on them, so I was hoping some people could look at them and tell me what they think. They still seem valid to me, and if they are, they could potentially be very valuable contributions. And if they aren't valid, then I think knowing the reason for this could potentially help me a lot in my future efforts towards contributing to AI safety.

The posts are:

...

FWIW, this conclusion is not clear to me. To return to one of my original points: I don't think you can dodge this objection by arguing from potentially idiosyncratic preferences, even perfectly reasonable ones; rather, you need it to be the case that no rational agent could have different preferences. Either that, or you need to be willing to override otherwise rational individual preferences when making interpersonal tradeoffs.

Yes, that's correct. It's possible that there are some agents with consistent preferences that really would wish to get extra...

If the impact measure was poorly implemented, then I think such an impact-reducing AI could indeed result in the world turning out that way. However, note that the technique in the paper is intended to, for a very wide range of variables, make the world if the AI wasn't turned on as similar as possible to what it would be like if it was turned on. So, you can potentially avoid the AI-controlled-drone scenario by including the variable "number of AI-controlled drones in the world" or something correlated with it, as these variables could be have quite diffe...

I have some concerns about an impact measure proposed here. I'm interested on working on impact measures, and these seem like very serious concerns to me, so it would be helpful seeing what others think about them. I asked Stuart, one of the authors, about these concerns, but he said it was too busy to work on dealing with them.

First, I'll give a basic description of the impact measure. Have your AI be turned on from some sort of stochastic process that may or may not result in the AI being turned on. For example, consider sending a photo through a semi-si...

I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don't think it's necessary to do this: unboundedness below means there's a sense in which everyone is a potential "negative utility monster" if you torture them long enough. I think the core issue here is whether there's some point at which we just stop caring, or whether that's morally repugnant.

Fair enough. So I'll provide a non-sadistic scenario. Consider again the scenario I previously described in which you have a...

I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don't think it's necessary to do this: unboundedness below means there's a sense in which everyone is a potential "negative utility monster" if you torture them long enough. I think the core issue here is whether there's some point at which we just stop caring, or whether that's morally repugnant.

Fair enough. So I'll provide a non-sadistic scenario. Consider again the scenario I previously described in which you have a...

2conchis1y
FWIW, this conclusion is not clear to me. To return to one of my original points: I don't think you can dodge this objection by arguing from potentially idiosyncratic preferences, even perfectly reasonable ones; rather, you need it to be the case that no rational agent could have different preferences. Either that, or you need to be willing to override otherwise rational individual preferences when making interpersonal tradeoffs.  To be honest, I'm actually not entirely averse to the latter option: having interpersonal trade-offs determined by contingent individual risk-preferences has never seemed especially well-justified to me (particularly if probability is in the mind [https://www.lesswrong.com/posts/f6ZLxEWaankRZ2Crv/probability-is-in-the-mind]). But I confess it's not clear whether that route is open to you, given the motivation for your system as a whole. That makes sense, thanks.

Also, in addition to my previous response, I want to note that the issues with unbounded satisfaction measures are not unique to my infinite ethical system. Instead, they are common potential problems with a wide variety of aggregate consequentialist theories.

For example, imagine suppose your a classical utilitarianism with an unbounded utility measure per person. And suppose you know that the universe is finite will consist of a single inhabitant with a utility whose probability distributions follows a Cauchy distribution. Then your expected utilities are...

1conchis1y
Agreed!

Thanks. I've toyed with similar ideas perviously myself. The advantage, if this sort of thing works, is that it conveniently avoids a major issue with preference-based measures: that they're not unique and therefore incomparable across individuals. However, this method seems fragile in relying on a finite number of scenarios: doesn't it break if it's possible to imagine something worse than whatever the currently worst scenario is? (E.g. just keep adding 50 more years of torture.) While this might be a reasonable approximation in some circumstances, it do

...
1conchis1y
Fair. Intuitively though, this feels more like a rescaling of an underlying satisfaction measure than a plausible definition of satisfaction to me. That said, if you're a preferentist, I accept this is internally consistent, and likely an improvement on alternative versions of preferentism.     Yes, and I am obviously not proposing a solution to this problem! More just suggesting that, if there are infinities in the problem that appear to correspond to actual things we care about, then defining them out of existence seems more like deprioritising the problem than solving it.  I think this framing muddies the intuition pump by introducing sadistic preferences, rather than focusing just on unboundedness below. I don't think it's necessary to do this: unboundedness below means there's a sense in which everyone is a potential "negative utility monster" if you torture them long enough. I think the core issue here is whether there's some point at which we just stop caring, or whether that's morally repugnant. Sorry, sloppy wording on my part. The question should have been "does this actually prevent us having a consistent preference ordering over gambles over universes" (even if we are not able to represent those preferences as maximising the expectation of a real-valued social welfare function)? We know (from lexicographic preferences) that "no-real-valued-utility-function-we-are-maximising-expectations-of" does not immediately imply "no-consistent-preference-ordering" (if we're willing to accept orderings that violate continuity). So pointing to undefined expectations doesn't seem to immediately rule out consistent choice.

For the record, according to my intuitions, average consequentialism seems perfectly fine to me in a finite universe.

That said, if you don't like using average consequentialism in a finite case, I don't personally see what's wrong with just having a somewhat different ethical system for finite cases. I know it seems ad-hoc, but I think there really is an important distinction between finite and infinite scenarios. Specifically, people have the moral intuition that larger numbers of satisfied lives are more valuable than smaller numbers of them, which avera...

In P(old probability of being in first group) * 1 = (P(old probability of being in first group) + $\epsilon) * u the epsilon is smaller than any real number and there is no real small enough that it could characterise the difference between 1 and u. Could you explain why you think so? I had already explained why would be real, so I'm wondering if you had an issue with my reasoning. To quote my past self: Remember that if you decide to take a certain action, that implies that other agents who are sufficiently similar to you and in sufficiently similar ... It's possible that (a) is true, and much of your response seems like it's probably (?) targeted at that claim, but FWIW, I don't think this case can be convincingly made by appealing to contingent personal values: e.g. suggesting that another 50 years of torture wouldn't much matter to you personally won't escape the objection, as long as there's a possible agent who would view their life-satisfaction as being materially reduced in the same circumstances. To some extent, whether or not life satisfaction is bounded just comes down to how you want to measu... 2conchis1y Thanks. I've toyed with similar ideas perviously myself. The advantage, if this sort of thing works, is that it conveniently avoids a major issue with preference-based measures: that they're not unique and therefore incomparable across individuals. However, this method seems fragile in relying on a finite number of scenarios: doesn't it break if it's possible to imagine something worse than whatever the currently worst scenario is? (E.g. just keep adding 50 more years of torture.) While this might be a reasonable approximation in some circumstances, it doesn't seem like a fully coherent solution to me. IMO, the problem highlighted by the utility monster objection is fundamentally a prioritiarian one. A transformation that guarantees boundedness above seems capable of resolving this, without requiring boundedness below (and thus avoiding the problematic consequences that boundedness below introduces). Given issues with the methodology proposed above for constructing bounded satisfaction functions, it's still not entirely clear to me that this is really a decision, as opposed to an empirical question (which we then need to decide how to cope with from a normative perspective). This seems like it may be a key difference in our perspectives here. Well, in general terms the answer to this question has to be either (a) bite a bullet, or (b) find another solution that avoids the uncomfortable trade-offs. It seems to me that you'll be willing to bite most bullets here. (Though I confess it's actually a little hard for me to tell whether you're also denying that there's any meaningful tradeoff here; that case still strikes me as less plausible.) If so, that's fine, but I hope you'll understand why to some of us that might feel less like a solution to the issue of infinities, than a decision to just not worry about them on a particular dimension. Perhaps that's ultimately necessary, but it's definitely non-ideal from my perspective. A final random thought/question: I get Thanks for the response. Third, the average view prefers arbitrarily small populations over very large populations, as long as the average wellbeing was higher. For example, a world with a single, extremely happy individual would be favored to a world with ten billion people, all of whom are extremely happy but just ever-so-slightly less happy than that single person. In an infinite universe, there's already infinitely-many people, so I don't think this applies to my infinite ethical system. First, consider a world inhabited by a single person enduring ... 1conchis1y YMMV, but FWIW allowing a system of infinite ethics to get finite questions (which should just be a special case) wrong seems a very non-ideal property to me, and suggests something has gone wrong somewhere. Is it really never possible to reach a state where all remaining choices have only finite implications? Under my eror model you run into trouble when you treat any transfininte amount the same. From that perspective recognising two transfinite amounts that could be different is progress. I guess this is the part I don't really understand. My infinite ethical system doesn't even think about transfinite quantities. It only considers the prior probability over ending up in situations, which is always real-valued. I'm not saying you're wrong, of course, but I still can't see any clear problem. Another attempt to throw a situation you might not be able to hand ... 2Slider1y In P(old probability of being in first group) * 1 = (P(old probability of being in first group) +$\epsilon) * u  the epsilon is smaller than any real number and there is no real small enough that it could characterise the difference between 1 and u. If you have some odds or expectations that deal with groups and you have other considerations that deal with a finite amount of individuals you either have the finite people not impact the probabilities at all or the probabilities will stay infinidesimally close (for which is see a~b been used as I am reading up on infinities) which will conflict with the desarata of In the usual way lexical priorities enter the picture beecause of something large but in your system there is a lexical priority because of something small, disintctions so faint that they become separable from the "big league" issues.

My point was more that, even if you can calculate the expectation, standard versions of average utilitarianism are usually rejected for non-infinitarian reasons (e.g. the repugnant conclusion) that seem like they would plausibly carry over to this proposal as well.

If I understand correctly, average utilitarianism isn't rejected due to the repugnant conclusion. In fact, it's the opposite: the repugnant conclusion is a problem for total utilitarianism, and average utilitarianism is one way to avoid the problem. I'm just going off what I read on The Stanfo...

1conchis1y
Re boundedness: I realise now that I may have moved through a critical step of the argument quite quickly above, which may be why this quote doesn't seem to capture the core of the objection I was trying to describe. Let me take another shot.  I am very much not suggesting that 50 years of torture does virtually nothing to [life satisfaction - or whatever other empirical value you want to take as axiologically primitive; happy to stick with life satisfaction as a running example]. I am suggesting that 50 years of torture is terrible for [life satisfaction]. I am then drawing a distinction between [life-satisfaction] and the output of the utility function that you then take expectations of. The reason I am doing this, is because it seems to me that whether [life satisfaction] is bounded is a contingent empirical question, not one that can be settled by normative fiat in order to make it easier to take expectations.  If, as a matter of empirical fact, [life satisfaction] is bounded, then the objection I describe will not bite.  If, on the other hand [life-satisfaction] is not bounded, then requiring the utility function you take expectations of to be bounded forces us to adopt some form of sigmoid mapping from [life satisfaction] to "utility", and this in turn forces us, at some margin, to not care about things that are absolutely awful (from the perspective of [life satisfaction]). (If an extra 50 years of torture isn't sufficient awful for some reason, then we just need to pick something more awful for the purposes of the argument). Perhaps because I didn't explain this very well the first time, what's not totally clear to me from your response, is whether you think: (a) [life satisfaction] is in fact bounded; or (b) even if [life satisfaction] is unbounded, it's actually ok to not care about stuff that is absolutely (infinitely?) awful from the perspective of [life-satisfaction] because it lets us take expectations more conveniently. [Intentionally provocati
1conchis1y
Re the repugnant conclusion: apologies for the lazy/incorrect example. Let me try again with better illustrations of the same underlying point. To be clear, I am not suggesting these are knock-down arguments; just that, given widespread (non-infinitarian) rejection of average utilitarianisms, you probably want to think through whether your view suffers from the same issues and whether you are ok with that.  Though there's a huge literature on all of this, a decent starting point is here [https://www.utilitarianism.net/population-ethics#the-average-view]:

Oh, I'm sorry; you're right. I messed up on step two of my proposed proof that your technique would be vulnerable to the same problem.

However, it still seems to me that agents using your technique would also be concerning likely to fail to cross, or otherwise suffer from other problems. Like last time, suppose and that . So if the agent decides to cross, it's either because of the chicken rule, because not crossing counterfactually results in utility -10, or because crossing counterfactually results in utility greater than -10...

2abramdemski1y
Right. This is precisely the sacrifice I'm making in order to solve Troll Bridge. Something like this seems to be necessary for any solution, because we already know that if your expectations of consequences entirely respect entailment, you'll fall prey to the Troll Bridge! In fact, your "stop thinking"/"rollback" proposals have precisely the same feature: you're trying to construct expectations which don't respect the entailment. So I think if you reject this, you just have to accept Troll Bridge. Well, this is precisely not what I mean when I say that the counterfactuals line up with reality. What I mean is that they should be empirically grounded, so, in cases where the condition is actually met, we see the predicted result. Rather than saying this AI's counterfactual expectations are "wrong in reality", you should say they are "wrong in logic" or something like that. Otherwise you are sneaking in an assumption that (a) counterfactual scenarios are real, and (b) they really do respect entailment. We can become confident in my strange counterfactual by virtue of having seen it play out many times, eg, crossing similar bridges many times. This is the meat of my take on counterfactuals: to learn them in a way that respects reality, rather than trying to deduce them. To impose empiricism on them, ie, the idea that they must make accurate predictions in the cases we actually see. And it simply is the case that if we prefer such empirical beliefs to logic, here, we can cross. So in this particular example, we see a sort of evidence that respecting entailment is a wrong principle for counterfactual expectations. The 5&10 problem can also be thought of as evidence against entailment as counterfactual. You have to realize that reasoning in this way amounts to insisting that the correct answer to Troll Bridge is not crossing, because the troll bridge variant you are proposing just punishes anyone whose reasoning differs from entailment. And again, you were also propo

Thanks for clearing some things up. There are still some things I don't follow, though.

You said my system would be ambivalent between between sand and insult. I just wanted to make sure I understand what you're saying here. Is insult specifically throwing sand at the same people that get it thrown at in dust, and get the sand amount of sand thrown at them at the same throwing speed? If so, then it seems to me that my system would clearly prefer sand to insult. This is because there in some non-zero chance of an agent, conditioning only on being in this uni...

3Slider1y
Yes, insult is supposed to add to the injury. Under my eror model you run into trouble when you treat any transfininte amount the same. From that perspective recognising two transfinite amounts that could be different is progress. Another attempt to throw a situation you might not be able to handle. Instead of having 2 infinite groups of unknown relative size all receiving the same bad thing as compensation for the abuse 1 slice of cake for one gorup and 2 slices of cake for the second group. Could there be a difference in the group size that perfectly balances the cake slice difference in order to keep cake expectation constant? Additional challenging situation. Instead of giving 1 or 2 slices of cake say that each slice is 3 cm wide so the original choices are between 3 cm of cake and 6 cm of cake. Now take some custom amount of cake slice (say 2.7 cm) then determine what would be group size to keep the world cake expectation the same. Then add 1 person to that group. Then convert that back to a cake slice width that keeps cake expectation the same. How wide is the slice?. Another formulation of the same challenge: Define a real number r for which converting that to a group size would get you a group of 5 people. Did you get on board about the difference between "help all the stars" and "all the stars as they could have been"?

The fact that it's lavishly uncomputable is a problem for using it in practice, of course :-).

Yep. To be fair, though, I suspect any ethical system that respects agents' arbitrary preferences would also be incomputable. As a silly example, consider an agent whose terminal values are, "If Turing machine T halts, I want nothing more than to jump up and down. However, if it doesn't halt, then it is of the utmost importance to me that I never jump up and down and instead sit down and frown." Then any ethical system that cares about those preferences is inco...

If we define "bad reasoning" as "crossing when there is a proof that crossing is bad" in general, this begs the question of how to evaluate actions. Of course the troll will punish counterfactual reasoning which doesn't line up with this principle, in that case. The only surprising thing in the proof, then, is that the troll also punishes reasoners whose counterfactuals respect proofs (EG, EDT).

I'm concerned that may not realize that your own current take on counterfactuals respects logical to some extent, and that, if I'm reasoning correctly, could res...

2abramdemski1y
If your point is that there are a lot of things to try, I readily accept this point, and do not mean to argue with it. I only intended to point out that, for your proposal to work, you would have to solve another hard problem. Ordinary Bayesian EDT has to finish its computation (of its probabilistic expectations) in order to proceed. What you are suggesting is to halt those calculations midway. I think you are imagining an agent who can think longer to get better results. But vanilla EDT does not describe such an agent. So, you can't start with EDT; you have to start with something else (such as logical induction EDT) which does already have a built-in notion of thinking longer. Then, my concern is that we won't have many guarantees for the performance of this system. True, it can stop thinking if it knows thinking will be harmful. However, if it mistakenly thinks a specific form of thought will be harmful, it has no device for correction. This is concerning because we expect "early" thoughts to be bad -- after all, you've got to spend a certain amount of time thinking before things converge to anything at all reasonable. So we're between a rock and a hard spot here: we have to stop quite early, because we know the proof of troll bridge is small. But we con't stop early, because we know things take a bit to converge. So I think this proposal is just "somewhat-logically-updateless-DT", which I don't think is a good solution. Generally I think rollback solutions are bad. (Several people have argued in their favor over the years; I find that I'm just never intrigued by that direction...) Some specific remarks: * Note that if you literally just roll back, you would go forward the same way again. So you need to somehow modify the rolled back state, creating a "pseudo-ignorant" belief states where you're not really uninformed, but rather, reconstruct something merely similar to an uninformed state.  * It is my impression that this causes problems.
2abramdemski1y
To elaborate a little, one way we could think about this would be that "in a broad variety of situations" the agent would think this property sounded pretty bad. For example, the hypothetical "PA proves ⊥" would be evaluated as pretty bad by a proof-based agent, in many situations; it would not expect its future self to make decisions well, so, it would often have pretty poor performance bounds for its future self (eg the lowest utility available in the given scenario). So far so good -- your condition seems like one which a counterfactual reasoner would broadly find concerning. It also passes the sniff test of "would I think the agent is being dumb if it didn't cross for this reason?" The fact that there's a troll waiting to blow up a bridge if I'm empirically incorrect about that very setup should not, in itself, make me too reluctant to cross a bridge. If I'm very confident that the situation is indeed as described, then intuitively, I should confidently cross. But it seems that, if I believe your proof, I would not believe this any more. You don't prove whether the agent crosses or not, but you do claim to prove that if the agent crosses, it in fact gets blown up. It seems you think the correct counterfactual (for such an agent) is indeed that it would get blown up if it crosses: So if the proof is to be believed, it seems like the philosophical argument falls flat? If the agent fails to cross for this reason, then it seems you think it is reasoning correctly. If it crosses and explodes, then it fails because it had wrong counterfactuals. This also does not seem like much of an indictment of how it was reasoning -- garbage in, garbage out. We can concern ourselves with achieving more robust reasoners, for sure, so that sometimes garbage in -> treasure out. But that's a far cry from the usual troll bridge argument, where the agent has a 100% correct description of the situation, and nonetheless, appears to mishandle it. To summarize: * The usual troll bri

So let's try again. The key thing in your system is not a program that outputs a hypothetical being's stream of experiences, it's a program that outputs a complete description of a (possibly infinite) universe and also an unambiguous specification of a particular experience-subject within that universe. This is only possible if there are at most countably many experience-subjects in said universe, but that's probably OK.

That's closer to what I meant. By "experience-subject", I think you mean a specific agent at a specific time. If so, my system doesn't ...

2gjm1y
No, I don't intend "experience-subject" to pick out a specific time. (It's not obvious to me whether a variant of your system that worked that way would be better or worse than your system as it is.) I'm using that term rather than "agent" because -- as I think you point out in te OP -- what matters for moral relevance is having experiences rather than performing actions. So, anyway, I think I now agree that your system does indeed do approximately what you say it does, and many of my previous criticisms do not in fact apply to it; my apologies for the many misunderstandings. The fact that it's lavishly uncomputable is a problem for using it in practice, of course :-). I have some other concerns, but haven't given the matter enough thought to be confident about how much they matter. For instance: if the fundamental thing we are considering probability distributions over is programs specifying a universe and an experience-subject within that universe, then it seems like maybe physically bigger experience subjects get treated as more important because they're "easier to locate", and that seems pretty silly. But (1) I think this effect may be fairly small, and (2) perhaps physically bigger experience-subjects should on average matter more because size probably correlates with some sort of depth-of-experience?

The integactions are all supposed to be negative in peace, punch, dust, insult. The surprising thing to me would be that the system would be ambivalent between sand and insult being a bad idea. If we don't necceasrily prefer D to C when helping does it matter if we torture our people a lot or a little as its going to get infinity saturated anyway.

Could you explain what insult is supposed to do? You didn't say what in the previous comment. Does it causally hurt infinitely-many people?

Anyways, it seems to me that my system would not be ambivalent about wh...

2Slider1y
Insult is when you do both punch and dust ie make a negative impact on infinite amotun of people and an additional negative impact on a single person. If degree of torture matters then dusting and punching the same person would be relevant. I guess the theory per se would treat it differntly if the punched person was not one of the dusted ones. "doesn't aggregate anything" - "aggregates the expected value of satisfaction in these situations" When we form the expecation what is going to happen in the descriped situation I imagine breaking it down into sad stories and good stories. The expectation sways upwards if ther are more good stories and downwards if there are more bad stories. My life will turn out somehow which can differ from my "storymates" outcomes. I didn't try to hit any special term but just refer to the cases the probabilities of the stories refer to.

Thanks for responding. As I said, the measure of satisfaction is bounded. And all bounded random variables have a well-defined expected value. Source: Stack Exchange.

Oh, I'm sorry; I misunderstood you. When you said the average of utilities, I thought you meant the utility averaged among all the different agents in the world. Instead, it's just, roughly, an average among probability density function of utility. I say roughly because I guess integration isn't exactly an average.

Please see this comment for an explanation.

RE: scenario one:

All these worlds come out exactly the same, so "infinitely many happy, one unhappy" is indistinguishable from "infinitely many unhappy, one happy"

It's not clear to me how they are indistinguishable. As long as the agent that's unhappy can have itself and its circumstances described with a finite description length, then it would have non-zero probability of an agent ending up as that one. Thus, making the agent unhappy would decrease the moral value of the world.

I'm not sure what would happen if the single unhappy agent has infinite co...

2gjm1y
It sounds as if my latest attempt at interpreting what your system proposes doing is incorrect, because the things you're disagreeing with seem to me to be straightforward consequences of that interpretation. Would you like to clarify how I'm misinterpreting now? Here's my best guess. You wrote about specifications of an experience-subject's universe and situation in it. I mentally translated that to their stream of experiences because I'm thinking in terms of Solomonoff induction. Maybe that's a mistake. So let's try again. The key thing in your system is not a program that outputs a hypothetical being's stream of experiences, it's a program that outputs a complete description of a (possibly infinite) universe and also an unambiguous specification of a particular experience-subject within that universe. This is only possible if there are at most countably many experience-subjects in said universe, but that's probably OK. So that ought to give a well-defined (modulo the usual stuff about uncomputability) probability distribution over experience-subjects-in-universes. And then you want to condition on "being in a universe with such-and-such characteristics" (which may or may not specify the universe itself completely) and look at the expected utility-or-utility-like-quantity of all those experience-subjects-in-universes after you rule out the universes without such-and-such characteristics. It's now stupid-o'-clock where I am and I need to get some sleep. I'm posting this even though I haven't had time to think about whether my current understanding of your proposal seems like it might work, because on past form there's an excellent chance that said understanding is wrong, so this gives you more time to tell me so if it is :-). If I don't hear from you that I'm still getting it all wrong, I'll doubtless have more to say later...

By one logic because we prefer B to A then if we "acausalize" this we should still preserve this preference (because "the amount of copies granted" would seem to be even handed), so we would expect to prefer D to C. However in a system where all infinites are of equal size then C=D and we become ambivalent between the options.

We shouldn't necessarily prefer D to C. Remember that one of the main things you can do to increase the moral value of the universe is to try to causally help other creatures so that other people who are in sufficiently similar cir...

2Slider1y
You can't causally help people without also acausally helping in the same go. Your acausal "influence" forces people matching your description to act the same. Even if it is possible to consider the directly helped and the undirectly helped to be the same they could also be different. In order to be fair we should also extend this to C. What if the person helped by all the acausal copies are in fact the same person? (If there is a proof it can't be why doesn't that apply when the patient group is large?) The integactions are all supposed to be negative in peace, punch, dust, insult. The surprising thing to me would be that the system would be ambivalent between sand and insult being a bad idea. If we don't necceasrily prefer D to C when helping does it matter if we torture our people a lot or a little as its going to get infinity saturated anyway. The basic sitatuino is that I have intuitions which I can't formulate that well. I will try another route. Suppose I help one person and then there is either a finite or infinite amount of people in my world. Finite impact over finite people leads to a real and finite kick. Finite impact over infinite people leads to a infinidesimal kick. Ah, but acausal copies of the finites! Yeah, but what about the acausal copies of the infinites? When I say "world has finite or infinite people" that is "within description" say that there are infinite people because I believe there are infinitely many stars. Then all the acausal copies of sol are going to have their own "out there" stars. Acts that "help all the stars" and "all the stars as they could have been" are different. Atleast until we consider that any agent that decides to "help all the stars" will have acausal shadows "that could have been". But still this consideration increases the impact on the multiverse (or keeps it the same if moving from a monoverse to a multiverse in the same step). One way to slither out of this is to claim that world-predescription-expansion need

I'll begin at the end: What is "the expected value of utility" if it isn't an average of utilities?

I'm just using the regular notion of expected value. That is, let P(u) be the probability density you get utility u. Then, the expected value of utility is , where uses Lebesgue integration for greater generality. Above, I take utility to be in .

Also note that my system cares about a measure of satisfaction, rather than specifically utility. In this case, just replace P(u) to be that measure of life satisfaction instead of a utility.

Als...

2gjm1y
If you are just using the regular notion of expected value then it is an average of utilities. (Weighted by probabilities.) I understand that your measure of satisfaction need not be a utility as such, but "utility" is shorter than "measure of satisfaction which may or may not strictly speaking be utility".

Post is pretty long winded,a bit wall fo texty in a lot of text which seems like fixed amount of content while being very claimy and less showy about the properties.

Yeah, I see what you mean. I have a hard time balancing between being succinct and providing sufficient support and detail. It actually used to be shorter, but I lengthened it to address concerns brought up a review.

My suspicion is that the acausal impact ends up being infinidesimal anyway. Even if one would get finite probability impact for probabilties concerning a infinite universe for

...
2Slider1y
The "nearby" acausal relatedness gives a certain multiplier (that is transfinite). That multiplier should be the same for all options in that scenario. Then if you have an option that has a finite multiplier and an infinite multipier the "simple" option is "only" infinite overall but the "large" option is "doubly" infinite because each of your likenesses has a infinite impact alone already (plus as an aggregate it would gain a infinite quality that way too). Now cardinalities don't really support "doubly infinite" ℵ0+ℵ0 is just ℵ0. However for transfinite values cardinality and ordinality diverge and for example with surreal numbers one could have ω+ω>ω and for relevantly for here ω<ω2 . As I understand there are four kinds of impact A="direct impact of helping one", B="direct impact of helping infinite amount", C="acasual impact of choosing ot help 1" and D="acausal impact of choosing to help infinite". You claim that B and C are either equivalent or roughly equivalent and A and B are not. But there is a lurking paralysis if D and C are (roughly) equivalent. By one logic because we prefer B to A then if we "acausalize" this we should still preserve this preference (because "the amount of copies granted" would seem to be even handed), so we would expect to prefer D to C. However in a system where all infinites are of equal size then C=D and we become ambivalent between the options. To me it would seem natural and the boundary conditions are near to forcing that D has just a vast cap to C that B has to A. In the above "roughly" can be somewhat translated to more precise language as "are within finite multiples away from each other" ie they are not relatively infinite ie they belong to the same archimedean field (helping 1 person or 2 person are not the same but they represent the case of "help fixed finite amount of people"). Within the example it seems we need to identify atleast 3 such fields. Moving within the field is "easy" understood real math. But when you n

Of course you can make moral decisions without going through such calculations. We all do that all the time. But the whole issue with infinite ethics -- the thing that a purported system for handling infinite ethics needs to deal with -- is that the usual ways of formalizing moral decision processes produce ill-defined results in many imaginable infinite universes. So when you propose a system of infinite ethics and I say "look, it produces ill-defined results in many imaginable infinite universes", you don't get to just say "bah, who cares about the deta

...
2gjm1y
I'll begin at the end: What is "the expected value of utility" if it isn't an average of utilities? You originally wrote: What is "the expected value of your life satisfaction [] conditioned on you being an agent in this universe but [not] on anything else" if it is not the average of the life satisfactions (utilities) over the agents in this universe? (The slightly complicated business with conditional probabilities that apparently weren't what you had in mind were my attempt at figuring out what else you might mean. Rather than trying to figure it out, I'm just asking you.)

You say, "There must be some reasonable way to calculate this."

(where "this" is Pr(I'm satisfied | I'm some being in such-and-such a universe)) Why must there be? I agree that it would be nice if there were, of course, but there is no guarantee that what we find nice matches how the world actually is.

To use probability theory to form accurate beliefs, we need a prior. I didn't think this was controversial. And if you have a prior, as far as I can tell, you can then compute Pr(I'm satisfied | I'm some being in such-and-such a universe) by simply updat...

2gjm1y
OK, so I think I now understand your proposal better than I did. So if I'm contemplating making the world be a particular way, you then propose that I should do the following calculation (as always, of course I can't do it because it's uncomputable, but never mind that): * Consider all possible computable experience-streams that a subject-of-experiences could have. * Consider them, specifically, as being generated by programs drawn from a universal distribution. * Condition on being in the world that's the particular way I'm contemplating making it -- that is, discard experience-streams that are literally inconsistent with being in that world. * We now have a probability distribution over experience-streams. Compute a utility for each, and take its expectation. And now we compare possible universes by comparing this expected utility. (Having failed to understand your proposal correctly before, I am not super-confident that I've got it right now. But let's suppose I have and run with it. You can correct me if not. In that case, some or all of what follows may be irrelevant.) I agree that this seems like it will (aside from concerns about uncomputability, and assuming our utilities are bounded) yield a definite value for every possible universe. However, it seems to me that it has other serious problems which stop me finding it credible. SCENARIO ONE. So, for instance, consider once again a world in which there are exactly two sorts of experience-subject, happy and unhappy. Traditionally we suppose infinitely many of both, but actually let's also consider possible worlds where there is just one happy experience-subject, or just one unhappy one. All these worlds come out exactly the same, so "infinitely many happy, one unhappy" is indistinguishable from "infinitely many unhappy, one happy". That seems regrettable, but it's a bullet I can imagine biting -- perhaps we just don't care at all about multiple instantiations of the exact same stream o

Kind of hard to ge a handle.

Are you referring to it being hard to understand? If so, I appreciate the feedback and am interested in the specifics what is difficult to understand. Clarity is a top priority for me.

If I have a choice of (finitely) helping a single human and I believe there to be infinite humans then the probability of a human being helped in my world will nudge less than a real number. And if we want to stick with probabilties being real then the rounding will make infinitarian paralysis.

You are correct that a single human would have 0...

2Slider1y
Post is pretty long winded,a bit wall fo texty in a lot of text which seems like fixed amount of content while being very claimy and less showy about the properties. My suspicion is that the acausal impact ends up being infinidesimal anyway. Even if one would get finite probability impact for probabilties concerning a infinite universe for claims like "should I help this one person" then claims like "should I help these infinite persons" would still have an infinity class jump between the statements (even if both need to have an infinite kick into the universe to make a dent there is an additional level to one of these statements and not all infinities are equal).  I am going to anticipate that your scheme will try to rule out statements like "should I help these infinite persons" for a reason like "its not of finite complexity". I am not convinced that finite complexity descriptions are good guarantees that the described condition makes for a finite proportion of possibility space. I think "Getting a perfect bullseye" is a description of finite complexity but it describes and outcome of (real) 0 probabaility. Being positive is of no guarantee of finitude, infinidesimal chances would spell trouble for the theory. And if statements like "Slider or (near equivalent) gets a perfect bullseye" are disallowed for not being finitely groundable then most references to infinite objects are ruled out anyway. Its not exactly an infinite ethic if it is not allowed to refer to infinite things. I am also slightly worried that "description cuts" will allow "doubling the ball" [https://en.wikipedia.org/wiki/Banach%E2%80%93Tarski_paradox] kind of events where total probability doesn't get preserved. That phenomenon gets around the theorethical problems by designating some sets non-measurable. But then being a a set doesn't mean its measurable. I am worried that "descriptions always have a usable probablity" is too lax and will bleed from the edges like a naive assumption that all

(Assuming you're read my other response you this comment):

I think it might help if I give a more general explanation of how my moral system can be used to determine what to do. This is mostly taken from the article, but it's important enough that I think it should be restated.

Suppose you're considering taking some action that would benefit our world or future life cone. You want to see what my ethical system recommends.

Well, for almost possible circumstances an agent could end up in in this universe, I think your action would have effectively no causal or ...

4gjm1y
Your comments are focusing on (so to speak) the decision-theoretic portion of your theory, the bit that would be different if you were using CDT or EDT rather than something FDT-like. That isn't the part I'm whingeing about :-). (There surely are difficulties in formalizing any sort of FDT, but they are not my concern; I don't think they have much to do with infinite ethics as such.) My whingeing is about the part of your theory that seems specifically relevant to questions of infinite ethics, the part where you attempt to average over all experience-subjects. I think that one way or another this part runs into the usual average-of-things-that-don't-have-an-average sort of problem which afflicts other attempts at infinite ethics. As I describe in another comment, the approach I think you're taking can move where that problem arises but not (so far as I can currently see) make it actually go away.

How is it a distribution over possible agents in possible universes (plural) when the idea is to give a way of assessing the merit of one possible universe?

I do think JBlack understands the idea of my ethical system and is using it appropriately.

my system provides a method of evaluating the moral value of a specific universe. The point of moral agents to to try to make the universe one that scores highlly on this moral valuation. But we don't know exactly what universe we're in, so to make decisions, we need to consider all universes we could be in, and...

4gjm1y
As I said to JBlack, so far as I can tell none of the problems I think I see with your proposal become any easier to solve if we switch from "evaluate one possible universe" to "evaluate all possible universes, weighted by credence". Why not? Of course you can make moral decisions without going through such calculations. We all do that all the time. But the whole issue with infinite ethics -- the thing that a purported system for handling infinite ethics needs to deal with -- is that the usual ways of formalizing moral decision processes produce ill-defined results in many imaginable infinite universes. So when you propose a system of infinite ethics and I say "look, it produces ill-defined results in many imaginable infinite universes", you don't get to just say "bah, who cares about the details?" If you don't deal with the details you aren't addressing the problems of infinite ethics at all! It's nice that your system gives the expected result in a situation where the choices available are literally "make everyone in the world happy" and "destroy the world". (Though I have to confess I don't think I entirely understand your account of how your system actually produces that output.) We don't really need a system of ethics to get to that conclusion! What I would want to know is how your system performs in more difficult cases. We're concerned about infinitarian paralysis, where we somehow fail to deliver a definite answer because we're trying to balance an infinite amount of good against an infinite amount of bad. So far as I can see, your system still has this problem. E.g., if I know there are infinitely many people with various degrees of (un)happiness, and I am wondering whether to torture 1000 of them, your system is trying to calculate the average utility in an infinite population, and that simply isn't defined. So, I think this is what you have in mind; my apologies if it was supposed to be obvious from the outset. We are doing something like Solomonof

How does that cash out if not in terms of picking a random agent, or random circumstances in the universe? So, remember, the moral value of the universe according to my ethical system depends on P(I'll be satisfied | I'm some creature in this universe).

There must be some reasonable way to calculate this. And one that doesn't rely on impossibly taking a uniform sample from a set that has none. Now, we haven't fully formalized reasoning and priors yet. But there is some reasonable prior probability distribution over situations you could end up in. And aft...

2gjm1y
You say (where "this" is Pr(I'm satisfied | I'm some being in such-and-such a universe)) Why must there be? I agree that it would be nice if there were, of course, but there is no guarantee that what we find nice matches how the world actually is. Does whatever argument or intuition leads you to say that there must be a reasonable way to calculate Pr(X is satisfied | X is a being in universe U) also tell you that there must be a reasonable way to calculate Pr(X is even | X is a positive integer)? How about Pr(the smallest n with x <= n! is even | x is a positive integer)? I should maybe be more explicit about my position here. Of course there are ways to give a meaning to such expressions. For instance, we can suppose that the integer n occurs with probability 2^-n, and then e.g. if I've done my calculations right then the second probability is the sum of 2^-0! + (2^-2!-2^-3!) + (2^-4!-2^-5!) + ... which presumably doesn't have a nice closed form (it's transcendental for sure) but can be calculated to high precision very easily. But that doesn't mean that there's and such thing as the way to give meaning to such an expression. We could use some other sequence of weights adding up to 1 instead of the powers of 1/2, for instance, and we would get a substantially different answer. And if the objects of interest to us were beings in universe U rather than positive integers, they wouldn't come equipped with a standard order to look at them in. Why should we expect there to be a well-defined answer to the question "what fraction of these beings are satisfied"? No, because I do not assign any probability to being happy in that universe. I don't know a good way to assign such probabilities and strongly suspect that there is none. You suggest doing maximum entropy on the states of the pseudorandom random number generator being used by the AI making this universe. But when I was describing that universe I said nothing about AIs and nothing about pseudorandom number gene

Thank you for responding. I actually had someone else bring up the same way in a review; maybe I should have addressed this in the article.

The average life satisfaction is undefined in a universe with infinitely-many agents of varying life-satisfaction. Thus a moral system using it suffers from infinitarian paralysis. My system doesn't worry about averages, and thus does not suffer from this problem.

1conchis1y
My point was more that, even if you can calculate the expectation, standard versions of average utilitarianism are usually rejected for non-infinitarian reasons (e.g. the repugnant conclusion) that seem like they would plausibly carry over to this proposal as well. I haven't worked through the details though, so perhaps I'm wrong. Separately, while I understand the technical reasons for imposing boundedness on the utility function, I think you probably also need a substantive argument for why boundedness makes sense, or at least is morally acceptable. Boundedness below risks having some pretty unappealing properties, I think.  Arguments that utility functions are in fact bounded in practice seem highly contingent, and potentially vulnerable e.g. to the creation of utility-monsters [https://en.wikipedia.org/wiki/Utility_monster], so I assume what you really need is an argument that some form of sigmoid transformation from an underlying real-valued welfare, u = s(w), is justified. On the one hand, the resulting diminishing marginal utility for high-values of welfare will likely be broadly acceptable to those with prioritarian intuitions. But I don't know that I've ever seen an argument for the sort of anti-prioritarian results you get as a result of increasing marginal utility at very low levels of welfare. Not only would this imply that there's a meaningful range where it's morally required to deprioritise the welfare of the worse off, this deprioritisation is greatest for the very worst off. Because the sigmoid function essentially saturates at very low levels of welfare, at some point you seem to end up in a perverse version of Torture vs. dust specks [https://www.lesswrong.com/posts/3wYTFWY3LKQCnAptN/torture-vs-dust-specks] where you think it's ok (or indeed required) to have 3^^^3 people (whose lives are already sufficiently terrible) horribly tortured for fifty years without hope or rest, to avoid someone in the middle of the welfare distribution getting a du
1tivelen1y
Your system may not worry about average life satisfaction, but it does seem to worry about expected life satisfaction, as far as I can tell. How can you define expected life satisfaction in a universe with infinitely-many agents of varying life-satisfaction? Specifically, given a description of such a universe (in whatever form you'd like, as long as it is general enough to capture any universe we may wish to consider), how would you go about actually doing the computation? Alternatively, how do you think that computing "expected life satisfaction" can avoid the acknowledged problems of computing "average life satisfaction", in general terms?

I think this system may have the following problem: It implicitly assumes that you can take a kind of random sample that in fact you can't.

You want to evaluate universes by "how would I feel about being in this universe?", which I think means either something like "suppose I were a randomly chosen subject-of-experiences in this universe, what would my expected utility be?" or "suppose I were inserted into a random place in this universe, what would my expected utility be?". (Where "utility" is shorthand for your notion of "life satisfaction", and you a

...
2gjm1y
I don't think I understand why your system doesn't require something along the lines of choosing a uniformly-random agent or place. Not necessarily exactly either of those things, but something of that kind. You said, in OP: How does that cash out if not in terms of picking a random agent, or random circumstances in the universe? If I understand your comment correctly, you want to deal with that by picking a random description of a situation in the universe, which is just a random bit-string with some constraints on it, which you presumably do in something like the same way as choosing a random program when doing Solomonoff induction: cook up a prefix-free language for describing situations-in-the-universe, generate a random bit-string with each bit equally likely 0 or 1, and see what situation it describes. But now everything depends on the details of how descriptions map to actual situations, and I don't see any canonical way to do that or any anything-like-canonical way to do it. (Compare the analogous issue with Solomonoff induction. There, everything depends on the underlying machine, but one can argue at-least-kinda-plausibly that if we consider "reasonable" candidates, the differences between them will quickly be swamped by all the actual evidence we get. I don't see anything like that happening here. What am I missing? Your example with an AI generating people with a PRNG is, so far as it goes, fine. But the epistemic situation one needs to be in for that example to be relevant seems to me incredibly different from any epistemic situation anyone is ever really in. If our universe is running on a computer, we don't know what computer or what program or what inputs produced it. We can't do anything remotely like putting a uniform distribution on the internal states of the machine. Further, your AI/PRNG example is importantly different from the infinitely-many-random-people example on which it's based. You're supposing that your AI's PRNG has an internal s

I'm not entirely sure what you consider to be a "bad" reason for crossing the bridge. However, I'm having a hard time finding a way to define it that both causes agents using evidential counterfactuals to necessarily fail while not having other agents fail.

One way to define a "bad" reason is an irrational one (or the chicken rule). However, if this is what is meant by a "bad" reason, it seems like this is an avoidable problem for an evidential agent, as long as that agent has control over what it decides to think about.

To illustrate, consider what I would ...

2abramdemski1y
Ok. This threw me for a loop briefly. It seems like I hadn't considered your proposed definition of "bad reasoning" (ie "it's bad if the agent crosses despite it being provably bad to do so") -- or had forgotten about that case. I'm not sure I endorse the idea of defining "bad" first and then considering the space of agents who pass/fail according to that notion of "bad"; how this is supposed to work is, rather, that we critique a particular decision theory by proposing a notion of "bad" tailored to that particular decision theory. For example, if a specific decision theorist thinks proofs are the way to evaluate possible actions, then "PA proves ⊥" will be a convincing notion of "bad reasoning" for that specific decision theorist. If we define "bad reasoning" as "crossing when there is a proof that crossing is bad" in general, this begs the question of how to evaluate actions. Of course the troll will punish counterfactual reasoning which doesn't line up with this principle, in that case. The only surprising thing in the proof, then, is that the troll also punishes reasoners whose counterfactuals respect proofs (EG, EDT).  If I had to make a stab at a generic notion of "bad", it would be "the agent's own way of evaluating consequences says that the consequences of its actions will be bad". But this is pretty ambiguous in some cases, such as chicken rule. I think a more appropriate way to generally characterize "bad reasoning" is just to say that proponents of the decision theory in question should agree that it looks bad. This is an open question, even for the examples I gave! I've been in discussions about Troll Bridge where proponents of proof-based DT (aka MUDT) argue that it makes perfect sense for the agent to think its action can control the consistency of PA in this case, so the reasoning isn't "bad", so the problem is unfair. I think it's correct to identify this as the crux of the argument -- whether I think the troll bridge argument incriminates proof

I'm certain that ants do in fact have preferences, even if they can't comprehend the concept of preferences in abstract or apply them to counterfactual worlds. They have revealed preferences to quite an extent, as does pretty much everything I think of as an agent.

I think the question of whether insects have preferences in morally pretty important, so I'm interested in hearing what made you think they do have them.

I looked online for "do insects have preferences?", and I saw articles saying they did. I couldn't really figure out why they thought they di...

Right, I suspected the evaluation might be something like that. It does have the difficulty of being counterfactual and so possibly not even meaningful in many cases.

Interesting. Could you elaborate?

I suppose counterfactuals can be tricky to reason about, but I'll provide a little more detail on what I had in mind. Imagine making a simulation of an agent that is a fully faithful representation of its mind. However, run the agent simulation in a modified environment that both gives it access to infinite computational resources as well as makes it ask, an...

1JBlack1y
I'm certain that ants do in fact have preferences, even if they can't comprehend the concept of preferences in abstract or apply them to counterfactual worlds. They have revealed preferences to quite an extent, as does pretty much everything I think of as an agent. They might not be communicable, numerically expressible, or even consistent, which is part of the problem. When you're doing the extrapolated satisfaction, how much of what you get reflects the actual agent and how much the choice of extrapolation procedure?

Presumably the evaluation is not just some sort of average-over-actual-lifespan of some satisfaction rating for the usual reason that (say) annihilating the universe without warning may leave average satisfaction higher than allowing it to continue to exist, even if every agent within it would counterfactually have been extremely dissatisfied if they had known that you were going to do it. This might happen if your estimate of the current average satisfaction was 79% and your predictions of the future were that the average satisfaction over the next trill

...
1JBlack1y
Right, I suspected the evaluation might be something like that. It does have the difficulty of being counterfactual and so possibly not even meaningful in many cases, but I do like the fact that it's based on agent-situations rather than individual agent-actions. On the other hand, evaluations from the point of view of agents that are sapient beings might be ethically completely dominated by those of 10^12 times as many agents that are ants, and I have no idea how such counterfactual evaluations might be applied to them at all.

I'm not sure how this system avoids infinitarian paralysis. For all actions with finite consequences in an infinite universe (whether in space, time, distribution, or anything else), the change in the expected value resulting from those actions is zero.

The causal change from your actions is zero. However, there are still logical connections between your actions and the actions of other agents in very similar circumstances. And you can still consider these logical connections to affect the total expected value of life satisfaction.

It's true, though, that...

2JBlack2y
Yes, that does clear up both of my questions. Thank you! Presumably the evaluation is not just some sort of average-over-actual-lifespan of some satisfaction rating for the usual reason that (say) annihilating the universe without warning may leave average satisfaction higher than allowing it to continue to exist, even if every agent within it would counterfactually have been extremely dissatisfied if they had known that you were going to do it. This might happen if your estimate of the current average satisfaction was 79% and your predictions of the future were that the average satisfaction over the next trillion years would be only 78.9%. I'm not sure what your idea of the evaluation actually is though, and how it avoids making it morally right (and perhaps even imperative) to destroy the universe in such situations.

I've come up with a system of infinite ethics intended to provide more reasonable moral recommendations than previously-proposed ones. I'm very interested in what people think of this, so comments are appreciated. I've made a write-up of it below.

One unsolved problem in ethics is that aggregate consquentialist ethical theories tend to break down if the universe is infinite. An infinite universe could contain both an infinite amount of good and an infinite amount of bad. If so, you are unable to change the total amount of good or bad in the universe, which ...

2JBlack2y
I'm not sure how this system avoids infinitarian paralysis. For all actions with finite consequences in an infinite universe (whether in space, time, distribution, or anything else), the change in the expected value resulting from those actions is zero. Actions that may have infinite consequences thus become the only ones that can matter under this theory in an infinite universe. You could perhaps drag in more exotic forms of arithmetic such as surreal numbers or hyperreals, but then you need to rebuild measure theory and probability from the ground up in that basis. You will likely also need to adopt some unusual axioms such as some analogue of the Axiom of Determinacy to ensure that every distribution of satisfactions has an expected value. I'm also not sure how this differs from Average Utilitarianism with a bounded utility function.