Wiki Contributions


Why does this pose an issue for reinforcement learning? Forgive my ignorance, I do not have a background in the subject. Though I don't believe that I have information which distinguishes cereal/granola in terms of which has stronger highest-severity consequences (given the smallness of those numbers and my inability to conceive of them, I strongly suspect anything I could come up with would exclusively represent epistemic and not aleatoric uncertainty), even if I accept it then the theory would tell me, correctly, that I should act based on that level. If that seems wrong, then it's evidence we've incorrectly identified an implicit severity class in our imagination of the hypothetical, not that severity classes are incoherent (i.e. if I really have reason to believe that eating cereal even slightly increases the chance of Universe Destruction compared to eating granola, shouldn't that make my decision for me?)

I would argue that many actions are sufficiently isolated such that, while they'll certainly have high-severity ripple effects, we have no reason to believe that on expectation the high-severity consequences are worse than they would have been for a different action.

If the non-Archimedean framework really does "collapse" to an Archimedean one in practice, that's fine with me. It exists to respond to a theoretical question about qualitatively different forms of utility, without biting a terribly strange bullet. Collapsing the utility function would mean assigning weight 0 to all but the maximal severity level, which seems very bad in that we certainly prefer no dust specks in our eyes to dust specks (ceteris paribus), and this should be accurately reflected in our evaluation of world states, even if the ramped function does lead to the same action preferences in many/most real-life scenarios for a sufficiently discerning agent (which maybe AI will be, but I know I am not).

If we had infinite compute, that would not eliminate empirical uncertainty. There are many things you cannot compute because you just don't have enough information. This is why in learning theory sample complexity is distinct from computational complexity, and applies to algorithms with unlimited computational resources. So, you would definitely still need to take expectations.

Thanks for letting me know about this! Another thing I haven't studied.

I agree that small decisions have high-severity impacts, my point was that it is only worth the time to evaluate that impact if there aren't other decisions I could spend that time making which have much greater impact and for which my decision time will be more effectively allocated. This is a comment about how using the non-Archimedean framework goes in practice. Certainly, if we had infinite compute and time to make every decision, we should focus on the most severe impacts of those decisions, but that is not the world we are in (and if we were, it would change quite a lot of other things too, effectively eliminating empirical uncertainty and the need to take expectations at all).

Yes! This is all true. I thought set differences of infinite unions and quotients would only make the post less accessible for non-mathematicians though. I also don't see a natural way to define the filtration without already having defined the severity classes.

The thing you called "pseudograding" is normally called "filtration".

Ah, thanks! I knew there had to be something for that, just couldn't remember what it was. I was embarrassed posting with a made-up word, but I really did look (and ask around) and couldn't find what I needed.

...Although, reading the definition, I'm not sure it's exactly the same...the severity classes aren't nested, and I think this is probably an important distinction to the conceptual framing, even if the math is equivalent. If I start with a filtration proper, I need to extract the severity classes in a way that seems slightly more convoluted than what I did.

In practice, because of the complexity of the world, and especially because of the presence of probabilistic uncertainty, an agent following a non-Archimedean utility function will always consider only the component corresponding to the absolute maximum of I, since there will never be a choice between A and B such that these components just happen to be exactly equal. So it will be equivalent to an Archimedean agent whose utility is this worst component.

See my response to Dacyn.

we find the need for a weird cut off point, like a broken arm

For the cut-off point on a broken arm, I recommend the elbow [not a doctor].

Suppose there was a strong clustering effect in human psychology, such that less than a week of torture left peoples minds in one state, and more than a week left them broken. I would still expect the possibility of some intermediate cases on the borderlines. Things as messy as human psychology, I would expect there to not be a perfectly sharp black and white cutoff. If we zoom in enough, we find that the space of possible quantum wavefunctions is continuous.

I agree! You've made my point for me: it is precisely this messiness which grants us continuity on average. Some people will take longer than others to have qualitatively incomparably damaging effects from torture, and as such the expected impact of any significant torture will have a component on the severity level of 50 years torture. Hence, comparable (on expectation).

Once you introduce any meaningful uncertainty into a non-Archimedean utility framework, it collapses into an Archimedean one. This is because even a very small difference in the probabilities of some highly positive or negative outcome outweighs a certainty of a lesser outcome that is not Archimedean-comparable. And if the probabilities are exactly aligned, it is more worth your time to do more research so that they will be less aligned, than to act on the basis of a hierarchically less important outcome.

I don't think this is true. As an example, when I wake up in the morning I make the decision between granola and cereal for breakfast. Universe Destruction is undoubtedly high up on the severity scale (certainly higher than crunch satisfaction utility), so your argument suggests that I should spend time researching which choice is more likely to impact that. However, the difference in expected impact in these options is so averse to detection that, despite the fact that I literally act on this choice every single day of my life, it would never be worth the time to research breakfast foods instead of other choices which have stronger (i.e. measurable by the human mind) impacts on Universe Destruction.

This is not a bug, but an incredible feature of the non-Archimedean framework. It allows you to evaluate choices only on the highest severity level at which they actually occur, which is in fact how humans seem to make their decisions already, to some approximation.

As for the car example, your analysis seems sound (assuming there's no positive expected utility at or above the severity level of car crash injuries to counterbalance it, which is not necessarily the case--e.g. driving somewhere increases the chance that you meet more people and consequently find the love(s) of your life, which may well be worth a broken limb or two. Alternatively, if you are driving to a workshop on AI risk then you may believe yourself to be reducing the expected disutility from unaligned AI, which appears to be incomparable with a car crash). But, forgiving my digression and argument of the hypothetical: the claim that not driving is (often) preferable to driving feels much more reasonable to me than the claim that some number of dust specks is worse than torture.

if SPECKS is preferable to TORTURE, then for some N and some level of torture X, you must prefer 10N people to be tortured at level X than N to be tortured at a slightly higher level X'. This is unreasonable, since X is only slightly higher than X', while you are forcing 10 times as many people to suffer the torture

I'm not sure I understand this properly. To clarify, I don't believe that any non-torture suffering is incomparable with torture, merely that dust specks are. I think "slightly higher level" is potentially misleading here--if it's in a different severity class, then by definition there is nothing slight about it. Depending on the order type, there may not even be a level immediately above torture X, and it may be that there are infinitely many severity classes sitting between X and any distinct X' (think: or ).

I agree that delineating the precise boundaries of comparability classes is a uniquely challenging task. Nonetheless, it does not mean they don't exist--to me your claim feels along the same lines as classical induction "paradoxes" involving classifying sand heaps. While it's difficult to define exactly what a sand heap is, we can look at many objects and say with certainty whether or not they are sand heaps, and that's what matters for living in the world and making empirical claims (or building sandcastles anyway).

I suspect it's quite likely that experiences you may be referring to as "higher quantities of themselves" within a single person are in fact qualitatively different and no longer comparable utilities in many cases. Consider the dust specks: they are assumed to be minimally annoying and almost indetectable to the bespeckèd. However, if we even slightly upgrade them so as to cause a noticeable sting in their targeted eye, they appear to reach a whole different level. I'd rather spend my life plagued by barely noticeable specks (assuming they have no interactions) than have one slightly burn my eyeball.