Anthony DiGiovanni

(Formerly "antimonyanthony.") I'm an s-risk-focused AI safety researcher at the Center on Long-Term Risk. I (occasionally) write about altruism-relevant topics on my Substack. All opinions my own.

10

The branch that's about sequential decision-making, you mean? I'm unconvinced by this too, see e.g. here — I'd appreciate more explicit arguments for this being "nonsense."

10

my response: "The arrow does not point toward most Sun prayer decision rules. In fact, it only points toward the ones that are secretly bayesian expected utility maximization. Anyway, I feel like this does very little to address my original point that there is this big red arrow pointing toward bayesian expected utility maximization and no big red arrow pointing toward Sun prayer decision rules."

I don't really understand your point, sorry. "Big red arrows towards X" only are a problem for doing Y if (1) they tell me that doing Y is inconsistent with doing [the form of X that's necessary to avoid leaving value on the table]. And these arrows aren't action-guiding for me unless (2) they tell me which particular variant of X to do. I've argued that there is no sense in which either (1) or (2) is true. Further, I think there are various big green arrows towards Y, as sketched in the SEP article and Mogensen paper I linked in the OP, though I understand if these aren't fully satisfying positive arguments. (I tentatively plan to write such positive arguments up elsewhere.)

I'm just not swayed by vibes-level "arrows" if there isn't an argument that my approach is leaving value on the table by my lights, or that you have a particular approach that doesn't do so.

10

Addendum: The approach I take in "*Ex ante *sure losses are irrelevant if you never actually occupy the *ex ante *perspective" has precedent in Hedden (2015)'s defense of "time-slice rationality," which I highly recommend. Relevant quote:

I am unmoved by the Diachronic Dutch Book Argument, whether for Conditionalization or for Reflection. This is because from the perspective of Time-Slice Rationality, it is question-begging. It is uncontroversial that collections of distinct agents can act in a way that predictably produces a mutually disadvantageous outcome without there being any irrationality. The defender of the Diachronic Dutch Book Argument must assume that this cannot happen with collections of time-slices of the same agent; if a collection of time-slices of the same agent predictably produces a disadvantageous outcome, there is ipso facto something irrational going on. Needless to say, this assumption will not be granted by the defender of Time-Slice Rationality, who thinks that the relationship between time-slices of the same agent is not importantly different, for purposes of rational evaluation, from the relationship between time-slices of distinct agents.

50

I reject the premise that my beliefs are equivalent to my betting odds. My betting odds are a decision, which I derive from my beliefs.

20

It's not that I "find it unlikely on priors" — I'm literally asking what your prior on the proposition I mentioned is, and why you endorse that prior. If you answered that, I could answer why I'm skeptical that that prior really is the unique representation of your state of knowledge. (It might well be the unique representation of the most-salient-to-you intuitions about the proposition, but that's not your state of knowledge.) I don't know what further positive argument you're looking for.

21

really ridiculously strong claim

What's your prior that in 1000 years, an Earth-originating superintelligence will be aligned to object-level values close to those of humans alive today [for whatever operationalization of "object-level" or "close" you like]? And why do you think that prior uniquely accurately represents your state of knowledge? Seems to me like the view that a single prior *does *accurately represent your state of knowledge is the strong claim. I don’t see how the rest of your comment answers this.

(Maybe you have in mind a very different conception of “represent” or “state of knowledge” than I do.)

10

And indeed, it is easy to come up with a case where the action that gets chosen is not best according to any distribution in your set of distributions: let there be one action which is uniformly fine and also for each distribution in the set, let there be an action which is great according to that distribution and disastrous according to every other distribution; the uniformly fine action gets selected, but this isn't EV max for any distribution in your representor.

Oops sorry, my claim had the implicit assumptions that (1) your representor includes all the convex combinations, and (2) you can use mixed strategies. ((2) is standard in decision theory, and I think (1) is a reasonable assumption — if I feel clueless as to how much I endorse distribution p vs distribution q, it seems weird for me to still be confident that I don't endorse a mixture of the two.)

If those assumptions hold, I think you can show that the max-regret-minimizing action maximizes EV w.r.t. some distribution in your representor. I don't have a proof on hand but would welcome counterexamples. In your example, you can check that either the uniformly fine action does best on a mixture distribution, or a mix of the other actions does best (lmk if spelling this out would be helpful).

10

If you buy the CCT's assumptions, then you literally do have an argument that anything other than precise EV maximization is bad

No, you have an argument that {anything that *cannot be represented after the fact *as precise EV maximization, with respect to some utility function and distribution} is bad. This doesn't imply that an agent who maintains imprecise beliefs will do badly.

Maybe you're thinking something like: "The CCT says that my policy is guaranteed to be Pareto-efficient iff it maximizes EV w.r.t. some distribution. So even if I don't know which distribution to choose, and even though I'm not guaranteed *not *to be Pareto-efficient if I follow Maximality, I at least **know** I don't violate Pareto-efficiency if do precise EV maximization"?

If so: I'd say that there are several imprecise decision rules that can be represented after the fact as precise EV max w.r.t. some distributions, so the CCT doesn't rule them out. E.g.:

- The minimax regret rule (sec 5.4.2 of Bradley (2012)) is equivalent to EV max w.r.t. the distribution in your representor that induces maximum regret.
- The maximin rule (sec 5.4.1) is equivalent to EV max w.r.t. the most pessimistic distribution.

You might say "Then why not just do precise EV max w.r.t. those distributions?" But the whole problem you face as a decision-maker is, **how do you decide which distribution?** Different distributions recommend different policies. If you endorse precise beliefs, it seems you'll commit to one distribution that you think best represents your epistemic state. Whereas someone with imprecise beliefs will say: "My epistemic state is not represented by just one distribution. I'll evaluate the imprecise decision rules based on which decision-theoretic desiderata they satisfy, then apply the most appealing decision rule (or some way of aggregating them) w.r.t. my imprecise beliefs." If the decision procedure you follow is psychologically equivalent to my previous sentence, then I have no objection to your procedure — I just think it would be misleading to say you endorse precise beliefs in that case.

52

Thanks for the detailed answer! I won't have time to respond to everything here, but:

I like the canonical arguments for bayesian expected utility maximization (

https://www.alignmentforum.org/posts/sZuw6SGfmZHvcAAEP/complete-class-consequentialist-foundations; alsohttps://web.stanford.edu/~hammond/conseqFounds.pdfseems cool (though I haven't read it properly)). I've never seen anything remotely close for any of this other stuff

But the CCT only says that if you satisfy [blah], your policy is consistent* *with precise EV maximization. This doesn't imply your policy is *inconsistent* with Maximality, nor (as far as I know) does it tell you what distribution with respect to which you should maximize precise EV in order to satisfy [blah] (or even that such a distribution is unique). So I don’t see a positive case here for precise EV maximization [ETA: as a procedure to guide your decisions, that is]. (This is my also response to your remark below about “equivalent to "act consistently with being an expected utility maximizer".”)

e.g. if one takes the cost of thinking into account in the calculation, or thinks of oneself as choosing a policy

Could you expand on this with an example? I don’t follow.

people often talk about things like default actions, permissibility, and preferential gaps, and these concepts seem bad to me. More precisely, they seem unnatural/unprincipled/confused/[I have a hard time imagining what they could concretely cache out to that would make the rule seem non-silly/useful].

Maximality and imprecision don’t make any reference to “default actions,” so I’m confused. I also don’t understand what’s unnatural/unprincipled/confused about permissibility or preferential gaps. They seem quite principled to me: I have a strict preference for taking action A over B (/ B is impermissible) only if I’m justified in beliefs according to which I expect A to do better than B.

basically everything becomes permissible, which seems highly undesirable

This is a much longer conversation, but briefly: I think it’s ad hoc / putting the cart before the horse to shape our epistemology to fit our intuitions about what decision guidance we should have.

My claim is that your notion of "utter disaster" presumes that a consequentialist under deep uncertainty has some sense of what to do, such that they don't consider ~everything permissible. This begs the question against severe imprecision. I don't really see why we should expect our pretheoretic intuitions about the verdicts of a value system as weird as impartial longtermist consequentialism, under uncertainty as severe as ours, to be a guide to our epistemics.

I agree that intuitively it's a very strange and disturbing verdict that ~everything is permissible! But that seems to be the fault of impartial longtermist consequentialism, not imprecise beliefs.