Anthony DiGiovanni

(Formerly "antimonyanthony.") I'm an s-risk-focused AI safety researcher at the Center on Long-Term Risk. I (occasionally) write about altruism-relevant topics on my Substack. All opinions my own.

30

Thanks! Can you say a bit on why you find the kinds of motivations discussed in (edit: changed reference) Sec. 2 of here ad hoc and unmotivated, if you're already familiar with them (no worries if not)? (I would at least agree that rationalizing people's intuitive ambiguity aversion is ad hoc and unmotivated.)

41

Predicting the long-term future, mostly. (I think imprecise probabilities might be relevant more broadly, though, as an epistemic foundation.)

30

I think I just don't understand / probably disagree with the premise of your question, sorry. I'm taking as given whatever distinction between these two ontologies is noted in the post I linked. These don't need to be mathematically precise in order to be useful concepts.

10

*shrug* — I guess it's not worth rehashing pretty old-on-LW decision theory disagreements, but: (1) I just don't find the pre-theoretic verdicts in that paper nearly as obvious as the authors do, since these problems are so out-of-distribution. Decision theory is hard. Also, some interpretations of logical decision theories give the pre-theoretically "wrong" verdict on "betting on the past." (2) I pre-theoretically find the kind of logical updatelessness that some folks claim follows from the algorithmic ontology pretty bizarre. (3) On its face it seems more plausible to me that algorithms just aren’t ontologically basic, they’re abstractions we use to represent (physical) input-output processes.

10

Thanks, that's helpful!

I am indeed interested in decision theory that applies to agents other than AIs that know their own source code. Though I'm not sure why it's a problem for the physicalist ontology that the agent doesn't know the exact details of itself — seems plausible to me that "decisions" might just be a vague concept, which we still want to be able to reason about under bounded rationality. E.g. under physicalist EDT, what I ask myself when I consider a decision to do X is, "What consequences do I expect conditional on my brain-state going through the process that I call 'deciding to do X' [and conditional on all the other relevant info I know including my own reasoning about this decision, per the Tickle Defense]?" But I might miss your point.

Re: mathematical universe hypothesis: I'm pretty unconvinced, though I at least see the prima facie motivation (IIUC: we want an explanation for why the universe we find ourselves in has the dynamical laws and initial conditions it does, rather than some others). Not an expert here, this is just based on some limited exploration of the topic. My main objections:

- The move from "fundamental physics is very well described by mathematics" to "physics
*is*(some) mathematical structure" seems like a map-territory error. I just don't see the justification for this. - I worry about giving description-length complexity a privileged status when setting priors / judging how "simple" a hypothesis is. The Great Meta-Turing Machine in the Sky as described by Schmidhuber scores very poorly by the speed prior.
- It's very much not obvious to me that conscious experience is computable. (This is a whole can of worms in this community, presumably :).)

30

Thanks — do you have a specific section of the paper in mind? Is the idea that this ontology is motivated by "finding a decision theory that recommends verdicts in such and such decision problems that we find pre-theoretically intuitive"?

30

Not sure what you mean by "the math" exactly. I've heard people cite the algorithmic ontology as a motivation for, e.g., logical updatelessness, or for updateless decision theory generally. In the case of logical updatelessness, I think (low confidence!) the idea is that if you don't see yourself as this physical object that exists in "the real world," but rather see yourself as an algorithm instantiated in a bunch of possible worlds, then it might be sensible to follow a policy that doesn't update on e.g. the first digit of pi being odd.

I continue to be puzzled as to why many people on LW are very confident in the "algorithmic ontology" about decision theory:

So I see all axes

exceptthe "algorithm" axis as "live debates" -- basically anyone who has thought about it very much seems to agree that you control "the policy of agents who sufficiently resemble you" (rather than something more myopic like "your individual action")

Can someone point to resources that clearly argue for this position? (I don't think that, e.g., the intuition that you ought to cooperate with your exact copy in a Prisoner's Dilemma — much as I share it — is an argument for this ontology. You could endorse the physicalist ontology + EDT, for example.)

20

Remember, the default outcome in a n-round prisoners dilemma in CDT is still constant defect, because you just argue inductively that you will definitely be defected on in the last round. So it being single shot isn't necessary.

I think the inductive argument just isn't *that* strong, when dealing with real agents. If, for whatever reason, you believe that your counterpart will respond in a tit-for-tat manner even in a finite-round PD, even if that's not a Nash equilibrium strategy, your best response is not necessarily to defect. So CDT in a vacuum doesn't prescribe always-defect, you need assumptions about the players' beliefs, and I think the assumption of Nash equilibrium or common knowledge of backward induction + iterated deletion of dominated strategies is questionable.

Also, of course, CDT agents can use conditional commitment + coordination devices.

the whole problem with TDT-ish arguments is that we have very little principled foundation of how to reason when two actors are quite imperfect decision-theoretic copies of each other

Agreed!

Exactly — and I don't see how this is in tension with imprecision. The motivation for imprecision is that no single prior seems to accurately represent our actual state of knowledge/ignorance.