abramdemski

# Sequences

Pointing at Normativity
Consequences of Logical Induction
Partial Agency
Alternate Alignment Ideas
Filtered Evidence, Filtered Arguments
CDT=EDT?
Embedded Agency
Hufflepuff Cynicism

Sorted by New

# Wiki Contributions

Sorted by

The fact that I could gamble more wisely if I had access to more computation doesn't seem to undercut the reasons for using probabilities when I don't.

I am not trying to undercut the use of probability in the broad sense of using numbers to represent degrees of belief.

However, if "probability" means "the kolmogorov axioms", we can easily undercut these by the argument you mention: we can consider a (quite realistic!) case where we don't have enough computational power to enforce the kolmogorov axioms precisely. We conclude that we should avoid easily-computed dutch books, but may be vulnerable to some hard-to-compute dutch books.

Now in the extreme adversarial case, a bookie could come along who knows my computational limits and only offers me bets where I lose in expectation. But this is also a problem for empirical uncertainty; in both cases, if you literally face a bookie who is consistently winning money from you, you could eventually infer that they know more than you and stop accepting their bets. I still see no fundamental difference between empirical and logical uncertainties.

Yes, exactly. In the perspective I am offering, the only difference between bookies who we stop betting with due to a history of losing money, vs bookies we stop betting with due to a priori knowing better, is that the second kind corresponds to something we already knew (already had high prior weight on).

In the classical story, however, there are bookies we avoid a priori as a matter of logic alone (we could say that the classical perspective insists that the kolmogorov axioms are known a priori -- which is completely fine and good if you've got the computational power to do it).

Here's an explanation that may help.

You can think of classical Bayesian reasoning as justified by Dutch Book arguments. However, for a Dutch Book argument to be convincing, there's an important condition that we need: the bookie needs to be just as ignorant as the agent. If the bookie makes money off the agent because the bookie knows an insider secret about the horse race, we don't think of this as "irrational" on the part of the agent.

This assumption is typically packaged into the part of a Dutch Book argument where we say the Dutch Book "guarantees a net loss" -- if the bookie is using insider knowledge, then it's not a "guarantee" of a net loss. This "guarantee" needs to be made with respect to all the ways things could empirically turn out.

However, this distinction becomes fuzzier when we consider embedded agency, and in particular, computational uncertainty. If the agent has observed the length of two sides of a right triangle, then it is possible to compute the length of the remaining side. Should we say, on the one hand, that there is a Dutch Book against agents who do not correctly compute this third length? Or should we complain that a bookie who has completed the computation has special insider knowledge, which our agent may lack due to not having completed the computation?

If we bite the "no principled distinction" bullet, we can develop a theory where we learn to avoid making logical mistakes (such as classical Dutch Books, or the triangle example) in exactly the same manner that we learn to avoid empirical mistakes (such as learning that the sun rises every morning). Instead of getting a guarantee that we never give in to a Dutch Book, we get a bounded-violations guarantee; we can only lose so much money that way before we wise up.

Yeah, in hindsight I realize that my iterated mugging scenario only communicates the intuition to people who already have it. The Lizard World example seems more motivating.

You can do exploration, but the problem is that (unless you explore into non-fixed-point regions, violating epistemic constraints) your exploration can never confirm the existence of a fixed point which you didn't previously believe in. However, I agree that the situation is analogous to the handstand example, assuming it's true that you'd never try the handstand. My sense is that the difficulties I describe here are "just the way it is" and only count against FixDT in the sense that we'd be happier with FixDT if somehow these difficulties weren't present.

I think your idea for how to find repulsive fixed-points could work if there's a trader who can guess the location of the repulsive point exactly rather than approximately, and has the wealth to precisely enforce that belief on the market. However, the wealth of that trader will act like a martingale; there's no reliable profit to be made (even on average) by enforcing this fixed point. Therefore, such a trader will go broke eventually. On the other hand, attractive fixed points allow profit to be made (on average) by approximately guessing their locations.

Repulsive points effectively "drain willpower".

(Some might believe that most people are mediocre, and their music would end up bland and similar to each other’s — but if no one else is in the feedback loop, how could that even happen? And if it did, what would this neutral, universally human music look like? A folk song from Borneo? Classical music? Hip-hop?)

This doesn't seem like the default to me. The default is AI companies that do centralized work trying to make a good product. All the users are in the feedback loop. Some customization to individual users is valuable, but the prior that's been developed through interaction with lots of people is going to do a ton of the work. Your intuition that music becomes super-individualized seems based on an intuition that the AI customization "grows with you", going deep down a rabbit hole over years. This doesn't seem like the sort of thing the companies are incentivized to create. The experience to new users is much more important for adoption.

abramdemskiΩ220

I think so, yes, but I want to note that my position is consistent with nosy-neighbor hypotheses not making sense. A big part of my point is that there's a lot of nonsense in a broad prior. I think it's hard to rule out the nonsense without learning. If someone thought nosy neighbors always 'make sense', it could be an argument against my whole position. (Because that person might be just fine with UDT, thinking that my nosy-neighbor 'problems' are just counterfactual muggings.)

Here's an argument that nosy neighbors can make sense.

For values, as I mentioned, a nosy-neighbors hypothesis is a value system which cares about what happens in many different universes, not just the 'actual' universe. For example, a utility function which assigns some value to statements of mathematics.

For probability, a nosy-neighbor is like the Lizard World hypothesis mentioned in the post: it's a world where what happens there depends a lot on what happens in other worlds.

I think what you wrote about staples vs paperclips nosy-neighbors is basically right, but maybe if we rephrase it it can 'make more sense'?: "I (actual me) value paperclips being produced in the counterfactual(-from-my-perspective) world where I (counterfactual me) don't value paperclips."

Anyway, whether or not it makes intuitive sense, it's mathematically fine. The idea is that a world will contain facts that are a good lens into alternative worlds (such as facts of Peano Arithmetic), which utility hypotheses / probabilistic hypotheses can care about. So although a hypothesis is only mathematically defined as a function of worlds where it holds, it "sneakily" depends on stuff that goes on in other worlds as well.

abramdemskiΩ220

I disagree with this framing. Sure, if you have 5 different cakes, you can eat some and have some. But for any particular cake, you can't do both. Similarly, if you face 5 (or infinitely many) identical decision problems, you can choose to be updateful in some of them (thus obtaining useful Value of Information, that increases your utility in some worlds), and updateless in others (thus obtaining useful strategic coherence, that increases your utility in other worlds). The fundamental dichotomy remains as sharp, and it's misleading to imply we can surmount it. It's great to discuss, given this dichotomy, which trade-offs we humans are more comfortable making. But I've felt this was obscured in many relevant conversations.

I don't get your disagreement. If your view is that you can't eat one cake and keep it too, and my view is that you can eat some cakes and keep other cakes, isn't the obvious conclusion that these two views are compatible?

I would also argue that you can slice up a cake and keep some slices but eat others (this corresponds to mixed strategies), but this feels like splitting hairs rather than getting at some big important thing. My view is mainly about iterated situations (more than one cake).

Maybe your disagreement would be better stated in a way that didn't lean on the cake analogy?

My point is that the theoretical work you are shooting for is so general that it's closer to "what sorts of AI designs (priors and decision theories) should always be implemented", rather than "what sorts of AI designs should humans in particular, in this particular environment, implement".
And I think we won't gain insights on the former, because there are no general solutions, due to fundamental trade-offs ("no-free-lunchs").
I think we could gain many insights on the former, but that the methods better fit for that are less formal/theoretical and way messier/"eye-balling"/iterating.

Well, one way to continue this debate would be to discuss the concrete promising-ness of the pseudo-formalisms discussed in the post. I think there are some promising-seeming directions.

Another way to continue the debate would be to discuss theoretically whether theoretical work can be useful.

It sort of seems like your point is that theoretical work always needs to be predicated on simplifying assumptions. I agree with this, but I don't think it makes theoretical work useless. My belief is that we should continue working to make the assumptions more and more realistic, but the 'essential picture' is often preserved under this operation. (EG, Newtonian gravity and general relativity make most of the same predictions in practice. Kolmogorov axioms vindicated a lot of earlier work on probability theory.)

abramdemskiΩ220

This was very though-provoking, but unfortunately I still think this crashes head-on with the realization that, a priori and in full generality, we can't differentiate between safe and unsafe updates. Indeed, why would we expect that no one will punish us by updating on "our own beliefs" or "which beliefs I endorse"? After all, that's just one more part of reality (without a clear boundary separating it).

I'm comfortable explicitly assuming this isn't the case for nice clean decision-theoretic results, so long as it looks like the resulting decision theory also handles this possibility 'somewhat sanely'.

It sounds like you are correctly explaining that our choice of prior will be, in some important sense, arbitrary: we can't know the correct one in advance, we always have to rely on extrapolating contingent past observations.
But then, it seems like your reaction is still hoping that we can have our cake and eat it: "I will remain uncertain about which beliefs I endorse, and only later will I update on the fact that I am in this or that reality. If I'm in the Infinite Counterlogical Mugging... then I will just eventually change my prior because I noticed I'm in the bad world!". But then again, why would we think this update is safe? That's just not being updateless, and losing out on the strategic gains from not updating.

My thinking is more that we should accept the offer finitely many times or some fraction of the times, so that we reap some of the gains from updatelessness while also 'not sacrificing too much' in particular branches.

That is: in this case at least it seems like there's concrete reason to believe we can have some cake and eat some too.

Since a solution doesn't exist in full generality, I think we should pivot to more concrete work related to the "content" (our particular human priors and our particular environment) instead of the "formalism".

This content-work seems primarily aimed at discovering and navigating actual problems similar to the decision-theoretic examples I'm using in my arguments. I'm more interested in gaining insights about what sorts of AI designs humans should implement. IE, the specific decision problem I'm interested in doing work to help navigate is the tiling problem.

abramdemskiΩ220

You're right, I was overstating there. I don't think it's probable that everything cancels out, but a more realistic statement might be something like "if UDT starts with a broad prior which wasn't designed to address this concern, there will probably be many situations where its actions are more influenced by alternative possibilities (delusional, from our perspective) than by what it knows about the branch that it is in".

abramdemskiΩ220

Yeah, I expect the Lizard World argument to be the more persuasive argument for a similar point. I'm thinking about reorganizing the post to make it more prominent.