ArthurB

Wiki Contributions

Comments

How I Lost 100 Pounds Using TDT

Indeed, there is nothing irrational (in an epistemic way) about having hyperbolic time preference. However, this means that a classical decision algorithm is not conducive to achieving long term goals.

One way around this problem is to use TDT, another way is to modify your preferences to be geometric.

A geometric time preference is a bit like a moral preference... it's a para-preference. Not something you want in the first place, but something you benefit from wanting when interacting with other agents (including your future self).

preferences:decision theory :: data:code

The second dot point is part of the problem description. You're saying it's irrelevant, but you can't just parachute a payoff matrix where causality goes backward in time.

Find any example you like, as long as they're physically possible, you'll either have the payoff tied to your decision algorithm (Newcomb's) or to your preference set (Solomon's).

preferences:decision theory :: data:code

I'm making a simple, logical argument. If it's wrong, it should be trivial to debunk. You're relying on an outside view to judge; it is pretty weak.

As I've clearly said, I'm entirely aware that I'm making a rather controversial claim. I never bother to post on lesswrong, so I'm clearly not whoring for attention or anything like that. Look at it this way, in order to present my point despite it being so unorthodox, I have to be pretty damn sure it's solid.

preferences:decision theory :: data:code

That's certainly possible, it's also possible that you do not understand the argument.

To make things absolutely clear, I'm relying on the following definition of EDT

Policy that picks action a = argmax( Sum( P( Wj | W, ai ). U( Wj ), j ) , i ) Where {ai} are the possible actions, W is the state of the world, P( W' | W, a ) the probability of moving to state of the world W' after doing a, and U is the utility function.

I believe the argument I made in the case of Solomon's problem is the clearest and strongest, would you care to rebut it?

I've challenged you to clarify through which mechanism someone with a cancer gene would decide to chew gum, and you haven't answered this properly.

  • If your decision algorithm is EDT, the only free variables that will determine what your decisions are are going to be your preferences and sensory input.
  • The only way the gene can cause you to chew gum in any meaningful sense is to make you prefer to chew gum.
  • An EDT agent has knowledge of its own preferences. Therefore, an EDT agent already knows if it falls in the "likely to get cancer" population.
preferences:decision theory :: data:code

Yes, the causality is from the decision process to the reward. The decision process may or may not be known to the agent, but its preferences are (data can be read, but the code can only be read if introspection is available).

You can and should self-modify to prefer acting in such a way that you would benefit from others predicting you would act a certain way. You get one-boxing behavior in Newcomb's and this is still CDT/EDT (which are really equivalent, as shown).

Yes, you could implement this behavior in the decision algorithm itself, and yes this is very much isomorphic. Evolution's way to implement better cooperation has been to implement moral preferences though, it feels like a more natural design.

preferences:decision theory :: data:code

Typo, I do mean that EDT two boxes.

preferences:decision theory :: data:code

According to wikipedia, the definition of EDT is

Evidential decision theory is a school of thought within decision theory according to which the best action is the one which, conditional on your having chosen it, gives you the best expectations for the outcome.

This is not the same as "being a randomly chosen member of a group of people..." and I've explained why. The information about group membership is contained in the filtration.

preferences:decision theory :: data:code

You're saying EDT causes you not to chew gum because cancer gives you EDT? Where does the gum appear in the equation?

preferences:decision theory :: data:code

The claim is generally that EDT chooses not to chew gum.

preferences:decision theory :: data:code

No it can't. If you use a given decision theory, your actions are entirely determined by your preferences and your sensory inputs.

Load More