Jesse Richardson

ERA fellow researching technical AI safety, July-August 2023. 

Interested in prediction markets and AI alignment.


Sorted by New

Wiki Contributions


You recognise this in the post and so set things up as follows: a non-myopic optimiser decides the preferences of a myopic agent. But this means your argument doesn’t vindicate coherence arguments as traditionally conceived. Per my understanding, the conclusion of coherence arguments was supposed to be: you can’t rely on advanced agents not to act like expected-utility-maximisers, because even if these agents start off not acting like EUMs, they’ll recognise that acting like an EUM is the only way to avoid pursuing dominated strategies. I think that’s false, for the reasons that I give in my coherence theorems post and in the paragraph above. But in any case, your argument doesn’t give us that conclusion. Instead, it gives us something like: a non-myopic optimiser of a myopic agent can shift probability mass from less-preferred to more-preferred outcomes by probabilistically precommitting the agent to take certain trades in a way that makes its preferences complete. That’s a cool result in its own right, and maybe your post isn’t trying to vindicate coherence arguments as traditionally conceived, but it seems worth saying that it doesn’t.


I might be totally wrong about this, but if you have a myopic agent with preferences A>B, B>C and C>A, it's not totally clear to me why they would change those preferences to act like an EUM. Sure, if you keep offering them a trade where they can pay small amounts to move in these directions, they'll go round and round the cycle and only lose money, but do they care? At each timestep, their preferences are being satisfied. To me, the reason you can expect a suitably advanced agent to not behave like this is that they've been subjected to a selection pressure / non-myopic optimiser that is penalising their losses.

If the non-myopic optimiser wants the probability of a dominated strategy lower than that, it has to make the agent non-myopic. And in cases where an agent with incomplete preferences is non-myopic, it can avoid pursuing dominated strategies by acting in accordance with the Caprice Rule.

This seems right to me. It feels weird to talk about an agent that has been sufficiently optimized for not pursuing dominated strategies but not for non-myopia. Doesn't non-myopia dominate myopia in many reasonable setups?

Can you explain more how this might work? 

Epistemic Status: Really unsure about a lot of this.

It's not clear to me that the randomization method here is sufficient for the condition of not missing out on sure gains with probability 1.

Scenario: B is preferred to A, but preference gap between A & C and B & C, as in the post.

Suppose both your subagents agree that the only trades that will ever be offered are A->C and C->B. These trades occur with a Poisson distribution, with  = 1 for the first trade and  = 3 for the second. Any trade that is offered must be immediately declined or accepted. If I understand your logic correctly, this would mean randomizing the preferences such that 

 = 1/3,

 = 1

In the world where one of each trade is offered, the agent always accepts A->C but will only accept C->B 1/3 of the time, thus the whole move from A->B only happens with probability 1/3. So the agent misses out on sure gains with probability 2/3. 

In other words, I think you've sufficiently shown that this kind of contract can taken a strongly-incomplete agent and make them not-strongly-incomplete with probability >0 but this is not the same as making them not-strongly-incomplete with probability 1, which seems to me to be necessary for expected utility maximization.

Something I have a vague inkling about based on what you and Scott have written is that the same method by which we can rescue the Completeness axiom i.e. via contracts/commitments may also doom the Independence axiom.  As in, you can have one of them (under certain premises) but not both?

This may follow rather trivially from the post I linked above so it may just come back to whether that post is 'correct', but it might also be a question of trying to marry/reconcile these two frameworks by some means. I'm hoping to do some research on this area in the next few weeks, let me know if you think it's a dead end I guess!

Really enjoyed this post, my question is how does this intersect with issues stemming from other VNM axioms e.g. Independence as referenced by Scott Garrabrant?

It seems to me that you don't get expected utility maximizers solely from not-strong-Incompleteness, as there are other conditions that are necessary to support that conclusion. 

Hi EJT, I'm starting research on incomplete preferences / subagents and would love to see this entry too if possible!

Furthermore, human values are over the “true” values of the latents, not our estimates - e.g. I want other people to actually be happy, not just to look-to-me like they’re happy.

I'm not sure that I'm convinced of this. I think when we say we value reality over our perception it's because we have no faith in our perception to stay optimistically detached from reality. If I think about how I want my friends to be happy, not just appear happy to me, it's because of a built-in assumption that if they appear happy to me but are actually depressed, the illusion will inevitably break. So in this sense I care not just about my estimate of a latent variable, but what my future retroactive estimates will be. I'd rather my friend actually be happy than be perfectly faking it for the same reason I save money and eat healthy - I care about future me. 

What about this scenario: my friend is unhappy for a year while I think they're perfectly happy, then at the end of the year they are actually happy but they reveal to me they've been depressed for the last year. Why is future me upset in this scenario, why does current me want to avoid this? Well because latent variables aren't time-specific, I care about the value of latent variables in the future and the past, albeit less so. To summarize: I care about my own happiness across time and future me cares about my friend's happiness across time, so I end up caring about the true value of the latent variable (my friend's happiness). But this is an instrumental value, I care about the true value because it affects my estimates, which I care about intrinsically.

Would it perhaps be helpful to think of agent-like behavior as that which takes abstractions as inputs, rather than only raw physical inputs? e.g. an inanimate object such as a rock only interacts with the world on the level of matter, not on the level of abstraction. A rock is affected by wind currents according to the same laws, regardless of  the type of wind (breeze, tornado, hurricane), while an agent may take different actions or assume different states dependent on the abstractions the wind has been reduced to in its world model. 

Load More