Is GPI / forethought foundation missing?
No, I was simply mistaken. Thanks for correcting my intuitions on the topic!
If this is the case, this seems more like a difference in exploration/exploitation strategies.
We do have positively valenced heuristics for exploration - say curiosity and excitement
I think that the intuition for this argument comes from something like a gradient ascent under an approximate utility function. The agent will spend most of it's time near what it perceives to be a local(ish) maximum.
So I suspect the argument here is that Optimistic Errors have a better chance of locking into a single local maximum or strategy, which get's reinforced enough (or not punished enough), even though it is bad in total.
Pessimistic Errors are ones in which the agent strategically avoids locking into maxima, perhaps by Hedonic Adaptation as Dagon suggested. This may miss big opportunities if there are actual, territorial, big maxima, but that may not be as bad (from a satisficer point of view at least).
And kudos for the neat explanation and an interesting theoretical framework :)
I'd expect the preference at each point to mostly go in the direction of either axis.
However, this analysis should be interesting in non-cooperative games where the vector might represent a mixed strategy, with amplitude the expected payoff perhaps.
I may be mistaken. I tried reversing your argument, and I bold the part that doesn't feel right.
Optimistic errors are no big deal. The agent will randomly seek behaviours that get rewarded, but as long as these behaviours are reasonably rare (and are not that bad) then that’s not too costly.
But pessimistic errors are catastrophic. The agent will systematically make sure not to fall into behaviors that avoid high punishment, and will use loopholes to avoid penalties even if that results in the loss of something really good. So even if these errors are extremely rare initially, they can totally mess up my agent.
So I think that maybe there is inherently an asymmetry between reward and punishment when dealing with maximizers.
But my intuition comes from somewhere else. If the difference between pessimism and optimism is given by a shift by a constant then it ought not matter for a utility maximizer. But your definition goes at errors conditional on the actual outcome, which should perhaps behave differently.
Pessimistic errors are no big deal. The agent will randomly avoid behaviors that get penalized, but as long as those behaviors are reasonably rare (and aren’t the only way to get a good outcome) then that’s not too costly.But optimistic errors are catastrophic. The agent will systematically seek out the behaviors that receive the high reward, and will use loopholes to avoid penalties when something actually bad happens. So even if these errors are extremely rare initially, they can totally mess up my agent.
I'd love to see someone analyze this thoroughly (or I'll do it if there will be an interest). I don't think it's that simple, and it seems like this is the main analytical argument.
For example, if the world is symmetric in the appropriate sense in terms of what actions get you rewarded or penalized, and you maximize expected utility instead of satisficing in some way, then the argument is wrong. I'm sure there is good literature on how to model evolution as a player, and the modeling of the environment shouldn't be difficult.
I find the classification of the elements of robust agency to be helpful, thanks for the write up and the recent edit.
I have some issues with Coherence and Consistency:
First, I'm not sure what you mean by that so I'll take my best guess which in its idealized form is something like: Coherence is being free of self contradictions and Consistency is having the tool to commit oneself to future actions. This is going by the last paragraph of that section-
There are benefits to reliably being able to make trades with your future-self, and with other agents. This is easier if your preferences aren’t contradictory, and easier if your preferences are either consistent over time, or at least predictable over time.
Second, the only case for Coherence is that reasons that coherence helps you make trade with your future self. My reasons for it are more strongly related to avoiding compartmentalization and solving confusions, and making clever choices in real time given my limited rationality.
Similarly, I do not view trades with future self as the most important reason for Consistency. It seems that the main motivator here for me is some sort of trade between various parts of me. Or more accurately, hacking away at my motivation schemes and conscious focus, so that some parts of me will have more votes than others.
Third, there are other mechanisms for Consistency. Accountability is a major one. Also, reducing noise in the environment externally and building actual external constraints can be helpful.
Forth, Coherence can be generalized to a skill that allows you to use your gear lever understanding of yourself and your agency to update your gears to what would be the most useful. This makes me wonder if the scope here is too large, and that gears level understanding and deliberate agency aren't related to the main points as much. These may all help one to be trustworthy, in that one's reasoning can judged to be adequate - including for oneself - which is the main thing I'm taking out from here.
Fifth (sorta), I have reread the last section, and I think that I understand now that your main motivation for Coherence and Consistency is that the conversation between rationalists can be made much more effective in that they can more easily understand each other's point of view. This I view related to Game Theoretic Soundness, more than the internal benefits of Coherence and Consistency which are probably more meaningful overall.
Non-Bayesian Utilitarian that are ambiguity averse sometimes need to sacrifice "expected utility" to gain more certainty (in quotes because that need not be well defined).