In a lot of discussion here, there's been talk about how decision algorithms would do for PD, Newcomb's Probmel, Parfit's Hitchhiker, and Counterfactual Mugging.
There's a reasonable chain there, but especially for the last one, there's a bit of a concern I've had about the overal pattern. Specifically, while we're optimizing for extreme cases, we want to make sure we're not hurting our decision algorithm's ability to deal with less bizzare cases.
Specifically, part of the reasoning for the last one could be stated as "be/have the type of algorithm that would be expected to do well when a Counterfactual Mugger showed up. That is, would have a net positive expected utility, etc..." This is reasonable, espectially given that there seems to be lines of reasoning (like Timeless Decision Theory) that _automatically_ get this right using the same rules that would get it to succeed with PD or any other such thing. But I worry about, well, actually it would be better for me to show an example:
Consider the Pathological Decision Challenge.
Omega shows up and presents a Decision Challenge, consisting of some assortment of your favorite decision theory puzzlers. (Newcomb, etc etc etc...)
Unbeknownst to you, however, Omega also has a secret additional test: If the decisions you make are all something _OTHER_ than the normal rational ones, then Omega will pay you some huge superbonus of utilions, vastly dwarfing any cost to loosing all of the individual challenges...
However, Omega also models you and if you would have willingly "failed" _HAD YOU KNOWN_ about the extra challenge above, (but not this extra extra criteria), then you get no bonus for failing everything.
A decision algorithm that would tend to win in this contrived situation would tend to lose in regular situations, right? Again, yes, I can see the argument that being the type of algorithm that can be successfully counterfactually mugged can arise naturally from a simple rule that automatically gives the right answer for many other more reasonable situations. But I can't help but worry that as we construct more... extreme cases, we'll end up with this sort of thing, were optimizing our decision algorithm to win in the latest "decision challenge" stops it from doing as well in more, for lack of a better word, "normal" situations.
Further, I'm not sure yet how to more precisely separate out pathalogical cases from more reasonable "weird" challenges. Just to clarify, this post isn't a complaint or direct objection to considering things like Newcomb's problem, just a concern I had about a possible way we might go wrong.