Counterfactual mugging is a mug's game in the first place - that's why it's called a "mugging" and not a "surprising opportunity". The agent don't know that Omega actually flipped a coin, would have paid you counterfactually if the agent was the sort of person to pay in this scenario, would have flipped the coin at all in that case, etc. The agent can't know these things, because the scenario specifies that they have no idea that Omega does any such thing or even that Omega existed before being approached. So a relevant rational decision-theoretic parameter is an estimate of how much such an agent would benefit, on average, if asked for money in such a manner.
A relevant prior is "it is known that there are a lot of scammers in the world who will say anything to extract cash vs zero known cases of trustworthy omniscient beings approaching people with such deals". So the rational decision is "don't pay" except in worlds where the agent does know that omniscient trustworthy beings vastly outnumber untrustworthy beings (whether omniscient or not), and those omniscient trustworthy beings are known to make these sorts of deals quite frequently.
Your argument is even worse. Even broad decision theories that cover counterfactual worlds such as FDT and UDT still answer the question "what decision benefits agents identical to Bob the most across these possible worlds, on average". Bob does not benefit at all in a possible world in which Bob was Alice instead. That's nonexistence, not utility.
Yet our AI systems, even the most advanced, focus almost exclusively on logical, step-by-step reasoning.
This is absolutely false.
We design them to explain every decision, show their work and follow clear patterns of deduction.
We are trying to design them to be able to explain their decisions and follow clear patterns of deduction, but we are still largely failing. In practice they often arrive at an answer in a flash (whether correct or incorrect), and this was almost universal for earlier models without the more recent development of "chain of thought".
Even in "reasoning" models there is plenty of evidence that they often still do have an answer largely determined before starting any "chain of thought" tokens and then make up reasons for it, sometimes including lies.
Yes, you can use yourself as a random sample but at best only within a reference class of "people who use themselves as a random sample for this question in a sufficiently similar context to you". That might be a population of 1.
For example, suppose someone without symptoms has just found out that they have genes for a disease that always progresses to serious illness. They have a mathematics degree and want to use their statistical knowledge to estimate how long they have before becoming debilitated.
They are not a random sample from the reference class of people who have these genes. They are from people who have the genes and didn't show symptoms before finding that out and did so during adulthood (almost certainly) and live in a time and place and with sufficient capacity to earn a mathematics degree and of suitable mindset to ask themselves this question and so on.
Any of these may be relevant information for estimating the distribution, especially if the usual age of onset is in childhood or the disease also reduces intellectual capacity or affects personality in general.
Relating back to the original doomsday problem: suppose that in the reference class of all civilizations, most discover some principle that conclusively resolves the Doomsday problem not long after formulating it (within a few hundred years or so). It doesn't really matter what that resolution happens to be, there are plenty of possibilities.
If that is the case, then most people who even bother to ask the Doomsday question without already knowing the answer are those in that narrow window of time where their civilization is sophisticated enough to ask the question without being sophisticated enough to answer it, regardless of how long those civilizations might last or how many people exist after resolving the question.
To the extent that the Doomsday reasoning is valid at all (which it may not be), all that it provides is an estimate of time until most people stop asking the Doomsday question in a similar context to yours. Destruction of the species is not required for that. Even it becoming unfashionable is enough.
Yes, player 2 loses with extremely low probability even for a 1-bit hash (on the order of 2^-256). For a more commonly used hash, or for 2^24 searches on their second-last move, they reduce their probability of loss by a huge factor more.
This paragraph also misses the possibility of constructing a LLM and/or training methodology such that it will learn certain functions, or can't learn certain functions. There is also a conflation of "reliable" with "provable" on top of that.
Perhaps there is some provision made elsewhere in the text that addresses these objections. Nonetheless, I am not going to search. I found that the abstract smells enough like bullshit to do something else.
I'll try to make it clearer:
Suppose b "knows" that Omega runs this experiment for all programs b. Then the optimal behaviour for a competent b (by a ridiculously small margin) is to 1-box.
Suppose b suspects that box-choosing programs are slightly less likely to be run if they 1-box on equal inputs. Then the optimal behaviour for b is to 2-box, because the average extra payoff for 1-boxing on equal inputs is utterly insignificant while the average penalty for not being chosen to run is very much greater. Anything that affects probability of being run as box-chooser with probability greater than 1000/|P| (which is on the order of 1/10^10^10^10^100) matters far more than what the program actually does.
In the original Newcombe problem, you know that you are going to get money based on your decision. In this problem, a running program does not know this. It doesn't know whether it's a or b or both, and every method for selecting a box-chooser is a different problem with different optimal strategies.
As a function of M, |P| is very likely to be exponential and so it will take O(M) symbols to specify a member of P. Under many encodings, there isn't one that can even check whether the inputs are equal before running out of time.
That aside, why are you assuming that program b "wants" anything? Essentially all of P won't be programs that have any sort of "want". If it is a precondition of the problem that b is such a program, what selection procedure is assumed between those that do "want" money from this scenario? Note that being selected for running is also a precondition for getting any money at all, so this selection procedure is critically important - far more so than anything the program might output!
Regarding the first paragraph: every purported rational decision theory maps actions to expected values. In most decision theory thought experiments, the agent is assumed to know all the conditions of the scenario, and so they can be taken as absolute facts about the world leaving only the unknown random variables to feed into the decision-making process. In the Counterfactual Mugging, that is explicitly not true. The scenario states
So it's not enough to ask what a rational agent with full knowledge of the rest of the scenario should do. That's irrelevant. We know it as omniscient outside observers, but the agent in question knows only what the mugger tells them. If they believe it then there is a reasonable argument that they should pay up, but there is nothing given in the scenario that makes it rational to believe the mugger. The prior evidence is massively against believing the mugger. Any decision theory that ignores this is broken.
Regarding the second paragraph: yes, indeed there is that additional argument against paying up and rationality does not preclude accepting that argument. Some people do in fact use exactly that argument even in this very much weaker case. It's just a billion times stronger in the "Bob could have been Alice instead" case and makes rejecting the argument untenable.