My standard answer to the kinds of points you're making here is in A Simple Toy Coherence Theorem, specifically this section.
Coherence is not about whether a system "can be well-modeled as a utility maximizer" for some utility function over anything at all, it's about whether a system can be well-modeled as a utility maximizer for utility over some specific stuff.
The utility in the toy coherence theorem in this post is very explicitly over final states, and the theorem says nontrivial things mainly when the agent is making decisions at earlier times in order to influence that final state - i.e. the agent is optimizing the state "far away" (in time) from its current decision. That's the prototypical picture in my head when I think of coherence. Insofar as an incoherent system can be well-modeled as a utility maximizer, its optimization efforts must be dominated by relatively short-range, myopic objectives. Coherence arguments kick in when optimization for long-range objectives dominates.
My understanding based on this is that your definition of “reasonable” as per my post is “non-myopic” or “concerned with some future world state”?
Yes.
In my head I usually think of it as non-myopic in spacetime (as opposed to just time), but the version which is (somewhat) justified by the Toy Coherence Theorem is non-myopia over time.
For example, if I appear to have very different preferences at different points in time (e.g. I prefer to hold a red apple on odd hours and a green apple on even hours), you can extract money from me, and that seems "irrational" to us.
This is everyday rational behaviour, disguised by the details of the example. Sometimes I want something to eat and sometimes I do not. Are the managers of restaurants and food shops money-pumping me by selling me sustenance over and over? If I take the train to work in the morning, and take the train home in the evening, is the train operator money-pumping me? If I have a bed for when I prefer to sleep, and chairs for when I prefer to wake, is the furniture store money-pumping me? No, I am gaining from each purchase and have no cause to regret any of them.
Another classic example: someone who mostly has consistent preferences, which can be simply described by a utility function, but also prefers apples to bananas, bananas to oranges, and oranges to apples.
At the moment there are cherries, plums, and oranges in my fruit bowl. (There are! I am not making this up!.) Which I prefer to choose from them, when I do, is negatively correlated with my most recent choices. So which of them do I prefer? The ontology of the question is wrong. My preference function, if there is one, is not over "cherries", "plums", "oranges", or any other types of fruit. In the longer term, almost every fruit I see in the shops will rotate through my fruit bowl on occasion.
If someone's preferences look incoherent to me, maybe I am mistaken about what sorts of things their preferences are over. Behaviour reveals nothing, without the motive.
"Utility Maximizer" exists in the map, not the territory. It's something we can apply to model the behaviour of things in the territory. As in all cases, models make a trade-off between simplicity and accuracy.
Some entities are particularly well modelled (by me) as carrying out a strategy of "Maximize [X]" where [X] is a short description of some outcome.
(The classic examples of "Stockfish" being well modelled by "Achieve wins in chess" comes to mind. Someone might well model a company as executing a strategy of "Maxmize your profits" or a politician as executing a strategy of "Maximize your popularity".)
This isn't perfect, obviously. We might need to add some extra information. For example, we can describe a chess player as executing "Win chess" but with an extra variable of "ELO = 1950" which describes the power of that utility maximizer. Likewise, you might model a doctor as executing a strategy of "Cure patients" but subject to a limited set of knowledge. This isn't really what people mean by "irrational" though, since these are mostly just limitations.
What really makes an entity "irrational" is when your model of it contains pretty much any other kind of behaviour. For example, those Go bots whose behaviour is well modelled as "Win at Go, ELO = Superhuman, EXCEPT behave as if you think cyclic patterns are unbeatable". In that case, the Go engine is behaving irrationally under our model of the world.
(Another classic example: someone who mostly has consistent preferences, which can be simply described by a utility function, but also prefers apples to bananas, bananas to oranges, and oranges to apples. This puts an epicycle in our model if we have to model their fruit-swapping behaviour.)
An entity is irrational under some model to the degree that that there are more epicycles on its behaviour. A person might appear like a utility maximizer (and thus, very rational) to a much stupider person (who would not be able to model their behaviour in any other way), but very unlike a utility maximizer to a superintelligent AI. Since most humans don't vary in intelligence by that much, most of the time we're working under similar models, so we can just talk about entities being irrational.
Caveat:
We might want to talk about which agents are more or less rational in general, which then means we're making a claim that our models reflect some aspect of reality. A more (or less) rational agent is then one which is overall considered more (or less) rational under a wide variety of high-accuracy low-complexity models.
AI risk arguments often gesture at smarter AIs being “more rational”/“closer to a perfect utility maximizer" (and hence being more dangerous) but what does this mean, concretely? Almost anything can be modeled as a maximizer of some utility function.
The only way I can see to salvage this line of reasoning is to restrict the class of utility functions one can have such that the agent's best-fit utility function cannot be maximized until it gets very capable. The restriction may be justified on the basis of which kind of agents are unstable under real-world conditions/will get outcompeted by other agents.
With people, you can appeal to the notion of reasonable vs. unreasonable utility functions, and hence look at their divergence from a maximizer of the best-fit "reasonable" utility function. For example, if I appear to have very different preferences at different points in time (e.g. I prefer to hold a red apple on odd hours and a green apple on even hours), you can extract money from me, and that seems "irrational" to us. But it’s only truly irrational if you require that I’m not indifferent to money and that I don’t prefer different fruit depending on the current time.
You can also informally constrain the set of "reasonable" utility functions by what people say they want. Like if I say "I want to win a chess tournament" you might consider me irrational if I get drunk on the day of the tournament. In any particular real-world situation where people discuss rationality and preferences, we can use a rough situation-specific model of "what kinds of things can one have preferences over" and this allows us to constrain the set of valid utility functions.
My point here is that it's easy to model something as an EU-maximizer if you allow "unreasonable"-seeming utility functions. If you're saying it's only rational when the utility function you're maximizing is the one you "want" to maximize, how do you define "want" here in a non-circular way?
Simplicity per se doesn't make sense, e.g. if I want to maximize the value of my coins and you want to maximize the total size of all your coins, it doesn't seem relevant which one of those goals is simpler.
There are only four axioms, so this doesn't provide a lot of differences in degree.
A narrow type of "reasonableness" restriction might be that you're not allowed to prefer different things depending on the current time, otherwise over time you'll bleed real-world resources (like energy or money) and get outcompeted by other agents that don't have time-varying preferences.
However, such a restriction seems insufficient. For example, I could say that Deep Blue is just as rational/EU-maximizing as Stockfish if the chess models' utility functions are just a lookup table of which move they prefer to take in each board state.
If you prefer different things depending on extremely subtle changes, you're not robust to noise in perception and computation. Therefore you're unlikely to be able to fulfil your preferences under real-world constraints.
I think a combination of A4 and A5 is the way to go here; when people discuss "approximation of a utility maximizer" what they really mean is "approximation of a utility maximizer with consistent preferences over time and small perturbations".