Humans don't. "Utility" is part of the map, not part of the territory. We make choices, but utility theory is only a modeling language used to describe choice-making processes.
One are of research you may want to investigate is "Revealed Preference", a concept developed primarily by Paul Samuelson.
"Revealed Preference" has issues, because of things like circular preferences - although it's a mistake to conclude that circular preferences are proof that humans are irrational. Rather, it demonstrates that utility theory in general is just a model, and an incomplete one.
The fundamental issue is that utility, as a model, attempts to compress a topography of many dimensions - human preferences - into a topography of exactly one - a utility value for each potential choice. Impossible Objects - "contradictions", such as circular preferences - are to be expected in the abbreviated topography.
Well, there is one case in which naive utility theory makes perfect sense: when the utility function is just measuring the value of some real-number random variable inside the epistemic model (ie: when reading a number off your map tells you the utility of the territory). Since utility theory was invented to deal with economics, in which such a random variable exists and is called "money", nobody ever bothered to ask what happened when you didn't have such a convenient real-valued, assumed-monotonic random variable.
True. Although I think most utility theorists would be somewhat horrified if you suggested that money was the only thing worth measuring, when measuring utility.
Well of course, because they conceived of utility theory as giving value to money. They also invented a utility theory that only really applies to measuring money. It was a kind of doublethink in which, if real human preferences don't fit a model constructed to deal with money, then economists conclude that humans are Irrational (in a capital-letter ideological sense) rather than trying to come up with a model of evaluative reasoning that actually explains the data gained from real people.
And utility starts to become absurd when discussing multi-agent systems. For instance, if every person assigns utility prefences to every other person, at difference levels of confidence depending on their mentalising ability... Voting theory constructs some solutions to this problem, but if I've ever come across someone who's familiar enough with voting theory for us to interact in optimal ways, I've never know. I would also recommend looking up the social and cognitive determinants of altruistic behaviour and kindess to supplement the dry economic blub you'll get by looking att he research on revealed preference.
This general problem has been studied by Stuart Russell, Andrew Ng, and others. It's called "Inverse Reinforcement Learning", and the general idea is to learn the utility function of an agent A given training data which includes A's actual decisions, and then use that to infer an approximation of A's utility function for use in a RL agent B, where B can satisfy A's goals, perhaps better than A itself (by thinking faster and or predicting the future better).
You need to start with some sensible prior's over A's utility function for the problem to be well formed, but after that it becomes a machine learning problem.
What does this method produce if there is no utility function that accurately models the agent's decisions?
I'm not sure, but I'd guess it wouldn't produce much. For example, if the agent is just making random decisions, well you won't be able to learn from that.
The IRL research so far has used training data provided by humans, and can infer human goal shaped utility functions for at least the fairly simple problem domains tested so far. Most of this research was done almost a decade ago and hasn't been as active recently. In particular if you scaled it up with modern tech, I bet that IRL techniques could learn the score function of Atari from watching human play - for example.
One relevant idea is that there is a duality between assigning utilities to actions (or equivalently, being able to pick your favorite option out of a probabilistic mix of actions), and assigning utilities to outcomes. Acting consistently in one way implies that you are also acting consistently in the other.
Since humans are much better at picking actions than we are at evaluating entire world-states, this is pretty handy (though it comes nowhere near solving the entire problem). Paul Christiano has a writeup of what a naive-ish application of this idea would look like here.
So if I understand this correctly, Alice and the Sovereign are identically omniscient, and the Sovereign additionally has some power and influence upon the world that Alice does not. In the case where Alice herself is the sovereign the problem is solved, right? The sovereign just has to figure out what she prefers and do that. The solution then, is to simulate the scenario where Alice has the power to make the decision herself and then match Alice's decision. This solves both 1 and 2.
My short answer to the broader "How do we know what sacks of meat / circuits / whatever prefer" question is "you look at the behavioral output". Here, if Alice can make the decision herself, the decision represents her behavioral output.
(I'm about halfway through writing about how to make this idea more workable without resorting to omniscient things with consistent preferences, if I still like the idea after writing it out I'll cross post it on lw.)
It seems like a good portion of the whole "maximizing utility" strategy which might be used by a sovereign relies on actually being able to consolidate human preferences into utilities. I think there are a few stages here, each of which may present obstacles. I'm not sure what the current state of the art is with regard to overcoming these, and am curious regarding such.
First, here are a few assumptions that I'm using just to make the problem a bit more navigable (dealing with one or two hard problems instead of a bunch at once) - will need to go back and do away with each of these (and each combination thereof) and see what additional problems result.
So Alice can conclude anything and everything, pretty much (and so can our sovereign.) The sovereign is faced with the problem of figuring out what action to take to maximize across Alice's preferences. However, Alice is basically a sack of meat that has certain emotions in response to certain experiences or certain conclusions about the world, and it doesn't seem obvious how to get the preference ordering of the different worldlines out of these emotions. Some difficulties:
So, to rehash my actual request: what's the state of the art with regards to these difficulties, and how confident are we that we've reached a satisfactory answer?