This is a simple transformation of standard expected utility formula that I found conceptually interesting.
For simplicity, let's consider a finite discrete probability space with non-zero probability at each point p(x), and a utility function u(x) defined on its sample space. Expected utility of an event A (set of the points of the sample space) is the average value of utility function weighted by probability over the event, and is written as
Expected utility is a way of comparing events (sets of possible outcomes) that correspond to, for example, available actions. Event A is said to be preferable to event B when EU(A)>EU(B). Preference relation doesn't change when utility function is transformed by positive affine transformations. Since the sample space is assumed finite, we can assume without loss of generality that for all x, u(x)>0. Such utility function can be additionally rescaled so that for all sample space
Now, if we define
the expected utility can be rewritten as
or
Here, P and Q are two probability measures. It's easy to see that this form of expected utility formula has the same expressive power, so preference relation can be defined directly by a pair of probability measures on the same sample space, instead of using a utility function.
Expected utility written in this form only uses probability of the whole event in both measures, without looking at the individual points. I tentatively call measure Q "shouldness", together with P being "probability". Conceptual advantage of this form is that probability and utility are now on equal footing, and it's possible to work with both of them using the familiar Bayesian updating, in exactly the same way. To compute expected utility of an event given additional information, just use the posterior shouldness and probability:
If events are drawn as points (vectors) in (P,Q) coordinates, expected utility is monotone on the polar angle of the vectors. Since coordinates show measures of events, a vector depicting a union of nonintersecting events is equal to the sum of vectors depicting these events:
This allows to graphically see some of the structure of simple sigma-algebras of the sample space together with a preference relation defined by a pair of measures. See also this comment on some examples of applying this geometric representation of preference.
Preference relation defined by expected utility this way also doesn't depend on constant factors in the measures, so it's unnecessary to require the measures to sum up to 1.
Since P and Q are just devices representing the preference relation, there is nothing inherently "epistemic" about P. Indeed, it's possible to mix P and Q together without changing the preference relation. A pair (p',q') defined by
gives the same preference relation,
(Coefficients can be negative or more than 1, but values of p and q must remain positive.)
Conversely, given a fixed measure P, it isn't possible to define an arbitrary preference relation by only varying Q (or utility function). For example, for a sample space of three elements, a, b and c, if p(a)=p(b)=p(c), then EU(a)>EU(b)>EU(c) means that EU(a+c)>EU(b+c), so it isn't possible to choose q such that EU(a+c)<EU(b+c). If we are free to choose p, however, an example that has these properties (allowing zero values for simplicity) is a=(0,1/4), b=(1/2,3/4), c=(1/2,0), with a+c=(1/2,1/4), b+c=(1,3/4), so EU(a+c)<EU(b+c).
Prior is an integral part of preference, and it works exactly the same way as shouldness. Manipulations with probabilities, or Bayesian "levels of certainty", are manipulations with "half of preference". The problem of choosing Bayesian priors is in general the problem of formalizing preference, it can't be solved completely without considering utility, without formalizing values, and values are very complicated. No simple morality, no simple probability.
I finally deciphered this post just now so I'll explain how I'm interpreting it for the convenience of future readers. Basically, we start in a world state with various timelines branching off it - points of the initial probability distribution. Each timeline has a particular utility (how much we like it), and a particular probability (how much we expect it). So you can sum utility times probability for all timelines to get the total expected value of this state of the world we're at right now.
However, we have the option of taking some action, the "event" referenced in the post, which rules out some set of timelines. The remaining set of timelines, the ones we can restrict our future to by performing the action, accounts for some proportion of the total expected value of our current state. That proportion is Q(A), derived from summing the expected value of each timeline in the set and dividing by the expected value of this present state - which is the same as normalizing the present state's expected value to 1.
If we perform the action, those timelines keep their probability weights, but in the absence of the other timelines now ruled out, we re-scale them to sum to 1, in the sense of Bayesian updating (our action is evidence that we're in that set of timelines rather than some other set), by dividing by the total proportion of probability mass they had in our initial state (i.e. their total probability), which is P(A).
So, Q(A)/P(A) essentially is like a "score multiplier". If the action restricts the future to a set of timelines whose proportion of total expected value, from the perspective of the pre-action starting state, is greater than their total probability, this normalized expected value of the action will be greater than 1 - we've improved our position, forced the universe into a world state which gives us a better bet than we had before. On the other hand, of course, it could be less than 1 if we restrict to a set of timelines whose density of value proportion per probability is too low - we've thrown away some potential value that was originally available to us.
The fun thing is that since Q and P both look like probability distributions - ways of weighting timelines as proportions of the whole - we can modify them with linear transformations in such a way that the preference ordering of Q(A)/P(A) remains unchanged. But that's where my currently reached understanding stops. I'll have to analyze the rest of the post to get a better sense of how that transformation works and why it would be useful.