Okay, I think this makes sense. The idea is trying to re-interpret the various functions in the utility function as a single function and asking about the notion of complexity on that function which combines the complexity of producing a circuit which computes that function and the complexity of the circuit itself.But just to check: is T over S×A×O→S? I thought T in utility functions only depended on states and actions S×A→S? Maybe I am confused by what you mean by S. I thought it was the state space, but that isn't consistent with r in your post which was defined over A×O→Q? As a follow up: defining r as depending on actions and observations instead of actions and states (which e.g. the definition in POMDP on Wikipedia) seems like it changes things. So I'm not sure if you intended the rewards to correspond with the observations or 'underlying' states.
One more question, this one about the priors: what are they a prior over exactly? I will use the letters/terms from https://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process to try to be explicit. Is the prior capturing the "set of conditional observation probabilities" (O on Wikipedia)? Or is it capturing the "set of conditional transition probabilities between states" (T on Wikipedia)? Or is it capturing a distribution over all possible T and O? Or are you imaging that T is defined with U (and is non-random) and O is defined within the prior? I ask because the term DKL(ζ0||ζ) will be positive infinity if ζ is zero for any value where ζ0 is non-zero. Which makes the interpretation that it is either O or T directly pretty strange (for example, in the case where there are two states s1 and s2 and two obersvations o1 and o2 an O where P(si|oi)=1 and P(si|oj)=0 if i≠j would have a KL divergence of infinity from the ζ0 if ζ0 had non-zero probability on P(s1|o2)). So, I assume this is a prior over what the conditional observation matrices might be. I am assuming that your comment above implies that T is defined in the utility function U instead, and is deterministic?
I am not sure I understand your use of C(U) in the third from last paragraph where you define goal directed intelligence. As you define C it is a complexity measure over programs P. I assume this was a typo and you mean K(U)? Or am I misunderstanding the definition of either U or C?
Is there a place to look for papers and posts people have already submitted (and want to be public)?