AIL — LessWrong

Okay, I think this makes sense. The idea is trying to re-interpret the various functions in the utility function as a single function and asking about the notion of complexity on that function which combines the complexity of producing a circuit which computes that function and the complexity of the circuit itself.

But just to check: is T over ? I thought T in utility functions only depended on states and actions $S \times A \to S$ ?

Maybe I am confused by what you mean by $S$ . I thought it was the state space, but that isn't consistent with $r$ in your post which was defined over $A \times O \to Q$ ? As a follow up: defining r as depending on actions and observations instead of actions and states (which e.g. the definition in POMDP on Wikipedia) seems like it changes things. So I'm not sure if you intended the rewards to correspond with the observations or 'underlying' states.

One more question, this one about the priors: what are they a prior over exactly? I will use the letters/terms from https://en.wikipedia.org/wiki/Partially_observable_Markov_decision_process to try to be explicit. Is the prior capturing the "set of conditional observation probabilities" (O on Wikipedia)? Or is it capturing the "set of conditional transition probabilities between states" (T on Wikipedia)? Or is it capturing a distribution over all possible T and O? Or are you imaging that T is defined with U (and is non-random) and O is defined within the prior?
I ask because the term $D_{K L} (ζ_{0} | | ζ)$ will be positive infinity if $ζ$ is zero for any value where $ζ_{0}$ is non-zero. Which makes the interpretation that it is either O or T directly pretty strange (for example, in the case where there are two states $s_{1}$ and $s_{2}$ and two obersvations $o_{1}$ and $o_{2}$ an O where $P (s_{i} | o_{i}) = 1$ and $P (s_{i} | o_{j}) = 0$ if $i \neq j$ would have a KL divergence of infinity from the $ζ_{0}$ if $ζ_{0}$ had non-zero probability on $P (s_{1} | o_{2})$ ). So, I assume this is a prior over what the conditional observation matrices might be. I am assuming that your comment above implies that T is defined in the utility function U instead, and is deterministic?

Vanessa Kosoy's Shortform

AIL3yΩ330

AIL3yΩ230

I am not sure I understand your use of in the third from last paragraph where you define goal directed intelligence. As you define $C$ it is a complexity measure over programs $P$ . I assume this was a typo and you mean $K (U)$ ? Or am I misunderstanding the definition of either $U$ or $C$ ?

LESSWRONG
LW

LESSWRONG
LW

Posts

Wikitag Contributions

Comments

Posts

Wikitag Contributions

Comments