This post is an answer to: http://intelligence.org/files/RealisticWorldModels.pdf

> In Solomonoff’s induction problem, the agent and its environment are fundamentally separate processes, connected only by an observation channel. In reality, agents are embedded within their environment; the universe consists of some ontologically continuous substrate (atoms, quantum fields) and the “agent” is just a part of the universe in which we have a particular interest. What, then, is the analogous prediction problem for agents embedded within (and computed by) their environment?

> This is the naturalized induction problem , and it is not yet well understood. A good formalization of this problem, on par with Solomonoff’s formalization of the computable sequence induction problem, would represent a significant advance in the theory of general reasoning.

In Solomonoff’s induction, an algorithm learns about the world (modeled as a Turing machine) from observations (modeled as output from that Turing machine). Solomonoff’s induction is uncomputable, however there are computable approximations.

1) Suggestion of agent design

Consider an agent fully embedded in an environment. Every turn, the agent receves one bit of observation and can preform one bit of action. The design we propose for this agent comes in two steps, learning and deciding.

1.1) Learning:

The agent models the entire wrold, including the agent it self, as one unknown, output only Turing machine. Both observations and the agents own actions, are seen as world outputs. The agent calculate the probability distribution over hypotheses for the entire world, using Solomonoff’s induction (computable approximation).

This suggestion completely removes any boundary between the agent and the rest of the world, in the agents wold model.

1.2) Deciding

The agent must also have a decision proses for choosing what action to take. Because the agent model its own actions as deterministic outputs from a complete wold model, we can not use the decision procedure used by Hutter’s AXIX. This is not just a problem of counterfactuals. Our agents internal world model has no input channel.

Instead we suggest the flowing: For each available action, the agent calculate the expected utility, conditional on observing itself preform that action. The agent chooses the action that gives the highest expected utility. Alternatively, the agent choses semi-randomly, with higher probability for actions that results in higher expected utility.

An advantage with this agent design is that the decision process does not have to care about sequences of actions. Instead different possible future actions are accoutered for by separate wold hypotheses. Further more, this naturally takes in to account situations where the agent looses control over its own actions, e.g. if it brakes down.

2) Scoring agents

Solomonoff’s induction is the formal optimal solution to an associated scoring rule. Having a such a scoring rule is useful for testing how well approximation do. (Right?)

I don’t know what the associated scoring rule would be for this suggestion. Sorry :(

3) Measuring utility

The decision proses is based on the agent being able to detect utility in any given world hypothesis. This is a very hard problem, which we do not attempt to solve here.

This sweeps some of the essential problems under the rug; if you formalize it a bit more, you'll see them.

It's not an artificial restriction, for instance, that a Solomonoff Induction oracle machine doesn't include things like itself in its own hypothesis class, since the question of "whether a given oracle machine matches the observed data" is a question that sometimes cannot be answered by an oracle machine of equivalent power. (There are bounded versions of this obstacle as well.)

Now, there are some ways around this problem (all of them, so far as I know, found by MIRI): modal agents, reflective oracle machines and logical inductors manage to reason about hypothesis classes that include objects like themselves. Outside of MIRI, people working on multiagent systems make do with agents that each assume the other is smaller/simpler/less meta than itself (so at least one of those agents is going to be wrong).

But this entire problem is hidden in your assertion that the agent, which is a Turing machine, "models the entire wrold, including the agent it self, as one unknown, output only Turing machine". The only way to find the other problems swept under the rug here is to formalize or otherwise unpack your proposal.