Beyond Rewards and Values: A Non-dualistic Approach to Universal Intelligence

[-]Roman Leventov3y10

I think that in the ACI model, you correctly capture that agents are not bestowed with the notion of good and bad "from above", as in the AIXI model (implicitly, encoded as rewards). ACI appears to be an uncomputable, idealised version of predictive processing.

However, after removing ethics "from the outside", ACI is left without an adequate replacement. I. e., this is an agent devoid of ethics as a cognitive discipline, which appears to be intimately related to foresight. ACI lacks constructive foresight, too, it always "looks back", which warranted periodic "supervised learning" stages that seem like a patch-up. This doesn't appear scalable, too.

In Active Inference, which is an extension of predictive processing, intelligent behaviour is modelled as perpetual minimisation of the expected free energy (EFE) of the agent:

Where $τ$ is the expected future action trajectory^[1] (from which the agent takes the first action and performs it at every iteration of its observe-orient-decide-act loop), $Q$ denotes the current inference of the world state, combined with the probabilistic generative model of how the world will change in the future from the current state ( $~ x$ denotes the expected future trajectory of the states in the domain of the world model, and $~ y$ denotes the expected future trajectory of the external observations from which the world state is inferred), and $P (~ y)$ denotes the prior preferences of the system (agent), specified as a probability distribution over future observations.

Note that the agent's intelligence could be roughly divided into three big disciplines:

Learning of $Q$ , or epistemology. This is what ACI is chiefly doing.
Learning of $P$ , or ethics.
Minimisation of EFE ( $G$ ), or rationality.

(Of course, all three could hardly be disentangled, and under different classifications could be seen as sub-disciplines of each other.)

For a natural intelligence, there is a perfect source of right experiences: its evolutionary history. All the actions of a living individual and its ancestors must be “right”, at least in their own environments, otherwise this individual won’t be here at all. If we have the knowledge of all those experiences, we can infer how this individual would behave.

In Active Inference parlance, this is called self-evidencing [in the ecological/phenotypical niche that has proven to support my life up to this point]. For a formal treatment of evolution through the lens of Active Inference, see "Life as we know it" (Friston 2013).

^{^}
The horizon of this future trajectory is termed the cognitive light cone (Levin 2019). There is yet no theory of how this light cone extends farther in the future, as far as I know.

[-]Akira Pyinya3y10

Thank you for your comment. I have spent some time reading the book Active Inference. I think active inference is a great theory, but focuses on some aspects of intelligence different from what ACI does.

ACI learns to behave the same way as examples, so it can also learn ethics from examples. For example, if behaviors like “getting into a very cold environment” is excluded from all the examples, either by natural selection or artificial selection, an ACI agent can learn ethics like “always getting away from cold”, and use it in the future. If you want to achieve new ethics, you have to either induce from the old ones or learn from selection in something like “supervised stages”.

Unlike Active Inference or the “AIXI+ValueLearning” combination, ACI does not divide the learning process into “information gain” and “pragmatic value learning”, but learns them as a whole. For example, bacterias can learn policies like following nutrient gradients from successful behaviors proved by natural selection.

The problem of dividing the learning process is that, without value learning, we don’t know what information is important and need to learn, but without enough information it would be difficult to understand values. That’s why active inference indicates that “pragmatic and epistemic values need to be pursued in tandem”. However, the ACI model works in a little different way, it induces and updates policies directly from examples, and practices epistemic learning only when the policy asks to, such as when the policy involves pursuing some goal states.

In the active inference model, both information gain and action are considered as “minimizing the discrepancy between our model and our world through perception and action”. For example, when a person senses his body temperature is much higher than expected, he should change his model of body temperature, or take action to lower his body temperature. He always chooses the latter, because “we are considerably more confident about our core temperature because it underwrites our existence”. In the words of ACI, that is “from experience (including genetic information), we know that as long as we (and our homeotherm ancestors) are alive, we always act to keep our core temperature stay the same”.

In order to have a better description of this body temperature example, I suggest a small improvement on the active inference model: “minimizing the discrepancy between our model and our world”, including minimizing the discrepancy between the model of our action and our actual action. In this body temperature example, the person has choices of acting to lower his temperature or not to act, the former action can minimize the discrepancy between the model of our action (always keep our core temperature the same) and our actual action.

[-]Roman Leventov3y10

ACI learns to behave the same way as examples, so it can also learn ethics from examples. For example, if behaviors like “getting into a very cold environment” is excluded from all the examples, either by natural selection or artificial selection, an ACI agent can learn ethics like “always getting away from cold”, and use it in the future. If you want to achieve new ethics, you have to either induce from the old ones or learn from selection in something like “supervised stages”.

You didn't respond to the critical part of my comment: "However, after removing ethics "from the outside", ACI is left without an adequate replacement. I. e., this is an agent devoid of ethics as a cognitive discipline, which appears to be intimately related to foresight. ACI lacks constructive foresight, too, it always "looks back", which warranted periodic "supervised learning" stages that seem like a patch-up. This doesn't appear scalable, too."

Let me try to rephrase: ACI appears fundamentally inductive, but inductivism doesn't appear to be a philosophy of science that really leads to general intelligence. A general intelligence should adopt some form of constructivism (note that in my "decomposition" of the "faculties" of general intelligence, based on Active Inference, in the comment above: namely, epistemology, rationality, and ethics, -- are all deeply intertwined, and "ethics" is really about any foresight and normativity, including constructivism). AIXI could be general intelligence because the constructive, normative aspect of intelligence is "assumed away" to the external entity that assigns rewards to different outcomes; with ACI, you basically still assume this aspect of intelligence away, relying on the "caretaker" that will decide what, when and how to teach the ACI. If it's some other AI that does it, how does that AI know? So, there is an infinite regress, and ACI couldn't be a universal model of general intelligence.

Also, cf. Safron (2022) discussion of FEP/Active Inference and AIXI.

A few corrections about your reading of Active Inference:

ACI does not divide the learning process into “information gain” and “pragmatic value learning”

First, Active Inference doesn't really "divide" them such, it's one of the decompositions of EFE (the other is into ambiguity and risk). Second, it's just "pragmatic value" here, not "pragmatic value learning".

In the active inference model, both information gain and action are considered as “minimizing the discrepancy between our model and our world through perception and action”.

Information gain is not coupled/contraposed with action. Action is only contraposed with perception. Perception != information gain. Perception is tuned to minimise VFE (rather than EFE); VFE = complexity + accuracy, but information gain doesn't feature in VFE.

[-]Akira Pyinya3y*10

I think I have already responded to that part. Who is the “caretaker that will decide what, when and how to teach the ACI”? The answer is natural selection or artificial selection, which work like filters. AIXI’s “constructive, normative aspect of intelligence is ‘assumed away’ to the external entity that assigns rewards to different outcomes”, while ACI’s constructive, normative aspect of intelligence is also assumed away to the environment that have determined which behavior was OK and which behavior would get a possible ancestor out of the gene pool. Since the the reward circuit of natural intelligence is shaped by natural selection, ACI is also eligible to be a universal model of intelligence.

Thank you for your correction about Active Inference reading, I will read more then respond to that.

LESSWRONG
LW

LESSWRONG
LW

10

Beyond Rewards and Values: A Non-dualistic Approach to Universal Intelligence

10

10

1. AIXI as a dualistic model

Dualistic agent and embedded agent

AIXI is dualistic while Solomonoff induction is not

AIXI has set two barriers

2. Introducing ACI: Agents that need no reward

Value learning

Case-based systems

Definition of ACI

3. Types of ACI

Natural ACI

Artificial ACI

Policies and sub-policies

IRL + AIXI is a subset of aACI

4. An example: paperclips making aACI

Supervised stage

Unsupervised stage

Another supervised stage

Human-AI coevolution for AI alignment

Could ACI be wireheaded?

5. Conclusions

Acknowledgements