Beyond Rewards and Values: A Non-dualistic Approach to Universal Intelligence
(This is my first post on Lesswrong, including some thoughts on topics I have read in this year. Please leave your comment on them. Thanks.) Abstract Building an AI system that aligns with human values is believed to be a two-step process: first design a value function or learn human value using value learning methods, then maximize those values using rational agents such as AIXI agents. In order to integrate this into one step, we analyze the dualistic assumptions of AIXI, and define a new universal intelligence model that can align with human preferences or specific environments, called Algorithmic Common Intelligence (ACI), which can behave the same way as examples. ACI does not have to employ rewards or value functions, but directly learns and updates hypothetical policies from experience using Solomonoff induction, while making actions according to the probability of every hypothesis. We argue that the rational agency model is a subset of ACI, and the coevolution of ACI and humans provides a pathway to AI alignment. 1. AIXI as a dualistic model Dualistic agent and embedded agent In most agent-based intelligence models, the agents are cleanly separated from their environments, but agents in the real world are a part of their environments. Demski and Garrabrant termed the former dualistic agents, and the later embedded agents. A dualistic agent acts like it’s playing a video game, interacts with the game only through well-defined input and output channels, such as the screen and the controller. The agent doesn't have any opportunity for self-modification. It’s immortal, and its reward circuit is not vulnerable to being hacked. On the contrary, an embedded agent is a part of the universe. There is no clear boundary between the agent and the environment. An embedded agent may improve itself, but might also modify its original goals in undesirable ways, like “directly tamping its reward circuit to get rewards in a conventional way”, which was calle
Thanks for your interest if you did have it.