According to the rational agent model , an AI agent should have a relatively stable terminal value, such as achieving an ultimate goal, maximizing rewards, or optimizing expected utilities. The terminal value should be prevented from modification most of the time, because it provides information about what counts as the right thing.

Despite being considered intelligent, humans often act irrationally. Human values are unclear and complex. Unfortunately, AI has to be aligned with human values, not the opposite. 

Value learning approaches believe that there are some relatively stable human values that underlie human behaviors, which need to be identified, loaded into, and followed by AI to ensure that it benefits humanity. Any mistakes in this process result in the alignment problem.

Some models attempt to account for human biases and limitations, others aim to replace the rational agent model altogether. But most of the time, such models increase the complexity of AI models without providing a much better solution.

Thus we introduce the Algorithmic Common Intelligence, ACI model, which is trying to provide a novel and universal answer to the fundamental question of value. Instead of relying on supposed goals or utilities, ACI uses past events and behaviors as the fundamental guideline, since they are the only evidence we know about human value.

 

Beyond goals and utilities


ACI shares the view that there exist values that underlie human behavior, but rather than restricting these values to goals or utilities, ACI holds the idea that there are no coherence arguments that say you must have goal-directed behavior.

ACI argues that the rational agent model is only an imperfect representation of actual intelligence in the real world. As a more general framework, ACI claims that an agent should behave the same way as past behaviors which are doing the right thing.

ACI is a universal intelligence model, building upon algorithmic information theory and draws inspiration from the common law system. ACI argues that the basis of all values is the past events and behaviors (or environment and actions) that are doing the right thing, a.k.a. the precedent, a terminology borrowed from the common law system. 

 

ACI and the alignment problem

ACI provides an explanation for the alignment problem, as well as a solution: the goal or reward system programmed into an AI is always an incomplete and consolidated representation of values, and only captures a fraction of the information behind doing the right thing.  

Through a process of reevaluating the precedents, an ACI agent can adjust and improve its goals and utilities. While rational AIs generally function well in environments that resemble the precedent, they may produce unexpected and undesirable outcomes when applied to unfamiliar situations or with increased computational power. On the contrary, ACI has the capacity to comprehend the underlying intent of precedents, and align with human value in the best possible way. 

 

Plans

But how to define “behave in the same way”? What’s the meaning of the underlying intent of precedents? How to use a prediction tool like algorithm information theory to guide our actions?  How can goals or rewards be derived and modified within the ACI framework?

In the following posts, the ACI model will be explained formally and in detail, with comparison with different perspectives, including that of:



Value learning

Evolution

Neural networks

AIXI

Case based reasoning and imitation learning

Low impact AI

Active inference

Coherent extrapolated volition

Cybernetic

Embedded agency


(And here is some earlier approach to ACI: https://www.lesswrong.com/posts/NKbF8RvNiQyfWoz8e/beyond-rewards-and-values-a-non-dualistic-approach-to  )

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 11:00 AM

Sure, seems reasonable enough. See also "conditioning predictive models" and related.