@beren discusses the assumption that intelligent systems would be well factored into a world model, objectives/values and a planning system.

He highlights that this factorisation doesn't describe intelligent agents created by ML systems (e.g. model free RL) well. Model free RL agents don't have cleanly factored architectures but tend to learn value functions/policies directly from the reward signal.

Such systems are much less general than their full model based counterpart as policies they learned that are optimal under one reward function may perform very poorly under another reward function.

 

Yet, contemporary ML favours such systems over their well factored counterparts because the are much more efficient:

  • Inference costs can be paid up front by learning a function approximator of the optimal policy and amortised over the agent's lifetime
    • A single inference step can be performed as a forward pass through the function approximator in a non factored system vs searching through a solution space to determine the optimal plan/strategy for well factored systems
  • The agent doesn't need to learn features of the environment that aren't relevant to their reward function
  • The agent can exploit the structure of the underlying problem domain
    • Specific recurring patterns can be better amortised

 

Beren attributes this tradeoff between specificity and generality to no free lunch theorems.

Attaining full generality is prohibitively expensive; as such full orthogonality is not the default or ideal case, but merely one end of a pareto tradeoff curve, with different architectures occupying various positions along it.

The future of AGI systems will be shaped by the slope of the pareto frontier across the range of general capabilities, determining whether we see fully general AGI singletons, multiple general systems, or a large number of highly specialised systems.

New to LessWrong?

New Comment
4 comments, sorted by Click to highlight new comments since: Today at 5:23 AM

I don’t think this post really has much to do with the “orthogonality thesis”, as I understand the term. The orthogonality thesis says:

  • (A) It’s possible for there to be an arbitrarily capable agent whose overriding priority is maximizing the number of paperclips in the distant future
  • (B) It’s possible for there to be an arbitrarily capable agent whose overriding priority is solving the Riemann Hypothesis
  • (C) It’s possible … etc.

I don’t think the orthogonality thesis requires that all these agents are identical except for different weights within a small data structure labeled “goals” in the source code, or whatever. The orthogonality thesis doesn’t require these agents to have any relation whatsoever. It’s just saying they can all exist.

Other than the (mis)use of the term “orthogonality thesis”, what do I think of the post?

From my perspective, I’d say that, holding compute fixed and assuming an approach that scales to radically superhuman AI, agent (A) will almost definitely wind up with better knowledge of metallurgy than agent (B), and agent (B) will almost definitely wind up with better knowledge of prime numbers than agent (A), even though “knowledge” is part of the world-model, not the value function or policy. This seems pretty obvious to me. I think that means I agree with the main substance of the post.

I don’t think this post really has much to do with the “orthogonality thesis”, as I understand the term.

I didn't read the post as having much to do with the orthogonality thesis either, and hence I made no mention of the orthogonality thesis in my summary.

I nonetheless do think the idea of a spectrum from systems that are well factored into objectives/goals/values, world models, and reasoners/planners to systems where these components are all intertwined is useful/valuable.

And the post correctly identifies relevant considerations, tradeoffs, etc. I found it very much well worth reading.

FWIW, here's a summary courtesy of -4:

The orthogonality thesis, which assumes that goals and core intelligence of an AGI system can be cleanly factored apart, has been challenged by recent developments in machine learning (ML). Model-free reinforcement learning (RL) systems, unlike their well-factored counterparts, learn policies or value functions directly from the reward signal, leading to less general agents. These non-factored agents are favored in contemporary ML because they are more efficient, amortizing the cost of planning and exploiting problem domain structure.

However, the tradeoff between specificity and generality, a consequence of the no-free-lunch theorem, means that achieving full generality is prohibitively expensive. Instead of viewing full orthogonality as the default or ideal case, it should be considered one end of a pareto tradeoff curve, with different architectures occupying various positions along it.

The future of AGI systems will be shaped by the slope of the pareto frontier across the range of general capabilities, determining whether we see fully general AGI singletons, multiple general systems, or a large number of highly specialized systems.

 

Let me know which you prefer.

[I adapted some parts of -4's summary that it seemed weren't sufficiently covered in my original account.]

[+][comment deleted]1y20