The purpose of this post is to serve as storage for particular thought so I can link it in dialogues.

I've seen logic along these lines several times:

  1. Consequentialist reasoning is more complex than following deontological rules;
  2. Neural networks have some kind of simplicity bias;
  3. Therefore, neural networks will learn something similar to deontological rules and won't be scary consequentialists. 

I think this reasoning is wrong because it relies on hidden assumptions about the qualitative difference between consequentialism and deontology. The rule "assign utility to every outcome and select the action that maximizes expected utility" is a rule. The reasoning "I am bad at evaluating outcomes and most decisions are not so important anyway, so I should use these reliable rules-of-thumb" is consequentialist reasoning. I often think about "utilitarian virtues," like "the ability to shut up and multiply in difficult situations" or "actually implementing the plan you designed."

Another hidden assumption is an absolute understanding of simplicity: deontological rules belong to the realm of "simple" things, and consequentialist plans belong to the realm of "complex" things. But this is definitely not true when training superintelligence. If we have an entity that can design non-trivial 100-step plans with a non-negligible chance of success, such an entity can operate non-trivial 5-step plans as easily as if they were simple heuristics from our perspective.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 10:52 PM

It's about humans, but superintelligence is closer to god here.