Conditions under which misaligned subagents can (not) arise in classifiers
Core claim: Misaligned subagents are very unlikely to arise in a classification algorithm unless that algorithm is directly or indirectly (e.g. in a subtask) modeling interactions through time at a significant level of complexity. Definition 1: Agent - a function from inputs and internal state (or memory) to an output...
Jul 11, 201812