Goal-directedness: exploring explanations

I'll probably update this comment after I finish some edits to previous posts.

I like this line of thinking. I'd say the big place where we've swept something under the rug here is what counts as an agent-shaped model? After all, we don't care about just any simple yet powerful explanations (otherwise the Standard Model would be much more relevant to ethics than it is), we care about the parameters of the agent-shaped models specifically.

[-]Morgan_Rogers4y10

I see what you're getting at. For an arbitrary explanation, we need to take into account not only the complexity of the explanation itself, but also how difficult it is to compute a relevant prediction from that explanation; according to my criteria, the Standard Model (or any sufficiently detailed theory of physics that accurately explains phenomena within a conservative range of low-ish energy environments encountered on Earth) would count as a very good explanation for any behaviour for its complexity, but that's ignoring the fact that it would be impossible to actually compute those predictions.

While I made the claim that there is a clear dividing line between (accuracy and power) and (complexity), this strikes me as an issue straddling complexity and explanatory power, which muddies the water a little.

Since I've appealed to physics explanations in my post, I'm glad you've made me think about these points. Moving forward, though, I expect the classes of explanation under consideration to be so constrained as to make this issue insignificant. That is, I expect to be directly comparing explanations taking the form of goals to explanations taking the form of algorithms or similar; each of these has a clear interpretation in terms of its predictions and, while the former might be harder to compute, the difference in difficulty is going to be suitably uniform across the classes (after accounting for complexity of explanations), so that I feel justified in ignoring it until later.

[-]Morgan_Rogers4y10

A note on judging explanations

I should address a point that wasn't addressed in the post, and which may otherwise be a point of confusion going forward: the quality of an explanation can be high according to my criteria even if it isn't empirically correct. That is, there are some explanations of behaviour which may be falsifiable: if I am observing a robot, I could explain its behaviour in terms of an algorithm, and one way to "test" that explanation would be to discover the algorithm which the robot is in fact running. However, no matter the result of this test, the judged quality of the explanation is not affected. Indeed, there are two possible outcomes: either the actual algorithm provides a better explanation overall, or our explanatory algorithm could be a simpler algorithm with the same effects, and hence be a better explanation than the true one, since using this simpler algorithm is a more efficient way to predict the robot's behaviour than simulating the robot's actual algorithm.

This might seem counterintuitive at first, but it's really just Occam's razor in action. Functionally speaking, the explanations I'm talking about in this post aren't intended to be recovering the specific algorithm the robot is running (just as we don't need the specifics of its hardware or operating system); I am only concerned with accounting for the robot's behaviour.

^{^}

That assumption may not be valid, of course; in existing AI we have explicit access to the source code, although not necessarily in a form that is useful from an explanatory perspective. I don't explore that possibility in this post, but...

^{^}

I think that some relation between competence and goal-directedness is inevitable, since an agent with a goal that has no idea how to achieve that goal might act essentially randomly, to the effect that whether or not it has a goal is not easy to detect.

^{^}

A relation $R : X ↬ Y$ is called total if for each $x \in X$ there exists some $y \in Y$ with $x R y$ . This guarantees that each explanation is "valid" in the sense of describing some possible behaviour, although it may describe several behaviours.

^{^}

I observe challenges and choices throughout this post. My intention in doing so is twofold. First, I want to emphasise that anyone employing any of these formulations will need to be explicit about their choices. Second, I want to be deliberate in pointing out where I am postponing choices for later.

^{^}

For some suitably strong notion of surjectivity. Smooth and almost-everywhere a submersion is definitely enough, but this only makes sense if $B$ is nice enough, in the sense of admitting a smooth structure. Assuming a topological structure on B, we could employ the concept of topological submersion as a corresponding sufficient condition.

^{^}

There is a sense in which deficiency in scope is complementary or dual to the dimensional deficiency of the previous section: the scope is measured in terms of the number of dependent variables, where the dimension is measured in terms of the number of independent variables.

^{^}

A subtle but important point: the algorithm outputs the explanation (the goal, say) not the behaviour predicted by the explanation!

^{^}

Maybe $M$ and $N$ represent the operations "multiply by 8" and "multiply by 3" respectively, while the second language only allows the operation $m$ of "multiplication by 2".

LESSWRONG
LW

LESSWRONG
LW

13

Goal-directedness: exploring explanations

13

13

What constitutes a good explanation?

Accurate explanations

Direct mapping and extrinsic measurements of accuracy

Rewards and evaluative measurements of accuracy

Relational versions

Measure-theoretic versions

Powerful Explanations

Lower dimensional explanations

Explanations of Restricted Scope

Simple Explanations

Algorithmic complexity

Computational complexities

Conclusions

The naïve picture and its flaws

What's missing?