Definition A:

a system is ascription universal if, relative to our current epistemic state, its explicit beliefs contain just as much information as any other way of ascribing beliefs to it.

Definition B:

a family of question-answering systems A[·] are ascription universal (w.r.t. 𝔼 and L) if A[C] epistemically dominates C for every computation C.

This definition is more technical, but Christiano describes the underlying intuition as follows:

"ascription universality,” which tries to capture the property that a question-answering system A is better-informed than any particular simpler computation C.

I'm totally failing to see the connection between these two definitions.

A few things I find especially confusing:

1. Definition A seems to measure the transparency/honesty of a single system, whereas definition B seems to measure the knowledge of one system relative to another. Why are these concepts related?

2. Wouldn't an automatic photosensitive streetlight be "ascription universal" by definition A? The explicit way of assigning beliefs to the streetlight "if the light is on, the system believes it's dark outside; otherwise the system believes it's light outside". From our epistemic standpoint, there's no way to ascribe other beliefs to it that contain more information than the explicit beliefs, so it appears to satisfy the criterion. However, I don't think a streetlight has any of the nice properties you would want an "ascription universal" system to have (in particular, it wouldn't be an effective overseer in IDA...)

Edit: After thinking about this some more, my best guess is that definition A is meant to describe the epistemically dominated computation from definition B. I think a computation satisfying definition A would be epistemically dominated by our current epistemic state (though I'm not sure if that's a coherent thing to say given that epistemic dominance is also defined relative to our current epistemic state). Similarly, if a particular computation C satisfies definition A, then an overseer that can understand A's explicit beliefs can (I think??) epistemically dominate C.

If this line of thinking is correct, I'm marginally less confused...


New Answer
Ask Related Question
New Comment

New to LessWrong?