To me, impact measurement research crystallizes how agents affect (or impact) each other; the special case of this is about how an AI will affect us (and what it even means for us to be "affected").

A distinction between "difference in world models" and "differences in what we are able to do" is subtle, and enlightening (at least to me). It allows a new terminology in which I can talk about the impact of artificial intelligence.

I find this important as well. With this understanding, we can easily consider how a system of agents affects the world and each other throughout their deployment.

The concept of impact appears to neighbor other relevant alignment concepts, like mild optimization, corrigibility, safe shutdowns, and task AGIs. I suspect that even if impact measures are never actually used in practice, there is still some potential that drawing clear boundaries between these concepts will help clarify approaches for designing powerful artificial intelligence.

This is essentially my model for why some AI alignment researchers believe that deconfusion is helpful. Developing a rich vocabulary for describing concepts is a key feature of how science advances. Particularly clean and insightful definitions help clarify ambiguity, allowing researchers to say things like "That technique sounds like it is a combination of X and Y without having the side effect of Z."

A good counterargument is that there isn't any particular reason to believe that this concept requires priority for deconfusion. It would be bordering on a motte and bailey to claim that some particular research will lead to deconfusion and then when pressed I appeal to research in general. I am not trying to do that here. Instead, I think that impact measurements are potentially good because they focus attention on a subproblem of AI, in particular catastrophe avoidance. And I also think there has empirically been demonstrable progress in a way that provides evidence that this approach is a good idea.

IMO: Deconfusion isn't a motte and bailey according to the private information I have; to me, the substantial deconfusion is a simple fact. Also from my point of view, many people seem wildly underexcited about this direction in general (hence the upcoming sequence).

There's a natural kind here, and there's lovely math for it. The natural kind lets us formalize power, and prove when and why power differentials exist. The natural kind lets us formalize instrumental convergence, and prove when and why it happens. (Or, it will, and I'm working out the details now.) The natural kind lets us understand why instrumental convergence ends up being bad news for us.

Now, when I consider the effects of running an AI, many more facets of my thoughts feel clear and sharp and well-defined. "Low-impact AGI can't do really ambitious stuff" seems like a true critique (for now! and with a few other qualifications), but it seems irrelevant to the main reasons I'm excited about impact measurement these days. IMO: there's so much low-hanging fruit, so many gold nuggets floating down the stream, so much gemstone that there's more gem than stone - we should exhaustively investigate this, as this fruit, these nuggets, these gems may^[1] later be connected to other important facts in AI alignment theory.

There is a power in the truth, in all the pieces of the truth which interact with each other, which you can only find by discovering as many truths as possible.

In fact, the deconfusion already connects to important facts: instrumental convergence is important to understand. ↩︎

Reply

Moderation Log

LESSWRONG
LW

LESSWRONG
LW

21

Four Ways An Impact Measure Could Help Alignment

21

Ω 10

21

Ω 10

Impact as a regularizer

Impact as a safety protocol

Impact as an influence-limiter

Impact as deconfusion