AI alignment is a multidisciplinary research program. This means that there is potentially relevant knowledge and skill scattered across different disciplines. But it also means that people schooled only in narrow disciplines will experience a hurdle when they would work on a problem in AI alignment. One such discipline is economics, from which decision theory and game theory originated.

In this post I want to explore the idea that we should try to create a collection of “alignment-problems-for-economists”, packaged in a way that economists who have relevant knowledge and skill but don't understand ML/CS/AF can work on them.

There seem to be sub-problems in AI alignment that economists might be able to work on. However, out of the economists that I’ve spoken to, some are enthusiastic about this but see it as a personal career-risk to work on it as they do not understand the computer science. So if we can take subproblems in alignment, and package them in a way that economists can immediately start working on them, then we might be able to utilize intellectual resources (economists) that would otherwise have worked on something different.

Two types of economists to target

1. Economists who also to a degree understand basic ML/CS

2. Economists who do not.

I don’t find it very plausible that we could find sub-problems for the second type to work on, but it doesn’t seem entirely impossible: there could be certain specific problems in mechanism design or social choice or so, that would be useful for alignment but don’t require any ML/CS.

Properties of alignment-problems-for-economists that are desirable:

1. Publishable in economics journals. I have spoken to economists that are interested in the alignment problem, but they are hesitant to work on it: It is a risky career move to work on alignment if they cannot publish in journals that they are used to.

2. High work/statement ratio. How long will it take to solve the problem, versus providing the statement of the problem? If 90% of the problem is to state it in a form so that an economist could work on it, then it would likely not be efficient to do so. It should be a problem that can relatively easily be communicated clearly to an economist, while taking more time to solve.

3. No strong reliance on CS/ML tools. Many economists are somewhat familiar with basic ML techniques, but if a problem relies too much on knowledge of CS or ML, this increases the career-risk of the problem.

4. Not necessarily specifically x-risk related. If a problem in alignment is not specifically x-risk related, it is less/not embarrassing to work on it, and therefore less of a career-risk. Nevertheless, most problems in AI alignment seem important even if you don't believe that AI poses an x-risk. I don't think this requirement is that important.

* Does not have to be high-impact. If a problem has only a small chance of being somewhat impactful, it might still be worth packaging it as an economic problem, since the economists who could work on it would not otherwise work on alignment problems at all.

I do not yet have a list of such problems, but it seems that it might be possible to make one:

For example, economists might work on problems in mechanism design and social choice for AGI’s in a virtual containment. For example, can we create mechanisms with desirable properties for the amplification phase in Christiano’s program, to align a collection of distilled agents? Can we prove that such mechanisms are robust under certain assumptions? Can we create mechanisms that robustly incentivizes AGI’s with unaligned utility functions to tell us the truth? Can we use social choice to find out properties of agents that consist of sub-agents?

Economists work on strategic communication between agents (cheap talk), which might be helpful in the design of safe containment systems of not-superintelligent AGI. Information economics works on game theoretic properties of different allocations of information, and might be useful in such mechanisms as well. Economists also work on voting, and decision theory.

I want your feedback:

1. What kind of problems have you encountered that might be added to this list?

2. Do you have reasons to think that this project would be doomed to fail (or not)? If so, I want to prevent wasting time on it as fast as possible. Despite having written this post, I don’t assign a high probability of success, but I’d like people’s views.

New to LessWrong?

New Comment
2 comments, sorted by Click to highlight new comments since: Today at 3:53 PM

The "desirable properties" all seem to conspire to make any particular problem fairly low impact, so I personally wouldn't be excited about doing this project myself, but obviously there are lots of good projects I wouldn't do myself. I don't think this means it's "doomed to fail" but it seemed worth bringing up.

One thing I can think of in this direction is this paper by Gans: https://arxiv.org/abs/1711.04309

This also seems like a topic that might be of interest to Robin Hanson.