AMA: Paul Christiano, alignment researcher

Thanks for these thoughts about the causal agenda. I basically agree with you on the facts, though I have a more favourable interpretation of how they bear on the potential of the causal incentives agenda. I've paraphrased the three bullet points, and responded in reverse order:

3) Many important incentives are not captured by the approach - e.g. sometimes an agent has an incentive to influence a variable, even if that variable does not cause reward attainment. 

-> Agreed. We're starting to study "side-effect incentives" (improved name pending), which have this property. We're still figuring out whether we should just care about the union of SE incentives and control incentives, or whether SE or when, SE incentives should be considered less dangerous. Whether the causal style of incentive analysis captures much of what we care about, I think will be borne out by applying it and alternatives to a bunch of safety problems.

2) sometimes we need more specific quantities, than just D affects A.

-> Agreed. We've privately discussed directional quantities like "do(D=d) causes A=a" as being more safety-relevant, and are happy to hear other ideas.

1) eliminating all control-incentives seems unrealistic

-> Strongly agree it's infeasibile to remove CIs on all variables. My more modest goal would be to prove that for particular variables (or classes of variables) such as a shut down button, or a human's values, we can either: 1) prove how to remove control (+ side-effect) incentives, or 2) why this is impossible, given realistic assumptions. If (2), then that theoretical case could justify allocation of resources to learning-oriented approaches.

Overall, I concede that we haven't engaged much on safety issues in the last year. Partly, it's that the projects have had to fit within people's PhDs. Which will also be true this year. But having some of the framework stuff behind us, we should still be able to study safety more, and gain a sense of how addressable concerns like these are, and to what extent causal decision problems/games are a really useful ontology for AI safety.

MIRI location optimization (and related topics) discussion

I think moving to the country could possibly be justified despite harms to recruitment and the rationality community, but in the official MIRI explanations, the downsides are quite underdiscussed.

What is going on in the world?

Interesting that about half of these "narratives" or "worldviews" are suffixed with "-ism": Malthusianism, Marxism, Georgism, effective altruism, transhumanism. But most of the (newer and less popular) rationalist narratives haven't yet been suchly named. This would be one heuristic for finding other worldviews. 

More generally, if you want people to know and contrast a lot of these worldviews, it'd be useful to name them all in 1-2 words each.

RyanCarey's Shortform

Causal prediction markets.

Prediction markets (and prediction tournaments more generally) may be useful for telling us not only what will happen, but which actions will achieve our goals. One proposal for getting prediction markets to help with this is to get users to make conditional predictions. For example, we can ask the question "if Biden wins the election, GDP will be higher than if Trump wins" and use that as evidence about who to elect, and so on. But conditional predictions only predict the effect of an action if the event (e.g. who is elected) is unconfounded with the outcome (GDP). It may be that higher GDP and Biden being elected have a common cause, even if electing Biden does not increase GDP directly. One way to address this would be to have the market only pay out if Biden barely wins, or Trump barely wins, so that the confounders can be assumed to be in a similar state. Another strategy for identifying the causal effect would be to randomise. We can't randomise the election result, but we can randomise other quantities. For instance, "we generate a number from 1-100, and audit company X if we generate 1. If we generate the number 1, how much tax evasion will we find?". In general, in order to design action-guiding prediction markets, it may be important to draw on identification strategies from the causal inference literature.

I haven't yet checked for existing literature on this topic. Does anyone know of any?

A vastly faster vaccine rollout

A response from @politicalmath, based on Smallpox: The Death of a Disease by  DA Henderson:

1) There were no phases, it was just "show up, get poked"
2) There were plenty of vaccines to go around. Countries typically had millions of smallpox vaccine doses ready to go just in case
3) with no lockdowns, they could go to schools / churches / offices & line people up
4) the smallpox vaccine was incredibly heat-stable. There were batches that were still efficacious after being stored for a year at 113F
5) the public health infrastructure had a lot of practice with mass vaccinations (they did them all the time in other countries)
And it did cause a city-wide panic. Smallpox has like a 20% fatality rate so people were pretty motivated to get the vaccine. All this is from this book

A vastly faster vaccine rollout

New York has apparently distributed 35% of the vaccine that it has. Maybe they are focusing on other bottlenecks? Though my naive guess would be that the main problems are that the staff at US agencies are more numerous, less-competent, more regulated, as part of the aging process of any bureaucracy, compounded by the declining prestige of governmental jobs.

The Case for a Journal of AI Alignment

One alternative would be to try to raise funds (e.g. perhaps from the EA LTF fund) to pay reviewers to perform reviews.

The Case for a Journal of AI Alignment

I don't (and perhaps shouldn't) have a guaranteed trigger - probably I will learn a lot more about what the trigger should be over the next couple years. But my current picture would be that the following are mostly true:

  • The AIS field is publishing 3-10x more papers per year as the causal inference field is now.
  • We have ~3 highly aligned tenured professors at top-10 schools, and ~3 mostly-aligned tenured professors with ~10k citations, who want to be editors of the journal
  • The number of great papers that can't get into other top AI journals is >20 per year. I figure it's currently like ~2.
  • The chance that some other group creates a similar (worse) journal for safety in the subsequent 3 years is >20%
The Case for a Journal of AI Alignment

This idea has been discussed before. Though it's an important one, so I don't think it's a bad thing for us to bring it up again. My perspective now and previously is that this would be fairly bad at the moment, but might be good in a couple of years time.

My background understanding is that the purpose of a conference or journal in this case (and in general) is primarily to certify the quality of some work (and to a lesser extent, the field of inquiry). This in-turn helps with growing the AIS field, and the careers of AIS researchers.

This is only effective if the conference or journal is sufficiently prestigious. Presently, publishing AI safety papers in Neurips, AAAI, JMLR, JAIR serves to certify the validity of the work, and boosts the field of AI safety whereas publishing in (for example) Futures or AGI doesn't. If you create a new publication venue, by default, its prestige would be comparable to, or less than Futures or AGI, and so wouldn't really help to serve the role of a journal.

Currently, the flow of AIS papers into the likes of Neurips and AAAI (and probably soon JMLR, JAIR) is rapidly improving. New keywords have been created there at several conferences, along the lines of "AI safety and trustworthiness" (I forget the exact wording) so that you can nowadays expect, on average, to receive reviewer who average out to neutral, or even vaguely sympathetic to AIS research. Ten or so papers were published in such journals in the last year, and all these authors will become reviewers under that keyword when the conference comes around next year. Yes, things like "Logical Inductors" or "AI safety via debate" are very hard to publish. There's some pressure to write research that's more "normie". All of that sucks, but it's an acceptable cost for being in a high-prestige field. And overall, things are getting easier, fairly quickly.

If you create a too low-prestige journal, you can generate blowback. For example, there was some criticism on Twitter about Pearl's "Journal of Causal Inference", even though his field is somewhat more advanced than hours.

In 1.5-3 years time, I think the risk-benefit calculus will probably change. The growth of AIS work (which has been fast) may outpace the virtuous cycle that's currently happening with AI conferences and journals, such that a lot of great papers are getting rejected. There could be enough tenure-track professors at top schools to make the journal decently high-status (moreso than Futures and AGI). We might even be nearing the point where some unilateral actor will go and make a worse journal if we don't make one. I'd say when a couple of those things are true, that's when we should pull the trigger and make this kind of conference/journal.

Load More