This is a linkpost for

Scott Aaronson is a computer scientist at the University of Texas in Austin, whose research mainly focuses on quantum computing and complexity theory. He's at least very adjacent to the Rationalist/LessWrong community. After some comments on his blog and then coversations with Jan Leike, he's decided work for one year on AI safety at OpenAI. 

To me this is a reasonable update that people who are sympathetic to AI safety can be convinced to actually do direct work. 

Aaronson might be one of the easier people to induce to do AI safety work, but I imagine there are also other people who are worth talking to about doing direct work on AI safety. 

New Comment
32 comments, sorted by Click to highlight new comments since:

Yes! I actually just discussed this with one of my advisors (an expert on machine learning), and he told me that if he could get funding to do it he would definitely be interested in dedicating a good chunk of his time to researching AGI safety. (For any funders who might read this and might be interested in providing that funding, please reach out to me by email I'm going to try to reach out to some potential funders next week.)

I think that there are a lot of researchers who are sympathetic to AI risk concerns, but they either lack the funding to work on it or they don't know how they might apply their area of expertise to do so. The former can definitely be fixed if there's an interest from funding organizations. The latter can be fixed in many cases by reaching out and talking to the researcher.

There's been discussion about there being a surplus of funding in EA and not enough people who want to get funded to do important work. If that is true, shouldn't it be relatively easy for your presumably competent advisor to get such funding to work on AI safety?

I think it might be super recent that mainstream academics are expressing this sentiment, probably because the harbingers are probably obvious, and yes this is probably not that hard to fund by certain EA causes. Or how much money does he actually want?

Hopefully. I have a feeling it won't be so easy, but we'll see.

If it ends up not being easy, it seems to me like that means that we are in fact funding constrained. Is that true or am I missing something?

(The advisor in question is just one person. If it was only them who wanted to work in AI safety but couldn't do to a lack of funds, that wouldn't be a big deal. But I am assuming that there are lots similar people in a similar boat. In which case the lack of funding would be an important problem.)

(I know this topic has been discussed previously. I bring it up again here because the situation with this advisor seems like a really good concrete example.)

My impression - which I kind of hope is wrong - has been that it is much easier to get an EA grant the more you are an "EA insider" or have EA insider connections. The only EA connection that my professor has is me. On the other hand, I understand the reluctance to some degree in the case of AI safety because funders are concerned that researchers will take the money and go do capabilities research instead.

Non-rhetorically, what's the difference between AI risk questions and ordinary scientific questions, in this respect? "There aren't clear / precise / interesting / tractable problems" is a thing we hear, but why do we hear that about AI risk as opposed to other fields with sort of undefined problems? Hasn't a lot of scientific work started out asking imprecise, intuitive questions, or no? Clearly there's some difference.

In fact, starting a scientific field, as opposed to continuing, is poorly funded, it's not just AI risk. Another way to say this is that AI risk, as a scientific field, is pre-paradigmic.


Given he is going to be doing this at literal OpenAI, how confident are we that this is on net a good idea? I'm especially interested in Christiano's opinion here, since he was Aaronson's student and he also was at but left OpenAI. 

Here's a 1-year-old answer from Christiano to the question "Do you still think that people interested in alignment research should apply to work at OpenAI?". Generally pretty positive about people going there to "apply best practices to align state of the art models". That's not exactly what Aaronson will be doing, but it seems like alignment theory should have even less probability of differentially accelerating capabilities.


He says he will be doing alignment work, the worst thing I can think of that can realistically happen is that he gives OpenAI unwarranted confidence in how aligned their AIs are. Working at OpenAI isn’t intrinsically bad, publishing capabilities research is.

[+][comment deleted]10

The field of alignment has historically been pretty divorced (IMO) from how the technology of machine learning works, so it would benefit the field to be closer to the ground reality. Also, any possible solution to alignment is going to need to be integrated with capabilities when it comes time. (Again, IMO.)

However good an idea it is, it's not as good an idea as Aaronson just taking a year off and doing it on his own time, collaborating and sharing whatever he deems appropriate with the greater community. Might be financially inconvenient but is definitely something he could swing.

There's a thought that's been circulating in my mind for a while that social proof is important here. I presume that seeing a reputable person like Scott Aaronson going to work on AI safety would do a lot to convince others (researchers, funders, policymakers) that it is an important and legitimate problem.

Honestly I suspect this is going to be the single largest benefit from paying Scott to work on the problem. Similarly, when I suggested in an earlier comment that we should pay other academics in a similar manner, in my mind the largest benefit of doing so is because that will help normalize this kind of research in the wider academic community. The more respected researchers there are working on the problem, the more other researchers start thinking about it as well, resulting (hopefully) in a snowball effect. Also, researchers often bring along their grad students!

Right, I was going to bring up the snowball effect as well but I forgot. I think that's a huge point.

I wonder what it would take to bring Terence Tao on board..

At any rate, this is good news, the more high status people in academia take Alignment seriously, the easier it becomes to convince the next one, in what I hope is a virtuous cycle!

I always assumed that "Why don't we give Terence Tao a million dollars to work on AGI alignment?" was using Tao to refer to a class of people. Your comment implies that it would be especially valuable for Tao specifically to work on it. 

Why should we believe that Tao would be especially likely to be able to make progress on AGI alignment (e.g. compared to other recent fields medal winners like Peter Scholze)?

I've also been perplexed by the focus on Tao in particular. In fact, I've long thought that if it's a good idea to recruit a top mathematician to alignment, then Peter Scholze would be a better choice since

  1. he's probably the greatest active mathematician
  2. he's built his career out of paradigmatizing pre-paradigmatic areas of math
  3. he has an interest in computer proof-checking.

That said, I'm quite confident that Scholze is too busy revolutionizing everything he touches in mathematics to be interested in switching to alignment, so this is all moot.

(Also, I recognize that playing the "which one mathematician would be the single best to recruit to alignment?" game is not actually particularly useful, but it's been a pet peeve of mine for a while that Tao is the poster child of the push to recruit a mathematician, hence this comment.)


Thanks, I’ve added him to my list of people to contact. If someone else wants to do it instead, reply to this comment so that we don’t interfere with each other.

I've always used "Tao" to mean "brillant mathematicians" but I also think he has surprisingly eclectic research interests and in particular has done significant work in image processing, which shows a willingness to work on applied mathematics and may be relevant for AI work.

I must say however that I've changed my mind on this issue and that AI alignment research would be better served by hiring a shit ton of PhD students with a promise of giving 80% of them 3-5 years short term research positions after their PhD and giving 80% of those tenure afterward. I think we made a mistake by assuming that pre paradigmatic research means only genius are useful, and that a pure number strategy would help a lot (also genius mathematicians willing to work on interesting problems are not that much interested in money overwise they would work in finance, but they are very much interested by getting a fixed stable position before 35. The fact that France is still relevant in mathematical research while under paying its researcher by a lot is proof of that).

IIRC you were looking into these ideas more seriously, any progress?


We should send somebody or somebodies to the Heidelberg Laureate Forum. High EV.

I always assumed that "Why don't we give Terence Tao a million dollars to work on AGI alignment?" was using Tao to refer to a class of people. Your comment implies that it would be especially valuable for Tao specifically to work on it. 

When I've talked about this, I've always meant both literally hire Tao and try to find young people of the same ability.  

While I too was using Tao as a reference class, it's not the only reason for mentioning him. I simply expect that people with IQs that ridiculously high are simply better suited to tackling novel topics, and I do mean novel, building a field from scratch, ideally with mathematical precision.

All the more if they have a proven track record, especially in mathematics, and I suspect that if Tao could be convinced to work on the problem, he would have genuinely significant insight. That and a cheerleader effect, which wouldn't be necessary in an ideal world, but that's hardly the one we live in is it?

Why should we believe that Tao would be especially likely to be able to make progress on AGI alignment (e.g. compared to other recent fields medal winners like Peter Scholze)?

Well, his name is alliterative, so there's that.

(I'm being glib here, but I agree that there's a much broader class of people who have a similar level of brilliance to Tao, but less name recognition, who could contribute quite a lot if they were to work on the problem.)

That's a great point! It'll also help with communicating the difficulty of the problem if they'll conclude that the field is in trouble and time running out (in case that's true – experts disagree here). I think AI strategy people should consider trying to get more ambassadors on board. (I think I see the ambassador effect as more important now than those people's direct contributions, but you definitely only want ambassadors whose understanding of AI risk is crystal clear.)

Edit: That said, bringing in reputable people from outside ML may not be a good strategy to convince opinion leaders within ML, so this could backfire. 


There are already people taking care of that, see this question I asked recently.

Just saying I appreciate this post being so short <3 

(and still informative)

My first reaction was somewhat skeptical. But I think it's actually good.

I don't think Scott Aaronson will do much to directly solve AI alignment immediately. But blue-sky research is still valuable, and if there are interesting complexity theory problems related to e.g. interpretability vs. steganography, I think it's great to encourage research on these questions.