TLDR: As someone who talks to researchers about AI alignment, I’m curious if there are ways to predict how well a conversation might go. For example, does demographic information help predict whether a conversation will have lasting effects months later? To answer this, I sifted through two sets of results, focusing on the newly-released quantitative analysis of 97 AI researcher interviews. It was a messy process and I’ve given up on releasing something that shows my work. However, jump down to the "Overall Takeaways" section to read what updates I’m making about predicting research interest in AI alignment.
(Note: I highly recommend you read the version of this post where the table formatting looks reasonable.)
In February-March 2022, I conducted 40-60 min interviews with 97 AI researchers about their perceptions of AI and the future of the field. The core of these conversations was discussing potential risks from advanced AI systems – I presented some arguments for why we might be concerned, then we discussed what they thought and why.
Maheen Shermohammed and I recently released a report (interactive graph version) analyzing these interviews. It’s an enormous report, and the main findings are described in this summary post.
However, the summary post doesn’t discuss the interaction effects between all of the variables we investigated. By understanding these interactions, we can answer a question I’m very interested in:
If you talk to an AI researcher, and you know something about them – demographics from their website, or information about their knowledge of AI safety, sympathy towards AI safety, or their timelines to AGI – can you predict anything about their future sympathy or whether the conversation will be useful to them?
All the relevant information is in our report. However, it is admirably massive! Ideally there would be a separate document just on the important interaction effects. To that end, I’ve spent more than 40 hours working through the interaction effect analyses, checking my independent understanding of the graphs against Maheen’s careful observations, incorporating some additional results from an EA Forum post that analyzed AI Impacts 2022 survey data, lovingly creating this massive document that lays out all my reasoning as I squint at dozens of graphs and correlations…
…and then I realized how much editing it’d need to post publicly. And you know what, I give up. Message me if you’d like the full, messy, graph-by-graph reasoning doc. Otherwise, here are my takeaways, mostly oriented towards AI safety field-building.
This post is organized as follows:
Acknowledgements: Thanks to Maheen Shermohammed for doing this entire analysis, including writing her interpretations for each graph, which were great to cross-check against. Thanks to Lukas Trötzmüller, Michael Keenan, and David Spearman for helping edit the post.
Demographic Variables | ||
h-index | Report | |
Age | Report | Based on university graduation date |
Field | Report | Field was evaluated in two ways: asking the participant directly in the interview (Field1) and by looking up participants’ websites and Google Scholar Interests (Field2). All analyses here use Field2. |
Sector | Report | Academia vs. industry |
There are additional demographic measures included in this dataset which are included in the "Demographics Correlation Matrix" section. Demographic information was sourced from Google Scholar and personal websites / LinkedIn. | ||
Measures of “Sympathy to AI Risk Arguments” | ||
“Alignment + Instrumental” | Report | This was the primary measure of what I’m calling “sympathy”. In these interviews, I described the alignment problem, and asked researchers if this argument seemed valid or invalid. I then described the idea of instrumental incentives, and asked researchers if that argument seemed valid or invalid. “Alignment + Instrumental” is a combined measure:
This combined measure is meant to be more robust than agreement to either “Alignment Problem” or “Instrumental Incentives” alone. |
Alignment Problem | Report | The core question was: What do you think of the argument: “highly intelligent systems will fail to optimize exactly what their designers intended them to, and this is dangerous?” |
Instrumental Incentives | Report | The core question was: What do you think about the argument: “highly intelligent systems will have an incentive to behave in ways to ensure that they are not shut off or limited in pursuing their goals, and this is dangerous?” |
Work on this [alignment research] | This question was asked in a bunch of different ways, but the core was: Would you work on AI alignment research? It could be considered a sympathy measure, and I included it as such for some sections below. However, I don’t think it tracks “sympathy” very well: see the “What is the 'work on this' variable tracking?” section for details. | |
Main Variables | ||
When will we get AGI? | Report | This was asked in different ways, and advanced AI systems / AGI was imprecisely defined, but roughly: When do you think we’ll have very general capable systems, perhaps with the cognitive capabilities to replace all current human jobs (so you could have a CEO AI or a scientist AI), if we do? |
Heard of AI safety? | Report | |
Heard of AI alignment? | Report | |
Work on this [alignment research] | Report | This question was asked in a bunch of different ways, but the core was: Would you work on AI alignment research? |
Did you change your mind? | Report | “Have you changed your mind on anything during this interview and how was this interview for you?” |
Follow-up Questions: Lasting Effects | Report | I emailed researchers 5-6 months later, and asked this binary question (Y/N): “Did the interview have a lasting effect on your beliefs?” |
Follow-up Questions: New Actions | Report | I emailed researchers 5-6 months later, and asked this binary question (Y/N): “Did the interview cause you to take any new actions in your work?” |
I also analyzed graphs from this EA Forum post that looked at interaction effects in the AI Impacts 2022 Expert Survey on Progress in AI data. I did not independently verify the results from that post. Those variables are included in the Appendix.
More sympathy for AI risk arguments, and sooner AGI timelines, among top researchers
Timelines are important
When asked about starting alignment research, AI researchers want to know what research directions exist that are close to their current research interests and skillsets
If you’ve already heard of AI safety / alignment before, you’re (weakly but fairly robustly) more likely to be sympathetic to the AI risk arguments
These interviews seemed useful
The demographics that matter for prediction are probably AI subfield and h-index
You can ask people if they changed their mind during the conversation as a litmus test
Besides the “changed mind” question, it’s hard to know who will be affected by the interview
The remaining sections are excerpts from the full messy doc, which is the source for the takeaways above. I first analyze the demographics variables, then all of the other variables we hypothesized could be predictive. For each section, there’s a table showing what graphs I analyzed (follow the “Report” links to see them), and then an interim conclusion.
This report | |
Alignment + Instrumental Combined, split by Field2 | Report |
Work on this, split by Field2 | Report |
When will we get AGI?, split by Field2 | Report |
Have you heard of AI safety?, split by Field2 | Report |
Have you heard of AI alignment?, split by Field2 | Report |
AI Impacts 2022, split by “By Specific AI Field” | |
Society should invest more / much more in AI safety research | |
>=5% chance that HLMI would be extremely bad | |
>=5% chance AI leading to bad outcomes | |
>=5% chance humans can’t control AI leading to bad outcomes |
The data in this section was way more chaotic than any other section, and I don’t trust most of it. That said, here are my conclusions about what subfields of AI I’m planning to pay more attention to, which should be taken with a grain of salt. Note that I’m focusing on the subfields where the skillsets are overlapping with existing research directions in alignment.
This report | |
Alignment + Instrumental Combined, split by Age | Report |
Work on this, split by Age | Report |
When will we get AGI?, split by Age | Report |
AI Impacts 2022, split by “By Time in Career” | |
Society should invest more / much more in AI safety research | |
>=5% chance AI leading to bad outcomes | |
>=5% chance humans can’t control AI leading to bad outcomes |
In my data, age doesn’t particularly seem to affect AGI timelines, or sympathy to arguments, though people who think the AI risk arguments are invalid (compared to valid) are maybe slightly younger (a couple of years). In the AI Impacts data, early-career people seem more concerned about risks from AI (directionally this is true across three questions but the strength of that effect differs).
(This section includes the two extra columns from the full messy doc. Most of the extra columns aren’t as neat as this one.)
Alignment + Instrumental Combined, split by h-index | Report | Invalid tends to have lower h-index | ![]() |
Work on this, split by h-index | Report | No differences | ![]() |
When will we get AGI?, split by h-index | Report | Shorter timelines maybe tend to have higher h-indices, but overlapping error bars | ![]() |
The mean h-index of researchers who thought the AI risk arguments were invalid was lower than the mean h-index of researchers who thought the AI risk arguments (alignment problem and instrumental incentives arguments) were valid.
Researchers with shorter AGI timelines tended to have higher h-indices on average than researchers with longer AGI timelines, though this effect was not strong.
(This section includes the two extra columns from the full messy doc. Most of the extra columns aren’t as neat as this one.)
Note we’re ignoring the data from researchers at research institutes (n=3/97), and only comparing academic versus industry researchers.
This report | |||
Alignment + Instrumental Combined, split by Sector | Report | No differences | ![]() ![]() |
Work on this, split by Sector | Report | Academics somewhat more interested than industry | ![]() ![]() |
When will we get AGI?, split by Sector | Report | Relatively similar proportions of academic / industry researchers across all timelines, but industry researchers tend towards shorter timelines (that’s an observation made by looking at the count graph rather than the proportion graph, though those counts are very low). “Won’t happen” does something separate. | ![]() ![]() |
AI Impacts 2022, split by “By Industry” | |||
>=5% chance that HLMI would be extremely bad | Industry is more worried than academia | ![]() | |
>=5% humans can’t control AI leading to bad outcomes | Industry is more worried than academia | ![]() |
In our report, academics are slightly more interested in working on AI alignment research than industry researchers.
In our report, there are no big differences in sympathy towards AI risk arguments between people in industry and academia. In the AI Impacts data, people in industry are more worried about AI risk than people in academia.
In our report, industry researchers tend towards shorter timelines, but this is a weak effect. (That industry researchers have short timelines and academic researchers have long timelines is the usual claim, which this data doesn’t particularly support.)
Anything we missed that has high correlations in "Demographics x Main Questions"? In particular, we’re interested in correlations that we haven’t seen yet in the graphs above, because the variables didn’t have associated “Split-by” graphs.
Demographics X Main Questions, Using Field2 Labels | Report |
My actual summary of the new information we haven’t seen above is (ranked by correlation strength, where the first is by far the strongest):
And there’s some other correlations relevant here but they’re lower p-values so not including.
Main Questions X Main Questions | Report |
Alignment + Instrumental Combined, split by "When will we get AGI?" | Report |
Work on this, split by "When will we get AGI?" | Report |
Thinking AGI will happen seems to be approximately a prerequisite to being concerned about AI risk, with earlier AGI timelines corresponding with being more interested in doing alignment research.
Follow-up Questions: Lasting Effects, split by "When will we get AGI?" | Report |
Follow-up Questions: New Actions, split by "When will we get AGI?" | Report |
Follow-up Questions: Lasting Effects, split by alignment+instrumental | Report, also Report and Report |
Follow-up Questions: New Actions, split by alignment+instrumental | Report, also Report and Report |
Short answer: No.
We’re asking whether AGI timelines and sympathy towards AI risk arguments are predictive for either of two potential effects:
I don’t think the data really makes sense in any direction, and this is a wash – the answer to the question is no.
This report | |
Main Questions x Main Questions, “Meaningful effects” (The relevant split for “meaningful effects” would be “Lasting Effects” and “New Actions” split by “Heard of AI safety” and “Heard of AI alignment”. We don’t have those splits, but do have Main Questions x Main Questions correlations.) | Report |
Main Questions x Main Questions, “Sympathy” | Report |
Alignment + Instrumental Combined, split by “Heard of AI safety” | Report, also Report and Report |
Alignment + Instrumental Combined, split by “Heard of AI alignment” | Report, also Report and Report |
Work on this, split by “Heard of AI safety” | Report |
Work on this, split by “Heard of AI alignment” | Report |
AI Impacts 2022 | |
Split by “By How Much Thought on HLMI”
| Post |
Split by “By How Much Thought on Social Impacts”
| Post |
What I’m actually taking away from this: The general idea that if you’ve already heard of AI safety / alignment before, you’re (weakly but fairly robustly) more likely to be sympathetic to the AI risk arguments (but there’s nothing particularly actionable on that). There’s also maybe an effect where, if you’re new to AI safety / alignment, the interview is more likely to have a long-term effect on you, but it’s hard to tell.
(This section includes the two extra columns from the full messy doc. Most of the extra columns aren’t as neat as this one.)
Main Questions x Main Questions
|
“Did you change your mind?” Report
Lasting effects Report | The second most significant correlation in the Main Questions x Main Questions correlation was “Did you change your mind?” and “Lasting effects” (rho=0.4851420, n=50, p=0.0003559), which is also plotted in a graph. (Note: the highest ranking correlation is uninteresting, since that “Heard of AI alignment” and “Heard of AI safety” would correlate was predetermined.) This means: People who said that they changed their minds during the interview were more likely to report later that the interview had a lasting effect on their beliefs. Similarly, if they said they didn’t change their mind, they were less likely to report a lasting effect.
| |
Follow-up Questions: Lasting effects, split by “Did you change your mind?” | Report | People who said “yes” to whether they changed their minds during the interview were more likely to say that the interview had a lasting change on their beliefs than people who said “no”. People whose responses were tagged as “ambiguous” as to whether they’d changed their minds basically never said the interview had a lasting change on their beliefs. People with “None/NA” responses to whether they changed their minds were split for reporting if the interview had a lasting change on their beliefs or not. (<-- I’m going to ignore most of this except for the main Yes vs No effect though, since that’s the clearer one.) | ![]() ![]() |
Follow-up Questions: New actions, split by “Did you change your mind?” | Report | Interviewees who said they had not changed their mind during the interview never reported that the interview caused them to take a new action(s) in their work. |
This finding is super interesting! It turns out that people saying they changed their mind during the interview (which isn’t even that rare: Yes: 24/58 (41%), Ambiguous: 12/58 (21%), No: 22/58 (38%), though noting there’s high selection + temporal bias for this question, so the true “Yes” probability is lower) is really very indicative of them saying the interview had a lasting effect on their beliefs 5-6 months later! That’s great, means you can ask that during a conversation and it means something.
There’s also the weaker finding that people who don’t believe AGI will happen tend to not report changing their minds during the interview. This is perhaps expected.
My original question here was: I’d like to know whether there’s a way to tell how flexible individuals are in their beliefs and if that correlates with anything.
There are two variables directly related to “flexibility of beliefs”: “Did you change your mind?” and “Did the interview have a lasting effect on your beliefs?”
I’m most interested in the correlations here ("Demographics x Main Questions", and "Main Questions x Main Questions") rather than any split-by breakdowns (almost all of which we’ve looked at already, if they’re available.)
In particular, I’m looking for any “Demographics x Main Questions” correlations under p < .05 that involve “chgmind” or “lastingeffects_yes”. After that, even more liberally, I’m looking for any correlations with rho >= 0.20. And then I’m doing a similar search within “Main Questions x Main Questions”.
Demographics X Main Questions, Using Field2 Labels, “Did you change your mind?” | Report |
Main Questions x Main Questions, “Did you change your mind?” | Report |
Demographics X Main Questions, Using Field2 Labels, Lasting effects | Report |
Main Questions x Main Questions, Lasting effects | Report |
Overall, I’d say the answer is “no, we can’t predict individual flexibility in beliefs”.
(Outside of what’s already been mentioned earlier, about the strong correlation between “changed mind” <> “lasting effects” (which is a bit circular and thus not really relevant here), and weaker correlation (rho=.24) between “changed mind” <> “thinking AGI would happen”. Note “lasting effects” <> “thinking AGI would happen” is only rho=.10.)
We’ve got some weak effects where women are less likely to report having changed their minds during the interview compared to men (there were only 8/97 (8%) women in this series), and some correlations with specific fields (the strongest ones are: NLP is associated with reporting not changing one’s mind, and Computing with reporting a lasting effect) that are maybe loosely tied to “previous exposure to AI safety / alignment makes one less likely to report changing one’s mind” but that could be spurious.
Something that’s come up when I’ve been plotting graphs for the “work on this” variable, and looking at “workon_interestedOrYes” correlations, is that I’d previously thought this variable was tracking something like “sympathy to the AI risk arguments”, and now I don’t think it’s tracking that. What is it tracking, though?
Let’s do some correlations with “work on this”. I’m looking for correlations under p < .05 that involve workon_interestedOrYes, in “Demographics x Main Questions”, or "Main Questions x Main Questions". After that, even more liberally, I’m looking for any correlations with rho >= 0.20. And there’s also some extra split-by graphs for inspection.
Demographics X Main Questions, Using Field2 Labels, workon_interestedOrYes | Report |
Qualitative response to Follow-up Question: New Actions | Not available elsewhere – this is everyone who sent me an optional qualitative note about this question |
Main Questions x Main Questions, workon_interestedOrYes | Report |
Follow-up Questions: Lasting Effects, split by “Work on this” | Report |
Follow-up Questions: New Actions, split by “Work on this” | Report |
Work on this, split by Alignment Problem | Report |
Work on this, split by Instrumental Incentives | Report |
This question is not an interaction effect, but I’m pretty curious anyway. We’ve got a few ways to measure this: whether people changed their mind during the interview, two follow-up questions: lasting effects and new actions, and “work on this”. Just a quick skim through those results:
Did you change your mind? | Report | No: 22/58 (38%). Ambiguous: 12/58 (21%). Yes: 24/58 (41%). Note there’s bias for this question.
|
Follow-up Questions: Lasting Effects | Report | Responses present for 82/86 (95%) emailed participants. Of the participants, 42 (51%) said yes. |
Follow-up Questions: Lasting Effects | Report | Responses present for 82/86 (95%) emailed participants. Of the participants, 12 (15%) said yes. |
Work on this (noting that this isn’t exactly a sympathy measure) | Report | Yes: 3/97, “Interested in long-term safety but”: 13/97, No: 35/97, None/NA: 46/97, but only 55 people were asked. Note there’s bias for this question.
|
I was quite surprised by the number of researchers who replied to my follow-up email or reminder emails (82/86 contacted). 51% is also a high number for people saying that the interview had a lasting effect on their beliefs. 15% saying the interview caused them to take a new action(s) at work seems interesting (though note none of those people had said they’d be interested in working on AI alignment research during the interview). It’s hard to know what “a new action” was, or what the lasting effect on their beliefs was. Reassuringly, the qualitative commentary that some interviewees left with respect to “a lasting effect on beliefs” suggested this was tracking something meaningful to many people, and the people who left “no” comments with respect to “a new action(s)” were mentioning things like projects or decision-making (search “Qualitative response to Follow-up Question: New Actions” to find that data).
It’s also hard to know what a “Yes” to “Have you changed your mind on anything in this interview and how was this interview for you” means, and 24/58 (41%) is an inflated percentage because of the selection effect for this question, but 24 people saying they changed their minds during the interview is pretty cool. Especially neat especially considering how this variable correlates with the interview having a lasting effect 5-6 months later.
“Work on this” doesn’t exactly seem to correlate with sympathy to AI risk measures, and I wasn’t super convinced anyone in the “Interested in long-term safety but” were going to take actions, plus there’s a selection bias in how the question was asked, so I’m uncertain how to interpret this measure.