Poster Session on AI Safety

Neil Crawford

Context

I co-presented the above poster at the PPE Society Sixth Annual Meeting 2022 (henceforth ‘PPE Meeting’) in New Orleans on the 4th of November. Most of the 380 attendees were academics doing research on areas of philosophy that interact with politics or economics. The poster session, which was held at the end of a day of talks, lasted 1h30. There were around 6 posters being presented. In addition to providing attendees with a preview of the poster prior to the session, I gave them the following description:

In our poster session, we want to give an overview of the ‘alignment problem’ of artificial general intelligence (AGI). This is the problem of how to get AGI to do what we want. So far, it seems surprisingly and worryingly difficult. As things stand, AGI will likely be misaligned, resulting in catastrophic consequences. In addition to arguing why AGI is likely to be misaligned, we will also try to defend the assumption that AGI will be developed this century.

Goals

Practice talking with other academics about AI safety
Increase academics’ exposure to AI safety
Highlight AI safety as a special problem amongst the many problems within machine ethics and ethics of AI more generally
Increase my own understanding of AI safety and, in particular, get to grips with the core arguments for why we should be concerned
Gain a better understanding of how other academics conceive of AI safety, which arguments they perceive as unpersuasive and which they perceive as persuasive
Convince at least a couple of academics to take AI safety seriously and increase the likelihood that they connect their research to AI safety

Results

I believe I by and large achieved these goals. However, I was disappointed by the low turnout at the poster session. Out of (supposedly) 380 attendees, I estimate that around 60 were aware of my presentation topic, 25 were able to at least glance at the poster and 8 came to read the poster or interact with us.

Reactions

Which research question(s) am I working on?
Humans do all these things that I’ve claimed make AGI dangerous by default; are humans therefore just as dangerous?
Children do some of these things and they don’t immediately share our values but through praise and blame we mould them to alignment; can we not do something similar with AGI? In particular, could we train them to optimise for praise minus blame (or some similar function of the two) and then deploy them and incrementally increase their intelligence and capabilities, as is the case for human children?
Okay, AI compute will increase but why think that this will lead to human-like intelligence and the ability to self-improve?

Good calls

I divided the poster's content into 3 sections, with two images in the centre. I think the images helped attract people's attention and the sectioning made the structure of the arguments clearer and made it easy for people to stand on one side and read that side of the poster. With many people I began by talking about the third (right-hand-side) section, and only addressed the left-hand-side when they were sceptical of us ever developing AGI.
I used bullet points instead of full-sentence paragraphs. I think this made the poster more interesting (in any case, less boring) to read. It also meant that, as a result of less text, I could increase the text size.

Bad calls

I expected people who were confused by things like ‘AGI’ to come ask me out of curiosity but I think most preferred not to, perhaps due to unease in revealing their ignorance. For this reason, I should have made the poster more readable on its own.
I should have maxed out the size of the poster so that people could read it even from far away.
I should have displayed a list of promising research questions relevant to PPE academics. I had hoped to come up with a list of such questions but could only think of a couple and decided not to include any for the sake of space and because I'd have been unable to share many further insights in conversation.

Uncertainties

Should I have focused more on AI governance given the audience? I think that whilst AI governance might have been more relevant to the attendees, I would not have done a very good job of presenting on it.^[1] Additionally, I think that AI governance is difficult to motivate strongly unless the audience is already aware of the alignment problem.
Should I have name-dropped more academics who acknowledge the risks of misaligned AGI? As a philosopher, I've been trained to become suspicious around appeals to authority. However, it seems that many philosophy academics are heavily reliant on what those they respect say.^[2] I was under the impression that Nick Bostrom and Toby Ord are not so well respected amongst philosophy academics but Stuart Russell is. However, the one time I did name-drop Stuart Russell, my interlocutor hadn't heard of him but was already in agreement with my conclusion due to Peter Railton supposedly also being in agreement. It seems that appeals to authority seem to work for some people. So, maybe I should have name-dropped more academics who are well-respected amongst philosophers. Relatedly, much of my content, including the two images, are taken from the Youtuber Robert Miles and the alarmist and unaccredited Eliezer Yudkowsky. I chose not to cite them in part because ad hominems are just as psychologically powerful as appeals to authority.

Credits

Thanks to Marius Hobbhahn for sharing his poster and for telling me about his experience presenting on AI safety. Thanks to Andrew Gewecke for co-presenting this poster with me. Thanks to Nick Cohen for answering some of my questions preceding and succeeding the presentation. Thanks also to Robert Miles and Eliezer Yudkowsky for the content and apologies for not citing either of you in the poster itself.

Contact

Feedback is most welcome. Either post in the comments section or reach out to me directly. My contact information is listed on my profile. If you find my poster useful as a template for your own presentation, feel free to steal it as I did from others. Just make sure you share your own writeup and include a link to mine.

^{^}
Relatedly, I feel that we need better online resources concerning AI governance. The topic doesn't even have it's own Wikipedia page yet!
^{^}
I suppose this is normal given that many arguments are complex and we don't have enough time to figure out for ourselves which are sound and which are not, and what a well-reasoned thinker says probably is a good enough guide to truth in many circumstances.

I agree that having a section on "what to do about it" is really useful for getting people interested. Otherwise you have a lot of unresolved tension.

Totally! I'll make sure to include such a section next time I present on AI safety or AI governance. After a quick Google search I found the following link post which would have been useful prior to the PPE Society poster session: https://forum.effectivealtruism.org/posts/kvkv6779jk6edygug/some-ai-governance-research-ideas

Some quick comments based purely on the poster (which is probably the most important part of your funnel):

"Biological Anchors" is probably not a meaningful term for your audience.

We have a 50% chance of recreating that amount of relevant computation by 2060

This seems wrong in that we already have around brain training levels of computation now or will soon - far before 2060. The remaining uncertainty is over software/algorithms, not hardware. We already have the hardware or are about to.

Once AI is capable of ML programming, it could improve its algorithms, making itself better at ML programming

This is overly specific - why only ML programming? What if the lowest hanging fruit is actually in cuda programming? Or just moving to different hardware? Or designing new hardware? Or better networking tech? Or one wierd trick to make a trillion dollars and quickly scale to more hardware? Etc etc. The idea that there are enormous gains in further optimization of ML architecture alone, and that this unending cornucopia of optimization low hanging fruit will still be bountiful and limitless by the time we actually get AGI - this suggests a very naive view of ML & neuroscience.

Just replace "ML programming" with "science and engineering R&D" or similar.

Training AI requires us to select an objective function to be maximized, yet coming up with an unproblematic objective function is really hard.

Many smart people will bounce hard off this, because they have many many examples where coming up with an unproblematic objective function isn't really hard at all. It's trivial to write the correct objective function for Chess or Go. It was trivial to design the correct utility function for atari, for minecraft even (which doesn't have a score!), it was also trivial for optimizing datacenter power usage, for generating high quality images from text, for every other modern example of DL, etc etc.

I would change this to something like:

"Training AI requires us to select an objective function to be maximized, yet coming up with an unproblematic objective function for AGI - agents with general intelligence beyond that of humans - seems really hard".

Thanks, Jacob! This is helpful. I've made the relevant changes to my copy of the poster.

Regarding the 'biological anchors' point, I intended to capture the notion that it is not just the level/amount of computation that matters by prefixing with the word 'relevant'. When expanding on that point in conversation, I am careful to point out that generating high levels of computation isn't sufficient for creating human-level intelligence. I agree with what you say. I also think you're right about the term "biological anchors" not being very meaningful to my audience. Given that, from my experience, many academics see the poster but don't ask questions, it's probably a good idea for me to substitute this term for another. Thanks!

Can I put this up at my school? Is there a good AIS poster like this elsewhere?

Sure! And yeah, there's one by Marius Hobbhahn here: Lessons learned from talking to >100 academics about AI safety — LessWrong

I don't think you need to view namedropping as an appeal to authority. The natural way to do it in a scholarly document, including a poster, would be to cite a source. That's giving the reader valuable information - a way to check out the authority behind it.

Of course, if the reader is familiar with the author cited and knows that their work is invariably strong, they might choose to take it on authority as a shortcut, but they have the info at hand to check into it if they wish.

I think that's right, but I think that who I cite in this case matters a lot to whether people take it seriously. This is why I chose not to cite Miles or Yudkowsky, though I'm aware that this is academically bad practice. In hindsight, I could have included some quote from Peter Railton but it doesn't feel right to do this for the sake of adding an authority to the list of citations. Thanks!