I am an adjunct instructor in history. Recently I was teaching the early Cold War in my American History survey class for my community college students and decided to go off on a 20 minute digression in class about how the 1950s nuclear arms race had implications not just for navigating the continuing risk of nuclear war in the present, but also for navigating the potential for an AI arms race. Most of my students either did not seem to be aware that there could potentially be an AI arms race, or were under the impression that such an AI arms race would merely be a subset of a larger military/economic race between geopolitical rivals like Russia and China that would touch upon things like cybersecurity or military drone technology. I doubt any of my students happened to notice Eliezer's editorial on Time.com, for example. There seemed to be little awareness that AI by itself might present an existential risk to all of humanity in the near future on a level even surpassing that of nuclear weapons.  

So I continued my digression and introduced my students (in history class) to some of the most basic concepts in AI existential risk, such as:

  1. Black boxes, i.e. ChatGPT is not just an "app" that some people created with conscious intent and known parameters.  Rather, it was "grown" or "evolved" with many rounds of self-modification of its hundreds of billions of parameters based on how well it predicted the next token of text each round.  (This was as specific/accurate as I got for time's sake).  Therefore, while we know the basic process that created ChatGPT, the final product is a "black box" that spits out responses for reasons we do not understand. 
  2. Goodhart's Law, i.e. just like how I want to maximize "student wisdom" or some other hard-to-measure variable like that, OpenAI wants to maximize something equally difficult to measure like "helpful answers."  Accordingly, instructors use multiple-choice questions and essay questions as easier-to-measure proxies to gauge student progress towards the real goal, while OpenAI uses things like prediction error and human ratings as proxies for what it actually wants.  Just like multiple-choice questions and essays were decent proxies for student wisdom in a certain bygone era/ancestral environment/training distribution (but seem less and less so now), so do things like prediction error and human ratings seem like decent proxies for measuring the helpfulness of answers.  But given a different environment context, such as one where students can copy essays from ChatGPT, or one where, say, a more powerful version of ChatGPT could take over the Earth and force humanity to sit in cages and give it continual thumbs-up ratings (oversimplifying, I know!), doing well on the proxy measures might diverge from attaining the originally intended goals.  I think my students understood this metaphor quite intuitively since I had already described ChatGPT early on in the semester while going over the syllabus and had explained that, in order for their essay responses to be guaranteed to be meaningful reflections of their own thought, students needed to include "personalized engagement" with the topic (such as relating it to personal life, family history, and/or current political events) in order to show that they had actually integrated a topic.  (And I'm not even sure that this assignment strategy of mine is sufficient).  

After explaining the risk of explaining how something like ChatGPT but more advanced could potentially go off-the-rails even with good intentions to try to "train" it to be helpful, some of my students were visibly unsettled. Nonetheless, I had to return to the main lesson for the sake of time.  But I wanted to delve into it with my students some more, and it got me wondering...

To my knowledge, there are no courses, extra-curricular activities, or anything at either the community college or the 4-year public university at which I teach that touch on the topic of AI risk, much less AI existential risk or AI alignment.  At the 4-year college there is a "Students in Tech" club, but looking at their past events and discord server, they don't seem to be very active and mostly function as a generic career/promotional club with little in the way of public lectures or any other sort of enrichment activities or public engagement, much less on AI existential risk issues. 

Is this a problem?    

Take an academic field like history, for example.  I'm most familiar with that.  Academic fields tend to follow a pyramid-shaped pattern of engagement.  

According to studentscholarships.org, as per Bing AI (lol!), there are about 3100 historians in the U.S.  Yeah, I'll buy that as a rough guess for the number of historians, i.e. people who actually write history or otherwise curate materials and contribute to the construction of new historical knowledge.  If you include archivists and museum curators, that figure might be a bit higher.  

Then, according to the Bureau of Labor Statistics, once again as per Bing AI (lol!), there are about 25,100 history post-secondary teachers.  To the extent that these don't overlap with the 3100 actual "historians" already mentioned, here we are mostly these days going to be talking about adjunct instructors who get paid to teach and do nothing else.  Some may try to write and research here and there on the side, but it's tough to have the time to meaningfully contribute in that way while also having a full course load.  Add to that about 160,000 high school history teachers, and you get about 185,000 history teachers whose main job is, in effect, transmitting, and trying to fairly represent and distill for popular audiences, what the 3100 actual historians produce. 

Then you probably have some fraction of Americans who have an amateur interest in history as a hobby.  Maybe they read a couple of history books each year.  Unfortunately, Bing AI was not so helpful on finding statistics on books sales in that subgenre.  

Then you have the fraction of Americans who have had to pass through at least a high school history class, which is going to be the vast majority of adults.  (What they retained from their high school football coach is another matter, of course...)

Then you have the fraction of Americans for whom history might be at least minimally relevant when appreciating life/making political decisions/etc., which I like to think would be nearly everyone to some extent, even kids who only get little bits and pieces of history at first through cultural osmosis.  

This is what a mature field that has wormed its way squarely into the cultural canon looks like.  Of course, history seems to be dying, so that position in the cultural canon might not be permanent.  But how does this compare with a nascent field like AI Existential Risk?

According to Paul Christiano's recent interview on the Bankless Podcast, there might be something like 50-100 people directly involved in researching AI Existential Risk/Alignment, with another several hundred doing adjacent work.  That's way too low.  Even speaking as someone in the history field with a vested interest in that field, I have to say that I'd prefer the places of history and AI Existential Risk to be swapped, at the very least, given how urgent the latter field seems to be.  

So should we be shooting for an AI Existential Risk/Alignment field that looks like this?

4000+ researchers

200,000+ educators/podcasters/popularizers

10 million+ frequent viewers/hobbyists

300 million+ Americans who at least have some vague familiarity with the topic.

But perhaps we just skimped on fleshing out the bottom sections of this pyramid and focused on pouring all the money into researchers.  Could we bootstrap that part of the pyramid even faster?  I doubt it.  Eliezer, Paul, and others have already commented that good talent, and the ability for funders to diagnose good talent and actual progress (as opposed to alignment-washing or even negative-value projects that put humanity in even more danger) are the bottlenecks, not money.  And how do you get that talent?  How do you get the financial ecosystem that can identify good researchers and projects from bad researchers and projects?  

That's exactly why you need the 10 million+ frequent viewers/hobbyists and the 200,000+ educators/podcasters/popularizers.  The hobbyists provide society with the ability to recognize progress from regress or lack of progress.  And the educators help train the "farm teams," as it were.  If I were to start an "AI Risk Club" at my local community college over the next 5 years, maybe I have 100 students pass through that club.  And maybe only 1 student actually has the IQ and inspiration needed to actually become more than just a hobbyist or popularizer in this field.  Well, if you get several thousand other clubs like that on college campuses, then you start getting towards that benchmark.  

That said, I am hesitant about making any overtures in this direction at my local colleges. Not so much because I'd be working outside of my field.  AI Existential Risk is partly technical, yes, but is still at such an early stage of development that it remains interdisciplinary.  I forget who it was who joked that fields always start out as subfields in "philosophy" until the problems become well-defined enough that they actually become their own thing.  

No, the bigger reason why I'd be hesitant about trying to start some sort of "AI Risk Club" is that I'm a little wary of the Dunning-Kruger Effect.  Part of the inherent risk of cultivating that base of 10 million+ hobbyists is the risk that you are going to have a whole lot more people who know just enough to think that they know more about the topic than they really do.  That said, I think most Americans are going to hear something about AI Risk as time goes on, and I'd rather it be me or others from the LessWrong quadrant of the tech industry to do it than Kamala Harris or whichever other politician/media personality gets appointed "AI tech Czar."

So, what do we think?  Is there any benefit to promoting more broad but relatively shallow engagement with the AI Existential Risk field, in the hope that some of this engagement will bubble-up into more eventual researchers who can do cutting-edge work?  Or would this just add yet more noise into a world already drowning in content and voices?  As my Marxist friends like to joke, does the world really need another Marxist podcast from a NYC flat with 50 viewers?  Does the world need more AI Risk podcasts?  Or would it just fuel the hype-train and dilute the sanity waterline?

New to LessWrong?

New Comment
1 comment, sorted by Click to highlight new comments since: Today at 9:54 PM

I wouldn't worry about adding more noise, the cream will rise to the top. Regardless of the quality of the audience/discourse there is definitely a benefit to promoting more engagement with the AI Existential Risk field but perhaps a lighter introduction is called for to avoid unintentionally perpetuating Dunning-Kruger effect. First start with spreading the larger and more accessible conversation about our relationship with technology (social media, genetic technology, material sciences, robotics, AI, space exploration etc.) and let people land on whichever fields they'd be most useful. AI risk sounds less hokey as part of a larger conversation and more plausible when placed in the perspective of the many ways technology might go wrong.