Crossposted from the AI Alignment Forum. May contain more technical jargon than usual.

AI researchers and others are increasingly looking for an introduction to the alignment problem that is clearly written, credible, and supported by evidence and real examples. The Wikipedia article on AI Alignment has become such an introduction.


Aside from me, it has contributions from Mantas Mazeika, Gavin Leech, Richard Ngo, Thomas Woodside (CAIS), Sidney Hough (CAIS), other Wikipedia contributors, and copy editor Amber Ace. It also had extensive feedback from this community.

In the last month, it had ~20k unique readers and was cited by Yoshua Bengio.

We've tried hard to keep the article accessible for non-technical readers while also making sense to AI researchers.

I think Wikipedia is a useful format because it can include videos and illustrations (unlike papers) and it is more credible than blog posts. However, Wikipedia has strict rules and could be changed by anyone.

Note that we've announced this effort on the Wikipedia talk page and shared public drafts to let other editors give feedback and contribute.

I you edit the article, please keep in mind Wikipedia's rules, use reliable sources, and consider that we've worked hard to keep it concise because most Wikipedia readers spend <1 minute on the page. For the latter goal, it helps to focus on edits that reduce or don't increase length. To give feedback, feel free to post on the talk page or message me. Translations would likely be impactful.

New Comment
10 comments, sorted by Click to highlight new comments since:

Yeah, generally when competent people hear a new word (e.g. AI Alignment, Effective Altruism, etc), they go to wikipedia to get a first impression overview of what it's all about.

When you look at it like that, lots of pages e.g. Nick Bostrom and Effective Altruism, seem to have been surprisingly efficiently vandalized to inoculate new people against longtermism and EA, whereas Eliezer Yudkowsky and MIRI are basically fine.

EDIT: I didn't mean to imply anything against Yud or MIRI here, I was being absentminded, and if I was paying more attention to that sort of thing at the time I wrote that, I would have went and found a non-Yud third example of a wikipedia article that was fine (which is most wikipedia articles). In fact, I strongly think that if Yud and MIRI are being hated on by the forces of evil, people should mitigate/reject that by supporting them, and label/remember the people who personally gained status by hopping on the hate train.

Skimming the Nick Bostrom and Effective Altruism Wikipedia pages, there doesn't seem to be anything particularly wrong with them, certainly not anything that I would consider vandalism. What do you see as wrong with those articles?

Likely referring to the "Racist e-mail controversy" section on Bostrom and the pervasive FTX and Bankman-Fried references throughout the EA article.

Wikipedia is a trusted brand for introducing new topics and has great placement with search engines. There are three potential headaches though.

(1) The Neutral Point of View (NPOV) rules mean in theory that one side of the argument can't dictate how a topic is dealt with, so even without a concerted effort there may creep in weasel words and various areas of balance. 93% chance of happening. It will be low impact on bias providing odd headaches but potentially improving article. About a 30% chance of making some of the article unreadable to a newcomer and 15% chance of lead being unreadable.

(2) A determined and coordinated group of editors with an agenda (or even a determined individual, which won't apply on an article as watched as AI alignment but may on more specialised subsidiary articles) can seriously change an article particularly over the long term. Another commentator has said that this process seems to have happened with the Effective Altruism article. So if (when) alignment becomes controversial it will attract detractors and this may be a determined group. 70% chance of attracting at least one determined individual and further 70% chance of them making sustained efforts on less watched articles. 30% chance of attracting a coordinated group of editors.

(3) Wikipedia culture skews to the American left. This will probably work for AI alignment as it seems to be on track to become a cultural marker for the blue side, but it may create a deeply hostile environment on Wikipedia if it's something that liberalism finds problematic for example as an obstacle to Democratic donors in tech or as a rival to greenhouse warming worries (I don't think either will happen, just that there are still plausible routes for the American left to become hostile to AI alignment). 15% chance of this happening, but the article will over time become actively harmful to awareness of AI alignment if it does.

I'd say there are two mitigations other than edit warring that I see. There may be many others.

(1) Links to other AI alignment resources, particularly in citations (these tend to survive unless there's a particularly effective and malevolent editor). Citation embedding will mean that the arguments can be still seen by more curious readers.

(2) Creating or reinforcing a recognised site which is acknowledged as a go to introduction. Wikipedia only stays first if there are no regular sites.

I think this is a great achievement and I wish I had the sense to be part of it, so none of this detracts from the achievement or recognition that it was much needed. And despite implied criticism of Wikipedia, I think it's a wonderful resource, just with its dangers.

Most Wikipedia readers spend less than a minute on a page?? I always read pages all the way through... even if they're about something that doesn't interest me much...

Often when I need a wikipedia article I’m using only the first paragraph to refresh my memory, or catch the general strokes of some thing I encountered in a piece of media was. Average use case is wondering, like, what the Burj Khalifa is, going to Wikipedia, and immediately knowing its the tallest skyscraper in Dubai. After that, I don’t really care too much, especially if I needed the information due to setting cues in some story.

Yeah I'm surprised by that figure too, it would imply most Wikipedia readers aren't even reading in any substantive way, just skimming and randomly stopping a few times at some keywords their brains so happen to recognize.

But then again GPT-4's writings are more coherent then a lot of high school and college undergrad essays, so maybe I shouldn't be surprised that average human reading patterns are likewise incoherent...

Depends why I'm on the page, for me. Pretty often I'm looking for something like "How many counties are there in [state] again?" or "What was [Author's] third book in [series] called?" and it's a quick wiki search + ctrl+f, close the tab a few seconds later.

Reduced it by ~43kb, though I don't know if many readers will notice as most of the reduction is in markup.

The article does not appear to address the possibility that some group of humans might intentionally attempt to create a misaligned AI for nefarious purposes. Are there really any safeguards sufficient to prevent such a thing, particularly if for example a state actor seeks to develop an AI with the intent of disrupting another country through deceit and manipulation?