Meta is experimenting with using AI to write Wikipedia articles: https://ai.facebook.com/research/publications/generating-full-length-wikipedia-biographies-the-impact-of-gender-bias-on-the-retrieval-based-generation-of-women-biographies 

I personally have a very bad feeling about this. I’m most afraid of this making it easier to spam the encyclopedia with fake information that looks plausible on the surface, and therefore doesn't get fully fact-checked.  This could also create perverse incentives where SEO companies put out false information online to bias Meta's algorithm, and thereby sneak their way in to the encyclopedia. The decision to make this open source seems incredibly foolish to me as well, considering how easily a service like this could be misused. (Edit: It has been pointed out to me that if they made it closed-source that wouldn't be great either, since then we would have no idea what it was doing under the hood. Either way I wouldn't be happy, so I'm not sure their choice counts as a point against them.)

Am I overreacting? Is this actually a good thing? Is this actually way worse than I think it is? Who knows!

What are your thoughts on this? 

New Comment
12 comments, sorted by Click to highlight new comments since: Today at 5:07 AM

FAIR publishing some research into long form text generation is basically unrelated to someone generating wikipedia articles and automatically uploading them. Researchers love using wikipedia in various ways because it's free and pretty high quality, and there's a lot of it. So tons and tons of publications do various things to and with wikipedia data. 

Yes maybe someone can download their code and do something nefarious, but I doubt it's any more useful for that sort of thing than other long form text generation approaches like GPT3.

thanks for the reassuring context :)

I'm positive that as these language models become more accessible and powerful, their misuse will grow massively. However, I believe open sourcing is the best option here; having access to such model allows us to create accurate automatic classifiers that detect outputs from such models. Media websites (e.g. Wikipedia, Twitter) could include this classifier in their pipeline for submitting new media.

Making such technologies closed source leaves researchers in the dark; due to the scaling-transformer hype, only a tiny fraction of the world's population have the financial means to train a SOTA transformer model.

After some consideration, I agree with you. Still can’t say I’m happy about it, but it’s a better option than closed source, for sure.

Presumably it'd take less manpower to review each article that the AI's written (i.e. read the citations & make sure the article accurately describes the subjects) than it would to write articles from scratch. I'd guess this is the case even if the claims seem plausible & fact-checking requires a somewhat detailed reading through of the sources.

That would be a much more boring task for most people than direct writing, and would attract fewer volunteers, I’d have to imagine

I think on balance this is a good thing.  More, better, fairer information on wikipedia is awesome.  
I do worry about different standards of editing for machine- and human-generated contributions, and I foresee some pain in edit wars and reversions until processes and norms evolve to handle mixed-source topics.

When the AI is capable of edit wars and defending its actions on the talk page, then Wikipedia will be truly doomed.

I was more thinking about the humans running the AI, not the AI itself having an advantage in the edit wars.  If the project gets special privileges and bypasses the normal (and sometimes painful) oversight by human volunteers, it can end up putting incorrect or low-value information in that's easier to create than to improve.

Agreed, if the AI is passing the "wikipedia contributor" turing test, then it's all over anyway.

if the AI is passing the "wikipedia contributor" Turing test, then it's all over anyway.

This is a very strong statement! Would you be willing to make a specific prediction conditional on an AI passing the 'Wikipedia contributor' Turing test? (something like "if that happens, I predict x will happen within [y unit of time] with z probability" or something of the sort)

Not that there'll necessarily be anyone around to register it if you're correct, but still...

Actually, I'll instead back off on my statement.  Having seen some of the low-quality discussions in edit wars, it's not actually a very high bar.  

lol I feel you on that one! 🙃