This is a linkpost for some interesting discussions of info security norms in AI. I threw the post below together in 2 hours, just to have a bunch of quotes and links for people, and to have the context in one place for a discussion here on LW (makes it easier for common knowledge of what the commenters have and haven't seen). I didn't want to assume people follow any news on LW, so for folks who've read a lot about GPT-2 much of the post is skimmable.
In February, OpenAI wrote a blogpost announcing GPT 2:
We’ve trained a large-scale unsupervised language model which generates coherent paragraphs of text, achieves state-of-the-art performance on many language modeling benchmarks, and performs rudimentary reading comprehension, machine translation, question answering, and summarization—all without task-specific training.
This has been a very important release, not least due to it allowing fans to try (and fail) to write better endings to Game of Thrones. Gwern used GPT-2 to write poetry and anime. There have been many Medium posts on GPT-2, some very popular, and at least one Medium post on GPT-2 written by GPT-2. There is a subreddit where all users are copies of GPT-2, and they imitate other subreddits. It got too meta when the subreddit imitated another subreddit about people play-acting robots-pretending-to-be-humans. Stephen Woods has lots of examples including food recipes.
Here in our rationality community, we created user GPT-2 trained on the entire corpus of LessWrong comments and posts and released it onto the comment section on April 1st (a user who we warned and then banned). And Nostalgebraist created a tumblr trained on the entire writings of Eliezer Yudkowsky (sequences+HPMOR), where Nostalgebraist picked their favourites to include on the Tumblr.
There was also very interesting analysis on LessWrong and throughout the community. The post that made me think most on this subject is Sarah Constantin's Human's Who Are Not Concentrating Are Not General Intelligences. Also see SlateStarCodex's Do Neural Nets Dream of Electric Hobbits? and GPT-2 As Step Toward General Intelligence, plus my teammate jimrandomh's Two Small Experiments on GPT-2.
However, these were all using a nerfed version of GPT-2, which only had 175 million parameters, rather than the fully trained model with 1.5 billion parameters. (If you want to see examples of the full model, see the initial announcement posts for examples with unicorns and more.)
Reasoning for only releasing a nerfed GPT-2 and response
Due to our concerns about malicious applications of the technology, we are not releasing the trained model.
While the post includes some discussion of how specifically GPT-2 could be used maliciously (e.g. automating false clickbait news, automated spam, fake accounts) the key line is here.
This decision, as well as our discussion of it, is an experiment: while we are not sure that it is the right decision today, we believe that the AI community will eventually need to tackle the issue of publication norms in a thoughtful way in certain research areas.
Is this out of character for OpenAI - a surprise decision? Not really.
Nearly a year ago we wrote in the OpenAI Charter: “we expect that safety and security concerns will reduce our traditional publishing in the future, while increasing the importance of sharing safety, policy, and standards research,” and we see this current work as potentially representing the early beginnings of such concerns, which we expect may grow over time.
Other disciplines such as biotechnology and cybersecurity have long had active debates about responsible publication in cases with clear misuse potential, and we hope that our experiment will serve as a case study for more nuanced discussions of model and code release decisions in the AI community.
Public response to decision
There has been discussion in news+Twitter, see here for an overview of what some people in the field/industry have said, and what the news media has written. The main response that's been selected for by news+twitter is that OpenAI did this primarily as a publicity stunt.
For a source with a different bias than the news and Twitter (which selects heavily for anger and calling out of norm violation), I've searched through all Medium articles on GPT-2 and copied here any 'most highlighted comments'. Most posts actually didn't have any, which I think means they haven't had many viewers. Here are the three I found, in chronological order.
OpenAIs GPT-2: The Model, The Hype, and the Controvery
As ML researchers, we are building things that affect people. Sooner or later, we’ll cross a line where our research can be used maliciously to do bad things. Should we just wait until that happens to decide how we handle research that can have negative side effects?
OpenAI GPT-2: Understanding Language Generation through Visualization
Soon, these deepfakes will become personal. So when your mom calls and says she needs $500 wired to the Cayman Islands, ask yourself: Is this really my mom, or is it a language-generating AI that acquired a voice skin of my mother from that Facebook video she posted 5 years ago?
GPT-2, Counting Consciousness and the Curious Hacker
If we have a system charged with detecting what we can and can’t trust, we aren’t removing our need to invest trust, we are only moving our trust from our own faculties to those of the machine.
I wrote this linkpost to discuss the last one. See below.
From the initial OpenAI announcement:
We are aware that some researchers have the technical capacity to reproduce and open source our results. We believe our release strategy limits the initial set of organizations who may choose to do this, and gives the AI community more time to have a discussion about the implications of such systems.
Since the release, one researcher has tried to reproduce and publish OpenAI's result. Google has a program called TensorFlow Research Cloud that gives loads of free compute to researchers affiliated with various universities, which let someone train an attempted copy of GPT-2 with 1.5 billion parameters. They say:
I really can’t express how grateful I am to Google and the TFRC team for their support in enabling this. They were incredibly gracious and open to allowing me access, without requiring any kind of rigorous, formal qualifications, applications or similar. I can really only hope they are happy with what I’ve made of what they gave me.
...I estimate I spent around 200 hours working on this project.... I ended up spending around 600–800€ on cloud resources for creating the dataset, testing the code and running the experiments
That said, it turned out that the copy did not match up in skill level, and is weaker even than nerfed model OpenAI released. The person who built it says (1) they think they know how to fix it and (2) releasing it as-is may still be a helpful "shortcut" for others interested in building a GPT-2-level system; I don't have the technical knowledge to assess these claims, and am interested to hear from others who do.
During the period where people didn't know that the attempted copy was not successful, the person who made the copy wrote a long and interesting post explaining their decision to release the copy (with multiple links to LW posts). It discussed reasons why this specific technology may cause us to better grapple with misinformation on the internet that we hear. The author is someone who had a strong object level disagreement with the policy people at OpenAI, and had thought pretty carefully about it. However, it opened thus:
Disclaimer: I would like it to be made very clear that I am absolutely 100% open to the idea that I am wrong about anything in this post. I don’t only accept but explicitly request arguments that could convince me I am wrong on any of these issues. If you think I am wrong about anything here, and have an argument that might convince me, please get in touch and present your argument. I am happy to say “oops” and retract any opinions presented here and change my course of action.
As the saying goes: “When the facts change, I change my mind. What do you do?”
TL;DR: I’m a student that replicated OpenAI’s GPT2–1.5B. I plan on releasing it on the 1st of July. Before criticizing my decision to do so, please read my arguments below. If you still think I’m wrong, contact me on Twitter @NPCollapse or by email (firstname.lastname@example.org) and convince me. For code and technical details, see this post.
And they later said
[B]e assured, I read every single comment, email and message I received, even if I wasn’t able to respond to all of them.
On reading the initial I was genuinely delighted to see such pro-social and cooperative behaviour from the person who believed OpenAI was wrong. They considered unilaterally overturning OpenAI's decision but instead chose to spend 11,000 words explaining their views and a month reading others' comments and talking to people. This, I thought, is how one avoids falling prey to Bostrom's unilateralist curse.
Their next post The Hacker Learns to Trust was released 6 days later, where they decided not to release the model. Note that they did not substantially change their opinions on the object level decision.
I was presented with many arguments that have made me reevaluate and weaken my beliefs in some of the arguments I presented in my last essay. There were also many, maybe even a majority of, people in full support of me. Overall I still stand by most of what I said.
...I got to talk to Jack Clark, Alec Radford and Jeff Wu from OpenAI. We had a nice hour long discussion, where I explained where I was coming from, and they helped me to refine my beliefs. They didn’t come in accusing me in any way, they were very clear in saying they wanted to help me gain more important insight into the wider situation. For this open and respectful attitude I will always be grateful. Large entities like OpenAI often seem like behemoths to outsiders, but it was during this chat that it really hit me that they were people just like me, and curious hackers to boot as well.
I quickly began to understand nuances of the situation I wasn’t aware of. OpenAI had a lot more internal discussion than their blog post made it seem. And I found this reassuring. Jack in particular also gave me a lot of valuable information about the possible dangers of the model, and a bit of insight into the workings of governments and intelligence agencies.
After our discussion, I had a lot to think about. But I still wasn’t really convinced to not release.
They then talked with Buck from MIRI (author of this great post). Talking with Buck lead them to their new view.
[T]his isn’t just about GPT2. What matters is that at some point in the future, someone will create something truly dangerous and there need to be commonly accepted safety norms before that happens.
We tend to live in an ever accelerating world. Both the industrial and academic R&D cycles have grown only faster over the decades. Everyone wants “the next big thing” as fast as possible. And with the way our culture is now, it can be hard to resist the pressures to adapt to this accelerating pace. Your career can depend on being the first to publish a result, as can your market share.
We as a community and society need to combat this trend, and create a healthy cultural environment that allows researchers to take their time. They shouldn’t have to fear repercussions or ridicule for delaying release. Postponing a release because of added evaluation should be the norm rather than the exception. We need to make it commonly accepted that we as a community respect others’ safety concerns and don’t penalize them for having such concerns, even if they ultimately turn out to be wrong. If we don’t do this, it will be a race to the bottom in terms of safety precautions.
We as a community of researchers and humans need to trust one another and respect when one of us has safety concerns. We need to extend understanding and offer help, rather than get caught in a race to the bottom. And this isn’t easy, because we’re curious hackers. Doing cool things fast is what we do.
The person also came to believe that the AI (and AI safety) community was much more helpful and cooperative than they'd expected.
The people at OpenAI and the wider AI community have been incredibly helpful, open and thoughtful in their responses to me. I owe to them everything I have learned. OpenAI reached out to me almost immediately to talk and they were nothing but respectful and understanding. The same applies to Buck Shlegeris from MIRI and many other thoughtful and open people, and I am truly thankful for their help.
I expected a hostile world of skepticism and competition, and there was some of that to be sure. But overall, the AI community was open in ways I did not anticipate. In my mind, I couldn’t imagine people from OpenAI, or MIRI, or anywhere else actually wanting to talk to me. But I found that was wrong.
So this is the first lesson: The world of AI is full of smart, good natured and open people that I shouldn’t be afraid of, and neither should you.
Overall, the copy turned out not to be strong enough to change the ability for malicious actors to automate spam/clickbait, but I am pretty happy with the public dialogue and process that occurred. It was a process whereby, in a genuinely dangerous situation, the AI world would not fall prey to Bostrom's unilateralist's curse. It's encouraging to see that process starting to happen in the field of ML.
I'm interested to know if anyone has any different takes, info to add, or broader thoughts on information-security norms.
Edited: Thanks to 9eB1 for pointing out how nerfed the copy was, I've edited the post to reflect that.
As is always the case, this person changed their mind because they were made to feel valued. The community treated what they'd done with respect (even though, fundamentally, they were unsuccessful and the actual release of the model would have had no impact on the world), and as a result they capitulated.
While I agree that this is an important factor when modelling people’s decision-making, I think there is some straightforward evidence that this was not the primary factor here.
Firstly, after the person spent an hour talking to friendly and helpful people from the high-status company, they did not change their decision, which is evidence against most parsimonious of status-based motives. (Relatedly, there was not a small set of people the author promised to read feedback from, but literally 100% of respondents, which is over-and-above what would be useful for getting the attention of key people.)
And secondly, which is more persuasive for me though harder to communicate, I read the extensive reasons for their decisions for doing so, and they seemed clear and well-reasoned, and then the reasons against were important factors that are genuinely nuanced and hard to notice. It seemed to me more of a situation where someone actually improves their understanding of the world than one in which they were waiting for certain high-status-to-them people to give them attention. My sense is that writing that explains someone’s decisions that is wholly motivated by status makes less sense than these two posts did.
You might still be right and I might have missed something, or just not have a cynical enough prior. Though I do believe people do sometimes change their actions due to good reasoning about the world and not solely due to immediate status considerations, and I feel very skeptical of any lens on the world that can’t (“As is always the case”) register a positive result on the question “Did this person make their decision due to updating their world model rather than short-sighted status-grabbing?”.
Am interested to hear further thoughts of yours on the broader topic of modelling people’s decision making as primarily status based, if you have more things to add to the discussion.
The phenomenon I was pointing out wasn't exactly that the person's decision was made because of status. It was that a prerequisite for them changing their mind was that they were taken seriously and engaged with respectfully. That said, I do think that its interesting to understand the way status plays into these events.
First, they started the essay with a personality-focused explanation:
To explain how this all happened, and what we can learn from it, I think it’s important to learn a little bit more about my personality and with what kind of attitude and world model I came into this situation.
I have a depressive/paranoid streak, and tend to assume the worst until proven otherwise. At the time I made my first twitter post, it seemed completely plausible in my mind that no one, OpenAI or otherwise, would care or even notice me. Or, even worse, that they would antagonize me."
The narrative that the author themselves is setting up is that they had irrational or emotional reasons for behaving the way they did, then they considered longer and changed their mind. They also specifically call out that their perceived lack of self-status as an influencing factor.
If someone has an irrational, status-focused explanation for their own initial reasoning, and then we see high-status people providing them extensive validation, it doesn't mean that they changed their mind because of the high-status people, but it's suggestive. My real model is that they took those ideas extra seriously because the people were nice and high status.
Imagine a counterfactual world where they posted their model, and all of the responses they received were the same logical argument, but instead made on 4Chan and starting with "hey fuckhead, what are you trying to do, destroy the world?" My priors suggest that this person would have, out of spite, continued to release the model.
The gesture they are making here, not releasing the model, IS purely symbolic. We know the model is not as good as mini-GPT2. Nonetheless, it may be useful to people who aren't being supported by large corporate interests, either for learning or just for understanding ML better for real hackers. Since releasing the model is not a bona fide risk, part of not releasing it is so they can feel like they are part of history. Note the end where they talk about the precedent they are setting now by not releasing it.
I think the fact that the model doesn't actually work is an important aspect of this. Many hackers would have done it as a cool project and released it without pomp, but this person put together a long essay, explicitly touting the importance of what they'd done and the impact it would have on history. Then, it turned out the model did not work, which must have been very embarrassing. It is fairly reasonable to suggest that the person then took the action that made them feel the best about their legacy and status: writing an essay about why they were not releasing the model for good rationalist approved reasons. It is not even necessarily the case that the person is aware that this is influencing the decision, this is a fully Elephant in the Brain situation.
When I read that essay, at least half of it is heavily-laden with status concerns and psychological motivations. But, to reiterate: though pro-social community norms left this person open to having their mind changed by argument, probably the arguments still had to be made.
How you feel about this should probably turn on questions like "Who has the status in this community to have their arguments taken seriously? Do I agree with them?" and "Is it good for only well-funded entities to have access to current state-of-the-art ML models?"
I agree with a lot of claims in your comment, and I think it's valuable to think through how status plays a role in many situations, including this.
There is an approach in your comments toward explaining someone's behaviour that I disagree with, though it may just be a question of emphasis. A few examples:
My real model is that they took those ideas extra seriously because the people were nice and high status.
...a prerequisite for them changing their mind was that they were taken seriously and engaged with respectfully
These seem to me definitely true and simultaneously not that important*.
When I read that essay, at least half of it is heavily-laden with status concerns and psychological motivations. But, to reiterate: though pro-social community norms left this person open to having their mind changed by argument, probably the arguments still had to be made. (emphasis added)
The word 'probably' in that sentence feels false to me. I feel somewhat analogous to hearing someone argue that a successful tech startup is 100s of people working together in a company, and that basically running a tech startup is about status and incentives, though "probably code still had to be written" to make it successful. They're both necessary.
More generally, there are two types of games going on. One we're allowed to talk about, and one we're not, or at least not very directly. And we have to coordinate on both levels to succeed. This generally warps how our words relate to reality, because we're also using those words to do things we're pretending to ourselves we're not doing, to let everyone express their preferences and coordinate in the silent games. These silent games have real and crucial implications for how well we can coordinate and where resources must be spent. But once you realise the silent games are being played, it isn't the right move to say that the silent games are the only games, or always the primary games.
Again, I agree that something in this reference class is likely happening. But, for example, the long essay was not only about increasing the perceived importance of the action. It was also a strongly pro-social and cooperative move to the broader AI community to allow counterarguments to be presented, which is what successfully happened. There are multiple motives here, and (I think) it's the case that the motive you point to was not the main one, even while it is a silent motive folks systematically avoid discussing.
*Actually I think that Connor in particular would've engaged with arguments even if they'd not been delivered respectfully, given that he responded substantively to many comments on Twitter/HackerNews/Medium, some of which were predominantly snark.
When Robin Hanson is interviewed about The Elephant in the Brain, he is often asked "Are you saying that status accounts for all of our behaviour?". His reply is that he+KevinSimler aren't arguing that the hidden motives are the only motive, but that they're a far more common motive than we give credit for in our normal discourse. Here's an example of him saying this kind of thing on the 80k podcast:
As we just said the example that, in education, your motive isn’t to learn the material, or when you go to the doctor, your motive isn’t to get well primarily, and the hidden motives are the actual motive. Now, how could I know what the hidden motives are, you might ask? The plan here, that’s where the book is … In each area, we identify the usual story, then we collect a set of puzzles that don’t make sense from the point of view of the usual story, strange empirical patterns, and then we offer an alternative motive that makes a lot more sense of those empirical patterns, and then we suggest that that is a stronger motive than the one we usually say.
Now, just to be clear, almost every area of human life is complicated, and there’s a lot of people with a lot of different details and so, of course, almost every possible motive shows up in almost every area of human life, so we can’t be talking about the only motive, and so the usual motive does actually apply sometimes. Actually, you could think of the analogy to the excuse that the dog ate my homework. It only works because sometimes dogs eat homework. We don’t say the dragon ate my homework. That wouldn’t fly, so the usual story is part of the story. It’s just a smaller part than we like to admit, and what we’re going to call the hidden motive, the real motive is a bigger part of the story, but it’s still not the only part.
it turned out the model did not work... It is fairly reasonable to suggest that the person then took the action that made them feel the best about their legacy and status
Reading this I realise I developed most of my attitudes toward the topic when I believed that the copy was full-strength, and only in writing the post did I find out that it wasn't - in fact it seems that it was weaker than the initial 117M version OpenAI released. You're right that this makes the 'release' option less exciting from the perspective of one's personal status, which (the status lens) would then predict taking whichever different action would give more personal status, and this is arguably one of those actions.
Just now I found this comment in the medium comment section, where Connor agrees with you about it being symbolic, and mentions how this affected his thinking.
...I did admit failure as I linked to said failure in the very first paragraph, and I have no intentions of hiding that. In fact, after learning of my failure I was convinced I might as well release, since most safety issues were no longer a threat anyways (though there remains the possibility it could be used as a “warm start” to train a better model). So if anything, my failure encouraged me to dump it, apologize and let history take its course.
My decision not to release is mostly symbolic. I’m doing it to signal good faith cooperation. Even if I failed today, some day someone will succeed, and we should have a default of cooperation before that.
(Meta: Wow, Medium requires you to click twice to go down one step in a comment thread! Turns out there are like 20 comments on the OP.)
Yeah, this is quite important, the attempted copy was weaker than the nerfed model OpenAI initially released. Thanks for emphasising this 9eB1, I've updated my post in a few places accordingly.
The phenomenon I was pointing out wasn't exactly that the person's decision was made because of status. It was that a prerequisite for them changing their mind was that they were taking seriously and engaged with respectfully.
The phenomenon I was pointing out wasn't exactly that the person's decision was made because of status. It was that a prerequisite for them changing their mind was that they were taking seriously and engaged with respectfully.
Yeah, respectful and serious engagement with people’s ideas, even when you’re on the opposite sides of policy/norm disputes, is very important.
On reading that I was genuinely delighted to see such pro-social and cooperative behaviour from the person who believed OpenAI was wrong.
I think the pro-social and cooperative thing to do was to email OpenAI privately rather than issuing a public ultimatum.
I’m imagining here something like a policy of emailing OpenAI and telling them your plan and offering them as much time to talk as possible, and saying that in a week you’ll publicly publish your reasoning too so that other people can respond + potentially change your mind. I also think it would’ve been quite reasonable to not expect any response from a big organisation like OpenAI, and to be doing it only out of courtesy.
It seems from above that talking to OpenAI didn’t change Connor’s mind, and that public discourse was very useful. I expect Buck would not have talked to him if he hadn’t done this publicly (I will ask Buck when I see him) (Added: Buck says this is true). Given the OP I don’t think it would’ve been able to resolve privately, and I think I am quite actively happy that it has resolved the way it has: Someone publicly deciding to not unilaterally break an important new norm, even while they strongly believe this particular application of the norm is redundant/unhelpful.
I’d be interested to know if you think that it would’ve been perfectly pro-social to give OpenAI a week’s heads-up and then writing your reasoning publicly and reading everyone else’s critiques (100% of random people from Hacker News and Twitter and longer chats with Buck). I have a sense that you wouldn’t but I’m not fully sure why.
I also think it would’ve been quite reasonable to not expect any response from a big organisation like OpenAI, and to be doing it only out of courtesy.
Yeah, that seems reasonable, but it doesn't seem like you could reasonably have 99% confidence in this.
It seems from above that talking to OpenAI didn’t change Connor’s mind, and that public discourse was very useful. I expect Buck would not have talked to him if he hadn’t done this publicly (I will ask Buck when I see him).
I agree with this, but it's ex-post reasoning, I don't think this was predictable with enough certainty ex-ante.
Given the OP I don’t think it would’ve been able to resolve privately, but if it had I think I’d be less happy than with what actually happened, which is someone publicly deciding to not unilaterally break an important new norm, even while they strongly believe this particular application of the norm is redundant/unhelpful.
It's always possible to publicly post after you've come to the decision privately. (Also, I'm really only talking about what should have been done ex-ante, not ex-post.)
That seems fine, and very close to what I would have gone with myself. Maybe I would have first emailed OpenAI, and if I hadn't gotten a response in 2-3 days, then said I would make it public if I didn't hear back in another 2-3 days. (This is all assuming I don't know anyone at OpenAI, to put myself in the author's position.)
If you want to build a norm, publicly visible use helps establish it.
As I mentioned above, it's always possible to publicly post after you've come to the decision privately.
If people choose whether to identify with you at your first public statement, switching tribes after that can carry along lurkers.
Agreed that this is a benefit of what actually happened, but I want to note that if you're banking on this ex ante, you're deciding not to cooperate with a group X because you want to publicly signal allegiance to group Y with the expectation that you will then switch to group X and take along some people from group Y.
This is deceptive, and it harms our ability to cooperate. It seems pretty obvious to me that we should not do that under normal circumstances.
(I really do only want to talk about what should be done ex ante, that seems like the only decision-relevant thing here.)
I was coming up with reasons that a nearsighted consequentialist (aka not worried about being manipulative) might use. That said, getting lurkers to identify with you, then gathering evidence that will sway you, and them, one way or the other, is a force multiplier on an asymmetric weapon pointed towards truth. You need only see the possibility of switching sides to use this. He was open about being open to be convinced. It's like preregistering a study.
You're right, it's too harsh to claim that this is deceptive. That does seem more reasonable. I still think it isn't worth it given the harm to your ability to coordinate.
I was coming up with reasons that a nearsighted consequentialist (aka not worried about being manipulative) might use.
Sorry, I thought you were defending the decision. I'm currently only interested in decision-relevant aspects of this, which as far as I can tell means "how the decision should be made ex-ante", so I'm not going to speculate on nearsighted-consequentialist-reasons.
Given that status seem to have always been coupled to reproductive success for a very long time, it should not be surprising that evolution wired up humans to be status seekers. This wasn't recognized back in the mid 90s and I got in considerable trouble for claiming this to be a common motive.