The Hacker Learns to Trust

[-]9eB16y100

As is always the case, this person changed their mind because they were made to feel valued. The community treated what they'd done with respect (even though, fundamentally, they were unsuccessful and the actual release of the model would have had no impact on the world), and as a result they capitulated.

[-]Ben Pace6y200

While I agree that this is an important factor when modelling people’s decision-making, I think there is some straightforward evidence that this was not the primary factor here.

Firstly, after the person spent an hour talking to friendly and helpful people from the high-status company, they did not change their decision, which is evidence against most parsimonious of status-based motives. (Relatedly, there was not a small set of people the author promised to read feedback from, but literally 100% of respondents, which is over-and-above what would be useful for getting the attention of key people.)

And secondly, which is more persuasive for me though harder to communicate, I read the extensive reasons for their decisions for doing so, and they seemed clear and well-reasoned, and then the reasons against were important factors that are genuinely nuanced and hard to notice. It seemed to me more of a situation where someone actually improves their understanding of the world than one in which they were waiting for certain high-status-to-them people to give them attention. My sense is that writing that explains someone’s decisions that is wholly motivated by status makes less sense than these two posts did.

You might still be right and I might have missed something, or just not have a cynical enough prior. Though I do believe people do sometimes change their actions due to good reasoning about the world and not solely due to immediate status considerations, and I feel very skeptical of any lens on the world that can’t (“As is always the case”) register a positive result on the question “Did this person make their decision due to updating their world model rather than short-sighted status-grabbing?”.

Am interested to hear further thoughts of yours on the broader topic of modelling people’s decision making as primarily status based, if you have more things to add to the discussion.

[-]9eB16y*200

The phenomenon I was pointing out wasn't exactly that the person's decision was made because of status. It was that a prerequisite for them changing their mind was that they were taken seriously and engaged with respectfully. That said, I do think that its interesting to understand the way status plays into these events.

First, they started the essay with a personality-focused explanation:

To explain how this all happened, and what we can learn from it, I think it’s important to learn a little bit more about my personality and with what kind of attitude and world model I came into this situation.

and

I have a depressive/paranoid streak, and tend to assume the worst until proven otherwise. At the time I made my first twitter post, it seemed completely plausible in my mind that no one, OpenAI or otherwise, would care or even notice me. Or, even worse, that they would antagonize me."

The narrative that the author themselves is setting up is that they had irrational or emotional reasons for behaving the way they did, then they considered longer and changed their mind. They also specifically call out that their perceived lack of self-status as an influencing factor.

If someone has an irrational, status-focused explanation for their own initial reasoning, and then we see high-status people providing them extensive validation, it doesn't mean that they changed their mind because of the high-status people, but it's suggestive. My real model is that they took those ideas extra seriously because the people were nice and high status.

Imagine a counterfactual world where they posted their model, and all of the responses they received were the same logical argument, but instead made on 4Chan and starting with "hey fuckhead, what are you trying to do, destroy the world?" My priors suggest that this person would have, out of spite, continued to release the model.

The gesture they are making here, not releasing the model, IS purely symbolic. We know the model is not as good as mini-GPT2. Nonetheless, it may be useful to people who aren't being supported by large corporate interests, either for learning or just for understanding ML better for real hackers. Since releasing the model is not a bona fide risk, part of not releasing it is so they can feel like they are part of history. Note the end where they talk about the precedent they are setting now by not releasing it.

I think the fact that the model doesn't actually work is an important aspect of this. Many hackers would have done it as a cool project and released it without pomp, but this person put together a long essay, explicitly touting the importance of what they'd done and the impact it would have on history. Then, it turned out the model did not work, which must have been very embarrassing. It is fairly reasonable to suggest that the person then took the action that made them feel the best about their legacy and status: writing an essay about why they were not releasing the model for good rationalist approved reasons. It is not even necessarily the case that the person is aware that this is influencing the decision, this is a fully Elephant in the Brain situation.

When I read that essay, at least half of it is heavily-laden with status concerns and psychological motivations. But, to reiterate: though pro-social community norms left this person open to having their mind changed by argument, probably the arguments still had to be made.

How you feel about this should probably turn on questions like "Who has the status in this community to have their arguments taken seriously? Do I agree with them?" and "Is it good for only well-funded entities to have access to current state-of-the-art ML models?"

[-]Ben Pace6y*240

I agree with a lot of claims in your comment, and I think it's valuable to think through how status plays a role in many situations, including this.

There is an approach in your comments toward explaining someone's behaviour that I disagree with, though it may just be a question of emphasis. A few examples:

My real model is that they took those ideas extra seriously because the people were nice and high status.

...a prerequisite for them changing their mind was that they were taken seriously and engaged with respectfully

These seem to me definitely true and simultaneously not that important*.

When I read that essay, at least half of it is heavily-laden with status concerns and psychological motivations. But, to reiterate: though pro-social community norms left this person open to having their mind changed by argument, probably the arguments still had to be made. (emphasis added)

The word 'probably' in that sentence feels false to me. I feel somewhat analogous to hearing someone argue that a successful tech startup is 100s of people working together in a company, and that basically running a tech startup is about status and incentives, though "probably code still had to be written" to make it successful. They're both necessary.

More generally, there are two types of games going on. One we're allowed to talk about, and one we're not, or at least not very directly. And we have to coordinate on both levels to succeed. This generally warps how our words relate to reality, because we're also using those words to do things we're pretending to ourselves we're not doing, to let everyone express their preferences and coordinate in the silent games. These silent games have real and crucial implications for how well we can coordinate and where resources must be spent. But once you realise the silent games are being played, it isn't the right move to say that the silent games are the only games, or always the primary games.

I think the fact that the model doesn't actually work is an important aspect of this. Many hackers would have done it as a cool project and released it without pomp, but this person put together a long essay, explicitly touting the importance of what they'd done and the impact it would have on history. Then, it turned out the model did not work, which must have been very embarrassing. It is fairly reasonable to suggest that the person then took the action that made them feel the best about their legacy and status: writing an essay about why they were not releasing the model for good rationalist approved reasons. It is not even necessarily the case that the person is aware that this is influencing the decision, this is a fully Elephant in the Brain situation.

Again, I agree that something in this reference class is likely happening. But, for example, the long essay was not only about increasing the perceived importance of the action. It was also a strongly pro-social and cooperative move to the broader AI community to allow counterarguments to be presented, which is what successfully happened. There are multiple motives here, and (I think) it's the case that the motive you point to was not the main one, even while it is a silent motive folks systematically avoid discussing.

*Actually I think that Connor in particular would've engaged with arguments even if they'd not been delivered respectfully, given that he responded substantively to many comments on Twitter/HackerNews/Medium, some of which were predominantly snark.

[-]Ben Pace6y70

When Robin Hanson is interviewed about The Elephant in the Brain, he is often asked "Are you saying that status accounts for all of our behaviour?". His reply is that he+KevinSimler aren't arguing that the hidden motives are the only motive, but that they're a far more common motive than we give credit for in our normal discourse. Here's an example of him saying this kind of thing on the 80k podcast:

As we just said the example that, in education, your motive isn’t to learn the material, or when you go to the doctor, your motive isn’t to get well primarily, and the hidden motives are the actual motive. Now, how could I know what the hidden motives are, you might ask? The plan here, that’s where the book is … In each area, we identify the usual story, then we collect a set of puzzles that don’t make sense from the point of view of the usual story, strange empirical patterns, and then we offer an alternative motive that makes a lot more sense of those empirical patterns, and then we suggest that that is a stronger motive than the one we usually say.

Now, just to be clear, almost every area of human life is complicated, and there’s a lot of people with a lot of different details and so, of course, almost every possible motive shows up in almost every area of human life, so we can’t be talking about the only motive, and so the usual motive does actually apply sometimes. Actually, you could think of the analogy to the excuse that the dog ate my homework. It only works because sometimes dogs eat homework. We don’t say the dragon ate my homework. That wouldn’t fly, so the usual story is part of the story. It’s just a smaller part than we like to admit, and what we’re going to call the hidden motive, the real motive is a bigger part of the story, but it’s still not the only part.

[-]Ben Pace6y*80

it turned out the model did not work... It is fairly reasonable to suggest that the person then took the action that made them feel the best about their legacy and status

Reading this I realise I developed most of my attitudes toward the topic when I believed that the copy was full-strength, and only in writing the post did I find out that it wasn't - in fact it seems that it was weaker than the initial 117M version OpenAI released. You're right that this makes the 'release' option less exciting from the perspective of one's personal status, which (the status lens) would then predict taking whichever different action would give more personal status, and this is arguably one of those actions.

Just now I found this comment in the medium comment section, where Connor agrees with you about it being symbolic, and mentions how this affected his thinking.

...I did admit failure as I linked to said failure in the very first paragraph, and I have no intentions of hiding that. In fact, after learning of my failure I was convinced I might as well release, since most safety issues were no longer a threat anyways (though there remains the possibility it could be used as a “warm start” to train a better model). So if anything, my failure encouraged me to dump it, apologize and let history take its course.

My decision not to release is mostly symbolic. I’m doing it to signal good faith cooperation. Even if I failed today, some day someone will succeed, and we should have a default of cooperation before that.

(Meta: Wow, Medium requires you to click twice to go down one step in a comment thread! Turns out there are like 20 comments on the OP.)

[-]Ben Pace6y60

Yeah, this is quite important, the attempted copy was weaker than the nerfed model OpenAI initially released. Thanks for emphasising this 9eB1, I've updated my post in a few places accordingly.

[-]Ben Pace6y40

The phenomenon I was pointing out wasn't exactly that the person's decision was made because of status. It was that a prerequisite for them changing their mind was that they were taking seriously and engaged with respectfully.

Yeah, respectful and serious engagement with people’s ideas, even when you’re on the opposite sides of policy/norm disputes, is very important.

[-]Rohin Shah6y60

On reading that I was genuinely delighted to see such pro-social and cooperative behaviour from the person who believed OpenAI was wrong.

I think the pro-social and cooperative thing to do was to email OpenAI privately rather than issuing a public ultimatum.

[-]Ben Pace6y*40

I’m imagining here something like a policy of emailing OpenAI and telling them your plan and offering them as much time to talk as possible, and saying that in a week you’ll publicly publish your reasoning too so that other people can respond + potentially change your mind. I also think it would’ve been quite reasonable to not expect any response from a big organisation like OpenAI, and to be doing it only out of courtesy.

It seems from above that talking to OpenAI didn’t change Connor’s mind, and that public discourse was very useful. I expect Buck would not have talked to him if he hadn’t done this publicly (I will ask Buck when I see him) (Added: Buck says this is true). Given the OP I don’t think it would’ve been able to resolve privately, and I think I am quite actively happy that it has resolved the way it has: Someone publicly deciding to not unilaterally break an important new norm, even while they strongly believe this particular application of the norm is redundant/unhelpful.

I’d be interested to know if you think that it would’ve been perfectly pro-social to give OpenAI a week’s heads-up and then writing your reasoning publicly and reading everyone else’s critiques (100% of random people from Hacker News and Twitter and longer chats with Buck). I have a sense that you wouldn’t but I’m not fully sure why.

[-]Rohin Shah6y20

I also think it would’ve been quite reasonable to not expect any response from a big organisation like OpenAI, and to be doing it only out of courtesy.

Yeah, that seems reasonable, but it doesn't seem like you could reasonably have 99% confidence in this.

It seems from above that talking to OpenAI didn’t change Connor’s mind, and that public discourse was very useful. I expect Buck would not have talked to him if he hadn’t done this publicly (I will ask Buck when I see him).

I agree with this, but it's ex-post reasoning, I don't think this was predictable with enough certainty ex-ante.

Given the OP I don’t think it would’ve been able to resolve privately, but if it had I think I’d be less happy than with what actually happened, which is someone publicly deciding to not unilaterally break an important new norm, even while they strongly believe this particular application of the norm is redundant/unhelpful.

It's always possible to publicly post after you've come to the decision privately. (Also, I'm really only talking about what should have been done ex-ante, not ex-post.)

I’d be interested to know if you think that it would’ve been perfectly pro-social to give OpenAI a week’s heads-up and then writing your reasoning publicly and reading everyone else’s critiques (100% of random people from Hacker News and Twitter and longer chats with Buck). I have a sense that you wouldn’t but I’m not fully sure why.

That seems fine, and very close to what I would have gone with myself. Maybe I would have first emailed OpenAI, and if I hadn't gotten a response in 2-3 days, then said I would make it public if I didn't hear back in another 2-3 days. (This is all assuming I don't know anyone at OpenAI, to put myself in the author's position.)

[-]Gurkenglas6y20

If you want to build a norm, publicly visible use helps establish it.

[-]Rohin Shah6y40

As I mentioned above, it's always possible to publicly post after you've come to the decision privately.

[-]Gurkenglas6y10

If people choose whether to identify with you at your first public statement, switching tribes after that can carry along lurkers.

[-]Rohin Shah6y40

Agreed that this is a benefit of what actually happened, but I want to note that if you're banking on this ex ante, you're deciding not to cooperate with a group X because you want to publicly signal allegiance to group Y with the expectation that you will then switch to group X and take along some people from group Y.

This is deceptive, and it harms our ability to cooperate. It seems pretty obvious to me that we should not do that under normal circumstances.

(I really do only want to talk about what should be done ex ante, that seems like the only decision-relevant thing here.)

[-]Gurkenglas6y50

I was coming up with reasons that a nearsighted consequentialist (aka not worried about being manipulative) might use. That said, getting lurkers to identify with you, then gathering evidence that will sway you, and them, one way or the other, is a force multiplier on an asymmetric weapon pointed towards truth. You need only see the possibility of switching sides to use this. He was open about being open to be convinced. It's like preregistering a study.

[-]Rohin Shah6y40

You're right, it's too harsh to claim that this is deceptive. That does seem more reasonable. I still think it isn't worth it given the harm to your ability to coordinate.

I was coming up with reasons that a nearsighted consequentialist (aka not worried about being manipulative) might use.

Sorry, I thought you were defending the decision. I'm currently only interested in decision-relevant aspects of this, which as far as I can tell means "how the decision should be made ex-ante", so I'm not going to speculate on nearsighted-consequentialist-reasons.

[-]hkhenson6y10

Given that status seem to have always been coupled to reproductive success for a very long time, it should not be surprising that evolution wired up humans to be status seekers. This wasn't recognized back in the mid 90s and I got in considerable trouble for claiming this to be a common motive.

LESSWRONG
is fundraising!
LW

LESSWRONG
is fundraising!
LW

80

The Hacker Learns to Trust

80

80

Background on GPT-2

Can someone else just build another GPT-2 and release the full 1.5B parameter model?