Meta’s frontier AI models are fundamentally unsafe. Since Meta AI has released the model weights publicly, any safety measures can be removed. Before it releases even more advanced models – which will have more dangerous capabilities – we call on Meta to take responsible release seriously and stop irreversible proliferation. Join us for a peaceful protest at Meta’s office in San Francisco at 250 Howard St at 4pm PT.

RSVP on Facebook[1] or through this form.

Let’s send a message to Meta:

  • Stop irreversible proliferation of model weights. Meta’s models are not safe if anyone can remove the safety measures.
  • Take AI risks seriously.
  • Take responsibility for harms caused by your AIs.

All you need to bring is yourself and a sign, if you want to make your own. I will lead a trip to SF from Berkeley but anyone can join at the location. We will have a sign-making party before the demonstration-- stay tuned for details. We'll go out for drinks afterward 🙂

  1. ^

    I like the irony.

New to LessWrong?

New Comment
33 comments, sorted by Click to highlight new comments since: Today at 9:04 PM

Given that a high stakes all out arms race for frontier foundation AGI models is heating up between the major powers, and meta's public models are trailing - it doesn't seem clear at all that open sourcing them is net safety negative. One could argue the benefits of having wide access for safety research along with tilting the world towards multi-polar scenarios outweight the (more minimal) risks.

I agree it is not clear if it is net postive or negative that they open source the models, here are the main arguments for and against I could think of:


Pros with open sourcing models

- Gives AI alignment researchers access to smarter models to experiment on

- Decreases income for leading AI labs such as OpenAI and Google, since people can use open source models instead.



Cons with open sourcing models

- Capability researchers can do better experiements on how to improve capabilities

-  The open source community could develop code to faster train and run inference on models, indirectly enhancing capability development.

- Better open source models could lead to more AI startups succeeding, which might lead to more AI research funding. This seems like a stretch to me.

- If Meta would share any meaningful improvements on how to train models that is of course directly contributing to other labs capabilities, but llama to me doesn't seem that innovative. I'm happy to be corrected if I am wrong on this point.

^ all good points, but I think the biggest thing here is the policy of sharing weights continuing into the future with more powerful models. 

Since Meta AI has released the model weights publicly, any safety measures can be removed.

They release base models as well, in addition to the tuned models. Base models have no safety measures, so talking about removal of safety measures (from the tuned models) sounds misleading. (Zvi also used this perplexing framing in a couple of recent posts.)

I actually did not realize they released the base model. There's research showing how easy it is to remove the safety fine-tuning, which is where I got the framing and probably Zvi too, but perhaps that was more of a proof of concept than the main concern in this case. 

The concept of being able to remove fine-tuning is pretty important for safety, but I will change my wording where possible to also mention it being bad to release the base model without any safety fine-tuning. Just asked to download llama 2 so I'll see what options they give.

Here's my comment with references where I attempted to correct Zvi's framing. He probably didn't notice it, since he used the framing again a couple of weeks later.

To be fair, the tuned models are arguably the most dangerous models, since they are more easily guided towards specific objectives. The fact that they release tuned models, on which the safety measures can be removed, is particularly egregious.

Though your overall point is a valid one.

I want to make a hopefully-sandboxed comment:

This seems kind of cringe.

I don't think of myself as someone who thinks in terms of cringe, much, but apparently I have this reaction. I don't particularly endorse it, or any implications of it, but it's there. Maybe it means I have some intuition that the thing is bad to do, or maybe it means I expect it to have some weird unexpected social effect. Maybe it will be mocked in a way that shows that it's not actually a good symbolic social move. Maybe the intuition is something like: protests are the sort of thing that the weak side does, the side that will be mainly ignored, or perhaps mocked, and so making a protest puts stop-AI-ists in a weak social position. (I continue to not endorse any direct implication here, such as "this is bad to do for this reason".) Why would someone in power and reasoning in terms of power, like Lecun, take the stop-AI-ists seriously, when they've basically publicly admitted to not having social power, i.e. to being losers? Someone in power can't gain more power by cooperating with losers, and does not need to heed the demands of losers because they can't threaten them in the arena of power. (I do endorse trying to be aware of this sort of dynamic. I hope to see some version of the protest that is good, and/or some version of updating on the results or non-results.)

[ETA: and to be extra clear, I definitely don't endorse making decisions from within the frame of social symbolic moves and power dynamics and conflict. That sort of situation is something that we are to some extent always in, and sometimes forced to be in and sometimes want to be in, but that thinking-frame is never something that we are forced to or should want to restrict our thinking to.]

[-]aphyer7mo3011

I had a similar gut reaction.  When I tried to run down my brain's root causes of the view, this is what it came out as:

There are two kinds of problem you can encounter in politics.  One type is where many people disagree with you on an issue.  The other type is where almost everyone agrees with you on the issue, but most people are not paying attention to it.

Protests as a strategy are valuable in the second case, but worthless or counterproductive in the first case.

If you are being knifed in an alleyway, your best strategy is to make as much noise as possible.  Your goal is to attract people to help you.  You don't really need to worry that your yelling might annoy people.  There isn't a meaningful risk that people will come by, see the situation, and then decide that they want to side with the guy knifing you.  If lots of people start looking at the situation, you win.  Your loss condition is 'no-one pays attention', not 'people pay attention but take the opposite side'.

And if you are in an isomorphic sort of political situation, where someone is doing something that basically everyone agrees is genuinely outrageous but nobody is really paying attention to, protests are a valuable strategy.  They will annoy people, but they will draw attention to this issue where you are uncontroversially right in a way that people will immediately notice.

But if you are in an argument where substantial numbers of people disagree with you, protests are a much less enticing strategy, and one that often seems to boil down to saying 'lots of people disagree with me, but I'm louder and more annoying than them, so you should listen to me'.

And 'AI development is a major danger' is very much the 'disagreement' kind of issue at the moment.  There is not broad social consensus that AI development is dangerous such that 'get lots of people to look at what Meta is doing' will lead to good outcomes.

I have no actual expertise in politics and don't actually know this to be true, but it seems to be what my subconscious thinks on this issue.

I think that, in particular, protesting Meta releasing their models to the public is a lot less likely to go well than protesting, say, OpenAI developing their models.  Releasing models to the public seems virtuous on its face both to the general public and to many technologists.  Protesting that is going to draw attention to that specifically and so tend to paint the developers of more advanced models in a comparatively better light and their opponents in a comparatively worse light compared. 

I agree with your assessment of the situation a lot, but I disagree that there is all that much controversy about this issue in the broader public. There is a lot of controversy on lesswrong, and in tech, but the public as a whole is in favor of slowing down and regulating AI developments. (Although other AI companies think sharing weights is really irresponsible and there are anti-competitive issues with llama 2’s ToS, which why it isn’t actually open source.) https://theaipi.org/poll-shows-overwhelming-concern-about-risks-from-ai-as-new-institute-launches-to-understand-public-opinion-and-advocate-for-responsible-ai-policies/

The public doesn’t understand the risks of sharing model weights so getting media attention to this issue will be helpful.

A lot of things that are cringe are specifically that way because they violate a socially-maintained taboo. Overcoming the taboos against these sorts of actions and these sorts of topics is precisely what we should be doing.

The fact that it is cringey is exactly the reason I am going to participate

What if there's a taboo against being able to pick and choose which taboos to "overcome"?

This seems likely to be the case otherwise the taboos on incest for example would have disappeared long ago in the US. Since there's easily a hard core group, almost certainly >0.1% of the US population, who would be just as interested in 'overcoming' that taboo.

Particularly in the rationalist community it seems like protesting is seen as a very outgroup thing to do. But why should that be? Good on you for expanding your comfort zone-- hope to see you there :)

Seems very plausible to me.

Took me a while to figure out where the quoted line is in the post. Now I realize that you were the one cringing.

FWIW I plan to show up unless I have something unusually important to do that day.

I just tried to change it from being a quote to being in a box. But apparently you need a package to put a box around verbatim text in Latex. https://tex.stackexchange.com/questions/6260/how-to-draw-box-around-text-that-contains-a-verbatim-block

So feature suggestion: boxes. Or Latex packages.

I commend your introspection on this.

In my sordid past I did plenty of "finding the three people for nuanced logical mind-changing discussions amidst a dozens of 'hey hey ho ho outgroup has got to go'", so I'll do the same here (if I'm in town), but selection effects seem deeply worrying (for example, you could go down to the soup kitchen or punk music venue and recruit all the young volunteers who are constantly sneering about how gentrifying techbros are evil and can't coordinate on whether their "unabomber is actually based" argument is ironic or unironic, but you oughtn't. The fact that this is even a question, that if you have a "mass movement" theory of change you're constantly temped to lower your standards in this way, is so intrinsically risky that no one should be comfortable that ML safety or alignment is resorting to this sort of thing). 

This strikes me as the kind of political thinking I think you’re trying to avoid. Contempt is not good for thought. Advocacy is not the only way to be tempted to lower your epistemic standards. I think you’re doing it right now when you other me or this type of intervention.

This seems kinda fair, I'd like to clarify--- I largely trust the first few dozen people, I just expect depending on how growth/acquisition is done if there are more than a couple instances of protests to have to deal with all the values diversity underlying the different reasons for joining in. This subject seems unusually fraught in potential to generate conflationary alliance https://www.lesswrong.com/s/6YHHWqmQ7x6vf4s5C sorta things.

Overall I didn't mean to other you-- in fact, never said this publicly, but a couple months ago there was a related post of yours that got me saying "yeah we're lucky holly is on this / she seems better suited than most would be to navigate this" cuz I've been consuming your essays for years. I also did not mean to insinuate that you hadn't thought it through-- I meant to signal "here's a random guy who cares about this consideration" just as an outside vote of "hope this doesn't get triage'd out". I basically assumed you had threatmodeled interactions with different strains of populism

Yeah, I’ve been weighing a lot whether big tent approaches are something I can pull off at this stage or whether I should stick to “Pause AI”. The Meta protest is kind of an experiment in that regard and it has already been harder than I expected to get the message about irreversible proliferation across well. Pause is sort of automatically a big tent because it would address all AI harms. People can be very aligned on Pause as a policy without having the same motivations. Not releasing model weights is more of a one-off issue and requires a lot of inferential distance crossing even with knowledgeable people. So I’ll probably keep the next several events focused on Pause, a message much better suited to advocacy.

I'm confused about the game theory of this kind of protest. If protests don't work, fine, no harm done either way. But if they do work, what's to stop the "do publicize this!" crowd (accelerationists, open source AI people, etc) from protesting on their own? Also, I have no idea about the relative numbers, but what if they could protest with 10x the number of people?

[-]tlevin7mo133

I think the main thing stopping the accelerationists and open source enthusiasts from protesting with 10x as many people is that, whether for good reasons or not, there is much more opposition to AI progress and proliferation than support among the general public. (Admittedly this is probably less true in the Bay Area, but I would be surprised if it was even close to parity there and very surprised if it were 10x.)

Thanks, that's very helpful context. In principle, I wouldn't put too much stock in the specific numbers of a single poll, since those results depend too much on specific wording etc. But the trend in this poll is consistent enough over all questions that I'd be surprised if the questions could be massaged to get the opposite results, let alone ones 10x in favor of the accelerationist side.

(That said, I didn't like the long multi-paragraph questions further down in the poll. I felt like many were phrased to favor the cautiousness side somewhat, which biases the corresponding answers. Fortunately there were also plenty of short questions without this problem.)

Thanks, that's very helpful context. In principle, I wouldn't put too much stock in the specific numbers of a single poll, since those results depend too much on specific wording etc. But the trend in this poll is consistent enough over all questions that I'd be surprised if the questions could be massaged to get the opposite results, let alone ones 10x in favor of the accelerationist side.

I believe this has been replicated consistently across many polls. For the results to change, reality (in the sense of popular opinion) likely has to change, rather than polling techniques.

On the other hand, popular opinion changing isn't that unlikely, as it's not exactly something that either voters or the elites have thought much about, and (fortunately) this has not yet hewn along partisan lines.

One very common pattern is, most people oppose a technology when it's new and unfamiliar, then later once it's been established for a little while and doesn't seem so strange most people think it's great.

Yeah, I’m afraid of this happening with AI even as the danger becomes clearer. It’s one reason we’re in a really important window for setting policy.

This seems to me like a second order correction which is unlikely to change the sign of the outcome. 

In case of issues where one "pulls sideways", politically speaking, I also expect indirect effects to be comparatively unimportant. But in political zero-ish-sum conflicts, I'm more apprehensive of indirect effects and arguments.

Change log: I removed the point about Meta inaccurately calling itself "open source" because it was confusing.