This is too strong. For example, releasing the product would be correct if someone else would do something similar soon anyway and you're safer than them and releasing first lets you capture more of the free energy. (That's not the case here, but it's not as straightforward as you suggest, especially with your "Regardless of how good their alignment plans are" and your claim "There's just no good reason to do that, except short-term greed".)

Express interest in an "FHI of the West"

Zach Stein-Perlman7d40

Constellation (which I think has some important FHI-like virtues, although makes different tradeoffs and misses on others)

What is Constellation missing or what should it do? (Especially if you haven't already told the Constellation team this.)

FHI (Future of Humanity Institute) has shut down (2005–2024)

Zach Stein-Perlman9d4029

Harry let himself be pulled, but as Hermione dragged him away, he said, raising his voice even louder, "It is entirely possible that in a thousand years, the fact that FHI was at Oxford will be the only reason anyone remembers Oxford!"

Staged release

Zach Stein-Perlman10d62

Yes but possibly the lab has its own private scaffolding which is better for its model than any other existing scaffolding, perhaps because it trained the model to use its specific scaffolding, and it can initially not allow users to use that.

(Maybe it’s impossible to give API access to scaffolding and keep the scaffolding private? Idk.)

Edit: Plus what David says.

RTFB: On the New Proposed CAIP AI Bill

Zach Stein-Perlman10d30

Suppose you can take an action that decreases net P(everyone dying) but increases P(you yourself kill everyone), and leaves all else equal. I claim you should take it; everyone is better off if you take it.

I deny "deontological injunctions." I want you and everyone to take the actions that lead to the best outcomes, not that keep your own hands clean. I'm puzzled by your expectation that I'd endorse "deontological injunctions."

This situation seems identical to the trolley problem in the relevant ways. I think you should avoid letting people die, not just avoid killing people.

[Note: I roughly endorse heuristics like if you're contemplating crazy-sounding actions for strange-sounding reasons, you should suspect that you're confused about your situation or the effects of your actions, and you should be more cautious than your naive calculations suggest. But that's very different from deontology.]

Anthropic AI made the right call

Zach Stein-Perlman10d20

I guess I'm more willing to treat Anthropic's marketing as not-representing-Anthropic.

Like, when OpenAI marketing says GPT-4 is our most aligned model yet! you could say this shows that OpenAI deeply misunderstands alignment but I tend to ignore it. Even mostly when Sam Altman says it himself.

[Edit after habryka's reply: my weak independent impression is that often the marketing people say stuff that the leadership and most technical staff disagree with, and if you use marketing-speak to substantially predict what-leadership-and-staff-believe you'll make worse predictions.]

Anthropic AI made the right call

Zach Stein-Perlman12d62

Thanks.

I guess I'm more willing to treat Anthropic's marketing as not-representing-Anthropic. Shrug. [Edit: like, maybe it's consistent-with-being-a-good-guy-and-basically-honest to exaggerate your product in a similar way to everyone else. (You risk the downsides of creating hype but that's a different issue than the integrity thing.)]

It is disappointing that Anthropic hasn't clarified its commitments after the post-launch confusion, one way or the other.

Anthropic AI made the right call

Zach Stein-Perlman12d95

Sorry for using the poll to support a different proposition. Edited.

To make sure I understand your position (and Ben's):

Dario committed to Dustin that Anthropic wouldn't "meaningfully advance the frontier" (according to Dustin)
Anthropic senior staff privately gave AI safety people the impression that Anthropic would stay behind/at the frontier (although nobody has quotes)
Claude 3 Opus meaningfully advanced the frontier? Or slightly advanced it but Anthropic markets it like it was a substantial advance so they're being similarly low-integrity?

...I don't think Anthropic violated its deployment commitments. I mostly believe y'all about 2—I didn't know 2 until people asserted it right after the Claude 3 release, but I haven't been around the community, much less well-connected in it, for long—but that feels like an honest miscommunication to me. If I'm missing "past Anthropic commitments" please point to them.

Anthropic AI made the right call

Zach Stein-Perlman12d105

Most of us agree with you that deploying Claude 3 was reasonable, although I for one disagree with your reasoning. The criticism was mostly about the release being (allegedly) inconsistent with Anthropic's past statements/commitments on releases.

[Edit: the link shows that most of us don't think deploying Claude 3 increased AI risk, not think deploying Claude 3 was reasonable.]

RTFB: On the New Proposed CAIP AI Bill

Zach Stein-Perlman16d40

"turning out to be right" is CAIS's strength

This is CAIP, not CAIS; CAIP doesn't really have a track record yet.