Zach Stein-Perlman

AI strategy & governance. Blog: Not Optional.

Sequences

Slowing AI

Wiki Contributions

Load More

Comments

Constellation (which I think has some important FHI-like virtues, although makes different tradeoffs and misses on others)

What is Constellation missing or what should it do? (Especially if you haven't already told the Constellation team this.)

Harry let himself be pulled, but as Hermione dragged him away, he said, raising his voice even louder, "It is entirely possible that in a thousand years, the fact that FHI was at Oxford will be the only reason anyone remembers Oxford!"

Yes but possibly the lab has its own private scaffolding which is better for its model than any other existing scaffolding, perhaps because it trained the model to use its specific scaffolding, and it can initially not allow users to use that.

(Maybe it’s impossible to give API access to scaffolding and keep the scaffolding private? Idk.)

Edit: Plus what David says.

Suppose you can take an action that decreases net P(everyone dying) but increases P(you yourself kill everyone), and leaves all else equal. I claim you should take it; everyone is better off if you take it.

I deny "deontological injunctions." I want you and everyone to take the actions that lead to the best outcomes, not that keep your own hands clean. I'm puzzled by your expectation that I'd endorse "deontological injunctions."

This situation seems identical to the trolley problem in the relevant ways. I think you should avoid letting people die, not just avoid killing people.

[Note: I roughly endorse heuristics like if you're contemplating crazy-sounding actions for strange-sounding reasons, you should suspect that you're confused about your situation or the effects of your actions, and you should be more cautious than your naive calculations suggest. But that's very different from deontology.]

I guess I'm more willing to treat Anthropic's marketing as not-representing-Anthropic.

Like, when OpenAI marketing says GPT-4 is our most aligned model yet! you could say this shows that OpenAI deeply misunderstands alignment but I tend to ignore it. Even mostly when Sam Altman says it himself.

[Edit after habryka's reply: my weak independent impression is that often the marketing people say stuff that the leadership and most technical staff disagree with, and if you use marketing-speak to substantially predict what-leadership-and-staff-believe you'll make worse predictions.]

Thanks.

I guess I'm more willing to treat Anthropic's marketing as not-representing-Anthropic. Shrug. [Edit: like, maybe it's consistent-with-being-a-good-guy-and-basically-honest to exaggerate your product in a similar way to everyone else. (You risk the downsides of creating hype but that's a different issue than the integrity thing.)]

It is disappointing that Anthropic hasn't clarified its commitments after the post-launch confusion, one way or the other.

Sorry for using the poll to support a different proposition. Edited.

To make sure I understand your position (and Ben's):

  1. Dario committed to Dustin that Anthropic wouldn't "meaningfully advance the frontier" (according to Dustin)
  2. Anthropic senior staff privately gave AI safety people the impression that Anthropic would stay behind/at the frontier (although nobody has quotes)
  3. Claude 3 Opus meaningfully advanced the frontier? Or slightly advanced it but Anthropic markets it like it was a substantial advance so they're being similarly low-integrity?

...I don't think Anthropic violated its deployment commitments. I mostly believe y'all about 2—I didn't know 2 until people asserted it right after the Claude 3 release, but I haven't been around the community, much less well-connected in it, for long—but that feels like an honest miscommunication to me. If I'm missing "past Anthropic commitments" please point to them.

Most of us agree with you that deploying Claude 3 was reasonable, although I for one disagree with your reasoning. The criticism was mostly about the release being (allegedly) inconsistent with Anthropic's past statements/commitments on releases.

[Edit: the link shows that most of us don't think deploying Claude 3 increased AI risk, not think deploying Claude 3 was reasonable.]

"turning out to be right" is CAIS's strength

This is CAIP, not CAIS; CAIP doesn't really have a track record yet.

Load More